Lecture 4: Neural Networks and Backpropagation

Total Page:16

File Type:pdf, Size:1020Kb

Lecture 4: Neural Networks and Backpropagation Lecture 4: Neural Networks and Backpropagation Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 1 April 11, 2019 Administrative: Assignment 1 Assignment 1 due Wednesday April 17, 11:59pm If using Google Cloud, you don’t need GPUs for this assignment! We will distribute Google Cloud coupons by this weekend Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 2 April 11, 2019 Administrative: Alternate Midterm Time If you need to request an alternate midterm time: See Piazza for form, fill it out by 4/25 (two weeks from today) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 3 April 11, 2019 Administrative: Project Proposal Project proposal due 4/24 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 4 April 11, 2019 Administrative: Discussion Section Discussion section tomorrow (1:20pm in Gates B03): How to pick a project / How to read a paper Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 5 April 11, 2019 Where we are... Linear score function SVM loss (or softmax) data loss + regularization How to find the best W? Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 6 April 11, 2019 Finding the best W: Optimize with Gradient Descent Landscape image is CC0 1.0 public domain Walking man image is CC0 1.0 public domain Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 7 April 11, 2019 Gradient descent Numerical gradient: slow :(, approximate :(, easy to write :) Analytic gradient: fast :), exact :), error-prone :( In practice: Derive analytic gradient, check your implementation with numerical gradient Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 8 April 11, 2019 Problem: Linear Classifiers are not very powerful Visual Viewpoint Geometric Viewpoint Linear classifiers learn Linear classifiers one template per class can only draw linear decision boundaries Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 9 April 11, 2019 One Solution: Feature Transformation f(x, y) = (r(x, y), θ(x, y)) Transform data with a cleverly chosen feature transform f, then apply linear classifier Color Histogram Histogram of Oriented Gradients (HoG) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 10 April 11, 2019 Image features vs ConvNets f Feature Extraction 10 numbers giving scores for classes training Krizhevsky, Sutskever, and Hinton, “Imagenet classification with deep convolutional neural networks”, NIPS 2012. Figure copyright Krizhevsky, Sutskever, and Hinton, 2012. Reproduced with permission. 10 numbers giving scores for classes training Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 11 April 9, 2019 Today: Neural Networks Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 12 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 13 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network (In practice we will usually add a learnable bias at each layer as well) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 14 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network “Neural Network” is a very broad term; these are more accurately called “fully-connected networks” or sometimes “multi-layer perceptrons” (MLP) (In practice we will usually add a learnable bias at each layer as well) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 15 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network or 3-layer Neural Network (In practice we will usually add a learnable bias at each layer as well) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 16 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network x W1 h W2 s 3072 100 10 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 17 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network x W1 h W2 s 3072 100 10 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 18 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network The function is called the activation function. Q: What if we try to build a neural network without one? Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 19 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network The function is called the activation function. Q: What if we try to build a neural network without one? A: We end up with a linear classifier again! Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 20 April 11, 2019 Activation functions Sigmoid Leaky ReLU tanh Maxout ReLU ELU Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 21 April 11, 2019 ReLU is a good default Activation functions choice for most problems Sigmoid Leaky ReLU tanh Maxout ReLU ELU Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 22 April 11, 2019 Neural networks: Architectures “3-layer Neural Net”, or “2-layer Neural Net”, or “2-hidden-layer Neural Net” “1-hidden-layer Neural Net” “Fully-connected” layers Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 23 April 11, 2019 Example feed-forward computation of a neural network Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 24 April 11, 2019 Full implementation of training a 2-layer Neural Network needs ~20 lines: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 25 April 11, 2019 Setting the number of layers and their sizes more neurons = more capacity Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 26 13 Jan 2016 Do not use size of neural network as a regularizer. Use stronger regularization instead: (Web demo with ConvNetJS: http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html) Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 27 13 Jan 2016 This image by Fotis Bobolas is licensed under CC-BY 2.0 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 28 April 11, 2019 Impulses carried toward cell body dendrite presynaptic terminal axon cell body Impulses carried away from cell body This image by Felipe Perucho is licensed under CC-BY 3.0 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 29 April 11, 2019 Impulses carried toward cell body dendrite presynaptic terminal axon cell body Impulses carried away from cell body This image by Felipe Perucho is licensed under CC-BY 3.0 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 30 April 11, 2019 Impulses carried toward cell body dendrite presynaptic terminal axon cell body Impulses carried away from cell body This image by Felipe Perucho is licensed under CC-BY 3.0 sigmoid activation function Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 31 April 11, 2019 Impulses carried toward cell body dendrite presynaptic terminal axon cell body Impulses carried away from cell body This image by Felipe Perucho is licensed under CC-BY 3.0 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 32 April 11, 2019 Biological Neurons: Neurons in a neural network: Complex connectivity patterns Organized into regular layers for computational efficiency This image is CC0 Public Domain Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 33 April 11, 2019 Biological Neurons: But neural networks with random Complex connectivity patterns connections can work too! This image is CC0 Public Domain Xie et al, “Exploring Randomly Wired Neural Networks for Image Recognition”, arXiv 2019 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 34 April 11, 2019 Be very careful with your brain analogies! Biological Neurons: ● Many different types ● Dendrites can perform complex non-linear computations ● Synapses are not a single weight but a complex non-linear dynamical system ● Rate code may not be adequate [Dendritic Computation. London and Hausser] Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 35 April 11, 2019 Problem: How to compute gradients? Nonlinear score function SVM Loss on predictions Regularization Total loss: data loss + regularization If we can compute then we can learn W1 and W2 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 36 April 11, 2019 (Bad) Idea: Derive on paper Problem: Very tedious: Lots of matrix calculus, need lots of paper Problem: What if we want to change loss? E.g. use softmax instead of SVM? Need to re-derive from scratch =( Problem: Not feasible for very complex models! Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 37 April 11, 2019 Better Idea: Computational graphs + Backpropagation x s (scores) hinge + * loss L W R Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 38 April 11, 2019 Convolutional network (AlexNet) input image weights loss Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 39 April 11, 2019 Neural Turing Machine input image loss Figure reproduced with permission from a Twitter post by Andrej Karpathy. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 40 April 11, 2019 Neural Turing Machine Figure reproduced with permission from a Twitter post by Andrej Karpathy. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 11, 2019 Backpropagation: a simple example Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 42 April 11, 2019 Backpropagation: a simple example Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 43 April 11, 2019 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 44 April 13, 2017 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 45 April 13, 2017 Backpropagation: a simple example e.g.
Recommended publications
  • Backpropagation and Deep Learning in the Brain
    Backpropagation and Deep Learning in the Brain Simons Institute -- Computational Theories of the Brain 2018 Timothy Lillicrap DeepMind, UCL With: Sergey Bartunov, Adam Santoro, Jordan Guerguiev, Blake Richards, Luke Marris, Daniel Cownden, Colin Akerman, Douglas Tweed, Geoffrey Hinton The “credit assignment” problem The solution in artificial networks: backprop Credit assignment by backprop works well in practice and shows up in virtually all of the state-of-the-art supervised, unsupervised, and reinforcement learning algorithms. Why Isn’t Backprop “Biologically Plausible”? Why Isn’t Backprop “Biologically Plausible”? Neuroscience Evidence for Backprop in the Brain? A spectrum of credit assignment algorithms: A spectrum of credit assignment algorithms: A spectrum of credit assignment algorithms: How to convince a neuroscientist that the cortex is learning via [something like] backprop - To convince a machine learning researcher, an appeal to variance in gradient estimates might be enough. - But this is rarely enough to convince a neuroscientist. - So what lines of argument help? How to convince a neuroscientist that the cortex is learning via [something like] backprop - What do I mean by “something like backprop”?: - That learning is achieved across multiple layers by sending information from neurons closer to the output back to “earlier” layers to help compute their synaptic updates. How to convince a neuroscientist that the cortex is learning via [something like] backprop 1. Feedback connections in cortex are ubiquitous and modify the
    [Show full text]
  • Training Autoencoders by Alternating Minimization
    Under review as a conference paper at ICLR 2018 TRAINING AUTOENCODERS BY ALTERNATING MINI- MIZATION Anonymous authors Paper under double-blind review ABSTRACT We present DANTE, a novel method for training neural networks, in particular autoencoders, using the alternating minimization principle. DANTE provides a distinct perspective in lieu of traditional gradient-based backpropagation techniques commonly used to train deep networks. It utilizes an adaptation of quasi-convex optimization techniques to cast autoencoder training as a bi-quasi-convex optimiza- tion problem. We show that for autoencoder configurations with both differentiable (e.g. sigmoid) and non-differentiable (e.g. ReLU) activation functions, we can perform the alternations very effectively. DANTE effortlessly extends to networks with multiple hidden layers and varying network configurations. In experiments on standard datasets, autoencoders trained using the proposed method were found to be very promising and competitive to traditional backpropagation techniques, both in terms of quality of solution, as well as training speed. 1 INTRODUCTION For much of the recent march of deep learning, gradient-based backpropagation methods, e.g. Stochastic Gradient Descent (SGD) and its variants, have been the mainstay of practitioners. The use of these methods, especially on vast amounts of data, has led to unprecedented progress in several areas of artificial intelligence. On one hand, the intense focus on these techniques has led to an intimate understanding of hardware requirements and code optimizations needed to execute these routines on large datasets in a scalable manner. Today, myriad off-the-shelf and highly optimized packages exist that can churn reasonably large datasets on GPU architectures with relatively mild human involvement and little bootstrap effort.
    [Show full text]
  • Q-Learning in Continuous State and Action Spaces
    -Learning in Continuous Q State and Action Spaces Chris Gaskett, David Wettergreen, and Alexander Zelinsky Robotic Systems Laboratory Department of Systems Engineering Research School of Information Sciences and Engineering The Australian National University Canberra, ACT 0200 Australia [cg dsw alex]@syseng.anu.edu.au j j Abstract. -learning can be used to learn a control policy that max- imises a scalarQ reward through interaction with the environment. - learning is commonly applied to problems with discrete states and ac-Q tions. We describe a method suitable for control tasks which require con- tinuous actions, in response to continuous states. The system consists of a neural network coupled with a novel interpolator. Simulation results are presented for a non-holonomic control task. Advantage Learning, a variation of -learning, is shown enhance learning speed and reliability for this task.Q 1 Introduction Reinforcement learning systems learn by trial-and-error which actions are most valuable in which situations (states) [1]. Feedback is provided in the form of a scalar reward signal which may be delayed. The reward signal is defined in relation to the task to be achieved; reward is given when the system is successfully achieving the task. The value is updated incrementally with experience and is defined as a discounted sum of expected future reward. The learning systems choice of actions in response to states is called its policy. Reinforcement learning lies between the extremes of supervised learning, where the policy is taught by an expert, and unsupervised learning, where no feedback is given and the task is to find structure in data.
    [Show full text]
  • Double Backpropagation for Training Autoencoders Against Adversarial Attack
    1 Double Backpropagation for Training Autoencoders against Adversarial Attack Chengjin Sun, Sizhe Chen, and Xiaolin Huang, Senior Member, IEEE Abstract—Deep learning, as widely known, is vulnerable to adversarial samples. This paper focuses on the adversarial attack on autoencoders. Safety of the autoencoders (AEs) is important because they are widely used as a compression scheme for data storage and transmission, however, the current autoencoders are easily attacked, i.e., one can slightly modify an input but has totally different codes. The vulnerability is rooted the sensitivity of the autoencoders and to enhance the robustness, we propose to adopt double backpropagation (DBP) to secure autoencoder such as VAE and DRAW. We restrict the gradient from the reconstruction image to the original one so that the autoencoder is not sensitive to trivial perturbation produced by the adversarial attack. After smoothing the gradient by DBP, we further smooth the label by Gaussian Mixture Model (GMM), aiming for accurate and robust classification. We demonstrate in MNIST, CelebA, SVHN that our method leads to a robust autoencoder resistant to attack and a robust classifier able for image transition and immune to adversarial attack if combined with GMM. Index Terms—double backpropagation, autoencoder, network robustness, GMM. F 1 INTRODUCTION N the past few years, deep neural networks have been feature [9], [10], [11], [12], [13], or network structure [3], [14], I greatly developed and successfully used in a vast of fields, [15]. such as pattern recognition, intelligent robots, automatic Adversarial attack and its defense are revolving around a control, medicine [1]. Despite the great success, researchers small ∆x and a big resulting difference between f(x + ∆x) have found the vulnerability of deep neural networks to and f(x).
    [Show full text]
  • Automated Elastic Pipelining for Distributed Training of Transformers
    PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers Chaoyang He 1 Shen Li 2 Mahdi Soltanolkotabi 1 Salman Avestimehr 1 Abstract the-art convolutional networks ResNet-152 (He et al., 2016) and EfficientNet (Tan & Le, 2019). To tackle the growth in The size of Transformer models is growing at an model sizes, researchers have proposed various distributed unprecedented rate. It has taken less than one training techniques, including parameter servers (Li et al., year to reach trillion-level parameters since the 2014; Jiang et al., 2020; Kim et al., 2019), pipeline paral- release of GPT-3 (175B). Training such models lel (Huang et al., 2019; Park et al., 2020; Narayanan et al., requires both substantial engineering efforts and 2019), intra-layer parallel (Lepikhin et al., 2020; Shazeer enormous computing resources, which are luxu- et al., 2018; Shoeybi et al., 2019), and zero redundancy data ries most research teams cannot afford. In this parallel (Rajbhandari et al., 2019). paper, we propose PipeTransformer, which leverages automated elastic pipelining for effi- T0 (0% trained) T1 (35% trained) T2 (75% trained) T3 (100% trained) cient distributed training of Transformer models. In PipeTransformer, we design an adaptive on the fly freeze algorithm that can identify and freeze some layers gradually during training, and an elastic pipelining system that can dynamically Layer (end of training) Layer (end of training) Layer (end of training) Layer (end of training) Similarity score allocate resources to train the remaining active layers. More specifically, PipeTransformer automatically excludes frozen layers from the Figure 1. Interpretable Freeze Training: DNNs converge bottom pipeline, packs active layers into fewer GPUs, up (Results on CIFAR10 using ResNet).
    [Show full text]
  • Convolutional Neural Network Transfer Learning for Robust Face
    Convolutional Neural Network Transfer Learning for Robust Face Recognition in NAO Humanoid Robot Daniel Bussey1, Alex Glandon2, Lasitha Vidyaratne2, Mahbubul Alam2, Khan Iftekharuddin2 1Embry-Riddle Aeronautical University, 2Old Dominion University Background Research Approach Results • Artificial Neural Networks • Retrain AlexNet on the CASIA-WebFace dataset to configure the neural • VGG-Face Shows better results in every performance benchmark measured • Computing systems whose model architecture is inspired by biological networf for face recognition tasks. compared to AlexNet, although AlexNet is able to extract features from an neural networks [1]. image 800% faster than VGG-Face • Capable of improving task performance without task-specific • Acquire an input image using NAO’s camera or a high resolution camera to programming. run through the convolutional neural network. • Resolution of the input image does not have a statistically significant impact • Show excellent performance at classification based tasks such as face on the performance of VGG-Face and AlexNet. recognition [2], text recognition [3], and natural language processing • Extract the features of the input image using the neural networks AlexNet [4]. and VGG-Face • AlexNet’s performance decreases with respect to distance from the camera where VGG-Face shows no performance loss. • Deep Learning • Compare the features of the input image to the features of each image in the • A subfield of machine learning that focuses on algorithms inspired by people database. • Both frameworks show excellent performance when eliminating false the function and structure of the brain called artificial neural networks. positives. The method used to allow computers to learn through a process called • Determine whether a match is detected or if the input image is a photo of a training.
    [Show full text]
  • Unsupervised Speech Representation Learning Using Wavenet Autoencoders Jan Chorowski, Ron J
    1 Unsupervised speech representation learning using WaveNet autoencoders Jan Chorowski, Ron J. Weiss, Samy Bengio, Aaron¨ van den Oord Abstract—We consider the task of unsupervised extraction speaker gender and identity, from phonetic content, properties of meaningful latent representations of speech by applying which are consistent with internal representations learned autoencoding neural networks to speech waveforms. The goal by speech recognizers [13], [14]. Such representations are is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being desired in several tasks, such as low resource automatic speech invariant to confounding low level details in the signal such as recognition (ASR), where only a small amount of labeled the underlying pitch contour or background noise. Since the training data is available. In such scenario, limited amounts learned representation is tuned to contain only phonetic content, of data may be sufficient to learn an acoustic model on the we resort to using a high capacity WaveNet decoder to infer representation discovered without supervision, but insufficient information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the to learn the acoustic model and a data representation in a fully kind of constraint that is applied to the latent representation. supervised manner [15], [16]. We compare three variants: a simple dimensionality reduction We focus on representations learned with autoencoders bottleneck, a Gaussian Variational Autoencoder (VAE), and a applied to raw waveforms and spectrogram features and discrete Vector Quantized VAE (VQ-VAE). We analyze the quality investigate the quality of learned representations on LibriSpeech of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately re- [17].
    [Show full text]
  • Approaching Hanabi with Q-Learning and Evolutionary Algorithm
    St. Cloud State University theRepository at St. Cloud State Culminating Projects in Computer Science and Department of Computer Science and Information Technology Information Technology 12-2020 Approaching Hanabi with Q-Learning and Evolutionary Algorithm Joseph Palmersten [email protected] Follow this and additional works at: https://repository.stcloudstate.edu/csit_etds Part of the Computer Sciences Commons Recommended Citation Palmersten, Joseph, "Approaching Hanabi with Q-Learning and Evolutionary Algorithm" (2020). Culminating Projects in Computer Science and Information Technology. 34. https://repository.stcloudstate.edu/csit_etds/34 This Starred Paper is brought to you for free and open access by the Department of Computer Science and Information Technology at theRepository at St. Cloud State. It has been accepted for inclusion in Culminating Projects in Computer Science and Information Technology by an authorized administrator of theRepository at St. Cloud State. For more information, please contact [email protected]. Approaching Hanabi with Q-Learning and Evolutionary Algorithm by Joseph A Palmersten A Starred Paper Submitted to the Graduate Faculty of St. Cloud State University In Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science December, 2020 Starred Paper Committee: Bryant Julstrom, Chairperson Donald Hamnes Jie Meichsner 2 Abstract Hanabi is a cooperative card game with hidden information that requires cooperation and communication between the players. For a machine learning agent to be successful at the Hanabi, it will have to learn how to communicate and infer information from the communication of other players. To approach the problem of Hanabi the machine learning methods of Q- learning and Evolutionary algorithm are proposed as potential solutions.
    [Show full text]
  • Linear Prediction-Based Wavenet Speech Synthesis
    LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis Min-Jae Hwang Frank Soong Eunwoo Song Search Solution Microsoft Naver Corporation Seongnam, South Korea Beijing, China Seongnam, South Korea [email protected] [email protected] [email protected] Xi Wang Hyeonjoo Kang Hong-Goo Kang Microsoft Yonsei University Yonsei University Beijing, China Seoul, South Korea Seoul, South Korea [email protected] [email protected] [email protected] Abstract—We propose a linear prediction (LP)-based wave- than the speech signal, the training and generation processes form generation method via WaveNet vocoding framework. A become more efficient. WaveNet-based neural vocoder has significantly improved the However, the synthesized speech is likely to be unnatural quality of parametric text-to-speech (TTS) systems. However, it is challenging to effectively train the neural vocoder when the target when the prediction errors in estimating the excitation are database contains massive amount of acoustical information propagated through the LP synthesis process. As the effect such as prosody, style or expressiveness. As a solution, the of LP synthesis is not considered in the training process, the approaches that only generate the vocal source component by synthesis output is vulnerable to the variation of LP synthesis a neural vocoder have been proposed. However, they tend to filter. generate synthetic noise because the vocal source component is independently handled without considering the entire speech To alleviate this problem, we propose an LP-WaveNet, production process; where it is inevitable to come up with a which enables to jointly train the complicated interactions mismatch between vocal source and vocal tract filter.
    [Show full text]
  • Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning
    Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Lecture 11 Recurrent Neural Networks I CMSC 35246 Introduction Sequence Learning with Neural Networks Lecture 11 Recurrent Neural Networks I CMSC 35246 Some Sequence Tasks Figure credit: Andrej Karpathy Lecture 11 Recurrent Neural Networks I CMSC 35246 MLPs only accept an input of fixed dimensionality and map it to an output of fixed dimensionality Great e.g.: Inputs - Images, Output - Categories Bad e.g.: Inputs - Text in one language, Output - Text in another language MLPs treat every example independently. How is this problematic? Need to re-learn the rules of language from scratch each time Another example: Classify events after a fixed number of frames in a movie Need to resuse knowledge about the previous events to help in classifying the current. Problems with MLPs for Sequence Tasks The "API" is too limited. Lecture 11 Recurrent Neural Networks I CMSC 35246 Great e.g.: Inputs - Images, Output - Categories Bad e.g.: Inputs - Text in one language, Output - Text in another language MLPs treat every example independently. How is this problematic? Need to re-learn the rules of language from scratch each time Another example: Classify events after a fixed number of frames in a movie Need to resuse knowledge about the previous events to help in classifying the current. Problems with MLPs for Sequence Tasks The "API" is too limited. MLPs only accept an input of fixed dimensionality and map it to an output of fixed dimensionality Lecture 11 Recurrent Neural Networks I CMSC 35246 Bad e.g.: Inputs - Text in one language, Output - Text in another language MLPs treat every example independently.
    [Show full text]
  • A Guide to Recurrent Neural Networks and Backpropagation
    A guide to recurrent neural networks and backpropagation Mikael Bod´en¤ [email protected] School of Information Science, Computer and Electrical Engineering Halmstad University. November 13, 2001 Abstract This paper provides guidance to some of the concepts surrounding recurrent neural networks. Contrary to feedforward networks, recurrent networks can be sensitive, and be adapted to past inputs. Backpropagation learning is described for feedforward networks, adapted to suit our (probabilistic) modeling needs, and extended to cover recurrent net- works. The aim of this brief paper is to set the scene for applying and understanding recurrent neural networks. 1 Introduction It is well known that conventional feedforward neural networks can be used to approximate any spatially finite function given a (potentially very large) set of hidden nodes. That is, for functions which have a fixed input space there is always a way of encoding these functions as neural networks. For a two-layered network, the mapping consists of two steps, y(t) = G(F (x(t))): (1) We can use automatic learning techniques such as backpropagation to find the weights of the network (G and F ) if sufficient samples from the function is available. Recurrent neural networks are fundamentally different from feedforward architectures in the sense that they not only operate on an input space but also on an internal state space – a trace of what already has been processed by the network. This is equivalent to an Iterated Function System (IFS; see (Barnsley, 1993) for a general introduction to IFSs; (Kolen, 1994) for a neural network perspective) or a Dynamical System (DS; see e.g.
    [Show full text]
  • Tiny Imagenet Challenge
    Tiny ImageNet Challenge Jiayu Wu Qixiang Zhang Guoxi Xu Stanford University Stanford University Stanford University [email protected] [email protected] [email protected] Abstract Our best model, a fine-tuned Inception-ResNet, achieves a top-1 error rate of 43.10% on test dataset. Moreover, we We present image classification systems using Residual implemented an object localization network based on a Network(ResNet), Inception-Resnet and Very Deep Convo- RNN with LSTM [7] cells, which achieves precise results. lutional Networks(VGGNet) architectures. We apply data In the Experiments and Evaluations section, We will augmentation, dropout and other regularization techniques present thorough analysis on the results, including per-class to prevent over-fitting of our models. What’s more, we error analysis, intermediate output distribution, the impact present error analysis based on per-class accuracy. We of initialization, etc. also explore impact of initialization methods, weight decay and network depth on system performance. Moreover, visu- 2. Related Work alization of intermediate outputs and convolutional filters are shown. Besides, we complete an extra object localiza- Deep convolutional neural networks have enabled the tion system base upon a combination of Recurrent Neural field of image recognition to advance in an unprecedented Network(RNN) and Long Short Term Memroy(LSTM) units. pace over the past decade. Our best classification model achieves a top-1 test error [10] introduces AlexNet, which has 60 million param- rate of 43.10% on the Tiny ImageNet dataset, and our best eters and 650,000 neurons. The model consists of five localization model can localize with high accuracy more convolutional layers, and some of them are followed by than 1 objects, given training images with 1 object labeled.
    [Show full text]