Chapter 11. Recurrent Networks

Total Page:16

File Type:pdf, Size:1020Kb

Chapter 11. Recurrent Networks Table of Contents CHAPTER XI- TRAINING AND USING RECURRENT NETWORKS ................................................................3 1. INTRODUCTION................................................................................................................................4 2. SIMPLE RECURRENT TOPOLOGIES ..................................................................................................5 3. ADAPTING THE FEEDBACK PARAMETER .............................................................................................8 4. UNFOLDING RECURRENT NETWORKS IN TIME...................................................................................11 5. THE DISTRIBUTED TLFN TOPOLOGY...............................................................................................24 6. DYNAMICAL SYSTEMS....................................................................................................................31 7. RECURRENT NEURAL NETWORKS ..................................................................................................34 8. LEARNING RULES FOR RECURRENT SYSTEMS ................................................................................36 9. APPLICATIONS OF DYNAMIC NETWORKS TO SYSTEM IDENTIFICATION AND CONTROL...........................43 10. HOPFIELD NETWORKS .................................................................................................................50 11. GROSSBERG’S ADDITIVE MODEL...................................................................................................59 12. BEYOND FIRST ORDER DYNAMICS: FREEMAN’S MODEL...................................................................62 13. CONCLUSIONS ............................................................................................................................68 BACKPROPAGATION VERSUS BPTT....................................................................................................72 VECTOR SPACE INTERPRETATION OF TLFNS ......................................................................................72 ADVANTAGE OF LINEAR MEMORY PES ................................................................................................75 TRAINING FOCUSED TLFNS ...............................................................................................................75 TRAINING THE GAMMA FILTER ............................................................................................................77 TRAINING ALTERNATE MEMORIES........................................................................................................78 TLFN ARCHITECTURES .....................................................................................................................79 DYNAMIC BACKPROPAGATION.............................................................................................................83 DERIVATION OF ENERGY FUNCTION.....................................................................................................85 FULLY RECURRENT ............................................................................................................................86 TLRN ..............................................................................................................................................86 TRAJECTORY.....................................................................................................................................87 FIXED POINT......................................................................................................................................87 HOPFIELD .........................................................................................................................................87 EQ.1 ................................................................................................................................................87 EQ.3 ................................................................................................................................................87 UNFOLDING.......................................................................................................................................87 EQ.6 ................................................................................................................................................88 EQ.8 ................................................................................................................................................88 EQ.9 ................................................................................................................................................88 EQ.10 ..............................................................................................................................................88 EQ.12 ..............................................................................................................................................88 EQ.4 ................................................................................................................................................88 EQ.15 ..............................................................................................................................................89 EQ.11 ..............................................................................................................................................89 EQ.24 ..............................................................................................................................................89 EQ.5 ................................................................................................................................................89 ATTRACTOR ......................................................................................................................................89 DKFL ................................................................................................................................................89 FREEMAN .........................................................................................................................................89 LUIS ALMEIDA ...................................................................................................................................90 EQ.46 ..............................................................................................................................................90 EQ. 48 .............................................................................................................................................90 EQ.22 ..............................................................................................................................................90 EQ.25 ..............................................................................................................................................90 EQ.31 ..............................................................................................................................................90 EQ.34 ..............................................................................................................................................91 EQ.33 ..............................................................................................................................................91 EQ.38 ..............................................................................................................................................91 EQ.45 ..............................................................................................................................................91 1 LEE GILES ........................................................................................................................................91 EQ.35 ..............................................................................................................................................91 EQ.32 ..............................................................................................................................................92 NARENDRA .......................................................................................................................................92 WAN ................................................................................................................................................92 EQ.14 ..............................................................................................................................................92 EQ.31 ..............................................................................................................................................92 BENGIO ............................................................................................................................................92 FELDKAMP........................................................................................................................................92 2 Chapter XI- Training and Using Recurrent Networks Version 2.0 This Chapter is Part of: Neural and Adaptive Systems: Fundamentals Through Simulation© by Jose C. Principe Neil R. Euliano W. Curt Lefebvre Copyright 1997 Principe The goal of this chapter is to introduce the following concepts: • Why backpropagation can not train recurrent systems. • Develop the backpropagation through time algorithm. • Introduce and train distributed TLFNs. • Provide the basic theory to study the gamma model • Introduce and train fully recurrent networks. • Explain fixed point learning.
Recommended publications
  • Chombining Recurrent Neural Networks and Adversarial Training
    Combining Recurrent Neural Networks and Adversarial Training for Human Motion Modelling, Synthesis and Control † ‡ Zhiyong Wang∗ Jinxiang Chai Shihong Xia Institute of Computing Texas A&M University Institute of Computing Technology CAS Technology CAS University of Chinese Academy of Sciences ABSTRACT motions from the generator using recurrent neural networks (RNNs) This paper introduces a new generative deep learning network for and refines the generated motion using an adversarial neural network human motion synthesis and control. Our key idea is to combine re- which we call the “refiner network”. Fig 2 gives an overview of current neural networks (RNNs) and adversarial training for human our method: a motion sequence XRNN is generated with the gener- motion modeling. We first describe an efficient method for training ator G and is refined using the refiner network R. To add realism, a RNNs model from prerecorded motion data. We implement recur- we train our refiner network using an adversarial loss, similar to rent neural networks with long short-term memory (LSTM) cells Generative Adversarial Networks (GANs) [9] such that the refined because they are capable of handling nonlinear dynamics and long motion sequences Xre fine are indistinguishable from real motion cap- term temporal dependencies present in human motions. Next, we ture sequences Xreal using a discriminative network D. In addition, train a refiner network using an adversarial loss, similar to Gener- we embed contact information into the generative model to further ative Adversarial Networks (GANs), such that the refined motion improve the quality of the generated motions. sequences are indistinguishable from real motion capture data using We construct the generator G based on recurrent neural network- a discriminative network.
    [Show full text]
  • A Relativistic Extension of Hopfield Neural Networks Via the Mechanical Analogy
    A relativistic extension of Hopfield neural networks via the mechanical analogy Adriano Barra,a;b;c Matteo Beccariaa;b and Alberto Fachechia;b aDipartimento di Matematica e Fisica Ennio De Giorgi, Universit`adel Salento, Lecce, Italy bINFN, Istituto Nazionale di Fisica Nucleare, Sezione di Lecce, Italy cGNFM-INdAM, Gruppo Nazionale per la Fisica Matematica, Sezione di Lecce, Italy E-mail: [email protected], [email protected], [email protected] Abstract: We propose a modification of the cost function of the Hopfield model whose salient features shine in its Taylor expansion and result in more than pairwise interactions with alternate signs, suggesting a unified framework for handling both with deep learning and network pruning. In our analysis, we heavily rely on the Hamilton-Jacobi correspon- dence relating the statistical model with a mechanical system. In this picture, our model is nothing but the relativistic extension of the original Hopfield model (whose cost function is a quadratic form in the Mattis magnetization which mimics the non-relativistic Hamiltonian for a free particle). We focus on the low-storage regime and solve the model analytically by taking advantage of the mechanical analogy, thus obtaining a complete characterization of the free energy and the associated self-consistency equations in the thermodynamic limit. On the numerical side, we test the performances of our proposal with MC simulations, showing that the stability of spurious states (limiting the capabilities of the standard Heb- bian
    [Show full text]
  • Recurrent Neural Network for Text Classification with Multi-Task
    Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Recurrent Neural Network for Text Classification with Multi-Task Learning Pengfei Liu Xipeng Qiu⇤ Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, Fudan University School of Computer Science, Fudan University 825 Zhangheng Road, Shanghai, China pfliu14,xpqiu,xjhuang @fudan.edu.cn { } Abstract are based on unsupervised objectives such as word predic- tion for training [Collobert et al., 2011; Turian et al., 2010; Neural network based methods have obtained great Mikolov et al., 2013]. This unsupervised pre-training is effec- progress on a variety of natural language process- tive to improve the final performance, but it does not directly ing tasks. However, in most previous works, the optimize the desired task. models are learned based on single-task super- vised objectives, which often suffer from insuffi- Multi-task learning utilizes the correlation between related cient training data. In this paper, we use the multi- tasks to improve classification by learning tasks in parallel. [ task learning framework to jointly learn across mul- Motivated by the success of multi-task learning Caruana, ] tiple related tasks. Based on recurrent neural net- 1997 , there are several neural network based NLP models [ ] work, we propose three different mechanisms of Collobert and Weston, 2008; Liu et al., 2015b utilize multi- sharing information to model text with task-specific task learning to jointly learn several tasks with the aim of and shared layers. The entire network is trained mutual benefit. The basic multi-task architectures of these jointly on all these tasks. Experiments on four models are to share some lower layers to determine common benchmark text classification tasks show that our features.
    [Show full text]
  • Optimization by Mean Field Annealing
    91 OPTIMIZATION BY MEAN FIELD ANNEALING Griff Bilbro Reinhold Mann Thomas K. Miller ECE Dept. Eng. Physics and Math. Div. ECE Dept. NCSU Oak Ridge N atl. Lab. NCSU Raleigh, NC 27695 Oak Ridge, TN 37831 Raleigh, N C 27695 Wesley. E. Snyder David E. Van den Bout Mark White ECE Dept. ECE Dept. ECE Dept. NCSU NCSU NCSU Raleigh, NC 27695 Raleigh, NC 27695 Raleigh, NC 27695 ABSTRACT Nearly optimal solutions to many combinatorial problems can be found using stochastic simulated annealing. This paper extends the concept of simulated annealing from its original formulation as a Markov process to a new formulation based on mean field theory. Mean field annealing essentially replaces the discrete de­ grees of freedom in simulated annealing with their average values as computed by the mean field approximation. The net result is that equilibrium at a given temperature is achieved 1-2 orders of magnitude faster than with simulated annealing. A general frame­ work for the mean field annealing algorithm is derived, and its re­ lationship to Hopfield networks is shown. The behavior of MFA is examined both analytically and experimentally for a generic combi­ natorial optimization problem: graph bipartitioning. This analysis indicates the presence of critical temperatures which could be im­ portant in improving the performance of neural networks. STOCHASTIC VERSUS MEAN FIELD In combinatorial optimization problems, an objective function or Hamiltonian, H(s), is presented which depends on a vector of interacting 3pim, S = {81," .,8N}, in some complex nonlinear way. Stochastic simulated annealing (SSA) (S. Kirk­ patrick, C. Gelatt, and M. Vecchi (1983)) finds a global minimum of H by com­ bining gradient descent with a random process.
    [Show full text]
  • 7Network Models
    7 Network Models 7.1 Introduction Extensive synaptic connectivity is a hallmark of neural circuitry. For ex- ample, a typical neuron in the mammalian neocortex receives thousands of synaptic inputs. Network models allow us to explore the computational potential of such connectivity, using both analysis and simulations. As illustrations, we study in this chapter how networks can perform the fol- lowing tasks: coordinate transformations needed in visually guided reach- ing, selective amplification leading to models of simple and complex cells in primary visual cortex, integration as a model of short-term memory, noise reduction, input selection, gain modulation, and associative mem- ory. Networks that undergo oscillations are also analyzed, with applica- tion to the olfactory bulb. Finally, we discuss network models based on stochastic rather than deterministic dynamics, using the Boltzmann ma- chine as an example. Neocortical circuits are a major focus of our discussion. In the neocor- tex, which forms the convoluted outer surface of the (for example) human brain, neurons lie in six vertical layers highly coupled within cylindrical columns. Such columns have been suggested as basic functional units, cortical columns and stereotypical patterns of connections both within a column and be- tween columns are repeated across cortex. There are three main classes of interconnections within cortex, and in other areas of the brain as well. Feedforward connections bring input to a given region from another re- feedforward, gion located at an earlier stage along a particular processing pathway. Re- recurrent, current synapses interconnect neurons within a particular region that are and top-down considered to be at the same stage along the processing pathway.
    [Show full text]
  • Chapter 14 Stochastic Networks
    14 Stochastic Networks 14.1 Variations of the Hopfield model In the previous chapter we showed that Hopfield networks can be used to provide solutions to combinatorial problems that can be expressed as the minimization of an energy function, although without guaranteeing global optimality. Once the weights of the edges have been defined, the network shows spontaneous computational properties. Harnessing this spontaneous dynamics for useful computations requires some way of avoiding falling into local minima of the energy function in such a way that the global minimum is reached. In the case of the eight queens problem, the number of local minima is much higher than the number of global minima and very often the Hopfield network does not stabilize at a correct solution. The issue to be investigated is therefore whether a certain variation of the Hopfield model could achieve better results in combinatorial optimization problems. One possible strategy to improve the global convergence properties of a network consists in increasing the number of paths in search space, in such a way that not only binary but also real-valued states become possible. Contin- uous updates of the network state become possible and the search for minima of the energy function is no longer limited to the corners of a hypercube in state space. All interior points are now legal network states. A continuous acti- vation function also makes it possible to describe more precisely the electrical circuits used to implement Hopfield networks. A second strategy to avoid local minima of the energy function consists in introducing noise into the network dynamics.
    [Show full text]
  • Deep Learning Architectures for Sequence Processing
    Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright © 2021. All rights reserved. Draft of September 21, 2021. CHAPTER Deep Learning Architectures 9 for Sequence Processing Time will explain. Jane Austen, Persuasion Language is an inherently temporal phenomenon. Spoken language is a sequence of acoustic events over time, and we comprehend and produce both spoken and written language as a continuous input stream. The temporal nature of language is reflected in the metaphors we use; we talk of the flow of conversations, news feeds, and twitter streams, all of which emphasize that language is a sequence that unfolds in time. This temporal nature is reflected in some of the algorithms we use to process lan- guage. For example, the Viterbi algorithm applied to HMM part-of-speech tagging, proceeds through the input a word at a time, carrying forward information gleaned along the way. Yet other machine learning approaches, like those we’ve studied for sentiment analysis or other text classification tasks don’t have this temporal nature – they assume simultaneous access to all aspects of their input. The feedforward networks of Chapter 7 also assumed simultaneous access, al- though they also had a simple model for time. Recall that we applied feedforward networks to language modeling by having them look only at a fixed-size window of words, and then sliding this window over the input, making independent predictions along the way. Fig. 9.1, reproduced from Chapter 7, shows a neural language model with window size 3 predicting what word follows the input for all the. Subsequent words are predicted by sliding the window forward a word at a time.
    [Show full text]
  • Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning
    Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Lecture 11 Recurrent Neural Networks I CMSC 35246 Introduction Sequence Learning with Neural Networks Lecture 11 Recurrent Neural Networks I CMSC 35246 Some Sequence Tasks Figure credit: Andrej Karpathy Lecture 11 Recurrent Neural Networks I CMSC 35246 MLPs only accept an input of fixed dimensionality and map it to an output of fixed dimensionality Great e.g.: Inputs - Images, Output - Categories Bad e.g.: Inputs - Text in one language, Output - Text in another language MLPs treat every example independently. How is this problematic? Need to re-learn the rules of language from scratch each time Another example: Classify events after a fixed number of frames in a movie Need to resuse knowledge about the previous events to help in classifying the current. Problems with MLPs for Sequence Tasks The "API" is too limited. Lecture 11 Recurrent Neural Networks I CMSC 35246 Great e.g.: Inputs - Images, Output - Categories Bad e.g.: Inputs - Text in one language, Output - Text in another language MLPs treat every example independently. How is this problematic? Need to re-learn the rules of language from scratch each time Another example: Classify events after a fixed number of frames in a movie Need to resuse knowledge about the previous events to help in classifying the current. Problems with MLPs for Sequence Tasks The "API" is too limited. MLPs only accept an input of fixed dimensionality and map it to an output of fixed dimensionality Lecture 11 Recurrent Neural Networks I CMSC 35246 Bad e.g.: Inputs - Text in one language, Output - Text in another language MLPs treat every example independently.
    [Show full text]
  • Comparative Analysis of Recurrent Neural Network Architectures for Reservoir Inflow Forecasting
    water Article Comparative Analysis of Recurrent Neural Network Architectures for Reservoir Inflow Forecasting Halit Apaydin 1 , Hajar Feizi 2 , Mohammad Taghi Sattari 1,2,* , Muslume Sevba Colak 1 , Shahaboddin Shamshirband 3,4,* and Kwok-Wing Chau 5 1 Department of Agricultural Engineering, Faculty of Agriculture, Ankara University, Ankara 06110, Turkey; [email protected] (H.A.); [email protected] (M.S.C.) 2 Department of Water Engineering, Agriculture Faculty, University of Tabriz, Tabriz 51666, Iran; [email protected] 3 Department for Management of Science and Technology Development, Ton Duc Thang University, Ho Chi Minh City, Vietnam 4 Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam 5 Department of Civil and Environmental Engineering, Hong Kong Polytechnic University, Hong Kong, China; [email protected] * Correspondence: [email protected] or [email protected] (M.T.S.); [email protected] (S.S.) Received: 1 April 2020; Accepted: 21 May 2020; Published: 24 May 2020 Abstract: Due to the stochastic nature and complexity of flow, as well as the existence of hydrological uncertainties, predicting streamflow in dam reservoirs, especially in semi-arid and arid areas, is essential for the optimal and timely use of surface water resources. In this research, daily streamflow to the Ermenek hydroelectric dam reservoir located in Turkey is simulated using deep recurrent neural network (RNN) architectures, including bidirectional long short-term memory (Bi-LSTM), gated recurrent unit (GRU), long short-term memory (LSTM), and simple recurrent neural networks (simple RNN). For this purpose, daily observational flow data are used during the period 2012–2018, and all models are coded in Python software programming language.
    [Show full text]
  • Statistical Computations Underlying the Dynamics of Memory Updating
    Statistical Computations Underlying the Dynamics of Memory Updating Samuel J. Gershman1*, Angela Radulescu2, Kenneth A. Norman2, Yael Niv2 1 Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachussetts, United States of America, 2 Department of Psychology and Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America Abstract Psychophysical and neurophysiological studies have suggested that memory is not simply a carbon copy of our experience: Memories are modified or new memories are formed depending on the dynamic structure of our experience, and specifically, on how gradually or abruptly the world changes. We present a statistical theory of memory formation in a dynamic environment, based on a nonparametric generalization of the switching Kalman filter. We show that this theory can qualitatively account for several psychophysical and neural phenomena, and present results of a new visual memory experiment aimed at testing the theory directly. Our experimental findings suggest that humans can use temporal discontinuities in the structure of the environment to determine when to form new memory traces. The statistical perspective we offer provides a coherent account of the conditions under which new experience is integrated into an old memory versus forming a new memory, and shows that memory formation depends on inferences about the underlying structure of our experience. Citation: Gershman SJ, Radulescu A, Norman KA, Niv Y (2014) Statistical Computations Underlying the Dynamics of Memory Updating. PLoS Comput Biol 10(11): e1003939. doi:10.1371/journal.pcbi.1003939 Editor: Olaf Sporns, Indiana University, United States of America Received August 19, 2013; Accepted September 26, 2014; Published November 6, 2014 Copyright: ß 2014 Gershman et al.
    [Show full text]
  • A Guide to Recurrent Neural Networks and Backpropagation
    A guide to recurrent neural networks and backpropagation Mikael Bod´en¤ [email protected] School of Information Science, Computer and Electrical Engineering Halmstad University. November 13, 2001 Abstract This paper provides guidance to some of the concepts surrounding recurrent neural networks. Contrary to feedforward networks, recurrent networks can be sensitive, and be adapted to past inputs. Backpropagation learning is described for feedforward networks, adapted to suit our (probabilistic) modeling needs, and extended to cover recurrent net- works. The aim of this brief paper is to set the scene for applying and understanding recurrent neural networks. 1 Introduction It is well known that conventional feedforward neural networks can be used to approximate any spatially finite function given a (potentially very large) set of hidden nodes. That is, for functions which have a fixed input space there is always a way of encoding these functions as neural networks. For a two-layered network, the mapping consists of two steps, y(t) = G(F (x(t))): (1) We can use automatic learning techniques such as backpropagation to find the weights of the network (G and F ) if sufficient samples from the function is available. Recurrent neural networks are fundamentally different from feedforward architectures in the sense that they not only operate on an input space but also on an internal state space – a trace of what already has been processed by the network. This is equivalent to an Iterated Function System (IFS; see (Barnsley, 1993) for a general introduction to IFSs; (Kolen, 1994) for a neural network perspective) or a Dynamical System (DS; see e.g.
    [Show full text]
  • Overview of Lstms and Word2vec and a Bit About Compositional Distributional Semantics If There’S Time
    Overview of LSTMs and word2vec and a bit about compositional distributional semantics if there’s time Ann Copestake Computer Laboratory University of Cambridge November 2016 Outline RNNs and LSTMs Word2vec Compositional distributional semantics Some slides adapted from Aurelie Herbelot. Outline. RNNs and LSTMs Word2vec Compositional distributional semantics Motivation I Standard NNs cannot handle sequence information well. I Can pass them sequences encoded as vectors, but input vectors are fixed length. I Models are needed which are sensitive to sequence input and can output sequences. I RNN: Recurrent neural network. I Long short term memory (LSTM): development of RNN, more effective for most language applications. I More info: http://neuralnetworksanddeeplearning.com/ (mostly about simpler models and CNNs) https://karpathy.github.io/2015/05/21/rnn-effectiveness/ http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Sequences I Video frame categorization: strict time sequence, one output per input. I Real-time speech recognition: strict time sequence. I Neural MT: target not one-to-one with source, order differences. I Many language tasks: best to operate left-to-right and right-to-left (e.g., bi-LSTM). I attention: model ‘concentrates’ on part of input relevant at a particular point. Caption generation: treat image data as ordered, align parts of image with parts of caption. Recurrent Neural Networks http://colah.github.io/posts/ 2015-08-Understanding-LSTMs/ RNN language model: Mikolov et al, 2010 RNN as a language model I Input vector: vector for word at t concatenated to vector which is output from context layer at t − 1. I Performance better than n-grams but won’t capture ‘long-term’ dependencies: She shook her head.
    [Show full text]