Learning in Large Scale Spiking Neural Networks

Total Page:16

File Type:pdf, Size:1020Kb

Learning in Large Scale Spiking Neural Networks Learning in large-scale spiking neural networks by Trevor Bekolay A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Computer Science Waterloo, Ontario, Canada, 2011 c Trevor Bekolay 2011 I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Abstract Learning is central to the exploration of intelligence. Psychology and machine learning provide high-level explanations of how rational agents learn. Neuroscience provides low- level descriptions of how the brain changes as a result of learning. This thesis attempts to bridge the gap between these two levels of description by solving problems using ma- chine learning ideas, implemented in biologically plausible spiking neural networks with experimentally supported learning rules. We present three novel neural models that contribute to the understanding of how the brain might solve the three main problems posed by machine learning: supervised learning, in which the rational agent has a fine-grained feedback signal, reinforcement learning, in which the agent gets sparse feedback, and unsupervised learning, in which the agents has no explicit environmental feedback. In supervised learning, we argue that previous models of supervised learning in spiking neural networks solve a problem that is less general than the supervised learning problem posed by machine learning. We use an existing learning rule to solve the general supervised learning problem with a spiking neural network. We show that the learning rule can be mapped onto the well-known backpropagation rule used in artificial neural networks. In reinforcement learning, we augment an existing model of the basal ganglia to im- plement a simple actor-critic model that has a direct mapping to brain areas. The model is used to recreate behavioural and neural results from an experimental study of rats per- forming a simple reinforcement learning task. In unsupervised learning, we show that the BCM rule, a common learning rule used in unsupervised learning with rate-based neurons, can be adapted to a spiking neural network. We recreate the effects of STDP, a learning rule with strict time dependencies, using BCM, which does not explicitly remember the times of previous spikes. The simulations suggest that BCM is a more general rule than STDP. Finally, we propose a novel learning rule that can be used in all three of these simu- lations. The existence of such a rule suggests that the three types of learning examined separately in machine learning may not be implemented with separate processes in the brain. iii Acknowledgements I would like to thank Xuan Choo, Travis DeWolf, Bruce Bobier, Daniel Rasmussen, Terry Stewart, and Charlie Tang for their help getting acquainted with the University of Waterloo, the Center for Theoretical Neuroscience, and the NEF. Extra thanks to Terry and Xuan for continuing help with NEF models and Nengo programming, and for providing some of the figures in this thesis. Some comments on early drafts were provided by Lucy Spardy and Travis DeWolf, to whom I am grateful. Gratitude also goes to my readers, Matthijs van der Meer and Jeff Orchard, for their comments and fruitful discussion. Most of all, I am truly indebted to my supervisor Chris Eliasmith, whose insights are responsible for any good ideas that may have sneaked into this thesis. iv Dedication For my parents, Cathy and David, who would be proud of me if my only publication was on their refrigerator. v Table of Contents List of Figures xi 1 Introduction1 1.1 Thesis organization...............................2 1.2 Thesis goals...................................3 2 Synaptic plasticity4 2.1 Hebbian learning................................6 2.1.1 Long-term potentiation (LTP).....................8 2.1.2 Long-term depression (LTD)...................... 11 2.1.3 Spike-timing dependent plasticity (STDP).............. 12 2.2 Non-Hebbian plasticity............................. 16 2.3 Dopamine modulated plasticity........................ 17 2.4 Explaining behaviour with synaptic strengths................ 18 3 Large-scale neural modelling 20 3.1 Single-neuron models.............................. 20 3.1.1 Leaky integrate-and-fire model..................... 21 3.2 The Neural Engineering Framework...................... 24 3.2.1 Representation............................. 24 3.2.2 Transformation............................. 32 vi 3.3 Plasticity in the NEF.............................. 37 3.3.1 Error minimization rule........................ 37 4 Supervised learning 40 4.1 Supervised learning in traditional artificial neural networks......... 41 4.1.1 Backpropagation............................ 41 4.2 Supervised learning in spiking neural networks................ 44 4.2.1 Temporal-coding based models.................... 46 4.2.2 Biologically plausible backpropagation................ 48 4.3 Supervised learning with the NEF....................... 49 4.3.1 Theoretical argument.......................... 49 4.3.2 Biological plausibility.......................... 52 4.3.3 Simulation results............................ 54 4.3.4 Conclusion................................ 61 5 Reinforcement learning 62 5.1 Reinforcement learning in traditional artificial neural networks....... 63 5.1.1 The agent-environment interface.................... 63 5.1.2 Markov decision processes....................... 64 5.1.3 Value functions............................. 65 5.1.4 Temporal-difference learning...................... 67 5.1.5 Using artificial neural networks.................... 68 5.1.6 Comparison to supervised learning.................. 69 5.2 Reinforcement learning in spiking neural networks.............. 70 5.2.1 Dopamine may encode TD-like reward prediction error....... 70 5.2.2 Previous neural models......................... 73 5.3 Reinforcement learning in the NEF...................... 79 5.3.1 Action selection............................. 79 vii 5.3.2 Critiquing the actor........................... 81 5.3.3 Simulation results............................ 81 5.3.4 Challenges for the current model................... 84 5.3.5 Conclusions............................... 88 6 Unsupervised learning 89 6.1 Unsupervised learning in traditional artificial neural networks....... 90 6.2 Unsupervised learning in spiking neural networks.............. 90 6.2.1 Artola, Br¨ocher, Singer (ABS) rule.................. 91 6.2.2 Bienenstock, Cooper, Munro (BCM) rule............... 92 6.2.3 Spike-timing dependent plasticity rules................ 96 6.2.4 Relationship between BCM and STDP rules............. 98 6.2.5 General unsupervised learning..................... 99 6.3 Unsupervised learning in the NEF....................... 99 6.3.1 Simulation results............................ 100 6.3.2 A unifying learning rule........................ 105 7 Discussion and conclusions 107 7.1 Large-scale unsupervised learning....................... 107 7.2 Model-based reinforcement learning...................... 108 7.3 Supervised error signals............................ 108 7.3.1 Solving the supervised spike-time learning problem......... 109 7.4 Computational complexity........................... 109 7.4.1 Decoder-level learning......................... 110 References 130 viii List of Figures 2.1 Illustration of the main parts of a neuron...................4 2.2 Illustration of the main parts of a synapse...................5 2.3 First experimental evidence of LTP.......................9 2.4 Illustration depicting homosynaptic and associative LTP........... 10 2.5 Illustration depicting heterosynaptic and homosynaptic LTD......... 11 2.6 Evidence that temporal order of pre- and postsynaptic stimulation affects LTP/LTD induction............................... 13 2.7 The classical STDP curve............................ 14 2.8 Five different STDP curves, showing the diversity of STDP in different synapses. Recreated from [1].......................... 15 2.9 Frequency dependence of the STDP protocol................. 17 3.1 Illustration of the bilipid membrane of a neuron. Labelled items refer to elements of the circuit diagram, figure 3.2. Recreated from [58]....... 22 3.2 Circuit diagram that corresponds to the leaky integrate-and-fire (LIF) neu- ron. Recreated from [58]............................. 22 3.3 Membrane voltage of a LIF neuron with constant input J, from [42]..... 23 3.4 Example tuning curves. (Left) Experimentally determined tuning curves, from [137]. (Right) Similar tuning curves for LIF neurons, from [42]..... 28 3.5 Illustration showing how the tuning curves of a population of LIF neurons can be linearly combined to estimate an input signal, x. From [42]..... 30 3.6 Decoding a time-varying scalar signal using a filtered spike train....... 32 ix 3.7 Network structure for computing a non-linear transformation of values en- coded in two separate populations....................... 35 3.8 Illustration showing how the tuning curves of a population of LIF neu- rons can be linearly combined to estimate a nonlinear function, in this case sin(2πx). From [42]............................... 36 4.1 The supervised learning problem........................ 41 4.2 Multi-layer perceptron architecture....................... 42 4.3 Sigmoid curves.................................. 43 4.4 Architecture of a model of the cerebellum that posits that the cerebellum is
Recommended publications
  • Machine Learning for Neuroscience Geoffrey E Hinton
    Hinton Neural Systems & Circuits 2011, 1:12 http://www.neuralsystemsandcircuits.com/content/1/1/12 QUESTIONS&ANSWERS Open Access Machine learning for neuroscience Geoffrey E Hinton What is machine learning? Could you provide a brief description of the Machine learning is a type of statistics that places parti- methods of machine learning? cular emphasis on the use of advanced computational Machine learning can be divided into three parts: 1) in algorithms. As computers become more powerful, and supervised learning, the aim is to predict a class label or modern experimental methods in areas such as imaging a real value from an input (classifying objects in images generate vast bodies of data, machine learning is becom- or predicting the future value of a stock are examples of ing ever more important for extracting reliable and this type of learning); 2) in unsupervised learning, the meaningful relationships and for making accurate pre- aim is to discover good features for representing the dictions. Key strands of modern machine learning grew input data; and 3) in reinforcement learning, the aim is out of attempts to understand how large numbers of to discover what action should be performed next in interconnected, more or less neuron-like elements could order to maximize the eventual payoff. learn to achieve behaviourally meaningful computations and to extract useful features from images or sound What does machine learning have to do with waves. neuroscience? By the 1990s, key approaches had converged on an Machine learning has two very different relationships to elegant framework called ‘graphical models’, explained neuroscience. As with any other science, modern in Koller and Friedman, in which the nodes of a graph machine-learning methods can be very helpful in analys- represent variables such as edges and corners in an ing the data.
    [Show full text]
  • Ce Document Est Le Fruit D'un Long Travail Approuvé Par Le Jury De Soutenance Et Mis À Disposition De L'ensemble De La Communauté Universitaire Élargie
    AVERTISSEMENT Ce document est le fruit d'un long travail approuvé par le jury de soutenance et mis à disposition de l'ensemble de la communauté universitaire élargie. Il est soumis à la propriété intellectuelle de l'auteur. Ceci implique une obligation de citation et de référencement lors de l’utilisation de ce document. D'autre part, toute contrefaçon, plagiat, reproduction illicite encourt une poursuite pénale. Contact : [email protected] LIENS Code de la Propriété Intellectuelle. articles L 122. 4 Code de la Propriété Intellectuelle. articles L 335.2- L 335.10 http://www.cfcopies.com/V2/leg/leg_droi.php http://www.culture.gouv.fr/culture/infos-pratiques/droits/protection.htm Ecole´ doctorale IAEM Lorraine D´epartement de formation doctorale en informatique Apprentissage spatial de corr´elations multimodales par des m´ecanismes d'inspiration corticale THESE` pour l'obtention du Doctorat de l'universit´ede Lorraine (sp´ecialit´einformatique) par Mathieu Lefort Composition du jury Pr´esident: Jean-Christophe Buisson, Professeur, ENSEEIHT Rapporteurs : Arnaud Revel, Professeur, Universit´ede La Rochelle Hugues Berry, CR HDR, INRIA Rh^one-Alpes Examinateurs : Beno^ıtGirard, CR HDR, CNRS ISIR Yann Boniface, MCF, Universit´ede Lorraine Directeur de th`ese: Bernard Girau, Professeur, Universit´ede Lorraine Laboratoire Lorrain de Recherche en Informatique et ses Applications | UMR 7503 Mis en page avec la classe thloria. i À tous ceux qui ont trouvé ou trouveront un intérêt à lire ce manuscrit ou à me côtoyer. ii iii Remerciements Je tiens à remercier l’ensemble de mes rapporteurs, Arnaud Revel et Hugues Berry, pour avoir accepter de prendre le temps d’évaluer mon travail et de l’avoir commenté de manière constructive malgré les délais relativement serrés.
    [Show full text]
  • Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
    Deep Neuroevolution: Genetic Algorithms are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning Felipe Petroski Such Vashisht Madhavan Edoardo Conti Joel Lehman Kenneth O. Stanley Jeff Clune Uber AI Labs ffelipe.such, [email protected] Abstract 1. Introduction Deep artificial neural networks (DNNs) are typ- A recent trend in machine learning and AI research is that ically trained via gradient-based learning al- old algorithms work remarkably well when combined with gorithms, namely backpropagation. Evolution sufficient computing resources and data. That has been strategies (ES) can rival backprop-based algo- the story for (1) backpropagation applied to deep neu- rithms such as Q-learning and policy gradients ral networks in supervised learning tasks such as com- on challenging deep reinforcement learning (RL) puter vision (Krizhevsky et al., 2012) and voice recog- problems. However, ES can be considered a nition (Seide et al., 2011), (2) backpropagation for deep gradient-based algorithm because it performs neural networks combined with traditional reinforcement stochastic gradient descent via an operation sim- learning algorithms, such as Q-learning (Watkins & Dayan, ilar to a finite-difference approximation of the 1992; Mnih et al., 2015) or policy gradient (PG) methods gradient. That raises the question of whether (Sehnke et al., 2010; Mnih et al., 2016), and (3) evolution non-gradient-based evolutionary algorithms can strategies (ES) applied to reinforcement learning bench- work at DNN scales. Here we demonstrate they marks (Salimans et al., 2017). One common theme is can: we evolve the weights of a DNN with a sim- that all of these methods are gradient-based, including ES, ple, gradient-free, population-based genetic al- which involves a gradient approximation similar to finite gorithm (GA) and it performs well on hard deep differences (Williams, 1992; Wierstra et al., 2008; Sali- RL problems, including Atari and humanoid lo- mans et al., 2017).
    [Show full text]
  • Acetylcholine in Cortical Inference
    Neural Networks 15 (2002) 719–730 www.elsevier.com/locate/neunet 2002 Special issue Acetylcholine in cortical inference Angela J. Yu*, Peter Dayan Gatsby Computational Neuroscience Unit, University College London, 17 Queen Square, London WC1N 3AR, UK Received 5 October 2001; accepted 2 April 2002 Abstract Acetylcholine (ACh) plays an important role in a wide variety of cognitive tasks, such as perception, selective attention, associative learning, and memory. Extensive experimental and theoretical work in tasks involving learning and memory has suggested that ACh reports on unfamiliarity and controls plasticity and effective network connectivity. Based on these computational and implementational insights, we develop a theory of cholinergic modulation in perceptual inference. We propose that ACh levels reflect the uncertainty associated with top- down information, and have the effect of modulating the interaction between top-down and bottom-up processing in determining the appropriate neural representations for inputs. We illustrate our proposal by means of an hierarchical hidden Markov model, showing that cholinergic modulation of contextual information leads to appropriate perceptual inference. q 2002 Elsevier Science Ltd. All rights reserved. Keywords: Acetylcholine; Perception; Neuromodulation; Representational inference; Hidden Markov model; Attention 1. Introduction medial septum (MS), diagonal band of Broca (DBB), and nucleus basalis (NBM). Physiological studies on ACh Neuromodulators such as acetylcholine (ACh), sero- indicate that its neuromodulatory effects at the cellular tonine, dopamine, norepinephrine, and histamine play two level are diverse, causing synaptic facilitation and suppres- characteristic roles. One, most studied in vertebrate systems, sion as well as direct hyperpolarization and depolarization, concerns the control of plasticity. The other, most studied in all within the same cortical area (Kimura, Fukuda, & invertebrate systems, concerns the control of network Tsumoto, 1999).
    [Show full text]
  • Model-Free Episodic Control
    Model-Free Episodic Control Charles Blundell Benigno Uria Alexander Pritzel Google DeepMind Google DeepMind Google DeepMind [email protected] [email protected] [email protected] Yazhe Li Avraham Ruderman Joel Z Leibo Jack Rae Google DeepMind Google DeepMind Google DeepMind Google DeepMind [email protected] [email protected] [email protected] [email protected] Daan Wierstra Demis Hassabis Google DeepMind Google DeepMind [email protected] [email protected] Abstract State of the art deep reinforcement learning algorithms take many millions of inter- actions to attain human-level performance. Humans, on the other hand, can very quickly exploit highly rewarding nuances of an environment upon first discovery. In the brain, such rapid learning is thought to depend on the hippocampus and its capacity for episodic memory. Here we investigate whether a simple model of hippocampal episodic control can learn to solve difficult sequential decision- making tasks. We demonstrate that it not only attains a highly rewarding strategy significantly faster than state-of-the-art deep reinforcement learning algorithms, but also achieves a higher overall reward on some of the more challenging domains. 1 Introduction Deep reinforcement learning has recently achieved notable successes in a variety of domains [23, 32]. However, it is very data inefficient. For example, in the domain of Atari games [2], deep Reinforcement Learning (RL) systems typically require tens of millions of interactions with the game emulator, amounting to hundreds of hours of game play, to achieve human-level performance. As pointed out by [13], humans learn to play these games much faster. This paper addresses the question of how to emulate such fast learning abilities in a machine—without any domain-specific prior knowledge.
    [Show full text]
  • Enhancing Nervous System Recovery Through Neurobiologics, Neural Interface Training, and Neurorehabilitation
    UCLA UCLA Previously Published Works Title Enhancing Nervous System Recovery through Neurobiologics, Neural Interface Training, and Neurorehabilitation. Permalink https://escholarship.org/uc/item/0rb4m424 Authors Krucoff, Max O Rahimpour, Shervin Slutzky, Marc W et al. Publication Date 2016 DOI 10.3389/fnins.2016.00584 Peer reviewed eScholarship.org Powered by the California Digital Library University of California REVIEW published: 27 December 2016 doi: 10.3389/fnins.2016.00584 Enhancing Nervous System Recovery through Neurobiologics, Neural Interface Training, and Neurorehabilitation Max O. Krucoff 1*, Shervin Rahimpour 1, Marc W. Slutzky 2, 3, V. Reggie Edgerton 4 and Dennis A. Turner 1, 5, 6 1 Department of Neurosurgery, Duke University Medical Center, Durham, NC, USA, 2 Department of Physiology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA, 3 Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA, 4 Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, USA, 5 Department of Neurobiology, Duke University Medical Center, Durham, NC, USA, 6 Research and Surgery Services, Durham Veterans Affairs Medical Center, Durham, NC, USA After an initial period of recovery, human neurological injury has long been thought to be static. In order to improve quality of life for those suffering from stroke, spinal cord injury, or traumatic brain injury, researchers have been working to restore the nervous system and reduce neurological deficits through a number of mechanisms. For example, Edited by: neurobiologists have been identifying and manipulating components of the intra- and Ioan Opris, University of Miami, USA extracellular milieu to alter the regenerative potential of neurons, neuro-engineers have Reviewed by: been producing brain-machine and neural interfaces that circumvent lesions to restore Yoshio Sakurai, functionality, and neurorehabilitation experts have been developing new ways to revitalize Doshisha University, Japan Brian R.
    [Show full text]
  • When Planning to Survive Goes Wrong: Predicting the Future and Replaying the Past in Anxiety and PTSD
    Available online at www.sciencedirect.com ScienceDirect When planning to survive goes wrong: predicting the future and replaying the past in anxiety and PTSD 1 3 1,2 Christopher Gagne , Peter Dayan and Sonia J Bishop We increase our probability of survival and wellbeing by lay out this computational problem as a form of approxi- minimizing our exposure to rare, extremely negative events. In mate Bayesian decision theory (BDT) [1] and consider this article, we examine the computations used to predict and how miscalibrated attempts to solve it might contribute to avoid such events and to update our models of the world and anxiety and stress disorders. action policies after their occurrence. We also consider how these computations might go wrong in anxiety disorders and According to BDT, we should combine a probability Post Traumatic Stress Disorder (PTSD). We review evidence distribution over all relevant states of the world with that anxiety is linked to increased simulations of the future estimates of the benefits or costs of outcomes associated occurrence of high cost negative events and to elevated with each state. We must then calculate the course of estimates of the probability of occurrence of such events. We action that delivers the largest long-run expected value. also review psychological theories of PTSD in the light of newer, Individuals can only possibly approximately solve this computational models of updating through replay and problem. To do so, they bring to bear different sources of simulation. We consider whether pathological levels of re- information (e.g. priors, evidence, models of the world) experiencing symptomatology might reflect problems and apply different methods to calculate the expected reconciling the traumatic outcome with overly optimistic priors long-run values of alternate courses of action.
    [Show full text]
  • Pnas11052ackreviewers 5098..5136
    Acknowledgment of Reviewers, 2013 The PNAS editors would like to thank all the individuals who dedicated their considerable time and expertise to the journal by serving as reviewers in 2013. Their generous contribution is deeply appreciated. A Harald Ade Takaaki Akaike Heather Allen Ariel Amir Scott Aaronson Karen Adelman Katerina Akassoglou Icarus Allen Ido Amit Stuart Aaronson Zach Adelman Arne Akbar John Allen Angelika Amon Adam Abate Pia Adelroth Erol Akcay Karen Allen Hubert Amrein Abul Abbas David Adelson Mark Akeson Lisa Allen Serge Amselem Tarek Abbas Alan Aderem Anna Akhmanova Nicola Allen Derk Amsen Jonathan Abbatt Neil Adger Shizuo Akira Paul Allen Esther Amstad Shahal Abbo Noam Adir Ramesh Akkina Philip Allen I. Jonathan Amster Patrick Abbot Jess Adkins Klaus Aktories Toby Allen Ronald Amundson Albert Abbott Elizabeth Adkins-Regan Muhammad Alam James Allison Katrin Amunts Geoff Abbott Roee Admon Eric Alani Mead Allison Myron Amusia Larry Abbott Walter Adriani Pietro Alano Isabel Allona Gynheung An Nicholas Abbott Ruedi Aebersold Cedric Alaux Robin Allshire Zhiqiang An Rasha Abdel Rahman Ueli Aebi Maher Alayyoubi Abigail Allwood Ranjit Anand Zalfa Abdel-Malek Martin Aeschlimann Richard Alba Julian Allwood Beau Ances Minori Abe Ruslan Afasizhev Salim Al-Babili Eric Alm David Andelman Kathryn Abel Markus Affolter Salvatore Albani Benjamin Alman John Anderies Asa Abeliovich Dritan Agalliu Silas Alben Steven Almo Gregor Anderluh John Aber David Agard Mark Alber Douglas Almond Bogi Andersen Geoff Abers Aneel Aggarwal Reka Albert Genevieve Almouzni George Andersen Rohan Abeyaratne Anurag Agrawal R. Craig Albertson Noga Alon Gregers Andersen Susan Abmayr Arun Agrawal Roy Alcalay Uri Alon Ken Andersen Ehab Abouheif Paul Agris Antonio Alcami Claudio Alonso Olaf Andersen Soman Abraham H.
    [Show full text]
  • Conceptual Nervous System”: Can Computational Cognitive Neuroscience Transform Learning Theory?
    Beyond the “Conceptual Nervous System”: Can Computational Cognitive Neuroscience Transform Learning Theory? Fabian A. Sotoa,∗ aDepartment of Psychology, Florida International University, 11200 SW 8th St, AHC4 460, Miami, FL 33199 Abstract In the last century, learning theory has been dominated by an approach assuming that associations between hypothetical representational nodes can support the acquisition of knowledge about the environment. The similarities between this approach and connectionism did not go unnoticed to learning theorists, with many of them explicitly adopting a neural network approach in the modeling of learning phenomena. Skinner famously criticized such use of hypothetical neural structures for the explanation of behavior (the “Conceptual Nervous System”), and one aspect of his criticism has proven to be correct: theory underdetermination is a pervasive problem in cognitive modeling in general, and in associationist and connectionist models in particular. That is, models implementing two very different cognitive processes often make the exact same behavioral predictions, meaning that important theoretical questions posed by contrasting the two models remain unanswered. We show through several examples that theory underdetermination is common in the learning theory literature, affecting the solvability of some of the most important theoretical problems that have been posed in the last decades. Computational cognitive neuroscience (CCN) offers a solution to this problem, by including neurobiological constraints in computational models of behavior and cognition. Rather than simply being inspired by neural computation, CCN models are built to reflect as much as possible about the actual neural structures thought to underlie a particular behavior. They go beyond the “Conceptual Nervous System” and offer a true integration of behavioral and neural levels of analysis.
    [Show full text]
  • Technical Note Q-Learning
    Machine Learning, 8, 279-292 (1992) © 1992 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Technical Note Q-Learning CHRISTOPHER J.C.H. WATKINS 25b Framfield Road, Highbury, London N5 1UU, England PETER DAYAN Centre for Cognitive Science, University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9EH, Scotland Abstract. ~-learning (Watkins, 1989) is a simpleway for agentsto learn how to act optimallyin controlledMarkovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successivelyimproving its evaluationsof the quality of particular actions at particular states. This paper presents and proves in detail a convergencetheorem for ~-learning based on that outlined in Watkins (1989). We show that 0~-learningconverges to the optimum action-valueswith probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely. We also sketch extensions to the cases of non-discounted, but absorbing, Markov environments, and where many O~values can be changed each iteration, rather than just one. Keywords. 0~-learning, reinforcement learning, temporal differences, asynchronous dynamic programming 1. Introduction O~-learning (Watkins, 1989) is a form of model-free reinforcement learning. It can also be viewed as a method of asynchronous dynamic programming (DP). It provides agents with the capability of learning to act optimally in Markovian domains by experiencing the con- sequences of actions, without requiring them to build maps of the domains. Learning proceeds similarly to Sutton's (1984; 1988) method of temporal differences (TD): an agent tries an action at a particular state, and evaluates its consequences in terms of the immediate reward or penalty it receives and its estimate of the value of the state to which it is taken.
    [Show full text]
  • Human Functional Brain Imaging 1990–2009
    Portfolio Review Human Functional Brain Imaging 1990–2009 September 2011 Acknowledgements The Wellcome Trust would like to thank the many people who generously gave up their time to participate in this review. The project was led by Claire Vaughan and Liz Allen. Key input and support was provided by Lynsey Bilsland, Richard Morris, John Williams, Shewly Choudhury, Kathryn Adcock, David Lynn, Kevin Dolby, Beth Thompson, Anna Wade, Suzi Morris, Annie Sanderson, and Jo Scott; and Lois Reynolds and Tilli Tansey (Wellcome Trust Expert Group). The views expressed in this report are those of the Wellcome Trust project team, drawing on the evidence compiled during the review. We are indebted to the independent Expert Group and our industry experts, who were pivotal in providing the assessments of the Trust’s role in supporting human functional brain imaging and have informed ‘our’ speculations for the future. Finally, we would like to thank Professor Randy Buckner, Professor Ray Dolan and Dr Anne-Marie Engel, who provided valuable input to the development of the timelines and report. The2 | Portfolio Wellcome Review: Trust Human is a Functional charity registeredBrain Imaging in England and Wales, no. 210183. Contents Acknowledgements 2 Key abbreviations used in the report 4 Overview and key findings 4 Landmarks in human functional brain imaging 10 1. Introduction and background 12 2 Human functional brain imaging today: the global research landscape 14 2.1 The global scene 14 2.2 The UK 15 2.3 Europe 17 2.4 Industry 17 2.5 Human brain imaging
    [Show full text]
  • LTP, STP, and Scaling: Electrophysiological, Biochemical, and Structural Mechanisms
    This position paper has not been peer reviewed or edited. It will be finalized, reviewed and edited after the Royal Society meeting on ‘Integrating Hebbian and homeostatic plasticity’ (April 2016). LTP, STP, and scaling: electrophysiological, biochemical, and structural mechanisms John Lisman, Dept. Biology, Brandeis University, Waltham Ma. [email protected] ABSTRACT: Synapses are complex because they perform multiple functions, including at least six mechanistically different forms of plasticity (STP, early LTP, late LTP, LTD, distance‐dependent scaling, and homeostatic scaling). The ultimate goal of neuroscience is to provide an electrophysiologically, biochemically, and structurally specific explanation of the underlying mechanisms. This review summarizes the still limited progress towards this goal. Several areas of particular progress will be highlighted: 1) STP, a Hebbian process that requires small amounts of synaptic input, appears to make strong contributions to some forms of working memory. 2) The rules for LTP induction in the stratum radiatum of the hippocampus have been clarified: induction does not depend obligatorily on backpropagating Na spikes but, rather, on dendritic branch‐specific NMDA spikes. Thus, computational models based on STDP need to be modified. 3) Late LTP, a process that requires a dopamine signal (neoHebbian), is mediated by trans‐ synaptic growth of the synapse, a growth that occurs about an hour after LTP induction. 4) There is no firm evidence for cell‐autonomous homeostatic synaptic scaling; rather, homeostasis is likely to depend on a) cell‐autonomous processes that are not scaling, b) synaptic scaling that is not cell autonomous but instead depends on population activity, or c) metaplasticity processes that change the propensity of LTP vs LTD.
    [Show full text]