Meta-Learning with Memory-Augmented Neural Networks

Total Page:16

File Type:pdf, Size:1020Kb

Meta-Learning with Memory-Augmented Neural Networks Meta-Learning with Memory-Augmented Neural Networks Adam Santoro [email protected] Google DeepMind Sergey Bartunov [email protected] Google DeepMind, National Research University Higher School of Economics (HSE) Matthew Botvinick [email protected] Daan Wierstra [email protected] Timothy Lillicrap [email protected] Google DeepMind Abstract nition (Yu & Deng, 2012), and games (Mnih et al., 2015; Silver et al., 2016). Notably, performance in such tasks is Despite recent breakthroughs in the applications typically evaluated after extensive, incremental training on of deep neural networks, one setting that presents large data sets. In contrast, many problems of interest re- a persistent challenge is that of “one-shot learn- quire rapid inference from small quantities of data. In the ing.” Traditional gradient-based networks require limit of “one-shot learning,” single observations should re- a lot of data to learn, often through extensive it- sult in abrupt shifts in behavior. erative training. When new data is encountered, the models must inefficiently relearn their param- This kind offlexible adaptation is a celebrated aspect of hu- eters to adequately incorporate the new informa- man learning (Jankowski et al., 2011), manifesting in set- tion without catastrophic interference. Architec- tings ranging from motor control (Braun et al., 2009) to the tures with augmented memory capacities, such as acquisition of abstract concepts (Lake et al., 2015). Gener- Neural Turing Machines (NTMs), offer the abil- ating novel behavior based on inference from a few scraps ity to quickly encode and retrieve new informa- of information – e.g., inferring the full range of applicabil- tion, and hence can potentially obviate the down- ity for a new word, heard in only one or two contexts – is sides of conventional models. Here, we demon- something that has remained stubbornly beyond the reach strate the ability of a memory-augmented neu- of contemporary machine intelligence. It appears to present ral network to rapidly assimilate new data, and a particularly daunting challenge for deep learning. In sit- leverage this data to make accurate predictions uations when only a few training examples are presented after only a few samples. We also introduce a one-by-one, a straightforward gradient-based solution is to new method for accessing an external memory completely re-learn the parameters from the data available that focuses on memory content, unlike previous at the moment. Such a strategy is prone to poor learning, methods that additionally use memory location- and/or catastrophic interference. In view of these hazards, based focusing mechanisms. non-parametric methods are often considered to be better suited. However, previous work does suggest one potential strat- 1. Introduction egy for attaining rapid learning from sparse data, and The current success of deep learning hinges on the abil- hinges on the notion of meta-learning( Thrun, 1998; Vi- ity to apply gradient-based optimization to high-capacity lalta & Drissi, 2002). Although the term has been used models. This approach has achieved impressive results on in numerous senses (Schmidhuber et al., 1997; Caruana, many large-scale supervised tasks with raw sensory input, 1997; Schweighofer & Doya, 2003; Brazdil et al., 2003), such as image classification (He et al., 2015), speech recog- meta-learning generally refers to a scenario in which an agent learns at two levels, each associated with different Proceedings of the 33 rd International Conference on Machine time scales. Rapid learning occurs within a task, for ex- Learning, New York, NY, USA, 2016. JMLR: W&CP volume ample, when learning to accurately classify within a par- 48. Copyright 2016 by the author(s). ticular dataset. This learning is guided by knowledge Meta-Learning with Memory-Augmented Neural Networks accrued more gradually across tasks, which captures the learning, extending the range of problems to which deep way in which task structure varies across target domains learning can be effectively applied. (Giraud-Carrier et al., 2004; Rendell et al., 1987; Thrun, 1998). Given its two-tiered organization, this form of meta- 2. Meta-Learning Task Methodology learning is often described as “learning to learn.” Usually, we try to choose parametersθ to minimize a learn- It has been proposed that neural networks with mem- ing cost across some datasetD. However, for meta- ory capacities could prove quite capable of meta-learning L learning, we choose parameters to reduce the expected (Hochreiter et al., 2001). These networks shift their bias learning cost across a distribution of datasetsp(D): through weight updates, but also modulate their output by learning to rapidly cache representations in memory stores θ∗ = argminθED p(D) [ (D;θ)]. (1) (Hochreiter & Schmidhuber, 1997). For example, LSTMs ∼ L trained to meta-learn can quickly learn never-before-seen quadratic functions with a low number of data samples To accomplish this, proper task setup is critical (Hochre- (Hochreiter et al., 2001). iter et al., 2001). In our setup, a task, or episode, in- T volves the presentation of some datasetD= d t t=1 = Neural networks with a memory capacity provide a promis- T { } (xt, yt) t=1. For classification,y t is the class label for ing approach to meta-learning in deep networks. However, { } an imagex t, and for regression,y t is the value of a hid- the specific strategy of using the memory inherent in un- den function for a vector with real-valued elementsx t, or structured recurrent architectures is unlikely to extend to simply a real-valued numberx t (here on, for consistency, settings where each new task requires significant amounts xt will be used). In this setup,y t is both a target, and of new information to be rapidly encoded. A scalable so- is presented as input along withx t, in a temporally off- lution has a few necessary requirements: First, information set manner; that is, the network sees the input sequence must be stored in memory in a representation that is both (x1, null),(x 2, y1),...,(x T , yT 1 ). And so, at timet the stable (so that it can be reliably accessed when needed) and − correct label for the previous data sample (yt 1) is pro- element-wise addressable (so that relevant pieces of infor- − vided as input along with a new queryx t (see Figure 1 (a)). mation can be accessed selectively). Second, the number The network is tasked to output the appropriate label for of parameters should not be tied to the size of the mem- xt (i.e.,y t) at the given timestep. Importantly, labels are ory. These two characteristics do not arise naturally within shuffled from dataset-to-dataset. This prevents the network standard memory architectures, such as LSTMs. How- from slowly learning sample-class bindings in its weights. ever, recent architectures, such as Neural Turing Machines Instead, it must learn to hold data samples in memory un- (NTMs) (Graves et al., 2014) and memory networks (We- til the appropriate labels are presented at the next time- ston et al., 2014), meet the requisite criteria. And so, in this step, after which sample-class information can be bound paper we revisit the meta-learning problem and setup from and stored for later use (see Figure 1 (b)). Thus, for a given the perspective of a highly capable memory-augmented episode, ideal performance involves a random guess for the neural network (MANN) (note: here on, the term MANN first presentation of a class (since the appropriate label can will refer to the class of external-memory equipped net- not be inferred from previous episodes, due to label shuf- works, and not other “internal” memory-based architec- fling), and the use of memory to achieve perfect accuracy tures, such as LSTMs). thereafter. Ultimately, the system aims at modelling the We demonstrate that MANNs are capable of meta-learning predictive distributionp(y t xt,D1:t 1;θ), inducing a cor- | − in tasks that carry significant short- and long-term mem- responding loss at each time step. ory demands. This manifests as successful classification This task structure incorporates exploitable meta- of never-before-seen Omniglot classes at human-like accu- knowledge: a model that meta-learns would learn to bind racy after only a few presentations, and principled function data representations to their appropriate labels regardless estimation based on a small number of samples. Addition- of the actual content of the data representation or label, ally, we outline a memory access module that emphasizes and would employ a general scheme to map these bound memory access by content, and not additionally on mem- representations to appropriate classes or function values ory location, as in original implementations of the NTM for prediction. (Graves et al., 2014). Our approach combines the best of two worlds: the ability to slowly learn an abstract method for obtaining useful representations of raw data, via gra- 3. Memory-Augmented Model dient descent, and the ability to rapidly bind never-before- 3.1. Neural Turing Machines seen information after a single presentation, via an external memory module. The combination supports robust meta- The Neural Turing Machine is a fully differentiable imple- mentation of a MANN. It consists of a controller, such as Meta-Learning with Memory-Augmented Neural Networks External Memory External Memory Class Prediction -2 2 -2 -4 ... ... + ... Backpropagated Shuffle: Signal (x ,y )(x ,y ) Labels (x ,0) (x ,y ) t t-1 t+1 t 1 2 1 xt: xt+1: xt+n: Classes yt-1: 3 yt: 2 yt+n-1: 1 Samples Episode Bind and Encode Retrieve Bound Information (a) Task setup (b) Network strategy Figure 1. Task structure. (a) Omniglot images (orx-values for regression),x t, are presented with time-offset labels (or function values), yt 1, to prevent the network from simply mapping the class labels to the output.
Recommended publications
  • Elements of DSAI: Game Tree Search, Learning Architectures
    Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Elements of DSAI AlphaGo Part 2: Game Tree Search, Learning Architectures Search & Learn: A Recipe for AI Action Decisions J¨orgHoffmann Winter Term 2019/20 Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 1/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Competitive Agents? Quote AI Introduction: \Single agent vs. multi-agent: One agent or several? Competitive or collaborative?" ! Single agent! Several koalas, several gorillas trying to beat these up. BUT there is only a single acting entity { one player decides which moves to take (who gets into the boat). Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 3/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Competitive Agents! Quote AI Introduction: \Single agent vs. multi-agent: One agent or several? Competitive or collaborative?" ! Multi-agent competitive! TWO players deciding which moves to take. Conflicting interests. Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 4/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Agenda: Game Search, AlphaGo Architecture Games: What is that? ! Game categories, game solutions. Game Search: How to solve a game? ! Searching the game tree. Evaluation Functions: How to evaluate a game position? ! Heuristic functions for games. AlphaGo: How does it work? ! Overview of AlphaGo architecture, and changes in Alpha(Go) Zero. Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 5/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Positioning in the DSAI Phase Model Hoffmann Elements of DSAI Game Tree Search, Learning Architectures 6/49 Introduction Games Game Search Evaluation Fns AlphaGo/Zero Summary Quiz References Which Games? ! No chance element.
    [Show full text]
  • Discrete Sequential Prediction of Continu
    Under review as a conference paper at ICLR 2018 DISCRETE SEQUENTIAL PREDICTION OF CONTINU- OUS ACTIONS FOR DEEP RL Anonymous authors Paper under double-blind review ABSTRACT It has long been assumed that high dimensional continuous control problems cannot be solved effectively by discretizing individual dimensions of the action space due to the exponentially large number of bins over which policies would have to be learned. In this paper, we draw inspiration from the recent success of sequence-to-sequence models for structured prediction problems to develop policies over discretized spaces. Central to this method is the realization that com- plex functions over high dimensional spaces can be modeled by neural networks that predict one dimension at a time. Specifically, we show how Q-values and policies over continuous spaces can be modeled using a next step prediction model over discretized dimensions. With this parameterization, it is possible to both leverage the compositional structure of action spaces during learning, as well as compute maxima over action spaces (approximately). On a simple example task we demonstrate empirically that our method can perform global search, which effectively gets around the local optimization issues that plague DDPG. We apply the technique to off-policy (Q-learning) methods and show that our method can achieve the state-of-the-art for off-policy methods on several continuous control tasks. 1 INTRODUCTION Reinforcement learning has long been considered as a general framework applicable to a broad range of problems. However, the approaches used to tackle discrete and continuous action spaces have been fundamentally different. In discrete domains, algorithms such as Q-learning leverage backups through Bellman equations and dynamic programming to solve problems effectively.
    [Show full text]
  • Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning
    Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning Guangxiang Zhu⇤ Minghao Zhang⇤ IIIS School of Software Tsinghua University Tsinghua University [email protected] [email protected] Honglak Lee Chongjie Zhang EECS IIIS University of Michigan Tsinghua University [email protected] [email protected] Abstract Sample efficiency has been one of the major challenges for deep reinforcement learning. Recently, model-based reinforcement learning has been proposed to address this challenge by performing planning on imaginary trajectories with a learned world model. However, world model learning may suffer from overfitting to training trajectories, and thus model-based value estimation and policy search will be prone to be sucked in an inferior local policy. In this paper, we propose a novel model-based reinforcement learning algorithm, called BrIdging Reality and Dream (BIRD). It maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectories. We demonstrate that our approach improves sample efficiency of model-based planning, and achieves state-of-the-art performance on challenging visual control benchmarks. 1 Introduction Reinforcement learning (RL) is proposed as a general-purpose learning framework for artificial intelligence problems, and has led to tremendous progress in a variety of domains [1, 2, 3, 4]. Model- free RL adopts a trail-and-error paradigm, which directly learns a mapping function from observations to values or actions through interactions with environments. It has achieved remarkable performance in certain video games and continuous control tasks because of its simplicity and minimal assumptions about environments.
    [Show full text]
  • Improving the Gating Mechanism of Recurrent
    Under review as a conference paper at ICLR 2020 IMPROVING THE GATING MECHANISM OF RECURRENT NEURAL NETWORKS Anonymous authors Paper under double-blind review ABSTRACT Gating mechanisms are widely used in neural network models, where they allow gradients to backpropagate more easily through depth or time. However, their satura- tion property introduces problems of its own. For example, in recurrent models these gates need to have outputs near 1 to propagate information over long time-delays, which requires them to operate in their saturation regime and hinders gradient-based learning of the gate mechanism. We address this problem by deriving two synergistic modifications to the standard gating mechanism that are easy to implement, introduce no additional hyperparameters, and improve learnability of the gates when they are close to saturation. We show how these changes are related to and improve on alternative recently proposed gating mechanisms such as chrono-initialization and Ordered Neurons. Empirically, our simple gating mechanisms robustly improve the performance of recurrent models on a range of applications, including synthetic memorization tasks, sequential image classification, language modeling, and reinforcement learning, particularly when long-term dependencies are involved. 1 INTRODUCTION Recurrent neural networks (RNNs) have become a standard machine learning tool for learning from sequential data. However, RNNs are prone to the vanishing gradient problem, which occurs when the gradients of the recurrent weights become vanishingly small as they get backpropagated through time (Hochreiter et al., 2001). A common approach to alleviate the vanishing gradient problem is to use gating mechanisms, leading to models such as the long short term memory (Hochreiter & Schmidhuber, 1997, LSTM) and gated recurrent units (Chung et al., 2014, GRUs).
    [Show full text]
  • Bayesian Policy Selection Using Active Inference
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Ghent University Academic Bibliography Published in the proceedings of the Workshop on “Structure & Priors in Reinforcement Learning” at ICLR 2019 BAYESIAN POLICY SELECTION USING ACTIVE INFERENCE Ozan C¸atal, Johannes Nauta, Tim Verbelen, Pieter Simoens, & Bart Dhoedt IDLab, Department of Information Technology Ghent University - imec Ghent, Belgium [email protected] ABSTRACT Learning to take actions based on observations is a core requirement for artificial agents to be able to be successful and robust at their task. Reinforcement Learn- ing (RL) is a well-known technique for learning such policies. However, current RL algorithms often have to deal with reward shaping, have difficulties gener- alizing to other environments and are most often sample inefficient. In this pa- per, we explore active inference and the free energy principle, a normative theory from neuroscience that explains how self-organizing biological systems operate by maintaining a model of the world and casting action selection as an inference problem. We apply this concept to a typical problem known to the RL community, the mountain car problem, and show how active inference encompasses both RL and learning from demonstrations. 1 INTRODUCTION Active inference is an emerging paradigm from neuroscience (Friston et al., 2017), which postulates that action selection in biological systems, in particular the human brain, is in effect an inference problem where agents are attracted to a preferred prior state distribution in a hidden state space. Contrary to many state-of-the art RL algorithms, active inference agents are not purely goal directed and exhibit an inherent epistemic exploration (Schwartenbeck et al., 2018).
    [Show full text]
  • Artificial Intelligence and Big Data – Innovation Landscape Brief
    ARTIFICIAL INTELLIGENCE AND BIG DATA INNOVATION LANDSCAPE BRIEF © IRENA 2019 Unless otherwise stated, material in this publication may be freely used, shared, copied, reproduced, printed and/or stored, provided that appropriate acknowledgement is given of IRENA as the source and copyright holder. Material in this publication that is attributed to third parties may be subject to separate terms of use and restrictions, and appropriate permissions from these third parties may need to be secured before any use of such material. ISBN 978-92-9260-143-0 Citation: IRENA (2019), Innovation landscape brief: Artificial intelligence and big data, International Renewable Energy Agency, Abu Dhabi. ACKNOWLEDGEMENTS This report was prepared by the Innovation team at IRENA’s Innovation and Technology Centre (IITC) with text authored by Sean Ratka, Arina Anisie, Francisco Boshell and Elena Ocenic. This report benefited from the input and review of experts: Marc Peters (IBM), Neil Hughes (EPRI), Stephen Marland (National Grid), Stephen Woodhouse (Pöyry), Luiz Barroso (PSR) and Dongxia Zhang (SGCC), along with Emanuele Taibi, Nadeem Goussous, Javier Sesma and Paul Komor (IRENA). Report available online: www.irena.org/publications For questions or to provide feedback: [email protected] DISCLAIMER This publication and the material herein are provided “as is”. All reasonable precautions have been taken by IRENA to verify the reliability of the material in this publication. However, neither IRENA nor any of its officials, agents, data or other third- party content providers provides a warranty of any kind, either expressed or implied, and they accept no responsibility or liability for any consequence of use of the publication or material herein.
    [Show full text]
  • Meta-Learning Deep Energy-Based Memory Models
    Published as a conference paper at ICLR 2020 META-LEARNING DEEP ENERGY-BASED MEMORY MODELS Sergey Bartunov Jack W Rae Simon Osindero DeepMind DeepMind DeepMind London, United Kingdom London, United Kingdom London, United Kingdom [email protected] [email protected] [email protected] Timothy P Lillicrap DeepMind London, United Kingdom [email protected] ABSTRACT We study the problem of learning an associative memory model – a system which is able to retrieve a remembered pattern based on its distorted or incomplete version. Attractor networks provide a sound model of associative memory: patterns are stored as attractors of the network dynamics and associative retrieval is performed by running the dynamics starting from a query pattern until it converges to an attractor. In such models the dynamics are often implemented as an optimization procedure that minimizes an energy function, such as in the classical Hopfield network. In general it is difficult to derive a writing rule for a given dynamics and energy that is both compressive and fast. Thus, most research in energy- based memory has been limited either to tractable energy models not expressive enough to handle complex high-dimensional objects such as natural images, or to models that do not offer fast writing. We present a novel meta-learning approach to energy-based memory models (EBMM) that allows one to use an arbitrary neural architecture as an energy model and quickly store patterns in its weights. We demonstrate experimentally that our EBMM approach can build compressed memories for synthetic and natural data, and is capable of associative retrieval that outperforms existing memory systems in terms of the reconstruction error and compression rate.
    [Show full text]
  • Machine Theory of Mind
    Machine Theory of Mind Neil C. Rabinowitz∗ Frank Perbet H. Francis Song DeepMind DeepMind DeepMind [email protected] [email protected] [email protected] Chiyuan Zhang S. M. Ali Eslami Matthew Botvinick Google Brain DeepMind DeepMind [email protected] [email protected] [email protected] Abstract 1. Introduction Theory of mind (ToM; Premack & Woodruff, For all the excitement surrounding deep learning and deep 1978) broadly refers to humans’ ability to rep- reinforcement learning at present, there is a concern from resent the mental states of others, including their some quarters that our understanding of these systems is desires, beliefs, and intentions. We propose to lagging behind. Neural networks are regularly described train a machine to build such models too. We de- as opaque, uninterpretable black-boxes. Even if we have sign a Theory of Mind neural network – a ToM- a complete description of their weights, it’s hard to get a net – which uses meta-learning to build models handle on what patterns they’re exploiting, and where they of the agents it encounters, from observations might go wrong. As artificial agents enter the human world, of their behaviour alone. Through this process, the demand that we be able to understand them is growing it acquires a strong prior model for agents’ be- louder. haviour, as well as the ability to bootstrap to Let us stop and ask: what does it actually mean to “un- richer predictions about agents’ characteristics derstand” another agent? As humans, we face this chal- and mental states using only a small number of lenge every day, as we engage with other humans whose behavioural observations.
    [Show full text]
  • Shixiang (Shane) Gu
    Shixiang (Shane) Gu Jesus College, Jesus Lane Cambridge, Cambridgeshire CB5 8BL, United Kingdoms [email protected] website: http://sg717.user.srcf.net/ RESEARCH INTERESTS Deep Learning, Reinforcement Learning, Robotics, Probabilistic Models and Approximate Inference, Causality, and other Machine Learning topics. EDUCATION PhD in Machine Learning 2014—Present University of Cambridge, UK. Advised by Prof. Richard E. Turner and Prof. Zoubin Ghahramani FRS. Max Planck Institute for Intelligent Systems, Germany. Advised by Prof. Bernhard Schoelkopf. BASc in Engineering Science, Electrical and Computer major 2009–2013 University of Toronto, Canada. CGPA: 3.93. Highest Rank 1st of 264 students (2010 Winter). Recipient of Award of Excellence 2013 (awarded to top 5 graduating students from the program). High School, Vancouver, Canada. 2006–2009 Middle School, Shanghai, China. 2002–2006 Primary School, Tokyo, Japan. 1997–2002 PROFESSIONAL EXPERIENCE Google Brain, Google Research, Mountain View, CA. PhD Intern June 2015—Jan. 2016, June 2016–Sept. 2016, June 2017–Sept. 2017 Supervisors & collaborators: Ilya Sutskever, Sergey Levine, Vincent Vanhoucke, Timothy Lillicrap. • Published 1 NIPS, 2 ICML, 3 ICLR, and 1 ICRA papers as a result of internships. • Worked part-time during Feb. 2016—May 2016 at Google DeepMind under Timothy Lillicrap. • Developed the state-of-the-art methods for deep RL in continuous control, and demonstrated the robots learning door opening under 3 hours. Panasonic Silicon Valley Lab, Panasonic R&D Company of America, Cupertino,
    [Show full text]
  • Efficiently Mastering the Game of Nogo with Deep Reinforcement
    electronics Article Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain Knowledge Yifan Gao 1,*,† and Lezhou Wu 2,† 1 College of Medicine and Biological Information Engineering, Northeastern University, Liaoning 110819, China 2 College of Information Science and Engineering, Northeastern University, Liaoning 110819, China; [email protected] * Correspondence: [email protected] † These authors contributed equally to this work. Abstract: Computer games have been regarded as an important field of artificial intelligence (AI) for a long time. The AlphaZero structure has been successful in the game of Go, beating the top professional human players and becoming the baseline method in computer games. However, the AlphaZero training process requires tremendous computing resources, imposing additional difficulties for the AlphaZero-based AI. In this paper, we propose NoGoZero+ to improve the AlphaZero process and apply it to a game similar to Go, NoGo. NoGoZero+ employs several innovative features to improve training speed and performance, and most improvement strategies can be transferred to other nonspecific areas. This paper compares it with the original AlphaZero process, and results show that NoGoZero+ increases the training speed to about six times that of the original AlphaZero process. Moreover, in the experiment, our agent beat the original AlphaZero agent with a score of 81:19 after only being trained by 20,000 self-play games’ data (small in quantity compared with Citation: Gao, Y.; Wu, L. Efficiently 120,000 self-play games’ data consumed by the original AlphaZero). The NoGo game program based Mastering the Game of NoGo with on NoGoZero+ was the runner-up in the 2020 China Computer Game Championship (CCGC) with Deep Reinforcement Learning limited resources, defeating many AlphaZero-based programs.
    [Show full text]
  • MUSTAFA SULEYMAN, DEEPMIND INTRO Thank You Mr. President and Your Excellencies for the Invitation to Speak to You About What Ar
    MUSTAFA SULEYMAN, DEEPMIND INTRO Thank you Mr. President and your excellencies for the invitation to speak to you about what artificial intelligence can do to help achieve the Sustainable Development Goals. As a global community, we’ve made significant strides over the last few decades, taking on some of the world’s greatest tragedies like child mortality. Every single day 17,000 new lives get to be lived, by children who would have died just 25 years ago. Peaceful cooperation and relentless innovation has been the driving force of this truly spectacular progress. And, yet, in many ways inequality hasn’t improved—it’s become worse. Malnutrition and preventable disease continue to kill millions, impacting both developed and developing world healthcare systems, and the devastating threat of climate change looms, with the effects set to hit the poorest the hardest. If we are to decrease this level of suffering, then we’ll have to make sure humanity is doing everything possible to come up with bold, new solutions. But to have any chance of truly solving these problems - one of the great goals of human civilisation and I would argue ​ core to the founding purpose of this great institution - then what’s possible won’t be ​ enough. We must take on what is currently impossible, and do all we can to push back the ​ ​ limits of what humanity can achieve. These limits are real, and they cap all of our aspirations for change. Our efforts to tackle disease are capped by a desperate shortage of trained nurses and doctors — in the rich West as well as in developing nations.
    [Show full text]
  • Deep Learning Without Weight Transport
    Deep Learning without Weight Transport Mohamed Akrout Collin Wilson Peter C. Humphreys University of Toronto, Triage University of Toronto DeepMind Timothy Lillicrap Douglas Tweed DeepMind, University College London University of Toronto, York University Abstract Current algorithms for deep learning probably cannot run in the brain because they rely on weight transport, where forward-path neurons transmit their synaptic weights to a feedback path, in a way that is likely impossible biologically. An algo- rithm called feedback alignment achieves deep learning without weight transport by using random feedback weights, but it performs poorly on hard visual-recognition tasks. Here we describe two mechanisms — a neural circuit called a weight mirror and a modification of an algorithm proposed by Kolen and Pollack in 1994 — both of which let the feedback path learn appropriate synaptic weights quickly and accu- rately even in large networks, without weight transport or complex wiring. Tested on the ImageNet visual-recognition task, these mechanisms learn almost as well as backprop (the standard algorithm of deep learning, which uses weight transport) and they outperform feedback alignment and another, more-recent transport-free algorithm, the sign-symmetry method. 1 Introduction The algorithms of deep learning were devised to run on computers, yet in many ways they seem suitable for brains as well; for instance, they use multilayer networks of processing units, each with many inputs and a single output, like networks of neurons. But current algorithms can’t quite work in the brain because they rely on the error-backpropagation algorithm, or backprop, which uses weight transport: each unit multiplies its incoming signals by numbers called weights, and some units transmit their weights to other units.
    [Show full text]