Towards Lifelong Reasoning with Sparse and Compressive Memory Systems

Total Page:16

File Type:pdf, Size:1020Kb

Towards Lifelong Reasoning with Sparse and Compressive Memory Systems Towards Lifelong Reasoning with Sparse and Compressive Memory Systems Jack William Rae PhD Thesis CoMPLEX University College London Declaration I, Jack William Rae confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. 1 Abstract Humans have a remarkable ability to remember information over long time hori- zons. When reading a book, we build up a compressed representation of the past narrative, such as the characters and events that have built up the story so far. We can do this even if they are separated by thousands of words from the current text, or long stretches of time between readings. During our life, we build up and retain memories that tell us where we live, what we have experienced, and who we are. Adding memory to artificial neural networks has been transformative in machine learning, allowing models to extract structure from temporal data, and more accu- rately model the future. However the capacity for long-range reasoning in current memory-augmented neural networks is considerably limited, in comparison to hu- mans, despite the access to powerful modern computers. This thesis explores two prominent approaches towards scaling artificial memo- ries to lifelong capacity: sparse access and compressive memory structures. With sparse access, the inspection, retrieval, and updating of only a very small sub- set of pertinent memory is considered. It is found that sparse memory access is beneficial for learning, allowing for improved data-efficiency and improved general- 2 isation. From a computational perspective | sparsity allows scaling to memories with millions of entities on a simple CPU-based machine. It is shown that memory systems that compress the past to a smaller set of representations reduce redun- dancy and can speed up the learning of rare classes and improve upon classical data-structures in database systems. Compressive memory architectures are also devised for sequence prediction tasks and are observed to significantly increase the state-of-the-art in modelling natural language. 3 Impact Statement This study investigates memory structures for artificial neural networks that can scale to longer periods of time and larger quantities of data than previously demon- strated possible. It is demonstrated that neural networks can learn to store and retrieve salient information over hundreds of thousands of time-steps | an im- provement of around two orders of magnitude from prior work. This is achieved using efficient sparse memory access operations and compressive memory data structures to reduce the computational cost of accessing memory and the amount of space required to store it. The long-term goal for this work is to embed these memory structures within a complex agent, so it can reason over the stream of data it experiences during its lifetime. This could be a robotic agent which acts in the real world, or a dialogue agent which can reason over past conversations and auxiliary text context (e.g. the web). This study demonstrates that better long-range memory architectures can be used to improve state-of-the-art performance in question answering (Chapter3), database systems (Chapter5), and the modelling of language and speech (Chapter 4 and6). 4 Acknowledgements I dedicate this work to my wife Laura and to my son William. I would like to thank my family and friends for their support throughout this project, in particular my mother who has given considerable guidance along the way. I would like to thank my supervisor Tim Lillicrap for both introducing me to the topics of scalable and compressive memory systems, and for guiding me through the research process over the past couple of years. His consistent energy and creativity has propelled me through many patches of uncertainty and it has been a wonderful experience to work with him. I would also like to thank my supervisor Peter Dayan for his wider perspective spanning multiple fields. Peter's detailed, analytical process of thinking and questioning has shaped the structure of my research. I would like to thank Yori Zwols, my manager throughout the majority of my doctoral studies. Yori helped me to select research projects that suited the self-contained nature of doctoral study, whilst being relevant for DeepMind's mission of solving intelligence. I would like to thank Alex Graves and Greg Wayne for their mentorship. Alex has introduced me to many ideas in the space of memory and sequence modelling. He has also taught me how to write, indirectly via his own works and directly via 5 his astute feedback. Greg has taught me to select projects that can be tentatively validated or disproved within a few weeks of work, which has proven to be one of the most important pieces of advice in the uncertain game of research. I would like to thank Demis Hassabis and Joel Liebo for initially introducing me to memory at DeepMind in 2014 and piquing my interest from their own innovative neuroscience- based memory research. Several research leads at DeepMind have overseen these research projects at DeepMind, providing resources and guidance | namely Daan Wierstraa, Oriol Vinyals, and Koray Kavukcuoglu. For work on sparse attention, I would like to thank Jonathan Hunt who collabo- rated closely on this project. Jonny provided ideas for sparsifying the DNC's link matrix and instrumented the bAbI evaluation. I would also like to thank Malcolm Reynolds, Fumin Wang and Yori Zwols for their advice and comments. For work on compressive memory systems for classification, I would like to thank Chris Dyer and Gabor Melis for providing invaluable direction in getting set up with language modelling. I would also like to thank Oriol Vinyals, Pablo Sprech- mann, Siddhant Jayakumar, Charles Blundell and Adam Santoro for a multitude of enlightening comments and suggestions. For work on the Neural Bloom Filter, I would like to thank my collaborator Sergey Bartunov who implemented the database experiments and had several architec- tural contributions. I thank Andras Gyorgy and Tor Lattimor for their input on the general space complexity of approximate set membership, from an information theoretic perspective. For work on the Compressive Transformer I would like to thank my collaborator 6 Anna Potapenko who ran several experiments and developed the PG-19 dataset. I would also like to thank Siddhant Jayakumar for advice and experiments on the memory maze tasks, and Chloe Hillier for management of the research project, and help in open-sourcing the model code and dataset. I thank Tim Lillicrap for his nu- merous inputs on model design, experiment design and inputs on the manuscript. I thank Chris Dyer, Felix Gimeno, and Koray Kavukcuoglu for reviewing the text. I also thank Peter Dayan, Adam Santoro, Jacob Menick, Emilio Parisotto, Hyun- jik Kim, Simon Osindero, Sergey Bartunov, David Raposo, and Daan Wierstra for ideas regarding model design. I thank Yazhe Li and Aaron Van de Oord for their help and advice in instrumenting speech modelling experiments. Finally I would like to thank the long list of unnamed DeepMind colleagues and UCL staff that have provided library code, talks, discussions, feedback, and endless enthusiasm and inspiration. 7 Contents 1 Introduction 26 1.1 Limitations of Existing Approaches.................. 28 1.2 Overview of Contributions....................... 30 2 Background 34 2.1 Memory in Animals........................... 34 2.1.1 Tulving's Taxonomy for Long-Term Memories........ 35 2.1.2 The Hippocampus....................... 38 2.1.3 Memory as a Complementary Learning System....... 40 2.2 Data Storage in Computing...................... 41 2.2.1 Hashing............................. 42 2.2.2 Nearest Neighbour Search................... 43 2.2.3 Set Membership......................... 46 2.2.4 Bloom Filter........................... 48 2.3 Memory-Augmented Neural Networks................. 49 2.3.1 Hopfield Networks....................... 51 2.3.2 Recurrent Neural Networks.................. 53 2.3.3 Gated Recurrent Neural Networks............... 55 8 2.3.4 Attention............................ 58 2.3.5 Memory Networks....................... 59 2.3.6 Neural Turing Machines.................... 61 2.3.7 Neural Stacks.......................... 66 2.3.8 Differentiable Neural Computer................ 68 2.3.9 Transformer........................... 69 2.3.10 Non-parametric Classification................. 74 2.4 Memory Tasks.............................. 76 2.4.1 Selective Processing of a Stream of Numbers......... 76 2.4.2 Algorithmic Tasks....................... 77 2.4.3 Language Modelling...................... 78 2.4.4 Question Answering...................... 84 3 Scaling Memory with Sparsity 94 3.1 Motivation................................ 95 3.2 Architecture............................... 97 3.2.1 Read............................... 98 3.2.2 Sparse Attention........................ 98 3.2.3 Write............................... 99 3.2.4 Setting the Memory Size N vs T ............... 102 3.2.5 Controller............................ 102 3.2.6 Efficient backpropagation through time............ 103 3.2.7 Approximate nearest neighbours................ 104 3.3 Time and space complexity...................... 106 3.3.1 Initialisation........................... 106 9 3.3.2 Complexity of read....................... 106 3.3.3 Complexity of write....................... 107 3.3.4 Content-based addressing in (log N) time.......... 109 O 3.3.5 Optimality............................ 110 3.4
Recommended publications
  • Lecture 9 Recurrent Neural Networks “I’M Glad That I’M Turing Complete Now”
    Lecture 9 Recurrent Neural Networks “I’m glad that I’m Turing Complete now” Xinyu Zhou Megvii (Face++) Researcher [email protected] Nov 2017 Raise your hand and ask, whenever you have questions... We have a lot to cover and DON’T BLINK Outline ● RNN Basics ● Classic RNN Architectures ○ LSTM ○ RNN with Attention ○ RNN with External Memory ■ Neural Turing Machine ■ CAVEAT: don’t fall asleep ● Applications ○ A market of RNNs RNN Basics Feedforward Neural Networks ● Feedforward neural networks can fit any bounded continuous (compact) function ● This is called Universal approximation theorem https://en.wikipedia.org/wiki/Universal_approximation_theorem Cybenko, George. "Approximation by superpositions of a sigmoidal function." Mathematics of Control, Signals, and Systems (MCSS) 2.4 (1989): 303-314. Bounded Continuous Function is NOT ENOUGH! How to solve Travelling Salesman Problem? Bounded Continuous Function is NOT ENOUGH! How to solve Travelling Salesman Problem? We Need to be Turing Complete RNN is Turing Complete Siegelmann, Hava T., and Eduardo D. Sontag. "On the computational power of neural nets." Journal of computer and system sciences 50.1 (1995): 132-150. Sequence Modeling Sequence Modeling ● How to take a variable length sequence as input? ● How to predict a variable length sequence as output? RNN RNN Diagram A lonely feedforward cell RNN Diagram Grows … with more inputs and outputs RNN Diagram … here comes a brother (x_1, x_2) comprises a length-2 sequence RNN Diagram … with shared (tied) weights x_i: inputs y_i: outputs W: all the
    [Show full text]
  • Who Wants to Be a Millionaire Host on 'Worst Year'
    7 Ts&Cs apply Iceland give huge discount Claire King health: Craig Revel Horwood Kate Middleton pregnant Jenny Ryan: ‘The cat is out to emergency service Emmerdale star's health: ‘It was getting with twins on royal tour in the bag’ The Chase quizzer workers - find… diagnosis ‘I was worse’ Strictly… Pakistan?… announces… Jeremy Clarkson: ‘Wanted to top myself’ Who Wants To Be A Millionaire host on 'worst year' JEREMY CLARKSON - who fronts ITV show Who Wants To Be A Millionaire? - shared his thoughts on a recent study which claimed 1978 was the “worst year” in British history. Who Wants to Be a Millionaire: Jeremy criticises the contestant Earlier this week, researchers from Warwick University claimed people of Britain were at their most unhappy in 1978. The latter year and the first two months of 1979 are best remembered for the Winter of Discontent, where strikes took place and caused various disruptions. ADVERTISING 1/6 Jeremy Clarkson (/search?s=jeremy+clarkson) shared his thoughts on the study as he recalled his first year of working during the strikes. PROMOTED STORY 4x4 Magazine: the SsangYong Musso is a quantum leap forward (SsangYong UK)(https://www.ssangyonggb.co.uk/press/first-drive-ssangyong-musso/56483&utm-source=outbrain&utm- medium=musso&utm-campaign=native&utm-content=4x4-magazine?obOrigUrl=true) In his column with The Sun newspaper, he wrote: “It’s been claimed that 1978 was the worst year in British history. RELATED ARTICLES Jeremy Clarkson sports slimmer waistline with girlfriend Lisa Jeremy Clarkson: Who Wants To Be A Millionaire host on his Hogan weight loss (/celebrity-news/1191860/Jeremy-Clarkson-weight-loss-girlfriend- (/celebrity-news/1192773/Jeremy-Clarkson-weight-loss-health- Lisa-Hogan-pictures-The-Grand-Tour-latest-news) Who-Wants-To-Be-A-Millionaire-age-ITV-Twitter-news) “I was going to argue with this.
    [Show full text]
  • Incorporating Prior Domain Knowledge Into Inductive Machine Learning Its Implementation in Contemporary Capital Markets
    University of Technology Sydney Incorporating Prior Domain Knowledge into Inductive Machine Learning Its implementation in contemporary capital markets A dissertation submitted for the degree of Doctor of Philosophy in Computing Sciences by Ting Yu Sydney, Australia 2007 °c Copyright by Ting Yu 2007 CERTIFICATE OF AUTHORSHIP/ORIGINALITY I certify that the work in this thesis has not previously been submitted for a degree nor has it been submitted as a part of requirements for a degree except as fully acknowledged within the text. I also certify that the thesis has been written by me. Any help that I have received in my research work and the preparation of the thesis itself has been acknowledged. In addition, I certify that all information sources and literature used are indicated in the thesis. Signature of Candidate ii Table of Contents 1 Introduction :::::::::::::::::::::::::::::::: 1 1.1 Overview of Incorporating Prior Domain Knowledge into Inductive Machine Learning . 2 1.2 Machine Learning and Prior Domain Knowledge . 3 1.2.1 What is Machine Learning? . 3 1.2.2 What is prior domain knowledge? . 6 1.3 Motivation: Why is Domain Knowledge needed to enhance Induc- tive Machine Learning? . 9 1.3.1 Open Areas . 12 1.4 Proposal and Structure of the Thesis . 13 2 Inductive Machine Learning and Prior Domain Knowledge :: 15 2.1 Overview of Inductive Machine Learning . 15 2.1.1 Consistency and Inductive Bias . 17 2.2 Statistical Learning Theory Overview . 22 2.2.1 Maximal Margin Hyperplane . 30 2.3 Linear Learning Machines and Kernel Methods . 31 2.3.1 Support Vector Machines .
    [Show full text]
  • Broadcast Bulletin Issue Number
    Ofcom Broadcast Bulletin Issue number 179 4 April 2011 1 Ofcom Broadcast Bulletin, Issue 179 4 April 2011 Contents Introduction 3 Standards cases In Breach Frankie Boyle’s Tramadol Nights (comments about Harvey Price) Channel 4, 7 December 2010, 22:00 5 [see page 37 for other finding on Frankie Boyle’s Tramadol Nights (mental health sketch and other issues)] Elite Days Elite TV (Channel 965), 30 November 2011, 12:00 to 13:15 Elite TV (Channel 965), 1 December 2010, 13:00 to 14:00 Elite TV 2 (Channel 914), 8 December 2010, 10.00 to 11:30 Elite Nights Elite TV (Channel 965), 30 November 2011, 22:30 to 23:35 Elite TV 2 (Channel 914), 6 December 2010, 21:00 to 21:25 Elite TV (Channel 965), 16 December 2010, 21:00 to 21:45 Elite TV (Channel 965), 22 December 2010, 00:50 to 01:20 Elite TV (Channel 965), 4 January 2011, 22:00 to 22:30 13 Page 3 Zing, 8 January 2011, 13:00 27 Deewar: Men of Power Star India Gold, 11 January 2011, 18:00 29 Bridezilla Wedding TV, 11 and 12 January 2011, 18:00 31 Resolved Dancing On Ice ITV1, 23 January 2011, 18:10 33 Not in Breach Frankie Boyle’s Tramadol Nights (mental health sketch and other issues) Channel 4, 30 November 2010 to 29 December 2010, 22:00 37 [see page 5 for other finding on Frankie Boyle’s Tramadol Nights (comments about Harvey Price)] Top Gear BBC2, 30 January 2011, 20:00 44 2 Ofcom Broadcast Bulletin, Issue 179 4 April 2011 Advertising Scheduling Cases In Breach Breach findings table Code on the Scheduling of Television Advertising compliance reports 47 Resolved Resolved findings table Code on the Scheduling of Television Advertising compliance reports 49 Fairness and Privacy cases Not Upheld Complaint by Mr Zac Goldsmith MP Channel 4 News, Channel 4, 15 and 16 July 2010 50 Other programmes not in breach 73 3 Ofcom Broadcast Bulletin, Issue 179 4 April 2011 Introduction The Broadcast Bulletin reports on the outcome of investigations into alleged breaches of those Ofcom codes and licence conditions with which broadcasters regulated by Ofcom are required to comply.
    [Show full text]
  • The Clarkson Controversy: the Impact of a Freewheeling Presenter on The
    The Clarkson Controversy: the Impact of a Freewheeling Presenter on the BBC’s Impartiality, Accountability and Integrity BA Thesis English Language and Culture, Utrecht University International Anglophone Media Studies Laura Kaai 3617602 Simon Cook January 2013 7,771 Words 2 Table of Contents 1. Introduction 3 2. Theoretical Framework 4 2.1 The BBC’s Values 4 2.1.2 Impartiality 5 2.1.3 Conflicts of Interest 5 2.1.4 Past Controversy: The Russell Brand Show and the Carol Thatcher Row 6 2.1.5 The Clarkson Controversy 7 2.2 Columns 10 2.3 Media Discourse Analysis 12 2.3.2 Agenda Setting, Decoding, Fairness and Fallacy 13 2.3.3 Bias and Defamation 14 2.3.4 Myth and Stereotype 14 2.3.5 Sensationalism 14 3. Methodology 15 3.1 Columns by Jeremy Clarkson 15 3.1.2 Procedure 16 3.2 Columns about Jeremy Clarkson 17 3.2.2 Procedure 19 4. Discussion 21 4.1 Columns by Jeremy Clarkson 21 4.2 Columns about Jeremy Clarkson 23 5. Conclusion 26 Works Cited 29 Appendices 35 3 1. Introduction “I’d have them all shot in front of their families” (“Jeremy Clarkson One”). This is part of the comment Jeremy Clarkson made on the 2011 public sector strikes in the UK, and the part that led to the BBC receiving 32,000 complaints. Clarkson said this during the 30 December 2011 live episode of The One Show, causing one of the biggest BBC controversies. The most widely watched factual TV programme in the world, with audiences in 212 territories worldwide, is BBC’s Top Gear (TopGear.com).
    [Show full text]
  • Efficient Machine Learning Models of Inductive Biases Combined with the Strengths of Program Synthesis (PBE) to Combat Implicit Biases of the Brain
    The Dangers of Algorithmic Autonomy: Efficient Machine Learning Models of Inductive Biases Combined With the Strengths of Program Synthesis (PBE) to Combat Implicit Biases of the Brain The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Halder, Sumona. 2020. The Dangers of Algorithmic Autonomy: Efficient Machine Learning Models of Inductive Biases Combined With the Strengths of Program Synthesis (PBE) to Combat Implicit Biases of the Brain. Bachelor's thesis, Harvard College. Citable link https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364669 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA The Dangers of Algorithmic Autonomy: Efficient Machine Learning Models of Inductive Biases combined with the Strengths of Program Synthesis (PBE) to Combat Implicit Biases of the Brain A Thesis Presented By Sumona Halder To The Department of Computer Science and Mind Brain Behavior In Partial Fulfillment of the Requirements for the Degree of Bachelor of Arts in the Subject of Computer Science on the Mind Brain Behavior Track Harvard University April 10th, 2020 2 Table of Contents ABSTRACT 3 INTRODUCTION 5 WHAT IS IMPLICIT BIAS? 11 Introduction and study of the underlying cognitive and social mechanisms 11 Methods to combat Implicit Bias 16 The Importance of Reinforced Learning 19 INDUCTIVE BIAS and MACHINE LEARNING ALGORITHMS 22 What is Inductive Bias and why is it useful? 22 How is Inductive Reasoning translated in Machine Learning Algorithms? 25 A Brief History of Inductive Bias Research 28 Current Roadblocks on Research with Inductive Biases in Machines 30 How Inductive Biases have been incorporated into Reinforced Learning 32 I.
    [Show full text]
  • Neural Turing Machines Arxiv:1410.5401V2 [Cs.NE]
    Neural Turing Machines Alex Graves [email protected] Greg Wayne [email protected] Ivo Danihelka [email protected] Google DeepMind, London, UK Abstract We extend the capabilities of neural networks by coupling them to external memory re- sources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to- end, allowing it to be efficiently trained with gradient descent. Preliminary results demon- strate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples. 1 Introduction Computer programs make use of three fundamental mechanisms: elementary operations (e.g., arithmetic operations), logical flow control (branching), and external memory, which can be written to and read from in the course of computation (Von Neumann, 1945). De- arXiv:1410.5401v2 [cs.NE] 10 Dec 2014 spite its wide-ranging success in modelling complicated data, modern machine learning has largely neglected the use of logical flow control and external memory. Recurrent neural networks (RNNs) stand out from other machine learning methods for their ability to learn and carry out complicated transformations of data over extended periods of time. Moreover, it is known that RNNs are Turing-Complete (Siegelmann and Sontag, 1995), and therefore have the capacity to simulate arbitrary procedures, if properly wired. Yet what is possible in principle is not always what is simple in practice. We therefore enrich the capabilities of standard recurrent networks to simplify the solution of algorithmic tasks. This enrichment is primarily via a large, addressable memory, so, by analogy to Turing’s enrichment of finite-state machines by an infinite memory tape, we 1 dub our device a “Neural Turing Machine” (NTM).
    [Show full text]
  • Broadcast Bulletin Issue Number 181 09/05/11
    Ofcom Broadcast Bulletin Issue number 181 9 May 2011 1 Ofcom Broadcast Bulletin, Issue 181 9 May 2011 Contents Introduction 4 Standards cases In Breach Music Video: Rihanna - "S&M" WTF TV, 10 March 2011, 11:25 5 Various ‘adult’ material Red Hot Mums, 8 January 2011, 22:20 to 22:30 Red Hot Mums, 8 January 2011, 23:20 to 23:30 Red Hot Mums, 9 January 2011, 00:20 to 00:30 Red Hot Mums, 9 January 2011, 22:20 to 22:30 10 ITV News ITV1, 14 February 2011, 18:30 14 Zor ka Zatka sponsorship credits NDTV Imagine, 1 February 2011, constantly until 18:00 17 QI Dave, 22 February 2011, 14:00 19 Resolved Chris Evans Breakfast Show BBC Radio 2, 28 January 2011, 08:50 21 The Real Housewives of Orange County ITV2, 11 February 2011, 07:15 23 Advertising Scheduling Cases In Breach Advertising minutage UTV, 13 March 2011, 11:56 25 Fairness & Privacy cases Upheld Complaint by Miss B The Ugly Face of Beauty, Channel 4, 20 July 2010 28 2 Ofcom Broadcast Bulletin, Issue 181 9 May 2011 Not upheld Complaint by Ms Denise Francis The Wire, The Hillz FM, 15 March 2010 34 Other programmes not in breach 40 3 Ofcom Broadcast Bulletin, Issue 181 9 May 2011 Introduction The Broadcast Bulletin reports on the outcome of investigations into alleged breaches of those Ofcom codes and licence conditions with which broadcasters regulated by Ofcom are required to comply. These include: a) Ofcom‟s Broadcasting Code (“the Code”), the most recent version of which took effect on 28 February 2011and covers all programmes broadcast on or after 28 February 2011.
    [Show full text]
  • Memory Augmented Matching Networks for Few-Shot Learnings
    International Journal of Machine Learning and Computing, Vol. 9, No. 6, December 2019 Memory Augmented Matching Networks for Few-Shot Learnings Kien Tran, Hiroshi Sato, and Masao Kubo broken down when they have to learn a new concept on the Abstract—Despite many successful efforts have been made in fly or to deal with the problems which information/data is one-shot/few-shot learning tasks recently, learning from few insufficient (few or even a single example). These situations data remains one of the toughest areas in machine learning. are very common in the computer vision field, such as Regarding traditional machine learning methods, the more data gathered, the more accurate the intelligent system could detecting a new object with only a few available captured perform. Hence for this kind of tasks, there must be different images. For those purposes, specialists define the approaches to guarantee systems could learn even with only one one-shot/few-shot learning algorithms that could solve a task sample per class but provide the most accurate results. Existing with only one or few known samples (e.g., less than 20 solutions are ranged from Bayesian approaches to examples per category). meta-learning approaches. With the advances in Deep learning One-shot learning/few-shot learning is a challenging field, meta-learning methods with deep networks could be domain for neural networks since gradient-based learning is better such as recurrent networks with enhanced external memory (Neural Turing Machines - NTM), metric learning often too slow to quickly cope with never-before-seen classes, with Matching Network framework, etc.
    [Show full text]
  • Sir David Attenborough Trounces Young Stars in Brandpool's Celebrity Trust Poll
    Sir David Attenborough trounces young stars in Brandpool’s celebrity trust poll Submitted by: Friday's Media Group Friday, 9 April 2010 Sir David Attenborough trounces young stars in Brandpool’s celebrity trust poll Sir David Attenborough is the celebrity consumers would most trust as the figurehead for an advertising campaign, according to a survey commissioned by ad agency and creative content providers Brandpool. However, it was Katie Price who narrowly beat John Terry and Ashley Cole to be voted the least believable brand ambassador, followed by Amy Winehouse, Heather Mills, Tiger Woods and Tony Blair – the latter raising questions over Labour’s deployment of its former leader as a ‘secret weapon’ in the run-up to the election. The survey flies in the face of the Cebra study published last week by research agency Millward Brown, which indexed the appeal of celebrities in relation to certain brands. But Brandpool’s research suggests the popularity of young stars such as David Tennant, Cheryl Cole and David Beckham doesn’t always translate into trust. The poll saw 46% of respondents name Sir David as one of their top three choices, with Stephen Fry second on 36% and Richard Branson third on 20%. It was the elder statesmen of British broadcasting who dominated the top 10, with Michael Parkinson, Sir Terry Wogan, Sir David Dimbleby, Jeremy Paxman, Lord Alan Sugar and Jeremy Clarkson also highly rated. These silver-haired stars were favoured despite almost a third of respondents being under 35. And although an even split of men and women voted, the only female celebrity in the top 10 was The One Show presenter Christine Bleakley.
    [Show full text]
  • Training Robot Policies Using External Memory Based Networks Via Imitation
    Training Robot Policies using External Memory Based Networks Via Imitation Learning by Nambi Srivatsav A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science Approved September 2018 by the Graduate Supervisory Committee: Heni Ben Amor, Chair Siddharth Srivastava HangHang Tong ARIZONA STATE UNIVERSITY December 2018 ABSTRACT Recent advancements in external memory based neural networks have shown promise in solving tasks that require precise storage and retrieval of past information. Re- searchers have applied these models to a wide range of tasks that have algorithmic properties but have not applied these models to real-world robotic tasks. In this thesis, we present memory-augmented neural networks that synthesize robot naviga- tion policies which a) encode long-term temporal dependencies b) make decisions in partially observed environments and c) quantify the uncertainty inherent in the task. We extract information about the temporal structure of a task via imitation learning from human demonstration and evaluate the performance of the models on control policies for a robot navigation task. Experiments are performed in partially observed environments in both simulation and the real world. i TABLE OF CONTENTS Page LIST OF FIGURES . iii CHAPTER 1 INTRODUCTION . 1 2 BACKGROUND .................................................... 4 2.1 Neural Networks . 4 2.2 LSTMs . 4 2.3 Neural Turing Machines . 5 2.3.1 Controller . 6 2.3.2 Memory . 6 3 METHODOLOGY . 7 3.1 Neural Turing Machine for Imitation Learning . 7 3.1.1 Reading . 8 3.1.2 Writing . 9 3.1.3 Addressing Memory . 10 3.2 Uncertainty Estimation . 12 4 EXPERIMENTAL RESULTS .
    [Show full text]
  • Learning Operations on a Stack with Neural Turing Machines
    Learning Operations on a Stack with Neural Turing Machines Anonymous Author(s) Affiliation Address email Abstract 1 Multiple extensions of Recurrent Neural Networks (RNNs) have been proposed 2 recently to address the difficulty of storing information over long time periods. In 3 this paper, we experiment with the capacity of Neural Turing Machines (NTMs) to 4 deal with these long-term dependencies on well-balanced strings of parentheses. 5 We show that not only does the NTM emulate a stack with its heads and learn an 6 algorithm to recognize such words, but it is also capable of strongly generalizing 7 to much longer sequences. 8 1 Introduction 9 Although neural networks shine at finding meaningful representations of the data, they are still limited 10 in their capacity to plan ahead, reason and store information over long time periods. Keeping track of 11 nested parentheses in a language model, for example, is a particularly challenging problem for RNNs 12 [9]. It requires the network to somehow memorize the number of unmatched open parentheses. In 13 this paper, we analyze the ability of Neural Turing Machines (NTMs) to recognize well-balanced 14 strings of parentheses. We show that even though the NTM architecture does not explicitely operate 15 on a stack, it is able to emulate this data structure with its heads. Such a behaviour was unobserved 16 on other simple algorithmic tasks [4]. 17 After a brief recall of the Neural Turing Machine architecture in Section 3, we show in Section 4 how 18 the NTM is able to learn an algorithm to recognize strings of well-balanced parentheses, called Dyck 19 words.
    [Show full text]