Connectionism and Classical Conditioning
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Arxiv:2107.04562V1 [Stat.ML] 9 Jul 2021 ¯ ¯ Θ∗ = Arg Min `(Θ) Where `(Θ) = `(Yi, Fθ(Xi)) + R(Θ)
The Bayesian Learning Rule Mohammad Emtiyaz Khan H˚avard Rue RIKEN Center for AI Project CEMSE Division, KAUST Tokyo, Japan Thuwal, Saudi Arabia [email protected] [email protected] Abstract We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones. 1 Introduction 1.1 Learning-algorithms Machine Learning (ML) methods have been extremely successful in solving many challenging problems in fields such as computer vision, natural-language processing and artificial intelligence (AI). The main idea is to formulate those problems as prediction problems, and learn a model on existing data to predict the future outcomes. For example, to design an AI agent that can recognize objects in an image, we D can collect a dataset with N images xi 2 R and object labels yi 2 f1; 2;:::;Kg, and learn a model P fθ(x) with parameters θ 2 R to predict the label for a new image. -
Training Plastic Neural Networks with Backpropagation
Differentiable plasticity: training plastic neural networks with backpropagation Thomas Miconi 1 Jeff Clune 1 Kenneth O. Stanley 1 Abstract examples. By contrast, biological agents exhibit a remark- How can we build agents that keep learning from able ability to learn quickly and efficiently from ongoing experience, quickly and efficiently, after their ini- experience: animals can learn to navigate and remember the tial training? Here we take inspiration from the location of (and quickest way to) food sources, discover and main mechanism of learning in biological brains: remember rewarding or aversive properties of novel objects synaptic plasticity, carefully tuned by evolution and situations, etc. – often from a single exposure. to produce efficient lifelong learning. We show Endowing artificial agents with lifelong learning abilities that plasticity, just like connection weights, can is essential to allowing them to master environments with be optimized by gradient descent in large (mil- changing or unpredictable features, or specific features that lions of parameters) recurrent networks with Heb- are unknowable at the time of training. For example, super- bian plastic connections. First, recurrent plastic vised learning in deep neural networks can allow a neural networks with more than two million parameters network to identify letters from a specific, fixed alphabet can be trained to memorize and reconstruct sets to which it was exposed during its training; however, au- of novel, high-dimensional (1,000+ pixels) nat- tonomous learning abilities would allow an agent to acquire ural images not seen during training. Crucially, knowledge of any alphabet, including alphabets that are traditional non-plastic recurrent networks fail to unknown to the human designer at the time of training. -
Simultaneous Unsupervised and Supervised Learning of Cognitive
Simultaneous unsupervised and supervised learning of cognitive functions in biologically plausible spiking neural networks Trevor Bekolay ([email protected]) Carter Kolbeck ([email protected]) Chris Eliasmith ([email protected]) Center for Theoretical Neuroscience, University of Waterloo 200 University Ave., Waterloo, ON N2L 3G1 Canada Abstract overcomes these limitations. Our approach 1) remains func- tional during online learning, 2) requires only two layers con- We present a novel learning rule for learning transformations of sophisticated neural representations in a biologically plau- nected with simultaneous supervised and unsupervised learn- sible manner. We show that the rule, which uses only infor- ing, and 3) employs spiking neuron models to reproduce cen- mation available locally to a synapse in a spiking network, tral features of biological learning, such as spike-timing de- can learn to transmit and bind semantic pointers. Semantic pointers have previously been used to build Spaun, which is pendent plasticity (STDP). currently the world’s largest functional brain model (Eliasmith et al., 2012). Two operations commonly performed by Spaun Online learning with spiking neuron models faces signifi- are semantic pointer binding and transmission. It has not yet been shown how the binding and transmission operations can cant challenges due to the temporal dynamics of spiking neu- be learned. The learning rule combines a previously proposed rons. Spike rates cannot be used directly, and must be esti- supervised learning rule and a novel spiking form of the BCM mated with causal filters, producing a noisy estimate. When unsupervised learning rule. We show that spiking BCM in- creases sparsity of connection weights at the cost of increased the signal being estimated changes, there is some time lag signal transmission error. -
Generative Linguistics and Neural Networks at 60: Foundation, Friction, and Fusion*
Generative linguistics and neural networks at 60: foundation, friction, and fusion* Joe Pater, University of Massachusetts Amherst October 3, 2018. Abstract. The birthdate of both generative linguistics and neural networks can be taken as 1957, the year of the publication of foundational work by both Noam Chomsky and Frank Rosenblatt. This paper traces the development of these two approaches to cognitive science, from their largely autonomous early development in their first thirty years, through their collision in the 1980s around the past tense debate (Rumelhart and McClelland 1986, Pinker and Prince 1988), and their integration in much subsequent work up to the present. Although this integration has produced a considerable body of results, the continued general gulf between these two lines of research is likely impeding progress in both: on learning in generative linguistics, and on the representation of language in neural modeling. The paper concludes with a brief argument that generative linguistics is unlikely to fulfill its promise of accounting for language learning if it continues to maintain its distance from neural and statistical approaches to learning. 1. Introduction At the beginning of 1957, two men nearing their 29th birthdays published work that laid the foundation for two radically different approaches to cognitive science. One of these men, Noam Chomsky, continues to contribute sixty years later to the field that he founded, generative linguistics. The book he published in 1957, Syntactic Structures, has been ranked as the most influential work in cognitive science from the 20th century.1 The other one, Frank Rosenblatt, had by the late 1960s largely moved on from his research on perceptrons – now called neural networks – and died tragically young in 1971. -
The Neocognitron As a System for Handavritten Character Recognition: Limitations and Improvements
The Neocognitron as a System for HandAvritten Character Recognition: Limitations and Improvements David R. Lovell A thesis submitted for the degree of Doctor of Philosophy Department of Electrical and Computer Engineering University of Queensland March 14, 1994 THEUliW^^ This document was prepared using T^X and WT^^. Figures were prepared using tgif which is copyright © 1992 William Chia-Wei Cheng (william(Dcs .UCLA. edu). Graphs were produced with gnuplot which is copyright © 1991 Thomas Williams and Colin Kelley. T^ is a trademark of the American Mathematical Society. Statement of Originality The work presented in this thesis is, to the best of my knowledge and belief, original, except as acknowledged in the text, and the material has not been subnaitted, either in whole or in part, for a degree at this or any other university. David R. Lovell, March 14, 1994 Abstract This thesis is about the neocognitron, a neural network that was proposed by Fuku- shima in 1979. Inspired by Hubel and Wiesel's serial model of processing in the visual cortex, the neocognitron was initially intended as a self-organizing model of vision, however, we are concerned with the supervised version of the network, put forward by Fukushima in 1983. Through "training with a teacher", Fukushima hoped to obtain a character recognition system that was tolerant of shifts and deformations in input images. Until now though, it has not been clear whether Fukushima's ap- proach has resulted in a network that can rival the performance of other recognition systems. In the first three chapters of this thesis, the biological basis, operational principles and mathematical implementation of the supervised neocognitron are presented in detail. -
History and Philosophy of Neural Networks
HISTORY AND PHILOSOPHY OF NEURAL NETWORKS J. MARK BISHOP Abstract. This chapter conceives the history of neural networks emerging from two millennia of attempts to rationalise and formalise the operation of mind. It begins with a brief review of early classical conceptions of the soul, seating the mind in the heart; then discusses the subsequent Cartesian split of mind and body, before moving to analyse in more depth the twentieth century hegemony identifying mind with brain; the identity that gave birth to the formal abstractions of brain and intelligence we know as `neural networks'. The chapter concludes by analysing this identity - of intelligence and mind with mere abstractions of neural behaviour - by reviewing various philosophical critiques of formal connectionist explanations of `human understanding', `mathematical insight' and `consciousness'; critiques which, if correct, in an echo of Aristotelian insight, sug- gest that cognition may be more profitably understood not just as a result of [mere abstractions of] neural firings, but as a consequence of real, embodied neural behaviour, emerging in a brain, seated in a body, embedded in a culture and rooted in our world; the so called 4Es approach to cognitive science: the Embodied, Embedded, Enactive, and Ecological conceptions of mind. Contents 1. Introduction: the body and the brain 2 2. First steps towards modelling the brain 9 3. Learning: the optimisation of network structure 15 4. The fall and rise of connectionism 18 5. Hopfield networks 23 6. The `adaptive resonance theory' classifier 25 7. The Kohonen `feature-map' 29 8. The multi-layer perceptron 32 9. Radial basis function networks 34 10. -
Connectionist Models of Cognition Michael S. C. Thomas and James L
Connectionist models of cognition Michael S. C. Thomas and James L. McClelland 1 1. Introduction In this chapter, we review computer models of cognition that have focused on the use of neural networks. These architectures were inspired by research into how computation works in the brain and subsequent work has produced models of cognition with a distinctive flavor. Processing is characterized by patterns of activation across simple processing units connected together into complex networks. Knowledge is stored in the strength of the connections between units. It is for this reason that this approach to understanding cognition has gained the name of connectionism. 2. Background Over the last twenty years, connectionist modeling has formed an influential approach to the computational study of cognition. It is distinguished by its appeal to principles of neural computation to inspire the primitives that are included in its cognitive level models. Also known as artificial neural network (ANN) or parallel distributed processing (PDP) models, connectionism has been applied to a diverse range of cognitive abilities, including models of memory, attention, perception, action, language, concept formation, and reasoning (see, e.g., Houghton, 2005). While many of these models seek to capture adult function, connectionism places an emphasis on learning internal representations. This has led to an increasing focus on developmental phenomena and the origins of knowledge. Although, at its heart, connectionism comprises a set of computational formalisms, -
CAP 5636 - Advanced Artificial Intelligence
CAP 5636 - Advanced Artificial Intelligence Introduction This slide-deck is adapted from the one used by Chelsea Finn at CS221 at Stanford. CAP 5636 Instructor: Lotzi Bölöni http://www.cs.ucf.edu/~lboloni/ Slides, homeworks, links etc: http://www.cs.ucf.edu/~lboloni/Teaching/CAP5636_Fall2021/index.html Class hours: Tue, Th 12:00PM - 1:15PM COVID considerations: UCF expects you to get vaccinated and wear a mask Classes will be in-person, but will be recorded on Zoom. Office hours will be over Zoom. Motivating artificial intelligence It is generally not hard to motivate AI these days. There have been some substantial success stories. A lot of the triumphs have been in games, such as Jeopardy! (IBM Watson, 2011), Go (DeepMind’s AlphaGo, 2016), Dota 2 (OpenAI, 2019), Poker (CMU and Facebook, 2019). On non-game tasks, we also have systems that achieve strong performance on reading comprehension, speech recognition, face recognition, and medical imaging benchmarks. Unlike games, however, where the game is the full problem, good performance on a benchmark does not necessarily translate to good performance on the actual task in the wild. Just because you ace an exam doesn’t necessarily mean you have perfect understanding or know how to apply that knowledge to real problems. So, while promising, not all of these results translate to real-world applications Dangers of AI From the non-scientific community, we also see speculation about the future: that it will bring about sweeping societal change due to automation, resulting in massive job loss, not unlike the industrial revolution, or that AI could even surpass human-level intelligence and seek to take control. -
Stochastic Gradient Descent Learning and the Backpropagation Algorithm
Stochastic Gradient Descent Learning and the Backpropagation Algorithm Oliver K. Ernst Department of Physics University of California, San Diego La Jolla, CA 92093-0354 [email protected] Abstract Many learning rules minimize an error or energy function, both in supervised and unsupervised learning. We review common learning rules and their relation to gra- dient and stochastic gradient descent methods. Recent work generalizes the mean square error rule for supervised learning to an Ising-like energy function. For a specific set of parameters, the energy and error landscapes are compared, and con- vergence behavior is examined in the context of single linear neuron experiments. We discuss the physical interpretation of the energy function, and the limitations of this description. The backpropgation algorithm is modified to accommodate the energy function, and numerical simulations demonstrate that the learning rule captures the distribution of the network’s inputs. 1 Gradient and Stochastic Gradient Descent A learning rule is an algorithm for updating the weights of a network in order to achieve a particular goal. One particular common goal is to minimize a error function associated with the network, also referred to as an objective or cost function. Let Q(w) denote the error function associated with a network with connection weights w. A first approach to minimize the error is the method of gradient descent, where at each iteration a step is taken in the direction corresponding to Q(w). The learning rule that follows is −∇ w := w η Q(w); (1) − r where η is a constant parameter called the learning rate. Consider being given a set of n observations xi; ti , where i = 1; 2:::n. -
The Hebbian-LMS Learning Algorithm
Michael Margaliot TheTel Aviv University, Hebbian-LMS Israel Learning Algorithm ©ISTOCKPHOTO.COM/BESTDESIGNS Bernard Widrow, Youngsik Kim, and Dookun Park Abstract—Hebbian learning is widely accepted in the fields of psychology, neurology, and neurobiol- Department of Electrical Engineering, ogy. It is one of the fundamental premises of neuro- Stanford University, CA, USA science. The LMS (least mean square) algorithm of Widrow and Hoff is the world’s most widely used adaptive algorithm, fundamental in the fields of signal processing, control systems, pattern recognition, and arti- I. Introduction ficial neural networks. These are very different learning onald O. Hebb has had considerable influence paradigms. Hebbian learning is unsupervised. LMS learn- in the fields of psychology and neurobiology ing is supervised. However, a form of LMS can be con- since the publication of his book “The structed to perform unsupervised learning and, as such, LMS can be used in a natural way to implement Hebbian learn- Organization of Behavior” in 1949 [1]. Heb- ing. Combining the two paradigms creates a new unsuper- Dbian learning is often described as: “neurons that fire vised learning algorithm that has practical engineering together wire together.” Now imagine a large network of applications and provides insight into learning in living interconnected neurons whose synaptic weights are neural networks. A fundamental question is, how does increased because the presynaptic neuron and the postsynap- learning take place in living neural networks? “Nature’s little secret,” the learning algorithm practiced by tic neuron fired together. This might seem strange. What nature at the neuron and synapse level, may well be purpose would nature fulfill with such a learning algorithm? the Hebbian-LMS algorithm. -
A Review of "Perceptrons: an Introduction to Computational Geometry"
INFORMATION AND CONTROL 17, 501-522 (1970) A Review of "Perceptrons: An Introduction to Computational Geometry" by Marvin Minsky and Seymour Papert. The M.I.T. Press, Cambridge, Mass., 1969. 112 pages. Price: Hardcover $12.00; Paperback $4.95. H. D. BLOCK Department of Theoretical and Applied 2Plechanics, Cornell University, Ithaca, New York 14850 1. INTRODUCTION The purpose of this book is to present a mathematical theory of the class of machines known as Perceptrons. The theory is carefully formulated and focuses on the theoretical capabilities and limitations of these machines. It is a remarkable book. Not only do the authors formulate a new and fundamental conceptual framework, but they also fill in the details using strikingly ingenious mathematical techniques. They ask some novel questions and find some difficult answers. The most striking of these will be presented in Section 2. The authors address the book to three classes of readers: (1) Computer scientists, specializing in pattern recognition, learning machines, and threshold logic; (2) Abstract mathematicians interested in the d6but of Computational Geometry; (3) Those interested in a general theory of computation leading to decisions based on the weight of partial evidence. The authors hope that this class includes psychologists and biologists. In Section 6 I shall give my estimate of the value of the book to each of these groups. The conversational style and the childlike freehand sketches might mislead the casual reader into believing that this book makes light reading. For example, the review in The American Mathematical Monthly (1969) states that the prospective reader "requires little mathematics beyond the high 501 502 BLOCK school level." This, as we shall see, is somewhat sanguine. -
Artificial Neural Networks Supplement to 2001 Bioinformatics Lecture on Neural Nets
Artificial Neural Networks Supplement to 2001 Bioinformatics Lecture on Neural Nets ಲং ྼႨ E-mail: [email protected] http://scai.snu.ac.kr./~btzhang/ Byoung-Tak Zhang School of Computer Science and Engineering SeoulNationalUniversity Outline 1. Basic Concepts of Neural Networks 2. Simple Perceptron and Delta Rule 3. Multilayer Perceptron and Backpropagation Learning 4. Applications of Neural Networks 5. Summary and Further Information 2 1. Basic Concepts of Neural Networks The Brain vs. Computer 1. 1011 neurons with 1. A single processor with 1014 synapses complex circuits 2. Speed: 10-3 sec 2. Speed: 10 –9 sec 3. Distributed processing 3. Central processing 4. Nonlinear processing 4. Arithmetic operation (linearity) 5. Parallel processing 5. Sequential processing 4 What Is a Neural Network? ! A new form of computing, inspired by biological (brain) models. ! A mathematical model composed of a large number of simple, highly interconnected processing elements. ! A computational model for studying learning and intelligence. 5 From Biological Neuron to Artificial Neuron 6 From Biology to Artificial Neural Networks (ANNs) 7 Properties of Artificial Neural Networks ! A network of artificial neurons ! Characteristics " Nonlinear I/O mapping " Adaptivity " Generalization ability " Fault-tolerance (graceful degradation) " Biological analogy <Multilayer Perceptron Network> 8 Synonyms for Neural Networks ! Neurocomputing ! Neuroinformatics (Neuroinformatik) ! Neural Information Processing Systems ! Connectionist Models ! Parallel Distributed Processing (PDP) Models ! Self-organizing Systems ! Neuromorphic Systems 9 Brief History ! William James (1890): Describes (in words and figures) simple distributed networks and Hebbian Learning ! McCulloch & Pitts (1943): Binary threshold units that perform logical operations (they prove universal computations!) ! Hebb (1949): Formulation of a physiological (local) learning rule ! Rosenblatt (1958): The Perceptron - a first real learning machine ! Widrow & Hoff (1960): ADALINE and the Windrow-Hoff supervised learning rule.