Time-Derivative Models of Pavlovian Reinforcement Richard S
Approximately as appeared in: Learning and Computational Neuroscience: Foundations of Adaptive Networks, M. Gabriel and J. Moore, Eds., pp. 497–537. MIT Press, 1990. Chapter 12 Time-Derivative Models of Pavlovian Reinforcement Richard S. Sutton Andrew G. Barto This chapter presents a model of classical conditioning called the temporal- difference (TD) model. The TD model was originally developed as a neuron- like unit for use in adaptive networks (Sutton and Barto 1987; Sutton 1984; Barto, Sutton and Anderson 1983). In this paper, however, we analyze it from the point of view of animal learning theory. Our intended audience is both animal learning researchers interested in computational theories of behavior and machine learning researchers interested in how their learning algorithms relate to, and may be constrained by, animal learning studies. For an exposition of the TD model from an engineering point of view, see Chapter 13 of this volume. We focus on what we see as the primary theoretical contribution to animal learning theory of the TD and related models: the hypothesis that reinforcement in classical conditioning is the time derivative of a compos- ite association combining innate (US) and acquired (CS) associations. We call models based on some variant of this hypothesis time-derivative mod- els, examples of which are the models by Klopf (1988), Sutton and Barto (1981a), Moore et al (1986), Hawkins and Kandel (1984), Gelperin, Hop- field and Tank (1985), Tesauro (1987), and Kosko (1986); we examine several of these models in relation to the TD model. We also briefly ex- plore relationships with animal learning theories of reinforcement, including Mowrer’s drive-induction theory (Mowrer 1960) and the Rescorla-Wagner model (Rescorla and Wagner 1972).
[Show full text]