Aversive Reinforcement Learning

Aversive Reinforcement Learning

Aversive Reinforcement Learning Ben Seymour Wellcome Trust Centre for Neuroimaging, UCL 12 Queen Square, London WC1N 3BG Supervisors: Ray Dolan Richard Frackowiak Karl Friston Submitted for the consideration of a PhD in Neurological Science. 1 Declaration. I, Ben Seymour, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Abstract. We hypothesise that human aversive learning can be described algorithmically by Reinforcement Learning models. Our first experiment uses a second-order conditioning design to study sequential outcome prediction. We show that aversive prediction errors are expressed robustly in the ventral striatum, supporting the validity of temporal difference algorithms (as in reward learning), and suggesting a putative critical area for appetitive-aversive interactions. With this in mind, the second experiment explores the nature of pain relief, which as expounded in theories of motivational opponency, is rewarding. In a Pavlovian conditioning task with phasic relief of tonic noxious thermal stimulation, we show that both appetitive and aversive prediction errors are co-expressed in anatomically dissociable regions (in a mirror opponent pattern) and that striatal activity appears to reflect integrated appetitive-aversive processing. Next we designed a Pavlovian task in which cues predicted either financial gains, losses, or both, thereby forcing integration of both motivational streams. This showed anatomical dissociation of aversive and appetitive predictions along a posterior-anterior gradient within the striatum, respectively. Lastly, we studied aversive instrumental control (avoidance). We designed a simultaneous pain avoidance and financial reward learning task, in which subjects had to learn independently learn about each, and trade off aversive 2 and appetitive predictions. We show that predictions for both converge on the medial head of caudate nucleus, suggesting that this is a critical site for appetitive-aversive integration in instrumental decision making. We also study also tested whether serotonin (5HT) modulates either phasic or tonic opponency using acute tryptophan depletion. Both behavioural and imaging data confirm the latter, in which it appears to mediate an average reward term, providing an aspiration level against which the benefits of exploration are judged. In summary, our data provide a basic computational and neuroanatomical framework for human aversive learning. We demonstrate the algorithmic and implementational validity of reinforcement learning models for both aversive prediction and control, illustrate the nature and neuroanatomy of appetitive-aversive integration, and discover the critical (and somewhat unexpected) central role for the striatum. 3 Ackowledgements. I would like to thank my supervisors Karl Friston, Richard Frackowiak and Ray Dolan for their help and support. I would also especially like to thank John O’Doherty, Nathaniel Daw and Peter Dayan for their substantial scientific guidance and mentorship throughout my doctoral research. I also thank a number of valued collaborators, including Chris Frith, Martin Koltzenburg, Anthony Jones, Katya Wiech, Tania Singer, Jon Roiser, Ivo Vlaev and Nick Chater, and many other fellow fellows at the Functional Imaging Lab and broader UCL neuroscience community. I would also like to thank the Wellcome Trust for their financial support, and the Gatsby Charitable Foundation for supporting the ‘Pain and Learning’ workshop in 2006, organised with by myself, Martin Koltzenburg and Peter Dayan. 4 Contents. 1. Introduction…………………………………………………………12 1.1 Summary …………………………………….………………12 1.2 Animal aversive learning theory……………………………16 1.2.1 Innate and Pavlovian value............................................16 1.2.2 Action learning………………………………………….23 1.2.3 Avoidance learning……………………………………..26 1.3 Neurobiological basis of aversive learning…………………29 1.3.1 Ascending nociceptive pathways………………………29 1.3.2 Anticipation of pain…………………………………….30 1.3.3 Aversive learning systems……………………………..33 1.4 Reinforcement learning………………………………………36 1.4.1 General principles of a computational framework…..36 1.3.2 Formalising motivation and learning…………………37 2. Methods……………………………………………………..………..43 2.1 Physics of MRI…………………………….……….………..45 2.2 Analysis of fMRI data……………………………….………47 2.3 Experimental design…………………………………………51 2.4 Psychophysical measures……………………………………53 2.5 Pain stimulation………………………………………...……54 3. Experiment 1: Second order Pavlovian aversive learning……….57 3.1 Introduction…………………………………………………57 3.2 Methods………………………………………………………60 3.3 Results………………………………………………………..64 3.4 Discussion……………………………………………………67 5 4. Experiment 2: Appetitive-aversive Pavlovian learning of phasic relief and exacerbations of tonic pain…………………….………………70 4.1 Introduction………………………………………………….70 4.2 Methods……………………………………………………....73 4.3 Results………………………………………………………..79 4.4 Discussion…………………………………………………….87 5. Experiment 3: Appetitive-aversive Pavlovian learning of mixed monetary gains and losses…………………………………………..93 5.1 Introduction………………………………………………….95 5.2 Methods………………………………………………………88 5.3 Results………………………………………………….…….100 5.4 Discussion…………………………………………………….105 6. Experiment 4: Instrumental learning for monetary gains and avoidance of pain, and the effect of tryptophan depletion .....…..108 6.1 Introduction…………………………………………………108 6.2 Methods……………………………………………………...111 6.3 Results………………………………………………………..120 6.4 Discussion……………………………………………………126 7. Discussion: contributions………………….………………………130 7.1 Methodological contributions…………………………….130 7.1.1 Computational approach to pain…………..……….130 7.1.2 Model-based fMRI……………………..……………131 7.2 Computational and psychological contributions…….…132 7.2.1 The validity of TD models for aversive learning….132 7.2.2 Extension of TD to avoidance learning…………….133 7.2.3 The existence of opponent motivational systems….134 7.2.4 The integrated choice model………………………..135 7.2.5 Average reward models……………………………..136 6 7.3 Neurobiological contributions………………………….…137 7.3.1 The role of the basal ganglia in aversion………..…..137 7.3.2 The anatomy of opponent systems……………..……139 7.3.3 The role of serotonin…………………………………139 8. Discussion: the architecture of aversive motivation…………..….141 8.1 Value systems…………………………………………...141 8.1.1 Innate values……………………..……………141 8.1.2 Forward model values………………………..142 8.1.3 Cached values…………………………………144 8.1.4 Long run values………………….……………145 8.2 Control (actions)………………………………………..146 8.2.1 Goal-directed…………………….……………147 8.2.2 Habits………………………………….………147 8.2.3 Pavlovian………………………………………148 8.3 Constructing aversive behaviour……….……………..150 8.3.1 Pavlovian-instrumental interactions….……..151 8.3.2 Avoidance……………………………………..152 8.4 The role of the amygdala in aversive motivation…......156 9. Discussion: consequences for behavioural economics……..……..161 9.1 Historical and methodological issues………………….161 9.2. Pavlovian influences on choice……………………….163 9.2.1. Impulsivity…………………………..………164 9.2.2 Framing effects………………………..…….165 9.2.3 Depressive realism………………..…………167 9.2.4. Dread……………………………………..…168 9.3 Explicit judgement and value relativity……………...169 9.3.1 Behavioural evidence………………………..169 9.3.2 Neurobiological insights……………………..175 9.3.2.1 Relative coding of value…………....175 9.3.3.2 Adaptive scaling…………………….176 9.3.2.3 Expectation, inference and placebo..177 7 9.3.2.4 Equating value in transactions…....178 9.3.6 Discussion……………………….……180 9.4 Aversion in social environments………………..................182 9.4.1 Introduction…………………………………………..182 9.4.2 Experimental observations of punishment in animals and humans…………………………………….………..182 9.4.3 Neuroimaging studies in humans……………………188 9.4.3. A neurobiological model……………………….……190 9.4.4 Altruistic punishment………………………………...192 8 Publications arising directly from this submission. 1. Ben Seymour, John O'Doherty, Peter Dayan, et al Temporal difference models describe higher-order learning in humans. Nature. 2004 Jun 10;429(6992):664-7. (Chapter 3) 2. Ben Seymour, John O'Doherty, Martin Koltzenburg, et al. Opponent appetitive-aversive neural processes underlie predictive learning of pain relief. Nature Neuroscience. 2005 Sep;8(9):1234-40. (Chapter 4). 3. Ben Seymour, Tania Singer, Ray Dolan. The neurobiology of punishment. Nature Reviews Neuroscience 2007 8; 300-11. (Chapter 1 and 7). 4. Ben Seymour, Nathaniel Daw, Peter Dayan, Tania Singer, Ray Dolan. Differential responses to gains and losses in human striatum. Journal of Neuroscience. 2007 May 2;27(18):4826-31. 5. Peter Dayan, Ben Seymour. Values and actions in aversion. In Neuro- economics: decision making in the brain Eds. Glimcher, Fehr, Camerer and Poldrack. Elsevier 2008. (Chapter 1 and 7). 6. Ben Seymour and Ray Dolan. Emotion, decision making and the amygdala. Neuron June 12th 2008. (Chapter 1 and 7). 7. Ben Seymour and Sam McClure. Anchors, scales and relative coding of value in the brain. Current Opinion in Neurobiology. 2008 Apr;18(2):173-8. (Chapter 7). 8. Ivo Vlaev and Ben Seymour, Ray Dolan, Nick Chater. The price of pain and the value of suffering. Psychological Science, 2009. (Chapter 7). 9. Ben Seymour, Wako Yoshida, Ray Dolan. Altruistic Learning. Frontiers in Neuroscience, 23:3 2009. (Chapter 7). 9 10. Ben Seymour, Nathaniel Daw, Peter Dayan, Jon Roiser, Karl Friston, Ray Dolan. Serotonin mediates tonic appetitive-aversive interactions in human striatum. Submitted. (chapter 6). Figures. 1.1 Basic classification of

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    223 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us