Modeling Reward-Guided Decision-Making with a Biophysically Plausible Attractor Network and the Belief-Dependent Learning Rule
Total Page:16
File Type:pdf, Size:1020Kb
Modeling reward-guided decision-making with a biophysically plausible attractor network and the belief-dependent learning rule Rebecca Sier A master project executed at the Neural Information Processing Group of the Technische Universität Berlin as part of the final requirements for fulfillment of the Master of Science in Brain and Cognitive Sciences of the Universiteit van Amsterdam January – September 2014 Supervisor: Audrey Houillon, PhD UvA representative: Dr. Leendert van Maanen Co-assessor: Dr. Leendert van Maanen 41 ECTS Final date: September 24, 2014 Student ID: 5893011 Study track: Cognition TABLE OF CONTENTS Abstract 4 Acknowledgements 5 Introduction 6 Methods 9 Theory on models of reward-guided decision-making 9 Making decisions: an attractor model of perceptual decision-making 9 Learning from decisions: the belief-dependent learning rule 10 Approximations of the attractor model for reward-guided decision-making 13 Experimental methods 15 Participants 15 Testing paradigm 15 Behavioral and imaging data 17 Computational model simulations 17 The sigmoid probability function 17 The reduced attractor model for decision-making 18 Results 22 Training phase 22 Estimating the noise parameter 휎 22 Learning behavior: age-independent 23 Learning behavior: age-dependent 24 Performance phase 26 Fitting the noice parameter 휎 27 Age independent 27 Age dependent 29 Physiological results 31 Discussion 32 Model performance 32 2 Testing the model hypotheses by Frank et al. 33 References 35 Appendix 37 Part A: surface plots for fitting the sigmoid function to training phase behavior 37 Part B: learning curves and learning phase 39 Learning curves with noise parameter 휎 = 0.0970 39 Learning curves dependent on age, two values of sigma 39 Part C: age independent performance behavior with 흈 = ퟎ. ퟏퟓ 41 3 ABSTRACT Elderly people and patients suffering from Parkinson’s disease learn more from negative than from posi- tive decision outcomes as compared to healthy controls. Previous research hypothesized that this error- avoidant behavior is partly caused by reduced levels of dopamine, and a model of this effect in the basal ganglia was constructed. However, the model is subject to discussion and hard to verify on the level of the basal ganglia. This thesis hypothesizes that the attractor model for decision-making combined with the belief-dependent learning rule can verify the basal ganglia model. It is shown how this combined model includes features of the basal ganglia model and explains them with greater detail. Simulations of the combined model make both correct and incorrect predictions of reward-guided decision behavior and neurotransmitter values of subjects that perform a probabilistic selection task. However, the empirical da- ta is restricted and future research should be done to decide on the usefulness and the correct implemen- tation of the model. 4 ACKNOWLEDGEMENTS The research leading to this thesis was performed at the Neural Information Processing Group at the Technische Universität Berlin, during the Spring and Summer of 2014. The project was partly funded by Erasmus+ and the Spinozafonds of the Amsterdams Universiteitsfonds. I would like to thank the colleagues I had at the research group, who gave me a warm welcome to the TU Berlin and to their great city. Special thanks to Audrey Houillon, PhD, who supervised me greatly throughout the project, providing comments and asking me the right questions even during an Asian holiday. Thank you Prof. Dr. Klaus Obermayer for giving me the opportunity to work in your research group. And finally I thank co-supervisor Dr. Leendert van Maanen and the people from the University of Amsterdam’s research master Brain & Cognitive Sciences, who gave me the freedom to create my own, more computationally oriented program within the master. 5 INTRODUCTION The ability to learn from the punishment or reward that follows a decision is an essential function of cognition. Consequently, reinforcement learning or the ability to incorporate feedback to improve future actions, is a much researched topic in psychology and neuroscience. A correct understanding of how reward-guided decision-making works in the brain is very useful, opening doors to better care for people with learning and decision-making disabilities and an increased fundamental knowledge on an essential part of human functioning. Deviations from normal reward-guided decision-making behavior provide first insights into its functioning. Compared to healthy controls, patients suffering from Parkinson’s disease (PD) off dopamine (DA) medication learn more from negative decision outcomes (Frank, Seeberger, & O’Reilly, 2004), while these patients learn less from positive decision outcomes. Similar results are found in older seniors, which have shown to be more error-avoidant than younger seniors (Frank & Kong, 2008). Shared by both PD patients and elderly people is a depleted tonic DA level. When making up for this deficit with DA medication the learning bias towards negative outcomes is reversed: PD patients on DA medication learn more from correct than from incorrect decisions (Frank et al., 2004). Similarly, the DA hypothesis of cognitive aging states that cognitive changes or impairments related to aging are at least partly caused by a decline in DA level (Frank & Kong, 2008). Younger adults on drugs that reduce DA levels showed to have the same error avoidance bias as elderly people, while drugs that increase DA levels remove the negative bias (Frank & Kong, 2008). In line with the mentioned correlations between DA level and reward-guided decision-making, Frank at al. (Frank & Kong, 2008; Frank et al., 2004) hypothesized that DA plays a crucial role in reward- guided decision-making. Moreover, error-avoidant behavior is supposed to be partly caused by reduced levels of DA. Exactly how DA level can have this effect is described in a biologically based network model. In the model for reward-guided decision-making by Frank et al. (Frank & Kong, 2008; Frank et al., 2004; Frank, 2005) DA is supposed to play a modulatory role, influencing pathways through nuclei of the basal ganglia (BG) that are related to decision-making. According to the model, the “Go” and “NoGo” (or “direct” and “indirect”) pathways together act as a gate, respectively facilitating and suppressing the different competing response options (Mink, 2003; Nambu, Tokuno, & Takada, 2002; Nambu, 2004). Activity of a choice alternative’s “Go” pathway in the BG is in favor of its probability to be chosen, while the corresponding “NoGo” pathway subtracts from this probability. With learning, DA modulates the pathways in order to change these probabilities differentially over the available stimulus-response options. 6 The BG model by Frank et al. correctly predicts how a lower amount of tonic DA level disrupts reinforcement learning (Frank et al., 2004): small negative phasic DA changes suffice to reach a lower threshold for achieving synaptic plasticity, while the positive threshold is only reached in the case of large positive DA changes. This explains why PD patients, elderly people and healthy subjects on DA-reducing drugs learn more easily from negative than from positive reinforcement. The opposite is true for elevated amounts of tonic DA, which results in reward-seeking instead of error-avoidant decision-making. Despite of the correct predictions the model by Frank et al. makes, it is hard to verify. The exact function of the BG is still investigated, and shown to be less clear than assumed in Frank et al.’s model (Utter & Basso, 2008). The two segregated pathways are shown to be an oversimplification of the actual circuitry of the BG, also when including a third “hyperdirect” pathway (Parent & Hazrati, 1995). A closer look at the anatomy of the BG shows how nuclei in the Go pathway are not strictly segregated from the NoGo pathway and vice versa. DA receptors D1 and D2 – which are supposed to be segregated per pathway according to the Frank et al. model – are shown to be located at the inputs of both pathways. Due to its simplifications, the model by Frank et al. is yet unable to give an analytical account of reward-guided decision-making. Model predictions of subjects’ learning and decision behavior are restricted to be only qualitative (Frank et al., 2004). In order to test the model predictions made by Frank et al., this thesis aims to construct and test an analytical model of reward-guided decision-making. The model requires detailed knowledge of the neural system, specifically where it comes to individual neurons’ electrochemical dynamics and interconnectivity. A biophysically plausible attractor model for perceptual decision-making created by Wang (2002) will be used, including parameters for individual neuron, neurotransmitter and synaptic gating dynamics. The model is shown to successfully predict behavioral and electrophysiological decision-making data. A learning rule at the same, detailed level of analysis is created by Soltani, Lee and Wang (2006). In this paper it is hypothesized that a combination of the attractor model for decision-making and the learning rule can analytically predict reward-guided decision-making at the neural network level, capturing the effects of age and dopamine on learning behavior. This paper will be structured as follows. The methods section first introduces the theory behind the attractor model for decision-making (Wang, 2002), the addition of a learning rule (Soltani et al., 2006) and model approximations that are used in the current experiment. Secondly, the experimental paradigm that was used to empirically test the reward-guided decision-making behavior of human subjects is explained in detail. Empirical data consists of age-dependent reward-guided learning behavior and neurochemical activity assessed with magnetic resonance spectroscopy (MRS) and positron emission tomography (PET). The third part of the methods section provides the details of the computational model that was used to simulate the tested subjects’ behavior. Moreover, this section shows how the belief- dependent learning rule can be used in combination with a mean-field reduction of the attractor model for 7 decision-making, which has not been done before.