Supplementary Eye Field Encodes Reward Prediction Error
Total Page:16
File Type:pdf, Size:1020Kb
2950 • The Journal of Neuroscience, February 29, 2012 • 32(9):2950–2963 Behavioral/Systems/Cognitive Supplementary Eye Field Encodes Reward Prediction Error NaYoung So1 and Veit Stuphorn1,2 1Department of Neuroscience, The Johns Hopkins University School of Medicine and Zanvyl Krieger Mind/Brain Institute, and 2Department of Psychological and Brain Sciences, The Johns Hopkins University, Baltimore, Maryland 21218 The outcomes of many decisions are uncertain and therefore need to be evaluated. We studied this evaluation process by recording neuronalactivityinthesupplementaryeyefield(SEF)duringanoculomotorgamblingtask.Whilethemonkeysawaitedtheoutcome,SEF neurons represented attributes of the chosen option, namely, its expected value and the uncertainty of this value signal. After the gamble result was revealed, a number of neurons reflected the actual reward outcome. Other neurons evaluated the outcome by encoding the difference between the reward expectation represented during the delay period and the actual reward amount (i.e., the reward prediction error). Thus, SEF encodes not only reward prediction error but also all the components necessary for its computation: the expected and theactualoutcome.ThissuggeststhatSEFmightactivelyevaluatevalue-baseddecisionsintheoculomotordomain,independentofother brain regions. Introduction ent evaluative stages can indicate whether a brain area contains all In most real-life decisions, the outcomes are uncertain or can the neuronal signals sufficient to compute RPE signals. change over time. The values assigned to particular options or We hypothesized that the supplementary eye field (SEF) plays actions are only approximations and need to be continually up- a critical role in the evaluation of value-based decisions. SEF dated. This evaluation process can be conceptualized in terms of encodes the incentive value of saccades (i.e., their action value) two different signals: valence and salience, which describe the (So and Stuphorn, 2010; Stuphorn et al., 2010) and therefore sign and size of the mismatch between expected and actual out- likely plays an important role in value-based decisions in the come, respectively. Valence reflects rewards and punishments in oculomotor domain. It has appropriate anatomical connections an opponent-like manner, while salience reflects their motiva- both to limbic and prefrontal areas that encode value signals and tional importance, independent of value. oculomotor areas that generate eye movements (Huerta and Various neuronal correlates of outcome evaluation signals Kaas, 1990). However, SEF neurons responded also to the antic- have been reported (Stuphorn et al., 2000; Belova et al., 2007; ipation of reward delivery (Amador et al., 2000; Stuphorn et al., Hayden et al., 2008; Matsumoto and Hikosaka, 2009b; Roesch et 2000), as well as to errors and response conflict (Stuphorn et al., al., 2010). Progress in understanding these signals has come from 2000; Nakamura et al., 2005). Thus, the SEF contains in the same reinforcement learning models (Rescorla and Wagner, 1972; cortical area both action value signals and evaluative signals that Pearce and Hall, 1980). A key element in many of these models is could be used to update the action value signals. the “reward prediction error” (RPE) (i.e., the difference between To investigate the role of SEF neurons in value-based decision the expected and the actual reward outcome) (Sutton and Barto, making, we designed an oculomotor gambling task (see Fig. 1). 1998). The RPE reflects both valence and salience of an outcome Our results show that, during the delay period, SEF neurons rep- using its sign and magnitude. Although many studies reported resented the subjective value of the chosen option and the uncer- RPE-related signals in various brain areas, it is still unclear where tainty associated with this outcome prediction. After the gamble these evaluative signals originate. result was revealed, some neurons represented the actual out- The outcomes of many real-life decisions not only are uncer- come, while others encoded the difference between the predicted tain but also manifest themselves only after some delay. To com- and the actual reward amount (i.e., the RPE). These findings pute a RPE signal, it is therefore necessary to represent suggest that SEF is capable of independently evaluating value- information about the choice during this delay period. As a re- based decisions, as all the necessary components for the compu- sult, investigating these two temporally and qualitatively differ- tation of the RPE signals are represented there. Materials and Methods General Received Aug. 29, 2011; revised Dec. 17, 2011; accepted Jan. 9, 2012. Two rhesus monkeys (both male; monkey A, 7.5 kg; monkey B, 8.5 kg) Author contributions: N.Y.S. and V.S. designed research; N.Y.S. performed research; N.Y.S. analyzed data; N.Y.S. were trained to perform the tasks used in this study. All animal care and and V.S. wrote the paper. experimental procedures were approved by The Johns Hopkins Univer- This work was supported by National Eye Institute Grant R01-EY019039 (V.S.). We are grateful to P. C. Holland and J. D. Schall for comments on this manuscript. sity Animal Care and Use Committee. During the experimental sessions, Correspondence should be addressed to Veit Stuphorn, The Johns Hopkins University, 338 Krieger Hall, 3400 each monkey was seated in a primate chair, with its head restrained, North Charles Street, Baltimore, MD 21218. E-mail: [email protected]. facing a video screen. Eye position was monitored with an infrared cor- DOI:10.1523/JNEUROSCI.4419-11.2012 neal reflection system (Eye Link; SR Research) and recorded with the Copyright © 2012 the authors 0270-6474/12/322950-14$15.00/0 PLEXON system (Plexon) at a sampling rate of 1000 Hz. We used a newly So and Stuphorn • SEF Encodes Reward Prediction Error J. Neurosci., February 29, 2012 • 32(9):2950–2963 • 2951 developed fluid delivery system that was based on two syringe pumps the same reward options as in the aborted trial. In this way, the monkey connected to a fluid container that were controlled by a stepper motor. was forced to sample every reward contingency evenly. The location of This system was highly accurate across the entire range of fluid amounts the targets, however, was randomized, so that the monkey could not used in the experiment. prepare a saccade in advance. The sequence of events in no-choice trials was the same as in choice Behavioral task trials, except that only one target was presented (see Fig. 1A). The loca- In the gambling task, the monkeys had to make saccades to peripheral tion of the target was randomized across the same four quadrants on the targets that were associated with different amounts of water reward (see screen that were used during choice trials. In no-choice trials, we used Fig. 1A). The fixation window around the central fixation point was Ϯ1° individually all seven sure and six gamble options that were presented in of visual angle. The targets were squares of various colors, 2.25 ϫ 2.25° of combination during choice trials. We presented no-choice and choice visual angle in size. They were always presented 10° away from the central trials interleaved in a pseudorandomized schedule in blocks of trials that fixation point at a 45, 135, 225, or 315° angle. The luminance of the consisted of all 24 different choice trials and 13 different no-choice trials. background was 11.87 cd/m 2. The luminance of the targets varied with This procedure ensured that the monkeys were exposed to all the trial color (ranging from 25.81 to 61.29 cd/m 2) but was in all cases highly types equally often. Within a block, the order of appearance was random- salient. The task consisted of two types of trials, choice trials and no- ized and a particular trial was never repeated, so that the monkeys could choice trials. In choice trials, two targets appeared on the screen and the not make a decision before the targets were shown. Randomized loca- monkeys were free to choose between them by making an eye movement tions of the targets in each trial also prevented the monkeys from prepar- to the target that was associated with the desired option. In no-choice ing a movement toward a certain direction before the target appearance. trials, only one target appeared on the screen so that the monkeys were In addition, presenting a target associated with the same option in dif- forced to make a saccade to the given target. No-choice trials were de- ferent locations allowed us to separate the motor decision from the value signed as a control to compare the behavior of the monkeys and the cell decision. activities when no decision was required. Estimation of subjective value of gamble options Two targets in each choice trial were associated with a gamble option In everyday life, a behavioral choice can yield two or more outcomes and a sure option, respectively. The sure option always led to a certain of varying value with different probabilities. A decision maker that is reward amount. The gamble option led with a certain set of probabilities indifferent to risk should base his decision on the sum of values of the to one of two possible reward amounts. We designed a system of color various outcomes weighted by their probabilities (i.e., the expected cues, to explicitly indicate to the monkeys the reward amounts and prob- value of the gamble). However, humans and animals are not indiffer- abilities associated with a particular target (see Fig. 1B). Seven different ent to risk and their actual decisions deviate from this prediction in a colors indicated seven reward amounts (increasing from 1 to 7 units of systematic fashion. Thus, the subjective value of a gamble depends on water, where 1 unit equaled 30 l). Targets indicating a sure option the risk attitude of a decision maker.