Reward and Loss Task

Total Page:16

File Type:pdf, Size:1020Kb

Reward and Loss Task

Supplementary Information

Reward and Loss Task

Before scanning, participants had a short training session on the task using different pairs of visual stimuli to those used in the scanner. Participants were told they would be presented with three pairs of pictures and on each trial would select one of the two stimuli. Depending on their choices they were presented messages that indicated change or no change in points: "win" or "nothing" on reward trials, "lose" or "nothing" on loss trials, "no change" (in points) or "nothing" on neutral trials. Participants were instructed to try and accumulate as many points as possible and were told that at the end of the scanning session, wins and losses would be summed, and that they would receive a gift voucher for an amount based on their accumulated points. This was calculated as

(net- points)/2 + 10. On average volunteers received a gift voucher for £16.Volunteers played four sessions of the task whilst being scanned, each session lasting 13 minutes.

Each session contained 20 trials of each condition (reward, neutral, loss). The sequence of trial types and inter-trial timing variation (“jitter”) was determined using the Optseq

(http://surfer.nmr.mgh.harvard.edu/optseq/) algorithm, designed to optimize detection of the neural signals of interest.

Reinforcement Learning Model

A reinforcement learning algorithm was used to estimate prediction errors signals for modelling neural function in the ventral striatum. Each participant’s sequence of choices and outcomes was used as input for the model. Given a pair of stimuli A and B, the model estimates the expected value of choosing A (Qa) and choosing B (Qb). On reward trials, these Q values can be interpreted as the expected reward if the corresponding action is selected. Analogous Q values for loss-avoidance were also calculated.

On every trial i, a prediction error δ was calculated

 i  Ri Qa i

where a is the chosen action on trial i and R is the reinforcement corresponding to the outcome of trial i. The prediction error was then used to update the Q value of the chosen stimulus according to the rule:

Qa i 1  Qa i * i

where α is the learning rate. The reinforcement magnitude R was coded as 1/0 for win/no-win outcomes on the reward trials and 1/0 for loss-avoidance/loss outcomes on the loss trials. The Q value estimates were initially set to zero. Based on the Q values, for each trial, the probability of choosing each action is calculated. The probability of choosing action a, was calculated using the softmax rule as:

Q i e a Pa i  Q i Q i e a  e b where β is the ‘inverse temperature’. The probability of choosing b was calculated in an analogous manner.

A low β implies all actions become equally probable whereas a high β implies behavioral choices are more dependent on actions with different value estimates. For the model based image analysis, values for the constants α and β had to be chosen. We selected α and β to maximize the log likelihood of the subjects actual choices according to the model. As in previous studies , a single set of parameters was fitted across groups and subjects since it has been noted that multi-subject fMRI results are more robust if a single set of parameters is used to generate regressors for all subjects . For the reward/loss conditions we used α = 0.23/0.25 and β = 3.5/2.1 as these values were found to be optimal.

fMRI Data Acquisition, Preprocessing and Analysis

For blood oxygen level dependent (BOLD) response imaging, T2* weighted gradient echo planar images were obtained using a 3T Siemens Magnetom Trio Tim MRI scanner with a 12-channel head coil. A total of 37 sequential slices of 3.5 mm thickness and 0.5 mm slice gap were obtained for each volume. In order to minimize the susceptibility artifact, slice orientation was initially orientated parallel to the AC-PC line, then rotated 30 degrees towards the coronal plane for scanning . Three hundred and ten volumes were obtained with a TR of 2.5 s, TE 30 ms, flip 90º, FOV 224 mm and matrix

64x64. The first four volumes were discarded to allow for scanner transient effects.

SPM8 (http://www.fil.ion.ucl.ac.uk/spm) was used for analysis. The first image from each session was aligned to the first scan of the first session. Then the images from each session were aligned to the first image of the session. The average realigned image was used to derive parameters for spatial normalization to the SPM8 Montreal

Neurological Institute (MNI) template with the parameters applied to each image in each time-series. The resultant time-series realigned and spatially normalized images were then smoothed with an 8 mm FWHM Gaussian kernel.

During reward learning tasks, activity in the ventral striatum is thought to be better described by reward prediction errors (mismatch between the predicted and actual outcome) rather than by a simple reward vs. no-reward contrast . To further examine activity in the ventral striatum we used a standard reinforcement learning algorithm

(Supplementary material) that estimated prediction errors for the reward and loss trials at the outcome time of the task. A general linear model was defined with prediction errors generated from the reinforcement learning model used as parametric modulations at the outcome time points, separately for reward and loss conditions. Regressors were also included for each trial type onset, for the neutral condition outcomes and realignment parameters. Beta images comprising linear regression coefficients at each voxel for the modeled activity vs. observed BOLD signal were taken to second level analyses and within and between groups activations examined using one-sample and two-sample t-test.

Behavioral Analysis of Reinforcement Learning Parameters

In addition, behavioral analyses of reinforcement learning parameters α and β were implemented. Between groups random effects analyses of α and β parameters were done. First, α and β parameters were estimated for each subject individually by maximizing the likelihood of each subject’s choices under the model. Second, each subject’s parameters were re-estimated applying prior information about the likely range of parameters (the prior being derived from the previous stage) to regularize estimates and avoid extreme (implausible) α or β values due to the inherent noisiness of the maximum likelihood estimation . Parameter estimates were taken to a second level and t- tests used to test null hypotheses of no difference between groups. No random effects between group differences were identified for either reward or loss trials.

Controlling for Potential Confounds

Patients and controls differed in average IQ, symptoms of low mood and anxiety, and nicotine use. We therefore tested whether between group differences in neural activity remained significant after controlling for these differences. This was done by repeating the image analyses having the WASI IQ score, HADS depression and anxiety scores and the Fagerstrom score, as covariates. Between group differences (patients vs. controls) in the bilateral ventral striatum and midbrain/para-hippocampal gyrus for the aversive trials at the outcome time remained significant at the same significance threshold. The between group difference in the insula found on the aversive trials at the anticipation/decision time and the difference in the caudate found on the reward trials remained significant at a reduced threshold of significance (p < 0.005 uncorrected). Supplementary Table S1 Participant details

Controls All Patients ADM BDM

n 23 30 15 15

Age 31.30±7.17 34.07±4.31 33.47±4.78 34.67±3.84

NART 118.96±6.37 114.40±5.99 113.53±6.23 115.27±5.82

IQ-WASI 101.64±9.11 91.73±11.60 88.60±13.51 94.87±8.67

HADS-D 1.61±2.48 4.17±3.38 4.67±3.70 3.67±3.09

HADS-a 3.87±3.90 6.10±4.40 6.67±5.18 5.53±3.56

Methadone dose -- 74.87±19.01 79.20±20.36 70.53±17.15 (mg/day)

Values are mean ± SD; NART, National Adult Reading Test; WASI, Wechsler

Abbreviated Scale of Intelligence; HADS-D/a, Hospital Anxiety and Depression Scale – depression/anxiety scores; ADM/BDM, patients scanned after/before the daily methadone intake Supplementary Table S2 Behavioral results

Condition Controls All Patients ADM BDM

Reaction time(s) Reward 0.99±0.25 1.09±0.29 1.03±0.26 1.16±0.32

Neutral 0.91±0.18 1.14±0.31 1.09±0.32 1.18±0.31

Loss 1.19±0.23 1.34±0.31 1.27±0.31 1.41±0.29

Number of high Reward 57.26±16.23 62.90±14.17 64.13±11.44 61.67±16.80 probability choices Neutral 47.83±19.92 54.00±19.25 51.73±21.85 56.27±16.71

Loss 51.04±12.39 52.57±13.09 53.27±13.49 51.87±13.11

Reaction times were taken between the trial onset and button press. The Number of high

probability choice indicates the number of times that participants selected the high

rewarding stimulus on reward trials, the high loss-avoidance stimulus on the loss trials

and the stimulus more associated with the ‘No-Change’ image on the neutral trials. Data

are expressed as mean ± SD. ADM/BDM, patients scanned after/before the daily

methadone intake Supplementary Table S3 Within and between group activations during reward trials for the contrast win vs. no-win at the outcome time of the task. BA x y z T Reward trials –outcome time Contrast: win > no-win Controls L ventral striatum -16 10 -10 7.02 R ventral striatum 16 10 -12 6.94 L dorsal caudate -18 20 12 5.10 Midbrain 0 -20 -12 3.69 Thalamus 6 -6 0 4.52 Medial prefrontal cortex 10-11-32 0 46 2 6.64

L amygdala-hippocampal complex/PHG -30 -10 -22 6.90 R amygdala-hippocampal complex/PHG 20 -10 -22 6.93 L insula -40 0 -4 3.40 R insula 38 2 -2 4.17 Posterior cingulate cortex 23-24 4 -28 32 6.45 L occipital lobe, cuneus 18 -16 -102 16 13.69 R occipital lobe, middle occipital lobe 18 26 -94 10 14.39 L cerebellum -40 -62 -40 5.47 R cerebellum 42 -60 -40 5.31

Patients L ventral striatum -10 10 -8 7.29 R ventral striatum 10 6 -8 6.25 Thalamus 2 -8 0 3.33 Medial prefrontal cortex 10-11 -2 36 -16 6.67 L amygdala-hippocampalcomplex/PHG -22 -12 -18 5.46 R amygdala-hippocampus complex/PHG 32 -10 -22 3.77 L occipital lobe, middle occipital gyrus 18 -6 -102 12 9.97 R occipital lobe, middle occipital gyrus 18 16 -100 12 9.60

Controls > Patients L dorsal caudate -20 22 12 4.06 R dorsal caudate 16 26 4 3.22 R parietal lobe, inferior parietal lobe 40 58 -38 42 4.59 Parietal lobe, precuneus 7 -10 -70 50 3.53 L occipital lobe, cuneus 18 -8 -102 14 5.14 R occipital lobe, cuneus 18 8 -102 2 4.71 R cerebellum 40 -76 -26 3.76

Patients>Controls Non significantactivations

Coordinates (x, y, z) reported in MNI space; R/L=right/left. All results significant at p<0.05 cluster extent corrected across the whole-brain. (PHG, para-hippocampal gyrus) Supplementary Table S4 Between group activations (ADM vs. BDM) during reward trials for the contrast win vs. no-win at the outcome time of the task. BA x y z T Reward trials –outcome time Contrast: win > no-win ADM>BDM No significant activations

BDM>ADM L frontal lobe, superior frontal gyrus 10 -22 56 10 4.35 R frontal lobe, middle frontal gyrus 10 34 54 0 4.59 L temporal lobe, middle temporal gyrus 39 -56 -58 12 4.22 R temporal lobe, superior temporal gyrus 22 62 -56 12 5.13 L para-hippocampal gyrus -44 -32 -10 4.50 L parietal lobe, precuneus 31 -6 -62 20 3.65

Coordinates (x, y, z) reported in MNI space; R/L=right/left. All results significant at p<0.05 cluster extent corrected across the whole-brain. Supplementary Table S5 Within and between group activations for the contrast reward vs. neutral at the decision time of the task.

BA x y z T Decision time Contrast: reward > neutral Controls L ventral striatum -8 6 0 4.05 R ventral striatum 10 6 -4 4.94 L insula -34 20 -12 4.67 R insula 34 24 -10 4.23 Anterior cingulate cortex 32 4 40 16 3.8 Frontal lobe, medial frontal gyrus 9 -2 36 36 3.73 L frontal lobe, middle frontal gyrus 10 -32 46 6 3.98 L frontal lobe, middle frontal gyrus 8 -48 10 46 4.21 L parietal lobe, inferior parietal lobule 40 -34 -52 38 3.99 R parietal lobe, superior parietal lobule 7 30 -60 54 3.15 L occipital lobe, middle occipital gyrus 19 -30 -94 10 4.31 R occipital lobe, middle occipital gyrus 18 26 -96 16 4.51 Cerebellum 12 -82 -30 4.88

Patients L ventral striatum -12 4 -6 4.51 R ventral striatum 8 10 -2 4.79 L insula -34 22 -8 4.20 R insula 38 18 -14 4.01 Frontal lobe, medial frontal gyrus and anterior cingulate cortex 32-9 -4 42 38 4.91 L frontal lobe, middle frontal gyrus 8 -44 22 48 3.35 Left posterior midbrain -10 -28 -7 4.57 Right posterior midbrain 8 -32 -12 3.90 Thalamus 9 -26 6 3.39 L occipital lobe, cuneus 19 -28 -92 22 3.73 R occipital lobe, fusiform gyrus 37 30 -50 -12 3.52 Cerebellum 16 -76 -32 4.63

Controls > Patients Non significant activations

Patients>Controls R temporal lobe, middle temporal gyrus 21 58 -22 -12 3.71

Coordinates (x, y, z) reported in MNI space; R/L=right/left. All results significant at p<0.05 cluster extent corrected across the whole-brain. Supplementary Table S6 Between group (ADM vs. BDM) activations for the contrast reward vs. neutral at the decision time of the task.

BA x y z T Decision time Contrast: reward > neutral ADM>BDM R frontal lobe, inferior 13 40 30 8 4.38 Rfrontal parietal gyrus lobe, precuneus 31 18 -54 26 3.59

BDM>ADM No significant activations

Coordinates (x, y, z) reported in MNI space; R/L=right/left. All results significant at p<0.05 cluster extent corrected across the whole-brain. Supplementary Table S7 Within group activations during loss trials for the contrast loss-avoidance>loss at the outcome time.

BA x y z T Loss trials – outcome time Contrast: loss- avoidance > loss Controls L ventral striatum -14 10 -10 5.37 R ventral striatum 14 8 -10 5.41 L dorsal caudate -20 20 12 4.74 R dorsal caudate 20 12 16 4.99 L temporal lobe, middle temporal gyrus 39 -50 -76 24 4.72 R temporal lobe, superior temporal gyrus 41 36 -42 10 5.16 Cerebellum 28 -62 -44 4.82

Patients L dorsal caudate -12 20 14 3.58 R dorsal caudate 14 20 12 3.46

Coordinates (x, y, z) reported in MNI space; R/L=right/left. All results significant at p<0.05 cluster extent corrected across the whole-brain. Supplementary Table S8 Within group activations during loss trials for the contrast loss> loss-avoidance at the outcome time.

BA x y z T Loss trials – outcome time Contrast: loss > loss-avoidance Controls L posterior midbrain -8 -22 -6 4.17 Dorsal anterior cingulate cortex 24 2 28 24 5.65 Frontal lobe, medial frontal gyrus 10 2 52 18 3.95 L frontal lobe, inferior frontal gyrus 47 -34 30 -22 4.73 L temporal lobe, superior temporal gyrus 38 -42 20 -32 4.35 R frontal lobe, inferior frontal gyrus 47 40 24 -22 5.35 R temporal lobe, superior temporal gyrus 38 34 22 -36 5.71 Frontal lobe, superior frontal gyrus 6 -8 16 68 5.04 L frontal lobe, precentral gyrus 6 -50 -2 52 4.03 L occipital lobe, middle occipital gyrus 18 -12 -96 12 7.97 R occipital lobe, cuneus 18 12 -100 6 7.85 Cerebellum 36 -40 -28 3.67 Cerebellum -22 -64 -12 3.59

Patients Midbrain 0 -22 -20 4.98 L midbrain and para-hippocampal gyrus -18 -22 -16 4.76 R midbrain and para-hippocampal gyrus 10 -20 -22 5.49 L thalamus and globus pallidus -16 -6 4 4.41 R thalamus and globus pallidus 14 -2 8 4.40 Dorsal anterior cingulate cortex 32 2 32 26 5.62 Dorsal posterior cingulate cortex 24 0 -12 32 4.39 Frontal lobe, superior frontal gyrus 6 0 16 64 7.78 L frontal lobe, middle frontal gyrus 6 -46 2 52 4.61 R frontal lobe, middle frontal gyrus 6 50 6 50 5.13 L frontal lobe, inferior frontal gyrus 47 -52 26 -4 6.55 R frontal lobe, inferior frontal gyrus 47 36 24 -10 7.57 R temporal lobe, superior temporal gyrus 38 56 18 -8 7.54 L temporal lobe, middle temporal gyrus 21 -46 4 -40 4.37 R temporal lobe, middle temporal gyrus 21 48 4 -42 3.18 L parietal lobe, superior parietal lobule 7 -26 -68 60 3.63 R parietal lobe, superior parietal lobule 7 28 -60 46 5.05 L temporal lobe, inferior temporal gyrus 20 -52 -26 -16 6.28 R temporal lobe, middle temporal gyrus 21 60 -24 -14 5.76 L occipital lobe, cuneus 18 -10 -100 18 10.48 R occipital lobe, middle occipital gyrus 18 14 -102 10 9.18 Cerebellum -36 -60 -28 7.62 Cerebellum 36 -60 -28 7.45

Coordinates (x, y, z) reported in MNI space; R/L=right/left. All results significant at p<0.05 cluster extent corrected across the whole-brain. Supplementary Table S9 Between group (controls vs. patients) activations during loss trials for the contrast loss-avoidance>loss at the outcome time.

BA x y z T Loss trials – outcome time Contrast: loss- avoidance > loss Controls > Patients L ventral striatum -14 6 -10 4.64 R ventral striatum 16 6 -10 3.79 L dorsal caudate -20 -10 22 4.16 R dorsal caudate 16 -14 26 3.73 L midbrain and para-hippocampal gyrus -18 -18 -20 3.73 R midbrain and para-hippocampal gyrus 16 -18 -20 3.93 Frontal lobe, superior frontal gyrus 8 -8 38 50 3.60 L frontal lobe, middle frontal gyrus 9 -32 30 42 4.18 L temporal lobe, inferior temporal gyrus 20 -48 -22 -16 3.40 R temporal lobe, inferior temporal gyrus 20 50 -24 -24 5.01 Cerebellum -16 -78 -32 4.37 Cerebellum 28 -62 -44 5.18 Patients > Controls No significant activations

Note that the comparison controls > patients for the contrast loss-avoidance > loss is analogous to the comparison patients > controls for the contrast loss > loss- avoidance.Coordinates (x, y, z) reported in MNI space; R/L=right/left. All results significant at p<0.05 cluster extent corrected across the whole-brain. Supplementary Table S10 Between group (ADM vs. BDM) activations for the contrast loss>loss-avoidance at the outcome time.

BA x y z T Loss trials - outcome time Contrast: loss>loss-avoidance ADM>BDM L midbrain and para-hippocampal gyrus* -18 -26 -16 3.15 R midbrain and para-hippocampal gyrus* 12 -20 -16 3.10 Thalamus -2 -4 8 3.88 R frontal lobe, superior frontal gyrus 10 34 54 18 4.01 L temporal lobe, middle temporal gyrus 20 -50 -40 -14 5.21 R temporal lobe, fusiform gyrus 37 48 -42 -14 3.46 Cerebellum -20 -54 -44 3.48 Cerebellum 18 -70 -28 3.31 BDM>ADM No significant activations

Note that the comparison ADM>BDM for the contrast loss>loss-avoidance is analogous to the comparison BDM>ADM for the contrast loss-avoidance>loss. Coordinates (x, y, z) reported in MNI space; R/L=right/left. * Regions significant at p<0.005 uncorrected. All other results significant at p<0.05 cluster extent corrected across the whole-brain. Supplementary Table S11 Within and between (controls vs. patients) group activations for the contrast loss vs. neutral at the decision time of the task.

BA x y z T Loss trials – decision time Contrast: loss > neutral Controls L medial caudate and globus pallidus -10 4 2 4.47 R medial caudate and globus pallidus 10 4 2 4.75 L insula and inferior frontal gyrus -48 20 0 5.11 R insula and inferior frontal gyrus 32 24 0 5.25 Dorsal anterior cingulate cortex 32 -8 22 44 4.97 R frontal lobe, inferior frontal gyrus 9 58 10 34 4.39 L frontal lobe, superior frontal gyrus 10 -22 58 -8 4.14 L parietal lobe, precuneus 7 -16 -70 34 4.60 R occipital lobe, middle occipital gyrus 18 24 -92 4 4.81 Cerebellum 38 -56 28 4.99 Cerebellum 8 -80 -28 4.59 Cerebellum -38 -48 -20 3.49 Patients L Thalamus -10 -12 6 3.59 R Thalamus 10 -16 8 3.39 L insula and inferior frontal gyrus -32 24 -4 5.12 R insula and inferior frontal gyrus 34 24 -12 4.43 Dorsal anterior cingulate cortex 32 -8 32 34 5.04 L frontal lobe, superior frontal gyrus 10 -28 60 -2 4.24 L frontal lobe, precentral gyrus 9 -40 18 44 3.96 R frontal lobe, inferior frontal gyrus 9 46 4 34 3.68 L parietal lobe, precuneus 7 -4 -80 48 3.84 R parietal lobe, inferior parietal lobule 40 34 -48 40 5.06 L occipital lobe, middle occipital gyrus 19 -34 -94 2 4.80 R occipital lobe, middle occipital gyrus 19 40 -92 4 5.63 Cerebellum 24 -68 -8 4.05 Controls > Patients L insula and inferior frontal gyrus -50 20 -6 3.46 Patients>Controls No significant activations

Coordinates (x, y, z) reported in MNI space; R/L=right/left. All results significant at p<0.05 cluster extent corrected across the whole-brain. Supplementary Table S12 Between group (ADM vs. BDM) activations for the contrast loss vs. neutral at the decision time of the task.

BA x y z T Decision time Contrast: loss > neutral ADM>BDM Thalamus 2 -22 6 3.63 R frontal lobe, middle frontal gyrus 11 30 46 -6 5.08 R parietal lobe, inferior parietal lobule 40 44 -56 50 3.43 BDM>ADM No significant activations

Coordinates (x, y, z) reported in MNI space; R/L=right/left. All results significant at p<0.05 cluster extent corrected across the whole-brain. Supplementary Figure S1 Neural responses correlating with reward prediction errors in the ventral striatum.

A) Controls exhibited neural responses correlating with predictions errors in the bilateral ventral striatum ((-10,16,-12), t=7.25; (-12,10,-12), t=6.39,p<0.05 cluster extent whole brain corrected). (B) Patients also showed reward prediction error signals in the ventral striatum ((-10,10,-8), t=6.36; (10,12,-8), t=5.91, p<0.05 cluster extent whole brain corrected). (C) There were no significant differences between patients and controls in neural encoding of reward prediction errors in the ventral striatum. Supplementary Figure S2 Brain regions active for reward vs. neutral at the decision time.

Brain regions active in (A) controls and in (B) patients for the contrast reward vs. neutral at the decision time of the task. Except for a minor cluster there were not significant differences between patients and controls in brain activity. Regions significant at p < 0.05 whole brain cluster extent corrected as described in the methods. dAC/mPFC, dorsal anterior cingulated / medial prefrontal cortex; VS=ventral striatum; I, insula Supplementary Figure S3 Neural responses correlating with loss-avoidance prediction errors in the ventral striatum.

(A) Controls exhibited neural responses correlating with loss-avoidance prediction errors in the bilateral ventral striatum ((-16 8 -12), t=5.70; (14 8 -10), t=5.35,p<0.05 cluster extent whole brain corrected). (B) Patients did not show significant activation for loss- avoidance prediction errors in the ventral striatum. (C) Controls demonstrated stronger activation than patients correlating with loss-avoidance prediction errors in the ventral striatum ((-14 6 -10), t=4.07 p<0.05 cluster extent whole brain corrected; (16 6 -10), t=3.08, p<0.005 uncorrected). Images displayed at p<0.005 uncorrected. References

Daw ND. Trial-by-trial data analysis using computational models. Attention &

Performance; 2009.

Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 2006; 442(7106):

1042-5.

Sutton RS, Barto AG. Reinforcement Learning. Cambridge, MA: MIT Press; 1998.

Recommended publications