Research Collection
Doctoral Thesis
The use of performance feedback and reward for optimization of motor learning and neurorehabilitation of motor functions
Author(s): Widmer, Mario
Publication Date: 2017
Permanent Link: https://doi.org/10.3929/ethz-a-010870008
Rights / License: In Copyright - Non-Commercial Use Permitted
This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.
ETH Library DISS. ETH NO. 24106
THE USE OF PERFORMANCE FEEDBACK AND REWARD FOR OPTIMIZATION OF MOTOR LEARNING AND NEUROREHABILITATION OF MOTOR FUNCTIONS
A thesis submitted to attain the degree of
DOCTOR OF SCIENCES of ETH ZURICH
(Dr. sc. ETH Zurich)
presented by
MARIO WIDMER
MSc ETH HMS, ETH Zürich
born on 22.04.1986
citizen of Gränichen (AG)
accepted on the recommendation of
Prof. Dr. Nicole Wenderoth Prof. Dr. Andreas Luft Dr. Kai Lutz
2017
The Use of Performance Feedback and Reward for Optimization of Motor Learning and Neuro- rehabilitation of Motor Functions
Doctoral Thesis
MARIO WIDMER
Acknowledgments
Writing my dissertation, and finally completing it, could not have been done without the help of some enthusiastic and intelligent people around me.
First, and foremost, I want to express my sincere gratitude to Prof. Dr. Andreas Luft for giving me the opportunity to work in this interesting research field and to conduct this thesis. His enormous scientific knowledge and experience were highly influential for my development over the course of my PhD.
I would also like to sincerely thank Prof. Dr. Nicole Wenderoth for agreeing to be the head of my committee and for giving me the freedom to perform my research outside of ETH Zurich.
Moreover, I am deeply grateful to Dr. Kai Lutz, my mentor, for his indispensable support dur- ing the last few years. I am indebted for his scientific advice, but also for his encouragement and comprehension in work-related as well as in private matters.
I would like to express my thankfulness for the great support from our research team. A spe- cial thank you goes to my teammate Jeremia Held, who supported me in every situation, in research and in daily life. I thank him for his helpful advice and for being a friend. Many cor- dial thanks go to Belen Valladares, who is always having an open ear for me (never forget that you make Switzerland a better place), but also to Robinson Kundert, José López Sánchez, Irene Christen, Carola Bade-Daum, and all other members of my study team who have helped me over the years.
Furthermore, I owe a big thank you to Samara Stulz for being an excellent Masters student, for her contribution to our "fMRI Reward Assessment" project and for her patience in the data acquisition, data analysis and the entry of the data in our electronic database. Many thanks also for taking care of our office plant, which is facing a very insecure future, now that you have left our group.
In addition, I would like to take this opportunity to express my sincere appreciation to all participants for their time and enthusiasm during the studies of this thesis. That includes all healthy young and elderly subjects as well as all stroke patients who, at times, needed to bring along a lot of patience.
v Acknowledgments
The biggest thank you goes to my friends and my girlfriend for their unconditional love and support - Without you, I have nothing. But with you, I have everything! – and last but not least, I would like to offer my gratitude to family for being there for me.
This research was carried out in collaboration with the University of Zurich, the University Hospital of Zurich, the cereneo - center for neurology and rehabilitation, and ETH Zürich. My position was funded by the Clinical Research Priority Program (CRPP) Neuro-Rehab of the University of Zurich. I am deeply grateful for their financial support.
I would like to dedicate this thesis to Nadja Ziegler, who started her Master thesis in our lab at around the same time as I started my PhD. Nadja sadly passed away in July 2014. “Funny how someone can come into your life for such a brief time but leave such a lasting impres- sion” - Monica Murphy. Through all the pain of losing you I know that I am better for having known you!
vi
vii
viii
Abstract
Intrinsic motivation refers to doing something because it is inherently interesting or enjoya- ble. Extrinsically motivated actions, on the other hand, are performed because they lead to an outcome. Similar to motivation, reward can be classified as extrinsic or intrinsic. Extrinsic reward refers to the receipt of material (e.g., food or money) for a specific activity. The term "intrinsic reward", on the other hand, refers to reward derived from task inherent stimulation (e.g., information about an achieved performance). This includes stimuli that signal perfor- mance accuracy, usually termed feedback, which can then be used to modify future perfor- mance. Generally, learners strive for positive feedback, which means that positive feedback fulfills the definition as a reinforcer or a reward.
The changes in neural activity in response to the processing of reward (and punishment) has been extensively investigated in healthy, but also clinical populations, using the so-called monetary incentive delay (MID) task. Typically, this task requires an individual to react to a target stimulus presented after an incentive cue to win or to avoid losing the indicated re- ward. The first part of this thesis (Chapter 2 ) offers an overview of different utilizations of the MID task by reviewing literature outlining the neuronal processes involved in distinct aspects of human reward processing. A special focus was laid on reward-based learning processes. For instance, in a motor experiment using a MID task combined with functional magnetic resonance imaging (fMRI), both intrinsic and extrinsic rewards have been shown to increase the neural activity in the ventral striatum, a key locus of reward processing. In a rewarded task, hemodynamic ventral striatal response correlates with dopamine release in the ventral striatum, which similarly correlates with the reward-related neural activity in the substantia nigra/ventral tegmental area, the origin of the dopaminergic projection. There is evidence from animal studies that dopaminergic projections from the midbrain to the primary motor cortex (M1) are necessary for the learning of a new motor skill. In M1, dopamine facilitates long-term potentiation, a form of synaptic plasticity that is critically involved in skill learning. Such synaptic plasticity in M1 similarly occurs during recovery/rehabilitation after stroke and likely contributes to its success. Thus, this opens the potential to use rewarding feedback in humans to promote motor skill learning and neurorehabilitation of motor functions.
ix Abstract
Based on this evidence, we conducted an fMRI study with healthy young subjects, relating striatal activity to performance feedback with or without monetary consequences during the training of a repetitive arc-tracking task (Chapter 3 ). The task required subjects to perform wrist movements to steer a cursor on a computer screen through a semicircular channel while undergoing fMRI. Our results demonstrate an influence of the feedback modality on motor skill learning. Adding a monetary reward after good performance led to better consol- idation and higher ventral striatal activation than knowledge of performance alone. In con- clusion, rewarding strategies that increase ventral striatal response during the training of a motor skill may be utilized to improve skill consolidation.
In stroke survivors, activity of this dopaminergic pathway may not only be reduced because rewards are small, but also because, after stroke, rewarding feedback might not have the same capacity to increase dopaminergic activity as in healthy subjects. This has been demon- strated for cognitive tasks, and the hypothesis for the study presented in Chapter 4 was, that this also happens in motor tasks. To test this hypothesis, we applied a similar arc-tracking task, modified as motor MID task and using fMRI to measure striatal activity linked to perfor- mance dependent monetary reward. Results of nine stroke patients and nine age-matched healthy individuals show a tendency for reduced responsiveness of ventral parts of the stria- tum in stroke patients. This is of particular interest as in the study described above ventral striatal activation was found to be the key factor for successful overnight consolidation. We have learned from animal studies that proper functioning of the dopaminergic reward system is necessary for successful motor skill learning. Thus, a reduced responsiveness of the ventral striatum to a motor performance derived reward, be it extrinsic or intrinsic, could be an im- plication for a blunted motor learning ability in patients after stroke. The ability to learn, however, is supposed to support motor recovery.
After stroke, about 50% of all survivors remain with functional impairments of their upper limb. As we were able to show that training with rewarding feedback improves motor learn- ing in humans, we hypothesize that rehabilitative arm training could also be enhanced by rewarding feedback. This amplification of reward during rehabilitative training might be a means to overcome a potentially deficient response to task inherent feedback in order to stimulate the dopaminergic system to improve recovery after stroke. Therefore, a further achievement of this thesis is the development of a clinical trial protocol, investigating re- wards in the form of performance feedback and monetary gains as ways to improve effec- tiveness of rehabilitative training (Chapter 5 ). This trial will be the first to directly evaluate
x Abstract the effect of rewarding feedback including monetary rewards on the recovery process of the upper limb following stroke. A positive outcome could therefore pave the way for novel types of interventions with significantly improved treatment benefits.
In conclusion, in line with findings from animal studies we demonstrated a positive influence of reward on motor skill learning in healthy young humans. This effect was linked to an in- creased ventral striatal response to the presentation of the rewarding feedback. In stroke patients, however, preliminary data points towards a blunted response of the ventral stria- tum when compared to a healthy age-matched control group. Nonetheless, findings of this thesis emphasize the potential of rewarding feedback to promote neurorehabilitation of mo- tor functions. Therefore, a trial protocol for a randomized controlled trial investigating re- wards in the form of performance feedback and monetary gains as ways to improve effec- tiveness of rehabilitative upper limb training after stroke is proposed.
xi
xii
Zusammenfassung
Intrinsische Motivation bedeutet etwas zu tun, weil es von Natur aus spannend und unter- haltsam ist. Extrinsisch motivierte Handlungen hingegen macht man, weil sie zu einem be- stimmten Ergebnis führen. So ähnlich kann man auch Belohnungen in extrinsisch und intrin- sisch einteilen. Extrinsische Belohnung ist verbunden mit dem Erhalt von Gütern (z.B. Nah- rung oder Geld) für eine spezifische Handlung. Der Begriff "intrinsische Belohnung" hingegen bezeichnet Belohnungen, die von Natur aus einer spezifischen Aufgabe innewohnen (z.B. In- formationen über eine erbrachte Leistung). Letzteres beinhaltet Stimuli, welche die Qualität einer Ausführung beschreiben. Solche Stimuli werden für gewöhnlich Feedback genannt. Sie können verwendet werden, um künftige Ausführungen der Aufgabe anzupassen. Im Allge- meinen erhalten Lernende gerne positives Feedback, was positives Feedback begehrenswert macht und daher dazu führt, dass es als Belohnung eingesetzt werden kann.
Änderungen der neuralen Aktivität als Reaktion auf die Verarbeitung von Belohnungen (und Bestrafungen) wurden ausgiebig untersucht, indem die sogenannte "Monetary Incentive Delay Task" (MID Task) verwendet wurde. Dies sowohl in gesunden als auch in klinischen Populationen. Üblicherweise wird dabei erst ein Hinweisreiz präsentiert und die Teilnehmer müssen danach auf einen Zielreiz reagieren, um einen angezeigten Geldbetrag zu gewinnen bzw. zu vermeiden, dass dieser verloren geht. Der erste Teil dieser Dissertation (Kapitel 2) liefert einen Überblick über die verschiedensten Verwendungen der MID Task, indem Litera- tur rezensiert wird, welche die neuronalen Prozesse beschreibt, die in verschiedene Aspekte der Belohnungsverarbeitung involviert sind. Ein spezieller Fokus wurde dabei auf beloh- nungsbasierte Lernprozesse gelegt. Unter Verwendung einer MID Task in Kombination mit funktioneller Magnetresonanztomografie (fMRT) wurde in einem Motorikexperiment zum Beispiel gezeigt, dass sowohl intrinsische wie auch extrinsische Belohnungen das ventrale Striatum, eine Schlüsselregion in der Belohnungsverarbeitung, zu aktivieren vermögen. In solchen Belohnungsaufgaben korreliert die Stärke der hämodynamischen Antwort im vent- ralen Striatum mit der Dopamin Ausschüttung im ventralen Striatum, welche wiederum mit der belohnungsbedingten neuralen Aktivität in der Substantia Nigra und dem Ventralen Teg- mentum, dem Ursprung der dopaminergen Bahnen, korreliert. Tierstudien haben gezeigt, dass dopaminerge Projektionen vom Mittelhirn zum primären motorischen Kortex (M1) für
xiii Zusammenfassung das Lernen einer motorischen Fertigkeit notwendig sind. In M1 fördert Dopamin die Lang- zeitpotenzierung, eine Form synaptischer Plastizität, welche entscheidend zum motorischen Fertigkeitslernen beiträgt. Plastizität ereignet sich jedoch auch in der Rehabilitation nach ei- nem Schlaganfall und trägt zu deren Erfolg bei. Dadurch eröffnet sich das Potenzial, beloh- nende Rückmeldungen in Menschen einzusetzen, um das motorische Fertigkeitslernen und die Neurorehabilitation motorischer Funktionen zu unterstützen.
Basierend auf dieser Evidenz haben wir fMRT eingesetzt, um in jungen gesunden Probanden die striatale Aktivität verbunden mit Leistungsfeedback mit oder ohne monetäre Konsequen- zen während dem Training einer arc-tracking Aufgabe zu messen (Kapitel 3 ). Mittels Handge- lenkbewegungen konnten die Versuchspersonen einen Cursor auf einem Bildschirm kontrol- lieren und diesen durch einen halbkreisförmigen Kanal steuern. Unsere Resultate zeigen ei- nen Einfluss der Feedback Modalität auf das Lernen dieser motorischen Fertigkeit. Wurde das Leistungsfeedback nach gut gelösten Durchgängen an eine monetäre Belohnung ge- knüpft, so führte dies zu einer besseren Konsolidierung der Fertigkeitsaufgabe und einer hö- heren Aktivierung des ventralen Striatums. Daraus schliessen wir, dass Belohnungsstrate- gien, welche während dem Training von motorischen Fertigkeiten die Aktivität im ventralen Striatum zu erhöhen vermögen, eingesetzt werden können, um die Konsolidierung der Fer- tigkeit zu fördern.
In Schlaganfallpatienten ist die Aktivität dieser Bahnen jedoch möglicherweise reduziert. Dies nicht nur weil die alltäglichen Belohnungen wohl eher klein sind, sondern auch weil Patienten nach einem Schlaganfall, verglichen mit gleichaltrigen gesunden Personen, unter Umständen ein Defizit in der Belohnungsverarbeitung aufweisen. Reduzierte Hirnaktivierungen als Ant- wort auf belohnendes Feedback konnten in kognitiven Aufgaben bereits nachgewiesen wer- den. In Kapitel 4 testen wir die Hypothese, dass dies auch auf motorische Aufgaben zutrifft. Dazu haben wir eine ähnliche Fertigkeitsaufgabe, modifiziert als motorische MID Task, unter gleichzeitiger Verwendung von fMRT eingesetzt, um die striatale Antwort auf eine leistungs- abhängige monetäre Belohnung in Schlaganfallpatienten zu untersuchen. Die Resultate von neun Schlaganfallpatienten und neun gleichaltrigen gesunden Kontrollen deuten auf eine Tendenz zu reduzierter Reaktivität der ventralen Teile des Striatums in Schlaganfallpatienten hin. Dies ist von besonderem Interesse, weil in unserer Vorgängerstudie erhöhte Aktivierun- gen im Striatum verbunden waren mit einer besseren Konsolidierung der motorischen Auf- gabe. Darüber hinaus wissen wir von Tierstudien, dass ein ordnungsgemässes Funktionieren des dopaminergen Belohnungssystems wichtig ist für das Lernen motorischer Fertigkeiten.
xiv Zusammenfassung
Eine reduzierte Reaktivität des ventralen Striatums auf belohnende Rückmeldungen (intrin- sischer oder extrinsischer Natur) im Zusammenhang mit einer vorhergehenden motorischen Leistung könnte daher auf eine reduzierte motorische Lernfähigkeit nach Schlaganfall hin- deuten. Die Fähigkeit zu lernen soll allerdings die motorische Genesung fördern.
Etwa 50% aller Patienten verbleiben nach einem Schlaganfall mit funktionellen Einschrän- kungen der oberen Extremität. Da wir zeigen konnten, dass das motorische Lernen durch Training mit belohnendem Feedback verbessert wird, möchten wir nun die Hypothese prü- fen, ob auch rehabilitatives Armtraining durch belohnendes Feedback gefördert werden kann. Eine solche Verstärkung belohnender Stimuli während dem Rehabilitationstraining könnte ein Mittel sein, um das sonst eher reduziert reagierende Belohnungssystem von Schlaganfallpatienten zu stimulieren und somit die motorische Erholung zu fördern. Eine wei- tere wichtige Errungenschaft dieser Doktorarbeit ist daher die Entwicklung und Beschreibung eines Forschungsprojekts, welches untersucht, ob Belohnungen in der Form von Leistungs- rückmeldungen und kleinen Geldbeträgen die Effizienz von rehabilitativem Training fördern können (Kapitel 5 ). Diese randomisierte kontrollierte Studie wird die erste sein, welche auf direkte Weise einen Einfluss von belohnendem Feedback inklusive monetärer Belohnung auf den Genesungsprozess der oberen Extremität nach einem Schlaganfall evaluiert. Ein positi- ves Ergebnis könnte den Weg für neue Arten von Interventionen mit signifikant besserem Behandlungsnutzen ebnen.
Die vorliegende Dissertation hat Erkenntnisse aus Tierstudien bestätigt, indem gezeigt wurde, dass Belohnungen einen positiven Einfluss auf das motorische Fertigkeitslernen im Menschen haben. Dieser Effekt war mit einer erhöhten Aktivierung des ventralen Striatums verbunden. Erste Daten von Schlaganfallpatienten deuten jedoch darauf hin, dass die Reak- tivität des ventralen Striatums auf Belohnungen im Vergleich zu gesunden gleichaltrigen Kon- trollen reduziert ist. Dennoch zeigt diese Arbeit das Potenzial auf, belohnende Rückmeldun- gen in der Form von Leistungsfeedback und monetärem Gewinn zur Förderung von neurore- habilitativem Training einzusetzen. Ein Folgeprojekt zur Untersuchung eines möglichen Ef- fekts auf das rehabilitative Training der oberen Extremität nach Schlaganfall wird in dieser Dissertation beschrieben.
xv
xvi
Contents
Acknowledgments ...... v Abstract ...... ix Zusammenfassung ...... xiii Contents ...... xvii List of Abbreviations ...... xxi General Introduction ...... 1 Stroke ...... 2 The Burden of Stroke ...... 2 Motor Learning as a Model for Stroke Recovery and Neurorehabilitation...... 4 Reward and Motivation ...... 6 The Use of Performance Feedback and Reward for Optimization of Motor Learning… ...... 8 … and Neurorehabilitation of Motor Functions ...... 9 Thesis Outline ...... 9 What can the monetary incentive delay task tell us about the neural processing of reward and punishment? ...... 13 Abstract ...... 14 Introduction ...... 15 Anatomy of the reward system ...... 17 Aspects of Processing Reward and Punishment ...... 19 Anticipation and Consumption (wanting/liking) ...... 19 Reward versus Punishment ...... 20 Reward-based Learning ...... 22 Goal-oriented Behavior and Reward ...... 24 Reward Processing and Error Monitoring ...... 26 Discounting of Delayed Reward ...... 27 Individual Influences on Reward Processing ...... 28 Conclusion ...... 33 Acknowledgments ...... 33 Disclosure ...... 33
xvii
Rewarding feedback promotes motor skill consolidation via striatal activity ...... 35 Abstract ...... 36 Introduction ...... 37 Methods ...... 38 Participants ...... 38 Study Design ...... 39 Motor Task ...... 39 fMRI Measurements ...... 42 Analysis of Imaging Data ...... 45 Analysis of Behavior ...... 46 Results ...... 47 fMRI ...... 47 Behavioral Results ...... 48 Discussion ...... 50 Training and Motor Skill Acquisition ...... 51 Consolidation ...... 51 Limitations ...... 54 Conclusion ...... 55 Acknowledgments ...... 55 Processing of Motor Performance Related Reward After Stroke ...... 59 Abstract ...... 60 Introduction ...... 61 Material and Methods ...... 61 Participants ...... 61 fMRI Task ...... 61 Results ...... 63 Discussion ...... 64 Conclusion ...... 64 Acknowledgments ...... 64 Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso- Reward: Study protocol for a randomized controlled trial...... 67 Abstract ...... 68 Background ...... 69 Methods ...... 70 Study Design ...... 70
xviii
Study Population ...... 70 Randomization ...... 70 ArmeoSenso Training System...... 71 Intervention ...... 72 Rewarded Training ...... 73 Control Training ...... 75 Outcome Measures ...... 76 Sample Size ...... 77 Statistical Analysis ...... 77 Discussion ...... 77 Declarations ...... 79 Ethics Approval and Consent to Participate ...... 79 Competing Interests ...... 79 Funding...... 79 Authors' Contributions ...... 79 Acknowledgments ...... 79 General discussion ...... 81 Main Findings ...... 82 Imaging Studies (Chapters 3 and 4) ...... 83 Studies on Stroke Patients (Chapters 4 and 5) ...... 85 Open Questions and Outlook ...... 86 Conclusion ...... 87 References ...... 91 Curriculum Vitae Mario Widmer, MSc ETH ...... 105
xix
xx
List of Abbreviations
AEs Adverse Events BI Barthel Index CRO Clinical Research Organisation EEG Electroencephalography EKNZ Ethikkommission Nordwest- und Zentralschweiz FMA-UE Fugl-Meyer Assessment of the Upper Extremity fMRI Functional Magnetic Resonance Imaging GABRA2 γ-Amino Butyric Acidβ2 Receptor Subunit GCP Good Clinical Practice GLMM Generalized Linear Mixed Model ID Identification Number IMU Inertial Measurement Unit
KP good Knowledge of Performance after well-solved trials
KP good +MR Knowledge of Performance plus Monetary Reward after well-solved trials
KP random Knowledge of Performance after random selection of trials M1 Primary Motor Cortex MAL14 Motor Activity Log 14 MR Monetary Reward MID Task Monetary Incentive Delay Task NIHSS National Institutes of Health Stroke Scale PI Principle Investigator RASGRF2 Ras Protein-specific Guanine Nucleotide-releasing Factor 2 RCT Randomized Controlled Trial ROM Range of Motion SAE Serious Adverse Event WHO World Health Organization WMFT Wolf Motor Function Test
xxi
General Introduction
1
General Introduction
Stroke
The Burden of Stroke Likely, the word “stroke” was first introduced in 1689 by William Cole. Before that, "apo- plexy" was commonly used to describe very acute nontraumatic brain injuries (Sacco et al., 2013). Nowadays, the World Health Organization (WHO) defines stroke as "rapidly develop- ing clinical signs of focal (or global) disturbance of cerebral function, lasting more than 24 hours or leading to death, with no apparent cause other than that of vascular origin" (Aho et al., 1980). But there are also other, similar definitions (Sacco et al., 2013). A stroke is caused by disruption of the blood supply to the brain, which may result from either blockage (is- chemic stroke) or rupture of a blood vessel (hemorrhagic stroke) (Mackay et al., 2004). In both cases, the brain tissue is damaged due to a lack of oxygen and nutrient supply. With about 87%, ischemic strokes account for the vast majority of stroke incidents (Go et al., 2014).
Worldwide, about 15 million people suffer a stroke per year (Mackay et al., 2004). While the incidence of stroke has declined by 42% over the past four decades in high-income countries, a more than 100% increase was observed in low- to middle-income countries. As a conse- quence, in the period 2000-08, the overall stroke incidence rates in low- to middle-income countries have exceeded the level of stroke incidence in high-income countries for the first time (Figure 1.1) (Feigin et al., 2009). Per year, about 5 of the 15 million people suffering a stroke die (Mackay et al., 2004). WHO estimates for 2001 indicate that death from stroke in low- and middle-income countries accounted for 85.5% of stroke deaths worldwide (Feigin et al., 2009; Mathers et al., 2006). In this context, the British newspaper "The Guardian" stated sharply: "The poor are dying more and more like the rich" (Lomborg, 2015). Hence, there is an urgent need for the progress in prevention and mortality achieved in the devel- oped world to be translated to middle- and lower-income societies (Towfighi and Saver, 2011).
Although the implementation of preventive treatments and reductions in risk factors at the population level helped to significantly reduce stroke incidence in well-developed countries (Feigin et al., 2009), stroke prevalence is likely to increase in the future due to the aging pop- ulation (Veerbeek et al., 2014). In the United States, for example, projections show that by 2030, an additional 3.4 million adult people will have had a stroke, which reflects a 20.5% increase in prevalence from 2012 (Go et al., 2014).
2
General Introduction
A B
Figure 1.1: Age-adjusted stroke incidence rates per 100 000 person-years across the past four decades in (A) high-income countries, and (B) low to middle income countries. Solid line is regression trend line. The regression line is based on a regression of average incidence on study period. Adapted from Feigin et al. (2009).
Seen from this angle, the increase in survivors with post-stroke morbidity is a tradeoff of mortality reduction through better acute stroke care (Towfighi and Saver, 2011). In Switzer- land, about 14'000 stroke survivors are discharged from the hospital each year (Meyer et al., 2009) and a large number of patients remain disabled. Typical deficits are motor impairments such as paresis, spasticity, and disorders of mobility together with such neuropsychological impairments as amnesia, agnosia, aphasia, apraxia, executive dysfunction, and mood disor- ders (Chen et al., 2013). In the long-term, 25-74% of patients who have suffered a stroke have
3
General Introduction to rely on human assistance for basic activities of daily living like feeding, self-care, and mo- bility (Miller et al., 2010), which has, of course, strong consequences for the patients and their families (Anderson et al., 1995). Moreover, stroke-related healthcare costs place a heavy burden on the society, which is likely to increase in the future. It is projected, for in- stance, that the total direct medical stroke-related costs in the United States will triple be- tween 2012 and 2030 (Go et al., 2014).
Hence, although stroke epidemiology is not an explicit topic of this thesis, these staggering numbers highlight that, even though remarkable progress in stroke prevention and health care after stroke has been made, we are far away from having solved the burden of stroke worldwide. This emphasizes the importance for research in the field of stroke rehabilitation.
Motor Learning as a Model for Stroke Recovery and Neurorehabilitation Every simple goal-oriented movement is made up of separate operations, each of which, in the context of stroke, may or may not be affected by a lesion (Krakauer, 2006). However, almost all stroke patients experience at least some degree of functional recovery within the first six months post-stroke. Mechanisms like recovery of penumbral tissues, neural plastic- ity, resolution of diaschisis and behavioral compensation strategies are presumed to be in- volved (Kwakkel et al., 2004). Rehabilitation is believed to interact with these underlying pro- cesses and, although some aspects of brain reorganization are probably unique to brain in- jury (Krakauer, 2006), there are large overlaps with development (Carmichael, 2003) and mo- tor learning (Kleim et al., 2004).
"Rehabilitation, for patients, is fundamentally a process of relearning how to move to carry out their needs successfully" (Carr, 1987). This statement illustrates that rehabilitation is based on the assumption that practice or training leads to improvement of skills after hemi- paresis (Krakauer, 2006). According to Shadmehr and Wise (2005), all humans must extend their motor repertoire during their lifetime in order to be able to adapt to changing circum- stances. This expansion is called skill learning or skill acquisition and may be seen as practice- dependent reduction of performance errors detected through sensory channels (Krakauer et al., 1999). Moreover, we must continually adapt those new, but also existing motor programs to changing circumstances. This modification of existing elements of the motor repertoire is called motor adaption (Shadmehr and Wise, 2005).
4
General Introduction
Figure 1.2: Typical force field experiment. (A) Subject sits in front of a manipulandum and executes reaching movements to visual targets. (C) Trajectories are initially straight. When exposed to a field (B), trajectories are initially perturbed (D). With training, they resume the prototypical shape (E). If the field is removed after learning, subjects display aftereffects (F) as overcompensation for the ex- pected perturbation. Adapted from Gandolfo et al. (1996).
In motor control experiments (Figure 1.2), after initial adaption to a certain perturbation (e.g., a force field), which is then suddenly turned off, trajectories are usually skewed in the direction opposite to that seen during initial adaptation (Gandolfo et al., 1996). These "after- effects" indicate that the central nervous system alters motor commands to the arm to pre- dict the effects of the forcefield. In turn, a new mapping between limb state and muscle forces (internal model) is formed. This is of great importance for rehabilitation because it means that the internal model can be updated as the state of the limb changes (Krakauer, 2006).
5
General Introduction
However, the most fundamental principle in motor learning likely is that the degree of per- formance improvement depends on the amount of practice (Schmidt and Lee, 1988). Simi- larly, there is evidence for a positive influence of intensive training on functional recovery after stroke (Veerbeek et al., 2014), albeit a recent study among 361 participants could not prove a dose-effect for occupational therapy on motor function or recovery after 12 months (Winstein et al., 2016). Nonetheless, improvement is (although certainly not exclusively) lim- ited by a subject's motivation to train, as it determines whether an individual is willing to spend its time and resources for the training of (rehabilitative) exercises. Therefore, the fol- lowing subchapter provides a short introduction into the concept of motivation and describes how it may be influenced, e.g., by rewards.
Reward and Motivation Doing something because it is inherently interesting or enjoyable is generally referred to as acting on intrinsic motivation, which is influenced by factors such as the subject’s perceived autonomy, competence for or relatedness to a task (Ryan and Deci, 2007). These factors, hence, make up the intrinsic value of the exercise. Extrinsically motivated actions, on the other hand, are performed because they lead to an outcome, e.g., to a reward (Ryan and Deci, 2000). Typically, rewards can be categorized into primary and secondary rewards. Pri- mary rewards have a direct positive value for an individual receiving the reward. They often have a physiological meaning, like food, beverages and sex. Secondary rewards, on the other hand, have no direct value, but we learn that receipt of such usually has positive conse- quences (e.g., money, tokens, some forms of social acknowledgement, or similar). While the valuation of primary rewards depends on hunger, thirst, or other states of the organism, sec- ondary rewards are less prone to saturation and thus possess a relatively stable value. Nev- ertheless, a multitude of factors exist, influencing the individual valuation of primary as well as secondary rewards (Lutz and Widmer, 2014; Schultz, 2000; Sescousse et al., 2013).
According to the concepts of behaviorists, reward increases the probability that a rewarded behavior is shown in the future. Hence, rewards are closely related to motivation, providing incentives to actively seek certain stimuli (Lutz and Widmer, 2014). Generally, an individual's motivation to perform a specific exercise or activity is determined by the subjective benefit and the subjective cost of the activity as illustrated in Figure 1.3. Both, benefits and costs are 1) subjective - i.e., dependent on individual's preposition, goals, values and attitudes -, 2) state-dependent (such as, the benefit of eating a sandwich is higher when hungry than when
6
General Introduction saturated, the cost of a cycling exercise is higher when tired than when well-rested), and 3) multifactorial (i.e., the overall benefit of a given exercise or activity is determined by multiple benefits of different natures) (Studer and Knecht, 2016).
Figure 1.3: The motivation for a specific exercise is increased by the subjective expected benefit and decreased by the subjective expected cost of the exercise. Both sides contain an intrinsic and an ex- trinsic component. However, although the tasks that we deal with in this thesis do, of course, also contain a subjective cost component, we mainly aim at manipulating the subjective benefit side in order to influence our subjects' motivation to train and, hence, their behavior. Adapted from Studer and Knecht (2016).
Rewards augment the overall subjective benefit of a task, making people tolerate higher sub- jective costs and are thus traditionally defined as stimuli an organism is willing to work for (Knutson and Cooper, 2005; Thorndike, 1931). While extrinsic reward refers to the receipt of material (e.g., food or money) for a specific activity, the term "intrinsic reward" is used to describe reward derived from task inherent stimulation (e.g., information about an achieved performance, looking at a self-painted picture, or feeling self-produced movements). This includes stimuli that signal performance accuracy, usually termed feedback, which can then be used to modify future performance (Kluger and DeNisi, 1996). Generally, learners like to receive positive feedback, causing positive feedback to have appetitive value, and thus to act as a reward (Elliott et al., 1997; Tricomi and DePasque, 2016).
The studies included in the present thesis were designed to manipulate the benefit side for performing a specific exercise/training (Figure 1.3) by adding performance feedback and/or a monetary incentive, and to investigate a possible effect on behavior ( Chapters 3-5).
7
General Introduction
The Use of Performance Feedback and Reward for Optimization of Motor Learning… The processing of both, intrinsic (in the form of performance feedback) and extrinsic reward (i.e., money) as described above has been shown to increase the neural activity in the stria- tum (Lutz et al., 2012), a key locus of reward processing (Knutson et al., 2009; for a detailed overview on the neural correlates involved in distinct aspects of reward processing, see Chapter 2 ). Several studies reported activation elicited by feedback alone in the dorsal stria- tum (Poldrack et al., 2001; Tricomi et al., 2006; Tricomi and Fiez, 2008; Tricomi et al., 2004). However, Lutz et al. (2012) found that only the ventral striatum was active during perfor- mance feedback, while feedback plus monetary reward also activated the dorsal parts of the striatum and elicited stronger activation in the ventral parts.
These activations in response to performance feedback (linked or not linked to a monetary outcome) are of particular interest because it is known that in a rewarded task, the hemody- namic ventral striatal response correlates with dopamine release in the ventral striatum, which as well correlates with the reward-related neural activity in the substantia nigra/ven- tral tegmental area, the origin of the dopaminergic projection (Schott et al., 2008).
In humans, indirect evidence for dopamine involvement in motor learning is found in studies showing that long-term potentiation (LTP)-like plasticity in the primary motor cortex (M1) is enhanced by levodopa, a precursor of dopamine (Kuo et al., 2008), abnormally reduced in Parkinson’s disease and restored to normal in these patients by dopaminergic treatment (Morgante et al., 2006). Animal studies suggest that motor learning may be mediated by such LTP-like processes in M1 (Rioult-Pedotti et al., 2000). Furthermore, it has been shown in an- imals that M1 plasticity and skill learning depend on dopamine (Molina-Luna et al., 2009) and Hosp et al. (2011) could demonstrate that these processes in rodents rely on midbrain dopa- minergic projections involved in signaling reward. Destroying dopaminergic neurons in the ventral tegmental area prevented improvements in forelimb reaching, a state that was abol- ished on administration of levodopa into M1 (Hosp et al., 2011). These findings give strong reasons to assume that dopaminergic reward signals, such as rewarding feedback, may alter LTP and thus lead to increased efficiency in motor skill acquisition and, potentially, motor recovery after stroke.
Indeed, recent work suggests positive effects of reward on procedural (Wachter et al., 2009) and skill motor learning (Abe et al., 2011) as well as on motor adaption (Galea et al., 2015).
8
General Introduction
Notably, all of these studies reported dissociable effects of positive and negative reward, and the latter two found positive reward to impact task consolidation/retention.
… and Neurorehabilitation of Motor Functions As mentioned above, neurorehabilitative training for stroke patients is an effective interven- tion to increase independency in daily life activities (Veerbeek et al., 2014). Part of this train- ing induced reduction of impairments is mediated by plastic reorganization of cortical circuits (Luft et al., 2004; Nudo, 2003; Schaechter, 2004). Thus, the ability to learn is assumed to support successful recovery and rehabilitation therapy after stroke (Krakauer, 2006; Lam et al., 2016). Reward has been shown to increase the effectiveness of learning a motor task (Abe et al., 2011; Galea et al., 2015; Wachter et al., 2009). However, in stroke survivors the activity of this dopaminergic pathway may not only be reduced because rewards are small, but also because patients after stroke have deficits in reward processing (Lam et al., 2016). Stroke survivors showed reduced brain activation to smiley face feedback, which was reflected in impaired reinforcement learning in a probabilistic classification task when compared to age- matched healthy individuals (Lam et al., 2016). Whether stroke also affects the processing of motor performance related reward is, however, yet unknown. Moreover, amplifying reward- ing stimuli during rehabilitative training might be a means to overcome such a deficit and to stimulate the dopaminergic system to improve recovery.
Thesis Outline The main objective of the present thesis was to translate evidence regarding the role of re- ward as a facilitator of synaptic plasticity in M1 and, as a consequence, of motor learning from animal models to humans. The ultimate goal, however, is the application of rewarding interventions for the optimization of motor neurorehabilitation in clinical populations that could benefit from improved plasticity as, for example, patients after stroke. This thesis con- sists of cumulative research articles originally written for separate peer-reviewed pub- lications in scientific journals (Chapters 2-5).
As a first step, literature of the most widely used functional imaging task to investigate the processing of reward in healthy, but also in clinical populations, was thoroughly reviewed (Chapter 2) .
9
General Introduction
Based on the acquired knowledge and using a modified version of a recently well-published motor skill task (Shmuelof et al., 2012), the effect of different reward modalities on motor skill learning was investigated by manipulating either the schedule for, or the extrinsic sub- jective value of delayed performance feedback. Using functional magnetic resonance imag- ing (fMRI), behavioural results could be linked to specific neural activations, while focusing on the striatum as a key region of reward processing in the human brain (Chapter 3) .
Based on findings presented in Chapter 3 , the most efficient reward condition was chosen to investigate the neural processing of motor performance related reward in patients after stroke. Pilot results of this investigation have been published and are presented in Chapter 4 .
Chapter 5 introduces a study protocol for an ongoing randomized controlled trial to investi- gate the clinical effect of reward on neurorehabilitation of motor functions after stroke.
Finally, Chapter 6 discusses the specific findings of this thesis in conjunction with each other, also mentioning shortcomings and future directions.
10
11
12
What can the monetary incentive delay task tell us about the neural processing of reward and punishment?
Published in:
Neuroscience and Neuroeconomics 2014, 3: 33-45. https://doi.org/10.2147/NAN.S38864
Authors:
Publisher:
Dove Medical Press Limited
Keywords:
Reward, punishment, dopamine, reward system
13
What can the monetary incentive delay task tell us about the neural processing of reward and punishment?
Abstract Since its introduction in 2000, the monetary incentive delay (MID) task has been used exten- sively to investigate changes in neural activity in response to the processing of reward and punishment in healthy, but also in clinical populations. Typically, the MID task requires an individual to react to a target stimulus presented after an incentive cue to win or to avoid losing the indicated reward. In doing so, this paradigm allows the detailed examination of different stages of reward processing like reward prediction, anticipation, outcome pro- cessing, and consumption as well as the processing of tasks under different reward condi- tions. This review gives an overview of different utilizations of the MID task by outlining the neuronal processes involved in distinct aspects of human reward processing, such as antici- pation versus consumption, reward versus punishment, and, with a special focus, reward- based learning processes. Furthermore, literature on specific influences on reward pro- cessing like behavioral, clinical and developmental influences, is reviewed, describing current findings and possible future directions.
14
What can the monetary incentive delay task tell us about the neural processing of reward and punishment?
Introduction Traditionally, rewards are defined as stimuli an organism is willing to work for and punish- ments as stimuli an organism is trying to avoid (Thorndike, 1931). These concepts have played a central role in the psychology of learning ever since they were introduced by behaviorism last century (see recent overviews by Domjan (2009); Miltenberger (2011)). They imply that reward and punishment are linked to an operant, i.e., to an agent’s action. According to be- haviorist concepts, reward increases the probability that a rewarded behavior is shown in the future, whereas punishment decreases this probability. Therefore, reward and punishment are closely related to motivation, providing incentives to actively seek or avoid certain stim- uli, and thus can elicit appetitive or avoidance behavior, respectively.
Rewards have been categorized into primary and secondary rewards. Primary rewards con- sist of stimuli which have a direct positive value for an individual receiving the reward. Many of these primary rewards or punishments have a physiological meaning, like food, beverages, sex, and pain. In contrast, secondary rewards have no immediate direct value, but an individ- ual learns that receipt of such rewards usually has positive consequences. Such rewards can be money, tokens, some forms of social acknowledgement, or similar. Valuation of primary rewards depends on hunger, thirst, or other states of the organism, often making it necessary to deprive an individual under observation of the respective reward, in order to make sure that the stimulus is indeed rewarding. In comparison, secondary rewards are less prone to saturation and thus possess a relatively stable value. Nevertheless, a multitude of factors exist, influencing the individual valuation of primary as well as secondary rewards.
The neuroscientific study of reward processing flourished with the detailed examination of neuronal activity in rodent brains during consumption and anticipation of rewards and pun- ishment (Hollerman and Schultz, 1998; Schultz, 1998). For a comprehensive review, see Schultz (2006). This work revealed that unexpected presentation of a reward, acting as an unconditioned stimulus, leads to a phasic increase in dopaminergic activity in the substantia nigra/ventral tegmental area. After classical conditioning of such a reward to a conditioned stimulus, the conditioned stimulus elicits a similar phasic increase of dopaminergic activity, but presentation of the unconditioned stimulus does not do so anymore. Correspondingly, if presentation of a conditioned stimulus is not followed by an unconditioned stimulus despite this being expected (leading to extinction), then a phasic decrease of dopaminergic activity can be found at the time when the unconditioned stimulus had been expected. Thus, a
15
What can the monetary incentive delay task tell us about the neural processing of reward and punishment? wealth of animal studies has led to the description of a reward system and allowed formula- tion of hypotheses about reward processing in human brains.
Soon after these groundbreaking investigations, research was extended to human subjects, mainly using neuroimaging methods to assess changes in neuronal activity due to the pro- cessing of reward and punishment (Delgado et al., 2000; Knutson et al., 2000). The most im- portant paradigm used for these studies has been the monetary incentive delay (MID) task. This task consists of the announcement of an incentive, which is linked with a certain contin- gency to receipt of this incentive. Basically, this reflects the case of classical conditioning. However, the standard version of the MID task requires an individual to react to a target stimulus presented after the incentive cue but before the reward is given. Whether the an- nounced reward is delivered depends then on the individual reaction. Again, contingency can be introduced to make receipt of the reward more or less predictable from the individual action. Examples of such actions include forced choice behavior, memory tasks, and motor tasks. See Figure 2.1 for a schematic comparison of classical conditioning and the MID task.
Figure 2.1: Schematic drawing of an incentive delay task (B) in comparison with a classical conditioning scheme (A). Note that both settings, instead of using reward/reinforcement, allow for use of aversive stimuli/punishment.
If contingency exists between an action (i.e., task processing) and a consequence, the learn- ing process rather fits into the scheme of operant conditioning. In this context, appetitive stimuli are called reinforcers, since they strengthen the reinforced behavior. If the action is not reinforced (e.g., because it was not performed to a trainer/teacher’s satisfaction), ac- cording to learning theory, this leads to extinction. Note that in the case of classical condi- tioning, a stimulus is, or is not, followed by a reward. During the MID task, an action is, or is
16
What can the monetary incentive delay task tell us about the neural processing of reward and punishment? not, followed by reinforcement. However, the MID task allows assignation of different stimuli to different behaviors shown during task processing. One important possibility is to assign reinforcement to one action and an aversive stimulus (punishment) to another action trig- gered by the preceding cue. This is not the same as assigning a pleasant stimulus (UCS1) to a conditioned stimulus in some cases and an aversive (UCS2) to the same conditioned stimulus in other cases, since during classical conditioning, presentation of the conditioned stimulus is not controllable by the individual, whereas during the MID task, task processing is. Further- more, both set ups, i.e., classical conditioning and MID tasks, allow the use of pleasant (ap- petitive) as well as unpleasant (aversive) stimuli to generate reward or punishment, respec- tively. The most important difference between the set ups is that reward/punishment in the MID task depends on task processing whereas in classical conditioning it depends on the con- ditioned stimulus.
This paradigm allows investigation of different stages of reward processing, like reward pre- diction, anticipation, outcome processing, and consumption, as well as the processing of tasks under different reward conditions. The current review gives an overview of the differ- ent utilities of the MID task that have been published since its introduction by Knutson et al. (2000). The review does not attempt to give an exhaustive overview of the literature, but instead presents selected articles in order to highlight how the MID task has been used to investigate neuronal processes involved in distinct aspects of human reward processing, such as anticipation versus consumption, reward versus punishment, and reward-based learning processes. We further highlight work investigating different influences on reward processing like behavioral, clinical, and developmental influences, as well as reward processing in differ- ent contexts. While describing current findings, the review attempts to point to possible fu- ture directions of investigation in the human reward system.
Anatomy of the reward system In order to present an anatomical framework for discussing the neuronal processes involved in reward and punishment, Figure 2.2 gives an overview of the relevant brain structures, as described by Haber and Knutson (2010).
17
What can the monetary incentive delay task tell us about the neural processing of reward and punishment?
Figure 2.2: The human reward circuit. Evidence from self-stimulation, pharmacological, physiological, and behavioral studies emphasizes the key role of the nucleus accumbens and the ventral tegmental area dopamine neurons in the human reward circuit. However striatal and midbrain areas involved during reward processing are more extensive than previously thought, including the entire ventral striatum and the dopamine neurons of the substantia nigra, respectively. Thereby, the orbital frontal cortex (dark orange arrow) and the anterior cingulate cortex (light orange arrow) provide the main cortical input to the ventral striatum. Moreover, the ventral striatum receives substantial dopaminer- gic input from the midbrain. On the other hand, ventral striatum projections target the ventral pal- lidum and the ventral tegmental area/substantia nigra, which, in turn, via the medial dorsal nucleus of the thalamus, project back to the prefrontal cortex. Additionally, other structures, such as the amyg- dala, hippocampus, lateral habenular nucleus, and specific brainstem structures, such as the pedun- culopontine nucleus and the raphe nuclei, play a key role in the regulation of the reward circuit.
Abbreviations: Amy, amygdala; Hipp, hippocampus; NAcc, nucleus accumbens; dACC, dorsal anterior cingulate cortex; dPFC, dorsal prefrontal cortex; Hypo, hypothalamus; S, shell; STN, subthalamic nu- cleus; VP, ventral pallidum; vmPFC, ventral medial prefrontal cortex; THAL, thalamus; LHb, lateral habenular; PPT, pedunculopontine nucleus.
Notes: Reprinted by permission from Macmillan Publishers Ltd: Neuropsychopharmacology. Haber SN, Knutson B. The reward circuit: linking primate anatomy and human imaging. Neuropsychopharmacol- ogy. 2010;35(1):4–26.9 Copyright © 2010.
18
What can the monetary incentive delay task tell us about the neural processing of reward and punishment?
Aspects of Processing Reward and Punishment A typical rewarding or punishing situation is a complex phenomenon. It consists of distinct temporal phases and can include different classes of stimuli. In the following sections, the most important aspects, as discussed in the literature, are outlined.
Anticipation and Consumption (wanting/liking) As described by Knutson et al. (2000) in their original article introducing the monetary de- layed incentive task, a distinction between anticipation and consumption of rewards should be made when interpreting neuronal activity involved in reward processing. Such a distinc- tion has been suggested based on previous observations in animals (Elliott et al., 2000), and on traditional views (Craig, 1917). Consequently, one year after publication of their original article (Knutson et al., 2000), a report about distinct neuronal activity in humans attributable to anticipation versus consumption of rewards was published (Knutson et al., 2001b). In short, it reports that reward anticipation activates ventral striatal regions, whereas the re- ceipt of reward outcomes activates the ventromedial frontal cortex, thus replicating earlier studies in monkeys (Schultz et al., 2000). This finding has essentially been corroborated over the years with different types of reward (Breiter et al., 2001; Knutson et al., 2003; O'Doherty et al., 2002; Rademacher et al., 2010). Closer inspection of the time course of brain activity involved in reward processing has revealed a more complex pattern: after presentation of monetary gain or loss, activity in the dorsal striatum, particularly the dorsal part of the cau- date nucleus, is sensitive to valence (reward/punishment) as well as outcome magnitude (Delgado et al., 2003). This is true at later stages, approximately 9–12 seconds after outcome presentation, when large rewards elicit the strongest increase and large punishments the weakest. On the other hand, the ventral striatum, especially the nucleus accumbens, seems to be strongly influenced by incentives (Knutson et al., 2001b) and shows less reactivity to outcome than the dorsal striatum (Delgado et al., 2003). Interestingly, initial feedback-re- lated activity in the dorsal striatum seems to be dependent on incentive values, but after a few seconds, activity seems to depend on the size of outcome (Delgado et al., 2004).
The dynamics of brain activity in relation to processing of different reward stages has led to the formulation of a temporal difference model of reward-based learning (Knutson and Wimmer, 2007; O'Doherty et al., 2003). In brief, this model describes how error terms are derived from a mismatch between the predicted reward and that actually received. This mis- match can lead to a positive or negative reward prediction error, meaning that an outcome
19
What can the monetary incentive delay task tell us about the neural processing of reward and punishment? is better or worse than expected, respectively. Prediction of rewards in response to cues seems to take place in the nucleus accumbens (Pagnoni et al., 2002), whereas the medial prefrontal cortex seems to (re)calculate expectations of gains in response to outcomes.
While these findings, gleaned by means of functional magnetic resonance imaging, describe brain activities with relatively slow temporal dynamics in the range of seconds, other meth- ods reveal how other brain activities with faster temporal dynamics are related to reward prediction and receipt. Thus, a more complex picture emerges, i.e., processing negative pre- diction errors leads to negativity in the posterior cingulate cortex and striatum, whereas pro- cessing of reward expectancy corresponds with electrophysiological activity in the posterior cingulate cortex, anterior cingulate cortex, and parahippocampal gyrus (Donamayor et al., 2012). Furthermore, electroencephalography (EEG) and magnetoencephalographic methods reveal that reward cues are coded by neuronal oscillations in the beta (20–30 Hz) and theta (5–8 Hz) range in the frontal regions (Bunzeck et al., 2011; Donamayor et al., 2012; Kawasaki and Yamaguchi, 2013). Integration of these results into existing knowledge about the human reward system has only just started, and is likely to benefit from further studies investigating the fast temporal dynamics of human reward processing.
Reward versus Punishment In addition to the question of temporal dynamics, when discussing rewards, the question remains as to whether rewarding and punishing effects are processed by distinct brain struc- tures. To elaborate on this question, it seems beneficial to briefly overview the positive and negative effects of rewards or punishments; by definition, a stimulus that increases the fre- quency of a behavior, upon which the stimulus is contingent, is called reinforcement. Reward and positive reinforcement are commonly considered to be synonymous, although a reward is less strictly defined. Positive reinforcement usually consists of the presentation of an ap- petitive stimulus contingent on an individual’s behavior, whereas negative reinforcement consists of the removal of a noxious or otherwise aversive stimulus. On the other hand, pun- ishment can consist of the presentation of an aversive stimulus or the removal of an appeti- tive stimulus. The MID task theoretically allows for investigation of all of these entities. How- ever, instead of investigating the removal of stimuli, the incentive delay task has usually been used to investigate negative prediction errors, i.e., an unexpected decrease of reward mag- nitude or an unexpected increase in punishment.
20
What can the monetary incentive delay task tell us about the neural processing of reward and punishment?
There have been several investigations comparing neuronal activity correlated with positive and negative prediction errors. Without distinguishing between anticipation and outcome processing, Delgado et al. (2000) found stronger involvement of the ventral striatum (approx- imate region of nucleus accumbens) and the dorsal striatum (caudate nucleus) in trials show- ing a positive rather than a negative outcome. The latter structure was later shown to code reward magnitude in a parametric manner (Delgado et al., 2003). Since the task used in the study of Delgado et al. (2000) involved gambling, and cues were not manipulated to induce expectancies, reward anticipation is unlikely to have varied systematically and therefore should not have influenced these findings.
Rogers et al. (2004) showed that activity in the medial prefrontal cortex (posterior orbitome- dial cortex and pregenual anterior cingulate cortex) increases when positive outcomes are given, relative to the situation when subjects are confronted with a loss. Importantly, these outcomes, due to the nature of the task, were unpredictable, so positive outcomes represent a positive prediction error. While Rogers et al. (2004) only reported increased brain activity due to processing of positive outcomes versus negative outcomes, and not vice versa, Ramnani et al. (2004) investigated both types of prediction error separately. Their results corroborate the finding that unexpected rewards activate, among other regions, the medial prefrontal cortex. They also showed that unexpected omission of rewards activates a distinct region of the medial prefrontal cortex, more anterior to the aforementioned areas. Negative outcomes in these studies were operationalized as not receiving an expected reward. Alter- natively, negative outcomes can be explicitly defined as a loss by deducting a certain amount of money from a participant’s credit. In doing so, distinct regions are revealed that code pos- itive (gain) and negative (loss) reward prediction errors (Yacubian et al., 2006). Whereas un- expected reward is confirmed to activate the ventral striatum, unexpected loss is shown to correlate with neuronal activation in the amygdala. Interestingly, using this design, not only receipt of outcomes (prediction errors) but also their anticipation involved activation of the ventral striatum and amygdala; anticipation of positive outcomes activates the ventral stria- tum, whereas anticipation of negative outcomes activates the amygdala. Further evidence for involvement of the amygdala in anticipation of outcomes comes from a study using a different task (Breiter et al., 2001). A wheel-of-fortune game presented subjects with several possible gains in some rounds and with possible losses in other rounds. The results differ from those reported by Yacubian et al. (2006) in that activation of the amygdala was in- creased during anticipation of loss as well as reward in the study by Breiter et al. (2001) but
21
What can the monetary incentive delay task tell us about the neural processing of reward and punishment? only during anticipation of loss in the study by Yacubian et al. (2006). Although speculative, the difference might be explained by the subjects’ role in the tasks. The wheel-of-fortune task did not give the subjects any control over the outcome, whereas during the guessing task used by Yacubian et al. (2006) subjects might have had a feeling of agency, i.e., being responsible for the outcome. Thus, anticipation of reward might depend on whether subjects perceive themselves to have control over the outcome or not.
Additionally, introducing high-incentive versus low-incentive trials (operationalized as mon- etary gain/loss versus knowledge of performance without monetary consequence, respec- tively), it has been shown that the dorsal striatum/caudate nucleus is mainly sensitive to monetary incentive, even during outcome processing (Delgado et al., 2004).
Reward-based Learning One of the most important functions of reward processing is to enable the organism to adapt behavior in order to maximize reward and minimize punishment. Reward prediction error indicates that a cue is not associated with the expected consequence. Thus, in the future, expectations connected to that cue should change. This forms the essence of classical condi- tioning. As a result, being confronted with the respective cue might be avoided or advanced in the future. Similarly, if reward is dependent on an individual’s behavior (e.g., choice be- havior or motor accuracy), a prediction error informs the individual that the behavior does not lead to the expected outcome. According to learning theory, behaviors are chosen so that expected reward is maximized and/or expected punishment is minimized. Thus, behav- iors leading to reward are strengthened and behaviors leading to punishment are weakened. This is the principle of operant (or instrumental) conditioning.
As a variant of the classical MID task, early studies (Ramnani and Miall, 2003; Ramnani et al., 2004) gave rewards contingent on goal-oriented activities or contingent on stimuli unrelated to behavior. A main result was that if not contingent on any behavior, unpredicted rewards evoked activity in the orbitofrontal cortex, the frontal pole, the parahippocampal cortex, and the cerebellum. If a monetary incentive is present while a visually triggered action is selected and planned, this results in enhanced activity within the prestriate visual cortex, the premo- tor cortex, and the lateral prefrontal cortex as compared with action selection and planning without a monetary incentive. These findings, based on goal-oriented behavior, do not in- volve striatal structures, which makes it hard to integrate them into the reward literature
22
What can the monetary incentive delay task tell us about the neural processing of reward and punishment? existing at the time. In fact, there is the possibility that focusing on goal-oriented motor be- havior might involve different structures than those involved in focusing on the decision pro- cess. However, no direct comparison of classical versus operant conditioning processes was performed in these studies. O'Doherty et al. (2004) applied a task requiring a two-alternative forced choice in which fruit juice could be gained as a reward upon selecting the right reac- tions in response to corresponding cues, thus forming a non-monetary instrumental condi- tioning task. The investigators compared this task with a condition in which the subject had no influence on the outcome, but the selection was made by a computer (coupled with the subject’s previous selections), thus forming a classical conditioning situation in which motor activity is comparable with instrumental conditioning. This approach allows the conditioning process to be viewed within the framework of an actor-critic model, where the actor chooses actions according to expected action outcomes and the critic controls whether the actions lead to the expected rewards (reward prediction error). In this setting, an actor was only assumed to be active in the instrumental conditioning condition. The results show that ven- tral striatum activity correlates with reward prediction error signal in both types of condi- tioning. The prediction error signal during classical conditioning was related to activity in the ventral putamen and the prediction error signal during instrumental conditioning was related to the nucleus accumbens. The dorsal striatum, on the other hand, was more strongly acti- vated due to outcome processing in instrumental conditioning than in classical conditioning, suggesting its involvement in the role of an “actor”. Importantly, in another study, the dorsal striatum (head of the caudate nucleus) was activated only when a reward was perceived to be contingent on an action (Tricomi et al., 2004).
Considering reward following cues versus reward following actions, Glascher et al. (2009) have set up an experiment discriminating between the two. They found that activity in the ventromedial prefrontal cortex corresponds to the expected reward following actions as well as external cues. On the other hand, using an operant conditioning paradigm, FitzGerald et al. (2012) distinguished action values (the value ascribed to a specific action) from choice values (the value ascribed to either of two choices). They were able to show that the ventro- medial prefrontal cortex along with thalamic and insular structures decode action-specific values, and are thus likely to be involved in operant conditioning. This is partly consistent with studies showing brain activity in the ventromedial prefrontal cortex to correspond to the expected value of actions or choices (Palminteri et al., 2009; Wunderlich et al., 2009).
23
What can the monetary incentive delay task tell us about the neural processing of reward and punishment?
These studies of reward-based learning make it clear that distinct mechanisms are likely to be involved when action is or is not required by the subject. However, when acting, we have to consider whether the actions are goal-oriented or not. If rewards or punishments are of- fered, we can usually assume that the agent’s goal is to maximize reward and to minimize punishment. Some aspects of goal-oriented behavior in the context of reward processing are discussed in the following section.
Goal-oriented Behavior and Reward One important distinction between classical and operant conditioning lies in the fact that classical conditioning does not assume an agent’s action to lead to consequences. Instead, during classical conditioning, stimulus effects on the individual are investigated. Operant con- ditioning, on the other hand, rewards certain behaviors, leading to increased probability that these behaviors take place in the future, possibly in order to receive further rewards. Thus, presentation of rewards is commonly understood to be accompanied by emotional reactions which may trigger motivated behavior. This view brings a series of studies into focus, allowing the question of whether goal orientation might involve distinct components of the reward system to be addressed. An early study pointing in this direction showed dorsal striatum ac- tivity in response to the presentation of performance feedback in classification learning tasks (Aron et al., 2004; Poldrack et al., 2001; Tricomi et al., 2006) Interestingly, positive perfor- mance feedback did not elicit stronger activity than negative feedback in any subregion of the striatum (Aron et al., 2004). However, when giving performance feedback under two conditions, one signaling achievement of the subject’s goal and the other signaling the same amount of information but unrelated to any explicit goal, the former condition activated the head of the caudate nucleus more strongly (Tricomi and Fiez, 2008). Similarly, Nees et al. (2012) demonstrated that during a MID task, the anticipation of an optional reward elicited ventral striatum activity dependent on the magnitude of the possible reward. On the other hand, no such dependency existed in a simple guessing task, in which reward magnitude, although being experimentally manipulated, was unrelated to the subject’s behavior. Other studies have used a slot-machine task requiring no action and compared this with tasks in which outcome was contingent on the preceding choice or action (Donkers et al., 2005). Us- ing EEG, the investigators found that action outcomes elicited a transient change in medio- frontal activity when they were unfavorable (errors). On the other hand, when independent of the subject’s action, a similar mediofrontal EEG component was elicited in response to both favorable as well as unfavorable salient outcomes. This finding was later substantiated
24
What can the monetary incentive delay task tell us about the neural processing of reward and punishment? with a different task, again showing a greater difference between win-related and loss-re- lated EEG frontomedial components, when the outcome depended on the subject’s action (Zhou et al., 2010). These findings can be seen in the context of the abovementioned studies of reward anticipation when there is (Yacubian et al., 2006) or is not (Breiter et al., 2001) a likely feeling of agency. To elaborate further on the role that performance feedback plays in the reward circuitry, a recent study compared performance feedback in a motor task when performance was or was not linked to monetary reward (Lutz et al., 2012). In both cases, being informed about good performance activated the ventral striatum. However, in this study, feedback about bad performance led to less activity in this region than feedback about good performance, in contrast with previous findings using other tasks (Aron et al., 2004). However, in those tasks, information about bad performance was similarly valuable for the overall goal of learning a classification task, whereas the goal in the later study was generally to maximize precision in a motor task that in a subset of trials led to monetary gain. Thus, error was always negatively coupled with reward. Similar results have been found in a study using a category learning task with a monetary incentive in 50% of trials whereas only cogni- tive feedback was given in the other 50% (Daniel and Pollmann, 2010). Both kinds of feedback elicited increases in the activity of several basal ganglia structures during the anticipation phase. Activity in the nucleus accumbens was stronger in monetary incentive trials, corre- sponded to measures of extrinsic motivation in monetary incentive trials, and corresponded to measures of intrinsic motivation in cognitive feedback trials. Video gaming is another ex- ample in which high motivation can be observed without obvious rewards. Performance feedback is stressed in many of these games, and massive release of dopamine into the ven- tral and dorsal striatum was reported long ago during video gaming (Koepp et al., 1998). An active role of the player seems to be essential for this (Katsyri et al., 2013).
These studies show that performance feedback, especially if it informs about good perfor- mance, elicits neuronal activity in many respects comparable with the neuronal activity elic- ited by the presentation of reward. Given that performance feedback is not regarded as a classical rewarding stimulus, the question about motivation in these tasks without explicit incentive is interesting. The concept of intrinsic motivation (Ryan and Deci, 2000) assumes that some tasks are worked on merely because a subject enjoys working on the task. With certain components of the human reward system being involved in the processing of perfor- mance feedback and even being connected to measures of intrinsic motivation, versions of
25
What can the monetary incentive delay task tell us about the neural processing of reward and punishment? the MID task without monetary (or other forms of) incentive seem to provide a valuable ap- proach to investigate intrinsic motivation and its interaction with extrinsic motivation. This interaction in behavioral studies has been discussed controversially in terms of the interest- ing notion that monetary reward can undermine intrinsic motivation. A pioneering study us- ing a variant of an MID task investigated the size of the midbrain and striatal activation due to positive performance feedback under several conditions, i.e., when performance was cou- pled to monetary reward, when performance was not coupled to monetary reward, and when performance was not coupled to monetary reward after having been coupled to it pre- viously. Not differentiating between the dorsal and ventral striatum, the authors found that performance feedback elicited the strongest responses in the midbrain and striatal regions due to feedback of good performance which was monetarily rewarded, and significantly weaker activation if no monetary incentive was given. Interestingly, removing the monetary incentive led to a drop in midbrain and striatal activity to significantly below the level in a control group where monetary incentives had never been present.
These studies show that, in some tasks, performance feedback can serve as task intrinsic reward, so instead of incentive motivation, intrinsic motivation might act. Considering the much discussed advantages of intrinsic motivation versus extrinsic motivation (Frey, 1994; Frey and Jegen, 2001), closer examination of the neural systems involved in the interaction of these motivational systems would be of interest.
Reward Processing and Error Monitoring A topic closely related to the role of performance feedback in reward processing is the pro- cessing of error information in cognitive or motor tasks. As mentioned earlier, processing of error information in a categorical learning task has been shown to elicit activity in the ventral striatum in specific settings (Poldrack et al., 2001). Error information is most frequently in- vestigated in tasks showing similarity to the MID task (consisting of the triad cue, ac- tion/choice, and outcome), with the difference being that monetary incentive is not usually coupled to error information. Therefore, although the scope of the present paper is focused on the MID task, a brief comment on the interesting link between human error processing and reward systems seems appropriate.
Processing error information in decision tasks has been studied intensively using electrophys- iological methods, mainly EEG. Event-related components, time locked to erroneous behav- ior (error-related negativity), and time locked to feedback about an error (feedback-related
26
What can the monetary incentive delay task tell us about the neural processing of reward and punishment? negativity) have been identified as the most important and most reliable neuronal correlates of error processing. An influential model, the reinforcement learning theory of error pro- cessing assumes that whenever information indicates that an action does not result in an expected consequence, disinhibition of the anterior cingulate cortex mediated by the basal ganglia leads to a negative EEG deflection in the frontocentral electrodes. The precise loca- tion and involvement of the basal ganglia, along with the anterior cingulate cortex and other structures not readily amenable to investigation by EEG, have been identified by functional magnetic resonance imaging studies of error perception (Holroyd et al., 2004; Klein et al., 2007; Ullsperger and von Cramon, 2003). The theory is eloquently described by Holroyd et al. (2009). The stronger the expectation of a certain action outcome that is missed, the greater the EEG deflection. Thus, the reinforcement learning theory of error processing only applies if stable expectancies concerning action outcome can be formed, allowing a reward prediction error to be generated. In motor learning theory, the capacity to predict action outcomes is inherent in an internal model mapping actions to consequences on the environ- ment (Wolpert et al., 1995). Confirming the reinforcement learning theory of error pro- cessing and its applicability to motor learning, a recent study has shown that error-related negativity increases with buildup of such an internal model while learning audiomotor map- pings on a manipulated piano keyboard layout (Lutz et al., 2013). The evidence presented here demonstrates that reward prediction errors seem to play a greater role than would be expected from the investigation of MID tasks. Rather, reward prediction and outcome mon- itoring seem to have important features in common. As proposed by Kaplan and Oudeyer (2007), the goal to minimize prediction error in several domains might be driving intrinsically motivated behavior like playing and exploration, and the nucleus accumbens may play a piv- otal role in this process. Interestingly, novelty, as encountered when exploring new environ- ments, activates a neuronal system partly overlapping with the reward network, and contex- tual novelty seems to boost activity in the striatum (Bunzeck et al., 2011; Bunzeck et al., 2012; Businelle et al., 2010; Guitart-Masip et al., 2010).
Discounting of Delayed Reward MID tasks also allow investigation of other aspects of reward processing, not mentioned so far. One important aspect of the value of the reward is determined by the temporal availa- bility of the reward; if available immediately, a reward is valued more highly than if it is avail- able only after a certain period of time. Thus, introducing a choice between small and imme- diately available or large and not immediately available rewards enables study of the process
27
What can the monetary incentive delay task tell us about the neural processing of reward and punishment? of devaluation due to temporal delay, known as delayed reward discounting. A wealth of literature has accumulated showing how different periods of delay influence the subjective perception of reward value in different populations. A recent overview of this topic, including underlying neural mechanisms, has been published by Peters and Buchel (2011). Subjective discounting seems to take place in a neural system comprising the ventromedial prefrontal cortex, ventral striatum, and posterior cingulate cortex. Further, the amygdala and hippo- campus are involved in delay discounting. The specific contribution of these structures is not yet fully understood, but abnormal delay discounting has been linked to neuropsychiatric disorders related to impulsivity, as well as to addictive behaviors. Typically, addicts discount delayed rewards at a much higher rate than control subjects (Bickel et al., 2007; Businelle et al., 2010). Moreover, this impulsive discounting behavior has been shown to be largely inde- pendent of the particular drug of abuse and thus seems to be a reliable trait marker for ad- diction, especially given that it has been observed not only for drug rewards, but also for nondrug rewards, such as money (Peters and Buchel, 2011). The latter makes the MID task an ideal tool for the study of addictive behavior in the context of reward and loss. The way in which the neural response to reward differs between addicted and healthy subjects is dis- cussed in the following section concerning individual influences on reward processing.
Individual Influences on Reward Processing With the observation that certain clinical populations show abnormal delay discounting, a further field of investigation using the MID task is now introduced, along with a few exam- ples, i.e., a detailed description of alterations in reward processing in clinical populations, as well as influences of personality traits and changes across the lifespan. This can help to in- crease our knowledge about the relevant diseases and find new treatment approaches.
For instance, many authors have used the MID task to understand reward and loss in addicts. The observation of steeper delay discounting in addicts raises the question of whether in- creased discounting is a consequence or a cause of addiction. That is, do genetic factors result in an impulsive personality and thereby increase the likelihood of drug abuse, or is impulsive discounting a repercussion of changes at the neural level due to long-term drug abuse? Gen- erally, while addicts show an increased response of the reward system to drug-related cues (Diekhof et al., 2008), overall the data imply that addiction is associated with reduced activa- tion of the valuation network (i.e., the ventral striatum and orbitofrontal cortex, including the ventromedial prefrontal cortex) during processing of nondrug rewards (Peters and
28
What can the monetary incentive delay task tell us about the neural processing of reward and punishment?
Buchel, 2011). Recent evidence from a longitudinal genetic neuroimaging study links de- creased reward sensitivity during the anticipation phase of an MID task to a certain haplotype of the ras protein-specific guanine nucleotide-releasing factor 2 (RASGRF2) gene in 14-year- old males (Stacey et al., 2012). This haplotype has previously been linked to addictive behav- ior (Schumann et al., 2011), and thus represents a possible genetic risk factor for drug addic- tion. In contrast with this reward deficiency hypothesis, other studies have observed in- creases in ventral striatal activity during the anticipation of monetary gains in chronic canna- bis users (Filbey et al., 2013; Jager et al., 2013; Nestor et al., 2010), and a blood-oxygen level- dependent response in the right ventral striatum was found to be significantly correlated with lifetime use and reported lifetime cannabis joints consumed (Nestor et al., 2010). There- fore, the relationship between chronic cannabis use and activity in the ventral striatum might be qualitatively different from that involving other drugs (Bjork et al., 2008). Concerning the question of cause or consequence, a recent study by Patel et al. (2013), in addition to cor- roborating the reward deficiency hypothesis, investigated reward processing in former and current cocaine users. Both groups differed similarly from control subjects, but between- group differences were found in the ventral tegmental area during loss outcome and in pre- frontal regions during loss anticipation. The authors concluded that current cocaine use may influence reward processing circuits, and that even long-term cocaine abstinence does not normalize most drug-related reward circuit abnormalities. Since both groups showed ele- vated impulse-related factors that relate to loss, the authors further suggested that these tendencies may predate cocaine addiction. Further, genetic factors have been shown to be associated with altered reward processing in alcoholism (Villafuerte et al., 2012). Certain var- iants in the inhibitory γ-amino butyric acidβ2 receptor subunit (GABRA2) gene are linked with higher insular cortex activity during anticipation of reward and punishment, as well as with impulsiveness and familial alcohol abuse. Here, however, changes in dopaminergic activity have not been directly reported, since GABRA2 acts on the production of GABA A receptors. All in all, further studies are needed investigating the extent to which functional differences in former users of cocaine and other drugs reflect pre-existing features, exposure, and recov- ery.
As another example of changed reward processing in a clinical population, patients with at- tention deficit hyperactivity disorder show decreased activation in the ventral striatum dur- ing anticipation of gain, but increased activation of the orbitofrontal cortex in response to gain outcomes (Strohle et al., 2008). However, these observations of decreased activation of
29
What can the monetary incentive delay task tell us about the neural processing of reward and punishment? the putamen (Stoy et al., 2011), while being partially confirmed in adults with persistent symptoms of attention deficit hyperactivity disorder, did not prevail in symptom-free adults with childhood attention deficit hyperactivity disorder (Stoy et al., 2011), and behavioral changes in reward processing were not confirmed in other studies (Demurie et al., 2011; Demurie et al., 2013). Thus, although the phenomenon of striatal hypoactivation during re- ward anticipation is well known in patients with attention deficit hyperactivity disorder (Stoy et al., 2011; Strohle et al., 2008), it would be premature to draw firm conclusions.
Schizophrenia is another disease that has been linked to changes in reward processing. Dur- ing reward anticipation, schizophrenics show significantly less activation in the ventral stria- tum (Schlagenhauf et al., 2008), anterior cingulate cortex, and dopaminergic midbrain re- gions than healthy controls (Nielsen et al., 2012a), possibly explaining the symptoms of apa- thy (Simon et al., 2010) commonly present in schizophrenia. This attenuation was reduced by treatment with the dopamine agonist amisulpride (Nielsen et al., 2012b) or with olanzap- ine (Schlagenhauf et al., 2008). However, this attenuation has not consistently been repli- cated in other studies (Waltz et al., 2010; White et al., 2013). Although there is no clear pic- ture as yet regarding how the reward system may be modified in patients with schizophrenia, probing the integrity of this system may lead to identification of subgroups and tailored treat- ment concepts.
A recent meta-analysis of the literature on reward processing in major depressive disorder has summarized the results of 22 functional magnetic resonance imaging studies, of which five used variations of the MID task and another seven used conditional learning or guessing tasks with or without rewards to patients. This work yielded rather heterogeneous results, possibly due to the great heterogeneity in the experimental paradigms used. One general finding seemed to be decreased reward-related activity in the subcortical and limbic areas and an increased response in cortical areas. The authors concluded that “future studies may be strengthened by paying careful attention to the types of reward used as well as the dif- ferent components of reward processing examined” (Zhang et al., 2013).
As mentioned above, addictive behavior is linked to changes in reward processing, possibly via greater impulsiveness or altered reward sensitivity in addicted individuals. However, in nonclinical populations, reward sensitivity, as measured by questionnaire (Torrubia et al., 2001) is a stable trait associated with changes in, e.g., reward-based learning and inhibitory control (Corr, 2004). This trait also has neurophysiologic correlates. For example, individuals
30
What can the monetary incentive delay task tell us about the neural processing of reward and punishment? with high reward sensitivity show increased responses in the nucleus accumbens and mid- brain to reward anticipation (Beaver et al., 2006; Camara et al., 2010; Carter et al., 2009; Hahn et al., 2009), as measured during performance of MID tasks. Further, structural and functional correspondence to high trait reward sensitivity, not directly related to MID task processing, have been described and complement our understanding of individual differ- ences in reward processing. Such changes include diminished striatal volume (Barros- Loscertales et al., 2006), increased strength of the white matter tract between the nucleus accumbens and amygdala (Cohen et al., 2009), more random resting neural dynamics in the nucleus accumbens and orbitofrontal cortex (Hahn et al., 2012), and less functional connec- tivity between the midbrain and medial orbitofrontal cortex (Costumero et al., 2013).
Other personality factors have also been shown to relate to reward processing. For example, Wu et al. (2014) have investigated the affective traits of positive arousal and negative arousal, as derived by factor analysis from several standard affective personality subscales (i.e., the extraversion and neuroticism scores from the NEO-Five-Factor Inventory; actual high-arousal positive and actual high-arousal negative scores from the A Values Inventory; and behavioral inhibition, behavioral activation-reward, behavioral activation-drive, and be- havioral activation-fun scores from the behavioral inhibition/behavioral activation scale). These authors demonstrated that during anticipation of large gains, the nucleus accumbens show significantly increased activity bilaterally, whereas during anticipation of large losses, activity in the anterior insular cortex is significantly increased bilaterally. Interestingly, acti- vation increases in the left nucleus accumbens during anticipation of large gains correlate with positive arousal scores, whereas activation increases in the right anterior insula during anticipation of large losses correlate with negative arousal scores.
Further, recent studies have shown that reward processing can be influenced by environ- mental factors such as stress. Treadway et al. (2013) demonstrated that subjects reporting a greater impact of stressors had smaller neural responses in the medial prefrontal cortex in response to both monetary gains and losses in an MID task. Similarly, acute stress, induced before performing a guessing task, blunted activation increases in the dorsal striatum and orbitofrontal cortex when compared with a control group not subject to stressors (Porcelli et al., 2012). These studies, although not directly corroborating each other, nevertheless draw a comparable picture, revealing a decrease in mediofrontal reward-related brain activity un- der conditions of perceived stress, which might relate to the role of stress as risk factor for addictive behavior (Sinha, 2009).
31
What can the monetary incentive delay task tell us about the neural processing of reward and punishment?
Another topic warranting brief discussion here is the development of reward processing over the lifespan. Although it could be argued that a comparison between healthy adolescents and adults reflects intraindividual development rather than interindividual differences, no longitudinal studies on reward processing beyond adolescence are available, to our knowledge (note, however, the IMAGEN trial following a cohort of 2,000 adolescents and describing, among other things, functional genetics and neuroimaging of the MID task) (Schumann et al., 2011). Thus, studies using the MID task to investigate gain and loss in de- velopmental populations are discussed in this section. Adolescents are of particular interest in this context because of their increased willingness to take risks. Bjork et al. (2010) were the first to compare patterns in the reward circuit in response to incentive cues and out- comes between adolescents and adults. They observed lower right ventral striatal and right- extended amygdala activation due to gain anticipation (but not consumption) in adolescents. These findings were subsequently replicated by the same group (Bjork et al., 2011). In con- trast, other studies investigating win versus no-win demonstrated stronger activation of the ventral striatum in adolescents, but observation of stronger activation in the amygdala of adults was documented, thus suggesting that “maturing subcortical systems become dispro- portionately activated relative to later maturing top-down control systems, biasing the ado- lescent’s action toward immediate over long-term gains” (Ernst et al., 2005; Galvan et al., 2006). The divergence of findings from these different studies has been attributed to sensi- tivity of the incentive-motivational neurocircuitry to the nuances of the incentive task or stimuli, such as behavioral or learning contingencies and to the specificity of the component of instrumental behavior, such as anticipation versus notification (Bjork et al., 2010). More recently, it was found that, compared with adults, adolescents show less of a linear increase in ventral striatal activity during anticipation of increasing reward magnitude (Vaidya et al., 2013). In this study, adults, but not adolescents, demonstrated greater ventral striatal activity in response to the same absolute reward when it was the preferred of two possibilities (i.e., $1 versus $0.20 compared with $1 versus $5), indicating that ventral striatal activity in ado- lescents is less sensitive to relative reward value. Further, reduced ventral striatal sensitivity to absolute anticipated reward correlated with a higher level of trait impulsivity. This finding is consistent with that of another study, in which healthy young subjects, who happened to be steep delay discounters, showed lower responses in the left ventromedial caudate during anticipation of potential reward (Benningfield et al., 2014). All in all, although their findings may diverge in some aspects, researchers agree on the attribution of increased risk-taking
32
What can the monetary incentive delay task tell us about the neural processing of reward and punishment? and impulsive behavior during adolescence to developmental differences in neural pro- cessing of rewards. Moreover, with the development of a child-friendly version of the MID task (Helfinstein et al., 2013; Kappel et al., 2013), the investigation of reward processing in developmental populations can now be validly expanded to children.
Conclusion We conclude with a short final valuation and synopsis of the use of the MID task. One of the most important achievements of the MID task is to provide a paradigm flexible enough to allow investigation of many facets of reward processing and yet allowing comparison be- tween studies. By parsing the whole process of reward processing, from incentive presenta- tion, task performance, display of approach or avoidance behavior, possible discounting of reward due to delay, and finally reward consumption, researchers are free to focus on any of these steps in a multitude of populations, using different reward modalities and introducing other variations. We have briefly mentioned current developments, e.g., the use of the MID task in prospective genetic neuroimaging studies on the development of psychiatric disor- ders. We have pointed to relationships with other tasks, e.g., some forms of conditioning or error processing, thereby placing a special focus on the possible role that agency and goal orientation might have in the processing of rewards and punishments. These relationships should be further explored in future studies, thus integrating knowledge gathered in differ- ent fields of research. Some fields of integration are already emerging, e.g., elucidating the role played by reward processing in learning mechanisms connected with novelty, or inves- tigating the processing of performance feedback in the framework of reward processing, which may yield new insights into the mechanisms of intrinsic motivation.
Acknowledgments The work was supported by the Clinical Research Priority Program “Neuro-Rehab” of the Uni- versity of Zurich. The authors would like to thank three anonymous reviewers for their knowl- edgeable and constructive remarks.
Disclosure The authors report no conflicts of interest in this work.
33
34
Rewarding feedback promotes motor skill consolidation via striatal activity
Published in:
Progress in Brain Research 2016, 229: 303-323. http://dx.doi.org/10.1016/bs.pbr.2016.05.006
Authors:
Widmer M., Ziegler N., Held J.P., Luft A.R. and Lutz K.
Publisher:
Elsevier
Keywords:
Motor skill learning, monetary reward, performance feedback, knowledge of performance, fMRI, striatum, pointing task, consolidation
35
Rewarding feedback promotes motor skill consolidation via striatal activity
Abstract Knowledge of performance can activate the striatum, a key region of the reward system and highly relevant for motivated behavior. Using functional magnetic resonance imaging, striatal activity linked to knowledge of performance was measured during the training of a repetitive arc-tracking task. Knowledge of performance was given after a random selection of trials or after good performance. The third group received knowledge of performance after good per- formance plus a monetary reward. Skill learning was measured from pre- to post- (acquisi- tion) and from post- to 24 hours post-training (consolidation). Our results demonstrate an influence of feedback on motor skill learning. Adding a monetary reward after good perfor- mance leads to better consolidation and higher ventral striatal activation than knowledge of performance alone. In turn, rewarding strategies that increase ventral striatal response dur- ing training of a motor skill may be utilized to improve skill consolidation.
36
Rewarding feedback promotes motor skill consolidation via striatal activity
Introduction Extrinsically motivated actions, are performed because they lead to an outcome, e.g., to a reward (Ryan and Deci, 2000). By increasing the extrinsic subjective value, rewards augment the overall subjective benefit of a task, making people tolerate higher subjective costs, and are thus traditionally defined as stimuli an organism is willing to work for (Knutson and Cooper, 2005; Lutz and Widmer, 2014). Intrinsic motivation, on the other hand, refers to do- ing something because it is inherently interesting or enjoyable, which is influenced by factors such as the subject’s perceived autonomy, competence for or relatedness to a task (Ryan and Deci, 2007). Similar to motivation, reward can be classified as extrinsic or intrinsic (Deci et al., 1999; Deci et al., 2001; Reitman, 1998). While extrinsic reward refers to the receipt of material (e.g., food or money) for a specific activity, the term “intrinsic reward” refers to reward derived from task inherent stimulation (e.g., information about an achieved perfor- mance, watching a self-painted picture, or feeling self-produced movements). Evidence from behavioral studies implies that extrinsic reward might undermine intrinsic motivation and thus may lead to a decrease in performance (Callan and Schweighofer, 2008; Deci et al., 1999; Kohn, 1999; Murayama et al., 2010; Spence, 1970). For instance, the time children spend drawing decreases below baseline after this behavior had been (externally) rewarded and the reward has then been withdrawn (Greene and Lepper, 1974).
In experiments using functional magnetic resonance imaging (fMRI), both intrinsic and ex- trinsic (performance-dependent) reward have been shown to increase the neural activity in the striatum (Lutz et al., 2012), a key locus of reward processing (Knutson et al., 2009). In these experiments, only the ventral striatum was active during performance feedback, while feedback plus monetary reward activated both ventral and dorsal parts of the striatum. How- ever, other studies found activation elicited by feedback alone also in the dorsal striatum (Poldrack et al., 2001; Tricomi et al., 2006; Tricomi and Fiez, 2008; Tricomi et al., 2004). Fur- thermore, dorsal striatal activity was shown to be modulated by the subject’s sense of agency for having achieved a goal (Han et al., 2010; Tricomi and Fiez, 2008).
Previous research has investigated the influence of feedback and reward on the acquisition of cognitive tasks, e.g., decision making paradigms (den Ouden et al., 2013; Frank et al., 2004; Robinson et al., 2010). Our animal studies suggest that dopaminergic signals originating in reward-coding brain regions (ventral tegmental area) are required for motor skill acquisition. In rodents, dopaminergic projections from the ventral tegmental area to the primary motor cortex enable motor learning and long-term potentiation in cortico-cortical projections (Hosp
37
Rewarding feedback promotes motor skill consolidation via striatal activity et al., 2011; Molina-Luna et al., 2009). These projections are not necessary for task execution (Molina-Luna et al., 2009). We hypothesize that this system can be used to facilitate motor skill learning by amplification of rewarding stimuli.
Indeed, recent work suggests positive effects of monetary reward on procedural (Wachter et al., 2009) and skill motor learning (Abe et al., 2011) as well as on motor adaption (Galea et al., 2015). Notably, all of these studies reported dissociable effects of positive and negative reward, and the latter two found positive reward to impact task consolidation/retention. Moreover, the reward-related learning effect reported by Wachter et al. (2009) was found to be mediated by the dorsal striatum. However, these studies exclusively used money as an extrinsic reward, albeit, as illustrated above, also intrinsic rewards (e.g., knowledge of per- formance) were shown to activate the human reward circuits and thereby possibly influence motor learning.
Dopaminergic neurons in the midbrain signal outcomes that are better than expected (posi- tive prediction error (Schultz, 2000)). Being informed about unexpectedly good performance may thus cause a positive prediction error. Indeed, only being informed about positive task outcome resulted in better performance than being informed about the outcome of poorly solved trials (Chiviacowsky and Wulf, 2007). Whether these findings come along with higher reward activity after good performance feedback remains to be elucidated.
In the present study, a modified version of the arc-pointing task that involves a visually- guided precision movement of the wrist (Shmuelof et al., 2012) was used to test the hypoth- esis that striatum activation is increased if knowledge of performance is given after good performance instead of a random selection of trials. Adding a performance-dependent mon- etary reward was expected to further increase this activation. In addition, we hypothesized that motor skill learning is improved in conditions with enhanced striatum activity.
Methods
Participants Forty-five healthy right-handed volunteers (22 females, 20-34 years of age, 24.5 years on average; Table 3.1) participated in this study that was approved by the cantonal ethics com- mittee (KEK-LU 13054). Hand preference and dominance were assessed using the Edinburgh Handedness Inventory (Oldfield, 1971) and the Hand Dominance Test (Steingruber and
38
Rewarding feedback promotes motor skill consolidation via striatal activity
Lienert, 1971), respectively, confirming that all participants were classified as right-handed. Subjects were recruited from the University community or shared a similar educational sta- tus. They were not specifically skilled or trained in comparable motor tasks. All participants gave written informed consent before being randomly assigned to one of three groups. Allo- cation was according to a computer-generated random number sequence. Subjects were un- aware of the other groups and the scientific rationale of the study. All subjects received fi- nancial compensation in comparable amounts, but only for one group payments depended on individual performance during the training of the motor task.
Overall KP random KP good KP good +MR
N (Dropouts) 44 (1) 15 ( -) 14 (1 ) 15 ( -)
Age (SD) 24.5 (3.2) 25.9 (2.8) 25.1 (3.7) 22.9 (2.5)
Sex, male / female 23 / 21 7 / 8 7 / 7 9 / 6
Table 3.1: Subject characteristics. N reports the number of subjects per group with dropouts listed in brackets. SD is standard deviation. Note that groups were allocated randomly, not by matching any of the reported characteristics.
Study Design Subjects participated in the study for two consecutive days. Neutral (group-independent) test sessions were performed to assess momentary performance on day one, before and af- ter the group-specific training. To assess overnight task consolidation, subjects returned 20 to 28 hours after finishing day 1-training.
Motor Task Originally, the arc-pointing task (Shmuelof et al., 2012) was developed to investigate the speed-accuracy trade-off function during motor skill learning. To examine the influence of knowledge of performance with or without monetary reward on brain activity and motor skill learning, we modified the task. Here, the task required subjects to perform wrist movements to steer a cursor on a computer screen through a semicircular channel (Figure 3.1). To max- imize the dynamic range of learning the non-dominant left rather than the right wrist was chosen assuming that initial performance would be worse with the left. Ideally, the cursor
39
Rewarding feedback promotes motor skill consolidation via striatal activity had to be guided along the middle of an arc-channel with the nominal movement speed dic- tated by a clock hand pointing at the current nominal position (Figure 3.2A). For each frame (at 60 frames per second), the absolute distance from the actual to the nominal position was calculated and the average over the whole movement was used as performance measure to determine a score (with or without monetary consequences, Figure 3.2B).
Prior to each new block of movements, subjects viewed a computer-generated demonstra- tion of the clock hand moving along the channel in the required movement time. At the be- ginning of each trial, subjects placed the cursor in the red starting box. After a variable delay (800 to 1’600 ms), the box turned green as an “ok-to-go” signal (reaction time was not a measure of performance and subjects were told to start any time after the box turned green). As soon as the cursor had left the box in positive y-direction (= upward), the clock hand started to move with uniform angular velocity continuously pointing at the nominal cursor position that subjects tried to adhere to. The cursor was visible throughout the movement (online feedback; Figure 3.2A) and the trial automatically ended when the clock hand arrived at the end of the channel. Then the screen froze for a variable period of time (500 to 4’500 ms). During test sessions, the subsequent trial directly followed. For training trials knowledge of performance or knowledge of performance plus monetary reward was presented for 3’000 ms at this point, followed by another variable delay period (500 to 4’500 ms) before the sub- sequent trial began. Figure 3.1 shows a schematic summary of the paradigm.
To assess skill level in the absence of knowledge of performance and monetary reward, par- ticipants had to perform the arc-pointing task at five different movement speeds defined by the movement time that was allowed to move the cursor through the arc-channel (the clock hand uniformly travelled along the arc in exactly that time). Per test session, seven consecu- tive trials were performed as blocks with one of five movement times (movement time in ms: 800, 1’000, 1’200, 1’400 and 1’800) and these blocks were randomly ordered with 15 s breaks in between. Ten familiarization trials were allowed prior to the very first test session (i.e., pre-training test) and, as already mentioned, a demonstration of the movement time was shown at the beginning of each movement time block. All in all, participants performed 35 trials per test session.
The training, on the other hand, was composed of five blocks of 50 trials each with 15 s breaks after 25 movements (within blocks) and two minutes breaks between the blocks. All 250
40
Rewarding feedback promotes motor skill consolidation via striatal activity
Figure 3.1: Trial sequence. After placing the cursor in the start box, the box eventually turned green (“ok-to-go” signal) and subjects were free to start the movement whenever ready. The placing of the cursor in the start box, as well as the period from “ok-to-go” to the actual start of the movement were self-paced and hence of variable length (var). A specific movement time (MT) according to the speed requirements of the current block of trials was allowed to steer the cursor through the semicircular channel. As soon as movement time elapsed, the screen froze. During test sessions, the next trial di- rectly followed. In case of a training trial, a group-specific knowledge of performance feedback was presented after feedback trials (FB TRIAL), or subjects were shown a neutral visual control stimulus after no-feedback trials (NO-FB TRIAL). Either way, the next training trial began after another delay period. training trials were performed at one single movement time (i.e., 1’200 ms). After a move- ment, subjects received a terminal feedback by ~50% chance. Here, the three groups differed in terms of the selection of feedback trials and in terms of the type of feedback they were given. While the first group received knowledge of performance after randomly selected tri- als (KP random ), the other groups got either knowledge of performance only (KP good ) or knowledge of performance signifying a monetary reward (KP good +MR) after relatively good performance, i.e., when they performed better than the moving median over their perfor- mance in the last 10 trials. As described above, the tip of the clock hand pointed at the nom- inal position for each frame during a trial and the cursor’s mean distance ( ̅) to the corre- sponding nominal position over all 72 frames per training trial (1’200 ms at 60 frames per second) was used as measure to quantify performance.
∑ ̅ = , where t is the number of the current trial and f stands for frame number. For members of
KP good and KP good +MR, hence, a feedback was delivered from the eleventh trial on, if