<<
Home , ETH

Research Collection

Doctoral Thesis

The use of performance feedback and reward for optimization of motor learning and neurorehabilitation of motor functions

Author(): Widmer, Mario

Publication Date: 2017

Permanent Link: https://doi.org/10.3929/ethz-a-010870008

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.

ETH Library DISS. ETH NO. 24106

THE USE OF PERFORMANCE FEEDBACK AND REWARD FOR OPTIMIZATION OF MOTOR LEARNING AND NEUROREHABILITATION OF MOTOR FUNCTIONS

A thesis submitted to attain the degree of

DOCTOR OF SCIENCES of ETH ZURICH

(Dr. sc. ETH Zurich)

presented by

MARIO WIDMER

MSc ETH HMS, ETH Zürich

born on 22.04.1986

citizen of Gränichen (AG)

accepted on the recommendation of

Prof. Dr. Nicole Wenderoth Prof. Dr. Andreas Luft Dr. Kai Lutz

2017

The Use of Performance Feedback and Reward for Optimization of Motor Learning and Neuro- rehabilitation of Motor Functions

Doctoral Thesis

MARIO WIDMER

Acknowledgments

Writing my dissertation, and finally completing it, could not have been done without the help of some enthusiastic and intelligent people around me.

First, and foremost, I want to express my sincere gratitude to Prof. Dr. Andreas Luft for giving me the opportunity to work in this interesting research field and to conduct this thesis. His enormous scientific knowledge and experience were highly influential for my development over the course of my PhD.

I would also like to sincerely thank Prof. Dr. Nicole Wenderoth for agreeing to be the head of my committee and for giving me the freedom to perform my research outside of ETH Zurich.

Moreover, I am deeply grateful to Dr. Kai Lutz, my mentor, for his indispensable support dur- ing the last few years. I am indebted for his scientific advice, but also for his encouragement and comprehension in work-related as well as in private matters.

I would like to express my thankfulness for the great support from our research team. A spe- cial thank you goes to my teammate Jeremia Held, who supported me in every situation, in research and in daily life. I thank him for his helpful advice and for being a friend. Many cor- dial thanks go to Belen Valladares, who is always having an open ear for me (never forget that you make Switzerland a better place), but also to Robinson Kundert, José López Sánchez, Irene Christen, Carola Bade-Daum, and all other members of my study team who have helped me over the years.

Furthermore, I owe a big thank you to Samara Stulz for being an excellent Masters student, for her contribution to our "fMRI Reward Assessment" project and for her patience in the data acquisition, data analysis and the entry of the data in our electronic database. Many thanks also for taking care of our office plant, which is facing a very insecure future, now that you have left our group.

In addition, I would like to take this opportunity to express my sincere appreciation to all participants for their time and enthusiasm during the studies of this thesis. That includes all healthy young and elderly subjects as well as all stroke patients who, at times, needed to bring along a lot of patience.

Acknowledgments

The biggest thank you goes to my friends and my girlfriend for their unconditional love and support - Without you, I have nothing. But with you, I have everything! – and last but not least, I would like to offer my gratitude to family for being there for me.

This research was carried out in collaboration with the University of Zurich, the University Hospital of Zurich, the cereneo - center for neurology and rehabilitation, and ETH Zürich. My position was funded by the Clinical Research Priority Program (CRPP) Neuro-Rehab of the University of Zurich. I am deeply grateful for their financial support.

I would like to dedicate this thesis to Nadja Ziegler, who started her Master thesis in our lab at around the same time as I started my PhD. Nadja sadly passed away in July 2014. “Funny how someone can come into your life for such a brief time but leave such a lasting impres- sion” - Monica Murphy. Through all the pain of losing you I know that I am better for having known you!

vi

vii

viii

Abstract

Intrinsic motivation refers to doing something because it is inherently interesting or enjoya- ble. Extrinsically motivated actions, on the other hand, are performed because they lead to an outcome. Similar to motivation, reward can be classified as extrinsic or intrinsic. Extrinsic reward refers to the receipt of material (.., food or money) for a specific activity. The term "intrinsic reward", on the other hand, refers to reward derived from task inherent stimulation (e.g., information about an achieved performance). This includes stimuli that signal perfor- mance accuracy, usually termed feedback, which can then be used to modify future perfor- mance. Generally, learners strive for positive feedback, which means that positive feedback fulfills the definition as a reinforcer or a reward.

The changes in neural activity in response to the processing of reward (and punishment) has been extensively investigated in healthy, but also clinical populations, using the so-called monetary incentive delay (MID) task. Typically, this task requires an individual to react to a target stimulus presented after an incentive cue to win or to avoid losing the indicated re- ward. The first part of this thesis (Chapter 2 ) offers an overview of different utilizations of the MID task by reviewing literature outlining the neuronal processes involved in distinct aspects of human reward processing. A special focus was laid on reward-based learning processes. For instance, in a motor experiment using a MID task combined with functional magnetic resonance imaging (fMRI), both intrinsic and extrinsic rewards have been shown to increase the neural activity in the ventral striatum, a key locus of reward processing. In a rewarded task, hemodynamic ventral striatal response correlates with dopamine release in the ventral striatum, which similarly correlates with the reward-related neural activity in the substantia nigra/ventral tegmental area, the origin of the dopaminergic projection. There is evidence from animal studies that dopaminergic projections from the midbrain to the primary motor cortex (M1) are necessary for the learning of a new motor skill. In M1, dopamine facilitates long-term potentiation, a form of synaptic plasticity that is critically involved in skill learning. Such synaptic plasticity in M1 similarly occurs during recovery/rehabilitation after stroke and likely contributes to its success. Thus, this opens the potential to use rewarding feedback in humans to promote motor skill learning and neurorehabilitation of motor functions.

ix Abstract

Based on this evidence, we conducted an fMRI study with healthy young subjects, relating striatal activity to performance feedback with or without monetary consequences during the training of a repetitive arc-tracking task (Chapter 3 ). The task required subjects to perform wrist movements to steer a cursor on a computer screen through a semicircular channel while undergoing fMRI. Our results demonstrate an influence of the feedback modality on motor skill learning. Adding a monetary reward after good performance led to better consol- idation and higher ventral striatal activation than knowledge of performance alone. In con- clusion, rewarding strategies that increase ventral striatal response during the training of a motor skill may be utilized to improve skill consolidation.

In stroke survivors, activity of this dopaminergic pathway may not only be reduced because rewards are small, but also because, after stroke, rewarding feedback might not have the same capacity to increase dopaminergic activity as in healthy subjects. This has been demon- strated for cognitive tasks, and the hypothesis for the study presented in Chapter 4 was, that this also happens in motor tasks. To test this hypothesis, we applied a similar arc-tracking task, modified as motor MID task and using fMRI to measure striatal activity linked to perfor- mance dependent monetary reward. Results of nine stroke patients and nine age-matched healthy individuals show a tendency for reduced responsiveness of ventral parts of the stria- tum in stroke patients. This is of particular interest as in the study described above ventral striatal activation was found to be the key factor for successful overnight consolidation. We have learned from animal studies that proper functioning of the dopaminergic reward system is necessary for successful motor skill learning. Thus, a reduced responsiveness of the ventral striatum to a motor performance derived reward, be it extrinsic or intrinsic, could be an im- plication for a blunted motor learning ability in patients after stroke. The ability to learn, however, is supposed to support motor recovery.

After stroke, about 50% of all survivors remain with functional impairments of their upper limb. As we were able to show that training with rewarding feedback improves motor learn- ing in humans, we hypothesize that rehabilitative arm training could also be enhanced by rewarding feedback. This amplification of reward during rehabilitative training might be a means to overcome a potentially deficient response to task inherent feedback in order to stimulate the dopaminergic system to improve recovery after stroke. Therefore, a further achievement of this thesis is the development of a clinical trial protocol, investigating re- wards in the form of performance feedback and monetary gains as ways to improve effec- tiveness of rehabilitative training (Chapter 5 ). This trial will be the first to directly evaluate

Abstract the effect of rewarding feedback including monetary rewards on the recovery process of the upper limb following stroke. A positive outcome could therefore pave the way for novel types of interventions with significantly improved treatment benefits.

In conclusion, in line with findings from animal studies we demonstrated a positive influence of reward on motor skill learning in healthy young humans. This effect was linked to an in- creased ventral striatal response to the presentation of the rewarding feedback. In stroke patients, however, preliminary data points towards a blunted response of the ventral stria- tum when compared to a healthy age-matched control group. Nonetheless, findings of this thesis emphasize the potential of rewarding feedback to promote neurorehabilitation of mo- tor functions. Therefore, a trial protocol for a randomized controlled trial investigating re- wards in the form of performance feedback and monetary gains as ways to improve effec- tiveness of rehabilitative upper limb training after stroke is proposed.

xi

xii

Zusammenfassung

Intrinsische Motivation bedeutet etwas zu tun, weil von Natur aus spannend und unter- haltsam ist. Extrinsisch motivierte Handlungen hingegen macht man, weil sie zu einem be- stimmten Ergebnis führen. So ähnlich kann man auch Belohnungen in extrinsisch und intrin- sisch einteilen. Extrinsische Belohnung ist verbunden mit dem Erhalt von Gütern (.. Nah- rung oder Geld) für eine spezifische Handlung. Der Begriff "intrinsische Belohnung" hingegen bezeichnet Belohnungen, die von Natur aus einer spezifischen Aufgabe innewohnen (z.B. In- formationen über eine erbrachte Leistung). Letzteres beinhaltet Stimuli, welche die Qualität einer Ausführung beschreiben. Solche Stimuli werden für gewöhnlich Feedback genannt. Sie können verwendet werden, um künftige Ausführungen der Aufgabe anzupassen. Im Allge- meinen erhalten Lernende gerne positives Feedback, was positives Feedback begehrenswert macht und daher dazu führt, dass es als Belohnung eingesetzt werden kann.

Änderungen der neuralen Aktivität als Reaktion auf die Verarbeitung von Belohnungen (und Bestrafungen) wurden ausgiebig untersucht, indem die sogenannte "Monetary Incentive Delay Task" (MID Task) verwendet wurde. Dies sowohl in gesunden als auch in klinischen Populationen. Üblicherweise wird dabei erst ein Hinweisreiz präsentiert und die Teilnehmer müssen danach auf einen Zielreiz reagieren, um einen angezeigten Geldbetrag zu gewinnen bzw. zu vermeiden, dass dieser verloren geht. Der erste Teil dieser Dissertation (Kapitel 2) liefert einen Überblick über die verschiedensten Verwendungen der MID Task, indem Litera- tur rezensiert wird, welche die neuronalen Prozesse beschreibt, die in verschiedene Aspekte der Belohnungsverarbeitung involviert sind. Ein spezieller Fokus wurde dabei auf beloh- nungsbasierte Lernprozesse gelegt. Unter Verwendung einer MID Task in Kombination mit funktioneller Magnetresonanztomografie (fMRT) wurde in einem Motorikexperiment zum Beispiel gezeigt, dass sowohl intrinsische wie auch extrinsische Belohnungen das ventrale Striatum, eine Schlüsselregion in der Belohnungsverarbeitung, zu aktivieren vermögen. In solchen Belohnungsaufgaben korreliert die Stärke der hämodynamischen Antwort im vent- ralen Striatum mit der Dopamin Ausschüttung im ventralen Striatum, welche wiederum mit der belohnungsbedingten neuralen Aktivität in der Substantia Nigra und dem Ventralen Teg- mentum, dem Ursprung der dopaminergen Bahnen, korreliert. Tierstudien haben gezeigt, dass dopaminerge Projektionen vom Mittelhirn zum primären motorischen Kortex (M1) für

xiii Zusammenfassung das Lernen einer motorischen Fertigkeit notwendig sind. In M1 fördert Dopamin die Lang- zeitpotenzierung, eine Form synaptischer Plastizität, welche entscheidend zum motorischen Fertigkeitslernen beiträgt. Plastizität ereignet sich jedoch auch in der Rehabilitation nach ei- nem Schlaganfall und trägt zu deren Erfolg bei. Dadurch eröffnet sich das Potenzial, beloh- nende Rückmeldungen in Menschen einzusetzen, um das motorische Fertigkeitslernen und die Neurorehabilitation motorischer Funktionen zu unterstützen.

Basierend auf dieser Evidenz haben wir fMRT eingesetzt, um in jungen gesunden Probanden die striatale Aktivität verbunden mit Leistungsfeedback mit oder ohne monetäre Konsequen- zen während dem Training einer arc-tracking Aufgabe zu messen (Kapitel 3 ). Mittels Handge- lenkbewegungen konnten die Versuchspersonen einen Cursor auf einem Bildschirm kontrol- lieren und diesen durch einen halbkreisförmigen Kanal steuern. Unsere Resultate zeigen ei- nen Einfluss der Feedback Modalität auf das Lernen dieser motorischen Fertigkeit. Wurde das Leistungsfeedback nach gut gelösten Durchgängen an eine monetäre Belohnung ge- knüpft, so führte dies zu einer besseren Konsolidierung der Fertigkeitsaufgabe und einer hö- heren Aktivierung des ventralen Striatums. Daraus schliessen wir, dass Belohnungsstrate- gien, welche während dem Training von motorischen Fertigkeiten die Aktivität im ventralen Striatum zu erhöhen vermögen, eingesetzt werden können, um die Konsolidierung der Fer- tigkeit zu fördern.

In Schlaganfallpatienten ist die Aktivität dieser Bahnen jedoch möglicherweise reduziert. Dies nicht nur weil die alltäglichen Belohnungen wohl eher klein sind, sondern auch weil Patienten nach einem Schlaganfall, verglichen mit gleichaltrigen gesunden Personen, unter Umständen ein Defizit in der Belohnungsverarbeitung aufweisen. Reduzierte Hirnaktivierungen als Ant- wort auf belohnendes Feedback konnten in kognitiven Aufgaben bereits nachgewiesen wer- den. In Kapitel 4 testen wir die Hypothese, dass dies auch auf motorische Aufgaben zutrifft. Dazu haben wir eine ähnliche Fertigkeitsaufgabe, modifiziert als motorische MID Task, unter gleichzeitiger Verwendung von fMRT eingesetzt, um die striatale Antwort auf eine leistungs- abhängige monetäre Belohnung in Schlaganfallpatienten zu untersuchen. Die Resultate von neun Schlaganfallpatienten und neun gleichaltrigen gesunden Kontrollen deuten auf eine Tendenz zu reduzierter Reaktivität der ventralen Teile des Striatums in Schlaganfallpatienten hin. Dies ist von besonderem Interesse, weil in unserer Vorgängerstudie erhöhte Aktivierun- gen im Striatum verbunden waren mit einer besseren Konsolidierung der motorischen Auf- gabe. Darüber hinaus wissen wir von Tierstudien, dass ein ordnungsgemässes Funktionieren des dopaminergen Belohnungssystems wichtig ist für das Lernen motorischer Fertigkeiten.

xiv Zusammenfassung

Eine reduzierte Reaktivität des ventralen Striatums auf belohnende Rückmeldungen (intrin- sischer oder extrinsischer Natur) im Zusammenhang mit einer vorhergehenden motorischen Leistung könnte daher auf eine reduzierte motorische Lernfähigkeit nach Schlaganfall hin- deuten. Die Fähigkeit zu lernen soll allerdings die motorische Genesung fördern.

Etwa 50% aller Patienten verbleiben nach einem Schlaganfall mit funktionellen Einschrän- kungen der oberen Extremität. Da wir zeigen konnten, dass das motorische Lernen durch Training mit belohnendem Feedback verbessert wird, möchten wir nun die Hypothese prü- fen, ob auch rehabilitatives Armtraining durch belohnendes Feedback gefördert werden kann. Eine solche Verstärkung belohnender Stimuli während dem Rehabilitationstraining könnte ein Mittel sein, um das sonst eher reduziert reagierende Belohnungssystem von Schlaganfallpatienten zu stimulieren und somit die motorische Erholung zu fördern. Eine wei- tere wichtige Errungenschaft dieser Doktorarbeit ist daher die Entwicklung und Beschreibung eines Forschungsprojekts, welches untersucht, ob Belohnungen in der Form von Leistungs- rückmeldungen und kleinen Geldbeträgen die Effizienz von rehabilitativem Training fördern können (Kapitel 5 ). Diese randomisierte kontrollierte Studie wird die erste sein, welche auf direkte Weise einen Einfluss von belohnendem Feedback inklusive monetärer Belohnung auf den Genesungsprozess der oberen Extremität nach einem Schlaganfall evaluiert. Ein positi- ves Ergebnis könnte den Weg für neue Arten von Interventionen mit signifikant besserem Behandlungsnutzen ebnen.

Die vorliegende Dissertation hat Erkenntnisse aus Tierstudien bestätigt, indem gezeigt wurde, dass Belohnungen einen positiven Einfluss auf das motorische Fertigkeitslernen im Menschen haben. Dieser Effekt war mit einer erhöhten Aktivierung des ventralen Striatums verbunden. Erste Daten von Schlaganfallpatienten deuten jedoch darauf hin, dass die Reak- tivität des ventralen Striatums auf Belohnungen im Vergleich zu gesunden gleichaltrigen Kon- trollen reduziert ist. Dennoch zeigt diese Arbeit das Potenzial auf, belohnende Rückmeldun- gen in der Form von Leistungsfeedback und monetärem Gewinn zur Förderung von neurore- habilitativem Training einzusetzen. Ein Folgeprojekt zur Untersuchung eines möglichen Ef- fekts auf das rehabilitative Training der oberen Extremität nach Schlaganfall wird in dieser Dissertation beschrieben.

xv

xvi

Contents

Acknowledgments ...... v Abstract ...... ix Zusammenfassung ...... xiii Contents ...... xvii List of Abbreviations ...... xxi General Introduction ...... 1 Stroke ...... 2 The Burden of Stroke ...... 2 Motor Learning as a Model for Stroke Recovery and Neurorehabilitation...... 4 Reward and Motivation ...... 6 The Use of Performance Feedback and Reward for Optimization of Motor Learning… ...... 8 … and Neurorehabilitation of Motor Functions ...... 9 Thesis Outline ...... 9 What can the monetary incentive delay task tell us about the neural processing of reward and punishment? ...... 13 Abstract ...... 14 Introduction ...... 15 Anatomy of the reward system ...... 17 Aspects of Processing Reward and Punishment ...... 19 Anticipation and Consumption (wanting/liking) ...... 19 Reward versus Punishment ...... 20 Reward-based Learning ...... 22 Goal-oriented Behavior and Reward ...... 24 Reward Processing and Error Monitoring ...... 26 Discounting of Delayed Reward ...... 27 Individual Influences on Reward Processing ...... 28 Conclusion ...... 33 Acknowledgments ...... 33 Disclosure ...... 33

xvii

Rewarding feedback promotes motor skill consolidation via striatal activity ...... 35 Abstract ...... 36 Introduction ...... 37 Methods ...... 38 Participants ...... 38 Study Design ...... 39 Motor Task ...... 39 fMRI Measurements ...... 42 Analysis of Imaging Data ...... 45 Analysis of Behavior ...... 46 Results ...... 47 fMRI ...... 47 Behavioral Results ...... 48 Discussion ...... 50 Training and Motor Skill Acquisition ...... 51 Consolidation ...... 51 Limitations ...... 54 Conclusion ...... 55 Acknowledgments ...... 55 Processing of Motor Performance Related Reward After Stroke ...... 59 Abstract ...... 60 Introduction ...... 61 Material and Methods ...... 61 Participants ...... 61 fMRI Task ...... 61 Results ...... 63 Discussion ...... 64 Conclusion ...... 64 Acknowledgments ...... 64 Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso- Reward: Study protocol for a randomized controlled trial...... 67 Abstract ...... 68 Background ...... 69 Methods ...... 70 Study Design ...... 70

xviii

Study Population ...... 70 Randomization ...... 70 ArmeoSenso Training System...... 71 Intervention ...... 72 Rewarded Training ...... 73 Control Training ...... 75 Outcome Measures ...... 76 Sample Size ...... 77 Statistical Analysis ...... 77 Discussion ...... 77 Declarations ...... 79 Ethics Approval and Consent to Participate ...... 79 Competing Interests ...... 79 Funding...... 79 Authors' Contributions ...... 79 Acknowledgments ...... 79 General discussion ...... 81 Main Findings ...... 82 Imaging Studies (Chapters 3 and 4) ...... 83 Studies on Stroke Patients (Chapters 4 and 5) ...... 85 Open Questions and Outlook ...... 86 Conclusion ...... 87 References ...... 91 Curriculum Vitae Mario Widmer, MSc ETH ...... 105

xix

xx

List of Abbreviations

AEs Adverse Events BI Barthel Index CRO Clinical Research Organisation EEG Electroencephalography EKNZ Ethikkommission Nordwest- und Zentralschweiz FMA-UE Fugl-Meyer Assessment of the Upper Extremity fMRI Functional Magnetic Resonance Imaging GABRA2 γ-Amino Butyric Acidβ2 Receptor Subunit GCP Good Clinical Practice GLMM Generalized Linear Mixed Model ID Identification Number IMU Inertial Measurement Unit

KP good Knowledge of Performance after well-solved trials

KP good +MR Knowledge of Performance plus Monetary Reward after well-solved trials

KP random Knowledge of Performance after random selection of trials M1 Primary Motor Cortex MAL14 Motor Activity Log 14 MR Monetary Reward MID Task Monetary Incentive Delay Task NIHSS National Institutes of Health Stroke Scale PI Principle Investigator RASGRF2 Ras Protein-specific Guanine Nucleotide-releasing Factor 2 RCT Randomized Controlled Trial ROM Range of Motion SAE Serious Adverse Event WHO World Health Organization WMFT Wolf Motor Function Test

xxi

General Introduction

1

General Introduction

Stroke

The Burden of Stroke Likely, the word “stroke” was first introduced in 1689 by William Cole. Before that, "apo- plexy" was commonly used to describe very acute nontraumatic brain injuries (Sacco et al., 2013). Nowadays, the World Health Organization (WHO) defines stroke as "rapidly develop- ing clinical signs of focal (or global) disturbance of cerebral function, lasting more than 24 hours or leading to death, with no apparent cause other than that of vascular origin" (Aho et al., 1980). But there are also other, similar definitions (Sacco et al., 2013). A stroke is caused by disruption of the blood supply to the brain, which may result from either blockage (is- chemic stroke) or rupture of a blood vessel (hemorrhagic stroke) (Mackay et al., 2004). In both cases, the brain tissue is damaged due to a lack of oxygen and nutrient supply. With about 87%, ischemic strokes account for the vast majority of stroke incidents (Go et al., 2014).

Worldwide, about 15 million people suffer a stroke per year (Mackay et al., 2004). While the incidence of stroke has declined by 42% over the past four decades in high-income countries, a more than 100% increase was observed in low- to middle-income countries. As a conse- quence, in the period 2000-08, the overall stroke incidence rates in low- to middle-income countries have exceeded the level of stroke incidence in high-income countries for the first time (Figure 1.1) (Feigin et al., 2009). Per year, about 5 of the 15 million people suffering a stroke die (Mackay et al., 2004). WHO estimates for 2001 indicate that death from stroke in low- and middle-income countries accounted for 85.5% of stroke deaths worldwide (Feigin et al., 2009; Mathers et al., 2006). In this context, the British newspaper "The Guardian" stated sharply: "The poor are dying more and more like the rich" (Lomborg, 2015). Hence, there is an urgent need for the progress in prevention and mortality achieved in the devel- oped world to be translated to middle- and lower-income societies (Towfighi and Saver, 2011).

Although the implementation of preventive treatments and reductions in risk factors at the population level helped to significantly reduce stroke incidence in well-developed countries (Feigin et al., 2009), stroke prevalence is likely to increase in the future due to the aging pop- ulation (Veerbeek et al., 2014). In the United States, for example, projections show that by 2030, an additional 3.4 million adult people will have had a stroke, which reflects a 20.5% increase in prevalence from 2012 (Go et al., 2014).

2

General Introduction

A B

Figure 1.1: Age-adjusted stroke incidence rates per 100 000 person-years across the past four decades in (A) high-income countries, and (B) low to middle income countries. Solid line is regression trend line. The regression line is based on a regression of average incidence on study period. Adapted from Feigin et al. (2009).

Seen from this angle, the increase in survivors with post-stroke morbidity is a tradeoff of mortality reduction through better acute stroke care (Towfighi and Saver, 2011). In Switzer- land, about 14'000 stroke survivors are discharged from the hospital each year (Meyer et al., 2009) and a large number of patients remain disabled. Typical deficits are motor impairments such as paresis, spasticity, and disorders of mobility together with such neuropsychological impairments as amnesia, agnosia, aphasia, apraxia, executive dysfunction, and mood disor- ders (Chen et al., 2013). In the long-term, 25-74% of patients who have suffered a stroke have

3

General Introduction to rely on human assistance for basic activities of daily living like feeding, self-care, and mo- bility (Miller et al., 2010), which has, of course, strong consequences for the patients and their families (Anderson et al., 1995). Moreover, stroke-related healthcare costs place a heavy burden on the society, which is likely to increase in the future. It is projected, for in- stance, that the total direct medical stroke-related costs in the United States will triple be- tween 2012 and 2030 (Go et al., 2014).

Hence, although stroke epidemiology is not an explicit topic of this thesis, these staggering numbers highlight that, even though remarkable progress in stroke prevention and health care after stroke has been made, we are far away from having solved the burden of stroke worldwide. This emphasizes the importance for research in the field of stroke rehabilitation.

Motor Learning as a Model for Stroke Recovery and Neurorehabilitation Every simple goal-oriented movement is made up of separate operations, each of which, in the context of stroke, may or may not be affected by a lesion (Krakauer, 2006). However, almost all stroke patients experience at least some degree of functional recovery within the first six months post-stroke. Mechanisms like recovery of penumbral tissues, neural plastic- ity, resolution of diaschisis and behavioral compensation strategies are presumed to be in- volved (Kwakkel et al., 2004). Rehabilitation is believed to interact with these underlying pro- cesses and, although some aspects of brain reorganization are probably unique to brain in- jury (Krakauer, 2006), there are large overlaps with development (Carmichael, 2003) and mo- tor learning (Kleim et al., 2004).

"Rehabilitation, for patients, is fundamentally a process of relearning how to move to carry out their needs successfully" (Carr, 1987). This statement illustrates that rehabilitation is based on the assumption that practice or training leads to improvement of skills after hemi- paresis (Krakauer, 2006). According to Shadmehr and Wise (2005), all humans must extend their motor repertoire during their lifetime in order to be able to adapt to changing circum- stances. This expansion is called skill learning or skill acquisition and may be seen as practice- dependent reduction of performance errors detected through sensory channels (Krakauer et al., 1999). Moreover, we must continually adapt those new, but also existing motor programs to changing circumstances. This modification of existing elements of the motor repertoire is called motor adaption (Shadmehr and Wise, 2005).

4

General Introduction

Figure 1.2: Typical force field experiment. (A) Subject sits in front of a manipulandum and executes reaching movements to visual targets. () Trajectories are initially straight. When exposed to a field (B), trajectories are initially perturbed (). With training, they resume the prototypical shape (E). If the field is removed after learning, subjects display aftereffects () as overcompensation for the ex- pected perturbation. Adapted from Gandolfo et al. (1996).

In motor control experiments (Figure 1.2), after initial adaption to a certain perturbation (e.g., a force field), which is then suddenly turned off, trajectories are usually skewed in the direction opposite to that seen during initial adaptation (Gandolfo et al., 1996). These "after- effects" indicate that the central nervous system alters motor commands to the arm to pre- dict the effects of the forcefield. In turn, a new mapping between limb state and muscle forces (internal model) is formed. This is of great importance for rehabilitation because it means that the internal model can be updated as the state of the limb changes (Krakauer, 2006).

5

General Introduction

However, the most fundamental principle in motor learning likely is that the degree of per- formance improvement depends on the amount of practice (Schmidt and Lee, 1988). Simi- larly, there is evidence for a positive influence of intensive training on functional recovery after stroke (Veerbeek et al., 2014), albeit a recent study among 361 participants could not prove a dose-effect for occupational therapy on motor function or recovery after 12 months (Winstein et al., 2016). Nonetheless, improvement is (although certainly not exclusively) lim- ited by a subject's motivation to train, as it determines whether an individual is willing to spend its time and resources for the training of (rehabilitative) exercises. Therefore, the fol- lowing subchapter provides a short introduction into the concept of motivation and describes how it may be influenced, e.g., by rewards.

Reward and Motivation Doing something because it is inherently interesting or enjoyable is generally referred to as acting on intrinsic motivation, which is influenced by factors such as the subject’s perceived autonomy, competence for or relatedness to a task (Ryan and Deci, 2007). These factors, hence, make up the intrinsic value of the exercise. Extrinsically motivated actions, on the other hand, are performed because they lead to an outcome, e.g., to a reward (Ryan and Deci, 2000). Typically, rewards can be categorized into primary and secondary rewards. Pri- mary rewards have a direct positive value for an individual receiving the reward. They often have a physiological meaning, like food, beverages and sex. Secondary rewards, on the other hand, have no direct value, but we learn that receipt of such usually has positive conse- quences (e.g., money, tokens, some forms of social acknowledgement, or similar). While the valuation of primary rewards depends on hunger, thirst, or other states of the organism, sec- ondary rewards are less prone to saturation and thus possess a relatively stable value. Nev- ertheless, a multitude of factors exist, influencing the individual valuation of primary as well as secondary rewards (Lutz and Widmer, 2014; Schultz, 2000; Sescousse et al., 2013).

According to the concepts of behaviorists, reward increases the probability that a rewarded behavior is shown in the future. Hence, rewards are closely related to motivation, providing incentives to actively seek certain stimuli (Lutz and Widmer, 2014). Generally, an individual's motivation to perform a specific exercise or activity is determined by the subjective benefit and the subjective cost of the activity as illustrated in Figure 1.3. Both, benefits and costs are 1) subjective - i.e., dependent on individual's preposition, goals, values and attitudes -, 2) state-dependent (such as, the benefit of eating a sandwich is higher when hungry than when

6

General Introduction saturated, the cost of a cycling exercise is higher when tired than when well-rested), and 3) multifactorial (i.e., the overall benefit of a given exercise or activity is determined by multiple benefits of different natures) (Studer and Knecht, 2016).

Figure 1.3: The motivation for a specific exercise is increased by the subjective expected benefit and decreased by the subjective expected cost of the exercise. Both sides contain an intrinsic and an ex- trinsic component. However, although the tasks that we deal with in this thesis do, of course, also contain a subjective cost component, we mainly aim at manipulating the subjective benefit side in order to influence our subjects' motivation to train and, hence, their behavior. Adapted from Studer and Knecht (2016).

Rewards augment the overall subjective benefit of a task, making people tolerate higher sub- jective costs and are thus traditionally defined as stimuli an organism is willing to work for (Knutson and Cooper, 2005; Thorndike, 1931). While extrinsic reward refers to the receipt of material (e.g., food or money) for a specific activity, the term "intrinsic reward" is used to describe reward derived from task inherent stimulation (e.g., information about an achieved performance, looking at a self-painted picture, or feeling self-produced movements). This includes stimuli that signal performance accuracy, usually termed feedback, which can then be used to modify future performance (Kluger and DeNisi, 1996). Generally, learners like to receive positive feedback, causing positive feedback to have appetitive value, and thus to act as a reward (Elliott et al., 1997; Tricomi and DePasque, 2016).

The studies included in the present thesis were designed to manipulate the benefit side for performing a specific exercise/training (Figure 1.3) by adding performance feedback and/or a monetary incentive, and to investigate a possible effect on behavior ( Chapters 3-5).

7

General Introduction

The Use of Performance Feedback and Reward for Optimization of Motor Learning… The processing of both, intrinsic (in the form of performance feedback) and extrinsic reward (i.e., money) as described above has been shown to increase the neural activity in the stria- tum (Lutz et al., 2012), a key locus of reward processing (Knutson et al., 2009; for a detailed overview on the neural correlates involved in distinct aspects of reward processing, see Chapter 2 ). Several studies reported activation elicited by feedback alone in the dorsal stria- tum (Poldrack et al., 2001; Tricomi et al., 2006; Tricomi and Fiez, 2008; Tricomi et al., 2004). However, Lutz et al. (2012) found that only the ventral striatum was active during perfor- mance feedback, while feedback plus monetary reward also activated the dorsal parts of the striatum and elicited stronger activation in the ventral parts.

These activations in response to performance feedback (linked or not linked to a monetary outcome) are of particular interest because it is known that in a rewarded task, the hemody- namic ventral striatal response correlates with dopamine release in the ventral striatum, which as well correlates with the reward-related neural activity in the substantia nigra/ven- tral tegmental area, the origin of the dopaminergic projection (Schott et al., 2008).

In humans, indirect evidence for dopamine involvement in motor learning is found in studies showing that long-term potentiation (LTP)-like plasticity in the primary motor cortex (M1) is enhanced by levodopa, a precursor of dopamine (Kuo et al., 2008), abnormally reduced in Parkinson’s disease and restored to normal in these patients by dopaminergic treatment (Morgante et al., 2006). Animal studies suggest that motor learning may be mediated by such LTP-like processes in M1 (Rioult-Pedotti et al., 2000). Furthermore, it has been shown in an- imals that M1 plasticity and skill learning depend on dopamine (Molina-Luna et al., 2009) and Hosp et al. (2011) could demonstrate that these processes in rodents rely on midbrain dopa- minergic projections involved in signaling reward. Destroying dopaminergic neurons in the ventral tegmental area prevented improvements in forelimb reaching, a state that was abol- ished on administration of levodopa into M1 (Hosp et al., 2011). These findings give strong reasons to assume that dopaminergic reward signals, such as rewarding feedback, may alter LTP and thus lead to increased efficiency in motor skill acquisition and, potentially, motor recovery after stroke.

Indeed, recent work suggests positive effects of reward on procedural (Wachter et al., 2009) and skill motor learning (Abe et al., 2011) as well as on motor adaption (Galea et al., 2015).

8

General Introduction

Notably, all of these studies reported dissociable effects of positive and negative reward, and the latter two found positive reward to impact task consolidation/retention.

… and Neurorehabilitation of Motor Functions As mentioned above, neurorehabilitative training for stroke patients is an effective interven- tion to increase independency in daily life activities (Veerbeek et al., 2014). Part of this train- ing induced reduction of impairments is mediated by plastic reorganization of cortical circuits (Luft et al., 2004; Nudo, 2003; Schaechter, 2004). Thus, the ability to learn is assumed to support successful recovery and rehabilitation therapy after stroke (Krakauer, 2006; Lam et al., 2016). Reward has been shown to increase the effectiveness of learning a motor task (Abe et al., 2011; Galea et al., 2015; Wachter et al., 2009). However, in stroke survivors the activity of this dopaminergic pathway may not only be reduced because rewards are small, but also because patients after stroke have deficits in reward processing (Lam et al., 2016). Stroke survivors showed reduced brain activation to smiley face feedback, which was reflected in impaired reinforcement learning in a probabilistic classification task when compared to age- matched healthy individuals (Lam et al., 2016). Whether stroke also affects the processing of motor performance related reward is, however, yet unknown. Moreover, amplifying reward- ing stimuli during rehabilitative training might be a means to overcome such a deficit and to stimulate the dopaminergic system to improve recovery.

Thesis Outline The main objective of the present thesis was to translate evidence regarding the role of re- ward as a facilitator of synaptic plasticity in M1 and, as a consequence, of motor learning from animal models to humans. The ultimate goal, however, is the application of rewarding interventions for the optimization of motor neurorehabilitation in clinical populations that could benefit from improved plasticity as, for example, patients after stroke. This thesis con- sists of cumulative research articles originally written for separate peer-reviewed pub- lications in scientific journals (Chapters 2-5).

As a first step, literature of the most widely used functional imaging task to investigate the processing of reward in healthy, but also in clinical populations, was thoroughly reviewed (Chapter 2) .

9

General Introduction

Based on the acquired knowledge and using a modified version of a recently well-published motor skill task (Shmuelof et al., 2012), the effect of different reward modalities on motor skill learning was investigated by manipulating either the schedule for, or the extrinsic sub- jective value of delayed performance feedback. Using functional magnetic resonance imag- ing (fMRI), behavioural results could be linked to specific neural activations, while focusing on the striatum as a key region of reward processing in the human brain (Chapter 3) .

Based on findings presented in Chapter 3 , the most efficient reward condition was chosen to investigate the neural processing of motor performance related reward in patients after stroke. Pilot results of this investigation have been published and are presented in Chapter 4 .

Chapter 5 introduces a study protocol for an ongoing randomized controlled trial to investi- gate the clinical effect of reward on neurorehabilitation of motor functions after stroke.

Finally, Chapter 6 discusses the specific findings of this thesis in conjunction with each other, also mentioning shortcomings and future directions.

10

11

12

What can the monetary incentive delay task tell us about the neural processing of reward and punishment?

Published in:

Neuroscience and Neuroeconomics 2014, 3: 33-45. https://doi.org/10.2147/NAN.S38864

Authors:

Lutz . and Widmer .

Publisher:

Dove Medical Press Limited

Keywords:

Reward, punishment, dopamine, reward system

13

What can the monetary incentive delay task tell us about the neural processing of reward and punishment?

Abstract Since its introduction in 2000, the monetary incentive delay (MID) task has been used exten- sively to investigate changes in neural activity in response to the processing of reward and punishment in healthy, but also in clinical populations. Typically, the MID task requires an individual to react to a target stimulus presented after an incentive cue to win or to avoid losing the indicated reward. In doing so, this paradigm allows the detailed examination of different stages of reward processing like reward prediction, anticipation, outcome pro- cessing, and consumption as well as the processing of tasks under different reward condi- tions. This review gives an overview of different utilizations of the MID task by outlining the neuronal processes involved in distinct aspects of human reward processing, such as antici- pation versus consumption, reward versus punishment, and, with a special focus, reward- based learning processes. Furthermore, literature on specific influences on reward pro- cessing like behavioral, clinical and developmental influences, is reviewed, describing current findings and possible future directions.

14

What can the monetary incentive delay task tell us about the neural processing of reward and punishment?

Introduction Traditionally, rewards are defined as stimuli an organism is willing to work for and punish- ments as stimuli an organism is trying to avoid (Thorndike, 1931). These concepts have played a central role in the psychology of learning ever since they were introduced by behaviorism last century (see recent overviews by Domjan (2009); Miltenberger (2011)). They imply that reward and punishment are linked to an operant, i.e., to an agent’s action. According to be- haviorist concepts, reward increases the probability that a rewarded behavior is shown in the future, whereas punishment decreases this probability. Therefore, reward and punishment are closely related to motivation, providing incentives to actively seek or avoid certain stim- uli, and thus can elicit appetitive or avoidance behavior, respectively.

Rewards have been categorized into primary and secondary rewards. Primary rewards con- sist of stimuli which have a direct positive value for an individual receiving the reward. Many of these primary rewards or punishments have a physiological meaning, like food, beverages, sex, and pain. In contrast, secondary rewards have no immediate direct value, but an individ- ual learns that receipt of such rewards usually has positive consequences. Such rewards can be money, tokens, some forms of social acknowledgement, or similar. Valuation of primary rewards depends on hunger, thirst, or other states of the organism, often making it necessary to deprive an individual under observation of the respective reward, in order to make sure that the stimulus is indeed rewarding. In comparison, secondary rewards are less prone to saturation and thus possess a relatively stable value. Nevertheless, a multitude of factors exist, influencing the individual valuation of primary as well as secondary rewards.

The neuroscientific study of reward processing flourished with the detailed examination of neuronal activity in rodent brains during consumption and anticipation of rewards and pun- ishment (Hollerman and Schultz, 1998; Schultz, 1998). For a comprehensive review, see Schultz (2006). This work revealed that unexpected presentation of a reward, acting as an unconditioned stimulus, leads to a phasic increase in dopaminergic activity in the substantia nigra/ventral tegmental area. After classical conditioning of such a reward to a conditioned stimulus, the conditioned stimulus elicits a similar phasic increase of dopaminergic activity, but presentation of the unconditioned stimulus does not do so anymore. Correspondingly, if presentation of a conditioned stimulus is not followed by an unconditioned stimulus despite this being expected (leading to extinction), then a phasic decrease of dopaminergic activity can be found at the time when the unconditioned stimulus had been expected. Thus, a

15

What can the monetary incentive delay task tell us about the neural processing of reward and punishment? wealth of animal studies has led to the description of a reward system and allowed formula- tion of hypotheses about reward processing in human brains.

Soon after these groundbreaking investigations, research was extended to human subjects, mainly using neuroimaging methods to assess changes in neuronal activity due to the pro- cessing of reward and punishment (Delgado et al., 2000; Knutson et al., 2000). The most im- portant paradigm used for these studies has been the monetary incentive delay (MID) task. This task consists of the announcement of an incentive, which is linked with a certain contin- gency to receipt of this incentive. Basically, this reflects the case of classical conditioning. However, the standard version of the MID task requires an individual to react to a target stimulus presented after the incentive cue but before the reward is given. Whether the an- nounced reward is delivered depends then on the individual reaction. Again, contingency can be introduced to make receipt of the reward more or less predictable from the individual action. Examples of such actions include forced choice behavior, memory tasks, and motor tasks. See Figure 2.1 for a schematic comparison of classical conditioning and the MID task.

Figure 2.1: Schematic drawing of an incentive delay task (B) in comparison with a classical conditioning scheme (A). Note that both settings, instead of using reward/reinforcement, allow for use of aversive stimuli/punishment.

If contingency exists between an action (i.e., task processing) and a consequence, the learn- ing process rather fits into the scheme of operant conditioning. In this context, appetitive stimuli are called reinforcers, since they strengthen the reinforced behavior. If the action is not reinforced (e.g., because it was not performed to a trainer/teacher’s satisfaction), ac- cording to learning theory, this leads to extinction. Note that in the case of classical condi- tioning, a stimulus is, or is not, followed by a reward. During the MID task, an action is, or is

16

What can the monetary incentive delay task tell us about the neural processing of reward and punishment? not, followed by reinforcement. However, the MID task allows assignation of different stimuli to different behaviors shown during task processing. One important possibility is to assign reinforcement to one action and an aversive stimulus (punishment) to another action trig- gered by the preceding cue. This is not the same as assigning a pleasant stimulus (UCS1) to a conditioned stimulus in some cases and an aversive (UCS2) to the same conditioned stimulus in other cases, since during classical conditioning, presentation of the conditioned stimulus is not controllable by the individual, whereas during the MID task, task processing is. Further- more, both set ups, i.e., classical conditioning and MID tasks, allow the use of pleasant (ap- petitive) as well as unpleasant (aversive) stimuli to generate reward or punishment, respec- tively. The most important difference between the set ups is that reward/punishment in the MID task depends on task processing whereas in classical conditioning it depends on the con- ditioned stimulus.

This paradigm allows investigation of different stages of reward processing, like reward pre- diction, anticipation, outcome processing, and consumption, as well as the processing of tasks under different reward conditions. The current review gives an overview of the differ- ent utilities of the MID task that have been published since its introduction by Knutson et al. (2000). The review does not attempt to give an exhaustive overview of the literature, but instead presents selected articles in order to highlight how the MID task has been used to investigate neuronal processes involved in distinct aspects of human reward processing, such as anticipation versus consumption, reward versus punishment, and reward-based learning processes. We further highlight work investigating different influences on reward processing like behavioral, clinical, and developmental influences, as well as reward processing in differ- ent contexts. While describing current findings, the review attempts to point to possible fu- ture directions of investigation in the human reward system.

Anatomy of the reward system In order to present an anatomical framework for discussing the neuronal processes involved in reward and punishment, Figure 2.2 gives an overview of the relevant brain structures, as described by Haber and Knutson (2010).

17

What can the monetary incentive delay task tell us about the neural processing of reward and punishment?

Figure 2.2: The human reward circuit. Evidence from self-stimulation, pharmacological, physiological, and behavioral studies emphasizes the key role of the nucleus accumbens and the ventral tegmental area dopamine neurons in the human reward circuit. However striatal and midbrain areas involved during reward processing are more extensive than previously thought, including the entire ventral striatum and the dopamine neurons of the substantia nigra, respectively. Thereby, the orbital frontal cortex (dark orange arrow) and the anterior cingulate cortex (light orange arrow) provide the main cortical input to the ventral striatum. Moreover, the ventral striatum receives substantial dopaminer- gic input from the midbrain. On the other hand, ventral striatum projections target the ventral pal- lidum and the ventral tegmental area/substantia nigra, which, in turn, via the medial dorsal nucleus of the thalamus, project back to the prefrontal cortex. Additionally, other structures, such as the amyg- dala, hippocampus, lateral habenular nucleus, and specific brainstem structures, such as the pedun- culopontine nucleus and the raphe nuclei, play a key role in the regulation of the reward circuit.

Abbreviations: Amy, amygdala; Hipp, hippocampus; NAcc, nucleus accumbens; dACC, dorsal anterior cingulate cortex; dPFC, dorsal prefrontal cortex; Hypo, hypothalamus; S, shell; STN, subthalamic nu- cleus; VP, ventral pallidum; vmPFC, ventral medial prefrontal cortex; THAL, thalamus; LHb, lateral habenular; PPT, pedunculopontine nucleus.

Notes: Reprinted by permission from Macmillan Publishers Ltd: Neuropsychopharmacology. Haber SN, Knutson B. The reward circuit: linking primate anatomy and human imaging. Neuropsychopharmacol- ogy. 2010;35(1):4–26.9 Copyright © 2010.

18

What can the monetary incentive delay task tell us about the neural processing of reward and punishment?

Aspects of Processing Reward and Punishment A typical rewarding or punishing situation is a complex phenomenon. It consists of distinct temporal phases and can include different classes of stimuli. In the following sections, the most important aspects, as discussed in the literature, are outlined.

Anticipation and Consumption (wanting/liking) As described by Knutson et al. (2000) in their original article introducing the monetary de- layed incentive task, a distinction between anticipation and consumption of rewards should be made when interpreting neuronal activity involved in reward processing. Such a distinc- tion has been suggested based on previous observations in animals (Elliott et al., 2000), and on traditional views (Craig, 1917). Consequently, one year after publication of their original article (Knutson et al., 2000), a report about distinct neuronal activity in humans attributable to anticipation versus consumption of rewards was published (Knutson et al., 2001b). In short, it reports that reward anticipation activates ventral striatal regions, whereas the re- ceipt of reward outcomes activates the ventromedial frontal cortex, thus replicating earlier studies in monkeys (Schultz et al., 2000). This finding has essentially been corroborated over the years with different types of reward (Breiter et al., 2001; Knutson et al., 2003; 'Doherty et al., 2002; Rademacher et al., 2010). Closer inspection of the time course of brain activity involved in reward processing has revealed a more complex pattern: after presentation of monetary gain or loss, activity in the dorsal striatum, particularly the dorsal part of the cau- date nucleus, is sensitive to valence (reward/punishment) as well as outcome magnitude (Delgado et al., 2003). This is true at later stages, approximately 9–12 seconds after outcome presentation, when large rewards elicit the strongest increase and large punishments the weakest. On the other hand, the ventral striatum, especially the nucleus accumbens, seems to be strongly influenced by incentives (Knutson et al., 2001b) and shows less reactivity to outcome than the dorsal striatum (Delgado et al., 2003). Interestingly, initial feedback-re- lated activity in the dorsal striatum seems to be dependent on incentive values, but after a few seconds, activity seems to depend on the size of outcome (Delgado et al., 2004).

The dynamics of brain activity in relation to processing of different reward stages has led to the formulation of a temporal difference model of reward-based learning (Knutson and Wimmer, 2007; O'Doherty et al., 2003). In brief, this model describes how error terms are derived from a mismatch between the predicted reward and that actually received. This mis- match can lead to a positive or negative reward prediction error, meaning that an outcome

19

What can the monetary incentive delay task tell us about the neural processing of reward and punishment? is better or worse than expected, respectively. Prediction of rewards in response to cues seems to take place in the nucleus accumbens (Pagnoni et al., 2002), whereas the medial prefrontal cortex seems to (re)calculate expectations of gains in response to outcomes.

While these findings, gleaned by means of functional magnetic resonance imaging, describe brain activities with relatively slow temporal dynamics in the range of seconds, other meth- ods reveal how other brain activities with faster temporal dynamics are related to reward prediction and receipt. Thus, a more complex picture emerges, i.e., processing negative pre- diction errors leads to negativity in the posterior cingulate cortex and striatum, whereas pro- cessing of reward expectancy corresponds with electrophysiological activity in the posterior cingulate cortex, anterior cingulate cortex, and parahippocampal gyrus (Donamayor et al., 2012). Furthermore, electroencephalography (EEG) and magnetoencephalographic methods reveal that reward cues are coded by neuronal oscillations in the beta (20–30 Hz) and theta (5–8 Hz) range in the frontal regions (Bunzeck et al., 2011; Donamayor et al., 2012; Kawasaki and Yamaguchi, 2013). Integration of these results into existing knowledge about the human reward system has only just started, and is likely to benefit from further studies investigating the fast temporal dynamics of human reward processing.

Reward versus Punishment In addition to the question of temporal dynamics, when discussing rewards, the question remains as to whether rewarding and punishing effects are processed by distinct brain struc- tures. To elaborate on this question, it seems beneficial to briefly overview the positive and negative effects of rewards or punishments; by definition, a stimulus that increases the fre- quency of a behavior, upon which the stimulus is contingent, is called reinforcement. Reward and positive reinforcement are commonly considered to be synonymous, although a reward is less strictly defined. Positive reinforcement usually consists of the presentation of an ap- petitive stimulus contingent on an individual’s behavior, whereas negative reinforcement consists of the removal of a noxious or otherwise aversive stimulus. On the other hand, pun- ishment can consist of the presentation of an aversive stimulus or the removal of an appeti- tive stimulus. The MID task theoretically allows for investigation of all of these entities. How- ever, instead of investigating the removal of stimuli, the incentive delay task has usually been used to investigate negative prediction errors, i.e., an unexpected decrease of reward mag- nitude or an unexpected increase in punishment.

20

What can the monetary incentive delay task tell us about the neural processing of reward and punishment?

There have been several investigations comparing neuronal activity correlated with positive and negative prediction errors. Without distinguishing between anticipation and outcome processing, Delgado et al. (2000) found stronger involvement of the ventral striatum (approx- imate region of nucleus accumbens) and the dorsal striatum (caudate nucleus) in trials show- ing a positive rather than a negative outcome. The latter structure was later shown to code reward magnitude in a parametric manner (Delgado et al., 2003). Since the task used in the study of Delgado et al. (2000) involved gambling, and cues were not manipulated to induce expectancies, reward anticipation is unlikely to have varied systematically and therefore should not have influenced these findings.

Rogers et al. (2004) showed that activity in the medial prefrontal cortex (posterior orbitome- dial cortex and pregenual anterior cingulate cortex) increases when positive outcomes are given, relative to the situation when subjects are confronted with a loss. Importantly, these outcomes, due to the nature of the task, were unpredictable, so positive outcomes represent a positive prediction error. While Rogers et al. (2004) only reported increased brain activity due to processing of positive outcomes versus negative outcomes, and not vice versa, Ramnani et al. (2004) investigated both types of prediction error separately. Their results corroborate the finding that unexpected rewards activate, among other regions, the medial prefrontal cortex. They also showed that unexpected omission of rewards activates a distinct region of the medial prefrontal cortex, more anterior to the aforementioned areas. Negative outcomes in these studies were operationalized as not receiving an expected reward. Alter- natively, negative outcomes can be explicitly defined as a loss by deducting a certain amount of money from a participant’s credit. In doing so, distinct regions are revealed that code pos- itive (gain) and negative (loss) reward prediction errors (Yacubian et al., 2006). Whereas un- expected reward is confirmed to activate the ventral striatum, unexpected loss is shown to correlate with neuronal activation in the amygdala. Interestingly, using this design, not only receipt of outcomes (prediction errors) but also their anticipation involved activation of the ventral striatum and amygdala; anticipation of positive outcomes activates the ventral stria- tum, whereas anticipation of negative outcomes activates the amygdala. Further evidence for involvement of the amygdala in anticipation of outcomes comes from a study using a different task (Breiter et al., 2001). A wheel-of-fortune game presented subjects with several possible gains in some rounds and with possible losses in other rounds. The results differ from those reported by Yacubian et al. (2006) in that activation of the amygdala was in- creased during anticipation of loss as well as reward in the study by Breiter et al. (2001) but

21

What can the monetary incentive delay task tell us about the neural processing of reward and punishment? only during anticipation of loss in the study by Yacubian et al. (2006). Although speculative, the difference might be explained by the subjects’ role in the tasks. The wheel-of-fortune task did not give the subjects any control over the outcome, whereas during the guessing task used by Yacubian et al. (2006) subjects might have had a feeling of agency, i.e., being responsible for the outcome. Thus, anticipation of reward might depend on whether subjects perceive themselves to have control over the outcome or not.

Additionally, introducing high-incentive versus low-incentive trials (operationalized as mon- etary gain/loss versus knowledge of performance without monetary consequence, respec- tively), it has been shown that the dorsal striatum/caudate nucleus is mainly sensitive to monetary incentive, even during outcome processing (Delgado et al., 2004).

Reward-based Learning One of the most important functions of reward processing is to enable the organism to adapt behavior in order to maximize reward and minimize punishment. Reward prediction error indicates that a cue is not associated with the expected consequence. Thus, in the future, expectations connected to that cue should change. This forms the essence of classical condi- tioning. As a result, being confronted with the respective cue might be avoided or advanced in the future. Similarly, if reward is dependent on an individual’s behavior (e.g., choice be- havior or motor accuracy), a prediction error informs the individual that the behavior does not lead to the expected outcome. According to learning theory, behaviors are chosen so that expected reward is maximized and/or expected punishment is minimized. Thus, behav- iors leading to reward are strengthened and behaviors leading to punishment are weakened. This is the principle of operant (or instrumental) conditioning.

As a variant of the classical MID task, early studies (Ramnani and Miall, 2003; Ramnani et al., 2004) gave rewards contingent on goal-oriented activities or contingent on stimuli unrelated to behavior. A main result was that if not contingent on any behavior, unpredicted rewards evoked activity in the orbitofrontal cortex, the frontal pole, the parahippocampal cortex, and the cerebellum. If a monetary incentive is present while a visually triggered action is selected and planned, this results in enhanced activity within the prestriate visual cortex, the premo- tor cortex, and the lateral prefrontal cortex as compared with action selection and planning without a monetary incentive. These findings, based on goal-oriented behavior, do not in- volve striatal structures, which makes it hard to integrate them into the reward literature

22

What can the monetary incentive delay task tell us about the neural processing of reward and punishment? existing at the time. In fact, there is the possibility that focusing on goal-oriented motor be- havior might involve different structures than those involved in focusing on the decision pro- cess. However, no direct comparison of classical versus operant conditioning processes was performed in these studies. O'Doherty et al. (2004) applied a task requiring a two-alternative forced choice in which fruit juice could be gained as a reward upon selecting the right reac- tions in response to corresponding cues, thus forming a non-monetary instrumental condi- tioning task. The investigators compared this task with a condition in which the subject had no influence on the outcome, but the selection was made by a computer (coupled with the subject’s previous selections), thus forming a classical conditioning situation in which motor activity is comparable with instrumental conditioning. This approach allows the conditioning process to be viewed within the framework of an actor-critic model, where the actor chooses actions according to expected action outcomes and the critic controls whether the actions lead to the expected rewards (reward prediction error). In this setting, an actor was only assumed to be active in the instrumental conditioning condition. The results show that ven- tral striatum activity correlates with reward prediction error signal in both types of condi- tioning. The prediction error signal during classical conditioning was related to activity in the ventral putamen and the prediction error signal during instrumental conditioning was related to the nucleus accumbens. The dorsal striatum, on the other hand, was more strongly acti- vated due to outcome processing in instrumental conditioning than in classical conditioning, suggesting its involvement in the role of an “actor”. Importantly, in another study, the dorsal striatum (head of the caudate nucleus) was activated only when a reward was perceived to be contingent on an action (Tricomi et al., 2004).

Considering reward following cues versus reward following actions, Glascher et al. (2009) have set up an experiment discriminating between the two. They found that activity in the ventromedial prefrontal cortex corresponds to the expected reward following actions as well as external cues. On the other hand, using an operant conditioning paradigm, FitzGerald et al. (2012) distinguished action values (the value ascribed to a specific action) from choice values (the value ascribed to either of two choices). They were able to show that the ventro- medial prefrontal cortex along with thalamic and insular structures decode action-specific values, and are thus likely to be involved in operant conditioning. This is partly consistent with studies showing brain activity in the ventromedial prefrontal cortex to correspond to the expected value of actions or choices (Palminteri et al., 2009; Wunderlich et al., 2009).

23

What can the monetary incentive delay task tell us about the neural processing of reward and punishment?

These studies of reward-based learning make it clear that distinct mechanisms are likely to be involved when action is or is not required by the subject. However, when acting, we have to consider whether the actions are goal-oriented or not. If rewards or punishments are of- fered, we can usually assume that the agent’s goal is to maximize reward and to minimize punishment. Some aspects of goal-oriented behavior in the context of reward processing are discussed in the following section.

Goal-oriented Behavior and Reward One important distinction between classical and operant conditioning lies in the fact that classical conditioning does not assume an agent’s action to lead to consequences. Instead, during classical conditioning, stimulus effects on the individual are investigated. Operant con- ditioning, on the other hand, rewards certain behaviors, leading to increased probability that these behaviors take place in the future, possibly in order to receive further rewards. Thus, presentation of rewards is commonly understood to be accompanied by emotional reactions which may trigger motivated behavior. This view brings a series of studies into focus, allowing the question of whether goal orientation might involve distinct components of the reward system to be addressed. An early study pointing in this direction showed dorsal striatum ac- tivity in response to the presentation of performance feedback in classification learning tasks (Aron et al., 2004; Poldrack et al., 2001; Tricomi et al., 2006) Interestingly, positive perfor- mance feedback did not elicit stronger activity than negative feedback in any subregion of the striatum (Aron et al., 2004). However, when giving performance feedback under two conditions, one signaling achievement of the subject’s goal and the other signaling the same amount of information but unrelated to any explicit goal, the former condition activated the head of the caudate nucleus more strongly (Tricomi and Fiez, 2008). Similarly, Nees et al. (2012) demonstrated that during a MID task, the anticipation of an optional reward elicited ventral striatum activity dependent on the magnitude of the possible reward. On the other hand, no such dependency existed in a simple guessing task, in which reward magnitude, although being experimentally manipulated, was unrelated to the subject’s behavior. Other studies have used a slot-machine task requiring no action and compared this with tasks in which outcome was contingent on the preceding choice or action (Donkers et al., 2005). Us- ing EEG, the investigators found that action outcomes elicited a transient change in medio- frontal activity when they were unfavorable (errors). On the other hand, when independent of the subject’s action, a similar mediofrontal EEG component was elicited in response to both favorable as well as unfavorable salient outcomes. This finding was later substantiated

24

What can the monetary incentive delay task tell us about the neural processing of reward and punishment? with a different task, again showing a greater difference between win-related and loss-re- lated EEG frontomedial components, when the outcome depended on the subject’s action (Zhou et al., 2010). These findings can be seen in the context of the abovementioned studies of reward anticipation when there is (Yacubian et al., 2006) or is not (Breiter et al., 2001) a likely feeling of agency. To elaborate further on the role that performance feedback plays in the reward circuitry, a recent study compared performance feedback in a motor task when performance was or was not linked to monetary reward (Lutz et al., 2012). In both cases, being informed about good performance activated the ventral striatum. However, in this study, feedback about bad performance led to less activity in this region than feedback about good performance, in contrast with previous findings using other tasks (Aron et al., 2004). However, in those tasks, information about bad performance was similarly valuable for the overall goal of learning a classification task, whereas the goal in the later study was generally to maximize precision in a motor task that in a subset of trials led to monetary gain. Thus, error was always negatively coupled with reward. Similar results have been found in a study using a category learning task with a monetary incentive in 50% of trials whereas only cogni- tive feedback was given in the other 50% (Daniel and Pollmann, 2010). Both kinds of feedback elicited increases in the activity of several basal ganglia structures during the anticipation phase. Activity in the nucleus accumbens was stronger in monetary incentive trials, corre- sponded to measures of extrinsic motivation in monetary incentive trials, and corresponded to measures of intrinsic motivation in cognitive feedback trials. Video gaming is another ex- ample in which high motivation can be observed without obvious rewards. Performance feedback is stressed in many of these games, and massive release of dopamine into the ven- tral and dorsal striatum was reported long ago during video gaming (Koepp et al., 1998). An active role of the player seems to be essential for this (Katsyri et al., 2013).

These studies show that performance feedback, especially if it informs about good perfor- mance, elicits neuronal activity in many respects comparable with the neuronal activity elic- ited by the presentation of reward. Given that performance feedback is not regarded as a classical rewarding stimulus, the question about motivation in these tasks without explicit incentive is interesting. The concept of intrinsic motivation (Ryan and Deci, 2000) assumes that some tasks are worked on merely because a subject enjoys working on the task. With certain components of the human reward system being involved in the processing of perfor- mance feedback and even being connected to measures of intrinsic motivation, versions of

25

What can the monetary incentive delay task tell us about the neural processing of reward and punishment? the MID task without monetary (or other forms of) incentive seem to provide a valuable ap- proach to investigate intrinsic motivation and its interaction with extrinsic motivation. This interaction in behavioral studies has been discussed controversially in terms of the interest- ing notion that monetary reward can undermine intrinsic motivation. A pioneering study us- ing a variant of an MID task investigated the size of the midbrain and striatal activation due to positive performance feedback under several conditions, i.e., when performance was cou- pled to monetary reward, when performance was not coupled to monetary reward, and when performance was not coupled to monetary reward after having been coupled to it pre- viously. Not differentiating between the dorsal and ventral striatum, the authors found that performance feedback elicited the strongest responses in the midbrain and striatal regions due to feedback of good performance which was monetarily rewarded, and significantly weaker activation if no monetary incentive was given. Interestingly, removing the monetary incentive led to a drop in midbrain and striatal activity to significantly below the level in a control group where monetary incentives had never been present.

These studies show that, in some tasks, performance feedback can serve as task intrinsic reward, so instead of incentive motivation, intrinsic motivation might act. Considering the much discussed advantages of intrinsic motivation versus extrinsic motivation (Frey, 1994; Frey and Jegen, 2001), closer examination of the neural systems involved in the interaction of these motivational systems would be of interest.

Reward Processing and Error Monitoring A topic closely related to the role of performance feedback in reward processing is the pro- cessing of error information in cognitive or motor tasks. As mentioned earlier, processing of error information in a categorical learning task has been shown to elicit activity in the ventral striatum in specific settings (Poldrack et al., 2001). Error information is most frequently in- vestigated in tasks showing similarity to the MID task (consisting of the triad cue, ac- tion/choice, and outcome), with the difference being that monetary incentive is not usually coupled to error information. Therefore, although the scope of the present paper is focused on the MID task, a brief comment on the interesting link between human error processing and reward systems seems appropriate.

Processing error information in decision tasks has been studied intensively using electrophys- iological methods, mainly EEG. Event-related components, time locked to erroneous behav- ior (error-related negativity), and time locked to feedback about an error (feedback-related

26

What can the monetary incentive delay task tell us about the neural processing of reward and punishment? negativity) have been identified as the most important and most reliable neuronal correlates of error processing. An influential model, the reinforcement learning theory of error pro- cessing assumes that whenever information indicates that an action does not result in an expected consequence, disinhibition of the anterior cingulate cortex mediated by the basal ganglia leads to a negative EEG deflection in the frontocentral electrodes. The precise loca- tion and involvement of the basal ganglia, along with the anterior cingulate cortex and other structures not readily amenable to investigation by EEG, have been identified by functional magnetic resonance imaging studies of error perception (Holroyd et al., 2004; Klein et al., 2007; Ullsperger and von Cramon, 2003). The theory is eloquently described by Holroyd et al. (2009). The stronger the expectation of a certain action outcome that is missed, the greater the EEG deflection. Thus, the reinforcement learning theory of error processing only applies if stable expectancies concerning action outcome can be formed, allowing a reward prediction error to be generated. In motor learning theory, the capacity to predict action outcomes is inherent in an internal model mapping actions to consequences on the environ- ment (Wolpert et al., 1995). Confirming the reinforcement learning theory of error pro- cessing and its applicability to motor learning, a recent study has shown that error-related negativity increases with buildup of such an internal model while learning audiomotor map- pings on a manipulated piano keyboard layout (Lutz et al., 2013). The evidence presented here demonstrates that reward prediction errors seem to play a greater role than would be expected from the investigation of MID tasks. Rather, reward prediction and outcome mon- itoring seem to have important features in common. As proposed by Kaplan and Oudeyer (2007), the goal to minimize prediction error in several domains might be driving intrinsically motivated behavior like playing and exploration, and the nucleus accumbens may play a piv- otal role in this process. Interestingly, novelty, as encountered when exploring new environ- ments, activates a neuronal system partly overlapping with the reward network, and contex- tual novelty seems to boost activity in the striatum (Bunzeck et al., 2011; Bunzeck et al., 2012; Businelle et al., 2010; Guitart-Masip et al., 2010).

Discounting of Delayed Reward MID tasks also allow investigation of other aspects of reward processing, not mentioned so far. One important aspect of the value of the reward is determined by the temporal availa- bility of the reward; if available immediately, a reward is valued more highly than if it is avail- able only after a certain period of time. Thus, introducing a choice between small and imme- diately available or large and not immediately available rewards enables study of the process

27

What can the monetary incentive delay task tell us about the neural processing of reward and punishment? of devaluation due to temporal delay, known as delayed reward discounting. A wealth of literature has accumulated showing how different periods of delay influence the subjective perception of reward value in different populations. A recent overview of this topic, including underlying neural mechanisms, has been published by Peters and Buchel (2011). Subjective discounting seems to take place in a neural system comprising the ventromedial prefrontal cortex, ventral striatum, and posterior cingulate cortex. Further, the amygdala and hippo- campus are involved in delay discounting. The specific contribution of these structures is not yet fully understood, but abnormal delay discounting has been linked to neuropsychiatric disorders related to impulsivity, as well as to addictive behaviors. Typically, addicts discount delayed rewards at a much higher rate than control subjects (Bickel et al., 2007; Businelle et al., 2010). Moreover, this impulsive discounting behavior has been shown to be largely inde- pendent of the particular drug of abuse and thus seems to be a reliable trait marker for ad- diction, especially given that it has been observed not only for drug rewards, but also for nondrug rewards, such as money (Peters and Buchel, 2011). The latter makes the MID task an ideal tool for the study of addictive behavior in the context of reward and loss. The way in which the neural response to reward differs between addicted and healthy subjects is dis- cussed in the following section concerning individual influences on reward processing.

Individual Influences on Reward Processing With the observation that certain clinical populations show abnormal delay discounting, a further field of investigation using the MID task is now introduced, along with a few exam- ples, i.e., a detailed description of alterations in reward processing in clinical populations, as well as influences of personality traits and changes across the lifespan. This can help to in- crease our knowledge about the relevant diseases and find new treatment approaches.

For instance, many authors have used the MID task to understand reward and loss in addicts. The observation of steeper delay discounting in addicts raises the question of whether in- creased discounting is a consequence or a cause of addiction. That is, do genetic factors result in an impulsive personality and thereby increase the likelihood of drug abuse, or is impulsive discounting a repercussion of changes at the neural level due to long-term drug abuse? Gen- erally, while addicts show an increased response of the reward system to drug-related cues (Diekhof et al., 2008), overall the data imply that addiction is associated with reduced activa- tion of the valuation network (i.e., the ventral striatum and orbitofrontal cortex, including the ventromedial prefrontal cortex) during processing of nondrug rewards (Peters and

28

What can the monetary incentive delay task tell us about the neural processing of reward and punishment?

Buchel, 2011). Recent evidence from a longitudinal genetic neuroimaging study links de- creased reward sensitivity during the anticipation phase of an MID task to a certain haplotype of the ras protein-specific guanine nucleotide-releasing factor 2 (RASGRF2) gene in 14-year- old males (Stacey et al., 2012). This haplotype has previously been linked to addictive behav- ior (Schumann et al., 2011), and thus represents a possible genetic risk factor for drug addic- tion. In contrast with this reward deficiency hypothesis, other studies have observed in- creases in ventral striatal activity during the anticipation of monetary gains in chronic canna- bis users (Filbey et al., 2013; Jager et al., 2013; Nestor et al., 2010), and a blood-oxygen level- dependent response in the right ventral striatum was found to be significantly correlated with lifetime use and reported lifetime cannabis joints consumed (Nestor et al., 2010). There- fore, the relationship between chronic cannabis use and activity in the ventral striatum might be qualitatively different from that involving other drugs (Bjork et al., 2008). Concerning the question of cause or consequence, a recent study by Patel et al. (2013), in addition to cor- roborating the reward deficiency hypothesis, investigated reward processing in former and current cocaine users. Both groups differed similarly from control subjects, but between- group differences were found in the ventral tegmental area during loss outcome and in pre- frontal regions during loss anticipation. The authors concluded that current cocaine use may influence reward processing circuits, and that even long-term cocaine abstinence does not normalize most drug-related reward circuit abnormalities. Since both groups showed ele- vated impulse-related factors that relate to loss, the authors further suggested that these tendencies may predate cocaine addiction. Further, genetic factors have been shown to be associated with altered reward processing in alcoholism (Villafuerte et al., 2012). Certain var- iants in the inhibitory γ-amino butyric acidβ2 receptor subunit (GABRA2) gene are linked with higher insular cortex activity during anticipation of reward and punishment, as well as with impulsiveness and familial alcohol abuse. Here, however, changes in dopaminergic activity have not been directly reported, since GABRA2 acts on the production of GABA A receptors. All in all, further studies are needed investigating the extent to which functional differences in former users of cocaine and other drugs reflect pre-existing features, exposure, and recov- ery.

As another example of changed reward processing in a clinical population, patients with at- tention deficit hyperactivity disorder show decreased activation in the ventral striatum dur- ing anticipation of gain, but increased activation of the orbitofrontal cortex in response to gain outcomes (Strohle et al., 2008). However, these observations of decreased activation of

29

What can the monetary incentive delay task tell us about the neural processing of reward and punishment? the putamen (Stoy et al., 2011), while being partially confirmed in adults with persistent symptoms of attention deficit hyperactivity disorder, did not prevail in symptom-free adults with childhood attention deficit hyperactivity disorder (Stoy et al., 2011), and behavioral changes in reward processing were not confirmed in other studies (Demurie et al., 2011; Demurie et al., 2013). Thus, although the phenomenon of striatal hypoactivation during re- ward anticipation is well known in patients with attention deficit hyperactivity disorder (Stoy et al., 2011; Strohle et al., 2008), it would be premature to draw firm conclusions.

Schizophrenia is another disease that has been linked to changes in reward processing. Dur- ing reward anticipation, schizophrenics show significantly less activation in the ventral stria- tum (Schlagenhauf et al., 2008), anterior cingulate cortex, and dopaminergic midbrain re- gions than healthy controls (Nielsen et al., 2012a), possibly explaining the symptoms of apa- thy (Simon et al., 2010) commonly present in schizophrenia. This attenuation was reduced by treatment with the dopamine agonist amisulpride (Nielsen et al., 2012b) or with olanzap- ine (Schlagenhauf et al., 2008). However, this attenuation has not consistently been repli- cated in other studies (Waltz et al., 2010; White et al., 2013). Although there is no clear pic- ture as yet regarding how the reward system may be modified in patients with schizophrenia, probing the integrity of this system may lead to identification of subgroups and tailored treat- ment concepts.

A recent meta-analysis of the literature on reward processing in major depressive disorder has summarized the results of 22 functional magnetic resonance imaging studies, of which five used variations of the MID task and another seven used conditional learning or guessing tasks with or without rewards to patients. This work yielded rather heterogeneous results, possibly due to the great heterogeneity in the experimental paradigms used. One general finding seemed to be decreased reward-related activity in the subcortical and limbic areas and an increased response in cortical areas. The authors concluded that “future studies may be strengthened by paying careful attention to the types of reward used as well as the dif- ferent components of reward processing examined” (Zhang et al., 2013).

As mentioned above, addictive behavior is linked to changes in reward processing, possibly via greater impulsiveness or altered reward sensitivity in addicted individuals. However, in nonclinical populations, reward sensitivity, as measured by questionnaire (Torrubia et al., 2001) is a stable trait associated with changes in, e.g., reward-based learning and inhibitory control (Corr, 2004). This trait also has neurophysiologic correlates. For example, individuals

30

What can the monetary incentive delay task tell us about the neural processing of reward and punishment? with high reward sensitivity show increased responses in the nucleus accumbens and mid- brain to reward anticipation (Beaver et al., 2006; Camara et al., 2010; Carter et al., 2009; Hahn et al., 2009), as measured during performance of MID tasks. Further, structural and functional correspondence to high trait reward sensitivity, not directly related to MID task processing, have been described and complement our understanding of individual differ- ences in reward processing. Such changes include diminished striatal volume (Barros- Loscertales et al., 2006), increased strength of the white matter tract between the nucleus accumbens and amygdala (Cohen et al., 2009), more random resting neural dynamics in the nucleus accumbens and orbitofrontal cortex (Hahn et al., 2012), and less functional connec- tivity between the midbrain and medial orbitofrontal cortex (Costumero et al., 2013).

Other personality factors have also been shown to relate to reward processing. For example, Wu et al. (2014) have investigated the affective traits of positive arousal and negative arousal, as derived by factor analysis from several standard affective personality subscales (i.e., the extraversion and neuroticism scores from the -Five-Factor Inventory; actual high-arousal positive and actual high-arousal negative scores from the A Values Inventory; and behavioral inhibition, behavioral activation-reward, behavioral activation-drive, and be- havioral activation-fun scores from the behavioral inhibition/behavioral activation scale). These authors demonstrated that during anticipation of large gains, the nucleus accumbens show significantly increased activity bilaterally, whereas during anticipation of large losses, activity in the anterior insular cortex is significantly increased bilaterally. Interestingly, acti- vation increases in the left nucleus accumbens during anticipation of large gains correlate with positive arousal scores, whereas activation increases in the right anterior insula during anticipation of large losses correlate with negative arousal scores.

Further, recent studies have shown that reward processing can be influenced by environ- mental factors such as stress. Treadway et al. (2013) demonstrated that subjects reporting a greater impact of stressors had smaller neural responses in the medial prefrontal cortex in response to both monetary gains and losses in an MID task. Similarly, acute stress, induced before performing a guessing task, blunted activation increases in the dorsal striatum and orbitofrontal cortex when compared with a control group not subject to stressors (Porcelli et al., 2012). These studies, although not directly corroborating each other, nevertheless draw a comparable picture, revealing a decrease in mediofrontal reward-related brain activity un- der conditions of perceived stress, which might relate to the role of stress as risk factor for addictive behavior (Sinha, 2009).

31

What can the monetary incentive delay task tell us about the neural processing of reward and punishment?

Another topic warranting brief discussion here is the development of reward processing over the lifespan. Although it could be argued that a comparison between healthy adolescents and adults reflects intraindividual development rather than interindividual differences, no longitudinal studies on reward processing beyond adolescence are available, to our knowledge (note, however, the IMAGEN trial following a cohort of 2,000 adolescents and describing, among other things, functional genetics and neuroimaging of the MID task) (Schumann et al., 2011). Thus, studies using the MID task to investigate gain and loss in de- velopmental populations are discussed in this section. Adolescents are of particular interest in this context because of their increased willingness to take risks. Bjork et al. (2010) were the first to compare patterns in the reward circuit in response to incentive cues and out- comes between adolescents and adults. They observed lower right ventral striatal and right- extended amygdala activation due to gain anticipation (but not consumption) in adolescents. These findings were subsequently replicated by the same group (Bjork et al., 2011). In con- trast, other studies investigating win versus no-win demonstrated stronger activation of the ventral striatum in adolescents, but observation of stronger activation in the amygdala of adults was documented, thus suggesting that “maturing subcortical systems become dispro- portionately activated relative to later maturing top-down control systems, biasing the ado- lescent’s action toward immediate over long-term gains” (Ernst et al., 2005; Galvan et al., 2006). The divergence of findings from these different studies has been attributed to sensi- tivity of the incentive-motivational neurocircuitry to the nuances of the incentive task or stimuli, such as behavioral or learning contingencies and to the specificity of the component of instrumental behavior, such as anticipation versus notification (Bjork et al., 2010). More recently, it was found that, compared with adults, adolescents show less of a linear increase in ventral striatal activity during anticipation of increasing reward magnitude (Vaidya et al., 2013). In this study, adults, but not adolescents, demonstrated greater ventral striatal activity in response to the same absolute reward when it was the preferred of two possibilities (i.e., $1 versus $0.20 compared with $1 versus $5), indicating that ventral striatal activity in ado- lescents is less sensitive to relative reward value. Further, reduced ventral striatal sensitivity to absolute anticipated reward correlated with a higher level of trait impulsivity. This finding is consistent with that of another study, in which healthy young subjects, who happened to be steep delay discounters, showed lower responses in the left ventromedial caudate during anticipation of potential reward (Benningfield et al., 2014). All in all, although their findings may diverge in some aspects, researchers agree on the attribution of increased risk-taking

32

What can the monetary incentive delay task tell us about the neural processing of reward and punishment? and impulsive behavior during adolescence to developmental differences in neural pro- cessing of rewards. Moreover, with the development of a child-friendly version of the MID task (Helfinstein et al., 2013; Kappel et al., 2013), the investigation of reward processing in developmental populations can now be validly expanded to children.

Conclusion We conclude with a short final valuation and synopsis of the use of the MID task. One of the most important achievements of the MID task is to provide a paradigm flexible enough to allow investigation of many facets of reward processing and yet allowing comparison be- tween studies. By parsing the whole process of reward processing, from incentive presenta- tion, task performance, display of approach or avoidance behavior, possible discounting of reward due to delay, and finally reward consumption, researchers are free to focus on any of these steps in a multitude of populations, using different reward modalities and introducing other variations. We have briefly mentioned current developments, e.g., the use of the MID task in prospective genetic neuroimaging studies on the development of psychiatric disor- ders. We have pointed to relationships with other tasks, e.g., some forms of conditioning or error processing, thereby placing a special focus on the possible role that agency and goal orientation might have in the processing of rewards and punishments. These relationships should be further explored in future studies, thus integrating knowledge gathered in differ- ent fields of research. Some fields of integration are already emerging, e.g., elucidating the role played by reward processing in learning mechanisms connected with novelty, or inves- tigating the processing of performance feedback in the framework of reward processing, which may yield new insights into the mechanisms of intrinsic motivation.

Acknowledgments The work was supported by the Clinical Research Priority Program “Neuro-Rehab” of the Uni- versity of Zurich. The authors would like to thank three anonymous reviewers for their knowl- edgeable and constructive remarks.

Disclosure The authors report no conflicts of interest in this work.

33

34

Rewarding feedback promotes motor skill consolidation via striatal activity

Published in:

Progress in Brain Research 2016, 229: 303-323. http://dx.doi.org/10.1016/bs.pbr.2016.05.006

Authors:

Widmer M., Ziegler ., Held .P., Luft A.. and Lutz K.

Publisher:

Elsevier

Keywords:

Motor skill learning, monetary reward, performance feedback, knowledge of performance, fMRI, striatum, pointing task, consolidation

35

Rewarding feedback promotes motor skill consolidation via striatal activity

Abstract Knowledge of performance can activate the striatum, a key region of the reward system and highly relevant for motivated behavior. Using functional magnetic resonance imaging, striatal activity linked to knowledge of performance was measured during the training of a repetitive arc-tracking task. Knowledge of performance was given after a random selection of trials or after good performance. The third group received knowledge of performance after good per- formance plus a monetary reward. Skill learning was measured from pre- to post- (acquisi- tion) and from post- to 24 hours post-training (consolidation). Our results demonstrate an influence of feedback on motor skill learning. Adding a monetary reward after good perfor- mance leads to better consolidation and higher ventral striatal activation than knowledge of performance alone. In turn, rewarding strategies that increase ventral striatal response dur- ing training of a motor skill may be utilized to improve skill consolidation.

36

Rewarding feedback promotes motor skill consolidation via striatal activity

Introduction Extrinsically motivated actions, are performed because they lead to an outcome, e.g., to a reward (Ryan and Deci, 2000). By increasing the extrinsic subjective value, rewards augment the overall subjective benefit of a task, making people tolerate higher subjective costs, and are thus traditionally defined as stimuli an organism is willing to work for (Knutson and Cooper, 2005; Lutz and Widmer, 2014). Intrinsic motivation, on the other hand, refers to do- ing something because it is inherently interesting or enjoyable, which is influenced by factors such as the subject’s perceived autonomy, competence for or relatedness to a task (Ryan and Deci, 2007). Similar to motivation, reward can be classified as extrinsic or intrinsic (Deci et al., 1999; Deci et al., 2001; Reitman, 1998). While extrinsic reward refers to the receipt of material (e.g., food or money) for a specific activity, the term “intrinsic reward” refers to reward derived from task inherent stimulation (e.g., information about an achieved perfor- mance, watching a self-painted picture, or feeling self-produced movements). Evidence from behavioral studies implies that extrinsic reward might undermine intrinsic motivation and thus may lead to a decrease in performance (Callan and Schweighofer, 2008; Deci et al., 1999; Kohn, 1999; Murayama et al., 2010; Spence, 1970). For instance, the time children spend drawing decreases below baseline after this behavior had been (externally) rewarded and the reward has then been withdrawn (Greene and Lepper, 1974).

In experiments using functional magnetic resonance imaging (fMRI), both intrinsic and ex- trinsic (performance-dependent) reward have been shown to increase the neural activity in the striatum (Lutz et al., 2012), a key locus of reward processing (Knutson et al., 2009). In these experiments, only the ventral striatum was active during performance feedback, while feedback plus monetary reward activated both ventral and dorsal parts of the striatum. How- ever, other studies found activation elicited by feedback alone also in the dorsal striatum (Poldrack et al., 2001; Tricomi et al., 2006; Tricomi and Fiez, 2008; Tricomi et al., 2004). Fur- thermore, dorsal striatal activity was shown to be modulated by the subject’s sense of agency for having achieved a goal (Han et al., 2010; Tricomi and Fiez, 2008).

Previous research has investigated the influence of feedback and reward on the acquisition of cognitive tasks, e.g., decision making paradigms (den Ouden et al., 2013; Frank et al., 2004; Robinson et al., 2010). Our animal studies suggest that dopaminergic signals originating in reward-coding brain regions (ventral tegmental area) are required for motor skill acquisition. In rodents, dopaminergic projections from the ventral tegmental area to the primary motor cortex enable motor learning and long-term potentiation in cortico-cortical projections (Hosp

37

Rewarding feedback promotes motor skill consolidation via striatal activity et al., 2011; Molina-Luna et al., 2009). These projections are not necessary for task execution (Molina-Luna et al., 2009). We hypothesize that this system can be used to facilitate motor skill learning by amplification of rewarding stimuli.

Indeed, recent work suggests positive effects of monetary reward on procedural (Wachter et al., 2009) and skill motor learning (Abe et al., 2011) as well as on motor adaption (Galea et al., 2015). Notably, all of these studies reported dissociable effects of positive and negative reward, and the latter two found positive reward to impact task consolidation/retention. Moreover, the reward-related learning effect reported by Wachter et al. (2009) was found to be mediated by the dorsal striatum. However, these studies exclusively used money as an extrinsic reward, albeit, as illustrated above, also intrinsic rewards (e.g., knowledge of per- formance) were shown to activate the human reward circuits and thereby possibly influence motor learning.

Dopaminergic neurons in the midbrain signal outcomes that are better than expected (posi- tive prediction error (Schultz, 2000)). Being informed about unexpectedly good performance may thus cause a positive prediction error. Indeed, only being informed about positive task outcome resulted in better performance than being informed about the outcome of poorly solved trials (Chiviacowsky and Wulf, 2007). Whether these findings come along with higher reward activity after good performance feedback remains to be elucidated.

In the present study, a modified version of the arc-pointing task that involves a visually- guided precision movement of the wrist (Shmuelof et al., 2012) was used to test the hypoth- esis that striatum activation is increased if knowledge of performance is given after good performance instead of a random selection of trials. Adding a performance-dependent mon- etary reward was expected to further increase this activation. In addition, we hypothesized that motor skill learning is improved in conditions with enhanced striatum activity.

Methods

Participants Forty-five healthy right-handed volunteers (22 females, 20-34 years of age, 24.5 years on average; Table 3.1) participated in this study that was approved by the cantonal ethics com- mittee (KEK-LU 13054). Hand preference and dominance were assessed using the Edinburgh Handedness Inventory (Oldfield, 1971) and the Hand Dominance Test (Steingruber and

38

Rewarding feedback promotes motor skill consolidation via striatal activity

Lienert, 1971), respectively, confirming that all participants were classified as right-handed. Subjects were recruited from the University community or shared a similar educational sta- tus. They were not specifically skilled or trained in comparable motor tasks. All participants gave written informed consent before being randomly assigned to one of three groups. Allo- cation was according to a computer-generated random number sequence. Subjects were un- aware of the other groups and the scientific rationale of the study. All subjects received fi- nancial compensation in comparable amounts, but only for one group payments depended on individual performance during the training of the motor task.

Overall KP random KP good KP good +MR

N (Dropouts) 44 (1) 15 ( -) 14 (1 ) 15 ( -)

Age (SD) 24.5 (3.2) 25.9 (2.8) 25.1 (3.7) 22.9 (2.5)

Sex, male / female 23 / 21 7 / 8 7 / 7 9 / 6

Table 3.1: Subject characteristics. N reports the number of subjects per group with dropouts listed in brackets. SD is standard deviation. Note that groups were allocated randomly, not by matching any of the reported characteristics.

Study Design Subjects participated in the study for two consecutive days. Neutral (group-independent) test sessions were performed to assess momentary performance on day one, before and af- ter the group-specific training. To assess overnight task consolidation, subjects returned 20 to 28 hours after finishing day 1-training.

Motor Task Originally, the arc-pointing task (Shmuelof et al., 2012) was developed to investigate the speed-accuracy trade-off function during motor skill learning. To examine the influence of knowledge of performance with or without monetary reward on brain activity and motor skill learning, we modified the task. Here, the task required subjects to perform wrist movements to steer a cursor on a computer screen through a semicircular channel (Figure 3.1). To max- imize the dynamic range of learning the non-dominant left rather than the right wrist was chosen assuming that initial performance would be worse with the left. Ideally, the cursor

39

Rewarding feedback promotes motor skill consolidation via striatal activity had to be guided along the middle of an arc-channel with the nominal movement speed dic- tated by a clock hand pointing at the current nominal position (Figure 3.2A). For each frame (at 60 frames per second), the absolute distance from the actual to the nominal position was calculated and the average over the whole movement was used as performance measure to determine a score (with or without monetary consequences, Figure 3.2B).

Prior to each new block of movements, subjects viewed a computer-generated demonstra- tion of the clock hand moving along the channel in the required movement time. At the be- ginning of each trial, subjects placed the cursor in the red starting box. After a variable delay (800 to 1’600 ms), the box turned green as an “ok-to-go” signal (reaction time was not a measure of performance and subjects were told to start any time after the box turned green). As soon as the cursor had left the box in positive -direction (= upward), the clock hand started to move with uniform angular velocity continuously pointing at the nominal cursor position that subjects tried to adhere to. The cursor was visible throughout the movement (online feedback; Figure 3.2A) and the trial automatically ended when the clock hand arrived at the end of the channel. Then the screen froze for a variable period of time (500 to 4’500 ms). During test sessions, the subsequent trial directly followed. For training trials knowledge of performance or knowledge of performance plus monetary reward was presented for 3’000 ms at this point, followed by another variable delay period (500 to 4’500 ms) before the sub- sequent trial began. Figure 3.1 shows a schematic summary of the paradigm.

To assess skill level in the absence of knowledge of performance and monetary reward, par- ticipants had to perform the arc-pointing task at five different movement speeds defined by the movement time that was allowed to move the cursor through the arc-channel (the clock hand uniformly travelled along the arc in exactly that time). Per test session, seven consecu- tive trials were performed as blocks with one of five movement times (movement time in ms: 800, 1’000, 1’200, 1’400 and 1’800) and these blocks were randomly ordered with 15 s breaks in between. Ten familiarization trials were allowed prior to the very first test session (i.e., pre-training test) and, as already mentioned, a demonstration of the movement time was shown at the beginning of each movement time block. All in all, participants performed 35 trials per test session.

The training, on the other hand, was composed of five blocks of 50 trials each with 15 s breaks after 25 movements (within blocks) and two minutes breaks between the blocks. All 250

40

Rewarding feedback promotes motor skill consolidation via striatal activity

Figure 3.1: Trial sequence. After placing the cursor in the start box, the box eventually turned green (“ok-to-go” signal) and subjects were free to start the movement whenever ready. The placing of the cursor in the start box, as well as the period from “ok-to-go” to the actual start of the movement were self-paced and hence of variable length (var). A specific movement time (MT) according to the speed requirements of the current block of trials was allowed to steer the cursor through the semicircular channel. As soon as movement time elapsed, the screen froze. During test sessions, the next trial di- rectly followed. In case of a training trial, a group-specific knowledge of performance feedback was presented after feedback trials (FB TRIAL), or subjects were shown a neutral visual control stimulus after no-feedback trials (NO-FB TRIAL). Either way, the next training trial began after another delay period. training trials were performed at one single movement time (i.e., 1’200 ms). After a move- ment, subjects received a terminal feedback by ~50% chance. Here, the three groups differed in terms of the selection of feedback trials and in terms of the type of feedback they were given. While the first group received knowledge of performance after randomly selected tri- als (KP random ), the other groups got either knowledge of performance only (KP good ) or knowledge of performance signifying a monetary reward (KP good +MR) after relatively good performance, i.e., when they performed better than the moving median over their perfor- mance in the last 10 trials. As described above, the tip of the clock hand pointed at the nom- inal position for each frame during a trial and the cursor’s mean distance ( ̅) to the corre- sponding nominal position over all 72 frames per training trial (1’200 ms at 60 frames per second) was used as measure to quantify performance.

∑ ̅ = , where is the number of the current trial and f stands for frame number. For members of

KP good and KP good +MR, hence, a feedback was delivered from the eleventh trial on, if

̅ < ̅, ̅ … , ̅ , where is the median value. If selected as feedback trial, the

41

Rewarding feedback promotes motor skill consolidation via striatal activity feedback included, as a still image, the presentation of the trajectory travelled by the cursor as a series of circles that were colored according to their positions with respect to the channel (green if inside and red if outside of the channel). Moreover, the nominal trajectory was drawn as a series of equally spaced white circles along the middle of the channel and circles of the trajectory travelled by the cursor were linked to the corresponding nominal position by red lines (line width = 2 pixels ≈ 0.02° visual angle) to visualize (Figure 3.2B). Addition- ally, a score-feedback, for KP random and KP good , and a monetary reward, for KP good +MR, was calculated based on ̅. The relation between ̅ and the monetary reward was chosen, based on pilot measurements, to allow members of KP good +MR to earn approximately 50 Swiss Francs (CHF; approx. 50 US-Dollars) over the course of the experiment, since their minimal financial compensation was fixed to be 50 CHF less than that of KP random and KP good , if perfor- mance related monetary rewards are not considered. Therefore, the monetary reward in

Rappen (1 Rappen = 0.01 CHF; approx. 0.01 US-Dollars) was set to be equal to 100 - ̅/2, if

̅ < 200 pixels, and 0 if ̅ ≥ 200 pixels. Accordingly, a maximum of 1 CHF per trial could be won in the unrealistic case of perfect performance (i.e., ̅ = 0). Note that no money was deducted after poor performance. Knowledge of performance for KP random and KP good was equally calculated, but its unit was points instead of Rappen, and for all groups the result of the current trial as well as the sum over the whole course of the experiment (money in CHF) was presented after feedback trials (all in letters and digits of ≈ 0.38° visual angle; Figure 3.2B). In case of no-feedback trials, subjects were shown a similar screen in which scores or monetary rewards were replaced by question marks and only the nominal trajectory was presented. This ensured a comparable visual stimulus to the feedback conditions (no-feed- back screen; Figure 3.2C). fMRI Measurements During the experiment, subjects lay supine in the MR scanner having their left forearm fix- ated with a customized armrest that was screwed on to the scanner table. A spherical reflec- tive marker was attached to the proximal interphalangeal joint (knuckle) of their left index finger using surgical double-sided adhesive tape. An MRI-compatible motion capture camera set (Oqus MRI, Qualisys AB, Gothenburg, Sweden) consisting of eight cameras was used to continuously track the marker position at a frequency of 400 Hz. This information was im- ported online into Matlab R2012b (Mathworks Inc., Natick, MA, USA) using the Qualisys

42

Rewarding feedback promotes motor skill consolidation via striatal activity

Figure 3.2: A) During the movement, the position of the cursor was indicated with a white circle (online feedback) and a clock hand continuously pointed at the current nominal position, which was defined to be in the middle of the semicircular channel. B) A knowledge of performance feedback was pre- sented after feedback trials, including the trajectory travelled by the cursor (series of green (inside of channel) or red (outside of channel) colored circles), as well as the nominal trajectory (series of uni- formly distributed white circles). A red line linked each point of the cursor’s trajectory to its corre- sponding nominal position and the average length of these lines was used to determine a score (for KPrandom- and KPgood-groups) or a monetary reward (for KPgood+MR). “In diesem Versuch gewon- nen: 45 Punkte” is the German expression explaining that the subject has won 45 points in the pre- ceding trial, which, in this example, sums up to a total score of 137 over the whole experiment (“Gesamtpunktzahl: 137”). The neutral visual control stimulus presented after no-feedback trials is shown in C). Note that the travelled trajectory was omitted and numbers specifying the score or mon- etary reward were replaced by question marks.

43

Rewarding feedback promotes motor skill consolidation via striatal activity

Matlab plug-in. A computer program written in "Presentation 16.3" software (Neurobehav- ioral Systems, Inc., Albany, USA) sampled the Qualisys-marker position via Matlab interface and transformed it into screen coordinates. To do so, in a calibration step, subjects were asked to move their wrist maximally in all directions having arm movements prevented by the aforementioned armrest. During this step, extreme x- (left-right) and y-positions (up- down) were logged and the screen was adjusted to display, in x- and y-direction, the middle 60% of each subject’s individual range of motion. This procedure ensured that all participants were able to perform the required movements within a comfortable movement range.

The computer program also controlled stimulus presentation. While moving within the cali- brated area, the marker position was displayed as a circle (≈ 0.13° visual angle) on a screen (0.64 x 0.4 meters; 1920 x 1200 pixels) visible via mirrors to the subject inside the scanner (distance mirror - screen ≈ 1.90 meter). The arc was centered around the middle of the screen (origin of ordinates) with an inner and outer radius of 384 (≈ 3.86° visual angle) and 456 pixels (≈ 4.58° visual angle), respectively. However, only the upper arc was used for task execution and all task-movements were performed in clockwise direction. To indicate the start position, a red square with the side length equaling the width of the channel (72 pixels ≈ 0.72° visual angle) was placed at the beginning of the arc (box-center coordinates: x = -420 pixels, y = 0 pixels). Finally, a clock hand used to point at the nominal position for each frame during a trial, starting at the origin of ordinates with a length of 384 pixels (≈ 3.86° visual angle) and a width of 10 pixels (≈ 0.10° visual angle), completed the visual stimulus presented during each trial.

A Philips Ingenia 3.0T MRI scanner (Philips Healthcare, Best, The Netherlands) equipped with a Philips 32-channel head coil was used. During scanning sessions, head movement was minimized using a cushion and foam material parts. Three-dimensional anatomical images of the entire brain were obtained by using a T1-weighted three-dimensional spoiled gradient echo pulse sequence (180 slices, TR = 20 ms, TE = 2.3 ms, flip angle = 20°, FOV = 220 mm x 220 mm x 135 mm, matrix size = 224 x 187, voxel size = 0.98 mm x 1.18 mm x 0.75 mm). Functional data were obtained in 150 scans per testing session and 317 scans per training block, all consisting of 40 slices (slice thickness 3.5 mm, ascending acquisition order, no inter- slice-gap) covering the whole brain in oblique acquisition orientation. We used a sensitivity encoded (SENSE, factor 1.8) single-shot echo planar imaging technique (FEEPI; TR = 2.35 s; TE = 32 ms; FOV = 240 mm x 240 mm x 140 mm; flip angle = 82°; matrix size = 80 x 80; voxel size = 3 mm x 3 mm x 3.5 mm) with three dummy scans acquired at the beginning of each run

44

Rewarding feedback promotes motor skill consolidation via striatal activity and discarded in order to establish a steady state in T1 relaxation for all functional scans to be analyzed. Moreover, cardiac and respiratory cycles were continuously recorded (Invivo Essential MRI Patient Monitor, Invivo Corporation, Orlando, FL, USA) to allow correction of fMRI data for physiological noise (see “Analysis of imaging data”).

Analysis of Imaging Data Artefact minimization and MRI data analysis were performed using Matlab R2013b and the SPM8 software package (Institute of Neurology, London, UK; http://fil.ion.ucl.ac.uk/spm). All images were realigned to the first volume, normalized into standard stereotactic space (using the EPI-template provided by the Montreal Neurological Institute, MNIbrain), resliced to 3 mm x 3 mm x 3 mm voxel size and smoothed using a 6 mm full-width-at-half-maximum Gaussian kernel. Since the interest of this study lay in the activation of rather small brain areas, a 6 mm rather than a larger Gaussian kernel was chosen, providing higher spatial res- olution of resulting images and thus smaller partial volume effects in region of interest (ROI) analyses. Correction for physiological noise was performed via RETROICOR (Glover et al., 2000; Hutton et al., 2011) using Fourier expansions of different order for the estimated phases of cardiac pulsation (3rd order), respiration (4th order) and cardio-respiratory inter- actions (1st order) (Harvey et al., 2008). The corresponding confound regressors were cre- ated using the Matlab physIO Toolbox (Kasper et al., 2009, open source code available as part of the TAPAS software collection: http://www.translationalneuromodeling.org/tapas/). For first level data analysis of the arc-pointing task training, after highpass filtering (cut-off 128 s), an individual statistical general linear model was set up for each subject (Friston et al., 1995) by defining six regressors, corresponding to six recurring conditions per training block. Onsets and durations (in seconds) for each condition were extracted from Presentation-log- files using custom Matlab routines. The first regressor defined, for each trial, the time interval needed to place the cursor into the start box. The second condition started immediately after reaching the box and thus the corresponding regressor included both the planning and the execution of the complete movement (“movement phase”). This was followed by a period of variable length where subjects were looking at a still image of the arc waiting to either be shown the feedback screen after feedback trials or the no-feedback screen after no-feedback trials. Feedback screens then have been presented for 3 s and were modeled as separate regressors (“feedback presentation” and “no-feedback presentation”). The sixth regressor was a parametric modulation of the feedback regressor by the number of points (when

KP random or KP good was presented) or the magnitude of the monetary reward (when KP good +MR

45

Rewarding feedback promotes motor skill consolidation via striatal activity was presented) presented on the feedback screen in case of a feedback trial. Delays were not modeled and thus were used as baseline.

Based on our hypothesis of improving motor skill learning by reward-induced striatal upreg- ulation, we focused the fMRI analysis on the striatum. To separate the signal change due to knowledge of performance and monetary reward from irrelevant visual input, the linear con- trast “feedback vs. no-feedback presentation” was specified. Thus, the relative signal in- crease during reward presentation after feedback trials relative to the signal elicited when looking at a visual control stimulus after no-feedback trials (both with respect to baseline signal during break periods) was calculated and represented as beta weights. These contrast values were then averaged over two ROIs (ventral and dorsal striatum) using an in-house Matlab ROI-analysis routine. The striatum was partitioned into ventral and dorsal parts ac- cording to Lutz et al. (2012). To test for significant activation of the ROI, average effect sizes per participant were tested against null by one-tailed one-sample t-tests. All statistical anal- yses (imaging and behavioural data) were performed using SAS Enterprise Guide (5.1, SAS Institute, Cary, NC, USA). Moreover, beta values from the contrast “feedback vs. no-feedback presentation” were subjected to a one-way ANOVA with the between-subject factor “group”

(KP random , KP good and KP good +MR), and results were Bonferroni-corrected for performing mul- tiple ANOVAs (two ROIs). Dunnett’s two-tailed t-tests were then used to locate eventual in- fluences of reward type (KP good +MR vs. KP good ) and/or feedback schedule (KPrandom vs. KP good ), where applicable (i.e., in case of a significant main effect “group”).

Analysis of Behavior

Boolean cursor position with respect to the arc-channel and were calculated online for each frame and logged together with all other relevant experimental information. Data were extracted and ̅ (according to the formula presented above) and ratios of data points lying within the arc channel were determined using custom Matlab routines. Data were then cor- rected for outlier-trials ( ̅ > average ̅ of the corresponding block of trials + 2 standard deviations (SD), or ̅ < average ̅ of the corresponding block of trials - 2 SD) using SAS En- terprise Guide 5.1. For statistical analysis of absolute movement errors during arc-pointing task training, the absolute error was logarithmically transformed in order to fulfill require- ments for statistical tests. Performance changes between sessions were calculated, for each subject and movement time, as percentage changes relative to the corresponding baseline.

46

Rewarding feedback promotes motor skill consolidation via striatal activity

That is, relative to the individual pretraining ̅ for task acquisition and relative to posttrain- ing ̅ for quantification of task consolidation. Generalized linear mixed models (GLMM) for repeated measures were applied using SAS proc mixed. GLMM1: Analysis of absolute errors during training included the main factors “group” (levels: KP random , KP good and KP good +MR) and “training block” (levels 1 - 5). GLMM2: Analysis of percentage change in performance com- prised the main factors “group” (levels: KP random , KP good and KP good +MR), “learning phase” (lev- els: acquisition and consolidation) and “movement time” (levels: 0.8, 1.0, 1.2, 1.4 and 1.6 ms). For post-hoc analysis, Dunnett’s t-tests, with KP good acting as control condition, were used to locate whether differential skill development can be attributed to either the usage of different feedback schedules (KP random vs. KP good ) or different types of reward (KP good +MR vs. KP good ). One-tailed (hypothesis driven) Dunnett’s t-tests were performed, where differ- ences in striatal activations between two conditions reached significance. Moreover, one- sample t-tests were used to examine whether the groups’ skill level changed during either of the learning phases, i.e., whether percentage changes were significantly different from zero.

Results Data from one subject had to be excluded due to a software crash during the training of the task, which required recalibration and a restart of the experiment thus hampering compara- bility to the data of other participants. fMRI Using the contrast “feedback vs. no-feedback presentation”, one-tailed one-sample t-tests revealed significant activations of the ventral striatum for KP random and KP good +MR ( t = 2.40, p

= 0.0153 and t = 4.57, p = 0.0002, respectively) and of the dorsal striatum for KP good +MR exclusively ( t = 3.11, p = 0.0077; Figure 3.3). The reward condition (main effect “group”) sig- nificantly influenced the relative signal increase in the ventral striatum ( F = 5.04, p = 0.0220), but less clearly in the dorsal striatum ( F = 2.56, p = 0.179). In the ventral striatum, KP good +MR showed significantly higher activation than KP good (t = 2.98, pDunnett = 0.0093).

47

Rewarding feedback promotes motor skill consolidation via striatal activity

Figure 3.3: Striatal activations (β-values) for the “feedback vs. no-feedback presentation” contrast. Group effects were found to be significant in the ventral Striatum (vStriatum), but not significant in the dorsal Striatum (dStriatum). Means ± standard error of the mean (SEM). Significant pairwise comparison ( p < 0.05). N = 44.

Behavioral Results

Task performance is expressed as ̅, which was the measure determining knowledge of per- formance and monetary rewards. As a result of our experimental manipulation (i.e., selecting well-solved trials for feedback), average performance during feedback trials was better ( ̅ was smaller) in KP good (54.16 ± 18.12 pixels, t = 3.77, p = 0.0005) and KP good +MR (50.07 ± 8.851 pixels, t = 4.47, p < 0.0001) compared with KP random (75.11 ± 17.22 pixels; GLMM1: main effect “group”: F = 11.58, p < 0.0001). As a consequence, these subjects were shown higher average scores per feedback trial (41.03 ± 7.158 points and 42.59 ± 13.97 Rappen vs. 32.98 ± 5.421 points) and reached higher total scores over the course of the experiment (5203 ± 908.7 points and 5358 ± 572.5 Rappen vs. 4122 ± 677.6 points), all KP good and KP good +MR vs. KP random .

Considering all trials, including no-feedback trials, overall performance increased (i.e., ̅ de- creased) over the course of training (Figure 3.4; GLMM1: main effect “training block”: F =

28.02, p < 0.0001). No difference in overall ̅ was found between groups (GLMM1: main

48

Rewarding feedback promotes motor skill consolidation via striatal activity effect “group”: F = 0.58, p = 0.5599), but performance development over the course of the training was influenced by the group-specific reward condition (GLMM1: interaction “group*training block”: F = 2.20, p = 0.0247).

Figure 3.4: Development of absolute errors ( ̅ ) in pixels for all trials (feedback and no-feedback trials) averaged over each training block (1-5) for all three study groups. Means ± SEM. N = 44.

Performance in our version of the arc-pointing task has been assessed before, right after and 24 hours after the training of the arc-pointing task without providing additional terminal feedback in these testing sessions. The evolution of absolute errors, i.e., of ̅, across the different test sessions is presented in Figure 3.5 (top). Of greater relevance than absolute error values, however, are performance changes between pre- and post- (due to task acqui- sition), as well as between post- and 24 hours post-training tests (due to task consolidation processes). Figure 3.5 (bottom) displays percentage changes relative to the corresponding baseline value (i.e., relative to pretraining ̅ for acquisition and relative to post-training ̅ for consolidation). Online learning and consolidation differentially influenced performance (GLMM2: main effect „learning phase“: F = 81.80, p < 0.0001), with greater changes caused by online learning. This change was influenced by task difficulty (GLMM2: interaction “learn- ing phase*movement time”: F = 11.15, p < 0.0001). Performance improved due to online

49

Rewarding feedback promotes motor skill consolidation via striatal activity learning at all movement times, while, on the other hand, performance at 24 hours could be maintained for movement times ≥ 1.2 but significantly suffered from “forgetting” at shorter movement times (i.e., at higher task difficulty). Furthermore, „learning phase” significantly interacted with the „group” factor (GLMM2: F = 3.69, p = 0.0259). While all groups profited similarly from arc-pointing task training, only KP random and KP good +MR consolidated their per- formance overnight. KP good ’s performance decreased significantly ( t = 3.39, p = 0.0008) and this worsening was significantly different compared with KP good +MR ( t = 2.42, pDunnett =

0.0324), and by tendency different compared with KPrandom (t = 2.09, pDunnett = 0.1399).

Figure 3.5: Absolute performance ( ̅ ) during test sessions (top, upper x-axis, left y-axis) and relative performance change (in %) compared to the preceding test session (bottom, lower x-axis, right y-axis), i.e., to pre-training ̅ for task acquisition and to post-training ̅ for consolidation. All data are pre- sented as Means ± SEM. Significant post-hoc comparison ( p < 0.05). N = 44.

Discussion Our results demonstrate that both striatal response and motor skill learning, measured as relative change of error from pre- to post-training (= acquisition) and from post-training to 24 hours thereafter (= consolidation), are influenced by manipulations of the schedule for performance feedback and/or the type of reward. Specifically, adding an extrinsic (monetary) reward increases ventral striatal activation to performance feedback, which is associated with better motor skill consolidation overnight.

50

Rewarding feedback promotes motor skill consolidation via striatal activity

Training and Motor Skill Acquisition All groups practiced in identical intensity. Interventions only differed in terms of which trials were selected for KP and whether performance had monetary consequences or not. Higher subjective benefit through additional extrinsic (monetary) reward at stable cost should raise the motivation for a specific exercise. Motivation may rely on dopaminergic activity in the nucleus accumbens, as animal studies have shown that dopamine depletion in nucleus ac- cumbens or low doses of dopamine antagonists reduce the willingness to work for extrinsic rewards (reviewed by Salamone and Correa (2002)). Enclosing nucleus accumbens, ventral striatum activations observed during our experiment (Figure 3.3) could thus be an indication that groups invested varying amounts of effort into training. But, MR neither improved per- formance during training nor skill acquisition. This supports the results from Abe et al. (2011), who also found no difference in acquisition between reward, punishment and control groups. However, other studies showed that punishment, but not reward improved the acquisition of a motor adaption paradigm (Galea et al., 2015) and induced a performance effect in a procedural motor task (Wachter et al., 2009). But, Wachter et al. (2009) also found that the acquisition of an implicit motor learning task profited from reward but not from punishment. This apparent inconsistency should be taken as indication that conclusions across different (motor) learning modalities like procedural, skill or adaption learning must be drawn with caution (Shmuelof et al., 2012).

Consolidation Our study design allows investigating the influence of using different schedules for intrinsic reward on neural activity and motor skill learning by comparing KP good and KP random condi- tions. While feedback trials were randomly selected in case of KP random , subjects in KP good were only informed about trials with good performance. Interestingly and against our hypothesis, striatal activation was only observed in KP random but not KP good . Behaviourally, this resulted in successful task consolidation for KP random and significant overnight forgetting in KP good with a between-group difference close to significance. Thus, ventral striatal activation during train- ing supports successful consolidation of a newly learned motor skill.

Poor performance and striatal underactivation in KPgood were unexpected. This result is in contrast to findings from Chiviacowsky and Wulf (2007), who studied two experimental groups, one receiving knowledge of result after good (KR good ) and the other after bad perfor- mance (KR poor ), in a ballistic task that required subjects to throw beanbags at a target with

51

Rewarding feedback promotes motor skill consolidation via striatal activity

their eyes covered. In their experiment, the KR good -group significantly outperformed the KRpoor-group when subjects repeated the task one day after the training without knowledge of result. Therefore, the authors proposed motivational properties of positive feedback to have a direct effect on learning. On the contrary, the guidance hypothesis of feedback sug- gests that feedback is more beneficial if presented after larger rather than smaller errors because it then better guides the learner to the correct response (Salmoni et al., 1984; Schmidt, 1991). Relating this controversy to our finding of a tendency towards better consol- idation in KP random compared with KP good , it appears that KP random combines the best of both theories. That is, adequate error information guiding subject’s response towards better per- formance, but still keeping subjects motivated by frequently including knowledge of perfor- mance after good performance. A positive motivational status might be indicated by the ob- served activation of the ventral striatum in KP random , as motivation may rely on dopaminergic activity in the nucleus accumbens (Salamone and Correa, 2002). However, the question re- mains why knowledge of performance after average performance (KP random ) lead to striatal activation while knowledge of performance after good performance did not. Attentively steering the cursor along the arc-channel under visual control may have enabled subjects to evaluate their performance online and thus to make predictions about the feedback. This, in turn, may have allowed KP good -group to predict the reception of knowledge of performance, as for them the selection of feedback trials depended on performance. We know from ex- periments in primates that dopamine neurons appear to emit an alerting message about the surprising presence or absence of rewards and that responses to rewards and reward-pre- dicting stimuli depend on event predictability (Schultz, 1998). It therefore seems to be the unpredictable selection of feedback trials in KP random , rather than the magnitude of the score that made up the activation in the ventral striatum. This finding is supported by the absence of significant activations to a parametric modulation of the “feedback presentation” contrast by the amount of points won during a trial.

Interestingly, although KP good failed to induce any striatal activation and was accompanied by overnight forgetting, knowledge of performance after good performance lead to highest ven- tral striatum response and also activated the dorsal striatum when knowledge of perfor- mance signified a monetary outcome. Both, ventral striatum activation and overnight task consolidation were significantly higher/better in KP good +MR compared with KP good . A benefi- cial influence of increased motivation due to higher subjective benefit (induced by extrinsic

52

Rewarding feedback promotes motor skill consolidation via striatal activity reward) on the consolidation component of motor skill learning thus emerges from our re- sults. This corroborates previous findings on motor skill learning (Abe et al., 2011) and motor adaption (Galea et al., 2015). The former experiment used an isometric pinch force tracking task to investigate motor skill learning under either monetarily rewarded, punished or neu- tral control training conditions. While at 24 hours post-training, punishment and control groups performed at a similar level as immediately after the training, the rewarded group experienced significant offline gains, which remained present at 30 days post-training. In con- trast, the neutral and punished groups showed substantial performance loss at 30 days. When comparing to the experiment of Abe et al. (2011), the beneficial effect of reward could be similarly demonstrated in the present study. Although, for practical reasons, we did not test further than 24 hours post-training. Some remaining discrepancies of performance changes at 24 hours post-training may be attributed to differential influences of task com- plexity or difficulty between the pinch force task and the arc-pointing task, as indicated by our finding of a significant “learning phase*movement time” interaction. That is, changes due to task consolidation highly depended on task difficulty (i.e., movement time).

However, regarding the comparison between KP good +MR and KP good , observed striatal activa- tions are in line with previous work, revealing that feedback related activity in the ventral striatum is increased if knowledge of performance has monetary consequences and that a monetary incentive is needed to elicit a neural response in the dorsal striatum (Lutz et al., 2012). The absence of a response of the dorsal striatum to performance feedback is, on the other hand, in contrast to findings from other studies (Poldrack et al., 2001; Tricomi et al., 2006; Tricomi and Fiez, 2008; Tricomi et al., 2004). Unfortunately, different approaches for defining striatal subdivisions hamper comparability between these results.

To summarize, training under a feedback condition, which elicited higher activation of the ventral striatum, positively influenced skill development via better task consolidation. Over- all, it seems that training under a feedback condition that induces activation in the ventral striatum helps for successful task consolidation. It is known that, in a rewarded task, hemo- dynamic ventral striatal response correlates with dopamine release in the ventral striatum, which as well correlates with the reward-related neural activity in the substantia nigra/ven- tral tegmental area, the origin of the dopaminergic projection (Schott et al., 2008). Reward- related ventral striatal activity may thus be an indication for increased dopaminergic function in the midbrain. In rodents, the existence of direct pathways linking midbrain reward centers

53

Rewarding feedback promotes motor skill consolidation via striatal activity to the motor cortex has been demonstrated (Hosp et al., 2011). In the motor cortex, dopa- mine facilitates long-term potentiation (Molina-Luna et al., 2009), a form of synaptic plastic- ity discussed to be critically involved in skill learning (Rioult-Pedotti et al., 2000; Ziemann et al., 2004). In their experiment, Hosp et al. (2011) could demonstrate that destroying dopa- minergic neurons in the ventral tegmental area prevented improvements in forelimb reach- ing, a state that was abolished on administration of levodopa into the primary motor cortex. Dopamine-dependent long-term potentiation develops gradually over hours (Huang and Kandel, 1995) and persists for days to weeks (Abraham, 2003). We thus propose increased dopamine release into the primary motor cortex in feedback conditions with significant acti- vation of the ventral striatum to be the key factor facilitating motor skill learning via better task consolidation.

Limitations The striatum is involved in fine motor control. Therefore, it is not surprising that both ventral and dorsal striatum activation was observed during movement execution in this experiment. These activations, however, did not differ between groups (data not shown) and the move- ment phase was well separated from feedback/no-feedback presentation through a variable delay period (Figure 3.1). Hence, we do not expect striatal involvement in movement control to have an influence on our imaging results observed during reward processing.

Furthermore, the present study does not yield a double dissociation between the influence of feedback schedule (random selection/good performance) and type of reward (knowledge of performance only/knowledge of performance plus monetary reward), because we have not fully balanced the possible conditions (KP random , KP good , KP random +MR, KP good +MR). Never- theless, we can corroborate influences of monetary reward on striatal activity and can link these to consolidation of a motor skill. It also allows to discuss effects of performance feed- back schedules on striatal activity and motor skill learning, but it does not allow to investigate interactions between these two factors.

Moreover, generalization of these findings to other types of motor or non-motor learning is limited. In motor skill learning, motor learning is investigated in the absence of a perturbation and the main goal is to reduce a variable error (Deutsch and Newell, 2004; Guo and Raymond, 2010; Hung et al., 2008; Liu et al., 2006; Muller and Sternad, 2004; Ranganathan and Newell, 2010). Task difficulty limits performance, usually in the form of a trade-off between speed

54

Rewarding feedback promotes motor skill consolidation via striatal activity and accuracy. Learning consists of breaking through this limit (i.e., improving the speed-ac- curacy trade-off) (Reis et al., 2009). In the original work introducing the arc-pointing task, the authors well defined and checked for fulfillment of speed requirements (i.e., the movement time) and then investigated an isolated measure of accuracy (Shmuelof et al., 2012). In con- trast, our main outcome measure, ̅, is influenced by both speed and accuracy. A reduction in ̅ can thus occur by improved accuracy, more accurate timing, or a combination of both. Although we refrained from defining a target zone and thus from strictly checking for ob- servance of the movement time, we excluded outlier trials, where, for example, the trial was accidentally started. In conclusion, although we can demonstrate a shift in the speed-accu- racy trade-off function for the entire subject population, comparing groups by means of a separable measure of either speed or accuracy is in our case not valid, as it was the combined measure ̅ that determined group specific feedback conditions. This might be viewed as a shortcoming, hampering clear definition of the behavior observed during our study as motor skill learning, but on the other hand it allowed effective investigation of learning of goal ori- ented movements with clearly set goals and well defined feedback on goal achievement.

Conclusion Our results demonstrate that motor skill learning is influenced by different reward conditions applied during the training of a motor task. Particularly, linking performance feedback to a monetary outcome efficiently raises ventral striatum activation, which comes along with bet- ter overnight task consolidation of the corresponding study group. Notably, all groups show- ing a significant response of the ventral striatum to feedback during training could retain their performance from the first day at the 24 hours post-training test, whereas a lack of ventral striatal response in the other group was accompanied by significant overnight forget- ting. This leads us to conclude that increasing ventral striatal activity during acquisition of a motor skill by using appropriate reward improves consolidation of the acquired skill.

Acknowledgments The authors are indebted to the volunteers for their dedicated participation in this study. Special thanks go to Benjamin Hertler for his support in the implementation of the study and Peter Rasmussen for his help with the statistical analysis of the data. This study was sup- ported by the Clinical Research Priority Program Neuro-Rehab (CRPP) of the University of

55

Rewarding feedback promotes motor skill consolidation via striatal activity

Zurich. We would like to dedicate this work to Nadja Ziegler who sadly passed away over the course of this project.

Conflict of Interest: The authors have no conflicts of interest to declare. ClinicalTrials.gov Identifier: NCT02189564.

56

57

58

Processing of Motor Performance Related Reward After Stroke

Published in:

Converging Clinical and Engineering Research on Neurorehabilitation II: Proceedings of the 3rd International Conference on NeuroRehabilitation (ICNR2016), October 18-21, 2016, Se- govia, Spain 2017. Biosystems & Biorobotics, 15: 1019-1023. http://dx.doi.org/10.1007/978-3-319-46669-9_165

Authors:

Widmer M., Luft A.R. and Lutz K.

Publisher:

Springer International Publishing

Keywords:

Stroke, fMRI, reward, feedback, striatum

59

Processing of Motor Performance Related Reward After Stroke

Abstract Performance dependent reward activates the striatum, a key region of the reward system. However, stroke patients were identified to show reduced brain activations to rewarding feedback in cognitive tasks when compared to healthy age-matched controls. This was re- flected in impaired reinforcement learning. Whether their response to reward derived from preceding motor performance is also reduced, is, however, still unknown. Using functional magnetic resonance imaging, striatal activity linked to performance dependent monetary re- ward was measured during the training of a repetitive arc-tracking task. Pilot results of nine stroke patients and nine age-matched healthy individuals point towards a tendency for re- duced responsiveness of ventral parts of the striatum in stroke patients, while the dorsal striatum, although to a smaller extent, shows an opposite trend. This is of particular interest as ventral striatal activation was found to be the key factor for successful overnight consoli- dation in an earlier study using a similar task.

60

Processing of Motor Performance Related Reward After Stroke

Introduction A monetary reward (MR) that depends on preceding motor task performance activates the striatum (Lutz et al., 2012; Widmer et al., 2016). In addition, such reward has been shown to improve consolidation/retention in motor skill learning studies (Abe et al., 2011; Widmer et al., 2016). This is possibly mediated by a reward related increase of the activity of dopamin- ergic projections originating in reward-coding brain regions and targeting the motor cortex (M1). As shown in rodents, these projections enable motor learning and long-term potentia- tion in cortico-cortical projections (Hosp et al., 2011; Molina-Luna et al., 2009). However, plasticity in M1 also occurs during recovery/rehabilitation after stroke and likely contributes to its success (Luft et al., 2004). In stroke survivors, activity of this dopaminergic pathway may not only be reduced because rewards are small, but also because patients after stroke have deficits in reward processing (Lam et al., 2016). In Lam et al. (2016), using a probabilistic classification task, this was reflected in impaired reinforcement learning as compared to age- matched healthy individuals. Whether the processing of reward derived from the perfor- mance in a motor task is also impaired after stroke, is, however, unclear. In a pilot study using functional magnetic resonance imaging (fMRI), we investigated the neural response to per- formance dependent MR feedback during the performance of a repetitive arc-tracking task in stroke survivors and healthy age-matched peers.

Material and Methods

Participants Nine stroke survivors and nine healthy elderly subjects (control) were recruited. Stroke pa- tients were included if they had suffered an ischemic stroke and were measured during in- patient rehabilitation (subacute phase; 48 (25) days post-stroke, mean (SD)). They had to be able to give written informed consent and to understand the task. Exclusion criteria were severe aphasia, dementia or depression, uncorrectable visual disorders or any contraindica- tion to MRI. For subject characteristics see Table 4.1. The study was approved by the regional Ethics Committee (EKNZ-LU 13054). All participants provided written informed consent. fMRI Task To assess motor reward related brain activity, participants performed two blocks of fifty trials of a modified arc-pointing task (Widmer et al., 2016) in a MRI scanner (Philips Ingenia 3.0T).

61

Processing of Motor Performance Related Reward After Stroke

SUBJECT CHARACTERISTICS Measure Stroke Patients ( n=9) Controls ( n=9)

Age 59.8 (11.3) 66.1 (6.1)

Mo CA 24.2 (3.2) 28.2 (2.0)

BDI 7.4 (4.4) 2.7 (2.3)

Table 4.1: Characteristics of stroke patients and controls including age as well as results from Montreal Cognitive Assessment (MoCA) and Beck’s Depression Index (BDI). Data are presented as mean (SD).

Having a spherical reflective marker attached to the knuckle of the index finger of their un- affected (stroke patients) or dominant (controls) hand that was continuously tracked with an MRI-compatible motion capture system (Qualisys AB) enabled them to control a computer screen cursor by moving their wrist while the arm was rested on a cushion.

The task required subjects to steer the cursor through a semicircular channel from a start to a target box in clockwise direction in their preferred movement speed. Thus, no time con- straints were imposed. The fraction of samples (at 60 Hertz) laying within the arc-channel (PCTin) was calculated for each trial and was used as main performance measure determining MR.

Figure 4.1: (a) Monetary reward feedback and (b) visual control stimulus. In the latter, numbers spec- ifying the monetary reward were replaced by question marks and the cursor’s trajectory was omitted.

The assessment started with a familiarization period consisting of twenty trials. Here, PCTin was used to adapt task difficulty (i.e., the width of the channel) to make sure all participants are able to perform the rewarded task at a similar performance level. In the main part, a MR was given after well-solved trials, while a neutral visual control stimulus was presented after

62

Processing of Motor Performance Related Reward After Stroke the other half of the trials (Figure 4.1). MR depended on PCTin, so that a maximum of 1 Swiss Franc (CHF) could be won per trial.

For the analysis of fMRI data, the presentation of either of the two stimuli was modeled as separate regressors and the contrast “MR vs. control stimulus” was defined and used for a region of interest (ROI) analysis.

Results Stroke patients and controls earned similar amounts of money over the course of the exper- iment (32.23 (6.96) CHF and 30.87 (6.12) CHF, respectively). In one-tailed one-sample t-tests, both groups showed significant activations of the ventral striatum ( t = 2.74, p = 0.013 and t = 4.48, p = 0.001, respectively) and the Nucleus Accumbens ( t = 3.30, p = 0.005 and t = 5.68, p < 0.001) to the “MR vs. control stimulus” contrast. The dorsal striatum, on the other hand, was only activated in stroke patients ( t = 3.42, p = 0.005), but not in healthy controls ( t = 1.68, p = 0.07).

Figure 4.2: Bold effect to the “monetary reward vs. control stimulus” contrast expressed as β-values in ventral (vStriatum) and dorsal striatal (dStriatum) regions of interest (ROIs), as well as in Nucleus Accumbens (NAcc). N = 18. Mean and standard error (SE).

With the current sample size, no between group differences were found in any of the three ROIs, but tendencies emerged mainly in Nucleus Accumbens ( t = 1.09, p = 0.143), but also in

63

Processing of Motor Performance Related Reward After Stroke the ventral ( t = 0.986, p = 0.169) and the dorsal striatum ( t = 0.925, p = 0.184). Interestingly, trends in ventral and dorsal parts of the striatum go in opposite directions (Figure 4.2).

Discussion In this pilot study, we have investigated the integrity of the reward system after stroke using a newly developed tool to measure the neural response to motor performance derived MR. There was a tendency towards lower responsiveness of ventral parts of the striatum, while the dorsal striatum, although to a smaller extent, showed an opposite trend when compared with age-matched healthy subjects. The tendency observed in the ventral striatum is in line with Lam et al. (2016). This study found reduced brain activation to smiley feedback in stroke subjects, although using a probabilistic classification instead of a motor task. Interestingly, ventral striatum activation was found to be the key factor for successful overnight task con- solidation in an earlier study with healthy young subjects (Widmer et al., 2016). There, we suggest that the increased ventral striatum activity of the corresponding study group can be taken as a surrogate for increased midbrain dopaminergic activity (Schott et al., 2008) facili- tating motor learning dependent plasticity in M1 via dopaminergic projections from the mid- brain to M1 (Hosp et al., 2011; Molina-Luna et al., 2009).

This study is limited by the small sample size. It shows the possibility, however, with enough subjects, to generate normative data and judge individual activation levels.

Conclusion Our data implies that there is a tendency towards altered processing of motor performance derived reward after stroke. Influencing factors (e.g., post-stroke depression, age, lesion lo- cation, ...) as well as the impact on the rehabilitation progress will be addressed in a larger follow-up study.

Acknowledgments This study was supported by the Clinical Research Priority Program (CRPP) Neuro-Rehab of the University of Zurich.

64

65

66

Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso-Reward: Study protocol for a randomized controlled trial.

Published in:

Submitted for publication in BMC Neurology

Authors:

Widmer M., Held J.P., Wittmann F., Lambercy O., Lutz K. and Luft A.R

Keywords:

Rehabilitation, virtual reality, stroke, upper extremity, arm, feedback, reward

67

Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso-Reward: Study protocol for a randomized controlled trial.

Abstract Background: 50% of all stroke survivors remain with functional impairments of their upper limb. While there is a need to improve the effectiveness of rehabilitative training, so far, no new training approach has proven to be clearly superior to conventional therapy. As training with rewarding feedback has been shown to improve motor learning in humans, it is hypoth- esized that rehabilitative arm training could be enhanced by rewarding feedback. In this pa- per, we propose a trial protocol investigating rewards in the form of performance feedback and monetary gains as ways to improve effectiveness of rehabilitative training.

Methods: This multicentric, assessor-blinded, randomized controlled trial uses the ArmeoSenso virtual reality rehabilitation system to train 74 first-ever stroke patients (< 100 days post-stroke) to lift their impaired upper limb against gravity and to improve the workspace of the paretic arm. Three sensors are attached to forearm, upper arm, and trunk to track arm movements in three-dimensional space while controlling for trunk compensa- tion. Whole arm movements serve as input for a therapy game. The reward group (n=37) will train with performance feedback and contingent monetary reward. The control group (n=37) uses the same system but without monetary reward and with reduced performance feed- back. Primary outcome is the change in the hand workspace in the transversal plane. Stand- ard clinical assessments are used as secondary outcome measures.

Discussion: This randomized controlled trial will be the first to directly evaluate the effect of rewarding feedback including monetary rewards on the recovery process of the upper limb following stroke. This could pave the way for novel types of interventions with significantly improved treatment benefits, e.g. for conditions that impair reward processing (stroke, Par- kinson‘s disease).

Trial registration: https://clinicaltrials.gov Identifier: NCT02257125

68

Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso-Reward: Study protocol for a randomized controlled trial.

Background After stroke, 50% of survivors are left with impairments in arm function (Kwakkel et al., 2003; Parker et al., 1986), which is associated with reduced health-related quality of life (Nichols- Larsen et al., 2005). While there is evidence for a positive correlation between therapy dose and functional recovery (Cooke et al., 2010; Kwakkel, 2006; Veerbeek et al., 2014), a higher therapy dose is challenging to implement, as it usually leads to an increase in costs commonly not covered by health insurances. However, when dose is matched, most randomized con- trolled trials introducing new types of rehabilitative interventions (e.g., robot assisted ther- apy (Kwakkel et al., 2008)) failed to show a superior effect compared to standard therapy. Thus, the need for improving therapy effectiveness remains. In search for elements of effec- tive therapy, we hypothesize that performance feedback and monetary rewards can improve effectiveness.

It has been shown that reward enhances procedural (Wachter et al., 2009) and motor skill learning (Abe et al., 2011; Widmer et al., 2016) and has a positive effect on motor adaption (Galea et al., 2015). Rewards mainly improve retention of motor skills and motor adaptions (Abe et al., 2011; Galea et al., 2015; Widmer et al., 2016). This effect was not explained by training duration (dose) as rewarded and non-rewarded groups underwent similar training schedules (Abe et al., 2011; Galea et al., 2015; Wachter et al., 2009; Widmer et al., 2016). In a functional magnetic resonance imaging study, Widmer et al. reported that adding mone- tary rewards after good performance led to better consolidation and higher ventral striatum activation than knowledge of performance alone (Widmer et al., 2016). The striatum is a key locus of reward processing (Knutson et al., 2009), and its activity was shown to be increased by both, intrinsic and extrinsic reward (Lutz et al., 2012). Being a brain structure that receives substantial dopaminergic input from the midbrain, ventral striatal activity can be seen as a surrogate marker for dopaminergic activity in substantia nigra/ventral tegmental area (Schott et al., 2008). In rodents, Hosp et al. found that dopaminergic projections from mid- brain also terminate directly in the primary motor cortex (M1) (Hosp et al., 2011). Dopamine in M1 is necessary for long-term potentiation of certain cortico-cortical connections and suc- cessful motor skill learning (Molina-Luna et al., 2009). As mechanisms of motor learning are also thought to play a role in motor recovery (Krakauer, 2006), rehabilitative interventions may benefit from neuroplasticity enhanced by reward.

69

Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso-Reward: Study protocol for a randomized controlled trial.

Here, we describe a trial protocol to test the effect of enhanced feedback and reward on arm rehabilitation after stroke at matched training dose (time and intensity). We use Arme- oSenso, a standardized virtual reality based training system (Wittmann et al., 2015) that is delivered in two versions for two different study groups, one version with and one without reward and enhanced performance feedback.

Methods

Study Design This multicentric trial is randomized, controlled and assessor-blinded (Figure 5.1). Patients are unaware of the training characteristics of the other study group.

Study Population This study includes stroke patients (max. 100 days after stroke) that meet the following cri- teria: A minimum age of 18 years, hemiparesis of the arm due to cerebrovascular ischemia, the ability to lift the paretic arm against gravity, a minimal arm workspace of 20 cm x 20 cm in the horizontal plane and absence of severe aphasia, depression, dementia and hemiano- pia. The study is approved by the local ethics committee and all subjects have to give written informed consent in accordance with the Declaration of Helsinki.

Randomization The randomization procedure was planned and set up by an independent contract research organization (Appletree CI Group, Winterthur, Switzerland). A non-consecutively increasing, pseudo-randomly generated list of subject identification numbers (IDs) was created. IDs are chronologically assigned to each new study participant, stratified by the study center. Allo- cation to one of the two study groups is balanced in blocks of four. The randomization list containing the subject-ID, the corresponding group allocation and a randomly generated password was sent to an independent (unblinded) study staff member (“admin”) who has set up respective patient user computer accounts used for accessing the therapy game. The group-specific version of the game, i.e., either with or without reward, is defined by the ac- count. The admin keeps the assignment list and is not involved in data collection. Immedi- ately before the very first training session, each study participant has to confirm by signature

70

Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso-Reward: Study protocol for a randomized controlled trial. to having received a sealed envelope containing a butterfly etiquette with ID and password to access the account. The patient keeps this etiquette for the entire study duration.

Figure 5.1: Flow diagram illustrating the trial design and sequence.

ArmeoSenso Training System The arm rehabilitation system combines motion capturing via wearable inertial measure- ment units (IMUs) in combination with a therapy game, running on a touch screen computer (Inspiron 2330, Dell Inc., USA) (Figure 5.2A). Three wireless IMUs (MotionPod 3, Movea SA, France) are fixed to the functionally impaired lower and upper arm as well as the trunk (Wittmann et al., 2015; Wittmann et al., 2016).

71

Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso-Reward: Study protocol for a randomized controlled trial.

Figure 5.2: A) Healthy Subject using the ArmeoSenso-Training System. B) Arm workspace assessment: Grey cubic voxel of 10 cm side length arranged in the transverse plane relative to the patient’s trunk.

In contrast to robot-based virtual reality therapy systems, this sensor-based approach does not offer any weight support for the impaired arm. The ArmeoSenso system specifically re- quires the patient to lift the arm against gravity and to increase hand workspace in three- dimensional (3D) space. The system was validated in a home feasibility trial with stroke pa- tients (Wittmann et al., 2016). For the present study, the ArmeoSenso system includes two automated functional assessments, one consisting of a pointing task with nine targets ar- ranged in two semicircles in the transversal plane. The second assessment measures the hand workspace of the trained limb (see “Primary Outcome”). While identical assessments are performed in both training groups, the system includes a specific version of a therapy game for each of the two training groups: (A) A rewarding version including monetary re- wards, knowledge of performance feedback and graphical special effects (Figure 5.3A), and (B) a non-rewarding version lacking these motivators (Figure 5.3B). A more detailed descrip- tion follows.

Intervention Both groups train one hour per day, five days a week for three weeks while being inpatient in a participating rehabilitation hospital. Training is supervised by a therapist. Since one hour of consecutive upper limb training per day without weight support can be too demanding for some patients, deviations from this protocol are allowed to a minimum cumulative training time of 720 minutes.

A typical ArmeoSenso training session is described in Wittmann et al. (2016). For the present study, patients log in to their user account with their random-ID and the password printed on their butterfly etiquette. The IMUs are fixated to the affected lower and upper arm and

72

Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso-Reward: Study protocol for a randomized controlled trial. to the trunk using custom-made Velcro straps (Balgrist Tec AG, Zurich, Switzerland). The su- pervising therapist may help if necessary. The ArmeoSenso system then guides the patient through three calibration poses and two automated assessments (see "Outcome Measures") before training starts (beginning of the targeted 60 min session duration). In order to prevent physical exhaustion, the patient is visually instructed to rest for at least 4 seconds every 40 seconds. Moreover, patients are allowed to interrupt the training session if an additional break is needed. The duration of the additional breaks is added at the end of the training session. After 60 minutes of net training time, the automated assessments will be repeated and the patient will be asked to fill in a short motivation questionnaire (see “Secondary Out- come”).

Both groups train with modified versions of the ArmeoSenso “METEORS” game (see Wittmann et al., 2015; Wittmann et al., 2016). Although the two versions differ markedly in terms of their appearance, they share the underlying game mechanics. That is, in both a vir- tual "hand" which matches the movement of the subject's real hand is used to catch objects that drop downwards from the top of the screen. The targets are placed within or at the border of the patient's virtual 3D workspace, which is continuously estimated and updated using a voxel-based model (Wittmann et al., 2015). The time to complete a round in the ME- TEORS game is T_max = 150 s (excluding rest). If during these 150 s less than five targets were missed, the round is won and the difficulty increases by up to three levels, depending on the number of targets that hit the ground. Difficulty is adapted dynamically by changing (i) the average target speed of falling, (ii) the target spawn interval and (iii) the number of simulta- neously spawned targets (one to a maximum of seven). It increases in this order (i, ii, iii, i, ...). Conversely, difficulty decreases in reverse order if more than four targets were missed and the round is lost after a certain time (T_loss). In that case, the difficulty decrease is calculated by rounding _ to the closest integer, but with a maximum of four difficulty levels. _

Rewarded Training The reward group will train for 15 hours with a version of the METEORS game that is very similar to the one used in previous studies (Wittmann et al., 2015; Wittmann et al., 2016). Briefly, the hand is used to catch the targets that are depicted as meteors. The movement of the patient's whole arm is displayed with low latency on the computer screen as a moving virtual arm; a feature implemented to increase the feeling for embodiment and thus improve

73

Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso-Reward: Study protocol for a randomized controlled trial. the motivation to move the arm (Price et al., 2012). Subjects are instructed to use the hand to catch the falling meteors in order to protect their planet from being destroyed (Figure 5.3A). This game theme is easily understood and emotionally involving (Wittmann et al., 2016).

Figure 5.3: A) ArmeoSenso-Reward: METEORS therapy game. The hand of the virtual arm is used to catch the falling meteors before they crash onto the planet. If caught, the meteor explodes and a score appears. If missed, the planet gets damaged (note the impact crater). The current score (=PUNKTE) is displayed on the upper left (white font color) and compared to the patient's all-time record (=REKORD; red font color, upper left). The green bar on the upper right indicates resting time. If completely black, the patient must rest for 4 s before new meteors are spawned. During rest, the bar fills with green. The yellow bar on the left indicates how much playtime is left in the ongoing round (max. 150 s). B) Control game. The virtual hand is a green decagon that can be used to touch the pill-shaped, single- colored targets dropping in from the top of the screen, which then disappear with a delay of 1 s with- out producing a score. The green bar on the upper right fills up whenever the patient assumes the resting position.

Whenever a meteor is touched by the virtual hand, it explodes, giving the patient immediate knowledge of result. Furthermore, a score appears with each exploding meteor that depends on the falling speed and diminishes with the time the meteor was visible on the screen before being caught. Scores are summed up over a round and reset when the next round starts. However, there is also an all-time high score always visible on the upper left (Figure 5.3A). If a meteor is missed, it crashes on the planet and damages it. Should the patient miss more than four meteors within T_loss < 150 s, the round is lost, which results in visual effects show- ing the planet being destroyed and the camera shaking, followed by a message encouraging the patient to try again.

After successful level completion, patients are shown a feedback screen illustrating that they have successfully saved the planet, how many meteors they managed to catch and how many they have missed (Figure 5.4A). Monetary rewards are given for each completed level. Pa- tients can win up to 1 Swiss Franc (CHF), if they succeed, but 0.1 CHF is deducted for every missed meteor. As a new level can be started approximately every 3 minutes, a maximum of

74

Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso-Reward: Study protocol for a randomized controlled trial.

20 Swiss Francs (CHF; approx. 20 US-Dollars) could be won per training session in case of an uninterrupted winning streak. This, however, is unlikely due to the difficulty adaption de- scribed above. All, the money won during the preceding round, during the ongoing training session and the total money gathered over the whole course of the study, is presented on the feedback screen (Figure 5.4A), which is followed by a high score list showing the top 10 results (Figure 5.4B). If the current result was in the top 10, it is marked in the list (Figure 5.4B). This feature was also implemented to optimize patient engagement.

New planets (8 in total) and/or backgrounds (12 in total) are unlocked during the course of the three-week training. These rewards do not have any influence on the gameplay and dif- ficulty but are intended to add variety to the game. Once three planets have been unlocked, the patient can choose between two randomly selected planets at the start of every round.

Figure 5.4: ArmeoSenso-Reward feedback screens. A) "PLANETEN GERETTET": planet saved. This screen is presented after each completed round. The number of meteors caught ("GEFANGEN", top) and meteors hitting the planet ("EINGESCHLAGEN", bottom) is indicated on the left. The monetary reward (“GEWINN”) for the current round ("DIESE RUNDE", top), the current day ("HEUTE", middle) and the total amount of money gathered over the course of the study ("TOTAL", bottom) are displayed on the right. Note that a maximum of 1 Swiss Franc (CHF) can be won per round. B) Hall of fame ("RUHMESHALLE") with the patient's top ten scores. If the current score is in the top ten, it is high- lighted in red.

Control Training The control training consists of the same sensor system and game mechanics with all reward- ing feedback removed. In order to reduce the feeling of embodiment (Price et al., 2012), only the position of the hand is shown as a green decagon on a plain black background. Targets are simple pill-shaped, single-colored objects that disappear with a delay of 1 s without pro- ducing a score or sound after being touched; hence, there is no immediate but delayed knowledge of performance. Complete removal of knowledge of performance is not possible in this game because patients then might reach for the same target for several times, which would hamper comparability to the other study group. The feedback screen, the monetary

75

Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso-Reward: Study protocol for a randomized controlled trial. reward, the high score list and the unlocking of new planets and backgrounds are also re- moved. Instead, patients are looking at a blank screen to keep the training time comparable. Most notably, the target placement and difficulty adaption remain unaffected.

Outcome Measures The clinical assessments are collected by assessors blinded to treatment allocation. All asses- sors are trained in performing the assessments before the start of the trial.

Primary Outcome The primary outcome of this trial is the workspace of the impaired arm in the horizontal plane, measured by using an assessment integrated into the ArmeoSenso platform. Subjects are instructed to actively reach out as far as possible with their impaired arm forward, back- ward and sideways to explore the entire arm workspace (Wittmann et al., 2015; Wittmann et al., 2016). The workspace is corrected for trunk movements and computed as the number of square pixels of 10 cm side length arranged in the transverse plane relative to the patient’s trunk (Figure 5.2B). This workspace assessment is conducted immediately before and after every therapy session.

Secondary Outcome Arm impairment is assessed using the Fugl-Meyer Assessment - Upper Extremity (FMA-UE), arm function using the Wolf Motor Function Test (WMFT), the Box and Block-Test, and a pointing task (ArmeoSenso integrated assessment). For the pointing task, nine targets ar- ranged in two semicircles appear one after another in the transversal plane in front of the subject. The goal is to reach out to the target within 8 s. The number of targets reached and the mean time to target is reported. The Motor Activity Log 14 (MAL-14) for self-reported movement ability, the Barthel Index (BI) as a measure of independence in daily living and the National Institutes of Health Stroke Scale (NIHSS) as a measure of stroke severity are rec- orded.

Finally, patients fill in a short questionnaire after each training session. Ten questions (five positively and five negatively formulated), given in randomized order, evaluate the subjective appraisal of the training on a five point Likert-scale.

Assessments of Safety Adverse events (AEs) expected to occur are skeletal or muscular pain and fatigue indicating a syndrome of overuse. The quality management system of the Clinical Trail Center Zurich

76

Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso-Reward: Study protocol for a randomized controlled trial. will be followed according to national and international guidelines (ICH, 1996). Adverse events (AEs) will be documented and related serious adverse events (SAEs) will be reported to the ethical committee, the competent authority (Swissmedic) and local principle investi- gators (PIs). All SAEs will be included in an annual report to authorities and PIs. AEs will be recorded from randomization to the end of the trial.

Sample Size Sample size is estimated to detect a between-group difference of 4.8 voxels in the workspace difference from beginning to end of training, based on the improvement in arm workspace from pilot results and an estimated group difference of 20%. This assumes a two-sided alpha level at .05 and a power of 80%. For an effect with a standard deviation of 7 voxels, 35 sub- jects per group yields 80% power to detect the true alternative. We will randomize 37 sub- jects in each group, based on our observed attrition rate of 5% in a previous interventional pilot trial (Wittmann et al., 2016).

Statistical Analysis Our primary analysis is a per-protocol analysis comparing the two groups. Therapy will take place in 15 sessions over three weeks, and there is the possibility that some subjects will not complete the full treatment regimen due to scheduling issues or other time constraints. If they still perform at least 12 hours of therapy the data will be analysed. All other patients will be considered “non-compliant” in the sense that they do not receive the full treatment dose. According to the per-protocol principle, their outcomes will not be analysed.

A two-sample t-test comparing the mean change in voxel workspace assessment between the two groups will be used; in case of non-normality, a Mann-Whitney test will be computed instead of the t-test. Statistical significance will be based on a p-value threshold of 0.05.

Discussion This is the first randomized clinical trial to evaluate the effect of enhanced feedback and re- ward on arm rehabilitative training following stroke. Intrinsic (score, knowledge of perfor- mance) and extrinsic rewards (money) hypothetically improve motor cortex plasticity and

77

Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso-Reward: Study protocol for a randomized controlled trial. overall motivation to train. Because motivation affects training time and time is a crucial de- terminant of effect (Veerbeek et al., 2014), this trial controls for time by using a control in- tervention that is matched in time and dose of training.

In a motor learning study with healthy young subjects, we have shown that the consolida- tion/retention of a skilled motor task is more effective if the task was trained in the presence of reward (Widmer et al., 2016). In a rat model, projections from midbrain dopaminergic regions to M1 are required for successful motor learning and functional plasticity at cortical (layer II/III) synapses (Hosp et al., 2011; Molina-Luna et al., 2009), mechanisms that presum- ably support recovery after stroke (Nudo, 2003). Whether the dopaminergic system can be stimulated to improve recovery remains to be shown. Likewise, whether reward is an appro- priate stimulus is yet unknown.

Although previous studies have assessed the patient’s motivation for a specific training (e.g. Nijenhuis et al., 2015; Wittmann et al., 2016), none of them compared the outcome to an appropriate control condition for the evaluation of the effectiveness of rewarding therapy. Here, we are in search of a clinical effect of reward on a reduction in impairment (shoul- der/elbow range of motion) mediated by active and repetitive proximal arm training. We chose this training method because (1) it can be standardized in its conduct and has quanti- fiable parameters of dose, movement success and arm workspace as primary outcome meas- ure, (2) it is based on therapy system which was already evaluated with patients and found to be safe, (3) it is easily supported in participating institutions without much training of ther- apists who provide assistance to the patient and (4) it has shown a moderate effect on chron- ically arm-impaired stroke survivors (Wittmann et al., 2016). Because the ArmeoSenso train- ing only works on proximal arm function, it is not expected to have a clinically relevant effect on activities of daily living, independence, or quality of life. We therefore chose a primary outcome that is close to what is actually being trained, i.e., arm workspace.

The study is enrolling subjects during the initial three-month after stroke. Most recovery is occurring in this period (Krakauer et al., 2012; Murphy and Corbett, 2009; Prabhakaran et al., 2008; Zarahn et al., 2011; Zeiler et al., 2013). Therefore, we expect an improvement in arm function in both groups.

78

Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso-Reward: Study protocol for a randomized controlled trial.

A positive outcome of this trial will emphasize the role of reward in rehabilitative training. This result could potentially be applicable to various forms of post-stroke rehabilitative train- ing.

Declarations

Ethics Approval and Consent to Participate The study will follow GCP-guidelines and was approved by the local Cantonal ethics commit- tee “Ethikkommission Nordwest- und Zentralschweiz” (LU2013-079 / PB_2016-01804) and the Swissmedic (2014-MD-0033). All subjects have to give written informed consent in ac- cordance with the Declaration of Helsinki.

Competing Interests Andreas R. Luft is a scientific advisor to Hocoma AG (Volketswil). The remaining authors have no conflict of interest in the submission of this manuscript.

Funding This work was supported by the Clinical Research Priority Program (CRPP) NeuroRehab of the University of Zurich, the P&K Pühringer Foundation, the Swiss Commission for Technology and Innovation (CTI Grant 13612.1) and the ETH Foundation (ETH Research Grant ET-17 13- 2)

Authors' Contributions MW and JPH drafted the manuscript. ARL sponsors the study. MW, JPH, KL, ARL, OL and FW participated in the study design and definition of requirements. MW, KL and FW imple- mented the therapy system. All authors read and approved the final manuscript.

Acknowledgments The authors would like to thank Mark van Raai for his help in the implementation of the ArmeoSenso software, Irene Christen, Jose Lopez and Belen Valladares for their support with the study.

79

80

General discussion

81

General discussion

Main Findings With the present thesis, we successfully translated evidence from animal studies, suggesting a critical role of the dopaminergic reward system in the facilitation of M1 synaptic plasticity and, thus, the promotion of motor skill learning, to humans. Our findings are in line with previous research, corroborating the positive influence that reward has on the consolidation of a recently learned motor task. Importantly, the present thesis substantially extends the current scientific knowledge by comparing the impact of an extrinsic monetary reward and performance feedback as an intrinsic reward upon motor skill learning and neural activations. Moreover, an fMRI paradigm and a clinical trial protocol have been developed to investigate the processing of motor reward in stroke patients and to evaluate a potential effect of re- warding feedback including monetary rewards on the recovery process of upper limb func- tions following stroke, respectively.

But first things first, this thesis offers a review on studies that have used the MID task to investigate the neural processing of reward (and punishment) in healthy human subjects, but also in clinical populations. Chapter 2 gives an overview of different utilizations of the MID task by outlining the neuronal processes involved in distinct aspects of human reward pro- cessing, such as anticipation versus consumption, reward versus punishment, and, with a special focus, reward-based learning processes. Building on gained knowledge from the re- view, we could demonstrate that providing performance feedback linked to a monetary re- ward (i.e., KP good +MR) during the training of a motor skill more strongly activates the ventral striatum and improves overnight consolidation of the learned skill when compared to a sim- ilar training without monetary rewards (i.e, KP good ). When comparing different schedules for intrinsic reward, on the other hand, it was rather randomly presented performance feedback

(i.e., KP random ) than performance feedback provided systematically for good training trials that was associated with stronger activation of the ventral striatum and better skill consolidation. Nevertheless, these results suggest that increasing ventral striatal activity during motor train- ing through appropriate reward could help to improve motor learning (Chapter 3 ).

The most efficient learning condition from Chapter 3 (i.e., KP good +MR) was then used to test the hypothesis of impaired processing of reward related to motor performance after stroke when compared with age-matched healthy individuals. However, certain modifications of the fMRI paradigm were necessary in order to be able to compare the different populations (see "Imaging Studies"). Pilot results revealed a trend towards reduced responsiveness of ventral parts of the striatum in stroke patients (Chapter 4 ). Linking this to findings from Chapter 3 , a

82

General discussion reduced responsiveness of the ventral striatum to a motor performance derived reward (ex- trinsic or intrinsic) in a similar training condition could be an implication for a blunted learning ability in patients after stroke. The ability to learn, however, is supposed to support motor recovery. More data are needed to verify this tendency and to better understand the under- lying mechanisms.

Nonetheless, the amplification of rewarding stimuli, e.g., by linking feedback about motor performance to a reward, may be a means to stimulate the dopaminergic system of stroke patients in order to support M1 plasticity and hence recovery of motor functions. This led us to initiate a randomized controlled trial which allows to detect a potential clinical effect of increased motivation to train induced by amplified reward, but with training dose and inten- sity matched between the two different training conditions. The resulting study protocol (Chapter 5) is therefore an essential outcome of this thesis.

Specific findings have been discussed in the corresponding chapters. Thus, in this last chap- ter, findings and shortcomings will be discussed in conjunction with each other. Moreover, an outlook and a conclusion will be given.

Imaging Studies (Chapters 3 and 4) A strength of this thesis is the modification of the arc-pointing task (Shmuelof et al., 2012) into a motor MID task that can be trained in a MRI scanner (Widmer et al., 2016; Widmer et al., 2017). The combined use of the motion capture system and the MRI scanner allows for precise kinematic movement analysis while using fMRI to examine the underlying neural pro- cesses. Our imaging data analysis was focused on the striatum, as it has been shown to play a key role in the processing of intrinsic and extrinsic reward derived from motor performance (Lutz et al., 2012; reviewed in Lutz and Widmer, 2014).

Strictly speaking, the term "motor MID task" fully applies only to Widmer et al. (2017) (Chap- ter 4 ), as in Widmer et al. (2016) (Chapter 3 ) just one study group received a delayed mone- tary reward that depended on the subject's preceding motor performance. For the other two study groups, however, the (intrinsic) reward consisted of a knowledge of performance feed- back including a score that was presented either after well-solved, or after randomly selected trials. Although the random selection seems to have induced higher prediction errors and thus elicited (descriptively) stronger activation in the ventral striatum (O'Doherty et al., 2004), strongest striatal involvement was observed when good performance feedback was

83

General discussion linked to an additional monetary reward which, in turn, induced the most stable learning effects (Widmer et al., 2016). Therefore, we have chosen the KP good +MR condition to investi- gate the integrity of the reward system in stroke patients. However, while the time that was allowed to perform a movement was prescribed in Widmer et al. (2016), no time limits were set in Widmer et al. (2017). Consequently, while the performance measure ( ̅) in the former experiment depended on speed and accuracy, the latter study used PCTin, a merely spatial performance measure, to determine knowledge of performance and monetary rewards. The removal of time restrictions might therefore have shifted the attention of our subjects away from timed and towards precise movement execution, as reflected in longer average move- ment times. Moreover, Widmer et al. (2017) allowed for an automated difficulty adaption by changing the channel size according to the subjects' performance during a familiarization period. This was implemented since performance was expected to vary substantially be- tween individuals and study groups, even though stroke patients performed the task with their ipsilesional upper limb (Winstein et al., 1999). The difficulty adaption should thus coun- ter this and equalize the range of monetary rewards gained on average in order to prevent differential reward magnitudes from influencing the neural activations (Knutson et al., 2001a).

To summarize, the removal of time restrictions and the implementation of a difficulty adap- tion were necessary to investigate the neural processing of a motor performance dependent reward, when otherwise the performance might systematically differ between the two study groups. These adjustments make it impossible to compare strictly defined motor skill learn- ing between individuals and groups, and therefore hamper the comparability of behavioral results between the studies of Chapter 3 and 4. Striatal activations, on the other hand, are still derived from the same contrast, which still depends on the subject's performance in the task. Thus, comparing ventral striatal activations of the KP good +MR-group (Figure 3.3) and the healthy elderly control group (Figure 4.2), they seem to be in a similar range. Whether age influences reward processing in motor tasks is, however, not yet fully clear (see "Open Ques- tions and Outlook").

Moreover, using this newly developed motor MID task, we observed a trend towards lower activation of the ventral parts of the striatum in stroke patients when compared with age- matched healthy individuals (Widmer et al., 2017). This is of interest, as the ventral striatum was more strongly activated in the group that showed significantly improved motor skill con- solidation in our study on healthy young subjects (Widmer et al., 2016) and thus seems to

84

General discussion play an important role in the improvement of learning induced by rewarding feedback. In a cognitive task, Lam et al. (2016) found a deficit in reward processing after stroke, which was reflected in impaired reinforcement learning when compared to healthy age-matched peers. When using a motor task, on the other hand, Winstein et al. (1999) demonstrated that stroke related damage in sensorimotor areas primarily affects the processes underlying the control and execution of motor skills rather than the learning of those. However, we have learned from animal studies that proper functioning of the dopaminergic reward system is necessary for successful motor skill learning (Molina-Luna et al., 2009). Thus, a reduced responsiveness of the ventral striatum to a reward derived from motor performance, be it extrinsic or intrin- sic, could be an implication for a blunted learning ability in stroke patients.

Studies on Stroke Patients (Chapters 4 and 5) The ability to learn is supposed to support motor recovery after stroke (Dominguez-Borras et al., 2013; Krakauer, 2006; Russell et al., 2013) and motor learning can be improved by reward amplification (Abe et al., 2011; Galea et al., 2015; Wachter et al., 2009; Widmer et al., 2016). Moreover, reward has been shown to increase motivation for a specific exercise (Pessiglione et al., 2007; Studer and Knecht, 2016). Therefore, Nielsen et al. (2015) high- lighted, among other factors, the potential of reward for rehabilitative training in a recent review article giving science-based recommendations for neurorehabilitation. It is thus a log- ical step to try to utilize reward to boost neuroplasticity and, thus, neurorehabilitation of motor functions after stroke. Hence, we came up with a trial protocol investigating rewards in the form of performance feedback and monetary gains as ways to improve effectiveness of rehabilitative training of the upper limb. Although there are a number of studies that have used enhanced feedback in virtual reality environments for rehabilitative training of the up- per limb, none of them, to our best knowledge, have used monetary rewards and none of them have compared virtual reality using enhanced feedback to a control condition that is truly similar in dose and intensity (Brunner et al., 2016; Kaur et al., 2012). As described in Chapter 5, our approach therefore was to take a therapy system that is known to be engaging (Wittmann et al., 2015; Wittmann et al., 2016), add a performance dependent monetary re- ward (intervention group), and then to systematically remove rewarding features for the control condition, but leaving just enough feedback to enable a similar training. In doing so, any difference between the study groups could clearly be attributed to the reward interven- tion. However, it might seem puzzling to, on the one hand, hypothesize deficient reward

85

General discussion processing after stroke (Widmer et al., 2017), but on the other hand to try to improve efficacy of rehabilitative training by utilizing reward. The reasoning behind this is that reward ampli- fication during rehabilitative training might be a means to overcome such a deficit in order to stimulate the dopaminergic system to improve recovery.

Open Questions and Outlook

One question that remains unanswered is whether a KP random +MR training condition in the study reported in Chapter 3 would even further increase the feedback related response in the ventral striatum and could, hence, further improve motor skill learning. However, based on findings from Chiviacowsky and Wulf (2007) and from Abe et al. (2011), we tried to use performance feedback after good performance, and reward to influence skill learning. While, unexpectedly, a random feedback schedule was more effective than systematically providing knowledge of performance after well-solved trials, we could verify our hypothesis of a posi- tive influence of performance feedback linked to a monetary outcome on striatal activation and efficacy of motor skill learning. Anyhow, it is, by now, not planned to address the ques- tion posed at the beginning of this paragraph with further experiments. In Chapter 3 , we concluded that increasing ventral striatal activity during training by using appropriate reward improves the consolidation of the acquired skill. Performance feedback with contingent mon- etary reward delivered after good performance has been proven effective in that regard and will therefore be used for future experiments.

Furthermore, Chapter 4 left us with inconclusive data showing tendencies that must be ver- ified with further measurements. We have thus developed a follow-up study enabling us to gather more data regarding the processing of motor performance dependent reward after stroke using our newly developed motor MID task. But this proposed study does not only aim at finding general differences between stroke patients and a healthy control group. The study is also designed to uncover influencing or mediating factors like post-stroke depression, age, lesion location et cetera. Moreover, it will be tested whether deficient reward processing impacts the rehabilitation progress in certain stroke subpopulations. However, this follow- up trial and the corresponding protocol are not part of this thesis and will thus not be dis- cussed here in more detail.

In addition, we have used the motor MID task described in Chapter 4 to investigate whether the striatal response to reward derived from motor performance changes over the lifespan

86

General discussion by comparing a group of young (26 years on average) and a group of elderly (65 years on average) healthy adults (Master thesis of Samara Stulz). As mentioned before, data of this thesis suggests similar activations in young and elderly (see Figures 3.3 and 4.2). However, the tasks differed in some essential features (see "Imaging Studies"). Although data of this investigation are still preliminary, we can already conclude that we will continue to compare stroke patients and healthy controls in an age-matched manner.

Moreover, while concept and design and, hence, the study protocol for the ArmeoSenso- Reward trial (Chapter 5) were developed and implemented within a reasonable time, more problems have arisen when trying to implement the trial in everyday clinical practice. Partic- ularly, it has become apparent that the recruitment of patients is even more difficult than expected. Competing clinical trials and overall availability of patients eligible for the study are the main causes for this problem. As a result, we have extended the study to two addi- tional study sites, thereby obviously increasing the number of patients that can be screened for inclusion. We are thus positive that the ambitious number of 74 stroke patients will be reached in due time. A positive outcome of this trial would emphasize the role of rewarding features in rehabilitative training, a result which could potentially be applicable to various forms of post-stroke rehabilitative training.

Conclusion Based on current literature, we have developed a fMRI paradigm enabling us to investigate the contribution of performance feedback and reward to motor skill learning, and to test for a deficit in the neural processing of motor performance derived reward after stroke. In line with findings from animal studies, we were able to demonstrate a positive influence of re- ward on motor skill learning in healthy young humans. This effect was linked to an increased ventral striatal response to the presentation of the rewarding feedback. When comparing different schedules for intrinsic reward, on the other hand, it was rather randomly presented performance feedback than performance feedback provided systematically for good training trials that was associated with stronger activation of the ventral striatum and better skill con- solidation. In patients after stroke, however, preliminary data points towards a blunted re- sponse of the ventral striatum during the processing of performance feedback linked to a monetary reward. Still, amplification of performance feedback by means of monetary re- wards could make up for deficient processing of feedback about one's own performance. Therefore, we have designed a randomized controlled trial investigating rewards in the form

87

General discussion of performance feedback and monetary gains as ways to improve effectiveness of rehabili- tative upper limb training after stroke. As both study arms are using identical training sched- ules and algorithms for target placement and difficulty adaption, time and dose of training should be well matched between the intervention and the control group. Hence, the newly developed fMRI paradigm and the initiated randomized controlled trial will allow to verify the hypotheses of deficient motor reward processing and of a clinical effect of reward on a reduction in upper limb impairment in patients after stroke, respectively.

88

89

90

References

Abe, M., Schambra, ., Wassermann, E.M., Luckenbaugh, D., Schweighofer, N., Cohen, .G., 2011. Reward improves long-term retention of a motor memory through induction of offline memory gains. Curr Biol. 21 , 557-562. Abraham, .C., 2003. How long will long-term potentiation last? Philos Trans R Soc Lond B Biol Sci. 358 , 735-744. Aho, K., Harmsen, P., Hatano, S., Marquardsen, J., Smirnov, V.E., Strasser, T., 1980. Cerebrovascular disease in the community: results of a WHO collaborative study. Bull World Health Organ. 58 , 113-130. Anderson, C.S., Linto, J., Stewart-Wynne, E.G., 1995. A population-based assessment of the impact and burden of caregiving for long-term stroke survivors. Stroke. 26 , 843-849. Aron, A.R., Shohamy, D., Clark, J., Myers, C., Gluck, M.A., Poldrack, R.A., 2004. Human midbrain sensitivity to cognitive feedback and uncertainty during classification learning. J Neurophysiol. 92 , 1144-1152. Barros-Loscertales, A., Meseguer, V., Sanjuan, A., Belloch, V., Parcet, M.A., Torrubia, R., Avila, C., 2006. Striatum gray matter reduction in males with an overactive behavioral activation system. Eur J Neurosci. 24 , 2071-2074. Beaver, J.D., Lawrence, A.D., Van Ditzhuijzen, J., Davis, M.H., Woods, A., Calder, A.J., 2006. Individual differences in reward drive predict neural responses to images of food. J Neurosci. 26 , 5160-5166. Benningfield, M.M., Blackford, J.U., Ellsworth, M.E., Samanez-Larkin, G.R., Martin, P.R., Cowan, R.L., Zald, D.H., 2014. Caudate responses to reward anticipation associated with delay discounting behavior in healthy youth. Dev Cogn Neurosci. 7 , 43-52. Bickel, W.K., Miller, M.L., Yi, R., Kowal, B.P., Lindquist, D.M., Pitcock, J.A., 2007. Behavioral and neuroeconomics of drug addiction: Competing neural systems and temporal discounting processes. Drug Alcohol Depend. 90 , S85-S91. Bjork, J.M., Momenan, R., Smith, A.R., Hommer, D.W., 2008. Reduced posterior mesofrontal cortex activation by risky rewards in substance-dependent patients. Drug Alcohol Depend. 95 , 115-128. Bjork, J.M., Smith, A.R., Chen, G., Hommer, D.W., 2010. Adolescents, Adults and Rewards: Comparing Motivational Neurocircuitry Recruitment Using fMRI. Plos One. 5, e11440. Bjork, J.M., Smith, A.R., Chen, G., Hommer, D.W., 2011. Psychosocial problems and recruitment of incentive neurocircuitry: Exploring individual differences in healthy adolescents. Dev Cogn Neurosci. 1 , 570-577. Breiter, H.C., Aharon, I., Kahneman, D., Dale, A., Shizgal, P., 2001. Functional imaging of neural responses to expectancy and experience of monetary gains and losses. Neuron. 30 , 619-639. Brunner, I., Skouen, J.S., Hofstad, H., Aßmuss, J., Becker, F., Pallesen, H., Thijs, L., Verheyden, G., 2016. Is upper limb virtual reality training more intensive than conventional training for patients in the subacute phase after stroke? An analysis of treatment intensity and content. BMC Neurol. 16 , 219. Bunzeck, N., Guitart-Masip, M., Dolan, R.J., Duzel, E., 2011. Contextual novelty modulates the neural dynamics of reward anticipation. J Neurosci. 31 , 12816-12822.

91 References

Bunzeck, N., Doeller, C.F., Dolan, R.J., Duzel, E., 2012. Contextual interaction between novelty and reward processing within the mesolimbic system. Hum Brain Mapp. 33 , 1309- 1324. Businelle, M.S., McVay, M.A., Kendzor, D., Copeland, A., 2010. A comparison of delay discounting among smokers, substance abusers, and non-dependent controls. Drug Alcohol Depend. 112 , 247-250. Callan, D.E., Schweighofer, N., 2008. Positive and negative modulation of word learning by reward anticipation. Hum Brain Mapp. 29 , 237-249. Camara, E., Rodriguez-Fornells, A., Munte, T.F., 2010. Microstructural Brain Differences Predict Functional Hemodynamic Responses in a Reward Processing Task. J Neurosci. 30 , 11398-11402. Carmichael, S.T., 2003. Plasticity of cortical projections after stroke. Neuroscientist. 9 , 64-75. Carr, J.H., 1987. Movement Science: Foundations for Physical Therapy in Rehabilitation. Aspen Publishers, Rockville (MD), USA. Carter, R.M., Maclnnes, J.J., Huettel, S.A., Adcock, R.A., 2009. Activation in the VTA and nucleus accumbens increases in anticipation of both gains and losses. Front Behav Neurosci. 3 , 21. Chen, C., Leys, D., Esquenazi, A., 2013. The interaction between neuropsychological and motor deficits in patients after stroke. Neurology. 80 , S27-S34. Chiviacowsky, S., Wulf, G., 2007. Feedback after good trials enhances learning. Res Exerc Sport. 78 , 40-47. Cohen, M.X., Schoene-Bake, J.C., Elger, C.E., Weber, B., 2009. Connectivity-based segregation of the human striatum predicts personality characteristics. Nat Neurosci. 12 , 32-34. Cooke, E.V., Mares, K., Clark, A., Tallis, R.C., Pomeroy, V.M., 2010. The effects of increased dose of exercise-based therapies to enhance motor recovery after stroke: a systematic review and meta-analysis. BMC Med. 8 , 60. Corr, P.J., 2004. Reinforcement sensitivity theory and personality. Neurosci Biobehav Rev. 28 , 317-332. Costumero, V., Barros-Loscertales, A., Bustamante, J.C., Ventura-Campos, N., Fuentes, P., Avila, C., 2013. Reward sensitivity modulates connectivity among reward brain areas during processing of anticipatory reward cues. Eur J Neurosci. 38 , 2399-2407. Craig, W., 1917. Appetites and aversions as constituents of instincts. Proc Natl Acad Sci U S A. 3 , 685-688. Daniel, R., Pollmann, S., 2010. Comparing the Neural Basis of Monetary Reward and Cognitive Feedback during Information-Integration Category Learning. J Neurosci. 30 , 47-55. Deci, E.L., Koestner, R., Ryan, R.M., 1999. A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psychol Bull. 125 , 627-668; discussion 692-700. Deci, E.L., Koestner, R., Ryan, R.M., 2001. Extrinsic rewards and intrinsic motivation in education: Reconsidered once again. Rev Educ Res. 71 , 1-27. Delgado, M.R., Nystrom, L.E., Fissell, C., Noll, D.C., Fiez, J.A., 2000. Tracking the hemodynamic responses to reward and punishment in the striatum. J Neurophysiol. 84 , 3072-3077. Delgado, M.R., Locke, H.M., Stenger, V.A., Fiez, J.A., 2003. Dorsal striatum responses to reward and punishment: effects of valence and magnitude manipulations. Cogn Affect Behav Neurosci. 3 , 27-38. Delgado, M.R., Stenger, V.A., Fiez, J.A., 2004. Motivation-dependent responses in the human caudate nucleus. Cereb Cortex. 14 , 1022-1030. Demurie, E., Roeyers, H., Baeyens, D., Sonuga-Barke, E.J.S., 2011. Common alterations in sensitivity to type but not amount of reward in ADHD and autism spectrum disorders. J Child Psychol Psychiatry. 52 , 1164-1173.

92 References

Demurie, E., Roeyers, H., Wiersema, J.R., Sonuga-Barke, E., 2013. No Evidence for Inhibitory Deficits or Altered Reward Processing in ADHD: Data From a New Integrated Monetary Incentive Delay Go/No-Go Task. J Atten Disord. den Ouden, H.E.M., Daw, N.D., Fernandez, G., Elshout, J.A., Rijpkema, M., Hoogman, M., Franke, B., Cools, R., 2013. Dissociable Effects of Dopamine and Serotonin on Reversal Learning. Neuron. 80 , 1090-1100. Deutsch, K.M., Newell, K.M., 2004. Changes in the structure of children's isometric force variability with practice. J Exp Child Psychol. 88 , 319-333. Diekhof, E.K., Falkai, P., Gruber, O., 2008. Functional neuroimaging of reward processing and decision-making: A review of aberrant motivational and affective processing in addiction and mood disorders. Brain Res Rev. 59 , 164-184. Dominguez-Borras, J., Armony, J.L., Maravita, A., Driver, J., Vuilleumier, P., 2013. Partial recovery of visual extinction by pavlovian conditioning in a patient with hemispatial neglect. Cortex. 49 , 891-898. Domjan, M.P., 2009. Principles of Learning and Behavior. CengageBrain.com. Donamayor, N., Schoenfeld, M.A., Munte, T.F., 2012. Magneto- and electroencephalographic manifestations of reward anticipation and delivery. Neuroimage. 62 , 17-29. Donkers, F.C.L., Nieuwenhuis, S., van Boxtel, G.J.M., 2005. Mediofrontal negativities in the absence of responding. Cogn Brain Res. 25 , 777-787. Elliott, R., Frith, C.D., Dolan, R.J., 1997. Differential neural response to positive and negative feedback in planning and guessing tasks. Neuropsychologia. 35 , 1395-1404. Elliott, R., Friston, K.J., Dolan, R.J., 2000. Dissociable neural responses in human reward systems. J Neurosci. 20 , 6159-6165. Ernst, M., Nelson, E.E., Jazbec, S., McClure, E.B., Monk, C.S., Leibenluft, E., Blair, J., Pine, D.S., 2005. Amygdala and nucleus accumbens in responses to receipt and omission of gains in adults and adolescents. Neuroimage. 25 , 1279-1291. Feigin, V.L., Lawes, C.M., Bennett, D.A., Barker-Collo, S.L., Parag, V., 2009. Worldwide stroke incidence and early case fatality reported in 56 population-based studies: a systematic review. Lancet Neurol. 8 , 355-369. Filbey, F.M., Dunlop, J., Myers, U.S., 2013. Neural Effects of Positive and Negative Incentives during Marijuana Withdrawal. Plos One. 8 , e61470. FitzGerald, T.H.B., Friston, K.J., Dolan, R.J., 2012. Action-Specific Value Signals in Reward- Related Regions of the Human Brain. J Neurosci. 32 , 16417a-16423a. Frank, M.J., Seeberger, L.C., O'Reilly R, C., 2004. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science. 306 , 1940-1943. Frey, B.S., 1994. How Intrinsic Motivation Is Crowded out and In. Rationality and Society. 6 , 334-352. Frey, B.S., Jegen, R., 2001. Motivation crowding theory. J Econ Surv. 15 , 589-611. Friston, K.J., Holmes, A.P., Poline, J.B., Grasby, P.J., Williams, S.C., Frackowiak, R.S., Turner, R., 1995. Analysis of fMRI time-series revisited. Neuroimage. 2 , 45-53. Galea, J.M., Mallia, E., Rothwell, J., Diedrichsen, J., 2015. The dissociable effects of punishment and reward on motor learning. Nat Neurosci. 18 , 597-602. Galvan, A., Hare, T.A., Parra, C.E., Penn, J., Voss, H., Glover, G., Casey, B.J., 2006. Earlier development of the accumbens relative to orbitofrontal cortex might underlie risk- taking behavior in adolescents. J Neurosci. 26 , 6885-6892. Gandolfo, F., Mussa-Ivaldi, F.A., Bizzi, E., 1996. Motor learning by field approximation. Proc Natl Acad Sci U S A. 93 , 3843-3846. Glascher, J., Hampton, A.N., O'Doherty, J.P., 2009. Determining a Role for Ventromedial Prefrontal Cortex in Encoding Action-Based Value Signals During Reward-Related Decision Making. Cereb Cortex. 19 , 483-495.

93 References

Glover, G.H., Li, T.Q., Ress, D., 2000. Image-based method for retrospective correction of physiological motion effects in fMRI: RETROICOR. Magnet Reson Med. 44 , 162-167. Go, A.S., Mozaffarian, D., Roger, V.L., Benjamin, E.J., Berry, J.D., Blaha, M.J., Dai, S., Ford, E.S., Fox, C.S., Franco, S., Fullerton, H.J., Gillespie, C., Hailpern, S.M., Heit, J.A., Howard, V.J., Huffman, M.D., Judd, S.E., Kissela, B.M., Kittner, S.J., Lackland, D.T., Lichtman, J.H., Lisabeth, L.D., Mackey, R.H., Magid, D.J., Marcus, G.M., Marelli, A., Matchar, D.B., McGuire, D.K., Mohler, E.R., Moy, C.S., Mussolino, M.E., Neumar, R.W., Nichol, G., Pandey, D.K., Paynter, N.P., Reeves, M.J., Sorlie, P.D., Stein, J., Towfighi, A., Turan, T.N., Virani, S.S., Wong, N.D., Woo, D., Turner, M.B., 2014. Heart Disease and Stroke Statistics—2014 Update. Circulation. 129 , e28-e292. Greene, D., Lepper, M.R., 1974. Effects of extrinsic rewards on children's subsequent intrinsic interest. Child Dev. 45 , 1141-1145. Guitart-Masip, M., Bunzeck, N., Stephan, K.E., Dolan, R.J., Duzel, E., 2010. Contextual Novelty Changes Reward Representations in the Striatum. J Neurosci. 30 , 1721-1726. Guo, C.C., Raymond, J.L., 2010. Motor learning reduces eye movement variability through reweighting of sensory inputs. J Neurosci. 30 , 16241-16248. Haber, S.N., Knutson, B., 2010. The reward circuit: linking primate anatomy and human imaging. Neuropsychopharmacology. 35 , 4-26. Hahn, T., Dresler, T., Ehlis, A.C., Plichta, M.M., Heinzel, S., Polak, T., Lesch, K.P., Breuer, F., Jakob, P.M., Fallgatter, A.J., 2009. Neural response to reward anticipation is modulated by Gray's impulsivity. Neuroimage. 46 , 1148-1153. Hahn, T., Dresler, T., Ehlis, A.C., Pyka, M., Dieler, A.C., Saathoff, C., Jakob, P.M., Lesch, K.P., Fallgatter, A.J., 2012. Randomness of resting-state brain oscillations encodes Gray's personality trait. Neuroimage. 59 , 1842-1845. Han, S., Huettel, S.A., Raposo, A., Adcock, R.A., Dobbins, I.G., 2010. Functional significance of striatal responses during episodic decisions: recovery or goal attainment? J Neurosci. 30 , 4767-4775. Harvey, A.K., Pattinson, K.T.S., Brooks, J.C.W., Mayhew, S.D., Jenkinson, M., Wise, R.G., 2008. Brainstem Functional Magnetic Resonance Imaging: Disentangling Signal From Physiological Noise. J Magn Reson Imaging. 28 , 1337-1344. Helfinstein, S.M., Kirwan, M.L., Benson, B.E., Hardin, M.G., Pine, D.S., Ernst, M., Fox, N.A., 2013. Validation of a child-friendly version of the monetary incentive delay task. Soc Cogn Affect Neurosci. 8 , 720-726. Hollerman, J.R., Schultz, W., 1998. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci. 1 , 304-309. Holroyd, C.B., Nieuwenhuis, S., Yeung, N., Nystrom, L., Mars, R.B., Coles, M.G.H., Cohen, J.D., 2004. Dorsal anterior cingulate cortex shows fMRI response to internal and external error signals. Nat Neurosci. 7 , 497-498. Holroyd, C.B., Krigolson, O.E., Baker, R., Lee, S., Gibson, J., 2009. When is an error not a prediction error? An electrophysiological investigation. Cogn Affect Behav Neurosci. 9, 59-70. Hosp, J.A., Pekanovic, A., Rioult-Pedotti, M.S., Luft, A.R., 2011. Dopaminergic projections from midbrain to primary motor cortex mediate motor skill learning. J Neurosci. 31 , 2481-2487. Huang, Y.Y., Kandel, E.R., 1995. D1/D5 receptor agonists induce a protein synthesis- dependent late potentiation in the CA1 region of the hippocampus. Proc Natl Acad Sci U S A. 92 , 2446-2450. Hung, Y.C., Kaminski, T.R., Fineman, J., Monroe, J., Gentile, A.M., 2008. Learning a multi-joint throwing task: a morphometric analysis of skill development. Exp Brain Res. 191 , 197-208.

94 References

Hutton, C., Josephs, O., Stadler, J., Featherstone, E., Reid, A., Speck, O., Bernarding, J., Weiskopf, N., 2011. The impact of physiological noise correction on fMRI at 7 T. Neuroimage. 57 , 101-112. ICH, 1996. ICH Harmonised Tripartite Guideline. Guideline for Good Clinical Practice E6(R1). Jager, G., Block, R.I., Luijten, M., Ramsey, N.F., 2013. Tentative Evidence for Striatal Hyperactivity in Adolescent Cannabis-Using Boys: A Cross-Sectional Multicenter fMRI Study. J Psychoactive Drugs. 45 , 156-167. Kaplan, F., Oudeyer, P.-Y., 2007. In search of the neural circuits of intrinsic motivation. Front Neurosci. 1 , 225–236. Kappel, V., Koch, A., Lorenz, R.C., Bruhl, R., Renneberg, B., Lehmkuhl, U., Salbach-Andrae, H., Beck, A., 2013. CID: a valid incentive delay paradigm for children. J Neural Transm. 120 , 1259-1270. Kasper, L., Marti, S., Vannesjö, S., Hutton, C., Dolan, R., Weiskopf, N., Stephan, K., Prüssmann, K., 2009. Cardiac artefact correction for human brainstem fMRI at 7 Tesla. In: Proceedings of the Organization for Human Brain Mapping, Vol. 15. San Francisco (CA), USA. Katsyri, J., Hari, R., Ravaja, N., Nummenmaa, L., 2013. Just watching the game ain't enough: striatal fMRI reward responses to successes and failures in a video game during active and vicarious playing. Front Hum Neurosci. 7. Kaur, G., English, C., Hillier, S., 2012. How physically active are people with stroke in physiotherapy sessions aimed at improving motor function? A systematic review. Stroke Res Treat. 2012. Kawasaki, M., Yamaguchi, Y., 2013. Frontal theta and beta synchronizations for monetary reward increase visual working memory capacity. Soc Cogn Affect Neurosci. 8 , 523- 530. Kleim, J.A., Hogg, T.M., VandenBerg, P.M., Cooper, N.R., Bruneau, R., Remple, M., 2004. Cortical synaptogenesis and motor map reorganization occur during late, but not early, phase of motor skill learning. J Neurosci. 24 , 628-633. Klein, T.A., Neumann, J., Reuter, M., Hennig, J., von Cramon, D.Y., Ullsperger, M., 2007. Genetically determined differences in learning from errors. Science. 318 , 1642-1645. Kluger, A.N., DeNisi, A., 1996. The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychol Bull. 119 , 254-284. Knutson, B., Westdorp, A., Kaiser, E., Hommer, D., 2000. FMRI visualization of brain activity during a monetary incentive delay task. Neuroimage. 12 , 20-27. Knutson, B., Adams, C.M., Fong, G.W., Hommer, D., 2001a. Anticipation of increasing monetary reward selectively recruits nucleus accumbens. J Neurosci. 21 , RC159. Knutson, B., Fong, G.W., Adams, C.M., Varner, J.L., Hommer, D., 2001b. Dissociation of reward anticipation and outcome with event-related fMRI. Neuroreport. 12 , 3683- 3687. Knutson, B., Fong, G.W., Bennett, S.M., Adams, C.M., Hommer, D., 2003. A region of mesial prefrontal cortex tracks monetarily rewarding outcomes: characterization with rapid event-related fMRI. Neuroimage. 18 , 263-272. Knutson, B., Cooper, J.C., 2005. Functional magnetic resonance imaging of reward prediction. Curr Opin Neurol. 18 , 411-417. Knutson, B., Wimmer, G.E., 2007. Splitting the difference: how does the brain code reward episodes? Ann N Y Acad Sci. 1104 , 54-69. Knutson, B., Delgado, M.R., Phillips, P.E.M., 2009. Representation of Subjective Value in the Striatum. In: Neuroeconomics. Camerer, C.F., Fehr, E., Poldrack, R.A., (editors). Academic Press, London, 389-406.

95 References

Koepp, M.J., Gunn, R.N., Lawrence, A.D., Cunningham, V.J., Dagher, A., Jones, T., Brooks, D.J., Bench, C.J., Grasby, P.M., 1998. Evidence for striatal dopamine release during a video game. Nature. 393 , 266-268. Kohn, A., 1999. Punished by rewards: The trouble with gold stars, incentive plans, A's, praise, and other bribes. Houghton Mifflin Harcourt, Boston (MA), USA. Krakauer, J.W., Ghilardi, M.F., Ghez, C., 1999. Independent learning of internal models for kinematic and dynamic control of reaching. Nat Neurosci. 2 , 1026-1031. Krakauer, J.W., 2006. Motor learning: its relevance to stroke recovery and neurorehabilitation. Curr Opin Neurol. 19 , 84-90. Krakauer, J.W., Carmichael, S.T., Corbett, D., Wittenberg, G.F., 2012. Getting Neurorehabilitation Right: What Can Be Learned From Animal Models? Neurorehabil Neural Repair. 26 , 923-931. Kuo, M.F., Paulus, W., Nitsche, M.A., 2008. Boosting focally-induced brain plasticity by dopamine. Cereb Cortex. 18 , 648-651. Kwakkel, G., Kollen, B.J., van der Grond, J., Prevo, A.J., 2003. Probability of regaining dexterity in the flaccid upper limb: impact of severity of paresis and time since onset in acute stroke. Stroke. 34 , 2181-2186. Kwakkel, G., Kollen, B., Lindeman, E., 2004. Understanding the pattern of functional recovery after stroke: facts and theories. Restor Neurol Neurosci. 22 , 281-299. Kwakkel, G., 2006. Impact of intensity of practice after stroke: issues for consideration. Disabil Rehabil. 28 , 823-830. Kwakkel, G., Kollen, B.J., Krebs, H.I., 2008. Effects of robot-assisted therapy on upper limb recovery after stroke: a systematic review. Neurorehabil Neural Repair. 22 , 111-21. Lam, J.M., Globas, C., Hosp, J.A., Karnath, H.O., Wachter, T., Luft, A.R., 2016. Impaired implicit learning and feedback processing after stroke. Neuroscience. 314 , 116-124. Liu, Y.T., Mayer-Kress, G., Newell, K.M., 2006. Qualitative and quantitative change in the dynamics of motor learning. J Exp Psychol Hum Percept Perform. 32 , 380-393. Lomborg, B., 2015. The spread of western disease: 'The poor are dying more and more like the rich' The Guardian, https://www.theguardian.com/global/2015/mar /02/stroke-heart-disease-attack-cancer-developing-countries, accessed 14th of December 2016. Luft, A.R., McCombe-Waller, S., Whitall, J., Forrester, L.W., Macko, R., Sorkin, J.D., Schulz, J.B., Goldberg, A.P., Hanley, D.F., 2004. Repetitive bilateral arm training and motor cortex activation in chronic stroke: a randomized controlled trial. JAMA. 292 , 1853- 1861. Lutz, K., Pedroni, A., Nadig, K., Luechinger, R., Jancke, L., 2012. The rewarding value of good motor performance in the context of monetary incentives. Neuropsychologia. 50 , 1739-1747. Lutz, K., Puorger, R., Cheetham, M., Jancke, L., 2013. Development of ERN together with an internal model of audio-motor associations. Front Hum Neurosci. 7 , 471. Lutz, K., Widmer, M., 2014. What can the monetary incentive delay task tell us about the neural processing of reward and punishment? Neurosci Neuroecon. 3 , 33-45. Mackay, J., Mensah, G.A., Mendis, S., Greenlund, K., 2004. The atlas of heart disease and stroke. World Health Organization. Mathers, C.D., Lopez, A.D., Murray, C.J.L., 2006. The Burden of Disease and Mortality by Condition: Data, Methods, and Results for 2001. In: Global Burden of Disease and Risk Factors. Lopez, A.D., Mathers, C.D., Ezzati, M., Jamison, D.T., Murray, C.J.L., (editors). World Bank, Washington (DC), USA. Meyer, K., Simmet, A., Arnold, M., Mattle, H., Nedeltchev, K., 2009. Stroke events, and case fatalities in Switzerland based on hospital statistics and cause of death statistics. Swiss Med Wkly. 139 , 65-69.

96 References

Miller, E.L., Murray, L., Richards, L., Zorowitz, R.D., Bakas, T., Clark, P., Billinger, S.A., Assoc, A.H., 2010. Comprehensive Overview of Nursing and Interdisciplinary Rehabilitation Care of the Stroke Patient A Scientific Statement From the American Heart Association. Stroke. 41 , 2402-2448. Miltenberger, R.G., 2011. Behavior modification: Principles and procedures. Cengage Learning. Molina-Luna, K., Pekanovic, A., Rohrich, S., Hertler, B., Schubring-Giese, M., Rioult-Pedotti, M.S., Luft, A.R., 2009. Dopamine in motor cortex is necessary for skill learning and synaptic plasticity. PLoS One. 4 , e7082. Morgante, F., Espay, A.J., Gunraj, C., Lang, A.E., Chen, R., 2006. Motor cortex plasticity in Parkinson's disease and levodopa-induced dyskinesias. Brain. 129 , 1059-1069. Muller, H., Sternad, D., 2004. Decomposition of variability in the execution of goal-oriented tasks: three components of skill improvement. J Exp Psychol Hum Percept Perform. 30 , 212-233. Murayama, K., Matsumoto, M., Izuma, K., Matsumoto, K., 2010. Neural basis of the undermining effect of monetary reward on intrinsic motivation. Proc Natl Acad Sci U S A. 107 , 20911-20916. Murphy, T.H., Corbett, D., 2009. Plasticity during stroke recovery: from synapse to behaviour. Nat Rev Neurosci. 10 , 861-872. Nees, F., Vollstadt-Klein, S., Fauth-Buhler, M., Steiner, S., Mann, K., Poustka, L., Banaschewski, T., Buchel, C., Conrod, P.J., Garavan, H., Heinz, A., Ittermann, B., Artiges, E., Paus, T., Pausova, Z., Rietschel, M., Smolka, M.N., Struve, M., Loth, E., Schumann, G., Flor, H., Consortium, I., 2012. A target sample of adolescents and reward processing: same neural and behavioral correlates engaged in common paradigms? Exp Brain Res. 223 , 429-439. Nestor, L., Hester, R., Garavan, H., 2010. Increased ventral striatal BOLD activity during non- drug reward anticipation in cannabis users. Neuroimage. 49 , 1133-1143. Nichols-Larsen, D.S., Clark, P.C., Zeringue, A., Greenspan, A., Blanton, S., 2005. Factors influencing stroke survivors' quality of life during subacute recovery. Stroke. 36 , 1480-1484. Nielsen, J.B., Willerslev-Olsen, M., Christiansen, L., Lundbye-Jensen, J., Lorentzen, J., 2015. Science-based neurorehabilitation: recommendations for neurorehabilitation from basic science. J Mot Behav. 47 , 7-17. Nielsen, M.O., Rostrup, E., Wulff, S., Bak, N., Broberg, B.V., Lublin, H., Kapur, S., Glenthoj, B., 2012a. Improvement of Brain Reward Abnormalities by Antipsychotic Monotherapy in Schizophrenia. Arch Gen Psychiatry. 69 , 1195-1204. Nielsen, M.O., Rostrup, E., Wulff, S., Bak, N., Lublin, H., Kapur, S., Glenthoj, B., 2012b. Alterations of the brain reward system in antipsychotic naive schizophrenia patients. Biol Psychiatry. 71 , 898-905. Nijenhuis, S.M., Prange, G.B., Amirabdollahian, F., Sale, P., Infarinato, F., Nasr, N., Mountain, G., Hermens, H.J., Stienen, A.H., Buurke, J.H., 2015. Feasibility study into self- administered training at home using an arm and hand device with motivational gaming environment in chronic stroke. J Neuroeng Rehabil. 12 , 1. Nudo, R.J., 2003. Adaptive plasticity in motor cortex: implications for rehabilitation after brain injury. J Rehabil Med. 7-10. O'Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K.J., Dolan, R.J., 2004. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science. 304, 452- 454. O'Doherty, J.P., Deichmann, R., Critchley, H.D., Dolan, R.J., 2002. Neural responses during anticipation of a primary taste reward. Neuron. 33 , 815-826.

97 References

O'Doherty, J.P., Dayan, P., Friston, K., Critchley, H., Dolan, R.J., 2003. Temporal difference models and reward-related learning in the human brain. Neuron. 38 , 329-337. Oldfield, R.C., 1971. The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia. 9 , 97-113. Pagnoni, G., Zink, C.F., Montague, P.R., Berns, G.S., 2002. Activity in human ventral striatum locked to errors of reward prediction. Nat Neurosci. 5 , 97-98. Palminteri, S., Boraud, T., Lafargue, G., Dubois, B., Pessiglione, M., 2009. Brain Hemispheres Selectively Track the Expected Value of Contralateral Options. J Neurosci. 29 , 13465- 13472. Parker, V.M., Wade, D.T., Langton Hewer, R., 1986. Loss of arm function after stroke: measurement, frequency, and recovery. Int Rehabil Med. 8 , 69-73. Patel, K.T., Stevens, M.C., Meda, S.A., Muska, C., Thomas, A.D., Potenza, M.N., Pearlson, G.D., 2013. Robust Changes in Reward Circuitry During Reward Loss in Current and Former Cocaine Users During Performance of a Monetary Incentive Delay Task. Biol Psychiatry. 74 , 529-537. Pessiglione, M., Schmidt, L., Draganski, B., Kalisch, R., Lau, H., Dolan, R.J., Frith, C.D., 2007. How the brain translates money into force: A neuroimaging study of subliminal motivation. Science. 316 , 904-906. Peters, J., Buchel, C., 2011. The neural mechanisms of inter-temporal decision-making: understanding variability. Trends Cogn Sci. 15 , 227-239. Poldrack, R.A., Clark, J., Pare-Blagoev, E.J., Shohamy, D., Moyano, J.C., Myers, C., Gluck, M.A., 2001. Interactive memory systems in the human brain. Nature. 414 , 546-550. Porcelli, A.J., Lewis, A.H., Delgado, M.R., 2012. Acute stress influences neural circuits of reward processing. Front Neurosci. 6 , 157. Prabhakaran, S., Zarahn, E., Riley, C., Speizer, A., Chong, J.Y., Lazar, R.M., Marshall, R.S., Krakauer, J.W., 2008. Inter-individual variability in the capacity for motor recovery after ischemic stroke. Neurorehabil Neural Repair. 22 , 64-71. Price, T.F., Peterson, C.K., Harmon-Jones, E., 2012. The emotive neuroscience of embodiment. Motivation and Emotion. 36 , 27-37. Rademacher, L., Krach, S., Kohls, G., Irmak, A., Grunder, G., Spreckelmeyer, K.N., 2010. Dissociation of neural networks for anticipation and consumption of monetary and social rewards. Neuroimage. 49 , 3276-3285. Ramnani, N., Miall, R.C., 2003. Instructed delay activity in the human prefrontal cortex is modulated by monetary reward expectation. Cereb Cortex. 13 , 318-327. Ramnani, N., Elliott, R., Athwal, B.S., Passinghm, R.E., 2004. Prediction error for free monetary reward in the human prefrontal cortex. Neuroimage. 23 , 777-786. Ranganathan, R., Newell, K.M., 2010. Influence of motor learning on utilizing path redundancy. Neurosci Lett. 469 , 416-420. Reis, J., Schambra, H.M., Cohen, L.G., Buch, E.R., Fritsch, B., Zarahn, E., Celnik, P.A., Krakauer, J.W., 2009. Noninvasive cortical stimulation enhances motor skill acquisition over multiple days through an effect on consolidation. Proc Natl Acad Sci U S A. 106 , 1590- 1595. Reitman, D., 1998. The real and imagined harmful effects of rewards: implications for clinicalpractice. J Behav Ther Exp Psychiatry. 29 , 101-113. Rioult-Pedotti, M.S., Friedman, D., Donoghue, J.P., 2000. Learning-induced LTP in neocortex. Science. 290 , 533-536. Robinson, O.J., Frank, M.J., Sahakian, B.J., Cools, R., 2010. Dissociable responses to punishment in distinct striatal regions during reversal learning. Neuroimage. 51 , 1459-1467. Rogers, R.D., Ramnani, N., Mackay, C., Wilson, J.L., Jezzard, P., Carter, C.S., Smith, S.M., 2004. Distinct portions of anterior cingulate cortex and medial prefrontal cortex are

98 References

activated by reward processing in separable phases of decision-making cognition. Biol Psychiatry. 55 , 594-602. Russell, C., Li, K., Malhotra, P.A., 2013. Harnessing motivation to alleviate neglect. Front Hum Neurosci. 7 , 230. Ryan, R.M., Deci, E.L., 2000. Intrinsic and extrinsic motivations: Classic definitions and new directions. Contemp Educ Psychol. 25 , 54-67. Ryan, R.M., Deci, E.L., 2007. Active human nature: Self-determination theory and the promotion and maintenance of sport, exercise, and health. In: Intrinsic motivation and self-determination in exercise and sport. Hagger, M.S., Chatzisarantis, N.L.D. (editors). Human Kinetics, Champaign (IL), 1–19. , 1-19. Sacco, R.L., Kasner, S.E., Broderick, J.P., Caplan, L.R., Culebras, A., Elkind, M.S., George, M.G., Hamdan, A.D., Higashida, R.T., Hoh, B.L., 2013. An updated definition of stroke for the 21st century a statement for healthcare professionals from the American Heart Association/American Stroke Association. Stroke. 44, 2064-2089. Salamone, J.D., Correa, M., 2002. Motivational views of reinforcement: implications for understanding the behavioral functions of nucleus accumbens dopamine. Behav Brain Res. 137 , 3-25. Salmoni, A.W., Schmidt, R.A., Walter, C.B., 1984. Knowledge of results and motor learning: a review and critical reappraisal. Psychol Bull. 95 , 355-386. Schaechter, J.D., 2004. Motor rehabilitation and brain plasticity after hemiparetic stroke. Prog Neurobiol. 73 , 61-72. Schlagenhauf, F., Juckel, G., Koslowski, M., Kahnt, T., Knutson, B., Dembler, T., Kienast, T., Gallinat, J., Wrase, J., Heinz, A., 2008. Reward system activation in schizophrenic patients switched from typical neuroleptics to olanzapine. Psychopharmacology. 196 , 673-684. Schmidt, R.A., Lee, T., 1988. Motor learning and control. Human Kinetics, Champaign (IL), USA. Schmidt, R.A., 1991. Frequent augmented feedback can degrade learning: Evidence and interpretations. In: Tutorials in motor neuroscience. Requin, J., Stelmach, G.E., (editors). Kluwer Academic Publishers, Dordrecht, the Netherlands, 59-75. Schott, B.H., Minuzzi, L., Krebs, R.M., Elmenhorst, D., Lang, M., Winz, O.H., Seidenbecher, C.I., Coenen, H.H., Heinze, H.J., Zilles, K., Duzel, E., Bauer, A., 2008. Mesolimbic functional magnetic resonance imaging activations during reward anticipation correlate with reward-related ventral striatal dopamine release. J Neurosci. 28 , 14311-14319. Schultz, W., 1998. Predictive reward signal of dopamine neurons. J Neurophysiol. 80 , 1-27. Schultz, W., 2000. Multiple reward signals in the brain. Nat Rev Neurosci. 1 , 199-207. Schultz, W., Tremblay, L., Hollerman, J.R., 2000. Reward processing in primate orbitofrontal cortex and basal ganglia. Cereb Cortex. 10 , 272-284. Schultz, W., 2006. Behavioral theories and the neurophysiology of reward. Annu Rev Psychol. 57 , 87-115. Schumann, G., Coin, L.J., Lourdusamy, A., Charoen, P., Berger, K.H., Stacey, D., Desrivieres, S., Aliev, F.A., Khan, A.A., Amin, N., Aulchenko, Y.S., Bakalkin, G., Bakker, S.J., Balkau, B., Beulens, J.W., Bilbao, A., de Boer, R.A., Beury, D., Bots, M.L., Breetvelt, E.J., Cauchi, S., Cavalcanti-Proenca, C., Chambers, J.C., Clarke, T.K., Dahmen, N., de Geus, E.J., Dick, D., Ducci, F., Easton, A., Edenberg, H.J., Esko, T., Fernandez-Medarde, A., Foroud, T., Freimer, N.B., Girault, J.A., Grobbee, D.E., Guarrera, S., Gudbjartsson, D.F., Hartikainen, A.L., Heath, A.C., Hesselbrock, V., Hofman, A., Hottenga, J.J., Isohanni, M.K., Kaprio, J., Khaw, K.T., Kuehnel, B., Laitinen, J., Lobbens, S., Luan, J.A., Mangino, M., Maroteaux, M., Matullo, G., McCarthy, M.I., Mueller, C., Navis, G., Numans, M.E., Nunez, A., Nyholt, D.R., Onland-Moret, C.N., Oostra, B.A., O'Reilly, P.F., Palkovits, M., Penninx, B.W., Polidoro, S., Pouta, A., Prokopenko, I., Ricceri, F.,

99 References

Santos, E., Smit, J.H., Soranzo, N., Song, K., Sovio, U., Stumvoll, M., Surakk, I., Thorgeirsson, T.E., Thorsteinsdottir, U., Troakes, C., Tyrfingsson, T., Tonjes, A., Uiterwaal, C.S., Uitterlinden, A.G., van der Harst, P., van der Schouw, Y.T., Staehlin, O., Vogelzangs, N., Vollenweider, P., Waeber, G., Wareham, N.J., Waterworth, D.M., Whitfield, J.B., Wichmann, E.H., Willemsen, G., Witteman, J.C., Yuan, X., Zhai, G.J., Zhao, J.H., Zhang, W.H., Martin, N.G., Metspalu, A., Doering, A., Scott, J., Spector, T.D., Loos, R.J., Boomsma, D.I., Mooser, V., Peltonen, L., Stefansson, K., van Duijn, C.M., Vineis, P., Sommer, W.H., Kooner, J.S., Spanagel, R., Heberlein, U.A., Jarvelin, M.R., Elliott, P., 2011. Genome-wide association and genetic functional studies identify autism susceptibility candidate 2 gene (AUTS2) in the regulation of alcohol consumption. Proc Natl Acad Sci U S A. 108 , 7119-7124. Sescousse, G., Caldu, X., Segura, B., Dreher, J.C., 2013. Processing of primary and secondary rewards: a quantitative meta-analysis and review of human functional neuroimaging studies. Neurosci Biobehav Rev. 37 , 681-696. Shadmehr, R., Wise, S.P., 2005. The computational neurobiology of reaching and pointing: a foundation for motor learning. MIT press, Cambridge (MA), USA. Shmuelof, L., Krakauer, J.W., Mazzoni, P., 2012. How is a motor skill learned? Change and invariance at the levels of task success and trajectory control. J Neurophysiol. 108 , 578-594. Simon, J.J., Biller, A., Walther, S., Roesch-Ely, D., Stippich, C., Weisbrod, M., Kaiser, S., 2010. Neural correlates of reward processing in schizophrenia - Relationship to apathy and depression. Schizophr Res. 118 , 154-161. Sinha, R., 2009. Stress and addiction: a dynamic interplay of genes, environment, and drug intake. Biol Psychiatry. 66 , 100-101. Spence, J.T., 1970. The distracting effects of material reinforcers in the discrimination learning of lower-and middle-class children. Child Dev. 41 , 103-111. Stacey, D., Bilbao, A., Maroteaux, M., Jia, T.Y., Easton, A.C., Longueville, S., Nymberg, C., Banaschewski, T., Barker, G.J., Buchel, C., Carvalho, F., Conrod, P.J., Desrivieres, S., Fauth-Buhler, M., Fernandez-Medarde, A., Flor, H., Gallinat, J., Garavan, H., Bokde, A.L.W., Heinz, A., Ittermann, B., Lathrop, M., Lawrence, C., Loth, E., Lourdusamy, A., Mann, K.F., Martinot, J.L., Nees, F., Palkovits, M., Paus, T., Pausova, Z., Rietschel, M., Ruggeri, B., Santos, E., Smolka, M.N., Staehlin, O., Jarvelin, M.R., Elliott, P., Sommer, W.H., Mameli, M., Muller, C.P., Spanagel, R., Girault, J.A., Schumann, G., Consortium, I., 2012. RASGRF2 regulates alcohol-induced reinforcement by influencing mesolimbic dopamine neuron activity and dopamine release. Proc Natl Acad Sci U S A. 109 , 21128-21133. Steingruber, H., Lienert, G., 1971. Hand-Dominanz-Test (HDT). Hogrefe, Göttingen, Germany. Stoy, M., Schlagenhauf, F., Schlochtermeier, L., Wrase, J., Knutson, B., Lehmkuhl, U., Huss, M., Heinz, A., Strohle, A., 2011. Reward processing in male adults with childhood ADHD-a comparison between drug-naive and methylphenidate-treated subjects. Psychopharmacology. 215 , 467-481. Strohle, A., Stoy, M., Wrase, J., Schwarzer, S., Schlagenhauf, F., Huss, M., Hein, J., Nedderhut, A., Neumann, B., Gregor, A., Juckel, G., Knutson, B., Lehmkuhl, U., Bauer, M., Heinz, A., 2008. Reward anticipation and outcomes in adult males with attention- deficit/hyperactivity disorder. Neuroimage. 39 , 966-972. Studer, B., Knecht, S., 2016. A benefit–cost framework of motivation for a specific activity. In: Progress in Brain Research. Volume 229, Bettina, S., Stefan, K., (editors). Elsevier, 25-47. Thorndike, E.L., 1931. Human learning. The Century Co, New York (), USA.

100 References

Torrubia, R., Avila, C., Moltó, J., Caseras, X., 2001. The Sensitivity to Punishment and Sensitivity to Reward Questionnaire (SPSRQ) as a measure of Gray's anxiety and impulsivity dimensions. Personality and individual differences. 31 , 837-862. Towfighi, A., Saver, J.L., 2011. Stroke declines from third to fourth leading cause of death in the United States: historical perspective and challenges ahead. Stroke. 42 , 2351- 2355. Treadway, M.T., Buckholtz, J.W., Zald, D.H., 2013. Perceived stress predicts altered reward and loss feedback processing in medial prefrontal cortex. Front Hum Neurosci. 7 , 180. Tricomi, E., Delgado, M.R., McCandliss, B.D., McClelland, J.L., Fiez, J.A., 2006. Performance feedback drives caudate activation in a phonological learning task. J Cogn Neurosci. 18 , 1029-1043. Tricomi, E., Fiez, J.A., 2008. Feedback signals in the caudate reflect goal achievement on a declarative memory task. Neuroimage. 41 , 1154-1167. Tricomi, E., DePasque, S., 2016. The Role of Feedback in Learning and Motivation. In: Recent Developments in Neuroscience Research on Human Motivation. 175-202. Tricomi, E.M., Delgado, M.R., Fiez, J.A., 2004. Modulation of caudate activity by action contingency. Neuron. 41 , 281-292. Ullsperger, M., von Cramon, D.Y., 2003. Error monitoring using external feedback: Specific roles of the habenular complex, the reward system, and the cingulate motor area revealed by functional magnetic resonance imaging. J Neurosci. 23 , 4308-4314. Vaidya, J.G., Knutson, B., O'Leary, D.S., Block, R.I., Magnotta, V., 2013. Neural sensitivity to absolute and relative anticipated reward in adolescents. PLoS One. 8 , e58708. Veerbeek, J.M., van Wegen, E., van Peppen, R., van der Wees, P.J., Hendriks, E., Rietberg, M., Kwakkel, G., 2014. What Is the Evidence for Physical Therapy Poststroke? A Systematic Review and Meta-Analysis. Plos One. 9. Villafuerte, S., Heitzeg, M.M., Foley, S., Yau, W.Y.W., Majczenko, K., Zubieta, J.K., Zucker, R.A., Burmeister, M., 2012. Impulsiveness and insula activation during reward anticipation are associated with genetic variants in GABRA2 in a family sample enriched for alcoholism. Mol Psychiatry. 17 , 511-519. Wachter, T., Lungu, O.V., Liu, T., Willingham, D.T., Ashe, J., 2009. Differential effect of reward and punishment on procedural learning. J Neurosci. 29 , 436-443. Waltz, J.A., Schweitzer, J.B., Ross, T.J., Kurup, P.K., Salmeron, B.J., Rose, E.J., Gold, J.M., Stein, E.A., 2010. Abnormal Responses to Monetary Outcomes in Cortex, but not in the Basal Ganglia, in Schizophrenia. Neuropsychopharmacology. 35 , 2427-2439. White, T.P., Gilleen, J., Shergill, S.S., 2013. Dysregulated but not decreased salience network activity in schizophrenia. Front Hum Neurosci. 7 , 65. Widmer, M., Ziegler, N., Held, J., Luft, A., Lutz, K., 2016. Rewarding feedback promotes motor skill consolidation via striatal activity. In: Progress in Brain Research. 229, Bettina, S., Stefan, K., (editors). Elsevier, 303-323. Widmer, M., Luft, A.R., Lutz, K., 2017. Processing of Motor Performance Related Reward After Stroke. In: Converging Clinical and Engineering Research on Neurorehabilitation II: Proceedings of the 3rd International Conference on NeuroRehabilitation (ICNR2016), October 18-21, 2016, Segovia, Spain. Ibáñez, J., González-Vargas, J., Azorín, J.M., Akay, M., Pons, J.L., (editors). Springer International Publishing, Cham, 1019-1023. Winstein, C.J., Merians, A.S., Sullivan, K.J., 1999. Motor learning after unilateral brain damage. Neuropsychologia. 37 , 975-87. Winstein, C.J., Wolf, S.L., Dromerick, A.W., et al., 2016. Effect of a task-oriented rehabilitation program on upper extremity recovery following motor stroke: The icare randomized clinical trial. JAMA. 315 , 571-581.

101 References

Wittmann, F., Lambercy, O., Gonzenbach, R.R., van Raai, M.A., Hover, R., Held, J., Starkey, M.L., Curt, A., Luft, A., Gassert, R., 2015. Assessment-driven arm therapy at home using an IMU-based virtual reality system. In: 2015 IEEE International Conference on Rehabilitation Robotics (ICORR). IEEE, 707-712. Wittmann, F., Held, J.P., Lambercy, O., Starkey, M.L., Curt, A., Hover, R., Gassert, R., Luft, A.R., Gonzenbach, R.R., 2016. Self-directed arm therapy at home after stroke with a sensor-based virtual reality training system. J Neuroeng Rehabil. 13 , 75. Wolpert, D.M., Ghahramani, Z., Jordan, M.I., 1995. An internal model for sensorimotor integration. Science. 269 , 1880-1882. Wu, C.C., Samanez-Larkin, G.R., Katovich, K., Knutson, B., 2014. Affective traits link to reliable neural markers of incentive anticipation. Neuroimage. 84 , 279-289. Wunderlich, K., Rangel, A., O'Doherty, J.P., 2009. Neural computations underlying action- based decision making in the human brain. Proc Natl Acad Sci U S A. 106 , 17199- 17204. Yacubian, J., Glascher, J., Schroeder, K., Sommer, T., Braus, D.F., Buchel, C., 2006. Dissociable systems for gain- and loss-related value predictions and errors of prediction in the human brain. J Neurosci. 26 , 9530-9537. Zarahn, E., Alon, L., Ryan, S.L., Lazar, R.M., Vry, M.S., Weiller, C., Marshall, R.S., Krakauer, J.W., 2011. Prediction of Motor Recovery Using Initial Impairment and fMRI 48 h Poststroke. Cereb Cortex. 21, 2712-2721. Zeiler, S.R., Gibson, E.M., Hoesch, R.E., Li, M.Y., Worley, P.F., O'Brien, R.J., Krakauer, J.W., 2013. Medial Premotor Cortex Shows a Reduction in Inhibitory Markers and Mediates Recovery in a Mouse Model of Focal Stroke. Stroke. 44 , 483-489. Zhang, W.N., Chang, S.H., Guo, L.Y., Zhang, K.L., Wang, J., 2013. The neural correlates of reward-related processing in major depressive disorder: a meta-analysis of functional magnetic resonance imaging studies. J Affect Disord. 151 , 531-539. Zhou, Z., Yu, R., Zhou, X., 2010. To do or not to do? Action enlarges the FRN and P300 effects in outcome evaluation. Neuropsychologia. 48 , 3606-3613. Ziemann, U., Ilic, T.V., Pauli, C., Meintzschel, F., Ruge, D., 2004. Learning modifies subsequent induction of long-term potentiation-like and long-term depression-like plasticity in human motor cortex. J Neurosci. 24 , 1666-1672.

102

103

104

Curriculum Vitae Mario Widmer, MSc ETH

Department of Neurology University Hospital Zurich Frauenklinikstrasse 26 -8091 Zürich

Personal data

Date of birth: 22/04/1986 Place of origin: Gränichen AG, Switzerland Nationality: Swiss Civil status: Unmarried

Education

Since 02/2013 PhD Student, Neural Control of Movement Lab, Depart- ment of Health Sciences and Technology, ETH Zurich and Department of Neurology, University Hospital Zurich, Switzerland

09/2010 – 08/2012 Master in Human Movement Sciences, ETH Zurich, Switzerland (Final grade: 5.79, Graduation „with distinction“) Major in Exercise Physiology Master thesis: „The Influence of Posture and Incremental Exercise in Normoxia and Hypoxia on Middle Cerebral Ar- tery Mean Velocity and Internal Jugular Venous Blood Flow”

09/2009 – 06/2015 Teaching Diploma for secondary (Matura) schools in Sport, ETH Zurich, Switzerland

09/2007 – 08/2011 Bachelor in Human Movement Sciences, ETH Zurich, Swit- zerland

08/2002 – 07/2006 Matura, Kantonsschule Aarau, Switzerland

105 Curriculum Vitae Mario Widmer, MSc ETH

Academic experience

11/2012 – 12/2012 Research assistant, Institute of Physiology, University of Zurich, Switzerland

09/2011 – 03/2012 Internship, Institute of Physiology, University of Zurich, Switzerland

Peer-reviewed publications

1. Lutz, K., Widmer, M. , 2014. What can the monetary incentive delay task tell us about the neural processing of reward and punishment? Neuroscience and Neuroe- conomics 3, 33-45. 2. Widmer, M. , Ziegler, N., Held, J., Luft, A., Lutz, K., 2016. Rewarding feedback pro- motes motor skill consolidation via striatal activity. Progress in Brain Research 229, 303-323 3. Widmer, M. , Luft, A.R., Lutz, K., 2017. Processing of Motor Performance Related Reward After Stroke. Converging Clinical and Engineering Research on Neuroreha- bilitation II, Springer, 1019-1023. 4. Rasmussen P., Widmer M. (co-first), Hilty M.P., Hug M., Sørensen H., Ogoh S., Sato K., Secher N.H., Maggiorini M., Lundby C.. Thermodilution-determined Internal Jug- ular Venous Flow. Medicine & Science in Sports & Exercise, published ahead of print

Publications in progress

5. Widmer M. , Held J.P., Wittmann F., Lambercy O., Lutz K., Luft A.R.. Does motivation matter in upper limb rehabilitation after stroke? ArmeoSenso-Reward: Study proto- col for a randomized controlled trial. Submitted for publication 6. Xu J., Ejaz N., Hertler B., Branscheidt M., Widmer M. , Faria A.V., Harran M., Cortes J.C., Kim N., Celnik P.A., Kitago T., Luft A.R., Krakauer J.W., Diedrichsen J.. Recovery of hand function after stroke: separable systems for finger strength and control. In preparation

Presentations

"Processing of Motor Performance Related Reward after Stroke", 3rd International Confer- ence on Neurorehabilitation (ICNR) , La Granja, Segovia, Spain, October 2016 (Oral and poster presentation)

"Wie beeinflusst Geld den Lernverlauf in einer Geschicklichkeitsaufgabe?", BrainFair 2016: Bewegung , Zurich, Switzerland, March 2016 (Oral presentation)

"Rewarding feedback promotes motor skill consolidation via striatal activity", 15 Day of Clinical Research , Zurich, Switzerland, March 2016 (Poster presentation)

106 Curriculum Vitae Mario Widmer, MSc ETH

"Motor reward processing after stroke", Neurorehabilitation Symposium at Parkhotel Vitznau , Vitznau, Switzerland, March 2016 (Oral presentation)

"Motor Skill Learning in the Context of Feedback and Reward", ZNZ Symposium 2015 , Zur- ich, Switzerland, September 2015 (Poster presentation)

"Motor Skill Learning in the Context of Feedback and Reward", Congress on NeuroRehabil- itation and Neural Repair , Maastricht, the Netherlands, May 2015 (Oral presentation)

"Facilitation of motor skill learning through feedback and rewards in healthy young sub- jects" and "Do enhanced feedback and reward during motor training facilitate motor reha- bilitation after stroke?", International Symposium Neuro-Rehab , Kartause Ittingen, Swit- zerland, March 2015 (Poster presentation)

Awards

Selected finalist – Best Student Contribution Award, 3rd International Conference on Neu- rorehabilitation (ICNR) , La Granja, Segovia, Spain, October 2016

Selected finalist – Young Scientist Award, Congress on NeuroRehabilitation and Neural Re- pair , Maastricht, the Netherlands, May 2015

Attended courses and seminars

11/2014 Good Clinical Practice (GCP) of the TRREE training program in re- search ethics evaluation

11/2014 MRI safety course organized by the MR Group of the Institute for Biomedical Engineering, University and ETH Zurich

05/2014 MRI safety course organized by Dr. Kai Lutz, Scientific Director of the cereneo, center for neurology & rehabilitation, Vitznau

Selected courses: • Introductory Course in Neurosciences I & II (Neuroscience Center Zurich – ZNZ) • Scientific Programming for Neuroeconomic Experiments (University of Zurich – UZH) • Application of Matlab in the HMS (ETH Zurich) • Improving Time and Self-Management Skills (ETH / UZH) • Responsible Conduct in Research (ETH) • PhD Retreat of Neuroscience Center Zurich (ZNZ)

107 Curriculum Vitae Mario Widmer, MSc ETH

Tutoring

2016 – Samara Stulz, Master Thesis ETH, "Processing of Motor Performance Related Reward in Young and Elderly Healthy Adults"

2013 – Nadja Ziegler, Master Thesis ETH, "The Impact of Anodal Transcranial Direct Current Stimulation on Motor Skill Learning in Healthy Subjects"

108