The brain mechanisms of

Philip Jean-Richard-dit-Bressel

Doctor of Philosophy Thesis

August 2015

School of Psychology

The University of New South Wales Abstract

Punishment involves the reduction of responding which lead to negative outcomes. This thesis investigated the behavioural and neural underpinnings of this phenomenon. A review of the literature suggests punishment results in a reduction of responding via instrumental suppression, and not other forms of learning. Experiment 1 developed a protocol for punishment whereby rats responded on two levers for food and responses on one lever

(punished), but not another (unpunished) were additionally punished with brief, aversive footshock. Rats selectively reduced responses on the punished lever and also showed a preference for the unpunished lever in a choice test.

The brain mechanisms of this behaviour were investigated in Experiments 2 – 9 using bilateral cannulations of BLA, AcbSh, PFC, VTA, LHb and dmStr, permitting microinfusions of selected drugs. Inactivations of the BLA using baclofen/muscimol (BM) impaired the acquisition and expression of punishment suppression, but not unpunished choice. This effect was attributable to manipulations of caudal but not rostral BLA. GABAA antagonist into the

VTA during punishment acquisition resulted in long-lasting impulsivity/punishment insensitivity and hyperactivity, but had more selected effects in other punishment tasks.

Microinfusions of relevant drugs into the AcbSh, PFC, LHb and dmStr did not cause any change in punished leverpressing during punishment acquisition, expression or choice.

These findings were interpreted to mean that: 1) caudal BLA serves a role in encoding the aversive value of the footshock punisher and this encoding is critical for acquiring and expressing punishment suppression; 2) VTA disinhibition during initial punishment perturbs learning-related plasticity, manifesting as an impulsivity/hyperactivity behavioural phenotype; and 3) AcbSh, PFC, LHb and dmStr are not crucial for punishment learning or behaviour. The pathways that mediate the caudal BLA role in punishment, and the precise neural and behavioural loci of the effects observed in the VTA experiment, remain unclear and are

i avenues for future research. These findings have important implications for considering the effects of punishment within populations with altered BLA (e.g., ) or VTA activity (e.g. ADHD, substance abuse disorders).

ii Table of Contents

Abstract ...... i

Originality Statement ...... viii

Acknowledgements ...... ix

Publications and Presentations ...... x

Care and Use of Animals ...... xii

List of Tables ...... xiii

List of Figures ...... xiv

List of Abbreviations ...... xvii

Chapter 1: Punishment Learning and Behaviour ...... 1

1. Pavlovian and Instrumental Learning ...... 2

1.1 Pavlovian conditioning ...... 2

Prediction error ...... 3

1.2 Instrumental conditioning ...... 5

Goal-directed behaviour ...... 8

Current models of instrumental conditioning ...... 10

Summary ...... 11

2. Theories of Punishment ...... 12

2.1 Negative Law of Effect ...... 13

2.2 Conditioned Emotional Response theory of punishment ...... 17

Response-dependent stimuli ...... 22

2.3 Avoidance theory of punishment ...... 25

2.4 Instrumental suppression ...... 28

Negative incentive motivation ...... 31

iii Response-punisher associations ...... 32

2.5 Synthesis and conclusions ...... 34

Chapter 2: Brain Mechanisms of Punishment ...... 40

1. General Drug Effects on Punishment ...... 41

1.1 Anxiolytics, GABA and serotonin ...... 42

Barbiturates ...... 43

Benzodiazepines ...... 44

Ethanol ...... 45

Serotonin and serotonergic anxiolytics ...... 46

1.2 Dopamine ...... 50

1.3 Norepinephrine ...... 51

1.4 Summary ...... 54

2. Implicated Circuits and Structures ...... 54

2.1 Gray’s Behavioural Inhibition System ...... 54

2.2 Amygdala ...... 58

2.3 Midbrain dopamine circuits ...... 62

Mesolimbic DA reward coding ...... 63

Mesolimbic DA inhibition ...... 64

The LHb-RMTg-DA circuit ...... 66

The nigrostriatal and indirect basal ganglia pathway ...... 70

Aversion-coding neurons of the VTA ...... 73

Summary of midbrain DA systems involved in punishment ...... 76

2.4 Response-outcome circuit ...... 78

2.5 Prefrontal cortex ...... 82

Anterior cingulate cortex ...... 82

iv Prelimbic cortex ...... 84

Infralimbic cortex...... 85

Orbitofrontal cortex ...... 86

Insular cortex ...... 88

Summary ...... 89

3. Summary of Chapter 2 ...... 90

4. Aims ...... 91

Chapter 3: Assessment of multi-phase punishment protocol ...... 92

Experiment 1 ...... 95

Methods ...... 95

Results ...... 98

Discussion ...... 103

Chapter 4: General Methods ...... 105

Chapter 5: The Role of BLA and mAcbSh in Punishment ...... 111

Experiment 2: BLA inactivation ...... 113

Methods ...... 113

Results ...... 113

Experiment 3: mAcbSh inactivation ...... 124

Methods ...... 124

Results ...... 124

Discussion ...... 132

v Chapter 6: The Role of the Prefrontal Cortex in Punishment ...... 135

Experiment 4: PL inactivation ...... 136

Methods ...... 136

Results ...... 137

Experiment 5: IL inactivation ...... 143

Methods ...... 143

Results ...... 143

Experiment 6: RAIC inactivation ...... 150

Methods ...... 150

Results ...... 151

Discussion ...... 157

Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment ...... 159

Experiment 7: VTA disinhibition ...... 161

Methods ...... 161

Results ...... 162

Experiment 8: LHb inactivation ...... 172

Methods ...... 172

Results ...... 173

Experiment 9: dmStr inactivation ...... 183

Methods ...... 183

Results ...... 185

Discussion ...... 192

Experiment 7: VTA disinhibition ...... 192

Experiment 8: LHb inactivation ...... 196

vi Experiment 9: dmStr inactivation ...... 197

Chapter 8: General Discussion ...... 199

A role for the BLA in encoding the aversive value of a punisher ...... 200

A role for VTA inhibition in punishment-learning plasticity ...... 203

A limited role for mAcbSh, PFC, LHb and dmStr in punishment ...... 207

Specific effects of neural manipulations on choice test ...... 209

IL inactivations ...... 209

RAIC inactivations ...... 211

Bidirectional effect of DA antagonists into the dmStr ...... 211

Methodological Considerations ...... 213

Future Directions ...... 215

Concluding Remarks ...... 216

References ...... 218

vii Originality Statement

I hereby declare that this submission is my own work and to the best of my knowledge contains no material previously published or written by another person, nor material which to substantial extent has been accepted for the reward of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis.

Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent of assistance from others in the project’s conception and design, or in style, presentation and linguistic expression as acknowledged.

Signed: ………………………………..

Date: …………………………………..

viii Acknowledgements

First and foremost, I would like to thank my supervisor Gavan McNally for his tutelage and steadfast support throughout my candidature. Your knowledge and passion for science have been inspirational, and I am sincerely grateful for all the help you have provided over the years, and for the opportunity you have given me to do this research.

I am grateful to Simon Killcross and Marios Panayi for allowing me to use their lab’s locomotor chambers, and much thanks go to Pascal Carrive, Vincent Laurent and Martine

Hoffman for lending kind assistance with drugs. I also wish to thank my co-supervisor, Rick

Richardson, and Fred Westbrook for their helpful comments.

Thank you to all those who make up the McNally lab, both past and present, who have helped me along the way. In particular, thanks go to Zayra Millan, Helen Nasser, Asheeta

Prasad and Lucy EunA Choi for patiently teaching me the various skills required for the procedures employed throughout my PhD, and Shaun Khoo for helping to code the Med-PC programs.

Last but not least, I would like to thank my dear friends and family, who have been an endless source of encouragement and compassion throughout this journey. Thank you for spurring me on, keeping me sane and standing alongside me.

ix Publications

Jean-Richard-dit-Bressel, P., & McNally, G. P. (2014). The role of the lateral habenula in

punishment. PLoS One, 9, e111699.

This publication forms the basis of Experiment 8 within Chapter 6 of the present

thesis.

Jean-Richard-Dit-Bressel, P., & McNally, G. P. (2015). The role of the basolateral amygdala

in punishment. Learning & Memory, 22, 128-137.

This publication forms the basis of Experiment 1 and 2 within Chapters 3 and 4 of

the present thesis.

Oral Presentations

Jean-Richard-Dit-Bressel, P., & McNally, G. P. (2013). How does punishment suppress

behaviour? University of New South Wales, Postgraduate Research Competition,

Sydney, Australia.

Jean-Richard-Dit-Bressel, P., & McNally, G. P. (2014). Brain mechanisms of punishment.

University of New South Wales, Postgraduate Seminar, Sydney, Australia.

Jean-Richard-Dit-Bressel, P., & McNally, G. P. (2014). When sticks become carrots:

Dopamine disinhibition during punishment causes enduring impulsivity and

hyperactivity. University of New South Wales, Science Research Expo & Competition,

Sydney, Australia.

x Posters

Jean-Richard-Dit-Bressel, P., & McNally, G. P. (2013). The role of the LHb in reversal

learning and punishment. Dopamine, Alghero, Sardinia.

Jean-Richard-Dit-Bressel, P., & McNally, G. P. (2013). How does punishment suppress

behaviour? University of New South Wales, Postgraduate Research Competition,

Sydney, Australia.

Jean-Richard-Dit-Bressel, P., & McNally, G. P. (2013). The roles of the LHb and BLA in

punishment. Society for Neuroscience, San Diego, CA.

Jean-Richard-Dit-Bressel, P., & McNally, G. P. (2014). The roles of the LHb and BLA in

punishment. Australasian Neuroscience Society, Adelaide, Australia.

Jean-Richard-Dit-Bressel, P., & McNally, G. P. (2014). When sticks become carrots:

Dopamine disinhibition during punishment causes enduring impulsivity and

hyperactivity. University of New South Wales, Science Research Expo & Competition,

Sydney, Australia.

Jean-Richard-Dit-Bressel, P., & McNally, G. P. (2014). The role of the BLA in punishment

and aversive choice. Inter-university Neuroscience and Mental Health Conference,

Sydney, Australia.

Jean-Richard-dit-Bressel, P., & McNally, G. P. (2014). The roles of the LHb and BLA in

punishment. Society for Neuroscience, Washington DC.

xi Care and Use of Animals

The experiments presented in this thesis were conducted in accordance with the guidelines for the ethical use of animals maintained by the Australian Code of Practice for the

Care and Use of Animals for Scientific Purposes (7th Edition), and all procedures were approved by the Animal Care and Ethics Committee at the University of New South Wales.

All efforts were made to minimise both suffering and the number of animals used.

xii List of Tables

Table Page

1.1 Summary of evidence for each account of punishment. 36-37

3.1 Summary of procedure for Experiment 1. 97

8.1 Summary of experimental findings. 201

xiii List of Figures

Figure Page

2.1 Summary of punishment-implicated midbrain DA circuits. 77

2.2 Summary of response-outcome circuit. 81

2.3 Regions of the rodent prefrontal cortex. 83

3.1 Experiment 1: Mean (±SEM) leverpressing and freezing during punishment. 100

3.2 Experiment 1: Mean (±SEM) leverpressing during choice test. 102

4.1 Summary of general procedure. 107

5.1 Experiment 2: Cannulae placements in the BLA. 114

5.2 Experiment 2: Mean (±SEM) leverpressing during punishment acquisition. 115

5.3 Experiment 2: Mean (±SEM) leverpressing during punishment expression. 117

5.4 Experiment 2: Mean (±SEM) leverpressing of caudal and rostral BLA during 120 punishment expression.

5.5 Experiment 2: Mean (±SEM) leverpressing during choice test. 122

5.6 Experiment 2: Locomotor test. 123

5.7 Experiment 3: Cannulae placements in the mAcbSh. 125

5.8 Experiment 3: Mean (±SEM) leverpressing during punishment acquisition. 127

5.9 Experiment 3: Mean (±SEM) leverpressing during punishment expression. 128

5.10 Experiment 3: Mean (±SEM) leverpressing during choice test. 130

5.11 Experiment 3: Locomotor test. 131

6.1 Experiment 4: Cannulae placements in the PL. 138

xiv 6.2 Experiment 4: Mean (±SEM) leverpressing during punishment acquisition. 139

6.3 Experiment 4: Mean (±SEM) leverpressing during punishment expression. 141

6.4 Experiment 4: Mean (±SEM) leverpressing during choice test. 142

6.5 Experiment 5: Cannulae placements in the IL. 144

6.6 Experiment 5: Mean (±SEM) leverpressing during punishment acquisition. 146

6.7 Experiment 5: Mean (±SEM) leverpressing during punishment expression. 147

6.8 Experiment 5: Mean (±SEM) leverpressing during choice test. 149

6.9 Experiment 6: Cannulae placements in the RAIC. 152

6.10 Experiment 6: Mean (±SEM) leverpressing during punishment acquisition. 153

6.11 Experiment 6: Mean (±SEM) leverpressing during punishment expression. 155

6.12 Experiment 6: Mean (±SEM) leverpressing during choice test. 156

7.1 Experiment 7: Cannulae placements in the VTA. 163

7.2 Experiment 7: Mean (±SEM) leverpressing during punishment acquisition. 164

7.3 Experiment 7: Mean (±SEM) leverpressing during punishment expression. 166

7.4 Experiment 7: Mean (±SEM) leverpressing during choice test. 169

7.5 Experiment 7: Locomotor test. 171

7.6 Experiment 8: Cannulae placements in the LHb. 174

7.7 Experiment 8: Mean (±SEM) leverpressing during punishment acquisition. 175

7.8 Experiment 8: Mean (±SEM) leverpressing during punishment expression. 177

7.9 Experiment 8: Mean (±SEM) leverpressing during extended punishment 179 expression.

7.10 Experiment 8: Mean (±SEM) leverpressing during choice test. 181

xv 7.11 Experiment 8: Locomotor test. 182

7.12 Experiment 9: Cannulae placements in the dmStr. 186

7.13 Experiment 9: Mean (±SEM) leverpressing during punishment acquisition. 187

7.14 Experiment 9: Mean (±SEM) leverpressing during punishment expression. 190

7.15 Experiment 9: Mean (±SEM) leverpressing during choice test. 191

xvi List of Abbreviations

5-HT serotonin

Acb nucleus accumbens

AcbC nucleus accumbens core

AcbSh nucleus accumbens shell

ACC anterior cingulate cortex

Acq acquisition

ADHD attention deficit hyperactivity disorder

AMPA α-amino-3-hydroxy-5-methylisoxazole-4-propionate

ANOVA analysis of variance

AP anterior-posterior

ATD acute tryptophan depletion

BAS behavioural activation system

Bic bicuculline

BIS behavioural inhibition system

BLA lateral and basolateral nucleus of the amygdala

BM baclofen/muscimol

Bup bupivacaine

BZ benzodiazepine

Ca2+ calcium

CeA central nucleus of the amygdala

CER conditioned emotional response

ChR2 channelrhodopsin-2 cm centimetre

CNS central nervous system

xvii CPA conditioned place aversion

CR conditioned response

CRF continuous

CS conditioned stimulus

DA dopamine

DAr dopamine receptor df degrees of freedom dmPFC dorsomedial prefrontal cortex dmStr dorsomedial striatum dStr dorsal striatum

DV dorsal-ventral

EP entopeduncular nucleus

Exp experiment eYFP enhanced yellow fluorescent protein

FI fixed interval fMRI functional magnetic resonance imaging fr fasciculus retroflexus

FR fixed ratio g gram

GABA gamma-aminobutyric acid

GP globus pallidus

GPi internal segment of the globus pallidus

GPe external segment of the globus pallidus

Hb habenular complex hr hour

xviii Hz hertz i.p intraperitoneal

ICSS intracranial self-stimulation kg kilogram

IL infralimbic cortex

Lat latencies

LC locus coeruleus

LHb lateral habenula

LiCl lithium chloride

LP leverpressing lPFC lateral prefrontal cortex

LTD long-term depression

LTP long-term potentiation mA milliamp mAcbSh medial nucleus accumbens shell

MDT mediodorsal thalamus

MHb medial habenula

Mib mibefradil mg milligram min minute ml millilitre

ML medial-lateral mm millimetre mPFC medial prefrontal cortex mRNA messenger ribonucleic acid

xix ms millisecond

MSN medium spiny neuron

N/A not available

Na+ sodium

NBQX 2,3-dihydroxy-6-nitro-7-sulfamoylbenzo(F)quinoxaline

NE norepinephrine

NMDA N-methyl-D-aspartate

NMDAr N-methyl-D-aspartate receptor

NpHR halorhodopsin

OFC orbitofrontal cortex

PFC prefrontal cortex

PL prelimbic cortex

Pun punished r receptor

R-O response-outcome

RAIC rostral agranular insular cortex

RMTg rostromedial mesopontine tegmental nucleus

S stimulus

SD discriminative stimulus

S-R stimulus-response

Sal saline sec second

SEM standard error of the mean

SN substantia nigra

SNc substantia nigra pars compacta

xx SNr substantia nigra pars reticulata

SSRI selective serotonin reuptake inhibitor

STN subthalamic nucleus

TH tyrosine hydroxylase

Unp unpunished

US unconditioned stimulus

VI variable interval vmPFC ventromedial prefrontal cortex

VR variable ratio vs versus

VT variable time

VTA ventral tegmental area

µm micrometre

xxi Chapter 1: Punishment Learning and Behaviour

Chapter 1 Punishment Learning and Behaviour

Punishment, in its conventional usage, refers to the use of negative outcomes to modify an organism’s behaviour. In this sense, punishment has seen widespread use in various forms throughout society and history; we scold our children to discourage naughtiness, levy fines to curtail unlawful driving, and use punishment to control the behaviour of animals. This diverse range of actions is an exploitation of a more general principle that a contingency between behaviour and a negative outcome, such as pain or omission of reward, causes a decrease in the likelihood of that behaviour to be emitted again in the future. This broader phenomenon was also termed punishment by Skinner (1953), and this definition will be the one used throughout this thesis.

This general reduction of punished behaviour is highly conserved. It has been studied in mammals, birds and fish (Boe, 1969; Melvin & Ervey, 1973; Pollard & Howard, 1990), confirming its putative adaptive function. However, despite its pervasiveness in presentation and use, punishment is relatively poorly understood and controversial within scientific disciplines. There are few unifying theories of punishment behaviour or neurophysiology, and the mechanisms of this fundamental process remain poorly understood. This thesis aims to investigate the brain mechanisms and circuits in different aspects of punishment learning and behaviour.

This thesis is organised as follows. This first chapter reviews the behavioural and/or cognitive mechanisms that have been postulated to underlie punishment. It is followed in

Chapter 2 by a review of the various neurobiological explanations offered for punishment.

Chapters 3–7 present the primary empirical data of the thesis. Chapter 8 discusses these data with reference of the literatures reviewed in Chapters 1 and 2.

1 Chapter 1: Punishment Learning and Behaviour

1.1 Pavlovian and Instrumental Learning

Before the various explanations of punishment behaviour are considered, it is useful to consider two different ways an organism adapts its behaviour to its environment: Pavlovian and instrumental learning. Both these forms of learning are examples of associative learning, because they involve the modulation of behaviour through the formation and dissolution of associations between events and/or behaviours. The motivational states underlying these associations can generally be understood as either appetitive or aversive, eliciting preservative and protective responses, respectively (Konorski, 1967). Understanding punishment requires a basic understanding of Pavlovian and instrumental learning, hence this section provides a brief and selective review of the relevant literatures.

1.1.1 Pavlovian conditioning

Pavlovian (or classical) conditioning occurs when an originally neutral stimulus (a conditioned stimulus; CS) is paired with a motivationally significant stimulus (e.g. an unconditioned stimulus; US) that already elicits a behavioural response (e.g. an unconditioned response; UR). Once an association between a CS and a US is formed, the CS comes to elicit a behavioural response (a conditioned response; CR) that is relevant to the US (such as consummatory or preparatory CRs; Konorski, 1967). These conditioned responses can be classified broadly as CS-directed behaviours (e.g. orienting to the source of the CS; Holland,

1977) and US-directed behaviours (e.g. entering the magazine where the food US is delivered; Farwell & Ayres, 1979). The first laboratory demonstration of this was by Pavlov

(1927), who reported that if the delivery of food (US) exclusively and consistently followed a clicking metronome (CS), the metronome itself would come to elicit salivation (CR) in the dogs he trained.

These learned behaviours can also be lost via Pavlovian conditioning. For example,

Pavlov (1927) reported that if dogs trained on auditory CS-food US pairings subsequently

2 Chapter 1: Punishment Learning and Behaviour received CS-alone presentations, the salivary CR diminished and was eventually abolished.

Pavlov termed this loss of the CR “extinction”. Similarly, rats previously trained on CS-US pairings show a gradual loss of both CS-generated and US-generated behaviours when the CS is presented in the absence of the US.

These concepts equally apply to aversive stimuli. Aversive Pavlovian conditioning typically involves pairing a CS with an aversive US (such as a footshock). This CS gradually comes to elicit conditioned fear behaviour, such as the species-specific defense reaction of freezing in rodents (Bolles, 1970), on its own. Extinction is also observed in conditioned fear, with repeated exposures to the fear CS without the aversive US causing a gradual reduction of fear responses to the CS.

Prediction error

The formation and dissolution of these Pavlovian associations is thought to be driven by surprisingness or prediction error (Kamin, 1969), and not just contiguous pairings of stimuli.

Increases in associative strength between a CS and US depend on positive prediction error, where the actual outcome (US) exceeds what was expected (outcome anticipated using CSs).

A prime example of this need for prediction error, or surprisingness, to learn is demonstrated by the phenomenon known as blocking (Kamin, 1968, 1969). Kamin (1968) trained rats in the experimental group to fear a visual CS by arranging for a visual CS to signal an aversive footshock US in Stage I of training. In Stage II, these rats received compound presentations of the visual CS and an auditory CS followed by a footshock. Rats in the control group only received Stage II training, i.e. compound visual and auditory CS presentations followed by footshock. On test, rats were presented with the auditory CS and their fear responses were measured. Kamin (1968) reported that there was significantly less fear to the auditory CS in the experimental group than in the control group, despite both groups receiving the same number of compound CS-US pairings. The prior training of the visual CS in Stage I had

3 Chapter 1: Punishment Learning and Behaviour blocked fear learning to the auditory CS in Stage II. This blocking of learning about the auditory CS is not observed if the magnitude, duration, or number of footshocks is greater in

Stage II than Stage I (as arranged in an unblocking procedure), which has been used to show that reintroducing prediction error allows for associative learning.

Decreases in associative strength can also be explained by prediction error. When the expected outcome is greater than the actual outcome, negative prediction error occurs, causing a reduction in associative strength. This explains extinction, where the associative strength of the CS decreases across trials until the expected outcome matches the actual outcome (i.e. no outcome in the case of extinction), as well as overexpectation (Lattal & Nakajima, 1998;

Rescorla, 1973), where CSs lose associative strength due to summed expectations from compound CS presentations exceeding the delivered outcome, and conditioned inhibition, where subjects receive a negative contingency between CS and US, causing negative associative strength to accrue to the CS, making it a conditioned inhibitor (Rescorla, 1969;

Rescorla & Lolordo, 1965).

Taken together, the learning phenomena reviewed here give strong support for the notion that Pavlovian learning is driven by prediction error. The acquisition of CRs to a CS occurs when the actual US exceeds expected US (positive prediction error). The reduction of

CRs to a CS occurs when the expected US exceeds the actual US (negative prediction error), as demonstrated in extinction, overexpectation and conditioned inhibition. When no prediction error is present, no learning occurs, as demonstrated by blocking. These learning phenomena of blocking, overexpectation, and conditioned inhibition have been shown in many species, including humans (Baetu & Baker, 2010; Collins & Shanks, 2006; Grillon &

Ameli, 2001; Martin & Levey, 1991), and highlight the conclusion that an understanding of

Pavlovian learning requires an understanding of prediction error.

4 Chapter 1: Punishment Learning and Behaviour

In summary, behavioural responses can be elicited by environmental stimuli, which can be motivationally significant by nature (USs) or motivationally significant through the formation of associations with these USs (CSs). These associations between CSs and USs are acquired on the basis of a CS’s predictive value of the US, which is driven through a mechanism of prediction error. These CSs will come to elicit CRs corresponding to the motivational valence of the US (appetitive or aversive) if the CS has an excitatory association with that US, or suppress activation of that motivational system if the CS has an inhibitory association with that US. Thus, CSs acquire control over behavioural repertoires relating to fear (excitation of the aversive system), relief (inhibition of the aversive system), approach

(excitation of the appetitive system) and frustration (inhibition of the appetitive system).

However, the behaviours themselves are not involved in the learning that takes place; they are simply outputs/by-products of the association between stimuli.

1.1.2 Instrumental conditioning

In Pavlovian learning, behaviour is controlled by its antecedents. It is emitted largely as a consequence of the CS and/or US presentations. However, much of human and other animal behaviour is controlled by its consequences. In this sense, behaviour can be instrumental, i.e. causal or effective, at bringing about or preventing the presentation of stimuli. These consequences of behaviour are conventionally known to modify subsequent behaviour, with positive consequences increasing the likelihood of repeating antecedent behaviours and negative consequences decreasing the likelihood of repeating antecedent behaviours.

The first laboratory demonstration of this was provided by Thorndike (1898), who trained cats to make certain responses to escape from a puzzle box. He found that cats would acquire the necessary responses on a trial-and-error basis. From this finding he proposed that the connection between a stimulus and response (S-R association) is strengthened if the consequence of that connection is a “satisfying state of affairs”. In the particular case of his

5 Chapter 1: Punishment Learning and Behaviour experiments, he posited that each time the cats escaped and received the food, the S-R connection (stimuli within the puzzle box and response necessary to escape) would be strengthened, such that the stimuli within the puzzle box would increasingly activate the particular responses that caused the cat to escape. This constituted the first half of his “law of effect” (Thorndike, 1911).

The second half of the law of effect is the first half’s inverse, and stated that an S-R connection is weakened if the consequence of that connection is an “annoying state of affairs”. The theory was that, over the course of training, behaviours instrumental to achieving a satisfying state of affairs are “stamped in”, i.e. strengthened and maintained, while irrelevant behaviours that did not result in a good outcome are “stamped out”, i.e. weakened and discarded. Though Thorndike used this second half of the law to account for punishment, which involves an annoying state of affairs contingent upon a response, he later abandoned this negatively-valanced portion of the law. This will be discussed further in the Theories of

Punishment section (p. 13).

B.F. Skinner elaborated on Thorndike’s work, and in his book The Behaviour of

Organisms (1938) affirmed the distinction between respondent behaviour, which referred to classically conditioned behaviour, and operant behaviour. Operant, or instrumental, behaviour is not elicited by a CS or US, but is instead emitted as a function of its previous contingency with outcomes. He argued that instrumental learning accounted for the acquisition of non- innate behaviours like leverpressing in rats (Skinner, 1932), though this argument is first credited to Miller and Konorski (1969) who argued for separate Type I (Pavlovian) and Type

II (instrumental) conditioning processes in 1928.

Skinner (1938) defined the operant as whatever produced a particular outcome, e.g. if depressing a lever was the operant, it did not matter what response the organism performed to depress that lever. This allowed different responses to be the same operant (such as biting or

6 Chapter 1: Punishment Learning and Behaviour pressing the lever). It also allowed the same response to be part of different operants, under the control of discriminative stimuli (SD) that signalled the contingencies of reinforcement being enforced. Skinner’s most well-known demonstration of this effect was with pigeons. If the pigeons pecked a button when it was red it caused a food pellet to be delivered to a receptacle in the chamber, but pecking the button when it was green did not cause food delivery. Instead, they would only receive a pellet if they turned around in response to the green button. Over time the pigeons learned to peck the red button and turn in the presence of the green button. Subsequently removing the contingencies of reinforcement led to a decrease of the responses, which was also termed extinction.

Apart from acquisition and extinction of instrumental responding according to SDs,

Skinner also showed that different schedules of reinforcement have different effects on this acquisition and extinction. Rates of responding are generally proportional to rate of reinforcement. Response-independent reward is generally an ineffective reinforcer of behaviour (especially when the behaviour is not an innate appetitive response), providing further evidence that Pavlovian and instrumental conditioning are separate mechanisms.

Responses that are continuously reinforced (CRF), where every operant response is rewarded, are acquired quickly but also extinguished quickly. Ratio schedules, where reinforcement depends on the number of responses, such as a set number of responses (fixed ratios; FR) or a fluctuating number of responses (variable ratio; VR), support a high rate of responding, and are resistant to extinction (with increasing resistance for higher ratios and VR schedules). A moderate acquisition rate is found if a response is reinforced only after a fixed or varying amount of time (fixed interval/FI and variable interval/VI, respectively).

Interestingly, different schedules also elicit different patterns of responding (Ferster &

Skinner, 1957). FI schedules of reinforcement show a scalloped response, with the lowest rate of responding following reward, which increases exponentially up until reinforcement. FR

7 Chapter 1: Punishment Learning and Behaviour schedules produce high and steady rates of responding, but only following a pause in responding after reward that is observed under especially austere ratios. VI and VR schedules support steady rates of responding with little pause after reinforcement.

In the situation where several operants are available, responding is not fully directed at the more reinforced operant, but is instead split between responses according to their relative reinforcement. This is Herrnstein’s (1961, 1970) matching law. It can also explain the different rates of responding found with different rates of reinforcement in single schedule, single operant situations. The alternative behaviours the organism can perform instead of the operant are under the control of “extraneous ” that are not manipulated by the experimenter (inherently rewarding).

Goal-directed behaviour

Neither Thorndike’s Law of Effect nor Skinner’s theories of reinforcement required the organism to encode the consequences of their behaviour, and the intention of obtaining the reward was not part of their theories of instrumental behaviour. The behaviour might be reinforced with food or other rewards, but they postulated that Thorndike’s cats did not escape the box for the food reward on that trial as much as they escaped a box because of past food. Though conceptually concise, several findings suggest that at least some instrumental behaviours are guided by representations of the consequences of actions, and thus can be goal-directed. Evidence for behaviour that involves representations of outcome, and not just antecedent stimuli, come from demonstrations of reinforcer devaluation (Adams & Dickinson,

1981; Colwill & Rescorla, 1985a) and instrumental contingency degradation (Colwill &

Rescorla, 1986; Hammond, 1980; Williams, 1989).

In a reinforcer devaluation protocol, a behaviour is reinforced with a particular reward.

After acquiring the response, the particular reward is devalued in isolation from the response.

The level of responding is then tested against a behaviour that was reinforced with a non-

8 Chapter 1: Punishment Learning and Behaviour devalued reward. Adams and Dickinson (1981) provided the first demonstration of reinforcer devaluation. After training rats to press a lever for food pellets, they gave the rats the pellets

(without rats needing to leverpress) followed by an injection of lithium chloride (LiCl) to induce sickness or saline as a control. The rats that received LiCl pressed the lever less in extinction (leverpressing not reinforced) and reacquisition (reinforced with the pellets) test sessions than saline controls, suggesting that the leverpress-pellet association was formed during training and was used at test to guide behaviour. Colwill and Rescorla (1985a) also devaluated food rewards by sating the animal with that reward prior to the session, finding a similar decrease in devalued responding compared to non-devalued responding.

In a typical instrumental contingency degradation protocol a response is reinforced in

Stage I. In Stage II the reinforcer used in Stage I is presented independently of responding

(free reinforcement), eroding the contingency between response and reinforcer. This causes responding for that particular reinforcer to decrease. Williams (1989) tested the various explanations for this established phenomenon by first training rats to nosepoke and leverpress for oil and pellets. He then freely reinforced one of the reinforcers with a superimposed variable time (VT) schedule, causing a specific reduction in responding associated with that reward. However, when the reinforcer was instead additionally reinforced on a VI schedule on a third rod-pressing response, no contingency degradation was found. This rules out reinforcer-specific satiety, and would not be predicted by an S-R account. It instead suggests the contingency between the response and reinforcer is encoded.

A conceptually similar finding was that if delivery of the reinforcer was reliably preceded by an environmental stimulus such as a light or tone, the acquisition of the response was impaired compared to those that did not get stimulus presentations or if the stimulus was not correlated with reinforcement (Pearce & Hall, 1978; St. Claire-Smith, 1979a). This was interpreted to show that a CS could overshadow the response-reinforcer contingency,

9 Chapter 1: Punishment Learning and Behaviour especially if the Pavlovian contingency had greater relative validity (predicted the reward better than responding). This suggests that it is not the absolute value of the contingency between response and outcome that determines the effectiveness of instrumental learning, but instead the relative validity of an outcome’s antecedents affecting the likeliness of an organism making a link between its actions and the consequences of those actions.

Finally, it has been shown that particular R-O associations can be encoded according to

SDs. Rescorla (1991) trained rats to pull a chain for food pellets and press a lever for sucrose solution when a light SD was present. However, these contingencies were reversed when a noise SD was presented, such that chain-pulling led to sucrose and leverpressing led to food pellets. He then devalued one of the outcomes with LiCl and found that rats diminished the devalued response according to its SD (under extinction conditions). This finding was used to suggest that R-O associations can be encoded within a hierarchy of SDs: an S-(R-O) associative structure. Though this interpretation has been disputed (Dickinson, 1994; Holland,

1992), the experiment demonstrates that animals can anticipate the reward they will receive for making a certain response in the presence of certain stimuli.

Current models of instrumental conditioning

Current models of instrumental behaviour incorporate both S-R associations and R-O associations (Balleine & O’Doherty, 2010; de Wit et al., 2012; Griffiths et al., 2014). It has been demonstrated that instrumental responding is initially sensitive to reinforcer devaluation and contingency, but after extended training (overtraining) become relatively insensitive to changes in reinforcer value or contingency, including being impaired at switching behaviour when placed on an omission schedule (where a reinforcement becomes contingent on not performing the previously reinforced action) (Balleine & Dickinson, 1998; Dickinson et al.,

1995, 1998). This has been used to suggest that instrumental behaviour transitions from being flexible, intentional and goal-directed (R-O), to being reflexive and habitual (S-R).

10 Chapter 1: Punishment Learning and Behaviour

However, though extended training seems to decrease the relative contribution of outcome representation to behaviour and increases stimulus control over instrumental responding, Colwill & Rescorla (1985b) suggested that the R-O association still exerts significant influence over the behaviour with extended training (2 weeks) and have even shown R-O association contributions can increase over the course of training (Colwill &

Rescorla, 1988).

Dickinson (1994, 2010) specified that simple and weaker response-outcome contingencies (e.g. an unchanging interval contingency) were most susceptible to growing S-

R control over behaviour, whereas studies that employed complex or strong contingencies

(FR contingencies and/or multiple contingencies and outcomes) were resistant to overtraining’s effects on instrumental behaviour, as was the case for Colwill and Rescorla’s

(1985b, 1988) studies.

It has also been found that goal-directed behaviour can be rescued from S-R control

(Coutureau & Killcross, 2003). This implies that these two forms of learning are not mutually exclusive, but are instead two intact processes/systems that exert relative influence over behaviour. The general consensus is that S-R control generally increases over unchanging conditions, but can under some conditions be suppressed, increasing the relative contribution from the R-O system, such that behaviour will return to being goal-directed and thus susceptible to reinforcer devaluation and contingency changes.

Summary

Two forms of behavioural control have been reviewed in this section: Pavlovian learning and instrumental learning. Pavlovian learning involves the control of behaviour by environmental stimuli that have innate or acquired motivational significance (USs and CSs, respectively), whose presentation elicits an automatic behavioural response in concordance with the motivational system (appetitive or aversive) it excites or inhibits. The modulation of

11 Chapter 1: Punishment Learning and Behaviour

CS-US associative strength, which confers motivational significance to the CS, depends on the action of prediction error. Increases in associative strength are driven by positive prediction error, decreases in associative strength are driven by negative prediction error, and no Pavlovian learning is found in situations where there is no prediction error (i.e. when outcomes are completely anticipated based on preceding CSs).

Distinct from Pavlovian learning, instrumental learning involves the modification of behaviour according to the outcome it causes. Behaviours that result in positive outcomes

(reward or omission of anticipated aversive outcomes) are reinforced, that is increased in likelihood to be emitted again. The pattern of behaviour evoked is dependent on the schedule of reinforcement, and is also subject to discriminative stimuli (SDs) that signal the reinforcement schedule being enforced. Instrumental learning can also be sensitive to changes in the value of the outcome, and contingency between the response and outcome, suggesting instrumental behaviour is supported by response-outcome (R-O) associations. Instrumental behaviour is currently thought to be controlled by both goal-directed R-O and habitual stimulus-response (S-R) associations, with relative contribution of association type to behaviour being regulated by reinforcement parameters.

1.2 Theories of Punishment

Punishment, the decrease in likelihood of emitting a behaviour that causes a negative outcome, is paradigmatically instrumental but has evaded a successful, coherent theoretical treatment. Both Thorndike and Skinner included punishment within their original models

(Thorndike, 1911; Skinner, 1938) but discarded or neglected it in their later treatments of instrumental behaviour (Thorndike, 1932; Skinner, 1953). Since then reinforcement has received significantly more attention, with punishment theories remaining relatively unconsolidated (Boe, 1969; Estes, 1969). Boe (1969) noted that despite renewed interest in the decade leading up to his review, research into punishment had not resulted in a

12 Chapter 1: Punishment Learning and Behaviour corresponding distilling of theoretical understanding. Estes (1969) also lamented at the lack of progress researchers had made in explaining punishment. He suggested that this was in part due to punishment behaviour evading simple and elegant explanations like other variants of appetitive and aversive associative learning (Church, 1963), but also because punishment was seemingly regarded as a secondary or derivative task for learning theory (Estes, 1969).

Given punishment, like Pavlovian conditioning and reinforcement, affects behaviour, and in particular tends to decrease responding, the role of learning theory is to account for this effect. Four categories of theories have been proposed: the negative law of effect, the conditioned emotional response (CER), avoidance, and instrumental suppression. Each of these theories has not been significantly elaborated upon in the last 30 years, with the most thorough treatments of their merits being found in Church’s (1963) review, Punishment and aversive behaviour (1969), and chapter’s 5 and 6 of Mackintosh’s (1983) Conditioning and associative learning.

1.2.1 Negative Law of Effect

The first significant theory of punishment was proposed by Thorndike, and was, as previously stated, the second half of his law of effect. While the first part states that an S-R association is strengthened if followed by a “satisfying state of affairs” (i.e. reinforcement), the second part states that the S-R association is weakened if the consequences of that connection are an “annoying state of affairs” (Thorndike, 1913, p. 4). Given it is the negatively valenced half of the law it has often been referred to as the negative law of effect

(Rachlin & Herrnstein, 1969).

The fundamental implication of the negative law of effect is that punishment is an unlearning of an S-R association; the loss of responding following punishment is due to the dissolution of associations that once elicited that response. This makes punishment conceptually identical to his conception of extinction of a reinforced behaviour. Both are due

13 Chapter 1: Punishment Learning and Behaviour to a weakening S-R association due to an annoying state of affairs, with the much faster state of “unlearning” as measured by decrement in responding for punishment compared to extinction (Estes, 1944) due to the relatively more annoying state of affairs of the former compared to the latter. Thorndike later abandoned this position due to the variable and unreliable effect of punishment on behaviour (Thorndike, 1932), and this view was also challenged and rejected by most subsequent learning theorists (Estes, 1969; Skinner, 1953;

Rescorla & Solomon, 1967). However, it was still appealed to by later theorists (Azrin &

Holz, 1966; Rachlin & Herrnstein, 1969), though that appeal might have been in name only and did not necessarily assume Thorndike’s connectionistic dissolution of S-R associations

(Farley & Fantino, 1978).

Evidence for the negative law of effect, apart from the typical reduction in punished responding, comes from studies testing the effect of punishment within the matching law, as mentioned in the Instrumental conditioning subsection (p. 8). When two responses are available, behaviour is precisely apportioned to each response according to the relative rate of reinforcement associated with each response (Herrnstein, 1961, 1970). It has been found that this choice between responses is sensitive to punishment, such that punishment shifts responding away from punished responses to unpunished responses (Farley & Fantino, 1978;

Rachlin & Herrnstein, 1969). Rachlin and Herrnstein (1969) trained pigeons to peck two keylights of different colours. Pecking one keylight caused the delivery of a large food reward while pecking the other keylight caused delivery of a small reward, and pigeons came to peck the large reward keylight proportionally more than the small reward keylight. They then began to punish responding on the large reward keylight through increasing intensity electric shocks. These shocks significantly decreased the rate of responding on the punished response, shifting choice towards the unpunished response. Crucially, this shift follows the same pattern as the matching law, such that response rate has a linear relationship with response value,

14 Chapter 1: Punishment Learning and Behaviour regardless of whether the outcome of responding is appetitive and/or aversive (Farley &

Fantino, 1978). This symmetry of punishment and reinforcement suggests that they have a similar theoretical basis, e.g. both act on an S-R association. It must be noted that these findings were not explicitly used to endorse unlearning, given findings irreconcilable with unlearning existing at the time (see below), but interpretations were ambiguous and argued that the negative law of effect was preferable to the popular CER and avoidance theories of the time (Farley & Fantino, 1978; Rachlin & Herrnstein, 1969).

Another experiment performed by Rachlin and Herrnstein (1969) was used to support the negative law of effect over alternative theories. They trained pigeons to peck a red button for food, and not peck for food (reinforced non-pecking) when the button was green. After pecking and non-pecking had been acquired according to their SD, the reinforced response was punished with increasing intensities of shock (i.e. pecking during red led to food and shock, not pecking during green led to food and shock). They found that the pecking response decreased over the course of punishment, but the non-pecking remained at baseline.

Rachlin and Herrnstein used this finding to suggest that punishment only applies to specific responses like pecking and not to non-specific responses like non-pecking, which is easily explained by a traditional conception of the negative law of effect but not avoidance theories (this will be discussed later in the section on avoidance theories, p. 25). Using the negative law of effect to explain these results, the pecking of the red button could be explained as an S-R connection (red button stimulus, peck button response) that had been strengthened with food. The green button did not elicit pecking because the green-peck S-R connection was weakened with non-reinforcement, and “acquiring a non-response” would be, on the surface, indistinguishable from just not responding. When the punishment contingency is introduced, the red-peck S-R connection is quickly weakened, resulting in no pecking when

15 Chapter 1: Punishment Learning and Behaviour the red button is presented (S-R connection dissolved). However, punishment had no effect on non-pecking because there was no S-R connection to weaken.

However, Beavers and Perkins (1977) disagreed with Rachlin and Herrnstein’s conclusion, showing the same result was found when the shock was delivered regardless of response, and argued that the negative law of effect was still effective for nonspecific responses but might be counteracted by response-independent effects (e.g. conditioned suppression; the cessation of a reinforced behaviour on presentation of a fear CS). This punishment-like behaviour suppression by Pavlovian CSs will be discussed in the next subsection.

One of the first findings that suggested the negative law of effect is an insufficient account for punishment was the observation that behaviour returned when the punishment contingency was removed, even under conditions of non-reinforcement (Estes, 1944). In several experiments described in Estes’ (1944) seminal monograph, rats were trained to press a lever for food on an FI-4min schedule. After training, control group rats were given access to the lever under extinction conditions (no reinforcement) for several days, while experimental group rats were also given access to the lever under extinction conditions, but on the first day of extinction were also “severely” punished with electric footshock for each press of the lever (severity was determined by shock causing complete suppression of responding). He found that the punished experimental group pressed the lever significantly less than the unpunished control group on Day 1, suggesting accelerated extinction. However, leverpressing returned in the experimental group on following extinction sessions

(unpunished) while leverpressing continued to decline in the control group. In fact, responding in the experimental group would surpass that found in the control group, despite the extinction conditions. Estes pointed out that if punishment simply removed the S-R association this return of responding would not occur. He concluded that punishment involves

16 Chapter 1: Punishment Learning and Behaviour the suppression of behaviour, not its elimination, and would therefore resurface following removal of the noxious stimulus. He instead subscribed to a conditioned emotional response theory of punishment.

1.2.2 Conditioned Emotional Response theory of punishment

The Conditioned Emotional Response (CER) theory posits that the suppression of behaviour by punishment is due to the acquisition of a competing fear CR, such as freezing or conditioned suppression (the cessation of a reinforced behaviour on presentation of a fear CS;

Estes & Skinner, 1941). This theory is particularly subversive for punishment because it suggests no learning about the response is actually involved and that, despite an instrumental contingency being enforced, nothing about punishment behaviour is in fact instrumental. It argues that punishment can be fully accounted for by Pavlovian conditioning.

Estes and Skinner (1941) showed that CSs can effectively suppress reinforced behaviour.

The aversive stimuli typically used for punishment are similar to the aversive USs employed in aversive Pavlovian conditioning. Therefore, the environmental or proprioceptive stimuli that precede a response can act as CSs that are conditioned by the punishment. For instance, if rats were shocked for pressing a lever (i.e. an instrumental punishment contingency), the shock might function as a US for the lever CS. The rats would then cease to press the lever because the lever CS elicits a CR of freezing, conditioned suppression and/or avoidance. This is a particularly problematic conjecture for the belief that punishment is an instrumental phenomenon because any behaviour and aversive outcome necessarily involve potential CSs and USs. It stands to reason that the CER account of punishment, if taken to its extreme, is unfalsifiable.

However, there are several observations that provide evidence that conditioned fear does not fully account for punishment learning and behaviour. The first line of evidence came from studies by Hunt and Brady (1951), who studied the effects of leverpress-dependent and

17 Chapter 1: Punishment Learning and Behaviour leverpress-independent shocks on behaviour in rats. They reported that response-independent shocks (Pavlovian contingency) caused responses typical of Pavlovian fear – freezing, crouching and defecating (see also Estes & Skinner, 1941), whereas response-dependent shocks (punishment contingency) generated a dominant response pattern of abortive leverpressing without noticeable autonomic disturbance (animals moved freely, absence of piloerection and defecation). These differences in the topography of behaviour suggest that punishment does not involve the same type of behavioural control as Pavlovian fear learning.

That is, Pavlovian fear results in diffuse, innate fear responses unrelated to the response, whereas punishment selectively suppresses the punished response without a diffuse “anxiety” response (Hunt & Brady, 1955). Thus, the effect of an aversive stimulus on subsequent behaviour is dependent on the contingency controlling the presentation of that stimulus, so equating punishment suppression with Pavlovian fear seems unwarranted.

It has been noted that response-dependent and response-independent shock cause differing levels of suppression, with several studies finding that a punishment contingency is much more effective at reducing a behaviour than a Pavlovian contingency, so long as the shock contingency is enforced (Annau & Kamin, 1961; Azrin, 1956; Goodall, 1984; Hoffman

& Fleshler, 1965; Rachlin & Herrnstein, 1969; Schuster & Rachlin, 1968), though one study found more suppression in CER compared to punished pigeons (Orme-Johnson & Yarczower,

1974). It has also been repeatedly shown that fewer shocks can result in greater suppression of responding when shocks are contingent on responding compared to response-independent shocks (Azrin, 1956; Camp et al., 1967; Church, 1969).

This difference in effectiveness for a noxious stimulus to suppress behaviour depending on whether that stimulus is contingent upon response or not is particularly evident at lower shock intensities. For example, Annau and Kamin (1961) observed no suppression of bar pressing for food in the presence of a CS signalling the delivery of a 0.28mA footshock.

18 Chapter 1: Punishment Learning and Behaviour

However, if the same 0.28mA shock is used to punish bar pressing in the presence of an SD, rapid suppression of pressing is found. This stark difference between the suppressive effects of response-dependent and response-independent shock has also been found at higher shock intensities. For example, a recent paper by Bouton and Schepers (2015) found maximal suppression when punishing leverpresses on a VI-60-sec with a 0.6mA shock, but virtually no suppression of leverpressing in a group that received response-independent shocks.

Hunt and Brady (1955) trained rats to press a lever with intermittent 3-min presentations of a clicking noise. For the CER group, this noise served as a CS that signalled two 1.5mA shocks (occurring during the CS), whereas it served as an SD for shock for the punishment group (1.5mA shocks for every leverpress at designated times during the SD). While there was no leverpressing during the clicking noise for either group (maximal suppression attributable to high shock intensity), they observed that leverpressing was more greatly suppressed during periods the noise was not presented for the CER group compared to the punished group. This suggests that Pavlovian fear generalizes to unpunished responses more readily than punishment. Also, when the noise no longer signalled shock (i.e. subsequent extinction sessions), responding rapidly returned for the punishment group but not the CER group. This suggests that punishment-suppression is less resistant to extinction than response-independent shocks – punishment offers a relatively transient suppression of behaviour, which is partially why theorists like Skinner abandoned punishment as an effective modifier of behaviour.

However, a significant caveat of this paper was that the number of shocks received by the punishment group was significantly less than the CER group; despite the potential for far more shock deliveries for the punished group given the punishment contingency used, response suppression was potent enough to reduce shocks received to fewer than half of that received by the CER group (which only delivered 2 shocks per 3-min clicker trial). Therefore

19 Chapter 1: Punishment Learning and Behaviour these differences in generalization and extinction could be attributed to the lower number of shocks received by the punished group.

A solution to this issue has been to yoke noxious stimulus presentations in CER condition to presentations determined in a punishment condition. This causes the intensity, number, and distribution of noxious stimuli to be matched between conditions, with the punishment condition receiving response-dependent punishers while the CER condition receives identical, but response-independent, noxious stimuli. This can be done between-subjects so that a subject in the punishment group determines noxious stimulus presentations for a subject in the

CER group, or within-subject so that a punishment trial determines noxious stimulus presentations on a subsequent CER trial. Though there are potential issues with this approach

(see Church, 1963, 1964; Kimmel & Terrant, 1968), it provides a valuable control of stimuli presentations when effects of response-dependent and response-independent contingencies are to be compared.

It was using this yoked procedure that Bouton and Schepers (2015) showed negligible suppression of barpressing following response-independent 0.6mA shocks if yoked to a punished group (that showed maximal suppression). Hoffman and Fleshler (1965) trained pigeons to peck for food. A tone was presented, and the first peck after 2 minutes (FI-2min) caused termination of the tone and delivery of electric shock for a punished pigeon.

Presentations of the tone and shock were yoked to another pigeon (yoked group), whose responses had no influence on tone or shock presentations. Along with the usual finding that punishment suppressed responding more than response-independent shocks, they found that yoked pigeons suppressed responding to a greater extent in the pre-tone period compared to punished pigeons, supporting Hunt and Brady’s (1955) notion that suppression driven by response-independent noxious stimuli generalises more than suppression driven by punishment. They also noted that once shock was no longer delivered (punishment extinction)

20 Chapter 1: Punishment Learning and Behaviour responding returned much faster within the punished group compared to the yoked group, another difference between punishment and Pavlovian fear noted by Hunt and Brady, except in this case number and distribution of shocks were equated. However, an important consideration with this conclusion of less resistance to extinction in punishment is that there was a relatively low level of suppression in the yoked pigeons at the beginning of extinction.

This issue of unequated suppression and/or shock deliveries seems difficult to surmount, given the difference in suppression generated by the different contingencies, despite the same distribution of shocks. However, when either suppression or shock deliveries are equal, punishment contingencies have resulted in quicker recovery of responding once noxious stimuli presentations are omitted (Hoffman & Fleshler, 1965; Hunt & Brady, 1955).

Punished behaviour returns when it is placed in an unpunished context, regardless of whether being returned to a previously unpunished context (ABA renewal) or a new context

(ABC renewal) (Bouton & Schepers, 2015; Marchant et al.., 2013). This is unlike fear conditioning, which readily generalizes to new contexts, while fear extinction is more context bound (Bouton, 1988). Thus, even if fear (which would suppress responding) is extinguished in one context, fear is still expressed in non-extinguished contexts, i.e. it is subject to ABA,

AAB and ABC renewal (Thomas et al., 2003). In other words, response suppression induced by punishment does not result in suppression within other contexts (does not generalise), whereas response suppression induced by Pavlovian fear does generalise, causing suppression within other contexts, even if suppression was fully abolished within another context through fear extinction. It is notable that punishment’s context specificity is more characteristic of

Pavlovian fear extinction, the opposite of Pavlovian fear. This contrast between punishment and fear suggest that punishment and fear are encoded and/or retrieved in fundamentally different ways. However, this difference in renewal has yet to be directly compared, in part due to the difficulty in matching suppression while also matching the number of punishers

21 Chapter 1: Punishment Learning and Behaviour delivered on these different contingencies, preventing any strong inferences about differences in return of responding (Bouton & Schepers, 2015; Brady & Hunt, 1955).

Response-dependent stimuli

The findings that response-contingent punishers induce more suppression than response- noncontingent noxious stimuli, though a serious challenge to Pavlovian accounts, can still be explained using a CER account of punishment; response-contingent punishers might be more effective because the stimuli preceding the shock are more reliable (as response determines the organism’s location etc). Estes (1969) argued that it was the confluence of reliably predictive stimuli that drove the greater suppression found in punishment versus CER contingencies.

One piece of evidence against this notion comes from an experiment by Beauchamp

(1966; as reported in Church, 1969). He trained rats to press a lever on a VI-1min schedule.

After this a 10-sec white noise was presented (on a VI-2min basis), which either co- terminated with a .25mA shock regardless of responding during that 10-sec noise (CER group) or only if the rat pressed the lever during that 10-sec period (punished group). By arranging for the shock to be delivered at the end of the signal, the contiguity between response (and related stimuli) and shock was weakened, but the overall contingency between response and shock was controlled (response-dependent versus response-independent contingency). Beauchamp found that rats in the punished group suppressed responding during the noise more than the CER group, eventually decreasing the number of shocks received compared to the CER group. Assuming the rats in the punished group were no more likely to receive shocks immediately after/during pressing the lever than the CER group, this finding cannot be explained by CER accounts. Given the fewer number of shocks received by the punished group, if CS-US association were the only type of association involved they should have undergone extinction and pressed the lever more than CER rats.

22 Chapter 1: Punishment Learning and Behaviour

Another compelling line of evidence against fear of environmental stimuli (particularly location of punishment) driving punishment suppression came from experiments done by

Bolles and colleagues (1980). In one experiment they reinforced two separate responses on the same lever (pulling and pressing), and subsequently punished only one of these responses with a 0.4mA shock on an FR-10 schedule. They argued that if a stimulus-shock contingency drove suppression, and not a response-shock contingency, then suppression of both the punished and unpunished response should occur since both responses occur in the same location, with no overt discriminating stimuli to drive differential responding. They reported that rats learned to suppress responding to the punished response but not the unpunished one, though there was a brief and minor suppression of the unpunished response initially. This specific suppression of the punished response suggests that rats were sensitive to the response-shock contingency. Bolles and colleagues also concluded that the initial suppression of both responses indicated that the CER does exert an effect, but only initially. Attempts to explain these findings using a CER account are extremely limited and would have to resort to relatively untestable CSs such as proprioceptive feedback.

Finally, punishment is subject to the control of SDs signalling the punishment contingency. The CER account would suggest the SD is in fact a fear CS. Azrin (1956) tested this hypothesis and found that pigeons punished on an FI or VI schedule when an orange light was presented (SD) suppressed responding during the orange light more than pigeons that only received shocks after a response-independent amount of time after orange light onset (CS).

They also found that the response-independent condition showed more suppression of responding when the orange light was not present (fear generalization) and greater resistance to extinction, replicating the findings of Hunt and Brady (1955).

This control by an SD also allows a more powerful test of the Pavlovian fear/CER account of punishment. Goodall and Mackintosh (1987) trained rats to leverpress for food. In Stage 1

23 Chapter 1: Punishment Learning and Behaviour

(following training), rats continued to leverpress, but sessions also included two 3-minute presentations of a clicker that functioned as either an SD signalling punishment (VI-60-sec), and two 3-minute presentations of a tone that functioned as a CS (shocks were yoked to the preceding SD), whereas a control group did not receive any tone or clicker presentations. In

Stage 2, rats received leverpress training with a clicker-light or tone-light compound that functioned as an SD signalling a VI-60-sec punishment contingency. If the punishment SD was simply functioning as a CS then blocking of learning to the light should occur for both groups

(since both should account for the shocks being delivered in Stage 2 compound training), and thus no suppression of responding should be found if the light were presented on its own.

Instead, they found that rats that had received tone CS-light presentations and control rats did suppress responding to the light, whereas those that had compound clicker SD-light presentations did not suppress responding to the light alone; SD-SD but not CS-SD blocking had occurred. They also showed in another experiment that if the compound stimulus presented in Stage 2 signalled response-independent shock (by yoking it to an SD group), blocking by the fear CS did occur. This shows that a CS can prevent another CS from becoming a fear CS, but it does not prevent another stimulus from becoming an SD response- for a punishment contingency (whereas an SD does). This shows that CSs and SDs are not synonymous or interchangeable and hence provides a compelling dissociation between

Pavlovian and instrumental accounts of punishment (see also Orme-Johnson & Yarczower,

1974).

Taken together, although Pavlovian fear can contribute to behaviour suppression, this literature confirms that response suppression driven by response-independent contingencies has different characteristics to the response suppression caused by response-dependent contingencies. In other words, there is evidence for an independent instrumental effect of punishment beyond the Pavlovian effects of such aversive reinforcement. Indeed, even Estes

24 Chapter 1: Punishment Learning and Behaviour

(1969), the originator of the CER theory, conceded that CER, in this form, is an inadequate explanation of punishment.

1.2.3 Avoidance theory of punishment

The avoidance theory of punishment suggests that the suppression of a punished behaviour is the acquisition of an incompatible or competing response and was advocated by several theorists including Mowrer (1947), Dinsmoor (1954, 1977, 1998) and, to a lesser extent, Solomon (1964). It draws from the well-known finding that specific behaviours are reinforced if they cause the cessation (escape learning) or prevention (active avoidance learning) of an aversive stimulus. This is formally negative reinforcement, because a behaviour is increased by causing the omission of an aversive stimulus. In the case of a punishment contingency, response suppression could be driven by instrumental avoidance, because responses that replaced the punished response were reinforced through punishment- omission. By making punishment the by-product of reinforcement, punishment could be explained using a theory of reinforcement, which as previously mentioned seems the preferred domain for learning theorists (Estes, 1969).

Mowrer’s (1947, 1960) two-factor theory is the most frequently invoked avoidance hypothesis of punishment (Church, 1963). Mowrer proposed that the various stimuli preceding the noxious punishment stimulus come to produce fear (the CER hypothesis), and that subsequent behaviours that eliminated this fear (e.g. avoidance of those CSs) was reinforced. Thus his model involved two processes – Pavlovian fear learning, which gives way to an instrumental avoidance response. Although this model has instrumental components, unlike the CER account, its implicit reliance on stimuli-elicited CER to negatively reinforce avoidance means it generally suffers from the same limitations as the

CER theory of punishment.

25 Chapter 1: Punishment Learning and Behaviour

Dinsmoor (1954, 1977) was also a strong advocate of a two-factor conditioned avoidance account of punishment, and maintained this position throughout his writings on the subject, though sometimes appealing to other learning phenomena (such as discrimination learning) to explain the breadth of punishment phenomena. Dinsmoor was in fact primarily interested in escape, which is a procedure that reliably negatively reinforces a behaviour. He noted that if rats were shocked until they pressed a bar, not only would they learn to press that bar but they would also stay close to the bar, holding it in anticipation of the shock. He surmised that this preparatory holding was a result of shock cessation reinforcing a chain of responding that promotes escape. He applied this logic to punishment suggesting that a shock could reinforce a chain of responses that promote avoidance. Thus punishment is parsimoniously dealt with using reinforcement theory.

However, as Dunham (1971, 1972) pointed out, although an alternative response may increase as the punished response decreases, the escalation of the former is more readily accounted for by the decline of the latter, rather than the other way around (see also

Mackintosh, 1983). Dunham found that rats increased their running in a wheel after drinking was punished, but similarly so when drinking was prevented by removing the water bottle.

The common factor driving the increase in behaviour was not punishment acting as a negative reinforcer, but instead a decrease in the drinking behaviour, so that more time was necessarily accorded to other activities.

Shettleworth (1978) came to a similar conclusion. She punished hamsters with electric shock after rearing, scrabbling or grooming, and found that increases in new responses did not predict the extent to which the punished response was suppressed. The results were complex, with the research question of the paper actually being directed at Pavlovian underpinnings of instrumental behavioural change, but the conclusions were that acquisition of competing responses did not account for the effects of punishment.

26 Chapter 1: Punishment Learning and Behaviour

These findings are not particularly problematic for an avoidance account, because no particular behaviour is reinforced – any competing response that displaces the delivery of punishment will be reinforced. However, it does make avoidance by punishment extremely difficult to examine experimentally, or even unfalsifiable, because there is no control of the competing response, unlike escape learning.

Estes (1969) said as much in his critique of the avoidance hypothesis, pointing out “the notion that suppression of a response by punishment is primarily the result of its displacement by a competing avoidance response was never founded in direct observation of the supposed process of avoidance conditioning” (p. 63). He also noted that a moderate punisher reliably produced suppression in punishment procedures but rarely did in conditioned avoidance procedures, and that the two phenomena have distinct time courses (punishment suppression is produced significantly faster than avoidance behaviour). Lastly he cited the finding that punishment is usually ineffective at inhibiting a negatively reinforced avoidance response, and sometimes even facilitates the avoidance response.

Another line of evidence that suggests punishment is not simply the reinforcement of an alternative response comes from the previously described experiment by Rachlin and

Herrnstein (1969) investigating the negative law of effect. Punished non-pecking did not result in pecking, which is a specific response that is readily reinforced. Though this might be influenced by response-independent effects (Beavers & Perkins, 1977), the fact that an avoidance contingency is often not sufficient to reinforce pecking (see Hoffman & Fleshler,

1959) is problematic for an avoidance account of punishment.

The avoidance hypothesis of punishment is largely an extrapolation from reinforcement theory. However, it has been a dominant account of punishment (Church, 1963; Dinsmoor,

1977; Estes, 1969). Given its conceptual basis, it is perhaps worth applying its logical process to reinforcement. By the same logic, reinforcement could simply be negative punishment

27 Chapter 1: Punishment Learning and Behaviour

(non-reinforced behaviour is punished by reward omission). This might be a facetious point, but it is equally irrefutable. All logical arguments that can be made against a punishment- based reinforcement model are equally relevant to reinforcement-based punishment.

Given this logical interchangeability of the two phenomena, it is worth reaffirming the difference between the two procedures. It is generally agreed that a punishment procedure is one which involves a specific operant/response causing the delivery of a noxious stimulus, leaving the behaviours that do not cause punishment as unspecified (Boe, 1969; Rachlin &

Herrnstein, 1969; Solomon, 1964). In the case of an avoidance procedure, a specific operant/response prevents the delivery of a noxious stimulus, while the behaviours that would result in its delivery are unspecified and diffuse.

This does not preclude the possibility that negative reinforcement drives the decrease in punished responding found in punishment protocols. However, given evidence for this interpretation has not been demonstrated, more direct and succinct explanation should be favoured, and just as instrumental reward seeking is a more succinct explanation of reinforcement than negative punishment, positive punishment (instrumental suppression) is a more succinct explanation of punishment than negative reinforcement. That said, there are bodies of evidence that lend themselves more to an instrumental suppression interpretation of punishment over a negative reinforcement interpretation, which will be discussed in the following section.

1.2.4 Instrumental suppression

The final category of theories reviewed in this section conceive punishment as the opposite or “symmetrical” (Mackintosh, 1983) to reinforcement, and so would aptly be called

“punishment” as per its original definition by Skinner, were it not for the alternative theories of punishment. Thus, for clarity, this conception of punishment will be termed instrumental suppression, since it argues that a punisher primarily causes suppression of a response via an

28 Chapter 1: Punishment Learning and Behaviour instrumental association, i.e. not via dissolution of a reinforced S-R, suppression by a

Pavlovian association or negative reinforcement of a competing response. This hypothesis of punishment is therefore the only one presented here that does not attempt to explain punishment through other learning/behavioural phenomena. Proponents of an instrumental suppression interpretation of punishment include Bolles and colleagues (1980), Goodall

(1980), and Mackintosh (1983), following a resurgence of interest in punishment in the late

1970s to early 1980s that produced a body of evidence that supported this account over the previously dominant interpretations.

Punishment behaviour has many properties that make it seem like the inverse of reinforcement. As reviewed in previous sections, a punishment contingency can be signalled by an SD in the same way a reinforcement contingency can, and this SD does not function as a

CS but as a signal of a response contingency (Goodall & Mackintosh, 1987). The level of suppression is generally proportional to the rate of punishment (Azrin et al., 1963; Bolles et al., 1980; Goodall, 1984) and intensity of the punisher (Azrin, 1960; Camp et al., 1967;

Church et al., 1967; Karsh, 1962). Increasing the delay between response and delivery of punisher decreases the effectiveness of the contingency to elicit response suppression (Azrin,

1956; Baron, 1965; Camp et al., 1967), just as delay between response and reward delivery weakens reinforcement (Chung & Herrnstein, 1967; Tarpy & Sawabini, 1974).

Appel (1968) showed that punishment on an FI schedule results in an inverted scallop in cumulative responding. A high rate of responding is found immediately after shock punishment, which then exponentially slows down as the time when the response would cause a shock approaches, i.e. the opposite of FI reinforcement. Just as reinforcement on a ratio schedule supports a steady rate of responding, punishment on a ratio schedule supports a relatively steady level of suppression. This indicates that organisms can be sensitive to the schedule of punishment in the same way they are sensitive to schedules of reinforcement.

29 Chapter 1: Punishment Learning and Behaviour

This is not easily explained by the negative law of effect or CER theories, but is perfectly predicted if punishment was instrumental suppression.

These effects of reinforcement were elucidated by Skinner, and with his original formulation of punishment as reinforcement’s opposite, it begs the question why he failed to find these effects, leading him to dismiss their similarity. A student of Skinner, Estes (1969), attributed the oversight to their use of very mild punishers that did not reliably result in detectable suppression, and when it did such suppression was slight and transient. This contrasted with the reliable and potent effects that reinforcement had on behaviour, which

Skinner (1953) found more compelling given his interest into the societal applications of his findings.

It has been noted by many punishment researchers that, while there is by and large a linear relationship between punishment intensity and response suppression, the effects of a punishment contingency across the spectrum of noxious stimulus intensity can be qualitatively distinguished (Azrin & Holz, 1966; Church, 1963; Solomon, 1964). At the lowest intensities the punisher is not in fact aversive and is only detected; it causes no suppression but can serve as an SD and arousing stimulus (Azrin & Holz, 1961). Mild punishers cause temporary suppression followed by complete recovery, presumably due to habituation to the punisher (Banks, 1976; Miller, 1960). Moderate punishers cause partial suppression, such that responses are greatly diminished without complete recovery. Severe punishers can cause complete suppression of the response with no recovery (e.g. Karsh, 1963;

Appel, 1963).

What distinguishes mild from moderate depends on the species and strains being used

(Storms et al., 1963). Many studies examining this issue (e.g. Karsh, 1963; Appel, 1963) find that moderate punishers are crucial in the effective suppression of reinforced behaviours, which is to be expected if the punishment contingency is superimposed on a reinforcement

30 Chapter 1: Punishment Learning and Behaviour contingency under conditions of hunger or thirst. This is not likely due to a limitation of punishment to control appetitive behaviour – consummatory responses have been permanently suppressed by punishment (Lichtenstein, 1950; Masserman & Pechtel, 1953), effects that seem most attributable to punishment learning (Holz & Azrin, 1962; Masserman,

1943).

Negative incentive motivation

The most intuitive explanation is that the incentive to perform a response (driven by reinforcement) is counteracted by incentives not to press (driven by punishment) (Logan,

1969). In situations where the incentive to perform an action is great (strong response- reinforcer contingency, strong motivation to respond), a sufficient incentive not to perform an action must be introduced to detectably displace the action (strong response-punisher contingency, strong motivation to not respond).

This can be framed within the matching law, where rate of responding and choice between responses are determined by the relative incentive to perform actions (Logan, 1969).

As described in the negative law of effect subsection (p. 14), punishment shifts responding away in a manner opposite to reinforcement (Farley & Fantino, 1978; Rachlin & Herrnstein,

1969). Logan (1969) gave rats the choice of running down one of two alleys, one of which ending in a goal-box containing a large reward, the other ending in a goal-box containing a small reward. After a preference for the large reward alley was established, delivery of shock within the goal-box was introduced. Preference to run down the large reward alley was decreased as a function of shock intensity (0, 75, 85, 100 and 115 volts) and shock contingency (shocked 100% or 50% of trials, 0-12sec delay in shock delivery), with low shock intensities and/or weak contingency only slightly shifting preference away from the punished large reward alley and towards the unpunished small reward alley, and high shock intensity and/or strong contingency shifting preference significantly, even to the extent of

31 Chapter 1: Punishment Learning and Behaviour choosing the small reward alley over the large reward alley. According to Logan, this demonstrated that the effect of punishment can be understood in terms of negative incentive, with response choice being based upon the net incentive motivation associated with each alternative (influenced by parameters of reward, punishment and motivation).

This interplay between appetitive and aversive factors was most notably suggested by

Konorski (1967), who argued that the appetitive and aversive motivational systems inhibited one another. Thus punishment would activate the aversive system, inhibiting the appetitive system and cause suppression of appetitive behaviour (given the appetitive system was not more prominently activated, thus overriding the punishment signal). This integrative model has been well established in Pavlovian learning (Dickinson & Dearing, 1979), but has received less direct attention within instrumental paradigms. Nonetheless it was appealed to in

Mackintosh’s (1983) treatment of punishment and is alluded to in Balleine’s instrumental theories (Dickinson & Balleine, 2002; Ostlund & Balleine, 2008b). It also finds correspondence in Gray’s (1981, 1982) biopsychological theory of personality, which posits mutually inhibitory behavioural inhibition and behavioural activation systems (BIS and BAS, respectively) that regulate behaviour.

Response-punisher associations

Given the above evidence, punishment seems to have the properties of instrumental learning elucidated by Skinner. However, does punishment learning show properties of R-O associations like reinforced behaviours (i.e. can it be goal-directed)? Is punishment suppression underpinned by the encoding of an R-O association, i.e. a response-punisher association, which causes a reduction in responding due to the undesirability of the outcome?

The premium demonstration of R-O associations, the reinforcer devaluation task, involves modifying the value of the outcome independently of responding and observing a change in responding for that outcome (as described in the instrumental conditioning section,

32 Chapter 1: Punishment Learning and Behaviour p. 8). It has been reported that a shock can lose its aversive value if its presentation reliably precedes an appetitive stimulus such as food in a process known as counterconditioning

(Erofeeva, 1916; Pearce & Dickinson, 1975). Dearing and Dickinson (1979) reported that a shock counterconditioned with water reinforcers was less able to punish a reinforced response

(FI-15sec in presence of a 70-sec tone SD) than if the shock was not counterconditioned (if shock and water were unpaired, or if water preceded shock presentations). However, this is not a demonstration of R-O associations, as the counterconditioning occurred prior to punishment. Thus evidence for response-punisher associations through outcome devaluation is yet outstanding.

However, there is evidence showing the effect of contingency degradation (another signature of R-O associations) on punished responding (Bolles et al., 1975; Church et al.,

1970). Bolles and colleagues (1975) trained rats to suppress responding during a 3-min white noise SD signalling a shock contingency. They found that introducing free shocks (response- independent) outside of the white noise SD period resulted in more responding during the SD.

Though not appealed to by the authors, a simple explanation of this finding is that responding returned because the contingency between response and shock was degraded.

Interestingly, over the course of extended training rats came to suppress responding during the SD, when punishment was response-dependent, but not as much when shocks were response-independent (yoked to punished shocks during the SD) when the SD was absent. This suggests rats learned to distinguish between the periods, and exhibited punishment contingency-appropriate suppression during the SD, without suppressing responding as much when shocks were independent of leverpressing.

Relevant evidence for response-punisher associations can also be had from other studies.

The studies by Goodall (1984) suggest that response-shock contingencies drive response suppression, and the stronger this contingency the lower the level of responding. Estes (1944)

33 Chapter 1: Punishment Learning and Behaviour showed that removing the punishment contingency caused a rapid and notable return of reinforced behaviour, even under conditions of extinction. This suggests an inhibitory process suppressing responding that is released upon learning that the punishment contingency is no longer enforced. Estes (1944) suggested conditioned suppression (CER), but given the evidence reviewed above that suggests this is not the cause of punishment, the only viable account remaining is an R-O association.

A response-punisher learning account was also endorsed by Bolles and colleagues

(1980). As previously described, they trained rats to perform two different responses on the same manipulandum and only punished one of those responses. The result of potent punished- response suppression and relatively unaffected unpunished responding led them to conclude an R-O association was formed.

Lastly, just as arranging a CS to reliably precede reinforcement delivery impairs acquisition of an instrumental behaviour (Pearce & Hall, 1978; St. Claire-Smith, 1979a), arranging a CS to reliably precede punishment delivery impairs the acquisition of punishment suppression (St. Claire-Smith, 1979b). As previously stated, instrumental associations are affected by relative validity, such that an instrumental contingency can be overshadowed by an apparent Pavlovian contingency. The only theory of punishment that can adequately explain this finding is the R-O account.

1.2.5 Synthesis and conclusions

Given the right parameters, an aversive outcome contingent upon a behaviour can potently suppress that behaviour; a punishment protocol effectively alters behaviour. Four distinct theories of punishment have been used to explain this phenomenon: the negative law of effect, the conditioned emotional response, avoidance, and instrumental suppression. These correspond to different types of learning: unlearning, Pavlovian fear conditioning, negative reinforcement, and learning involving the punished response, respectively. While the role

34 Chapter 1: Punishment Learning and Behaviour these forms of learning play as determinants of at least some behaviours is relatively undisputed, whether they account for the effects of punishment has been subject to ongoing debate.

Part of the problem is that some accounts have proven extremely difficult to refute experimentally (and could even be unfalsifiable). Though there is some evidence to dissuade a pure fear or avoidance interpretation of punishment, the fact that environmental and interoceptive stimuli necessarily precede a punished response, and alternative behaviours necessarily take the place of a punished behaviour, mean it is impossible to dismiss the possibility of fear CSs and negatively reinforced competing response contributions to the decline in responding observed. This has left the field in a state of relative incoherence.

However, when the experimental data are taken as a whole, it can be concluded that an instrumental suppression explanation of punishment accounts for the breadth of findings better than the other theories (see Table 1.1 for summary). In brief, the negative law of effect

(unlearning) cannot account for why behaviour returns once the absence of the punishment contingency is detected, despite no further reinforcement (Estes, 1944). While the CER account (Pavlovian fear) can account for this finding, further investigation into this possibility found that response-independent USs are much less effective at suppressing behaviour, and produce markedly different behavioural phenotypes to response-contingent punishers (Annau

& Kamin, 1961; Azrin, 1956; Bolles et al., 1980; Bouton & Schepers, 2015; Goodall, 1984;

Hunt & Brady, 1951). However, Bolles and colleagues’ (1980) findings suggest that in situations where Pavlovian fear does contribute to suppression, it is most notable during initial response-shock pairings, with weaker correlations between the response and the punisher, and is not restricted to the punished response (i.e. generalizes to unpunished behaviour) in within- subjects procedures.

35 Chapter 1: Punishment Learning and Behaviour

Table 1.1

Summary of evidence for each account of punishment.

Negative Instrumental Observation CER Avoidance Law of Effect Suppression Recovery of responding in punishment extinction x √ ? √ (Estes, 1944)

Punishment behavioural phenotype x x √ √ (Hunt & Brady, 1951)

Lack of generalisation to unpunished response ? x √ √ (Hoffman & Fleshler, 1965; Hunt & Brady, 1955)

Response rate dependent on punishment schedule √ ? √ √ (Appel, 1968; Azrin et al., 1963; Chung & Herrnstein, 1967; Goodall, 1984)

Rapid suppression √ √ x √ (see Estes, 1969)

Failure to punish non-responding √ √ x √ (Rachlin & Herrnstein, 1969)

More suppression for response- dependent than independent shock √ x √ √ (e.g. Annau & Kamin, 1961; Camp et al., 1967; Schuster & Rachlin, 1968)

Matching law/negative incentive (Farley & Fantino, 1978; Logan, 1969; √ x √ √ Rachlin & Herrnstein, 1969)

Contingency degradation x x √ √ (Bolles et al., 1975; Church et al., 1970)

(to be continued on next page)

36 Chapter 1: Punishment Learning and Behaviour

Table 1.1 (continued)

Summary of evidence for each account of punishment.

Negative Instrumental Observation CER Avoidance Law of Effect Suppression Increase in alternative responses √ x ? √ (Shettleworth, 1978; see Dunham, 1972)

Relative validity (Beauchamp, 1966; St. Claire-Smith, x x √ √ 1979b)

Suppression of specific response on single manipulandum √ x √ √ (Bolles et al., 1980)

Punishment SD blocks learning about SD, not CS x x √ √ (Goodall & Mckintosh, 1987)

Punishment renewal (Bouton & Schepers, 2015) x x ? √

Score 1 -7 7 14

Note: √ denotes evidence that is congruous with an account, x denotes evidence that is incongruous with an account, ? denotes evidence that has unclear bearings on an account. The score at the bottom of the table was calculated as follows: +1 for each √, -1 for each x, 0 for each ?. This score does not refer to an account’s merit, but to its ability to singularly account for the breadth of findings reviewed throughout Chapter 1. As noted in the chapter, CER fails to account for most characteristics about punishment on its own, but likely contributes to initial suppression of a punished response (Bolles et al., 1980). Conversely, avoidance is congruent with most findings, but this may be due to the relative unfalsifiability of avoidance, and avoidance may not in fact contribute to reductions in punished behaviours (see Dunham, 1972; Estes, 1969).

37 Chapter 1: Punishment Learning and Behaviour

The avoidance hypothesis, while particularly difficult to disprove, lacks any direct, persuasive evidence for its role in punishment (Estes, 1969). It also has great difficulty explaining the findings that punishment is subject to relative validity (St. Claire-Smith,

1979b) and that punishment of non-responding does not result in acquisition of responding

(Rachlin & Herrnstein, 1969). These findings are easily accounted for by instrumental suppression, along with other predictions such as sensitivity to schedules of punishment (e.g.

Appel, 1968).

The theory that most parsimoniously accounts for the data is instrumental suppression, which argues that punishment is due to formation of an R-O or S-R association symmetrical to those formed during appetitive instrumental learning. Thus it is reasonable to conclude that punishment behaviour is likely driven by instrumental suppression. However, while this best accounts for the data, it is not necessary to rule out contributions from other processes. In fact, experimenters advocating for an instrumental suppression interpretation have not dismissed other factors, particularly Pavlovian fear, on punishment. Instead they have attempted to elaborate on the various characteristics of each type of learning within specific punishment protocols (e.g. Bolles et al., 1980). Therefore the question becomes what the likelihood, degree and effect of contributions from other forms of learning would have on instrumental suppression, and what factors need to be taken into consideration.

The conclusion reached by this review is that punishment is, at a fundamental level, instrumental suppression of behaviour, but that it can be impacted by other processes of learning, e.g. Pavlovian fear and negative reinforcement. Any investigation into punishment as instrumental suppression will need to control for these other processes through careful selection of protocol parameters that preferentially invoke instrumental suppression learning over alternative processes, and/or account for the relative influence of other processes by measuring process-specific effects. It is suggested that a good protocol for investigating

38 Chapter 1: Punishment Learning and Behaviour instrumental suppression would use a tight response-punisher contingency (low FR or short

VI schedules), include an alternative unpunished response, and measure Pavlovian fear responses such as freezing. This would minimize other forms of learning, and account for influences from conditioned fear.

39 Chapter 2: Brain Mechanisms of Punishment

Chapter 2 Brain Mechanisms of Punishment

The lack of theoretical agreement on the psychological nature of punishment learning has hindered the delineation of its neural underpinnings. This stands in contrast to Pavlovian and appetitive instrumental learning, where consolidated theoretical understanding has benefited the investigation of its brain mechanisms. For instance, understanding of S-R and R-O components of reinforcement has allowed the determination of the structures and circuits underpinning these different processes (Balleine et al., 2007; Balleine & O’Doherty, 2010;

Hart et al., 2014), while the understanding of Pavlovian prediction error has promoted an elaborated understanding of the neural mechanisms of Pavlovian fear (McNally et al., 2011).

Understanding the brain mechanisms of punishment is also hampered by a lack of precision in the use of the term “punishment” by neuroscientists. It is not unusual for studies that do not employ any instrumental component (i.e. no response-punisher contingency) to refer to their study as using a “punishment task” because a purportedly non-desirable outcome is experienced. A recent example comes from the Frontiers Research Topics issue on

“Punishment-Based Decision Making” (May, 2014), containing 10 articles on the topic. Of the 6 articles that presented experimental data on “punishment”, only one used a task with a response-punisher contingency. The others used response-independent outcomes (aversive

Pavlovian procedures), or are better characterised as appetitive extinction (designated unrewarded response) or negative reinforcement (responding prevented aversive outcome) paradigms. In one study, where a choice was required for the task, the outcome was completely random, and would therefore only involve unconditioned responses, or superstitious learning spuriously related to punishment.

With these caveats in mind, the following chapter reviews the various neural systems and circuits potentially involved in punishment. The first section covers general neurotransmitter

40 Chapter 2: Brain Mechanisms of Punishment systems implicated, and the second covers the various brain structures and pathways implicated, followed by a summary. This review is somewhat long, but the intention is to review relevant findings from related literatures to hopefully generate insights into these brain mechanisms.

2.1 General Drug Effects on Punishment

It was noted early on by behavioural pharmacologists that relatively few drugs were able to specifically affect punishment-suppressed responding, even after systemic administration

(Witkin, 2002). For example, drugs well known for increasing instrumental response rates, such as amphetamines (Dews & Wenger, 1977), did not necessarily increase response rates suppressed by punishment (Barret, 1977; Geller & Seifter, 1960). Conversely, drug-induced release of punished responding did not necessarily result in a concurrent increase of unpunished behaviour (Barrett & Vanover, 1993; Witkin, 2002; Witkin et al., 2004), showing the increase in responding was specific to punishment behaviour, suggesting a specific neural system mediating this process. The release of punished responding found in these studies is unlikely due to drug-induced analgesia preventing detection of the shock punisher because potent analgesics such as morphine do not increase punished responding (Geller et al., 1963;

McCloskey et al., 1987; McMillan & Leander, 1975), and punishment-specific effects are even observed when non-painful punishers, such as pressurized air or reinforcement timeouts, are used (Spealman, 1979; van Haaren & Anderson, 1997).

These findings suggested that punishment is mediated by a distinct neurophysiological system, and that this system could be selectively modulated through drugs that act on that system. Therefore, drugs that selectively affected punishment could be used to identify the neural substrates and mechanisms supporting punishment suppression. This section will first overview anxiolytics, the most notable modulator of punished behaviour. This is followed by reviews of punishment-effects following manipulation of dopamine and norepinephrine.

41 Chapter 2: Brain Mechanisms of Punishment

2.1.1 Anxiolytics, GABA and serotonin

Anxiolytics include a wide variety of drug classes such as barbiturates, benzodiazepines, and drugs modulating serotonin transmission (Prut & Belzung, 2003). Anxiolytics are known to release punished behaviour, with this finding being so robust that anti-punishment effects have been considered the primary pre-clinical test for anxiolytics (Barrett & Vanover, 1993;

Commissaris, 1993; Witkin et al., 2004). Hence, punishment tasks have been used as pre- clinical screens for anxiolytics without particular interest in punishment itself.

This screening for anxiolytic properties was usually done using a procedure developed by

Geller and Seifter (1960), or a slight variation of it (Davidson & Cook, 1969; Vogel et al.,

1971). In Geller and Seifter’s original protocol, rats were trained to press a lever for food on a

VI-2min schedule. Once responding was acquired, they introduced regular 3-min tones indicating a continuous reinforcement schedule (CRF) of food throughout the 75min sessions.

After 7 days of training with this SD, a CRF leverpress-shock contingency was superimposed on the food-CRF during the tone. Thus the tone is both an SD for reinforcement and punishment, and for this reason has been called the Geller-Seifter conflict paradigm (Pollard

& Howard, 1979). If the shock was of a high enough intensity (0.6-0.85mA), rats would potently suppress responding during the tone, while responding remained high during unpunished periods when the tone was not present (Geller & Seifter, 1960, 1962). This procedure allowed for multiple within-subject assessments of drug effects. It also allowed the concurrent assessment of drug effects on punished and unpunished responding, so anti- punishment effects could be assessed alongside any changes in unpunished responding and general behaviour. Variants of this task include shorter unpunished VI schedules and increased fixed ratio for food and shock during conflict trials (e.g., VI-30sec reinforcement,

FR-10 shock; Davidson & Cook, 1969), or using water reinforcers and punishing drinking by

42 Chapter 2: Brain Mechanisms of Punishment electrifying the spout (Vogel et al., 1971). Conflict tests have also been extended to pigeons and monkeys (see Pollard & Howard, 1990).

It is worth noting that, given the near-maximal suppression induced by the punishment contingency within many of these procedures, only drug-induced release of punished responding (anti-punishment effects), and not increased suppression (pro-punishment effects), could be detected. While this was sufficient for the aims of these studies, it means that drug facilitation of punishment, which is of equal importance in attempting to map neural substrates, would not be detected. Also, because the drug was typically administered after extensive training, effects on acquisition of punishment, or other elements of punishment learning and behaviour, could not be revealed. Finally, because these studies were invariably uninterested in punishment itself, the authors were content with observing any form of increase in punished responding and did not test or analyse the potential psychological loci of an anxiolytic’s anti-punishment effect. While these limitations are important, the failure of drugs, such as stimulants and analgesics to reliably increase punished responding, help to narrow interpretations of drug effects to impairments in aversively-motivated suppression.

Barbiturates

Barbiturates are anxiolytics, sedatives and anticonvulsants (Johnston & Willow, 1982).

They agonise, i.e. promote, the effect of γ-aminobutyric acid (GABA) on GABAA receptors, which cause hyperpolarization (inhibition) of its neuron via chloride influx (Rudolph &

Knoflach, 2011). GABA is regarded as the principal inhibitory neurotransmitter of the central nervous system (CNS), with substantial distribution of GABA receptors throughout the brain

(Bowery et al., 1987). It has more recently been discovered that barbiturates also antagonise

AMPA/kainate receptors and voltage-activated ion channels (Lösher & Rogawski, 2012).

Early studies by Geller and Seifter (1960, 1962) found that barbiturates (phenobarbital and pentobarbital) significantly increased punished responding, even at low doses, without

43 Chapter 2: Brain Mechanisms of Punishment increasing unpunished responding. In fact, at moderate doses the sedative effects of barbiturates slightly decreased unpunished responding while significantly increasing punished responding. The relationship between dose and release of punished responding was linear until doses were high enough for the side-effects of sedation and ataxia to reduce all behaviours. These results have been replicated using different parameters (Cook & Davidson,

1973; McIntyre & Liddell, 1984), in pigeons (Barrett & Witkin, 1976; Brandao et al., 1980) and in monkeys (Gluckman & Stein, 1978; Patel & Migler, 1982). Koob and colleagues

(1988) showed that the anti-punishment effects of barbiturates are likely mediated by its action on GABA receptors; administering a GABA antagonist (IPPO), which had no anti- punishment effect on its own, blocked the increase in punished responding caused by phenobarbital in rats within a Geller-Seifter protocol.

The anti-punishment effects of barbiturates are mostly undisputed (Pollard & Howard,

1990). It is dissociable from, and therefore not attributable to, the sedative and anticonvulsant effects of barbiturates, and is linked to its action on GABAA receptors. However, the neural mechanisms that underlie its anti-punishment effects, such as which particular neuronal populations and circuits are involved, remain unclear.

Benzodiazepines

Benzodiazepines (BZs) were first developed in the 1960s and are structurally characterised by a fused benzene and diazepine ring, giving this class of drugs its name. They rapidly replaced barbiturates as the primary drug prescribed for anxiety, due to their much higher therapeutic index (Rang et al., 2012), but still have sedative and anticonvulsant properties (Barrett & Vanover, 1993). BZs, like barbiturates, are GABAA agonists, binding to several allosteric (non-GABA) binding sites on the GABAA receptor, known as BZ-binding sites (Sieghart, 2015). However, unlike barbiturates, BZs do not cause chloride influx without

44 Chapter 2: Brain Mechanisms of Punishment

GABA and also do not affect GABAA conductance beyond the maximum response GABA can produce on its own (Rudolph & Knoflach, 2011).

BZs, like barbiturates, release punished behaviour within conflict tasks. This has been demonstrated in a number of studies using different procedures (Geller-Seifter, Davidson-

Cook, etc), different BZs (chlordiazepoxide, diazepam, midazolam, etc) and in several species

(rodents, pigeons and primates). In fact, every study prior to 1990 measuring the effect of BZs within a conflict procedure reported an anti-punishment effect (Pollard & Howard, 1990).

Koob and colleagues (1988) showed that the anti-punishment effects of the BZs are mediated by its action on GABA receptors; administering GABA antagonist IPPO blocked the increase in punished responding caused by chlordiazepoxide in a Geller-Seifter protocol.

Pellon and colleagues (2007) extended these findings to different BZs and isolated actions to

BZ-binding sites using punishment suppression of licking in rats. Two different competitive

BZ antagonists, flumazenil and RU-34000, blocked diazepam-induced release of punishment.

These anti-punishment effects of BZs are dissociable from their effects on active avoidance. In fact, BZs can enhance the acquisition of active avoidance (Escorihuela et al.,

1993; McNaughton & Gray, 2000). This suggests that BZs anti-punishment effects are not attributable to impaired aversive motivation learning, as performance of this task is also aversively-motivated. The critical difference is that the conflict test involves instrumental suppression of a punished response (passive avoidance) whereas this avoidance task depends on negative reinforcement.

Ethanol

Ethanol has complex depressant effects on CNS activity via cellular mechanisms that, although unique, overlap with barbiturates (Rang et al., 2012; Spanagel, 2009). Ethanol, like barbiturates, acts on BZ binding sites and δ subunits of GABAA receptors, glutamate

45 Chapter 2: Brain Mechanisms of Punishment receptors, and voltage-activated ion channels (Glowa et al., 1988; Tabakoff & Hoffman,

1996).

Ethanol has anti-punishment effects within rodents, birds and primates in various conflict protocols not readily attributable to effects on unpunished responding (Barret et al., 1985;

Glowa & Barrett, 1976; Koob et al., 1988; Pollard & Howard, 1990). Ethanol releases punished licking for water in rats and mice to a similar degree as the BZ chlordiazepoxide

(Vogel et al., 1980) and barbiturate phentobarbital (Glowa et al., 1988), without affecting unpunished licking. The effects of ethanol on punishment are mediated by its action on

GABA receptors (Koob et al., 1988), and most likely the BZ binding site (Glowa et al., 1988).

Rasmussen and Newland (2009) tested the effects of ethanol on punishment in humans.

They reinforced clicking of two boxes on a screen with money, with clicking of a box resulting in +4 cents being added to a counter according to a VI-schedule. In some sessions a monetary punishment (-4 cents) was delivered for clicking one of the boxes on a VI-schedule

(25% longer VI for punishment than reinforcement on that same response). When given placebo, participants’ unpunished responding fit the matching law, and in punished sessions responses shifted away from the punished response. Ethanol had no effect on unpunished responding but dose-dependently increased choice of, and rate of responding on the punished response. Thus ethanol selectively impaired sensitivity to punishment but not reinforcement.

Serotonin and serotoninergic anxiolytics

Stein and colleagues (1977) argued that the anti-punishment effects of GABAergic anxiolytics were due to specific GABAergic inhibition of serotonin (5-hydroxytryptamine; 5-

HT). 5-HT is a monoamine neurotransmitter synthesized within nuclei running along the midline of the rostrocaudal extension of the brainstem, most notably in the raphe nuclei

(Hornung, 2003). It is known to play an important role in a vast array of physiological and behavioural processes, including homeostasis, mood regulation, sleep and arousal, social

46 Chapter 2: Brain Mechanisms of Punishment behaviour, pain and sensorimotor activity (Jacobs & Azmitia, 1992; Pattij & Schoffelmeer,

2015), but has also been argued to play a role in suppressing behaviour (Harvey et al., 1975) and processing aversive stimuli (Faulkner & Deakin, 2014; Wise et al., 1970).

Graeff and Schoenfield (1970) gave pigeons intramuscular injections of 5-HT antagonists methysergide and BOL-148 in a conflict procedure. They found a substantial increase in punished responding, in the order of magnitude produced by BZs. Stein and colleagues (1975) destroyed serotonin-containing terminals within rat brains using intraventricular infusions of

5,6-DHT. They found that lesions caused release of punished behaviour with no effect on unpunished responding within a conflict protocol. Faulkner and Deakin (2014) reviewed the effect of acute tryptophan depletion (ATD), a dietary manipulation that markedly decreases circulatory concentrations of the 5-HT precursor tryptophan. This manipulation is presumed to impair 5-HT synthesis and release. Of the 34 studies that tested the effects of ATD using tasks that involved non-reinforced behavioural inhibition, Pavlovian learning, instrumental reinforcement and instrumental suppression, Faulkner and Deakin noted that ATD only impaired behavioural inhibition if the inhibition was motivated by aversive outcomes. Taken together, these findings suggest that punishment behaviour depends on 5-HT transmission.

Returning to anxiolytics, in the 1970s a novel, non-BZ anxiolytic called buspirone was developed (Wu et al.., 1972). It was “anxioselective” (Taylor et al., 1984), that is it reduced anxiety symptoms as measured by rating scales in humans as effectively as potent BZs, but without the side-effects associated with those BZs, and did not directly interact with BZ binding sites (New, 1990). Buspirone was at first thought to act on the dopamine system

(Taylor et al., 1984). However, it has since been demonstrated that buspirone, and related anxiolytics (Barrett, 1992), likely produce their effects as partial agonists at the 5-HT receptor subtype 5-HT1A (Barrett & Vanover, 1993; Peroutka, 1985).

47 Chapter 2: Brain Mechanisms of Punishment

5-HT1A receptors cause hyperpolarization and are found pre- and post-synaptically, and even on 5-HT soma. This means the actions of 5-HT1A agonists, including buspirone, on neuronal firing are complex, including inhibiting 5-HT1A-expressing targets of the 5-HT system but also the 5-HT system pre-synaptically (Barnes & Sharp, 1999).

Despite its effectiveness at reducing clinical anxiety as much as diazepam (Goldberg &

Finnerty, 1979, 1982; Rickels, 1982), the effect of 5-HT1A anxiolytics on punishment have been less consistent than other anxiolytics (Barret & Witkin, 1991; Howard & Pollard, 1990;

Pollard & Howard, 1990; Sanger, 1990, 1992). While some studies report an increase in FR- punished responding following buspirone administration in rats (Sullivan et al., 1983; Young et al., 1987) and monkeys (Geller & Hartman, 1982), several studies have not (Gardner, 1986;

Goldberg et al., 1983; Wettstein, 1988). Sanger (1990) reported that 5-HT1A anxiolytics

(buspirone and ipsapirone) did not increase punished responding under an FR schedule in rats, but did increase punished responding if responses were punished on a VI schedule (Sanger,

1992). However, a study using VI-punishment in monkeys failed to find an effect of buspirone (Sullivan et al., 1983). In contrast, studies using pigeons have found robust and significant increases in punished responding by 5-HT1A anxiolytics (Barrett 1992; Brocco et al., 1990; Nanry et al., 1991; Pollard et al., 1992). The release of punished behaviour, regularly surpassing 1000% compared to within-subject controls and demonstrated using several types of 5-HT1A anxiolytics (e.g. buspirone, ipsapirone, flesinoxan) (Barrett, 1992).

This discrepancy of anti-punishment effects between mammals and birds seems unique to buspirone, because all other drugs tested result in similar effects on punished responding across species (Pollard & Howard, 1990). In fact, buspirone’s notable anxiolytic properties without concurrent effect on punishment raised the question of whether punishment only screened anxiolytics based on their ability to act on the benzodiazepine-GABA-chloride ionophore complex (Barrett & Vanover, 1993). The reason for this difference remains

48 Chapter 2: Brain Mechanisms of Punishment

unclear. The potency of various 5-HT1A anxiolytics at releasing punished responding in pigeons is highly correlated (r = 0.83) with their IC50 values of 8-OH-DPAT displacement (a measure of 5-HT1A binding), suggesting effects on punished responding are mediated by this receptor subtype (Barrett, 1992). Moreover, stimulants, non-5-HT1A antidepressants, antipsychotics and opioids are equally ineffective at releasing punished responding in pigeons, suggesting that anti-punishment effects of 5-HT1A anxiolytics is not attributable to action on other receptors/neurotransmitter systems (Barrett & Vanover, 1993).

Another type of serotonergic anxiolytic are 5-HT2 receptor antagonists, such as ritanserin and ketanserin. This receptor is expressed post-synaptically and, unlike 5-HT1A receptors, modulates ion flux to cause neuronal depolarisation (Barnes & Sharp, 1999). Similarly to 5-

HT1A partial agonists, 5-HT2 antagonist effects on punishment have been inconsistent

(Barrett, 1991; Howard & Pollard, 1990). However, in pigeons these drugs reliably produce increases in punished responding, though not as potently as benzodiazepines and 5-HT1A partial agonists (Brocco et al., 1990; Gleeson et al., 1989). Inconsistent effects have been attributed to differences in functions of 5-HT2 receptor subtypes, though further research with specific agonists and antagonists of these receptors is needed (Cervo & Samanin, 1995).

It has also been found that blockade of 5-HT2 receptors greatly enhance 5-HT1A receptor- mediated effects on punishment. Using a compound that concurrently antagonized 5-HT2 receptors and agonised 5-HT1A, e.g. WY-50,324 (Millan et al., 1992), several studies using pigeons have produced extremely large increases in punished (approximately 10,000% of control responding), while decreasing unpunished responding (Barrett & Zhang, 1991;

Colpaert et al., 1992; Millan et al., 1992).

It has also been reported that selective serotonin reuptake inhibitors (SSRIs) have anti- punishment effects (Herzalleh et al., 2013; Macoveanu, 2014). Herzalleh and colleagues

(2013) found that major depressive disorder patients receiving SSRI treatment were impaired

49 Chapter 2: Brain Mechanisms of Punishment in punishment learning compared to patients not receiving SSRIs and healthy controls.

However, this study did not randomly assign patients to treatment, so confounds including severity of symptoms might underpin the findings.

The mechanism by which SSRIs cause these anti-punishment effects remains unclear

(Macoveanu, 2014; Christmas et al., 2008). Anti-punishment effects due to SSRI-blockade of

5-HT reuptake seems to contradict previous findings that decreased 5-HT transmission attenuates punishment. It has been suggested that SSRIs cause inhibition of 5-HT neurons by acting on 5-HT1A autoreceptors (Barnes & Sharpe, 1999; Gartside et al., 1997), and may also reduce 5-HT signalling by desensitizing 5-HT2C receptors, (Christmas et al., 2008).

2.1.2 Dopamine

Dopamine (DA) is a monoamine of the catecholamine subclass, and is most notably synthesised by two nuclei located within the ventral midbrain: the ventral tegmental area

(VTA) and substantia nigra (SN). Dopamine exerts its influence via two families of dopamine receptor: D1-like (D1 and D5 receptors) and D2-like (D2, D3 and D4 receptors), with D1 and D2 being the most abundantly expressed. These receptors are G-protein coupled and expressed post-synaptically, but D2-like receptors are also expressed pre-synaptically and thus also function as autoreceptors (Jaber et al., 1996).

DA has traditionally been associated with reinforcement. Stimulation of the DAergic regions of the brain is reinforcing (Adamantidis et al., 2011; Ilango et al., 2014; Kim et al.,

2012), and reinforcing stimuli such as natural rewards and addictive drugs increase DA cell firing and DA transmission (Becker et al., 2001; Everitt & Robbins, 2005; Small et al., 2003).

Mice unable to selectively produce DA do not exhibit instrumental behaviour unless DA function is restored, including performing basic behaviours necessary for survival despite having no gross sensory or motor deficits (Cannon & Palmiter, 2003).

50 Chapter 2: Brain Mechanisms of Punishment

It has been suggested that 5-HT inhibits the DA system, with the 5-HT and DA being conceived of as Konorskian oppositional systems, promoting aversive/suppressive and appetitive/reinforcing functions respectively (Cools et al., 2010; Daw et al., 2002; Dayan,

2012). This modulation of the DAergic reinforcement system by 5-HT depends on a complex arrangement of various 5-HT receptor subtypes on DA neurons, DA inputs and DA projection targets (Alex & Pehek, 2007).

Dubrovina and Zinov’eva (2010) trained mice to avoid the naturally preferred dark- portion of a light-dark box by shocking mice with a 2sec 0.5mA shock if they entered the dark portion. Mice passively avoided the dark portion on subsequent extinction trials (no shock).

Mice given systemic administration of a D2 antagonist sulpiride entered the dark portion significantly faster during extinction trials than control mice. This is despite the general suppressive effect of D2 antagonism on locomotion (Hillegaart & Ahlenius, 1987).

Conversely, mice that received D2 agonist quinpirole were significantly slower to extinguish avoidance of the dark portion and entered the dark portion significantly slower than controls.

There was no effect of D1 agonism or antagonism. Though this suggests a role for D2 receptors in punishment, this avoidance behaviour could be attributed to effects on Pavlovian fear instead of instrumental suppression. Interestingly, D2 antagonism (by raclopride) decreases active avoidance as well, suggesting blockade of D2 receptors may impair aversive learning (Hillegaart & Ahlenius, 1987), though whether this includes instrumental suppression has yet to be determined.

2.1.3 Norepinephrine

Norepinephrine (NE), like DA, is a monoamine of the catecholamine subclass. It is produced in several regions within the pons and medulla, as well as within the thalamus and throughout the autonomic nervous system. The locus coeruleus (LC), a nucleus located in the dorsal pons, is considered the most important NE system, providing NE innervation

51 Chapter 2: Brain Mechanisms of Punishment throughout the brain crucial for arousal and attention (Aston-Jones et al., 1999; Sara, 2009).

There are two categories of NE receptor: α-NE receptors and β-NE receptors, which have various subtypes.

Stein et al. (1977) suggested anti-punishment effects of BZs might be partially mediated its suppressive effect on NE activity. However, 5-HT1A ligands, including anxiolytic buspirone, tend to cause a large increase in LC activity and increase the release of NE in several brain regions (Barnes & Sharp, 1999; Hajos-Korcsok & Sharp, 1999). These ligands can have potent anti-punishment effects, contradicting the NE-reduction hypothesis of punishment. Also, the non-selective α-NE receptor antagonist phentolamine and non-selective

β-NE receptor antagonist propranolol did not have anti-punishment effects in conflict paradigms (Howard & Pollard, 1990).

However, several studies have found that clonidine, an α2-NE receptor agonist (but also agonises imidazoline receptors) has anti-punishment effects in conflict tasks (Howard &

Pollard, 1990; Kruse et al., 1981). Intraventricular infusions of (–)-NE, but not (+)-NE or DA, produced a large increase in punished responding (Stein et al., 1973). Margules (1968, 1971a,

1971b) was a strong advocate for the hypothesis that NE transmission was involved in punishment suppression, and suggested this role was mediated by NE innervation of the amygdala. He found that NE infusions into the amygdala abolished punishment suppression

(Margules, 1968). This amygdala-mediated anti-punishment effect was blocked by concurrent

α-NE receptor antagonist phentolamine, but not β-NE receptor antagonist LB-46, infusions into the amygdala (Margules, 1971a). Margules (1971b) suggested that NE release onto the amygdala “inhibits punishment” (p. 183), allowing for reinforced behaviour. Given NE receptor antagonists have no effect on punishment behaviour, but NE and NE agonists show anti-punishment effects, NE transmission within the CNS may mediate endogenous inhibition of punishment (e.g. during extinction of punishment).

52 Chapter 2: Brain Mechanisms of Punishment

Baarendse and colleagues (2013) examined interactions of 5-HT, DA and NE on instrumental decision-making. They trained rats to make nosepokes into various ports for sucrose reward. Nosepoking some ports resulted in immediate small rewards, but occasionally in a short reward timeout (defined as punishment) (advantageous). Nosepoking other ports resulted in large rewards, but could result in substantial timeouts, such that nosepoking these ports resulted in fewer pellets than the small reward port (disadvantageous). Rats learned to make advantageous over disadvantageous choices. The selective DA reuptake inhibitor

GBR12909, selective NE reuptake inhibitor atomoxetine, and SSRI citalopram had no effect.

However, concurrent administration of GBR12909 and atomoxetine decreased bias away from the disadvantageous response, suggesting concurrent upregulation of DA and NE can attenuate instrumental suppression. It is unclear whether this effect was driven by attenuated punishment or simply impaired instrumental extinction or comparison of appetitive value.

Taken together, increasing NE signalling, particularly within amygdala, attenuates punishment suppression (Howard & Pollard, 1990; Kruse et al., 1981; Margules, 1968, 1971a,

1971b; Stein et al., 1973). However, the reliable anti-punishment effects of anxiolytics, despite opposite effects on NE nuclei activity, indicate a less important role of NE compared to other neurotransmitters. The lack of NE receptor antagonist effects (Howard & Pollard,

1990), apart from blocking anti-punishment effects of NE infusions (Margules, 1971a), may indicate that NE is not directly involved in maintenance of punishment, but instead is responsible for its extinction. Concurrently increasing DA and NE transmission, which theoretically boosts reinforcement signals and inhibits punishment signals respectively, leads to an increase in choice of a disadvantageous response (Baarendse et al., 2013). It was suggested that DA and NE together promote testing a previously punished response and mediate the return of responding following omission of an anticipated aversive outcome.

53 Chapter 2: Brain Mechanisms of Punishment

2.1.4 Summary

The specific release of punished responding by a small subset of drugs, without significant effects on unpunished behaviour, has implicated distinct neurotransmitter systems in punishment. Barbiturates, benzodiazepines, ethanol and serotonergic anxiolytics display potent anti-punishment effects whereas many other classes of drugs such as stimulants and opioids do not. Only 5-HT effects appear subject to protocol parameters and species use.

It has been suggested that GABAergic anxiolytics derive their anti-punishment effects from inhibition of the 5-HT system. The 5-HT system, in turn, inhibits the DA system, which is known to mediate instrumental responding; punishment is thought to activate the 5-HT system, inhibiting DA neurons to cause instrumental suppression. The NE system has also been implicated in punishment, with NE agonists causing anti-punishment effects. However, the behavioural and neural loci of all of these effects on punishment remain unclear.

2.2 Implicated Circuits and Structures

Relatively few studies have examined the structures and circuits underpinning instrumental suppression. However, studies investigating the mechanisms of instrumental learning, aversion and behavioural control provide likely candidates for the aversively- motivated instrumental suppression that drives punishment behaviour. The following section reviews these circuits and structures: Gray’s Behavioural Inhibition System, the amygdala, midbrain DA pathways, the response-outcome circuit, and the prefrontal cortex.

2.2.1 Gray’s Behavioural Inhibition System

One of the earliest brain circuits implicated in punishment comprises the septum and hippocampus (septohippocampal system), and their monoaminergic afferents from the brainstem, including 5-HT from the raphe, NE from the LC and DA from the ventral midbrain

(Eison & Temple, 1986; Gray 1982; McNaughton & Gray, 2000). Early intracranial electrical

54 Chapter 2: Brain Mechanisms of Punishment stimulation studies in cats, dogs and monkeys by Kaada (1951) showed that stimulation of the anterior septum and anterior cingulate cortex (ACC) produced somato-motor inhibition, leading them to suggest the septum and ACC were involved in behavioural inhibition (Kaada et al., 1953). McCleary (1961) tested this hypothesis; lesions of the septum impaired passive avoidance of running to a goal-box or barpressing that led to shock (i.e. impaired suppression of punished responding), whereas ACC lesions had no effect. Interestingly, septal lesions had no effect on active avoidance, where running to the goal-box or pressing the bar prevented the delivery of shock (i.e. negative reinforcement), whereas ACC lesions impaired cats’ ability to acquire these negatively reinforced responses. This distinction led to McCleary (1966) isolating the role of response inhibition to the septum.

Although the hypothalamus and preoptic area were implicated (see Kaada et al., 1962), several researchers suggested that the septum’s role in response inhibition was mediated by the hippocampus (Altman et al., 1973; Kimble, 1969), lesions of which also lead to deficits in response inhibition (see Gray & McNaughton, 1983 for extensive review). This functional pathway is supported by the strong anatomical interconnectivity of the septum and hippocampus; medial septum projects to the hippocampal formation and the lateral septal area receives hippocampal efferents (Swanson, 1978).

The role of the septohippocampal system in response inhibition has been disputed by researchers that argued the septum and hippocampus were necessary for a variety of cognitive functions, particularly memory processes and spatial abilities (Black et al., 1978; Olton et al.,

1979; Ursin, 1976); they argued impairments in response inhibition were actually due to impairments in spatial ability and memory. Addressing these alternatives, Gray and

McNaughton (1983) reviewed hundreds of studies involving lesions of the septum and/or hippocampus within a variety of procedures testing different cognitive and behavioural processes. While a variety of effects were found following lesions of the septum and/or

55 Chapter 2: Brain Mechanisms of Punishment hippocampus, Gray and McNaughton noted that septal and hippocampal lesions reliably impaired learning and behaviour in tasks that required inhibition of instrumental behaviour, including punishment of instrumental responding that did not require spatial learning, e.g. unitary bar pressing, as well as impairments in spatial tasks such as maze learning.

However, tasks that did not require instrumental behaviour inhibition or spatial abilities had inconsistent effects following hippocampal or septal lesions. Pavlovian fear conditioning as measured by freezing and conditioned suppression was only impaired in some studies, and this was attributed to the generally observed increase in motor activity following lesions.

Importantly, reinforced responding, especially negatively reinforced active avoidance, was enhanced following lesions, showing that it was not a general impairment in aversively- motivated instrumental behaviour. Enhanced active avoidance also suggests generally impaired memory is not the cause of impaired passive avoidance. Instead, the impairment seems specific to suppression of instrumental behaviour paired with a negative outcome.

Gray (1981, 1982) used the finding that septal and hippocampal lesions impaired passive avoidance but facilitated active avoidance to develop his biopsychological theory of personality. He hypothesized that there were two general systems competing to guide behaviour: the mutually inhibitory behavioural inhibition and behavioural activation systems

(BIS and BAS, respectively), akin to Konorski’s opponent process model. Punishment activates the BIS (neurally embodied in the septohippocampal system), which suppresses

BAS-driven responding for reward.

Gray (1977, 1982) also noted that the effect of septohippocampal lesions was comparable to the effects of anxiolytics, and suggested that the anti-punishment effects of anxiolytics were due to their influence on the septohippocampal BIS system, particularly septal modulation of hippocampal theta activity. McNaughton and Gray (2000) surmised that hippocampal lesions and anxiolytics (both GABAergic and serotonergic) had the same effects

56 Chapter 2: Brain Mechanisms of Punishment when considering aggregate findings: shock sensitivity, escape, responding for continuous reinforcement are unchanged, two-way and non-spatial avoidance are enhanced, increased responding during partial reinforcement and extinction of reinforced behaviour, but impaired passive avoidance and spatial abilities.

This is supported anatomically, because α2 subunit-containing GABAA and 5-HT1A receptors, which various anxiolytics act on, are strongly expressed in the hippocampus and septum (Barnes & Sharp, 1999; Rudolph & Knoflach, 2011). Indeed, Eison and Temple

(1986) outlined a model of neurotransmitter system contributions to BIS to explain the different effects of anxiolytics. They suggested the shared anti-punishment effects of anxiolytics are due to preventing anxiogenic 5-HT modulation of septohippocampal BIS neurons.

Scales to measure the variation of BIS and BAS strength between individuals have been developed (Torrubia et al., 2001) – trait sensitivity to punishment and reward, respectively.

People with higher scores on the trait sensitivity to punishment scales suppress behaviour in response to negative outcomes more than those with lower scores (Avila & Torrubia, 2006).

BIS-related scores are associated with increased amygdala and hippocampal gray matter

(Barros-Loscertales et al., 2006), supporting the role of the hippocampus in punishment processing. Functional magnetic resonance imaging (fMRI) studies have also detected a correlation between BIS scores and punishment-induced amygdala-hippocampus co- activation (Hahn et al., 2010).

In summary, stimulation of the septohippocampal system suppresses behaviour, while lesions of this system impair suppression. Thus activity within the septohippocampal system appears to be sufficient and necessary for behavioural suppression. This BIS system is recruited to suppress behaviours that result in negative outcomes by inhibiting the

Behavioural Activation System that drives the reinforced responding. It has also been

57 Chapter 2: Brain Mechanisms of Punishment suggested that the anti-punishment effects of anxiolytics derive their effect from disrupting

BIS activity (Eison & Temple, 1986; Gray, 1977).

2.2.2 Amygdala

The amygdala has long been implicated in Pavlovian fear learning, with a traditional role for the lateral and basolateral nucleus of the amygdala (BLA) in aversive Pavlovian association formation (Kim & Jung, 2006; LeDoux, 1992; Maren, 2001). Principal cells of the

BLA receive inputs from the thalamus and cortex conveying information about the CS and

US, causing long-term potentiation of CS inputs onto BLA cells (Marek et al., 2013; Sah et al., 2003). This allows CS-driven recruitment of the central nucleus of the amygdala (CeA), which is thought to control the discharge of the various CRs through relevant downstream structures (Davis, 1992). The actions of glutamate at NMDA receptors are essential for this plasticity, as well as fear learning. Thus in rodents, BLA lesions, reversible inactivation, or microinjections of NMDA receptor antagonists, among other manipulations, each impair acquisition of Pavlovian fear learning. In humans, amygdala damage likewise impairs fear learning and neuroimaging studies show a robust and reliable change in the blood oxygenation level dependent signal during fear learning.

The amygdala also serves roles in instrumental conditioning. Apart from its interaction with the septohippocampal BIS system (Gray & McNaughton, 2000), the BLA has also been ascribed a more general role in instrumental conditioning, particularly in encoding response- outcome (R-O) associations (Corbit & Balleine, 2005; Parkes & Balleine, 2013). The evidence for this role of BLA in R-O associations will be discussed later in a section on the R-

O circuit (p. 75). The amygdala also has a well-documented sensitivity to aversive stimuli, such that BLA neurons fire in response to unexpected but not expected aversive stimuli

(Johansen et al., 2010). This prediction error coding allows the amygdala to form and update aversive associations. Though this has been used to outline the role of the amygdala in the

58 Chapter 2: Brain Mechanisms of Punishment modulation of Pavlovian associations, some research suggests a role for the BLA in encoding aversive associations in relation to its behavioural antecedents, i.e. for aversive instrumental associations.

Killcross and colleagues (1997) tested the role of BLA and CeA in Pavlovian fear versus conditioned punishment. They trained rats to press two levers for food on a VI-60sec schedule. They then made excitotoxic lesions of BLA or CeA, and trained rats on the same reinforcement schedule, except pressing one lever resulted in a 10sec tone CS that terminated with a 0.2mA 0.5sec footshock (CS+) on a VI-120sec schedule, while pressing the other lever resulted in a neutral clicker CS (CS-) on a VI-120sec schedule. Sham lesioned rats suppressed responding on the CS+ lever compared to CS-, which the authors interpreted as a conditioned punishment effect (suppression of responding that leads to an aversive CS). Sham lesioned rats also suppressed responding during the CS+ compared to the CS-, which they interpreted as conditioned suppression (Pavlovian fear). Rats that received CeA lesions did not show conditioned suppression (impaired Pavlovian fear), but they did suppress responding on the

CS+ lever compared to the CS- lever (intact conditioned punishment). Conversely, BLA lesion rats had intact conditioned suppression, but were impaired in conditioned punishment.

These results suggest that CeA is crucial for the production of aversive CRs, but BLA may be more important for guiding aversively motivated instrumental actions.

In humans, fMRI studies have implicated the amygdala in punishment. In these studies, punishment has been achieved via different approaches (e.g., monetary loss, loss feedback) and fMRI shows amygdala activation (Zalla et al., 2000) and amygdala interactions with the ventral striatum (Camara et al., 2009) and hippocampus (Hahn et al., 2010). Amygdala volume is correlated with sensitivity to punishment (BIS scale) (Barros-Loscertales et al.,

2006). Finally, a study on humans with bilateral amygdala lesions (Bechara et al., 1999)

59 Chapter 2: Brain Mechanisms of Punishment showed that they, unlike healthy controls, did not learn to avoid choosing from disadvantageous decks in the Iowa Gambling Task.

The amygdala has also been proposed as the locus of the anti-punishment effects of anxiolytics. Sommer and colleagues (2011) selectively lesioned 5-HT innervation of the amygdala in rats using microinfusions of neurotoxin 5,7-DHT. 5-HT lesions caused a failure to suppress drinking of glucose solution when paired with footshock, but no differences on unpunished responding, general anxiety within an elevated plus maze, and ethanol preference.

This suggests lesions did not release punished responding due to generally increased responding, diminished anxiety or impairments in taste discrimination. However, localisation of 5-HT action to the amygdala is doubtful because autoradiogram confirmation of lesions revealed significant 5-HT denervation in regions surrounding the amygdala (particularly the caudal portion of the striatum, globus pallidus, and entopeduncular nucleus).

Other systemic effects can also be localised to action on the amygdala. For example, intra-amygdala infusions of NE reduce the effectiveness of shock as an instrumental punisher

(Margules, 1971a, 1971b). Using a Vogel conflict procedure (punished licking for water), Liu and Glowa (2000) showed that punishment down-regulated BZ binding sites and α1 GABAA subunit mRNA expression in the rat BLA and thalamus, but not other relevant regions like the cortex or hippocampus, compared to unpunished controls. Administration of the BZ alprazolam counteracted this BZ-site and α1- GABAA receptor downregulation, but had no effect on other analysed brain regions.

Evidence for a role for the amygdala in punishment also comes indirectly from psychopathologies whose symptoms involve perturbations in punishment processing. These include depression, psychopathy and substance abuse disorders, each of which are thought to involve maladaptive punishment processing (oversensitivity to punishment within depression, insensitivity within psychopathy and substance abuse) and perturbations in amygdala activity

60 Chapter 2: Brain Mechanisms of Punishment

(Bechara & Damasio, 2002; Blair, 2008; Eshel & Rosier, 2010; Moul et al., 2012). For example, psychopathy, which is a mental disorder characterized by anti-social behaviour, is known to involve a deficit in punishment processing. In tasks where they must choose between an array of stimuli that are associated with monetary gain or loss, psychopaths (as identified by a psychopathy screen) choose punished options more than matched controls and do not learn to suppress punished responses across trials (Blair et al., 2004; Newman &

Kosson, 1986). They also choose punished responses instead of withholding a response more than healthy controls (Newman et al., 1987, 1990).

The impaired instrumental suppression in psychopathic populations has been attributed to a deficit in modifying goal-directed responses due to perturbed amygdala activity.

Psychopathic individuals have been reported to have heightened and blunted amygdala activity in response to aversive stimuli (Birbaumer et al., 2005; Müller et al., 2003; Schneider et al., 2000) and smaller amygdala volume compared to controls (Weber et al., 2008; Yang et al., 2009). Moul and colleagues (2012) suggested the heightened amygdala response was driven by overactivation of the valence-coding CeA, whereas underactivation was mediated by the BLA, which encodes the value of an outcome, such that attenuated BLA activity reduces the aversive value of a punisher. Punishment impairments are thus proposed to be the result of a diminished ability to modulate instrumental responding in response to contingent aversive outcomes due to muted BLA-mediated aversion signals.

In summary, the amygdala is involved in Pavlovian aversive learning, as well as instrumental learning. The amygdala has also been directly implicated in punishment learning and behaviour. Amygdala activity corresponds to coding of the aversive value of outcomes, and blunting of this aversion signalling within mental health populations (e.g. psychopaths) corresponds to deficits in punishment learning. The systemic drug effects discussed earlier within this chapter have also been related to actions on the amygdala. This role of the

61 Chapter 2: Brain Mechanisms of Punishment amygdala in punishment has been specifically attributed to activity within the BLA, which is thought to encode the aversive value of instrumental actions and outcomes. Lesions of the amygdala, in particular the BLA, also cause deficits in avoidance of disadvantageous response options and instrumental suppression.

2.2.3 Midbrain dopamine circuits

DA neurons found within the ventral midbrain can be organised into 3 broad systems: the mesolimbic, mesocortical and nigrostriatal systems. The mesolimbic system consists of projections from the VTA to several parts of the limbic system, including the nucleus accumbens, amygdala and hippocampus. The mesocortical system consists of a VTA projection to the prefrontal cortex (PFC). Finally, the nigrostriatal system consists of projections from the SN to the striatum (caudate nucleus and putamen).

Each of these systems function in tonic and phasic modes (Grace, 1991; Grace et al.,

2007). Tonic, or pacemaker, firing involves a steady rate of firing (~4Hz; Grace, 1991), maintaining a baseline level of DA in downstream structures, which is vital for normal functions such as voluntary movement, cognition and motivation (Schultz, 1998,

2007b; Schultz & Dickinson, 2000). Under particular conditions, e.g. in response to presentation of a stimulus, these neurons phasically increase (on average 20Hz; Blythe et al.,

2009; Dreyer et al., 2010) or decrease (no firing) for 100-500ms, causing significant changes in DA concentrations and receptor occupancy at projection targets (Dreyer et al., 2010).

The firing patterns of these neurons, especially in response to motivationally salient stimuli, have been the focus of much research, particularly appetitive Pavlovian conditioning

(Schultz 2007a). A variety of experimental findings by Schultz and colleagues have shown that midbrain DA neuron display phasic bursts in firing corresponding to appetitive CSs and

USs in a prediction error dependent manner (Schultz, 2007a, 2007b; Schulz et al., 1993,

1997), which match relative size of reward (Tobler et al., 2005). However, this body of

62 Chapter 2: Brain Mechanisms of Punishment research is only indirectly relevant to punishment, so will not be extensively discussed here.

Instead the role of DA in instrumental behaviour will be reviewed. Specific DA signals arising from differential firing patterns have separately implicated each of the three DA systems in punishment learning and behaviour.

Mesolimbic DA reward coding

There is strong evidence to suggest that burst firing of mesolimbic DA neurons encode reinforcer value to promote instrumental behaviour. Both natural reinforcers (e.g. food, sex) and addictive drugs stimulate DA activity (Becker et al., 2001; Everitt & Robbins, 2005;

Matsumoto & Hikosaka, 2007), while intracranial self-stimulation (ICSS) of DA structures and circuits is inherently reinforcing (Adamantidis et al., 2011; Ilango et al., 2014; Kim et al.,

2012; Olds & Milner, 1954; Olds & Olds, 1963), even displacing preservative activities such as eating and drinking (Milner, 1991). Preventing these DA signals by genetically knocking out NMDA receptors from DA neurons, which impairs burst firing but not tonic firing within these neurons (Komendantov et al., 2004; Parker et al., 2010), causes a selective impairment in specific forms of learning including instrumental behaviour (Zweifel et al., 2009).

Mesolimbic DA neurons have thus been labelled the “pleasure center” of the brain (Wise,

1980; Wise & Rompré, 1989) and projections from the VTA to the forebrain have been called the “brain reward circuit” (Ikemoto, 2007; 2010; Phillips, 1984). That said, hedonic experience is not dependent on DA neurons (Berridge, 2007; Cannon & Palmiter, 2003;

Schultz, 2006; Spanagel & Weiss, 1999). Cannon and Palmiter (2003) showed that mice that were genetically modified to be incapable of producing DA still showed a preference for sucrose over water, but did not acquire reinforced responding despite having no gross sensory or motor deficits. In fact, these mice starve, even in the presence of palatable food, unless given injections of L-DOPA, which rescues DAergic function, or direct placement of food into

63 Chapter 2: Brain Mechanisms of Punishment their mouths, at which point they readily eat. Instead, DA is argued to mediate goal-directed,

“wanting” behaviours, and not hedonic “liking” (Berridge, 2007; Cannon & Palmiter, 2003).

Taken together, there is strong evidence that the endogenous activity of midbrain DA neurons is both necessary and sufficient for reinforcement of behaviour. This endogenous activity is typified by burst firing of VTA DA neurons in response to rewarding stimuli, which causes the release of DA at projection targets.

Mesolimbic DA inhibition

It follows therefore that a highly plausible candidate for punishment is inhibition of DA neurons. Indeed, VTA firing is also sensitive to negative outcomes, with unexpected omission of reward and aversive outcomes causing a pause in firing within DA neurons (Guarraci &

Kapp, 1999; Matsumoto & Hikosaka, 2009a; Mirenowicz & Schultz, 1996; Schultz, 2007a,

2007b; Schultz et al., 1997; Tan et al., 2012). Though both these outcomes are sufficient to punish behaviour, it is unclear whether these phasic pauses underpin punishment.

Matsumoto and Hikosaka (2007) trained monkeys to make visual saccades to a target to the left or right of a fixation point. Correct saccades to one side (e.g. a left saccade) were followed by a tone and delivery of a liquid reward from a spout to the monkey’s mouth, whereas saccades to the other side (e.g. a right saccade) was followed by a tone but no reward. This resulted in monkeys making faster saccades to the rewarded target compared to the unrewarded target. They also observed a phasic excitation of DA neurons with reward delivery and a slight inhibition when reward was not delivered during initial trials. However, as the contingency between target direction and reward was learned, the phasic DA response to the outcome was no longer observed and was replaced by phasic excitation and inhibition of DA firing to presentations of the rewarded and unrewarded target, respectively. This replicated the prediction error findings of Schultz and colleagues – phasic DA burst at presentation onset of reinforced cues, while DA response to the reward itself diminished as

64 Chapter 2: Brain Mechanisms of Punishment the reward became increasingly expected. While this was an instrumental task with an unrewarded response that elicited saccades slower than its reinforced alternative, the unrewarded saccade was not technically punished, as monkeys had to make the saccade to move onto the next trial, and could simply reflect appetitive extinction.

Apart from endogenous pauses in DA activity to negative outcomes, evidence for a role of midbrain DA inhibition in aversion come from studies that inhibited VTA DA neurons. Liu and colleagues (2008) administered quinpirole (D2r agonist) into the VTA, which caused inhibition of the VTA and resulted in rats spending less time in a quinpirole-paired chamber on a subsequent non-infusion day (conditioned place aversion; CPA). Conversely, D1r antagonist into the ventral striatum, a major target of reward-coding VTA DA neurons, which might mimic the effect of reduced DA release as caused by VTA DA inhibition, also results in CPA (Shippenberg et al., 1991). Tan and colleagues (2010) showed that the benzodiazepine midazolam inhibits the activity of VTA GABA neurons, consequently increasing the activity of VTA DA neurons, an effect that depended on the α1 GABA subunit. This impairment in

VTA DA inhibition might mediate the previously described punishment-releasing effects of benzodiazepines, though this has yet to be shown directly.

Tan and colleagues (2012) showed that electric footshock caused a phasic excitation of

VTA GABA cells and a phasic inhibition of VTA DA cells in anaesthetized mice. To test the causal role of GABA cells, they expressed ChR2 in VTA GABA interneurons under control of a GAD promoter. Stimulation of these GABA neurons caused inhibition of VTA DA neurons, which was blocked by GABA antagonist bicuculline. When allowed to explore two chambers connected by a passageway, mice that had ChR2 expressed in GABA neurons developed an aversion to a chamber paired with ChR2 stimulation and spent less time in it, even on a subsequent non-stimulation day (CPA), whereas mice that did not have ChR2 expression did not develop a preference and spent equal amounts of time in both chambers.

65 Chapter 2: Brain Mechanisms of Punishment

Tan and colleagues tested whether direct inhibition of VTA DA neurons had the same effect by expressing halorhodopsin (NpHR), a chloride channel stimulated by orange light (causing hyperpolarisation of neurons expressing it, decreasing cell firing), in VTA TH+ neurons.

Halorhodopsin-mediated inhibition of VTA DA neurons caused CPA, just as ChR2- stimulation of VTA GABA neurons had done.

Danjo and colleagues (2014) tested the role of downstream targets in the aversive inhibition of VTA DA. Using TH-Cre mice they expressed a hyperpolarizing opsin, Arch, in

VTA DA neurons. Optogenetic inhibition of the VTA caused reduced levels of DA in the accumbens (Acb) and induced CPA of a stimulated dark chamber. Using lentivirus containing short hairpin RNA specific to D1 or D2 receptors injected into the Acb (causing selective knockdown of D1r or D2r), they showed that CPA induced by VTA-inhibition depended on accumbal D2r but not D1r; control and D1r knockdown mice exhibited CPA, whereas D2r knockdown mice did not.

A key limitation when considering this literature is that none of the above tasks are clearly punishment-based, and effects could be explained using non-punishment accounts, e.g.

CPA being driven by Pavlovian aversion. However, taken together there is strong evidence that inhibition of VTA DA is aversive, and could thus possibly mediate punishment.

The LHb-RMTg-DA circuit

While Tan and colleagues (2012) demonstrated that VTA GABA is able to inhibit VTA

DA, another inhibitory input to VTA DA has been speculated to mediate punishment learning and behaviour (Hikosaka, 2010; Hikosaka et al., 2008; Lawson et al., 2014; Stamatakis &

Stuber, 2012). The lateral habenula (LHb), a highly conserved structure within the epithalamus (Concha & Wilson, 2001), projects to ventral midbrain neurons and 5-HT neurons in the raphe nuclei via the fasciculus retroflexus (fr) (Hikosaka et al., 2008; Kim,

2009). Ji and Shephard (2007) reported that electrically stimulating LHb in rats inhibited 97%

66 Chapter 2: Brain Mechanisms of Punishment of DA neurons in VTA and SN. This inhibition was not observed if the fr was lesioned before stimulation, suggesting that LHb inhibition of midbrain DA depended critically on the descending projections from the LHb. Ji and Shepard (2007) also showed that infusing

GABAA antagonist into the VTA prevented midbrain DA inhibition via LHb stimulation, showing this inhibition of DA is mediated by GABAA receptors.

However, LHb neurons are glutamatergic, and thus excitatory (Lecourtier & Kelly, 2007;

Omelchenko et al., 2009), ruling out the possibility that LHb directly inhibits midbrain DA neurons. It was hypothesized that LHb stimulation inhibited DA neurons by exciting GABA interneurons within the VTA (Lecourtier & Kelly, 2007) but Omelchenko and colleagues

(2009) showed that this was unlikely because 85% of LHb terminals in the VTA were on DA neurons, which meant, excluding indirect pathways, LHb excitation would tend to excite

VTA DA neurons. LHb GABAergic inhibition of VTA DA is instead mediated by the rostromedial mesopontine tegmental nucleus (RMTg), a population of GABAergic neurons immediately caudal to the VTA (Bourdy & Barrot, 2012; Kaufling et al., 2009). RMTg receives strong inputs from LHb, and in turn densely projects to midbrain DAergic neurons, though it also has projections to other regions including the dorsal raphe and other tegmental nuclei (Jhou et al., 2009a). Balcita-Pedicino and colleagues’ (2011) ultrastructural analyses confirmed that glutamatergic LHb axons terminal on RMTg GABA neurons, which in turn inhibit VTA DA neurons.

Phasic inhibition of midbrain DA to negative outcomes is thought to derive from this

LHb-RMTg input. LHb and RMTg neurons are phasically excited by aversive stimuli and reward omission (Hong et al., 2011; Matsumoto & Hikosaka, 2007, 2009a), in a prediction error manner. Crucially, this excitation precedes midbrain DA inhibition. Given the sufficiency of LHb and RMTg excitation to inhibit midbrain DA (Christoph et al., 1986;

Hong et al., 2011; Ji & Shepard, 2007; Matsumoto & Hikosaka, 2007), there appears to be a

67 Chapter 2: Brain Mechanisms of Punishment

functional pathway of LHb phasic excitation  RMTg phasic excitation  VTA phasic inhibition that codes for negative outcomes in a prediction error manner. This hypothesis is also supported in humans; fMRI studies reveal the habenula is activated in response to aversive shocks (Lawson et al., 2014), as well as negative feedback and absence of expected positive feedback (Ullsperger & von Cramon, 2003).

Jhou and colleagues (2009b) observed that shocks and aversive CSs cause increased c-

Fos expression within VTA-projecting RMTg neurons, but not surrounding VTA-projecting neurons, suggesting aversive inputs to VTA derive from RMTg. Some evidence for the importance of LHb inputs onto RMTg for aversion come from a study by Brown and Shepard

(2013), who showed that rats that received four 0.5sec, 0.5mA shocks delivered across a

20min session had a significant increase in c-Fos expression (marker for neural activity) within the LHb and RMTg compared to rats that did not receive shocks. Crucially, lesions of the fr attenuated the shock-induced c-Fos within the RMTg without significantly affecting

LHb c-Fos. However, while the LHb and RMTg showed marked increases in c-Fos expression following high-intensity footshocks (120, 0.8mA footshocks of pseudo-random duration [5-15secs] delivered over 40mins), the RMTg increase was not dependent on having an intact fr, suggesting other pathways are recruited in situations of comparably high-intensity shock. The RMTg receives inputs from a number of other neural structures apart from the

LHb, with projections from cortical, striatal and midbrain regions (Jhou et al., 2009a), so aversive signals could be compensated for by other inputs in situations of severe shock.

Jhou and colleagues (2009b) found that lesions of the RMTg attenuated acquisition of freezing to a 20sec CS signalling 0.5mA shock. RMTg lesions also decreased freezing but increased treading in the presence of a predator odour (aversive US-elicited behaviour), such that no difference in overall time spent displaying defense behaviours was found, simply a shift in proportion of defense behaviour from inactive to active. This proportion of freezing

68 Chapter 2: Brain Mechanisms of Punishment was highly correlated (r = 0.86) with the extent of RMTg lesions. RMTg lesions also increased open arm entries within an elevated plus maze.

Evidence that activity in the LHb-RMTg pathway is sufficient to support aversion comes from experiments performed by Stamatakis and Stuber (2012). They expressed ChR2 or eYFP (control) in mice LHb neurons, and implanted optic fibres directly above the RMTg.

Because ChR2 would only be expressed in LHb neurons, light delivery into the RMTg would selectively activate ChR2-expressing LHb neurons within the RMTg, i.e. LHb terminals within the RMTg (though possibly LHb axons passing through the RMTg as well). This method allows for the selective manipulation of neural pathways, in this case, the LHb-RMTg pathway. Stamatakis and Stuber showed that optogenetic stimulation of the RMTg caused a

CPA in ChR2, but not eYFP, mice. This CPA was observable up to 7 days after stimulation.

ChR2 stimulation of the RMTg also negatively reinforced nosepoking, such that nosepoking was acquired if it caused a 20-sec cessation of ChR2 stimulation in ChR2 but not eYFP mice.

Also, after training mice to nosepoke for sucrose solution, optogenetic stimulation suppressed responding in ChR2 mice compared to eYFP mice. This suggests that LHb-RMTg activation can function as a punisher and negative reinforcer.

Several lines of evidence suggest the LHb receives these aversion coding signals from the globus pallidus (GP; entopeduncular nucleus [EP] in rodents) (Hong & Hikosaka, 2008;

Shabel et al., 2012; Wickens, 2008). The LHb receives dense projections from this region

(Herkenham & Nauta, 1977; Shabel et al., 2012), LHb-projecting GP neurons have been shown to signal aversion similarly to LHb neurons (Hong & Hikosaka, 2008), and ChR2 stimulation of the GP-LHb pathway causes conditioned place aversion (Shabel et al., 2012).

There is also evidence of motivationally significant excitatory and inhibitory inputs to the

LHb coming from the VTA (Good et al., 2013; Root et al., 2014; Stamatakis et al., 2013), with glutamatergic excitation of the LHb causing CPA (Root et al., 2014) and GABAergic

69 Chapter 2: Brain Mechanisms of Punishment inhibition of the LHb causing conditioned place preference and reinforcement (Stamatakis et al., 2013). Serotoninergic modulation of LHb inputs has also been reported (Hwang & Chung,

2014; Shabel et al., 2012, 2014).

In summary, reward-coding DA neurons in the ventral midbrain are excited by appetitive stimuli and positive outcomes in accordance with appetitive prediction error, thus coding for reward expectancy and deviations from expected outcome. Burst firing within these neurons is sufficient to reinforce behaviours and are thought to be necessary for instrumental reward.

These reward-coding neurons are also inhibited by aversive stimuli and negative outcomes, causing a pause in firing below baseline rates on a negative prediction error basis. This phasic inhibition of DA neurons is linked to LHb excitation, which inhibits the VTA via the

GABAergic RMTg. Manipulation of this circuit has consistently shown that inhibition of

VTA DA (by direct inhibition or stimulation of VTA GABA interneurons, LHb or RMTg) is aversive. These findings have led to the view that the LHb-RMTg-VTA pathway is critical for learning and behaviour within punishment situations (Hikosaka, 2010; Hikosaka et al., 2008;

Stamatakis & Stuber, 2012). However, this claim does not have direct support from the literature – no study has yet prevented these signals and observed a deficit in punishment, a finding vital to asserting the necessity of these signals for punishment learning and behaviour.

It is just as plausible that these effects observed are driven by negative reinforcement, appetitive extinction and/or punishment-independent aversion.

The nigrostriatal and indirect basal ganglia pathway

Most of the literature on the motivation and learning aspects of reward and punishment have focused on VTA neurons, particularly the role of the mesolimbic DA system. The nigrostriatal system has traditionally been allocated sensorimotor functions including voluntary motor execution and habit formation (Haber, 2003). However, these neurons also show reward prediction error coding (Matsumoto & Hikosaka, 2007), with the LHb-RMTg

70 Chapter 2: Brain Mechanisms of Punishment pathway innervating the SNc as densely as the VTA (Jhou et al., 2009a) and causing comparable inhibition of SN DA firing (Christoph et al., 1986; Matsumoto & Hikosaka,

2007). This suggests that the activity of SNc neurons share similar properties as reward- coding VTA neurons, and with similar aversion-coding inputs from the LHb-RMTg pathway, it is possible the SNc also has a role in reinforcement and aversion.

Illango and colleagues (2014) investigated this possibility, expressing ChR2 in TH-Cre mice VTA and SNc DA neurons. They showed that mice would acquire leverpressing if it caused 20ms, 25Hz intracranial blue light pulse delivery (optogenetic ICSS) into the SNc as much as rats that received VTA stimulation. No leverpressing was found on an unreinforced lever, but if the reinforcement contingency between levers was reversed, such that pressing the unreinforced lever caused intracranial light pulses and pressing the reinforced lever no longer had any consequence, leverpressing rapidly switched to the now reinforced lever for both VTA and SNc stimulating mice.

To test the effect of inhibition on these two neuron populations they expressed inhibitory

NpHR within the VTA and SNc of mice. Optogenetic inhibition of VTA and SNc DA while mice were in one half of a chamber caused CPA, such that rats with NpHR expressed within their VTA and SNc spent significantly less time in the light-stimulating side, and crossed over between halves less, compared to control mice. This preference could be reversed by switching which half caused intracranial light pulse deliveries. When NpHR was expressed in the dorsal striatum (dStr), optogenetic inhibition also caused CPA that could be reversed by changing the chamber paired with light deliveries. This provides evidence that excitation and inhibition of SNc DA neurons, which is thought to cause corresponding excitation and inhibition within its downstream striatal targets, can support reinforcement and aversion.

As previously mentioned, the nigrostriatal pathway is part of the basal ganglia. The basal ganglia has two pathways: a direct and indirect pathway (Graybiel, 2000). Though there are

71 Chapter 2: Brain Mechanisms of Punishment slight variations across models about what structures are within these pathways (Bolam et al.,

2000; Graybiel, 2000; Kravitz & Kreitzer, 2012; Wall et al., 2013), both pathways begin in the dStr, and have opposite effects on downstream targets controlling movement, i.e. the internal segment of the globus pallidus (GPi, or EP in rodents), the SN pars reticula (SNr), and thalamus. Direct pathway neurons in the striatum, mainly GABAergic medium spiny neurons (MSNs), project directly to the GPi and SNr, which have tonic inhibitory influences on the thalamus. Thus activation of direct pathway neurons in the striatum causes inhibition of GPi and SNr neurons, which causes disinhibition of thalamic neurons (Kravitz & Kreitzer,

2012). The subsequent increase of thalamic firing causes the release of movement through glutamatergic activation of cortical and striatal progenitors of motor movement. Indirect pathway neurons instead have an indirect projection to GPi and SNr neurons; indirect pathway MSNs in the striatum project to the external globus pallidus (GPe), which has an inhibitory projection to the glutamatergic subthalamic nucleus (STN), which projects to the inhibitory GPi and SNr (Calbresi et al., 2014). Thus activating indirect pathway neurons causes activation of the GPi and SNr through disinhibition of excitatory inputs, which in turn inhibits thalamic neurons responsible for initiating voluntary movements. Therefore activation of the indirect pathway suppresses behaviours (Kravitz & Kreitzer, 2012).

Although these direct and indirect pathway neurons are intermingled within the dmStr, they can be distinguished by using markers for D1 and D2 receptors (Gerfun et al., 1990;

Hikada et al., 2010; Kravitz et al., 2010). This difference in receptor expression has been proposed to mediate differences in recruitment of direct and indirect pathways by DA

(Bromberg-Martin et al., 2010; Frank, 2005; Hikosaka, 2007; Kreitzer & Malenka, 2008); burst firing in DA neurons cause high DA concentrations within the striatum, preferentially activating D1r, thus recruiting the direct pathway, while pauses in DA neuron firing cause low

72 Chapter 2: Brain Mechanisms of Punishment

DA concentrations within the striatum, preferentially activating D2r, thus recruiting the indirect pathway (Bromberg-Martin et al., 2010; Shen et al., 2008).

Kravitz and colleagues (2012) investigated the direct and indirect pathways using optogenetic techniques. They infused virus containing DIO-ChR2 construct into the dmStr of

D1-Cre and D2-Cre transgenic mice, so ChR2 would only be expressed in D1 or D2 MSNs.

They found that optogenetic stimulation of D1 neurons reinforced contacts with a touch sensor that caused 1sec blue light into the dmStr, even on following extinction sessions (no light delivery on contact with sensor), suggesting appetitive learning had occurred. In contrast, optogenetic stimulation of D2 neurons reduced likelihood of further contact with the light- paired touch sensor compared to controls.

Kravitz and colleagues (2012) used these findings to argue that activation of the direct pathway is sufficient for persistent reinforcement, whereas activation of the indirect pathway is sufficient for transient punishment, which they suggested was a transient phenomenon anyways based on the conclusions of Skinner (1953). Combined with the fact that long-term potentiation (LTP) and long-term depression (LTD) can occur separately on direct and indirect pathways (Shen et al., 2008), Kravitz and Kreitzer (2012) suggested that LTP of the direct pathway, and LTD of the indirect pathway, mediated reinforcement, whereas LTP of the indirect pathway and LTD of the direct pathway, mediated punishment. Thus punishment is thought to be mediated by aversive outcome-induced SNc inhibition, causing recruitment of the indirect pathway to suppress behaviours.

Aversion-coding neurons of the VTA

Although reward-coding DA neurons dominate the literature on DA function, several researchers have found that some DA neurons within the VTA are phasically excited by aversive stimuli (e.g. Cohen et al., 2012; Matsumoto & Hikosaka, 2009b). McCutcheon and colleagues (2012) reviewed studies recording from midbrain DA neurons during aversive

73 Chapter 2: Brain Mechanisms of Punishment stimulation and found there was significant variation in the proportion of neurons inhibited or excited by aversive stimulation, and suggested that this variation might be due to differences in the nature of the aversive event (airpuff, pinch, shock, aversive CS etc), DA region recorded from, method of determining neuron type and whether the animal was awake during recording (see also Ilango et al., 2012 for a similar review). However, all but one study

(Ungless et al., 2004) found some neurons that increased firing in response to an aversive event. This finding has been used to suggest that midbrain DA neurons may also encode aversion via excitation (Bromberg-Martin et al., 2010; Lammel et al., 2014), though this role for midbrain in DA has been disputed (Fiorillo, 2013; Horvitz, 2000).

Matsumoto and Hikosaka (2009b) argued that there are at least two distinct types of DA neurons that code for aversive stimuli differently. One type, found more ventromedially

(including in the VTA), were excited by appetitive stimuli and inhibited by aversive stimuli, thus coding for motivational value. The other type of neuron, found more dorsolaterally and in the SNc, was excited by both aversive and appetitive stimuli, and thus seem to code for motivational salience. This suggests that different DA signals can be coded for by different dopamine neurons, thus not limiting midbrain dopamine to conveying only one type of information. Therefore, the theories that midbrain dopamine codes for salient environmental stimuli and reward prediction error are not mutually exclusive.

Mantz and colleagues (1989) found that tail pinch of anaesthetized rats caused no increase in firing for any recorded mesoaccumbal DA neurons while 25% decreased in firing in response to tail pinch. However, 65% of mesocortical DA neurons increased firing in response to tail pinch. Lammel and colleagues (2011) extended these findings, showing reinforcing cocaine administration increased excitatory synapses onto mesolimbic VTA neurons, but not mesocortical neurons, whereas aversive formalin injections increased excitatory synapses onto mesocortical neurons and lateral AcbSh-projecting VTA neurons,

74 Chapter 2: Brain Mechanisms of Punishment but not medial AcbSh-projecting VTA neurons. Thus the mesocortical pathway seems to encode aversive stimuli, unlike the mostly reward-coding mesolimbic pathway.

Lammel and colleagues (2012) investigated this mesocortical pathway in aversion, as well as the role of LHb modulation of this pathway. They studied VTA-projecting LHb neurons in mice. They injected ChR2-containing rabies virus, which is taken up by terminals to cause expression of ChR2 construct in afferents, into the VTA and implanted optic fibres into the LHb. Stimulation of VTA-projecting LHb neurons induced CPA of a stimulation- paired chamber.

To determine which VTA pathway the LHb projections synapsed onto, the mesolimbic and/or mesocortical system, the authors injected retrogradely-transported fluorescent beads into the mPFC (mesocortical) or Acb (mesolimbic), and ChR2-eYFP into the LHb. They recorded from midbrain slices and found that stimulation of LHb terminals induced cell firing in TH+ neurons projecting to the mPFC and GABAergic RMTg neurons, but not TH+ Acb- projecting or SN neurons. In fact, stimulation of LHb terminals caused an inhibition of VTA

DA cells projecting to the lateral shell of the Acb (but not medial shell-projecting neurons), which was blocked by GABA antagonist picrotoxin application. These observations affirmed previous findings that the LHb mostly projects to the GABAergic RMTg and VTA DA cells

(Omelchenko et al., 2009), but extended this by isolating the excitatory LHb-VTA projection to mesocortical neurons. It also suggests that the inhibitory influence of the LHb on VTA neurons is on mesolimbic neurons projecting to the lateral AcbSh (though no data on inhibitory currents for mesocortical or SN neurons was reported).

Finally, Lammel and colleagues (2012) tested whether the CPA induced by stimulating

VTA-projecting LHb neurons was mediated by mesocortical DA. In a replication of the previously described experiment, they expressed ChR2 in VTA afferents using rabies virus and implanted optic fibres into the LHb, but they also cannulated the PFC to allow for local

75 Chapter 2: Brain Mechanisms of Punishment drug infusions. They replicated their previous findings that optogenetic stimulation of VTA- projecting LHb neurons induced CPA of a stimulation-paired chamber, but infusions of

SCH23390 (D1r antagonist) blocked this CPA, showing that aversion induced by LHb-VTA stimulation depends on D1r activity in the PFC. They concluded that the LHb-mesocortical

DA pathway contributes to aversive learning and behaviour.

Summary of midbrain DA systems involved in punishment

Midbrain DA can be divided into 3 systems: the mesolimbic, mesocortical and nigrostriatal systems. Each of these has been implicated in aversion, and by extension, punishment. The mesolimbic and nigrostriatal system contain reward-coding neurons that emit phasic increases in firing in response to appetitive stimuli and phasically pause firing in response to aversive stimuli. These phasic changes in firing are greater to unexpected than expected stimuli presentations, which has been linked to prediction error coding and a role for these phasic signals in plasticity-mediated learning. Both the mesolimbic and nigrostriatal systems are inhibited by an aversion-coding LHb-RMTg input. Stimulation of the LHb-RMTg pathway, and inhibition of the mesolimbic or nigrostriatal system, is aversive, as measured by place avoidance and ICSS. There is some recent evidence that the effect of aversion-coding pauses in VTA firing is mediated by D2r in the Acb, while SNc pauses in firing is thought to cause activation and LTP of D2r-expressing indirect pathway neurons within the dStr.

Contrastingly, mesocortical neurons receive a direct, excitatory input from the LHb and are presumably excited by aversive stimuli. Activation of the LHb-VTA pathway causes a CPA that is blocked by D1r antagonist in the mPFC. A summary of these circuits is illustrated in

Figure 2.1.

This complex array of contributions of midbrain DA signals to aversion processing has been used to suggest these various signals are important for punishment (Bromberg-Martin et al., 2010; Lammel et al., 2014). However, though some of these pathways have been shown to

76

Figure 2.1. Summary of midbrain DA circuits suggested to mediate punishment in sagittal view. Negative outcomes cause excitation of the EP and LHb, causing inhibition of nigrostriatal and mesolimbic DA neurons, and excitation of mesocortical DA neurons, affecting DA transmission

77 in their respective downstream structures. Adapted from Paxinos and Watson (2007).

Chapter 2: Brain Mechanisms of Punishment have punishment-relevant neural activity, with artificial emulation of these signals being sufficient for punishment-like effects, the necessity of these pathways in instrumental punishment has yet to be directly examined.

2.2.4 Response-outcome circuit

As reviewed in the Chapter 1, instrumental behaviour has properties indicative of response-outcome (R-O) and stimulus-response (S-R) associations. A growing body of evidence has suggested that distinct circuits mediate R-O and S-R learning and behaviour

(Balleine & O’Doherty, 2010; Yin & Knowlton, 2006). Studies using rodents have shown that pre-training lesions of the BLA, but not CeA or sham lesions, impair choice of a non- devalued response over a satiety-devalued response under extinction conditions but not under continued reinforcement (Corbit & Balleine, 2005). Pre-training BLA lesions also impair contingency degradation effects (Balleine et al., 2003). Ostlund and Balleine (2008a) found that post-training, pre-devaluation lesions had no effect on these R-O dependent behaviours.

These findings were used to suggest that the BLA is required to learn and use R-O associations.

Pre-training lesions of the prelimbic region (PL) slowed acquisition of an instrumental response, and removed normal bias towards responding for a non-devalued outcome over responding for a satiety-devalued outcome under conditions of extinction, but not continued reinforcement (Corbit & Balleine, 2003). Sham-lesioned (control) rats also suppressed responding on an operant that led to an outcome that was also delivered independently of responding (degraded response) compared to a non-degraded response (even in a subsequent extinction trial), whereas PL-lesioned rats performed the degraded response more than the non-degraded response, and under extinction had no preference between responses. Later studies showed that post-training, pre-devaluation lesions had no effect on these R-O dependent behaviours (Ostlund & Balleine, 2005; Tran-Tu-Yen et al., 2009), suggesting the

78 Chapter 2: Brain Mechanisms of Punishment

PL is involved in acquisition of R-O associations, but not their storage or expression, nor the updating of outcome value and incorporation of updated value into choice. Corbit and

Balleine (2003) also suggested that lesions of the PL prevented formation of R-O associations so responding was acquired through an S-R mechanism.

Pre-training lesions, but not post-training lesions, of the mediodorsal thalamus (MDT) in rats cause similar impairments in outcome devaluation and contingency degradation tasks, with intact response and outcome discrimination (Corbit et al., 2003; Ostlund & Balleine,

2008a). This may reflect the MDT input to the PFC, including projections to the PL (Öngür &

Price, 2000).

PL and BLA have robust projections to the dorsomedial striatum (dmStr) (Voorn et al.,

2004). Yin and colleagues (2005a) showed that infusing the NMDA receptor antagonist APV into the dmStr immediately prior to training prevented outcome devaluation. Another study by Yin and colleagues (2005b) showed that pre- or post-training lesions of the dmStr abolished sensitivity to outcome devaluation and contingency degradation. The presence of a pre- and post-training impairment in both tasks was used to suggest that, like the PL, the dmStr is critical for the formation of R-O associations but, unlike the PL, is also required for the expression of R-O behaviour. This was supported by findings that dmStr inactivations using infusions of GABAA agonist muscimol following pre-feeding, but prior to extinction test, impaired sensitivity to devaluation by satiety. Also, after being trained in a contingency degradation task drug-free (all rats showed more responding on the non-degraded response), infusions of muscimol into the dmStr removed preferential responding on the non-degraded response under extinction conditions.

These findings have been used by Balleine and colleagues (Balleine et al., 2007; Balleine

& O’Doherty, 2010; Hart et al., 2014) to propose a model of R-O action control in the MDT-

PL-dmStr pathway alluded to above, with the dmStr directly or indirectly projecting to the

79 Chapter 2: Brain Mechanisms of Punishment

SNr, which projects back to the MD, creating a corticostriatal loop. This circuit is modulated by DAergic signals from the SNc to the dmStr, as well as by signals from the BLA encoding sensory features and value (Balleine & Killcross, 2006) via projections to dmStr (Corbit et al.,

2013) but not mPFC (Coutureau et al., 2009) (summarised in Figure 2.2).

Punishment has several characteristics of R-O learning such as contingency degradation, suggesting organisms encode the relationship between the response and a punisher. Indeed, as reviewed in Chapter 1, this was central to the last view articulated by Bolles on the nature of punishment learning. The effect of overtraining on suppression of a response following introduction of response-contingent noxious stimuli has not been directly tested, but the presumption is that aversive conditions would rapidly shift S-R control to R-O control

(Ostlund & Balleine, 2008b). Ostlund and Balleine (2008b) trained rats to leverpress for sugar. They then devalued the sugar via conditioned taste aversion. Undertrained rats showed much less leverpressing for the devalued sugar compared to overtrained rats under conditions of extinction. However, when the reinforcement contingency was reinstated (“punishment test”), responding for the devalued sugar decreased over the course of post-devaluation reinforcement sessions in both under and overtrained rats. This finding was used to suggest two things: that the response was being punished by its contingency with the now-aversive sugar, and that this punishment is not impaired by reflexive responding induced by overtraining.

Ostlund and Balleine (2008b) also tested the effect of dmStr lesions on this “punishment” of devalued responding. They found that under extinction conditions, rats with dmStr lesions did not suppress responding on the lever associated with the devalued response, whereas rats with sham lesions did. However, when the contingency between response and outcome

(though now devalued) was reinstated (“punishment test”) sham rats had a clear preference in responding for the non-devalued outcome, whereas dmStr rats showed no preference. Since

80

Figure 2.2. Summary of the functional connections within the response-outcome (R-O) circuit in sagittal view. MDT and PL are required for the acquisition of R-O associations, while BLA and dmStr are required for both acquisition and expression of R-O behaviour. Adapted from

81 Paxinos and Watson (2007).

Chapter 2: Brain Mechanisms of Punishment punishment in this task is not impaired by S-R control, the authors argued that dmStr lesions prevented rats from shifting back to R-O control during the punishment test. This was linked to the similar effects of BLA lesions, and given both the BLA and dmStr are implicated in the acquisition and expression of goal-directed instrumental behaviour, they are also possibly involved in punishment control of behaviour.

2.2.5 Prefrontal cortex

The PFC can be broadly partitioned into medial and lateral divisions (mPFC and lPFC, respectively). The rodent mPFC largely consists of the ACC and PL cortex dorsally (dmPFC), and the infralimbic (IL) cortex ventrally (vmPFC). The lPFC consists of the ventrally located orbitofrontal (OFC) and insula cortex (Paxinos & Watson, 2007) (see Figure 2.3). It is worth noting that the primate PFC differs from the rodent PFC; the medio-lateral distinction referred to in primate studies do not necessarily contain the same anatomical and functional regions as the rodent mPFC and lPFC, with the homologues between primate and rodent PFC remaining somewhat disputed (Balleine & O’Doherty, 2010; Brown & Bowman, 2002; Seamans, 2008;

Uylings et al., 2003). For sake of clarity, any reference to the mPFC and lPFC within this thesis refers to the anatomical divisions within the rodent brain.

The PFC in has long been implicated in decision-making and behavioural control (Dias et al., 1996; Onge et al., 2012; Ragozzino, 2007; Szcepanski & Knight, 2014), and has been suggested to mediate punishment behaviour (Kobayashi, 2012; Wiech & Tracey, 2013). The evidence supporting a role for these regions in punishment will now be reviewed.

Anterior cingulate cortex

The ACC has been strongly implicated in the processing of noxious stimuli. ACC neurons fire in response to noxious stimuli (Sikes & Vogt, 1992), and imaging studies in humans have found that ACC activation is correlated with the magnitude of unpleasantness

82 Chapter 2: Brain Mechanisms of Punishment

Figure 2.3. Regions of the rodent prefrontal cortex (PFC) in coronal plane (adapted from Paxinos & Watson, 2007). The ACC and PL make up the dorsomedial PFC, the IL is part of the ventromedial PFC, and the OFC and RAIC are part of the lateral PFC.

83 Chapter 2: Brain Mechanisms of Punishment experienced in response to noxious stimuli (Rainville et al., 1997). In rats (Furlong et al.,

2010) and humans (Carter et al., 1998; Holroyd & Coles, 2002), the extent of neuronal activation in ACC in response to an aversive event is linked to positive prediction error, so that unexpected but not expected aversive events recruit this region. Moreover, infusions of glutamate into the ACC is sufficient to induce CPA, and intra-ACC infusions of glutamate antagonist kynurenic acid attenuates formalin-induced CPA, suggesting that activation of the

ACC is both necessary and sufficient for CPA (Johansen & Fields, 2004). However, lesion of the ACC following training has no effect on CPA, suggesting this ACC role is isolated to acquisition.

As reviewed previously, ACC has also been implicated in behavioural inhibition.

Intracranial electrical stimulation of the ACC inhibits behaviour (Kaada, 1951), which was used to suggest the ACC is responsible for the suppression of behaviour (Kaada et al., 1953).

However, lesions of the ACC in cats did not impair passive avoidance of running to a goal- box or suppressing bar pressing responses that led to shock (i.e. impaired suppression of punished responding) (McCleary, 1961). ACC lesions did impair cats’ ability to acquire these responses on a negative reinforcement schedule (not running or barpressing led to shock).

This suggests that ACC aversion-coding is required for active avoidance (negatively reinforced behaviour) but not passive avoidance (punished behaviour).

Prelimbic cortex

The prelimbic cortex (PL), aside from its role in the formation of R-O associations (see above), has been ascribed a role in expression of aversive Pavlovian associations (Blum et al.,

2006; Corcoran & Quirk, 2007; Sierra-Mercada et al., 2011) and top-down control of behaviour (Marquis et al., 2007; Ragozzino, 2007).

Inactivations of the PL reduce (Blum et al., 2006; Corcoran & Quirk, 2007; Sierra-

Mercada et al., 2011) whereas stimulation of the PL increases fear expression (Vidal-

84 Chapter 2: Brain Mechanisms of Punishment

Gonzalez et al., 2006). Though these findings are from explicitly Pavlovian tasks, this function may reflect a more general role for the PL in encoding aversive stimuli and engaging relevant circuits to execute appropriate behavioural responses.

The PL is also implicated in behavioural flexibility and control, as PL inactivations cause selective impairments in strategy shifting and use of contextual cues to guide instrumental behaviour (Marquis et al., 2007; Ragozzino, 2007). Instrumental suppression can be considered inhibition of a reinforced response due to an additional punishment contingency. The conflict between obtaining the reward and avoiding the aversive outcome requires behavioural flexibility once the punishment contingency is enforced, and behavioural control to maintain suppression despite this suppression preventing the procurement of reward. However, these studies have been conducted under conditions of choice, so it may be the case that the PL only mediates strategy shifting and behaviour when discrete alternatives are available.

Infralimbic cortex

The infralimbic cortex (IL) is considered part of the vmPFC, located immediately ventral to the PL. Although the IL and PL are both part of the mPFC and functional distinctions are often not considered, a growing body of evidence suggests that the IL has roles distinct from, and even opposing to, the PL (Balleine & O’Doherty, 2010; Hayen et al.,

2014; Killcross & Coutureau, 2003; Laurent & Westbrook, 2009; Sierra-Mercado et al., 2011;

Willcocks & McNally, 2013).

While the PL is required for fear expression, the IL is required for the learning and expression of fear extinction (Laurent & Westbrook, 2009; Sierra-Mercado et al., 2011).

Interestingly, the IL has also been shown to be involved in the extinction of instrumental reward seeking (Marchant et al., 2010; Peters & De Vries, 2013; Peters et al., 2008, 2009).

The IL is recruited during extinction of responding for beer, as measured by c-Fos expression

85 Chapter 2: Brain Mechanisms of Punishment

(Marchant et al., 2010), and BM inactivation of the IL impairs extinction of cocaine seeking

(reinstates responding) (Peters et al., 2008). Infusions of the partial NMDA agonist D-

Cycloserine into the IL, which are argued to potentiate IL function, enhance extinction memory (Peters & De Vries, 2013). Therefore it appears that IL activity is correlated with, sufficient, and necessary for the extinction of instrumental responding. This role may also extend to suppression of responding due to punishment, and fits with a proposed role for the

IL in behavioural inhibition (Arnsten & Li, 2005; Killcross & Coutureau, 2003). However,

Willcocks and McNally (2013) reported that IL inactivations did not increase responding following extinction of beer seeking, though there was an increase in latencies to perform the extinguished response.

The IL is also thought to play a distinct role from the PL in instrumental learning.

Specifically, IL may mediate the shift to S-R control of behaviour; IL lesions prevented the typical insensitivity to satiety-induced outcome devaluation following overtraining of responding (Killcross & Coutureau, 2003). Infusions of muscimol into the IL immediately prior to choice test (after overtraining and devaluation) also restored goal-directed responding

(Coutureau & Killcross, 2003), suggesting the IL mediates S-R control by suppressing the influence of R-O associations. However, it is worth noting that IL inactivations did not restore this difference by decreasing devalued responding, but instead restored non-devalued responding. This suggests that the IL mediates specific aspects of behavioural suppression.

Orbitofrontal cortex

The OFC has been thought to encode the value of stimuli (Cardinal et al., 2003;

Schoenbaum & Roesch, 2005), as well as mediate response choice and inhibition (Arana et al., 2003; Hodgson et al., 2002; O’Doherty et al., 2003; Schoenbaum et al., 2009). These functions make the OFC a strong candidate for mediating punishment learning and behaviour,

86 Chapter 2: Brain Mechanisms of Punishment as punishment involves encoding the value of a response and/or punisher, and requires response choice/inhibition to appropriately suppress emission of the punished response.

Recordings of neural activity in rodents and non-human primates, as well as brain imaging studies using humans, have observed that the OFC is activated by motivationally salient stimuli (both outcomes and predictors), in accordance with their value (O’Doherty et al., 2001, 2003; Morrison & Salzman, 2011; Schoenbaum et al., 2009); the OFC contains reward-coding neurons that are activated by appetitive stimuli, and aversion-coding neurons that are activated by aversive stimuli (Morrison & Salzman, 2011). These finding suggest that

OFC neural activity encodes the value of outcomes as well as the value of stimuli that predict those outcomes.

Value coding and prediction error coding may also extend to instrumental tasks. For example, O’Doherty and colleagues (2001) found that the medial OFC was activated by stimuli signalling monetary reinforcement whereas lateral OFC by stimuli signalling monetary punishment. Moreover, medial OFC activity was correlated with the magnitude of reward and lateral OFC activity with magnitude of punishment (see also Arana & colleagues,

2003).

The above studies show that OFC activity is correlated with encoding the value of reinforcers and response choice, but do not show whether it is necessary for value-guided behaviour. Humans with bilateral lesions of the OFC are impaired in choosing an advantageous option over a disadvantageous option compared to healthy controls (Bechara et al., 1999, 2000). Pickens and colleagues (2003) trained rats to associate houselight presentations with food delivery. They then devalued this food by pairing it with LiCl. This caused a decrease in magazine responses during subsequent presentations of the houselight compared to control rats that received unpaired food and LiCl. Lesions of the OFC, prior to devaluation, abolished the reinforcer devaluation effect without affecting magazine entries

87 Chapter 2: Brain Mechanisms of Punishment during the houselight if the food was not devalued. These findings were used to support the claim that the OFC encodes the value of a reinforcer, but more specifically is required to update the incentive value of an outcome. However, it remains unclear whether the OFC is required for punishment behaviour.

Insular cortex

The insula cortex stretches along the rostrocaudal axis of the cortex, and the rostral portion of this structure (rostral agranular insular cortex; RAIC), lateral to the OFC within the

PFC, has been strongly implicated in aversion (Wiech & Tracey, 2013). The insula is activated in response to aversive stimuli (Coghill et al., 1994; Hayes & Northoff, 2011), as well as the anticipation of aversive stimuli (Franciotti et al., 2009; Simmons et al., 2004,

2006). This has been linked to a role of the RAIC in pain modulation (analgesia and hyperalgesia) (Jasmin et al., 2003), but has also been linked to cognitive and behavioural processes in response to aversive stimuli (Flynn et al., 1999; Furlong et al., 2010; Menon &

Uddin, 2010; Wiech & Tracey, 2013). Like the ACC, this role has been linked to aversive prediction error coding in rats (Furlong et al., 2010) and humans (Preuschoff et al., 2008;

Seymour et al., 2004).

The RAIC is also implicated in attentional control and triggering cognitive control

(Sridharan et al., 2008), with a proposed role for the RAIC in engaging cognitive resources in response to aversive stimuli to ensure appropriate behavioural responses (Menon & Uddin,

2010; Wiech et al., 2010; Wiech & Tracey, 2013). RAIC activity has also been implicated in top-down inhibitory control of behaviour (Cai et al., 2014; Ghahremani et al., 2014). These functions would likely be involved in punishment-driven instrumental suppression.

RAIC activity has also been posited to encode relative instrumental response value

(Talmi & Pine, 2012). Parkes and Balleine (2013) found that the rat insula was activated in response to satiety-induced outcome devaluation. Parkes and Balleine also found that NMDAr

88 Chapter 2: Brain Mechanisms of Punishment

(glutamate receptor) antagonist, ifenprodil, infusions into the insula immediately before outcome devaluation or choice test impaired performing the non-devalued response over the devalued response at test.

Finally, RAIC has also been directly implicated in aversively-motivated instrumental behaviour. Paulus and colleagues (2003) measured brain activity using fMRI in human subjects while they undertook a Risky-Gains task. In this task, subjects could perform a safe response for a small reward (in the form of points) or could perform a risky response that caused either large reward or punishment (point loss). They found that the insula activation was stronger when subjects selected the risky response, and insula activity was related to the probability of selecting a safe response following punishment. Wächter and colleagues (2009)

“punished” slow responses on a reaction time task with monetary loss and found, using fMRI, that the insula was activated in response to punishment, and this activation was correlated with subsequent improvement in performance on the task. However, this effect may be attributable to negative reinforcement, though a similar task reinforcing fast responses with monetary reward did not activate the insula, while successful avoidance of punishment activated the dStr. These findings suggest that the dStr is involved in negative reinforcement, while the insula is involved in punishment-driven behavioural change.

Summary

In summary, the PFC has been suggested to play a role in punishment by virtue of its involvement in aversive and instrumental behaviour. This PFC potential involvement in punishment is not confined to a particular PFC region or structure; the dmPFC (ACC and PL), vmPFC (IL) and lPFC (OFC and RAIC) have each been implicated in punishment-related functions. Moreover, there is substantial anatomical and functional connectivity between these regions and the other brain regions (hippocampus, amygdala, VTA, striatum) reviewed in this chapter. Indeed, the punishment-related functions of PFC structures have been linked

89 Chapter 2: Brain Mechanisms of Punishment to this connectivity (Baxter et al., 2000; Holland & Gallagher, 2004; Kobayashi, 2012;

Lammel et al., 2008; Schoenbaum & Roesch, 2005; Wiech & Tracey, 2013). However, direct evidence for a role of the PFC in punishment is limited, and the particular involvement of specific PFC structures in different aspects of punishment requires further investigation.

2.3 Summary of Chapter 2

This chapter has reviewed the brain mechanisms for punishment. A number of pharmacological manipulations and brain regions has been reviewed. Expressed simply:

1. Drugs that agonise GABA receptors with BZ-binding sites and modulate 5-HT

action, particularly anxiolytics, reduce suppression of a concurrently reinforced and

punished response without significant perturbation of unpunished behaviour.

GABAergic anxiolytics may derive their anti-punishment effects from specific

inhibition of the 5-HT system, which interacts with the DA system. NE may have a

more specific role in extinction of punishment.

2. Several circuits and structures have been implicated in punishment and/or

punishment-related learning and behaviour. The Behavioural Inhibition System,

composed of the septohippocampal system and its connections with the PFC and

subcortical structures (e.g. amygdala and midbrain), is argued to mediate

instrumental suppression.

3. The amygdala has been separately attributed a role in aversively-motivated

behaviour, including punishment. It has been proposed that the BLA is responsible

for encoding the motivational value of stimuli and response-outcome associations,

both of which are required for appropriately suppressing behaviour on the basis of

response-punisher associations.

90 Chapter 2: Brain Mechanisms of Punishment

4. Midbrain DA pathways have been implicated in aversive and instrumental behaviour

in several ways: direct and indirect LHb modulation of DA neuron activity within the

VTA and SNc influences their ventral striatal, dorsal striatal, and cortical targets,

differentially implicating mesolimbic, nigrostriatal and mesocortical pathways.

5. In addition, a survey of the appetitive instrumental conditioning literature, which is

significantly more advanced, has identified discrete R-O circuits (PL, BLA, MDT

and dmStr).

6. Finally, a broader consideration of prefrontal function highlights regions, including

dmPFC, vmPFC and lPFC, considered to play a crucial role in decision-making and

behavioural control.

2.4 Aims

Remarkably, and importantly, the precise role of most of the aforementioned systems and circuits in punishment remain unclear. This is a fundamental limitation of our current understanding. On the one hand, we know from an extensive literature that GABA and 5-HT mediate suppression of a punished response (with a likely role for DA as well). However, the brain regions for these contributions are poorly understood. On the other hand, we know much about the brain mechanisms for appetitive instrumental learning and the roles of specific regions and circuits in discrete aspects of this learning, but we do not know whether these functions extend to punishment and instrumental suppression. The current thesis seeks to use localised neuropharmacological manipulations into discrete regions of the rodent brain, in particular into the amygdala, striatum, prefrontal cortex, LHb, and midbrain, to determine their roles in discrete aspects of this learning.

91 Chapter 3: Assessment of a multi-phase punishment protocol

Chapter 3 Assessment of a multi-phase punishment protocol

Before investigating the neural underpinnings of punishment learning and behaviour, an appropriate protocol invoking punishment must be established. A punishment procedure is defined as a contingency between a response and an aversive stimulus, causing suppression of that response. Therefore it precludes contingencies that do not require a response and contingencies that do not result in response suppression.

In Chapter 1 it was maintained that punishment, as a form of learning and behaviour, is driven by instrumental suppression. That is, it is suppression of a particular response driven by formation of a response-punisher association (Bolles et al., 1975, 1980; Mackintosh, 1983;

St. Claire-Smith, 1979b). This is distinct from Pavlovian fear, which is suppression driven by

CS-US associations, and negative reinforcement, which is a displacement of the punished response by a negatively reinforced behaviour. However, it is possible for a punishment procedure to cause a reduction in responding through each or a combination of these processes.

Previous studies have indicated some distinctions between these processes, allowing for means to control and account for their contribution. As instrumental punishment is the process of interest, it is preferable to minimise contributions of negative reinforcement and Pavlovian fear through experimental design, or measure the extent of their possible influence so that appropriate interpretations can be made.

Negative reinforcement contributions to punishment suppression are difficult to directly control or measure within punishment protocols. However, it has been argued that instrumental suppression will be preferentially recruited if the response that causes the aversive stimulus is discrete and specific (Rachlin & Herrnstein, 1969; Solomon, 1964). Also, although alternative behaviours necessarily replace the punished response, this occurs in a

92 Chapter 3: Assessment of a multi-phase punishment protocol manner more attributable to reductions in the punished response than reinforcement of a response competing with the punished one (Dunham, 1971, 1972). When a protocol using a specific response-punisher contingency (with diffuse unpunished alternative behaviours) the suppression that occurs does not seem to recruit negative reinforcement processes (Estes,

1969; Shettleworth, 1978).

Pavlovian fear can arise from formation of associations between the aversive punisher and its environmental/sensory precedents. Although such precedents are impossible to remove, ample research has found that response-dependent punishers result in a different behavioural phenotype than response-independent punishers, apart from the robust finding that response-dependent punishers result in significantly more suppression of the punished response (Annau & Kamin, 1961; Azrin, 1956; Bolles et al.. 1980; Bouton & Schepers, 2015;

Goodall, 1984). Response-independent suppression results in measurable species-specific defence reactions such as freezing (Hunt & Brady, 1951) and also generalises, causing suppression of unpunished responses (Bolles et al., 1980; Hunt & Brady, 1955). Bolles and colleagues (1980) also noted that Pavlovian contributions to suppression were most notable during initial trials, and were quickly extinguished, possibly due to a low relative validity of

CSs, but high relative validity of the response, for the punisher (St. Claire-Smith, 1979b).

This Pavlovian fear contribution is minimized by the use of strong response-punisher contingencies (Goodall, 1984).

Therefore an investigation into instrumental suppression would use a strong response- punisher contingency reducing Pavlovian contributions. Also, including an alternative unpunished response and measurement of a fear response, such as freezing, could be used to account for any suppression stemming from Pavlovian fear.

The aim of Experiment 1 was to validate a punishment protocol that invoked instrumental suppression. The design involved training rats to press two individually presented levers by

93 Chapter 3: Assessment of a multi-phase punishment protocol reinforcing leverpressing on a VI-30sec schedule with food. After leverpressing on both levers was acquired, pressing of one lever was punished with a moderate 0.5mA footshock on a superimposed FR-10 schedule (punished lever), providing conflict and incentive to continue pressing the lever. The other lever continued to be reinforced with food on a VI-30sec, but was not punished with footshock (unpunished lever).

The rationale for presenting the levers individually was so that response competition between levers and response bias would not affect pressing of either lever. Also, concurrent presentation would make an alternative reinforced response available during punished response trials, removing conflict (as reinforcement can be earned shock-free by pressing the unpunished lever). It has been shown that providing an unpunished, reinforced alternative response greatly reduces the amount of punished responding (Azrin & Holz, 1966), which may prevent the detection of increases or decreases in punished responding due to neural manipulations, undermining the objectives of this thesis. Also, protocols for detecting effects of neural manipulations on punishment have used non-concurrent punished and unpunished responses (Davidson & Cook, 1969; Geller & Seifter, 1960).

However, a discrete choice between responding on one lever over another is of interest to this thesis, as previous studies show that punishment shifts response allocation away from a previously punished response (Logan, 1969). Therefore this protocol also included a choice test measuring preference between the two levers. So that within-subject measures of choice can be made in future experiments, two choice tests were conducted, with a typical punished session between choice tests to reduce an extinction effects of the initial unpunished choice test.

94 Chapter 3: Assessment of a multi-phase punishment protocol

Experiment 1

Methods

Subjects

Subjects were 8 experimentally naive male Sprague Dawley rats (290-340g) obtained from a commercial supplier (Animal Resources Centre, Perth, Australia). Rats were housed in groups of four in plastic cages and maintained on a 12 hr light-dark cycle (lights on at

7:00A.M.). The procedures used were approved by the Animal Care and Ethics Committee at the University of New South Wales and were conducted in accordance with the National

Institutes of Health (NIH) Guide for the Care and Use of Laboratory Animals (NIH

Publications No. 80-23, revised 1996).

Apparatus

All behavioural training was conducted in a set of eight identical experimental chambers (24cm [length] x 30cm [width] x 21cm [height]; Med Associates Inc., VT, USA).

Each chamber was enclosed in sound- and light-attenuating cabinets (55.9cm [length] x

35.6cm [width] x 38.1cm [height]) and fitted with fans for ventilation and background noise.

The chambers were made up of a Perspex rear-wall, ceiling and hinged front-wall, and stainless steel sidewalls. The chamber floors were made of stainless steel rods (4mm in diameter) spaced 15mm apart. Each chamber stood 35mm above a tray of corncob bedding. A recessed magazine (3cm in diameter) within a 4cm x 4cm hollow in the right-side chamber wall received pellets from an external automatic hopper. Infrared photocells detected entries into the magazine. Infrared digital cameras, attached to the ceiling of the cabinets, recorded all activity within the chamber from above.

Two retractable levers were placed either side of the magazine. In both experiments, a

45g grain pellet, which was delivered to the magazine from the external hopper, served as the reward. The punisher was a 0.5sec, 0.5mA footshock delivered through the grid floor. All

95 Chapter 3: Assessment of a multi-phase punishment protocol chambers were connected to a computer with Med-PC IV software (Med Associates, VT,

USA), which controlled lever, pellet and shock presentations and recorded the leverpresses and magazine entries.

Procedure

A summary of the following procedure is shown in Table 3.1.

Food deprivation and magazine training. Commencing 3 days prior to the start of the experiment, and persisting for the duration of the experiment, rats received daily access to 10-

15g of food and unrestricted access to water in their home cages. Rats were placed in the experimental chambers for 30mins to acclimatise and were then given leverpress training, which consisted of two levers (left and right) being extended and reinforced with grain pellets on a fixed ratio-1 (FR-1) schedule for 1 hr, or until each lever had been pressed 25 times each

(each lever would retract after 25 presses). Houselights were on throughout the session.

Leverpress training. All rats were then given 7 days of leverpress training. Levers were presented individually in an alternating pattern so that one lever was extended for 5 mins while the other lever was retracted. Houselights were off. After 5 mins the extended lever was retracted and the retracted lever was extended, such that each lever was always presented on its own. This alternation occurred throughout the 40min session. Both levers were reinforced with a pellet on a VI-30sec schedule.

Punishment. On Days 1 – 8, rats were trained and tested in the punishment task.

Punishment sessions were identical to sessions describe above, except that a designated lever was also punished with a 0.5sec, 0.5mA footshock on an FR-10 schedule. The same lever (left or right) was designated as “punished” throughout the experiment for each rat but which lever was designated as punished was counterbalanced between rats.

Choice test. On Days 9 and 11 rats received a choice test. This involved both levers being extended for 30 mins. Responses on either lever were rewarded on a VI-60sec such that

96

Table 3.1

Summary of procedure for Experiment 1

Food Magazine Leverpress deprivation training training Punishment Choice test (3 days) (2 days) (7 days) Day 1 - 8 Day 9, 11

Punished Lever - FR-1 pellet VI-30sec pellet VI-30sec pellet, FR-10 shock Unitary VI-60sec Unpunished Lever - FR-1 pellet VI-30sec pellet VI-30sec pellet pellet

Levers extended - Concurrently Individually Individually Concurrently

Note: Levers were located on the left and right side of the magazine. They were either concurrently extended, or were extended individually on a 5min alternating basis. One lever, left or right, was designated as punished for the entire experiment (counterbalanced across subjects). The punished lever caused 0.5sec, 0.5mA footshock every 10 leverpresses during punishment sessions. During choice test, no shocks were delivered and leverpressing (either lever) resulted in pellet delivery approximately every 60secs.

97

Chapter 3: Assessment of a multi-phase punishment protocol after an average of 60secs the first leverpress of either lever caused immediate pellet delivery and reset the VI timer. Therefore only pressing one lever over the course of the session, or pressing both levers over the course of the session, would not make a difference to the number of rewards obtained. No shocks were delivered. Between the two choice tests, rats received a reminder punished session, under the same conditions as the previous punished sessions, to reduce any effects the initial non-punished session might have had on performance or lever preference.

Data Analysis

The dependent measures were total punished and unpunished leverpressing per session, average latency to initially press punished and unpunished levers, and freezing

(defined as a crouching posture with the absence of all movement other than that required for respiration; Bouton & Bolles, 1980) during punished and unpunished trials. Within-subjects

ANOVAs were used to analyse all data, with lever (punished vs. unpunished) and day (linear contrasts) as the two factors for punishment and aversive choice. A correlation analysis between freezing and leverpressing was also conducted, with the total number of leverpresses during a session correlated with the amount of freezing observed for that session. For all analyses, the type I error rate (α) was controlled at 0.05. Rats were excluded from all analyses if they failed to acquire leverpressing during pre-training.

Results

Pre-training

In daily 40min pre-training sessions, rats received alternating periods of 5 min access to two levers whereby each lever was reinforced with a food pellet on a VI-30sec schedule for 7 days. One rat was excluded from all analyses due to a failure to acquire leverpressing to either lever during pre-training. Mean and standard error of the mean (SEM) leverpressing, freezing

98 Chapter 3: Assessment of a multi-phase punishment protocol and latency to initial leverpress on the last day of pre-training are shown in Figure 3.1. There was no significant difference in responding on the to-be punished and to-be unpunished levers as measured by leverpressing (F(1,6) < 1; p > .05) or average latency to initially leverpress

(F(1,6) < 1; p > .05) on the last day of pre-training. There was also no difference in freezing while the punished and to-be unpunished levers were extended (F(1,6) < 1; p > .05) on the last day of pre-training.

Punishment

Following pre-training, rats received daily 40min punishment sessions consisting of alternating periods of 5 min access to two levers for 7 days. Pressing these levers was reinforced with food pellets via the same VI-30sec schedule used during pre-training, and one of these levers was also punished on an FR-10 schedule with delivery of footshock.

Mean ± SEM leverpressing during the punishment phase are shown in Figure 3.1(A).

Over the course of this training, there was a significant effect of lever (punished versus unpunished), (F(1,6) = 25.5; p < .05) and the difference in responding on the levers increased across days, (F(1,6) = 30.7; p < .05). Across days, there was an increase in responding on the unpunished lever (F(1,6) = 27.3; p < .05) and a decrease in responding on the punished lever

(F(1,6) = 38.3; p < .05).

Mean ± SEM latencies to initially leverpress during the punishment phase are shown in

Figure 3.1(B). There was an overall decrease in latencies to press levers across days (F(1,6) =

8.3; p < .05). There was also a significant effect of lever (punished versus unpunished) (F(1,6)

= 34.2; p < .05). No significant interaction between lever and day was observed (F(1,6) = 4.6; p

> .05), though there was a significant decrease in latencies to press the unpunished lever (F(1,6)

= 7.3; p < .05) but no significant change in latencies to press the punished lever (F(1,6) = 1.1; p

> .05).

99

Figure 3.1. (A) Mean ± SEM leverpresses on the punished and unpunished levers for the last day of pre-training (T) and punishment sessions (Day 1-7) (n = 7). (B) Mean ± SEM latency to initially press the punished and unpunished lever (averaged across trials) for the last day of pre-training (T) and punishment sessions. (C) Mean ± SEM levels of freezing while punished and unpunished levers were extended for the last day of pre-training (T) and punishment sessions.

100

Chapter 3: Assessment of a multi-phase punishment protocol

The species-typical defense reaction of freezing during each session was assessed.

Mean ± SEM freezing during the punishment phase are shown in Figure 3.1(C). Freezing was initially low (approximately 20% of observations) and significantly decreased across the course of punishment training. Over the course of this training, there was a significant effect of which lever was extended (punished versus unpunished) (F(1,6) = 7.5; p < .05) and a significant decrease of freezing across days (F(1,6) = 23.7; p < .05). There was no lever x day interaction (F(1,6) < 1; p > .05), with freezing decreasing across days for both the punished

(F(1,6) = 61.8; p < .05) and unpunished lever (F(1,6) = 12.2; p < .05). Importantly, there was no significant correlation between freezing and responding on the punished lever (rxy = .236, p >

.05). So, although there was some freezing in this task, it was low, it decreased across the course of punishment, and explained only 5.5% of the variance in punished lever responding.

Aversive Choice

Following the punishment expression test, rats were assessed in a choice procedure across two days that involved simultaneous presentations of both the punished and unpunished lever in 30min sessions. Each lever was reinforced with food pellets on a VI-

60sec schedule, but no punishment was delivered. A punished session was given between the two choice sessions.

Figure 3.2 shows total responses and latencies to respond on choice test. Rats responded significantly more on the unpunished than the punished lever (F(1,6) = 34.7; p <

.05), with no lever x day interaction (F(1,6) = 1.4; p > .05). No significant overall effect of lever (F(1,6) = 4.9; p > .05), day (F(1,6) = 2.0; p > .05), or lever x day interaction (F(1,6) = 2.1; p

> .05) on leverpress latency was found. So, rats displayed a preference for responding on the unpunished lever that was stable across two test days and although latencies were faster on the second test day than the first, this decrease was not significant. Freezing during choice test was low and did not exceed 5% of observations.

101 Chapter 3: Assessment of a multi-phase punishment protocol

Figure 3.2. (A) Mean ± SEM lever presses on the punished and unpunished levers across the two days of aversive choice task. (B) Mean ± SEM latency to initially press the punished and unpunished lever across the two days of aversive choice task.

102 Chapter 3: Assessment of a multi-phase punishment protocol

Discussion This experiment sought to validate a punishment protocol that preferentially invoked instrumental suppression. Rats were trained to respond on two individually presented levers for food. Introduction of a leverpress-shock contingency on one of the levers resulted in rapid and potent reduction in responding on the punished lever and an increase in latencies to respond on the punished lever. Although there was a slight reduction in responding and increase in leverpress latencies on the unpunished lever on initial trials, unpunished leverpressing quickly recovered and was at or above pre-training levels following day 4 of punishment. Crucially, levels of freezing were relatively low throughout punishment sessions, being most notable during initial punishment sessions and present during both lever presentations, and decreased over the course of punishment sessions despite increased suppression on the punished lever. Finally, the unpunished choice test revealed a clear preference for pressing the unpunished lever over the previously-punished lever. The contribution of Pavlovian fear, via freezing, to these effects was minimal.

These findings are congenial with previous findings. Punishment using a tight response- punisher contingency, and moderate punisher, resulted in a stable suppression of responding that could not be accounted for by Pavlovian fear (Annau & Kamin, 1961; Azrin, 1956;

Bolles et al.. 1980; Bouton & Schepers, 2015; Hunt & Brady, 1955; Goodall, 1984).

Pavlovian fear, as indicated by freezing and suppression of unpunished responding (which coincided with each other but not suppression of punished responding), was mostly restricted to initial trials, in agreement with observations by Bolles and colleagues (1980).

Choice to press the unpunished lever over the punished lever, despite no shocks being presented, indicates the ability for punishment to potently affect choice behaviour, shifting responding away from the punished response despite identical appetitive reinforcement, as found in matching law studies (Farley & Fantino, 1978; Logan, 1969; Rachlin & Herrnstein,

1969). This preference was sustained in a second unpunished choice test, given an intervening

103 Chapter 3: Assessment of a multi-phase punishment protocol punishment session, though there was a non-significant decrease in latencies to press the punished lever compared to the first choice test.

This experiment has established a punishment protocol that causes selective instrumental suppression of a punished but not unpunished response. Although there is evidence for Pavlovian fear via freezing, it is slight, largely restricted to initial sessions, and can be detected by decrements in responding on the unpunished lever. Punishment sessions cause a preference to press the unpunished lever in a choice test, which could be sustained over two unpunished sessions.

104 Chapter 4: General Methods

Chapter 4

General Methods

Experiment 1 established a punishment protocol that generated selective instrumental suppression of a punished response. This protocol can be divided into 3 separate phases of punishment behaviour:

1. Punishment acquisition: Initial punishment sessions, where suppression of

responding to the punished lever is learned and Pavlovian contributions are

extinguished.

2. Punishment expression: Later punishment sessions, after instrumental suppression

has been fully acquired, and responding is stable between sessions.

3. Aversive choice: Preference for an unpunished lever over a punished lever.

The neural underpinnings of each of these aspects of punished behaviour were of interest. The current thesis investigated the role of various brain structures and circuits by selectively modulating activity within these structures through bilateral microinfusions of drugs into the region of interest. All experiments achieved this through bilateral implantation of guide cannulae prior to behavioural training, and compared the behavioural effects of these drugs to behaviour following control infusions of saline.

Punishment acquisition was investigated through between-subject infusions immediately before the first two punishment sessions (drug and saline groups). Punishment expression was investigated using within-subject infusions of saline and drugs

(counterbalanced across sessions) on day 6 and 7. Choice was also investigated using within- subject infusions of saline and drugs (counterbalanced across sessions) following expression.

In some experiments, the effect of drugs on unprovoked locomotor activity was measured following within-subject drug infusions (counterbalanced across sessions). Location of

105 Chapter 4: General Methods infusions were verified through histological examination after all behavioural tasks were completed. Some experiments deviated from the design outlined below. Where that is the case the alteration is explicitly described within that experiment’s methods section.

Subjects

Subjects were experimentally naive, adult male Sprague Dawley rats (ranging from

260-470g in weight at surgery) obtained from the same source and maintained under the same conditions as reported in Experiment 1.

Apparatus

Apparatus was identical to that used in Experiment 1. Locomotor activity was assessed in Plexiglas chambers (Med Associates, St Albans, VT, USA) 43.2cm (width) x 43.2cm

(length) x 30.5cm (height). Movement was tracked through the use of three 16 beam infrared arrays. Infrared beams were located on both the X and Y-axes for positional tracking.

Procedure

A summary of the following procedure is shown in Figure 4.1.

Surgery. Rats were anaesthetized with 1.3ml/kg ketamine (100 mg/ml; Ketapex; Apex

Laboratories, Sydney, Australia) and 0.2ml/kg muscle relaxant, xylazine (20mg/ml; Rompun;

Bayer, Sydney, Australia) (i.p.) and placed in a stereotaxic apparatus (Model 900, Kopf,

Tujunga, CA, USA), with the incisor bar maintained at approximately 3.3 mm below horizontal to achieve a flat skull position. 26 gauge guide cannulae (6 or 11mm in length;

Plastics One, Roanoke, VA, USA) were implanted bilaterally into the relevant brain region using coordinates derived from Paxinos and Watson (2007). The guide cannulae were fixed in position with dental cement and jeweller’s screws. Dummy cannulae were kept in the guide at all times except during microinjections. Rats were allowed to recover for at least 5 days prior to the start of the experimental procedure.

106

Figure 4.1. Summary of general procedure. Behavioural training uses the same procedure as described in Experiment 1 (p. 95). Infusions are of a selected drug or saline (control). Within-subject infusions are counterbalanced across infusion days.

107

Chapter 4: General Methods

Food deprivation. Commencing at least 5 days after surgery and persisting for the duration of the experiment, rats received daily access of 10-15 g of food (to attain and then maintain a weight of approximately 90% compared to immediately prior food deprivation) and unrestricted access to water in their home cages.

Leverpress training. Three days after commencement of food deprivation, rats were placed in the experimental chambers for 30 mins to acclimatise and were then given leverpress training as described in Experiment 1.

Punishment. On Days 1 – 8, rats were trained and tested in the punishment task as described in Experiment 1. Immediately before the first 2 days of punishment, rats received bilateral infusions of 0.9% phosphate-buffered saline or a drug to assess the role of the target structure in the acquisition of punishment. For microinjections, a 33-gauge microinjection cannula (Plastics One, Roanoke, VA, USA) was inserted into the guide cannula and connected to a 10µl glass syringe (Hamilton Company, NV, USA) operated by an infusion pump (World Precision Instruments, FL, USA). The microinjection cannula projected a further 1 mm ventral to the tip of the guide cannula. Drugs were infused at a rate of 0.25

µl/min over 2 min, and the microinjection cannula was left in place for a further 1 min to permit diffusion of the injectate. Rats also received bilateral infusions of either saline or drug on day 6 and 7 (counterbalanced within-subject) to test for drug effects on expression of punishment.

Choice test. On Days 9 and 11 rats received a choice test. This was the same as

Experiment 1. Rats were tested twice, once after bilateral infusions of drug and once after infusions of saline (within-subject, counterbalanced). Between the two choice tests, rats received a reminder punishment session.

Locomotor Test. On Day 13, rats were placed in locomotor chambers for 40 mins to habituate them to the chamber. On Days 14 and 15, rats received bilateral infusions of saline

108 Chapter 4: General Methods or drug (counterbalanced, within-subject) immediately before being placed into the locomotor chambers for 40 mins. Total distance travelled and velocities were measured.

Histology. At the end of the experiment, rats were injected i.p. with sodium pentobarbital (100 mg/kg) and their brains removed. Unfixed brains were quickly frozen and sectioned coronally (40µm) through the target structure using a cryostat (Microm 560,

Germany). Each section was collected and subsequently stained with cresyl violet for histological examination. The boundaries of the target structure were determined according to

Paxinos and Watson (2007).

Data Analysis

For all analyses, the type I error rate (α) was controlled at 0.05. Rats were excluded from all analyses if they failed to acquire leverpressing during pre-training or if the cannula tip was not bilaterally located within the target structure.

Leverpress training and punishment acquisition: The dependent measures were total punished and unpunished leverpressing per session and latency to leverpress. Between x within-subjects ANOVAs were used to analyse leverpress training and punishment acquisition data, with lever (punished vs. unpunished) and day (for punishment acquisition, using linear contrasts) as the within-subjects factors, and drug group (saline vs. drug) as the between-subjects factor.

Punishment expression and choice: Within-subjects ANOVAs were used to analyse leverpresses and leverpress latencies for punishment expression and aversive choice. In these analyses lever (punished vs. unpunished) was one within-subjects factor and infusion (saline vs. drug) was the other. If effects were found during punishment acquisition, the role of acquisition group on leverpressing during expression and choice were analysed; a between- subjects factor of acquisition group (saline vs. drug) was included with the aforementioned

109 Chapter 4: General Methods within-subjects ANOVAs. Leverpress ratios were analysed using a one-sample t-test, using

0.5 (no change in leverpressing after drug compared to after saline) as the test value.

The effect of infusion order on aversive choice was analysed by including a between- subjects factor of infusion order (saline or drug first) with the within-subjects choice

ANOVA. To analyse within-session changes in punished leverpressing during the aversive choice test, punished leverpresses for each minute was used as one within-subjects factor

(using a linear contrast) and infusion (saline vs. drug) was used as another factor.

Locomotion: Distance travelled and velocity were analysed using within-subjects

ANOVAs, with infusion (saline vs. drug) as the within-subjects factor. To analyse the role of punishment acquisition infusions on later unprovoked locomotion a between-subjects factor of acquisition group (saline vs. drug) was included in the within-subjects ANOVAs.

110 Chapter 5: The Role of BLA and mAcbSh in Punishment

Chapter 5 The Role of BLA and mAcbSh in Punishment

As outlined in Chapter 2, BLA mediates Pavlovian fear conditioning (Maren & Quirk,

2004; Sehlmeyer et al., 2009), encodes R-O associations (Corbit & Balleine, 2005; Parkes &

Balleine, 2013) and the values of outcomes (Belova et al., 2007; Zhang et al., 2013). Several lines of evidence directly implicate amygdala in punishment; infusions of norepinephrine or benzodiazepines into the amygdala increase punished responding, and punishment itself increases GABAA receptor subunit mRNA expression and benzodiazepine binding in the amygdala (Liu & Glowa, 2000; Margules, 1971b). Similar anti-punishment effects have been observed with serotonin-depleting amygdala lesions (Sommer et al., 2001). In human fMRI studies, punishment has been achieved via different approaches (e.g., monetary loss, loss feedback) and fMRI shows amygdala activation (Zalla et al., 2000) and amygdala interactions with the ventral striatum (likely the Acb) (Camara et al., 2009) and hippocampus (Hahn et al.,

2010) in response to negative outcomes. A study of humans with bilateral amygdala lesions

(Bechara et al., 1999) showed that they, unlike healthy controls, did not learn to avoid choosing from disadvantageous decks in the Iowa Gambling Task.

Nonetheless, these studies have not adequately isolated the role of BLA versus other amygdala regions in punishment. Lesion studies in rats, enabling greater anatomical control, have been better able to isolate an effect to BLA. For example, Killcross and colleagues

(1997) reported that excitotoxic lesions of BLA, but not central amygdala, impaired conditioned punishment whereby leverpressing was punished via presentations of an aversive

CS. Yet these lesion studies have not adequately distinguished between the roles of BLA in different aspects of punishment (e.g., acquisition, expression, choice), and a role for BLA in punishment remains disputed (Maren, 2003).

111 Chapter 5: The Role of BLA and mAcbSh in Punishment

BLA also has differences in connectivity (Alheid, 2003; Brog et al., 1993; Hamlin et al.,

2009; McDonald et al., 1996; Sesack et al., 1989; Shinonaga et al., 1994) and function

(Hamlin et al., 2009; Kantak et al., 2002; McLaughlin & Floresco, 2007) between rostral and caudal portions, which have not been explored in regards to punishment. For example, caudal but not rostral BLA projects extensively to the medial portion of the nucleus accumbens shell

(mAcbSh), whereas rostral BLA projects to lateral portions of the nucleus accumbens shell

(Groenewegen et al., 1999). Consistent with this differential connectivity, caudal BLA has been implicated in extinction of instrumental responding (Hamlin et al. 2009; McLaughlin &

Floresco 2007) whereas rostral BLA has been implicated in reinstatement of extinguished instrumental responding (Kantak et al. 2002).

This role for caudal BLA has been related to its connectivity with the medial accumbens shell (mAcbSh), particularly from data disconnecting these two regions during instrumental tasks, including expression of instrumental extinction (Millan & McNally, 2011) and stimulus-signalled active avoidance (Ramirez et al., 2015), showing BLA-AcbSh connectivity is involved in behavioural suppression and aversively-motivated behaviour. Thus, mAcbSh is a strong candidate as a structure mediating punishment behaviour, possibly through its connection with caudal BLA. The projection from BLA to AcbSh is glutamatergic, and blocked by AMPA receptor antagonists (Howland et al., 2002; Stuber et al., 2011).

The aim of Experiment 2 and 3 was to study the role of BLA and mAcbSh in punishment. In Experiment 2, rats received bilateral cannulations of the rostral or caudal

BLA, permitting reversible inactivation using the GABAA and GABAB receptor agonists muscimol and baclofen, and were then run through the protocol outlined in Chapter 4. In

Experiment 3, rats received bilateral cannulation of the mAcbSh permitting reversible inactivation with NBQX and were run through the same punishment procedure as BLA rats.

112 Chapter 5: The Role of BLA and mAcbSh in Punishment

Experiment 2: The role of the BLA in punishment

Methods

Subjects

Subjects were 22 experimentally naive male Sprague Dawley rats (300-380g).

Procedure

Procedures were identical to those described in the General Methods section (p. 105).

Cannulae (11mm) were implanted bilaterally according to the coordinates AP: -2.9, ML: ±5.0,

DV: -7.9 mm from bregma when targeting caudal BLA (n = 12) and AP: -2.1, ML: ±4.9, DV:

-7.9mm from bregma when targeting rostral BLA (n = 10) (Paxinos & Watson, 2007).

The drug used for infusions was an equal mixture of GABA agonists baclofen and muscimol (BM; 1mM baclofen, 0.1mM muscimol; Sigma-Aldrich, Sydney, Australia).

Results

Histology

The locations of microinjection cannulae are shown in Figure 5.1. Examination of placements revealed that 4 rats had misplaced cannulae and did not bilaterally target the BLA.

These animals with misplaced cannulae were excluded from the analyses, leaving 18 animals remaining. Of these 18 animals, 9 had placements in rostral BLA and 9 in caudal BLA.

Pre-training

The mean ± SEM responses on the to-be-punished and to-be-unpunished levers are shown in Figure 5.2(A) (data point T). There was no significant overall difference between saline and BM groups in leverpressing at the end of pre-training (F(1,16) < 1; p > .05), no overall difference in responding on the to-be punished and to-be unpunished levers (F(1,16) =

1.4; p > .05), and no group x lever interaction (F(1,16) = 2.9; p > .05).

113 Chapter 5: The Role of BLA and mAcbSh in Punishment

Figure 5.1. Microinfusion cannula placements within the BLA as verified by Nissl-stained sections. Black dots represent the most ventral point of the cannula tract, indicated on coronal sections adapted from Paxinos and Watson (2007). Cannulations anterior to -2.6mm from Bregma were considered as targeting the rostral BLA, while those posterior to -2.6mm were considered as targeting the caudal BLA.

114 Chapter 5: The Role of BLA and mAcbSh in Punishment

Figure 5.2. (A) Mean ± SEM lever presses on the punished and unpunished levers during the last day of pre-training (T) and punishment acquisition. Arrows indicate days that rats received infusions of either saline (n = 9) or baclofen and muscimol (BM) (n = 9) immediately prior to the session. (B) Mean ± SEM latency to initially press the punished and unpunished lever (averaged across trials) during punishment acquisition. *p < .05

115 Chapter 5: The Role of BLA and mAcbSh in Punishment

Effects of BLA inactivation on acquisition of punishment

The mean ± SEM leverpressing during the punishment phase are shown in Figure

5.2(A). Over the course of this training, there was a significant effect of lever (punished versus unpunished), (F(1,16) = 35.3; p < .05) and the difference in responding on the levers increased across days, (F(1,16) = 51.4; p < .05). Across days, there was an increase in responding on the unpunished lever (F(1,16) = 35.7; p < .05) and a decrease in responding on the punished lever (F(1,16) = 36.8; p < .05).

Rats received BLA infusions of BM or saline prior to the first two days of training.

During these infusion days, responding on the punished lever (F(1,16) = 6.3; p < .05) but not responding on the unpunished lever (F(1,16) < 1; p > .05) was significantly increased by BM infusions. There were no differences between groups on the remaining three infusion free days (all F(1,16) < 1.3; all p > .05). Latencies to emit first responses across trials on the punished and unpunished lever during infusion days were also assessed (Figure 5.2(B)).

During these infusion days, latencies to respond on the punished lever increased (F(1,16) = 9.5; p < .05) whereas latencies to respond on the unpunished lever did not (F(1,16) < 1; p > .05).

There was no effect of BM infusions on these latencies for either the punished (F(1,16) = 1.4; p

> .05) or unpunished (F(1,16) < 1; p > .05) lever.

Effects of BLA inactivation on expression of punishment

At the end of training rats were tested twice, once after BLA infusion of BM and once after infusion of saline, for the effects of BLA inactivation on the expression of punishment.

The order of these tests was counterbalanced.

The mean ± SEM levels of performance on test are shown in Figure 5.3(A). There was a significant main effect of lever, such that rats responded more on the unpunished lever than

*In this and remaining experiments, df for expression and other within-subject tests is different (df = n-1) to acquisition (df = n-2) due to the use of appropriate between vs. within-subject analyses.

116 Chapter 5: The Role of BLA and mAcbSh in Punishment

the punished lever (F(1,17*) = 58.6; p < .05). There was no difference in responding between

117

Figure 5.3. (A) Mean ± SEM lever presses on the punished and unpunished levers during punishment expression. Rats received within-subject infusions of saline and BM (n = 18) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM suppression ratios of BM on lever pressing during punishment expression. (C) Mean ± SEM latency to initially press the punished and unpunished lever across trials, after

117 infusions of saline or BM, during punishment expression. *p < .05

Chapter 5: The Role of BLA and mAcbSh in Punishment

BM and saline tests for the unpunished lever (F(1,17) < 1; p > .05) and the difference between these tests for the punished lever approached, but did not reach, statistical significance (F(1,17)

= 4.1; p = .059). There was no effect of acquisition group (saline versus BM) on leverpressing during expression test (all F(1,16) < 1; p > .05).

To further examine the effects of BLA inactivation on expression of punishment, responding on punished and unpunished levers was computed as a ratio of responding between the BM and saline tests (ratio = A/(A+B)) (Figure 5.3(B)). When this ratio equals

0.5, responding on the lever did not change between the BM and saline tests whereas values greater than 0.5 indicate an increase in responding on the BM test and values less than 0.5 indicate a decrease in responding on the BM test. BM significantly increased this ratio on the punished lever relative to the unpunished lever (F(1,17) = 5.7; p < .05). This was due to the punished leverpress ratio being significantly greater than 0.5 (t (17) = 2.4; p < .05) while the unpunished ratio was no different from 0.5 (t (17) = -.02; p > .05). There was no effect of acquisition group (saline versus BM) on leverpress ratios during the expression test (all F(1,16)

< 1; p > .05).

When latencies to respond on test were analysed, rats were significantly slower to respond on the punished lever than the unpunished lever (F(1,17) = 17.9; p < .05) and there was a significant overall increase in latencies to respond across the session (F(1,17) = 5.1; p < .05)

(Figure 5.3(C)). This was due to increased latencies to press the punished lever (F(1,17) = 6.9; p < .05), while there was no significant change in latencies to press the unpunished lever

(F(1,17) = 1; p > .05). BM infusion into BLA had no significant effect on these latencies (all

F(1,17) < 2.4; p > .05).

The results of Experiment 1 showed that the preference for the unpunished lever during choice test – both in terms of responses and latency to response – did not change significantly across the two test days. Effect of infusion order (BM as first or second infusion) was

119 Chapter 5: The Role of BLA and mAcbSh in Punishment

examined. There was no interaction of infusion order on leverpresses, latencies (all F(1, 15) < 1; p > .05), or within-session leverpresses (all F(1, 15) < 2.2; p > .05).

BLA cannula placements varied from -2.28 through to -3.36 mm relative to Bregma.

Given the aforementioned differences in connectivity and function between rostral and caudal

BLA, further analyses were conducted to determine whether the effects of BM depended on whether rostral or caudal BLA was targeted. Using a criterion of -2.6 mm from Bregma to separate rostral and caudal BLA (Hamlin et al. 2009; McLaughlin and Floresco 2007), 9 subjects were identified as having caudal BLA placements and 9 subjects as having rostral placements.

Overall, rats with caudal (F(1,8) = 57.7; p < .05) and rostral (F(1,8) = 16.6; p < .05) placements responded significantly less on the punished lever compared to the unpunished lever (Figure 5.4(A)). However, infusions of BM into the caudal BLA caused a significant increase in responding on the punished (F(1,8) = 5.3; p < .05) but not the unpunished lever

(F(1,8) < 1; p > .05). In contrast, infusions of BM into the rostral BLA had no significant effect on responding on either lever (F(1,8) < 1; p > .05). There was also a significant increase in the

BM:Saline leverpress ratio for the punished lever compared to the unpunished lever for caudal (F(1,8) = 29.1; p < .05) but not rostral BLA (F(1,8) < 1; p > .05) (Figure 5.4(B)). This was due to the leverpress ratio for caudal BLA being significantly greater than 0.5 for the punished lever (t (8) = 3.7; p < .05) while the unpunished ratio was no different from 0.5 (t (8) =

-.81; p > .05). Infusions of BM into the rostral BLA did not significantly change the leverpress ratios from 0.5 for either lever (punished: t (8) = .04; p > .05; unpunished: t (8) = 0.5; p > .05). There was no effect of acquisition group (saline versus BM) on leverpressing or leverpress ratios for either caudal or rostral BLA rats (all F(1,7) < 2; p > .05).

BM infusion into the caudal BLA also had a significant effect on leverpress latencies

(Figure 5.4(C)). Specifically, overall, rats with caudal (F(1,8) = 6.9; p < .05) and rostral (F(1,8)

120

Figure 5.4. (A) Mean ± SEM lever presses on the punished and unpunished levers during punishment expression, separated into rats with cannulae targeting caudal (n = 9) and rostral (n = 9) portions of the BLA. (B) Mean ± SEM suppression ratios of BM on lever pressing during punishment expression, separated into caudal and rostral BLA-targeting cannulae. (C) Mean ± SEM latency to initially press the punished and unpunished lever across trials during punishment expression, separated into caudal and rostral BLA-targeting cannulae. *p < .05 120

Chapter 5: The Role of BLA and mAcbSh in Punishment

= 10.4; p < .05) placements were significantly slower to respond on the punished lever compared to the unpunished lever. However, BM infusion into caudal BLA decreased latency to respond on the punished (F(1,8) = 8.4; p < .05) but not the unpunished lever (F(1,8) < 1; p >

.05). Critically, this effect of BM interacted with trials (F(1,8) = 7.7; p < .05) so that response latencies to the punished lever increased over the session after saline infusions into the caudal

BLA (F(1,8) = 5.6; p < .05), while response latencies to the punished lever after BM infusions did not significantly change over the session (F(1,8) = 2.2; p > .05). BM into the rostral BLA had no significant effect on leverpress latencies (all F(1,8) < 2.3; all p > .05).

Effects of BLA inactivation on aversive choice

Rats received infusions prior to the two choice tests (BM and saline, counterbalanced).

One rat was excluded due to a damaged cannula. Figure 5.5 shows responses on choice test and latencies to responses. Rats responded significantly more on the unpunished than the punished lever (F(1,16) = 38.9; p < .05). There was no difference in responding between BM and saline tests for the unpunished lever (F(1,16) < 1; p > .05) or punished lever (F(1,16) < 1; p >

.05). Rats were also significantly slower to respond on the punished relative to the unpunished lever (F(1,16) = 11.5; p < .05). However, there was no effect of BM infusions on leverpress latencies (all F(1,16) < 2.7; p > .05). There was no significant increase in punished leverpresses across the 30 min sessions (F(1,16) = 3.2; p > .05), and no interaction between changes in punished leverpressing across choice test sessions and infusion of BM (F(1,16) < 1; p > .05).

Further analysis of these data by location of BLA cannula did not alter these main findings.

Effects of BLA inactivation on locomotor activity

Lastly, rats were placed in a plain locomotor chamber for 40mins over 3 days. They received BM and saline infusions immediately prior to the last 2 days (counterbalanced across days). BM infusions into the BLA had no effect on locomotor activity (Figure 5.6). This was

122 Chapter 5: The Role of BLA and mAcbSh in Punishment

Figure 5.5. (A) Mean ± SEM lever presses on the punished and unpunished levers during the aversive choice task. Rats received within-subject infusions of saline and BM (n = 17) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM latency to initially press the punished and unpunished lever during the aversive choice task.

123 Chapter 5: The Role of BLA and mAcbSh in Punishment

Figure 5.6. (A) Mean ± SEM distance travelled after within-subject infusions of saline and BM (n = 17), counterbalanced across days. (B) Mean ± SEM total velocity after within- subject infusions of saline and BM (n = 17), counterbalanced across days.

124 Chapter 5: The Role of BLA and mAcbSh in Punishment

regardless of whether total distance travelled (F(1,16) = 1.6; p > .05) or average velocity (F(1,16)

= 3.3; p > .05) was assessed. This finding was also unaffected by location of cannulae - neither BM infusions into the caudal BLA nor rostral BLA affected distance travelled or velocity (Fs < 2.1; p > .05).

Experiment 3: The role of the mAcbSh in punishment

Methods

Subjects

Subjects were 16 experimentally naive male Sprague Dawley rats (310-400g).

Procedure

Procedures were identical to those described in the General Methods section (p. 105).

Cannulae (11mm) were implanted bilaterally according to the coordinates AP: +1.2, ML:

±1.0, DV: -6.6 mm from bregma (Paxinos & Watson, 2007).

Based on the findings of Millan and McNally (2011), the drug used for infusions was

AMPA antagonist NBQX (1µg/µl; Sigma-Aldrich, Sydney, Australia).

Results

Histology

Examination of placements revealed that 5 rats had misplaced cannulae and did not bilaterally target the AcbSh. These animals with misplaced cannulae were excluded from the analyses, leaving 11 animals remaining. The locations of microinjection cannulae are shown in Figure 5.7.

Pre-training

The mean ± SEM responses on the to-be-punished and to-be-unpunished levers are shown in Figure 5.8. There was no significant overall difference between saline and NBQX

125 Chapter 5: The Role of BLA and mAcbSh in Punishment

Figure 5.7. Microinfusion cannula placements within the mAcbSh as verified by Nissl- stained sections. Black dots represent the most ventral point of the cannula tract, indicated on coronal sections adapted from Paxinos and Watson (2007).

126 Chapter 5: The Role of BLA and mAcbSh in Punishment

groups in leverpressing at the end of pre-training (F(1,9) < 1; p > .05), no overall difference in responding on the to-be punished and to-be unpunished levers (F(1,9) < 1; p > .05), and no group x lever interaction (F(1,9) < 1; p > .05).

Effects of AcbSh inactivation on acquisition of punishment

Rats received infusions into the AcbSh immediately prior to the first two days of punishment training. The mean ± SEM leverpressing during the punishment phase are shown in Figure 5.8(A). Over the course of this training, there was a significant effect of lever

(punished versus unpunished), (F(1,9) = 21.9; p < .05) and the difference in responding on the levers increased across days, (F(1,9) = 23.3; p < .05). Across days, there was an increase in responding on the unpunished lever (F(1,9) = 19.7; p < .05) and a decrease in responding on the punished lever (F(1,9) = 22.4; p < .05).

During infusion days, there was no effect of NBQX on responding to the punished

(F(1,9) < 1; p > .05) or unpunished lever (F(1,9) < 1; p > .05). Latencies to emit first responses across trials on the punished and unpunished lever during infusion days were also assessed

(Figure 5.8(B)). During infusion days, latencies to respond on the punished lever increased

(F(1,9) = 7.2; p < .05) whereas latencies to respond on the unpunished lever decreased (F(1,9) =

9.2; p < .05). There was no effect of NBQX infusions on these latencies for either the punished (F(1,9) < 1; p > .05) or unpunished (F(1,9) = 2.3; p > .05) lever. Thus, AcbSh infusions of NBQX had no effect on responding during punishment acquisition.

Effects of AcbSh inactivation on expression of punishment

At the end of training rats were tested twice, once after AcbSh infusion of NBQX and once after infusion of saline, for the effects of NBQX on the expression of punishment. The mean ± SEM levels of performance on test are shown in Figure 5.9(A). There was a significant main effect of lever, such that rats responded more on the unpunished than the

127 Chapter 5: The Role of BLA and mAcbSh in Punishment

Figure 5.8. Effect of NBQX inactivation of the mAcbSh during punishment acquisition. (A) Mean ± SEM leverpresses on the punished and unpunished levers prior to and during punishment acquisition. T represents the last day of leverpress training. Arrows indicate days that rats received infusions of either saline (n = 5) or NBQX (n = 6) immediately prior to the session. (B) Mean ± SEM latency to initially press the punished and unpunished lever (averaged across trials) during punishment acquisition.

128 Chapter 5: The Role of BLA and mAcbSh in Punishment

Figure 5.9. Effect of NBQX inactivation of the mAcbSh during punishment expression. (A) Mean ± SEM leverpresses on the punished and unpunished levers during punishment expression. Rats received within-subject infusions of saline and NBQX (n = 11) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM suppression ratios of NBQX on leverpressing during punishment expression. (C) Mean ± SEM latency to initially press the punished and unpunished lever across trials, after infusions of saline or NBQX, during punishment expression.

129 Chapter 5: The Role of BLA and mAcbSh in Punishment

punished lever (F(1,10) = 52.3; p < .05). There was no difference in responding between NBQX and saline tests for the punished (F(1,10) < 1; p > .05) or unpunished (F(1,10) = 1.4; p > .05) lever. NBQX also did not significantly alter punished (t(10) = .72; p > .05) or unpunished (t(10)

= -1.3; p > .05) leverpress ratios from 0.5 (Figure 5.9(B)).

When latencies to emit first responses (averaged across trials) were analysed, rats were significantly slower to respond on the punished lever than the unpunished lever (F(1,10) =

105.1; p < .05) (Figure 5.9(C)). NBQX infusion into AcbSh had no significant main effect on these latencies (F(1,10) < 1; p > .05). There was a trend towards a drug x lever interaction

(F(1,10) = 3.4; p < .1), which was mostly driven by a non-significant increase in unpunished leverpress latencies after NBQX (F(1,10) = 2.8; p > .05); punished leverpress latencies were unaffected by NBQX (F(1,10) < 1; p < .05).

Effects of AcbSh inactivation on aversive choice

Figure 5.10 shows responses on choice test and latencies to responses. Rats responded significantly more on the unpunished than the punished lever (F(1,10) = 38.1; p < .05). There was no difference in responding between NBQX and saline tests for the unpunished lever

(F(1,10) < 1; p > .05) or punished lever (F(1,10) = 1.25; p > .05). Rats were also significantly slower to respond on the punished relative to the unpunished lever (F(1,10) = 75.1; p < .05).

There was no effect of NBQX infusions on these leverpress latencies (all F(1,10) <1; p > .05).

Effects of AcbSh inactivation on locomotor activity

NBQX infusions into the AcbSh had no effect on locomotor activity. This was

regardless of whether total distance travelled (F(1,10) < 1; p > .05) or average velocity (F(1,10) <

1; p > .05) (Figure 5.11) was assessed.

130 Chapter 5: The Role of BLA and mAcbSh in Punishment

Figure 5.10. Effect of NBQX inactivation of the AcbSh on choice. (A) Mean ± SEM leverpresses on the punished and unpunished levers during the aversive choice task. Rats received within-subject infusions of saline and NBQX (n = 11) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM latency to initially press the punished and unpunished lever during the aversive choice task.

131 Chapter 5: The Role of BLA and mAcbSh in Punishment

Figure 5.11. Effect of NBQX inactivation of the AcbSh on unprovoked locomotion. (A) Mean ± SEM distance travelled after within-subject infusions of saline and NBQX (n = 11), counterbalanced across days. (B) Mean ± SEM total velocity after within-subject infusions of saline and NBQX (n = 11), counterbalanced across days.

132 Chapter 5: The Role of BLA and mAcbSh in Punishment

Discussion

The experiments reported in this chapter studied the roles of BLA and mAcbSh in punishment. Experiment 2 investigated the role of the BLA in punishment by reversibly inactivating rostral or caudal BLA, during punishment acquisition, expression and choice, while the role of the mAcbSh was investigated in Experiment 3 by infusions of the AMPA receptor antagonist NBQX. Replicating the findings from Experiment 1, rats learned to reduce responding on the punished lever across the course of punishment training and the latencies with which animals responded on this lever increased. Responding on the unpunished lever increased and latencies to respond on this lever remained low. When confronted with a choice between the unpunished versus punished lever, but in the absence of any punishment, rats showed a clear preference for the unpunished lever both in terms of total leverpresses as well as latencies to respond on the two levers.

Reversible inactivation of the BLA had significant but selective and anatomically specific effects on punishment. First, BLA inactivation significantly impaired the initial acquisition of punishment so that BM animals responded more on the punished lever than saline controls. Second, inactivation of the caudal but not the rostral BLA significantly reduced the expression of punishment as shown by significantly more responses on the punished lever between saline and BM tests, a significantly increased ratio of responses on these two tests, and significantly faster latencies to respond on the punished lever. These effects show that caudal but not rostral BLA is important for punishment. Nonetheless, there was no evidence here that BLA, either caudal or rostral, was important for choice between the punished and unpunished levers as measured by responses on either levers or latencies to respond on these levers. However, NBQX infusions into the mAcbSh had no significant effect on any of the behaviours measured.

133 Chapter 5: The Role of BLA and mAcbSh in Punishment

These results are consistent with a role for the BLA in determining the aversive value of the shock punisher. Three lines of evidence support this interpretation. First, BLA inactivation increased responding on the punished lever, consequently increasing the number of shocks that the animals received. Second, the impact of BLA inactivation depended on the presence of the punisher and was not observed in its absence; BLA inactivation had robust effects on performance during sessions when the shock punisher was delivered but no effect on performances in the choice test when the punisher was absent. The requirement of the footshock punisher to be present to detect an effect of BLA inactivation could be interpreted as BLA being important for footshock sensitivity. However, inactivation of the BLA (via lesions or reversible inactivations using muscimol or APV) does not affect rats’ sensitivity to footshock (Maren et al., 1996; Rabinak & Maren, 2008). Third, latency data from expression tests showed a dual effect of punishment. There was an increased latency to respond on the punished lever compared to the unpunished lever that was present at the start of the session, prior to any shock being delivered. There was also a second increase in these latencies across the session after shock had been delivered. BLA inactivation had no effect on initial latencies to respond on the punished lever. These inactivations did, however, prevent the increase in latencies across the session.

A key finding here was that there were differences between rostral and caudal BLA involvement in punishment. In fact, BLA contributions to punishment were difficult to detect when rostral and caudal inactivations were aggregated. This might explain why previous considerations of a role for the BLA (that did not account for a rostro-caudal subdivision) in instrumental aversive learning have been somewhat ambiguous (Maren, 2003). The role for caudal BLA in both extinction of instrumental responding (Hamlin et al., 2009; McLaughlin

& Floresco, 2007) and punishment could reflect this subregion’s general role in suppression of appetitively motivated behaviour.

134 Chapter 5: The Role of BLA and mAcbSh in Punishment

The role for the caudal BLA in extinction of instrumental responding has been linked to its connectivity with the mAcbSh (Millan & McNally, 2011). The BLA-mAcbSh pathway has also been implicated in stimulus-signalled active avoidance (Ramirez et al., 2015). Despite these roles in behaviour suppression and aversively-motivated behaviour, there was no evidence here for a role of AcbSh in punishment suppression and choice, as reversible inactivation of the AcbSh had no significant effects on punishment suppression within

Experiment 3. This lack of NBQX effect suggests that the glutamatergic BLA projection to the mAcbSh is not crucial for punishment learning and behaviour. More broadly, the current results suggest that glutamatergic signals to AcbSh neurons, including inputs from punishment-implicated PFC and hippocampus, may not be necessary for the acquisition or expression of instrumental suppression, or choice. The notable conditioned suppression observed within Experiment 3 was also unaffected by AMPA receptor blockade.

Taken together, these findings support the conclusion that the BLA is important for both the acquisition and expression of punishment but not for unpunished choice. This role appears to be linked to neurons in the caudal BLA, rather than rostral BLA, and is most parsimoniously interpreted as a role for the caudal BLA in determining the aversive value of the shock punisher. However, a role for glutamatergic inputs to the mAcbSh in punishment, including those from the caudal BLA, was not supported.

135 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Chapter 6 The Role of the Prefrontal Cortex in Punishment

Various structures of the PFC have been implicated in punishment-related learning and behaviour in several ways. As outlined in Chapter 2, the PFC has been ascribed roles in modulating aversive Pavlovian associations (Blum et al., 2006; Corcoran & Quirk, 2007;

Sierra-Mercada et al., 2011), top-down control of behaviour (Marquis et al., 2007; Ragozzino,

2007), aversive mesocortical signals (Lammel et al., 2012, 2014), and the formation of associations underpinning instrumental behaviour (Balleine & O’Doherty, 2010; Cardinal et al., 2003; Hayen et al., 2013; Killcross & Coutureau, 2003).

A role for the mPFC in punishment has not been demonstrated, though its role in other forms of learning and behaviour have been used to advocate a role for the mPFC in punishment. There is also a growing body of evidence that there are functional distinctions between dorsal (dmPFC) and ventral portions (vmPFC) of the mPFC (Balleine & O’Doherty,

2010; Hayen et al., 2013; Killcross & Coutureau, 2003; Laurent & Westbrook, 2009; Sierra-

Mercado et al., 2011; Willcocks & McNally, 2013), which have not been adequately examined in regards to punishment.

The dmPFC is composed of the ACC and PL. The ACC has been implicated in aversively-motivated reinforcement, and lesions of the ACC spare punishment suppression

(McCleary, 1961). Thus the ACC does not appear to be important for punishment. However, the role of the PL in punishment has not been adequately examined. The IL is the most studied structure of the vmPFC, and is located immediately ventral to the PL. The importance of the IL in punishment is unknown.

The lPFC consists of the OFC and rostral portions of the insula (RAIC). The medial

OFC has mostly been implicated in reward, whereas the lateral OFC and RAIC have been

136 Chapter 6: The Role of the Prefrontal Cortex in Punishment implicated in pain and aversion processing (Arana et al., 2003; O’Doherty et al., 2001; Wiech

& Tracey, 2013), suggesting a medio-lateral gradient in appetitive and aversive processing.

However, there is some evidence that the PFC is not required for punishment suppression. Pelloux and colleagues (2013) found that lesions of the ACC, PL, IL, OFC or

RAIC did not increase punished leverpressing (previously reinforced with cocaine) under extinction (no cocaine deliveries, only shocks), while BLA lesions did attenuate response reduction. This finding suggests the PFC is not required for punishment suppression.

However, prior cocaine self-administration as well as the extinction conditions used in this experiment make interpretation of the role of PFC in punishment difficult. Moreover, there are findings that PFC is required for instrumental extinction in specific ways (McLaughlin &

Floresco, 2007; Peters & De Vries, 2013; Peters et al., 2008, 2009; Willcocks & McNally,

2013). Thus the role of the PFC in various aspects of punishment, and punishment of responses reinforced by natural rewards, requires further investigation.

The following experiments investigated the role of the PL, IL and RAIC, as representatives of the dmPFC, vmPFC and lPFC, in punishment acquisition, expression and aversive choice. Rats received bilateral cannulation of the PL, IL or RAIC, permitting reversible inactivation via the GABAA and GABAB receptor agonists muscimol and baclofen

(BM).

Experiment 4: The role of the PL in punishment

Methods

Subjects

Subjects were 20 experimentally naive male Sprague Dawley rats (360-470g).

137 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Procedure

Procedures were identical to those described in the General Methods section (p. 105).

Cannulae (6mm) were implanted bilaterally according to the coordinates AP: +3.25, ML:

±0.75, DV: -3.3 mm from bregma (Paxinos & Watson, 2007).

The drug used for infusions was an even mixture of GABA agonists baclofen and muscimol (BM; 1mM baclofen, 0.1mM muscimol; Sigma-Aldrich, Sydney, Australia).

Results

Histology

The locations of microinjection cannulae are shown in Figure 6.1. Examination of placements revealed that 5 rats had misplaced cannulae and did not bilaterally target the PL.

These animals with misplaced cannulae were excluded from the analyses, leaving 15 animals remaining.

Pre-training

Mean ± SEM responses on the to-be-punished and to-be-unpunished levers are shown in Figure 6.2. There was no significant overall difference between saline and BM groups in leverpressing at the end of pre-training (F(1,13) = 1.3; p > .05), no overall difference in responding on the to-be punished and to-be unpunished levers (F(1,13) < 1; p > .05), and no group x lever interaction (F(1,13) < 1; p > .05).

Effects of PL inactivation on acquisition of punishment

Mean ± SEM leverpressing during the punishment phase are shown in Figure 6.2(A).

Over the course of this training, there was a significant effect of lever (punished versus unpunished) (F(1,13) = 134.2; p < .05), and the difference in responding on the levers increased across days (F(1,13) = 116.1; p < .05). Across days, there was an increase in responding on the

138 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Figure 6.1. Microinfusion cannula placements within the PL as verified by Nissl-stained sections. Black dots represent the most ventral point of the cannula tract, indicated on coronal sections adapted from Paxinos and Watson (2007).

139 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Figure 6.2. Effect of BM inactivation of the PL during punishment acquisition. (A) Mean ± SEM leverpresses on the punished and unpunished levers prior to and during punishment acquisition. T represents the last day of leverpress training. Arrows indicate days that rats received infusions of either saline (n = 7) or BM (n = 8) immediately prior to the session. (B) Mean ± SEM latency to initially press the punished and unpunished lever (averaged across trials) during punishment acquisition.

140 Chapter 6: The Role of the Prefrontal Cortex in Punishment

unpunished lever (F(1,13) = 95.1; p < .05) and a decrease in responding on the punished lever

(F(1,13) = 45.4; p < .05).

Rats received PL infusions of BM or saline prior to the first two days of punishment.

During these infusion days, there was no effect of BM on responding to the punished (F(1,13) <

1; p > .05) or unpunished lever (F(1,13) = 1.7; p > .05). Latencies to emit first responses were assessed (averaged across trials) (Figure 6.2(B)). During these infusion days, latencies to respond on the punished lever increased (F(1,13) = 52.1; p < .05) whereas latencies to respond on the unpunished lever did not change (F(1,13) < 1; p > .05). There was no effect of BM infusions on these latencies for either the punished or unpunished lever (all F(1,13) < 1; p >

.05). Thus, PL inactivations had no effect on responding during punishment acquisition.

Effects of PL inactivation on expression of punishment

Mean ± SEM leverpresses during expression are shown in Figure 6.3(A). There was a significant main effect of lever, such that rats responded more on the unpunished lever than the punished lever (F(1,14) = 172.7; p < .05). There was no difference in responding between

BM and saline tests for the punished or unpunished lever (all F(1,14) < 1; p > .05). BM also did not significantly alter punished (t(14) = 1.22; p > .05) or unpunished (t(14) = .80; p > .05) leverpress ratios from 0.5 (Figure 6.3(B)).

When latencies to emit first responses (averaged across trials) were analysed, rats were significantly slower to respond on the punished lever than the unpunished lever (F(1,14) = 55.1; p < .05) (Figure 6.3(C)). BM infusion into PL had no significant main effect on punished

(F(1,14) = 1.8; p > .05) or unpunished latencies (F(1,14) = 1.4; p > .05).

Effects of PL inactivation on aversive choice

Figure 6.4 shows responses on choice test and latencies to responses. Rats responded significantly more on the unpunished than the punished lever (F(1,14) = 84.1; p < .05). There

141 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Figure 6.3. Effect of BM inactivation of the PL during punishment expression. (A) Mean ± SEM leverpresses on the punished and unpunished levers during punishment expression. Rats received within-subject infusions of saline and BM (n = 15) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM suppression ratios of BM on leverpressing during punishment expression. (C) Mean ± SEM latency to initially press the punished and unpunished lever across trials, after infusions of saline or BM, during punishment expression.

142 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Figure 6.4. Effect of BM inactivation of the PL on choice. (A) Mean ± SEM leverpresses on the punished and unpunished levers during the aversive choice task. Rats received within- subject infusions of saline and BM (n = 15) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM latency to initially press the punished and unpunished lever during the aversive choice task.

143 Chapter 6: The Role of the Prefrontal Cortex in Punishment was no difference in responding between BM and saline tests for the unpunished lever or punished lever (F(1,14) < 1; p > .05). Rats were also significantly slower to respond on the punished relative to the unpunished lever (F(1,14) = 7.0; p < .05). There was no effect of BM infusions on these leverpress latencies (all F(1,14) <1; p > .05).

Experiment 5: The role of the IL in punishment

Methods

Subjects

Subjects were 33 experimentally naive male Sprague Dawley rats (290-380g).

Procedure

Procedures were identical to those described in the General Methods section. Drug effects on locomotion were not assessed. Cannulae (6mm) were implanted bilaterally into the

IL according to the coordinates AP: +3.1, ML: ±0.5, DV: -4.7 mm from bregma (Paxinos &

Watson, 2007).

The drug used for infusions was an even mixture of GABA agonists baclofen and muscimol (BM; 1mM baclofen, 0.1mM muscimol; Sigma-Aldrich, Sydney, Australia).

Results

Histology

The locations of microinjection cannulae are shown in Figure 6.5. Examination of placements revealed that 15 rats had cannulae that did not bilaterally target the IL, with a majority of these misplacements being too ventral and within the dorsal peduncular cortex

(Paxinos & Watson, 2007). These animals with misplaced cannulae were excluded from the analyses, leaving 18 animals remaining.

144 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Figure 6.5. Microinfusion cannula placements within the IL as verified by Nissl-stained sections. Black dots represent the most ventral point of the cannula tract, indicated on coronal sections adapted from Paxinos and Watson (2007).

145 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Pre-training

Mean ± SEM responses on the to-be-punished and to-be-unpunished levers are shown in Figure 6.6. There was no significant overall difference between saline and BM groups in leverpressing at the end of pre-training (F(1,16) < 1; p > .05), no overall difference in responding on the to-be punished and to-be unpunished levers (F(1,16) < 1; p > .05), and no group x lever interaction (F(1,16) < 1; p > .05).

Effects of IL inactivation on acquisition of punishment

Mean ± SEM leverpressing during the punishment phase are shown in Figure 6.6(A).

Over the course of this training, there was a significant effect of lever (punished versus unpunished), (F(1,16) = 65.2; p < .05) and the difference in responding on the levers increased across days, (F(1,16) = 41.1; p < .05). Across days, there was an increase in responding on the unpunished lever (F(1,16) = 22.4; p < .05) and a decrease in responding on the punished lever

(F(1,16) = 54.7; p < .05).

Rats received IL infusions of BM or saline prior to the first two days of punishment.

During these infusion days, there was no effect of BM on responding to the punished (F(1,16) <

1; p > .05) or unpunished lever (F(1,16) < 1; p > .05). Latencies to emit first responses were also assessed (averaged across trials) (Figure 6.6(B)). During these infusion days, latencies to respond on the punished lever increased (F(1,16) = 46.3; p < .05) whereas latencies to respond on the unpunished lever did not change (F(1,16) < 1; p > .05). There was no effect of BM infusions on these latencies for either the punished or unpunished lever (all F(1,16) < 1; p >

.05). Thus, IL inactivations had no effect on responding during punishment acquisition.

Effects of IL inactivation on expression of punishment

Mean ± SEM leverpresses during expression are shown in Figure 6.7(A). There was a significant main effect of lever, such that rats responded more on the unpunished lever than

146 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Figure 6.6. Effect of BM inactivation of the IL during punishment acquisition. (A) Mean ± SEM leverpresses on the punished and unpunished levers prior to and during punishment acquisition. T represents the last day of leverpress training. Arrows indicate days that rats received infusions of either saline (n = 7) or BM (n = 8) immediately prior to the session. (B) Mean ± SEM latency to initially press the punished and unpunished lever (averaged across trials) during punishment acquisition.

147 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Figure 6.7. Effect of BM inactivation of the IL during punishment expression. (A) Mean ± SEM leverpresses on the punished and unpunished levers during punishment expression. Rats received within-subject infusions of saline and BM (n = 18) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM suppression ratios of BM on leverpressing during punishment expression. (C) Mean ± SEM latency to initially press the punished and unpunished lever across trials, after infusions of saline or BM, during punishment expression.

148 Chapter 6: The Role of the Prefrontal Cortex in Punishment

the punished lever (F(1,17) = 101.4; p < .05). There was no difference in responding between

BM and saline tests for the punished or unpunished lever (all F(1,17) < 1; p > .05). BM also did not significantly alter punished (t(17) = 0.05; p > .05) or unpunished (t(17) = 1.22; p > .05) leverpress ratios from 0.5 (Figure 6.7(B)).

When latencies to emit first responses (averaged across trials) were analysed, rats were significantly slower to respond on the punished lever than the unpunished lever (F(1,17) = 48.2; p < .05) (Figure 6.7(C)). BM infusion into IL had no significant main effect on punished

(F(1,17) < 1; p > .05) or unpunished latencies (F(1,17) < 1; p > .05).

Effects of IL inactivation on aversive choice

Figure 6.8 shows responses on choice test and latencies to responses. Rats responded significantly more on the unpunished than the punished lever (F(1,17) = 70.5; p < .05). There was significantly more leverpressing following BM infusions compared to saline (F(1,17) =

5.42; p < .05). There was also a drug x lever interaction (F(1,17) = 5.77; p < .05), driven by significantly more unpunished leverpressing following BM compared to saline (F(1,17) = 6.58; p < .05), while there was a non-significant decrease in punished leverpressing following BM

(F(1,17) = 1.72; p > .05).

Rats were also significantly slower to respond on the punished relative to the unpunished lever (F(1,17) = 11.8; p < .05). There was a significant effect of drug on overall leverpress latencies, with significantly shorter latencies to press following BM infusions compared to saline (F(1,17) = 4.76; p < .05). However, unlike total leverpresses, there was no interaction effect of drug and lever (F(1,17) = 2.93; p > .05), with simple effects failing to detect significant effects of drug on either the punished (F(1,17) = 3.94; p = .064) or unpunished lever (F(1,17) < 1; p > .05).

Due to the significant effect of IL inactivations on leverpressing, effect of drug on within-session leverpressing was analysed (data not shown). Rate of pressing the unpunished

149 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Figure 6.8. Effect of BM inactivation of the IL on choice. (A) Mean ± SEM leverpresses on the punished and unpunished levers during the aversive choice task. Rats received within- subject infusions of saline and BM (n = 15) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM latency to initially press the punished and unpunished lever during the aversive choice task. *p < .05

150 Chapter 6: The Role of the Prefrontal Cortex in Punishment

lever did not significantly change across the session (F(1,17) = 1.53; p > .05), and BM had no effect on this (F(1,17) = 1.27; p > .05). Rate of pressing the punished lever significantly increased across the session (F(1,17) = 5.84; p > .05), but BM had no effect on this increase

(F(1,17) < 1; p > .05). Thus IL inactivations increased unpunished leverpressing and decreased leverpress latencies during choice test, but did not affect change in rate of leverpressing across the choice test.

Experiment 6: The role of the RAIC in punishment

Methods

Subjects

Subjects were 20 experimentally naive male Sprague Dawley rats (260-390g).

Procedure

Procedures were identical to those described in the General Methods section. Drug effects on locomotion were not assessed. Cannulae (11mm) were implanted bilaterally according to the coordinates AP: +2.65, ML: ±3.7, DV: -5.1 mm from bregma (Paxinos &

Watson, 2007).

The drug used for infusions was an even mixture of GABA agonists baclofen and muscimol (BM; 1mM baclofen, 0.1mM muscimol; Sigma-Aldrich, Sydney, Australia).

Data Analysis

One rat received unilateral BM infusions during punishment acquisition due to cannulae patency issues, and was therefore excluded from pre-training and acquisition analyses, but was included in expression and choice analyses due to resolution of the patency issues. Three rats were excluded from choice analyses due to a program malfunction on their second choice test.

151 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Results

Histology

The locations of microinjection cannulae are shown in Figure 6.9. Examination of placements revealed that 5 rats had cannulae that did not bilaterally target the RAIC. These animals with misplaced cannulae were excluded from the analyses, leaving 15 animals remaining.

Pre-training

There was no significant overall difference between saline and BM groups in leverpressing at the end of pre-training (F(1,12) < 1; p > .05), no overall difference in responding on the to-be punished and to-be unpunished levers (F(1,12) < 1; p > .05), and no group x lever interaction (F(1,12) < 1; p > .05).

Effects of RAIC inactivation on acquisition of punishment

Mean ± SEM leverpressing during the punishment phase are shown in Figure 6.10(A).

Over the course of this training, there was a significant effect of lever (punished versus unpunished), (F(1,12) = 42.8; p < .05) and the difference in responding on the levers increased across days, (F(1,12) = 21.7; p < .05). Across days, there was an increase in responding on the unpunished lever (F(1,12) = 62.3; p < .05) and a decrease in responding on the punished lever

(F(1,12) = 25.4; p < .05).

Rats received RAIC infusions of BM or saline prior to the first two days of punishment.

During these infusion days, there was no effect of BM on responding to the punished (F(1,12) <

1; p > .05) or unpunished lever (F(1,12) < 1; p > .05). Latencies to emit first responses were also assessed (averaged across trials) (Figure 6.10(B)). During these infusion days, latencies to respond on the punished lever increased (F(1,12) = 64.0; p < .05) whereas latencies to respond on the unpunished lever did not change (F(1,12) < 1; p > .05). There was no significant

152 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Figure 6.9. Microinfusion cannula placements within the RAIC as verified by Nissl-stained sections. Black dots represent the most ventral point of the cannula tract, indicated on coronal sections adapted from Paxinos and Watson (2007).

153 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Figure 6.10. Effect of BM inactivation of the RAIC during punishment acquisition. (A) Mean ± SEM leverpresses on the punished and unpunished levers prior to and during punishment acquisition. T represents the last day of leverpress training. Arrows indicate days that rats received infusions of either saline (n = 7) or BM (n = 7) immediately prior to the session. (B) Mean ± SEM latency to initially press the punished and unpunished lever (averaged across trials) during punishment acquisition.

154 Chapter 6: The Role of the Prefrontal Cortex in Punishment effect of BM infusions on these latencies for either the punished or unpunished lever (all

F(1,12) < 1.2; p > .05). Thus, RAIC infusions of BM had no effect on responding during punishment acquisition.

Effects of RAIC inactivation on expression of punishment

Mean ± SEM leverpressing are shown in Figure 6.11(A). There was a significant main effect of lever, such that rats responded more on the unpunished lever than the punished lever

(F(1,14) = 74.5; p < .05). There was no difference in responding between BM and saline tests for the punished (F(1,14) = 2.23; p > .05) or unpunished lever (F(1,14) < 1; p > .05). BM also did not significantly alter punished (t(14) = 0.40; p > .05) or unpunished (t(14) = 1.07; p > .05) leverpress ratios from 0.5 (Figure 6.11(B)).

When latencies to emit first responses (averaged across trials) were analysed, rats were significantly slower to respond on the punished lever than the unpunished lever (F(1,14) = 29.9; p < .05) (Figure 6.11(C)). BM infusion into RAIC had no significant main effect on punished

(F(1,14) < 1; p > .05) or unpunished latencies (F(1,14) = 1.30; p > .05).

Effects of RAIC inactivation on aversive choice

Figure 6.12 shows responses on choice test and latencies to responses. Rats responded significantly more on the unpunished than the punished lever (F(1,11) = 36.7; p < .05). There was no significant difference in overall responding between BM and saline tests (F(1,11) =

1.41; p > .05), but there was a significant drug x lever interaction (F(1,11) = 6.94; p < .05).

However, follow-up simple effect analyses failed to detect significant differences in leverpressing between BM and saline sessions on either lever, though the trend was an increase in punished leverpressing (F(1,11) = 2.17; p > .05) and a decrease in unpunished leverpressing (F(1,11) = 4.64; p = .054) following BM infusions.

155 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Figure 6.11. Effect of BM inactivation of the RAIC during punishment expression. (A) Mean ± SEM leverpresses on the punished and unpunished levers during punishment expression. Rats received within-subject infusions of saline and BM (n = 15) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM suppression ratios of BM on leverpressing during punishment expression. (C) Mean ± SEM latency to initially press the punished and unpunished lever across trials, after infusions of saline or BM, during punishment expression.

156 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Figure 6.12. Effect of BM inactivation of the RAIC on choice. (A) Mean ± SEM leverpresses on the punished and unpunished levers during the aversive choice task. Rats received within- subject infusions of saline and BM (n = 12) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM latency to initially press the punished and unpunished lever during the aversive choice task. *p < .05

157 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Although there was a trend towards rats pressing the unpunished lever faster than the punished lever, this difference did not reach statistical significance (F(1,11) = 4.12; p = .067).

There were also no significant effects of BM on leverpress latencies (all F(1,11) < 1.2; p > .05).

Discussion

The experiments reported in this chapter studied the roles of PFC regions in punishment by reversibly inactivating the PL, IL or RAIC, using baclofen/muscimol (BM), during various punishment-influenced tasks. As in previous experiments, rats learned to reduce responding on the punished lever across the course of punishment training and the latencies with which animals responded on this lever increased. Responding on the unpunished lever increased and latencies to respond on this lever were low. When confronted with a choice between the unpunished versus punished lever, but in the absence of any punishment, rats showed a clear preference for the unpunished lever both in terms of total leverpresses as well as latencies to respond on the two levers.

In Experiment 4, PL inactivations had no effect on any measures. In Experiment 5, IL inactivations had no effect on punishment acquisition or expression, but significantly increased unpunished leverpressing and decreased overall leverpress latency in the choice test. In Experiment 6, RAIC inactivations had no effect on punishment acquisition or expression, but did shift responding from the unpunished lever to the punished lever in a choice test, though this was not substantiated by statistically significant changes in leverpressing, or latencies to respond, on either lever alone. The specific effect of IL and

RAIC inactivations during choice tests will be discussed in the General Discussion (Chapter

8, p. 209), though the lack of detectable effect of inactivations on responding on the punished lever during this task suggest instrumental suppression was relatively unaffected.

158 Chapter 6: The Role of the Prefrontal Cortex in Punishment

Given the lack of isolatable effects on pressing the punished lever, these results suggest the PFC is not necessary for instrumental suppression. This is despite the PFC’s asserted role in aversion processing (Coghill et al., 1994; Franciotti et al., 2009; Lammel, 2012; Sierra-

Mercado et al., 2011; Vidal-Gonzalez et al., 2006), regulation of R-O associations (Cardinal et al., 2003; Corbit & Balleine, 2003; Killcross & Coutureau, 2003), behavioural flexibility

(Marquis et al., 2007; Ragozzino, 2007), and inhibitory control of behaviour (Balleine &

O’Doherty, 2010; Cai et al., 2014; Hayen et al., 2013; Sierra-Mercado et al., 2011). These functions are presumed to be important for punishment, though the current experiment suggests other structures and circuits can mediate instrumental suppression.

These findings are also in agreement with the conclusions of Pelloux and colleagues

(2013). They found that lesions of the PL, IL, or RAIC, did not increase punished leverpressing for cocaine under extinction conditions (no cocaine deliveries, only shocks), which they used to suggest that the PFC is not required for punishment suppression. Though use of cocaine as the reinforcer and testing under extinction conditions made strong interpretations difficult, the current results also suggest the PL, IL and RAIC are not required for the acquisition or expression of punishment suppression of responding for food reward.

159 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Chapter 7 The Role of Midbrain Dopamine Circuits in Punishment

DA neurons of the ventral midbrain have been implicated in aversive and instrumental learning. This includes mesolimbic and mesocortical neurons which transmit DA signals from the VTA to the limbic system and PFC, and the nigrostriatal system, which is comprised of

DAergic SN projections to the dorsal striatum (dStr).

VTA DA neuron firing is strongly implicated appetitive behaviour (Ikemoto, 2007;

Schultz, 2007b; Wise & Rompré, 1989); phasic increases in DA neuron firing are necessary and sufficient for reinforcement (Adamantidis et al., 2011; Cannon & Palmiter, 2003; Kim et al., 2012; Parker et al., 2010; Zweifel et al., 2009). Conversely, phasic pauses in VTA DA firing can signal negative outcomes (Guarraci & Kapp, 1999; Matsumoto & Hikosaka, 2009a;

Tan et al., 2012) and is sufficient to induce CPA (Danjo et al., 2014; Liu et al., 2008; Tan et al., 2012). Therefore VTA DA neuron inhibition can signal punishing outcomes, and this inhibition is sufficient for aversion. However, whether these phasic pauses are involved in punishment learning and behaviour is unclear.

These aversive phasic pauses in DA are hypothesized to come from activation of the

LHb (Hikosaka, 2010; Hikosaka et al., 2008; Stamatakis & Stuber, 2012), which inhibits midbrain DA neurons via the GABAergic rostromedial tegmental nucleus (RMTg) (Hong et al., 2010). Stimulation of the LHb or RMTg is aversive and can act as a punisher (Friedman et al., 2010, 2011; Stamatakis & Stuber, 2012). Lammel and colleagues (2012) have also shown that the LHb excites mesocortical DA neurons, which have been ascribed a distinct role in aversion (Lammel et al., 2011; Mantz et al., 1989). These findings have been used to suggest that LHb activity is required for punishment, but this theory has not been directly confirmed.

The nigrostriatal pathway is also implicated in aversion-related processing and plasticity; inhibition of the SNc and the dStr elicits a place aversion (Ilango et al., 2014). The

160 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment dStr has also been proposed to mediate punishment via its role as the source of the direct and indirect basal ganglia pathway, which mediate response promotion and suppression, respectively (Kravitz & Kreitzer, 2012). In particular, the nigrostriatal DA projection recruits direct and indirect pathways via D1 and D2 receptors (D1r and D2r), as dStr direct pathway neurons tend to express D1r while indirect pathway neurons tend to express D2r (Gerfun et al.,

1990; Hikada et al., 2010; Kravitz et al., 2010). Stimulation of D2r-expressing neurons in the dmStr suppresses responding in a similar manner to punishment (Kravitz et al., 2010), but it is unclear whether D2r activation (or other determinants of dStr activity) is required for punishment learning and behaviour.

The nigrostriatal pathway includes innervation of both medial and lateral portions of the dStr (dmStr and dlStr, respectively). The above studies implicating the dStr in aversion and instrumental behaviour either focused on the dmStr, or highlighted the dmStr over the dlStr, in these functions (Ilango et al., 2014; Kravitz et al., 2010). The dmStr has also been separately implicated in R-O association learning (Balleine et al, 2010; Hart et al., 2014); response-punisher associations are proposed to underpin instrumental suppression, and thus the dmStr may also mediate punishment by mediating this function. The dmStr receives converging inputs from the SNc, BLA, thalamus, and PFC (Voorn et al., 2004), which likely contribute to dmStr involvement in instrumental behaviour (Hart et al., 2014).

To test the role of VTA inhibition in punishment, the VTA was bilaterally cannulated in

Experiment 7. This permitted the infusion of GABAA receptor antagonist bicuculline, which would prevent the inhibition of VTA DA neurons (Lobb et al., 2010, 2011; Tan et al., 2012).

The aim of Experiment 8 was to study the role of the LHb in punishment. Rats received bilateral cannulation of the LHb permitting reversible inactivation using the AMPA/kainate receptor antagonist NBQX, which has been previously shown to attenuate aversion-related signals within the LHb (Hong & Hikosaka, 2010; Shabel et al., 2012). Rats were then

161 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment subjected to a slightly modified version of the standard punishment protocol to allow a broader assessment of LHb significance within this task; the effects of intra-LHb infusions of sodium (Na+) and calcium (Ca2+) channel blockers (bupivacaine and mibefradil, respectively) on punishment expression were examined. Experiment 9 investigated the role of dmStr in punishment. Rats received bilateral inactivations of the dmStr during punishment acquisition, expression and choice using the GABA agonists baclofen and muscimol (BM). Given the proposed importance of D1r and D2r neurons within the dStr, rats also received D1r and D2r antagonists (SCH39166 and eticlopride) during expression and choice, as the use of within- subject manipulation during these phases allowed for inclusion of these drugs without compromising statistical power.

Experiment 7: The role of VTA inhibition in punishment

Methods

Subjects

Subjects were 16 experimentally naive male Sprague Dawley rats (360-470g).

Procedure

Procedures were identical to those described in the General Methods section (p. 105).

Cannulae (11mm) were implanted bilaterally according to the coordinates AP: -5.8, ML:

±0.75, DV: -8.2 mm from bregma (Paxinos & Watson, 2007).

Given VTA inhibition is likely mediated by GABAA receptors on these neurons (Ji &

Shepard, 2007; Lobb et al., 2010, 2011), the drug used for infusions was GABAA antagonist bicuculline (Bic; 0.1µg/µl; Tocris, Sydney, Australia).

162 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Results

Histology

The locations of microinjection cannulae are shown in Figure 7.1. Examination of placements revealed that 3 rats had misplaced cannulae and did not bilaterally target the VTA.

These animals with misplaced cannulae were excluded from the analyses, leaving 13 animals remaining (Saline group n = 7, Bic group n = 6).

Given that over-excitation of neurons can cause excitotoxic cell death (Choi, 1992), histological examination of the VTA for excitotoxic lesions were also conducted. No lesions at infusion sites were observed.

Pre-training

The mean ± SEM responses on the to-be-punished and to-be-unpunished levers are shown in Figure 7.2. There was a trend towards more pressing on the to-be unpunished lever compared to the to-be punished lever (F(1,11) = 4.36; p = .061). There was no significant overall difference between acquisition groups in leverpressing at the end of pre-training

(F(1,11) < 1; p > .05), and no group x lever interaction (F(1,11) < 1; p > .05).

Effects of VTA disinhibition on acquisition of punishment

Mean ± SEM leverpressing during punishment acquisition are shown in Figure 7.2(A).

Over the course of this training, there was a significant effect of lever (punished versus unpunished), (F(1,11) = 36.7; p < .05) and the difference in responding on the levers increased across days, (F(1,11) = 29.1; p < .05). Across days, there was an increase in responding on the unpunished lever (F(1,11) = 14.0; p < .05) and a decrease in responding on the punished lever

(F(1,11) = 100.1; p < .05).

Rats received VTA infusions of bicuculline or saline prior to the first two days of punishment (Acq Bic or Acq Sal groups, respectively). During these infusion days, there was

163 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Figure 7.1. Microinfusion cannula placements within the VTA as verified by Nissl-stained sections. Black dots represent the most ventral point of the cannula tract, indicated on coronal sections adapted from Paxinos and Watson (2007).

164 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Figure 7.2. Effect of bicuculline (Bic) disinhibition of the VTA during punishment acquisition. (A) Mean ± SEM leverpresses on the punished and unpunished levers prior to and during punishment acquisition. T represents the last day of leverpress training. Arrows indicate days that rats received infusions of either saline (n = 7) or Bic (n = 6) immediately prior to the session. (B) Mean ± SEM latency to initially press the punished and unpunished lever (averaged across trials) during punishment acquisition. *p < .05

165 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

no effect of bicuculline infusions on unpunished leverpressing (F(1,11) < 1; p > .05), but Acq

Bic rats pressed the punished lever significantly more than Acq Sal rats (F(1,11) = 10.6; p <

.05). Interestingly, this difference persisted after infusion days; Acq Bic rats pressed the punished lever more than saline rats (F(1,11) = 6.6; p < .05), while there was no difference between groups in unpunished leverpressing (F(1,11) < 1; p < .05), on the subsequent 3 non- infusion days.

Latencies to emit first responses were also assessed (averaged across trials) (Figure

7.2(B)). During these infusion days, latencies to respond on the punished lever increased

(F(1,11) = 30.3; p < .05) whereas latencies to respond on the unpunished lever did not change

(F(1,11) < 1; p > .05). There was a significant interaction of drug on this increase in punished latencies (F(1,11) = 6.6; p < .05), with the saline (F(1,11) = 35.2; p < .05) but not bicuculline

(F(1,11) = 4.0; p > .05), group showing a significant increase in latencies to press the punished lever across infusion days. In agreement with the leverpress findings for non-infusion days, there was a significant effect of drug on punished (F(1,11) = 14.0; p < .05) but not unpunished leverpress latencies (F(1,11) < 1; p > .05), revealing that Acq Bic rats initiated punished leverpressing significantly quicker than saline rats on subsequent non-infusion days.

Effects of VTA disinhibition on expression of punishment

There were long-lasting consequences for performance of the original infusions in the

Acq Sal and Acq Bic groups. Therefore, the figures (Figures 7.3 – 7.5) depict the data according to these acquisition groups, and additional analyses according to acquisition group are reported after the usual aggregated within-subject analyses are reported.

There was a significant main effect of lever, such that rats responded more on the unpunished lever than the punished lever (F(1,12) = 34.6; p < .05) (Figure 7.3(A)). There was no difference in responding between bicuculline and saline tests for the punished lever (F(1,12)

< 1; p > .05), and a trend towards less unpunished leverpressing following bicuculline

166

Figure 7.3. Effect of bicuculline (Bic) disinhibition of the VTA during punishment expression across acquisition groups (Acq Sal and Acq Bic). (A) Mean ± SEM leverpresses on the punished and unpunished levers during punishment expression. Rats received within-subject infusions of saline and Bic immediately prior to the session, counterbalanced across days. (B) Mean ± SEM suppression ratios of Bic on leverpressing during punishment expression. (C) Mean ± SEM latency to initially press the punished and unpunished lever across trials, after infusions of saline or Bic, during punishment expression. *p < .05 166

Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

infusions (F(1,12) = 4.1; p = .051). Analyses revealed that bicuculline caused a significant decrease in unpunished (t(12) = -2.41; p < .05) but no significant change in punished leverpressing ratios (t(12) = .31; p > .05) (Figure 7.3(B)), showing bicuculline generally decreased unpunished leverpressing.

When latencies to emit first responses (averaged across trials) were analysed, rats were significantly slower to respond on the punished lever than the unpunished lever (F(1,12) = 22.9; p < .05) (Figure 7.3(C)). In agreement with leverpressing results, bicuculline infusion into

VTA had no significant main effect on punished latencies (F(1,12) < 1; p > .05) but significantly increased latencies to press the unpunished lever (F(1,12) = 5.0; p < .05).

Because Acq Bic rats pressed the punished lever more than Acq Sal rats on previous non-infusion days, a between-subject factor of acquisition group was added to the within- subjects analyses for punishment expression. Acq Bic rats continued to press the punished lever significantly more than Acq Sal rats (F(1,11) = 16.4; p < .05) during expression, without any group differences in unpunished leverpressing (F(1,11) < 1; p > .05) (Figure 7.3(A)). There was no interaction of acquisition infusion group and expression infusion on punished (F(1,11) <

1; p > .05) or unpunished leverpressing (F(1,11) < 1; p > .05).

Acq Bic rats were also significantly faster to initially press the punished lever compared to Acq Sal rats (F(1,11) = 5.6; p < .05) during punishment expression, with no significant difference between groups in unpunished leverpress latencies (F(1,11) = 1.2; p > .05). There was no interaction of acquisition group and expression infusion on punished (F(1,11) < 1; p >

.05) or unpunished leverpress latencies (F(1,11) = 1.4; p > .05).

In summary, the enduring effect of acquisition infusions was observed during expression sessions; Acq Bic rats pressed the punished lever more and initiated punished leverpressing sooner than Acq Sal rats, regardless of infusion during expression. Separately, bicuculline decreased unpunished leverpressing and increased unpunished leverpress

168 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment latencies, regardless of acquisition group, showing VTA disinhibition impaired unpunished responding.

Effects of VTA disinhibition on aversive choice

Aggregated across acquisition groups, rats responded significantly more on the unpunished than the punished lever during choice tests (F(1,12) = 22.7; p < .05). Rats were also significantly slower to respond on the punished relative to the unpunished lever (F(1,12) = 5.6; p < .05). No significant effect of bicuculline infusions on pressing unpunished or punished levers was detected (all F(1,12) < 3.1; p > .05). There was also no significant effect of bicuculline infusions on leverpress latencies (all F(1,12) < 2; p > .05).

Analysis of leverpressing by acquisition groups (Figure 7.4(A)) revealed that Acq Bic rats pressed the punished lever more than Acq Sal rats (F(1,11) = 5.6; p < .05), with no differences between Acq Bic and Acq Sal rats in overall unpunished leverpressing (F(1,11) < 1; p > .05), an effect consistent with the group differences observed throughout acquisition and expression. However, there was a significant interaction of acquisition infusion group and choice test infusion on pressing of the punished lever (F(1,11) = 11.6; p < .05), with Acq Sal rats pressing the punished lever more after bicuculline infusions compared to saline (F(1,11) =

8.4; p < .05), whereas there was a non-significant decrease in pressing the punished lever for

Acq Bic rats following bicuculline (F(1,11) = 3.8; p > .05). There was also a trend towards interaction of acquisition infusion group and choice infusion on pressing the unpunished lever

(F(1,11) = 4.8; p = .052), driven by Acq Sal rats pressing the unpunished lever less after bicuculline infusions (F(1,11) = 8.8; p < .05), while pressing the unpunished levers was unaffected by bicuculline in Acq Bic rats (F(1,11) < 1; p > .05).

To better understand the increased pressing of the punished lever following bicuculline in the Acq Sal group, within-session changes in leverpressing were analysed. There was a significant effect of infusion on change in leverpressing across choice test for the punished

169 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Figure 7.4. Effect of bicuculline (Bic) disinhibition of the VTA on choice, separated into acquisition groups (Acq Sal and Acq Bic). (A) Mean ± SEM leverpresses on the punished and unpunished levers during the aversive choice task. Rats received within-subject infusions of saline and Bic immediately prior to the session, counterbalanced across days. (B) Mean ± SEM latency to initially press the punished and unpunished lever during the aversive choice task. *p < .05

170 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

(F(1,11) = 6.19; p < .05), but not for the unpunished lever (F(1,11) < 1; p > .05). This significant interaction between drug and rate of punished leverpressing across choice test was driven by a significant within-session increase in pressing the punished lever following bicuculline infusions (F(1,11) = 9.80; p < .05), but no change in pressing of the punished lever following saline infusions (F(1,11) < 1; p > .05).

When choice test latencies were broken down by acquisition group (Figure 7.4(B)), there was a trend towards Acq Bic rats pressing the punished lever faster than Acq Sal rats

(F(1,11) = 4.7; p = .053), and no effect of acquisition group on unpunished leverpresses (F(1,11)

= 1.0; p > .05). However, unlike leverpresses, there was no significant interaction between acquisition infusions and choice infusions on latencies to press the punished (F(1,11) = 1.9; p >

.05) or unpunished lever (F(1,11) = 1.0; p > .05).

In summary, Acq Bic rats continued to press the punished lever more than Acq Sal rats in choice tests. Bicuculline infusions immediately prior to choice test selectively increased punished leverpressing in Acq Sal rats, which was characterised by an increase in punished leverpressing across choice test (not observed following saline infusions). Although the pattern of leverpress latencies during choice reflect these effects on leverpressing, no statistically significant effects were detected when latencies were analysed.

Effects of VTA disinhibition on locomotor activity

Bicuculline infusions into the VTA had no effect total distance travelled (F(1,14) < 1; p >

.05) or total velocity (F(1,14) < 1; p > .05). When analysed by acquisition group, a main effect of acquisition group was found for both distance travelled (F(1,11) = 20.2; p < .05) and total velocity (F(1,11) = 17.3; p < .05); Acq Bic rats were hyperactive compared to Acq Sal rats

(Figure 7.5). There were no interactions between acquisition group and locomotor test infusion on distance travelled (F(1,11) = 2.0; p > .05) or total velocity (F(1,11) < 1; p > .05).

171 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Figure 7.5. Effect of bicuculline (Bic) disinhibition of the VTA on unprovoked locomotion, separated into acquisition groups (Acq Sal and Acq Bic). (A) Mean ± SEM distance travelled after within-subject infusions of saline and Bic, counterbalanced across days. (B) Mean ± SEM total velocity after within-subject infusions of saline and Bic, counterbalanced across days. *p < .05

172 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Experiment 8: The role of the LHb in punishment

Methods

Subjects

Subjects were 17 experimentally naive male Sprague Dawley rats (300-380g).

Procedure

There were slight differences between the procedure described in the General Methods section and this experiment. Namely, instead of undertaking the choice test, some rats (n =

10) received additional punishment sessions with infusions of Na+ and Ca2+ channel blockers for assessment of ion channel blockade on punishment expression. The remainder of the experiment was identical to that described in the General Methods section, including subsequent locomotion test. The particular procedures for punishment expression and choice will be described below.

Cannulae (6mm) were implanted bilaterally into the LHb according to the coordinates

AP: -3.8, ML: ±0.8, DV: -4.8 mm from bregma. Apart from the Na+ and Ca2+ channel blockers used in the additional expression test, the infused drug was always AMPA antagonist

NBQX (1µg/µl; Sigma-Aldrich, Sydney, Australia).

Punishment expression. On Days 6 – 7 all rats received bilateral infusions of either saline or NBQX (counterbalanced within-subject) to test for the effect of LHb inactivation on expression of punishment, as outlined in the General Methods section.

Choice test. Six rats received the choice test on Days 9 and 11 following counterbalanced saline or NBQX infusions, as outlined in the General Methods section.

Ion channel blockers into LHb on punishment expression. Instead of choice sessions, 10 rats received, on Days 9 and 11 in a fully counterbalanced manner, bilateral LHb infusions of

Na+ channel blocker bupivacaine (5 µg/µl) or Ca2+ channel blocker mibefradil (1 µg/µl)

173 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment immediately before punishment sessions. This was to assess whether blocking other means of signal transmission through the LHb would affect expression of punished and unpunished leverpressing. The dose of bupivacaine used is sufficient to block voltage-gated Na+ channels

(Stoetzer et al., 2014), while the dose of mibefradil used would have effectively blocked both

L- and T-type Ca2+ channels (Leuranguer et al., 2001; Xia et al., 2004), both of which are found within the LHb (Huguenard et al., 1993; Meye et al., 2013). Rats received standard punishment sessions with no infusions on Days 8 and 10.

Data Analysis

Within-subjects ANOVAs were used to analyse total leverpresses and leverpress latencies for ion channel blockade on punishment expression. In these analyses lever

(punished vs. unpunished) was one within-subjects factor and drug (saline vs. drug) was the other.

Results

Histology

The locations of microinjection cannulae are shown in Figure 7.6. One rat was excluded due to incorrect cannula placement. The remaining 16 rats had confirmed bilateral placements in LHb.

Leverpress training

There was no significant overall difference between saline and NBQX groups in leverpressing at the end of leverpress training (F(1,14) < 1; p > .05), no overall difference in responding on the to-be punished and to-be unpunished levers (F(1,14) = 1.2; p > .05), and no group x lever interaction (F(1,14) < 1; p > .05) (Figure 7.7).

174 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Figure 7.6. Microinfusion cannula placements within the LHb as verified by Nissl-stained sections. Black dots represent the most ventral point of the cannula tract, indicated on coronal sections adapted from Paxinos and Watson (2007).

175 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Figure 7.7. Effect of NBQX inactivation of the LHb during punishment acquisition. (A) Mean ± SEM leverpresses on the punished and unpunished levers prior to and during punishment acquisition. T represents the last day of leverpress training. Arrows indicate days that rats received infusions of either saline (n = 8) or NBQX (n = 8) immediately prior to the session. (B) Mean ± SEM latency to initially press the punished and unpunished lever (averaged across trials) during punishment acquisition.

176 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Effects of LHb inactivation on acquisition of punishment

Mean ± SEM leverpresses during punishment acquisition are shown in Figure 7.7(A).

Over the course of this training, there was a significant effect of lever (punished versus unpunished), (F(1,14) = 170.7; p < .05) and the difference in responding on the levers increased across days, (F(1,14) = 36.1; p < .05). Across days, there was an increase in responding on the unpunished lever (F(1,14) = 26.6; p < .05) and a decrease in responding on the punished lever

(F(1,14) = 15.4; p < .05).

Rats received LHb infusions of NBQX or saline prior to the first two days of training.

During these infusion days, there was no effect of LHb inactivations using NBQX on responding to the punished (F(1,14) < 1; p > .05) or unpunished lever (F(1,14) < 1; p > .05).

Latencies to emit first responses were also assessed (averaged across trials) (Figure 7.7(B)).

During these infusion days, latencies to respond on the punished lever increased (F(1,14) =

12.0; p < .05) whereas latencies to respond on the unpunished lever did not significantly change (F(1,14) < 1; p > .05). There was no effect of NBQX infusions on these latencies for either the punished (F(1,14) < 1; p > .05) or unpunished (F(1,14) = 1.9; p > .05) lever. Thus, LHb infusions of NBQX had no effect on the acquisition of responding.

Effects of LHb inactivation on expression of punishment

Mean ± SEM leverpresses are shown in Figure 7.8(A). There was a significant main effect of lever, such that rats responded more on the unpunished lever than the punished lever

(F(1,14) = 292.2; p < .05). There was no difference in responding between NBQX and saline tests for the punished (F(1,14) < 1; p > .05) or unpunished (F(1,14) < 1; p > .05) lever. NBQX also did not significantly alter punished (t(14) = .79; p > .05) or unpunished (t(14) = 1.6; p > .05) leverpress ratios from 0.5 (Figure 7.8(B)).

When latencies to emit first responses (averaged across trials) were analysed, rats were significantly slower to respond on the punished lever than the unpunished lever (F(1,14) = 14.8;

177

Figure 7.8. Effect of NBQX inactivation of the LHb during punishment expression. (A) Mean ± SEM leverpresses on the punished and unpunished levers during punishment expression. Rats received within-subject infusions of saline and NBQX (n = 15) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM suppression ratios of NBQX on leverpressing during punishment expression. (C) Mean ± SEM latency to initially press the punished and unpunished lever across trials, after infusions of saline or NBQX, during punishment expression.

177

Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment p < .05) (Figure 7.8(C)). NBQX infusion into LHb had no significant effect on these latencies

(all F(1,14) < 1; p > .05).

Effects of LHb ion channel blockade on expression of punishment

Following expression punishment sessions, 10 rats were subjected to an extension of the punishment expression test using counterbalanced infusions of Na+ channel blocker bupivacaine and Ca2+ channel blocker mibefradil, each immediately prior to a punishment session given 2 days apart, with an infusion-free punishment session between these two days.

Responding on levers after bupivacaine and mibefradil infusions were compared to responding after saline infusions during the preceding punishment expression test (within- subject).

Mean ± SEM leverpresses after channel blocker infusions are shown in Figure 7.9(A).

Bupivacaine had no significant effect on punished (F(1,9) < 1; p > .05) or unpunished (F(1,9) <

1; p > .05) leverpressing when compared to saline. Mibefradil had no significant effect on punished leverpressing (F(1,9) < 1; p > .05), but significantly increased responding on the unpunished lever (F(1,9) = 14.8; p < .05) in comparison to saline.

Mean ± SEM average latencies to emit first responses on each lever are shown in

Figure 7.9(C). Rats were faster to press the unpunished lever than the punished lever (all

F(1,9) > 7.63; p < .05). Bupivacaine had no effect on these latencies (all F(1,9) < 1; p > .05).

However, analysis of mibefradil’s effect on latencies yielded a significant drug x lever interaction (F(1,9) = 9.6; p < .05). Simple effects analysis reveal that this interaction was driven by mibefradil significantly decreasing average latencies to press the punished lever (F(1,9) =

5.2; p < .05) compared to saline, while no significant changes to unpunished latencies were found (F(1,9) < 1; p > .05). It is possible that floor effects prevented any decreases in unpunished latencies to be observed.

179

Figure 7.9. Effect of ion channel blocker inactivation of the LHb during punishment expression. (A) Mean ± SEM leverpresses on the punished and unpunished levers during punishment expression. Rats received within-subject infusions of bupivacaine (Bup) and mibefradil (Mib) (n = 10) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM suppression ratios of bupivacaine and mibefradil on leverpressing during punishment expression. (C) Mean ± SEM latency to initially press the punished and unpunished lever across trials, after infusions of bupivacaine or mibefradil, during punishment expression. * p < .05 179

Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

When suppression ratios were analysed, results concurring with total leverpresses were found (Figure 7.9(B)). Bupivacaine infusions had no effect on punished (t(9) = -.24; p > .05) or unpunished (t(9) = .27; p > .05) leverpress ratios. Mibefradil significantly increased the unpunished leverpress ratio greater than 0.5 (t (17) = 3.9; p < .05), while the punished ratio was no different from 0.5 (t (17) = -.40; p > .05). Thus, even though mibefradil significantly decreased latency to press the punished lever, it did not increase overall pressing of that lever.

Effects of LHb inactivation on aversive choice

Instead of receiving channel blocker infusions prior to punishment sessions, 6 rats were assessed in the usual choice test used in previous experiments. Figure 7.10 shows leverpresses on choice test and latencies to responses. Rats responded significantly more on the unpunished than the punished lever (F(1,5) = 36.6; p < .05). There was no difference in responding between NBQX and saline tests for the unpunished (F(1,5) < 1; p > .05) or punished lever (F(1,5) < 1; p > .05). While rats were slower to respond on the punished relative to the unpunished lever, this difference did not reach statistical significance (F(1,5) = 3.1; p >

.05). Importantly, there was no effect of NBQX infusions on leverpress latencies (all F(1,5) <

1.2; p > .05).

Effects of LHb inactivation on locomotor activity

All rats were finally tested for NBQX infusions into the LHb on locomotor activity.

NBQX significantly increased locomotor activity as measured by total distance travelled

(F(1,15) = 21.6; p < .05) and velocity (F(1,15) = 18.4; p < .05) (Figure 7.11).

181 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Figure 7.10. Effect of NBQX inactivation of the LHb on choice. (A) Mean ± SEM leverpresses on the punished and unpunished levers during the aversive choice task. Rats received within-subject infusions of saline and NBQX (n = 6) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM latency to initially press the punished and unpunished lever during the aversive choice task.

182 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Figure 7.11. Effect of NBQX inactivation of the LHb on unprovoked locomotion. (A) Mean ± SEM distance travelled after within-subject infusions of saline and NBQX (n = 16), counterbalanced across days. (B) Mean ± SEM total velocity after within-subject infusions of saline and NBQX (n = 16), counterbalanced across days.

183 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Experiment 9: The role of the dmStr in punishment

Methods

Subjects

Subjects were 20 experimentally naive male Sprague Dawley rats (330-400g).

Procedure

There were a number of differences between the procedure used within this experiment and that described in the General Methods section. Between-subject acquisition infusions used GABA agonists baclofen and muscimol (BM; 1mM baclofen, 0.1mM muscimol; Sigma-

Aldrich, Sydney, Australia), but within-subject infusions (expression and choice; locomotion was not assessed) involved infusions of BM, as well as D1r antagonist SCH39166 (2 µg/µl;

Tocris, Sydney, Australia) and D2r antagonist eticlopride (2 µg/µl; Tocris, Sydney, Australia).

These within-subject infusions sessions were not conducted on consecutive days; they were separated by non-infused punishment sessions (as for choice sessions) to measure any effects on the following punishment session and limit carryover effects. Punishment acquisition was also slightly longer than in previous experiments (7 days instead of 5) due to variations in latencies to respond on the unpunished lever. Given these modifications, procedures following leverpress training will be described in detail below.

Prior to food deprivation and behavioural procedures, cannulae (6mm) were implanted bilaterally according to the coordinates AP: -0.2, ML: ±2.5, DV: -4 mm from bregma

(targeted at a 10° angle laterally to avoid lateral ventricles) (Paxinos & Watson, 2007).

Punishment acquisition. Following leverpress training (as described in General

Methods), rats were given 7 days of the 40-min punishment sessions. Immediately before the first two days of punishment sessions, rats received bilateral infusions of saline or BM to assess the role of the dmStr in punishment acquisition.

184 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Punishment expression. Rats were then given 8 more days of 40-min punishment sessions. On days 1, 3, 5 and 7 of punishment expression, rats received within-subject infusions of saline, BM, D1r antagonist SCH39166, or D2r antagonist eticlopride (order of drug infusion was determined using a Latin square design). On days 2, 4, 6 and 8, rats were given punishment sessions without infusions to measure any drug effects that may carry over to the next day as well as to space out infusions.

Choice test. Following the punishment phase, rats received 4 days of choice test, with a punishment session between each of these choice test days. Choice test involved both levers being extended for 20 mins (instead of the usual 30 mins). Responses on either lever were rewarded on a VI-60 s such that pressing only one lever or both levers over the course of the session yielded no benefit. No shocks were delivered. Immediately before choice tests, rats received within-subject infusions of saline, BM, SCH39166, and eticlopride (order of drug infusion was determined using another Latin square design).

Data Analysis

All applicable analyses described within the General Methods section were used. For punishment expression within-subject ANOVAs were used to analyse total leverpresses and leverpress latencies for punishment expression and aversive choice; lever (punished vs. unpunished) was one within-subjects factor and drug (sal vs. drug) was the other. Along with analysing infusion days, days following drug infusion were analysed using the same factors, only for the next day. Effects of drugs on the next day’s non-infused punishment session

(drug+1) was compared to the non-infused session following saline infusions (sal+1).

185 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Results

Histology

The locations of microinjection cannulae are shown in Figure 7.12. No rats had misplaced cannula, with all cannuale and gliosis induced by spread of injectate being isolated to the dorsomedial portion of the striatum.

Pre-training

Mean ± SEM responses on the to-be-punished and to-be-unpunished levers for the last day of pre-training are shown in Figure 7.13 (data point T). There was no significant overall difference between the drug groups (F(1,18) < 1; p > .05), no overall difference in responding on the to-be punished and to-be unpunished levers (F(1,18) < 1; p > .05), and no group x lever interaction (F(1,18) < 1; p > .05).

Effects of dmStr inactivation on acquisition of punishment

Mean ± SEM leverpressing during the punishment phase are shown in Figure 7.13(A).

Over the course of this training, there was a significant effect of lever (punished versus unpunished), (F(1,18) = 110.1; p < .05) and the difference in responding on the levers increased across days, (F(1,18) = 114.9; p < .05). Across days, there was an increase in responding on the unpunished lever (F(1,18) = 92.7; p < .05) and a decrease in responding on the punished lever

(F(1,18) = 164.8; p < .05). Rats received dmStr infusions of BM or saline prior to the first two days of punishment. During these infusion days, there was no effect of dmStr inactivations using BM on responding to the punished (F(1,18) < 1; p > .05) or unpunished lever (F(1,18) =

3.06; p > .05). However, in contrast all other experiments reported here, responding on both the punished and unpunished lever decreased across the two infusion days.

Latencies to emit first responses (averaged across trials) on the punished and unpunished lever during infusion days are shown in Figure 7.13(B). During these infusion

186 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Figure 7.12. Microinfusion cannula placements within the dorsomedial striatum (dmStr) as verified by Nissl-stained or fluorescent muscimol sections. Black dots represent the most ventral point of the cannula tract, indicated on coronal sections adapted from Paxinos and Watson (2007).

187 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Figure 7.13. Effect of BM inactivation of the dmStr during punishment acquisition. (A) Mean ± SEM leverpresses on the punished and unpunished levers prior to and during punishment acquisition. T represents the last day of leverpress training. Arrows indicate days that rats received infusions of either saline (n = 10) or BM (n = 10) immediately prior to the session. (B) Mean ± SEM latency to initially press the punished and unpunished lever (averaged across trials) during punishment acquisition.

188 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

days, latencies to respond on the punished lever increased (F(1,18) = 48.9; p < .05), which was not significantly affected by BM infusions into the dmStr (all F(1,18) < 2.2; p > .05). Unlike previous experiments, latencies to respond on the unpunished lever also increased across infusion days (F(1,18) = 24.7; p < .05). Rats that received BM infusions pressed the unpunished lever significantly faster than rats that received saline (F(1,18) = 4.89; p < .05), but this drug effect did not interact with the increase in unpunished leverpress latencies across infusion days (F(1,18) < 1; p > .05). This increase in unpunished leverpress latencies was seemingly restricted to infusion days, as there was an overall decrease in unpunished leverpress latencies across punishment acquisition sessions (F(1,18) = 95.0; p < .05). Also, though rats that received

BM initially pressed the unpunished lever faster than saline rats on the day following acquisition infusions (Day3), this difference did not reach statistical significance (F(1,18) =

3.49; p = .078). Towards the end of punishment acquisition there were no differences between groups on unpunished leverpress latencies (Days 6 and 7; all F(1,18) < 1; p > .05).

Therefore, in summary: 1) BM infusions into the dmStr did not significantly affect leverpressing relative to saline; 2) BM infusions did significantly decrease latencies to press the unpunished lever relative to saline on infusion days only; 3) infusion of either BM or saline into dmStr appeared to disrupt performance in the task because responding on both levers decreased and latencies to respond on both levers increased across infusion days. This performance effect was transient, with the usual pattern of responding and latencies being observed across remaining days.

Effects of dmStr inactivation on expression of punishment

Following punishment acquisition, rats received infusions (saline, BM, SCH39166 and eticlopride; within-subjects, counterbalanced across days) immediately prior to punishment sessions to test the effect of dmStr inactivation and dopamine receptor antagonism on the expression of punished behaviour. Each infusion day was followed by a day of non-infused

189 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment punishment to measure any drug effects that carried over to the next day as well as to space out infusions. One rat was excluded from all further analyses due to cannula damage.

Mean ± SEM leverpressing during infusion days are shown in Figure 7.14(A). There was a significant main effect of lever, such that rats responded more on the unpunished lever than the punished lever (F(1,18) = 139.8; p < .05). There was no difference in responding between saline and each drug test for the punished or unpunished lever (all F(1,18) < 1.8; p >

.05). There were also no differences between punished or unpunished leverpresses on the days following drug test when each were compared to leverpresses on the day following saline infusions (all F(1,18) < 1; p > .05).

Figure 7.14(B) shows drug:saline leverpress ratios. BM did not significantly alter punished (t(18) = 0.07; p > .05) or unpunished (t(18) = -1.71; p > .05) leverpress ratios from 0.5.

D1r antagonist SCH39166 also did not significantly alter punished (t(18) = -1.14; p > .05) or unpunished (t(18) = -1.09; p > .05) leverpress ratios from 0.5. Finally, D2r antagonist eticlopride also did not significantly alter punished (t(18) = 1.27; p > .05) or unpunished (t(18) =

-1.02; p > .05) leverpress ratios from 0.5.

When latencies to emit first responses (averaged across trials) were analysed, rats were significantly slower to respond on the punished lever than the unpunished lever (F(1,18) =

345.1; p < .05) (Figure 7.14(C)). No effect of drug was found for punished leverpress latencies (all F(1,18) < 1.7; p > .05). Though BM increased average latencies to press the unpunished lever, this increase did not reach statistical significance (F(1,18) = 4.01; p = .060).

Dopamine receptor antagonism had no effect on unpunished leverpress latencies (all F(1,18) <

1.9; p > .05).

Effects of dmStr inactivation on choice test

Figure 7.15 shows responses on choice test and latencies to initially press each lever.

Rats responded significantly more on the unpunished than the punished lever (F(1,14) = 109.2;

190

Figure 7.14. Effect of BM, D1r antagonist (D1; SCH39166) and D2r antagonist (D2; eticlopride) infusions into the dmStr on punishment expression. (A) Mean ± SEM leverpresses on the punished and unpunished levers during punishment expression. Rats received within-subject infusions of saline, BM, SCH39166 and eticlopride (n = 19) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM suppression ratios of drug on leverpressing during punishment expression. (C) Mean ± SEM latency to initially press the punished and

190 unpunished lever across trials during punishment expression.

Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Figure 7.15. Effect of drug infusions into the dmStr on choice. (A) Mean ± SEM leverpresses on the punished and unpunished levers during the aversive choice task. Rats received within- subject infusions of BM, D1r antagonist (D1; SCH39166) and D2r antagonist (D2; eticlopride) (n = 19) immediately prior to the session, counterbalanced across days. (B) Mean ± SEM latency to initially press the punished and unpunished lever during the aversive choice task. *p < .05

192 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment p < .05). There was no difference in responding between BM and saline tests for the unpunished lever or punished lever (all F(1,18) < 1; p > .05). D1r antagonist SCH39166 did not significantly alter overall pressing of the punished lever (F(1,18) = 2.81; p > .05), but it did significantly increase overall pressing of the unpunished lever (F(1,18) = 4.94; p < .05). D2r antagonist eticlopride also did not significantly alter overall pressing of the punished lever

(F(1,18) = 1.59; p > .05), but it significantly decreased overall pressing of the unpunished lever

(F(1,18) = 14.1; p < .05).

Rats were also significantly slower to respond on the punished relative to the unpunished lever (F(1,18) = 37.1; p < .05). There was no effect of drug infusions on latencies to press the punished (all F(1,18) <1; p > .05) or unpunished (all F(1,18) <1.1; p > .05) levers.

To further assess the effect of DA receptor antagonists on unpunished leverpressing, leverpressing across choice sessions was analysed (data not shown). For saline and DA antagonist sessions, overall there was a non-significant decrease in leverpressing across the sessions (F(1,18) = 3.29; p > .05) and there were no significant interactions between drug and this linear trend (all F(1,18) < 1; p > .05), suggesting that changes in leverpressing across a choice session was not differentially affected by D1r or D2r antagonist infusions.

Discussion

The experiments reported in this chapter studied the roles of midbrain DA circuits in punishment. Experiment 7 investigated the role of VTA inhibition, Experiment 8 investigated the LHb, and Experiment 9 examined the role of the dmStr. There were numerous effects, so each experiment will be discussed in turn.

Experiment 7: VTA disinhibition

Preventing the inhibition of the VTA by local infusions of GABAA antagonist bicuculline attenuated acquisition of punishment; bicuculline-infused rats pressed the

193 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment punished lever significantly more than saline-infused rats on acquisition infusion days.

Bicuculline also attenuated the increase in latencies to press the punished lever across infusion days observed in saline controls. Interestingly, this increased punished leverpressing in Acq Bic rats (those that received bicuculline during acquisition) persisted beyond acquisition infusion days; Acq Bic rats pressed the punished lever significantly more, and faster, than the acquisition-saline (Acq Sal) group on subsequent non-infusion days, expression tests and choice. Acq Bic rats also travelled further and faster than Acq Sal rats in unprovoked locomotion tests.

Analyses of within-subject effects (across both acquisition groups) revealed bicuculline increased latencies to press the unpunished lever and lowered unpunished leverpress ratios during punishment expression, but had no effect on punished leverpressing. Aggregated across groups, bicuculline had no within-subject effects on choice or locomotion. When broken down according to acquisition group, analysis of within-subject infusions revealed no differential effect of bicuculline on responding in expression test. During the choice tests, bicuculline had no effect on responding in Acq Bic rats but increased pressing of the punished lever in Acq Sal rats (punished leverpressing increased over choice session following bicuculline but not saline); there were no infusion effects on choosing the unpunished lever.

These results suggest that VTA GABAA receptors are involved in punishment learning, such that blockade of these receptors during initial punishment impairs acquisition of instrumental suppression. In fact, VTA disinhibition during initial punishment caused an enduring impulsive/hyperactive phenotype, as indicated by greater and faster punished leverpressing and increased locomotion. Subsequent bicuculline infusions in the Acq Sal group did not abolish this acquisition group difference, which suggests that this effect of bicuculline depends on its actions during initial punishment sessions. The potential behavioural locus of this group effect will be discussed in greater detail within the General

194 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Discussion, but in summary it is suggested that the most parsimonious interpretations involve peturbations in normal punishment-learning plasticity within VTA circuits.

VTA disinhibition also attenuated unpunished leverpressing during punishment expression, which is consistent with suggestions that finely-tuned VTA firing is necessary for directing instrumental responding for reward (Salamone, 1992; Wise, 2004). The lack of disinhibition effect on punished responding during expression test (besides the distal effect of acquisition infusions) suggests VTA inhibition is not responsible for the expression of learned instrumental suppression.

Interestingly, VTA disinhibition increased responding on the punished lever throughout a choice session in the Acq Sal group (Figure 7.4(A)). Given no change in punished leverpressing during the expression test was observed (Figure 7.3(A)), this effect within the

Acq Sal group can be interpreted as VTA disinhibition disrupting avoidance of the punished lever under conditions unique to the choice test. There are 3 aspects of choice sessions that distinguish it from punishment expression sessions, which may explain the discrepancy in effects: leverpressing is reinforced on a VI-60sec schedule (instead of a VI-30sec schedule), both levers are extended (instead of individually presented), and no shocks are delivered.

The change in reinforcement schedule is unlikely to directly affect pressing of the previously-punished lever, as rate of leverpressing on that lever is too low (~1 press/min) to cause detection of the changed contingency. It is much more likely that the contingency change would be detected while pressing the unpunished lever, but no significant changes in unpunished leverpressing were detected. However, the significantly lower latency to press the punished lever following bicuculline infusions may reflect an increased likelihood to switch from pressing the unpunished lever to pressing the punished lever, which may have been affected by bicuculline-modulation of reinforcement contingency detection or behavioural response to contingency changes. A general increase in switching does not explain the

195 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment increase in punished leverpressing observed across the session following bicuculline. This increase across the session may reflect extinction of punishment due to the lack of shock deliveries. It is unclear whether this proposed effect of VTA disinhibition was due to its direct enhancement of the extinction process, or simply allowed it by increasing pressing of the punished lever (e.g. by increasing switching), which resulted in unexpected shock omission, expediting normal punishment extinction.

This within-subject effect of bicuculline was not observed for the Acq Bic group. One reason for this might be that switching to the punished lever was extremely quick within Acq

Bic, and thus bicuculline-induced increases in switching were without effect. It is also possible that aberrant plasticity suggested to be responsible for the increased punished leverpressing and locomotion resulted in an insensitivity to the effects of VTA disinhibition on choice behaviour.

Thus increased responding on the punished lever throughout a choice session for Acq

Sal rats following bicuculline infusions is proposed to be driven by increased switching and punishment extinction. VTA disinhibition had no direct effect on unprovoked locomotion within a familiar chamber, which is congruent with the proposition that the VTA is responsible for promoting motivated responding, but not general motor activation, e.g. unprovoked locomotion (Haber, 2014; Salamone, 1992). However, previous experiments have found that antagonism of VTA GABAA receptors, including by bicuculline infusions, causes increased unprovoked locomotion (Lavezzi et al., 2015; Mogenson et al., 1979). The reason for this discrepancy is unclear, though it may reflect the fact that rats were habituated to the locomotion chamber in the current experiment, which reduces motivationally-elicited exploratory behaviour (Mällo et al., 2007), which has been suggested to rely on VTA DA

(Fink & Smith, 1980).

196 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Experiment 8: LHb inactivation

NBQX inactivations of the LHb did not affect the acquisition or expression of punishment behaviour. NBQX also had no effect on preference to press an unpunished lever over a previously punished lever within a choice test, despite Stopper and Floresco (2014) suggesting the LHb is particularly important in tasks requiring discrete choice. This may indicate the role of LHb in choice is restricted to appetitive determinants of behaviour, as used by Stopper and Floresco (2014).

Ca2+ channel blockade, but not Na+ channel blockade, increased expression of unpunished leverpressing while decreasing latency to press a punished lever. Finally, NBQX into the LHb significantly increased locomotion. Taken together, these results suggest the aversion-related signals of the LHb are not as essential for acquiring and expressing punishment-related behaviour as previously thought. Importantly, LHb infusions of NBQX did act to increase locomotor activity – a well-documented consequence of LHb manipulations (Gifuni et al., 2012; Lecourtier et al., 2008; Nair et al., 2012) – indicating that these infusions were effective in manipulating LHb activity.

It is worth unpacking the observed decrease in latencies to press the punished lever following mibefradil infusions. This effect corresponds with previous findings that lesions of the LHb affect response latencies (Hong et al., 2010; Jhou et al., 2013; Matsumoto &

Hikosaka, 2011; Mirrione et al., 2014; Tomaiulo et al., 2014). However, no effect on overall pressing of the punished lever was found (indeed, there were fewer total punished leverpresses after mibefradil compared to saline), suggesting initial avoidance and overall avoidance are distinct processes, of which only the former is affected by Ca2+ channel blockade within the LHb.

The LHb contains both L- and T-type Ca2+ channels (Huguenard et al., 1993; Meye et al., 2013) (high and low voltage-dependent Ca2+channels, respectively), both of which would

197 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment be effectively blocked with the mibefradil dosage used (Lee et al., 2006). The behavioural significance of LHb Ca2+ channels has not yet been explored, though a recent finding by Zuo and colleagues (2013) suggests that LHb Ca2+ channels, and not Na+ channels, are necessary for cocaine’s facilitation of spontaneous firing within the LHb. This cocaine-induced firing in the LHb has been linked to the aversive properties of cocaine, which involves D2r feedback to the LHb from the VTA (Jhou et al., 2013; Zuo et al., 2013). Thus, the present results fit broadly with a regulatory role of LHb Ca2+ channels over midbrain DA and hence modulation of reward seeking behaviour by punishment.

However, the effects of mibefradil on latencies were relatively modest and did not result in increased punished leverpressing. Taken together with the finding that NBQX and bupivacaine had no significant effect on leverpressing, despite NBQX having a pronounced effect on locomotor activity, it is suggested that the role of the LHb in punishment behaviour is more limited than previously asserted.

Experiment 9: dmStr inactivation

Inactivations of the dmStr using BM had no effects on punishment suppression, nor did blockade of dmStr D1r and D2r. However, D1r antagonists into the dmStr increased responding on the unpunished lever within the choice test, whereas D2r antagonists into the dmStr decreased responding on the unpunished lever within the choice test. The potential reasons for this bidirectional effect on unpunished responding will be discussed in the General

Discussion. In terms of the aim of the experiment, i.e. investigation of the role of dmStr in punishment, there was no evidence here that dmStr is a key structure. This contrasts with findings from appetitive instrumental responding identifying dmStr as a key structure within the R-O circuit (Hart et al., 2014). Moreover, the failure of D2r antagonist to affect punished responding contrasts with claims that signalling via D2r-expressing dmStr neurons mediates punishment (Kravitz & Kreitzer, 2012; Kravitz et al., 2010).

198 Chapter 7: The Role of Midbrain Dopamine Circuits in Punishment

Although there was no specific effect of dmStr manipulation on punishment, there was a notable decrease in leverpressing, and an increase in leverpress latencies, on both levers during acquisition infusions, which recovered after infusion days. The reasons for this are unclear. It is possible that dmStr infusions or surgery may have globally disrupted dmStr function, causing impairments in motor control and response execution (Haber, 2003). The presence of this effect is concerning for the methodology used in this thesis. However, there was no evidence for a generally disruptive effect of infusions on behaviour elsewhere in this thesis, and even for dmStr these effects were transient.

199 Chapter 8: General Discussion

Chapter 8 General Discussion

Punishment involves the reduction of a behaviour that causes a negative outcome. This reduction in responding has been attributed to unlearning of an instrumental response, conditioned suppression due to Pavlovian conditioning, and negative reinforcement of a competing response. Though these drivers of behaviour may contribute to observed reductions in punished responding, the literature suggests that a punishment procedure primarily reduces responding via instrumental suppression, which involves encoding of the response-punisher contingency, devaluing the response.

These experiments investigated the role of specific brain structures in punishment learning and behaviour. A punishment protocol was designed and tested in Experiment 1; leverpressing for food was greatly reduced on a lever that resulted in moderate response- contingent shocks but not for an unpunished lever. Over the course of punishment training, responding on the punished lever decreased and slowed, plateauing after a few days. This suppression of the punished response was not caused by Pavlovian fear as measured by freezing, as the initially low levels of freezing decreased across punishment sessions. When presented with a choice to press the punished or unpunished lever, rats invariably chose the unpunished lever.

The role of BLA, mAcbSh, PFC (namely the PL, IL and RAIC), VTA, LHb and dmStr within punishment acquisition, expression and aversive choice were examined using microinfusions of drugs into these regions. These regions were chosen because the literature has implicated or hypothesized their involvement in punishment behaviour, but has not provided much direct evidence for the importance of these regions in punishment. In these experiments the same pattern of behaviour as in Experiment 1 was observed; acquisition and expression of punished suppression, and preference for an unpunished response over a

200 Chapter 8: General Discussion concurrently available punished response. This demonstrates the robustness of punishment on behaviour. Although no manipulation completely abolished all punishment-driven suppression or preference, two manipulations significantly increased responding on the punished lever. BLA inactivations attenuated punishment suppression during acquisition and expression sessions but had no effect on choice, while VTA disinhibition during acquisition permanently impaired suppression of the punished response and independently increased choice of the punished lever. Manipulations of the mAcbSh, PFC, LHb and dmStr did not significantly increase leverpressing of the punished lever, though there were some effects on leverpress latencies and unpunished behaviour. A summary of the findings of this thesis is catalogued in Table 8.1.

This pattern of effects are most parsimoniously interpreted as supporting a role for the caudal BLA in encoding the aversive value of the punisher, and VTA GABA inhibition in punishment-learning plasticity, whereas these other regions are not important for punishment behaviour. The evidence for, and implications of, these interpretations will be discussed below.

A role for the BLA in encoding the aversive value of a punisher

BLA inactivation selectively increased responding on the punished lever during acquisition and expression, sessions in which the shock punisher was delivered, but not during choice, where shocks were not delivered. Importantly, latency data from expression tests showed a selective effect of BLA inactivations. Rats were slower to press the punished lever compared to the unpunished lever at the start of the session, prior to any shock being delivered. BLA inactivation had no effect on these initial latencies. There was an increase in latencies to press the punished lever across the session after shock had been delivered, and

201 Chapter 8: General Discussion

Table 8.1

Summary of experimental findings

Experiment (region - drug) Acquisition Expression Choice Locomotion Exp 2 ↑ Pun LP Caudal BLA: ↑ Pun - - (BLA - BM)

Exp 3 - - - - (AcbSh - NBQX)

Exp 4 - - - N/A (PL - BM)

Exp 5 ↑ Unp LP - - N/A (IL - BM) ↓ Lat

Exp 6 - - ↕ LP N/A (RAIC - BM)

Exp 7 ↑ Pun LP ↑ Unp Lat AcqSal: ↑ Pun AcqBic > AcqSal (VTA - Bic) ↓ Pun Lat

Exp 8 ↑ Unp LP - Mib: - ↑ (LHb - NBQX) ↓ Pun Lat

Exp 9 D1: ↑ Unp LP - - N/A (dmStr - BM) D2: ↓ Unp LP

Legend: ↑ denotes an increase following drug compared to control, ↓ denotes a decrease following drug compared to control, ↕ denotes an unspecified interaction effect. Pun denotes the punished lever, Unp denotes the unpunished lever. LP denotes leverpressing, Lat denotes latencies. Arrows within Exp 7 row refers to enduring effects of acquisition infusions. Mib, 2+ D1, and D2 refer to effects of mibefradil (Ca channel blocker), SCH39166 (D1r antagonist),

and eticlopride (D2r antagonist), respectively. N/A = not available.

202 Chapter 8: General Discussion

BLA inactivations prevented this increase in latencies. These effects are consistent with the proposed role for the BLA in determining the aversive value of a shock punisher (Moul et al.,

2012), as all these effects depended on experience of the shock. Previous studies suggest it is unlikely that BLA inactivation reduced rats’ sensitivity to the footshock (Maren et al., 1996;

Rabinak & Maren, 2008).

Given the well-documented role for the BLA in fear learning, it is worth commenting on the possibility that the results reflect an impairment in Pavlovian fear. As noted by Bolles and colleagues (1980) and demonstrated using the current protocol in Experiment 1, the role of Pavlovian fear is mostly restricted to initial punishment sessions and even then is a relatively minor contribution which is detectable on the unpunished response. While BLA inactivations attenuated acquisition of punishment suppression, where Pavlovian fear exerts the greatest influence, no significant effects were found on the unpunished lever. This suggests the attenuation of suppression was specific to the punished lever, which suggests

Pavlovian contributions were either negligible or unaffected by BLA inactivations. The possibility of negligible contributions is anticipated, given the parameters used within this design were explicitly chosen to reduce Pavlovian influences. However, the possibility that

BLA inactivations had no effect on Pavlovian conditioned suppression is also supported by previous research showing lesions of the BLA do not impair conditioned suppression, but do impair punishment (Killcross et al., 1997). It is much more difficult to attribute the anti- punishment effects of BLA inactivations during expression on attenuation of Pavlovian fear, as freezing and suppressed unpunished responding are not evident during the later stages of punishment. Also, the finding that initially high latencies to press the punished lever during expression were unaffected by BLA inactivation does not accord with attenuated fear.

A key finding was that there were differences between rostral and caudal BLA involvement in punishment. In fact, BLA contributions to punishment were difficult to detect

203 Chapter 8: General Discussion when rostral and caudal inactivations were aggregated. This might explain why previous considerations of a role for the BLA (that did not account for a rostro-caudal subdivision) in instrumental aversive learning have been ambiguous (Maren, 2003). The rostral and caudal

BLA have differences in connectivity and function, which could indicate an anatomically specific circuit for punishment processing. However, two projection targets of caudal but not rostral BLA, the PL and mAcbSh, were investigated and the null results observed suggest these structures do not mediate the caudal BLA involvement in punishment.

In summary, the BLA is important for both the acquisition and expression of punishment but not for unpunished choice. This role appears to be linked to neurons in the caudal BLA, rather than rostral BLA, and is most parsimoniously interpreted as a role for the caudal BLA in determining the aversive value of the shock punisher. This is congenial with a previously proposed role for the BLA in encoding outcome value (Balleine et al., 2003;

Morrison & Salzman, 2010; Moul et al., 2012), extending this role to primary punishers, and its localization to the functionally and anatomically distinct caudal BLA.

A role for VTA inhibition in punishment-learning plasticity

Preventing the inhibition of the VTA by local infusions of GABAA antagonist bicuculline during initial punishment sessions permanently attenuated suppression of the punished response; rats that received bicuculline during acquisition sessions (Acq Bic) pressed the punished lever significantly more and initiated punished leverpressing significantly faster than saline-infused rats (Acq Sal) during acquisition, expression and choice. Acq Bic rats also travelled further and faster than Acq Sal rats in unprovoked locomotion tests. Subsequent bicuculline infusions in the Acq Sal group did not yield the same phenotype as observed in the Acq Bic rats or abolish this acquisition group difference, which suggests that this effect of bicuculline depends on its actions during initial punishment sessions.

204 Chapter 8: General Discussion

It is worth ruling out alternative explanations for this enduring group effect. The significantly higher punished responding in the Acq Bic group on the first day of punishment suggests it was not the consecutive days of bicuculline infusions that caused the long-term effect, which would result in group differences to emerge on the second infusion day.

Histological examination of infusion sites also suggested that excitotoxic lesions were not responsible for the observed effects. It is also difficult to attribute this failure of suppression to increased locomotion, as Experiment 8 (NBQX into the LHb) demonstrated that significant increases in locomotion do not necessarily translate to an increase in leverpressing (punished or otherwise) within the protocol employed. This is supported by observations that punishment suppression is not readily released by drugs that increase locomotion and unpunished responding (Barrett & Vanover, 1993; Witkin, 2002; Witkin et al., 2004).

Thus it is reasonable to argue that bicuculline infusions influenced neural plasticity during punishment acquisition. This may be due to bicuculline modulation of shock-induced

DA-mediated plasticity. The evidence for this suggestion is that VTA DA neuron firing, which is sensitive to motivational valence and contingencies (Schultz, 2006), is thought to convey this behaviourally relevant information to downstream structures as a teaching signal, modulating plasticity in these downstream structures to effect learning (Schultz, 1998, 2007b;

Wise, 2004). However, it is unclear how DA firing, including firing to the punisher, was modified by bicuculline within the current experimental design.

Greater punished responding in the Acq Bic group may reflect an enduring insensitivity to punishment or an inability to inhibit behaviour. Punishment insensitivity may have been caused by bicuculline permanently shifting the value of the shock. It has been reported that a shock can lose its aversive value if its presentation reliably precedes an appetitive stimulus

(Erofeeva, 1916; Pearce & Dickinson, 1975), and this process reduces the ability for a shock to punish behaviour (Dearing & Dickinson, 1979). It is possible that VTA disinhibition during

205 Chapter 8: General Discussion initial shock presentations effectively counterconditioned the shock, which is harmonious with the VTA’s role in appetitive conditioning (Schulz, 2006). GABAA antagonism would prevent aversive pauses and increase appetitive burst firing signals within the VTA (Lobb et al., 2010, 2011). This would cause appetitive value to accrue onto the shock, decreasing its aversive value (Dickinson & Dearing, 1979; Konorski, 1967). The consequence of this counterconditioning is decreased effectiveness for the shock to suppress responding on the punished lever in subsequent tasks, as observed within this experiment.

The failure of subsequent bicuculline infusions to influence the value of the shock in

Acq Sal rats may be due to the value of the shock having already been established. Perhaps

VTA signalling is only required for initial valuation of the shock, with VTA disinhibition simply causing plasticity-encoded misvaluation of the shock. However, while this role for

VTA DA signalling in counterconditioning fits with its endogenous firing properties, the

VTA’s causal role in these processes has yet to be directly demonstrated. Also, this does not explain the increased locomotion observed. Acquisition effects on locomotion may be caused independently, but indicates a lack of parsimony.

An impairment in ability to inhibit or regulate behaviour explains both the increase and faster initiation of punished responding throughout the experiment, and general hyperactivity.

VTA disinhibition during punishment acquisition may have impaired normal plasticity of circuits mediating behavioural inhibition in response to punishment. A potential locus of these effects are structures within the Behavioural Inhibition System, including the septum, hippocampus, cortex and amygdala, which are modulated by DAergic inputs from the VTA

(Abraham et al., 2014; Bromberg-Martin et al., 2010; Eison & Temple, 1986; Lammel et al.,

2008). Preventing inhibition of VTA neurons during punishment may disrupt appropriate recruitment and plasticity within these circuits. Alternatively, VTA disinhibition during

206 Chapter 8: General Discussion punishment may have altered normal punishment-induced plasticity onto VTA neurons themselves.

Thus the VTA is implicated in punishment-induced plasticity. However, the nature of this role remains unclear. The critical element of the disinhibition effect also remain unclear, e.g. is this long-lasting impulsivity/hyperactivity effect specifically due to abolishing pauses in reward-coding DA firing during the shock, removing fine-tuned DA coding throughout punishment learning, or another aspect of VTA activity? Investigation into these possibilities requires specifically manipulating VTA GABA inhibition during the shock or other points of the punishment session. This could be achieved through manipulation of neural activity with more temporal specificity that drug microinfusions allow, e.g. using optogenetic techniques.

Also, an interesting implication of this finding is that psychopathologies known to involve altered DA function (e.g. ADHD, substance abuse disorders; Li et al., 2006; Volkow et al., 2007, 2009) include symptoms of punishment insensitivity and impaired behavioural control (Bechara & Damasio, 2002; Humphreys & Lee, 2011; Petry, 2001; van Meel et al.,

2011). The current results suggest the possibility that such symptoms may depend, in part, on the confluence of altered DA signalling and punishment. Moreover, punishment in situations of perturbed DA (e.g. drug addiction, ADHD, and/or in individuals with a genetic predisposition to altered DA signalling) may cause long-lasting impairments in behavioural control. These novel hypotheses require further investigation.

In summary, disinhibition of the VTA resulted in impaired acquisition of instrumental suppression. This also caused a long lasting impulsivity/hyperactivity behavioural phenotype, possibly due to disruption of normal DA-mediated plasticity during punishment learning.

VTA disinhibition attenuated unpunished leverpressing during punishment expression, which is consistent with suggestions that finely-tuned VTA firing is necessary for directing instrumental responding for reward. The lack of effect on punished responding during

207 Chapter 8: General Discussion expression test suggests VTA inhibition is not responsible for the expression of learned instrumental suppression. VTA disinhibition also increased responding on the punished lever throughout a choice session for non-impulsive rats, which could be driven by increased switching and/or punishment extinction. VTA disinhibition had no direct effect on unprovoked locomotion within a familiar chamber. The precise behavioural and neural basis of these effects are uncertain, and extension of these findings with techniques that allow for temporally- and neuron-specific manipulation of VTA activity is required.

A limited role for the mAcbSh, PFC, LHb and dmStr in punishment

The failure to find notable anti-punishment effects following manipulations of these brain regions, which have been hypothesized to mediate punishment behaviour in relation to their particular functions, was surprising. The role of the mAcbSh and PFC in behaviour suppression (Arnsten & Li, 2005; Balleine & O’Doherty, 2010; Cai et al., 2014; Millan &

McNally, 2011) have not been demonstrated in aversive punishment, and the current experiment suggest that these regions are not required for suppression of behaviours punished by shock.

Inactivations of the PL and dmStr also had no effect, despite their role in encoding R-O associations. However, the evidence for the structures within the R-O circuit has only been demonstrated using appetitive R-O associations, and may not extend to aversive response- punisher associations. The lack of PFC involvement in punishment stands in agreement with

Pelloux and colleagues’ (2013) findings. This suggests that punishment behaviour can be emitted independently of the PFC, implicating a more fundamental role for subcortical circuits.

The LHb has been hypothesized to mediate punishment behaviour; LHb activity is correlated with and sufficient for punishment learning and behaviour (Hong et al., 2010;

Stamatakis & Stuber, 2012). Experiment 8 suggests that LHb activity is not in fact necessary

208 Chapter 8: General Discussion for punishment. This lack of effect is not easily attributable to the nature of the aversive event used in the current protocol because the relatively weak punisher used here (0.5mA footshock) has been shown to recruit an LHb-RMTg pathway (Brown & Shepard, 2013). It is likely that multiple structures and/or pathways are recruited to mediate punishment, so these other pathways may have compensated for LHb inactivation. The effect of VTA disinhibition suggests inhibition of midbrain DA is involved in punishment but this may not rely on activation of the LHb. The RMTg receives inputs from a multitude of other structures (Jhou et al., 2013), which may allow for the GABAergic inhibition of the VTA in response to punishment. Therefore the LHb may not be as critical for punishment, either acquisition or expression, as widely believed.

Interestingly, Stopper and Floresco (2014) reported that LHb inactivation using a

GABA agonist removes a rat’s preference to choose a large but risky/delayed reward over a small but certain/immediate reward. Therefore, it is possible that punishment-related signals within LHb, while sufficient but not necessary for punishment, serve a more fundamental role in decision-making. Indeed, Stopper and Floresco argued that LHb serves as a preference centre involved in decision-making. However, the findings from the choice test here suggest that this role does not extend to choice biased by previous aversive experiences. Perhaps the role of LHb in biasing decision-making is restricted to appetitive, but not aversive, modifiers of subjective value.

Lastly, DAr antagonism in dmStr had no effect on responding during punishment expression sessions, or choosing to press the punished lever in the choice test. This suggests the nigrostriatal and basal ganglia pathways (direct and indirect) are not important for expression of instrumental suppression. The nigrostriatal pathway encodes reinforcement and aversion in a similar manner to the reward-coding mesolimbic pathway (Ilango et al., 2014), while stimulation of D1r-expressing direct pathway neurons and D2r-expressing indirect

209 Chapter 8: General Discussion pathway neurons is rewarding and punishing, respectively (Kravitz & Kreitzer, 2012). It appears that signalling in these pathways is sufficient, but not necessary, for punishment behaviour.

Specific effects of neural manipulations on choice test

There were 3 experiments that detected drug effects on responding during the choice test, without significant increase in responding on the punished lever: Experiment 4 (IL),

Experiment 5 (RAIC), and Experiment 9 (dmStr). Interestingly, these effects were not observed during punishment expression, suggesting an effect on factors unique to choice sessions. There are 3 aspects of choice sessions that distinguish it from expression sessions and may explain the discrepancy in effects found: no shocks are delivered, leverpressing is reinforced on a VI-60sec schedule (instead of a VI-30sec schedule), and both levers are extended (instead of individually presented) during choice.

IL inactivations

IL inactivation during an unpunished choice test increased unpunished leverpressing and decreased latencies to press both levers, but had no effect on overall punished leverpresses, an effect not observed during expression. Processing of shock omission was unlikely the locus of the IL inactivation effect, because there was no significant effect on punished leverpressing. Instead an effect was found on latencies to initially press both levers, which occurs prior to unexpected shock omission. Also, IL inactivations significantly increased unpunished leverpressing, but did not affect rate of leverpressing across the session, suggesting the increase in unpunished leverpressing was not driven by increases in the latter part of the session where the absence of shocks might be registered.

The use of a VI-60sec food pellet schedule, instead of the VI-30sec schedule used during punishment sessions, may mediate the IL’s effect on unpunished responding within the

210 Chapter 8: General Discussion choice test. This reduced schedule of reinforcement may introduce negative prediction error on the unpunished lever (responding on the punished lever is generally too low to detect this contingency change [less than 1 response/min, regardless of infusion]). This may result in a decrease in responding, just as extinction reduces instrumental responding through negative prediction error. Given the IL’s proposed role in instrumental extinction (Marchant et al.,

2010; Peters et al., 2008, 2009), it is possible that the increase in unpunished leverpressing following IL inactivations was caused by preventing negative prediction error-driven downregulation of instrumental responding. However, there was no significant decrease in unpunished responding across the session to indicate any downregulation of leverpressing, and IL inactivation had no effect on change in leverpressing across the choice test. This suggests the effects observed are not mediated by this extinction-like process.

The last difference between choice and punishment sessions is the concurrent presentation of levers in the choice test. The inactivation effects during choice may be due to

IL involvement in decision-making about alternate responses. It is likely that the presentation of both levers requires processing of both levers and allocation of responding, as predicted by the matching law (Herrnstein, 1970; Logan, 1969) and evidenced within the current experiment by eventual leverpressing of the punished lever; processing of the alternative response may even interfere with current responding (Keeler et al., 2014). Rats retained preference for the unpunished lever following IL inactivations, but were quicker to press both levers, indicative of less deliberation and behavioural inhibition. The general increase in unpunished leverpressing, without an increase in punished responding, is congruent with a loss of deliberation of alternate responses (without impairing punishment suppression). This broadly fits with the proposed role for the IL in instrumental behaviour inhibition (Arnsten &

Li, 2005; Hitchcott et al., 2007; Killcross & Coutureau, 2003), but also suggests this does not extend to punishment suppression.

211 Chapter 8: General Discussion

RAIC inactivations

Insula activity has also been posited to encode relative response value (Talmi & Pine,

2012) and retrieve representations of outcome value (Balleine & Dickinson, 2000; Parkes &

Balleine, 2013). Interestingly, RAIC inactivations slightly shifted responding towards the punished lever during the choice test, consistent with these proposed valuative functions.

When two levers are presented, with one lever being affected by negative incentive value due to its previous contingency with punishment (Logan, 1969), blocking the retrieval of outcome value via RAIC inactivations attenuated aversively-motivated choice to respond on the more highly valued unpunished lever over the previously-punished lever. However, the shift in choice observed in Experiment 6 was unimpressive compared to the complete elimination of sensitivity to differential incentive value following insula inactivation within outcome devaluation tasks (Balleine & Dickinson, 2000; Parkes & Balleine, 2013), suggesting that aversively-motivated choice is not completely dependent on the RAIC. It is possible that inactivation of a more caudal portion of the insula (as targeted by Balleine & Dickinson,

2000; Parkes & Balleine, 2013) would result in a more substantial attenuation of preference, though evidence to support this proposition is lacking. It is also worth noting that the outcome devaluation employed by previous studies (Balleine & Dickinson, 2000; Parkes & Balleine,

2013) used sensory-specific satiety of a food reinforcer, and thus depends on gustatory function, which the insula is known to be involved in (Flynn et al., 1999). Therefore, it is possible this role in outcome value processing is limited to gustation, though several lines of evidence suggest the RAIC encodes general aversive value (Coghill et al., 1994; Franciotti et al., 2009; Hayes & Northoff, 2011; Preuschoff et al., 2008; Simmons et al., 2004, 2006).

Bidirectional effect of DA antagonists into the dmStr

During the choice test, D1r blockade increased, whereas D2r blockade decreased, leverpressing of the unpunished lever, without affecting the punished lever. This suggests that

212 Chapter 8: General Discussion

DAergic signals onto direct or indirect pathway neurons are not required for avoidance of the punished lever within a choice task.

This bidirectional effect contrasts with previous findings that dStr D1r blockade attenuated reward behaviour and dStr D2r blockade attenuated non-rewarded behaviour

(Nakamura & Hikosaka, 2006); there are several differences in the tasks, which may explain this discrepancy. Firstly, the present task involved choice whereas Nakamura and Hikosaka’s only presented one visual target to saccade to at a time. Secondly, their task did not involve aversive outcomes to motivate alternative behaviour. Lastly, the faster reaction time to complete a rewarded saccade used in Nakamura and Hikosaka’s (2006) task may be more

Pavlovian than instrumental (Hearst, 1976; Kaye & Pearce, 1984).

Given no changes in unpunished leverpressing were found during expression test, a parsimonious interpretation of the effects of dmStr DAr antagonism would involve factors unique to the choice test. No shocks were delivered during the choice test so this is unlikely to mediate DA receptor antagonist effects. The use of a sparser reinforcement schedule during choice tests (introducing negative prediction error, contingency degradation), is also unlikely to mediate DAr antagonist effects because the effects of DAr antagonists did not depend on changes in responding across time. The remaining difference is that choice tests involved concurrent, instead of individual, lever presentations. Given there was no effect of DA receptor blockade on punished leverpressing, it is unlikely that these effects were due to a shift in responding to or away from the punished lever. Therefore it seems that DAergic modulation of unpunished responding is indirectly affected by the concurrent presentation of the punished lever within a choice test.

Bidirectional modulation of unpunished leverpressing by dmStr DA receptors might be mediated by effects on action-selection within the choice test. As previously argued, when both levers are presented, rats must allocate responding across the two levers. DA receptor

213 Chapter 8: General Discussion blockade had no effect on latencies to press either lever. However, it is possible that leverpressing under conditions of choice involves processing of the alternative response that conflicts with the current response (Redgrave et al., 1999); leverpressing may be reduced in the context of another lever, regardless of eventual leverpressing, through response conflict or attention directed towards the alternative lever. Dopamine receptor antagonists into the dmStr may affect this conflict.

Indeed, in recent models of direct and indirect pathway function (Bromberg-Martin et al., 2010; Kravitz & Kreitzer, 2012), dStr D1r are proposed to mediate approach whereas dStr

D2r to mediate avoidance. Given no drug effects were found on single-option leverpressing, suggesting dmStr DA receptors are not required for approach/avoidance in itself, D1r may specifically promote approach of the alternative lever; blocking this D1r-dependent signal would cause uninterrupted responding on the unpunished lever. Conversely, D2r could promote avoidance of the alternative lever, and blocking this D2r-dependent signal would cause increased response conflict, thus reducing unpunished leverpressing. Both D1r- and D2r- expressing neurons receive multiple other inputs (e.g. glutamatergic inputs from the cortex), which may mediate their role in reinforcement and punishment, while DAergic modulation of dmStr activity may only be important for consideration of alternative responses.

Methodological Considerations

The current protocol used two levers: one that was punished and another that was not.

Although Experiment 1 suggests Pavlovian factors do not account for suppression of responding on the punished lever, and robust differences in punished and unpunished leverpressing suggest reliable discrimination of the levers, this design presents some issues in interpreting drug effects. Firstly, the levers occupied distinct spatial locations. This presents a confound, as failure to appropriately suppress punished responding may be due to impairments in distinguishing levers based on their spatial location. This is one reason why

214 Chapter 8: General Discussion the hippocampus, a structure proposed to mediate punishment behaviour, was not manipulated, as it is strongly implicated in encoding spatial information. It was reasoned that impairments in spatial discrimination would confound the task. This possible confound is worth considering but it cannot provide a parsimonious explanation of the effects of BLA and

VTA manipulations because these manipulations were selective to the punished lever.

Each experiment involved selection of specific pharmacological agents to manipulate neural transmission. Although drugs and dosages were carefully selected to block neural signals of interest, neural activity was not directly recorded within the current experiments so it remains unclear how drug infusions affected signalling within the targeted structures. In some experiments (e.g., LHb, dmStr) a variety of compounds were used to provide as broad as possible an investigation. But, in other studies, fewer compounds were used and their probable efficacy and caveats will be discussed.

Most experiments employed infusions of BM, commonly used in inactivation studies

(for example, Corbit & Janak, 2010; Millan & McNally, 2011; Rogers & See, 2007), and was effective after infusions into the BLA. However, whether it was effective at preventing punishment signals within the PFC and dmStr is unknown, though it is reasoned that BM would suppress general activity within the targeted structure. For mAcbSh, it is possible that other compounds could have yielded different effects, e.g. DAr antagonists; Acb DA has been proposed to encode aversion (Danjo et al., 2014; Roitman et al., 2008; Shippenberg et al.,

1991; but see Lammel et al., 2012). That said, glutamatergic innervation of AcbSh modulates

DA release, both directly (Howland et al., 2002) and indirectly (Floresco et al., 2001; Yin et al., 2008). Lammel and colleagues (2012) suggest that VTA innervation of the mAcbSh is in fact glutamatergic, and DA terminals within the Acb have even been shown to co-release glutamate (Stuber et al., 2010). AMPA antagonism by NBQX would attenuate these

215 Chapter 8: General Discussion mesolimbic contributions to AcbSh activity and Experiment 3 suggests that these contributions are not important for punishment-influenced behaviour.

Likewise, for VTA, the GABA system was targeted. However, GABA is not the sole inhibitory influence on VTA neurons. The VTA receives inputs from the 5-HT dorsal and medial raphe (DRN/MRN), which have direct inhibitory and excitatory influences on VTA

DA neurons (Alex & Pehek, 2007; Dray et al., 1978; Gervais & Rouillard, 2000; Ugedo et al.,

1989). Given the role for 5-HT in punishment (see Chapter 2), it is worth investigating a role for 5-HT actions in VTA in future studies.

Although dmStr activity was manipulated in multiple ways, the effect of D1r and D2r antagonism on punishment acquisition was not assessed, as a between-subjects manipulation did not allow additional groups without a drastic reduction in statistical power. It was reasoned that BM inactivations of the dmStr would be sufficient to disrupt punishment- relevant signalling during acquisition, but this is nonetheless a caveat of that experiment.

Future Directions

Though the current experiments reveal a role of the BLA and VTA in punishment, the remaining neural mechanisms are unclear. No manipulation within this thesis completely abolished avoidance of the punished lever, and no evidence for particular circuits was provided. For instance, the inputs and outputs of the caudal BLA that support its role in punishment are unclear. Two strong candidates that receive caudal, but not rostral, BLA inputs are the PL and mAcbSh, which have been separately implicated in punishment-related behaviour. However, Experiments 3 and 4 suggest that these structures are not necessary for punishment. Thus it is likely that other structures mediate caudal BLA-dependent punishment behaviour. One possibility is the OFC, which also encodes outcome value and guides instrumental behaviour (Arana et al., 2003; O’Doherty et al., 2003; Schoenbaum et al., 2009).

However, the lack of anti-punishment effect following OFC lesions in a previous study

216 Chapter 8: General Discussion

(Pelloux et al., 2013), suggests the OFC is simply not required for punishment suppression, and that the caudal BLA’s role can be mediated by other structures. Future research should determine these pathways.

The precise cause of the enduring effect of VTA disinhibition is also unclear; is the long-lasting impulsivity/hyperactivity specifically due to abolishing pauses in reward-coding

DA firing during the shock? Removing fine-tuned DA coding throughout punishment learning? Or another aspect of VTA activity? An effective means of determining this would be to specifically manipulate VTA GABA inhibition during the shock or other points of the punishment session. This could be achieved through neural manipulations with greater temporal specificity than those used here. Optogenetics allows for this temporal precision, along with greater anatomical control, in the manipulation of neural activity, and could be used to address this issue with cell type specificity.

Concluding Remarks

Punishment is a fundamental form of learning, involving the suppression of a behaviour that leads to a negative outcome. Despite its importance for an organism’s survival, the neural mechanisms of punishment behaviour are not well understood. This thesis investigated the role of various structures in punishment acquisition, expression and aversive choice. The experiments that were conducted provide, for the first time, direct evidence for a role of BLA, particularly caudal BLA, in encoding the aversive value of a punisher, and a role for VTA inhibition in punishment-learning plasticity. Somewhat surprisingly, a role for the mAcbSh,

PFC, LHb and dmStr were not supported, suggesting these structures, and the particular circuits they within, are not important for punishment behaviour.

The protocol developed within this thesis reliably produces instrumental suppression, as well as punishment-influenced preference, providing a method for investigating the brain mechanisms of punishment acquisition, expression and choice. As a potent determinant of

217 Chapter 8: General Discussion behaviour, implicated in the pathologies of depression (Eshel & Roiser, 2010; Must et al.,

2006), psychopathy (Blair, 2008; Blair et al., 2004; Moul et al., 2012), ADHD (Humphreys &

Lee, 2011; van Meel et al., 2011) and substance abuse disorders (Bechara & Damasio, 2002;

Petry, 2001), continued research into the neural underpinnings of punishment is of significant theoretical and practical import.

218

References

Abraham, A. D., Neve, K. A., & Lattal, K. M. (2014). Dopamine and extinction: A

convergence of theory with fear and reward circuitry. Neurobiology of Learning and

Memory, 108, 65-77.

Adamantidis, A. R., Tsai, H. C., Boutrel, B., Zhang, F., Stuber, G. D., Budygin, E. A., ... & de

Lecea, L. (2011). Optogenetic interrogation of dopaminergic modulation of the multiple

phases of reward-seeking behavior. Journal of Neuroscience, 31, 10829-10835.

Adams, C. D., & Dickinson, A. (1981). Instrumental responding following reinforcer

devaluation. Quarterly Journal of Experimental Psychology, 33, 109-121.

Alex, K. D., & Pehek, E. A. (2007). Pharmacologic mechanisms of serotonergic regulation of

dopamine neurotransmission. Pharmacology & Therapeutics, 113, 296-320.

Alheid, G. F. (2003). Extended amygdala and basal forebrain. Annals of the New York

Academy of Sciences, 985, 185-205.

Altman, J., Brunner, R. L., & Bayer, S. A. (1973). The hippocampus and behavioral

maturation. Behavioral Biology, 8, 557-596.

Annau, Z., & Kamin, L. J. (1961). The conditioned emotional response as a function of

intensity of the US. Journal of Comparative and Physiological Psychology, 54, 428.

Appel, J. B. (1963). Punishment and shock intensity. Science, 141, 528-529.

Appel, J. B. (1968). Fixed-interval punishment. Journal of the Experimental Analysis of

Behavior, 11, 803-808.

Arana, F. S., Parkinson, J. A., Hinton, E., Holland, A. J., Owen, A. M., & Roberts, A. C.

(2003). Dissociable contributions of the human amygdala and orbitofrontal cortex to

incentive motivation and goal selection. Journal of Neuroscience, 23, 9632-9638.

Arnsten, A.F., & Li, B.M. (2005). Neurobiology of executive functions: Catecholamine

influences on prefrontal cortical functions. Biological Psychiatry, 57, 1377-1384.

219

Aston-Jones, G., Rajkowski, J., & Cohen, J. (1999). Role of locus coeruleus in attention and

behavioral flexibility. Biological Psychiatry, 46, 1309-1320.

Avila, C., & Torrubia, R. (2006). Personality differences in suppression of behavior as a

function of the probability of punishment. Personality and Individual Differences, 41,

249-260.

Azrin, N.H. (1956). Some effects of two intermittent schedules of immediate and non-

immediate punishment. Journal of Psychology, 42, 3-21.

Azrin, N.H. (1960). Effects of punishment intensity during variable-interval reinforcement.

Journal of the Experimental Analysis of Behavior, 3, 123-142.

Azrin, N.H., & Holz, W.C. (1961). Punishment during fixed-interval reinforcement. Journal

of the Experimental Analysis of Behavior, 4, 343-347.

Azrin, N.H., & Holz, W.C. (1966). Punishment. In W.K. Honig (Ed.), Operant behavior:

Areas of research and application (pp. 380-447). New York, NY: Appleton-Century-

Crofts.

Azrin, N. H., Holz, W. C., & Hake, D. F. (1963). Fixed-ratio punishment. Journal of the

Experimental Analysis of Behavior, 6, 141-148.

Baarendse, P. J., Winstanley, C. A., & Vanderschuren, L. J. (2013). Simultaneous blockade of

dopamine and noradrenaline reuptake promotes disadvantageous decision making in a

rat gambling task. Psychopharmacology, 225, 719-731.

Baetu, I., & Baker, A. G. (2010). Extinction and blocking of conditioned inhibition in human

causal learning. Learning and Behavior, 38, 394-407.

Balcita-Pedicino, J. J., Omelchenko, N., Bell, R., & Sesack, S. R. (2011). The inhibitory

influence of the lateral habenula on midbrain dopamine cells: Ultrastructural evidence

for indirect mediation via the rostromedial mesopontine tegmental nucleus. Journal of

Comparative Neurology, 519, 1143-1164.

220

Balleine, B. W., & Dickinson, A. (1998). Goal-directed instrumental action: Contingency and

incentive learning and their cortical substrates. Neuropharmacology, 37, 407-419.

Balleine, B.W., & Dickinson, A. (2000). The effect of lesions of the insular cortex on

instrumental conditioning: Evidence for a role in incentive memory. Journal of

Neuroscience, 20, 8954–8964.

Balleine, B. W., & Killcross, S. (2006). Parallel incentive processing: An integrated view of

amygdala function. Trends in Neurosciences, 29, 272-279.

Balleine, B. W., & O'Doherty, J. P. (2010). Human and rodent homologies in action control:

Corticostriatal determinants of goal-directed and habitual action.

Neuropsychopharmacology, 35, 48-69.

Balleine, B. W., Killcross, A. S., & Dickinson, A. (2003). The effect of lesions of the

basolateral amygdala on instrumental conditioning. Journal of Neuroscience, 23, 666-

675.

Balleine, B. W., Delgado, M. R., & Hikosaka, O. (2007). The role of the dorsal striatum in

reward and decision-making. Journal of Neuroscience, 27, 8161-8165.

Banks, R. K. (1976). Resistance to punishment as a function of intensity and frequency of

prior punishment experience. Learning and Motivation, 7, 551-558.

Baron, A. (1965). Delayed punishment of a runway response. Journal of Comparative and

Physiological Psychology, 60, 131-134.

Barrett, J.E. (1977). Behavioral history as a determinant of the effects of d-amphetamine on

punished behaviour. Science, 198, 67-69.

Barrett, J. E. (1992). Studies on the effects of 5‐HT1A drugs in the pigeon. Drug

Development Research, 26, 299-317.

Barnes, N. M., & Sharp, T. (1999). A review of central 5-HT receptors and their function.

Neuropharmacology, 38, 1083-1152.

221

Barrett, J. E., & Vanover, K. E. (1993). 5-HT receptors as targets for the development of

novel anxiolytic drugs: Models, mechanisms and future directions.

Psychopharmacology, 112, 1-12.

Barrett, J. E., & Zhang, L. (1991). Anticonflict and discriminative stimulus effects of the 5‐

HT1A compounds WY‐47,846 and WY‐48,723 and the mixed 5‐HT1A agonist/5‐HT2

antagonist WY‐50,324 in pigeons. Drug Development Research, 24, 179-188.

Barrett, J. E., Brady, L. S., & Witkin, J. M. (1985). Behavioral studies with anxiolytic drugs.

I. Interactions of the benzodiazepine antagonist Ro 15-1788 with chlordiazepoxide,

pentobarbital and ethanol. Journal of Pharmacology and Experimental Therapeutics,

233, 554-559.

Barros-Loscertales, A., Meseguer, V., Sanjuan, A., Belloch, V., Parcet, M. A., Torrubia, R., &

Avila, C. (2006). Behavioral inhibition system activity is associated with increased

amygdala and hippocampal gray matter volume: A voxel-based morphometry study.

Neuroimage, 33, 1011-1015.

Baxter, M. G., Parker, A., Lindner, C. C., Izquierdo, A. D., & Murray, E. A. (2000). Control

of response selection by reinforcer value requires interaction of amygdala and orbital

prefrontal cortex. Journal of Neuroscience, 20, 4311-4319.

Beauchamp, R.D.A. (1966). A comparison of the degree of suppression following either a

discriminative punishment treatment of a conditioned emotional response treatment.

Unpublished MA thesis, Brown University.

Beavers, W. O., & Perkins Jr, C. C. (1977). Punishment of nonspecific responses: Does the

negative half of the law of effect apply? Bulletin of the Psychonomic Society, 9, 14-16.

Bechara, A., & Damasio, H. (2002). Decision-making and addiction (part I): Impaired

activation of somatic states in substance dependent individuals when pondering

decisions with negative future consequences. Neuropsychologia, 40, 1675-1689.

222

Bechara, A., Damasio, H., Damasio, A. R., & Lee, G. P. (1999). Different contributions of the

human amygdala and ventromedial prefrontal cortex to decision-making. Journal of

Neuroscience, 19, 5473-5481.

Bechara, A., Damasio, H., & Damasio, A. R. (2000). Emotion, decision making and the

orbitofrontal cortex. Cerebral Cortex, 10, 295-307.

Becker, J. B., Rudick, C. N., Jenkins, W. J. (2001) The role of dopamine in the nucleus

accumbens and striatum during sexual behavior in the female rat. Journal of

Neuroscience, 21, 3236-3241.

Belova, M. A., Paton, J. J., Morrison, S. E., & Salzman, C. D. (2007). Expectation modulates

neural responses to pleasant and aversive stimuli in primate amygdala. Neuron, 55, 970-

984.

Berridge, K. C. (2007). The debate over dopamine’s role in reward: The case for incentive

salience. Psychopharmacology, 191, 391-431.

Birbaumer, N., Veit, R., Lotze, M., Erb, M., Hermann, C., Grodd, W., & Flor, H. (2005).

Deficient fear conditioning in psychopathy: A functional magnetic resonance imaging

study. Archives of General Psychiatry, 62, 799-805.

Black, A. H., Nadel, L., & O'Keefe, J. (1977). Hippocampal function in avoidance learning

and punishment. Psychological Bulletin, 84, 1107-1129.

Blair, R.J.R. (2008). The amygdala and ventromedial prefrontal cortex: functional

contributions and dysfunction in psychopathy. Philosophical Transactions of the Royal

Society of London B: Biological Sciences, 363, 2557-2565.

Blair, R.J.R., Mitchell, D.G.V., Leonard, A., Budhani, S., Peschardt, K.S., & Newman, C.

(2004). Passive avoidance learning in individuals with psychopathy: modulation by

reward but not by punishment. Personality and Individual Differences, 37, 1179-1192.

223

Blum, S., Hebert, A. E., & Dash, P. K. (2006). A role for the prefrontal cortex in recall of

recent and remote memories. NeuroReport, 17, 341-344.

Blythe, S. N., Wokosin, D., Atherton, J. F., & Bevan, M. D. (2009). Cellular mechanisms

underlying burst firing in substantia nigra dopamine neurons. Journal of Neuroscience,

29, 15531-15541.

Boe, E.E. (1969). Bibliography on punishment. In B.A. Campbell & R.M. Church (Eds.),

Punishment and aversive behavior (pp. 531-589). New York, NY: Appleton-Century-

Crofts.

Bolam, J.P., Hanley, J.J., Booth, P.A.C., & Bevan, M.D. (2000). Synaptic organisation of the

basal ganglia. Journal of Anatomy, 196, 527-542.

Bolles, R.C. (1970). Species-specific defense reactions and avoidance learning. Psychological

Review, 77, 32-48.

Bolles, R. C., Uhl, C. N., Wolfe, M., & Chase, P. B. (1975). Stimulus learning versus

response learning in a discriminated punishment situation. Learning and Motivation, 6,

439-447.

Bolles, R.C., Holtz, R., Dunn, T., & Hill, W. (1980). Comparisons of stimulus learning and

response learning in a punishment situation. Learning and Motivation, 11, 78-96.

Bourdy, R., & Barrot, M. (2012). A new control center for dopaminergic systems: Pulling the

VTA by the tail. Trends in Neurosciences, 35, 681-690.

Bouton, M. E. (1988). Context and ambiguity in the extinction of emotional learning:

Implications for exposure therapy. Behaviour Research and Therapy, 26, 137-149.

Bouton, M.E., & Bolles, R.C. (1980). Conditioned fear assessed by freezing and by the

suppression of three different baselines. Animal Learning & Behavior, 8, 429-434.

224

Bouton, M. E., & Schepers, S. T. (2015). Renewal after the punishment of free operant

behavior. Journal of Experimental Psychology: Animal Learning and Cognition, 41, 81-

90.

Bowery, N.G., Hudson, A.L., & Price, G.W. (1987). GABAA and GABAB receptor site

distribution in the rat central nervous system. Neuroscience, 20, 365-385.

Bowker, R.M., & Abbott, L.C. (1990). Quantitative re-evaluation of descending serotonergic

and non-serotonergic projections from the medulla of the rodent: evidence for extensive

co-existence of serotonin and peptides in the same spinally projecting neurons, but not

from the nucleus raphe magnus. Brain Research, 512, 15-25.

Brady, J. V., & Hunt, H. F. (1955). An experimental approach to the analysis of emotional

behavior. Journal of Psychology, 40, 313-324.

Brandão, M. L., Fontes, J. C. S., & Graeff, F. G. (1980). Facilitatory effect of ketamine on

punished behavior. Pharmacology Biochemistry and Behavior, 13, 1-4.

Brocco, M. J., Koek, W., Degryse, A. D., & Colpaert, F. C. (1990). Comparative studies on

the anti-punishment effects of chlordiazepoxide, buspirone and ritanserin in the pigeon,

Geller-Seifter and Vogel conflict procedures. Behavioural Pharmacology, 1, 403-418.

Brodie, M. S., & Bunney, E. B. (1996). Serotonin potentiates dopamine inhibition of ventral

tegmental area neurons in vitro. Journal of Neurophysiology, 76, 2077-2082.

Brog, J. S., Salyapongse, A., Deutch, A. Y., & Zahm, D. S. (1993). The patterns of afferent

innervation of the core and shell in the “Accumbens” part of the rat ventral striatum:

Immunohistochemical detection of retrogradely transported fluoro‐gold. Journal of

Comparative Neurology, 338, 255-278.

Bromberg-Martin, E. S., Matsumoto, M., & Hikosaka, O. (2010). Dopamine in motivational

control: rewarding, aversive, and alerting. Neuron, 68, 815-834.

225

Brown, V. J., & Bowman, E. M. (2002). Rodent models of prefrontal cortical function.

Trends in Neurosciences, 25, 340-343.

Brown, P. L., & Shepard, P. D. (2013). Lesions of the fasciculus retroflexus alter footshock-

induced cFos expression in the mesopontine rostromedial tegmental area of rats. PLoS

One, 8, e60678.

Cai, W., Ryali, S., Chen, T., Li, C. S. R., & Menon, V. (2014). Dissociable roles of right

inferior frontal cortex and anterior insula in inhibitory control: Evidence from intrinsic

and task-related functional parcellation, connectivity, and response profile analyses

across multiple datasets. Journal of Neuroscience, 34, 14652-14667.

Calabresi, P., Picconi, B., Tozzi, A., Ghiglieri, V., & Di Filippo, M. (2014). Direct and

indirect pathways of basal ganglia: A critical reappraisal. Nature Neuroscience, 17,

1022-1030.

Camara, E., Rodriguez-Fornells, A., & Münte, T. F. (2009). Functional connectivity of reward

processing in the brain. Frontiers in Human Neuroscience, 2, 19.

Camp, D. S., Raymond, G. A., & Church, R. M. (1967). Temporal relationship between

response and punishment. Journal of Experimental Psychology, 74, 114-123.

Cannon, C. M., & Palmiter, R. D. (2003). Reward without dopamine. Journal of

Neuroscience, 23, 10827-10831.

Cardinal, R. N., Parkinson, J. A., Hall, J., & Everitt, B. J. (2003). The contribution of the

amygdala, nucleus accumbens, and prefrontal cortex to emotion and motivated

behaviour. International Congress Series, 1250, 347-370.

Carter, C. S., Braver, T. S., Barch, D. M., Botvinick, M. M., Noll, D., & Cohen, J. D. (1998).

Anterior cingulate cortex, error detection, and the online monitoring of performance.

Science, 280, 747-749.

226

Cervo, L., & Samanin, R. (1995). 5-HT 1A receptor full and partial agonists and 5-HT 2C

(but not 5-HT 3) receptor antagonists increase rates of punished responding in rats.

Pharmacology Biochemistry and Behavior, 52, 671-676.

Choi, D.W. (1992). Excitotoxic cell death. Journal of Neurobiology, 23, 1261-1276.

Christmas, D., Hood, S., & Nutt, D. (2008). Potential novel anxiolytic drugs. Current

Pharmaceutical Design, 14, 3534-3546.

Christoph, G. R., Leonzio, R. J., & Wilcox, K. S. (1986). Stimulation of the lateral habenula

inhibits dopamine-containing neurons in the substantia nigra and ventral tegmental area

of the rat. Journal of Neuroscience, 6, 613-619.

Chung, S. H., & Herrnstein, R. J. (1967). Choice and delay of reinforcement. Journal of the

Experimental Analysis of Behavior, 10, 67-74.

Church, R.M. (1963). The varied effects of punishment on behavior. Psychological Review,

70, 369-402.

Church, R.M. (1964). Systematic effect of random error in the yoked control design.

Psychological Bulletin, 62, 122-131.

Church, R.M. (1969). Response suppression. In B.A. Campbell & R.M. Church (Eds.),

Punishment and aversive behavior (pp. 111-156). New York, NY: Appleton-Century-

Crofts.

Church, R.M., Raymond, G.A., & Beauchamp, R.D. (1967). Response suppression as a

function of intensity and duration of punishment. Journal of Comparative and

Psychological Psychology, 63, 39-44.

Church, R. M., Wooten, C. L., & Matthews, T. J. (1970). Discriminative punishment and the

conditioned emotional response. Learning and Motivation, 1, 1-17.

Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B., & Uchida, N. (2012). Neuron-type-specific

signals for reward and punishment in the ventral tegmental area. Nature, 482, 85-88.

227

Concha, M. L., & Wilson, S. W. (2001). Asymmetry in the epithalamus of veterbrates.

Journal of Anatomy, 199, 63-84.

Coghill, R.C., Talbot, J.D., Evans, A. C., Meyer, E., Gjedde, A., Bushnell, M.C., & Duncan,

G. H. (1994). Distributed processing of pain and vibration by the human brain. The

Journal of Neuroscience, 14, 4095-4108.

Collins, D. J., & Shanks, D. R. (2006). Summation in causal learning: Elemental processing

or configural generalization? Quarterly Journal of Experimental Psychology, 59, 1524-

1534.

Colpaert, F. C., Koek, W., Lehmann, J., Rivet, J. M., Lejeune, F., Canton, H., ... & Lavielle,

G. (1992). S 14506: A novel, potent, high‐efficacy 5‐HT1A agonist and potential

anxiolytic agent. Drug Development Research, 26, 21-48.

Colwill, R.M., & Rescorla, R.A. (1985a). Postconditioning devaluation of a reinforcer affects

instrumental responding. Journal of Experimental Psychology: Animal Behaviour

Processes, 11, 120-132.

Colwill, R.M., & Rescorla, R.A. (1985b). Instrumental responding remains sensitive to

reinforcer devaluation after extensive training. Journal of Experimental Psychology:

Animal Behaviour Processes, 11, 120-132.

Colwill, R. M., & Rescorla, R. A. (1986). Associative structures in instrumental learning.

Psychology of Learning and Motivation, 20, 55-104.

Colwill, R. M., & Rescorla, R. A. (1988). The role of response-reinforcer associations

increases throughout extended instrumental training. Animal Learning & Behavior, 16,

105-111.

Commissaris, R. (1993). Conflict behaviors as animal models for the study of anxiety. In F.

van Haaren (Ed.), Research methods in behavioural pharmacology (pp. 443-466).

Amsterdam: Elsevier Science Publishers.

228

Cook, L., & Davidson, A. B. (1973). Effects of behaviorally active drugs in a conflict-

punishment procedure in rats. In S. Garattini, E. Mussini, & L.O. Randall (pp. 327-345),

The Benzodiazepines. New York, NY: Raven Press.

Cools, R., Roberts, A. C., & Robbins, T. W. (2008). Serotoninergic regulation of emotional

and behavioural control processes. Trends in Cognitive Sciences, 12, 31-40.

Corbit, L. H., & Balleine, B. W. (2003). The role of prelimbic cortex in instrumental

conditioning. Behavioural Brain Research, 146, 145-157.

Corbit, L. H., & Balleine, B. W. (2005). Double dissociation of basolateral and central

amygdala lesions on the general and outcome-specific forms of pavlovian-instrumental

transfer. Journal of Neuroscience, 25, 962-970.

Corbit, L. H., & Janak, P. H. (2010). Posterior dorsomedial striatum is critical for both

selective instrumental and Pavlovian reward learning. European Journal of

Neuroscience, 31, 1312-1321.

Corbit, L. H., Leung, B. K., & Balleine, B. W. (2013). The role of the amygdala-striatal

pathway in the acquisition and performance of goal-directed instrumental actions.

Journal of Neuroscience, 33, 17682-17690.

Corcoran, K. A., & Quirk, G. J. (2007). Activity in prelimbic cortex is necessary for the

expression of learned, but not innate, fears. Journal of Neuroscience, 27, 840-844.

Coutureau, E., & Killcross, S. (2003). Inactivation of the infralimbic prefrontal cortex

reinstates goal-directed responding in overtrained rats. Behavioural Brain Research,

146, 167-174.

Coutureau, E., Marchand, A. R., & Di Scala, G. (2009). Goal-directed responding is sensitive

to lesions to the prelimbic cortex or basolateral nucleus of the amygdala but not to their

disconnection. Behavioral Neuroscience, 123, 443.

229

Danjo, T., Yoshimi, K., Funabiki, K., Yawata, S., & Nakanishi, S. (2014). Aversive behavior

induced by optogenetic inactivation of ventral tegmental area dopamine neurons is

mediated by dopamine D2 receptors in the nucleus accumbens. Proceedings of the

National Academy of Sciences, 111, 6455-6460.

Davidson, A. F., & Cook, L. (1969). Effects of combined treatment with trifluoperazine-HCl

and amobarbital on punished behavior in rats. Psychopharmacologia, 15, 159-168.

Davis, M. (1992). The role of the amygdala in fear and anxiety. Annual Review of

Neuroscience, 15, 353-375.

Daw, N. D., Kakade, S., & Dayan, P. (2002). Opponent interactions between serotonin and

dopamine. Neural Networks, 15, 603-616.

Dayan, P. (2012). Instrumental vigour in punishment and reward. European Journal of

Neuroscience, 35, 1152-1168. de Wit, S., Watson, P., Harsay, H. A., Cohen, M. X., van de Vijver, I., & Ridderinkhof, K. R.

(2012). Corticostriatal connectivity underlies individual differences in the balance

between habitual and goal-directed action control. Journal of Neuroscience, 32, 12066-

12075.

Dearing, M.F., & Dickinson, A. (1979). Counterconditioning of shock by a water reinforcer in

rabbits. Animal Learning Behaviour, 7, 360-366.

Deisseroth, K. (2011). Optogenetics. Nature Methods, 8, 26-29.

Dews, P.B., & Wenger, G.R. (1977). Rate dependency of the behavioural effects of

amphetamine. In T. Thompson & P.B. Dews (Eds.), Advances in behavioural

pharmacology, Vol. 1 (pp. 167-227). Orlando: Academic Press.

Dias, R., Robbins, T. W., & Roberts, A. C. (1996). Dissociation in prefrontal cortex of

affective and attentional shifts. Nature, 380, 69-72.

230

Dickinson, A. (1994). Instrumental conditioning. In N. J. Mackintosh (Ed.), Animal Learning

and Cognition (pp. 45-79). Orlando: Academic Press.

Dickinson, A. (2010). Instrumental conditioning. In Encyclopedia of Psychopharmacology

(pp. 645-649). Berlin, Germany: Springer Berlin Heidelberg.

Dickinson, A., & Balleine, B. (2002). The role of learning in the operation of motivational

systems. In C.R. Gallistel (Ed.) Stevens' Handbook of Experimental Psychology (Vol. 3,

3rd Ed, pp. 497-534). New York, NY: John Wiley & Sons.

Dickinson, A., & Dearing, M. F. (1979). Appetitive-aversive interactions and inhibitory

processes. In A. Dickinson & R. A. Boakes (Eds.), Mechanisms of learning and

motivation: A memorial volume to Jerzy Konorski (pp. 203-231). Hillsdale, NJ:

Erlbaum.

Dickinson, A., Balleine, B. W., Watt, A., Gonzales, F., & Boakes, R. A. (1995). Overtraining

and the motivational control of instrumental action. Animal Learning & Behavior, 22,

197-206.

Dickinson, A., Squire, S., Varga, Z., & Smith, J.W. (1998). Omission learning after

instrumental pretraining. Quarterly Journal of Experimental Psychology: Section B, 51,

271-286.

Dinsmoor, J.V. (1954). Punishment: I. The avoidance hypothesis. Psychological Review, 61,

34-46.

Dinsmoor, J. A. (1977). Escape, avoidance, punishment: Where do we stand? Journal of the

Experimental Analysis of Behavior, 28, 83-95.

Dinsmoor, J. A. (1998). Punishment. In W.T. O’Donohue (Ed.), Learning and behavior

therapy (pp. 188-204). Needham Heights, MA: Allyn & Bacon.

231

Dray, A., Davies, J., Oakley, N. R., Tongroach, P., & Vellucci, S. (1978). The dorsal and

medial raphe projections to the substantia nigra in the rat: electrophysiological,

biochemical and behavioural observations. Brain Research, 151, 431-442.

Dreyer, J. K., Herrik, K. F., Berg, R. W., & Hounsgaard, J. D. (2010). Influence of phasic and

tonic dopamine release on receptor activation. Journal of Neuroscience, 30, 14273-

14283.

Dubrovina, N. I., & Zinov’eva, D. V. (2010). Effects of activation and blockade of dopamine

receptors on the extinction of a passive avoidance reaction in mice with a depressive-

like state. Neuroscience and Behavioral Physiology, 40, 55-59.

Dunham, P.J. (1971). Punishment: Method and theory. Psychological Review, 78, 58-70.

Dunham, P.J. (1972). Some effects of punishment upon unpunished responding. Journal of

the Experimental Analysis of Behavior, 17, 443-450.

Eison, A. S., & Temple, D. L. (1986). Buspirone: review of its pharmacology and current

perspectives on its mechanism of action. American Journal of Medicine, 80, 1-9.

Erofeeva, M. N. (1916). Contributions a l’etude des reflexes conditionnels destructifs. Compte

Rendu de la Societé de Biologie Paris, 79, 239-240.

Escorhuela, R.M., Fernández-Teruel, A., Zapata, A., Núñez, J.F., & Tobeña, A. (1993).

Flumazenil prevents the anxiolytic effects of diazepam, alprazolam and adinazolam on

the early acquisition of two-way active avoidance. Pharmacological Research, 28, 53-

58.

Eshel, N., & Roiser, J. P. (2010). Reward and punishment processing in depression.

Biological Psychiatry, 68, 118-124.

Estes, W.K. (1944). An experimental study of punishment. Psychological Monographs, 57 (3,

Whole No. 263).

232

Estes, W.K. (1969). Outline of a theory of punishment. In B.A. Campbell & R.M. Church

(Eds.), Punishment and aversive behavior (pp. 57-82). New York, NY: Appleton-

Century-Crofts.

Estes, W.K., & Skinner, B.F. (1941). Some quantitative properties of anxiety. Journal of

Experimental Psychology, 29, 390-400.

Everitt, B. J., & Robbins, T. W. (2005). Neural systems of reinforcement for drug addiction:

From actions to habits to compulsion. Nature Neuroscience, 8, 1481-1489.

Farley, J., & Fantino, E. (1978). The symmetrical law of effect and the matching relation in

choice behavior. Journal of the Experimental Analysis of Behavior, 29, 37-60.

Farwell, B. J., & Ayres, J. J. (1979). Stimulus-reinforcer and response-reinforcer relations in

the control of conditioned appetitive headpoking (“goal tracking”) in rats. Learning and

Motivation, 10, 295-312.

Faulkner, P., & Deakin, W.J.F. (2014). The role of serotonin in reward, punishment and

behavioural inhibition in humans: insights from studies with acute tryptophan depletion.

Neuroscience and Biobehavioral Reviews, 46, 365-378.

Fenno, L., Yizhar, O., & Deisseroth, K. (2011). The development and application of

optogenetics. Annual Review of Neuroscience, 34, 389-412.

Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement. New York, NY:

Appleton-Century-Crofts.

Fink, J. S., & Smith, G. P. (1980). Mesolimbic and mesocortical dopaminergic neurons are

necessary for normal exploratory behavior in rats. Neuroscience Letters, 17, 61-65.

Fiorillo, C. D. (2013). Two dimensions of value: Dopamine neurons represent reward but not

aversiveness. Science, 341, 546-549.

233

Floresco, S. B., Todd, C. L., & Grace, A. A. (2001). Glutamatergic afferents from the

hippocampus to the nucleus accumbens regulate activity of ventral tegmental area

dopamine neurons. Journal of Neuroscience, 21, 4915-4922.

Flynn, F. G. (1999). Anatomy of the insula functional and clinical correlates. Aphasiology, 13,

55-78.

Franciotti, R., Ciancetta, L., Della Penna, S., Belardinelli, P., Pizzella, V., & Romani, G.L.

(2009). Modulation of alpha oscillations in insular cortex reflects the threat of painful

stimuli. Neuroimage, 46, 1082–1090.

Frank, M. J. (2005). Dynamic dopamine modulation in the basal ganglia: a

neurocomputational account of cognitive deficits in medicated and nonmedicated

Parkinsonism. Journal of Cognitive Neuroscience, 17, 51-72.

Friedman, A., Lax, E., Dikshtein, Y., Abraham, L., Flaumenhaft, Y., Sudai, E., Ben-Tzion,

M., Ami-Ad, L., Yaka, R., & Yadid, G. (2010). Electrical stimulation of the lateral

habenula produces enduring inhibitory effect on cocaine seeking behavior.

Neuropharmacology, 59, 452-459.

Friedman, A., Lax, E., Dikshtein, Y., Abraham, L., Flaumenhaft, Y., Sudai, E., Ben-Tzion,

M., & Yadid, G. (2011). Electrical stimulation of the lateral habenula produces an

inhibitory effect on sucrose self-administration. Neuropharmacology, 60, 381-387.

Furlong, T. M., Cole, S., Hamlin, A. S., & McNally, G. P. (2010). The role of prefrontal

cortex in predictive fear learning. Behavioral Neuroscience, 124, 574-586.

Gardner, C. R. (1986). Recent developments in 5HT-related pharmacology of animal models

of anxiety. Pharmacology Biochemistry and Behavior, 24, 1479-1485.

Gartside, S. E., Umbers, V., & Sharp, T. (1997). Inhibition of 5-HT cell firing in the DRN by

non-selective 5-HT reuptake inhibitors: Studies on the role of 5-HT1A autoreceptors

and noradrenergic mechanisms. Psychopharmacology, 130, 261-268.

234

Geller, I., & Hartmann, R. J. (1982). Effects of buspirone on operant behavior of laboratory

rats and cynomologus monkeys. Journal of Clinical Psychiatry, 43, 25-32.

Geller, I., & Seifter, J. (1960). The effects of meprobamate, barbiturates, d-amphetamine and

promazine on experimentally induced conflict in the rat. Psychopharmacologia, 1, 482-

492.

Geller, I., & Seifter, J. (1962). The effects of mono-urethans, di-urethans and barbiturates on a

punishment discrimination. Journal of Pharmacology and Experimental Therapeutics,

136, 284-288.

Geller, I., Bachman, E., & Seifter, J. (1963). Effects of reserpine and morphine on behavior

suppressed by punishment. Life Sciences, 2, 226-231

Gervais, J., & Rouillard, C. (2000). Dorsal raphe stimulation differentially modulates

dopaminergic neurons in the ventral tegmental area and substantia nigra. Synapse, 35,

281-291.

Ghahremani, A., Rastogi, A., & Lam, S. (2015). The role of right anterior insula and salience

processing in inhibitory control. Journal of Neuroscience, 35, 3291-3292.

Gifuni, A. J., Jozaghi, S., Gauthier-Lamer, A. C., & Boye, S. M. (2012). Lesions of the lateral

habenula dissociate the reward-enhancing and locomotor-stimulant effects of

amphetamine. Neuropharmacology, 63, 945-957.

Gleeson, S., Ahlers, S. T., Mansbach, R. S., Foust, J. M., & Barrett, J. E. (1989). Behavioral

studies with anxiolytic drugs. VI. Effects on punished responding of drugs interacting

with serotonin receptor subtypes. Journal of Pharmacology and Experimental

Therapeutics, 250, 809-817.

Glowa, J. R., & Barrett, J. E. (1976). Effects of alcohol on punished and unpunished

responding of squirrel monkeys. Pharmacology Biochemistry and Behavior, 4, 169-173.

235

Glowa, J.R., Crawley, J., Suzdak, P.D., & Paul, S.M. (1988). Ethanol and the GABA receptor

complex: Studies with the partial inverse benzodiazepine receptor agonist Ro 15-4513.

Pharmacology, Biochemistry and Behaviour, 31, 767-772.

Gluckman, M. I., & Stein, L. (1978). Pharmacology of lorazepam. Journal of Clinical

Psychiatry, 39, 3-10.

Goldberg, H. L., & Finnerty, R. J. (1979). The comparative efficacy of buspirone and

diazepam in the treatment of anxiety. American Journal of Psychiatry, 136, 1184-1187.

Goldberg, H. L., & Finnerty, R. J. (1982). Comparison of buspirone in two separate studies.

Journal of Clinical Psychiatry, 43, 87-91.

Goldberg, M. E., Salama, A. I., Patel, J. B., & Malick, J. B. (1983). Novel non-

benzodiazepine anxiolytics. Neuropharmacology, 22, 1499-1504.

Good, C. H., Wang, H., Chen, Y. H., Mejias-Aponte, C. A., Hoffman, A. F., & Lupica, C. R.

(2013). Dopamine D4 receptor excitation of lateral habenula neurons via multiple

cellular mechanisms. Journal of Neuroscience, 33, 16853-16864.

Goodall, G. (1984). Learning due to the response-shock contingency in signalled punishment.

Quarterly Journal of Experimental Psychology Section B: Comparative and

Physiological Psychology, 36, 259-279.

Goodall, G., & Mackintosh, N. J. (1987). Analysis of the Pavlovian properties of signals for

punishment. Quarterly Journal of Experimental Psychology, 39, 1-21.

Graeff, F. G., & Schoenfeld, R. I. (1970). Tryptaminergic mechanisms in punished and

nonpunished behavior. Journal of Pharmacology and Experimental Therapeutics, 173,

277-283.

Grace, A. A. (1991). Phasic versus tonic dopamine release and the modulation of dopamine

system responsivity: A hypothesis for the etiology of schizophrenia. Neuroscience, 41,

1-24.

236

Grace, A. A., Floresco, S. B., Goto, Y., & Lodge, D. J. (2007). Regulation of firing of

dopaminergic neurons and control of goal-directed behaviors. Trends in Neurosciences,

30, 220-227.

Gray, J. A. (1977). Drug effects on fear and frustration: Possible limbic site of action of minor

tranquilizers. In L. L. Iversen, S. D. Iversen, & S. H. Snyder (Eds.), Drugs,

Neurotransmitters, and Behavior (pp. 433-529). New York, NY: Springer.

Gray, J. A. (1981). A critique of Eysenck’s theory of personality. In H.J. Eysenck (Ed.), A

model for personality (pp. 246-276). Berlin, Germany: Springer-Verlag.

Gray, J. (1982). The neuropsychology of anxiety: Inquiry into the septohippocampal system.

Oxford: Clarendon Press.

Gray, J. A., & McNaughton, N. (1983). Comparison between the behavioural effects of septal

and hippocampal lesions: A review. Neuroscience & Biobehavioral Reviews, 7, 119-

188.

Gray, J.A., Rawlins, J.N.P, & Feldon, J. (1979). Brain mechanisms in the inhibition of

behaviour. In A. Dickinson & R.A. Boakes (Eds.), Mechanisms of learning and

motivation: A memorial volume to Jerzy Konorski (pp. 295-316). New York, NY:

Psychology Press.

Graybiel, A.M. (2000). The basal ganglia. Current Biology, 14, R509-R511.

Griffiths, K. R., Morris, R. W., & Balleine, B. W. (2014). Translational studies of goal-

directed action as a framework for classifying deficits across psychiatric disorders.

Frontiers in Systems Neuroscience, 8, 101.

Grillon, C., & Ameli, R. (2001). Conditioned inhibition of fear-potentiated startle and skin

conductance in humans. Psychophysiology, 38, 807-815.

237

Groenewegen, H. J., Wright, C. I., Beijer, A. V., & Voorn, P. (1999). Convergence and

segregation of ventral striatal inputs and outputs. Annals of the New York Academy of

Sciences, 877, 49-63.

Guarraci, F. A., & Kapp, B. S. (1999). An electrophysiological characterization of ventral

tegmental area dopaminergic neurons during differential pavlovian fear conditioning in

the awake rabbit. Behavioural Brain Research, 99, 169-179.

Haber, S.N. (2003). The primate basal ganglia: Parallel and integrative networks. Journal of

Chemical Neuroanatomy, 26, 317-330.

Haber, S.N. (2014). The place of dopamine in the cortico-basal ganglia circuit. Neuroscience,

282, 248-257.

Hahn, T., Dresler, T., Plichta, M. M., Ehlis, A. C., Ernst, L. H., Markulin, F., ... & Fallgatter,

A. J. (2010). Functional amygdala-hippocampus connectivity during anticipation of

aversive events is associated with Gray's trait “sensitivity to punishment”. Biological

Psychiatry, 68, 459-464.

Hajós‐Korcsok, É., & Sharp, T. (1999). Effect of 5‐HT1A receptor ligands on Fos‐like

immunoreactivity in rat brain: Evidence for activation of noradrenergic transmission.

Synapse, 34, 145-153.

Hake, D.F., & Azrin, N.H. (1965). Conditioned punishment. Journal of the Experimental

Analysis of Behavior, 8, 279-293.

Hamlin, A. S., Clemens, K. J., Choi, E. A., & McNally, G. P. (2009). Paraventricular

thalamus mediates context‐induced reinstatement (renewal) of extinguished reward

seeking. European Journal of Neuroscience, 29, 802-812.

Hammond, L. J. (1980). The effect of contingency upon the appetitive conditioning of free-

operant behavior. Journal of the Experimental Analysis of Behavior, 34, 297.

238

Hart, G., Leung, B. K., & Balleine, B. W. (2014). Dorsal and ventral streams: the distinct role

of striatal subregions in the acquisition and performance of goal-directed actions.

Neurobiology of Learning and Memory, 108, 104-118.

Harvey, J.A., Schlosberg, A.J., & Yunger, L.M. (1975). Behavioral correlates of serotonin

depletion. Federation Proceedings, 34, 1976-1801.

Hayen, A., Meese-Tamuri, S., Gates, A., & Ito, R. (2014). Opposing roles of prelimbic and

infralimbic dopamine in conditioned cue and place preference. Psychopharmacology,

231, 2483-2492.

Hayes, D.J., & Northoff, G. (2011). Identifying a network of brain regions involved in

aversion-related processing: a cross-species translational investigation. Frontiers in

Integrative Neuroscience, 5, 49.

Hearst, E. (1976). Pavlovian conditioning and directed movements. Psychology of Learning

and Motivation, 9, 215-262.

Herkenham, M., & Nauta, W. J. (1977). Afferent connections of the habenular nuclei in the

rat: A horseradish peroxidase study, with a note on the fiber-of-passage problem.

Journal of Comparative Neurology, 173, 123-146.

Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency

of reinforcement. Journal of the Experimental Analysis of Behavior, 4, 267-272.

Herrnstein, R. J. (1970). On the Law of Effect. Journal of the Experimental Analysis of

Behavior, 13, 243-266.

Herzallah, M. M., Moustafa, A. A., Natsheh, J. Y., Abdellatif, S. M., Taha, M. B., Tayem, Y.

I., ... & Gluck, M. A. (2013). Learning from negative feedback in patients with major

depressive disorder is attenuated by SSRI antidepressants. Frontiers in Integrative

Neuroscience, 7, 67.

Hikida, T., Kimura, K., Wada, N., Funabiki, K., & Nakanishi, S. (2010). Distinct roles of

239

synaptic transmission in direct and indirect striatal pathways to reward and aversive

behavior. Neuron, 66, 896-907.

Hikosaka, O. (2007). Basal ganglia mechanisms of reward‐oriented eye movement. Annals of

the New York Academy of Sciences, 1104, 229-249.

Hikosaka, O. (2010). The habenula: From stress evasion to value-based decision-making.

Nature Reviews Neuroscience, 11, 503-513.

Hikosaka, O., Sesack, S. R., Lecourtier, L., & Shepard, P. D. (2008). Habenula: Crossroad

between the basal ganglia and the limbic system. Journal of Neuroscience, 28, 11825-

11829.

Hillegaart, V., & Ahlenius, S. (1987). Effects of raclopride on exploratory locomotor activity,

treadmill locomotion, conditioned avoidance behaviour and catalepsy in rats:

behavioural profile comparisons between raclopride, haloperidol and preclamol.

Pharmacology & Toxicology, 60, 350-354.

Hitchcott, P. K., Quinn, J. J., & Taylor, J. R. (2007). Bidirectional modulation of goal-

directed actions by prefrontal cortical dopamine. Cerebral Cortex, 17, 2820-2827.

Hodgson, T. L., Mort, D., Chamberlain, M. M., Hutton, S. B., O’Neill, K. S., & Kennard, C.

(2002). Orbitofrontal cortex mediates inhibition of return. Neuropsychologia, 40, 1891-

1901.

Hoffman, H.S., & Fleshler, M. (1959). Aversive control with the pigeon. Journal of the

Experimental Analysis of Behavior, 10, 35-44.

Hoffman, H.S., & Fleshler, M. (1961). Stimulus factors in aversive controls: The

generalization of conditioned suppression. Journal of the Experimental Analysis of

Behavior, 4, 371-378.

Hoffman, H.S., & Fleshler, M. (1965). Stimulus aspects of aversive controls: The effects of

response contingent shock. Journal of the Experimental Analysis of Behavior, 8, 89-96.

240

Holland, P. C. (1977). Conditioned stimulus as a determinant of the form of the Pavlovian

conditioned response. Journal of Experimental Psychology: Animal Behavior

Processes, 3, 77.

Holland, P. C. (1992). Occasion setting in Pavlovian conditioning. Psychology of Learning

and Motivation, 28, 69-125.

Holland, P. C., & Gallagher, M. (2004). Amygdala–frontal interactions and reward

expectancy. Current Opinion in Neurobiology, 14, 148-155.

Holroyd, C.B., & Coles, M.G.H. (2002). The neural basis of human error processing:

reinforcement learning, dopamine, and the error-related negativity. Psychological

Review, 109, 679-709.

Holz, W. C., & Azrin, N. H. (1962). Interactions between the discriminative and aversive

properties of punishment. Journal of the Experimental Analysis of Behavior, 5, 229-234.

Hong, S., & Hikosaka, O. (2010). The globus pallidus sends reward-related signals to the

lateral habenula. Neuron, 60, 720-729.

Hong, S., Jhou, T. C., Smith, M., Saleem, K. S., & Hikosaka, O. (2011). Negative reward

signals from the lateral habenula to dopamine neurons are mediated by rostromedial

tegmental nucleus in primates. Journal of Neuroscience, 31, 11457-11471.

Hornung, J. P. (2003). The human raphe nuclei and the serotonergic system. Journal of

Chemical Neuroanatomy, 26, 331-343.

Horvitz, J. C. (2000). Mesolimbocortical and nigrostriatal dopamine responses to salient non-

reward events. Neuroscience, 96, 651-656.

Howard, J.L, & Pollard, G.T. (1990). Effects of buspirone in the Geller-Seifter conflict test

with incremental shock. Drug Development Research, 19, 37-49.

241

Howland, J. G., Taepavarapruk, P., & Phillips, A. G. (2002). Glutamate receptor-dependent

modulation of dopamine efflux in the nucleus accumbens by basolateral, but not central,

nucleus of the amygdala in rats. Journal of Neuroscience, 22, 1137-1145.

Huguenard, J. R., Gutnick, M. J., & Prince, D. A. (1993). Transient Ca2+ currents in neurons

isolated from rat lateral habenula. Journal of Neurophysiology, 70, 158-166.

Humphreys, K. L., & Lee, S. S. (2011). Risk taking and sensitivity to punishment in children

with ADHD, ODD, ADHD+ODD, and controls. Journal of Psychopathology and

Behavioral Assessment, 33, 299-307.

Hunt, H. F., & Brady, J. V. (1951). Some effects of electro-convulsive shock on a conditioned

emotional response ("anxiety"). Journal of Comparative and Physiological Psychology,

44, 88-98.

Hunt, H.F., & Brady, J.V. (1955). Some effects of punishment and intercurrent “anxiety” on a

simple operant. Journal of Comparative and Physiological Psychology, 48, 305-310.

Hwang, E. K., & Chung, J. M. (2014). 5HT1B receptor-mediated pre-synaptic depression of

excitatory inputs to the rat lateral habenula. Neuropharmacology, 81, 153-165.

Ikemoto, S. (2007). Dopamine reward circuitry: Two projection systems from the ventral

midbrain to the nucleus accumbens–olfactory tubercle complex. Brain Research

Reviews, 56, 27-78.

Ikemoto, S. (2010). Brain reward circuitry beyond the mesolimbic dopamine system: A

neurobiological theory. Neuroscience and Biobehavioral Reviews, 35, 129-150.

Ilango, A., Shumake, J., Wetzel, W., Scheich, H., & Ohl, F. W. (2012). The role of dopamine

in the context of aversive stimuli with particular reference to acoustically signaled

avoidance learning. Frontiers in Neuroscience, 6, 132.

242

Ilango, A., Kesner, A. J., Keller, K. L., Stuber, G. D., Bonci, A., & Ikemoto, S. (2014).

Similar roles of substantia nigra and ventral tegmental dopamine neurons in reward and

aversion. Journal of Neuroscience, 34, 817-822.

Iversen, S.D. (1984). 5-HT and anxiety. Neuropharmacology, 23, 1553-1560.

Jaber, M., Robinson, S. W., Missale, C., & Caron, M. G. (1996). Dopamine receptors and

brain function. Neuropharmacology, 35, 1503-1519.

Jacobs, B.L., & Azmitia, E.C. (1992). Structure and function of the brain serotonin system.

Psychological Reviews, 72, 165-229.

Jasmin, L., Rabkin, S. D., Granato, A., Boudah, A., & Ohara, P. T. (2003). Analgesia and

hyperalgesia from GABA-mediated modulation of the cerebral cortex. Nature, 424,

316-320.

Jhou, T. C., Geisler, S., Marinelli, M., Degarmo, B. A., & Zahm, D. S. (2009a). The

mesopontine rostromedial tegmental nucleus: A structure targeted by the lateral

habenula that projects to the ventral tegmental area of tsai and the substantia nigra

compacta. Journal of Comparative Neurology, 513, 566-596.

Jhou, T.C., Fields, H.L., Baxter, M.G., Saper, C.B, & Holland, P.C. (2009b). The

rostromedial tegmental nucleus (RMTg), a GABAergic afferent midbrain dopamine

neurons, encodes aversive stimuli and inhibits motor responses. Neuron, 61, 786-800.

Jhou, T. C., Good, C. H., Rowley, C. S., Xu, S. P., Wang, H., Burnham, N. W., ... & Ikemoto,

S. (2013). Cocaine drives aversive conditioning via delayed activation of dopamine-

responsive habenular and midbrain pathways. Journal of Neuroscience, 33, 7501-7512.

Ji, H., & Shephard, P. D. (2007). Lateral habenula stimulation inhibits rat midbrain dopamine

neurons through a GABA(A) receptor-mediated mechanism. Journal of Neuroscience,

27, 6923-6930.

243

Johansen, J. P., & Fields, H. L. (2004). Glutamatergic activation of anterior cingulate cortex

produces an aversive teaching signal. Nature Neuroscience, 7, 398-403.

Johansen, J. P., Tarpley, J. W., LeDoux, J. E., & Blair, H. T. (2010). Neural substrates for

expectation-modulated fear learning in the amygdala and periaqueductal gray. Nature

Neuroscience, 13, 979-986.

Johnston, G. A., & Willow, M. (1982). GABA and barbiturate receptors. Trends in

Pharmacological Sciences, 3, 328-330.

Kaada, B.R. (1951). Somato-motor, autonomic and electrocorticographic responses to

electrical stimulation of "rhinencephalic" and other structures in primates, cat and dog.

Acta Physiologica Scandinavica, 24 (Suppl. 83), 1-258.

Kaada, B.R. (1960). Cingulate, posterior orbital, anterior insular and temporal pole cortex. In

J. Field, H.W. Magoun, & V.E. Hall (Eds.), Handbook of physiology. Neurophysiology.

Vol. 11 (pp. 1345-1372). Washington, D. C.: American Physiological Society.

Kaada, B. R., Jarisen, J., & Andersen, P. (1953). Stimulation of the hippocampus and medial

cortical areas in unanesthetized cats. Neurology, 3, 844.

Kaada, B.R., Rasmussen, E.W., & Kveim, 0. (1962). Impaired acquisition of passive

avoidance behavior by subcallosal, septal, hypothalamic and insular lesions in rats.

Journal of Comparative Physiological Psychology, 55, 661-670.

Kamin, L. J. (1968). “Attention-like” processes in classical conditioning. In M. R. Jones

(Ed.), Miami symposium on the prediction of behaviour, 1967: Aversive stimulation (pp.

9-33). Miami, FL: University of Miami Press.

Kamin, L. J. (1969). Predictability, surprise, attention, and conditioning. In B. A. Campbell &

R. M. Church (Eds.), Punishment and aversive behaviour (pp. 279-296). New York,

NY: Appleton-Century-Crofts.

Kantak, K. M., Black, Y., Valencia, E., Green-Jordan, K., & Eichenbaum, H. B. (2002).

244

Dissociable effects of lidocaine inactivation of the rostral and caudal basolateral

amygdala on the maintenance and reinstatement of cocaine-seeking behavior in rats.

Journal of Neuroscience, 22, 1126-1136.

Karsh, E. B. (1962). Effects of number of rewarded trials and intensity of punishment on

running speed. Journal of Comparative and Physiological Psychology, 55, 44-51.

Kaufling, J., Veinante, P., Pawlowski, S.A, Freund-Mercier, M. J., & Barrot, M. (2009).

Afferents to the GABAergic tail of the ventral tegmental area in the rat. Journal of

Comparative Neurology, 513, 597-621.

Kaye, H., & Pearce, J. M. (1984). The strength of the orienting response during Pavlovian

conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 10, 90-

109.

Keeler, J. F., Pretsell, D. O., & Robbins, T. W. (2014). Functional implications of dopamine

D1 vs. D2 receptors: A ‘prepare and select’ model of the striatal direct vs. indirect

pathways. Neuroscience, 282, 156-175.

Killcross, S., & Coutureau, E. (2003). Coordination of actions and habits in the medial

prefrontal cortex of rats. Cerebral Cortex, 13, 400-408.

Killcross, S., Robbins, T. W. & Everitt, B. J. (1997). Different types of fear-conditioned

behaviour mediated by separate nuclei within amygdala. Nature, 388, 377-380.

Kim, U. (2009). Topographic commissural and descending projections of the habenula in the

rat. Journal of Comparative Neurology, 513, 173-187.

Kim, J. J., & Jung, M. W. (2006). Neural circuits and mechanisms involved in Pavlovian fear

conditioning: a critical review. Neuroscience & Biobehavioral Reviews, 30, 188-202.

Kim, K. M., Baratta, M. V., Yang, A., Lee, D., Boyden, E. S., & Fiorillo, C. D. (2012).

Optogenetic mimicry of the transient activation of dopamine neurons by natural reward

is sufficient for operant reinforcement. PLoS One, 7, e33612.

245

Kimble, D. P. (1969). Possible inhibitory functions of the hippocampus. Neuropsychologia, 7,

235-244.

Kimmel, H.D., & Terrant, F.R. (1968). Bias due to individual differences in yoked control

designs. Behaviour Research Methods and Instrumentation, 1, 11-14.

Kravitz, A. V., & Kreitzer, A. C. (2012). Striatal mechanisms underlying movement,

reinforcement, and punishment. Physiology, 27, 167-177.

Kravitz, A. V., Tye, L. D., & Kreitzer, A. C. (2012). Distinct roles for direct and indirect

pathway striatal neurons in reinforcement. Nature Neuroscience, 15, 816-818.

Kreitzer, A. C., & Malenka, R. C. (2008). Striatal plasticity and basal ganglia circuit function.

Neuron, 60, 543-554.

Kruse, H., Dunn, R. W., Theurer, K. L., Novick, W. J., & Shearman, G. T. (1981).

Attenuation of conflict‐induced suppression by clonidine: Indication of anxiolytic

activity. Drug Development Research, 1, 137-143.

Kobayashi, S. (2012). Organization of neural systems for aversive information processing:

Pain, error, and punishment. Frontiers in Neuroscience, 6, 136.

Komendantov, A. O., Komendantova, O. G., Johnson, S. W., & Canavier, C. C. (2004). A

modeling study suggests complementary roles for GABAA and NMDA receptors and

the SK channel in regulating the firing pattern in midbrain dopamine neurons. Journal

of Neurophysiology, 91, 346-357.

Konorski, J. (1967). Integrative activity of the brain: An interdisciplinary approach. Chicago,

IL: University of Chicago Press.

Koob, G.F., Mendelson, W.B., Schafer, J., Wall, T.L., Britton, K.T., & Bloom, F.E. (1988).

Picrotoxinin receptor ligand blocks anti-punishment effects of alcohol. Alcohol, 5, 437-

443.

246

Lammel, S., Hetzel, A., Häckel, O., Jones, I., Liss, B., & Roeper, J. (2008). Unique properties

of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron,

57, 760-773.

Lammel, S., Ion, D. I., Roeper, J., & Malenka, R. C. (2011). Projection-specific modulation of

dopamine neuron synapses by aversive and rewarding stimuli. Neuron, 70, 855-862.

Lammel, S., Lim, B. K., Ran, C., Huang, K. W., Betley, M. J., Tye, K. M., ... & Malenka, R.

C. (2012). Input-specific control of reward and aversion in the ventral tegmental area.

Nature, 491, 212-217.

Lammel, S., Lim, B. K., & Malenka, R. C. (2014). Reward and aversion in a heterogeneous

midbrain dopamine system. Neuropharmacology, 76, 351-359.

Lasiter, P.S., Deems, D.A., & Garcia, J. (1985). Involvement of the anterior insular gustatory

neocortex in taste-potentiated odor aversion learning. Physiology & Behavior, 34, 71-

77.

Lattal, K. M., & Nakajima, S. (1998). Overexpectation in appetitive Pavlovian and

instrumental conditioning. Animal Learning & Behavior, 26, 351-360.

Laurent, V., & Westbrook, R. F. (2009). Inactivation of the infralimbic but not the prelimbic

cortex impairs consolidation and retrieval of fear extinction. Learning & Memory, 16,

520-529.

Lavezzi, H. N., Parsley, K. P., & Zahm, D. S. (2015). Modulation of locomotor activation by

the rostromedial tegmental nucleus. Neuropsychopharmacology, 40, 676–687.

Lawson, R. P., Seymour, B., Loh, E., Lutti, A., Dolan, R. J., Dayan, P., ... & Roiser, J. P.

(2014). The habenula encodes negative motivational value associated with primary

punishment in humans. Proceedings of the National Academy of Sciences, 111, 11858-

11863.

Lecourtier, L., & Kelly, P. H. (2007). A conductor hidden in the orchestra? Role of the

247

habenular complex in monoamine transmission and cognition. Neuroscience and

Biobehavioral Reviews, 31, 658-672.

Lecourtier, L., Neijt, H. C., & Kelly, P. H. (2004). Habenula lesions cause impaired cognitive

performance in rats: Implications for schizophrenia. European Journal of Neuroscience,

19, 2551-2560.

Lecourtier, L., & DeFrancesco, A., & Moghaddam, B. (2008). Differential tonic influence of

lateral habenula on prefrontal cortex and nucleus accumbens dopamine release.

European Journal of Neuroscience, 27, 1755-1762.

LeDoux, J. E. (1992). Brain mechanisms of emotion and emotional learning. Current Opinion

in Neurobiology, 2, 191-197.

Lee, T. S., Kaku, T., Takebayashi, S., Uchino, T., Miyamoto, S., Hadama, T., ... & Ono, K.

(2006). Actions of mibefradil, efonidipine and nifedipine block of recombinant T-and

L-type Ca2+ channels with distinct inhibitory mechanisms. Pharmacology, 78, 11-20.

Lester, R. A. J., Quaram, M. L., Parker, J. D., Weber, E., & Jahr, C. E. (1989). Interaction of

6-cyano-7-nitroquinoxaline-23-dione with the N-methyl-D-aspartate receptor-associated

glycine binding site. Molecular Pharmacology, 35, 565-570.

Leuranguer, V., Mangoni, M. E., Nargeot, J., & Richard, S. (2001). Inhibition of T‐Type and

L‐Type calcium channels by mibefradil: Physiologic and pharmacologic bases of

cardiovascular effects. Journal of Cardiovascular Pharmacology, 37, 649-661.

Li, D., Sham, P. C., Owen, M. J., & He, L. (2006). Meta-analysis shows significant

association between dopamine system genes and attention deficit hyperactivity disorder

(ADHD). Human Molecular Genetics, 15, 2276-2284.

Lichtenstein, P. E. (1950). Studies of anxiety: I. The production of a feeding inhibition in

dogs. Journal of Comparative and Physiological Psychology, 43, 16-29.

248

Liu, M., & Glowa, J. R. (2000). Regulation of benzodiazepine receptor binding and GABA A

subunit mRNA expression by punishment and acute alprazolam administration. Brain

Research, 887, 23-33.

Liu, Z.H., Shin, R., & Ikemoto, S. (2008). Dual role of medial A10 dopamine neurons in

affective encoding. Neuropsychopharmacology, 33, 3010 –3020.

Ljungberg, T., Apicella, P., & Schultz, W. (1992). Responses of monkey dopamine neurons

during learning of behavioral reactions. Journal of Neurophysiology, 67, 145-163.

Lobb, C. J., Wilson, C. J., & Paladini, C. A. (2010). A dynamic role for GABA receptors on

the firing pattern of midbrain dopaminergic neurons. Journal of Neurophysiology, 104,

403-413.

Lobb, C. J., Wilson, C. J., & Paladini, C. A. (2011). High-frequency, short-latency

disinhibition bursting of midbrain dopaminergic neurons. Journal of Neurophysiology,

105, 2501-2511.

Logan, F.A. (1969). The negative incentive value of punishment. In B.A. Campbell & R.M.

Church (Eds.), Punishment and aversive behavior (pp. 43-54). New York, NY:

Appleton-Century-Crofts.

Lösher, W., & Rogawski, M.A. (2012). How theories evolved concerning the mechanism of

action of barbiturates. Epilepsia, 53, 12-25.

Low, K., Crestani, F., Keist, R., Benke, D., Brunig, I., Benson, A., et al. (2000). Molecular

and neuronal substrate for the selective attenuation of anxiety. Science, 290, 131–134.

Mackintosh, N.J. (1975). A theory of attention: Variations in the associability of stimulus

with reinforcement. Psychological Review, 82, 276-298.

Mackintosh, N.J. (1983). Conditioning and associative learning. Oxford: Clarendon Press.

Macoveanu, J. (2014). Serotonergic modulation of reward and punishment: Evidence from

pharmacological fMRI studies. Brain Research, 1556, 19-27.

249

Mällo, T., Alttoa, A., Kõiv, K., Tõnissaar, M., Eller, M., & Harro, J. (2007). Rats with

persistently low or high exploratory activity: behaviour in tests of anxiety and

depression, and extracellular levels of dopamine. Behavioural Brain Research, 177,

269-281.

Mantz, J., Thierry, A. M., & Glowinski, J. (1989). Effect of noxious tail pinch on the

discharge rate of mesocortical and mesolimbic dopamine neurons: selective activation

of the mesocortical system. Brain Research, 476, 377-381.

Marchant, N. J., Furlong, T. M., & McNally, G. P. (2010). Medial dorsal hypothalamus

mediates the inhibition of reward seeking after extinction. Journal of Neuroscience, 30,

14102-14115.

Marchant, N. J., Khuc, T. N., Pickens, C. L., Bonci, A., & Shaham, Y. (2013). Context-

induced relapse to alcohol seeking after punishment in a rat model. Biological

Psychiatry, 73, 256-262.

Marek, R., Strobel, C., Bredy, T. W., & Sah, P. (2013). The amygdala and medial prefrontal

cortex: partners in the fear circuit. Journal of Physiology, 591, 2381-2391.

Maren, S. (2001). Neurobiology of Pavlovian fear conditioning. Annual Review of

Neuroscience, 24, 897-931.

Maren, S. (2003). What the amygdala does and doesn't do in aversive learning. Learning &

Memory, 10, 306-308.

Maren, S., & Quirk, G. J. (2004). Neuronal signalling of fear memory. Nature Reviews

Neuroscience, 5, 844-852.

Maren, S., Aharonov, G., Stote, D. L., & Fanselow, M. S. (1996). N-methyl-D-aspartate

receptors in the basolateral amygdala are required for both acquisition and expression of

conditional fear in rats. Behavioral Neuroscience, 110, 1365-1374.

250

Margolis, E.B., Toy, B., Himmels, P., Morales, M., & Fields, H.L. (2012). Identification of

rat ventral tegmental area GABAergic neurons. PLoS ONE, 7, e42365.

doi:10.1371/journal.pone.0042365

Margules, D. L. (1968). Noradrenergic basis of inhibition between reward and punishment in

amygdala. Journal of Comparative and Physiological Psychology, 66, 329.

Margules, D. L. (1971a). Alpha and beta adrenergic receptors in amygdala: Reciprocal

inhibitors and facilitators of punished operant behavior. European Journal of

Pharmacology, 16, 21-26.

Margules, D. L. (1971b). Localization of anti-punishment actions of norepinephrine and

atropine in amygdala and entopeduncular nucleus of rats. Brain Research, 35, 177-184.

Marquis, J. P., Killcross, S., & Haddon, J. E. (2007). Inactivation of the prelimbic, but not

infralimbic, prefrontal cortex impairs the contextual control of response conflict in rats.

European Journal of Neuroscience, 25, 559-566.

Martin, I., & Levey, A. B. (1991). Blocking observed in human eyelid conditioning.

Quarterly Journal of Experimental Psychology, 43, 233-256.

Masserman, J. H. (1943). Behavior and neurosis: An experimental psychoanalytic approach

to psychobiologic principles. Chicago: University of Chicago Press.

Masserman, J. H., & Pechtel, C. (1953). Neuroses in monkeys: A preliminary report of

experimental observations. Annals of the New York Academy of Sciences, 56, 253-265.

Matsumoto, M., & Hikosaka, O. (2007). Lateral habenula as a source of negative reward

signals in dopamine neurons. Nature, 447, 1111-1115.

Matsumoto, M., & Hikosaka, O. (2009a). Representation of negative motivational value in the

primate lateral habenula. Nature Neuroscience, 12, 77-84.

Matsumoto, M., & Hikosaka, O. (2009b). Two types of dopamine neuron distinctly convey

positive and negative motivational signals. Nature, 459, 837-841.

251

Matsumoto, M., & Hikosaka, O. (2011). Electrical stimulation of the primate lateral habenula

suppresses saccadic eye movement through a learning mechanism. PLoS One, 6,

e26701.

McCleary, R. A. (1961). Response specificity in the behavioral effects of limbic system

lesions in the cat. Journal of Comparative and Physiological Psychology, 54, 605-613.

McCleary, R. A. (1966). Response-modulating functions of the limbic system: Initiation and

suppression. Progress in Physiological Psychology, 1, 209-272.

McCloskey, T.C., Paul, B.K., & Commissaris, R.L. (1987). Buspirone effects in an animal

conflict procedure: Comparison with diazepam and phenobarbital. Pharmacology,

Biochemistry and Behaviour, 27, 171-175.

McCutcheon, J. E., Ebner, S. R., Loriaux, A. L., & Roitman, M. F. (2012). Encoding of

aversion by dopamine and the nucleus accumbens. Frontiers in Neuroscience, 6, 137.

McCulloch, J., Savaki, H. E., & Sokoloff, L. (1980). Influence of dopaminergic systems on

the lateral habenular nucleus of the rat. Brain Research, 21, 117-124.

McDonald, A. J., Mascagni, F., & Guo, L. (1996). Projections of the medial and lateral

prefrontal cortices to the amygdala: A Phaseolus vulgaris leucoagglutinin study in the

rat. Neuroscience, 71, 55-75.

McIntire, K. D., & Liddell, B. J. (1984). Gamma-butyrolactone increases the rate of punished

lever pressing by rats. Pharmacology Biochemistry and Behavior, 20, 307-310.

McLaughlin, R. J., & Floresco, S. B. (2007). The role of different subregions of the

basolateral amygdala in cue-induced reinstatement and extinction of food-seeking

behavior. Neuroscience, 146, 1484-1494.

McMillan, D.E., & Leander, J.D. (1975). Drugs and punished responding. V. Effects of drugs

on responding suppressed by response-dependent and response-independent electric

shock. Archives Internationales de Pharmacodynamie et de Therapie, 213, 22-27.

252

McNally, G. P., Johansen, J., & Blair, H. T. (2011). Placing prediction into the fear circuit.

Trends in Neurosciences, 34, 283-292.

McNaughton, N., & Gray, J. A. (2000). Anxiolytic action on the behavioural inhibition

system implies multiple types of arousal contribute to anxiety. Journal of Affective

Disorders, 61, 161-176.

Melvin, K. B., & Ervey, D. H. (1973). Facilitative and suppressive effects of punishment on

species-typical aggressive display in Betta splendens. Journal of Comparative and

Physiological Psychology, 83, 451.

Menon, V., & Uddin, L.Q. (2010). Saliency, switching, attention and control: A network

model of insula function. Brain Structure and Function, 214, 655-667.

Meye, F. J., Lecca, S., Valentinova, K., & Mameli, M. (2013). Synaptic and cellular profile of

neurons in the lateral habenula. Frontiers in Human Neuroscience, 7, 860.

Millan, E. Z., & McNally, G. P. (2011). Accumbens shell AMPA receptors mediate

expression of extinguished reward seeking through interactions with basolateral

amygdala. Learning & Memory, 18, 414-421.

Millan, M. J., Rivet, J. M., Canton, H., Lejeune, F., Bervoets, K., Brocco, M., ... & Verriele,

L. (1992). S 14671: a naphtylpiperazine 5-hydroxytryptamine1A agonist of exceptional

potency and high efficacy possessing antagonist activity at 5-hydroxytryptamine1C/2

receptors. Journal of Pharmacology and Experimental Therapeutics, 262, 451-463.

Miller, N. E. (1960). Learning resistance to pain and fear: Effects of overlearning, exposure,

and rewarded exposure in context. Journal of Experimental Psychology, 60, 137

Miller, S., & Konorski, J. (1969). On a particular form of conditioned reflex. Journal of the

Experimental Analysis of Behavior, 12, 187-189. [Translated by B.F. Skinner of: Sur

une forme particuliere des reflex conditionnels. Les Compte Rendus des Seances de la

Societe Polonaise de Biologie, 1928]

253

Milner, P. M. (1991). Brain-stimulation reward: A review. Canadian Journal of Psychology,

45, 1-36.

Miranda-Morales, R.S., Nizhnikov, M.E., Waters, D.H., & Spear, N.E. (2014). New evidence

of ethanol's anxiolytic properties in the infant rat. Alcohol, 48, 367-374.

Mirenowicz, J., & Schultz, W. (1994). Importance of unpredictability for reward responses in

primate dopamine neurons. Journal of Neurophysiology, 72, 1024-1027.

Mirrione, M. M., Schulz, D., Lapidus, K. A., Zhang, S., Goodman, W., & Henn, F. A. (2014).

Increased metabolic activity in the septum and habenula during stress is linked to

subsequent expression of learned helplessness behavior. Frontiers in Human

Neuroscience, 8, 29.

Mogenson, G. J., Wu, M., & Machanda, S. K. (1979). Locomotor activity initiated by

microinfusions of picrotoxin into the ventral tegmental area. Brain Research, 161, 311-

319.

Moore, R.Y., Halaris, A.E., & Jones, B.E. (1978). Serotonin neurons of the midbrain raphe:

Ascending projections. Journal of Comparative Neurology, 180, 417-438.

Morrison, S. E., & Salzman, C. D. (2011). Representations of appetitive and aversive

information in the primate orbitofrontal cortex. Annals of the New York Academy of

Sciences, 1239, 59-70.

Moul, C., Killcross, S., & Dadds, M. R. (2012). A model of differential amygdala activation

in psychopathy. Psychological Review, 119, 789-806.

Mowrer, O. (1947). On the dual nature of learning: A reinterpretation of "conditioning" and

"problem-solving". Harvard Educational Review, 17, 102-150.

Mowrer, O. (1960). Learning theory and behavior. Hoboken, NJ: John Wiley & Sons.

Müller, J. L., Sommer, M., Wagner, V., Lange, K., Taschler, H., Röder, C. H., ... & Hajak, G.

(2003). Abnormalities in emotion processing within cortical and subcortical regions in

254

criminal psychopaths: evidence from a functional magnetic resonance imaging study

using pictures with emotional content. Biological Psychiatry, 54, 152-162.

Must, A., Szabó, Z., Bódi, N., Szász, A., Janka, Z., & Kéri, S. (2006). Sensitivity to reward

and punishment and the prefrontal cortex in major depression. Journal of Affective

Disorders, 90, 209-215.

Nair, S. G., Strand, N. S., & Neumaier, J. F. (2013). DREADDing the lateral habenula: a

review of methodological approaches for studying lateral habenula function. Brain

Research, 1511, 93-101.

Nakamura, K., & Hikosaka, O. (2006). Role of dopamine in the primate caudate nucleus in

reward modulation of saccades. Journal of Neuroscience, 26, 5360-5369.

Nanry, K. P., Howard, J. L., & Pollard, G. T. (1991). Effects of buspirone and other

anxiolytics on punished key‐pecking in the pigeon. Drug Development Research, 24,

269-276.

New, J. S. (1990). The discovery and development of buspirone: a new approach to the

treatment of anxiety. Medicinal Research Reviews, 10, 283-326.

Newman, J. P., & Kosson, D. S. (1986). Passive avoidance learning in psychopathic and

nonpsychopathic offenders. Journal of Abnormal Psychology, 95, 252-256.

Newman, J. P., Patterson, C. M., & Kosson, D. S. (1987). Response perseveration in

psychopaths. Journal of Abnormal Psychology, 96, 145-148.

Newman, J. P., Patterson, C. M., Howland, E. W., & Nichols, S. L. (1990). Passive avoidance

in psychopaths: The effects of reward. Personality and Individual Differences, 11,

1101-1114.

O'Doherty, J., Kringelbach, M. L., Rolls, E. T., Hornak, J., & Andrews, C. (2001). Abstract

reward and punishment representations in the human orbitofrontal cortex. Nature

Neuroscience, 4, 95-102.

255

O'Doherty, J. P., Dayan, P., Friston, K., Critchley, H., & Dolan, R. J. (2003). Temporal

difference models and reward-related learning in the human brain. Neuron, 38, 329-337.

Olds, J., & Milner, P. M. (1954). Positive reinforcement produced by electrical stimulation of

septal area and other regions of rat brain. Journal of Comparative and Physiological

Psychology, 47, 419-427.

Olds, M. E., & Olds, J. (1963). Approach-avoidance analysis of rat diencephalon. Journal of

Comparative Neurology, 120, 259-295.

Olton, D. S., Becker, J. T., & Handelmann, G. E. (1979). Hippocampus, space, and memory.

Behavioral and Brain Sciences, 2, 313-322.

Omelchenko, N., Bell, R., & Sesack, S. R. (2009). Lateral habenula projections to dopamine

and GABA neurons in the rat ventral tegmental area. European Journal of

Neuroscience, 30, 1239-1250.

Onge, J. R. S., Stopper, C. M., Zahm, D. S., & Floresco, S. B. (2012). Separate prefrontal-

subcortical circuits mediate different components of risk-based decision making.

Journal of Neuroscience, 32, 2886-2899.

Öngür, D., & Price, J.L. (2000). The organization of networks within the orbital and medial

prefrontal cortex of rats, monkeys and humans. Cerebral Cortex, 10, 206-219.

Orme-Johnson, D.W., & Yarczower, M. (1974). Conditioned suppression, punishment, and

aversion. Journal of the Experimental Analysis of Behavior, 21, 57-74.

Ostlund, S.B., & Balleine, B.W. (2005). Lesions of medial prefrontal cortex disrupt the

acquisition but not the expression of goal-directed learning. Journal of Neuroscience,

25, 7763-7770.

Ostlund, S.B., & Balleine, B.W. (2007). Orbitofrontal cortex mediates outcome encoding in

Pavlovian but not instrumental conditioning. Journal of Neuroscience, 27, 4819-4825.

256

Ostlund, S.B., & Balleine, B.W. (2008a). Differential involvement of the basolateral

amygdala and mediodorsal thalamus in instrumental action selection. Journal of

Neuroscience, 28, 4398-4405.

Ostlund, S.B., & Balleine, B.W. (2008b). On habits and addiction: An associative analysis of

compulsive drug seeking. Drug Discovery Today: Disease Models, 5, 235-245.

Packard, M.G., & Knowlton, B.J. (2002). Learning and memory functions of the Basal

Ganglia. Annual Review of Neuroscience, 25, 563–593.

Parker, J. G., Zweifel, L. S., Clark, J. J., Evans, S. B., Phillips, P. E., & Palmiter, R. D.

(2010). Absence of NMDA receptors in dopamine neurons attenuates dopamine release

but not conditioned approach during Pavlovian conditioning. Proceedings of the

National Academy of Sciences, 107, 13491-13496.

Parkes, S. L., & Balleine, B. W. (2013). Incentive memory: Evidence the basolateral

amygdala encodes and the insular cortex retrieves outcome values to guide choice

between goal-directed actions. Journal of Neuroscience, 33, 8753-8763.

Patel, J. B., & Migler, B. (1982). A sensitive and selective monkey conflict test.

Pharmacology Biochemistry and Behavior, 17, 645-649.

Pattij, T., & Schoffelmeer, A. N. (2015). Serotonin and inhibitory response control: Focusing

on the role of 5-HT1A receptors. European Journal of Pharmacology, 753, 140-145.

Paulus, M. P., Rogalsky, C., Simmons, A., Feinstein, J. S., & Stein, M. B. (2003). Increased

activation in the right insula during risk-taking decision making is related to harm

avoidance and neuroticism. Neuroimage, 19, 1439-1448.

Pavlov, I. P. (1927). Conditioned Reflexes. London: Oxford University Press.

Paxinos, G., & Watson, C. (2007). The rat brain in stereotaxic coordinates. London:

Academic Press.

257

Pearce, J.M., & Dickinson, A. (1975). Pavlovian counterconditioning: Changing the

suppressive properties of shock by association with food. Journal of Experimental

Psychology: Animal Behaviour Processes, 1, 170-177.

Pearce, J. M., & Hall, G. (1978). Overshadowing the instrumental conditioning of a lever-

press response by a more valid predictor of the reinforcer. Journal of Experimental

Psychology: Animal Behavior Processes, 4, 356.

Pellon, R., Ruız, A., Lamas, E., & Rodrıguez, C. (2007). Pharmacological analysis of the

effects of benzodiazepines on punished schedule-induced polydipsia in rats. Behavioral

Pharmacology, 18, 81-87.

Pelloux, Y., Murray, J. E., & Everitt, B. J. (2013). Differential roles of the prefrontal cortical

subregions and basolateral amygdala in compulsive cocaine seeking and relapse after

voluntary abstinence in rats. European Journal of Neuroscience, 38, 3018-3026.

Peroutka, S.J. (1985). Selective interaction of novel anxiolytics with 5-hydroxytryptamine1A

receptors. Biological Psychiatry, 20, 971-979.

Peters, J., & De Vries, T.J. (2013). D-cycloserine administered directly to infralimbic medial

prefrontal cortex enhances extinction memory in sucrose-seeking animals.

Neuroscience, 230, 24-30.

Peters, J., LaLumiere, R.T., & Kalivas, P.W. (2008). Infralimbic prefrontal cortex is

responsible for inhibiting cocaine seeking in extinguished rats. The Journal of

Neuroscience, 28, 6046-6053.

Peters, J., Kalivas, P.W., & Quirk, G.J. (2009). Extinction circuits for fear and addiction

overlap in prefrontal cortex. Learning & Memory, 16, 279-288.

Petry, N. M. (2001). Substance abuse, pathological gambling, and impulsiveness. Drug and

Alcohol Dependence, 63, 29-38.

Phillips, A. G. (1984). Brain reward circuitry: A case for separate systems. Brain Research

258

Bulletin, 12, 195-201.

Pickens, C. L., Saddoris, M. P., Setlow, B., Gallagher, M., Holland, P. C., & Schoenbaum, G.

(2003). Different roles for orbitofrontal cortex and basolateral amygdala in a reinforcer

devaluation task. Journal of Neuroscience, 23, 11078-11084.

Pollard, G. T., & Howard, J. L. (1979). The Geller-Seifter conflict paradigm with incremental

shock. Psychopharmacology, 62, 117-121.

Pollard, G. T., & Howard, J. L. (1990). Effects of drugs on punished behavior: Pre-clinical

test for anxiolytics. Pharmacology & Therapeutics, 45, 403-424.

Pollard, G. T., Nanry, K. P., & Howard, J. L. (1992). Effects of tandospirone in three

behavioral tests for anxiolytics. European Journal of Pharmacology, 221, 297-305.

Preuschoff, K., Quartz, S.R., & Bossaerts, P. (2008). Human insula activation reflects risk

prediction errors as well as risk. Journal of Neuroscience, 28, 2745-2752.

Prut, L., & Belzung, C. (2003). The open field as a paradigm to measure the effects of drugs

on anxiety-like behaviors: A review. European Journal of Pharmacology, 463, 3-33.

Quina, L. A., Tempest, L., Ng, L., Harris, J. A., Ferguson, S., Jhou, T. C., & Turner, E. E.

(2015). Efferent pathways of the mouse lateral habenula. Journal of Comparative

Neurology, 523, 32-60.

Quirk, G. J., & Sotres-Bayon, F. (2009). Signaling aversive events in the midbrain: Worse

than expected. Neuron, 61, 655-656.

Rabinak, C. A., & Maren, S. (2008). Associative structure of fear memory after basolateral

amygdala lesions in rats. Behavioral Neuroscience, 122, 1284-1294.

Rachlin, H., & Herrnstein, R.J. (1969). Hedonism revisited: On the negative law of effect. In

B.A. Campbell & R.M. Church (Eds.), Punishment and aversive behavior (pp. 83-109).

New York, NY: Appleton-Century-Crofts.

259

Ragozzino, M. E. (2007). The contribution of the medial prefrontal cortex, orbitofrontal

cortex, and dorsomedial striatum to behavioral flexibility. Annals of the New York

Academy of Sciences, 1121, 355-375.

Rainville, P., Duncan, G. H., Price, D. D., Carrier, B., & Bushnell, M. C. (1997). Pain affect

encoded in human anterior cingulate but not somatosensory cortex. Science, 277, 968-

971.

Ramirez, F., Moscarello, J. M., LeDoux, J. E., & Sears, R. M. (2015). Active avoidance

requires a serial basal amygdala to nucleus accumbens shell circuit. Journal of

Neuroscience, 35, 3470-3477.

Rang, H.P., Dale, M.M., Ritter, J.M., Flower, R.J., & Henderson, G. (2012). Rang & Dale's

pharmacology. Edinburgh: Churchill Livingstone.

Rasmussen, E. B., & Newland, M. C. (2009). Quantification of ethanol's antipunishment

effect in humans using the generalized matching equation. Journal of the Experimental

Analysis of Behavior, 92, 161-180.

Redgrave, P., Prescott, T. J., & Gurney, K. (1999). The basal ganglia: A vertebrate solution to

the selection problem? Neuroscience, 89, 1009-1023.

Rescorla, R. A. (1969). Pavlovian conditioned inhibition. Psychological Bulletin, 72, 77-94.

Rescorla, R. A. (1973). Evidence for" unique stimulus" account of configural conditioning.

Journal of Comparative and Physiological Psychology, 85, 331-338.

Rescorla, R. A. (1991). Associative relations in instrumental learning: The eighteenth Bartlett

memorial lecture. Quarterly Journal of Experimental Psychology, 43, 1-23.

Rescorla, R. A., & Lolordo, V. M. (1965). Inhibition of avoidance behavior. Journal of

Comparative and Physiological Psychology, 59, 406-412.

260

Rescorla, R. A., & Solomon, R. L. (1967). Two-process learning theory: Relationships

between Pavlovian conditioning and instrumental learning. Psychological Review, 74,

151-182.

Rickels, K. (1982). Buspirone and diazepam in anxiety: A controlled study. Journal of

Clinical Psychiatry, 43, 81-86.

Rogers, J. L., & See, R. E. (2007). Selective inactivation of the ventral hippocampus

attenuates cue-induced and cocaine-primed reinstatement of drug-seeking in rats.

Neurobiology of Learning and Memory, 87, 688-692.

Roitman, M.F., Wheeler, R.A., & Carelli, R.M. (2005). Nucleus accumbens neurons are

innately tuned for rewarding and aversive taste stimuli, encode their predictors, and are

linked to motor output. Neuron, 45, 587-597.

Roitman, M. F., Wheeler, R. A., Wightman, R. M., & Carelli, R. M. (2008). Real-time

chemical responses in the nucleus accumbens differentiate rewarding and aversive

stimuli. Nature Neuroscience, 11, 1376-1377.

Romo, R., & Schultz, W. (1990). Dopamine neurons of the monkey midbrain: Contingencies

of responses to active touch during self-initiated arm movements. Journal of

Neurophysiology, 63, 592-606.

Root, D. H., Mejias-Aponte, C. A., Qi, J., & Morales, M. (2014). Role of glutamatergic

projections from ventral tegmental area to lateral habenula in aversive conditioning.

Journal of Neuroscience, 34, 13906-13910.

Rudolph, U., & Knoflach, F. (2011). Beyond classical benzodiazepines: novel therapeutic

potential of GABAA receptor subtypes. Nature Reviews Drug Discovery, 10, 685-697.

Rudolph, U., & Mohler, H. (2006). GABA-based therapeutic approaches: GABAA receptor

subtype functions. Current Opinion in Pharmacology, 6, 18-23.

Sah, P., Faber, E. L., De Armentia, M. L., & Power, J. (2003). The amygdaloid complex:

261

anatomy and physiology. Physiological Reviews, 83, 803-834.

Salamone, J. D. (1992). Complex motor and sensorimotor functions of striatal and accumbens

dopamine: Involvement in instrumental behavior processes. Psychopharmacology, 107,

160-174.

Salas, R., Baldwin, P., de Biasi, M., & Montague, P. R. (2010). BOLD responses to negative

prediction errors in human habenula. Frontiers in Human Neuroscience, 11, 4-36.

Sanger, D. J. (1990). Effects of buspirone and related compounds on suppressed operant

responding in rats. Journal of Pharmacology and Experimental Therapeutics, 254, 420-

426.

Sanger, D. J. (1992). Increased rates of punished responding produced by buspirone-like

compounds in rats. Journal of Pharmacology and Experimental Therapeutics, 261, 513-

517.

Sara, S. J. (2009). The locus coeruleus and noradrenergic modulation of cognition. Nature

Reviews Neuroscience, 10, 211-223.

Schlund, M.W., Siegle, G.J., Ladouceur, C.D., Silk, J.S., Cataldo, M.F., Forbes, E.E., Dahl,

R.E., & Ryan, N.D. (2010). Nothing to fear? Neural systems supporting avoidance

behavior in healthy youths. Neuroimage, 52, 710-719.

Schneider, F., Habel, U., Kessler, C., Posse, S., Grodd, W., & Müller-Gärtner, H. W. (2000).

Functional imaging of conditioned aversive emotional responses in antisocial

personality disorder. Neuropsychobiology, 42, 192-201.

Schoenbaum, G., & Roesch, M. (2005). Orbitofrontal cortex, associative learning, and

expectancies. Neuron, 47, 633-636.

Schoenbaum, G., Roesch, M. R., Stalnaker, T. A., & Takahashi, Y. K. (2009). A new

perspective on the role of the orbitofrontal cortex in adaptive behaviour. Nature Reviews

Neuroscience, 10, 885-892.

262

Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of

Neurophysiology, 80, 1-27.

Schultz, W. (2007a). Behavioral dopamine signals. Trends in Neurosciences, 30, 203-210.

Schultz, W. (2007b). Multiple dopamine functions at different time courses. Annual Review of

Neuroscience, 30, 259-288.

Schultz, W., Apicella, P., & Ljungberg, T. (1993). Responses of monkey dopamine neurons to

reward and conditioned stimuli during successive steps of learning a delayed response

task. Journal of Neurophysiology, 13, 900-913.

Schultz, W., Dayan, P, & Montague, R. R. (1997). A neural substrate of prediction and

reward. Science, 12, 4595-4610.

Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annual Review of

Neuroscience, 23, 473-500.

Schultz, W., & Romo, R. (1990). Dopamine neurons of the monkey midbrain: Contingencies

of responses to stimuli eliciting immediate behavioral reactions. Journal of

Neurophysiology, 63, 607-624.

Schuster, R., & Rachlin, H. (1968). Indifference between punishment and free shock:

Evidence for the negative law of effect. Journal of the Experimental Analysis of

Behavior, 11, 777-786.

Seamans, J. K., Lapish, C. C., & Durstewitz, D. (2008). Comparing the prefrontal cortex of

rats and primates: insights from electrophysiology. Neurotoxicity Research, 14, 249-

262.

Sehlmeyer, C., Schöning, S., Zwitserlood, P., Pfleiderer, B., Kircher, T., Arolt, V., & Konrad,

C. (2009). Human fear conditioning and extinction in neuroimaging: a systematic

review. PLoS One, 4, e5865.

263

Sesack, S. R., Deutch, A. Y., Roth, R. H., & Bunney, B. S. (1989). Topographical

organization of the efferent projections of the medial prefrontal cortex in the rat: An

anterograde tract‐tracing study with Phaseolus vulgaris leucoagglutinin. Journal of

Comparative Neurology, 290, 213-242.

Seymour, B., O'Doherty, J.P., Dayan, P., Koltzenburg, M., Jones, A.K., Dolan, R.J., Friston,

K.J., & Frackowiak, R.S. (2004). Temporal difference models describe higher-order

learning in humans. Nature, 429, 664-667.

Seymour, B., O'Doherty, J.P., Koltzenburg, M., Wiech, K., Frackowiak, R., Friston, K., &

Dolan, R. (2005). Opponent appetitive-aversive neural processes underlie predictive

learning of pain relief. Nature Neuroscience, 8, 1234-1240.

Shabel, S. J., Proulx, C. D., Trias, A., Murphy, R. T., & Malinow, R. (2012). Input to the

lateral habenula from the basal ganglia is excitatory, aversive, and suppressed by

serotonin. Neuron, 74, 475-481.

Shabel, S. J., Proulx, C. D., Piriz, J., & Malinow, R. (2014). GABA/glutamate co-release

controls habenula output and is modified by antidepressant treatment. Science, 345,

1494-1498.

Shen, W., Flajolet, M., Greengard, P., & Surmeier, D. J. (2008). Dichotomous dopaminergic

control of striatal synaptic plasticity. Science, 321, 848-851.

Shettleworth, S. J. (1978). Reinforcement and the organization of behavior in golden

hamsters: Punishment of three action patterns. Learning and Motivation, 9, 99-123.

Shinonaga, Y., Takada, M., & Mizuno, N. (1994). Topographic organization of collateral

projections from the basolateral amygdaloid nucleus to both the prefrontal cortex and

nucleus accumbens in the rat. Neuroscience, 58, 389-397.

264

Shippenberg, T.S., Bals-Kubik, R., Huber, A., & Herz, A. (1991). Neuroanatomical substrates

mediating the aversive effects of D-1 dopamine receptor antagonists.

Psychopharmacology, 103, 209 –214.

Sieghart, W. (2015). Chapter three – Allosteric modulation of GABAA receptors via multiple

drug-binding sites. Advances in Pharmacology, 72, 53-96.

Siegle, G. J., Steinhauer, S. R., Thase, M. E., Stenger, V. A., & Carter, C. S. (2002). Can’t

shake that feeling: Event-related fMRI assessment of sustained amygdala activity in

response to emotional information in depressed individuals. Biological Psychiatry, 51,

693-707.

Sierra-Mercado, D., Padilla-Coreano, N., & Quirk, G. J. (2011). Dissociable roles of

prelimbic and infralimbic cortices, ventral hippocampus, and basolateral amygdala in

the expression and extinction of conditioned fear. Neuropsychopharmacology, 36, 529-

538.

Sikes, R. W., & Vogt, B. A. (1992). Nociceptive neurons in area 24 of rabbit cingulate cortex.

Journal of Neurophysiology, 68, 1720-1732.

Simmons, A., Matthews, S.C., Stein, M.B., & Paulus, M.P. (2004). Anticipation of

emotionally aversive visual stimuli activates right insula. NeuroReport, 15, 2261–2265.

Simmons, A., Strigo, I., Matthews, S.C., Paulus, M.P., & Stein, M.B. (2006). Anticipation of

aversive visual stimuli is associated with increased insula activation in anxiety-prone

subjects. Biological Psychiatry, 60, 402–409.

Skinner, B. F. (1932). On the rate of formation of a conditioned reflex. Journal of General

Psychology, 7, 274-286.

Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. New York, NY:

Appleton-Century.

Skinner, B.F. (1953). Science and human behavior. New York, NY: Macmillan.

265

Small, D. M., Jones-Gotman, M., & Dagher, A. (2003). Feeding-induced dopamine release in

dorsal striatum correlates with meal pleasantness ratings in healthy human volunteers.

Neuroimage, 19, 1709-1715.

Solomon, R.L. (1964). Punishment. American Psychologist, 19, 239-253.

Sommer, W., Moller, C., Wiklund, L., Thorsell, A., Rimondini, R., Nissbrandt, H., & Heilig,

M. (2001). Local 5, 7-dihydroxytryptamine lesions of rat amygdala: Release of

punished drinking, unaffected plus-maze behavior and ethanol consumption.

Neuropsychopharmacology, 24, 430-440.

Spanagel, R., & Weiss, F., (1999). The dopamine hypothesis of reward: Past and current

status. Trends Neuroscience, 22, 521-527.

Spealman, R. (1979). Comparison of drug effects on responding punished by pressurized air

or electric shock delivery in squirrel monkeys: Pentobarbital, chlordiazepoxide, d-

amphetamine and cocaine. Journal of Pharmacology and Experimental Therapeutics,

209, 309-315.

Sridharan, D., Levitin, D. J., & Menon, V. (2008). A critical role for the right fronto-insular

cortex in switching between central-executive and default-mode networks. Proceedings

of the National Academy of Sciences, 105, 12569-12574.

St. Claire-Smith, R. (1979a). The overshadowing of instrumental conditioning by a stimulus

that predicts reinforcement better than the response. Animal Learning & Behavior, 7,

224-228.

St. Claire-Smith, R. (1979b). The overshadowing and blocking of punishment. Quarterly

Journal of Experimental Psychology, 31, 51-61.

Stamatakis, A. M., & Stuber, G. D. (2012). Activation of lateral habenula inputs to the ventral

midbrain promotes behavioral avoidance. Nature Neuroscience, 15, 1105-1107.

266

Stamatakis, A. M., Jennings, J. H., Ung, R. L., Blair, G. A., Weinberg, R. J., Neve, R. L., ... &

Stuber, G. D. (2013). A unique population of ventral tegmental area neurons inhibits the

lateral habenula to promote reward. Neuron, 80, 1039-1053.

Stein, L., Wise, C. D., & Berger, B. D. (1973). Antianxiety action of benzodiazepines:

Decrease in activity of serotonin neurons in the punishment system. In S. Garattini, E.

Mussini, & L.O. Randall (Eds.), The Benzodiazepines (pp. 299-326). New York, NY:

Raven Press.

Stein, L., Wise, C. D., & Belluzzi, J. D. (1975). Effects of benzodiazepines on central

serotonergic mechanisms. Advances in Biochemical Psychopharmacology, 14, 29-44.

Stein, L., Belluzzi, J. D., & Wise, C. D. (1977). Benzodiazepines: Behavioral neurochemical

mechanisms. American Journal of Psychiatry, 134, 665-669.

Stoetzer, C., Kistner, K., Stüber, T., Wirths, M., Schulze, V., Doll, T., ... & Leffler, A. (2014).

Methadone is a local anaesthetic-like inhibitor of neuronal Na+ channels and blocks

excitability of mouse peripheral nerves. British Journal of Anaesthesia, aeu206.

Storms, L. H., Boroczi, G., & Broer, W. E. (1963). Effects of punishment as a function of

strain of rat and duration of shock. Journal of Comparative and Physiological

Psychology, 56, 1022-1026.

Stopper, C. M., & Floresco, S. B. (2014). What's better for me? Fundamental role for lateral

habenula in promoting subjective decision biases. Nature Neuroscience, 17, 33-35.

Stuber, G. D., Hnasko, T. S., Britt, J. P., Edwards, R. H., & Bonci, A. (2010). Dopaminergic

terminals in the nucleus accumbens but not the dorsal striatum corelease glutamate.

Journal of Neuroscience, 30, 8229-8233.

Stuber, G. D., Sparta, D. R., Stamatakis, A. M., van Leeuwen, W. A., Hardjoprajitno, J. E.,

Cho, S., ... Bonci, A. (2011). Excitatory transmission from the amygdala to nucleus

accumbens facilitates reward seeking. Nature, 475, 377-380.

267

Sullivan, J. W., Keim, K. L., & Sepinwall, J. (1983). A preclinical evaluation of buspirone in

neuropharmacologic, EEG, and anticonflict test procedures [Abstract]. Society for

Neuroscience Abstracts, 9, 434.

Suzdak, P.D., Glowa, J.R., Crawley, J.N., Schwartz, R.D., Skolnick, P., & Paul, S.M.A.

(1986). A selective imidazobenzodiazapine antagonist of ethanol in the rat. Science,

234, 1243-1247.

Swanson, L. W. (1978). The anatomical organization of septo-hippocampal projections. In K.

Elliott & J. Whelan (Eds.), Functions of the septo-hippocampal system (pp. 25-48). New

York, NY: Elsevier.

Szczepanski, S. M., & Knight, R. T. (2014). Insights into human behavior from lesions to the

prefrontal cortex. Neuron, 83, 1002-1018.

Tabakoff, B., & Hoffman, P.L. (1996). Alcohol addiction: An enigma among us. Neuron, 16,

909-912.

Talmi, D., & Pine, A. (2012). How costs influence decision values for mixed outcomes.

Frontiers in Neuroscience, 6, 146.

Tan, K. R., Yvon, C., Turiault, M., Mirzabekov, J. J., Doehner, J., Labouèbe, G., ... &

Lüscher, C. (2012). GABA neurons of the VTA drive conditioned place aversion.

Neuron, 73, 1173-1183.

Tarpy, R. M., & Sawabini, F. L. (1974). Reinforcement delay: A selective review of the last

decade. Psychological Bulletin, 81, 984-997.

Taylor, D.P., Allen, L.E., Becker, J.A., Crane, M., Hyslop, D.K., & Riblet L.A. (1984).

Changing concepts of the biochemical action of the anxioselective drug, buspirone.

Drug Development Research, 4, 95-108.

268

Thomas, B. L., Larsen, N., & Ayres, J. J. (2003). Role of context similarity in ABA, ABC,

and AAB renewal paradigms: Implications for theories of renewal and for treating

human phobias. Learning and Motivation, 34, 410-436.

Thorndike, E. L. (1911). Animal intelligence: Experimental studies. New York, NY:

Macmillan.

Thorndike, E. L. (1913). Educational psychology (Vol. 2). New York, NY: Teacher’s

College, Columbia University.

Thorndike, E.L. (1932). The fundamentals of learning. New York, NY: Teacher’s College,

Columbia University.

Tobler, P. N., Dickinson, A., & Schultz, W. (2003). Coding of predicted reward omission by

dopamine neurons in a conditioned inhibition paradigm. Neuroscience and

Biobehavioral Reviews, 6, 1-13.

Tobler, P. N., Fiorillo, C. D., & Schultz, W. (2005). Adaptive coding of reward value by

dopamine neurons. Science, 307, 1642-1645.

Tomaiulo, M., Gonzalez, C., Medina, J.H., & Piriz, J. (2014). Lateral habenula determines

long-term storage of aversive memories. Frontiers in Behavioral Neuroscience, 8, 170.

Tornatzky, W., & Miczek, K.A. (1995). Alcohol, anxiolytics and social stress in rats.

Psychopharmacology, 121, 135-144.

Torrubia, R., Avila, C., Moltó, J., & Caseras, X. (2001). The Sensitivity to Punishment and

Sensitivity to Reward Questionnaire (SPSRQ) as a measure of Gray's anxiety and

impulsivity dimensions. Personality and Individual Differences, 31, 837-862.

Tran‐Tu‐Yen, D. A., Marchand, A. R., Pape, J. R., Di Scala, G., & Coutureau, E. (2009).

Transient role of the rat prelimbic cortex in goal‐directed behaviour. European Journal

of Neuroscience, 30, 464-471.

Ugedo, L., Grenhoff, J., & Svensson, T. H. (1989). Ritanserin, a 5-HT2 receptor antagonist,

269

activates midbrain dopamine neurons by blocking serotonergic inhibition.

Psychopharmacology, 98, 45-50.

Ullsperger, M., & von Cramon, D. Y. (2003). Error monitoring using external feedback:

Specific roles of the habenular complex, the reward system, and the cingulate motor

area revealed by functional magnetic resonance imaging. Journal of Neuroscience, 23,

4308-4314.

Ungless, M. A., Magill, P. J., & Bolam, J. P. (2004). Uniform inhibition of dopamine neurons

in the ventral tegmental area by aversive stimuli. Science, 303, 2040-2042.

Ursin, H. (1976). Inhibition and the septal nuclei: Breakdown of the single concept model.

Acta Neurobiologiae Experimentalis, 36, 91-115.

Uylings, H. B., Groenewegen, H. J., & Kolb, B. (2003). Do rats have a prefrontal cortex?

Behavioural Brain Research, 146, 3-17. van Haaren, F., & Anderson, K. (1997). Effects of chlordiazepoxide, buspirone and cocaine

on behavior suppressed by timeout. Behavioral Pharmacology, 8, 174-182. van Meel, C. S., Heslenfeld, D. J., Oosterlaan, J., Luman, M., & Sergeant, J. A. (2011). ERPs

associated with monitoring and evaluation of monetary reward and punishment in

children with ADHD. Journal of Child Psychology and Psychiatry, 52, 942-953.

Vertes, R.P. (2004). Differential projections of the infralimbic and prelimbic cortex in the rat.

Synapse, 51, 32-58.

Vidal-Gonzalez, I., Vidal-Gonzalez, B., Rauch, S. L., & Quirk, G. J. (2006). Microstimulation

reveals opposing influences of prelimbic and infralimbic cortex on the expression of

conditioned fear. Learning & Memory, 13, 728-733.

Vogel, J. R., Beer, B., & Clody, D. E. (1971). A simple and reliable conflict procedure for

testing anti-anxiety agents. Psychopharmacologia, 21, 1-7.

270

Vogel, R. A., Frye, G. D., Wilson, J. H., Kuhn, C. M., Koepke, K. M., Mailman, R. B., ... &

Breese, G. R. (1980). Attenuation of the effects of punishment by ethanol: Comparisons

with chlordiazepoxide. Psychopharmacology, 71, 123-129.

Volkow, N. D., Fowler, J. S., Wang, G. J., Swanson, J. M., & Telang, F. (2007). Dopamine in

drug abuse and addiction: Results of imaging studies and treatment implications.

Archives of Neurology, 64, 1575-1579.

Volkow, N. D., Wang, G. J., Kollins, S. H., Wigal, T. L., Newcorn, J. H., Telang, F., ... &

Swanson, J. M. (2009). Evaluating dopamine reward pathway in ADHD: Clinical

implications. Journal of the American Medical Association, 302, 1084-1091.

Voorn, P., Vanderschuren, L. J., Groenewegen, H. J., Robbins, T. W., & Pennartz, C. M.

(2004). Putting a on the dorsal–ventral divide of the striatum. Trends in

Neurosciences, 27, 468-474.

Wächter, T., Lungu, O. V., Liu, T., Willingham, D. T., & Ashe, J. (2009). Differential effect

of reward and punishment on procedural learning. Journal of Neuroscience, 29, 436-

443.

Wall, N.R., De La Parra, M., Callaway, E.M., & Kreitzer, A.C. (2013). Differential

innervation of direct- and indirect-pathway striatal projection neurons. Neuron, 79, 347-

360.

Wallis, J. D. (2012). Cross-species studies of orbitofrontal cortex and value-based decision-

making. Nature Neuroscience, 15, 13-19.

Wallner, M., Hanchar, H.J., & Olsen, R.W. (2014). Alcohol selectivity of β3-containing

GABAA receptors: evidence for a unique extracellularalcohol/ imidazobenzodiazepine

Ro15-4513 binding site at the α+β- subunit interface in αβ3δ GABAA receptors.

Neurochemical Research, 39, 1118-1126.

271

Weber, S., Habel, U., Amunts, K., & Schneider, F. (2008). Structural brain abnormalities in

psychopaths—A review. Behavioral Sciences & the Law, 26, 7-28.

Wettstein, J. G. (1988). Behavioral effects of acute and chronic buspirone. European Journal

of Pharmacology, 151, 341-344.

Whitteridge, D. (1948). The role of acetylcholine in synaptic transmission: A critical review.

Journal of Neurology, Neurosurgery and Psychiatry, 11, 134-140.

Wickens, J. (2008). Toward an anatomy of disappointment: Reward-related signals from the

globus pallidus. Neuron, 60, 530-531.

Wiech, K., & Tracey, I. (2013). Pain, decisions, and actions: A motivational perspective.

Frontiers in Neuroscience, 7, 46.

Wiech, K., Lin, C. S., Brodersen, K. H., Bingel, U., Ploner, M., & Tracey, I. (2010). Anterior

insula integrates information about salience into perceptual decisions about pain.

Journal of Neuroscience, 30, 16324-16331.

Wightman, R.M., & Robinson, D. L. (2002). Transient changes in mesolimbic dopamine and

their association with ‘reward’. Journal of Neurochemistry, 82, 721–735.

Williams, B. A. (1989). The effects of response contingency and reinforcement identity on

response suppression by alternative reinforcement. Learning and Motivation, 20, 204-

224.

Willcocks, A. L., & McNally, G. P. (2013). The role of medial prefrontal cortex in extinction

and reinstatement of alcohol‐seeking in rats. European Journal of Neuroscience, 37,

259-268.

Witkin, J. M. (2002). Some contextual and historical determinants of the effects of

chlordiazepoxide on punished responding of rats. Psychopharmacology, 163, 488-494.

Witkin, J. M., & Barrett, J. E. (1976). Effects of pentobarbital on punished behavior at

different shock intensities. Pharmacology Biochemistry and Behavior, 5, 535-538.

272

Witkin, J. M., Morrow, D., & Li, X. (2004). A rapid punishment procedure for detection of

anxiolytic compounds in mice. Psychopharmacology, 172, 52-57.

Wise, C. D., Berger, B. D., & Stein, L. (1970). Serotonin: a possible mediator of behavioral

suppression induced by anxiety. Diseases of the Nervous System, 31, 34-37.

Wise, R. A. (1980). The dopamine synapse and the notion of ‘pleasure centers’ in the brain.

Trends Neuroscience, 3, 91-95.

Wise, R. A. (2004). Dopamine, learning and motivation. Nature Reviews Neuroscience, 5, 1-

12.

Wise, R. A., & Rompré, P. P. (1989). Brain dopamine and reward. Annual Review of

Psychology, 40, 191-225.

Wu, Y.H., Rayburn, J.W., Allen, L.E., Ferguson H.C., & Kissel, J.W. (1972). Psychosedative

agents. 2. 8-(4-Substituted 1-piperazinylalkyl)-8-azaspiro(4.5)decane-7,9-

diones. Journal of Medicinal Chemistry, 15, 477–479.

Xia, M., Salata, J. J., Figueroa, D. J., Lawlor, A. M., Liang, H. A., Liu, Y., & Connolly, T. M.

(2004). Functional expression of L-and T-type Ca 2+ channels in murine HL-1 cells.

Journal of Molecular and Cellular Cardiology, 36, 111-119.

Yang, Y., Raine, A., Narr, K. L., Colletti, P., & Toga, A. W. (2009). Localization of

deformations within the amygdala in individuals with psychopathy. Archives of General

Psychiatry, 66, 986-994.

Yin, H. H., & Knowlton, B. J. (2006). The role of the basal ganglia in habit formation. Nature

Reviews Neuroscience, 7, 464-476.

Yin, H.H., Ostlund, S.B., Knowlton, B.J., & Balleine, B.W. (2005a). The role of the

dorsomedial striatum in instrumental conditioning. European Journal of Neuroscience,

22, 513–523.

273

Yin, H.H., Knowlton, B.J., & Balleine, B.W. (2005b). Blockade of NMDA receptors in the

dorsomedial striatum prevents action-outcome learning in instrumental conditioning.

European Journal of Neuroscience, 22, 505–512.

Yin, H. H., Ostlund, S. B., & Balleine, B. W. (2008). Reward‐guided learning beyond

dopamine in the nucleus accumbens: The integrative functions of cortico‐basal ganglia

networks. European Journal of Neuroscience, 28, 1437-1448.

Yizhar, O., Fenno, L. E., Davidson, T. J., Mogri, M., & Deisseroth, K. (2011). Optogenetics

in neural systems. Neuron, 71, 9-34.

Young, R., Urbancic, A., Emrey, T. A., Hall, P. C., & Metcalf, G. (1987). Behavioral effects

of several new anxiolytics and putative anxiolytics. European Journal of

Pharmacology, 143, 361-371.

Zalla, T., Koechlin, E., Pietrini, P., Basso, G., Aquino, P., Sirigu, A., & Grafman, J. (2000).

Differential amygdala responses to winning and losing: A functional magnetic

resonance imaging study in humans. European Journal of Neuroscience, 12, 1764-

1770.

Zhang, W., Schneider, D. M., Belova, M. A., Morrison, S. E., Paton, J. J., & Salzman, C. D.

(2013). Functional circuits and anatomical distribution of response properties in the

primate amygdala. Journal of Neuroscience, 33, 722-733.

Zuo, W., Chen, L., Wang, L., & Ye, J. H. (2013). Cocaine facilitates glutamatergic

transmission and activates lateral habenular neurons. Neuropharmacology, 70, 180-189.

Zweifel, L. S., Parker, J. G., Lobb, C. J., Rainwater, A., Wall, V. Z., Fadok, J. P., ... &

Palmiter, R. D. (2009). Disruption of NMDAR-dependent burst firing by dopamine

neurons provides selective assessment of phasic dopamine-dependent behavior.

Proceedings of the National Academy of Sciences, 106, 7281-7288.

274