The Role of the Lateral Habenula in Inhibitory Learning from Reward Omission Experiences

Review | Cognition and Behavior The Role of the Lateral Habenula in Inhibitory Learning from Reward Omission Experiences https://doi.org/10.1523/ENEURO.0016-21.2021

Cite as: eNeuro 2021; 10.1523/ENEURO.0016-21.2021 Received: 12 January 2021 Revised: 26 March 2021 Accepted: 30 March 2021

This Early Release article has been peer-reviewed and accepted, but has not been through the composition and copyediting processes. The final version may differ slightly in style or formatting and will contain links to any extended data.

Alerts: Sign up at www.eneuro.org/alerts to receive customized email alerts when the fully formatted version of this article is published.

Copyright © 2021 Sosa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.

1 1. Manuscript Title (50 word maximum): The Role of the Lateral Habenula in Inhibitory Learning 2 from Reward Omission Experiences 3 4

5 2. Abbreviated Title (50 character maximum): Lateral Habenula and Inhibitory Learning 6 7

8 3. List all Author Names and Affiliations in order as they would appear in the published 9 article:

11 Rodrigo Sosa. Universidad Panamericana (corresponding autor).

12 Jesús Mata-Luévanos. Universidad Iberoamericana.

13 Mario Buenrostro-Jáuregui. Universidad Iberoamericana (co-corresponding autor). 14 15

16 4. Author Contributions: RS conceived the idea and outlined the paper; RS, JM-L, and MB- 17 J wrote the paper to equal extent; RS and MB-J worked out the figures; JM-L formatted 18 the manuscript. 19 20

21 5. Correspondence should be addressed to (include email address): Rodrigo Sosa 22 ([email protected]) or Mario Buenrostro-Jáuregui ([email protected]). 23 24 25 6. Number of Figures: 31 9. Number of words for Abstract: 203 26 5 32 10. Number of words for 27 7. Number of Tables: 0 33 Significance Statement: 119 28 8. Number of 34 11. Number of words for Introduction: 523 29 Multimedia: 0 35 12. Number of words for Discussion: 925 30

37 13. Acknowledgements: We thank División de Investigación y Posgrado of the 38 Universidad Iberoamericana Ciudad de México and the Fondo Fomento a la 39 Investigación of the Universidad Panamericana for providing funding for production 40 expenses. 41 42

43 14. Conflict of Interest: The authors declare no competing financial interests. 44 45

46 15. Funding sources: Sistema Nacional de Investigadores, Consejo Nacional de 47 Ciencia y Tecnología. 48

61 The Role of the Lateral Habenula in Inhibitory Learning

62 from Reward Omission

64 Abstract

65 The lateral habenula (LHb) is a phylogenetically primitive brain structure that plays a key role in

66 learning to inhibit distinct responses to specific stimuli. This structure is activated by primary

67 aversive stimuli, cues predicting an imminent aversive event, unexpected reward omissions,

68 and cues associated with the omission of an expected reward. The most widely described effect

69 of LHb activation is acutely suppressing midbrain dopaminergic signaling. However, recent

70 studies have identified multiple means by which the LHb foster this effect as well as other

71 mechanisms of action. These findings reveal the complex nature of LHb function. The present

72 paper reviews the role of this structure in learning from reward omission experiences. We

73 approach this topic from the perspective of computational models of behavioral change that

74 account for inhibitory learning to frame key findings. Such findings are drawn from recent

75 behavioral neuroscience studies that use novel brain imaging, stimulation, ablation, and

76 reversible inactivation techniques. Further research and conceptual work are needed to clarify

77 the nature of the processes related to updating motivated behavior in which the LHb is

78 involved. As yet, there is little understanding of whether such mechanisms are parallel or

79 complementary to the well-known modulatory function of the more recently evolved prefrontal

80 cortex.

82 Keywords: dopamine signaling, conditioned inhibition, inhibitory learning, mesolimbic pathway,

83 mesocortical pathway, negative reward prediction error

85 Significance Statement

86 The lateral habenula is a brain structure that has received a great deal of attention and has

87 been a hot topic in the past decades. Consequently, this research field has been extensively

88 reviewed. We review in detail some key recent findings that are pivotal in framing the role of

89 the lateral habenula in well-described associative learning phenomenon, conditioned inhibition.

90 This specific topic has not been considered deeply enough in previous review articles. We also

91 outline the possible mechanisms by which the lateral habenula updates behavior by means of

92 two identified pathway categories (inhibitory and excitatory). This provides a comprehensive

93 account potentially embracing more issues than previously thought, refines our understanding

94 of multiple reward-related mechanisms, and raises novel research questions.

96 Introduction

98 The lateral habenula (LHb) is a phylogenetically preserved brain structure located in the

99 dorsomedial surface of the thalamus that participates in learning from aversive (i.e., undesired)

100 experiences. These include primary aversive stimuli, reward omission, and cues associated with

101 either aversive stimuli or reward omission (Baker et al., 2016). However, latest research

102 suggests that the LHb is not as crucial for learning from primary aversive experiences (see Li et

103 al., 2019b) as it is for learning from reward omission.

104 The LHb has been characterized as a part of a “brake” mechanism to suppress firing in

105 midbrain dopaminergic neurons (Barrot et al., 2012; Vento & Jhou, 2020). Such effect is mainly

106 achieved by exciting GABAergic neurons in the rostromedial tegmental nucleus (RMTg) reaching

107 the ventral tegmental area (VTA) and the substantia nigra pars compacta (SNc) (Jhou, Geisler,

108 Marinelli, Degarmo, & Zahm, 2009). However, direct glutamatergic excitation of GABAergic

109 interneurons that synapse with dopamine neurons in the VTA also takes place (Omelchenko

110 and Sesack, 2009). In both cases, the net effect is to suppress dopamine release in the nucleus

111 accumbens (NAc; by VTA endings) and the dorsal striatum (by SNc endings). This would disrupt

112 reward-related motor activity, and reward-related plastic changes associated with midbrain

113 dopamine release (Tsutsui-Kimura et al., 2020). A third, recently discovered, pathway (Lammel

114 et al., 2012) involves LHb glutamatergic projections reaching dopamine neurons in the VTA that,

115 in turn, target the medial prefrontal cortex (mPFC; see Figure 1). mPFC activity has been

116 associated with aversive learning (Huang et al., 2019) and the capacity to behave congruently

117 with hierarchically arranged stimuli sets (Roughley and Killcross, 2019).

118

119 ***INSERT FIGURE 1***

120

121 Numerous reviews about the physiology of the LHb have already been published; most

122 of them cover the participation of the LHb in primary aversive learning, stressing on the its

123 putative implications for psychiatric conditions (e.g., Hu, Cui, & Yang, 2020). Other settings that

124 involve the LHb in non-primary aversive situations, such as tasks that require behavior

125 flexibility, have also been reviewed (e.g., Baker et al., 2016). In this review, we discuss the

126 experimental findings on a third ground, the role of the LHb in reward-omission learning, which

127 is known to imply negative reward prediction errors. Particularly, we explore how such findings

128 could be framed in terms of some models of behavioral change. This might be informative for a

129 research agenda aiming to unveil the details of adaptive mechanisms in which the LHb

130 participates. In addition, this may aid in understanding maladaptive behaviors that arise from

131 the disruption of brain regions in the extended network in which the LHb is embedded. We

132 conceive learning from reward omission as being potentially integrated with more widely

133 studied phenomena involving LHb function (i.e., primary aversive conditioning and behavioral

134 flexibility). On one hand, reward-omission experiences imbue preceding cues with properties

135 that are functionally equivalent to those cues associated with primary aversive stimuli. On the

136 other hand, learning in tasks requiring behavioral flexibility involves committing errors (Baker,

137 Raynor, Francis, & Mizumori, 2017) and this often entails the omission of otherwise expected

138 rewards. Therefore, we hope that understanding one of these artificially delineated fields

139 would potentially help to elucidate the mechanisms involved in the others.

140

141 1. Modeling the Dynamics of Acquisition and Extinction of Responding in Reward Learning

142

143 Learning could be roughly regarded as a process in which behavior is updated when an

144 organism faces environmental regularities. Although learning is a continuous process (Sutton &

145 Barto, 2018), it is sometimes methodologically and analytically useful to discretize it in trials

146 (Harris, 2019). Trials are fractions of time in which explicit experiences are assumed to promote

147 specific changes in future behavior. Rescorla and Wagner (1972) proposed a model accounting

148 for the associative strength of a target stimulus given its pairings with an affectively significant

149 event on a trial-by-trial basis. Such an associative strength or value could represent a

150 theoretical estimation of either the magnitude or the probability of responding to the target

151 stimulus in the next trial. While “responding” was originally intended to reflect a change in

152 overt outcome-anticipating behavior, it may also represent a neural-level state change such as

153 firing of dopamine neurons (see Roesch, Esber, Li, Daw and Schoenbaum, 2012). This model

154 states that a target stimulus gains or loses associative strength according to a learning rule. The

155 change for this value in a given trial depends on the discrepancy of the current value of the

156 target stimulus and that of the outcome that follows (e.g., presentation or absence of a

157 reward). Formally:

158

159 ΔVX = α(λ – ΣVN) Equation 1

160

161 where ΔVX represents the change in the associative strength (V) of the target stimulus (or

162 action; X) in a given trial, λ represents the magnitude of reward (zero if the reward is omitted),

163 ΣVN represents the sum of the values of all cues that are present during a conditioning trial

164 (usually, accounted for with a few of those cues), and α, which is bound between 0 and 1,

165 represents a learning rate parameter. To show how the model operates, let us first consider the

166 simplest possible example. This requires assuming that, during conditioning trials, the only

167 relevant input besides the reward is a single cue (thus being VX ≈ ΣVN). If λ (> 0) and α are held

168 constant, VX will increase approaching λ throughout conditioning trials (e.g., pairings of the

169 stimulus with reward) in a decelerated fashion, as the discrepancy between these parameters

170 (i.e., λ − ΣVN ) progressively declines (see Figure 2, left panel). Thence, the absolute value of the

171 parenthetical term in Equation 1 determines the amount of behavioral change on the next trial

172 of the same type. The discrepancy between expected and experienced outcomes has been

173 termed prediction error and could be regarded as a theoretical teaching signal that alters some

174 aspect of the system to update behavior. Prediction errors can be classified as appetitive

175 (reward-related) or aversive, and as positive or negative (Iordanova, Yau, McDannald, & Corbit.,

176 2021). In addition, prediction errors seem to dictate both Pavlovian (i.e., stimulus-outcome) and

177 instrumental (i.e., action-outcome) learning through a homologous correction mechanism

178 (Bouton, Thrailkill, Trask, & Alfaro, 2020; see also Eder, Rothermund, De Houwer, & Hommel,

179 2015).

180

181 ***INSERT FIGURE 2***

182

183 At the physiological level, canonical midbrain dopamine neurons fire above their

184 baseline activity upon unexpected rewards (i.e., positive reward prediction error; λ > ΣVN). Such

185 dopamine bursts increase the probability of occurrence of the actions that took place priorly,

186 whenever the organism is presented to a similar situation in the future (i.e., positive

187 reinforcement). In addition, dopamine bursts backpropagate to reliable signals of reward,

188 making them acquire signaling (i.e., discrimination learning) and rewarding (i.e., conditioned

189 reinforcement) properties in themselves (Sutton & Barto, 2018). Dopamine phasic activity (as in

190 VX > 0) is triggered by inputs received by midbrain neurons from multiple nodes in a brain-wide

191 network (Tian et al., 2016). Deceleration in the increase of associative strength with repeated

192 experiences involving paired presentations of the stimulus and the reward would require an

193 antagonistic mechanism. Inhibitory NAc-VTA projections have been hypothesized to play some

194 role in said mechanism (see Mollick et al., 2020).

195 Once a cue or action has acquired some associative strength, the repeated omission of

196 reward after its presentation allows for the restitution of behavior to the initial non-responsive

197 state. Reward omissions following the presentation of an already conditioned stimulus (or

198 whenever λ is smaller than ΣVN; i.e., negative reward prediction error) trigger a phasic decrease

199 (or dip) in dopamine activity (e.g., Matsumoto and Hikosaka, 2007; Schultz, Dayan and

200 Montague, 1997). If repeated consistently, these dopamine dips will decrease dopamine firing

201 upon the target stimulus in subsequent trials. This translates into a decrease in both

202 conditioned responses and reinforcing effects associated with the target stimulus until reaching

203 a minimum value (see Figure 2, right panel). Here, the procedure, process, and outcome

204 [Footnote 1: Note that the minimal response to the target stimulus observed after extinction

205 procedures is quite labile and responding is often susceptible to bounce back after a slight

206 contextual change (Bouton, Maren, & McNally, 2020). This is not accounted by the Rescorla–

207 Wagner model.] are each referred to by the term ‘extinction’. Suppression of midbrain

208 dopamine neurons induced by LHb activity plays a key role in this (context dependent; see

209 Footnote 1) restauration mechanism. For example, a recent study by Donaire et al. (2019)

210 found that LHb lesions impair extinction of an appetitive response in rats. Similarly, Zapata,

211 Hwang, and Lupica (2017) found that pharmacological inhibition of the LHb impaired extinction

212 of responses that were previously rewarded in the presence of a stimulus. However, this effect

213 was selective for cocaine reward and did not impair extinction of responding previously

214 rewarded with a sucrose solution. This result was interpreted as an indicative of the greater

215 difficulty in withholding responses rewarded with cocaine compared with those rewarded with

216 sucrose. Both findings reveal the participation of the LHb in decreasing previously acquired

217 behavior when a stimulus or action is no longer followed by a reward (i.e., extinction).

218

219 2. Inducing, Testing, and Modeling Net Inhibitory Effects

220

221 When conceptualizing conditioning trials as if a single cue is relevant for reward learning (such

222 that VX = ΣVN) obtaining a negative value for VX with Equation 1 is logically impossible. As VX

223 would range from zero to λ, it can only vary in its degree of “rewardingness” (for lack of a

224 better term). That is, one could conceive of a stimulus as more or less rewarding than the other,

225 but hardly as more aversive in a general sense. However, a distinctiveness of the Rescorla–

226 Wagner model allows to capture both net negative associative strength values and acquisition

227 of opposed affective effects. Such feature is important because it captures aspects of key

228 conditioning phenomena.

229 A neutral stimulus that is consistently paired with the omission of an expected

230 affectively significant event often acquires the opposite affective valence. In this case, a

231 stimulus paired with the omission of an expected reward would acquire aversive properties

232 (Wasserman, Franklin and Hearst, 1974). This means that the organism will learn to avoid it

233 (either actively or passively) or escape from it. On the other hand, the negative summation

234 effect occurs when a stimulus associated with the omission of reward is presented

235 simultaneously with one that reliably predicts the reward. The usual result is that responding to

236 the compound stimulus is reduced compared with that controlled by the reward-predicting

237 stimulus alone (Acebes et al., 2012; Harris, Kwok, & Andrew, 2014). The model of Rescorla and

238 Wagner (1972) allows for partitioning of the elements that constitute a compound stimulus;

239 concretely, it provides a rule to determine how associative strength values of different stimuli

240 presented simultaneously interact trial by trial. Recall that ΔVX represents the change in the

241 associative strength of a single element, X, of the assemblage of current stimuli, N. Whenever

242 ΣVN exceeds the value of λ (as in trials involving reward omission), VX could take negative values

243 under appropriate conditions. In order for X to end up with a net negative associative strength

244 following reward omission requires that (1) its current associative strength to be sufficiently

245 low and (2) the remaining of the coextensive stimuli have some associative value (see Figure 3).

246

247 ***INSERT FIGURE 3***

248

249 In the Rescorla–Wagner model, the magnitude of responses evoked by ΣVN is assumed

250 to be a sort of algebraic sum of the values of current stimuli (obviating biological limits).

251 Crucially, this model assumes that the context (cues that are continually present) could acquire

252 associative strength and is thus capable of meaningfully impacting learning. For example,

253 presenting X without the reward and the reward without X with time intervals in between

254 would imbue that stimulus with net inhibitory properties (i.e., VX < 0). This protocol is known as

255 the explicitly unpaired procedure. In such a condition, the context, C, acquires a positive

256 associative strength (VC > 0), by being present when the reward is delivered. Importantly, the

257 context is also present during the trials involving the X stimulus plus the omission of reward,

258 being ΣVN equal to VC + VX (Wagner & Rescorla, 1972). Then, ΔVX will be negative for every such

259 trial, accruing a negative value progressively over VX. This would manifest as aversion-related

260 behaviors towards X (see Wasserman et al., 1974) and as a subtraction of the response

261 magnitude evoked by a reward-predicting stimulus whenever X is presented (i.e., X ∈ ΣVN < X ∉

262 ΣVN). Such outcomes are justified by two assumptions. First, a negative VX implies a subtraction

263 of the response-evoking potential of concurrent stimuli (Wagner & Rescorla, 1972). Second,

264 negative associative values imply that a stimulus has an affective valence that opposes that of

265 the outcome with which it was trained (Daly and Daly, 1982).

266 A convenient way to conceptually and empirically instantiate these attributes of the

267 Rescorla–Wagner model is the feature-negative discrimination protocol [Footnote 2: This

268 paradigm is more broadly known as Pavlovian conditioned inhibition; however, “conditioned

269 inhibition” is also used to designate the process that the target stimulus undergoes (here termed

270 “inhibitory learning”), as well as the empiric demonstration that such a process has taken place

271 (Savastano, Cole, Barnet and Miller, 1999). Therefore, here we use the term “feature-negative

272 discrimination” for the procedure and “inhibitory learning” for the process in order to avoid

273 ambiguity.]. This technique consists in presenting two types of conditioning trials, usually in a

274 randomly alternated fashion. One type of trial consists of presenting a single stimulus, A,

275 followed by an affective outcome (e.g., reward). The remaining type of trial consists of the

276 same stimulus accompanied by another stimulus, X, followed by the omission of that outcome.

277 Such a manipulation, according to the Rescorla–Wagner model, leads stimulus A to acquire a

278 positive associative strength and stimulus X to acquire a deep negative associative strength (see

279 Figure 4). This hypothetical attribute is known as conditioned inhibition and Rescorla (1969)

280 stated that it should be empirically demonstrated using two special tests. One of these tests is

281 the above-described negative summation effect and the other is the retardation in the

282 acquisition of a conditioned response by the target stimulus. Also, as stated above, conditioned

283 inhibitors trained with an outcome of a particular affective valence have been documented to

284 acquire an opposed valence (i.e., appetitive to aversive and vice versa).

285

286 ***INSERT FIGURE 4***

287

288 3. Dopamine Dips Propagate from Reward Omission to Events that Predict it

289

290 The omission of an expected reward seems to promote a similar backpropagation process to

291 the one that takes place when a cue predicts reward. Interestingly, the LHb is likely to play a

292 crucial role in this process (see also Mollick et al., 2020). As mentioned above, reward omission

293 triggers a phasic depression in dopamine release. In turn, this promotes a decrease in

294 dopamine release (and its behavioral outcomes) towards cues that predict reward omissions.

295 This implies that dopamine dips would promote a plastic change in the behaving agent which is

296 opposite to that involved in reward. The clearest illustrations of this claim are brought by two

297 complementary experiments to be described below.

298 First, Tobler, Dickinson, and Schultz (2003) exposed macaque monkeys to a feature-

299 negative discrimination paradigm (see section above). In this study, the presentation of a cue

300 alone predicted reward delivery. The presentation of the same cue accompanied by another

301 stimulus, the conditioned inhibitor, predicted reward omission. Through training, the

302 conditioned inhibitor acquired the capacity of counteracting the effects of the companion cue

303 in reward omission trials (see Figure 4; gray circles). This tendency was also observed when the

304 conditioned inhibitor was presented along with a different reward-predicting cue in a novel

305 compound (i.e., negative summation). Notably, those effects were reported at both behavioral

306 (licking response) and neural (dopamine neuron activity) levels of observation. Most

307 importantly, the presentation of the conditioned inhibitor alone, while behaviorally silent, was

308 capable of decreasing baseline dopamine firing levels. This revealed that reward omission

309 promotes a transfer of function from the situation in which the reward is actually omitted to

310 the cue that consistently predicts it.

311 The second experiment was conducted recently by Chang, Gardner, Conroy, Whitaker,

312 and Schoenbaum (2018), using rats as subjects. In this study, brief dopamine dips were

313 artificially induced in midbrain neurons following the simultaneous presentation of a reward-

314 predicting cue and a novel cue. After training, the novel cue acquired the properties of a

315 conditioned inhibitor, as per the conventional tests of negative summation and retardation of

316 acquisition. Crucially, in the training trials in which this novel cue was involved the reward was

317 not omitted. Thus, the inhibitory effects in the target cue must have arisen from the artificial

318 dips in dopamine firing, complementing the findings of Tobler et al. (2003). On one hand,

319 Tobler et al. (2003) demonstrated that dopamine dips could propagate from the omission of

320 reward to the presentation of an arbitrary event; on the other hand, Chang et al. (2018)

321 demonstrated that the dopamine dips themselves are sufficient for this to occur.

322 As we mentioned above, one of the most robust physiological effects of LHb activation

323 is producing phasic dips in dopamine firing. Therefore, both Tobler et al. and Chang et al.’s

324 findings are relevant to elucidate the function of LHb physiology in learning from experiences

325 involving reward omission. In another recent study, Mollick et al. (2021) intended to test this

326 notion with humans in an fMRI study. The aim of this study was to replicate Tobler et al.’s

327 (2003) behavioral protocol but including the measurement of the activity in different regions

328 across the whole brain (not only in midbrain dopamine cells). Mollick et al. indeed captured

329 greater LHb activity during the presentation of a conditioned inhibitor compared to control

330 stimuli; however, this difference did not survive correction for multiple comparisons. Therefore,

331 the authors recommend further replication before any inference can be made. While an earlier

332 human fMRI study (Salas et al., 2010) already documented that the habenula is activated when

333 an expected reward is omitted (i.e., negative reward prediction error), evidence for the

334 backpropagation of this effect remains elusive. A possible explanation for the null result

335 reported by Mollick et al. (2021) is that the human LHb is quite small and cannot be

336 differentiated from the medial habenula with standard fMRI resolutions (cf. Salas et al., 2010).

337 Similarly, the study by Mollick et al. (2021) failed to capture decreased activity in midbrain

338 dopamine regions during reward omissions. The authors argued that their fMRI spatial

339 resolution may have also precluded the distinction between these regions and the adjacent

340 GABAergic RTMg (which was presumably active as well during reward omissions).

341

342 4. Concurrent Reward Cues Contribute to Dopamine Dips and their Propagation

343

344 The above section concluded that dopamine dips may function as “teaching” signals for

345 gradually imbuing a stimulus with negative associative strength. The Rescorla–Wagner model

346 states that the negative associative strength accrued by a stimulus that is paired with reward

347 omission depends on concurrent reward cues (see Figure 3). Therefore, if we assume that

348 dopamine dips depend on LHb activation, the latter would rely on current information about an

349 impending reward. That is, if a stimulus is followed by the omission of reward, its associative

350 strength would not change unless it is coextensive with reward-related cues. Therefore, LHb

351 activation might not be induced by the sheer non-occurrence of reward. Rather, the LHb may

352 become active when this non-occurrence coincides with signals from other brain regions

353 indicating probable reward. Furthermore, the greater the signal for reward the larger the

354 downshift in associative strength of (and the aversion imbued to) the stimulus paired with

355 reward omission (see Figure 3). Accordingly, dopamine dips induced by reward omission would

356 be proportional to the probability (or magnitude) of the reward associated with a conditioned

357 stimulus. Tian & Uchida (2015) found evidence supporting this assumption using mice as

358 subjects and three different probabilities of reward associated with different target stimuli.

359 Crucially, this pattern of results was hindered in animals with bilateral lesions in the whole

360 habenular complex.

361 A relevant consideration is the origin of the signals required for invigorating the LHb

362 activity upon signals of an impending reward. The entopeduncular nucleus (EPN; border region

363 of the globus pallidus internal segment in primates) (Hong and Hikosaka, 2013) and lateral

364 hypothalamus (LH) (Stamatakis et al., 2016) provide excitatory inputs to the LHb. However, up

365 to this point, it remains unclear how impending reward signals mediate the invigoration of the

366 LHb. An elegant study with macaques by Hong and Hikosaka (2013) found that electric

367 stimulation of the ventral pallidum (VP) consistently inhibited the activity of the LHb. These

368 authors conjectured that an interaction between inputs from the EPN and the VP in the LHb

369 was responsible for the activation of the LHb. Recent evidence indicates that GABAergic

370 neurons in the VP are activated by surprising rewards and reward-predicting cues and inhibited

371 by aversive stimuli and reward omission (Stephenson-Jones et al., 2020; Wheeler and Carelli,

372 2006). Crucially, inhibition of LHb projecting VP GABAergic neurons upon reward omission

373 depends on current reward cues. In short, these VP neurons showed a biphasic

374 ascending→descending pattern of activation to the reward-cue→reward-omission sequence.

375 Therefore, this structure is a strong candidate for providing signals to invigorate the activation

376 of the LHb in reward omission episodes via disinhibition.

377 The strength with which the EPN triggers LHb activation may depend on the prior tone

378 of VP inhibitory inputs. In fact, Stephenson-Jones et al. (2020) reported that a number of

379 GABAergic VP neurons synapsing with the LHb exhibited a sustained pattern of activation

380 during reward-predicting cues. Then, sudden inhibition of these neurons upon reward omission

381 may combine with activation of EPN signaling for strongly disinhibiting the LHb. Stephenson-

382 Jones et al. (2020) also observed that a small subpopulation of LHb projecting glutamatergic VP

383 neurons showed the opposite pattern of activation to that of GABAergic ones. Activation of

384 these neurons may add up to complement the potent disinhibition-excitation process in the

385 LHb during omission of an expected reward.

386 Knowing the source of excitatory and inhibitory inputs to the LHb is an essential matter.

387 However, this leads to the task of identifying the upstream inputs of these sources (Bromberg-

388 Martin, Matsumoto, Hong and Hikosaka, 2010b) and so on. Hong and Hikosaka (2013),

389 speculated that the EPN may activate the LHb through disinhibitory inputs from striosomal

390 regions of the dorsal striatum. This conjecture has recently been supported by a study

391 conducted by Hong et al. (2019) in which the activation of striatal neurons inside or nearby

392 striosomes correlated with activation of the LHb. Hong et al. (2019) further advanced the

393 hypothesis that striosomes may convey both excitatory and inhibitory inputs to the LHb.

394 Regarding the VP, Stephenson-Jones et al. (2020) reported that neurons in this structure

395 targeting the LHb became active in a state-dependent fashion; specifically, these neurons fire

396 according to the motivational status of the subjects, which is probably mediated by

397 hypothalamic signals. It seems reasonable that activation of the LHb would be dependent on

398 the energetic supply status or sexual drive at a particular moment. In short, the omission of

399 reward should not cause substantial disturbance to a sated individual. This could be another

400 possibility for explaining the failure of Mollick et al. (2021) to replicate Tobbler et al.’s findings.

401 Both studies used fruit juice as reward. However, in the latter study macaques were liquid

402 deprived, while in the former participants were recruited based on self-reported preference for

403 the type of juice employed. The motivational state of participants in Mollick et al.’s (2021) study

404 was probably insufficient to induce strong physiological responses.

405

406 5. The Lateral Habenula Fosters Stimulus-Specific and Response-Specific Inhibitory Effects

407

408 Although feature-negative discrimination is a prototypic procedure for inducing conditioned

409 inhibition effects in appetitive settings (which here we are proposing as a behavioral

410 phenotypic model of LHb function), other protocols are also capable of doing so (see Savastano

411 et al., 1999). Likewise, other behavioral manifestations, besides negative summation and

412 retardation of acquisition tests, could be used to certify conditioned inhibition (e.g.,

413 Wasserman et al., 1974). For example, in a pivotal study, Laurent, Wong, and Balleine (2017)

414 exposed rats to a backward conditioning preparation to induce conditioned inhibition. Then,

415 conditioned inhibitory properties of stimuli were tested through a Pavlovian-to-instrumental

416 transfer (PIT) test. This backward conditioning procedure consisted in presenting one of two

417 different auditory target stimuli following –rather than preceding, as typically done– the

418 presentation of two different rewards; each target stimuli was consistently associated with its

419 corresponding reward in a backward fashion (reward→stimulus). Next, rats pressed two levers

420 to obtain rewards and each of the rewards used in the previous phase was associated with a

421 different lever. In the PIT test, the levers were present but pressing was no longer followed by

422 reward. In that condition, the target stimuli that were used in the first phase were presented

423 randomly and lever pressing was recorded. Conditioned inhibition was revealed inasmuch each

424 target stimulus biased lever-pressing toward the opposite lever than that associated with its

425 corresponding reward. Importantly, Laurent et al. (2017) found that this outcome was impaired

426 by bilateral lesions of the LHb.

427 The disruption of conditioned inhibition was first observed by performing an electrolytic

428 lesion in the entire LHb. This result was replicated when the authors induced a selective

429 ablation of neurons in the LHb that projected to the RMTg via a viral technique. Therefore, the

430 authors concluded that this LHb-RMTg pathway is crucial for learning these specific negative

431 predictions about rewards. Remarkably, electrolytic lesions in the LHb did not disrupt PIT

432 performance for rats trained with traditional forward-conditioning (stimulus→reward)

433 procedure. This suggests that the disruptive effect was confined to inhibitory learning (see also

434 Zapata et al., 2017). Laurent et al.’s (2017) findings provide important insights for

435 understanding the role of LHb-RMTg patway in the development of conditioned inhibition in

436 appetitive settings. Beyond generally impairing responding when reward is negatively

437 predicted, the LHb seems to promote learning to perform specific action patterns to specific

438 cues. Apparently, this structure participates in enabling alternative behaviors for exploiting

439 other potential resources in the environment when the omission of a particular reward is

440 imminent.

441 Notably, the findings reported by Laurent et al. (2017) challenge the Rescorla–Wagner

442 model. According to this model, a stimulus that is explicitly unpaired with a reward can acquire

443 a negative associative strength (i.e., become a conditioned inhibitor). This would be mediated

444 by the relatively high associative strength in the context that coextends with the target stimulus

445 preceding reward omission. The Rescorla–Wagner model predicts that backward conditioning

446 would lead as well to conditioned inhibition by this rationale (see Wagner & Rescorla, 1972). In

447 backward conditioning the target stimulus also co-occurs with the presumably excitatory

448 context. However, Laurent et al. (2017) observed specific inhibitory effects for each of the

449 target stimuli in their study. It is worth stressing that those target stimuli were trained at the

450 same time range and in the same context. Thus, the target stimuli presumably acquired

451 conditioned inhibitory properties by being (negatively) associated with each reward. In

452 contrast, the Rescorla–Wagner model predicts that both target stimuli would acquire general

453 inhibitory properties. This follows from the rationale that each target stimulus was paired with

454 the absence of either reward in a context associated with both.

455 At the behavioral level, Laurent et al.’s findings can be readily accounted for by the

456 sometimes-opponent-process (SOP) model (see Vogel, Ponce, and Wagner, 2019). This

457 associative learning model invokes stimuli after-effects that are capable of shaping behavior if

458 they are appropriately paired with other events. According to the SOP model, there is an

459 inhibitory (opponent) decaying trace of its own kind following the delivery of each particular

460 reward. In the study of Laurent et al. (2017), each reward’s unique trace would have been

461 paired with each of the target stimuli. As a consequence, each target stimulus acquired reward-

462 specific conditioned inhibitory properties. As LHb lesions completely abolished the behavioral

463 effect resulting from this protocol this structure appears to play a crucial role in this

464 phenomenon. However, the putative physiological means by which LHb mediates learning in

465 conditions such as those in Laurent et al.’s (2017) study remain to be determined.

466

467 6. The Lateral Habenula Participates in Inhibitory Learning even without Temporally Specific

468 Reward Omission

469

470 The findings reported by Laurent et al. (2017) merit further examination. Based on this study,

471 we can assume that the LHb could exert its effects without acute episodes involving the

472 omission of an expected reward. Such a process is also instantiated in explicitly unpaired

473 procedures, in which a target stimulus acquires inhibitory properties solely by alternating with

474 reward. A recent study by Choi, Kim, Gallagher, and Han (2020) sought to test the hypothesis

475 that the LHb is involved in learning the association of a cue with the absence of reward. These

476 authors exposed rats to an explicitly unpaired conditioning procedure that used a light as a

477 negative predictor of food delivery. Choi et al., (2020) found increased c-Fos expression in the

478 LHb of rats exposed to an explicitly unpaired conditioning procedure compared to controls. This

479 result supports the idea that the LHb is engaged in the learning process that takes place when a

480 stimulus signals the absence of reward. In a separate experiment, these authors did not find

481 any disruption in performance on this protocol when they induced excitotoxic LHb lesions.

482 However, this should not be taken as negative evidence of the involvement of the LHb in

483 inhibitory learning. This may be explained on the basis of the need for special tests (e.g.,

484 summation, retardation, PIT; see Rescorla, 1969) to certify that an inhibitory stimulus is capable

485 to counteract reward-related behaviors. Unfortunately, Choi et al. (2020) did not conduct any

486 of the available tests for assessing conditioned inhibition.

487 Both in backward conditioning and in explicitly unpaired conditioning preparations, the

488 reward would not be expected to occur in any particular moment; specifically, a reward

489 delivery is scheduled to occur at random intervals without discrete cues anticipating it, unlike in

490 extinction and in feature-negative discrimination procedures. The ability of the LHb to

491 modulate reward seeking, even in paradigms not involving explicit reward omissions, raise the

492 question of what the mechanisms involved are. A recent study by Wang et al. (2017) may shed

493 light in this issue. These authors described a rebound excitation in the LHb following the

494 inhibition typically produced by reward delivery. Given appropriate timing, the traces of such a

495 rebound excitation could have been paired with the target stimuli for the control –but not for

496 the lesioned– subjects in Laurent et al.’s study. Such pairings would facilitate the propagation of

497 LHb excitation, presumably promoting dopamine suppression, eventually upon the target

498 stimuli alone. However, this mechanism cannot account readily for inhibitory learning in

499 explicitly unpaired conditioning procedures. In these protocols, the target stimulus does not

500 consistently follow the reward, but these events are rather separated by intertrial intervals of

501 varying durations. Therefore, decreases in dopamine mediated by rebound activity of the LHb

502 could hardly explain the inhibitory properties acquired by a target stimulus in explicitly

503 unpaired procedures.

504 The LHb is known to modulate other monoamines besides dopamine, such as serotonin

505 (Amat et al., 2001). Tonic serotoninergic activity depends on accumulated experience with

506 rewards (Cohen et al., 2015). In turn, serotonergic tone determines the effects of phasic activity

507 in serotonergic neurons, most of which is known to occur during aversive experiences (Cools et

508 al., 2011). Thus, the LHb may participate in accumulating information about reward probability

509 in a given environment. This would tune serotonin levels during inter-reward intervals on

510 explicitly unpaired procedures, which might determine the affectiveness of salient cues

511 throughout those intervals. A similar effect may also be evoked by the LHb via modulation of

512 tonic dopamine levels (see Lecourtier, DeFrancesco, & Moghaddam, 2008), which would not

513 contradict with the serotonergic hypothesis. However, it is yet to be determined with certainty

514 whether the LHb participates in inhibitory learning from explicitly unpaired protocols. This will

515 require using either in vivo real-time recording or special tests for conditioned inhibition in

516 lesion/inactivation studies. Also, it is not clear how the LHb would be more active in explicitly

517 unpaired conditions than in control conditions involving equivalent reward density, which was

518 the main result reported by Choi et al. (2020). A possibility is that the LHb is relatively inactive

519 when a reward is fully predicted by a cue. In contrast, it may be tonically active in ambiguous

520 conditions; for example, when the reward or the cue occur at any given time without notice.

521

522 7. The Habenulo-Meso-Cortical Excitatory Pathway May Play a Role in Negative Reward

523 Prediction Error

524

525 The glutamatergic LHb-VTA-mPFC pathway (see Figure 1) is less well-known than the LHb-RMTg

526 pathway but may also play an important role in inhibitory learning from reward omission. As we

527 described earlier, unlike other LHb efferents this pathway promotes, rather than inhibiting,

528 dopaminergic transmission. Perhaps, the most crucial evidence supporting this idea stems from

529 the study conducted by Lammel et al. (2012). These authors reported that blocking

530 dopaminergic transmission in the mPFC abolishes the capacity of the LHb to generate aversion

531 to spatial stimuli. Therefore, this pathway may be crucial in complementing dips in meso-

532 striatal dopaminergic release for learning from negative reward prediction errors. Two putative,

533 not mutually exclusive, mechanisms for such an effect could be (1) directly opposing midbrain

534 dopamine activity (see Jo, Lee, & Mitzumori, 2013) and (2) selecting which stimuli should be

535 filtered to control behavior (see Vander Weele et al., 2018). The relationship between the mPFC

536 and the LHb is reciprocal (see Mathis et al., 2017) and their afferences also converge in certain

537 brain locations (Varga, Kocsis, & Sharp, 2003). The mPFC has long been considered a key brain

538 region for restraining inappropriate actions and adjusting behavior when errors occur

539 (Ragozzino, Detrick, & Kesner, 1999) but only recently this has been considered from an

540 associative learning perspective. Although research on LHb-mPFC interaction is still scant, we

541 outline some ideas on how these regions may jointly contribute to learning from negative

542 reward prediction errors.

543 We should consider first that the mPFC is divided in functionally dissociable anatomical

544 subregions, at least for some eutherian mammals (see Ongur and Price, 2000). A subregion that

545 might play a role in the LHb’s network is the prelimbic cortex of rodents, which presumably

546 corresponds to the pregenual anterior cingulate in primates (see Laubach, Amarante, Swanson,

547 & White, 2018). Several studies have linked this subregion with the ability of mammals to

548 restrain a dominant (previously rewarded) response (Laubach, Caetano, & Narayanan, 2015).

549 An outstanding example of this is a study conducted by Meyer and Bucci (2014) using rats as

550 subjects. These authors found that prelimbic –but not in infralimbic– PFC lesions impaired

551 inhibitory learning process in an appetitive feature-negative discrimination paradigm in rats

552 (i.e., differentiation in responding to A and AX in Figure 4). As we described above, this

553 paradigm consists in presenting a target stimulus that signals reward omission. But,

554 importantly, this target stimulus occurs in the presence of a cue that otherwise consistently

555 predicts reward delivery. To effectively learn how to stop reward-related responses in such

556 situation, subjects must solve the conflict between reward and reward-omission cues that are

557 presented simultaneously.

558 It has been suggested that the opposing effects of the prelimbic division of the mPFC to

559 the reward systems is exerted via an aversion mechanism. This rationale is supported by

560 evidence that this region is involved in fear learning (e.g., Burgos-Robles, Vidal-Gonzalez and

561 Quirk, 2009; Piantadosi, Yeates, & Fioresco, 2020) and exhibits robust excitatory connections

562 with the basal amygdala (Sotres-Bayon, Sierra-Mercado, Pardilla-Delgado and Quirk, 2012).

563 Conversely, the infralimbic portion of the mPFC, has been related to both the expression of

564 habitual reward-related responses (e.g., Haddon and Killcross, 2011) and fear suppression

565 (Santini, Quirk and Porter, 2008; but see Capuzzo and Floresco, 2020). The involvement of the

566 infralimbic cortex in the latter of these processes could be accounted for in terms of mediation

567 of a subjective relief state. Such a state would be functionally equivalent to reward and,

568 therefore, opposed to aversive states in a hierarchical fashion under appropriate

569 circumstances.

570 However, some findings (e.g., Sharpe and Killcross, 2015) seem to contradict this

571 prelimbic–aversive and infralimbic–appetitive notion, which prompts deviations from this

572 theoretical scheme. Instead of the aversive–appetitive dichotomy, Killcross and Sharpe (2018)

573 sustain that the infralimbic mPFC promotes attention to stimuli that reliably predict affectively

574 significant events, regardless of their valence (i.e., reflective control). Conversely, the prelimbic

575 mPFC would promote attention to higher order cues setting hierarchical (as in conditional

576 probabilities) relationships between stimuli and relevant outcomes (i.e., proactive control);

577 again, regardless of their affective valence. A similar point has been raised by Hardung et al.

578 (2017), who found evidence for a functional gradient in the mPFC spanning from the prelimbic

579 to the lateral-orbital cortex (passing through the infralimbic, medial-orbital and ventral-orbital

580 regions). These authors reported that regions near to the PL cortex tend to participate more in

581 proactive control, while regions near the lateral-orbital cortex participate more in reflective

582 control. However, this study was conducted in an appetitive setting, so hierarchical control

583 (reactive vs proactive) and opponent affective control (appetitive vs aversive) hypotheses are

584 confounded. While cortical inputs to the LHb are generally modest, the prelimbic cortex makes

585 the largest contribution among the regions of the cortex innervating this structure (Yetnikoff et

586 al., 2015). This may indicate that this pathway serves as an either positive or negative feedback

587 mechanism, depending on whether these connections are excitatory or inhibitory. Being the

588 LHb a major aversive center, the former possibility would be at odds with the approach of

589 Sharpe and Killcross (2018).

590 Furthermore, a recent finding has tilted the scale in favor of the aversive–appetitive

591 modularity of the mPFC. Yan, Wang and Zhou (2019) reported that spiking in a subpopulation of

592 neurons in rats dorsomedial PFC (dmPFC; including prelimbic and dorsal cingulate cortex) was

593 (1) increased upon a cue signaling an imminent electric shock, and (2) increased –but

594 considerably less– and then decayed below baseline upon the presentation of a cue that signals

595 the omission of the shock. In addition, Yan et al. (2019) showed that the cue signaling shock

596 omission passed the tests of summation and retardation for conditioned inhibition, both at

597 neural and behavioral levels. Interestingly, this pattern of results mirrors those in the study by

598 Tobin et al. (2003) in an appetitive setting with macaques. In brief, suppressing fear responses

599 involves inhibition of dmPFC neurons, suggesting that this region serves primarily to facilitate

600 aversive learning rather than having a general hierarchical control function. This is in line with

601 the idea that LHb input to the prelimbic prefrontal cortex may counteract the effects of reward

602 cues via an opponent affective process. Such a mechanism has been regarded as one of the

603 necessary conditions for the occurrence of inhibitory learning and performance (see Konorski,

604 1948; Pearce & Hall, 1980). However, this mechanism may be complemented with others, such

605 as threshold increase and attention to informative cues (Laubach et al., 2015).

606

607 8. Relevance of the Lateral Habenula for Mental Health

608

609 There is an emerging literature suggesting that many psychiatric disorders can be characterized

610 as alterations of the ability to predict rewards and aversive outcomes (Byrom & Murphy, 2018).

611 Therefore, human wellbeing could be understood, at least in part, in terms of the dynamics of

612 associative learning and outcome prediction (Papalini, Beckers & Vervliet, 2020). Thus,

613 disrupting the system involving outcome predictions, either in an upward or downward

614 direction, could lead to maladaptive behaviors that threaten people's subjective wellbeing.

615 Disappointment, frustration, and discomfort stemming from worse than expected outcomes

616 are useful for directing behavior to adaptive ways of interaction with our surroundings (see

617 Figure 5 light purple arrows). However, if these subjective sensations are lacking or excessive, it

618 is likely that problematic behaviors will arise (see Figure 5 dark colored arrows).

619

620 ***INSERT FIGURE 5***

621

622 For instance, some authors have proposed that a pervasive reward learning process

623 (Chow, Smith, Wilson, Zentall and Beckmann, 2017; Sosa and dos Santos, 2019a) can explain

624 maladaptive decision-making in impulsive and risk-taking behaviors. This may arise in part from

625 a deficit in learning from negative reward prediction errors (Laude, Stagner and Zentall, 2014;

626 Sosa and dos Santos, 2019b) (see Figure 5 dark aqua arrow). Impulsive-like behavior is often

627 studied in choice protocols with animal models. A study by Stopper and Floresco (2014) showed

628 that LHb inactivation led to indifference when rats choose between ambiguous alternatives.

629 Specifically, subjects were biased away from the most profitable of two alternative rewards when a

630 delay was imposed between the selecting action and that reward; however, this bias was abolished by

631 LHb inactivation. This suggests that delay aversion might stem from a negative reward prediction

632 error mechanism, rendering the puzzle even more complex. On the other hand, studies

633 involving humans and rodent models of helplessness (Gold and Kadriu, 2019; Jakobs, Pitzer,

634 Sartorius, Unterberg and Kiening, 2019) have suggested that an over-reactivity of the LHb can

635 induce symptoms associated with depression (see Figure 10 dark purple arrows). Such

636 symptoms include reluctance to explore new environments and unwillingness to initiate

637 challenging actions, which could be interpreted as a constant state of disappointment or sense

638 of doom (Proulx et al., 2014). In that case, atypically high expectations could lead to a pervasive

639 and maladaptive enhanced inhibitory learning. Another possibility is that some individuals may

640 have a proclivity to overgeneralize learning from reward omissions, which would cause a

641 decreased activity overall (a major symptom of depression). These putative mechanisms are, of

642 course, not mutually exclusive.

643 Synthetizing behavioral and neurophysiological studies could serve to address the

644 etiology and phenomenology of these psychiatric conditions. Such an enterprise would require

645 integrating details regarding behavioral changes in response to environmental constraints and

646 the underlying neurobiological mechanisms involved. This is precisely what we have attempted

647 in the present paper. If ongoing research continues to make advances, then this perspective

648 could serve to guide clinical practice. An example would be determining whether

649 psychotherapy would be enough to tackle a particular case or if combining it with

650 pharmacotherapy would be pertinent. Knowledge about neural and behavioral dynamics might

651 serve to design combined intensive interventions to avoid long-term side effects of medications

652 and dependency (see Hengartner, Davies and Read, 2019). Additionally, understanding

653 morphogenesis and wiring of the LHb during embryological development could be relevant to

654 inquire whether disturbances in these processes underlie problematic behavioral traits

655 (Schmidt and Pasterkamp, 2017).

656 Some recent studies have furthered our understanding of behavioral-physiological

657 interactions involving the LHb from basic science to application. For example, Zhu et al. (2019)

658 found an association between subclinical depression and atypical connections in the dorsal

659 posterior thalamus and the LHb. This peculiar connectivity of the LHb might predispose people

660 to depression or, alternatively, some life events could promote behavioral patterns leading to

661 that phenotype. Experiences of reward loss could vary regarding the time frame and magnitude

662 of the involved rewards (Papini, Wood, Daniel and Norris, 2006). A person could lose money

663 during a gambling night with friends, which could be a relatively innocuous experience, or lose

664 a lifetime partner, being a devastating event. Significant life experiences can cause detectable

665 changes in brain networks (e.g., Matthews et al. 2016); tracking these changes can lead to

666 finding ways to understand the underpinnings of behavioral phenotypes associated with

667 psychiatric conditions. An intriguing example is that attenuating LHb activity has been shown to

668 ameliorate depressive-like symptoms induced by maternal separation in mice (Tchenio, Lecca,

669 Valentinova and Mameli, 2017).

670

671 9. Directions for Moving the Field Forward

672

673 Computational modeling of associative phenomena is just one possible approach to account for

674 the relationship between LHb physiology and inhibitory learning. There are other alternatives

675 for improving our understanding of this topic, even in a more detailed manner, such as neural

676 network models. These are powerful tools for envisaging plausible mechanisms implemented

677 by biological agents to interact adaptively with their environment. A clear advantage of these

678 models is that they make physiological hypotheses more tenable (Frank, 2005). To our

679 knowledge, only few recently proposed neural-network models (see Mollick et al., 2020; Vitay

680 and Hamker, 2014) have incorporated nodes regarding the physiology of the LHb. Even so,

681 those models do not account for the multifaceted circuitry (e.g., including the LHb-VTA-mPFC

682 patway) of this structure which, we argue, is highly relevant for an utter understanding of its

683 role in learning. Lack of inclusion of the LHb into neural network modeling (e.g., Burgos and

684 García-Leal, 2015; Collins and Frank, 2014; O’Reilly, Russin and Herd, 2019; Sutton and Barto,

685 2018) may be partly due to an overemphasis in performance over learning of inhibitory control.

686 Cortical-thalamic-striatal circuits are often invoked to account for inhibitory performance, often

687 without formally specifying how outcomes “teach” future actions (e.g. van Wijk et al., 2020;

688 Verbruggen et al. 2014). An integration of learning and performance models should be pursued

689 to increase our understanding of the broad network in which the LHb participates.

690 Another matter for future consideration is whether the role of the LHb in learning from

691 reward omission is exclusive for edible rewards or it extends to other types of reward, such as

692 sexual stimuli for receptive individuals. In addition, our understanding of the behavioral

693 neuroscience regarding the LHb come from select model species. It has been hypothesized that

694 the habenular complex evolved in an ancestor of vertebrates to enable circadian-determined

695 movement modulation (Hikosaka, 2010). In this sense, the LHb may have later evolved its role

696 for suppressing specific actions in response to more dynamic environmental information. This

697 exaptation hypothesis implies that the habenula of basal lineages did not play the same role it

698 does in extant vertebrates. This, in turn, implies that either basal lineages could not learn from

699 reward omission or that they did so by recruiting other mechanisms.

700 Even invertebrates are equipped with mechanisms for learning from situations involving

701 negative prediction errors (i.e., conditioned inhibition; Acebes et al., 2012; Britton and Farley,

702 1999; Couvillon, Bumanglag and Bitterman, 2003; Durrieu, Wystrach, Arrufatm Giurfa, & Isabel,

703 2020). Although this fact is appealing, for now it is fairly limited to the behavioral level of

704 observation. However, an emerging field of research is currently scrutinizing the neural

705 mechanisms involved in negative reward prediction errors in drosophila flies

706 (Felsenberg, Barnstedt, Cognini, Lin and Waddell, 2017). In these animals, a bilateral structure

707 known as the mushroom body supports associative learning using dopamine as a plasticity

708 factor, in a similar fashion as the vertebrate brain does (Aso et al., 2014). The mushroom body

709 possesses different subdivisions, which selectively release dopamine during reward or aversive

710 stimuli. Intriguingly, Felsenberg et al. (2017) reported that inactivating aversion-related

711 dopamine neurons preclude the bias in behavior otherwise produced by pairing a stimulus with

712 reward omission. It has been suggested that aversive dopamine neurons directly oppose to

713 reward-related dopamine neurons in the mushroom body (Perisse et al., 2016). This may

714 indicate that invertebrate nervous systems possess the hierarchical architecture necessary for

715 inhibitory learning, such as that found in vertebrates. Remarkably, these diverging nervous

716 systems also exhibit antagonistic dopaminergic subsystems, similar to those that the LHb

717 orchestrate in the vertebrate brain. Future research in this field may uncover the evolutionary

718 origins of negative reward prediction errors and a more precise dating of the origins and

719 precursors of the LHb.

720

721 9. Summary and Concluding Remarks

722

723 Organisms benefit from possessing mechanisms to track resources in their surroundings and

724 adjust their actions to obtain maximum profit. This relies on a delicate balance between

725 responding and withholding specific responses whenever it is appropriate. Sensorimotor

726 feedforward loops are mediated by long-term potentiation in some locations of the striatum

727 induced by midbrain dopamine inputs (Yagishita et al., 2014). Conversely, there are several

728 processes that prevent spurious sensorimotor loops, which would potentially waste energy and

729 put the organism at risk. A convenient paradigm to study one such process is conditioned

730 inhibition, a subclass of Pavlovian learning phenomena. This paradigm could serve as a

731 principled and physiologically informed behavioral phenotype model of negative error

732 prediction. Negative feedback control mechanisms had been recently proposed to be of

733 primary importance to understand adaptive behavior (Yin, 2020). Therefore, conditioned

734 inhibition might be a particularly relevant and timely conceptual and methodological tool.

735 Intriguingly, although inhibitory learning has been thought to be multifaceted in nature (Sosa

736 and Ramírez, 2019), several of its manifestations seem to implicate the LHb.

737 Omission of expected rewards at a precise time promotes dopamine dips in canonical

738 midbrain neurons with a remarkable contribution of the LHb (see Mollick et al., 2020). These

739 dopamine dips back-propagate to stimuli that consistently predict reward omission in a way

740 that resembles plastic changes associated with dopamine release (Tobler et al., 2003). All things

741 being equal, dopamine dips can counteract phasic dopamine effects on behavior (Chang et al.,

742 2018). This accounts for LHb’s contribution to extinction of a previously acquired response

743 (Zapata et al., 2017), feature negative discrimination, negative summation, and retardation of

744 acquisition in appetitive settings (Tobler et al., 2003). However, although the activity of the LHb

745 is associated with dopamine dips, it may also participate in learning processes beyond general

746 suppression of previously rewarded actions (Laurent et al., 2017). This selective effect is

747 remarkable given that other brain areas have been associated with a global response inhibition

748 (see Wiecki & Frank, 2013).

749 Even if LHb’s supressing effects on midbrain dopamine neurons are robust, the

750 excitatory LHb-VTA-mPFC pathway may also play a role in inhibitory learning from reward

751 omission. Lesions of the prelimbic portion of the mPFC impair inhibitory learning in appetitive

752 settings (e.g., Meyer and Bucci, 2014). This region receives indirect dopaminergic input from

753 the LHb and its activation has been implicated in aversion learning (Lammel et al., 2012). Some

754 learning theories have proposed that conditioned inhibition is partly determined by

755 antagonizing the effect of an affectively loaded conditioned stimulus (Konorski, 1948; Pearce

756 and Hall, 1980). The activity of this pathway leads to dopamine release in the mPFC, which may

757 allow channeling relevant sensory inputs to key plastic brain regions (Vander Weele et al.,

758 2018). Therefore, interactions of the LHb with aversive centers via the prelimbic cortex may

759 facilitate imbuing cues with the capacity of counteracting reward-seeking tendencies.

760 Disrupting either the excitatory or inhibitory LHb pathways has been shown to hamper

761 inhibitory learning processes to some degree. Whether and how those pathways may interact

762 or complement each other to produce inhibitory learning remains a matter of further

763 investigation.

764 Feature-negative discrimination and extinction procedures promote a decrease (or even

765 net negative values) in associative strength, presumably through the omission of a reward at a

766 precise moment following a cue. However, backward conditioning also induces conditioned

767 inhibition and this effect is impaired by specific ablation of the LHb-RMTg pathway (Laurent et

768 al., 2017). In this paradigm, the reward is not expected at a specific point in time, so other

769 mechanisms may be recruited by the LHb. A potential candidate for this process is the rebound

770 activity of the LHb following inhibition by reward (Wang et al., 2017). Yet, alternating a target

771 stimulus with reward delivery in an explicitly unpaired fashion also induces conditioned

772 inhibition. There is ex-vivo evidence that such conditions promote meaningful activity in LHb

773 neurons (Choi et al., 2020). In such case, learning could not be clearly linked neither to phasic

774 decreases in dopamine nor to post-reward rebound activity of the LHb; therefore, other

775 mechanisms involving the LHb may take place.

776 Converging lines of evidence suggest that LHb participates in the induction and

777 expression of inhibitory learning under different conditions involving reward. This supports the

778 intriguing idea that this structure is composed of several parallel circuits operating in different

779 situations (Stephenson-Jones et al., 2020). Unfortunately, conditioned inhibition, the

780 accumulated outcome of inhibitory learning, is an elusive phenomenon which often requires

781 special tests to be validated. Moreover, some authors have claimed that each test requires up

782 to several control conditions in order to be deemed conclusive (Papini and Bitterman, 1993).

783 This complicates the matter further, as it multiplies the number of subjects that are needed to

784 evaluate the role of the LHb in this phenomenon; aside from comparing the role of its different

785 subcircuits. However, this topic is still worth investigating, as it seems to be implied in many

786 situations involving negative predicting relationships between events. Perhaps, one situation in

787 which such processes takes place is the adaptation to environments that require behavior

788 flexibility. If one takes “negative reward prediction errors” in a broader sense, many tasks

789 requiring shifts in behavior following subtle cues could be considered in this category.

790 Accordingly, there is an increasing amount of evidence that LHb plays a role in updating

791 behavior on those tasks (e.g., Baker et al., 2017). Conditioned inhibition could be conceived as

792 an accumulated outcome of the mechanisms involved in shifting behavior in dynamic

793 conditions by error corrections. Models of associative learning would be useful to frame and

794 test this and further hypothetical propositions.

795

796 10. References

797

798 Acebes F, Solar P, Moris J, & Loy I. 2012. Associative learning phenomena in the snail (Helix

799 aspersa): Conditioned inhibition. Learn Behav 40:34–41. doi:10.3758/s13420-011-0042-6

800 Amat J, Sparks PD, Matus-Amat P, Griggs J, Watkins LR, & Maier SF. 2001. The role of the

801 habenular complex in the elevation of dorsal raphe nucleus serotonin and the changes in

802 the behavioral responses produced by uncontrollable stress. Brain Res 917:118–126.

803 doi:10.1016/S0006-8993(01)02934-1

804 Aso Y, Hattori D, Yu Y, Johnston RM, Iyer NA, Ngo TTB, et al. 2014. The neuronal architecture of

805 the mushroom body provides a logic for associative learning. Elife 3:e04577.

806 doi:10.7554/eLife.04577

807 Baker PM, Jhou T, Li B, Matsumoto M, Mizumori SJY, Stephenson-Jones M, et al. 2016. The

808 lateral habenula circuitry: Reward processing and cognitive control. J Neurosci 36:11482–

809 11488. doi:10.1523/JNEUROSCI.2350-16.2016

810 Baker PM, Raynor SA, Francis NT, & Mizumori SJY. 2017. Lateral habenula integration of

811 proactive and retroactive information mediates behavioral flexibility. Neuroscience

812 345:89–98. doi:10.1016/j.neuroscience.2016.02.010

813 Barrot M, Sesack SR, Georges F, Pistis M, Hong S, & Jhou TC. 2012. Braking dopamine systems:

814 A new GABA master structure for mesolimbic and nigrostriatal functions. J Neurosci

815 32:14094–14101. doi:10.1523/JNEUROSCI.3370-12.2012

816 Bouton ME, Maren S, & McNally GP. 2020. Behavioral and Neurobiological Mechanisms of

817 Pavlovian and Instrumental Extinction Learning. Physiol Rev.

818 doi:10.1152/physrev.00016.2020

819 Britton G, & Parley J. 1999. Behavioral and neural bases of noncoincidence learning in

820 Hermissenda. J Neurosci 19:9126–9132. doi:10.1523/jneurosci.19-20-09126.1999

821 Bromberg-Martin ES, Matsumoto M, Hong S, & Hikosaka O. 2010. A pallidus-habenula-

822 dopamine pathway signals inferred stimulus values. J Neurophysiol 104:1068–1076.

823 doi:10.1152/jn.00158.2010

824 Burgos-Robles A, Vidal-Gonzalez I, & Quirk GJ. 2009. Sustained conditioned responses in

825 prelimbic prefrontal neurons are correlated with fear expression and extinction failure. J

826 Neurosci 29:8474–8482. doi:10.1523/JNEUROSCI.0378-09.2009

827 Burgos JE, & García-Leal Ó. 2015. Autoshaped choice in artificial neural networks: Implications

828 for behavioral economics and neuroeconomics. Behav Processes 114:63–71.

829 doi:10.1016/j.beproc.2015.01.010

830 Byrom NC, & Murphy RA. 2018. Individual differences are more than a gene × environment

831 interaction: The role of learning. J Exp Psychol Anim Learn Cogn 44:36–55.

832 doi:10.1037/xan0000157

833 Capuzzo G, & Floresco SB. 2020. Prelimbic and Infralimbic Prefrontal Regulation of Active and

834 Inhibitory Avoidance and Reward-Seeking. J Neurosci 40:4773–4787.

835 doi:10.1523/JNEUROSCI.0414-20.2020

836 Chang CY, Gardner MPH, Conroy JC, Whitaker LR, & Schoenbaum G. 2018. Brief, but not

837 prolonged, pauses in the firing of midbrain dopamine neurons are sufficient to produce a

838 conditioned inhibitor. J Neurosci 38:8822–8830. doi:10.1523/JNEUROSCI.0144-18.2018

839 Choi BR, Kim DH, Gallagher M, & Han JS. 2020. Engagement of the Lateral Habenula in the

840 Association of a Conditioned Stimulus with the Absence of an Unconditioned Stimulus.

841 Neuroscience 444:136–148. doi:10.1016/j.neuroscience.2020.07.031

842 Chow JJ, Smith AP, Wilson AG, Zentall TR, & Beckmann JS. 2017. Suboptimal choice in rats:

843 Incentive salience attribution promotes maladaptive decision-making. Behav Brain Res

844 320:244–254. doi:10.1016/j.bbr.2016.12.013

845 Cohen JY, Amoroso MW, & Uchida N. 2015. Serotonergic neurons signal reward and

846 punishment on multiple timescales. Elife 4. doi:10.7554/eLife.06346

847 Collins AGE, & Frank MJ. 2014. Opponent actor learning (OpAL): Modeling interactive effects of

848 striatal dopamine on reinforcement learning and choice incentive. Psychol Rev 121:337–

849 366. doi:10.1037/a0037015

850 Cools R, Nakamura K, & Daw ND. 2011. Serotonin and dopamine: Unifying affective,

851 activational, and decision functions. Neuropsychopharmacology 36:98–113.

852 doi:10.1038/npp.2010.121

853 Couvillon PA, Bumanglag A V., & Bitterman ME. 2003. Inhibitory conditioning in honeybees. Q J

854 Exp Psychol Sect B Comp Physiol Psychol 56:359–370. doi:10.1080/02724990244000313

855 Daly HB, & Daly JT. 1982. A mathematical model of reward and aversive nonreward: Its

856 application in over 30 appetitive learning situations. J Exp Psychol Gen 111:441–480.

857 doi:10.1037/0096-3445.111.4.441

858 Durrieu M, Wystrach A, Arrufat P, Giurfa M, & Isabel G. 2020. Fruit flies can learn non-

859 elemental olfactory discriminations. Proceedings Biol Sci 287:20201234.

860 doi:10.1098/rspb.2020.1234

861 Eder AB, Rothermund K, De Houwer J, & Hommel B. 2015. Directive and incentive functions of

862 affective action consequences: an ideomotor approach. Psychol Res 79:630–649.

863 doi:10.1007/s00426-014-0590-4

864 Felsenberg J, Barnstedt O, Cognigni P, Lin S, & Waddell S. 2017. Re-evaluation of learned

865 information in Drosophila. Nature 544:240–244. doi:10.1038/nature21716

866 Gold PW, & Kadriu B. 2019. A major role for the lateral habenula in depressive illness:

867 Physiologic and molecular mechanisms. Front Psychiatry 10:1–7.

868 doi:10.3389/fpsyt.2019.00320

869 Haddon JE, & Killcross S. 2011. Inactivation of the infralimbic prefrontal cortex in rats reduces

870 the influence of inappropriate habitual responding in a response-conflict task.

871 Neuroscience 199:205–212. doi:10.1016/j.neuroscience.2011.09.065

872 Hardung S, Epple R, Jäckel Z, Eriksson D, Uran C, Senn V, et al. 2017. A Functional Gradient in

873 the Rodent Prefrontal Cortex Supports Behavioral Inhibition. Curr Biol 27:549–555.

874 doi:10.1016/j.cub.2016.12.052

875 Harris JA. 2019. The importance of trials. J Exp Psychol Anim Learn Cogn.

876 doi:10.1037/xan0000223

877 Harris JA, Kwok DWS, & Andrew BJ. 2014. Conditioned inhibition and reinforcement rate. J Exp

878 Psychol Anim Behav Process 40:335–354. doi:10.1037/xan0000023

879 Hengartner MP, Davies J, & Read J. 2019. Antidepressant withdrawal - The tide is finally turning.

880 Epidemiol Psychiatr Sci 2018–2020. doi:10.1017/S2045796019000465

881 Hikosaka O. 2010. The habenula: From stress evasion to value-based decision-making. Nat Rev

882 Neurosci 11:503–513. doi:10.1038/nrn2866

883 Hong S, Amemori S, Chung E, Gibson DJ, Amemori K ichi, & Graybiel AM. 2019. Predominant

884 Striatal Input to the Lateral Habenula in Macaques Comes from Striosomes. Curr Biol

885 29:51-61.e5. doi:10.1016/j.cub.2018.11.008

886 Hong S, & Hikosaka O. 2013. Diverse sources of reward value signals in the basal ganglia nuclei

887 transmitted to the lateral habenula in the monkey. Front Hum Neurosci 7:1–7.

888 doi:10.3389/fnhum.2013.00778

889 Hong S, & Hikosaka O. 2008. The Globus Pallidus Sends Reward-Related Signals to the Lateral

890 Habenula. Neuron 60:720–729. doi:10.1016/j.neuron.2008.09.035

891 Hu H, Cui Y, & Yang Y. 2020. Circuits and functions of the lateral habenula in health and in

892 disease. Nat Rev Neurosci 21:277–295. doi:10.1038/s41583-020-0292-4

893 Huang S, Borgland SL, & Zamponi GW. 2019. Dopaminergic modulation of pain signals in the

894 medial prefrontal cortex: Challenges and perspectives. Neurosci Lett 702:71–76.

895 doi:10.1016/j.neulet.2018.11.043

896 Iordanova MD, Yau JO-Y, McDannald MA, & Corbit LH. 2021. Neural substrates of appetitive

897 and aversive prediction error. Neurosci Biobehav Rev 123:337–351.

898 doi:10.1016/j.neubiorev.2020.10.029

899 Jakobs M, Pitzer C, Sartorius A, Unterberg A, & Kiening K. 2019. Acute 5 Hz deep brain

900 stimulation of the lateral habenula is associated with depressive-like behavior in male

901 wild-type Wistar rats. Brain Res 1721:146283. doi:10.1016/j.brainres.2019.06.002

902 Jhou TC, Geisler S, Marinelli M, Degarmo BA, & Zahm DS. 2009. The mesopontine rostromedial

903 tegmental nucleus: A structure targeted by the lateral habenula that projects to the

904 ventral tegmental area of Tsai and substantia nigra compacta. J Comp Neurol 513:566–596.

905 doi:10.1002/cne.21891

906 Jo YS, Lee J, & Mizumori SJY. 2013. Effects of prefrontal cortical inactivation on neural activity in

907 the ventral tegmental area. J Neurosci 33:8159–8171. doi:10.1523/JNEUROSCI.0118-

908 13.2013

909 Juarez J, Barrios De Tomasi E, Muñoz-Villegas P, & Buenrostro M. 2013. Adicción farmacológica

910 y conductual In: González-Garrido A, & Matute E, editors. Cerebro y Conducta.

911 Guadalajara: Manual Moderno.

912 Konorski J. 1948. Conditioned reflexes and neuron organization., Conditioned reflexes and

913 neuron organization. New York, NY, US: Cambridge University Press.

914 Lammel S, Lim BK, Ran C, Huang KW, Betley MJ, Tye KM, et al. 2012. Input-specific control of

915 reward and aversion in the ventral tegmental area. Nature 491:212–217.

916 doi:10.1038/nature11527

917 Laubach M, Amarante LM, Swanson K, & White SR. 2018. Cognition and Behavior What, If

918 Anything, Is Rodent Prefrontal Cortex? eNeuro 5:315–333.

919 Laubach M, Caetano MS, & Narayanan NS. 2015. Mistakes were made: neural mechanisms for

920 the adaptive control of action initiation by the medial prefrontal cortex. J Physiol Paris

921 109:104–117. doi:10.1016/j.jphysparis.2014.12.001

922 Laude JR, Stagner JP, & Zentall TR. 2014. Suboptimal choice by pigeons may result from the

923 diminishing effect of nonreinforcement. J Exp Psychol Anim Behav Process 40:12–21.

924 doi:10.1037/xan0000010

925 Laurent V, Wong FL, & Balleine BW. 2017. The lateral habenula and its input to the rostromedial

926 tegmental nucleus mediates outcome-specific conditioned inhibition. J Neurosci 37:10932–

927 10942. doi:10.1523/JNEUROSCI.3415-16.2017

928 Lecourtier L, DeFrancesco A, & Moghaddam B. 2008. Differential tonic influence of lateral

929 habenula on prefrontal cortex and nucleus accumbens dopamine release. Eur J Neurosci

930 27:1755–1762. doi:10.1111/j.1460-9568.2008.06130.x

931 Li H, Vento PJ, Parrilla-Carrero J, Pullmann D, Chao YS, Eid M, et al. 2019. Three Rostromedial

932 Tegmental Afferents Drive Triply Dissociable Aspects of Punishment Learning and Aversive

933 Valence Encoding. Neuron 104:987-999.e4. doi:10.1016/j.neuron.2019.08.040

934 Mathis V, Barbelivien A, Majchrzak M, Mathis C, Cassel JC, & Lecourtier L. 2017. The Lateral

935 Habenula as a Relay of Cortical Information to Process Working Memory. Cereb Cortex

936 27:5485–5495. doi:10.1093/cercor/bhw316

937 Matthews GA, Nieh EH, Vander Weele CM, Halbert SA, Pradhan R V., Yosafat AS, et al. 2016.

938 Dorsal Raphe Dopamine Neurons Represent the Experience of Social Isolation. Cell

939 164:617–631. doi:10.1016/j.cell.2015.12.040

940 Meyer HC, & Bucci DJ. 2014. The contribution of medial prefrontal cortical regions to

941 conditioned inhibition. Behav Neurosci 128:644–653. doi:10.1037/bne0000023

942 Mollick JA, Chang LJ, Krishnan A, Hazy TE, Krueger KA, Frank GKW, et al. 2021. The Neural

943 Correlates of Cued Reward Omission. Front Hum Neurosci 15.

944 doi:10.3389/fnhum.2021.615313

945 Mollick JA, Hazy TE, Krueger KA, Nair A, Mackie P, Herd SA, et al. 2020. A systems-neuroscience

946 model of phasic dopamine. Psychol Rev. doi:10.1037/rev0000199

947 O’Reilly RC, Russin J, & Herd SA. 2019. Computational models of motivated frontal function, 1st

948 ed, Handbook of Clinical Neurology. Elsevier B.V. doi:10.1016/B978-0-12-804281-6.00017-

949 3

950 Omelchenko N, & Sesack SR. 2009. Ultrastructural analysis of local collaterals of rat ventral

951 tegmental area neurons: GABA phenotype and synapses onto dopamine and GABA cells.

952 Synapse 63:895–906. doi:10.1002/syn.20668

953 Ongur D, & Price JL. 2000. The Organization of Networks within the Orbital and Medial

954 Prefrontal Cortex of Rats, Monkeys and Humans. Cereb Cortex 10:206–219.

955 doi:10.1093/cercor/10.3.206

956 Papalini S, Beckers T, & Vervliet B. 2020. Dopamine: From Prediction Error To Psychotherapy.

957 Transl Psychiatry 1–37. doi:10.1038/s41398-020-0814-x

958 Papini MR, & Bitterman ME. 1993. The Two-Test Strategy in the Study of Inhibitory

959 Conditioning. J Exp Psychol Anim Behav Process 19:342–352. doi:10.1037/0097-

960 7403.19.4.342

961 Papini MR, Wood M, Daniel AM, & Norris JN. 2006. Reward loss as psychological pain. Int J

962 Psychol Psychol Ther 6:189–213.

963 Pearce JM, & Hall G. 1980. A model for Pavlovian learning: Variations in the effectiveness of

964 conditioned but not of unconditioned stimuli. Psychol Rev 87:532–552. doi:10.1037/0033-

965 295X.87.6.532

966 Perisse E, Owald D, Barnstedt O, Talbot CBB, Huetteroth W, & Waddell S. 2016. Aversive

967 Learning and Appetitive Motivation Toggle Feed-Forward Inhibition in the Drosophila

968 Mushroom Body. Neuron 90:1086–1099. doi:10.1016/j.neuron.2016.04.034

969 Piantadosi PT, Yeates DCM, & Floresco SB. 2020. Prefrontal cortical and nucleus accumbens

970 contributions to discriminative conditioned suppression of reward-seeking. Learn Mem

971 27:429–440. doi:10.1101/LM.051912.120

972 Proulx CD, Hikosaka O, & Malinow R. 2014. Reward processing by the lateral habenula in

973 normal and depressive behaviors. Nat Neurosci 17:1146–1152. doi:10.1038/jid.2014.371

974 Ragozzino ME, Detrick S, & Kesner RP. 1999. Involvement of the prelimbic-infralimbic areas of

975 the rodent prefrontal cortex in behavioral flexibility for place and response learning. J

976 Neurosci 19:4585–4594. doi:10.1523/jneurosci.19-11-04585.1999

977 Rescorla R, & Wagner. AR. 1972. A theory of classical conditioning: Variations in the

978 effectiveness of reinforcement and nonreinforcement., Classical Conditioning II Current

979 Research and Theory. New York: Appleton-Century-Crofts.

980 Rescorla RA. 1969. Pavlovian conditioned inhibition. Psychol Bull 72:77–94.

981 doi:10.1037/h0027760

982 Roesch MR, Esber GR, Li J, Daw ND, & Schoenbaum G. 2012. Surprise! Neural correlates of

983 Pearce-Hall and Rescorla-Wagner coexist within the brain. Eur J Neurosci 35:1190–1200.

984 doi:10.1111/j.1460-9568.2011.07986.x

985 Roughley S, & Killcross S. 2019. Loss of hierarchical control by occasion setters following lesions

986 of the prelimbic and infralimbic medial prefrontal cortex in rats. Brain Sci 9:1–16.

987 doi:10.3390/brainsci9030048

988 Salas R, Baldwin P, de Biasi M, & Montague PR. 2010. BOLD responses to negative reward

989 prediction errors in human habenula. Front Hum Neurosci 4:1–7.

990 doi:10.3389/fnhum.2010.00036

991 Santini E, Quirk GJ, & Porter JT. 2008. Fear conditioning and extinction differentially modify the

992 intrinsic excitability of infralimbic neurons. J Neurosci 28:4028–4036.

993 doi:10.1523/JNEUROSCI.2623-07.2008

994 Savastano HI, Cole RP, Barnet RC, & Miller RR. 1999. Reconsidering Conditioned Inhibition.

995 Learn Motiv 30:101–127. doi:10.1006/lmot.1998.1020

996 Schmidt ERE, & Pasterkamp RJ. 2017. The molecular mechanisms controlling morphogenesis

997 and wiring of the habenula. Pharmacol Biochem Behav 162:29–37.

998 doi:10.1016/j.pbb.2017.08.008

999 Schultz W, Dayan P, & Montague PR. 1997. A neural substrate of prediction and reward. Science

1000 (80- ) 275:1593–1599. doi:10.1126/science.275.5306.1593

1001 Sharpe MJ, & Killcross S. 2018. Modulation of attention and action in the medial prefrontal

1002 cortex of rats. Psychol Rev 125:822–843. doi:10.1037/rev0000118

1003 Sharpe MJ, & Killcross S. 2015. The prelimbic cortex directs attention toward predictive cues

1004 during fear learning. Learn Mem 22:289–293. doi:10.1101/lm.038273.115

1005 Sosa R, & dos Santos CV. 2019a. Toward a Unifying Account of Impulsivity and the Development

1006 of Self-Control. Perspect Behav Sci 42:291–322. doi:10.1007/s40614-018-0135-z

1007 Sosa R, & dos Santos CV. 2019b. Conditioned Inhibition and its Relationship to Impulsivity:

1008 Empirical and Theoretical Considerations. Psychol Rec 69:315–332. doi:10.1007/s40732-

1009 018-0325-9

1010 Sosa R, & Ramírez MN. 2019. Conditioned inhibition: Historical critiques and controversies in

1011 the light of recent advances. J Exp Psychol Anim Learn Cogn 45:17–42.

1012 doi:10.1037/xan0000193

1013 Sotres-Bayon F, Sierra-Mercado D, Pardilla-Delgado E, & Quirk GJ. 2012. Gating of Fear in

1014 Prelimbic Cortex by Hippocampal and Amygdala Inputs. Neuron 76:804–812.

1015 doi:10.1016/j.neuron.2012.09.028

1016 Stamatakis AM, Van Swieten M, Basiri ML, Blair GA, Kantak P, & Stuber GD. 2016. Lateral

1017 hypothalamic area glutamatergic neurons and their projections to the lateral habenula

1018 regulate feeding and reward. J Neurosci 36:302–311. doi:10.1523/JNEUROSCI.1202-

1019 15.2016

1020 Stephenson-Jones M, Bravo-Rivera C, Ahrens S, Furlan A, Xiao X, Fernandes-Henriques C, et al.

1021 2020. Opposing Contributions of GABAergic and Glutamatergic Ventral Pallidal Neurons to

1022 Motivational Behaviors. Neuron 105:921-933.e5. doi:10.1016/j.neuron.2019.12.006

1023 Stopper CM, & Floresco SB. 2014. What’s better for me? Fundamental role for lateral habenula

1024 in promoting subjective decision biases. Nat Neurosci 17:33–35. doi:10.1038/nn.3587

1025 Sutton RS, & Barto AG. 2018. Reinforcement learning: An introduction (second edition), The

1026 MIT Pr. ed. Cambridge.

1027 Tchenio A, Lecca S, Valentinova K, & Mameli M. 2017. Limiting habenular hyperactivity

1028 ameliorates maternal separation-driven depressive-like symptoms. Nat Commun 8.

1029 doi:10.1038/s41467-017-01192-1

1030 Tian J, Huang R, Cohen JY, Osakada F, Kobak D, Machens CK, et al. 2016. Distributed and Mixed

1031 Information in Monosynaptic Inputs to Dopamine Neurons. Neuron 91:1374–1389.

1032 doi:10.1016/j.neuron.2016.08.018

1033 Tian J, & Uchida N. 2015. Habenula Lesions Reveal that Multiple Mechanisms Underlie

1034 Dopamine Prediction Errors. Neuron 87:1304–1316. doi:10.1016/j.neuron.2015.08.028

1035 Tobler PN, Dickinson A, & Schultz W. 2003. Coding of Predicted Reward Omission by Dopamine

1036 Neurons in a Conditioned Inhibition Paradigm. J Neurosci 23:10402–10410.

1037 doi:10.1523/jneurosci.23-32-10402.2003

1038 Tsutsui-Kimura I, Matsumoto H, Uchida N, & Watabe-Uchida M. 2020. Distinct temporal

1039 difference error signals in dopamine axons in three regions of the striatum in a decision-

1040 making task 2 3. bioRxiv 2020.08.22.262972.

1041 van Wijk BCM, Alkemade A, & Forstmann BU. 2020. Functional segregation and integration

1042 within the human subthalamic nucleus from a micro- and meso-level perspective. Cortex

1043 131:103–113. doi:10.1016/j.cortex.2020.07.004

1044 Vander Weele CM, Siciliano CA, Matthews GA, Namburi P, Izadmehr EM, Espinel IC, et al. 2018.

1045 Dopamine enhances signal-to-noise ratio in cortical-brainstem encoding of aversive

1046 stimuli. Nature 563:397–401. doi:10.1038/s41586-018-0682-1

1047 Varga V, Kocsis B, & Sharp T. 2003. Electrophysiological evidence for convergence of inputs

1048 from the medial prefrontal cortex and lateral habenula on single neurons in the dorsal

1049 raphe nucleus. Eur J Neurosci 17:280–286. doi:10.1046/j.1460-9568.2003.02465.x

1050 Vento PJ, & Jhou TC. 2020. Bidirectional Valence Encoding in the Ventral Pallidum. Neuron

1051 105:766–768. doi:10.1016/j.neuron.2020.02.017

1052 Verbruggen F, Best M, Bowditch WA, Stevens T, & McLaren IPL. 2014. The inhibitory control

1053 reflex. Neuropsychologia 65:263–278. doi:10.1016/j.neuropsychologia.2014.08.014

1054 Vitay J, & Hamker FH. 2014. Timing and expectation of reward: A neuro-computational model

1055 of the afferents to the ventral tegmental area. Front Neurorobot 8:1–25.

1056 doi:10.3389/fnbot.2014.00004

1057 Vogel EH, Ponce FP, & Wagner AR. 2019. The development and present status of the SOP model

1058 of associative learning. Q J Exp Psychol 72:346–374. doi:10.1177/1747021818777074

1059 Wagner AR, & Rescorla RA. 1972. Inhibition in Pavlovian Conditioning: Application of a Theory.

1060 Inhib Learn 301–334.

1061 Wang D, Li Y, Feng Q, Guo Q, Zhou J, & Luo M. 2017. Learning shapes the aversion and reward

1062 responses of lateral habenula neurons. Elife 6:1–20. doi:10.7554/elife.23045

1063 Wasserman EA, Franklin SR, & Hearst E. 1974. Pavlovian appetitive contingencies and approach

1064 versus withdrawal to conditioned stimuli in pigeons. J Comp Physiol Psychol 86:616–627.

1065 doi:10.1037/h0036171

1066 Wheeler RA, & Carelli RM. 2006. The neuroscience of pleasure. Focus on “ventral pallidum firing

1067 codes hedonic reward: When a bad taste turns good.” J Neurophysiol 96:2175–2176.

1068 doi:10.1152/jn.00727.2006

1069 Wiecki T V., & Frank MJ. 2013. A computational model of inhibitory control in frontal cortex and

1070 basal ganglia. Psychol Rev 120:329–355. doi:10.1037/a0031542

1071 Yagishita S, Hayashi-Takagi A, Ellis-Davies GCR, Urakubo H, Ishii S, & Kasai H. 2014. A critical

1072 time window for dopamine actions on the structural plasticity of dendritic spines. Science

1073 (80- ) 345:1616–1620. doi:10.1126/science.1255514

1074 Yan R, Wang T, & Zhou Q. 2019. Elevated dopamine signaling from ventral tegmental area to

1075 prefrontal cortical parvalbumin neurons drives conditioned inhibition. Proc Natl Acad Sci U

1076 S A 116:13077–13086. doi:10.1073/pnas.1901902116

1077 Yetnikoff L, Cheng AY, Lavezzi HN, Parsley KP, & Zahm DS. 2015. Sources of input to the

1078 rostromedial tegmental nucleus, ventral tegmental area, and lateral habenula compared:

1079 A study in rat. J Comp Neurol 523:2426–2456. doi:10.1002/cne.23797

1080 Yin H. 2020. The crisis in neuroscience, The Interdisciplinary Handbook of Perceptual Control

1081 Theory. Elsevier Inc. doi:10.1016/b978-0-12-818948-1.00003-4

1082 Zapata A, Hwang EK, & Lupica CR. 2017. Lateral Habenula Involvement in Impulsive Cocaine

1083 Seeking. Neuropsychopharmacology 42:1103–1112. doi:10.1038/npp.2016.286

1084 Zhu Y, Qi S, Zhang B, He D, Teng Y, Hu J, et al. 2019. Connectome-based biomarkers predict

1085 subclinical depression and identify abnormal brain connections with the lateral habenula

1086 and thalamus. Front Psychiatry 10:1–14. doi:10.3389/fpsyt.2019.00371

1087

1088 Figure 1. Projections of the midbrain dopamine regions and their neurochemical interaction with the LHb. Panel A

1089 shows a sagittal view of the rat brain, depicting efferent pathways of midbrain dopamine regions and their input

1090 from the LHb. Panel B represents synaptic relationships between neurons in the VTA and inputs (direct and

1091 through the RMTg) from the LHb. DA: dopamine (orange), GABA: gamma-aminobutyric acid (green), Glu:

1092 glutamate (blue), mPFC: medial prefrontal cortex, dSTR: dorsal striatum, NAc: nucleus accumbens, LHb: lateral

1093 habenula, VTA: ventral tegmental area, SNc: substantia nigra pars compacta, RMTg: rostromedial tegmental

1094 nucleus. The image of the rat was adapted from https://smart.servier.com/smart_image/rat, under the terms of

1095 the CC-BY 3.0 Unported license (https://creativecommons.org/licenses/by/3.0/). Rat brain adapted from Juarez et

1096 al. (2013).

1097

1098 Figure 2. Dynamics of associative strength according to the Bush-Mosteller model in 100 consecutive trials of

1099 acquisition (λ = 1) followed by 100 extinction trials (λ = 0). Colors represent different learning rates (α), assuming

1100 that extinction rates are 3/4 of acquisition rates. Note that the decrease in responding (right panel) is a dynamic

1101 process that progresses with each trial, unlike the slow and continuous decay that would be expected if we simply

1102 allow time to pass.

1103

1104 Figure 3. Possible values of associative strength for stimulus X (VX; color scale) after being presented in compound

1105 with an excitatory conditioned stimulus, A, and followed by the omission of reward (λ = 0). According to the

1106 Rescorla–Wagner model, the outcome depends on the associative strength of the companion stimulus A

1107 (horizontal axis) that of X (vertical axis) prior to the reward omission episode, and on learning rate (α = .12 is

1108 assumed).

1109

1110 Figure 4. Associative strength, according to the Rescorla–Wagner model, of stimuli involved in a feature-negative

1111 discrimination paradigm in 80 trials of each type. Trials of the stimulus A plus reward and not rewarded A-with-X

1112 trials were assumed to alternate non-randomly. Values of α were set at 0.1 and 0.075 for rewarded (λ = 1) and not

1113 rewarded (λ = 0) trials, respectively. The associative strength of the stimulus A and the compound stimulus AX

1114 could be observed in actual performance. In contrast, the associative strength of X alone (i.e., the conditioned

1115 inhibitor) is theoretically inferred from the model and could be revealed by special tests (see text).

1116

1117 Figure 5. Simplified diagram representing the participation of the LHb in plastic brain processes for adaptively

1118 updating behavior and hypothesized consequences of excessive activity (dark purple) or inactivity (dark aqua) of

1119 this brain structure. EPN = entopeduncular nucleus, LHb = lateral habenula, RMTg = rostromedial tegmental

1120 nucleus, VP = ventral pallidum, VTA = ventral tegmental área.

50 A

GABA

Glu

mPFC LHb

dSTR RMTg

NAc VTA SNc

to LHb NAc VTA RMTg

to NAc LHb VTA VTA

to VTA LHb mPFC 1.0 acquisition | extinction 0.9 a = .16 | .12 0.8 a = .08 | .06 0.7 a = .04 | .03 0.6 0.5 0.4 0.3 0.2 0.1

AssociativeStrength(V) 0.0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Trials 0.20

0.15

Vx Post-trial

trial 0.1 - 0.10 0.0

Pre

V -0.1

0.05

0.00

0.00 0.25 0.50 0.75 1.00 VA Pre - trial 1.0

0.8

Stimulus 0.6 A 0.4 AX X 0.2

0.0

-0.2

-0.4

-0.6

AssociativeStrength(V) -0.8

-1.0 0 10 20 30 40 50 60 70 80 Trials excitatory synapses

inhibitory synapses modulatory processes LHb Organism

Signals of reward RMTg omission Signals of impending EPN pain Signals of pain relief

sensory

Signals of impending processing VP reward

sensory input sensory VTA updang the system's state via plasc processes

moto

vironment r output En motor output

Reinforcing Discouraging recent recent behaviors behaviors

Degree of behavioral

Impulsive-like

excessive

moderate

change Depressive-like

moderate excessive

Selecve suppression Obtaining rewards Unyielding, of unfruiul or harmful and avoiding harm in Generalized stereotyped, behaviors, allowing Potenal a parcular context suppression of unfruiul behaviors alternave acons to while remaining behavior outcome wasng energy and arise which would ﬂexible in case of (hopelessness). pung life at risk. eventually lead to changes. reward.