Review | Cognition and Behavior The Role of the Lateral Habenula in Inhibitory Learning from Reward Omission Experiences https://doi.org/10.1523/ENEURO.0016-21.2021
Cite as: eNeuro 2021; 10.1523/ENEURO.0016-21.2021 Received: 12 January 2021 Revised: 26 March 2021 Accepted: 30 March 2021
This Early Release article has been peer-reviewed and accepted, but has not been through the composition and copyediting processes. The final version may differ slightly in style or formatting and will contain links to any extended data.
Alerts: Sign up at www.eneuro.org/alerts to receive customized email alerts when the fully formatted version of this article is published.
Copyright © 2021 Sosa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.
1 1. Manuscript Title (50 word maximum): The Role of the Lateral Habenula in Inhibitory Learning 2 from Reward Omission Experiences 3 4
5 2. Abbreviated Title (50 character maximum): Lateral Habenula and Inhibitory Learning 6 7
8 3. List all Author Names and Affiliations in order as they would appear in the published 9 article:
10
11 Rodrigo Sosa. Universidad Panamericana (corresponding autor).
12 Jesús Mata-Luévanos. Universidad Iberoamericana.
13 Mario Buenrostro-Jáuregui. Universidad Iberoamericana (co-corresponding autor). 14 15
16 4. Author Contributions: RS conceived the idea and outlined the paper; RS, JM-L, and MB- 17 J wrote the paper to equal extent; RS and MB-J worked out the figures; JM-L formatted 18 the manuscript. 19 20
21 5. Correspondence should be addressed to (include email address): Rodrigo Sosa 22 ([email protected]) or Mario Buenrostro-Jáuregui ([email protected]). 23 24 25 6. Number of Figures: 31 9. Number of words for Abstract: 203 26 5 32 10. Number of words for 27 7. Number of Tables: 0 33 Significance Statement: 119 28 8. Number of 34 11. Number of words for Introduction: 523 29 Multimedia: 0 35 12. Number of words for Discussion: 925 30
1
36
37 13. Acknowledgements: We thank División de Investigación y Posgrado of the 38 Universidad Iberoamericana Ciudad de México and the Fondo Fomento a la 39 Investigación of the Universidad Panamericana for providing funding for production 40 expenses. 41 42
43 14. Conflict of Interest: The authors declare no competing financial interests. 44 45
46 15. Funding sources: Sistema Nacional de Investigadores, Consejo Nacional de 47 Ciencia y Tecnología. 48
49
50
51
52
53
54
55
56
57
58
59
60
1
61 The Role of the Lateral Habenula in Inhibitory Learning
62 from Reward Omission
63
64 Abstract
65 The lateral habenula (LHb) is a phylogenetically primitive brain structure that plays a key role in
66 learning to inhibit distinct responses to specific stimuli. This structure is activated by primary
67 aversive stimuli, cues predicting an imminent aversive event, unexpected reward omissions,
68 and cues associated with the omission of an expected reward. The most widely described effect
69 of LHb activation is acutely suppressing midbrain dopaminergic signaling. However, recent
70 studies have identified multiple means by which the LHb foster this effect as well as other
71 mechanisms of action. These findings reveal the complex nature of LHb function. The present
72 paper reviews the role of this structure in learning from reward omission experiences. We
73 approach this topic from the perspective of computational models of behavioral change that
74 account for inhibitory learning to frame key findings. Such findings are drawn from recent
75 behavioral neuroscience studies that use novel brain imaging, stimulation, ablation, and
76 reversible inactivation techniques. Further research and conceptual work are needed to clarify
77 the nature of the processes related to updating motivated behavior in which the LHb is
78 involved. As yet, there is little understanding of whether such mechanisms are parallel or
79 complementary to the well-known modulatory function of the more recently evolved prefrontal
80 cortex.
81
2
82 Keywords: dopamine signaling, conditioned inhibition, inhibitory learning, mesolimbic pathway,
83 mesocortical pathway, negative reward prediction error
84
85 Significance Statement
86 The lateral habenula is a brain structure that has received a great deal of attention and has
87 been a hot topic in the past decades. Consequently, this research field has been extensively
88 reviewed. We review in detail some key recent findings that are pivotal in framing the role of
89 the lateral habenula in well-described associative learning phenomenon, conditioned inhibition.
90 This specific topic has not been considered deeply enough in previous review articles. We also
91 outline the possible mechanisms by which the lateral habenula updates behavior by means of
92 two identified pathway categories (inhibitory and excitatory). This provides a comprehensive
93 account potentially embracing more issues than previously thought, refines our understanding
94 of multiple reward-related mechanisms, and raises novel research questions.
95
96 Introduction
97
98 The lateral habenula (LHb) is a phylogenetically preserved brain structure located in the
99 dorsomedial surface of the thalamus that participates in learning from aversive (i.e., undesired)
100 experiences. These include primary aversive stimuli, reward omission, and cues associated with
101 either aversive stimuli or reward omission (Baker et al., 2016). However, latest research
102 suggests that the LHb is not as crucial for learning from primary aversive experiences (see Li et
103 al., 2019b) as it is for learning from reward omission.
3
104 The LHb has been characterized as a part of a “brake” mechanism to suppress firing in
105 midbrain dopaminergic neurons (Barrot et al., 2012; Vento & Jhou, 2020). Such effect is mainly
106 achieved by exciting GABAergic neurons in the rostromedial tegmental nucleus (RMTg) reaching
107 the ventral tegmental area (VTA) and the substantia nigra pars compacta (SNc) (Jhou, Geisler,
108 Marinelli, Degarmo, & Zahm, 2009). However, direct glutamatergic excitation of GABAergic
109 interneurons that synapse with dopamine neurons in the VTA also takes place (Omelchenko
110 and Sesack, 2009). In both cases, the net effect is to suppress dopamine release in the nucleus
111 accumbens (NAc; by VTA endings) and the dorsal striatum (by SNc endings). This would disrupt
112 reward-related motor activity, and reward-related plastic changes associated with midbrain
113 dopamine release (Tsutsui-Kimura et al., 2020). A third, recently discovered, pathway (Lammel
114 et al., 2012) involves LHb glutamatergic projections reaching dopamine neurons in the VTA that,
115 in turn, target the medial prefrontal cortex (mPFC; see Figure 1). mPFC activity has been
116 associated with aversive learning (Huang et al., 2019) and the capacity to behave congruently
117 with hierarchically arranged stimuli sets (Roughley and Killcross, 2019).
118
119 ***INSERT FIGURE 1***
120
121 Numerous reviews about the physiology of the LHb have already been published; most
122 of them cover the participation of the LHb in primary aversive learning, stressing on the its
123 putative implications for psychiatric conditions (e.g., Hu, Cui, & Yang, 2020). Other settings that
124 involve the LHb in non-primary aversive situations, such as tasks that require behavior
125 flexibility, have also been reviewed (e.g., Baker et al., 2016). In this review, we discuss the
4
126 experimental findings on a third ground, the role of the LHb in reward-omission learning, which
127 is known to imply negative reward prediction errors. Particularly, we explore how such findings
128 could be framed in terms of some models of behavioral change. This might be informative for a
129 research agenda aiming to unveil the details of adaptive mechanisms in which the LHb
130 participates. In addition, this may aid in understanding maladaptive behaviors that arise from
131 the disruption of brain regions in the extended network in which the LHb is embedded. We
132 conceive learning from reward omission as being potentially integrated with more widely
133 studied phenomena involving LHb function (i.e., primary aversive conditioning and behavioral
134 flexibility). On one hand, reward-omission experiences imbue preceding cues with properties
135 that are functionally equivalent to those cues associated with primary aversive stimuli. On the
136 other hand, learning in tasks requiring behavioral flexibility involves committing errors (Baker,
137 Raynor, Francis, & Mizumori, 2017) and this often entails the omission of otherwise expected
138 rewards. Therefore, we hope that understanding one of these artificially delineated fields
139 would potentially help to elucidate the mechanisms involved in the others.
140
141 1. Modeling the Dynamics of Acquisition and Extinction of Responding in Reward Learning
142
143 Learning could be roughly regarded as a process in which behavior is updated when an
144 organism faces environmental regularities. Although learning is a continuous process (Sutton &
145 Barto, 2018), it is sometimes methodologically and analytically useful to discretize it in trials
146 (Harris, 2019). Trials are fractions of time in which explicit experiences are assumed to promote
147 specific changes in future behavior. Rescorla and Wagner (1972) proposed a model accounting
5
148 for the associative strength of a target stimulus given its pairings with an affectively significant
149 event on a trial-by-trial basis. Such an associative strength or value could represent a
150 theoretical estimation of either the magnitude or the probability of responding to the target
151 stimulus in the next trial. While “responding” was originally intended to reflect a change in
152 overt outcome-anticipating behavior, it may also represent a neural-level state change such as
153 firing of dopamine neurons (see Roesch, Esber, Li, Daw and Schoenbaum, 2012). This model
154 states that a target stimulus gains or loses associative strength according to a learning rule. The
155 change for this value in a given trial depends on the discrepancy of the current value of the
156 target stimulus and that of the outcome that follows (e.g., presentation or absence of a
157 reward). Formally:
158
159 ΔVX = α(λ – ΣVN) Equation 1
160
161 where ΔVX represents the change in the associative strength (V) of the target stimulus (or
162 action; X) in a given trial, λ represents the magnitude of reward (zero if the reward is omitted),
163 ΣVN represents the sum of the values of all cues that are present during a conditioning trial
164 (usually, accounted for with a few of those cues), and α, which is bound between 0 and 1,
165 represents a learning rate parameter. To show how the model operates, let us first consider the
166 simplest possible example. This requires assuming that, during conditioning trials, the only
167 relevant input besides the reward is a single cue (thus being VX ≈ ΣVN). If λ (> 0) and α are held
168 constant, VX will increase approaching λ throughout conditioning trials (e.g., pairings of the
169 stimulus with reward) in a decelerated fashion, as the discrepancy between these parameters
6
170 (i.e., λ − ΣVN ) progressively declines (see Figure 2, left panel). Thence, the absolute value of the
171 parenthetical term in Equation 1 determines the amount of behavioral change on the next trial
172 of the same type. The discrepancy between expected and experienced outcomes has been
173 termed prediction error and could be regarded as a theoretical teaching signal that alters some
174 aspect of the system to update behavior. Prediction errors can be classified as appetitive
175 (reward-related) or aversive, and as positive or negative (Iordanova, Yau, McDannald, & Corbit.,
176 2021). In addition, prediction errors seem to dictate both Pavlovian (i.e., stimulus-outcome) and
177 instrumental (i.e., action-outcome) learning through a homologous correction mechanism
178 (Bouton, Thrailkill, Trask, & Alfaro, 2020; see also Eder, Rothermund, De Houwer, & Hommel,
179 2015).
180
181 ***INSERT FIGURE 2***
182
183 At the physiological level, canonical midbrain dopamine neurons fire above their
184 baseline activity upon unexpected rewards (i.e., positive reward prediction error; λ > ΣVN). Such
185 dopamine bursts increase the probability of occurrence of the actions that took place priorly,
186 whenever the organism is presented to a similar situation in the future (i.e., positive
187 reinforcement). In addition, dopamine bursts backpropagate to reliable signals of reward,
188 making them acquire signaling (i.e., discrimination learning) and rewarding (i.e., conditioned
189 reinforcement) properties in themselves (Sutton & Barto, 2018). Dopamine phasic activity (as in
190 VX > 0) is triggered by inputs received by midbrain neurons from multiple nodes in a brain-wide
191 network (Tian et al., 2016). Deceleration in the increase of associative strength with repeated
7
192 experiences involving paired presentations of the stimulus and the reward would require an
193 antagonistic mechanism. Inhibitory NAc-VTA projections have been hypothesized to play some
194 role in said mechanism (see Mollick et al., 2020).
195 Once a cue or action has acquired some associative strength, the repeated omission of
196 reward after its presentation allows for the restitution of behavior to the initial non-responsive
197 state. Reward omissions following the presentation of an already conditioned stimulus (or
198 whenever λ is smaller than ΣVN; i.e., negative reward prediction error) trigger a phasic decrease
199 (or dip) in dopamine activity (e.g., Matsumoto and Hikosaka, 2007; Schultz, Dayan and
200 Montague, 1997). If repeated consistently, these dopamine dips will decrease dopamine firing
201 upon the target stimulus in subsequent trials. This translates into a decrease in both
202 conditioned responses and reinforcing effects associated with the target stimulus until reaching
203 a minimum value (see Figure 2, right panel). Here, the procedure, process, and outcome
204 [Footnote 1: Note that the minimal response to the target stimulus observed after extinction
205 procedures is quite labile and responding is often susceptible to bounce back after a slight
206 contextual change (Bouton, Maren, & McNally, 2020). This is not accounted by the Rescorla–
207 Wagner model.] are each referred to by the term ‘extinction’. Suppression of midbrain
208 dopamine neurons induced by LHb activity plays a key role in this (context dependent; see
209 Footnote 1) restauration mechanism. For example, a recent study by Donaire et al. (2019)
210 found that LHb lesions impair extinction of an appetitive response in rats. Similarly, Zapata,
211 Hwang, and Lupica (2017) found that pharmacological inhibition of the LHb impaired extinction
212 of responses that were previously rewarded in the presence of a stimulus. However, this effect
213 was selective for cocaine reward and did not impair extinction of responding previously
8
214 rewarded with a sucrose solution. This result was interpreted as an indicative of the greater
215 difficulty in withholding responses rewarded with cocaine compared with those rewarded with
216 sucrose. Both findings reveal the participation of the LHb in decreasing previously acquired
217 behavior when a stimulus or action is no longer followed by a reward (i.e., extinction).
218
219 2. Inducing, Testing, and Modeling Net Inhibitory Effects
220
221 When conceptualizing conditioning trials as if a single cue is relevant for reward learning (such
222 that VX = ΣVN) obtaining a negative value for VX with Equation 1 is logically impossible. As VX
223 would range from zero to λ, it can only vary in its degree of “rewardingness” (for lack of a
224 better term). That is, one could conceive of a stimulus as more or less rewarding than the other,
225 but hardly as more aversive in a general sense. However, a distinctiveness of the Rescorla–
226 Wagner model allows to capture both net negative associative strength values and acquisition
227 of opposed affective effects. Such feature is important because it captures aspects of key
228 conditioning phenomena.
229 A neutral stimulus that is consistently paired with the omission of an expected
230 affectively significant event often acquires the opposite affective valence. In this case, a
231 stimulus paired with the omission of an expected reward would acquire aversive properties
232 (Wasserman, Franklin and Hearst, 1974). This means that the organism will learn to avoid it
233 (either actively or passively) or escape from it. On the other hand, the negative summation
234 effect occurs when a stimulus associated with the omission of reward is presented
235 simultaneously with one that reliably predicts the reward. The usual result is that responding to
9
236 the compound stimulus is reduced compared with that controlled by the reward-predicting
237 stimulus alone (Acebes et al., 2012; Harris, Kwok, & Andrew, 2014). The model of Rescorla and
238 Wagner (1972) allows for partitioning of the elements that constitute a compound stimulus;
239 concretely, it provides a rule to determine how associative strength values of different stimuli
240 presented simultaneously interact trial by trial. Recall that ΔVX represents the change in the
241 associative strength of a single element, X, of the assemblage of current stimuli, N. Whenever
242 ΣVN exceeds the value of λ (as in trials involving reward omission), VX could take negative values
243 under appropriate conditions. In order for X to end up with a net negative associative strength
244 following reward omission requires that (1) its current associative strength to be sufficiently
245 low and (2) the remaining of the coextensive stimuli have some associative value (see Figure 3).
246
247 ***INSERT FIGURE 3***
248
249 In the Rescorla–Wagner model, the magnitude of responses evoked by ΣVN is assumed
250 to be a sort of algebraic sum of the values of current stimuli (obviating biological limits).
251 Crucially, this model assumes that the context (cues that are continually present) could acquire
252 associative strength and is thus capable of meaningfully impacting learning. For example,
253 presenting X without the reward and the reward without X with time intervals in between
254 would imbue that stimulus with net inhibitory properties (i.e., VX < 0). This protocol is known as
255 the explicitly unpaired procedure. In such a condition, the context, C, acquires a positive
256 associative strength (VC > 0), by being present when the reward is delivered. Importantly, the
257 context is also present during the trials involving the X stimulus plus the omission of reward,
10
258 being ΣVN equal to VC + VX (Wagner & Rescorla, 1972). Then, ΔVX will be negative for every such
259 trial, accruing a negative value progressively over VX. This would manifest as aversion-related
260 behaviors towards X (see Wasserman et al., 1974) and as a subtraction of the response
261 magnitude evoked by a reward-predicting stimulus whenever X is presented (i.e., X ∈ ΣVN < X ∉
262 ΣVN). Such outcomes are justified by two assumptions. First, a negative VX implies a subtraction
263 of the response-evoking potential of concurrent stimuli (Wagner & Rescorla, 1972). Second,
264 negative associative values imply that a stimulus has an affective valence that opposes that of
265 the outcome with which it was trained (Daly and Daly, 1982).
266 A convenient way to conceptually and empirically instantiate these attributes of the
267 Rescorla–Wagner model is the feature-negative discrimination protocol [Footnote 2: This
268 paradigm is more broadly known as Pavlovian conditioned inhibition; however, “conditioned
269 inhibition” is also used to designate the process that the target stimulus undergoes (here termed
270 “inhibitory learning”), as well as the empiric demonstration that such a process has taken place
271 (Savastano, Cole, Barnet and Miller, 1999). Therefore, here we use the term “feature-negative
272 discrimination” for the procedure and “inhibitory learning” for the process in order to avoid
273 ambiguity.]. This technique consists in presenting two types of conditioning trials, usually in a
274 randomly alternated fashion. One type of trial consists of presenting a single stimulus, A,
275 followed by an affective outcome (e.g., reward). The remaining type of trial consists of the
276 same stimulus accompanied by another stimulus, X, followed by the omission of that outcome.
277 Such a manipulation, according to the Rescorla–Wagner model, leads stimulus A to acquire a
278 positive associative strength and stimulus X to acquire a deep negative associative strength (see
279 Figure 4). This hypothetical attribute is known as conditioned inhibition and Rescorla (1969)
11
280 stated that it should be empirically demonstrated using two special tests. One of these tests is
281 the above-described negative summation effect and the other is the retardation in the
282 acquisition of a conditioned response by the target stimulus. Also, as stated above, conditioned
283 inhibitors trained with an outcome of a particular affective valence have been documented to
284 acquire an opposed valence (i.e., appetitive to aversive and vice versa).
285
286 ***INSERT FIGURE 4***
287
288 3. Dopamine Dips Propagate from Reward Omission to Events that Predict it
289
290 The omission of an expected reward seems to promote a similar backpropagation process to
291 the one that takes place when a cue predicts reward. Interestingly, the LHb is likely to play a
292 crucial role in this process (see also Mollick et al., 2020). As mentioned above, reward omission
293 triggers a phasic depression in dopamine release. In turn, this promotes a decrease in
294 dopamine release (and its behavioral outcomes) towards cues that predict reward omissions.
295 This implies that dopamine dips would promote a plastic change in the behaving agent which is
296 opposite to that involved in reward. The clearest illustrations of this claim are brought by two
297 complementary experiments to be described below.
298 First, Tobler, Dickinson, and Schultz (2003) exposed macaque monkeys to a feature-
299 negative discrimination paradigm (see section above). In this study, the presentation of a cue
300 alone predicted reward delivery. The presentation of the same cue accompanied by another
301 stimulus, the conditioned inhibitor, predicted reward omission. Through training, the
12
302 conditioned inhibitor acquired the capacity of counteracting the effects of the companion cue
303 in reward omission trials (see Figure 4; gray circles). This tendency was also observed when the
304 conditioned inhibitor was presented along with a different reward-predicting cue in a novel
305 compound (i.e., negative summation). Notably, those effects were reported at both behavioral
306 (licking response) and neural (dopamine neuron activity) levels of observation. Most
307 importantly, the presentation of the conditioned inhibitor alone, while behaviorally silent, was
308 capable of decreasing baseline dopamine firing levels. This revealed that reward omission
309 promotes a transfer of function from the situation in which the reward is actually omitted to
310 the cue that consistently predicts it.
311 The second experiment was conducted recently by Chang, Gardner, Conroy, Whitaker,
312 and Schoenbaum (2018), using rats as subjects. In this study, brief dopamine dips were
313 artificially induced in midbrain neurons following the simultaneous presentation of a reward-
314 predicting cue and a novel cue. After training, the novel cue acquired the properties of a
315 conditioned inhibitor, as per the conventional tests of negative summation and retardation of
316 acquisition. Crucially, in the training trials in which this novel cue was involved the reward was
317 not omitted. Thus, the inhibitory effects in the target cue must have arisen from the artificial
318 dips in dopamine firing, complementing the findings of Tobler et al. (2003). On one hand,
319 Tobler et al. (2003) demonstrated that dopamine dips could propagate from the omission of
320 reward to the presentation of an arbitrary event; on the other hand, Chang et al. (2018)
321 demonstrated that the dopamine dips themselves are sufficient for this to occur.
322 As we mentioned above, one of the most robust physiological effects of LHb activation
323 is producing phasic dips in dopamine firing. Therefore, both Tobler et al. and Chang et al.’s
13
324 findings are relevant to elucidate the function of LHb physiology in learning from experiences
325 involving reward omission. In another recent study, Mollick et al. (2021) intended to test this
326 notion with humans in an fMRI study. The aim of this study was to replicate Tobler et al.’s
327 (2003) behavioral protocol but including the measurement of the activity in different regions
328 across the whole brain (not only in midbrain dopamine cells). Mollick et al. indeed captured
329 greater LHb activity during the presentation of a conditioned inhibitor compared to control
330 stimuli; however, this difference did not survive correction for multiple comparisons. Therefore,
331 the authors recommend further replication before any inference can be made. While an earlier
332 human fMRI study (Salas et al., 2010) already documented that the habenula is activated when
333 an expected reward is omitted (i.e., negative reward prediction error), evidence for the
334 backpropagation of this effect remains elusive. A possible explanation for the null result
335 reported by Mollick et al. (2021) is that the human LHb is quite small and cannot be
336 differentiated from the medial habenula with standard fMRI resolutions (cf. Salas et al., 2010).
337 Similarly, the study by Mollick et al. (2021) failed to capture decreased activity in midbrain
338 dopamine regions during reward omissions. The authors argued that their fMRI spatial
339 resolution may have also precluded the distinction between these regions and the adjacent
340 GABAergic RTMg (which was presumably active as well during reward omissions).
341
342 4. Concurrent Reward Cues Contribute to Dopamine Dips and their Propagation
343
344 The above section concluded that dopamine dips may function as “teaching” signals for
345 gradually imbuing a stimulus with negative associative strength. The Rescorla–Wagner model
14
346 states that the negative associative strength accrued by a stimulus that is paired with reward
347 omission depends on concurrent reward cues (see Figure 3). Therefore, if we assume that
348 dopamine dips depend on LHb activation, the latter would rely on current information about an
349 impending reward. That is, if a stimulus is followed by the omission of reward, its associative
350 strength would not change unless it is coextensive with reward-related cues. Therefore, LHb
351 activation might not be induced by the sheer non-occurrence of reward. Rather, the LHb may
352 become active when this non-occurrence coincides with signals from other brain regions
353 indicating probable reward. Furthermore, the greater the signal for reward the larger the
354 downshift in associative strength of (and the aversion imbued to) the stimulus paired with
355 reward omission (see Figure 3). Accordingly, dopamine dips induced by reward omission would
356 be proportional to the probability (or magnitude) of the reward associated with a conditioned
357 stimulus. Tian & Uchida (2015) found evidence supporting this assumption using mice as
358 subjects and three different probabilities of reward associated with different target stimuli.
359 Crucially, this pattern of results was hindered in animals with bilateral lesions in the whole
360 habenular complex.
361 A relevant consideration is the origin of the signals required for invigorating the LHb
362 activity upon signals of an impending reward. The entopeduncular nucleus (EPN; border region
363 of the globus pallidus internal segment in primates) (Hong and Hikosaka, 2013) and lateral
364 hypothalamus (LH) (Stamatakis et al., 2016) provide excitatory inputs to the LHb. However, up
365 to this point, it remains unclear how impending reward signals mediate the invigoration of the
366 LHb. An elegant study with macaques by Hong and Hikosaka (2013) found that electric
367 stimulation of the ventral pallidum (VP) consistently inhibited the activity of the LHb. These
15
368 authors conjectured that an interaction between inputs from the EPN and the VP in the LHb
369 was responsible for the activation of the LHb. Recent evidence indicates that GABAergic
370 neurons in the VP are activated by surprising rewards and reward-predicting cues and inhibited
371 by aversive stimuli and reward omission (Stephenson-Jones et al., 2020; Wheeler and Carelli,
372 2006). Crucially, inhibition of LHb projecting VP GABAergic neurons upon reward omission
373 depends on current reward cues. In short, these VP neurons showed a biphasic
374 ascending→descending pattern of activation to the reward-cue→reward-omission sequence.
375 Therefore, this structure is a strong candidate for providing signals to invigorate the activation
376 of the LHb in reward omission episodes via disinhibition.
377 The strength with which the EPN triggers LHb activation may depend on the prior tone
378 of VP inhibitory inputs. In fact, Stephenson-Jones et al. (2020) reported that a number of
379 GABAergic VP neurons synapsing with the LHb exhibited a sustained pattern of activation
380 during reward-predicting cues. Then, sudden inhibition of these neurons upon reward omission
381 may combine with activation of EPN signaling for strongly disinhibiting the LHb. Stephenson-
382 Jones et al. (2020) also observed that a small subpopulation of LHb projecting glutamatergic VP
383 neurons showed the opposite pattern of activation to that of GABAergic ones. Activation of
384 these neurons may add up to complement the potent disinhibition-excitation process in the
385 LHb during omission of an expected reward.
386 Knowing the source of excitatory and inhibitory inputs to the LHb is an essential matter.
387 However, this leads to the task of identifying the upstream inputs of these sources (Bromberg-
388 Martin, Matsumoto, Hong and Hikosaka, 2010b) and so on. Hong and Hikosaka (2013),
389 speculated that the EPN may activate the LHb through disinhibitory inputs from striosomal
16
390 regions of the dorsal striatum. This conjecture has recently been supported by a study
391 conducted by Hong et al. (2019) in which the activation of striatal neurons inside or nearby
392 striosomes correlated with activation of the LHb. Hong et al. (2019) further advanced the
393 hypothesis that striosomes may convey both excitatory and inhibitory inputs to the LHb.
394 Regarding the VP, Stephenson-Jones et al. (2020) reported that neurons in this structure
395 targeting the LHb became active in a state-dependent fashion; specifically, these neurons fire
396 according to the motivational status of the subjects, which is probably mediated by
397 hypothalamic signals. It seems reasonable that activation of the LHb would be dependent on
398 the energetic supply status or sexual drive at a particular moment. In short, the omission of
399 reward should not cause substantial disturbance to a sated individual. This could be another
400 possibility for explaining the failure of Mollick et al. (2021) to replicate Tobbler et al.’s findings.
401 Both studies used fruit juice as reward. However, in the latter study macaques were liquid
402 deprived, while in the former participants were recruited based on self-reported preference for
403 the type of juice employed. The motivational state of participants in Mollick et al.’s (2021) study
404 was probably insufficient to induce strong physiological responses.
405
406 5. The Lateral Habenula Fosters Stimulus-Specific and Response-Specific Inhibitory Effects
407
408 Although feature-negative discrimination is a prototypic procedure for inducing conditioned
409 inhibition effects in appetitive settings (which here we are proposing as a behavioral
410 phenotypic model of LHb function), other protocols are also capable of doing so (see Savastano
411 et al., 1999). Likewise, other behavioral manifestations, besides negative summation and
17
412 retardation of acquisition tests, could be used to certify conditioned inhibition (e.g.,
413 Wasserman et al., 1974). For example, in a pivotal study, Laurent, Wong, and Balleine (2017)
414 exposed rats to a backward conditioning preparation to induce conditioned inhibition. Then,
415 conditioned inhibitory properties of stimuli were tested through a Pavlovian-to-instrumental
416 transfer (PIT) test. This backward conditioning procedure consisted in presenting one of two
417 different auditory target stimuli following –rather than preceding, as typically done– the
418 presentation of two different rewards; each target stimuli was consistently associated with its
419 corresponding reward in a backward fashion (reward→stimulus). Next, rats pressed two levers
420 to obtain rewards and each of the rewards used in the previous phase was associated with a
421 different lever. In the PIT test, the levers were present but pressing was no longer followed by
422 reward. In that condition, the target stimuli that were used in the first phase were presented
423 randomly and lever pressing was recorded. Conditioned inhibition was revealed inasmuch each
424 target stimulus biased lever-pressing toward the opposite lever than that associated with its
425 corresponding reward. Importantly, Laurent et al. (2017) found that this outcome was impaired
426 by bilateral lesions of the LHb.
427 The disruption of conditioned inhibition was first observed by performing an electrolytic
428 lesion in the entire LHb. This result was replicated when the authors induced a selective
429 ablation of neurons in the LHb that projected to the RMTg via a viral technique. Therefore, the
430 authors concluded that this LHb-RMTg pathway is crucial for learning these specific negative
431 predictions about rewards. Remarkably, electrolytic lesions in the LHb did not disrupt PIT
432 performance for rats trained with traditional forward-conditioning (stimulus→reward)
433 procedure. This suggests that the disruptive effect was confined to inhibitory learning (see also
18
434 Zapata et al., 2017). Laurent et al.’s (2017) findings provide important insights for
435 understanding the role of LHb-RMTg patway in the development of conditioned inhibition in
436 appetitive settings. Beyond generally impairing responding when reward is negatively
437 predicted, the LHb seems to promote learning to perform specific action patterns to specific
438 cues. Apparently, this structure participates in enabling alternative behaviors for exploiting
439 other potential resources in the environment when the omission of a particular reward is
440 imminent.
441 Notably, the findings reported by Laurent et al. (2017) challenge the Rescorla–Wagner
442 model. According to this model, a stimulus that is explicitly unpaired with a reward can acquire
443 a negative associative strength (i.e., become a conditioned inhibitor). This would be mediated
444 by the relatively high associative strength in the context that coextends with the target stimulus
445 preceding reward omission. The Rescorla–Wagner model predicts that backward conditioning
446 would lead as well to conditioned inhibition by this rationale (see Wagner & Rescorla, 1972). In
447 backward conditioning the target stimulus also co-occurs with the presumably excitatory
448 context. However, Laurent et al. (2017) observed specific inhibitory effects for each of the
449 target stimuli in their study. It is worth stressing that those target stimuli were trained at the
450 same time range and in the same context. Thus, the target stimuli presumably acquired
451 conditioned inhibitory properties by being (negatively) associated with each reward. In
452 contrast, the Rescorla–Wagner model predicts that both target stimuli would acquire general
453 inhibitory properties. This follows from the rationale that each target stimulus was paired with
454 the absence of either reward in a context associated with both.
19
455 At the behavioral level, Laurent et al.’s findings can be readily accounted for by the
456 sometimes-opponent-process (SOP) model (see Vogel, Ponce, and Wagner, 2019). This
457 associative learning model invokes stimuli after-effects that are capable of shaping behavior if
458 they are appropriately paired with other events. According to the SOP model, there is an
459 inhibitory (opponent) decaying trace of its own kind following the delivery of each particular
460 reward. In the study of Laurent et al. (2017), each reward’s unique trace would have been
461 paired with each of the target stimuli. As a consequence, each target stimulus acquired reward-
462 specific conditioned inhibitory properties. As LHb lesions completely abolished the behavioral
463 effect resulting from this protocol this structure appears to play a crucial role in this
464 phenomenon. However, the putative physiological means by which LHb mediates learning in
465 conditions such as those in Laurent et al.’s (2017) study remain to be determined.
466
467 6. The Lateral Habenula Participates in Inhibitory Learning even without Temporally Specific
468 Reward Omission
469
470 The findings reported by Laurent et al. (2017) merit further examination. Based on this study,
471 we can assume that the LHb could exert its effects without acute episodes involving the
472 omission of an expected reward. Such a process is also instantiated in explicitly unpaired
473 procedures, in which a target stimulus acquires inhibitory properties solely by alternating with
474 reward. A recent study by Choi, Kim, Gallagher, and Han (2020) sought to test the hypothesis
475 that the LHb is involved in learning the association of a cue with the absence of reward. These
476 authors exposed rats to an explicitly unpaired conditioning procedure that used a light as a
20
477 negative predictor of food delivery. Choi et al., (2020) found increased c-Fos expression in the
478 LHb of rats exposed to an explicitly unpaired conditioning procedure compared to controls. This
479 result supports the idea that the LHb is engaged in the learning process that takes place when a
480 stimulus signals the absence of reward. In a separate experiment, these authors did not find
481 any disruption in performance on this protocol when they induced excitotoxic LHb lesions.
482 However, this should not be taken as negative evidence of the involvement of the LHb in
483 inhibitory learning. This may be explained on the basis of the need for special tests (e.g.,
484 summation, retardation, PIT; see Rescorla, 1969) to certify that an inhibitory stimulus is capable
485 to counteract reward-related behaviors. Unfortunately, Choi et al. (2020) did not conduct any
486 of the available tests for assessing conditioned inhibition.
487 Both in backward conditioning and in explicitly unpaired conditioning preparations, the
488 reward would not be expected to occur in any particular moment; specifically, a reward
489 delivery is scheduled to occur at random intervals without discrete cues anticipating it, unlike in
490 extinction and in feature-negative discrimination procedures. The ability of the LHb to
491 modulate reward seeking, even in paradigms not involving explicit reward omissions, raise the
492 question of what the mechanisms involved are. A recent study by Wang et al. (2017) may shed
493 light in this issue. These authors described a rebound excitation in the LHb following the
494 inhibition typically produced by reward delivery. Given appropriate timing, the traces of such a
495 rebound excitation could have been paired with the target stimuli for the control –but not for
496 the lesioned– subjects in Laurent et al.’s study. Such pairings would facilitate the propagation of
497 LHb excitation, presumably promoting dopamine suppression, eventually upon the target
498 stimuli alone. However, this mechanism cannot account readily for inhibitory learning in
21
499 explicitly unpaired conditioning procedures. In these protocols, the target stimulus does not
500 consistently follow the reward, but these events are rather separated by intertrial intervals of
501 varying durations. Therefore, decreases in dopamine mediated by rebound activity of the LHb
502 could hardly explain the inhibitory properties acquired by a target stimulus in explicitly
503 unpaired procedures.
504 The LHb is known to modulate other monoamines besides dopamine, such as serotonin
505 (Amat et al., 2001). Tonic serotoninergic activity depends on accumulated experience with
506 rewards (Cohen et al., 2015). In turn, serotonergic tone determines the effects of phasic activity
507 in serotonergic neurons, most of which is known to occur during aversive experiences (Cools et
508 al., 2011). Thus, the LHb may participate in accumulating information about reward probability
509 in a given environment. This would tune serotonin levels during inter-reward intervals on
510 explicitly unpaired procedures, which might determine the affectiveness of salient cues
511 throughout those intervals. A similar effect may also be evoked by the LHb via modulation of
512 tonic dopamine levels (see Lecourtier, DeFrancesco, & Moghaddam, 2008), which would not
513 contradict with the serotonergic hypothesis. However, it is yet to be determined with certainty
514 whether the LHb participates in inhibitory learning from explicitly unpaired protocols. This will
515 require using either in vivo real-time recording or special tests for conditioned inhibition in
516 lesion/inactivation studies. Also, it is not clear how the LHb would be more active in explicitly
517 unpaired conditions than in control conditions involving equivalent reward density, which was
518 the main result reported by Choi et al. (2020). A possibility is that the LHb is relatively inactive
519 when a reward is fully predicted by a cue. In contrast, it may be tonically active in ambiguous
520 conditions; for example, when the reward or the cue occur at any given time without notice.
22
521
522 7. The Habenulo-Meso-Cortical Excitatory Pathway May Play a Role in Negative Reward
523 Prediction Error
524
525 The glutamatergic LHb-VTA-mPFC pathway (see Figure 1) is less well-known than the LHb-RMTg
526 pathway but may also play an important role in inhibitory learning from reward omission. As we
527 described earlier, unlike other LHb efferents this pathway promotes, rather than inhibiting,
528 dopaminergic transmission. Perhaps, the most crucial evidence supporting this idea stems from
529 the study conducted by Lammel et al. (2012). These authors reported that blocking
530 dopaminergic transmission in the mPFC abolishes the capacity of the LHb to generate aversion
531 to spatial stimuli. Therefore, this pathway may be crucial in complementing dips in meso-
532 striatal dopaminergic release for learning from negative reward prediction errors. Two putative,
533 not mutually exclusive, mechanisms for such an effect could be (1) directly opposing midbrain
534 dopamine activity (see Jo, Lee, & Mitzumori, 2013) and (2) selecting which stimuli should be
535 filtered to control behavior (see Vander Weele et al., 2018). The relationship between the mPFC
536 and the LHb is reciprocal (see Mathis et al., 2017) and their afferences also converge in certain
537 brain locations (Varga, Kocsis, & Sharp, 2003). The mPFC has long been considered a key brain
538 region for restraining inappropriate actions and adjusting behavior when errors occur
539 (Ragozzino, Detrick, & Kesner, 1999) but only recently this has been considered from an
540 associative learning perspective. Although research on LHb-mPFC interaction is still scant, we
541 outline some ideas on how these regions may jointly contribute to learning from negative
542 reward prediction errors.
23
543 We should consider first that the mPFC is divided in functionally dissociable anatomical
544 subregions, at least for some eutherian mammals (see Ongur and Price, 2000). A subregion that
545 might play a role in the LHb’s network is the prelimbic cortex of rodents, which presumably
546 corresponds to the pregenual anterior cingulate in primates (see Laubach, Amarante, Swanson,
547 & White, 2018). Several studies have linked this subregion with the ability of mammals to
548 restrain a dominant (previously rewarded) response (Laubach, Caetano, & Narayanan, 2015).
549 An outstanding example of this is a study conducted by Meyer and Bucci (2014) using rats as
550 subjects. These authors found that prelimbic –but not in infralimbic– PFC lesions impaired
551 inhibitory learning process in an appetitive feature-negative discrimination paradigm in rats
552 (i.e., differentiation in responding to A and AX in Figure 4). As we described above, this
553 paradigm consists in presenting a target stimulus that signals reward omission. But,
554 importantly, this target stimulus occurs in the presence of a cue that otherwise consistently
555 predicts reward delivery. To effectively learn how to stop reward-related responses in such
556 situation, subjects must solve the conflict between reward and reward-omission cues that are
557 presented simultaneously.
558 It has been suggested that the opposing effects of the prelimbic division of the mPFC to
559 the reward systems is exerted via an aversion mechanism. This rationale is supported by
560 evidence that this region is involved in fear learning (e.g., Burgos-Robles, Vidal-Gonzalez and
561 Quirk, 2009; Piantadosi, Yeates, & Fioresco, 2020) and exhibits robust excitatory connections
562 with the basal amygdala (Sotres-Bayon, Sierra-Mercado, Pardilla-Delgado and Quirk, 2012).
563 Conversely, the infralimbic portion of the mPFC, has been related to both the expression of
564 habitual reward-related responses (e.g., Haddon and Killcross, 2011) and fear suppression
24
565 (Santini, Quirk and Porter, 2008; but see Capuzzo and Floresco, 2020). The involvement of the
566 infralimbic cortex in the latter of these processes could be accounted for in terms of mediation
567 of a subjective relief state. Such a state would be functionally equivalent to reward and,
568 therefore, opposed to aversive states in a hierarchical fashion under appropriate
569 circumstances.
570 However, some findings (e.g., Sharpe and Killcross, 2015) seem to contradict this
571 prelimbic–aversive and infralimbic–appetitive notion, which prompts deviations from this
572 theoretical scheme. Instead of the aversive–appetitive dichotomy, Killcross and Sharpe (2018)
573 sustain that the infralimbic mPFC promotes attention to stimuli that reliably predict affectively
574 significant events, regardless of their valence (i.e., reflective control). Conversely, the prelimbic
575 mPFC would promote attention to higher order cues setting hierarchical (as in conditional
576 probabilities) relationships between stimuli and relevant outcomes (i.e., proactive control);
577 again, regardless of their affective valence. A similar point has been raised by Hardung et al.
578 (2017), who found evidence for a functional gradient in the mPFC spanning from the prelimbic
579 to the lateral-orbital cortex (passing through the infralimbic, medial-orbital and ventral-orbital
580 regions). These authors reported that regions near to the PL cortex tend to participate more in
581 proactive control, while regions near the lateral-orbital cortex participate more in reflective
582 control. However, this study was conducted in an appetitive setting, so hierarchical control
583 (reactive vs proactive) and opponent affective control (appetitive vs aversive) hypotheses are
584 confounded. While cortical inputs to the LHb are generally modest, the prelimbic cortex makes
585 the largest contribution among the regions of the cortex innervating this structure (Yetnikoff et
586 al., 2015). This may indicate that this pathway serves as an either positive or negative feedback
25
587 mechanism, depending on whether these connections are excitatory or inhibitory. Being the
588 LHb a major aversive center, the former possibility would be at odds with the approach of
589 Sharpe and Killcross (2018).
590 Furthermore, a recent finding has tilted the scale in favor of the aversive–appetitive
591 modularity of the mPFC. Yan, Wang and Zhou (2019) reported that spiking in a subpopulation of
592 neurons in rats dorsomedial PFC (dmPFC; including prelimbic and dorsal cingulate cortex) was
593 (1) increased upon a cue signaling an imminent electric shock, and (2) increased –but
594 considerably less– and then decayed below baseline upon the presentation of a cue that signals
595 the omission of the shock. In addition, Yan et al. (2019) showed that the cue signaling shock
596 omission passed the tests of summation and retardation for conditioned inhibition, both at
597 neural and behavioral levels. Interestingly, this pattern of results mirrors those in the study by
598 Tobin et al. (2003) in an appetitive setting with macaques. In brief, suppressing fear responses
599 involves inhibition of dmPFC neurons, suggesting that this region serves primarily to facilitate
600 aversive learning rather than having a general hierarchical control function. This is in line with
601 the idea that LHb input to the prelimbic prefrontal cortex may counteract the effects of reward
602 cues via an opponent affective process. Such a mechanism has been regarded as one of the
603 necessary conditions for the occurrence of inhibitory learning and performance (see Konorski,
604 1948; Pearce & Hall, 1980). However, this mechanism may be complemented with others, such
605 as threshold increase and attention to informative cues (Laubach et al., 2015).
606
607 8. Relevance of the Lateral Habenula for Mental Health
608
26
609 There is an emerging literature suggesting that many psychiatric disorders can be characterized
610 as alterations of the ability to predict rewards and aversive outcomes (Byrom & Murphy, 2018).
611 Therefore, human wellbeing could be understood, at least in part, in terms of the dynamics of
612 associative learning and outcome prediction (Papalini, Beckers & Vervliet, 2020). Thus,
613 disrupting the system involving outcome predictions, either in an upward or downward
614 direction, could lead to maladaptive behaviors that threaten people's subjective wellbeing.
615 Disappointment, frustration, and discomfort stemming from worse than expected outcomes
616 are useful for directing behavior to adaptive ways of interaction with our surroundings (see
617 Figure 5 light purple arrows). However, if these subjective sensations are lacking or excessive, it
618 is likely that problematic behaviors will arise (see Figure 5 dark colored arrows).
619
620 ***INSERT FIGURE 5***
621
622 For instance, some authors have proposed that a pervasive reward learning process
623 (Chow, Smith, Wilson, Zentall and Beckmann, 2017; Sosa and dos Santos, 2019a) can explain
624 maladaptive decision-making in impulsive and risk-taking behaviors. This may arise in part from
625 a deficit in learning from negative reward prediction errors (Laude, Stagner and Zentall, 2014;
626 Sosa and dos Santos, 2019b) (see Figure 5 dark aqua arrow). Impulsive-like behavior is often
627 studied in choice protocols with animal models. A study by Stopper and Floresco (2014) showed
628 that LHb inactivation led to indifference when rats choose between ambiguous alternatives.
629 Specifically, subjects were biased away from the most profitable of two alternative rewards when a
630 delay was imposed between the selecting action and that reward; however, this bias was abolished by
27
631 LHb inactivation. This suggests that delay aversion might stem from a negative reward prediction
632 error mechanism, rendering the puzzle even more complex. On the other hand, studies
633 involving humans and rodent models of helplessness (Gold and Kadriu, 2019; Jakobs, Pitzer,
634 Sartorius, Unterberg and Kiening, 2019) have suggested that an over-reactivity of the LHb can
635 induce symptoms associated with depression (see Figure 10 dark purple arrows). Such
636 symptoms include reluctance to explore new environments and unwillingness to initiate
637 challenging actions, which could be interpreted as a constant state of disappointment or sense
638 of doom (Proulx et al., 2014). In that case, atypically high expectations could lead to a pervasive
639 and maladaptive enhanced inhibitory learning. Another possibility is that some individuals may
640 have a proclivity to overgeneralize learning from reward omissions, which would cause a
641 decreased activity overall (a major symptom of depression). These putative mechanisms are, of
642 course, not mutually exclusive.
643 Synthetizing behavioral and neurophysiological studies could serve to address the
644 etiology and phenomenology of these psychiatric conditions. Such an enterprise would require
645 integrating details regarding behavioral changes in response to environmental constraints and
646 the underlying neurobiological mechanisms involved. This is precisely what we have attempted
647 in the present paper. If ongoing research continues to make advances, then this perspective
648 could serve to guide clinical practice. An example would be determining whether
649 psychotherapy would be enough to tackle a particular case or if combining it with
650 pharmacotherapy would be pertinent. Knowledge about neural and behavioral dynamics might
651 serve to design combined intensive interventions to avoid long-term side effects of medications
652 and dependency (see Hengartner, Davies and Read, 2019). Additionally, understanding
28
653 morphogenesis and wiring of the LHb during embryological development could be relevant to
654 inquire whether disturbances in these processes underlie problematic behavioral traits
655 (Schmidt and Pasterkamp, 2017).
656 Some recent studies have furthered our understanding of behavioral-physiological
657 interactions involving the LHb from basic science to application. For example, Zhu et al. (2019)
658 found an association between subclinical depression and atypical connections in the dorsal
659 posterior thalamus and the LHb. This peculiar connectivity of the LHb might predispose people
660 to depression or, alternatively, some life events could promote behavioral patterns leading to
661 that phenotype. Experiences of reward loss could vary regarding the time frame and magnitude
662 of the involved rewards (Papini, Wood, Daniel and Norris, 2006). A person could lose money
663 during a gambling night with friends, which could be a relatively innocuous experience, or lose
664 a lifetime partner, being a devastating event. Significant life experiences can cause detectable
665 changes in brain networks (e.g., Matthews et al. 2016); tracking these changes can lead to
666 finding ways to understand the underpinnings of behavioral phenotypes associated with
667 psychiatric conditions. An intriguing example is that attenuating LHb activity has been shown to
668 ameliorate depressive-like symptoms induced by maternal separation in mice (Tchenio, Lecca,
669 Valentinova and Mameli, 2017).
670
671 9. Directions for Moving the Field Forward
672
673 Computational modeling of associative phenomena is just one possible approach to account for
674 the relationship between LHb physiology and inhibitory learning. There are other alternatives
29
675 for improving our understanding of this topic, even in a more detailed manner, such as neural
676 network models. These are powerful tools for envisaging plausible mechanisms implemented
677 by biological agents to interact adaptively with their environment. A clear advantage of these
678 models is that they make physiological hypotheses more tenable (Frank, 2005). To our
679 knowledge, only few recently proposed neural-network models (see Mollick et al., 2020; Vitay
680 and Hamker, 2014) have incorporated nodes regarding the physiology of the LHb. Even so,
681 those models do not account for the multifaceted circuitry (e.g., including the LHb-VTA-mPFC
682 patway) of this structure which, we argue, is highly relevant for an utter understanding of its
683 role in learning. Lack of inclusion of the LHb into neural network modeling (e.g., Burgos and
684 García-Leal, 2015; Collins and Frank, 2014; O’Reilly, Russin and Herd, 2019; Sutton and Barto,
685 2018) may be partly due to an overemphasis in performance over learning of inhibitory control.
686 Cortical-thalamic-striatal circuits are often invoked to account for inhibitory performance, often
687 without formally specifying how outcomes “teach” future actions (e.g. van Wijk et al., 2020;
688 Verbruggen et al. 2014). An integration of learning and performance models should be pursued
689 to increase our understanding of the broad network in which the LHb participates.
690 Another matter for future consideration is whether the role of the LHb in learning from
691 reward omission is exclusive for edible rewards or it extends to other types of reward, such as
692 sexual stimuli for receptive individuals. In addition, our understanding of the behavioral
693 neuroscience regarding the LHb come from select model species. It has been hypothesized that
694 the habenular complex evolved in an ancestor of vertebrates to enable circadian-determined
695 movement modulation (Hikosaka, 2010). In this sense, the LHb may have later evolved its role
696 for suppressing specific actions in response to more dynamic environmental information. This
30
697 exaptation hypothesis implies that the habenula of basal lineages did not play the same role it
698 does in extant vertebrates. This, in turn, implies that either basal lineages could not learn from
699 reward omission or that they did so by recruiting other mechanisms.
700 Even invertebrates are equipped with mechanisms for learning from situations involving
701 negative prediction errors (i.e., conditioned inhibition; Acebes et al., 2012; Britton and Farley,
702 1999; Couvillon, Bumanglag and Bitterman, 2003; Durrieu, Wystrach, Arrufatm Giurfa, & Isabel,
703 2020). Although this fact is appealing, for now it is fairly limited to the behavioral level of
704 observation. However, an emerging field of research is currently scrutinizing the neural
705 mechanisms involved in negative reward prediction errors in drosophila flies
706 (Felsenberg, Barnstedt, Cognini, Lin and Waddell, 2017). In these animals, a bilateral structure
707 known as the mushroom body supports associative learning using dopamine as a plasticity
708 factor, in a similar fashion as the vertebrate brain does (Aso et al., 2014). The mushroom body
709 possesses different subdivisions, which selectively release dopamine during reward or aversive
710 stimuli. Intriguingly, Felsenberg et al. (2017) reported that inactivating aversion-related
711 dopamine neurons preclude the bias in behavior otherwise produced by pairing a stimulus with
712 reward omission. It has been suggested that aversive dopamine neurons directly oppose to
713 reward-related dopamine neurons in the mushroom body (Perisse et al., 2016). This may
714 indicate that invertebrate nervous systems possess the hierarchical architecture necessary for
715 inhibitory learning, such as that found in vertebrates. Remarkably, these diverging nervous
716 systems also exhibit antagonistic dopaminergic subsystems, similar to those that the LHb
717 orchestrate in the vertebrate brain. Future research in this field may uncover the evolutionary
31
718 origins of negative reward prediction errors and a more precise dating of the origins and
719 precursors of the LHb.
720
721 9. Summary and Concluding Remarks
722
723 Organisms benefit from possessing mechanisms to track resources in their surroundings and
724 adjust their actions to obtain maximum profit. This relies on a delicate balance between
725 responding and withholding specific responses whenever it is appropriate. Sensorimotor
726 feedforward loops are mediated by long-term potentiation in some locations of the striatum
727 induced by midbrain dopamine inputs (Yagishita et al., 2014). Conversely, there are several
728 processes that prevent spurious sensorimotor loops, which would potentially waste energy and
729 put the organism at risk. A convenient paradigm to study one such process is conditioned
730 inhibition, a subclass of Pavlovian learning phenomena. This paradigm could serve as a
731 principled and physiologically informed behavioral phenotype model of negative error
732 prediction. Negative feedback control mechanisms had been recently proposed to be of
733 primary importance to understand adaptive behavior (Yin, 2020). Therefore, conditioned
734 inhibition might be a particularly relevant and timely conceptual and methodological tool.
735 Intriguingly, although inhibitory learning has been thought to be multifaceted in nature (Sosa
736 and Ramírez, 2019), several of its manifestations seem to implicate the LHb.
737 Omission of expected rewards at a precise time promotes dopamine dips in canonical
738 midbrain neurons with a remarkable contribution of the LHb (see Mollick et al., 2020). These
739 dopamine dips back-propagate to stimuli that consistently predict reward omission in a way
32
740 that resembles plastic changes associated with dopamine release (Tobler et al., 2003). All things
741 being equal, dopamine dips can counteract phasic dopamine effects on behavior (Chang et al.,
742 2018). This accounts for LHb’s contribution to extinction of a previously acquired response
743 (Zapata et al., 2017), feature negative discrimination, negative summation, and retardation of
744 acquisition in appetitive settings (Tobler et al., 2003). However, although the activity of the LHb
745 is associated with dopamine dips, it may also participate in learning processes beyond general
746 suppression of previously rewarded actions (Laurent et al., 2017). This selective effect is
747 remarkable given that other brain areas have been associated with a global response inhibition
748 (see Wiecki & Frank, 2013).
749 Even if LHb’s supressing effects on midbrain dopamine neurons are robust, the
750 excitatory LHb-VTA-mPFC pathway may also play a role in inhibitory learning from reward
751 omission. Lesions of the prelimbic portion of the mPFC impair inhibitory learning in appetitive
752 settings (e.g., Meyer and Bucci, 2014). This region receives indirect dopaminergic input from
753 the LHb and its activation has been implicated in aversion learning (Lammel et al., 2012). Some
754 learning theories have proposed that conditioned inhibition is partly determined by
755 antagonizing the effect of an affectively loaded conditioned stimulus (Konorski, 1948; Pearce
756 and Hall, 1980). The activity of this pathway leads to dopamine release in the mPFC, which may
757 allow channeling relevant sensory inputs to key plastic brain regions (Vander Weele et al.,
758 2018). Therefore, interactions of the LHb with aversive centers via the prelimbic cortex may
759 facilitate imbuing cues with the capacity of counteracting reward-seeking tendencies.
760 Disrupting either the excitatory or inhibitory LHb pathways has been shown to hamper
761 inhibitory learning processes to some degree. Whether and how those pathways may interact
33
762 or complement each other to produce inhibitory learning remains a matter of further
763 investigation.
764 Feature-negative discrimination and extinction procedures promote a decrease (or even
765 net negative values) in associative strength, presumably through the omission of a reward at a
766 precise moment following a cue. However, backward conditioning also induces conditioned
767 inhibition and this effect is impaired by specific ablation of the LHb-RMTg pathway (Laurent et
768 al., 2017). In this paradigm, the reward is not expected at a specific point in time, so other
769 mechanisms may be recruited by the LHb. A potential candidate for this process is the rebound
770 activity of the LHb following inhibition by reward (Wang et al., 2017). Yet, alternating a target
771 stimulus with reward delivery in an explicitly unpaired fashion also induces conditioned
772 inhibition. There is ex-vivo evidence that such conditions promote meaningful activity in LHb
773 neurons (Choi et al., 2020). In such case, learning could not be clearly linked neither to phasic
774 decreases in dopamine nor to post-reward rebound activity of the LHb; therefore, other
775 mechanisms involving the LHb may take place.
776 Converging lines of evidence suggest that LHb participates in the induction and
777 expression of inhibitory learning under different conditions involving reward. This supports the
778 intriguing idea that this structure is composed of several parallel circuits operating in different
779 situations (Stephenson-Jones et al., 2020). Unfortunately, conditioned inhibition, the
780 accumulated outcome of inhibitory learning, is an elusive phenomenon which often requires
781 special tests to be validated. Moreover, some authors have claimed that each test requires up
782 to several control conditions in order to be deemed conclusive (Papini and Bitterman, 1993).
783 This complicates the matter further, as it multiplies the number of subjects that are needed to
34
784 evaluate the role of the LHb in this phenomenon; aside from comparing the role of its different
785 subcircuits. However, this topic is still worth investigating, as it seems to be implied in many
786 situations involving negative predicting relationships between events. Perhaps, one situation in
787 which such processes takes place is the adaptation to environments that require behavior
788 flexibility. If one takes “negative reward prediction errors” in a broader sense, many tasks
789 requiring shifts in behavior following subtle cues could be considered in this category.
790 Accordingly, there is an increasing amount of evidence that LHb plays a role in updating
791 behavior on those tasks (e.g., Baker et al., 2017). Conditioned inhibition could be conceived as
792 an accumulated outcome of the mechanisms involved in shifting behavior in dynamic
793 conditions by error corrections. Models of associative learning would be useful to frame and
794 test this and further hypothetical propositions.
795
796 10. References
797
798 Acebes F, Solar P, Moris J, & Loy I. 2012. Associative learning phenomena in the snail (Helix
799 aspersa): Conditioned inhibition. Learn Behav 40:34–41. doi:10.3758/s13420-011-0042-6
800 Amat J, Sparks PD, Matus-Amat P, Griggs J, Watkins LR, & Maier SF. 2001. The role of the
801 habenular complex in the elevation of dorsal raphe nucleus serotonin and the changes in
802 the behavioral responses produced by uncontrollable stress. Brain Res 917:118–126.
803 doi:10.1016/S0006-8993(01)02934-1
804 Aso Y, Hattori D, Yu Y, Johnston RM, Iyer NA, Ngo TTB, et al. 2014. The neuronal architecture of
805 the mushroom body provides a logic for associative learning. Elife 3:e04577.
35
806 doi:10.7554/eLife.04577
807 Baker PM, Jhou T, Li B, Matsumoto M, Mizumori SJY, Stephenson-Jones M, et al. 2016. The
808 lateral habenula circuitry: Reward processing and cognitive control. J Neurosci 36:11482–
809 11488. doi:10.1523/JNEUROSCI.2350-16.2016
810 Baker PM, Raynor SA, Francis NT, & Mizumori SJY. 2017. Lateral habenula integration of
811 proactive and retroactive information mediates behavioral flexibility. Neuroscience
812 345:89–98. doi:10.1016/j.neuroscience.2016.02.010
813 Barrot M, Sesack SR, Georges F, Pistis M, Hong S, & Jhou TC. 2012. Braking dopamine systems:
814 A new GABA master structure for mesolimbic and nigrostriatal functions. J Neurosci
815 32:14094–14101. doi:10.1523/JNEUROSCI.3370-12.2012
816 Bouton ME, Maren S, & McNally GP. 2020. Behavioral and Neurobiological Mechanisms of
817 Pavlovian and Instrumental Extinction Learning. Physiol Rev.
818 doi:10.1152/physrev.00016.2020
819 Britton G, & Parley J. 1999. Behavioral and neural bases of noncoincidence learning in
820 Hermissenda. J Neurosci 19:9126–9132. doi:10.1523/jneurosci.19-20-09126.1999
821 Bromberg-Martin ES, Matsumoto M, Hong S, & Hikosaka O. 2010. A pallidus-habenula-
822 dopamine pathway signals inferred stimulus values. J Neurophysiol 104:1068–1076.
823 doi:10.1152/jn.00158.2010
824 Burgos-Robles A, Vidal-Gonzalez I, & Quirk GJ. 2009. Sustained conditioned responses in
825 prelimbic prefrontal neurons are correlated with fear expression and extinction failure. J
826 Neurosci 29:8474–8482. doi:10.1523/JNEUROSCI.0378-09.2009
827 Burgos JE, & García-Leal Ó. 2015. Autoshaped choice in artificial neural networks: Implications
36
828 for behavioral economics and neuroeconomics. Behav Processes 114:63–71.
829 doi:10.1016/j.beproc.2015.01.010
830 Byrom NC, & Murphy RA. 2018. Individual differences are more than a gene × environment
831 interaction: The role of learning. J Exp Psychol Anim Learn Cogn 44:36–55.
832 doi:10.1037/xan0000157
833 Capuzzo G, & Floresco SB. 2020. Prelimbic and Infralimbic Prefrontal Regulation of Active and
834 Inhibitory Avoidance and Reward-Seeking. J Neurosci 40:4773–4787.
835 doi:10.1523/JNEUROSCI.0414-20.2020
836 Chang CY, Gardner MPH, Conroy JC, Whitaker LR, & Schoenbaum G. 2018. Brief, but not
837 prolonged, pauses in the firing of midbrain dopamine neurons are sufficient to produce a
838 conditioned inhibitor. J Neurosci 38:8822–8830. doi:10.1523/JNEUROSCI.0144-18.2018
839 Choi BR, Kim DH, Gallagher M, & Han JS. 2020. Engagement of the Lateral Habenula in the
840 Association of a Conditioned Stimulus with the Absence of an Unconditioned Stimulus.
841 Neuroscience 444:136–148. doi:10.1016/j.neuroscience.2020.07.031
842 Chow JJ, Smith AP, Wilson AG, Zentall TR, & Beckmann JS. 2017. Suboptimal choice in rats:
843 Incentive salience attribution promotes maladaptive decision-making. Behav Brain Res
844 320:244–254. doi:10.1016/j.bbr.2016.12.013
845 Cohen JY, Amoroso MW, & Uchida N. 2015. Serotonergic neurons signal reward and
846 punishment on multiple timescales. Elife 4. doi:10.7554/eLife.06346
847 Collins AGE, & Frank MJ. 2014. Opponent actor learning (OpAL): Modeling interactive effects of
848 striatal dopamine on reinforcement learning and choice incentive. Psychol Rev 121:337–
849 366. doi:10.1037/a0037015
37
850 Cools R, Nakamura K, & Daw ND. 2011. Serotonin and dopamine: Unifying affective,
851 activational, and decision functions. Neuropsychopharmacology 36:98–113.
852 doi:10.1038/npp.2010.121
853 Couvillon PA, Bumanglag A V., & Bitterman ME. 2003. Inhibitory conditioning in honeybees. Q J
854 Exp Psychol Sect B Comp Physiol Psychol 56:359–370. doi:10.1080/02724990244000313
855 Daly HB, & Daly JT. 1982. A mathematical model of reward and aversive nonreward: Its
856 application in over 30 appetitive learning situations. J Exp Psychol Gen 111:441–480.
857 doi:10.1037/0096-3445.111.4.441
858 Durrieu M, Wystrach A, Arrufat P, Giurfa M, & Isabel G. 2020. Fruit flies can learn non-
859 elemental olfactory discriminations. Proceedings Biol Sci 287:20201234.
860 doi:10.1098/rspb.2020.1234
861 Eder AB, Rothermund K, De Houwer J, & Hommel B. 2015. Directive and incentive functions of
862 affective action consequences: an ideomotor approach. Psychol Res 79:630–649.
863 doi:10.1007/s00426-014-0590-4
864 Felsenberg J, Barnstedt O, Cognigni P, Lin S, & Waddell S. 2017. Re-evaluation of learned
865 information in Drosophila. Nature 544:240–244. doi:10.1038/nature21716
866 Gold PW, & Kadriu B. 2019. A major role for the lateral habenula in depressive illness:
867 Physiologic and molecular mechanisms. Front Psychiatry 10:1–7.
868 doi:10.3389/fpsyt.2019.00320
869 Haddon JE, & Killcross S. 2011. Inactivation of the infralimbic prefrontal cortex in rats reduces
870 the influence of inappropriate habitual responding in a response-conflict task.
871 Neuroscience 199:205–212. doi:10.1016/j.neuroscience.2011.09.065
38
872 Hardung S, Epple R, Jäckel Z, Eriksson D, Uran C, Senn V, et al. 2017. A Functional Gradient in
873 the Rodent Prefrontal Cortex Supports Behavioral Inhibition. Curr Biol 27:549–555.
874 doi:10.1016/j.cub.2016.12.052
875 Harris JA. 2019. The importance of trials. J Exp Psychol Anim Learn Cogn.
876 doi:10.1037/xan0000223
877 Harris JA, Kwok DWS, & Andrew BJ. 2014. Conditioned inhibition and reinforcement rate. J Exp
878 Psychol Anim Behav Process 40:335–354. doi:10.1037/xan0000023
879 Hengartner MP, Davies J, & Read J. 2019. Antidepressant withdrawal - The tide is finally turning.
880 Epidemiol Psychiatr Sci 2018–2020. doi:10.1017/S2045796019000465
881 Hikosaka O. 2010. The habenula: From stress evasion to value-based decision-making. Nat Rev
882 Neurosci 11:503–513. doi:10.1038/nrn2866
883 Hong S, Amemori S, Chung E, Gibson DJ, Amemori K ichi, & Graybiel AM. 2019. Predominant
884 Striatal Input to the Lateral Habenula in Macaques Comes from Striosomes. Curr Biol
885 29:51-61.e5. doi:10.1016/j.cub.2018.11.008
886 Hong S, & Hikosaka O. 2013. Diverse sources of reward value signals in the basal ganglia nuclei
887 transmitted to the lateral habenula in the monkey. Front Hum Neurosci 7:1–7.
888 doi:10.3389/fnhum.2013.00778
889 Hong S, & Hikosaka O. 2008. The Globus Pallidus Sends Reward-Related Signals to the Lateral
890 Habenula. Neuron 60:720–729. doi:10.1016/j.neuron.2008.09.035
891 Hu H, Cui Y, & Yang Y. 2020. Circuits and functions of the lateral habenula in health and in
892 disease. Nat Rev Neurosci 21:277–295. doi:10.1038/s41583-020-0292-4
893 Huang S, Borgland SL, & Zamponi GW. 2019. Dopaminergic modulation of pain signals in the
39
894 medial prefrontal cortex: Challenges and perspectives. Neurosci Lett 702:71–76.
895 doi:10.1016/j.neulet.2018.11.043
896 Iordanova MD, Yau JO-Y, McDannald MA, & Corbit LH. 2021. Neural substrates of appetitive
897 and aversive prediction error. Neurosci Biobehav Rev 123:337–351.
898 doi:10.1016/j.neubiorev.2020.10.029
899 Jakobs M, Pitzer C, Sartorius A, Unterberg A, & Kiening K. 2019. Acute 5 Hz deep brain
900 stimulation of the lateral habenula is associated with depressive-like behavior in male
901 wild-type Wistar rats. Brain Res 1721:146283. doi:10.1016/j.brainres.2019.06.002
902 Jhou TC, Geisler S, Marinelli M, Degarmo BA, & Zahm DS. 2009. The mesopontine rostromedial
903 tegmental nucleus: A structure targeted by the lateral habenula that projects to the
904 ventral tegmental area of Tsai and substantia nigra compacta. J Comp Neurol 513:566–596.
905 doi:10.1002/cne.21891
906 Jo YS, Lee J, & Mizumori SJY. 2013. Effects of prefrontal cortical inactivation on neural activity in
907 the ventral tegmental area. J Neurosci 33:8159–8171. doi:10.1523/JNEUROSCI.0118-
908 13.2013
909 Juarez J, Barrios De Tomasi E, Muñoz-Villegas P, & Buenrostro M. 2013. Adicción farmacológica
910 y conductual In: González-Garrido A, & Matute E, editors. Cerebro y Conducta.
911 Guadalajara: Manual Moderno.
912 Konorski J. 1948. Conditioned reflexes and neuron organization., Conditioned reflexes and
913 neuron organization. New York, NY, US: Cambridge University Press.
914 Lammel S, Lim BK, Ran C, Huang KW, Betley MJ, Tye KM, et al. 2012. Input-specific control of
915 reward and aversion in the ventral tegmental area. Nature 491:212–217.
40
916 doi:10.1038/nature11527
917 Laubach M, Amarante LM, Swanson K, & White SR. 2018. Cognition and Behavior What, If
918 Anything, Is Rodent Prefrontal Cortex? eNeuro 5:315–333.
919 Laubach M, Caetano MS, & Narayanan NS. 2015. Mistakes were made: neural mechanisms for
920 the adaptive control of action initiation by the medial prefrontal cortex. J Physiol Paris
921 109:104–117. doi:10.1016/j.jphysparis.2014.12.001
922 Laude JR, Stagner JP, & Zentall TR. 2014. Suboptimal choice by pigeons may result from the
923 diminishing effect of nonreinforcement. J Exp Psychol Anim Behav Process 40:12–21.
924 doi:10.1037/xan0000010
925 Laurent V, Wong FL, & Balleine BW. 2017. The lateral habenula and its input to the rostromedial
926 tegmental nucleus mediates outcome-specific conditioned inhibition. J Neurosci 37:10932–
927 10942. doi:10.1523/JNEUROSCI.3415-16.2017
928 Lecourtier L, DeFrancesco A, & Moghaddam B. 2008. Differential tonic influence of lateral
929 habenula on prefrontal cortex and nucleus accumbens dopamine release. Eur J Neurosci
930 27:1755–1762. doi:10.1111/j.1460-9568.2008.06130.x
931 Li H, Vento PJ, Parrilla-Carrero J, Pullmann D, Chao YS, Eid M, et al. 2019. Three Rostromedial
932 Tegmental Afferents Drive Triply Dissociable Aspects of Punishment Learning and Aversive
933 Valence Encoding. Neuron 104:987-999.e4. doi:10.1016/j.neuron.2019.08.040
934 Mathis V, Barbelivien A, Majchrzak M, Mathis C, Cassel JC, & Lecourtier L. 2017. The Lateral
935 Habenula as a Relay of Cortical Information to Process Working Memory. Cereb Cortex
936 27:5485–5495. doi:10.1093/cercor/bhw316
937 Matthews GA, Nieh EH, Vander Weele CM, Halbert SA, Pradhan R V., Yosafat AS, et al. 2016.
41
938 Dorsal Raphe Dopamine Neurons Represent the Experience of Social Isolation. Cell
939 164:617–631. doi:10.1016/j.cell.2015.12.040
940 Meyer HC, & Bucci DJ. 2014. The contribution of medial prefrontal cortical regions to
941 conditioned inhibition. Behav Neurosci 128:644–653. doi:10.1037/bne0000023
942 Mollick JA, Chang LJ, Krishnan A, Hazy TE, Krueger KA, Frank GKW, et al. 2021. The Neural
943 Correlates of Cued Reward Omission. Front Hum Neurosci 15.
944 doi:10.3389/fnhum.2021.615313
945 Mollick JA, Hazy TE, Krueger KA, Nair A, Mackie P, Herd SA, et al. 2020. A systems-neuroscience
946 model of phasic dopamine. Psychol Rev. doi:10.1037/rev0000199
947 O’Reilly RC, Russin J, & Herd SA. 2019. Computational models of motivated frontal function, 1st
948 ed, Handbook of Clinical Neurology. Elsevier B.V. doi:10.1016/B978-0-12-804281-6.00017-
949 3
950 Omelchenko N, & Sesack SR. 2009. Ultrastructural analysis of local collaterals of rat ventral
951 tegmental area neurons: GABA phenotype and synapses onto dopamine and GABA cells.
952 Synapse 63:895–906. doi:10.1002/syn.20668
953 Ongur D, & Price JL. 2000. The Organization of Networks within the Orbital and Medial
954 Prefrontal Cortex of Rats, Monkeys and Humans. Cereb Cortex 10:206–219.
955 doi:10.1093/cercor/10.3.206
956 Papalini S, Beckers T, & Vervliet B. 2020. Dopamine: From Prediction Error To Psychotherapy.
957 Transl Psychiatry 1–37. doi:10.1038/s41398-020-0814-x
958 Papini MR, & Bitterman ME. 1993. The Two-Test Strategy in the Study of Inhibitory
959 Conditioning. J Exp Psychol Anim Behav Process 19:342–352. doi:10.1037/0097-
42
960 7403.19.4.342
961 Papini MR, Wood M, Daniel AM, & Norris JN. 2006. Reward loss as psychological pain. Int J
962 Psychol Psychol Ther 6:189–213.
963 Pearce JM, & Hall G. 1980. A model for Pavlovian learning: Variations in the effectiveness of
964 conditioned but not of unconditioned stimuli. Psychol Rev 87:532–552. doi:10.1037/0033-
965 295X.87.6.532
966 Perisse E, Owald D, Barnstedt O, Talbot CBB, Huetteroth W, & Waddell S. 2016. Aversive
967 Learning and Appetitive Motivation Toggle Feed-Forward Inhibition in the Drosophila
968 Mushroom Body. Neuron 90:1086–1099. doi:10.1016/j.neuron.2016.04.034
969 Piantadosi PT, Yeates DCM, & Floresco SB. 2020. Prefrontal cortical and nucleus accumbens
970 contributions to discriminative conditioned suppression of reward-seeking. Learn Mem
971 27:429–440. doi:10.1101/LM.051912.120
972 Proulx CD, Hikosaka O, & Malinow R. 2014. Reward processing by the lateral habenula in
973 normal and depressive behaviors. Nat Neurosci 17:1146–1152. doi:10.1038/jid.2014.371
974 Ragozzino ME, Detrick S, & Kesner RP. 1999. Involvement of the prelimbic-infralimbic areas of
975 the rodent prefrontal cortex in behavioral flexibility for place and response learning. J
976 Neurosci 19:4585–4594. doi:10.1523/jneurosci.19-11-04585.1999
977 Rescorla R, & Wagner. AR. 1972. A theory of classical conditioning: Variations in the
978 effectiveness of reinforcement and nonreinforcement., Classical Conditioning II Current
979 Research and Theory. New York: Appleton-Century-Crofts.
980 Rescorla RA. 1969. Pavlovian conditioned inhibition. Psychol Bull 72:77–94.
981 doi:10.1037/h0027760
43
982 Roesch MR, Esber GR, Li J, Daw ND, & Schoenbaum G. 2012. Surprise! Neural correlates of
983 Pearce-Hall and Rescorla-Wagner coexist within the brain. Eur J Neurosci 35:1190–1200.
984 doi:10.1111/j.1460-9568.2011.07986.x
985 Roughley S, & Killcross S. 2019. Loss of hierarchical control by occasion setters following lesions
986 of the prelimbic and infralimbic medial prefrontal cortex in rats. Brain Sci 9:1–16.
987 doi:10.3390/brainsci9030048
988 Salas R, Baldwin P, de Biasi M, & Montague PR. 2010. BOLD responses to negative reward
989 prediction errors in human habenula. Front Hum Neurosci 4:1–7.
990 doi:10.3389/fnhum.2010.00036
991 Santini E, Quirk GJ, & Porter JT. 2008. Fear conditioning and extinction differentially modify the
992 intrinsic excitability of infralimbic neurons. J Neurosci 28:4028–4036.
993 doi:10.1523/JNEUROSCI.2623-07.2008
994 Savastano HI, Cole RP, Barnet RC, & Miller RR. 1999. Reconsidering Conditioned Inhibition.
995 Learn Motiv 30:101–127. doi:10.1006/lmot.1998.1020
996 Schmidt ERE, & Pasterkamp RJ. 2017. The molecular mechanisms controlling morphogenesis
997 and wiring of the habenula. Pharmacol Biochem Behav 162:29–37.
998 doi:10.1016/j.pbb.2017.08.008
999 Schultz W, Dayan P, & Montague PR. 1997. A neural substrate of prediction and reward. Science
1000 (80- ) 275:1593–1599. doi:10.1126/science.275.5306.1593
1001 Sharpe MJ, & Killcross S. 2018. Modulation of attention and action in the medial prefrontal
1002 cortex of rats. Psychol Rev 125:822–843. doi:10.1037/rev0000118
1003 Sharpe MJ, & Killcross S. 2015. The prelimbic cortex directs attention toward predictive cues
44
1004 during fear learning. Learn Mem 22:289–293. doi:10.1101/lm.038273.115
1005 Sosa R, & dos Santos CV. 2019a. Toward a Unifying Account of Impulsivity and the Development
1006 of Self-Control. Perspect Behav Sci 42:291–322. doi:10.1007/s40614-018-0135-z
1007 Sosa R, & dos Santos CV. 2019b. Conditioned Inhibition and its Relationship to Impulsivity:
1008 Empirical and Theoretical Considerations. Psychol Rec 69:315–332. doi:10.1007/s40732-
1009 018-0325-9
1010 Sosa R, & Ramírez MN. 2019. Conditioned inhibition: Historical critiques and controversies in
1011 the light of recent advances. J Exp Psychol Anim Learn Cogn 45:17–42.
1012 doi:10.1037/xan0000193
1013 Sotres-Bayon F, Sierra-Mercado D, Pardilla-Delgado E, & Quirk GJ. 2012. Gating of Fear in
1014 Prelimbic Cortex by Hippocampal and Amygdala Inputs. Neuron 76:804–812.
1015 doi:10.1016/j.neuron.2012.09.028
1016 Stamatakis AM, Van Swieten M, Basiri ML, Blair GA, Kantak P, & Stuber GD. 2016. Lateral
1017 hypothalamic area glutamatergic neurons and their projections to the lateral habenula
1018 regulate feeding and reward. J Neurosci 36:302–311. doi:10.1523/JNEUROSCI.1202-
1019 15.2016
1020 Stephenson-Jones M, Bravo-Rivera C, Ahrens S, Furlan A, Xiao X, Fernandes-Henriques C, et al.
1021 2020. Opposing Contributions of GABAergic and Glutamatergic Ventral Pallidal Neurons to
1022 Motivational Behaviors. Neuron 105:921-933.e5. doi:10.1016/j.neuron.2019.12.006
1023 Stopper CM, & Floresco SB. 2014. What’s better for me? Fundamental role for lateral habenula
1024 in promoting subjective decision biases. Nat Neurosci 17:33–35. doi:10.1038/nn.3587
1025 Sutton RS, & Barto AG. 2018. Reinforcement learning: An introduction (second edition), The
45
1026 MIT Pr. ed. Cambridge.
1027 Tchenio A, Lecca S, Valentinova K, & Mameli M. 2017. Limiting habenular hyperactivity
1028 ameliorates maternal separation-driven depressive-like symptoms. Nat Commun 8.
1029 doi:10.1038/s41467-017-01192-1
1030 Tian J, Huang R, Cohen JY, Osakada F, Kobak D, Machens CK, et al. 2016. Distributed and Mixed
1031 Information in Monosynaptic Inputs to Dopamine Neurons. Neuron 91:1374–1389.
1032 doi:10.1016/j.neuron.2016.08.018
1033 Tian J, & Uchida N. 2015. Habenula Lesions Reveal that Multiple Mechanisms Underlie
1034 Dopamine Prediction Errors. Neuron 87:1304–1316. doi:10.1016/j.neuron.2015.08.028
1035 Tobler PN, Dickinson A, & Schultz W. 2003. Coding of Predicted Reward Omission by Dopamine
1036 Neurons in a Conditioned Inhibition Paradigm. J Neurosci 23:10402–10410.
1037 doi:10.1523/jneurosci.23-32-10402.2003
1038 Tsutsui-Kimura I, Matsumoto H, Uchida N, & Watabe-Uchida M. 2020. Distinct temporal
1039 difference error signals in dopamine axons in three regions of the striatum in a decision-
1040 making task 2 3. bioRxiv 2020.08.22.262972.
1041 van Wijk BCM, Alkemade A, & Forstmann BU. 2020. Functional segregation and integration
1042 within the human subthalamic nucleus from a micro- and meso-level perspective. Cortex
1043 131:103–113. doi:10.1016/j.cortex.2020.07.004
1044 Vander Weele CM, Siciliano CA, Matthews GA, Namburi P, Izadmehr EM, Espinel IC, et al. 2018.
1045 Dopamine enhances signal-to-noise ratio in cortical-brainstem encoding of aversive
1046 stimuli. Nature 563:397–401. doi:10.1038/s41586-018-0682-1
1047 Varga V, Kocsis B, & Sharp T. 2003. Electrophysiological evidence for convergence of inputs
46
1048 from the medial prefrontal cortex and lateral habenula on single neurons in the dorsal
1049 raphe nucleus. Eur J Neurosci 17:280–286. doi:10.1046/j.1460-9568.2003.02465.x
1050 Vento PJ, & Jhou TC. 2020. Bidirectional Valence Encoding in the Ventral Pallidum. Neuron
1051 105:766–768. doi:10.1016/j.neuron.2020.02.017
1052 Verbruggen F, Best M, Bowditch WA, Stevens T, & McLaren IPL. 2014. The inhibitory control
1053 reflex. Neuropsychologia 65:263–278. doi:10.1016/j.neuropsychologia.2014.08.014
1054 Vitay J, & Hamker FH. 2014. Timing and expectation of reward: A neuro-computational model
1055 of the afferents to the ventral tegmental area. Front Neurorobot 8:1–25.
1056 doi:10.3389/fnbot.2014.00004
1057 Vogel EH, Ponce FP, & Wagner AR. 2019. The development and present status of the SOP model
1058 of associative learning. Q J Exp Psychol 72:346–374. doi:10.1177/1747021818777074
1059 Wagner AR, & Rescorla RA. 1972. Inhibition in Pavlovian Conditioning: Application of a Theory.
1060 Inhib Learn 301–334.
1061 Wang D, Li Y, Feng Q, Guo Q, Zhou J, & Luo M. 2017. Learning shapes the aversion and reward
1062 responses of lateral habenula neurons. Elife 6:1–20. doi:10.7554/elife.23045
1063 Wasserman EA, Franklin SR, & Hearst E. 1974. Pavlovian appetitive contingencies and approach
1064 versus withdrawal to conditioned stimuli in pigeons. J Comp Physiol Psychol 86:616–627.
1065 doi:10.1037/h0036171
1066 Wheeler RA, & Carelli RM. 2006. The neuroscience of pleasure. Focus on “ventral pallidum firing
1067 codes hedonic reward: When a bad taste turns good.” J Neurophysiol 96:2175–2176.
1068 doi:10.1152/jn.00727.2006
1069 Wiecki T V., & Frank MJ. 2013. A computational model of inhibitory control in frontal cortex and
47
1070 basal ganglia. Psychol Rev 120:329–355. doi:10.1037/a0031542
1071 Yagishita S, Hayashi-Takagi A, Ellis-Davies GCR, Urakubo H, Ishii S, & Kasai H. 2014. A critical
1072 time window for dopamine actions on the structural plasticity of dendritic spines. Science
1073 (80- ) 345:1616–1620. doi:10.1126/science.1255514
1074 Yan R, Wang T, & Zhou Q. 2019. Elevated dopamine signaling from ventral tegmental area to
1075 prefrontal cortical parvalbumin neurons drives conditioned inhibition. Proc Natl Acad Sci U
1076 S A 116:13077–13086. doi:10.1073/pnas.1901902116
1077 Yetnikoff L, Cheng AY, Lavezzi HN, Parsley KP, & Zahm DS. 2015. Sources of input to the
1078 rostromedial tegmental nucleus, ventral tegmental area, and lateral habenula compared:
1079 A study in rat. J Comp Neurol 523:2426–2456. doi:10.1002/cne.23797
1080 Yin H. 2020. The crisis in neuroscience, The Interdisciplinary Handbook of Perceptual Control
1081 Theory. Elsevier Inc. doi:10.1016/b978-0-12-818948-1.00003-4
1082 Zapata A, Hwang EK, & Lupica CR. 2017. Lateral Habenula Involvement in Impulsive Cocaine
1083 Seeking. Neuropsychopharmacology 42:1103–1112. doi:10.1038/npp.2016.286
1084 Zhu Y, Qi S, Zhang B, He D, Teng Y, Hu J, et al. 2019. Connectome-based biomarkers predict
1085 subclinical depression and identify abnormal brain connections with the lateral habenula
1086 and thalamus. Front Psychiatry 10:1–14. doi:10.3389/fpsyt.2019.00371
1087
1088 Figure 1. Projections of the midbrain dopamine regions and their neurochemical interaction with the LHb. Panel A
1089 shows a sagittal view of the rat brain, depicting efferent pathways of midbrain dopamine regions and their input
1090 from the LHb. Panel B represents synaptic relationships between neurons in the VTA and inputs (direct and
1091 through the RMTg) from the LHb. DA: dopamine (orange), GABA: gamma-aminobutyric acid (green), Glu:
1092 glutamate (blue), mPFC: medial prefrontal cortex, dSTR: dorsal striatum, NAc: nucleus accumbens, LHb: lateral
48
1093 habenula, VTA: ventral tegmental area, SNc: substantia nigra pars compacta, RMTg: rostromedial tegmental
1094 nucleus. The image of the rat was adapted from https://smart.servier.com/smart_image/rat, under the terms of
1095 the CC-BY 3.0 Unported license (https://creativecommons.org/licenses/by/3.0/). Rat brain adapted from Juarez et
1096 al. (2013).
1097
1098 Figure 2. Dynamics of associative strength according to the Bush-Mosteller model in 100 consecutive trials of
1099 acquisition (λ = 1) followed by 100 extinction trials (λ = 0). Colors represent different learning rates (α), assuming
1100 that extinction rates are 3/4 of acquisition rates. Note that the decrease in responding (right panel) is a dynamic
1101 process that progresses with each trial, unlike the slow and continuous decay that would be expected if we simply
1102 allow time to pass.
1103
1104 Figure 3. Possible values of associative strength for stimulus X (VX; color scale) after being presented in compound
1105 with an excitatory conditioned stimulus, A, and followed by the omission of reward (λ = 0). According to the
1106 Rescorla–Wagner model, the outcome depends on the associative strength of the companion stimulus A
1107 (horizontal axis) that of X (vertical axis) prior to the reward omission episode, and on learning rate (α = .12 is
1108 assumed).
1109
1110 Figure 4. Associative strength, according to the Rescorla–Wagner model, of stimuli involved in a feature-negative
1111 discrimination paradigm in 80 trials of each type. Trials of the stimulus A plus reward and not rewarded A-with-X
1112 trials were assumed to alternate non-randomly. Values of α were set at 0.1 and 0.075 for rewarded (λ = 1) and not
1113 rewarded (λ = 0) trials, respectively. The associative strength of the stimulus A and the compound stimulus AX
1114 could be observed in actual performance. In contrast, the associative strength of X alone (i.e., the conditioned
1115 inhibitor) is theoretically inferred from the model and could be revealed by special tests (see text).
1116
1117 Figure 5. Simplified diagram representing the participation of the LHb in plastic brain processes for adaptively
1118 updating behavior and hypothesized consequences of excessive activity (dark purple) or inactivity (dark aqua) of
49
1119 this brain structure. EPN = entopeduncular nucleus, LHb = lateral habenula, RMTg = rostromedial tegmental
1120 nucleus, VP = ventral pallidum, VTA = ventral tegmental área.
50 A
DA
GABA
Glu
mPFC LHb
dSTR RMTg
NAc VTA SNc
B
to LHb NAc VTA RMTg
to NAc LHb VTA VTA
to VTA LHb mPFC 1.0 acquisition | extinction 0.9 a = .16 | .12 0.8 a = .08 | .06 0.7 a = .04 | .03 0.6 0.5 0.4 0.3 0.2 0.1
AssociativeStrength(V) 0.0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Trials 0.20
0.15
Vx Post-trial
trial 0.1 - 0.10 0.0
Pre
X
V -0.1
0.05
0.00
0.00 0.25 0.50 0.75 1.00 VA Pre - trial 1.0
0.8
Stimulus 0.6 A 0.4 AX X 0.2
0.0
-0.2
-0.4
-0.6
AssociativeStrength(V) -0.8
-1.0 0 10 20 30 40 50 60 70 80 Trials excitatory synapses
inhibitory synapses modulatory processes LHb Organism
Signals of reward RMTg omission Signals of impending EPN pain Signals of pain relief
sensory
Signals of impending processing VP reward
sensory input sensory VTA updang the system's state via plasc processes
moto
vironment r output En motor output
Reinforcing Discouraging recent recent behaviors behaviors
Degree of behavioral
Impulsive-like
excessive
moderate
change Depressive-like
moderate excessive
Selecve suppression Obtaining rewards Unyielding, of unfruiul or harmful and avoiding harm in Generalized stereotyped, behaviors, allowing Potenal a parcular context suppression of unfruiul behaviors alternave acons to while remaining behavior outcome wasng energy and arise which would flexible in case of (hopelessness). pung life at risk. eventually lead to changes. reward.