Progress in Neurobiology 96 (2012) 96–135
Contents lists available at SciVerse ScienceDirect
Progress in Neurobiology
jo urnal homepage: www.elsevier.com/locate/pneurobio
Neural systems analysis of decision making during goal-directed navigation
Marsha R. Penner, Sheri J.Y. Mizumori *
Department of Psychology, University of Washington, Seattle, WA 98195-1525, United States
A R T I C L E I N F O A B S T R A C T
Article history: The ability to make adaptive decisions during goal-directed navigation is a fundamental and highly
Received 12 April 2011
evolved behavior that requires continual coordination of perceptions, learning and memory processes,
Received in revised form 6 August 2011
and the planning of behaviors. Here, a neurobiological account for such coordination is provided by
Accepted 29 August 2011
integrating current literatures on spatial context analysis and decision-making. This integration includes
Available online 21 September 2011
discussions of our current understanding of the role of the hippocampal system in experience-dependent
navigation, how hippocampal information comes to impact midbrain and striatal decision making
Keywords:
systems, and finally the role of the striatum in the implementation of behaviors based on recent
Dopamine
decisions. These discussions extend across cellular to neural systems levels of analysis. Not only are key
Reinforcement learning
Hippocampus findings described, but also fundamental organizing principles within and across neural systems, as well
Striatum as between neural systems functions and behavior, are emphasized. It is suggested that studying
Navigation decision making during goal-directed navigation is a powerful model for studying interactive brain
Decision making systems and their mediation of complex behaviors.
ß 2011 Published by Elsevier Ltd.
Contents
1. Introduction ...... 97
2. Navigation and foraging behavior ...... 97
3. Laboratory tasks that are based on foraging behavior ...... 98
4. Reinforcement learning and decision making environments...... 99
4.1. Temporal difference learning ...... 100
4.2. Dopamine and reinforcement learning ...... 101
5. The neurobiology of reinforcement learning and goal-directed navigation: hippocampal contributions ...... 102
5.1. Hippocampal place fields as spatial context representations ...... 102
5.2. The hippocampus distinguishes contexts during navigation ...... 103
5.3. Cellular and network mechanisms underlying hippocampal context processing ...... 104
5.3.1. CA3 and CA1 place fields contributions to the evaluation of context ...... 105
5.3.2. Temporal encoding of spatial contextual information ...... 105
5.3.3. Sources of hippocampal spatial and nonspatial information ...... 106
5.3.4. Determining context saliency as a part of learning ...... 107
5.4. Relationship between hippocampal context codes and reinforcement based learning ...... 108
5.4.1. Functional connectivity between reinforcement and hippocampal systems...... 108
5.4.2. A role for dopamine in hippocampal-dependent learning and plasticity ...... 109
5.4.3. Impact of hippocampal context processing on dopamine cell responses to reward ...... 110
6. The neurobiology of reinforcement learning and goal-directed navigation: striatal contributions ...... 112
6.1. Striatal based navigational circuitry ...... 112
6.2. Dopamine signaling and reward prediction error within the striatum ...... 113
Abbreviations: BLA, basolateral amygdale complex; DLS, dorsolateral striatum; DMS, dorsomedial striatum; LDTg, lateral dorsal tegmental nucleus; mPFC, medial prefrontal
cortex; OFC, orbitofrontal cortex; PPTg, pedunculopontine nucleus; SI/MI, primary sensory and motor cortices; SNc, substantia nigra pars compacta; vPFC, ventral prefrontal
cortex; VTA, ventral tegmental area.
* Corresponding author at: Department of Psychology, Box 351525, University of Washington, Seattle, WA 98195-1525, United States. Tel.: +1 206 685 9660;
fax: +1 206 685 3157.
E-mail addresses: [email protected], [email protected] (Sheri J.Y. Mizumori).
0301-0082/$ – see front matter ß 2011 Published by Elsevier Ltd. doi:10.1016/j.pneurobio.2011.08.010
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 97
6.3. The ventral striatum: Pavlovian learning and cost-based decision making ...... 115
6.3.1. Nucleus accumbens and Pavlovian learning...... 116
6.3.2. The nucleus accumbens and cost-based decision making ...... 116
6.3.3. Spatial learning and navigation: the role of the ventral striatum ...... 118
6.4. Dorsal striatum: contributions to response and associative learning...... 118
6.4.1. Action–outcome learning and habit learning in the dorsal striatum ...... 119
6.4.2. Response learning in the dorsal striatum...... 119
6.4.3. Sequence learning in the dorsal striatum...... 120
6.5. Interactions between the dorsomedial and dorsolateral striatum ...... 120
7. Neural systems coordination: cellular mechanisms ...... 121
7.1. Single cells and local network coordination ...... 121
7.2. Neural systems organization and oscillatory activity ...... 122
7.2.1. Theta rhythms ...... 122
7.2.2. Gamma rhythms ...... 122
7.2.3. Coordination of theta and gamma rhythms ...... 122
8. Neural systems coordination: decisions and common foraging behaviors ...... 123
8.1. Goal directed navigation in a familiar context ...... 123
8.2. Goal directed navigation in a familiar context following a significant change in context...... 123
8.3. Goal directed navigation in a novel context ...... 124
9. The challenges ahead...... 125
Acknowledgements ...... 125
References ...... 125
1. Introduction animals will not acquire the efficient learning strategies necessary
for adaptive behaviors. It should be noted that the suggestion to
Nearly all cognitive processes utilize or include some aspect of link reinforcement learning ideas with navigation dates back
spatial information processing. An animal’s ability to find its way decades, although the terminology may be different (e.g., cost–
around its world is critical for survival; it is crucial for obtaining benefit analysis of foraging behavior vs. value-based decision
food, avoiding prey and finding mates. Research into spatial making). By investigating this link in freely navigating animals, we
information processing over many decades not only continues to may be able to uncover the mechanisms that underlie naturalistic
define the mechanisms that contribute to spatial information motivated behaviors.
processing, but these efforts have also provided significant insight
into the fundamental mechanisms that underlie learning and 2. Navigation and foraging behavior
memory more generally.
Within the laboratory, goal-directed spatial navigation, in The natural foraging environments on which laboratory
particular, is an immensely useful behavior to study because in navigational tasks are based are tremendously complex. The
many ways it reflects ethologically relevant learning challenges, forager’s challenge is to acquire sufficient food stores to prevent
and provides opportunities to examine dynamic features of neural starvation, produce viable offspring, and avoid predators. A natural
function that are otherwise not afforded by more simple tendency for many animals, including rodents, is to hoard small
behavioral paradigms and tasks. Goal-directed navigation is a amounts of food in a scattered distribution within their home
complex behavior, requiring the subject to perceive its environ- range or nest (Stephens, 1986). The caching of food requires careful
ment, learn about the significance of the environment, and then route planning to and from the source of food, the cache, and the
select where to go next based upon what has been learned. Thus, home nest. Moreover, because animals acquire food during times
navigation-based tasks can be used to investigate behavioral and when it is abundant, and recover it when food sources are scarce,
neural aspects of external and internal sensory perception, the animal must retain knowledge of where the food has been
learning and decision making, memory consolidation and updat- cached. This behavior, a naturally occurring spatially directed
ing, and planned movement. Goal-directed navigation, then, is a behavior, is evident in many species, including rodents, birds,
powerful model by which to study dynamic neural systems spiders, honeybees, and humans (e.g., Anderson, 1984; Davies,
interactions during a fundamental and complex natural behavior. 1977; Diaz-Fleischer, 2005; Goss-Custard, 1977; Hawkes et al.,
As a whole, efforts to understand the neurobiology of 1982; Waddington and Holden, 1979).
navigational behavior have focused mainly on the nature and The development of mathematical models that formally
mechanisms of spatial representation in limbic brain structures defined naturally occurring foraging behaviors led to optimal
that are known to be important for spatial learning. As a result, foraging theory which describes the foraging behavior of an animal
there have been important revelations regarding the physiological in relation to the metabolic payoff it receives when using different
mechanisms that control limbic spatial representations. Relating foraging options. Most animals are adapted structurally and
such representations, however, to limbic-mediated learning or physiologically to feed on a limited range of food and to gather this
memory has been indirect and correlational at best (as discussed in food in specific ways (e.g., caching of food during times of
Mizumori et al., 2007a). Here, we suggest that careful application abundance). Some food may contain more energy but be harder to
of reinforcement learning theory to an understanding of how capture or be further away, while food that is close at hand may not
decisions are made during goal-directed navigation can identify a be considered as nutritionally profitable. According to optimal
fundamental and essential process that likely underlies naviga- foraging theory, an ‘optimal forager’ will make decisions that
tion-related perception, learning, memory or response selection. maximize energy gain and minimize energy expenditure (Krebs
That is, in order to understand how spatial representations are and McCleery, 1984; Stephens, 1986). Two foraging models are of
related to learning, it is necessary to understand how decisions are note: the ‘prey model’ proposed by MacArthur and Pianka (1966),
made during navigation from both neural and behavioral and the ‘patch model’ proposed by Charnov (1976). The prey model
perspectives. Without the ability to make adaptive decisions, seeks to define the criteria that determine whether prey items will
98 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
be consumed based on the level of energetic investment needed to
acquire the prey and the rate of energetic return (MacArthur and
Pianka, 1966). One prediction of the prey model is that when there
is an abundance of high quality food, an animal’s diet will consist
mainly of these items, and lower quality food is less likely to be
consumed. The patch model, on the other hand (Charnov, 1976),
takes into account the energy expended when an animal searches
for food that is clumped in space and time, and thus must decide
how long to spend foraging within a food patch before abandoning
it and moving onto another (i.e., exploration vs. exploitation).
These models have been mapped onto the behavior of several
species (e.g., Anderson, 1984; Cowie, 1977; Davies, 1977; Diaz-
Fleischer, 2005; Goss-Custard, 1977; Lima, 1983), and they
demonstrated decades ago the strength of applying an economic
approach to the study of naturally occurring, complex behaviors.
3. Laboratory tasks that are based on foraging behavior
The study of navigational behavior within the laboratory
became central to the study of learning and memory function with
the introduction of the rat as the primary research subject (Munn,
1950). There are a number of reasons why rodent foraging behavior
is an ideal model with which to study complex learning in the
laboratory: (1) rodents are naturally excellent foragers, and
therefore they tend to learn tasks based on this ability
exceptionally well; (2) we can apply our understanding of the
brain’s natural motivational circuitry to gain new clues about the
mechanisms of a highly evolved and adaptive complex learning
system; (3) despite its complexity – which is what most real world
learning is – this model is highly tractable; (4) within the human
literature, navigation-based tasks have been developed that mimic
the tasks used with rodents (e.g., Astur et al., 1998; Burgess et al.,
2002; Fitting et al., 2007; Hamilton et al., 2002).
As early as the late 1890s and early 1900s, Willard S. Small used
one of the first mazes to investigate learning by white rats (Small,
1899, 1900, 1901), and others soon followed (e.g., Carr, 1917;
Fig. 1. Laboratory tasks used to assess navigational behaviors. (A) Morris swim task.
Honzik, 1933; Tolman, 1930; Watson, 1907). Early mazes
Photograph of a rat swimming in the cued version of the Morris swim task, in which
consisted of a system of runways or alleys arranged in various
an escape platform is clearly visible to the rat. In the spatial version of the task, the
configurations. The first investigations into maze learning were platform is submerged beneath the opaque water, and the rat uses distal cues
around the room to locate the platform. (B) Barnes Circular Platform Task.
aimed primarily at determining which sensory inputs were
Photograph of a rat making an ‘error’ on the Circular Platform Task by looking into a
essential for successfully navigating a maze to the intended goal,
hole that is not over the dark escape chamber. The arrow points to the correct
and this led to the assumption that navigation through a maze is
location of the hole over the goal, which the rat must find on the basis of the features
performed purely on proprioceptive responses (i.e., stimulus– of the environment distal to the platform. (C) Radial arm maze. Photograph of a rat
response behavior), although later studies demonstrated that on one of the 8 arms of the radial maze, which is designed to mimic natural foraging
behaviors. At the end of each of the arms is a food cup where reward is delivered. At
stimulus–response strategies were not sufficient to optimally solve
the beginning of a trial, subjects are placed in the center of the maze and allowed
complex mazes (Munn, 1950; O’Keefe and Nadel, 1978a,b). While
access to all of the maze arms, but only a subset of the arms will actually contain a
many different kinds of mazes were developed in the early years of
reward (usually four). After a rentention delay, the subject is returned to the maze.
maze use, only a select few are still used, and these are well suited In win-stay conditions, the same four arms are baited after the delay, and the
number of correct choices the subject makes in collecting these rewards is recorded.
for studying reinforcement learning in the context of navigation.
In win-shift conditions, the four arms not baited in the earlier trial are now baited,
These include the T-Maze, and similar variations including the
and the number of correct arm choices is recorded. Each day, a new set of four arms
multiple T-maze, the plus maze, and the Y-maze. The radial maze,
is chosen randomly. (D and E) Schematic of a plus maze. The plus maze represents a
introduced by David Olton in 1976, is another excellent and well- ‘dual solutions’ problem in that it can be solved using a ‘response’ strategy or a
used example of a so-called ‘multiple solutions’ laboratory task ‘place’ strategy. In the place/response task, rats are trained to retrieve food from one
arm of a T-maze or cross maze. The content of learning can be assessed by moving
(Olton and Samuelson, 1976). Unlike many of the mazes used in
the starting arm to the other side of the maze on a probe test. The animal may enter
the early days, the solution to these sorts of maze tasks is
the arm corresponding to the location of the reward during training (place strategy)
sufficiently ambiguous that successful performance is based on
or the arm corresponding to the turning response that was reinforced during
more than a single trajectory to a unique goal, and this allows for training (response strategy).
Photograph in panel (A) taken by Dr. J. Lister; photograph in panel (B) taken by Dr.
testing of more than one cognitive strategy (see Fig. 1).
C.A. Barnes; photograph in panel (C) taken by D. Jaramillo. All used with permission.
The plus maze figured prominently in early debates between
behaviorists and cognitive learning theorists who pondered what,
exactly, an animal learned that enabled it to find the goal on the other hand, argued that rats could engage in goal-directed
maze (Hull, 1932, 1943; Packard, 2009; Restle, 1957; Tolman, behaviors to solve the maze task, meaning that animals were
1930). Behaviorists argued that all behavior is simply elicited by capable of learning the casual relationship between their actions
antecedent stimuli within the environment, and thus a task such as and the resulting outcomes, allowing them control over their own
the plus maze can be solved simply via stimulus–response action based on their desire for that particular outcome (Tolman,
associations (Guthrie, 1935). Cognitive learning theorists, on the 1948). The plus-maze is arranged so that a goal location can be
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 99
approached from one of two start boxes. In the standard ‘dual
solution’ version of the task, rats are consistently released from the The Goal
same start arm, and are trained to retrieve reward from another in a Given Context
consistently baited maze arm. Rats can use one of two strategies to
solve this task: they can acquire information concerning the spatial
location of the goal and use that information to navigate to the
Learning & Memory Value Assessment
rewarded arm (i.e., a place strategy), or the rat can learn to
approach the rewarded location by acquiring a specific response,
such as a right body turn to reach the reward (i.e., a response
strategy). To determine which strategy the rat is using, a probe trial
Action Selection
can be given in which the rat starts the task from a different arm of
the maze. Rats with knowledge of the spatial location of the food
should continue to approach the rewarded arm on the probe trial,
whereas rats that have learned a specific body turn should choose
Outcome Evaluation
the opposite arm. A number of factors can influence which strategy
a rat will ultimately use to reach the goal, including the amount of
training the animal receives. Rat that are overtrained on this task
tend to predominantly use a response strategy, whereas most rats
will use a place strategy early on in training. Thus, overtraining Fig. 2. A general conceptual framework for evaluating goal-directed decision
making behavior. Within a context, an assessment of the internal and external
results in a shift from goal-directed action–outcome learning and
factors of the current situation help to determine the current goal for behavior. The
strategy use to less flexible stimulus–response learning and
factors that influence goal assessment include internal states (e.g., hunger or thirst)
strategy use (Packard, 1999; Packard and McGaugh, 1996). Other and external factors (e.g., distance to different goal locations, presence of
goal-directed navigation-based tasks that are widely used include predators). A value assessment involves considering how rewarding any one
goal is (e.g., a far away large cache of food vs. an uncertain but close cache) and
the Morris Swim Task (Morris, 1981) and the Barnes Circular
assigns value to each of the available options. An action is selected and is then
Platform Task (Barnes, 1979). All of the above tasks test goal-
implemented. An evaluation of the outcome is made. Did the behavior result in the
directed navigation that requires active decision making and
expected reward? Was the outcome better (e.g., more food) or worse (no food) than
learning about how reinforcers influence choices that are made. expected? The outcome of the behavior results in learning when the outcome does
These tasks can be contrasted to other ‘foraging’ tasks in which the not match the expectation, and might be considered ‘complete’ when a mismatch
between what is expected and what is actually achieved no longer occurs. Memory
animal is not required to implement a decision-based strategy,
stores can then be updated to guide subsequent behavior.
including random foraging (for bits of food sprinkled randomly
After Rangel et al. (2008).
around an open platform or box), tasks in which movement is
passive (i.e., ‘assisted’ exploration), or tasks in which animals
follow paths provided by the experimenter until rewards are ‘complete’ when the outcome of the chosen course of action is
encountered. aligned to the expected outcome. If the outcome, on the other
Navigational strategies (such as those just described) may range hand, is better or worse than expected, learning about which
from relatively simple approach and avoidance behavior to the use actions will lead to an optimal outcome continues.
of complex representations of the environment (e.g., geometrical These processes are, of course theoretical in nature and not
maps). In the context of natural foraging, the goal is to find food absolute, but help to guide our thinking about the neurobiological
while avoiding predators and minimizing energy expenditures. processes that contribute to successful goal-directed navigation. It
Similarly, in many maze tasks, the goal of a hungry rodent is to find may be prudent, at this point, to define ‘reward’ (for the sake of
food, or avoid unpleasant situations, such as cool water or bright simplicity, we consider reward to be synonymous with goal).
open spaces. In most cases, an animal is faced with more than one Rewards can be defined as objects or events that elicit approach
option. In a natural foraging context, animals may be faced with a and consummatory behavior, and they represent positive out-
situation in which it must take into account the energy expended comes of decisions that result in positive emotions and hedonic
while searching for food, and thus must decide how long to spend feelings. Rewards are crucial for survival and support elementary
foraging within a food patch before abandoning it and moving onto processes such as drinking, eating and reproduction. For other
another (i.e., exploration vs. exploitation). On a maze task (e.g., 8- situations, rewards can also be more abstract, such as money,
arm radial maze) the animal may need to decide which arms on the social status, and information. (e.g., Bromberg-Martin and
maze to visit first, for example, an arm that always has a small food Hikosaka, 2009; Corrado et al., 2009).
reward, or an arm that only sometimes has a large food reward. To
determine a course of action, the animal will engage in ‘value- 4. Reinforcement learning and decision making environments
based decision making’, which can be broken down into several
steps (Fig. 2; Rangel et al., 2008; Mizumori et al., 2000; Sutton and Reinforcement learning describes the process through which an
Barto, 1998). First, the organism needs to determine the goal of the organism learns to optimize behavior within a decision environ-
current behavior, a process that may include the assessment of ment (see Fig. 3). The ultimate goal of reinforcement learning is to
one’s internal state, such as level of hunger, and external context, implement behaviors or actions that result in a maximization of
such as risk in the environment. Next, a value assignment is made reward or minimization of punishment. The decision-making
for each available action, taking into consideration the relative cost environments in which reinforcement learning occurs consist of a
or benefit associated with each action. Once these values have been set of ‘states’ (Sutton and Barto, 1998), which in the case of
assigned, they can be compared, and a choice is then made about navigation, can be represented by locations on a maze (e.g., the
which behavior to select, and it is then implemented. An analysis of center platform would be one ‘state’, the end of an arm another
the outcome of the behavior can then be determined. Did the ‘state’); a set of possible actions that the decision-maker can
action result in the desired outcome? Was the outcome better than choose from (e.g., turn left or travel south); and a set of rules that
expected, or worse? Finally, this feedback is used to update the decision-maker will initially be naı¨ve to, and thus must learn
learning and memory processes so that future decisions can be via interaction with the environment (e.g., a large reward is always
impacted by what has just been learned. Learning is said to be available on the south maze arm). The actions or behaviors that the
100 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
A Start B Start
S1 S1
S4 S2 S3 + S4 S2 S3 + ++ ++
S5 S5 - - Model-free trial and error Model-based action-outcome
decision making decision making
Fig. 3. Reinforcement learning on a maze task. (A) Schematic of model-free trial and error decision making on a plus maze task. Model-free reinforcement learning involves
learning action values directly, by trial and error. The environments in which learning occurs consists of a set of states (i.e., locations on the maze), and each state (S1–S5) is
initially independent of other states. Because the decision-maker has not had experience with the states, they will all have similar values assigned to them, and are thus
equally likely to be chosen. (B) Schematic of model-based action–outcome decision making. The ultimate goal of reinforcement learning is to select actions that result in a
maximization of reward. Model-based reinforcement learning uses experience to construct an internal model, for example, a cognitive map, of the transitions and immediate
outcomes in the environment. Through trial error learning, this representation is constructed, and helps to strengthen the connection between states. In the example shown
here, thicker lines represent stronger associative connections, while thinner lines represent connections that are not as strong. Dashed lines indicate that an association has
not been strengthened, as in the case when reward is not delivered at one of those states (S5). In this example, the decision-maker has learned that choosing to go from S2 to S4
results in a large reward, whereas moving from S2 to S3 results in acquisition of a small reward. In a dynamic environment, the value of the rewards may change, resulting in
either strengthening or weakening of states.
decision-maker implements move the agent from one state to modeling behavioral and neural aspects of reward-related learning
another, and produces outcomes which can have positive or (e.g., Bayer and Glimcher, 2005; Kurth-Nelson and Redish, 2009,
negative utilities (e.g., finding a large reward, a small reward or no 2010; Ludvig et al., 2008; Maia, 2009; Montague et al., 1996;
reward). Finally, the utility of the outcome can change, even within Nakahara et al., 2004; O’Doherty et al., 2003; Pan et al., 2005, 2008;
the same state, by factors such as the motivational circumstances Schultz et al., 1997; Seymour et al., 2004) such that reward
of the decision-maker, such as a change from hunger or thirst to predictions are constantly improved by comparing them to actual
satiation (e.g., Aberman and Salamone, 1999; Dayan and Daw, rewards (Sutton and Barto, 1998). According to such models, an
2008; Dayan and Niv, 2008; Niv, 2009; Sutton and Barto, 1998). expected reward value for a given state is estimated. When
Reinforcement learning models are often divided into model- external reward is delivered, it is translated into an internal signal
free and model-based categories (e.g., Daw et al., 2005; Niv et al., that enters into a computation that determines whether the value
2006). Using model-free reinforcement learning strategies, ani- of the current state is better or worse than predicted. Signals that
mals learn the value of each action directly, by trial and error. In reflect discrepancies between expected and actual reward values
contrast, model-based reinforcement learning uses experience to can be used to update future expected values and reward
construct an internal model, for example, a cognitive map, of the probabilities. The temporal difference model can be used to
transitions and immediate outcomes in the environment. Animals describe how neural responses to stimuli change during learning;
can then estimate the value associated with each action in every as prediction improves, these responses reflect the linking of
trial using knowledge about their costs and benefits. Within the stimuli with their expected probability of reinforcement. By
framework of navigational behavior, this kind of learning allows extension, then, the temporal difference model predicts that neural
action selection to be dynamic, changing as the rules within the activation will gradually shift from the time of reward to the time
environment change, and is thus suited to support goal-directed of the predictors of subsequent reinforcement (reviewed in Suri,
behaviors. Learning using both model-based and model-free 2002; Suri and Schultz, 2001). Indeed, different types of neurons
strategies is generally driven by ‘prediction errors’, which are have been shown to exhibit these sorts of changes in firing during
the differences between actual and expected outcomes, and are learning (Hollerman and Schultz, 1998; Mirenowicz and Schultz,
used to update expectations in order to make predictions more 1994; Schultz et al., 1993).
accurate. Although the neural circuitry by which temporal difference
computations occur remains to be clarified, a popular idea is that
4.1. Temporal difference learning there is one neural network that selects behaviors (the ‘actor’), and
a second neural network that evaluates the outcomes of the
A critical problem in animal and human decision making is how behaviors selected by the actor. That second network is referred to
to choose behaviors that will lead to reward in the long run. A as the ‘critic’ (e.g., Houk et al., 1995; Sutton and Barto, 1998). The
‘classic’ approach to this problem was proposed by Rescorla and fact that neurons within the reward circuitry represent action, and
Wagner (1972) who argued that learning occurs when there is a sometimes action sequences, as well as reward (Graybiel, 1998;
discrepancy between events that are predicted and those that Hikosaka et al., 1989, 1999; Lavoie and Mizumori, 1994; Mulder
actually happen (Rescorla and Wagner, 1972). An extension to the et al., 2004; Schmitzer-Torbert and Redish, 2004; Schultz et al.,
Rescorla–Wagner model was proposed by Sutton (1988) and 1997; van der Meer et al., 2010; Wiener, 1993) was taken as initial
Sutton and Barto (1998) in a model which came to be known as evidence to support an actor–critic explanation. Computational
‘temporal difference learning’. This has been widely used in models suggest that the critic compares the outcome of the action
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 101
of the actor against the expected value based on past experience. If The prediction error hypothesis has garnered a great deal of
there is a discrepancy between predicted and actual rewards (i.e., a attention since it was first proposed because it is exactly the kind of
reward prediction error), a temporal difference reinforcement teaching signal that figures prominently in many models of
signal is used to update the value signal in memory. Future actions learning, including the Rescorla–Wagner model and the temporal
are then selected according to whether they are expected to difference reinforcement learning algorithm (Rescorla and
produce a maximal value reward. Wagner, 1972; Sutton and Barto, 1998; Sutton, 1988). There is,
The striatum has received much attention as the locus of the however, evidence that dopamine may also function in other
actor–critic function (e.g., Joel et al., 2002). The lateral dorsal capacities to facilitate learning. For example, while most con-
striatum is often considered to mediate stimulus–response or ceptualizations focus on reward-related signaling in the positive
habit learning, while the ventral striatum and medial dorsal sense, there is also evidence that a subpopulation of dopamine
striatum are thought of as evaluators of the outcomes of actions neurons exhibit phasic responses to aversive stimuli or to cues that
(see Section 6). Thus, many view the actor–critic networks as predict aversive events (e.g., Brischoux et al., 2009; Joshua et al.,
corresponding to the lateral dorsal striatum and ventral/medial 2008; Matsumoto and Hikosaka, 2009; Zweifel et al., 2011). In
dorsal striatum, respectively (e.g., van der Meer and Redish, 2011; addition, there are data suggesting that dopamine may provide a
van der Meer et al., 2010). Since reward prediction error signals are reward risk signal (Fiorillo et al., 2003), and also signal non-
coded by dopamine neurons as well (Khamassi et al., 2008; rewarding salient events, such as surprising or novel stimuli
O’Doherty et al., 2004; Schultz, 1997), dopamine neurons may also (Redgrave and Gurney, 2006). Thus, a broader conceptualization of
contribute to analysis by the critic. Others suggest that there are the role of dopamine in learning has emerged (e.g., Berridge, 2007;
multiple actor–critic functional modules within striatum, and Bromberg-Martin et al., 2010; Redgrave and Gurney, 2006;
these correspond to the matrix–patch cellular subdivisions that Redgrave et al., 1999b; Salamone, 2007; Wise, 2006). Based on a
run through both dorsal and ventral striatum, respectively (Houk, growing body of experimental evidence that suggests that
1995). While the issue of localization remains to be resolved, it is different subgroups of neurons within the midbrain respond
becoming clearer that the neurocircuitry underlying critic func- differentially to, reward, aversive stimuli and novelty, Bromberg-
tions extends across, at least, the dopaminergic-striatal circuitry Martin et al. (2010) suggest that some dopamine neurons encode
(see Section 6). reward value, necessary for reward seeking and value learning,
It is worth noting that as appealing as the temporal difference while others encode motivational salience necessary for orienting
model is, it cannot represent the full picture for how reinforcement and general motivation.
outcomes are determined. This is because reward is often delayed, One hypothesis about how dopamine supports reinforcement
and can be separated from the action for which it was rewarded by learning is that it adjusts the strength of synaptic connections
other, irrelevant actions. Such a delay creates an accountability between neurons according to a modified Hebbian rule (‘neurons
problem referred to as the problem of ‘temporal credit assignment’ that fire together wire together’; Hebb, 1949). Conceptually, if cell
(Sutton and Barto, 1998). Studies of goal-directed navigation could A activates cell B, and cell B results in an action that is rewarded,
be particularly useful for determining how the brain naturally dopamine is released and the A/B connection is reinforced
solves the temporal credit assignment: one can imagine a case (Montague et al., 1996; Schultz, 1998a,b). With enough experience,
when an animal will have to make a decision at, for example, a ‘fork this mechanism would allow an organism to learn the optimal
in the road’. After enacting a decision about which way to turn, a choice of actions to gain reward. In fact, dopamine has been shown
number of pathways may become available, the selection of any to facilitate synaptic plasticity in several mnemonic brain
one of which will lead to the goal (see Fig. 3). The next time the structures (Frank, 2005; Goto et al., 2010; Lisman and Grace,
animal encounters the ‘fork in the road’, it will have to remember 2005; Marowsky et al., 2005; Molina-Luna et al., 2009; Surmeier
which of the many subsequent alternatives led to the desired goal. et al., 2010). The precise information transmitted when dopamine
cells fire is not clear. To address this issue, it is necessary to
4.2. Dopamine and reinforcement learning understand the firing patterns of dopamine neurons, and the
factors that regulate these patterns. Dopamine signals occur in two
A critical and unresolved issue is how the brain implements modes, a tonic mode and a phasic mode (Grace, 1991; Grace et al.,
reinforcement learning algorithms. In a series of pioneering studies 2007). Tonic dopaminergic signaling maintains a steady baseline
conducted in non-human primates, Schultz et al. (1997) provided level of dopamine in afferent structures. While a precise functional
evidence that one of the primary neural correlate of reinforcement role for the tonic dopamine signal has not yet been established
learning theory may reside in the signal provided by midbrain (Ostlund et al., 2011), one intriguing hypothesis is that tonic
dopamine neurons. Dopamine neurons respond with phasic bursts dopamine may represent the ‘‘net value’’ of rewards, and underlie
of action potentials when an unexpected reward is delivered, and the vigor with which responding is made (Niv et al., 2007). Phasic
also respond to conditioned cues that predict reward (Ljungberg dopamine, on the other hand, is the dopaminergic signal that is
et al., 1992; Mirenowicz and Schultz, 1994). When, however, an thought to do the heavy lifting, at least in terms of reward
expected event or reward does not occur, the activity of some processing (Schultz, 1997; Schultz et al., 1997; Wise, 2005) and
putative dopamine cells tend is inhibited. Thus, a reward that is incentive salience that promotes reward seeking (Berridge and
better than predicted can generate a positive prediction error, a Robinson, 1998). Dopamine may have unique effects across
fully predicted reward elicits no error, and a reward that is worse different efferent targets, however, since (a) the regulation of
than predicted can elicit a negative prediction error (e.g., Bayer and tonic vs. phasic activation of dopamine cells is controlled by an
Glimcher, 2005; Hollerman and Schultz, 1998; Hollerman et al., array of diverse inputs, and (b) dopamine efferent systems express
1998). In this way, dopamine acts as a teaching signal that enables different levels and types of dopamine receptors. Important for the
the use of flexible behaviors during learning (Schultz and present discussion, both the ventral tegmental area (VTA) and the
Dickinson, 2000), and facilitates motivated behaviors by signaling substantia nigra pars compacta (SNc) project to the hippocampus
the salience of environmental stimuli, such as cues that predict and to the striatum, two brain structures frequently discussed in
food (Berridge and Robinson, 1998; Flagel et al., 2011; Salamone terms of goal-directed navigation and learning. How dopamine
and Correa, 2002). In addition, the prediction error signal appears contributes to information processing within these structures
to take into account the behavioral context in which rewards are during navigation-based learning will be discussed in the
obtained (Nakahara et al., 2004). following sections.
102 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 A Place cells Neocortex
Postrhinal Perirhinal cortex cortex
Medial Lateral entorhinal entorhinal cortex cortex
CA1
CA3 Dentate Subiculum CA1 CA3 gyrus
Spatial information Hippocampal formation B Grid cells
Nonpatial information Parahippocampal cortex
Fig. 4. Flow of cortical information to hippocampus. Multimodal sensory, motor,
and associative information arrive in the hippocampus primarily through the
parahippocampal cortex. The anatomically distinct medial entorhinal cortex and
lateral entorhinal cortex receive spatial and nonspatial information from distinct
adjacent cortical regions of the postrhinal cortex (spatial), which receives input
from the parietal and retrosplenial cortices (not shown), and perirhinal cortex
(nonspatial), respectively. Both entorhinal cortical regions, in turn, project to the
dentate gyrus, CA3, CA1 and subicular regions of hippocampus proper. Although all
intrahippocampal regions receive neocortical input, each is thought to make a
distinct contribution to the determination of context saliency as context
information passes through from the dentate gyrus to the subiculum. The red
arrow refers to the large recurrent excitatory system found amongst CA3 neurons.
Presumably this unique pattern allows for information to be held on-line for brief periods.
Head direction cells
5. The neurobiology of reinforcement learning and goal- C
directed navigation: hippocampal contributions
The previous discussion clearly illustrates the central role of
dopamine in decision-making processes that lead to effective
learning. In this section, we first describe the hippocampal neural
circuit whose dynamic and interactive functions form the
substrate on which the dopamine system acts, then discuss how
this circuit guides decision making (and ultimately learning) by
identifying the saliency of a context (i.e., whether a familiar
context has changed or if the current context is novel). Both
instances of context analysis may rely on the same computation.
5.1. Hippocampal place fields as spatial context representations
The hippocampal complex is comprised of hippocampus proper
and the surrounding parahippocampal cortex. Generally speaking,
there are two tracks of information flow into the hippocampus Fig. 5. (A) Schematic illustration of location-selective firing by a hippocampal CA1
from the neocortex (see Fig. 4). Spatial information arrives from the place cell (red), and a hippocampal CA3 place cell (blue). As shown, CA3 place fields
tend to be more spatially constricted than CA1 place fields. Also place fields
postrhinal region to the medial entorhinal area of posterior cortex.
typically show a Gaussian distribution of firing as an animal traverses the place
In contrast, predominantly nonspatial information is passed from
field. (B) Entorhinal cortex contains cells that show regularly spaced location-
the perirhinal cortex to the lateral entorhinal cortex. Both
selective firing. These are referred to as grid cells as the firing fields can be viewed as
entorhinal cortices in turn project to all of the subregions of vertices of a grid that covers a particular environment. (C) A third type of spatial
hippocampus proper (which includes the dentate gyrus, CA3, CA1 representation is one that relays information about the directional heading of an
animal. In this example, the arrows indicate the preferred orientation direction of a
and subicular areas; Amaral and Lavenex, 2006; Burwell, 2000;
cell: if the animal orients its head in the northeast direction of the environment
Burwell and Amaral, 1998a,b; Van Strien et al., 2009).
(from any location), the cell will preferentially fire. Typically, when the rat orients
Single unit recording studies have generated foundational its head in other directions, a head direction cell will not fire.
information for theories of hippocampal function. The most
commonly reported behavioral correlate of hippocampal output
neurons (pyramidal cells) is location-selective firing, referred to as
place fields (see Fig. 5 for an example; O’Keefe and Dostrovsky,
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 103
1971). The seminal discovery that hippocampal pyramidal neurons (Buzsaki, 1989; Fenton and Muller, 1998; Ferbinteanu and Shapiro,
exhibit remarkably distinct and reliable firing when rats visit 2003; Ferbinteanu et al., 2011; Foster and Wilson, 2006; Frank
particular regions of the environment led to a widely held view of et al., 2000; Lee and Wilson, 2002; Louie and Wilson, 2001;
hippocampus as a cognitive map (O’Keefe and Nadel, 1978a,b). Olypher et al., 2002; Pennartz et al., 2002; Touretzky and Redish,
Decades of research (for reviews see McNaughton et al., 1996; 1996; Redish, 1999; Wilson and McNaughton, 1994; Wood et al.,
Mizumori et al., 1999; Muller et al., 1996; O’Keefe, 1976; O’Mara, 2000; Yeshenko et al., 2004). Additional reports provide evidence
1995; Wiener, 1996) clearly demonstrate that place fields reflect that place fields reflect expectations based on learned reward
more than details of the current external sensory surround since information (e.g., Jackson and Redish, 2007). Place fields have been
they are observed when external cues are essentially absent observed to move closer to goal locations as animals gain more
(McNaughton et al., 1996; O’Keefe and Conway, 1978; Quirk et al., experience receiving rewards at the goal (Hollup et al., 2001;
1990). Further, in the absence of external sensory cues, temporal or Lenck-Santini et al., 2001, 2002). Further, when compared to times
internal sensory cue information has been shown to shape the of random foraging, a larger proportion of hippocampal neurons
characteristics of place fields. For instance, the elapsed time since exhibit reward responsiveness when rats are explicitly trained to
leaving a goal box can often be a better predictor of place fields discriminate reward locations (Smith and Mizumori, 2006b). Thus,
than the external features of an environment (Gothard et al., 1996; an animal’s motivational state or its expectations or successful
Redish et al., 2000). Also, internally generated sensory and motion behavioral outcomes contribute to how learning-related brain
information about one’s own behavior impacts place fields: the structures code information that is directly relevant to future
velocity of an animal’s movement through a place field, the decisions and behavioral choices.
direction in which rats traverse a place field, and vestibular (or Place fields, then, appear to represent a matrix of information
inertial) information has been shown to be correlated with place that includes location-selective salient features such as external
cell firing rates (e.g., Gavrilov et al., 1998; Hill and Best, 1981; and internal sensory information, an animal’s past, present, and
Knierim et al., 1995; Markus et al., 1994; McNaughton et al., 1983; future behaviors relative to the target location, as well as the
Wiener et al., 1995). Evidence indicates that the location selectivity expectations for the consequences of behaviors. This sort of
of place fields is positively related to the degree of sensitivity to complex representation has been taken as evidence that during
internally generated cues: for example, the extent to which place active navigation, the hippocampus represents spatially organized
fields are sensitive to internally generated cues systematically contextual information, perhaps for the purpose of determining
declines from the septal pole to the temporal pole of hippocampus the salience of the current context. Context saliency refers to not
(Maurer et al., 2005), and place fields become increasingly larger only the significance of currently existing contextual features, but
for place cells recorded along the dorsal-to-ventral axis (e.g., Jung also the extent to which the expected contextual features have
et al., 1994). Also supporting the conclusion that (at least dorsal) changed (e.g., Kubie and Ranck, 1983; Mizumori et al., 1999, 2000;
hippocampal place fields represent egocentric information are Mizumori, 2008; Nadel and Payne, 2002; Nadel and Wilner, 1980).
findings that the degree to which animals are free to move about in This conclusion is consistent with a literature documenting the
an environment predicts place field specificity (Foster et al., 1989; impact of hippocampal lesions on animals’ use of contextual
Gavrilov et al., 1998; Song et al., 2005). Compared to passive information (for reviews see Anagnostaras et al., 2001; Maren,
movement conditions in which rats are made to go through a place 2001; Myers and Gluck, 1994). For example, subjects with
field either by being held by the experimenter or by being placed hippocampal damage do not exhibit conditioned fear responses
on a moveable robotic device, active and unrestrained movement to contextual stimuli even though responses to discrete condition-
corresponds to the observation of more selective and reliable place al stimuli remain intact (Kim and Fanselow, 1992; Phillips and
fields (Terrazas et al., 2005). The fact that neural representations in LeDoux, 1992). While intact subjects exhibit decrements in
the brain are so dramatically affected by voluntary and active conditioned responding when the context is altered, subjects with
navigation provides a compelling argument for studying not only lesions of the hippocampus (Penick and Solomon, 1991) or the
learning, but also decision making, in animals that navigate entorhinal cortex (Freeman et al., 1997) do not. These findings
spatially extended environments. converge on a hypothesis that hippocampus is important for
One interpretation of the sensitivity of place fields to both determining context saliency.
egocentric and allocentric information is that it allows rats to It is important to note that a context processing interpretation
rapidly switch between multiple cue sources, thereby insuring of hippocampal neural representations is entirely consistent with a
continuously adaptive choices (e.g., Etienne and Jeffery, 2004; number of hypotheses that have been put forth to account for
Gavrilov et al., 1998; Knierim et al., 1995; Maurer et al., 2005; hippocampal contributions to learning, including spatial proces-
McNaughton et al., 1996; Mizumori et al., 2000; Mizumori, 2008; sing (e.g., Long and Kesner, 1996; O’Keefe and Nadel, 1978a,b;
Whishaw and Gorny, 1999). Such an ability seems advantageous in Poucet, 1993), working memory (Olton et al., 1979), relational
a constantly changing environment. The identity of the necessary learning (Eichenbaum and Cohen, 2001), episodic memory (e.g.,
changes in conditions that lead to a decision to switch strategies, Tulving, 2002), context processing (e.g., Hirsh, 1974), declarative
however, remains to be determined. memory (Squire, 1994), and the encoding of experiences in general
To identify motivational or mnemonic, rather than sensory or (Moscovitch et al., 2005). It is consistent with these other theories
behavioral state influences on place fields, rats can be trained to because context analyses represent a fundamental computation of
solve a maze task under conditions in which the external sensory the hippocampus that underlies relational learning, or episodic,
environment and the behavioral requirements of the task are held working, or declarative memory (e.g., Mizumori, 2008).
constant while the internal state or specific memory used to guide
behaviors are manipulated by the experimenter (e.g., Frank et al., 5.2. The hippocampus distinguishes contexts during navigation
2000; Kelemen and Fenton, 2010; Smith and Mizumori, 2006a,b;
Wood et al., 2000; Yeshenko et al., 2004). Under these test The literature shows that place cells are simultaneously
conditions place field representation of sensory and behavioral responsive to, and thus presumably encode, a combination of
information can be conditional upon an animal’s motivational different context-defining features such as spatial information (i.e.,
state (e.g., hungry or thirsty; Kennedy and Shapiro, 2004), as well location and heading direction), consequential information (i.e.,
as recent (retrospective coding) or upcoming (prospective coding) reward), current movement-related (i.e., velocity and acceleration
events such as behavioral sequences, or response trajectories – determinants of response trajectory), external (nonspatial)
104 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
sensory information, the currently active memory (defined cells exhibit characteristic short-lasting, high frequency bursts of
operationally in terms of task strategy and/or task phase), and action potentials when a rat passes through a cell’s place field
the current motivational state. Thus, place fields are considered to (Ranck, 1973). This type of phasic, burst firing pattern is thought to
be spatial context representations, and it has been suggested that be associated with increased synaptic plasticity (Martin et al.,
they code the extent to which familiar contexts change (Nadel and 2000), as well as the encoding of discrete features of a situation
Payne, 2002; Nadel and Wilner, 1980), perhaps by performing a that do not change very rapidly or often (e.g., significant locations,
match–mismatch comparison of expected and actual context reward expectations, task phase). Interneurons, on the other hand,
features (e.g., Anderson and Jeffery, 2003; Jeffery et al., 2004; discharge signals continuously and at high rates, a pattern that is
Mizumori et al., 1999, 2000; Vinogradova, 1995). The results of well suited to encode rapidly and continuously changing features,
match–mismatch comparisons can serve as a metric for determin- such as changes in movement and orientation during task
ing the saliency of the current context, and this in turn should be performance. The combination of context features and the
directly related to an animal’s ability to distinguish contexts. Such potential for temporally patterned discharge by both pyramidal
a discrimination function seems necessary for the hippocampus to cells and interneurons, then, provides the hippocampus with a rich
define significant events or episodes (as defined by Tulving, 2002). array of rate and temporal neural codes to use in the determination
Analogous to what has been described by others (e.g., Hasselmo, of context saliency (Mizumori et al., 1999; Mizumori, 2008).
2005a,b; Hasselmo and McGaughy, 2004; Lisman, 1999; Mizumori, It is often reported that place fields rapidly reorganize (i.e.,
2008; Smith and Mizumori, 2006a,b; Treves, 2004; Wang and change field location and/or firing rate with the place field) when
Morris, 2010) the process of comparing expected and actual an environmental context is altered. Notably, however, unless an
contexts should be automatic in nature because a change in a animal is tested in a completely novel environment, one also finds
context can happen often or at unexpected times during natural a group of place fields that are unchanged following a change in the
foraging. By continually determining context saliency (i.e., always context. Thus there seems to be two forms of context representa-
computing whether a context has changed), the hippocampus can tion in the hippocampus. The place fields that reorganize after
immediately alert other neural systems when a change does context modification may reflect current contextual features while
occurs. In this way, the hippocampus contributes to rapid learning the place fields that persist when a context changes may reflect the
of new information and the optimal implementation of adaptive expected contextual features. In principle, a novel environment
choices and behaviors. would not generate expectations, resulting in ‘complete reorgani-
What is the underlying neural circuitry that discriminates zation’, where 100% of the cells exhibit new place field properties.
contexts? A Context Discrimination Hypothesis (Mizumori, 2008; However, when an animal experiences a change in a familiar
Smith and Mizumori, 2006a) emphasizes the importance of context, one observes what is referred to as ‘partial reorganization’,
representing integrated sensory, motivational, response, and when only a subset of place fields show altered properties (for
memorial input. Indeed, place fields represent such integrated review, see Colgin et al., 2008). To explain the latter, it is helpful to
information. The relative strengths of these four types of inputs clarify that any context representation, almost by definition,
may vary depending on task demands such that a given cell may reflects a unique array of inputs. In theory, then, a change in any
show, for example, a place correlate during the performance of one one or combination of features could result in the production of an
task, and a nonspatial correlate during the performance of a ‘error’ signal that reflects a mismatch between expected and actual
different task (e.g., Wiener et al., 1989). Also, movement correlates context features (Mizumori et al., 2000). If such a ‘context
observed in one task may not be observed when the memory prediction error’ occurs, then the output message from hippocam-
component of the context, and not behavior, changes (e.g., pus should reflect this fact. Such a signal may be sent to update
Yeshenko et al., 2004). It should be noted that context discrimina- cortical memory circuits, which in turn leads to an update of the
tion by hippocampal neurons is observed not only during most recent hippocampal expectation for a context. A hippocampal
performance of spatial tasks, but also during nonspatial task output that signals a context prediction error may also be sent to
performance such as olfactory (e.g., Wiener et al., 1989) or auditory the ventral striatum to engage the critic function of the actor–critic
discrimination (Freeman et al., 1996; Sakurai, 1994). Thus, context system (described in more detail in Section 4.1). Further, a context
discrimination may be a basic hippocampal operation that can be error message should update the selection of ongoing behaviors by
universally applied to facilitate decision making, enhance learning, informing basal ganglia circuitry. If it is determined that the
and/or strengthen any sort of memory that uses context context has not changed (i.e., there is no place field reorganization),
information. As such, it is important to understand how context a consistent hippocampal output will result in the persistence and
discrimination is accomplished at a neural level, since this should strengthening of currently active neural activity patterns, which in
help us to understand the types of contextual information that turn maintains the same expectation information in hippocampus,
come to impact future decisions. The following summarizes the and the same behavioral expression patterns.
neural circuitry that may be responsible for determining context It is intriguing to note that the proposed error analysis by
saliency by hippocampal neurons. hippocampus is analogous to error prediction signals that
dopamine cells generate when an expected reward is not realized.
5.3. Cellular and network mechanisms underlying hippocampal It is known from studies of dopamine cells that the magnitude of
context processing the error prediction signal depends in part on the certainty and
saliency of reward (Fiorillo et al., 2003; Mirenowicz and Schultz,
Determining context saliency likely involves a number of stages 1994; Schultz, 1997; Schultz et al., 1997): the less certain it is that a
of processing within different synaptic regions of hippocampus reward will be found, the smaller the magnitude of an error
(Fig. 4). The following discussion describes how these various prediction signal. When this idea is applied to our understanding of
stages of processing may result in an assessment of context place field reorganization, one could argue that whether a place
saliency, beginning with context representation by individual field reorganizes depends on the strength of memory expectations.
neurons. A strong expectation signal to some cells may result in a high
The relative influence of context-defining input on the threshold for generating error signals, i.e., place field reorganiza-
discharge rates of place (pyramidal) cells and interneurons may tion. Place fields of these cells would tend to show persistent place
vary not only according to the strength of each type of afferent fields when there is a minor context shift. Such a condition may
input, but also the intrinsic (membrane) properties of a cell. Place apply to CA1. Other cells may not receive such a strong expectation
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 105
signal, resulting in place field reorganization following even minor themselves. The recurrent networks of the CA3 region may support
changes in context, such as that which is observed for CA3 place the short-term buffer that is postulated to be needed to determine
fields. whether specific features of the current context match expected
With the introduction of new technologies and clever contextual features (e.g., Gold and Kesner, 2005; Guzowski et al.,
experimentation by a large number of researchers, a neurobiologi- 2004; Treves, 2004).
cal model of hippocampal function has emerged that describes CA1 also seems to represent current and expected contextual
mechanisms involved in determining the saliency of a context. The information but, relative to CA3, a greater proportion of cells show
process of context comparison begins by identifying the relevant persistent place fields despite changes in a familiar context (e.g.,
stimuli and memories (or expectations). The dentate gyrus is Lee et al., 2004; Leutgeb et al., 2004; Mizumori et al., 1989b, 1999).
thought to engage in pattern separation functions that might serve CA1 place fields also show more discordant responses to context
this purpose by distinguishing between similar, potentially change than CA3 (Lee et al., 2004), and this may reflect the fact that
important inputs (Gilbert et al., 2001; Leutgeb et al., 2007; O’Reilly CA3 is driven in large part by recurrent collaterals while CA1 is not.
and McClelland, 1994; Rolls, 1996). Specifically, dentate gyrus Further, as noted above, CA3 may be more strongly tied to a spatial
place fields tend to be smaller (i.e., more spatially localized) than coordinate system than CA1, and perhaps this accounts for the
either CA3 or CA1 place fields, and they show the most immediate common findings that CA3 place fields tend to be smaller in size
response to context changes. Also, the fact that there is tremendous relative to CA1 place fields, and that more CA1 than CA3 place cells
convergence of input from the dentate gyrus to the CA3 regions show ‘split fields’, i.e., more than one location that elicits elevated
(Amaral et al., 1990) further suggests that the dentate gyrus filters, firing. All of the above differences suggest that CA1 place fields do
or separates patterns of information, for subsequent hippocampal not convey as precise location or sensory information as CA3 place
processing. The transformation of CA3 place fields to downstream fields, and consequently they may include more nonspatial
CA1 place fields is currently enigmatic since the connections are information within their neural code (Mizumori et al., 2000;
direct, yet there are clear differences in the properties of CA3 and Wiener et al., 1989). Furthermore, Henriksen and colleagues
CA1 place fields. (2010) further suggest that the extent to which CA1 conveys
spatial and nonspatial information varies depending on the
5.3.1. CA3 and CA1 place fields contributions to the evaluation of location of the CA1 place cell being recorded: distal (closest to
context subiculum) CA1 neurons show stronger spatial codes than
Hippocampal-based context evaluations require representation proximal CA1 place neurons.
of both expected and current context information. There is ample A difference in the ratio of spatial to nonspatial information
evidence that both CA1 and CA3 place fields represent both coded by CA3 and CA1 place fields may be accounted for by their
expected and current contextual information. However, recent different afferent patterns of input. For example, nonspatial
data suggest that the contributions made by CA3 and CA1 place context-defining information may arrive directly in CA1 via layer
cells differ. When rats perform at asymptotic levels on hippocam- III entorhinal input. By comparison, CA3 receives its direct
pal-dependent spatial memory tasks, CA3 place fields are smaller entorhinal cortex input from layer II (Witter et al., 2000) which
than CA1 place fields, and more easily disrupted following cue seems to contain more neural codes for explicit spatial features
manipulations (Barnes et al., 1990; Guzowski et al., 2004; than layer III. If some of the nonspatial input to CA1 includes
Mizumori, 2006; Mizumori et al., 1989b, 1999). CA3 place fields memory-defined expectations, then this may account for a greater
are more labile generally than CA1 place fields in that they are also proportion of CA1 place fields showing stability across minor shifts
more easily disrupted following reversible inactivation of the in context.
medial septum (Mizumori et al., 1989a). The greater sensitivity of If CA3 is primarily responsible for the comparison of contextual
CA3 fields to changed inputs seems to occur regardless of the type information, then what function does CA1 serve? Many have
of task being used (Lee et al., 2004; Leutgeb et al., 2004). This may suggested that CA1 is especially important for temporally
indicate that CA3 place fields are more exclusively linked to the organizing or sequencing information (e.g., Gilbert et al., 2001;
currently active spatial coordinate system (i.e., a map; Leutgeb Hampson et al., 1993; Hoge and Kesner, 2007; Kesner et al., 2004;
et al., 2007) compared to CA1 place fields. As such, CA3 is better Olton et al., 1979; Rawlins, 1985; Treves, 2004; Wiener et al.,
suited than CA1 to distinguish the contextual significance of 1995). That is, CA1 place cells may temporally organize, or define,
absolute locations in space, a process that presumably relies on CA3 output such that meaningful epochs of related information are
small differences in input configurations at different locations. This passed on to efferent targets, such as the prefrontal cortex (Jay
function is likely related to the key role that CA3 plays in the rapid et al., 1989) and subiculum, to impact future behavioral choices.
acquisition of new memories (Kesner, 2007; Miyashita et al., 2009), Neocortical-based memory representations may, via direct ento-
a conclusion that is consistent with a vast literature on the rhinal input to CA1 (Witter et al., 2000), predispose CA1 to
importance of hippocampus for new learning (Mizumori et al., temporally organize CA3-based information in experience-depen-
2007b). dent ways (Mizumori et al., 1999). Although the precise nature of
If CA3 is the brain area where context novelty is identified, then this temporal organization remains to be determined, CA1 appears
one would expect CA3 to also represent information that defines to be more tightly coupled than CA3 cells to the rhythmic
the baseline expectations from which novelty (i.e., unexpected oscillations of hippocampal EEG (Buzsaki, 2005; Buzsaki and
information) is determined. In this regard, it is worth noting that Chrobak, 2005).
despite the greater overall sensitivity of CA3 place fields to changes
in contextual information, a subpopulation of CA3 place fields 5.3.2. Temporal encoding of spatial contextual information
continue to persist when faced with contextual changes in familiar It is becoming clearer that important context information is
environments (Mizumori et al., 1999). Novelty detection requires a embedded within the temporal organization of intrahippocampal
mechanism by which baseline and new information can be held networks. Many years ago, it was shown that movement through
briefly on-line so that the expected and current information can be place fields is associated with dynamic changes in spike timing
compared. The intrinsic circuitry of CA3 is one that can hold relative to the ongoing theta oscillations in the EEG (O’Keefe and
information on-line: less than one-third of its inputs come from Recce, 1993). That is, on a single pass through a field, the first spike
outside of CA3 (Amaral and Lavenex, 2006), and the most of successive bursts of spikes occurs at progressively earlier phases
prominent input to CA3 pyramidal cells come from the CA3 cells of the theta cycle. The discovery of this so-called ‘phase precession’
106 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
effect is considered significant because it was the first clear hippocampus via the medial regions of the parahippocampal
evidence that place cells are part of a temporal code that could cortex (i.e., postrhinal cortex and the MEC) since a prominent input
contribute to the mnemonic processes of the hippocampus. to postrhinal cortex is the posterior parietal cortex (Burwell and
Changes in this sort of temporally organized spiking may be a Amaral, 1998a,b). In contrast, the multimodal temporal cortex of
key mechanism by which place fields provide a link between rat projects nonspatial information to the hippocampus via the
temporally extended behaviors of an animal and the comparatively lateral parahippocampal regions (i.e., perirhinal cortex and LEC).
rapid synaptic plasticity mechanisms that are thought to subserve Both MEC and LEC afferents appear to relay visual, auditory,
learning (e.g., Skaggs et al., 1996). Theoretical models have been olfactory and/or tactile sensory information (Burwell and Amaral,
generated to explain in more detail how phase precession could 1998a). Thus, the nature of information transmitted within a
explain the link between predictive and sequence behaviors, and pathway or brain structure does not reveal how that information is
neural plasticity mechanisms (Buzsaki, 2005; Buzsaki and used. [This broad conclusion will be seen to be relevant when the
Chrobak, 2005; Jensen and Lisman, 1996; Lisman and Redish, mesoaccumbens system is discussed below.] Also, although the
2009; Zugaro et al., 2005). MEC is often considered to be specialized to process spatial
Another form of temporal-based neuroplasticity involves a information, accurate navigation likely relies on integrated input
change in the timing of spike discharge by one cell relative to those from both MEC and LEC since one needs to understand the spatial
of other cells. For example, theta recorded from CA1 and CA3 tend dimensions of behavior (e.g., location and orientation) relative to
to be more cohesive when rats pass through the stem region of a T- salient environmental information. Indeed, contralateral, but not
Maze, presumably reflecting greater synchrony of neural firing ipsilateral, lesion of the perirhinal cortex and the hippocampus
during times when decision are made (Montgomery et al., 2009). results in impaired object–place association learning (Jo and Lee,
Greater synchronization could offer a stronger output signal to 2010).
efferent structures. Experience-dependent temporal codes may The recent development of more theories on a more specific
also be found in terms of the temporal relationships between the role for parahippocampal cortex during active navigation is mainly
firing of cells with adjacent place fields. With continued exposure due to the discovery of multiple types of spatial representation in
to a new environment, place fields begin to expand asymmetrically the MEC (Enomoto and Floresco, 2009; Hafting et al., 2005;
in that the peak firing rate is achieved with shorter latency upon Sargolini et al., 2006; Taha et al., 2007), including grid cells and head
entrance into the field (Mehta et al., 1997, 2000). It was postulated direction cells (see Fig. 5). Like place cells, grid cells fire when
that repeated activation of a particular sequence of place cells animals traverse specific locations within an environment.
results in stronger synaptic connections between cells with However, unlike place cells, grid cells fire relative to a number
adjacent fields. Under these conditions entry into one place field of small regions arranged in a hexagonal grid rather than in a single
begins to activate the cell with the adjacent place field at shorter region of a given environment. Head direction cells, on the other
and shorter latency. The asymmetric backwards expansion of place hand, show elevated firing rates that coincide with the particular
fields is thought to provide a neural mechanism for learning head orientation of the rat regardless of the rat’s location. A third
directional sequences. Moreover, it has been suggested that the population of cells shows both grid and head direction properties,
backward expansion phenomenon may contribute to the trans- and are therefore called conjunctive cells. Finally, a fourth class of
formation of a rate code to a temporal code such as that illustrated spatial cell is the border cells that are found in the medial entorhinal
in phase precession (Mehta et al., 2000). The backward expansion cortex. Head direction cells and border cells are known to also exist
mechanism could also help to explain other place field phenome- in related cortical regions, such as subiculum, postsubiculum,
non such as the tendency for place cells to fire in anticipation of parasubiculum, and postrhinal cortices (Lever et al., 2009; Taube
entering a field within a familiar environment (Muller and Kubie, et al., 1990). There are strong anatomical and functional ties
1989). While the dynamic changes in place field shape are between cells associated with these types of spatial representation,
intriguing, it remains to be determined whether the asymmetric and they are thought to form a coordinated network for orienting
expansion is directly related to spatial learning. Also, there is an an animal in allocentric space.
intriguing possibility that dopamine may play a key role in There are a number of excellent reviews that detail grid field
coordinating some aspect of the temporal phenomena observed in properties (Burgess et al., 2007; Derdikman and Moser, 2010;
hippocampus. For example it has been shown that the temporal Moser et al., 2008; Savelli and Knierim, 2010). Briefly, MEC layer II
coherence of the discharges of place cells is greater in mice with an has the highest proportion of grid cells (50%), layer III has a more
intact hippocampus compared to mice with deficient NMDA diverse blend of grid cells, head direction cells and conjunctive
systems (McHugh et al., 1996), and there is evidence that cells; head direction cells are the predominant cell type in the deep
dopamine may exert powerful influences in hippocampus via layers. Nearby grid cells tend to have similar spacing, but their
control of NMDA receptor function (e.g., Bethus et al., 2010; Frey peaks are offset relative to each other. The spacing seems to reflect
et al., 1990). Therefore it is possible that even though the relative spatial features of the current environment since, in familiar
quantity of dopamine innervations in hippocampus is small (Fields environments, grid fields will rotate in the direction of cue
et al., 2007) dopamine may have a critical orchestrating role in a rotations, and if a familiar environment is widened or narrowed,
hippocampal determination of context salience. grid field spacing will resize accordingly (Barry et al., 2007). Across
the dorsal–ventral axis, there seems to be a topographically
5.3.3. Sources of hippocampal spatial and nonspatial information organized increased spacing of adjacent grid fields (Enomoto and
Consideration of the sources of the different types of informa- Floresco, 2009; Hafting et al., 2005). If experimental procedures
tion that enters into hippocampal context-related computations induce grid field reorganization, different grid fields rotate and
provides keen insight into the stages of processing required to translate together. Such cohesion between grid cells, along with
make efficient, context-relevant choices. The parahippocampal the regularity of the grids and their apparently consistent spacing,
region (which includes perirhinal, postrhinal, and entorhinal gives the impression that the grid system is stable across
cortices; see Fig. 4) is considered to provide the bulk of the spatial environments and that they might form a blueprint (i.e., a spatial
and nonspatial sensory information to the hippocampus (Burwell, reference frame) onto which the hippocampus can add relevant
2000; Burwell and Amaral, 1998a,b; Eichenbaum and Lipton, 2008; information. Presumably, these sort of spatial and nonspatial
Hunsaker et al., 2007; Knierim et al., 2006; Witter et al., 2000). associations in hippocampus derive from convergent input from
Generally, spatial information is thought to arrive in the the MEC and LEC. This associative process must occur fairly rapidly
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 107
since hippocampal place fields are observed upon first exposure to (e.g., place and grid fields) may contribute to the determination of
a new environment (e.g., Hill, 1978; Muller and Kubie, 1987; context saliency, but there is abundant evidence to support the
O’Keefe and Burgess, 1996; Wilson and McNaughton, 1993). The claim that this is a key function of the hippocampus.
apparent regularity of the spatial representations within the
hippocampal and entorhinal system has been further strengthened 5.3.4. Determining context saliency as a part of learning
by findings that grid fields, head direction preferences, and place As one learns the significance of a new environment, one’s
fields show a high degree of coherence (e.g., displacement) in perception of the relationship between environmental stimuli,
response to changes in simple geometric environments (Har- responses, and consequences is continually updated. Presumably,
greaves et al., 2007; Lee and Knierim, 2007; Nicola et al., 1996). mismatches between updated expectations and experiences with
Additional studies, however, suggest that a straightforward the new context are frequently detected, resulting in the continual
description of the relationship between grid and place fields is not shaping of long-term memory representations (McClelland et al.,
likely. Place fields in CA1 continue to reorganize in response to 1995). As memory representations become more precise, so too will
changes in the visuo-spatial environment for periods of time that the feedback to hippocampal cells regarding the expected contex-
exceed the period of grid field responses (Van Cauter et al., 2008). tual features. Thus, it is predicted that place fields should become
Also, place fields have been observed to become more specific after more specific and reliable with continued training as one gradually
repeated exposure to a familiar environment (Nicola and Malenka, learns about associations relevant to the test environment. In
1998) even after entorhinal cortex lesions. Further, as the support of this prediction, many studies have shown that place fields
behavioral tasks have become more complex, so have the nature become more specific and/or reliable with short-term exposure to
of the responses of grid fields. Importantly the hexagonal grid novel environments (e.g., Frank et al., 2004; Hetherington and
patterns do not appear to persist in more complex environments. Shapiro, 1997; Kentros et al., 1998; Markus et al., 1995; Muller and
When an animal is running along a linear track, the grid patterns Kubie, 1987; O’Keefe and Burgess, 1996; Wilson and McNaughton,
reset when rats turns around (Fyhn et al., 2007) and if a maze 1993). More spatially selective firing (or reduced ‘overdispersion’)
contains multiple hairpin turns, the resetting occurs periodically has also been reported to reflect goal-directed learning (e.g., Fenton
(Hikosaka et al., 2008). Finally, when using a linear track that is and Muller, 1998; Mizumori et al., 1996; Kentros et al., 1998; O’Keefe
18 m long, periodicity is limited to sections of the track (Nicola and and Speakman, 1987; Rosenzweig et al., 2003).
Malenka, 1998). These observations imply that the ‘gridness’ of Learning can be considered to be complete when mismatches
each cell is subject to being organized by ongoing behavior perhaps no longer occur and consistent memory representations are
separately from place field reorganization. The extent to which maintained during behavior (Mizumori, 2008). Indeed, after
other features of a context (e.g., motivation, memory, etc.) learning, place fields are remarkably stable after repeated
similarly impact all spatial representations remains to be exposures to the same, familiar context, and this presumably
determined. reflects stable input from memory representations. If more than
One issue of importance is the assumption that place and grid one context is learned simultaneously, a given population of place
field reliability and spatial specificity is necessary for optimal cells should show context-specific patterns of place fields, and each
decision-making during navigation. For place fields, this issue has pattern should be reliable for that context (Smith and Mizumori,
been addressed in a number of ways (for review see Mizumori 2006a,b). Presumably, such stable hippocampal patterns are in
et al., 2007b), including demonstrations that physiological some way driven by established neocortical networks, or schemas
conditions that are associated with normal learning and decisions (Tse et al., 2007). To insure adaptive behavior, however, the
(e.g., synaptic plasticity mechanisms, sensory and motor proces- hippocampus must constantly engage in context comparisons in
sing systems, motivational systems, and so on) are also associated the event that the familiar context is altered. Similarly, hippocam-
with greater place field stability. Although a systematic and direct pus should process contextual information even for tasks that do
test of this relationship has yet to be carried out, it is worth noting not explicitly require contextual knowledge in case contextual
that it may be difficult to observe a clear and strong correlation information becomes relevant. Place cell studies indeed, show that
between (at least) CA1 place field stability and choice accuracy specific neural codes in the hippocampus remain responsive to
since the recorded CA1 population tend to exhibit a heterogeneous changes in context even though contextual learning is not
collection of neural responses (e.g., within a single recording necessary to solve a task (Yeshenko et al., 2004). Thus, processing
session, there are individual cell differences in place field contextual information by the hippocampus appears to be
responses to context changes). Indeed, laboratories have reported automatic and continuous (Morris and Frey, 1997). A different
a lack of correlation between CA1 place field reorganization and but related theory is that the hippocampus uses context
behavior (e.g., Cooper and Mizumori, 2001; Jeffery et al., 2003). information to recall specific context-relevant memories (Fuhs
Most of the place field data in the literature are based on recordings and Touretzky, 2007; Redish, 1999; Redish et al., 2001).
from CA1 neurons. Therefore the relationship between CA3 place If the hippocampus continually processes contextual informa-
field properties and optimal decisions remains to be determined. tion, then why do hippocampal lesions disrupt only certain forms
The same is true for grid cells: the results of direct tests of the of learning and not others? If one assumes that lesion effects are
relevance of grid fields for accurate decisions are not yet known. observed only when the intrinsic processing by the structure of
The discussion so far presents the view that hippocampus interest is unique and essential for learning to take place, then no
functions to detect differences between contexts, or detect when a behavioral impairment should be observed if other neural circuits
context changes. A basic algorithm that compares an animal’s can compensate for the lesion-induced change in function. Indeed,
expectations of a familiar contextual environment (i.e., the spatial there is abundant evidence that under most conditions, stimulus–
layout of external sensory cues, the relevant behaviors to obtain response learning is not impaired following hippocampal lesions,
rewards, the location of goals, and consequences to specific since striatal computations are sufficient to support such learning
choices) with actual experiences can be used to discriminate (e.g., McDonald and White, 1993; Packard et al., 1989; Packard and
contexts, detect changes in a familiar context, or identify novel McGaugh, 1996). This does not mean that the hippocampus does
situations. All of these operations have in common the need to not normally play a role in stimulus–response performance, but
determine the saliency of the current context. There is currently rather, that the hippocampus may contribute by defining the
only a rudimentary understanding of how the various neural context for the learning, which in turn may allow the learned
representations of the spatial context by hippocampal neurons information to be more adaptive in new situations in the future.
108 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
distributed across the subiculum, CA1, CA3, and the dentate gyrus
Hippocampus (CA1/subiculum)
with CA1 and the subiculum receiving more innervation relative to
Prefrontal CA3 and the dentate gyrus (Gasbarri et al., 1994a,b, 1997).
cortex Compared to other efferent structures of the dopaminergic system
GLU such as the nucleus accumbens, the hippocampus receives a
relatively small proportion of input from the VTA; 10% or less of the
Ventral
GLU cytochemically identified dopamine neurons project to the
striatum
hippocampus, whereas 80% of that population projects to the
GABA Ventral nucleus accumbens (Fields et al., 2007).
GABA DA
GLU pallidum Although the hippocampus receives modest dopaminergic
innervation from the VTA, it is one of the few brain regions that
Ventral
express all of the five dopamine receptor subtypes. The dentate
tegmental GLU
GLU gyrus and subiculum shows high levels of the D1 receptor subtype,
area ACh
and the D1-like D5 receptors are expressed throughout the
Pedunculopontine
hippocampus. D2 receptor binding sites are most prominent in
tegmental
dorsal CA1 and the subiculum, while the levels of D3 receptors are
nucleus
low throughout. Finally, D4 receptors are found in the dentate
Lateral dorsal tegmentum
gyrus, CA1, and CA3. The dopaminergic innervations of the
Lateral habenula
structure, along with the expression of all five receptor subtypes
Lateral hypothalamus
allows dopamine to have a powerful influence on the function of
and more
the hippocampus, impacting information processing and plasticity
(Frey et al., 1990; Huang and Kandel, 1995; Li et al., 2003;
Fig. 6. An essential neural circuit that links hippocampal (spatial context)
Otmakhova and Lisman, 1998).
information with reinforcement learning and decision making systems of the
The path from hippocampus to the midbrain dopaminergic
brain. Direct hippocampal arrives in the reinforcement learning system via the CA1
and subicular projections to the ventral striatum (i.e., the nucleus accumbens). The system is indirect and varied (see Fig. 6). The most direct path from
ventral striatum is thought to serve as the ‘critic’ in the actor–critic model of the hippocampus involves transmission from both dorsal and
reinforcement learning. As such, the ventral striatum determines whether the
ventral subiculum, and to a lesser extent CA1, via the fimbria-fornix
outcomes of behavior are as predicted based on an animal’s expectations for a given
(Boeijinga et al., 1993; Lopes da Silva et al., 1984; Groenewegen et al.,
context. If the outcome is as expected, ventral striatum continues exerting
inhibitory control over VTA neurons. In this situation, encounters with rewards do 1999a, 1987; McGeorge and Faull, 1989; Mulder et al., 1998;
not result in dopamine cell firing. If the saliency of a context changes (as determined Swanson and Cowan, 1977; Totterdell and Meredith, 1997; van
by hippocampal processing), signals to the ventral striatum may preferentially
Groen and Wyss, 1990). More specifically, the dorsal subiculum (and
excite VTA neurons via an indirect pathway that includes the ventral pallidum and
CA1) project primarily to the rostro-lateral shell region of the
the pedunculopontine nucleus. The result of this elevated excitation may be a
nucleus accumbens, while the ventral subiculum (and CA1)
depolarization of VTA neurons such that they are more likely to fire when
subsequent reward information arrives in VTA. selectively terminate throughout the rostral–caudal extent of the
accumbens shell. Entorhinal cortex also provides extensive input to
5.4. Relationship between hippocampal context codes and the nucleus accumbens, with the MEC preferentially innervating the
reinforcement based learning rostro-medial shell and core divisions of the accumbens, and the LEC
terminating throughout the rostral–caudal extent of the lateral shell
Hippocampal efferent systems can use the result of the and core regions (Totterdell and Meredith, 1997). It should be noted
hippocampal context analysis to update their neural response that the limbic input to the ventral striatum (including the nucleus
profile such that subsequent behavioral choices are optimized. The accumbens) is one of a number of convergent inputs to individual
midbrain and striatal reinforcement learning systems are a major ventral striatal neurons (e.g., Floresco et al., 2001; French and
target of hippocampal output (see Fig. 6). Therefore, it is often Totterdell, 2002; Goto and O’Donnell, 2002; O’Donnell and Grace,
assumed that hippocampus provides the necessary context 1995). Other sources of afferents include the prelimbic/infralimbic
information that guides dopamine-related reward or behavioral and orbital frontal cortices, as well as the basolateral amygdala.
responses. The outcomes of behavioral choices are evaluated by Thus,the ventral striatum has long been considered a central point of
the reinforcement learning system, and the result of such an integration of information needed for adaptive behaviors (Mogen-
evaluation is thought to feed back to memory systems and the son et al., 1980).
hippocampus to update future context-based expectations. To It is through the ventral striatum that the hippocampus may
begin to discuss how a hippocampal evaluation of context saliency ultimately impact dopamine cell firing, since the ventral striatum
impacts reinforcement learning systems of the brain, the following in turn innervates the VTA and SNc. Moreover, both of the core and
discusses (1) a neuroanatomical network that supports a shell components of the nucleus accumbens have some degree of
functional link between hippocampal place fields and reinforce- control over the dopamine cells that in turn project to them. The
ment learning systems, (2) evidence for a role for dopamine in details of the circuitry is complex (for a recent excellent summary,
hippocampal-dependent learning and plasticity, and (3) the see Humphries and Prescott, 2010) but of direct relevance here is
possible impact of hippocampal context processing on dopamine that the lateral and medial shell innervates, either via direct or
cell responses to reward. indirect routes, the lateral or ventral sectors of the VTA,
respectively (Ikemoto, 2007; Zhou et al., 2003). This pattern
5.4.1. Functional connectivity between reinforcement and matches the topography of VTA connections back to the shell
hippocampal systems region. Also of note is the fact that both GABA and dopamine
Direct dopaminergic innervation of the hippocampus arises neurons participate in this reciprocal interaction between VTA and
from both the VTA and the sunstantia nigra pars compacta (SNc), ventral striatum (Carr and Sesack, 2000; Nair-Roberts et al., 2008).
although input from the VTA is more extensive (Gasbarri et al., This is an important point to note since studies of VTA single unit
1994b). Dopaminergic projections occur across the entirety of the representations during hippocampal-based memory performance
dorsal–ventral axis of the hippocampus, with the ventral axis being suggest that it is likely that both dopaminergic and GABAergic
more heavily innervated. The innervation is also differentially populations contribute to reward processing (Martig and
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 109
Mizumori, 2011; Puryear et al., 2010). Core regions of the The hippocampus likely plays a role in detecting changes in
accumbens project to a slightly different population of dopami- familiar contexts, and for generating novelty related signals that
nergic neurons, those in the SNC and in the lateral regions of the initiate relevant investigatory behaviors for both spatial and
VTA (Berendse et al., 1992a,b; Usuda et al., 1998; Zhou et al., 2003). nonspatial tasks. Interestingly, the dopamine system is also known
These dopaminergic regions seem to project back to the same core for its association with novelty detection (Horvitz et al., 1997;
areas that project to it (Joel and Weiner, 1994). For both shell and Ljungberg et al., 1992; Redish et al., 2007; Seamans and Yang,
core regions, their impact on the VTA and SNc are presumed to be 2004), a response that is perhaps triggered following hippocampal
inhibitory since the accumbens projection cells are GABAergic. identification of novelty. Further, exposure to novel environments
Thus, one possibility is that excitatory (glutamatergic) messages enhances synaptic plasticity mechanisms in hippocampus, and this
from the hippocampus add to the inhibitory control over enhancement appears related to D1 receptor activation (Li et al.,
dopaminergic neurons. Currently it is not possible to state how 2003). Thus, it has been postulated that a functional loop between
much control the hippocampus exerts onto dopamine neurons the VTA and the hippocampus allows novelty signals from the
since we do not yet fully understand the significance and hippocampus to be relayed to the VTA to generate responses to
mechanism of convergence in ventral striatum of hippocampal, novelty by dopaminergic neurons (Lisman and Grace, 2005;
frontal and amygdala information. Nevertheless, this is likely an Mizumori et al., 2004). The latter responses are then thought to
important pathway by which hippocampal systems and the be relayed back to the hippocampus to facilitate plasticity circuits
midbrain motivational circuitry interact. and learning.
In addition to the hippocampal–accumbens–VTA/SNc pathway, Most of the studies investigating possible dopaminergic effects
there are a number of sources of excitatory and inhibitory control on hippocampal function include the application of drugs directly
over dopamine cell firing (see Fig. 6), and details of these to, or lesions of, the hippocampus. Recently, Martig et al. (2009)
connections remain to be worked out. Four of the most studied employed a different approach, and that was to reversibly
dopamine afferent systems include the frontal cortex and the inactivate the VTA of rats to temporarily reduce endogenous
amygdala (Lodge and Grace, 2006; Woolf, 1991), as well as the levels of dopamine within the hippocampus. Attempts were made
pedunculopontine nucleus (PPTg) and the lateral dorsal tegmental to selectively silence VTA dopamine neurons by infusing baclofen
nucleus. As an example of the complex nature of each afferent (Xi and Stein, 1998), rather than more broadly inactivating VTA
input, the PPTg provides cholinergic (Woolf, 1991) and glutama- with anesthetics such as lidocaine or tetracaine. VTA inactivation
tergic input to VTA and SNc (Beninato and Spencer, 1987; Futami significantly impaired choice accuracy on a hippocampal-depen-
et al., 1995; Sesack et al., 2003) and this input is topographical in dent spatial working memory task. However the effect was time
nature. The PPTg is characterized by an uneven distribution of dependent: greater impairment was observed after the initial days
distinct populations of cholinergic, glutamatergic, and GABAergic of infusion, suggesting some form of compensatory change in the
cells (Wang and Morales, 2009), with differential input and output neural circuitry connecting the hippocampus and the VTA. Further,
projections of its anterior and posterior subdivisions (Alderson VTA inactivation selectively impaired short term working memory,
et al., 2008). Cholinergic cells are concentrated in posterior PPTg a form of memory that is hypothesized to be important following a
(Wilson et al., 2009) and project mostly to VTA, while anterior PPTg change in context. Importantly, the selective behavioral effects
contains proportionately greater GABAergic cells that project to demonstrate that the hippocampal effects were not due to changes
the SNc (Oakman et al., 1995). It has been argued that the PPTg in behavioral control or motivation.
regulates the transition to burst firing by dopamine cells (Grace In a subsequent experiment, Martig and Mizumori (2011)
et al., 2007), but precisely how this happens remains under recorded hippocampal place field responses to baclofen-induced
investigation. Thus, the ventral striatum may ultimately be in a inactivation of the VTA as rats performed a spatial working
position to orchestrate the balance between inhibitory and memory task on a radial arm maze. Based on the findings of
excitatory control over dopamine cell firing depending on the Kentros et al. (2004), it was predicted that VTA inactivation would
determination of saliency of the current context by hippocampus. destabilize choice accuracy that is dependent on hippocampal
function, as well as the stability of place fields. Also, given the
5.4.2. A role for dopamine in hippocampal-dependent learning and differential distribution of VTA afferents to the hippocampal
plasticity subfields (CA1 > CA3), it was expected that CA1 place fields would
There is abundant evidence that the dopaminergic system plays be impacted more dramatically than CA3 place fields. Finally, given
an important role in hippocampal-dependent behavior and the transient behavioral effect that was observed by Martig et al.
plasticity. The hippocampal dopaminergic system has been (2009), the maze training procedures were modified to increase
manipulated in a number of ways, and the bulk of the evidence the likelihood that VTA was essential for good performance. That is,
shows that dopaminergic agonism and antagonism, respectively, rats learned to expect rewards of different magnitudes at specific
enhance and impair spatial learning. As examples, D1 receptor locations on the maze.
knock-out mice exhibit deficits in spatial learning (El-Ghundi et al., The results showed that VTA inactivation significantly, and
1999) and selective 6-OHDA lesions in hippocampus impaired more consistently, impaired choice accuracy than in Martig et al.
performance in the Morris swim task (Gasbarri et al., 1996). Direct (2009). This behavioral impairment occurred even though rats
hippocampal infusions of agents that disrupt D1–NMDA receptor retained their preference to visit maze locations that were
interactions also produce performance deficits in the working previously associated with large rewards. This result was
memory version of the Morris swim task (Nai et al., 2010). Selective surprising given that VTA neurons are known to preferentially
removal of hippocampal dopamine input via local 6-OHDA respond to larger rewards than small rewards (Puryear et al., 2010;
infusions into the subiculum and adjacent CA1 region of rats also Schultz et al., 1997). The authors interpreted this unexpected
impairs performance in the spatial version of the water maze result to indicate that VTA’s selective coding of large rewards is not
(Gasbarri et al., 1996). Manipulations of endogenous levels of necessary or sufficient to drive behavioral choices toward the large
dopamine in the hippocampus also negatively impact hippocam- rewards. Rather, the VTA neural codes may contribute to an
pal-dependent processing (e.g., Kentros et al., 2004; Martig et al., evaluation of the consequences of behaviors. Recorded hippocam-
2009; Wisman et al., 2008). Finally, dopamine agonist treatment in pal CA1 place cells showed less stable fields after VTA inactivation
the hippocampus can reverse age-related decreases in spatial relative to control conditions and relative to CA3 place cells. The
performance (Bach et al., 1999; Behr et al., 2000). differential response reveals that in a well learned task, CA3 place
110 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
fields alone are not sufficient to maintain high choice accuracy shown to reduce EPSPs in subiculum (Behr et al., 2000). This result
during navigation. This supports the view described above that a implies that excitatory inputs to hippocampus must surpass the
hippocampal evaluation of the expectations (and hence saliency) inhibitory influence of low levels of dopamine in subiculum.
of a context requires coordinated effort between CA1 and CA3. However, when large quantities of dopamine are applied, there is a
In summary, there is substantial evidence that there is an facilitation of long lasting synaptic potentiation in the CA1 region
important role for the VTA dopamine cells in regulating (Huang and Kandel, 1995). Therefore, dopamine acts to dose-
hippocampal-dependent learning and context representation. dependently gate excitatory drive by reducing the effectiveness of
The place field data show that hippocampal neurons rely on potentially irrelevant inputs. By determining the overall effective-
dopamine input for representing context-relevant information ness of excitatory inputs within a structure, dopamine could be
over time. These results are consistent with growing evidence that part of a mechanism that determines the likelihood that new or
dopamine increases the stability of neural plasticity mechanisms salient information is remembered.
in hippocampus. Cellular mechanisms for this stabilization
function are revealed in studies of dopamine effects on hippocam- 5.4.3. Impact of hippocampal context processing on dopamine cell
pal synaptic plasticity. Dopamine appears to importantly regulate responses to reward
a leading model of learning-related synaptic plasticity, long-term In contrast to the abundant evidence for a functional link from
potentiation (LTP). LTP is generally described as a persistent the dopaminergic system to the hippocampal system, converging
increase in synaptic efficiency (Martin et al., 2000), and it has been evidence for a functional link in the other direction is only recently
shown that its induction alters place fields (Dragoi et al., 2003). The beginning to emerge. Nevertheless, existing theories argue that the
duration of LTP varies depending upon the pattern of neural VTA–hippocampal connection is important for several complex
activation used for induction (Morris and Frey, 1997). D1 receptor behaviors, such as reinforcement learning, spatial/contextual
activation appears critical for the maintenance of late phase LTP in learning, and motivation (Fields et al., 2007; Lisman and Grace,
CA1 (L-LTP; Frey et al., 1990, 1991; Huang and Kandel, 1995; 2005; Schultz, 2002; Wise, 2004). Central to these functions is the
Williams and Eskandar, 2006). Dopamine application is also idea that dopamine may strengthen stimulus–reward associations
capable of inducing LTP, referred to as early phase LTP (E-LTP), in (Schultz, 2002). Accordingly, dopamine neurons fire upon presen-
the dentate gyrus, following stimulation protocols which are tation of unexpected rewards and conditioned cues that predict
normally insufficient to do so (Kusuki et al., 1997). Further there is reward, and they are inhibited when expected events do not occur
some indication that dopamine agonists alone may be sufficient to (Schultz and Dickinson, 2000). These firing patterns may signal an
induce a slowly developing potentiation that is independent of any error in the prediction of reward (Bayer and Glimcher, 2005;
other external stimulation (Huang and Kandel, 1995; Williams Hollerman and Schultz, 1998), and this in turn enables the use of
et al., 2006; Williams and Eskandar, 2006). The general pattern, flexible behaviors during learning (Schultz and Dickinson, 2000).
then, seems to be that dopamine elevates and/or maintains The reward prediction error signal appears to take into account the
synaptic excitability of hippocampal neurons. Enhancing the behavioral context in which rewards are obtained (Nakahara et al.,
duration of strong neural signals may be an important way to 2004; Roesch et al., 2007), context information that may derive
increase the associative capacity of temporally discrete events, and from hippocampal input. If this is the case, it should be possible to
this could in turn facilitate accurate determinations of context record similar reward responses in freely behaving rats performing
saliency. a hippocampal-dependent maze task. A recent study explicitly
A possible mechanism for dopamine’s effects on hippocampal tested this idea.
neurons was revealed by findings that dopamine agonist-induced Puryear et al. (2010) found that VTA dopamine neurons
L-LTP can be significantly attenuated by NMDA-receptor antago- increased firing when rats encountered rewards in expected
nism (Stramiello and Wagner, 2008) suggesting an important locations on a radial maze, and that the response was much larger
interaction between these neurotransmitter systems. There is following encounters of the larger size rewards. This is analogous
additional evidence that the interaction between glutamatergic to dopamine responses reported from studies with primate
and dopaminergic systems modulates heterosynaptic LTP, where- (Schultz et al., 1997). Moreover, it appeared as if these cells fired
by weak inputs become strongly potentiated (O’Carroll and Morris, in response to cues that predict reward in that they exhibited
2004). Specifically, it is suggested that NMDA-receptor activation elevated discharge coincident with an auditory stimulus that
in hippocampus may ‘prime’ synaptic markers that synergize with signified the beginning of a trial. Also, it was shown that changes in
neuromodulatory signals, such as dopamine, to initiate increases in the visual aspects of the test environment resulted in significant
the mRNA and protein synthesis that is thought to be so important alterations in the reward responsiveness of the dopamine neurons.
for L-LTP (Frey and Morris, 1997). Thus, again as shown in primate studies, the dopamine reward
The electrical stimulation protocols used to induce LTP are responses appear to be context-dependent. Of particular interest
unlikely to occur during natural learning scenarios. However, was whether rodent VTA neurons would show evidence for either
evidence indicates that lasting changes in synaptic plasticity in the positive or negative reward prediction signaling during navigation
hippocampus can result from exposure to different spatial based goal-directed behaviors. Indeed, it was found that VTA cells
contexts. Dopamine has been implicated in such context-induced increased firing when a larger than expected reward was
changes in hippocampal synaptic plasticity. Pre-treatment with a encountered, and reduced firing when an expected reward was
D1/D5 receptor antagonist interferes with the LTP-inducing effects not found. In addition to confirming that rodent dopamine cells
of spatial exploration (Lemon and Manahan-Vaughan, 2006; Li code reward when spatial information is used to guide behaviors to
et al., 2003). The ability of dopamine to gate exploration-induced locations that signify food, use of a navigation-based task allowed
synaptic plasticity, then, may be reflected in changes in spatially Puryear et al. (2010) to examine the relationship between
selective neural activity. If dopamine enhances the duration of LTP, voluntary movement and reward codes. This was of interest given
then dopamine may act to stabilize place field properties. This a vast clinical and research literature showing a critical role for the
hypothesis was supported recently by Martig and Mizumori (2011) dopamine system in the voluntary initiation of behaviors. The
who found that temporarily removing dopamine input to place firing rates of dopaminergic reward neurons were found to be
cells reduces place field stability. correlated with velocity and/or acceleration as rats moved
Hippocampal output via the subiculum is also modulated by between food locations. However, in contrast to the reward
dopamine afferents. In one study, a low dose of dopamine was responses, the movement correlates were not context-dependent,
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 111
suggesting that there are at least two independent sources that that does not involve the PPTg (such as the lateral habenula,
regulate dopamine cell firing during navigation. Matsumoto and Hikosaka, 2007). A separate population of PPTg
A rather surprising result of the Puryear study was that neurons exhibited firing rate correlations with the velocity of
dopamine neurons consistently responded to rewards even though movement. There were also a small number of cells that encoded
the task was well learned. According to the now classic studies by reward in conjunction with a specific type of egocentric movement
Schultz and his colleagues (e.g., Schultz, 1998b, 2010; Schultz et al., (i.e., turning behavior). The context-dependency of PPTg reward
1997) dopamine cells cease firing to rewards and instead fire in responses was tested by observing the impact of changes in
response to the presentation of cues that predict rewards. Firing to visuospatial and reward information. Visuospatial, but not reward
cues was in fact observed in the Puryear study, but so was firing to manipulations significantly altered PPTg reward-related activity.
the rewards. One possible explanation for the continued response Movement-related responses, however, were not affected by either
to reward by dopamine neurons is that our working memory task type of manipulations. These results suggest that PPTg neurons
generated a sufficient degree of uncertainty about choices that conjunctively encode both reward and behavioral response
dopamine responses to rewards were retained (Fiorillo et al., information, and that the reward information is processed in a
2003). Dopamine signals can be thought of as ‘uncertainty signals’ context-dependent manner.
that reflect the strategy of continually updating action–outcome Upon closer examination of the PPTg data, it was found that
systems to optimize future behavioral choices. To test this excitatory reward responses predominated for anterior PPTg cells,
hypothesis, Martig and Mizumori (2011) recorded VTA neurons and not posterior PPTg neurons. Considering their different
as rats learned a spatial task that did not involve working memory. efferent targets (Puryear and Mizumori, 2008), it appears that
Rats learned to visit the same maze arm to obtain food reward. there is increased synaptic drive to nigral cells from anterior PPTg
After rats learned the initial goal location over days, the same rats coincident with reward consumption in our task. At the same time
were trained to find food in a novel location; after rats learned the there is reduced synaptic drive to VTA. This was unexpected since
second location, a third novel location was introduced. The number it has been shown that under the identical test conditions, both
of VTA cells showing reward responses declined as additional VTA and nigral cells increase burst firing relative to reward
locations were learned. For comparison, SNc neurons were also acquisition (Gill and Mizumori, 2007; Martig and Mizumori, 2011;
recorded as rats performed the same task. In contrast to the VTA Puryear et al., 2010). To account for this apparent discrepancy, it is
cells, SNc cells did not show a change in the number of reward cells suggested that during reward acquisition, the reduction of
with continued training. This differential response of VTA and SNc cholinergic input to VTA from the posterior PPTg may reduce
cells is potentially highly significant since it (1) suggests that the excitatory drive to VTA GABA neurons. Since VTA GABA
dopamine signaling can have more than one function, and (2) neurons normally provide inhibitory control over dopamine cells
stresses the importance in future studies of identifying the (Omelchenko and Sesack, 2009), their reduced activation ‘permits’
locations of the cells being recorded in any functional analysis dopamine burst firing. Posterior PPTg responses to rewards tended
of dopamine neurons. Evidently, context-dependent reward to persist for the duration of reward consumption, whereas VTA
responses are more apparent for VTA than for SNc cells. This cells show phasic high frequency burst firing to rewards, and the
finding begs the question: what is the source of context duration of the VTA response is relatively short compared to the
information for VTA neurons? duration of reward consumption. Thus while posterior PPTg may
The VTA may receive context-dependent information via an initiate VTA dopaminergic reward responses, other intrinsic or
indirect pathway from the hippocampus that includes the ventral extrinsic mechanisms regulate the duration of dopamine burst
striatum, ventral pallidum, and the PPTg (Fig. 6). Recent work firing (perhaps the inhibitory input from accumbens or pallidum
tested whether the latter pathway is an essential link that bridges (Zahm and Heimer, 1990; Zahm et al., 1996). Fig. 7 provides a
hippocampal context processing and the VTA. It had been known schematic illustration of a comparison between VTA and PPTg
that PPTg contributes to the burst firing of dopamine cells (Oakman neural responses to reward.
et al., 1995; Pan and Hyland, 2005), yet the significance of this A salient feature of the dopamine cell response to reward is the
influence is not clear. Consideration of sensory afferents to the brief changes in firing rates when rats encounter unexpectedly
PPTg (Redgrave et al., 1987; Reese et al., 1995) along with the large or small rewards. Such a prediction error signal was not
established role of dopamine in reinforcement-based operant observed for PPTg neuron, suggesting that it is either computed
learning (Schultz, 1998b) suggests that the PPTg may facilitate the locally within VTA circuitry, or it is received by an afferent
processing of (or attention to) learned conditioned stimuli via a structure. Matsumoto and Hikosaka (2007) provide convincing
sensory-gating mechanism (Kobayashi and Isa, 2002; Winn, 2006). evidence that the lateral habenula is at least a critical player in
Indeed, PPTg neurons exhibit phasic responses to auditory and generating a prediction error signal for dopamine cells since its
visual sensory stimuli that predict reward with a shorter latency neurons also show altered firing rates in response to a change in
(5–10 ms) than dopamine cells (Pan and Hyland, 2005). The PPTg the expected amount of reward. The direction of the change,
may, however, serve a more complex function than to relay current however, is the opposite of that of dopamine cells: they increase
sensory information since context-dependent responses of PPTg firing when animals encounter less reward than expected, and they
neurons have been described in cats performing a motor show reduced firing after encounters of unexpectedly large
conditioning task (Dormont et al., 1998). Thus it was of interest rewards. This pattern is consistent with the finding that lateral
to identify the nature of the information passed from PPTg to habenula activation normally inhibits the activity of VTA and SNc
dopamine cells during goal-directed navigation by investigating dopamine neurons (Christoph et al., 1986; Herkenham and Nauta,
PPTg neural responses during performance of a task that is (a) 1979). Additionally, Puryear and Mizumori (2008) found predic-
known to rely on intact hippocampal processing, and (b) known to tion error codes in cells of the medial reticular nucleus (Swanson,
generate burst firing by VTA neurons in a context-dependent 2003), which is known to provide glutamatergic input to VTA
fashion (Puryear et al., 2010). (Geisler et al., 2007). The reticular formation is thought to be
When PPTg cells were recorded from rats searching for food in important for modulating arousal and vigilance levels necessary
known locations on a radial maze, 45% of recorded PPTg neurons for attending to and acting upon salient stimuli (Mesulam, 1981;
were either excited or inhibited upon reward acquisition, and there Pragay et al., 1978). Thus, it seems reasonable that multiple areas
was no evidence for prediction error signaling. Thus, the latter modulate the activity of VTA dopamine neurons when the outcome
component of reward processing may arrive in the VTA via a route of behavior does not meet expectations.
112 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
Midbrain DA cells PPTg cells large small large small 30
Unexpected 0 rewards 500 ms
stimulus reward stimulus reward
Expected reward
Reward
omission
Fig. 7. Reward-related neural discharge has now been shown to exist in multiple brain structures throughout the midbrain and forebrain areas. Left: Responses of a midbrain
(VTA) dopamine cell to rewards of large and small magnitude. The top two rows illustrate responses when a large or small reward is unexpectedly presented to an animal: the
top row shows a schematized response that illustrates a greater dopamine cell response to large rewards. The example of a response by a single dopamine neuron in the
second row confirms the schematic on the top row. The third row illustrates that after a stimulus has been associated with reward, the stimulus itself, and not the reward,
elicits dopamine cell discharge. In this case the subject expects to receive reward following presentation of the stimulus. The bottom row illustrates dopamine cell responses
when a reward is omitted after the associated stimulus is presented. It can be seen that dopamine cells increase firing after stimulus presentation, but the same cell shows
reduced firing at the time when the rat expected to receive reward. This inhibited response is referred to as an inhibitory (or negative) reward prediction error that signals
efferent structures that an expected reward was not found. Right: For comparison with dopamine cell responses, schematized and exemplar responses are shown for cells
recorded in the pedunculopontine nucleus (PPTg), a structure that is thought to regulate burst firing by dopamine cells. Like dopamine cells, PPTg cells respond not only
respond to encounters with unexpected reward, but they also do so differentially. However, in contrast to dopamine cells, the PPTg responses differentiate reward
magnitudes in terms of the duration of response, and not magnitude of response. This pattern suggests that PPTg cells signal the presence of reward. If stimuli are associated
with subsequent reward encounters, PPTg cells show responses to cues that predict rewards (and not to stimuli that do not predict rewards). Unlike dopamine cells, PPTg cells
continue to response to reward presentations even after the presentation of a conditioned stimulus. The last row shows that, again unlike the response of dopamine neurons,
PPTg cells show no evidence of prediction error signaling.
To summarize, the hippocampus may provide a fundamental fact, these forms of learning can be represented on a kind of
analysis of the current context that allows subsequent decisions to continuum. Pavlovian learning mechanisms underlie the ability of
be made based on the most recent determination of context an organism to learn that neutral stimuli can be predictive of
saliency. Via direct projections to the ventral striatal–VTA system, rewards and goals and can eventually facilitate instrumental
hippocampus may signal the dopaminergic component of the learning (i.e., Pavlovian-instrumental transfer), and instrumental
reinforcement learning system when there are violations of one’s learning can progress from goal-directed behavior, to habitual
expectations for a given context. This ‘alerting’ signal may lower action–outcome associations once a behavior has been well-
the threshold for dopamine cell firing to reward so that the learned. Within the reinforcement learning literature, these
‘teaching signal’ can be distributed to update memory and different modes of learning are described by ‘model-free’ algo-
behavioral systems. The following section will describe current rithms that attempt to explain stimulus–response behavior, and
ideas about the impact of dopamine signals on the ventral and ‘model-based’ algorithms that describe how learning about the
dorsal striatum, focusing on the role of dopamine in decision environment allows an organism to consider impending actions or
making and behavioral control during navigation. formulate new actions within the current context. Until very
recently, it was thought that the dorsal striatum worked as the
6. The neurobiology of reinforcement learning and goal- actor in a model-free system, and the ventral striatum functioned
directed navigation: striatal contributions as the critic in a model-based system (Atallah et al., 2007; Johnson
et al., 2007; van der Meer and Redish, 2011). A wealth of recent
Decision making or action selection processes have been data, however, suggests a more fine-tuned delineation of function
attributed to the striatum, which acts as a dynamic controller of across the dorsal–ventral striatum. Along with a refinement of the
behavior, integrating sensory, contextual and motivational infor- functional anatomy of the striatum, it is also clear that reinforce-
mation from a wide network of cortical and subcortical structures. ment learning algorithms themselves may need to be reconsidered
This function can be accomplished through the use of reinforce- if they are to successfully model learning in complex environ-
ment learning algorithms that compare the expected success of a ments.
learned behavior with the actual success experienced by the
organism. In reinforcement learning models, the actor and critic 6.1. Striatal based navigational circuitry
use these predictions to implement successful action–outcome
policies (Khamassi et al., 2005). The actor–critic distinction Like the hippocampus, the striatum is composed of several
represents a classic distinction in psychological literature, that functionally and anatomically distinct subregions. All cortical
between Pavlovian learning (stimulus–outcome relationships) and areas project to the striatum (Berendse et al., 1992a,b; McGeorge
instrumental learning (action–outcome learning). While these and Faull, 1987, 1989; Parent, 1990) and the distribution of these
aspects of learning are often studied under restrictive conditions projections help to define three main subdivisions of the striatum:
designed to assess particular features of each type of learning, in the ventral striatum (often synonymous with the nucleus
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 113 A B C
Limbic loop Associative loop Sensorimotor loop
Orbital & PFC & Parietal Sensorimotor Ventral PFC BLA assoc. cortex cortices Mediodorsal Mediodorsal Ventral /ventral thalamus thalamus thalamus
NAc Ventral Dorsomedial Assoc. Dorsolateral Motor Shell Core pallidum striatum pallidum striatum pallidum
Medial Lateral Ventral Dorsal VTA VTA SNc SNc
Excitatory Disinhibition
Inhibitory DA modulation
Fig. 8. Striatal–cortical information processing loops. (A) The ‘limbic loop’ connects the orbital and ventromedial prefrontal cortex with the nucleus accumbens. Input from
these cortical regions is excitatory. The accumbens sends inhibitory projections to the ventral pallidum, which innervates the mediodorsal and other thalamic divisions. (B)
An ‘associative loop’ connects the prefrontal and parietal association cortices with the dorsomedial striatum. The dorsomedial striatum sends inhibitory projections to the
associative pallidum which innervates the mediodorsal and ventral thalamus. (C) The ‘sensorimotor loop’ connects the primary sensorimotor cortices with the dorsolateral
striatum. Emphasis is placed on the spiraling midbrain–striatum–midbrain projections, which allows information to be propagated forward in a hierarchical manner. Note
that this is only one possible neural implementation; interactions via different thalamo-cortico-thalamic projections are also possible (Haber, 2003). BLA, basolateral
amygdale complex; core, nucleus accumbens core; DLS, dorsolateral striatum; DMS, dorsomedial striatum; mPFC, medial prefrontal cortex; OFC, orbitofrontal cortex; shell,
nucleus accumbens shell; SI/MI, primary sensory and motor cortices; SNc, substantia nigra pars compacta; vPFC, ventral prefrontal cortex; VTA, ventral tegmental area.
accumbens), dorsomedial striatum, and dorsolateral striatum behaviors (e.g., learning vs. performance, operant vs. maze
(Alexander and Crutcher, 1990a; Alexander et al., 1986; Humph- learning, Pavlovian vs. instrumental learning). A complete discus-
ries and Prescott, 2010; Voorn et al., 2004). Each of these sion of these issues is beyond the scope of the current paper, thus
subregions participates in one of a series of parallel loops that the interested reader is directed to several excellent reviews that
go from the neocortex to the striatum, pallidum, thalamus, and have discussed these details (Bromberg-Martin et al., 2010;
then back to neocortex (see Fig. 8; Alexander and Crutcher, 1990a; Humphries and Prescott, 2010; Nicola et al., 2000; Redgrave and
Groenewegen et al., 1999a; Haber, 2003). These loops include a Gurney, 2006; Wise, 2009; Yin et al., 2008).
‘limbic loop’ that connects the ventromedial prefrontal cortex with
the ventral striatum (Alexander and Crutcher, 1990a; Graybiel, 6.2. Dopamine signaling and reward prediction error within the
2008; Graybiel et al., 1994; Pennartz et al., 2009; Voorn et al., 2004; striatum
Yin and Knowlton, 2006), an ‘associative loop’ that connects the
medial prefrontal cortex with the dorsomedial striatum, and a The striatum is a major target of midbrain dopaminergic
‘sensorimotor loop’ that connects somatosensory and motor projections from both the VTA and the SNc (Beckstead et al.,
cortical areas with the dorsolateral striatum. Activity within these 1979; Haber et al., 2000; Humphries and Prescott, 2010). The
loops is modulated by dopamine, released from fibers originating dopaminergic projections from the VTA and SNc play a crucial role in
in either the VTA or the SNc. Dopamine influences glutamatergic motor control and in emotional and cognitive processes (Wise,
afferents and striatal medium spiny neuron efferents, and through 2004). Dopamine neurons in the VTA send projections to the
these actions, modulates striatal output from these loops (Horvitz, prefrontal cortex, hippocampus, and amygdala, in addition to the
2002; Nicola et al., 2004). The particular role that dopamine plays projection to the ventral striatum, whereas dopaminergic neurons
in regulating information processing within each of the cortical– from the SNc connect primarily to the dorsal striatum (Bjorklund and
striatal loops is influenced by the origin and destination of the Dunnett, 2007).The projections that originate in the VTA and connect
dopaminergic projections. In addition, recent work has demon- to the prefrontal cortex are thought to regulate attentional processes
strated regional differences in tonic and phasic dopamine signals and working memory (Dalley et al., 2004), whereas VTA projections
across the ventral–dorsal axis of the striatum (Zhang et al., 2009). to the ventral striatum are assumed to play a key role in reward,
As recently pointed out by Humphries and Prescott (2010), and motivation, and goal-directed behavior (Ikemoto, 2007; McFarland
also noted by others (Bromberg-Martin et al., 2010; Salamone, and Ettenberg, 1995; Smith-Roe and Kelley, 2000; Wolterink et al.,
2007; Wise, 2009; Yin et al., 2008), a number of issues related to 1993). In terms of dopaminergic projections that originate in the SNc,
dopamine signaling within the striatum remain topics of intense the traditional view has been that this projection influences motor
debate, for example, where and what type of dopamine receptors output and stimulus–response learning (Featherstone and McDo-
are found within the striatum, and what effects their activation nald, 2004; Hikosaka et al., 2006; O’Doherty et al., 2004). However,
may have on cell signaling and behavior. The factors that are likely recent evidence indicates that goal-directed behaviors depend on
to contribute to the confusion include unclear boundaries between signaling in the dorsomedial striatum and prefrontal cortex
striatal compartments, unclear boundaries between midbrain (Graybiel, 2008; Yin et al., 2008). In addition, data from rodents
dopaminergic regions (VTA and the SNc), and the different with neurotoxic lesions of nigrostriatal dopaminergic neurons
methods used to study the effects of dopamine (e.g., pharmaco- suggest that the dorsal striatum strongly contributes to visuospatial
logical manipulations, lesions, genetically engineered mice, function and memory (Baunez and Robbins, 1999; Chudasama and
microdialysis, and voltammetry) on many different kinds of Robbins, 2006; De Leonibus et al., 2007; Da Cunha et al., 2003).
114 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
The nucleus accumbens is the dopamine terminal field most promotes habit learning (Yin et al., 2008). For example, selective
strongly implicated in reward function. As discussed in Section 4.2, lesions of dopamine cells that project to the dorsolateral striatum
the predominate view of phasic burst firing of dopaminergic impairs habit learning (Faure et al., 2005). Local dopamine
neurons within the midbrain is that it provides a reward prediction depletion, then, is similar to excitotoxic lesions of the dorsolateral
error signal representing the difference between the expected and striatum, in that both manipulations retard habit formation and
the received reward outcome (Ljungberg et al., 1992; Schultz, favor the acquisition of goal-directed actions (Yin et al., 2004).
1998b). In Pavlovian conditioning tasks, in which a cue signals the Further evidence that dopamine signaling within the dorsal
availability of reward, these neurons burst fire in response to striatum may differentially mediate action–outcome and habit/
reward but with learning, this activity shifts to the cue that motor learning has been provided by Yin et al. (2009). Medium
predicts reward. When the reward is omitted after learning, the spiny neurons within the striatum can be segregated into two
putative dopamine cells show a brief depression in activity at the distinct populations, those projecting directly to neurons of the
expected time of its delivery (e.g., Fiorillo et al., 2003; Tobler et al., substantia nigra pars reticulata (SNr) and internal segment of the
2003; Waelti et al., 2001; see Section 4.2). Demonstrating changes globus pallidus (the ‘direct’ pathway) and those that project to the
in activity within the dopamine-rich VTA, however, does not external segment of the globus pallidus, or entopeduncular
necessarily equate to dopamine release within its target structure, nucleus in rodents (the ‘indirect’ pathway). Neurons of the
although one would predict that these events would be correlated external globus pallidus or entopeduncular neurons then project
if dopamine modulates activity within the nucleus accumbens. to the SNr, the internal globus pallidus, and subthalamic nucleus.
Technological advances have provided a tool, fast-scan cyclic These two populations exhibit distinct physiological properties
voltammetry, for measuring dopamine release in target structures and, importantly, express different dopaminergic receptors, with
on a subsecond timescale (Clark et al., 2010; Robinson et al., 2003; neurons of the direct pathway preferentially expressing D1
Wightman and Robinson, 2002). Using this technique, work by receptors and neurons of the indirect pathway preferentially
Regina Carelli’s group tested the hypothesis that dopamine release expressing D2 receptors (Albin et al., 1989; Surmeier et al., 2007).
in the accumbens core is indeed correlated with a prediction error Using D2-eGFP mice, Yin et al. (2009) found that D2 expressing
signal in an appetitive Pavlovian conditioning paradigm (Day et al., neurons located in dorsolateral striatum exhibit a significant
2007). As would be predicted based on activity in the VTA, a phasic increase in synaptic strength compared to D1 expressing neurons
dopamine signal in the accumbens core was observed immediately from the same region when mice underwent extended training on
after receipt of reward, but over extended training, this signal a rotarod task. Further, blocking D1 receptors did not affect
shifted to the conditioned stimuli. This finding supports the performance when injected after the task had been well-learned. In
original ‘prediction error’ hypothesis and is also consistent with contrast, blocking D2 receptors impaired performance at both
earlier work showing impaired performance of a Pavlovian early and late training phases. This suggests that motor skill
conditioned response after either dopamine receptor antagonism learning involves an increase in synaptic activation of D2
or dopamine depletion in the accumbens core (Di Ciano et al., expressing medium spiny neurons within the dorsolateral stria-
2001; Parkinson et al., 2002). Thus, at least within the nucleus tum. An intriguing possibility is that these kinds of changes may
accumbens, the generation of a reward prediction error within the also underlie habitual behavior as routes become very familiar in
VTA does appear to provide a teaching signal to the nucleus an unchanging environment.
accumbens that facilitates learning, but it should be noted that it Additional methods to assess the distinct role that dopamine
may not provide a unitary teaching signal across the ventral has on learning and decision making mechanisms within the dorsal
striatum (Aragona et al., 2009). striatum have been employed by Palmiter and colleagues, using a
Although many remarkable discoveries have been made in dopamine deficient mouse (Palmiter, 2008; Wall et al., 2011).
terms of how the nucleus accumbens contributes to decision These mice lack tyrosine hydroxylase selectively in dopamine
making processes, it has become increasingly clear that the dorsal neurons and are therefore unable to synthesize dopamine. In
striatum is also involved. The existence of an error prediction contrast to lesion models, dopamine neurons in dopamine
signal, however, is not as well established in the dorsal compared deficient mice are functionally intact (Robinson et al., 2004),
to the ventral striatum. Direct measurement of dopamine within and endogenous dopamine signaling can be selectively restored by
the dorsal striatum has not been undertaken during a task that the experimenter, making them a powerful tool for studying
would produce a prediction error signal from the midbrain. Work dopamine signaling. These mice show impairments in instrumen-
by Oyama et al. (2010) has provided the best evidence to date that tal learning and performance, but their performance can be
an error signal is in fact generated in the dorsal striatum. In this restored either by L-DOPA injection or by anatomically selective
study, single unit activity was recorded in the dorsal striatum and viral gene transfer (Robinson et al., 2007; Sotak et al., 2005). Work
the VTA/SNc within the same animals to look for correlated activity by Darvas and Palmiter (2010, 2011) has provided evidence that
between structures during performance of a probabilistic Pavlovi- dopamine is necessary for cognitive flexibility using a water U-
an conditioning task. The data indicate that neurons within the maze task in which mice had to shift from an initially acquired
dorsal striatum do in fact show activity indicative of an error escape strategy to a new strategy, or to reverse the initially learned
prediction signal that is similar to the signal generated by putative strategy. Restricting dopamine signaling to the ventral striatum
dopaminergic neurons within the midbrain. did not impair learning of the initial strategy or reversal-learning
In addition to potentially providing a prediction signal, but strongly disrupted strategy-shifting. In contrast, mice with
dopamine within the dorsal striatum promotes learning and dopamine signaling restricted to the dorsal striatum had intact
memory processes that are necessary for goal-directed behavior. learning of the initial strategy, reversal-learning, and strategy-
The dopamine projection to the dorsomedial striatum, however, shifting. This suggests that dopamine signaling in both dorsal and
may play a different role in learning than the projection to the ventral striatum is sufficient for reversal-learning, whereas only
dorsolateral striatum, as these two regions may differ significantly dopamine signaling in the dorsal striatum is sufficient for the more
in the temporal profile of dopamine release, uptake and degrada- demanding strategy-shifting task. In a follow-up study (Darvas and
tion (Wickens et al., 2007a,b). One current working hypothesis is Palmiter, 2011) dopamine was restored to the ventromedial
that dopamine projections to the dorsomedial striatum from the striatum, and this treatment rescued spatial memory, visuospatial
medial SNc promotes action–outcome learning, while dopaminer- and discriminatory learning. Acquisition of operant behavior was
gic projections from the lateral SNc to the dorsolateral striatum delayed, however, and motivation to obtain food rewards was
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 115
blunted. These studies indicate that precise restoration of (b) The dorsomedial striatum, on the other hand, appears to
dopamine signaling within the striatum can selectively affect support action–outcome associations. This kind of learning is
behavior. It should be noted, however, that whatever functions can fundamental for adaptive goal-directed behaviors. Many of our
be rescued by L-DOPA or adenosine antagonism in DA-deficient behaviors can be considered goal-directed, for example,
mice are likely related to restoration of tonic dopamine signaling, publishing more papers will lead to a promotion at work, or
rather than phasic dopamine signaling. In addition, these mice increasing our level of exercise may lead to better health.
have not been used to directly assess habit formation, or the (c) The dorsolateral striatum is involved in incremental stimulus–
potential parallel signaling that may take place between the response kinds of learning that underlie procedural learning
dorsolateral and dorsomedial striatum as learning develops. and the formation of habits, and the sequencing of behavior. In
Nevertheless, the development of this kind of model for selectively many cases, habits are thought of in a negative context such as
investigating dopamine function in the striatum is likely to drug addiction. When habits are discussed here, the term is
significantly advance our understanding of the role that dopamine meant to indicate something more general and adaptive,
plays in decision making during learning. reflecting a well-learned skill or automatic behavior. One
Based on these data, one hypothesis about the influence of example of this kind of learning may be learning to ride a
dopamine on striatal function suggests that the striatum can be bicycle; initially, a great deal of effort and conscious thought
organized into four regions that underlie different, but synergistic goes into staying upright and moving the bicycle forward. Over
association processes, each contributing to the decision processes time, however, these actions become considerably easier and
that are necessary for navigating within complex learning the individual components of the behavior that keeps you
environments (Ikemoto, 2007; Yin et al., 2008). Neuronal signaling upright and moves the bicycle forward become an implicit fluid
moves through a serial cascade, beginning in the ventral striatum sequence that may be difficult to verbalize when teaching
and moving into the dorsomedial and finally, the dorsolateral someone else how to ride a bicycle.
striatum as learning progresses. It is thought that this spiraling of
information through the ventral–dorsal aspects of the striatum While these descriptions of the contributions of the striatal
promotes the transition from goal-directed to habit-driven subregions to decision making processes suggests separable
behaviors (Belin and Everitt, 2008; Everitt and Robbins, 2005). functions (i.e., serial processing), it is more likely that these
Details of this working model of the striatum include the following subregions function synergistically within a wide network to
(also see Fig. 9): direct behavior in complex learning environments (Groenewegen
et al., 1999b; Haber, 2003; Haruno and Kawato, 2006; Joel and
(a) The ventral striatum is important for Pavlovian learning and Weiner, 2000; Yin et al., 2008; Zahm, 2000). These functions will be
the interaction between Pavlovian and instrumental learning discussed individually.
mechanisms. This kind of stimulus–reward learning underlies
conditioned approach behaviors, and is a powerful way in 6.3. The ventral striatum: Pavlovian learning and cost-based decision
which one can learn that neutral stimuli leads to reward. In making
some cases, the stimuli that predict reward may acquire some
of the motivational properties of the primary reward. An The ventral striatum receives convergent glutamatergic input
example of this is the value that money has – while money from multiple sensory and association areas of the neocortex
itself has no innate biological importance, it is often paired with (prefrontal cortex) and the limbic system, including the amygdala
items that do have motivational significance, allowing it to and hippocampus and related structures (subiculum, area CA1,
serve as a predictor for future rewards, and also as a powerful entorhinal cortex) (Boeijinga et al., 1993; Flaherty and Graybiel,
conditioned reinforcer. 1993; Groenewegen et al., 1999a,b, 1987; Humphries and Prescott,
Dorsolateral Striatum Model-free Stimulus response learning Habits, skills, behavioral sequencing Model-based Dorsomedial Striatum Action-outcome learning Goal-directed action
DLS Nucleus Accumbens Core Stimulus-outcome learning DMS Pavlovian preparatory CRs & Anticipatory approach behaviors core
Nucleus Accumbens Shell shell Stimulus-outcome learning Pavlovian consummatory CRs &
Hedonic URs
Fig. 9. Major functional domains of the striatum. An illustration of a coronal section of the striatum showing half of the brain (Paxinos and Watson, 2007). The four functional
domains are anatomically continuous, and roughly correspond to what are commonly known as nucleus accumbens shell and core (ventral striatum), the dorsomedial
striatum and the dorsolateral striatum. These striatal subregions are thought to implement different aspects of reinforcement learning, either ‘model-free’ learning (dark
grey) or ‘model-based’ learning (light grey). In addition, these subregions are thought to represent both the actor and the critic. Within the dorsal striatum, the lateral portion
supports a model-free actor function whereas the dorsomedial region represents a model-based actor. The ventral striatum, which is crucial for Pavlovian learning, is thought
to represent the critic; the core represents a model-free critic, whereas the shell represents a model-based critic.
After Bornstein and Daw (2011) and Yin et al. (2008).
116 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
2010; Izquierdo et al., 2006; McGeorge and Faull, 1989; Mulder organism learns about environmental cues that lead to biologically
et al., 1998; Totterdell and Meredith, 1997; van Groen and Wyss, significant events such as food, mates, and shelter. It is not
1990; Voorn et al., 2004). The nucleus accumbens, the main portion surprising then, that autoshaping is demonstrated by a number of
of the ventral striatum, can be divided into two major subregions, species, including birds (Brown and Jenkins, 1968) monkeys
the core which is continuous with the dorsomedial striatum, and (Sidman and Fletcher, 1968) and humans (Wilcove and Miller,
the shell which occupies the ventral and medial portions of the 1974).
nucleus accumbens. Although the core and shell regions share A number of studies suggest that the nucleus accumbens
common characteristics, they also differ significantly in terms of mediates autoshaping. For example, Cardinal et al. (2001)
their cellular morphology, neurochemistry, and patterns of demonstrated that excitotoxic lesions of the nucleus accumbens
projections, all of which may suggest a different function for the core impair the ability to discriminate between a cue that is
core and shell (Heimer et al., 1991; Jongen-Relo et al., 1994; predictive of reward and an alternate cue with no predictive value.
Meredith, 1999; Meredith et al., 1992, 1996, 2008; Usuda et al., Similarly, depletion of dopamine in the nucleus accumbens results
1998; Zahm and Brog, 1992; Zahm and Heimer, 1993). The core in deficits in the acquisition and expression of approach behaviors
and shell regions of the nucleus accumbens are not likely to (Di Ciano et al., 2001; Parkinson et al., 2002). Further, electrophys-
function completely independently of each other, however, as iological recordings during autoshaping demonstrate that accum-
direct interconnections between these areas have also been bens neurons exhibit phasic changes in firing rate that are selective
described (Heimer et al., 1991; van Dongen et al., 2005; Zahm, for cues predictive of reward; in some cases, an increase in activity
1999; Zahm and Brog, 1992; Zahm and Heimer, 1993). is associated with the onset of a reward predicting cue, while a
Based on its connectivity, a general working model has been second subset of neurons is significantly inhibited. These same
that the nucleus accumbens represents a ‘limbic–motor interface’ cells showed little or no change in activity in response to a cue that
that facilitates appropriate responding to reward-predictive was not paired with reward. These findings were also core and
stimuli (e.g., Ikemoto and Panksepp, 1999; Mogenson et al., shell specific; significantly fewer neurons in the shell showed an
1980; Nicola, 2007; Pennartz et al., 1994; Wise, 2004; Wright et al., excitatory response to predictive cues compared to neurons within
1996; Zahm, 2000). How this process is achieved, however, is not the core (Day et al., 2006). In addition, lesion and pharmacological
fully understood. If the accumbens does indeed represent such an data indicate that disrupting activity within the core interferes
interface, then it should, at the very least, process information with approach toward predictive cues, suggesting that the core
related to reward and the actions that lead to the acquisition of may help organisms discriminate between biologically relevant
reward. In fact, there is a fair amount of evidence suggesting that and irrelevant cues (Cardinal et al., 2001; Di Ciano et al., 2001). The
neurons within the nucleus accumbens respond to cues associated functional dissociation between the core and shell might be
with a reward (e.g., Carelli and Ijames, 2001; Cromwell and expected given that these regions send separate projections to
Schultz, 2003; Hassani et al., 2001; Hollerman and Schultz, 1998; different output structures (Heimer et al., 1991; Sesack and Grace,
Nicola et al., 2004; Roitman et al., 2005; Setlow et al., 2003; Wilson 2010).
and Bowman, 2005), as well as the selection of one behavior from The accumbens is also involved in Pavlovian-instrumental
among competing alternatives (Hikosaka et al., 2006; Nicola, 2007; transfer (PIT), which is the capacity of a Pavlovian stimulus that
Pennartz et al., 1994; Redgrave et al., 1999a; Roesch et al., 2009; predicts reward to elicit or increase instrumental responses for the
Taha et al., 2007). same (or a similar) reward (Estes, 1943, 1948; Kruse et al., 1983;
Rescorla and Solomon, 1967). To produce PIT, animals first undergo
6.3.1. Nucleus accumbens and Pavlovian learning Pavlovian and then instrumental training during which they learn
Foraging animals encounter situations in which they are to associate a cue with reward and then later, learn to make a
required to find food or other necessary resources. In order to specific operant response (i.e., press a lever) for the reward. On a
learn that certain stimuli may signal the availability of the resource probe trial, the predictive cue is presented with the lever, and the
being pursued, organisms must be able to learn relationships change in response rate on the lever is measured. Two forms of PIT
between positive outcomes and their reward predictive cues. This can be observed, one that is related to the arousing effect of
behavior can be investigated within the laboratory using an reward-related cues (non-selective PIT), and another that is more
autoshaping (also known as ‘sign tracking’) paradigm. In auto- selective for choice performance produced by the predictive status
shaping experiments, a cue is paired with the availability of of a cue with respect to one specific reward compared to others
reward. Initially, this cue is neutral, meaning that the cue itself is (outcome-selective PIT) (Holmes et al., 2010). The shell and core
neither biologically significant, nor is it predictive of reward. regions of the nucleus accumbens are differentially involved in
Because the cue is novel, and rodents have a propensity for general and selective PIT; general PIT is disrupted by lesions of the
investigating novel cues and objects (Bardo et al., 1989, 1996; core, but not by lesions of the shell (Hall et al., 2001), whereas
Bardo and Dwoskin, 2004; Burns et al., 1996; De Leonibus et al., selective PIT is disrupted by lesions of the shell, but not by lesions
2006), the animal will approach the cue, and over time will begin to of the core (Corbit et al., 2001). Importantly, because the
associate the cue with a reward. Thus, the neutral cue gains control accumbens is not thought to be integral to instrumental behaviors
over approach responses even though reward delivery is indepen- (Yin et al., 2008), other regions of the striatum that are involved in
dent of any specific behavior, and with extended training, instrumental learning should also be involved in PIT. In fact, Corbit
approach responses are observed nearly every time the reward- and Janak (2007, 2010) have shown that the dorsolateral and
predictive cue is presented. A cue that has never been paired with dorsomedial striatum integrate different aspects of Pavlovian and
reward does not elicit approach behavior even after repeated instrumental information. For example, lesions of the dorsolateral
presentation (Bussey et al., 1997; Robbins and Everitt, 2002). This striatum reduces PIT altogether, whereas lesions of the dorsome-
approach behavior lacks the flexibility of instrumental learning in dial striatum interferes with the selectivity of PIT (Corbit and Janak,
that the behavior is not generally altered by the introduction of 2007).
new contingencies (Bussey et al., 1997; Day and Carelli, 2007;
Jenkins and Moore, 1973; Locurto et al., 1976; Williams and 6.3.2. The nucleus accumbens and cost-based decision making
Williams, 1969). Autoshaping has important implications for When animals are pursuing a goal, they are often faced with
foraging behavior; in a rapidly changing environment, autoshaping complex effort or time-related barriers that separate the actions
behaviors represent a fundamental mechanism through which an they make from the goal being pursued. This is the case in natural
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 117
foraging environments, and in the laboratory where animals are their predictive cues are separable and independently modulated
trained to lever press or navigate a maze for reward. Thus, it is when instrumental-response requirements are progressively in-
adaptive for animals to cope with delayed reinforcement or creased. That is, reward-evoked dopamine release within the
increased effort to obtain the desired outcome. Within the accumbens is affected by escalating costs in proportion to the delay
laboratory, effort-based decision making can be assessed by imposed prior to reward delivery rather than to increased work
providing the organism with a choice between a low-cost/low requirements, whereas cue-evoked dopamine release is unaffected
value reward vs. a high-cost/high value reward. Most typically, low by either temporal or effort-related costs. Together, these results
cost options are associated with, for example, few lever press may be congruent with competing theories of dopamine function: if
responses or a short time delay, while high cost options require dopamine provides a prediction error signal, then dopamine
significantly more lever presses or impose a longer delay between neurons in a trained animal respond to rewards only when they
the last response and the delivery of reward. Many factors may are unexpected (Fiorillo et al., 2003; Schultz et al., 1997), as would be
influence the choice that any one animal makes, including the case when the relative cost of a reward changes. In addition,
motivational factors such as how hungry the animal is, or how phasic dopamine signals may provide an incentive signal that is used
desirable the reward is (Salamone et al., 2007, 2009). A growing to determine the value of the reward (Berridge, 2007). This would
body of work suggests that the nucleus accumbens and its cortical also explain the observation that changes in phasic dopamine occur
afferents (e.g., the anterior cingulate cortex and medial prefrontal when costs to obtain the reward changes. Finally, these results may
cortex) are involved in exertion of effort and effort-related choice also be consistent with the ‘Flexible Approach Hypothesis’ which
behaviors (e.g., Cardinal et al., 2001; Floresco and Ghods-Sharifi, states that dopamine signaling within the accumbens is required for
2007; Floresco et al., 2008a; Salamone, 2002; Walton et al., 2006). reward seeking behavior only when specific actions that are
Disrupting activity within the nucleus accumbens can shift necessary to obtain reward are variable across trials (Nicola, 2010).
behavior toward actions that require less effort or are associated The role of the nucleus accumbens in mediating cost-based
with shorter delays to reward (Aberman and Salamone, 1999; choice behavior has also been tested using maze tasks. For
Aberman et al., 1998; Bezzina et al., 2008; Cardinal et al., 2001; Day example, a T-maze choice task (Cousins et al., 1996; Salamone,
et al., 2011; Hauber and Sommer, 2009; Walton et al., 2006). In a 1994) can be used in which one of the choice arms contains a large
recent study (Day et al., 2011), the complex role that the nucleus food reward, whereas the other arm has a significantly smaller
accumbens plays in effort-based and delay-based costs was reward. Effort-related decision problems can be introduced by
assessed. In this study, a visual cue signaled the relative value placing a barrier in the arm that contains the larger reward, thus
of an upcoming reward. Analysis of single unit activity within the presenting an obstacle that the rat must climb to gain access to the
accumbens indicates that a subgroup of neurons show phasic larger reward. Alternatively, the barrier that prevents the rat from
increases in firing in response to the predictive cue, and this accessing the larger reward can be used to impose a delay before
activity reflects the cost-discounted value of the upcoming access to the large reward is granted. Using an effort-based version
response for effort-related, but not delay-related costs. In contrast, of this task, Cousins et al. (1996) demonstrated that excitotoxic
additional subgroups of neurons respond during response initia- lesions of the accumbens significantly decreased selection of the
tion or reward delivery, but this activity does not differ on the basis high effort/high reward maze arm. When, however, reward was
of reward cost. Finally, another population of neurons within the entirely omitted from the low effort maze arm, these rats choose
accumbens showed sustained changes in firing rate (either the high effort/high reward arm and were capable of obtaining the
excitation or inhibition) while rats completed high-effort require- reward, despite the high cost.
ments or waited for delayed rewards. The complexity of the results Recently, Bardgett et al. (2009) used a discounting version of the
reported in this study highlights the complexity of the computa- T-maze task in which the amount of food in the large reward arm of
tions required to make decisions when faced with competing the maze was reduced each time the rat selected that arm. This
options. For the foraging animal, the cost of obtaining rewards is ‘adjusting-amount’ discounting variant of the T-maze task permits
dynamic; for example, the time to explore and the distance that assessment of the indifference point for each rat, which is defined
must be travelled to obtain resources is constantly changing as the point at which the rat no longer shows a preference for one
(Stephens, 1986). Because individual neurons within the accum- reward over the other, and therefore chooses both amounts equally
bens receive diverse cortical and subcortical inputs, they are likely often (Richards et al., 1997). When dopamine signaling was
to carry a heavy information processing load (Kincaid et al., 1998) blocked with either a D1 or D2 receptor antagonist, rats were more
in complex decision making environments. likely to choose the small-reward arm, but when treated with
Dopamine signaling also contributes to the execution of cost– amphetamine, rats were more likely to choose the large-reward
benefit decisions (Fiorillo et al., 2003, 2008; Gan et al., 2010; arm. Clearly, carefully designed behavioral studies with mazes can
Kobayashi and Schultz, 2008; Ostlund et al., 2011; Phillips et al., provide a more complete understanding of how the brain
2007; Roesch et al., 2007; Roitman et al., 2004; Tobler et al., 2005; processes information necessary for making (optimal) decisions
Wanat et al., 2010). Some studies have investigated the role of in complex learning environments. In fact, cost-based decision
putative dopamine neurons to cost-based decisions by measuring making has been investigated on several maze-based tasks, have
activity in the midbrain (Fiorillo et al., 2003, 2005, 2008; Kobayashi undergone behavioral validation and evaluation (Cousins et al.,
and Schultz, 2008; Roesch et al., 2007; Tobler et al., 2005) while 1996; Salamone et al., 1991; van den Bos et al., 2006), and have
others studies have obtained a measure of dopamine activity within been used by several laboratories to characterize the effects of
the nucleus accumbens, since the latter is a major target of midbrain brain lesions or drug manipulations on choice behavior (Bardgett
dopaminergic projections, and is known to be involved in the et al., 2009; Denk et al., 2005; Salamone et al., 1991; Schweimer
computations that support cost-based decision making (Day et al., and Hauber, 2006; Walton et al., 2002). Although there are very
2011; Gan et al., 2010; Salamone et al., 2009; Wanat et al., 2010). In obvious differences between these tasks, and the operant tasks
studies using voltammetry to measure phasic dopamine release, after which they have been modeled, both have yielded remark-
cue-evoked dopamine signals are shown to be relatively insensitive ably similar results (Bardgett et al., 2009; Cousins et al., 1994; Denk
to both effort-based and delay-based costs, but a significant et al., 2005; Floresco et al., 2008b; Koch et al., 2000; Salamone et al.,
response is observed when the cost to obtain reward changes 1991, 2002; Sink et al., 2008; Wakabayashi et al., 2004; Walton
(Gan et al., 2010; Roesch et al., 2007; Wanat et al., 2010). Further, et al., 2006). Thus, maze tasks appear to be valid models for
Wanat et al. (2010) showed that dopamine responses to rewards and investigating choice behavior during cost-based decision making.
118 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
6.3.3. Spatial learning and navigation: the role of the ventral striatum neurons described by Lavoie and Mizumori (1994) likely reflect the
The ability to make optimal cost-based decisions is essential if integration of spatial with non-spatial information (i.e., reward
animals are to make adaptive behavioral choices during goal- and movement-related information) within the ventral striatum.
directed navigation. The ventral striatum appears strategically In addition, the fact that single ventral striatal neurons encode
positioned to play a key role in cost based decisions during multiple types of information supports the view that spatial,
navigation given the convergent evidence from a variety of maze reward and movement information may be integrated at the level
studies, including the spatial version of the Morris swim task of individual ventral striatal neurons. Thus, together with the
(Sargolini et al., 2003; Setlow and McGaugh, 1998), the radial maze hippocampus, the ventral striatum plays a key role in evaluating
(Gal et al., 1997; Smith-Roe et al., 1999), a spatial version of the and selecting the behaviors most likely to result in reward, and
hole board task (Maldonado-Irizarry and Kelley, 1995), as well as a thus underlie goal-directed behavior (in this particular case, goal-
task in which the animals are required to discriminate a spatial directed navigation).
displacement of objects (e.g., Annett et al., 1989; Ferretti et al., In addition to characterizing the activity of the hippocampus
2005; Roullet et al., 2001; Sargolini et al., 1999; Seamans and and the ventral striatum in a maze-based decision making task,
Phillips, 1994; Usiello et al., 1998). characterization of the dorsal striatum was also undertaken.
To investigate the idea that the ventral striatum associates Previous studies have provided evidence that neurons within the
specifically spatial context with reward information to facilitate dorsal striatum exhibit egocentric movement-related discharge
initiation of appropriate navigation-based behaviors (Mogenson (e.g., Barnes et al., 2005; Jog et al., 1999; Yeshenko et al., 2004) and
et al., 1980), Lavoie and Mizumori (1994) recorded neural activity show spatially selective firing on maze tasks. On the multiple T-
in the ventral striatum while rats navigated an 8-arm radial maze maze task, van der Meer et al. (2010) observed a gradual increase in
for food reward. This study demonstrated, for the first time, spatial the coding efficiency of dorsal striatal neurons as the animals
firing correlates within the ventral striatum (Lavoie and Mizumori, become better at implementing the correct choice. In addition,
1994). The mean place specificity for all ventral striatal neurons these responses within the dorsal striatum are most evident during
was significantly lower than that typically observed in the the turn sequence, at reward location, and in response to cues that
hippocampus (Barnes et al., 1990), indicating that while ventral are predictive of reward (van der Meer et al., 2010). This suggests
striatal neurons discharge with spatial selectivity, they are not as that activity in the dorsal striatum may reflect the events that
selective as those observed from hippocampal neurons. The define the task structure; because the ultimate goal of the task is to
moderate spatial selectivity likely reflects the integration of reach reward, this is one salient event, and the turn sequence that
spatial with other non-spatial information within the ventral the rat makes in order to reach that reward might be considered
striatum, including reward and movement. The fact that single another salient aspect of task structure. This result is in line with
ventral striatal neurons encode multiple types of information work from Graybiel and colleagues (e.g., Barnes et al., 2005; Jog
supports the view that spatial, reward and movement information et al., 1999), which is discussed in greater detail below. Overall,
may be integrated at the level of individual ventral striatal these results provide evidence for a functional network that
neurons. Recent evidence suggests that spatial information within supports choice behavior on a goal-directed navigation based task.
the ventral striatum is derived from the hippocampus, Ito et al. The role that the dorsal striatum plays in decision and learning
(2008) showed that an interruption of information sharing processes will be discussed below.
between the hippocampus and shell of the nucleus accumbens
disrupted the acquisition of context-dependent retrieval of cue 6.4. Dorsal striatum: contributions to response and associative
information, suggesting that the shell, in particular, may provide a learning
site at which spatial and discrete cue information may be
integrated. Historically, investigation of the particular role that the dorsal
Work by Redish and his colleagues have sought to describe the striatum plays in mediating goal-directed behaviors investigated
unique contributions that the hippocampus and the striatum make the dorsal striatum as a single entity, and it has only been fairly
to choice behavior and spatial information processing using a recently recognized that the lateral and medial aspects of the
multiple T-maze task. With this task, several choice points are dorsal striatum participate in learning in unique ways (Balleine
presented to the rat as it navigates from a start location to a reward et al., 2007; Balleine and O’Doherty, 2010; Yin et al., 2008). The
site. The final choice point on the maze represents a point in space dorsomedial striatum is innervated by the association cortices, and
where the animal makes a final ‘high-cost’ choice to gain access to the anterior portion of the dorsomedial striatum also receives
reward. At this critical point, a number of interesting events occur projections from the prefrontal cortex, while the more posterior
in terms of both observable behavior and neuronal responses. First, region receives significant projections from the perirhinal and
early in training, while the animal is learning the correct choice, agranular insular regions, as well as the entorhinal cortex and
the animal pauses and engages in what is called ‘vicarious trial and basolateral amygdala (McGeorge and Faull, 1987, 1989). This
error’ (Tolman, 1938, 1939). While this behavior is being engaged, region of the dorsal striatum is thought to mediate goal-directed
ensembles of hippocampal neurons transiently represent locations behaviors, as has been shown in instrumental operant tasks, and in
ahead of the animal, sweeping down the arms of the maze before goal-directed navigational tasks. In contrast, the dorsolateral
the animal implements a choice (Schmitzer-Torbert and Redish, striatum, which is innervated by the primary motor and
2002; van der Meer et al., 2010). In parallel with these forward somatosensory cortices, underlies motor skill learning and habit
sweeps, neurons in the ventral striatum that are responsive to learning that allows automaticity of behavior when appropriate
reward (i.e., at the reward site on the maze), also show enhanced (see Balleine et al., 2007; Johnson et al., 2007; Yin and Knowlton,
neural responses at the final decision point. This activity is thought 2006; Yin et al., 2008). Importantly, both modes of learning will
to reflect an ‘expectation-of-reward’ signal at decision points (van contribute to flexible navigational behaviors – it is through the
der Meer et al., 2010; van der Meer and Redish, 2010). This interaction of these two modes of learning that animals will be able
interpretation is congruent with work described above showing to select the most adaptive behavior necessary to navigate in a
that the ventral striatum is involved in mediating the influence complex learning environment. In terms of reinforcement learning
that motivationally relevant cues have on behavior (Cardinal et al., theory, as a whole, the dorsal striatum is thought to represent the
2001; Day and Carelli, 2007; Kelley, 2004). In addition, these actor in the actor–critic framework, but the dorsomedial striatum
results support the idea that the moderately spatial selective is thought to perform this function within a model-based system
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 119
whereas the dorsolateral striatum is thought to perform this whereas the dorsolateral striatum underlies response (motor)
function within a model-free framework. learning. Lesions to the dorsomedial or dorsolateral striatum were
made prior to the acquisition of the task, and rats were then
6.4.1. Action–outcome learning and habit learning in the dorsal extensively trained to retrieve reward using a response strategy,
striatum specifically a rightward body turn. The strategy that the animal is
Given enough time and practice, the learning of a motor skill or using can be assessed directly on a probe trial in which the animal
habit can move from being effortful, to a point where the newly begins the trial on a different arm of the maze. If the animal is
acquired skill can be performed without a great deal of cognitive dependent on a response/motor strategy, then it will persist in
effort. Under ‘normal’ learning conditions, some degree of making a rightward body turn, but if using a more flexible place
automation of behavior may be beneficial in that well-learned strategy, the animal will be able to navigate to the rewarded site by
behaviors can take place without a great deal of information reintegrating the spatial features of the environment with the goal
processing resources being engaged, thus leaving the organism in a location. Lesions of the posterior dorsomedial striatum resulted in
position to direct attentional and cognitive resources to more the use of a response strategy; in this case, the animals continued
difficult or urgent matters. The mechanisms that underlie this to make rightward body turns, while control animals were able to
transition are only just beginning to be understood. Behavioral employ a place strategy to successfully retrieve the reward. This
evidence indicates that motor skill and habit learning takes place observation, together with the data discussed above, indicates that
over an initial phase of fast improvements, followed by a slower the dorsomedial striatum underlies flexible choice behavior
phase of gradual refinement (Costa et al., 2004; Karni et al., 1998; (Corbit and Janak, 2010; Devan and White, 1999; Ragozzino
Yin and Knowlton, 2006; Yin et al., 2008). Within an instrumental et al., 2002; Whishaw et al., 1987).
learning task, this incremental learning is observed during an Neurophysiological studies indicate that neurons within the
initial phase of learning that is sensitive to both the action– dorsomedial striatum undergo changes in activity early on during
outcome contingency and the value of the outcome. After motor learning and their firing has been shown to change
prolonged training, however, these actions are transformed, and according to flexible stimulus-value assignments (Kimchi and
the behavior becomes automatic and insensitive to both the Laubach, 2009; Yin et al., 2009). Similarly, inactivation or
action–outcome contingency and to the outcome value (Balleine pharmacological manipulations of the prelimbic and infralimbic
and Dickinson, 1998; Balleine et al., 2009; Yin et al., 2008). cortical areas, which form part of the association loop that projects
A series of elegant studies conducted by Yin and his colleagues to the medial portion of the dorsal striatum, also impairs
have clearly identified functional differences between the dorsolat- behavioral flexibility (Ragozzino et al., 1999a,b). Whereas the
eral and dorsomedial striatum (Yin et al., 2004, 2005, 2006, 2009; Yin hippocampus may be necessary to establish the spatial location of
and Knowlton, 2004). Animals were trained to lever press for sucrose the goal (see Section 5), it would appear that the dorsomedial
reward using instrumental contingencies that are known to striatum is important for choosing the correct course of action that
eventually lead to habit formation. To test if the behavior indeed leads the animal to this location. One intriguing interpretation of
reached habit status, the reward was paired with lithium chloride to these results is that the hippocampus does not compete with, or
induce taste aversion. Control animal given this treatment contin- function independently of, the striatum, as has been previously
ued to lever press for sucrose reward, indicating that their behavior claimed (Packard and Knowlton, 2002; Poldrack and Packard,
was impervious to the reward devaluation procedure. Animals with 2003), but rather, these brain regions work synergistically to form
selective lesions of the dorsolateral striatum, however, significantly a functional circuit (Mizumori et al., 2004, 2009; Yin and Knowlton,
reduced their rate of responding, indicating that the dorsolateral 2006). This hypothesis is supported by studies that have examined
striatum plays a key role in habit behavior. Importantly, lesions of neural activity in the dorsomedial and dorsolateral striatum during
the dorsomedial striatum after the acquisition of the habitual spatial navigation. Some of the neurons within these regions
behavior did not affect habitual responding; these animals exhibit location-specific firing while a rat traverses a maze,
continued to lever press for sucrose reward after lithium chloride occasionally independent of both movement and reward condition
treatment, indicating that the dorsomedial striatum is not necessary (Mizumori et al., 2000; Ragozzino et al., 2001; Wiener, 1993).
for the expression of habitual behavior once it has been acquired (Yin While it has been argued that hippocampal place fields contribute
et al., 2004). Working on the idea that the dorsomedial striatum may to the determination of context saliency (discussed in Section
be involved in action–outcome learning rather than habit learning, 5.3.1), striatal place fields may be used to provide location-
Yin et al. again trained rats on a task that is normally sensitive to selective and context-dependent control over an animal’s move-
outcome devaluation and contingency degradation in which the ment. On the other hand, neurons that are sensitive to the
probability of reward delivery is no longer dependent on an egocentric movement of the animal are likely to reflect intentional
appropriate response by the rat (Colwill and Rescorla, 1990; movement/planning of movement toward the goal location, and
Hammond, 1980). Reversible inactivation of the posterior part of neurons responsive to the goal location provide information
the dorsomedial striatum, as well as pre- and post-training lesions of regarding the outcome of the action/movement to the goal location
this region, eliminated sensitivity to outcome devaluation and (Mizumori et al., 2004; Yeshenko et al., 2004). Support for this idea
degradation, and thus led to habit-like responding (Yin et al., 2005). has also been shown in non-human primates, in which striatal
Based on these results, it appears that the posterior dorsomedial neurons become engaged in processing information about learned
striatum is important for learning and expression of goal-directed events that have not yet occurred, suggesting that this activity is
behavior because when this region is functionally blocked, behavior evoked by the expectation of an upcoming salient event (Schultz
of the animal becomes habitual even under training conditions that et al., 1997). This kind of neural activity signals not only whether
normally result in goal-directed actions in control rats. an event is going to occur, but also the location of the event
As discussed in relation to the ventral striatum, maze tasks can (Hikosaka et al., 1989), and in some cases, the direction of
be used that closely parallel learning contingencies used within impending movement (Alexander and Crutcher, 1990b).
instrumental-operant tasks, despite there being obvious differ-
ences between the motor programs necessary for pressing a lever 6.4.2. Response learning in the dorsal striatum
and traversing a maze. Using a T-maze task, Yin and Knowlton For response learning, sensory stimuli directs the behavior or
(2004) evaluated the idea that the posterior dorsomedial striatum motor response that will ultimately be made, for example, an arm
is involved in flexible action–outcome/associative learning, movement or a body turn. The likelihood that any particular
120 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
movement is made in response to a stimulus is initially influenced suggests that the dorsal striatum participates in self-initiated
by the presence or absence of reward. Over time however, reward sequences of behaviors that lead to reward. In this study, rats were
no longer reliably influences behavior, and thus, the behavior is no trained to press two levers in a particular sequence in order to gain
longer considered flexible, but is considered habitual. The access to reward. Excitotoxic lesions of the dorsolateral striatum
acquisition of a habit involves the gradual development of specific significantly impaired the acquisition of the correct sequence,
S–R associations (Mishkin et al., 1984; Squire et al., 1993). A habit while lesions of the dorsomedial striatum had no significant effect
is distinguished by the tendency to be ‘response-like’ meaning that on sequence learning. In terms of reinforcement learning algo-
it is triggered automatically by a particular stimulus or stimulus rithms, chunking of behaviors into a coherent ‘whole’ that leads to
complex (Dickinson, 1985). If individual neurons represent a desired goal is formalized in hierarchical reinforcement learning
stimulus–response associations, then they should exhibit two models (Botvinick et al., 2009). These models are attractive for
key characteristics: their activity should be modulated by the describing goal-directed behavior in complex learning situations
presentation of a stimulus that cues the organism to perform an because they may be able to more accurately describe multiple
action for reward, and their activity should encode some aspect of ‘bits’ of behavior that ultimately lead to goal acquisition, blending
the action that the organism performs once the stimulus has been both model-based and model-free behavioral strategies that are
presented. This kind of activity has been well demonstrated in the likely to underlie flexible goal-directed behavior. Learning to
dorsolateral striatum using tasks that require the subject to make a execute learned actions in a complete sequence is essential for
specific response movement to receive a reward as directed by an survival and subserves many routine behaviors, including naviga-
instructional cue (e.g., Barnes et al., 2011; Jog et al., 1999; Thorn tion.
et al., 2010). These kinds of results have been demonstrated in both Organizing behaviors into sequences requires that precise
primates and rodents, for several different kinds of task-relevant timing and identification of the beginning and the end of a
cues, including auditory and visual cues, and for many different complete sequence of behaviors. Recent work has elegantly
body movements, including movement of the hand, arm/forelimb, demonstrated that the ‘stop’ and ‘start’ signals that identify the
eyes, head and whole body movements (Alexander and Crutcher, beginning and end of self-initiated sequential behavior appear to
1990b; Barnes et al., 2005; Gardiner and Kitai, 1992; Hikosaka be coded within the dorsal striatum (Jin and Costa, 2010). In this
et al., 1989; Jaeger et al., 1993; Jog et al., 1999; Kimura et al., 1992; study, rats were trained to press a lever on a fixed ratio schedule
Schultz and Romo, 1988, 1992; White and Rebec, 1993). that required 8 lever presses to obtain sucrose reward. Over the
Work by Ann Graybiel and her colleagues have identified some course of training, rats gradually acquired a sequence of
of the key neural mechanisms that underlie habit formation/ approximately 8 lever presses, with few responding any more or
stimulus–response learning (Barnes et al., 2005, 2011; Jog et al., any less when the lever was active. As the rats learned the
1999). Using a T-maze task, rats were overtrained to respond to the behavioral sequence necessary for obtaining reward, the activity of
presentation of an auditory instruction cue that indicated that the neurons within the dorsal striatum and the SNc appeared to reflect
animal should turn left or right to reach the goal (i.e., food reward). the initiation and termination of the self-paced action sequences.
Single unit recordings from the dorsolateral striatum were Importantly, control experiments provided evidence that these
performed throughout the training procedure, which allowed an learning-related changes in neuronal activity reflected neither
assessment of potential changes in neural activity as learning movement speed nor action value (Jin and Costa, 2010). Thus, these
progressed. In addition, task-related neural activity was assessed results have identified a fundamental mechanism that organizes
at different areas on the maze, including the start area, the area actions into behavioral sequences, and have important implica-
where the tone was provided, the area where the body turn toward tions for complex adaptive behaviors, including goal-directed
the goal was executed, and the goal location. Initially, neural navigation.
activity was responsive to several aspects of the task, especially the
point at which an animal executed the body turn toward the goal 6.5. Interactions between the dorsomedial and dorsolateral striatum
location. Over the course of learning, however, neural activity
gradually shifted, so that task-related activity reflected the Although many behaviors that are performed on a regular basis
beginning and the end of the task. This pattern of activity are often performed automatically, there are instances when it is
remained stable over the course of several weeks, as did the necessary to alter a routine if something in the environment
behavior (Jog et al., 1999). These results suggest that there is a changes and the routine behavior is thus rendered inappropriate.
restructuring of neuronal responses within the sensorimotor The regulation of this behavioral switching can occur either
striatum as habitual behavior develops. retroactively as a result of error feedback or proactively by
detecting a change in context. A salient example that is often given
6.4.3. Sequence learning in the dorsal striatum for this kind of behavior is driving to work – anecdotally, many
In addition to learning which behaviors ultimately lead to people have experienced suddenly arriving at work in their car
reward, goal-directed behavior may require that behaviors are without any specific recollection of the journey, despite being the
performed in a particular order or sequence. There is evidence that driver of the car. This is due, in part, to a fairly static context in
the striatum participates in the sequential organization of natural which we transverse the same route, and thus encounter the same
behaviors in monkeys (Van den Bercken and Cools, 1982) and rats traffic lights, execute the same turns, and become accustomed to
(Berridge and Whishaw, 1992; DeCoteau and Kesner, 2000; Pellis the background scenery around us (buildings, street lights, trees,
et al., 1993). For example, the dorsal striatum has been shown to be etc.). When, however, a significant change is encountered on our
critical for grooming sequences in rats (Aldridge and Berridge, drive to work, for example an unexpected accident that is backing
1998; Berridge and Whishaw, 1992). In addition, in the work up traffic, we can quite quickly interrupt our behavioral routine
discussed above, it was demonstrated that neurons within the and evaluate other available options for getting to work. Thus,
dorsolateral striatum tend to respond to the beginning and the end when confronted with a change in context, an important decision
of trials as training on a cued T-maze task progresses. This response can be made to switch from a routine behavior to an alternative
may indicate that behavioral sequences are parsed into ‘chunks’ as behavior that will allow us to reach our goal location.
the task is learned (e.g., Barnes et al., 2005; Boyd et al., 2009; In order for habits to develop, learning needs to occur that
Graybiel, 1998; Kubota et al., 2009; Thorn and Graybiel, 2010; associates a particular action with a particular outcome. As
Tremblay et al., 2009, 2010). Recent work by Yin (2010) also described above, this kind of association can be mediated by the
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 121
dorsomedial striatum. Once a behavior has been well-learned, multiple neural systems are involved, but also the adaptive
however, its performance appears to be mediated by the features of this behavioral model depend on conditional and
dorsolateral striatum. If these observations are true, then a iterative processing loops, as well as coordination at multiple
question that remains is how these different subregions gain or levels of neural function (from single neuron to specific interac-
maintain control over behavior? Recent work by Thorn et al. (2010) tions between brain structures). Also contributing to the difficulty
suggests that activity within the dorsolateral and dorsomedial of studying complex behaviors are the dynamic ways in which the
striatum undergo simultaneous changes in their neuronal activity nature of the signals transmitted to efferent structures can change,
patterns, but that these changes are unique in each structure as both in terms of information content, and whether such signals
learning progresses. These results point to a confluence of many serve activating, inhibiting, or permissive roles. Moreover, much of
other pieces of data (e.g., Jog et al., 1999; Yin and Knowlton, 2004; the existing literature on the neurobiology of complex behaviors
Yin et al., 2009) and suggest a current working model in which the considers rate codes of neurons, and to a lesser degree, temporal
dorsomedial striatum regulates the evolution of behavior toward codes, although this is changing in more recent studies. At a higher,
habit formation. This idea has been further tested by Yin et al. more integrative level, the identity of the coordinating mechanism
(2009), who identified region specific changes in striatal neural of orchestrated neural activity is not yet known. With regard to the
activity that map onto different phases of skill learning. latter issue, a likely possibility is that the primary determinant of
Electrophysiological recordings from the dorsolateral and dor- the interactive and dynamic patterns that emerge may not be
somedial striatum were performed while mice learned an attributed to a single brain structure but rather a state, such as a
accelerating rotarod task, a task that requires the gradual motivational or emotional state.
acquisition of complex movements to stay on the rotating rod.
Performance on this task is characterized by rapid initial 7.1. Single cells and local network coordination
improvement on the first day of training, with performance
reaching asymptotic levels after three days of training. These The functional orchestration of neural systems that underlie
behavioral observations were accompanied by distinct changes in complex behaviors should be expected to involve integration
the rate of neuronal activity in the dorsomedial striatum early on in within and across multiple levels of processing, from cellular to
training, while the dorsolateral striatum showed increased rate local circuit to neural systems. We are only beginning to
modulation during the extended training period. Further, when understand how such integration can happen, and studies of
lesions of the dorsomedial striatum were given prior to training, goal-directed navigation have begun to reveal important clues.
mice were unable to acquire the skill, but this was not observed Starting at the level of single neurons, it is known that dopamine
when lesions were produced after the acquisition of the skill. In has effects across different timescales in different brain structures,
contrast, lesions of the dorsolateral striatum affected both early and this may define the type of coordination that is possible at any
and late phases of training, suggesting that the dorsolateral given point in time. In the hippocampus, a short lasting effect of
striatum and dorsomedial striatum participate in the acquisition of dopamine may be to determine the location of a place field (Martig
the motor skills, but once the skills are learned, the dorsomedial et al., 2009), while a long lasting effect could be to enhance the
striatum is no longer engaged. Recordings from slices taken from duration of the post-event period of plasticity (Huang and Kandel,
the trained animals demonstrated a potential synaptic mechanism 1995; Otmakhova and Lisman, 1996; Rossato et al., 2009). By
for this transition: medium spiny neurons in both the dorsomedial prolonging periods of plasticity, dopamine activation may make it
and dorsolateral striatum exhibited training phase-related so that there is sufficient time for accurate context analysis, a
changes in glutamatergic transmission. The slope of excitatory process that in turn determines which memories are formed or
postsynaptic potentials, a measure of synaptic strength, was higher updated. An example of how this might work can be seen when one
selectively in the dorsomedial striatum following early training, considers place field responses during learning: place fields
while an increase in synaptic strength was higher in the become sequentially associated as rats repeatedly traverse a path
dorsolateral striatum only after extended training. Although this on its way to reward. This sequential activation of place cells was
task is different from more traditional learning tasks (instrumen- shown to repeat itself ‘off-line’ during subsequent period of
tal-operant or maze tasks), it is likely to point to a fundamental relative inactivity (e.g., Lee and Wilson, 2002; Louie and Wilson,
synaptic mechanism that underlies the transition from action– 2001; Wilson and McNaughton, 1994). This pattern of neural
outcome/associative learning to well learned habit/motor skills, ‘replay’ is consistent with many theories of memory including the
irrespective of the task used. idea that optimal memory requires the reactivation of behavioral
In summary, there is emerging evidence that the striatum experiences, typically during periods of sleep or rest (Buzsaki,
functions to evaluate the outcomes of behaviors in terms of an 1989; Marr, 1971; McClelland et al., 1995; Pennartz et al., 2002).
organism’s learned expectations. Through a series of interactive Interestingly, dopamine has been shown to facilitate hippocampal
loops of information flow between the striatum and different ‘replay’ of sequences of place fields (Singer and Frank, 2009). Thus,
cortical and subcortical structures, behavioral responses and their dopamine may direct both cellular (e.g., place field location) and
expected consequences will become more refined and predictable. circuit level (e.g., sequential activation of place fields) neural
Ultimately, a well learned behavioral response will develop as the organization within the hippocampus. In this way, dopamine is
dorsolateral striatum assumes greater control over behavior. These necessary for synaptic plasticity within the hippocampus.
functions must ultimately be coordinated with the context The replay of temporally ordered neural activity has been
saliency function of the hippocampus so that the ‘best’ behaviors primarily studied in populations of hippocampal pyramidal cells
can be selected within the correct context or decision making that exhibit place fields (Skaggs et al., 1996; Wilson and
environment. How this coordination among different brain McNaughton, 1994), where it is assumed to underlie spatial and
structures occurs is discussed in the following section. contextual information processing. Work by Pennartz et al. (2004),
however, indicates that this kind of replay may reflect a common
7. Neural systems coordination: cellular mechanisms process that enables binding of many kinds of information. In that
study, replay of sequences of neural activity was found to also
Understanding how, and under what conditions, neural occur in the ventral striatum during periods of rest that follow
systems interact is no small feat, even with a tractable model periods of activity. Moreover, recent work suggests that reward-
such as goal-directed navigation. This is the case not only because related replay contributes a motivational component to a
122 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
reactivated memory trace (Lansink et al., 2008). A follow-up study rats (Benchenane et al., 2010). Assuming this is also the case in
by the same group (Lansink et al., 2009) further demonstrated that awake navigating rats, it may be that dopamine plays a crucial role
hippocampal–striatal ensembles reactivated together during in coordinating ensemble activity across brain areas within a
sleep. This process was especially strong in pairs in which the decision-making network during navigation. Functionally, this
hippocampal cell processed spatial information and ventral striatal type of control by dopamine suggests that information about the
firing correlated to reward, suggesting a mechanism for consoli- saliency of reward may determine which brain systems become
dating place-reward associations. synchronized (and desynchronized), and this in turn informs
decisions about what information is used to update memories and
7.2. Neural systems organization and oscillatory activity which behaviors are selected.
Neural circuits have a natural tendency to oscillate according to a 7.2.2. Gamma rhythms
wide range of frequencies, and as such are likely to reflect a Neuronal groups are observed to synchronize their activity at
fundamental mechanism for coordinating neural activity across frequencies that are higher than the theta rhythm. In particular it is
multiple brain regions (e.g., Buzsaki, 2006; Fries, 2009). Goal- now well established that many brain areas exhibit rhythmic
directed navigation likely requires a high degree of coordination of neural activity in the gamma band (30–100 Hz). These include
multiple forms of information so that decisions can be made quickly. many sensory and motor areas of cortex, hippocampus, parietal
Thus, it seems reasonable to assume that a rich array of rhythmic cortex, and striatum (e.g., Bauer et al., 2006; Berke et al., 2004;
coordination occurs as animals engage in decision processes during Brosch et al., 2002; Csicsvari et al., 2003; Hoogenboom et al., 2006;
navigation. Oscillatory activity reflects alternating periods of Leung and Yim, 1993; Womelsdorf et al., 2006). In all cases, it is
synchronous and desynchronous neural firing: synchronous activity thought that the inhibitory interneuron networks within each
is associated with greater synaptic plasticity and stronger coupling structure play a major role in generating synchronized gamma
among cells of an ensemble, while desynchronous activity is oscillations (e.g., Bartos et al., 2007; Vida et al., 2006; Whittington
associated with period of less plasticity and weak signal strength et al., 1995). The functional importance of gamma oscillations
(Buzsaki, 2006; Hasselmo, 2005b; Hasselmo et al., 2002). remains debated. However, since gamma oscillations tend to occur
intermittently (i.e., in the form of a ‘gamma burst’ of about 150–
7.2.1. Theta rhythms 250 ms followed by periods of desynchronous activity), informa-
Numerous laboratories have now reported that synchronous tion carried by the cells that participate in a gamma-burst
neural activity (in particular coherence of the theta rhythm) can be effectively become a noticeable punctuate signal against a
detected across local neural networks both within and between background of disorganized neural activity. For this reason, it
brain structures such as the hippocampus, striatum, or prefrontal has been suggested that gamma-bursts represent a fundamental
cortex (DeCoteau et al., 2007a; Engel et al., 2001; Fell et al., 2001; mechanism by which information becomes segmented and/or
Siapas et al., 2005; Tabuchi et al., 2000; Varela et al., 2001; filtered within a structure, as well as a way to coordinate
Womelsdorf et al., 2007). For example, hippocampal theta activity information across structures (Buzsaki, 2006). Although theta
modulates the probability of neuronal firing, and theta can become and gamma frequencies vary by quite a bit (perhaps reflecting the
synchronized with place cell firing, serving to coordinate the type of information that each rhythm coordinates), there are many
timing of spatial coding (Gengler et al., 2005; O’Keefe and Recce, common physiological and behavioral relationships that suggest
1993). A growing number of studies demonstrate coordinated they are components of a coordinated and larger scale oscillatory
neural activity between the hippocampus and the striatum. Theta network. For example, similar to theta rhythms, single unit
oscillations within the striatum can become entrained to the responses that are recorded simultaneously with gamma oscilla-
hippocampal theta rhythm (Allers et al., 2002; Berke et al., 2004; tions have been found to have specific phase relationships to the
DeCoteau et al., 2007a). Stimulating the striatum can induce gamma rhythm (e.g., Berke, 2009; Kalenscher et al., 2010; van der
hippocampal theta activity (Sabatino et al., 1985) and increases Meer and Redish, 2009). Also, it is hypothesized that gamma
high frequency theta power, which is thought to be important for oscillations may effectively select salient information that can
sensorimotor integration (Hallworth and Bland, 2004). When come to impact decisions, learning, and behavioral responses (e.g.,
neural activity is disrupted in the striatum via D2 receptor Kalenscher et al., 2010; van der Meer and Redish, 2009) since their
antagonism, striatal modulation of high frequency hippocampal appearance is often in relation to task-relevant events. Another
theta activity is reduced, motor and spatial/contextual information similarity with the theta system is that the occurrence gamma
is not integrated, and task performance is impaired (Gengler et al., oscillations appear to be at least in part regulated by the dopamine
2005). It appears then that during goal directed navigation, system (Berke, 2009).
hippocampal and striatal activity becomes increasingly coherent,
and this pattern appears dopamine dependent. 7.2.3. Coordination of theta and gamma rhythms
Particularly intriguing is a finding common to both the It appears that task demands dictate the nature of neural
hippocampus and striatum: synchronous neural activity occurs synchrony across distal brain structures, suggesting that coordi-
in specific task-relevant ways (e.g., Hyman et al., 2005; Jones and nation of neural activity across brain structures has at least a
Wilson, 2005), and in particular, during times when rats are said to mnemonic component. A recent study (Fujisawa and Buzsaki,
be engaged in decision making (e.g., Benchenane et al., 2010). For 2010) showed that such an influence may come in the form of a
example, striatal theta is modified over the course of learning an very low frequency (4 Hz) entrainment of local field potentials
egocentric T-maze task, increasing as the rat chooses and initiates across brain areas (e.g., the 7–12 Hz theta oscillation). In that
turn behavior (DeCoteau et al., 2007a,b). Rats that learned the task study, a 4-Hz rhythm emerged only during phases of a maze task
developed an antiphase relationship between hippocampal and when rats made decisions (i.e., in the stem of a T-Maze). During
striatal theta oscillations, while rats that did not learn the task also decision periods, the 4 Hz rhythm was phase locked to the theta
did not show this coherent theta relationship. This coherence has oscillations in both the prefrontal cortex and VTA. Some of the
also been observed during striatal-dependent classical condition- individual prefrontal and VTA neurons were also phase locked to
ing (Kropf and Kuschinsky, 1993). hippocampal theta oscillation at this time. Importantly, the 4 Hz
Coherent theta oscillations across distant brain structures can rhythm was present only during a decision making period when
be enhanced with application of dopamine, at least in anesthetized theta oscillations were also present. The findings of the Fujisawa
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 123
and Buzsaki (2010) study suggest that a 4 Hz rhythm may only a representation of the current spatial structure of an
coordinate activity in distal brains structures specifically as environment, but also an experience-dependent definition of a
animals make decisions during goal-directed navigation. It rat’s expectations for the sensory environment that is, itself,
remains to be seen whether dopamine selectively activates the influenced by the appropriate behavioral repertoire, and expecta-
4 Hz rhythm when decisions need to be made. tions about the consequences of decisions and choices. Lateral
entorhinal cortex is presumably also activated by current (but in
8. Neural systems coordination: decisions and common this case nonspatial) sensory input as well as the same set of
foraging behaviors expectations (i.e., memories) that influence medial entorhinal
cortical processing. With the combined input from medial and
Successful decisions during goal-directed navigation likely lateral entorhinal cortex, the hippocampus can determine the
depend on a hierarchy of systems and cellular level interactions in extent to which the rat’s (spatial and nonspatial) expectations for
the brain. The accompanying video (http://depts.washington.edu/ the current context are met.
mizlab) demonstrates on a basic level, the relative involvement of When goals are achieved as predicted (e.g., food is found in
the hippocampus, the dopamine system, and the ventral and dorsal expected locations), hippocampal output may have the effect of
(medial and lateral) striatum during a simple food search task on a strengthening currently active memory circuits thereby increasing
laboratory maze. Particular attention is paid to the relative the likelihood that the same decisions and behaviors will be
contributions of these brain areas during each of the five ‘states’ selected the next time the rat is in the same familiar situation. The
of processing in Fig. 3, and as a function of novel exposure, new signal strength to ventral striatum would be expected to be
learning, and asymptotic performance levels. moderate, resulting in ventral striatal output that maintains a
To illustrate in more detail the functional interactions of the baseline level of inhibitory control over VTA neural responses to
same brain regions during common foraging scenarios, the reward encounters. That is, when rats encounter rewards in
following are neural and behavioral explanations for how animals expected locations, there should be no VTA response to the reward
make adaptive choices while navigating familiar environments, encounter itself. If an animal finds itself engaging in rather
how decisions are adjusted when familiar conditions change, and stereotyped or habitual behaviors in the familiar environment, it is
then how this same circuitry mediates rapid and adaptive learning likely that the dorsolateral striatum exerts more control over
when animals find themselves in novel situations. behavior than ventral striatum since dorsolateral striatum is
particularly involved in the performance of habitual behaviors
8.1. Goal directed navigation in a familiar context (e.g., Atallah et al., 2007; Jog et al., 1999; Thorn et al., 2010; Yin and
Knowlton, 2004; Yin et al., 2009 as discussed above).
There is clearly a home court advantage when it comes to an VTA dopamine neurons are known to increase firing when an
animal’s survival. If animals are familiar with their environment, animal encounters cues that predict reward (in familiar test
they are more likely to make good choices when it comes to conditions, e.g., Puryear et al., 2010; Schultz et al., 1997). These
deciding when and where to secure food, safe shelter, and mates. cue-elicited responses may arrive from the frontal cortex as there
This is the case not only because animals have learned the physical is little evidence of predictive cue processing in at least two other
characteristics of the environment, but perhaps more importantly major VTA afferent structures (e.g., the PPTg and LDTg). Thus,
because they have learned to identify its salient features. These during navigation in a familiar environment, both frontal cortex
salient features have taken on predictive value based on the and hippocampus may determine the timing of dopamine cells’
expected probability of reward given certain levels of effort. This contribution to reward processing. Although the details of the
information can be used to make choices that are appropriate for underlying neurocircuitry are presently not clear, this pattern of
different motivational and behavioral states. Under constant dopamine cell firing to cues and rewards results in the mainte-
conditions, obtaining a predicted outcome should result in the nance of the currently active memory networks.
strengthening of the memories that were used to guide decisions
and behavioral choices in the first place. 8.2. Goal directed navigation in a familiar context following a
It is postulated that the motivational state of an animal significant change in context
predisposes them to pay attention to specific cues within a familiar
environment, cues that have been previously associated with goal The natural environment is a continuously changing one. Thus,
acquisition. In this way, memories of past behavioral outcomes of, even when a rat navigates a familiar environment, the hippocam-
for example a hungry rat, defines the appropriate behavioral pus should automatically and continuously evaluate the saliency of
responses needed to obtain maximum amounts of food with the current context. In that way, when a rat encounters a change in
minimal effort or temporal delay. Based on the extensive literature the expected matrix of context information, hippocampal output
summarized previously, it seems reasonable to assume that when can immediately reflect the detected change to assess the need to
a rat enters a familiar environment in search of food, its change decisions and behaviors. Note that since a given context is
translational movement generates (movement-sensitive) theta comprised of multiple features, a detectable change in any one
rhythms in hippocampal regions, resulting in the activation of a feature should result in a signal that the context is different. The
spatial coordinate system that in turn imposes an experience- impact of detecting a context change on subsequent behaviors
determined spatial organization to information used during the depends on the processing within efferent target structures.
current event. The clearest neural instantiation of such an When an unexpected behavioral outcome or stimulus configu-
organization (often referred to as a spatial reference frame, map ration occurs in a familiar environment, rats increase exploratory
or chart) is represented by the grid cells of medial entorhinal activity and attention to potential cues. The latter would be
cortex. While there remain unresolved issues about how such a expected to result from the reorganization of spatial representa-
reference system actually works (e.g., does a given ‘map’ reset tions (e.g., grid and place cells) in hippocampal systems. The
during a single navigational event, and if so how and under what hippocampal reorganization would in turn generate an output that
conditions?), the current view is that both learned spatial and reflects the context change. In anticipation of receipt of new
nonspatial information arrive in hippocampus via the medial and information, striatal theories (e.g., Belin and Everitt, 2008;
lateral entorhinal cortices, respectively. Upon entering a familiar Humphries and Prescott, 2010; Salamone et al., 2009) suggest
environment, the medial entorhinal spatial reference includes not that when there is a significant change in a familiar environment,
124 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
ventral striatum may come to play a greater role in behavioral have been interpreted as ‘teaching signals’ for other neural systems
control than dorsal striatum. According to the circuitry presented (Schultz and Dickinson, 2000).
in Fig. 6, hippocampal output to the ventral striatum can The outcome of a striatal/VTA evaluation of the reinforcement
potentially activate two pathways of information flow to the outcomes of context-dependent behaviors is likely used by striatal
VTA. According to a scenario described by Humphries and Prescott efferent systems to modify decisions about which behaviors to
(2010), the ventral striatum in turn relays information about engage and which memories to modify. As memories become
reward expectations via a direct inhibitory pathway to the VTA, updated, so do the expectations for a given spatial context.
and information about the actual rewards via an indirect excitatory Assuming that the expected spatial context input to hippocampus
pathway (the ventral pallidum and the PPTg) to the VTA. When the is continuously refreshed, the context discrimination can always
actual rewards occur as expected, there is comparable inhibitory proceed with the most recent information from neocortex.
and excitatory control over dopamine cell responses to reward.
This balanced pattern of input results in no response to rewards by 8.3. Goal directed navigation in a novel context
dopamine neurons. Indeed dopamine cells do not respond to the
acquisition of expected rewards. If however, the actual reward is Recent evidence suggests that at least rats have an innate,
greater than expected, the excitatory drive should be greater than though initially rudimentary, spatial navigation-related neural
the inhibitory one, resulting in increased firing to reward by network that continues to develop over time (Langston et al., 2010;
dopamine cells. Perhaps the increased excitatory ventral striatal Wills et al., 2010). While the directional heading circuitry appears
input transitions dopamine cell membranes to a relative depolar- adult-like from a very young age, the grid and location systems
ized state. On the other hand, if the actual reward is less than take more time to develop. As experiences accumulate over a
expected, the inhibitory drive becomes greater than the excitatory lifetime then so might the efficiency of a context-dependent
one and this is manifest as reduced firing at the time of expected navigation circuit. Learning is faster when the outcomes of
rewards. Either of these altered dopamine responses to reward behaviors are predictable, and predictability can be enhanced
Theta wave Gamma wave
Dorsal Dopamine Ventral Hippocampus striatum (VTA/SN) striatum (Medial) (Lateral)
Context Salience Reward Salience Expectations Behavioral state
ed
hing signal Critic r-model free ontext Salience cto C Teac Actor-model basA
Goal-directed navigation
Fig. 10. Orchestration of neural systems while animals make decisions during goal-directed navigation. Accurate goal-directed navigation requires precise integration of
multiple types of information (e.g., context salience, reward salience, expectations (based on memories), and one’s behavioral state). Based on the current literature, it is clear
that all of these types of information are represented in some way within different neural systems. For illustration purposes, only the hippocampus, dopamine system, ventral
and dorsal striatum are shown. Thus, the nature of information represented does not clearly reveal the unique contributions of any one of these neural systems to goal-
directed navigation. Rather, the specialized contributions of different neural systems must be defined by their computational capacities (i.e., their intrinsic patterns of neural
connectivity), and the particular efferent structures that receive their output messages. Converging evidence supports the view that hippocampal output reflects an
evaluation of the salience of the current context, dopamine cells signal changes in expected reward values (and in doing so serve as a ‘teaching signal’ that updates processing
in efferent structures), the ventral striatum determines whether the outcomes of behavior were predicted, and dorsal striatum selects the appropriate behavior based on the
ventral striatal analysis. Especially during new learning, the dorsal medial striatum plays this ‘actor’ role for model based learning. As learning and performance become
model free, the dorsal lateral striatum serves the ‘actor’ role. These neural systems do not necessarily function independently. Rather emerging findings show that, depending
on specific task demands, neural activity may become synchronized across combinations of two or three brain structures according to theta and gamma rhythm frequencies.
Importantly the synchronization appears to happen at times when decisions should be made. This suggests that there may be some overarching factor that determines when
systems interactions will occur. One possibility is that a very low frequency oscillation (4 Hz) coordinates the theta and gamma coherence that has been observed between
neural systems (Fujisawa and Buzsaki, 2010). Since general physiological states are known to alter patterns of neural representations during learning (e.g., Kennedy and
Shapiro, 2004), it is suggested here that physiological states, such as hunger, fear, and stress, may determine the kind of neural systems orchestration that needs to take place
in order for animals to make optimal decisions relative to the achievement of specific kinds of goals.
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 125
even in a novel environment, if the significance of at least a subset in reinforcement learning (e.g., Doya, 2008; Farrar et al., 2008,
of contextual features can be inferred from past experiences with 2007, 2010; Font et al., 2008; Mingote et al., 2008a,b; Miyazaki
similar features. et al., 2011; Mott et al., 2009; Ragozzino, 2003; Ragozzino et al.,
Novelty coding by the navigational circuit is typically tested 2009; Worden et al., 2009).
with rats that have been trained to forage for food in one (b) There are many unanswered questions regarding the role of
environment, then placed in a new testing environment but asked dopamine in decision making and learning. For example, does
to perform the same behaviors (e.g., search for randomly place food dopamine have the same impact on synaptic and behavioral
in a novel open arena). Thus, the task rules and motor instructions functions in all brain regions that receive dopamine inputs?
for the novel context are previously learned, but there is novelty in The answer is likely yes and no. Dopamine appears to facilitate
terms of the cues that are present to inform goal-directed choices. excitation in efferent structures, although the details and
The familiar features (e.g., the narrow alleys of a maze, an enclosed temporal details may vary. Even if the degree of excitability
testing area with cues, and the fact that rewards can be found on were the same in different brain areas, the impact on behavior
such mazes) should immediately activate a ‘best match’ reference will likely be different since different structures (e.g.,
frame that can be used to guide initial exploration and goal hippocampus and striatum) engage unique intrinsic computa-
directed decisions. As consequences to choices occur and learning tional architectures to process similar information (e.g., spatial,
takes place, the difference between expected and actual context movement, and reward). Another critical issue whose resolu-
will diminish, and the relevant memory and reference frame will tion will impact future theoretical explanations of decision
be updated accordingly. At the point when the expected contexts making during navigation is the regulation and meaning of
and behavioral outcomes match what actually occur, one can tonic release of dopamine. For instance, tonic levels of
conclude that learning is complete. This new learning process may dopamine may contribute to defining the overall motivation
use similar neural circuitry as that described above when or goals during navigation (e.g., Niv et al., 2007).
information about changes in an expected context update (c) When recording in navigating animals, it is clear that against a
memories. In this way, behaviors that increase cue predictability foreground of interesting task-relevant firing is a background
and reduce unexpected outcomes will be associated with specific of neural codes for egocentric movements exhibited by the
cues. animal. The meaning of this seemingly universal coding of
If a rat with no testing experience is placed in an experimental egocentric information remains elusive. An intriguing possi-
arena for the first time, the rat may still bring to bear a ‘best match’ bility is that such codes guide specific task-relevant codes in a
option, or some minimal form of spatial reference within which to manner analogous to the way that intended movements appear
incorporate new information into the memories that are being to bias sensory responses by cortical neurons (e.g., Colby, 1998;
created during learning. For example, the rat may have learned the Colby and Goldberg, 1999). Interestingly, the movement-
identity of a safe new food, but now needs to learn the rules that related cells are often interpreted as reflecting the firing
lead to the efficient, most cost-effective strategy for securing the patterns of inhibitory interneurons, the specific function of
food. Compared to a foraging situation when there are slight which are only beginning to be appreciated.
changes in a familiar context, it should take more trials or more
time to reach the point when the expectations match the actual
The existence of many unresolved issues should not deter
outcomes (i.e., when learning is complete).
continued and intensive investigation of the adaptive navigation-
based heuristic for complex learning situations. Rather, because it
9. The challenges ahead
is evolutionarily highly conserved, this model holds great promise
for continuing to reveal fundamental organizing principles within
A big challenge facing the general field of neuroscience is to
and across neural systems, as well as between neural systems
understand the dynamic neural mechanisms that underlie
functions and behavior.
complex and adaptive natural behaviors. A first step toward
addressing this challenge could be to integrate existing literatures
Acknowledgements
on specific components of the adaptive behavior of interest, such as
context processing and decision-making that occurs during goal-
We thank Yong Sang Jo for helpful comments on earlier versions
directed navigation. In addition, new findings indicate that
of this manuscript and for producing all of the figures, Trevor
decision making during navigation is a powerful model for not
Bortins for producing the video linked to the article, Daniela
only defining neural and behavioral states that are relevant to this
Jaramillo for help managing references, Drs. Jeremy Clark and
behavior, but also for understanding how these states switch
Andrea Stocco for insightful discussion regarding the striatum, and
processing modes during natural learning situations. The identifi-
Dr. Van Redila for comments on an earlier version. We also thank
cation of such ‘switching mechanisms’ is important for our
anonymous reviewers for their comments. This work is funded by
understanding of what leads to decisions to ‘stay the course’ or
NIMH grant MH58755.
change behaviors. It is proposed that the motivational state of the
animal establishes the intended goals, and as such sets the References
thresholds for, and constraints on, neural activation across
multiple brain structures. A summary of key elements of this Aberman, J.E., Salamone, J.D., 1999. Nucleus accumbens dopamine depletions make
rats more sensitive to high ratio requirements but do not impair primary food
model is shown in Fig. 10.
reinforcement. Neuroscience 92, 545–552.
An explanation of the neurobiological mechanisms that support
Aberman, J.E., Ward, S.J., Salamone, J.D., 1998. Effects of dopamine antagonists and
decisions during goal-directed navigation will undoubtedly accumbens dopamine depletions on time-constrained progressive-ratio per-
become more complex. This is the case not only because of the formance. Pharmacol. Biochem. Behav. 61, 341–348.
Albin, R.L., Young, A.B., Penney, J.B., 1989. The functional anatomy of basal ganglia
technological advances in our ability to probe brain function, but
disorders. Trends Neurosci. 12, 366–375.
also because of the following:
Alderson, H.L., Latimer, M.P., Winn, P., 2008. A functional dissociation of the anterior
and posterior pedunculopontine tegmental nucleus: excitotoxic lesions have
differential effects on locomotion and the response to nicotine. Brain Struct.
(a) There are other important contributing factors that were not
Funct. 213, 247–253.
discussed here. Examples include the possible roles of
Aldridge, J.W., Berridge, K.C., 1998. Coding of serial order by neostriatal neurons: a
serotonin, acetylcholine, enkephalins, A2A receptors, and GABA ‘‘natural action’’ approach to movement sequence. J. Neurosci. 18, 2777–2787.
126 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
Alexander, G.E., Crutcher, M.D., 1990a. Functional architecture of basal ganglia Behr, J., Gloveli, T., Schmitz, D., Heinemann, U., 2000. Dopamine depresses excit-
circuits: neural substrates of parallel processing. Trends Neurosci. 13, 266–271. atory synaptic transmission onto rat subicular neurons via presynaptic D1-like
Alexander, G.E., Crutcher, M.D., 1990b. Preparation for movement: neural repre- dopamine receptors. J. Neurophysiol. 84, 112–119.
sentations of intended direction in three motor areas of the monkey. J. Neu- Belin, D., Everitt, B.J., 2008. Cocaine seeking habits depend upon dopamine-depen-
rophysiol. 64, 133–150. dent serial connectivity linking the ventral with the dorsal striatum. Neuron 57,
Alexander, G.E., DeLong, M.R., Strick, P.L., 1986. Parallel organization of functionally 432–441.
segregated circuits linking basal ganglia and cortex. Annu. Rev. Neurosci. 9, Benchenane, K., Peyrache, A., Khamassi, M., Tierney, P.L., Gioanni, Y., Battaglia, F.P.,
357–381. Wiener, S.I., 2010. Coherent theta oscillations and reorganization of spike
Allers, K.A., Ruskin, D.N., Bergstrom, D.A., Freeman, L.E., Ghazi, L.J., Tierney, P.L., timing in the hippocampal–prefrontal network upon learning. Neuron 66,
Walters, J.R., 2002. Multisecond periodicities in basal ganglia firing rates 921–936.
correlate with theta bursts in transcortical and hippocampal EEG. J. Neurophy- Beninato, M., Spencer, R.F., 1987. A cholinergic projection to the rat substantia nigra
siol. 87, 1118–1122. from the pedunculopontine tegmental nucleus. Brain Res. 412, 169–174.
Amaral, D.G., Ishizuka, N., Claiborne, B., 1990. Neurons, numbers and the hippo- Berendse, H.W., Galis-de Graaf, Y., Groenewegen, H.J., 1992a. Topographical orga-
campal network. Prog. Brain Res. 83, 1–11. nization and relationship with ventral striatal compartments of prefrontal
Amaral, D.G., Lavenex, P., 2006. Hippocampal neuroanatomy. In: Anderson, P., corticostriatal projections in the rat. J. Comp. Neurol. 316, 314–347.
Morris, R., Amaral, D., Bliss, T., O’Keefe, J. (Eds.), The Hippocampus. Oxford Berendse, H.W., Groenewegen, H.J., Lohman, A.H., 1992b. Compartmental distribu-
University Press, Oxford. tion of ventral striatal neurons projecting to the mesencephalon in the rat. J.
Anagnostaras, S.G., Gale, G.D., Fanselow, M.S., 2001. Hippocampus and contextual Neurosci. 12, 2079–2103.
fear conditioning: recent controversies and advances. Hippocampus 11, 8–17. Berke, J.D., 2009. Fast oscillations in cortical–striatal networks switch frequency
Anderson, M.I., Jeffery, K.J., 2003. Heterogeneous modulation of place cell firing by following rewarding events and stimulant drugs. Eur. J. Neurosci. 30, 848–
changes in context. J. Neurosci. 23, 8827–8835. 859.
Anderson, O., 1984. Optimal foraging by largemouth bass in structured environ- Berke, J.D., Okatan, M., Skurski, J., Eichenbaum, H.B., 2004. Oscillatory entrainment
ments. Ecology 65, 851–861. of striatal neurons in freely moving rats. Neuron 43, 883–896.
Annett, L.E., McGregor, A., Robbins, T.W., 1989. The effects of ibotenic acid lesions of Berridge, K.C., 2007. The debate over dopamine’s role in reward: the case for
the nucleus accumbens on spatial learning and extinction in the rat. Behav. incentive salience. Psychopharmacology (Berl) 191, 391–431.
Brain Res. 31, 231–242. Berridge, K.C., Robinson, T.E., 1998. What is the role of dopamine in reward: hedonic
Aragona, B.J., Day, J.J., Roitman, M.F., Cleaveland, N.A., Wightman, R.M., Carelli, R.M., impact, reward learning, or incentive salience? Brain Res. Brain Res. Rev. 28,
2009. Regional specificity in real-time development pf phasic dopamine trans- 309–369.
mission patterns during acquisition of a cue-cocaine association in rats. Eur. J. Berridge, K.C., Whishaw, I.Q., 1992. Cortex, striatum and cerebellum: control of
Neurosci. 30, 1889–1899. serial order in a grooming sequence. Exp. Brain Res. 90, 275–290.
Astur, R.S., Ortiz, M.L., Sutherland, R.J., 1998. A characterization of performance by Bethus, I., Tse, D., Morris, R.G., 2010. Dopamine and memory: modulation of the
men and women in a virtual Morris water task: a large and reliable sex persistence of memory for novel hippocampal NMDA receptor-dependent
difference. Behav. Brain Res. 93, 185–190. paired associates. J. Neurosci. 30, 1610–1618.
Atallah, H.E., Lopez-Paniagua, D., Rudy, J.W., O’Reilly, R.C., 2007. Separate neural Bezzina, G., Body, S., Cheung, T.H., Hampson, C.L., Deakin, J.F., Anderson, I.M.,
substrates for skill learning and performance in the ventral and dorsal striatum. Szabadi, E., Bradshaw, C.M., 2008. Effect of quinolinic acid-induced lesions of
Nat. Neurosci. 10, 126–131. the nucleus accumbens core on performance on a progressive ratio schedule of
Bach, M.E., Barad, M., Son, H., Zhuo, M., Lu, Y.F., Shih, R., Mansuy, I., Hawkins, R.D., reinforcement: implications for inter-temporal choice. Psychopharmacology
Kandel, E.R., 1999. Age-related defects in spatial memory are correlated with (Berl) 197, 339–350.
defects in the late phase of hippocampal long-term potentiation in vitro and are Bjorklund, A., Dunnett, S.B., 2007. Dopamine neuron systems in the brain: an
attenuated by drugs that enhance the cAMP signaling pathway. Proc. Natl. Acad. update. Trends Neurosci. 30, 194–202.
Sci. U.S.A. 96, 5280–5285. Boeijinga, P.H., Mulder, A.B., Pennartz, C.M., Manshanden, I., Lopes da Silva, F.H.,
Balleine, B.W., Delgado, M.R., Hikosaka, O., 2007. The role of the dorsal striatum in 1993. Responses of the nucleus accumbens following fornix/fimbria stimula-
reward and decision-making. J. Neurosci. 27, 8161–8165. tion in the rat. Identification and long-term potentiation of mono- and poly-
Balleine, B.W., Dickinson, A., 1998. Goal-directed instrumental action: contingency synaptic pathways. Neuroscience 53, 1049–1058.
and incentive learning and their cortical substrates. Neuropharmacology 37, Bornstein, A.M., Daw, N.D., 2011. Multiplicity of control in the basal ganglia:
407–419. computational roles of striatal subregions. Curr. Opin. Neurobiol. 21, 374–
Balleine, B.W., Liljeholm, M., Ostlund, S.B., 2009. The integrative function of the 380.
basal ganglia in instrumental conditioning. Behav. Brain Res. 199, 43–52. Botvinick, M.M., Niv, Y., Barto, A.C., 2009. Hierarchically organized behavior and its
Balleine, B.W., O’Doherty, J.P., 2010. Human and rodent homologies in action neural foundations: a reinforcement learning perspective. Cognition 113, 262–
control: corticostriatal determinants of goal-directed and habitual action. 280.
Neuropsychopharmacology 35, 48–69. Boyd, L.A., Edwards, J.D., Siengsukon, C.S., Vidoni, E.D., Wessel, B.D., Linsdell, M.A.,
Bardgett, M.E., Depenbrock, M., Downs, N., Points, M., Green, L., 2009. Dopamine 2009. Motor sequence chunking is impaired by basal ganglia stroke. Neurobiol.
modulates effort-based decision making in rats. Behav. Neurosci. 123, 242–251. Learn. Mem. 92, 35–44.
Bardo, M.T., Donohew, R.L., Harrington, N.G., 1996. Psychobiology of novelty Brischoux, F., Chakraborty, S., Brierley, D.I., Ungless, M.A., 2009. Phasic excitation of
seeking and drug seeking behavior. Behav. Brain Res. 77, 23–43. dopamine neurons in ventral VTA by noxious stimuli. Proc. Natl. Acad. Sci. U.S.A.
Bardo, M.T., Dwoskin, L.P., 2004. Biological connection between novelty- and drug- 106, 4894–4899.
seeking motivational systems. Nebr. Symp. Motiv. 50, 127–158. Bromberg-Martin, E.S., Hikosaka, O., 2009. Midbrain dopamine neurons signal
Bardo, M.T., Neisewander, J.L., Pierce, R.C., 1989. Novelty-induced place preference preference for advance information about upcoming rewards. Neuron 63,
behavior in rats: effects of opiate and dopaminergic drugs. Pharmacol. Biochem. 119–126.
Behav. 32, 683–689. Bromberg-Martin, E.S., Matsumoto, M., Hikosaka, O., 2010. Dopamine in motiva-
Barnes, C.A., 1979. Memory deficits associated with senescence: a neurophysiologi- tional control: rewarding, aversive, and alerting. Neuron 68, 815–834.
cal and behavioral study in the rat. J. Comp. Physiol. Psychol. 93, 74–104. Brosch, M., Budinger, E., Scheich, H., 2002. Stimulus-related gamma oscillations in
Barnes, C.A., McNaughton, B.L., Mizumori, S.J., Leonard, B.W., Lin, L.H., 1990. primate auditory cortex. J. Neurophysiol. 87, 2715–2725.
Comparison of spatial and temporal characteristics of neuronal activity in Brown, P.L., Jenkins, H.M., 1968. Auto-shaping of the pigeon’s key-peck. J. Exp. Anal.
sequential stages of hippocampal processing. Prog. Brain Res. 83, 287–300. Behav. 11, 1–8.
Barnes, T.D., Kubota, Y., Hu, D., Jin, D.Z., Graybiel, A.M., 2005. Activity of striatal Burgess, N., Barry, C., O’Keefe, J., 2007. An oscillatory interference model of grid cell
neurons reflects dynamic encoding and recoding of procedural memories. firing. Hippocampus 17, 801–812.
Nature 437, 1158–1161. Burgess, N., Maguire, E.A., O’Keefe, J., 2002. The human hippocampus and spatial
Barnes, T.D., Mao, J.B., Hu, D., Kubota, Y., Dreyer, A.A., Stamoulis, C., Brown, E.N., and episodic memory. Neuron 35, 625–641.
Graybiel, A.M., 2011. Advance-cueing produces enhanced action-boundary Burns, L.H., Annett, L., Kelley, A.E., Everitt, B.J., Robbins, T.W., 1996. Effects of lesions
patterns of spike activity in the sensorimotor striatum. J. Neurophysiol. 105, to amygdala, ventral subiculum, medial prefrontal cortex, and nucleus accum-
1861–1878. bens on the reaction to novelty: implication for limbic–striatal interactions.
Barry, C., Hayman, R., Burgess, N., Jeffery, K.J., 2007. Experience-dependent rescaling Behav. Neurosci. 110, 60–73.
of entorhinal grids. Nat. Neurosci. 10, 682–684. Burwell, R.D., 2000. The parahippocampal region: corticocortical connectivity. Ann.
Bartos, M., Vida, I., Jonas, P., 2007. Synaptic mechanisms of synchronized gamma N. Y. Acad. Sci. 911, 25–42.
oscillations in inhibitory interneuron networks. Nat. Rev. Neurosci. 8, 45–56. Burwell, R.D., Amaral, D.G., 1998a. Cortical afferents of the perirhinal, postrhinal,
Bauer, M., Oostenveld, R., Peeters, M., Fries, P., 2006. Tactile spatial attention and entorhinal cortices of the rat. J. Comp. Neurol. 398, 179–205.
enhances gamma-band activity in somatosensory cortex and reduces low- Burwell, R.D., Amaral, D.G., 1998b. Perirhinal and postrhinal cortices of the rat:
frequency activity in parieto-occipital areas. J. Neurosci. 26, 490–501. interconnectivity and connections with the entorhinal cortex. J. Comp. Neurol.
Baunez, C., Robbins, T.W., 1999. Effects of dopamine depletion of the dorsal striatum 391, 293–321.
and further interaction with subthalamic nucleus lesions in an attentional task Bussey, T.J., Everitt, B.J., Robbins, T.W., 1997. Dissociable effects of cingulate and
in the rat. Neuroscience 92, 1343–1356. medial frontal cortex lesions on stimulus–reward learning using a novel
Bayer, H.M., Glimcher, P.W., 2005. Midbrain dopamine neurons encode a quantita- Pavlovian autoshaping procedure for the rat: implications for the neurobiology
tive reward prediction error signal. Neuron 47, 129–141. of emotion. Behav. Neurosci. 111, 908–919.
Beckstead, R.M., Domesick, V.B., Nauta, W.J., 1979. Efferent connections of the Buzsaki, G., 1989. Two-stage model of memory trace formation: a role for ‘‘noisy’’
substantia nigra and ventral tegmental area in the rat. Brain Res. 175, 191–217. brain states. Neuroscience 31, 551–570.
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 127
Buzsaki, G., 2005. Theta rhythm of navigation: link between path integration and Day, J.J., Carelli, R.M., 2007. The nucleus accumbens and Pavlovian reward learning.
landmark navigation, episodic and semantic memory. Hippocampus 15, 827– Neuroscientist 13, 148–159.
840. Day, J.J., Jones, J.L., Carelli, R.M., 2011. Nucleus accumbens neurons encode predicted
Buzsaki, G., 2006. Rhythms of the Brain. Oxford Press, NY. and ongoing reward costs in rats. Eur. J. Neurosci. 33, 308–321.
Buzsaki, G., Chrobak, J.J., 2005. Synaptic plasticity and self-organization in the Day, J.J., Roitman, M.F., Wightman, R.M., Carelli, R.M., 2007. Associative learning
hippocampus. Nat. Neurosci. 8, 1418–1420. mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat.
Cardinal, R.N., Pennicott, D.R., Sugathapala, C.L., Robbins, T.W., Everitt, B.J., 2001. Neurosci. 10, 1020–1028.
Impulsive choice induced in rats by lesions of the nucleus accumbens core. Day, J.J., Wheeler, R.A., Roitman, M.F., Carelli, R.M., 2006. Nucleus accumbens
Science 292, 2499–2501. neurons encode Pavlovian approach behaviors: evidence from an autoshaping
Carelli, R.M., Ijames, S.G., 2001. Selective activation of accumbens neurons by paradigm. Eur. J. Neurosci. 23, 1341–1351.
cocaine-associated stimuli during a water/cocaine multiple schedule. Brain Dayan, P., Daw, N.D., 2008. Decision theory, reinforcement learning, and the brain.
Res. 907, 156–161. Cogn. Affect. Behav. Neurosci. 8, 429–453.
Carr, D.B., Sesack, S.R., 2000. Projections from the rat prefrontal cortex to the ventral Dayan, P., Niv, Y., 2008. Reinforcement learning: the good, the bad and the ugly.
tegmental area: target specificity in the synaptic associations with mesoac- Curr. Opin. Neurobiol. 18, 185–196.
cumbens and mesocortical neurons. J. Neurosci. 20, 3864–3873. De Leonibus, E., Pascucci, T., Lopez, S., Oliverio, A., Amalric, M., Mele, A., 2007. Spatial
Carr, H.A., 1917. The distribution and elimination of errors in the maze. J. Anim. deficits in a mouse model of Parkinson disease. Psychopharmacology (Berl) 194,
Behav. 7, 145–159. 517–525.
Charnov, E.L., 1976. Optimal foraging, the marginal value theorem. Theor. Popul. De Leonibus, E., Verheij, M.M., Mele, A., Cools, A., 2006. Distinct kinds of novelty
Biol. 9, 129–136. processing differentially increase extracellular dopamine in different brain
Christoph, G.R., Leonzio, R.J., Wilcox, K.S., 1986. Stimulation of the lateral habenula regions. Eur. J. Neurosci. 23, 1332–1340.
inhibits dopamine-containing neurons in the substantia nigra and ventral DeCoteau, W.E., Kesner, R.P., 2000. A double dissociation between the rat hippo-
tegmental area of the rat. J. Neurosci. 6, 613–619. campus and medial caudoputamen in processing two forms of knowledge.
Chudasama, Y., Robbins, T.W., 2006. Functions of frontostriatal systems in cogni- Behav. Neurosci. 114, 1096–1108.
tion: comparative neuropsychopharmacological studies in rats, monkeys and DeCoteau, W.E., Thorn, C., Gibson, D.J., Courtemanche, R., Mitra, P., Kubota, Y.,
humans. Biol. Psychol. 73, 19–38. Graybiel, A.M., 2007a. Learning-related coordination of striatal and hippocam-
Clark, J.J., Sandberg, S.G., Wanat, M.J., Gan, J.O., Horne, E.A., Hart, A.S., Akers, C.A., pal theta rhythms during acquisition of a procedural maze task. Proc. Natl. Acad.
Parker, J.G., Willuhn, I., Martinez, V., Evans, S.B., Stella, N., Phillips, P.E., 2010. Sci. U.S.A. 104, 5644–5649.
Chronic microsensors for longitudinal, subsecond dopamine detection in be- DeCoteau, W.E., Thorn, C., Gibson, D.J., Courtemanche, R., Mitra, P., Kubota, Y.,
having animals. Nat. Methods 7, 126–129. Graybiel, A.M., 2007b. Oscillations of local field potentials in the rat dorsal
Colby, C.L., 1998. Action-oriented spatial reference frames in cortex. Neuron 20, 15– striatum during spontaneous and instructed behaviors. J. Neurophysiol. 97,
24. 3800–3805.
Colby, C.L., Goldberg, M.E., 1999. Space and attention in parietal cortex. Annu. Rev. Denk, F., Walton, M.E., Jennings, K.A., Sharp, T., Rushworth, M.F., Bannerman, D.M.,
Neurosci. 22, 319–349. 2005. Differential involvement of serotonin and dopamine systems in cost–
Colgin, L.L., Moser, E.I., Moser, M., 2008. Understanding memory through hippo- benefit decisions about delay or effort. Psychopharmacology (Berl) 179, 587–
campal remapping. Trends Neurosci. 31, 469–477. 596.
Colwill, R.M., Rescorla, R.A., 1990. Effect of reinforcer devaluation on discriminative Derdikman, D., Moser, E.I., 2010. A manifold of spatial maps in the brain. Trends
control of instrumental behavior. J. Exp. Psychol. Anim. Behav. Process 16, 40– Cogn. Sci. 14, 561–569.
47. Devan, B.D., White, N.M., 1999. Parallel information processing in the dorsal
Cooper, B.G., Mizumori, S.J., 2001. Temporary inactivation of the retrosplenial striatum: relation to hippocampal function. J. Neurosci. 19, 2789–2798.
cortex causes a transient reorganization of spatial coding in the hippocampus. Di Ciano, P., Cardinal, R.N., Cowell, R.A., Little, S.J., Everitt, B.J., 2001. Differential
J. Neurosci. 21, 3986–4001. involvement of NMDA, AMPA/kainate, and dopamine receptors in the nucleus
Corbit, L.H., Janak, P.H., 2007. Inactivation of the lateral but not medial dorsal accumbens core in the acquisition and performance of Pavlovian approach
striatum eliminates the excitatory impact of Pavlovian stimuli on instrumental behavior. J. Neurosci. 21, 9471–9477.
responding. J. Neurosci. 27, 13977–13981. Dickinson, A., 1985. Actions and habits: the development of behavioural autonomy.
Corbit, L.H., Janak, P.H., 2010. Posterior dorsomedial striatum is critical for both Philos. Trans. R. Soc. Lond. B: Biol. Sci. 308, 67–78.
selective instrumental and Pavlovian reward learning. Eur. J. Neurosci. 31, Diaz-Fleischer, F., 2005. Predatory behavior and prey-capture decision-making by
1312–1321. the web-weaving spider Micrathena sagittata. Can. J. Zool. Rev. Can. Zool. 83,
Corbit, L.H., Muir, J.L., Balleine, B.W., 2001. The role of the nucleus accumbens in 268–273.
instrumental conditioning: evidence of a functional dissociation between Dormont, J.F., Conde, H., Farin, D., 1998. The role of the pedunculopontine tegmental
accumbens core and shell. J. Neurosci. 21, 3251–3260. nucleus in relation to conditioned motor performance in the cat. I. Context-
Corrado, G.S., Sugrue, L.P., Brown, J.R., Newsome, W.T., 2009. The trouble with dependent and reinforcement-related single unit activity. Exp. Brain Res. 121,
choice: studying decision variables in the brain. In: Glimcher, P.W., Camerer, 401–410.
C.F., Fehr, E., Poldrack, R.A. (Eds.), Neuroeconomics: Decision Making the Brain. Doya, K., 2008. Modulators of decision making. Nat. Neurosci. 11, 410–416.
Elsevier. Dragoi, G., Harris, K.D., Buzsaki, G., 2003. Place representation within hippocampal
Costa, R.M., Cohen, D., Nicolelis, M.A., 2004. Differential corticostriatal plasticity networks is modified by long-term potentiation. Neuron 39, 843–853.
during fast and slow motor skill learning in mice. Curr. Biol. 14, 1124– Eichenbaum, H., Cohen, N.J., 2001. From Conditioning to Conscious Recollection:
1134. Memory Systems of the Brain. Oxford University Press, New York.
Cousins, M.S., Atherton, A., Turner, L., Salamone, J.D., 1996. Nucleus accumbens Eichenbaum, H., Lipton, P.A., 2008. Towards a functional organization of the medial
dopamine depletions alter relative response allocation in a T-maze cost/benefit temporal lobe memory system: role of the parahippocampal and medial
task. Behav. Brain Res. 74, 189–197. entorhinal cortical areas. Hippocampus 18, 1314–1324.
Cousins, M.S., Wei, W., Salamone, J.D., 1994. Pharmacological characterization of El-Ghundi, M., Fletcher, P.J., Drago, J., Sibley, D.R., O’Dowd, B.F., George, S.R., 1999.
performance on a concurrent lever pressing/feeding choice procedure: effects Spatial learning deficit in dopamine D(1) receptor knockout mice. Eur. J.
of dopamine antagonist, cholinomimetic, sedative and stimulant drugs. Psycho- Pharmacol. 383, 95–106.
pharmacology (Berl) 116, 529–537. Engel, A.K., Fries, P., Singer, W., 2001. Dynamic predictions: oscillations and
Cowie, R.J., 1977. Optimal foraging in great tits (Parus major). Nature 268, 137–139. synchrony in top-down processing. Nat. Rev. Neurosci. 2, 704–716.
Cromwell, H.C., Schultz, W., 2003. Effects of expectations for different reward Enomoto, T., Floresco, S.B., 2009. Disruptions in spatial working memory, but not
magnitudes on neuronal activity in primate striatum. J. Neurophysiol. 89, short-term memory, induced by repeated ketamine exposure. Prog. Neurop-
2823–2838. sychopharmacol. Biol. Psychiatry 33, 668–675.
Csicsvari, J., Jamieson, B., Wise, K.D., Buzsaki, G., 2003. Mechanisms of gamma Estes, W.K., 1943. Discriminative conditioning. I. A discriminative property of
oscillations in the hippocampus of the behaving rat. Neuron 37, 311–322. conditioned anticipation. J. Exp. Psychol. 32, 150–155.
Dalley, J.W., Cardinal, R.N., Robbins, T.W., 2004. Prefrontal executive and cognitive Estes, W.K., 1948. Discriminative conditioning. II. Effects of a Pavlovian conditioned
functions in rodents: neural and neurochemical substrates. Neurosci. Biobehav. stimulus upon a subsequently established operant response. J. Exp. Psychol. 38,
Rev. 28, 771–784. 173–177.
Da Cunha, C., Wietzikoski, S., Wietzikoski, E.C., Miyoshi, E., Ferro, M.M., Anselmo- Etienne, A.S., Jeffery, K.J., 2004. Path integration in mammals. Hippocampus 14,
Franci, J.A., Canteras, N.S., 2003. Evidence for the substantia nigra pars compacta 180–192.
as an essential component of a memory system independent of the hippocam- Everitt, B.J., Robbins, T.W., 2005. Neural systems of reinforcement for drug addic-
pal memory system. Neurobiol. Learn Mem. 79, 236–242. tion: from actions to habits to compulsion. Nat. Neurosci. 8, 1481–1489.
Darvas, M., Palmiter, R.D., 2010. Restricting dopaminergic signaling to either Farrar, A.M., Font, L., Pereira, M., Mingote, S., Bunce, J.G., Chrobak, J.J., Salamone, J.D.,
dorsolateral or medial striatum facilitates cognition. J. Neurosci. 30, 1158–1165. 2008. Forebrain circuitry involved in effort-related choice: injections of the
Darvas, M., Palmiter, R.D., 2011. Contributions of striatal dopamine signaling to the GABAA agonist muscimol into ventral pallidum alter response allocation in
modulation of cognitive flexibility. Biol. Psychiatry 69, 704–707. food-seeking behavior. Neuroscience 152, 321–330.
Davies, N.B., 1977. Prey selection and search strategy of spotted flycatcher (Mus- Farrar, A.M., Pereira, M., Velasco, F., Hockemeyer, J., Muller, C.E., Salamone, J.D.,
ciapa striata)-filed-study on optimal foraging. Anim. Behav. 25, 1016–1033. 2007. Adenosine A(2A) receptor antagonism reverses the effects of dopamine
Daw, N.D., Niv, Y., Dayan, P., 2005. Uncertainty-based competition between pre- receptor antagonism on instrumental output and effort-related choice in the
frontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. rat: implications for studies of psychomotor slowing. Psychopharmacology
12, 1704–1711. (Berl) 191, 579–586.
128 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
Farrar, A.M., Segovia, K.N., Randall, P.A., Nunes, E.J., Collins, L.E., Stopper, C.M., Port, Frey, U., Schroeder, H., Matthies, H., 1990. Dopaminergic antagonists prevent long-
R.G., Hockemeyer, J., Muller, C.E., Correa, M., Salamone, J.D., 2010. Nucleus term maintenance of posttetanic LTP in the CA1 region of rat hippocampal
accumbens and effort-related functions: behavioral and neural markers of the slices. Brain Res. 522, 69–75.
interactions between adenosine A2A and dopamine D2 receptors. Neuroscience Fries, P., 2009. Neuronal gamma-band synchronization as a fundamental process in
166, 1056–1067. cortical computation. Annu. Rev. Neurosci. 32, 209–224.
Faure, A., Haberland, U., Conde, F., El Massioui, N., 2005. Lesion to the nigrostriatal Fuhs, M.C., Touretzky, D.S., 2007. Context learning in the rodent hippocampus.
dopamine system disrupts stimulus–response habit formation. J. Neurosci. 25, Neural Comput. 19, 3173–3215.
2771–2780. Fujisawa, S., Buzsaki, G., 2010. Theta and 4 Hz Oscillations: Region-specific Coupling
Featherstone, R.E., McDonald, R.J., 2004. Dorsal striatum and stimulus–response of PFC, VTA and Hippocampus in a Goal-directed Behavior. Society for Neuro-
learning: lesions of the dorsolateral, but not dorsomedial, striatum impair science, San Diego, CA.
acquisition of a simple discrimination task. Behav. Brain Res. 150, 15–23. Futami, T., Takakusaki, K., Kitai, S.T., 1995. Glutamatergic and cholinergic inputs
Fell, J., Klaver, P., Lehnertz, K., Grunwald, T., Schaller, C., Elger, C.E., Fernandez, G., from the pedunculopontine tegmental nucleus to dopamine neurons in the
2001. Human memory formation is accompanied by rhinal-hippocampal cou- substantia nigra pars compacta. Neurosci. Res. 21, 331–342.
pling and decoupling. Nat. Neurosci. 4, 1259–1264. Fyhn, M., Hafting, T., Treves, A., Moser, M.B., Moser, E.I., 2007. Hippocampal
Fenton, A.A., Muller, R.U., 1998. Place cell discharge is extremely variable during remapping and grid realignment in entorhinal cortex. Nature 446, 190–194.
individual passes of the rat through the firing field. Proc. Natl. Acad. Sci. U.S.A. Gal, G., Joel, D., Gusak, O., Feldon, J., Weiner, I., 1997. The effects of electrolytic lesion
95, 3182–3187. to the shell subterritory of the nucleus accumbens on delayed non-matching-
Ferbinteanu, J., Shirvalkar, P., Shapiro, M.L., 2011. Memory modulates journey- to-sample and four-arm baited eight-arm radial-maze tasks. Behav. Neurosci.
dependent coding in the rat hippocampus. J. Neurosci. 31, 9135–9146. 111, 92–103.
Ferbinteanu, J., Shapiro, M.L., 2003. Prospective and retrospective memory coding in Gan, J.O., Walton, M.E., Phillips, P.E., 2010. Dissociable cost and benefit encoding of
the hippocampus. Neuron 40, 1227–1239. future rewards by mesolimbic dopamine. Nat. Neurosci. 13, 25–27.
Ferretti, V., Florian, C., Costantini, V.J., Roullet, P., Rinaldi, A., De Leonibus, E., Gardiner, T.W., Kitai, S.T., 1992. Single-unit activity in the globus pallidus and
Oliverio, A., Mele, A., 2005. Co-activation of glutamate and dopamine receptors neostriatum of the rat during performance of a trained head movement. Exp.
within the nucleus accumbens is required for spatial memory consolidation in Brain Res. 88, 517–530.
mice. Psychopharmacology (Berl) 179, 108–116. Gasbarri, A., Packard, M.G., Campana, E., Pacitti, C., 1994a. Anterograde and retro-
Fields, H.L., Hjelmstad, G.O., Margolis, E.B., Nicola, S.M., 2007. Ventral tegmental grade tracing of projections from the ventral tegmental area to the hippocampal
area neurons in learned appetitive behavior and positive reinforcement. Annu. formation in the rat. Brain Res. Bull. 33, 445–452.
Rev. Neurosci. 30, 289–316. Gasbarri, A., Sulli, A., Innocenzi, R., Pacitti, C., Brioni, J.D., 1996. Spatial memory
Fiorillo, C.D., Newsome, W.T., Schultz, W., 2008. The temporal precision of reward impairment induced by lesion of the mesohippocampal dopaminergic system
prediction in dopamine neurons. Nat. Neurosci. 11, 966–973. in the rat. Neuroscience 74, 1037–1044.
Fiorillo, C.D., Tobler, P.N., Schultz, W., 2003. Discrete coding of reward probability Gasbarri, A., Sulli, A., Packard, M.G., 1997. The dopaminergic mesencephalic projec-
and uncertainty by dopamine neurons. Science 299, 1898–1902. tions to the hippocampal formation in the rat. Prog. Neuropsychopharmacol.
Fiorillo, C.D., Tobler, P.N., Schultz, W., 2005. Evidence that the delay-period activity Biol. Psychiatry 21, 1–22.
of dopamine neurons corresponds to reward uncertainty rather than back- Gasbarri, A., Verney, C., Innocenzi, R., Campana, E., Pacitti, C., 1994b. Mesolimbic
propagating TD errors. Behav. Brain Funct. 1, 7. dopaminergic neurons innervating the hippocampal formation in the rat: a
Fitting, S., Allen, G.L., Wedell, D.H., 2007. Remembering places in space: a human combined retrograde tracing and immunohistochemical study. Brain Res. 668,
analog study of the Morris water maze. In: Barkowsky, T., Knauff, M., Ligozat, 71–79.
G., Montello, D.R. (Eds.), Spatial Cognition V: Reasoning, Action, Interaction. Gavrilov, V.V., Wiener, S.I., Berthoz, A., 1998. Discharge correlates of hippocampal
Springer-Verlag, Berlin, Heidelberg, pp. 59–75. complex spike neurons in behaving rats passively displaced on a mobile robot.
Flagel, S.B., Clark, J.J., Robinson, T.E., Mayo, L., Czuj, A., Willuhn, I., Akers, C.A., Hippocampus 8, 475–490.
Clinton, S.M., Phillips, P.E., Akil, H., 2011. A selective role for dopamine in Geisler, S., Derst, C., Veh, R.W., Zahm, D.S., 2007. Glutamatergic afferents of the
stimulus–reward learning. Nature 469, 53–57. ventral tegmental area in the rat. J. Neurosci. 27, 5730–5743.
Flaherty, A.W., Graybiel, A.M., 1993. Output architecture of the primate putamen. J. Gengler, S., Mallot, H.A., Holscher, C., 2005. Inactivation of the rat dorsal striatum
Neurosci. 13, 3222–3237. impairs performance in spatial tasks and alters hippocampal theta in the freely
Floresco, S.B., Blaha, C.D., Yang, C.R., Phillips, A.G., 2001. Modulation of hippocampal moving rat. Behav. Brain Res. 164, 73–82.
and amygdalar-evoked activity of nucleus accumbens neurons by dopamine: Gilbert, P.E., Kesner, R.P., Lee, I., 2001. Dissociating hippocampal subregions: double
cellular mechanisms of input selection. J. Neurosci. 21, 2851–2860. dissociation between dentate gyrus and CA1. Hippocampus 11, 626–636.
Floresco, S.B., Ghods-Sharifi, S., 2007. Amygdala-prefrontal cortical circuitry reg- Gill, K.M., Mizumori, S.J., 2007. Inactivation of prefrontal cortex alters reward-
ulates effort-based decision making. Cereb. Cortex 17, 251–260. related neural activity in substantia nigra. In: Society for Neuroscience
Floresco, S.B., St Onge, J.R., Ghods-Sharifi, S., Winstanley, C.A., 2008a. Cortico- Abstracts. Program No. 640.1.
limbic-striatal circuits subserving different forms of cost–benefit decision Gold, A.E., Kesner, R.P., 2005. The role of the CA3 subregion of the dorsal hippo-
making. Cogn. Affect. Behav. Neurosci. 8, 375–389. campus in spatial pattern completion in the rat. Hippocampus 15, 808–814.
Floresco, S.B., Tse, M.T., Ghods-Sharifi, S., 2008b. Dopaminergic and glutamatergic Goss-Custard, J.D., 1977. Response of redshank, Tringa totanus, to absolute and
regulation of effort- and delay-based decision making. Neuropsychopharma- relative densities of 2 prey species. J. Anim. Ecol. 46, 867–874.
cology 33, 1966–1979. Gothard, K.M., Skaggs, W.E., Moore, K.M., McNaughton, B.L., 1996. Binding of
Font, L., Mingote, S., Farrar, A.M., Pereira, M., Worden, L., Stopper, C., Port, R.G., hippocampal CA1 neural activity to multiple reference frames in a land-
Salamone, J.D., 2008. Intra-accumbens injections of the adenosine A2A agonist mark-based navigation task. J. Neurosci. 16, 823–835.
CGS 21680 affect effort-related choice behavior in rats. Psychopharmacology Goto, Y., O’Donnell, P., 2002. Timing-dependent limbic–motor synaptic integration
(Berl) 199, 515–526. in the nucleus accumbens. Proc. Natl. Acad. Sci. U.S.A. 99, 13189–13193.
Foster, D.J., Wilson, M.A., 2006. Reverse replay of behavioural sequences in hippo- Goto, Y., Yang, C.R., Otani, S., 2010. Functional and dysfunctional synaptic plasticity
campal place cells during the awake state. Nature 440, 680–683. in prefrontal cortex: roles in psychiatric disorders. Biol. Psychiatry 67, 199–207.
Foster, T.C., Castro, C.A., McNaughton, B.L., 1989. Spatial selectivity of rat hippo- Grace, A.A., 1991. Phasic versus tonic dopamine release and the modulation of
campal neurons: dependence on preparedness for movement. Science 244, dopamine system responsivity: a hypothesis for the etiology of schizophrenia.
1580–1582. Neuroscience 41, 1–24.
Frank, L.M., Brown, E.N., Wilson, M., 2000. Trajectory encoding in the hippocampus Grace, A.A., Floresco, S.B., Goto, Y., Lodge, D.J., 2007. Regulation of firing of dopami-
and entorhinal cortex. Neuron 27, 169–178. nergic neurons and control of goal-directed behaviors. Trends Neurosci. 30,
Frank, L.M., Stanley, G.B., Brown, E.N., 2004. Hippocampal plasticity across multiple 220–227.
days of exposure to novel environments. J. Neurosci. 24, 7681–7689. Graybiel, A.M., 1998. The basal ganglia and chunking of action repertoires. Neuro-
Frank, M.J., 2005. Dynamic dopamine modulation in the basal ganglia: a neuro- biol. Learn. Mem. 70, 119–136.
computational account of cognitive deficits in medicated and nonmedicated Graybiel, A.M., 2008. Habits, rituals, and the evaluative brain. Annu. Rev. Neurosci.
Parkinsonism. J. Cogn. Neurosci. 17, 51–72. 31, 359–387.
Freeman Jr., J.H., Cuppernell, C., Flannery, K., Gabriel, M., 1996. Context-specific Graybiel, A.M., Aosaki, T., Flaherty, A.W., Kimura, M., 1994. The basal ganglia and
multi-site cingulate cortical, limbic thalamic, and hippocampal neuronal activ- adaptive motor control. Science 265, 1826–1831.
ity during concurrent discriminative approach and avoidance training in rab- Groenewegen, H.J., Galis-de Graaf, Y., Smeets, W.J., 1999a. Integration and segrega-
bits. J. Neurosci. 16, 1538–1549. tion of limbic cortico-striatal loops at the thalamic level: an experimental
Freeman Jr., J.H., Weible, A., Rossi, J., Gabriel, M., 1997. Lesions of the entorhinal tracing study in rats. J. Chem. Neuroanat. 16, 167–185.
cortex disrupt behavioral and neuronal responses to context change during Groenewegen, H.J., Vermeulen-Van der Zee, E., te Kortschot, A., Witter, M.P., 1987.
extinction of discriminative avoidance behavior. Exp. Brain Res. 115, 445–457. Organization of the projections from the subiculum to the ventral striatum in
French, S.J., Totterdell, S., 2002. Hippocampal and prefrontal cortical inputs mono- the rat. A study using anterograde transport of Phaseolus vulgaris leucoagglu-
synaptically converge with individual projection neurons of the nucleus accum- tinin. Neuroscience 23, 103–120.
bens. J. Comp. Neurol. 446, 151–165. Groenewegen, H.J., Wright, C.I., Beijer, A.V., Voorn, P., 1999b. Convergence and
Frey, U., Matthies, H., Reymann, K.G., 1991. The effect of dopaminergic D1 receptor segregation of ventral striatal inputs and outputs. Ann. N. Y. Acad. Sci. 877, 49–
blockade during tetanization on the expression of long-term potentiation in the 63.
rat CA1 region in vitro. Neurosci. Lett. 129, 111–114. Guthrie, E.R., 1935. The Psychology of Learning. Harper, New York.
Frey, U., Morris, R.G., 1997. Synaptic tagging and long-term potentiation. Nature Guzowski, J.F., Knierim, J.J., Moser, E.I., 2004. Ensemble dynamics of hippocampal
385, 533–536. regions CA3 and CA1. Neuron 44, 581–584.
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 129
Haber, S.N., 2003. The primate basal ganglia: parallel and integrative networks. J. Holmes, N.M., Marchand, A.R., Coutureau, E., 2010. Pavlovian to instrumental
Chem. Neuroanat. 26, 317–330. transfer: a neurobehavioural perspective. Neurosci. Biobehav. Rev. 34, 1277–
Haber, S.N., Fudge, J.L., McFarland, N.R., 2000. Striatonigrostriatal pathways in 1295.
primates form an ascending spiral from the shell to the dorsolateral striatum. Honzik, C.H., 1933. Maze learning in rats on the absence of specific intra- and extra-
J. Neurosci. 20, 2369–2382. maze stimuli. Psychol. Bull. 30, 589–590.
Hafting, T., Fyhn, M., Molden, S., Moser, M.B., Moser, E.I., 2005. Microstructure of a Hoogenboom, N., Schoffelen, J.M., Oostenveld, R., Parkes, L.M., Fries, P., 2006.
spatial map in the entorhinal cortex. Nature 436, 801–806. Localizing human visual gamma-band activity in frequency, time and space.
Hall, J., Parkinson, J.A., Connor, T.M., Dickinson, A., Everitt, B.J., 2001. Involvement of Neuroimage 29, 764–773.
the central nucleus of the amygdala and nucleus accumbens core in mediating Horvitz, J.C., 2002. Dopamine gating of glutamatergic sensorimotor and incentive
Pavlovian influences on instrumental behaviour. Eur. J. Neurosci. 13, 1984– motivational input signals to the striatum. Behav. Brain Res. 137, 65–74.
1992. Horvitz, J.C., Stewart, T., Jacobs, B.L., 1997. Burst activity of ventral tegmental
Hallworth, N.E., Bland, B.H., 2004. Basal ganglia–hippocampal interactions support dopamine neurons is elicited by sensory stimuli in the awake cat. Brain Res.
the role of the hippocampal formation in sensorimotor integration. Exp. Neurol. 759, 251–258.
188, 430–443. Houk, J.C., 1995. Information processing in modular circuits linking basal ganglia
Hamilton, D.A., Driscoll, I., Sutherland, R.J., 2002. Human place learning in a virtual and cerebral cortex. In: Houk, J.C., Davis, J.L., Beiser, D.G. (Eds.), Models of
Morris water task: some important constraints on the flexibility of place Information Processing in the Basal Ganglia. MIT Press, Cambridge.
navigation. Behav. Brain Res. 129, 159–170. Houk, J.C., Davis, J.L., Beiser, D.G., 1995. Models of Information Processing in the
Hammond, L.J., 1980. The effect of contingency upon the appetitive conditioning of Basal Ganglia. MIT Press, Cambridge, MA.
free-operant behavior. J. Exp. Anal. Behav. 34, 297–304. Huang, Y.Y., Kandel, E.R., 1995. D1/D5 receptor agonists induce a protein synthesis-
Hampson, R.E., Heyser, C.J., Deadwyler, S.A., 1993. Hippocampal cell firing correlates dependent late potentiation in the CA1 region of the hippocampus. Proc. Natl.
of delayed-match-to-sample performance in the rat. Behav. Neurosci. 107, 715– Acad. Sci. U.S.A. 92, 2446–2450.
739. Hull, C.L., 1932. The goal gradient hypothesis and maze learning. Psychol. Rev. 39,
Hargreaves, E.L., Yoganarasimha, D., Knierim, J.J., 2007. Cohesiveness of spatial and 25–43.
directional representations recorded from neural ensembles in the anterior Hull, C.L., 1943. Principles of Behavior. Appleton-Century Crofts, New York.
thalamus, parasubiculum, medial entorhinal cortex, and hippocampus. Hippo- Humphries, M.D., Prescott, T.J., 2010. The ventral basal ganglia, a selection mecha-
campus 17, 826–841. nism at the crossroads of space, strategy, and reward. Prog. Neurobiol. 90, 385–
Haruno, M., Kawato, M., 2006. Heterarchical reinforcement-learning model for 417.
integration of multiple cortico-striatal loops: fMRI examination in stimulus– Hunsaker, M.R., Mooy, G.G., Swift, J.S., Kesner, R.P., 2007. Dissociations of the medial
action–reward association learning. Neural Netw. 19, 1242–1254. and lateral perforant path projections into dorsal DG, CA3, and CA1 for spatial
Hassani, O.K., Cromwell, H.C., Schultz, W., 2001. Influence of expectation of different and nonspatial (visual object) information processing. Behav. Neurosci. 121,
rewards on behavior-related neuronal activity in the striatum. J. Neurophysiol. 742–750.
85, 2477–2489. Hyman, J.M., Zilli, E.A., Paley, A.M., Hasselmo, M.E., 2005. Medial prefrontal cortex
Hasselmo, M.E., 2005a. The role of hippocampal regions CA3 and CA1 in matching cells show dynamic modulation with the hippocampal theta rhythm dependent
entorhinal input with retrieval of associations between objects and context: on behavior. Hippocampus 15, 739–749.
theoretical comment on Lee et al. (2005). Behav. Neurosci. 119, 342–345. Ikemoto, S., 2007. Dopamine reward circuitry: two projection systems from the
Hasselmo, M.E., 2005b. What is the function of hippocampal theta rhythm? Linking ventral midbrain to the nucleus accumbens-olfactory tubercle complex. Brain
behavioral data to phasic properties of field potential and unit recording data. Res. Rev. 56, 27–78.
Hippocampus 15, 936–949. Ikemoto, S., Panksepp, J., 1999. The role of nucleus accumbens dopamine in
Hasselmo, M.E., Hay, J., Ilyn, M., Gorchetchnikov, A., 2002. Neuromodulation, theta motivated behavior: a unifying interpretation with special reference to re-
rhythm and rat spatial navigation. Neural Netw. 15, 689–707. ward-seeking. Brain Res. Brain Res. Rev. 31, 6–41.
Hasselmo, M.E., McGaughy, J., 2004. High acetylcholine levels set circuit dynamics Ito, R., Robbins, T.W., Pennartz, C.M., Everitt, B.J., 2008. Functional interaction
for attention and encoding and low acetylcholine levels set dynamics for between the hippocampus and nucleus accumbens shell is necessary for the
consolidation. Prog. Brain Res. 145, 207–231. acquisition of appetitive spatial context conditioning. J. Neurosci. 28, 6950–
Hauber, W., Sommer, S., 2009. Prefrontostriatal circuitry regulates effort-related 6959.
decision making. Cereb. Cortex 19, 2240–2247. Izquierdo, I., Bevilaqua, L.R., Rossato, J.I., Bonini, J.S., Da Silva, W.C., Medina, J.H.,
Hawkes, K., Hill, K., O’Connell, J., 1982. Why hunters gather—optimal foraging and Cammarota, M., 2006. The connection between the hippocampal and the
the ache of eastern Paraguay. Am. Ethnol. 9, 379–398. striatal memory systems of the brain: a review of recent findings. Neurotox.
Hebb, D.O., 1949. The Organization of Behavior: A Neuropsychological Theory. John Res. 10, 113–121.
Wiley and Sons. Jackson, J., Redish, A.D., 2007. Network dynamics of hippocampal cell-assemblies
Heimer, L., Zahm, D.S., Churchill, L., Kalivas, P.W., Wohltmann, C., 1991. Specificity resemble multiple spatial maps within single tasks. Hippocampus 17, 1209–
in the projection patterns of accumbal core and shell in the rat. Neuroscience 41, 1229.
89–125. Jaeger, D., Gilman, S., Aldridge, J.W., 1993. Primate basal ganglia activity in
Henriksen, E.J., Colgin, L.L., Barnes, C.A., Witter, M.P., Moser, M.B., Moser, E.I., 2010. a precued reaching task: preparation for movement. Exp. Brain Res. 95,
Spatial representation along the proximodistal axis of CA1. Neuron 68, 127– 51–64.
137. Jay, T.M., Glowinski, J., Thierry, A.M., 1989. Selectivity of the hippocampal
Herkenham, M., Nauta, W.J., 1979. Efferent connections of the habenular nuclei in projection to the prelimbic area of the prefrontal cortex in the rat. Brain
the rat. J. Comp. Neurol. 187, 19–47. Res. 505, 337–340.
Hetherington, P.A., Shapiro, M.L., 1997. Hippocampal place fields are altered by the Jeffery, K.J., Anderson, M.I., Hayman, R., Chakraborty, S., 2004. A proposed architec-
removal of single visual cues in a distance-dependent manner. Behav. Neurosci. ture for the neural representation of spatial context. Neurosci. Biobehav. Rev.
111, 20–34. 28, 201–218.
Hikosaka, O., Bromberg-Martin, E., Hong, S., Matsumoto, M., 2008. New insights on Jeffery, K.J., Gilbert, A., Burton, S., Strudwick, A., 2003. Preserved performance in a
the subcortical representation of reward. Curr. Opin. Neurobiol. 18, 203–208. hippocampal-dependent spatial task despite complete place cell remapping.
Hikosaka, O., Nakahara, H., Rand, M.K., Sakai, K., Lu, X., Nakamura, K., Miyachi, S., Hippocampus 13, 175–189.
Doya, K., 1999. Parallel neural networks for learning sequential procedures. Jenkins, H.M., Moore, B.R., 1973. The form of the auto-shaped response with food or
Trends Neurosci. 22, 464–471. water reinforcers. J. Exp. Anal. Behav. 20, 163–181.
Hikosaka, O., Nakamura, K., Nakahara, H., 2006. Basal ganglia orient eyes to reward. Jensen, O., Lisman, J.E., 1996. Hippocampal CA3 region predicts memory sequences:
J. Neurophysiol. 95, 567–584. accounting for the phase precession of place cells. Learn. Mem. 3, 279–287.
Hikosaka, O., Sakamoto, M., Usui, S., 1989. Functional properties of monkey caudate Jin, X., Costa, R.M., 2010. Start/stop signals emerge in nigrostriatal circuits during
neurons. III. Activities related to expectation of target and reward. J. Neuro- sequence learning. Nature 466, 457–462.
physiol. 61, 814–832. Jo, Y.S., Lee, I., 2010. Disconnection of the hippocampal–perirhinal cortical circuits
Hill, A.J., 1978. First occurrence of hippocampal spatial firing in a new environment. severely disrupts object–place paired associative memory. J. Neurosci. 30,
Exp. Neurol. 62, 282–297. 9850–9858.
Hill, A.J., Best, P.J., 1981. Effects of deafness and blindness on the spatial correlates of Joel, D., Niv, Y., Ruppin, E., 2002. Actor–critic models of basal ganglia function: new
hippocampal unit activity in the rat. Exp. Neurol. 74, 204–217. anatomical and computational perpectives. Neural Netw. 15, 535–547.
Hirsh, R., 1974. The hippocampus and contextual retrieval of information from Joel, D., Weiner, I., 1994. The organization of the basal ganglia–thalamocortical
memory: a theory. Behav. Biol. 12, 421–444. circuits: open interconnected rather than closed segregated. Neuroscience 63,
Hoge, J., Kesner, R.P., 2007. Role of CA3 and CA1 subregions of the dorsal hippo- 363–379.
campus on temporal processing of objects. Neurobiol. Learn. Mem. 88, 225–231. Joel, D., Weiner, I., 2000. The connections of the dopaminergic system with the
Hollerman, J.R., Schultz, W., 1998. Dopamine neurons report an error in the striatum in rats and primates: an analysis with respect to the functional and
temporal prediction of reward during learning. Nat. Neurosci. 1, 304–309. compartmental organization of the striatum. Neuroscience 96, 451–474.
Hollerman, J.R., Tremblay, L., Schultz, W., 1998. Influence of reward expectation on Jog, M.S., Kubota, Y., Connolly, C.I., Hillegaart, V., Graybiel, A.M., 1999. Building
behavior-related neuronal activity in primate striatum. J. Neurophysiol. 80, neural representations of habits. Science 286, 1745–1749.
947–963. Johnson, A., van der Meer, M.A., Redish, A.D., 2007. Integrating hippocampus and
Hollup, S.A., Kjelstrup, K.G., Hoff, J., Moser, M.B., Moser, E.I., 2001. Impaired striatum in decision-making. Curr. Opin. Neurobiol. 17, 692–697.
recognition of the goal location during spatial navigation in rats with hippo- Jones, M.W., Wilson, M.A., 2005. Theta rhythms coordinate hippocampal–prefrontal
campal lesions. J. Neurosci. 21, 4505–4513. interactions in a spatial memory task. PLoS Biol. 3, e402.
130 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
Jongen-Relo, A.L., Voorn, P., Groenewegen, H.J., 1994. Immunohistochemical char- Lansink, C.S., Goltstein, P.M., Lankelma, J.V., Joosten, R.N., McNaughton, B.L., Pen-
acterization of the shell and core territories of the nucleus accumbens in the rat. nartz, C.M., 2008. Preferential reactivation of motivationally relevant informa-
Eur. J. Neurosci. 6, 1255–1264. tion in the ventral striatum. J. Neurosci. 28, 6372–6382.
Joshua, M., Adler, A., Mitelman, R., Vaadia, E., Bergman, H., 2008. Midbrain dopa- Lansink, C.S., Goltstein, P.M., Lankelma, J.V., McNaughton, B.L., Pennartz, C.M., 2009.
minergic neurons and striatal cholinergic interneurons encode the difference Hippocampus leads ventral striatum in replay of place-reward information.
between reward and aversive events at different epochs of probabilistic classi- PLoS Biol. 7, e1000173.
cal conditioning trials. J. Neurosci. 28, 11673–11684. Lavoie, A.M., Mizumori, S.J., 1994. Spatial, movement- and reward-sensitive dis-
Jung, M.W., Wiener, S.I., McNaughton, B.L., 1994. Comparison of spatial firing charge by medial ventral striatum neurons of rats. Brain Res. 638, 157–168.
characteristics of units in dorsal and ventral hippocampus of the rat. J. Neurosci. Lee, A.K., Wilson, M.A., 2002. Memory of sequential experience in the hippocampus
14, 7347–7356. during slow wave sleep. Neuron 36, 1183–1194.
Kalenscher, T., Lansink, C.S., Lankelma, J.V., Pennartz, C.M., 2010. Reward-associated Lee, I., Knierim, J.J., 2007. The relationship between the field-shifting phenomenon
gamma oscillations in ventral striatum are regionally differentiated and mod- and representational coherence of place cells in CA1 and CA3 in a cue-altered
ulate local firing activity. J. Neurophysiol. 103, 1658–1672. environment. Learn. Mem. 14, 807–815.
Karni, A., Meyer, G., Rey-Hipolito, C., Jezzard, P., Adams, M.M., Turner, R., Ungerlei- Lee, I., Yoganarasimha, D., Rao, G., Knierim, J.J., 2004. Comparison of population
der, L.G., 1998. The acquisition of skilled motor performance: fast and slow coherence of place cells in hippocampal subfields CA1 and CA3. Nature 430,
experience-driven changes in primary motor cortex. Proc. Natl. Acad. Sci. U.S.A. 456–459.
95, 861–868. Lemon, N., Manahan-Vaughan, D., 2006. Dopamine D1/D5 receptors gate the
Kelemen, E., Fenton, A.A., 2010. Dynamic grouping of hippocampal neural activity acquisition of novel information through hippocampal long-term potentiation
during cognitive control of two spatial frames. PLoS 8, e1000403. and long-term depression. J. Neurosci. 26, 7723–7729.
Kelley, A.E., 2004. Ventral striatal control of appetitive motivation: role in Lenck-Santini, P.P., Muller, R.U., Save, E., Poucet, B., 2002. Relationships between
ingestive behavior and reward-related learning. Neurosci. Biobehav. Rev. place cell firing fields and navigational decisions by rats. J. Neurosci. 22, 9035–
27, 765–776. 9047.
Kennedy, P.J., Shapiro, M.L., 2004. Retrieving memories via internal context requires Lenck-Santini, P.P., Save, E., Poucet, B., 2001. Evidence for a relationship between
the hippocampus. J. Neurosci. 24, 6979–6985. place-cell spatial firing and spatial memory performance. Hippocampus 11,
Kentros, C.G., Agnihotri, N.T., Streater, S., Hawkins, R.D., Kandel, E.R., 2004. Increased 377–390.
attention to spatial context increases both place field stability and spatial Leung, L.S., Yim, C.Y., 1993. Rhythmic delta-frequency activities in the nucleus
memory. Neuron 42, 283–295. accumbens of anesthetized and freely moving rats. Can. J. Physiol. Pharmacol.
Kentros, C., Hargreaves, E., Hawkins, R.D., Kandel, E.R., Shapiro, M., Muller, R.V., 71, 311–320.
1998. Abolition of long-term stability of new hippocampal place cell maps by Leutgeb, J.K., Leutgeb, S., Moser, M.B., Moser, E.I., 2007. Pattern separation in the
NMDA receptor blockade. Science 280, 2121–2126. dentate gyrus and CA3 of the hippocampus. Science 315, 961–966.
Kesner, R.P., 2007. Behavioral functions of the CA3 subregion of the hippocampus. Leutgeb, S., Leutgeb, J.K., Treves, A., Moser, M.B., Moser, E.I., 2004. Distinct ensemble
Learn. Mem. 14, 771–781. codes in hippocampal areas CA3 and CA1. Science 305, 1295–1298.
Kesner, R.P., Lee, I., Gilbert, P., 2004. A behavioral assessment of hippocampal Lever, C., Burton, S., Jeewajee, A., O’Keefe, J., Burgess, N., 2009. Boundary vector cells
function based on a subregional analysis. Rev. Neurosci. 15, 333–351. in the subiculum of the hippocampal formation. J. Neurosci. 29, 9771–9777.
Khamassi, M., Lacheze, L., Girard, B., Berthoz, A., Guillot, A., 2005. Actor–critic Li, S., Cullen, W.K., Anwyl, R., Rowan, M.J., 2003. Dopamine-dependent facilitation of
models of reinforcement learning in the basal ganglia: from natural to arificial LTP induction in hippocampal CA1 by exposure to spatial novelty. Nat. Neurosci.
rats. Adapt. Behav. 13, 131–148. 6, 526–531.
Khamassi, M., Mulder, A.B., Tabuchi, E., Douchamps, V., Wiener, S.I., 2008. Antici- Lima, S.L., 1983. Downy woodpecker foraging behavior-foraging by expectation and
patory reward signals in ventral striatal neurons of behaving rats. Eur. J. energy intake rate. Oecologia 58, 232–237.
Neurosci. 28, 1849–1866. Lisman, J.E., 1999. Relating hippocampal circuitry to function: recall of memory
Kim, J.J., Fanselow, M.S., 1992. Modality-specific retrograde amnesia of fear. Science sequences by reciprocal dentate–CA3 interactions. Neuron 22, 233–242.
256, 675–677. Lisman, J.E., Grace, A.A., 2005. The hippocampal–VTA loop: controlling the entry of
Kimchi, E.Y., Laubach, M., 2009. Dynamic encoding of action selection by the medial information into long-term memory. Neuron 46, 703–713.
striatum. J. Neurosci. 29, 3148–3159. Lisman, J., Redish, A.D., 2009. Prediction, sequences and the hippocampus. Philos.
Kimura, M., Aosaki, T., Hu, Y., Ishida, A., Watanabe, K., 1992. Activity of primate Trans. R. Soc. Lond. B: Biol. Sci. 364, 1193–1201.
putamen neurons is selective to the mode of voluntary movement: visually Ljungberg, T., Apicella, P., Schultz, W., 1992. Responses of monkey dopamine
guided, self-initiated or memory-guided. Exp. Brain Res. 89, 473–477. neurons during learning of behavioral reactions. J. Neurophysiol. 67, 145–163.
Kincaid, A.E., Zheng, T., Wilson, C.J., 1998. Connectivity and convergence of single Locurto, C., Terrace, H.S., Gibbon, J., 1976. Autoshaping, random control, and
corticostriatal axons. J. Neurosci. 18, 4722–4731. omission training in the rat. J. Exp. Anal. Behav. 26, 451–462.
Knierim, J.J., Kudrimoti, H.S., McNaughton, B.L., 1995. Place cells, head direction Lodge, D.J., Grace, A.A., 2006. The laterodorsal tegmentum is essential for burst
cells, and the learning of landmark stability. J. Neurosci. 15, 1648–1659. firing of ventral tegmental area dopamine neurons. Proc. Natl. Acad. Sci. U.S.A.
Knierim, J.J., Lee, I., Hargreaves, E.L., 2006. Hippocampal place cells: parallel input 103, 5167–5172.
streams, subregional processing, and implications for episodic memory. Hip- Long, J.M., Kesner, R.P., 1996. The effects of dorsal versus ventral hippocampal, total
pocampus 16, 755–764. hippocampal, and parietal cortex lesions on memory for allocentric distance in
Kobayashi, S., Schultz, W., 2008. Influence of reward delays on responses of rats. Behav. Neurosci. 110, 922–932.
dopamine neurons. J. Neurosci. 28, 7837–7846. Lopes da Silva, F.H., Arnolds, D.E., Neijt, H.C., 1984. A functional link between the
Kobayashi, Y., Isa, T., 2002. Sensory-motor gating and cognitive control by the limbic cortex and ventral striatum: physiology of the subiculum accumbens
brainstem cholinergic system. Neural Netw. 15, 731–741. pathway. Exp. Brain Res. 55, 205–214.
Koch, M., Schmid, A., Schnitzler, H.U., 2000. Role of muscles accumbens dopamine Louie, K., Wilson, M.A., 2001. Temporally structured replay of awake hippocampal
D1 and D2 receptors in instrumental and Pavlovian paradigms of conditioned ensemble activity during rapid eye movement sleep. Neuron 29, 145–156.
reward. Psychopharmacology 152, 67–73. Ludvig, E.A., Sutton, R.S., Kehoe, E.J., 2008. Stimulus representation and the timing of
Krebs, J.R., McCleery, R.H., 1984. Optimization in behavioural ecology. In: Davies, reward-prediction errors in models of the dopamine system. Neural Comput.
J.R.K.N.B. (Ed.), Behavioural Ecology. Sinauer, Sunderland, MA, pp. 91– 20, 3034–3054.
121. MacArthur, R.H., Pianka, E.R., 1966. On optimal use of patchy environments. Am.
Kropf, W., Kuschinsky, K., 1993. Conditioned effects of apomorphine are manifest in Nat. 100, 603–609.
regional EEG of rats both in hippocampus and in striatum. Naunyn Schmiede- Maia, T.V., 2009. Reinforcement learning, conditioning, and the brain: success and
bergs Arch. Pharmacol. 347, 487–493. challenges. Cogn. Affect. Behav. Neurosci. 9, 343–364.
Kruse, J.M., Overmier, B., Konz, W.A., Rokke, E., 1983. Pavlovian conditioned Maldonado-Irizarry, C.S., Kelley, A.E., 1995. Excitatory amino acid receptors within
stimulus effects upon instrumental choice behavior are reinforcer specific. nucleus accumbens subregions differentially mediate spatial learning in the rat.
Learn. Motiv. 14, 165–181. Behav. Pharmacol. 6, 527–539.
Kubie, J.L., Ranck Jr., J.B., 1983. Sensory-behavioral correlates in individual Maren, S., 2001. Neurobiology of Pavlovian fear conditioning. Annu. Rev. Neurosci.
hippocampus neurons in three situations: space and context. In: Seifert, 24, 897–931.
W. (Ed.), Neurobiology of the Hippocampus. Academic, New York, pp. Markus, E.J., Barnes, C.A., McNaughton, B.L., Gladden, V.L., Skaggs, W.E., 1994.
433–447. Spatial information content and reliability of hippocampal CA1 neurons: effects
Kubota, Y., Liu, J., Hu, D., DeCoteau, W.E., Eden, U.T., Smith, A.C., Graybiel, A.M., 2009. of visual input. Hippocampus 4, 410–421.
Stable encoding of task structure coexists with flexible coding of task events in Markus, E.J., Qin, Y.L., Leonard, B., Skaggs, W.E., McNaughton, B.L., Barnes, C.A., 1995.
sensorimotor striatum. J. Neurophysiol. 102, 2142–2160. Interactions between location and task affect the spatial and directional firing of
Kurth-Nelson, Z., Redish, A.D., 2009. Temporal-difference reinforcement learning hippocampal neurons. J. Neurosci. 15, 7079–7094.
with distributed representations. PLoS One 4, e7362. Marowsky, A., Yanagawa, Y., Obata, K., Vogt, K.E., 2005. A specialized subclass of
Kurth-Nelson, Z., Redish, A.D., 2010. A reinforcement learning model of precom- interneurons mediates dopaminergic facilitation of amygdala function. Neuron
mitment in decision making. Front. Behav. Neurosci. 4, 184. 48, 1025–1037.
Kusuki, T., Imahori, Y., Ueda, S., Inokuchi, K., 1997. Dopaminergic modulation of LTP Marr, D., 1971. Simple memory: a theory for archicortex. Philos. Trans. R. Soc. Lond.
induction in the dentate gyrus of intact brain. Neuroreport 8, 2037–2040. B: Biol. Sci. 262, 23–81.
Langston, R.F., Ainge, J.A., Couey, J.J., Canto, C.B., Bjerknes, T.L., Witter, M.P., Moser, Martig, A.K., Jones, G.L., Smith, K.E., Mizumori, S.J., 2009. Context dependent effects
E.I., Moser, M.B., 2010. Development of the spatial representation system in the of ventral tegmental area inactivation on spatial working memory. Behav. Brain
rat. Science 328, 1576–1580. Res. 203, 316–320.
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 131
Martig, A.K., Mizumori, S.J., 2011. Ventral tegmental area disruption selectively CA3c output: evidence for pattern completion in hippocampus. J. Neurosci. 9,
affects CA1/CA2 but not CA3 place fields during a differential reward working 3915–3928.
memory task. Hippocampus 21, 172–184. Mizumori, S.J., Puryear, C.B., Martig, A.K., 2009. Basal ganglia contributions to
Martin, S.J., Grimwood, P.D., Morris, R.G., 2000. Synaptic plasticity and memory: an adaptive navigation. Behav. Brain Res. 199, 32–42.
evaluation of the hypothesis. Annu. Rev. Neurosci. 23, 649–711. Mizumori, S.J., Ragozzino, K.E., Cooper, B.G., Leutgeb, S., 1999. Hippocampal repre-
Matsumoto, M., Hikosaka, O., 2007. Lateral habenula as a source of negative reward sentational organization and spatial context. Hippocampus 9, 444–451.
signals in dopamine neurons. Nature 447, 1111–1115. Mizumori, S.J., Smith, D.M., Puryear, C.B., 2007a. Hippocampal and neocortical
Matsumoto, M., Hikosaka, O., 2009. Two types of dopamine neuron distinctly interactions during context discrimination: electrophysiological evidence from
convey positive and negative motivational signals. Nature 459, 837–841. the rat. Hippocampus 17, 851–862.
Maurer, A.P., Vanrhoads, S.R., Sutherland, G.R., Lipa, P., McNaughton, B.L., 2005. Self- Mizumori, S.J., Yeshenko, O., Gill, K.M., Davis, D.M., 2004. Parallel processing across
motion and the origin of differential spatial scaling along the septo-temporal neural systems: implications for a multiple memory system hypothesis. Neu-
axis of the hippocampus. Hippocampus 15, 841–852. robiol. Learn. Mem. 82, 278–298.
McClelland, J.L., McNaughton, B.L., O’Reilly, R.C., 1995. Why there are complemen- Mizumori, S.J.Y., 2008. Hippocampal Place Fields: Relevance to Learning and
tary learning systems in the hippocampus and neocortex: insights from the Memory. Oxford University Press, New York.
successes and failures of connectionist models of learning and memory. Psy- Mizumori, S.J.Y., Smith, D.M., Puryear, C.B., 2007b. Mnemonic contributions of
chol. Rev. 102, 419–457. hippocampal place cells. In: Martinez, J.L., Kesner, R.P. (Eds.), Neurobiology
McDonald, R.J., White, N.M., 1993. A triple dissociation of memory systems: of Learning and Memory. Academic Press.
hippocampus, amygdala, and dorsal striatum. Behav. Neurosci. 107, 3–22. Mogenson, G.J., Jones, D.L., Yim, C.Y., 1980. From motivation to action: functional
McFarland, K., Ettenberg, A., 1995. Haloperidol differentially affects reinforcement interface between the limbic system and the motor system. Prog. Neurobiol. 14,
and motivational processes in rats running an alley for intravenous heroin. 69–97.
Psychopharmacology (Berl) 122, 346–350. Molina-Luna, K., Pekanovic, A., Rohrich, S., Hertler, B., Schubring-Giese, M., Rioult-
McGeorge, A.J., Faull, R.L., 1987. The organization and collateralization of corti- Pedotti, M.S., Luft, A.R., 2009. Dopamine in motor cortex is necessary for skill
costriate neurones in the motor and sensory cortex of the rat brain. Brain Res. learning and synaptic plasticity. PLoS One 4, e7082.
423, 318–324. Montague, P.R., Dayan, P., Sejnowski, T.J., 1996. A framework for mesencephalic
McGeorge, A.J., Faull, R.L., 1989. The organization of the projection from the cerebral dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–
cortex to the striatum in the rat. Neuroscience 29, 503–537. 1947.
McHugh, T.J., Blum, K.I., Tsien, J.Z., Tonegawa, S., Wilson, M.A., 1996. Impaired Montgomery, S.M., Betancur, M.I., Buzsaki, G., 2009. Behavior-dependent coordi-
hippocampal representation of space in CA1-specific NMDAR1 knockout mice. nation of multiple theta dipoles in the hippocampus. J. Neurosci. 29, 1381–
Cell 87, 1339–1349. 1394.
McNaughton, B.L., Barnes, C.A., Gerrard, J.L., Gothard, K., Jung, M.W., Knierim, J.J., Morris, R.G., Frey, U., 1997. Hippocampal synaptic plasticity: role in spatial learning
Kudrimoti, H., Qin, Y., Skaggs, W.E., Suster, M., Weaver, K.L., 1996. Deciphering or the automatic recording of attended experience? Philos. Trans. R. Soc. Lond.
the hippocampal polyglot: the hippocampus as a path integration system. J. B: Biol. Sci. 352, 1489–1503.
Exp. Biol. 199, 173–185. Morris, R.G.M., 1981. Spatial localization does not require the presence of local cues.
McNaughton, B.L., Barnes, C.A., O’Keefe, J., 1983. The contributions of position, Learn. Motiv. 12, 239–260.
direction, and velocity to single unit activity in the hippocampus of freely- Moscovitch, M., Rosenbaum, R.S., Gilboa, A., Addis, D.R., Westmacott, R., Grady, C.,
moving rats. Exp. Brain Res. 52, 41–49. McAndrews, M.P., Levine, B., Black, S., Winocur, G., Nadel, L., 2005. Functional
Mehta, M.R., Barnes, C.A., McNaughton, B.L., 1997. Experience-dependent, asym- neuroanatomy of remote episodic, semantic and spatial memory: a unified
metric expansion of hippocampal place fields. Proc. Natl. Acad. Sci. U.S.A. 94, account based on multiple trace theory. J. Anat. 207, 35–66.
8918–8921. Moser, E.I., Kropff, E., Moser, M.B., 2008. Place cells, grid cells, and the brain’s spatial
Mehta, M.R., Quirk, M.C., Wilson, M.A., 2000. Experience-dependent asymmetric representation system. Annu. Rev. Neurosci. 31, 69–89.
shape of hippocampal receptive fields. Neuron 25, 707–715. Mott, A.M., Nunes, E.J., Collins, L.E., Port, R.G., Sink, K.S., Hockemeyer, J., Muller,
Meredith, G.E., 1999. The synaptic framework for chemical signaling in nucleus C.E., Salamone, J.D., 2009. The adenosine A2A antagonist MSX-3 reverses the
accumbens. Ann. N. Y. Acad. Sci. 877, 140–156. effects of the dopamine antagonist haloperidol on effort-related decision
Meredith, G.E., Agolia, R., Arts, M.P., Groenewegen, H.J., Zahm, D.S., 1992. Morpho- making in a T-maze cost/benefit procedure. Psychopharmacology (Berl) 204,
logical differences between projection neurons of the core and shell in the 103–112.
nucleus accumbens of the rat. Neuroscience 50, 149–162. Mulder, A.B., Hodenpijl, M.G., Lopes da Silva, F.H., 1998. Electrophysiology of the
Meredith, G.E., Baldo, B.A., Andrezjewski, M.E., Kelley, A.E., 2008. The structural hippocampal and amygdaloid projections to the nucleus accumbens of the
basis for mapping behavior onto the ventral striatum and its subdivisions. Brain rat: convergence, segregation, and interaction of inputs. J. Neurosci. 18,
Struct. Funct. 213, 17–27. 5095–5102.
Meredith, G.E., Pattiselanno, A., Groenewegen, H.J., Haber, S.N., 1996. Shell and core Mulder, A.B., Tabuchi, E., Wiener, S.I., 2004. Neurons in hippocampal afferent zones
in monkey and human nucleus accumbens identified with antibodies to cal- of rat striatum parse routes into multi-pace segments during maze navigation.
bindin-D28k. J. Comp. Neurol. 365, 628–639. Eur. J. Neurosci. 19, 1923–1932.
Mesulam, M.M., 1981. A cortical network for directed attention and unilateral Muller, R.U., Kubie, J.L., 1987. The effects of changes in the environment on the
neglect. Ann. Neurol. 10, 309–325. spatial firing of hippocampal complex-spike cells. J. Neurosci. 7, 1951–1968.
Mingote, S., Font, L., Farrar, A.M., Vontell, R., Worden, L.T., Stopper, C.M., Port, R.G., Muller, R.U., Kubie, J.L., 1989. The firing of hippocampal place cells predicts the
Sink, K.S., Bunce, J.G., Chrobak, J.J., Salamone, J.D., 2008a. Nucleus accumbens future position of freely moving rats. J. Neurosci. 9, 4101–4110.
adenosine A2A receptors regulate exertion of effort by acting on the ventral Muller, R.U., Stead, M., Pach, J., 1996. The hippocampus as a cognitive graph. J. Gen.
striatopallidal pathway. J. Neurosci. 28, 9037–9046. Physiol. 107, 663–694.
Mingote, S., Pereira, M., Farrar, A.M., McLaughlin, P.J., Salamone, J.D., 2008b. Munn, N.L., 1950. Handbook of Psychological Research on the Rat; An Introduction
Systemic administration of the adenosine A(2A) agonist CGS 21680 induces to Animal Psychology. Houghton Mifflin, Oxford.
sedation at doses that suppress lever pressing and food intake. Pharmacol. Myers, C.E., Gluck, M., 1994. Context, conditioning, and hippocampal rerepresenta-
Biochem. Behav. 89, 345–351. tion in animal learning. Behav. Neurosci. 108, 835–847.
Mirenowicz, J., Schultz, W., 1994. Importance of unpredictability for reward Nadel, L., Payne, J.D., 2002. The hippocampus, wayfinding and episodic memory. In:
responses in primate dopamine neurons. J. Neurophysiol. 72, 1024–1027. Sharp, P.E. (Ed.), The Neural Basis of Navigation: Evidence from Single Cell
Mishkin, M., Malamut, B., Bachevalier, J., 1984. Memories and habits: two neural Recording. Kluwer Academic Publication, MA.
systems. In: Lynch, G., MCGaugh, J.L., Weinberger, N.M. (Eds.), Neurobiology of Nadel, L., Wilner, J., 1980. Context and conditioning: a place for space. Physiol.
Learning and Memory. Guilford, New York. Psychol. 8, 218–228.
Miyashita, T., Kubik, S., Haghighi, N., Steward, O., Guzowski, J.F., 2009. Rapid Nai, Q., Li, S., Wang, S.H., Liu, J., Lee, F.J., Frankland, P.W., Liu, F., 2010. Uncoupling the
activation of plasticity-associated gene transcription in hippocampal neurons D1–N-methyl-D-aspartate (NMDA) receptor complex promotes NMDA-depen-
provides a mechanism for encoding of one-trial experience. J. Neurosci. 29, dent long-term potentiation and working memory. Biol. Psychiatry 67, 246–
898–906. 254.
Miyazaki, K.W., Miyazaki, K., Doya, K., 2011. Activation of the central serotonergic Nair-Roberts, R.G., Chatelain-Badie, S.D., Benson, E., White-Cooper, H., Bolam, J.P.,
system in response to delayed but not omitted rewards. Eur. J. Neurosci. 33, Ungless, M.A., 2008. Stereological estimates of dopaminergic, GABAergic and
153–160. glutamatergic neurons in the ventral tegmental area, substantia nigra and
Mizumori, S.J., 2006. Hippocampal place fields: a neural code for episodic memory? retrorubral field in the rat. Neuroscience 152, 1024–1031.
Hippocampus 16, 685–690. Nakahara, H., Itoh, H., Kawagoe, R., Takikawa, Y., Hikosaka, O., 2004. Dopamine
Mizumori, S.J., Barnes, C.A., McNaughton, B.L., 1989a. Reversible inactivation of the neurons can represent context-dependent prediction error. Neuron 41, 269–
medial septum: selective effects on the spontaneous unit activity of different 280.
hippocampal cell types. Brain Res. 500, 99–106. Nicola, S.M., 2007. The nucleus accumbens as part of a basal ganglia action selection
Mizumori, S.J., Cooper, B.G., Leutgeb, S., Pratt, W.E., 2000. A neural systems analysis circuit. Psychopharmacology (Berl) 191, 521–550.
of adaptive navigation. Mol. Neurobiol. 21, 57–82. Nicola, S.M., 2010. The flexible approach hypothesis: unification of effort and cue-
Mizumori, S.J., Lavoie, A.M., Kalyani, A., 1996. Redistribution of spatial representa- responding hypotheses for the role of nucleus accumbens dopamine in the
tion in the hippocampus of aged rats performing a spatial memory task. Behav. activation of reward-seeking behavior. J. Neurosci. 30, 16585–16600.
Neurosci. 110, 1006–1016. Nicola, S.M., Kombian, S.B., Malenka, R.C., 1996. Psychostimulants depress excit-
Mizumori, S.J., McNaughton, B.L., Barnes, C.A., Fox, K.B., 1989b. Preserved spatial atory synaptic transmission in the nucleus accumbens via presynaptic D1-like
coding in hippocampal CA1 pyramidal cells during reversible suppression of dopamine receptors. J. Neurosci. 16, 1591–1604.
132 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
Nicola, S.M., Malenka, R.C., 1998. Modulation of synaptic transmission by dopamine Palmiter, R.D., 2008. Dopamine signaling in the dorsal striatum is essential for
and norepinephrine in ventral but not dorsal striatum. J. Neurophysiol. 79, motivated behaviors: lessons from dopamine-deficient mice. Ann. N. Y. Acad.
1768–1776. Sci. 1129, 35–46.
Nicola, S.M., Surmeier, J., Malenka, R.C., 2000. Dopaminergic modulation of neuronal Pan, W.X., Hyland, B.I., 2005. Pedunculopontine tegmental nucleus controls condi-
excitability in the striatum and nucleus accumbens. Annu. Rev. Neurosci. 23, tioned responses of midbrain dopamine neurons in behaving rats. J. Neurosci.
185–215. 25, 4725–4732.
Nicola, S.M., Yun, I.A., Wakabayashi, K.T., Fields, H.L., 2004. Firing of nucleus Pan, W.X., Schmidt, R., Wickens, J.R., Hyland, B.I., 2005. Dopamine cells respond to
accumbens neurons during the consummatory phase of a discriminative stim- predicted events during classical conditioning: evidence for eligibility traces in
ulus task depends on previous reward predictive cues. J. Neurophysiol. 91, the reward-learning network. J. Neurosci. 25, 6235–6242.
1866–1882. Pan, W.X., Schmidt, R., Wickens, J.R., Hyland, B.I., 2008. Tripartite mechanism of
Niv, Y., 2009. Reinforcement learning in the brain. J. Math. Psychol. 53, 139–154. extinction suggested by dopamine neuron activity and temporal difference
Niv, Y., Daw, N.D., Joel, D., Dayan, P., 2007. Tonic dopamine: opportunity costs and model. J. Neurosci. 28, 9619–9631.
the control of response vigor. Psychopharmacology (Berl) 191, 507–520. Parent, A., 1990. Extrinsic connections of the basal ganglia. Trends Neurosci. 13,
Niv, Y., Joel, D., Dayan, P., 2006. A normative perspective on motivation. Trends 254–258.
Cogn. Sci. 10, 375–381. Parkinson, J.A., Dalley, J.W., Cardinal, R.N., Bamford, A., Fehnert, B., Lachenal, G.,
O’Carroll, C.M., Morris, R.G., 2004. Heterosynaptic co-activation of glutamatergic Rudarakanchana, N., Halkerston, K.M., Robbins, T.W., Everitt, B.J., 2002. Nucleus
and dopaminergic afferents is required to induce persistent long-term potenti- accumbens dopamine depletion impairs both acquisition and performance of
ation. Neuropharmacology 47, 324–332. appetitive Pavlovian approach behaviour: implications for mesoaccumbens
O’Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., Dolan, R.J., 2004. dopamine function. Behav. Brain Res. 137, 149–163.
Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Paxinos, G., Watson, C., 2007. The Rat Brain in Stereotaxic Coordinates. Elsevier
Science 304, 452–454. Academic Press, San Diego.
O’Doherty, J.P., Dayan, P., Friston, K., Critchley, H., Dolan, R.J., 2003. Temporal Pellis, S.M., Castaneda, E., McKenna, M.M., Tran-Nguyen, L.T., Whishaw, I.Q., 1993.
difference models and reward-related learning in the human brain. Neuron The role of the striatum in organizing sequences of play fighting in neonatally
38, 329–337. dopamine-depleted rats. Neurosci. Lett. 158, 13–15.
O’Donnell, P., Grace, A.A., 1995. Synaptic interactions among excitatory afferents to Penick, S., Solomon, P.R., 1991. Hippocampus, context, and conditioning. Behav.
nucleus accumbens neurons: hippocampal gating of prefrontal cortical input. J. Neurosci. 105, 611–617.
Neurosci. 15, 3622–3639. Pennartz, C.M., Berke, J.D., Graybiel, A.M., Ito, R., Lansink, C.S., van der Meer, M.,
O’Keefe, J., 1976. Place units in the hippocampus of the freely moving rat. Exp. Redish, A.D., Smith, K.S., Voorn, P., 2009. Corticostriatal interactions during
Neurol. 51, 78–109. learning, memory processing, and decision making. J. Neurosci. 29, 12831–
O’Keefe, J., Burgess, N., 1996. Geometric determinants of the place fields of hippo- 12838.
campal neurons. Nature 381, 425–428. Pennartz, C.M., Groenewegen, H.J., Lopes da Silva, F.H., 1994. The nucleus accum-
O’Keefe, J., Conway, D.H., 1978. Hippocampal place units in the freely moving rat: bens as a complex of functionally distinct neuronal ensembles: an integration of
why they fire where they fire. Exp. Brain Res. 31, 573–590. behavioural, electrophysiological and anatomical data. Prog. Neurobiol. 42,
O’Keefe, J., Dostrovsky, J., 1971. The hippocampus as a spatial map. Preliminary 719–761.
evidence from unit activity in the freely-moving rat. Brain Res. 34, 171–175. Pennartz, C.M., Lee, E., Verheul, J., Lipa, P., Barnes, C.A., McNaughton, B.L., 2004. The
O’Keefe, J., Nadel, L., 1978a. The Hippocampus as a Cognitive Map. Oxford University ventral striatum in off-line processing: ensemble reactivation during sleep and
Press. modulation by hippocampal ripples. J. Neurosci. 24, 6446–6456.
O’Keefe, J., Recce, M.L., 1993. Phase relationship between hippocampal place units Pennartz, C.M., Uylings, H.B., Barnes, C.A., McNaughton, B.L., 2002. Memory reacti-
and the EEG theta rhythm. Hippocampus 3, 317–330. vation and consolidation during sleep: from cellular mechanisms to human
O’Mara, S.M., 1995. Spatially selective firing properties of hippocampal formation performance. Prog. Brain Res. 138, 143–166.
neurons in rodents and primates. Prog. Neurobiol. 45, 253–274. Phillips, P.E., Walton, M.E., Jhou, T.C., 2007. Calculating utility: preclinical evidence
O’Reilly, R.C., McClelland, J.L., 1994. Hippocampal conjunctive encoding, storage, for cost–benefit analysis by mesolimbic dopamine. Psychopharmacology (Berl)
and recall: avoiding a trade-off. Hippocampus 4, 661–682. 191, 483–495.
O’Keefe, J., Nadel, L., 1978b. The Hippocampus as a Cognitive Map. Oxford University Phillips, R.G., LeDoux, J.E., 1992. Differential contribution of amygdala and hippo-
Press, Oxford. campus to cued and contextual fear conditioning. Behav. Neurosci. 106, 274–
O’Keefe, J., Speakman, A., 1987. Single unit activity in the rat hippocampus during a 285.
spatial memory task. Exp. Brain Res. 68, 1–27. Poldrack, R.A., Packard, M.G., 2003. Competition among multiple memory systems:
Oakman, S.A., Faris, P.L., Kerr, P.E., Cozzari, C., Hartman, B.K., 1995. Distribution of converging evidence from animal and human brain studies. Neuropsychologia
pontomesencephalic cholinergic neurons projecting to substantia nigra differs 41, 245–251.
significantly from those projecting to ventral tegmental area. J. Neurosci. 15, Poucet, B., 1993. Spatial cognitive maps in animals: new hypotheses on their
5859–5869. structure and neural mechanisms. Psychol. Rev. 100, 163–182.
Olton, D.S., Becker, J.T., Handelmann, G.E., 1979. Hippocampus, space, and memory. Pragay, E.B., Mirsky, A.F., Ray, C.L., Turner, D.F., Mirsky, C.V., 1978. Neuronal activity
Brain Behav. Sci. 2, 313–365. in the brain stem reticular formation during performance of a ‘‘go-no go’’ visual
Olton, D.S., Samuelson, R.J., 1976. Remembrance of places passed: spatial memory attention task in the monkey. Exp. Neurol. 60, 83–95.
in rats. J. Exp. Psychol. Anim. Behav. Process. 2, 97–116. Puryear, C.B., Kim, M.J., Mizumori, S.J., 2010. Conjunctive encoding of movement
Olypher, A.V., Lansky, P., Fenton, A.A., 2002. Properties of the extra-positional signal and reward by ventral tegmental area neurons in the freely navigating rodent.
in hippocampal place cell discharge derived from the overdispersion in loca- Behav. Neurosci. 124, 234–247.
tion-specific firing. Neuroscience 111, 553–566. Puryear, C.B., Mizumori, S.J., 2008. Reward prediction error signals by reticular
Omelchenko, N., Sesack, S.R., 2009. Ultrastructural analysis of local collaterals of rat formation neurons. Learn. Mem. 15, 895–898.
ventral tegmental area neurons: GABA phenotype and synapses onto dopamine Quirk, G.J., Muller, R.U., Kubie, J.L., 1990. The firing of hippocampal place cells in the
and GABA cells. Synapse 63, 895–906. dark depends on the rat’s recent experience. J. Neurosci. 10, 2008–2017.
Ostlund, S.B., Wassum, K.M., Murphy, N.P., Balleine, B.W., Maidment, N.T., 2011. Ragozzino, K.E., Leutgeb, S., Mizumori, S.J., 2001. Dorsal striatal head direction and
Extracellular dopamine levels in striatal subregions track shifts in motivation hippocampal place representations during spatial navigation. Exp. Brain Res.
and response cost during instrumental conditioning. J. Neurosci. 31, 200– 139, 372–376.
207. Ragozzino, M.E., 2003. Acetylcholine actions in the dorsomedial striatum support
Otmakhova, N.A., Lisman, J.E., 1996. D1/D5 dopamine receptor activation increases the flexible shifting of response patterns. Neurobiol. Learn. Mem. 80, 257–267.
the magnitude of early long-term potentiation at CA1 hippocampal synapses. J. Ragozzino, M.E., Detrick, S., Kesner, R.P., 1999a. Involvement of the prelimbic–
Neurosci. 16, 7478–7486. infralimbic areas of the rodent prefrontal cortex in behavioral flexibility for
Otmakhova, N.A., Lisman, J.E., 1998. D1/D5 dopamine receptors inhibit depotentia- place and response learning. J. Neurosci. 19, 4585–4594.
tion at CA1 synapses via cAMP-dependent mechanism. J. Neurosci. 18, 1270– Ragozzino, M.E., Mohler, E.G., Prior, M., Palencia, C.A., Rozman, S., 2009. Acetylcho-
1279. line activity in selective striatal regions supports behavioral flexibility. Neu-
Oyama, K., Hernadi, I., Iijima, T., Tsutsui, K., 2010. Reward prediction error coding in robiol. Learn. Mem. 91, 13–22.
dorsal striatal neurons. J. Neurosci. 30, 11447–11457. Ragozzino, M.E., Ragozzino, K.E., Mizumori, S.J., Kesner, R.P., 2002. Role of the
Packard, M.G., 1999. Glutamate infused posttraining into the hippocampus or dorsomedial striatum in behavioral flexibility for response and visual cue
caudate-putamen differentially strengthens place and response learning. Proc. discrimination learning. Behav. Neurosci. 116, 105–115.
Natl. Acad. Sci. U.S.A. 96, 12881–12886. Ragozzino, M.E., Wilcox, C., Raso, M., Kesner, R.P., 1999b. Involvement of rodent
Packard, M.G., 2009. Exhumed from thought: basal ganglia and response learning in prefrontal cortex subregions in strategy switching. Behav. Neurosci. 113, 32–41.
the plus-maze. Behav. Brain Res. 199, 24–31. Ranck Jr., J.B., 1973. Studies on single neurons in dorsal hippocampal formation and
Packard, M.G., Knowlton, B.J., 2002. Learning and memory functions of the basal septum in unrestrained rats. I. Behavioral correlates and firing repertoires. Exp.
ganglia. Annu. Rev. Neurosci. 25, 563–593. Neurol. 41, 461–531.
Packard, M.G., McGaugh, J.L., 1996. Inactivation of hippocampus or caudate nucleus Rangel, A., Camerer, C., Montague, P.R., 2008. A framework for studying the
with lidocaine differentially affects expression of place and response learning. neurobiology of value-based decision making. Nat. Rev. Neurosci. 9, 545–556.
Neurobiol. Learn. Mem. 65, 65–72. Rawlins, J.N.P., 1985. Associations across time: the hippocampus as a temporary
Packard, M.G., Hirsh, R., White, N.M., 1989. Differential effects of fornix and caudate memory store. Brain Behav. Sci. 8, 479–496.
nucleus lesions on two radial maze tasks: evidence for multiple memory Redgrave, P., Gurney, K., 2006. The short-latency dopamine signal: a role in
systems. J. Neurosci. 9, 1465–1472. discovering novel actions? Nat. Rev. Neurosci. 7, 967–975.
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 133
Redgrave, P., Mitchell, I.J., Dean, P., 1987. Further evidence for segregated output Salamone, J.D., Correa, M., Farrar, A., Mingote, S.M., 2007. Effort-related functions of
channels from superior colliculus in rat: ipsilateral tecto-pontine and tecto- nucleus accumbens dopamine and associated forebrain circuits. Psychophar-
cuneiform projections have different cells of origin. Brain Res. 413, 170–174. macology (Berl) 191, 461–482.
Redgrave, P., Prescott, T.J., Gurney, K., 1999a. The basal ganglia: a vertebrate Salamone, J.D., Correa, M., Farrar, A.M., Nunes, E.J., Pardo, M., 2009. Dopamine,
solution to the selection problem? Neuroscience 89, 1009–1023. behavioral economics, and effort. Front. Behav. Neurosci. 3, 13.
Redgrave, P., Prescott, T.J., Gurney, K., 1999b. Is the short-latency dopamine Salamone, J.D., Steinpreis, R.E., McCullough, L.D., Smith, P., Grebel, D., Mahan, K.,
response too short to signal reward error? Trends Neurosci. 22, 146–151. 1991. Haloperidol and nucleus accumbens dopamine depletion suppress lever
Redish, A.D., Jensen, S., Johnson, A., Kurth-Nelson, Z., 2007. Reconciling reinforce- pressing for food but increase free food consumption in a novel food choice
ment learning models with behavioral extinction and renewal: implications for procedure. Psychopharmacology (Berl) 104, 515–521.
addiction, relapse, and problem gambling. Psychol. Rev. 114, 784–805. Sargolini, F., Florian, C., Oliverio, A., Mele, A., Roullet, P., 2003. Differential involve-
Redish, A.D., 1999. Beyond the Cognitive Map: From Place Cells to Episodic Memory. ment of NMDA and AMPA receptors within the nucleus accumbens in consoli-
The MIT Press, Boston. dation of information necessary for place navigation and guidance strategy of
Redish, A.D., Battaglia, F.P., Chawla, M.K., Ekstrom, A.D., Gerrard, J.L., Lipa, P., mice. Learn. Mem. 10, 285–292.
Rosenzweig, E.S., Worley, P.F., Guzowski, J.F., McNaughton, B.L., Barnes, C.A., Sargolini, F., Fyhn, M., Hafting, T., McNaughton, B.L., Witter, M.P., Moser, M.B.,
2001. Independence of firing correlates of anatomically proximate hippocampal Moser, E.I., 2006. Conjunctive representation of position, direction, and velocity
pyramidal cells. J. Neurosci. 21, RC134 (1–6). in entorhinal cortex. Science 312, 758–762.
Redish, A.D., Rosenzweig, E.S., Bohanick, J.D., McNaughton, B.L., Barnes, C.A., 2000. Sargolini, F., Roullet, P., Oliverio, A., Mele, A., 1999. Effects of lesions to the
Dynamics of hippocampal ensemble activity realignment: time versus space. J. glutamatergic afferents to the nucleus accumbens in the modulation of
Neurosci. 20, 9298–9309. reactivity to spatial and non-spatial novelty in mice. Neuroscience 93,
Reese, N.B., Garcia-Rill, E., Skinner, R.D., 1995. The pedunculopontine nucleus— 855–867.
auditory input, arousal and pathophysiology. Prog. Neurobiol. 47, 105–133. Savelli, F., Knierim, J.J., 2010. Hebbian analysis of the transformation of medial
Rescorla, R.A., Solomon, R.L., 1967. Two-process learning theory: relationships entorhinal grid-cell inputs to hippocampal place fields. J. Neurophysiol. 103,
between Pavlovian conditioning and instrumental learning. Psychol. Rev. 74, 3167–3183.
151–182. Schmitzer-Torbert, N., Redish, A.D., 2002. Development of path stereotypy in a
Rescorla, R.A., Wagner, A.R., 1972. A theory of Pavlovian conditioning: variations in single day in rats on a multiple-T maze. Arch. Ital. Biol. 140, 295–301.
the effectiveness of reinforcement and nonreinforcement. In: Black, A.H., Schmitzer-Torbert, N., Redish, A.D., 2004. Neuronal activity in the rodent dorsal
Prokasy, W.F. (Eds.), Classical Conditioning II: Current Research and Theory. striatum in sequential navigation: separation of spatial and reward responses
Appleton Century Crofts, New York, pp. 64–99. on the multiple T task. J. Neurophysiol. 91, 2259–2272.
Restle, F., 1957. Discrimination of cues in mazes: a resolution of the place-vs.- Schultz, W., 1997. Dopamine neurons and their role in reward mechanisms. Curr.
response question. Psychol. Rev. 64, 217–228. Opin. Neurobiol. 7, 191–197.
Richards, J.B., Mitchell, S.H., de Wit, H., Seiden, L.S., 1997. Determination of discount Schultz, W., 1998a. The phasic reward signal of primate dopamine neurons. Adv.
functions in rats with an adjusting-amount procedure. J. Exp. Anal. Behav. 67, Pharmacol. 42, 686–690.
353–366. Schultz, W., 1998b. Predictive reward signal of dopamine neurons. J. Neurophysiol.
Robbins, T.W., Everitt, B.J., 2002. Limbic–striatal memory systems and drug addic- 80, 1–27.
tion. Neurobiol. Learn. Mem. 78, 625–636. Schultz, W., 2002. Getting formal with dopamine and reward. Neuron 36, 241–263.
Robinson, D.L., Venton, B.J., Heien, M.L., Wightman, R.M., 2003. Detecting subsecond Schultz, W., 2010. Dopamine signals for reward value and risk: basic and recent
dopamine release with fast-scan cyclic voltammetry in vivo. Clin. Chem. 49, data. Behav. Brain Funct. 6, 24.
1763–1773. Schultz, W., Apicella, P., Ljungberg, T., 1993. Responses of monkey dopamine
Robinson, S., Rainwater, A.J., Hnasko, T.S., Palmiter, R.D., 2007. Viral restoration of neurons to reward and conditioned stimuli during successive steps of learning
dopamine signaling to the dorsal striatum restores instrumental conditioning a delayed response task. J. Neurosci. 13, 900–913.
to dopamine-deficient mice. Psychopharmacology (Berl) 191, 567–578. Schultz, W., Dayan, P., Montague, P.R., 1997. A neural substrate of prediction and
Robinson, S., Smith, D.M., Mizumori, S.J., Palmiter, R.D., 2004. Firing properties of reward. Science 275, 1593–1599.
dopamine neurons in freely moving dopamine-deficient mice: effects of dopa- Schultz, W., Dickinson, A., 2000. Neuronal coding of prediction errors. Annu. Rev.
mine receptor activation and anesthesia. Proc. Natl. Acad. Sci. U.S.A. 101, Neurosci. 23, 473–500.
13329–13334. Schultz, W., Romo, R., 1988. Neuronal activity in the monkey striatum during the
Roesch, M.R., Calu, D.J., Schoenbaum, G., 2007. Dopamine neurons encode the better initiation of movements. Exp. Brain Res. 71, 431–436.
option in rats deciding between differently delayed or sized rewards. Nat. Schultz, W., Romo, R., 1992. Role of primate basal ganglia and frontal cortex in the
Neurosci. 10, 1615–1624. internal generation of movements. I. Preparatory activity in the anterior stria-
Roesch, M.R., Singh, T., Brown, P.L., Mullins, S.E., Schoenbaum, G., 2009. Ventral tum. Exp. Brain Res. 91, 363–384.
striatal neurons encode the value of the chosen action in rats deciding between Schweimer, J., Hauber, W., 2006. Dopamine D1 receptors in the anterior cingulate
differently delayed or sized rewards. J. Neurosci. 29, 13365–13376. cortex regulate effort-based decision making. Learn. Mem. 13, 777–782.
Roitman, M.F., Stuber, G.D., Phillips, P.E., Wightman, R.M., Carelli, R.M., 2004. Seamans, J.K., Phillips, A.G., 1994. Selective memory impairments produced by
Dopamine operates as a subsecond modulator of food seeking. J. Neurosci. transient lidocaine-induced lesions of the nucleus accumbens in rats. Behav.
24, 1265–1271. Neurosci. 108, 456–468.
Roitman, M.F., Wheeler, R.A., Carelli, R.M., 2005. Nucleus accumbens neurons are Seamans, .J.K., Yang, C.R., 2004. The principal features and mechanisms of dopamine
innately tuned for rewarding and aversive taste stimuli, encode their predictors, modulation in the prefrontal cortex. Prog. Neurobiol. 74, 1–58.
and are linked to motor output. Neuron 45, 587–597. Sesack, S.R., Carr, D.B., Omelchenko, N., Pinto, A., 2003. Anatomical substrates for
Rolls, E.T., 1996. A theory of hippocampal function in memory. Hippocampus 6, glutamate–dopamine interactions: evidence for specificity of connections and
601–620. extrasynaptic actions. Ann. N. Y. Acad. Sci. 1003, 36–52.
Rosenzweig, E.S., Redish, A.D., McNaughton, B.L., Barnes, C.A., 2003. Hippocampal Sesack, S.R., Grace, A.A., 2010. Cortico-basal ganglia reward network: microcircuit-
map realignment and spatial learning. Nat. Neurosci. 6, 609–615. ry. Neuropsychopharmacology 35, 27–47.
Rossato, J.I., Bevilaqua, L.R., Izquierdo, I., Medina, J.H., Cammarota, M., 2009. Setlow, B., McGaugh, J.L., 1998. Sulpiride infused into the nucleus accumbens
Dopamine controls persistence of long-term memory storage. Science 325, posttraining impairs memory of spatial water maze training. Behav. Neurosci.
1017–1020. 112, 603–610.
Roullet, P., Sargolini, F., Oliverio, A., Mele, A., 2001. NMDA and AMPA antagonist Setlow, B., Schoenbaum, G., Gallagher, M., 2003. Neural encoding in ventral striatum
infusions into the ventral striatum impair different steps of spatial information during olfactory discrimination learning. Neuron 38, 625–636.
processing in a nonassociative task in mice. J. Neurosci. 21, 2143–2149. Seymour, B., O’Doherty, J.P., Dayan, P., Koltzenburg, M., Jones, A.K., Dolan, R.J.,
Sabatino, M., Ferraro, G., Liberti, G., Vella, N., La Grutta, V., 1985. Striatal and septal Friston, K.J., Frackowiak, R.S., 2004. Temporal difference models describe
influence on hippocampal theta and spikes in the cat. Neurosci. Lett. 61, 55–59. higher-order learning in humans. Nature 429, 664–667.
Sakurai, Y., 1994. Involvement of auditory cortical and hippocampal neurons in Siapas, A.G., Lubenov, E.V., Wilson, M.A., 2005. Prefrontal phase locking to hippo-
auditory working memory and reference memory in the rat. J. Neurosci. 14, campal theta oscillations. Neuron 46, 141–151.
2606–2623. Sidman, M., Fletcher, F.G., 1968. A demonstration of auto-shaping with monkeys. J.
Salamone, J.D., 1994. The involvement of nucleus accumbens dopamine in appeti- Exp. Anal. Behav. 11, 307–309.
tive and aversive motivation. Behav. Brain Res. 61, 117–133. Singer, A.C., Frank, L.M., 2009. Rewarded outcomes enhance reactivation of experi-
Salamone, J.D., 2002. Functional significance of nucleus accumbens dopamine: ence in the hippocampus. Neuron 64, 910–921.
behavior, pharmacology and neurochemistry. Behav. Brain Res. 137, 1. Sink, K.S., Vemuri, V.K., Olszewska, T., Makriyannis, A., Salamone, J.D., 2008.
Salamone, J.D., 2007. Functions of mesolimbic dopamine: changing concepts and Cannabinoid CB1 antagonists and dopamine antagonists produce different
shifting paradigms. Psychopharmacology (Berl) 191, 389. effects on a task involving response allocation and effort-related choice in
Salamone, J.D., Arizzi, M.N., Sandoval, M.D., Cervone, K.M., Aberman, J.E., 2002. food-seeking behavior. Psychopharmacology (Berl) 196, 565–574.
Dopamine antagonists alter response allocation but do not suppress appetite for Skaggs, W.E., McNaughton, B.L., Wilson, M.A., Barnes, C.A., 1996. Theta phase
food in rats: contrast between the effects of SKF 83566, raclopride, and precession in hippocampal neuronal populations and the compression of
fenfluramine on a concurrent choice task. Psychopharmacology (Berl) 160, temporal sequences. Hippocampus 6, 149–172.
371–380. Small, W.S., 1899. Notes on the psychic development of the young white rat. Am. J.
Salamone, J.D., Correa, M., 2002. Motivational views of reinforcement: implications Psychol. 11, 80–100.
for understanding the behavioral functions of nucleus accumbens dopamine. Small, W.S., 1900. An experimental study of the mental processes of the rat. Am. J.
Behav. Brain Res. 137, 3–25. Psychol. 11, 133–165.
134 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135
Small, W.S., 1901. Experimental study of the mental processes of the rat. Am. J. Tse, D., Langston, R.F., Kakeyama, M., Bethus, I., Spooner, P.A., Wood, E.R., Witter,
Psychol. 12, 206–239. M.P., Morris, R.G., 2007. Schemas and memory consolidation. Science 316, 76–
Smith-Roe, S.L., Kelley, A.E., 2000. Coincident activation of NMDA and dopamine D1 82.
receptors within the nucleus accumbens core is required for appetitive instru- Tulving, E., 2002. Episodic memory: from mind to brain. Annu. Rev. Psychol. 53, 1–
mental learning. J. Neurosci. 20, 7737–7742. 25.
Smith-Roe, S.L., Sadeghian, K., Kelley, A.E., 1999. Spatial learning and performance Usiello, A., Sargolini, F., Roullet, P., Ammassari-Teule, M., Passino, E., Oliverio, A.,
in the radial arm maze is impaired after N-methyl-D-aspartate (NMDA) receptor Mele, A., 1998. N-methyl-D-aspartate receptors in the nucleus accumbens are
blockade in striatal subregions. Behav. Neurosci. 113, 703–717. involved in detection of spatial novelty in mice. Psychopharmacology (Berl)
Smith, D.M., Mizumori, S.J., 2006a. Hippocampal place cells, context, and episodic 137, 175–183.
memory. Hippocampus 16, 716–729. Usuda, I., Tanaka, K., Chiba, T., 1998. Efferent projections of the nucleus accumbens
Smith, D.M., Mizumori, S.J., 2006b. Learning-related development of context-spe- in the rat with special reference to subdivision of the nucleus: biotinylated
cific neuronal responses to places and events: the hippocampal role in context dextran amine study. Brain Res. 797, 73–93.
processing. J. Neurosci. 26, 3154–3163. Van Cauter, T., Poucet, B., Save, E., 2008. Unstable CA1 place cell representation in
Song, E.Y., Kim, Y.B., Kim, Y.H., Jung, M.W., 2005. Role of active movement in place- rats with entorhinal cortex lesions. Eur. J. Neurosci. 27, 1933–1946.
specific firing of hippocampal neurons. Hippocampus 15, 8–17. Van den Bercken, J.H., Cools, A.R., 1982. Evidence for a role of the caudate nucleus in
Sotak, B.N., Hnasko, T.S., Robinson, S., Kremer, E.J., Palmiter, R.D., 2005. Dysregula- the sequential organization of behavior. Behav. Brain Res. 4, 319–327.
tion of dopamine signaling in the dorsal striatum inhibits feeding. Brain Res. van den Bos, R., Lasthuis, W., den Heijer, E., van der Harst, J., Spruijt, B., 2006. Toward
1061, 88–96. a rodent model of the Iowa gambling task. Behav. Res. Methods 38, 470–478.
Squire, L.R., Knowlton, B., Musen, G., 1993. The structure and organization of van der Meer, M.A., Redish, A.D., 2011. Ventral striatum: a critical look at models of
memory. Annu. Rev. Psychol. 44, 453–495. learning and evaluation. Curr. Opin. Neurobiol. 21, 387–392.
Squire, L.R., 1994. Memory and forgetting: long-term and gradual changes in van der Meer, M.A., Johnson, A., Schmitzer-Torbert, N.C., Redish, A.D., 2010. Triple
memory storage. Int. Rev. Neurobiol. 37, 243–269 discussion 248–285. dissociation of information processing in dorsal striatum, ventral striatum, and
Stephens, B.a.K.J., 1986. Foraging Theory. Princeton University Press, Princeton, NJ. hippocampus on a learned spatial decision task. Neuron 67, 25–32.
Stramiello, M., Wagner, J.J., 2008. D1/5 receptor-mediated enhancement of LTP van der Meer, M.A., Redish, A.D., 2009. Low and high gamma oscillations in rat
requires PKA. Src family kinases, and NR2B-containing NMDARs. Neurophar- ventral striatum have distinct relationships to behavior, reward, and spiking
macology 55, 871–877. activity on a learned spatial decision task. Front. Integr. Neurosci. 3, 9.
Suri, R.E., 2002. TD models of reward predictive responses in dopamine neurons. van der Meer, M.A., Redish, A.D., 2010. Expectancies in decision making, reinforce-
Neural Netw. 15, 523–533. ment learning, and ventral striatum. Front. Neurosci. 4, 6.
Suri, R.E., Schultz, W., 2001. Temporal difference model reproduces anticipatory van Dongen, Y.C., Deniau, J.M., Pennartz, C.M., Galis-de Graaf, Y., Voorn, P., Thierry,
neural activity. Neural Comput. 13, 841–862. A.M., Groenewegen, H.J., 2005. Anatomical evidence for direct connections
Surmeier, D.J., Ding, J., Day, M., Wang, Z., Shen, W., 2007. D1 and D2 dopamine- between the shell and core subregions of the rat nucleus accumbens. Neuro-
receptor modulation of striatal glutamatergic signaling in striatal medium science 136, 1049–1071.
spiny neurons. Trends Neurosci. 30, 228–235. van Groen, T., Wyss, J.M., 1990. Extrinsic projections from area CA1 of the rat
Surmeier, D.J., Shen, W., Day, M., Gertler, T., Chan, S., Tian, X., Plotkin, J.L., 2010. The hippocampus: olfactory, cortical, subcortical, and bilateral hippocampal for-
role of dopamine in modulating the structure and function of striatal circuits. mation projections. J. Comp. Neurol. 302, 515–528.
Prog. Brain Res. 183, 149–167. Van Strien, N.M., Cappaert, N.L., Witter, M.P., 2009. The anatomy of memory: an
Sutton, R.S., 1988. Learning to predict by the methods of temporal differences. interactive overview of the parahippocampal–hippocampal network. Nat. Rev.
Mach. Learn. 3, 9–44. Neurosci. 10, 272–282.
Sutton, R., Barto, A., 1998. Reinforcement Learning: An Introduction. MIT Press, Varela, F., Lachaux, J.P., Rodriguez, E., Martinerie, J., 2001. The brainweb: phase
Cambridge, MA. synchronization and large-scale integration. Nat. Rev. Neurosci. 2, 229–239.
Swanson, L.W., 2003. Brain Maps: Structure of the Rat Brain, 3rd edition. Academic Vida, I., Bartos, M., Jonas, P., 2006. Shunting inhibition improves robustness of
Press, San Diego, CA. gamma oscillations in hippocampal interneuron networks by homogenizing
Swanson, L.W., Cowan, W.M., 1977. An autoradiographic study of the organization firing rates. Neuron 49, 107–117.
of the efferent connections of the hippocampal formation in the rat. J. Comp. Vinogradova, O.S., 1995. Expression, control, and probable functional significance of
Neurol. 172, 49–84. the neuronal theta-rhythm. Prog. Neurobiol. 45, 523–583.
Tabuchi, E.T., Mulder, A.B., Wiener, S.I., 2000. Position and behavioral modulation of Voorn, P., Vanderschuren, L.J., Groenewegen, H.J., Robbins, T.W., Pennartz, C.M.,
synchronization of hippocampal and accumbens neuronal discharges in freely 2004. Putting a spin on the dorsal–ventral divide of the striatum. Trends
moving rats. Hippocampus 10, 717–728. Neurosci. 27, 468–474.
Taha, S.A., Nicola, S.M., Fields, H.L., 2007. Cue-evoked encoding of movement Waddington, K.D., Holden, L.R., 1979. Optimal foraging-flower selection by bees.
planning and execution in the rat nucleus accumbens. J. Physiol. 584, 801–818. Am. Nat. 114, 179–196.
Taube, J.S., Muller, R.U., Ranck Jr., J.B., 1990. Head-direction cells recorded from the Waelti, P., Dickinson, A., Schultz, W., 2001. Dopamine responses comply with basic
postsubiculum in freely moving rats. I. Description and quantitative analysis. J. assumptions of formal learning theory. Nature 412, 43–48.
Neurosci. 10, 420–435. Wakabayashi, K.T., Fields, H.L., Nicola, S.M., 2004. Dissociation of the role of nucleus
Terrazas, A., Krause, M., Lipa, P., Gothard, K.M., Barnes, C.A., McNaughton, B.L., 2005. accumbens dopamine in responding to reward-predictive cues and waiting for
Self-motion and the hippocampal spatial metric. J. Neurosci. 25, 8085–8096. reward. Behav. Brain Res. 154, 19–30.
Thorn, C.A., Atallah, H., Howe, M., Graybiel, A.M., 2010. Differential dynamics of Wall, V.Z., Parker, J.G., Fadok, J.P., Darvas, M., Zweifel, L., Palmiter, R.D., 2011. A
activity changes in dorsolateral and dorsomedial striatal loops during learning. behavioral genetics approach to understanding D1 receptor involvement in
Neuron 66, 781–795. phasic dopamine signaling. Mol. Cell. Neurosci. 46, 21–31.
Thorn, C.A., Graybiel, A.M., 2010. Pausing to regroup: thalamic gating of cortico- Walton, M.E., Bannerman, D.M., Rushworth, M.F., 2002. The role of rat medial
basal ganglia networks. Neuron 67, 175–178. frontal cortex in effort-based decision making. J. Neurosci. 22, 10996–11003.
Tobler, P.N., Dickinson, A., Schultz, W., 2003. Coding of predicted reward omission Walton, M.E., Kennerley, S.W., Bannerman, D.M., Phillips, P.E., Rushworth, M.F.,
by dopamine neurons in a conditioned inhibition paradigm. J. Neurosci. 23, 2006. Weighing up the benefits of work: behavioral and neural analyses of
10402–10410. effort-related decision making. Neural Netw. 19, 1302–1314.
Tobler, P.N., Fiorillo, C.D., Schultz, W., 2005. Adaptive coding of reward value by Wanat, M.J., Kuhnen, C.M., Phillips, P.E., 2010. Delays conferred by escalating costs
dopamine neurons. Science 307, 1642–1645. modulate dopamine release to rewards but not their predictors. J. Neurosci. 30,
Tolman, E.C., 1930. Maze performance a function of motivation and of reward as 12020–12027.
well as knowledge of the maze paths. J. Gen. Psychol. 4, 338–342. Wang, H.L., Morales, M., 2009. Pedunculopontine and laterodorsal tegmental nuclei
Tolman, E.C., 1938. The determiners of behavior at a choice point. Psychol. Rev. 46, contain distinct populations of cholinergic, glutamatergic and GABAergic neu-
318–336. rons in the rat. Eur. J. Neurosci. 29, 340–358.
Tolman, E.C., 1939. Prediction of vicarious trial and error by means of the schematic Wang, S.H., Morris, R.G., 2010. Hippocampal–neocortical interactions in memory
sowbug. Psychol. Rev. 46, 318–336. formation, consolidation, and reconsolidation. Annu. Rev. Psychol. 61 (49-79),
Tolman, E.C., 1948. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208. C41–C44.
Totterdell, S., Meredith, G.E., 1997. Topographical organization of projections from Watson, J.B., 1907. Kinaesthetic and organic sensations: their role in the reactions of
the entorhinal cortex to the striatum of the rat. Neuroscience 78, 715–729. the white rat. Psychol. Rev. Monogr. (Suppl. (8)) number 2.
Touretzky, D.S., Redish, A.D., 1996. Theory of rodent navigation based on interacting Whishaw, I.Q., Gorny, B., 1999. Path integration absent in scent-tracking fimbria-
representations of space. Hippocampus 6, 247–270. fornix rats: evidence for hippocampal involvement in ‘‘sense of direction’’ and
Tremblay, P.L., Bedard, M.A., Langlois, D., Blanchet, P.J., Lemay, M., Parent, M., ‘‘sense of distance’’ using self-movement cues. J. Neurosci. 19, 4662–4673.
2010. Movement chunking during sequence learning is a dopamine-depen- Whishaw, I.Q., Mittleman, G., Bunch, S.T., Dunnett, S.B., 1987. Impairments in the
dant process: a study conducted in Parkinson’s disease. Exp. Brain Res. 205, acquisition, retention and selection of spatial navigation strategies after medial
375–385. caudate-putamen lesions in rats. Behav. Brain Res. 24, 125–138.
Tremblay, P.L., Bedard, M.A., Levesque, M., Chebli, M., Parent, M., Courtemanche, White, I.M., Rebec, G.V., 1993. Responses of rat striatal neurons during performance
R., Blanchet, P.J., 2009. Motor sequence learning in primate: role of the D2 of a lever-release version of the conditioned avoidance response task. Brain Res.
receptor in movement chunking during consolidation. Behav. Brain Res. 198, 616, 71–82.
231–239. Whittington, M.A., Traub, R.D., Jefferys, J.G., 1995. Synchronized oscillations in
Treves, A., 2004. Computational constraints between retrieving the past and pre- interneuron networks driven by metabotropic glutamate receptor activation.
dicting the future, and the CA3–CA1 differentiation. Hippocampus 14, 539–556. Nature 373, 612–615.
M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 135
Wickens, J.R., Budd, C.S., Hyland, B.I., Arbuthnott, G.W., 2007a. Striatal contributions Wood, E.R., Dudchenko, P.A., Robitsek, R.J., Eichenbaum, H., 2000. Hippocampal
to reward and decision making: making sense of regional variations in a neurons encode information about different types of memory episodes occur-
reiterated processing matrix. Ann. N. Y. Acad. Sci. 1104, 192–212. ring in the same location. Neuron 27, 623–633.
Wickens, J.R., Horvitz, J.C., Costa, R.M., Killcross, S., 2007b. Dopaminergic mecha- Woolf, N.J., 1991. Cholinergic systems in mammalian brain and spinal cord. Prog.
nisms in actions and habits. J. Neurosci. 27, 8181–8183. Neurobiol. 37, 475–524.
Wiener, S.I., 1993. Spatial and behavioral correlates of striatal neurons in rats Worden, L.T., Shahriari, M., Farrar, A.M., Sink, K.S., Hockemeyer, J., Muller, C.E.,
performing a self-initiated navigation task. J. Neurosci. 13, 3802–3817. Salamone, J.D., 2009. The adenosine A2A antagonist MSX-3 reverses the effort-
Wiener, S.I., 1996. Spatial, behavioral and sensory correlates of hippocampal CA1 related effects of dopamine blockade: differential interaction with D1 and D2
complex spike cell activity: implications for information processing functions. family antagonists. Psychopharmacology (Berl) 203, 489–499.
Prog. Neurobiol. 49, 335–361. Wright, C.I., Beijer, A.V., Groenewegen, H.J., 1996. Basal amygdaloid complex
Wiener, S.I., Korshunov, V.A., Garcia, R., Berthoz, A., 1995. Inertial, substratal and afferents to the rat nucleus accumbens are compartmentally organized. J.
landmark cue control of hippocampal CA1 place cell activity. Eur. J. Neurosci. 7, Neurosci. 16, 1877–1893.
2206–2219. Xi, Z.X., Stein, E.A., 1998. Nucleus accumbens dopamine release modulation by
Wiener, S.I., Paul, C.A., Eichenbaum, H., 1989. Spatial and behavioral correlates of mesolimbic GABAA receptors—an in vivo electrochemical study. Brain Res. 798,
hippocampal neuronal activity. J. Neurosci. 9, 2737–2763. 156–165.
Wightman, R.M., Robinson, D.L., 2002. Transient changes in mesolimbic dopamine Yeshenko, O., Guazzelli, A., Mizumori, S.J., 2004. Context-dependent reorganization
and their association with ‘reward’. J. Neurochem. 82, 721–735. of spatial and movement representations by simultaneously recorded hippo-
Wilcove, W.G., Miller, J.C., 1974. CS-USC presentations and a lever: human auto- campal and striatal neurons during performance of allocentric and egocentric
shaping. J. Exp. Psychol. 103, 868–877. tasks. Behav. Neurosci. 118, 751–769.
Williams, D.R., Williams, H., 1969. Auto-maintenance in the pigeon: sustained Yin, H.H., 2010. The sensorimotor striatum is necessary for serial order learning. J.
pecking despite contingent non-reinforcement. J. Exp. Anal. Behav. 12, 511– Neurosci. 30, 14719–14723.
520. Yin, H.H., Knowlton, B.J., 2004. Contributions of striatal subregions to place and
Williams, S., Mmbaga, N., Chirwa, S., 2006. Dopaminergic D1 receptor agonist SKF response learning. Learn. Mem. 11, 459–463.
38393 induces GAP-43 expression and long-term potentiation in hippocampus Yin, H.H., Knowlton, B.J., 2006. The role of the basal ganglia in habit formation. Nat.
in vivo. Neurosci. Lett. 402, 46–50. Rev. Neurosci. 7, 464–476.
Williams, Z.M., Eskandar, E.N., 2006. Selective enhancement of associative learning Yin, H.H., Knowlton, B.J., Balleine, B.W., 2004. Lesions of dorsolateral striatum
by microstimulation of the anterior caudate. Nat. Neurosci. 9, 562–568. preserve outcome expectancy but disrupt habit formation in instrumental
Wills, T.J., Cacucci, F., Burgess, N., O’Keefe, J., 2010. Development of the hippocampal learning. Eur. J. Neurosci. 19, 181–189.
cognitive map in preweanling rats. Science 328, 1573–1576. Yin, H.H., Knowlton, B.J., Balleine, B.W., 2006. Inactivation of dorsolateral striatum
Wilson, D.I., Bowman, E.M., 2005. Rat nucleus accumbens neurons predominantly enhances sensitivity to changes in the action-outcome contingency in instru-
respond to the outcome-related properties of conditioned stimuli rather than mental conditioning. Behav. Brain Res. 166, 189–196.
their behavioral-switching properties. J. Neurophysiol. 94, 49–61. Yin, H.H., Mulcare, S.P., Hilario, M.R., Clouse, E., Holloway, T., Davis, M.I., Hansson,
Wilson, D.I., MacLaren, D.A., Winn, P., 2009. Bar pressing for food: differential A.C., Lovinger, D.M., Costa, R.M., 2009. Dynamic reorganization of striatal
consequences of lesions to the anterior versus posterior pedunculopontine. Eur. circuits during the acquisition and consolidation of a skill. Nat. Neurosci. 12,
J. Neurosci. 30, 504–513. 333–341.
Wilson, M.A., McNaughton, B.L., 1993. Dynamics of the hippocampal ensemble code Yin, H.H., Ostlund, S.B., Balleine, B.W., 2008. Reward-guided learning beyond
for space. Science 261, 1055–1058. dopamine in the nucleus accumbens: the integrative functions of cortico-basal
Wilson, M.A., McNaughton, B.L., 1994. Reactivation of hippocampal ensemble ganglia networks. Eur. J. Neurosci. 28, 1437–1448.
memories during sleep. Science 265, 676–679. Yin, H.H., Ostlund, S.B., Knowlton, B.J., Balleine, B.W., 2005. The role of the dor-
Winn, P., 2006. How best to consider the structure and function of the peduncu- somedial striatum in instrumental conditioning. Eur. J. Neurosci. 22, 513–523.
lopontine tegmental nucleus: evidence from animal studies. J. Neurol. Sci. 248, Zahm, D.S., 1999. Functional–anatomical implications of the nucleus accumbens
234–250. core and shell subterritories. Ann. N. Y. Acad. Sci. 877, 113–128.
Wise, R.A., 2004. Dopamine, learning and motivation. Nat. Rev. Neurosci. 5, 483– Zahm, D.S., 2000. An integrative neuroanatomical perspective on some subcortical
494. substrates of adaptive responding with emphasis on the nucleus accumbens.
Wise, R.A., 2005. Forebrain substrates of reward and motivation. J. Comp. Neurol. Neurosci. Biobehav. Rev. 24, 85–105.
493, 115–121. Zahm, D.S., Brog, J.S., 1992. On the significance of subterritories in the ‘‘accumbens’’
Wise, R.A., 2006. Role of brain dopamine in food reward and reinforcement. Philos. part of the rat ventral striatum. Neuroscience 50, 751–767.
Trans. R. Soc. Lond. B: Biol. Sci. 361, 1149–1158. Zahm, D.S., Heimer, L., 1990. Two transpallidal pathways originating in the rat
Wise, R.A., 2009. Roles for nigrostriatal—not just mesocorticolimbic—dopamine in nucleus accumbens. J. Comp. Neurol. 302, 437–446.
reward and addiction. Trends Neurosci. 32, 517–524. Zahm, D.S., Heimer, L., 1993. Specificity in the efferent projections of the nucleus
Wisman, L.A., Sahin, G., Maingay, M., Leanza, G., Kirik, D., 2008. Functional conver- accumbens in the rat: comparison of the rostral pole projection patterns with
gence of dopaminergic and cholinergic input is critical for hippocampus- those of the core and shell. J. Comp. Neurol. 327, 220–232.
dependent working memory. J. Neurosci. 28, 7797–7807. Zahm, D.S., Williams, E., Wohltmann, C., 1996. Ventral striatopallidothalamic
Witter, M.P., Naber, P.A., van Haeften, T., Machielsen, W.C., Rombouts, S.A., Barkhof, projection: IV. Relative involvements of neurochemically distinct subterritories
F., Scheltens, P., Lopes da Silva, F.H., 2000. Cortico-hippocampal communication in the ventral pallidum and adjacent parts of the rostroventral forebrain. J.
by way of parallel parahippocampal-subicular pathways. Hippocampus 10, Comp. Neurol. 364, 340–362.
398–410. Zhang, L., Doyon, W.M., Clark, J.J., Phillips, P.E., Dani, J.A., 2009. Controls of tonic and
Wolterink, G., Phillips, G., Cador, M., Donselaar-Wolterink, I., Robbins, T.W., Everitt, phasic dopamine transmission in the dorsal and ventral striatum. Mol. Phar-
B.J., 1993. Relative roles of ventral striatal D1 and D2 dopamine receptors in macol. 76, 396–404.
responding with conditioned reinforcement. Psychopharmacology (Berl) 110, Zhou, L., Furuta, T., Kaneko, T., 2003. Chemical organization of projection neurons in
355–364. the rat accumbens nucleus and olfactory tubercle. Neuroscience 120, 783–798.
Womelsdorf, T., Fries, P., Mitra, P.P., Desimone, R., 2006. Gamma-band synchro- Zugaro, M.B., Monconduit, L., Buzsaki, G., 2005. Spike phase precession persists after
nization in visual cortex predicts speed of change detection. Nature 439, 733– transient intrahippocampal perturbation. Nat. Neurosci. 8, 67–71.
736. Zweifel, L., Fadok, J.P., Argilli, E., Garelick, M.G., Jones, G.L., Dickerson, T.M.K., Allens,
Womelsdorf, T., Schoffelen, J.M., Oostenveld, R., Singer, W., Desimone, R., Engel, A.K., J.M., Mizumori, S.J.Y., Bonci, A., Palmiter, R., 2011. Activation of dopamine
Fries, P., 2007. Modulation of neuronal interactions through neuronal synchro- neurons is critical for aversive conditioning and prevention of generalized
nization. Science 316, 1609–1612. anxiety. Nat. Neurosci. 14, 620–626.