Progress in Neurobiology 96 (2012) 96–135

Contents lists available at SciVerse ScienceDirect

Progress in Neurobiology

jo urnal homepage: www.elsevier.com/locate/pneurobio

Neural systems analysis of decision making during goal-directed navigation

Marsha R. Penner, Sheri J.Y. Mizumori *

Department of Psychology, University of Washington, Seattle, WA 98195-1525, United States

A R T I C L E I N F O A B S T R A C T

Article history: The ability to make adaptive decisions during goal-directed navigation is a fundamental and highly

Received 12 April 2011

evolved behavior that requires continual coordination of perceptions, learning and memory processes,

Received in revised form 6 August 2011

and the planning of behaviors. Here, a neurobiological account for such coordination is provided by

Accepted 29 August 2011

integrating current literatures on spatial context analysis and decision-making. This integration includes

Available online 21 September 2011

discussions of our current understanding of the role of the hippocampal system in experience-dependent

navigation, how hippocampal information comes to impact midbrain and striatal decision making

Keywords:

systems, and finally the role of the in the implementation of behaviors based on recent

Dopamine

decisions. These discussions extend across cellular to neural systems levels of analysis. Not only are key

Reinforcement learning

Hippocampus findings described, but also fundamental organizing principles within and across neural systems, as well

Striatum as between neural systems functions and behavior, are emphasized. It is suggested that studying

Navigation decision making during goal-directed navigation is a powerful model for studying interactive brain

Decision making systems and their mediation of complex behaviors.

ß 2011 Published by Elsevier Ltd.

Contents

1. Introduction ...... 97

2. Navigation and foraging behavior ...... 97

3. Laboratory tasks that are based on foraging behavior ...... 98

4. Reinforcement learning and decision making environments...... 99

4.1. Temporal difference learning ...... 100

4.2. Dopamine and reinforcement learning ...... 101

5. The neurobiology of reinforcement learning and goal-directed navigation: hippocampal contributions ...... 102

5.1. Hippocampal place fields as spatial context representations ...... 102

5.2. The distinguishes contexts during navigation ...... 103

5.3. Cellular and network mechanisms underlying hippocampal context processing ...... 104

5.3.1. CA3 and CA1 place fields contributions to the evaluation of context ...... 105

5.3.2. Temporal encoding of spatial contextual information ...... 105

5.3.3. Sources of hippocampal spatial and nonspatial information ...... 106

5.3.4. Determining context saliency as a part of learning ...... 107

5.4. Relationship between hippocampal context codes and reinforcement based learning ...... 108

5.4.1. Functional connectivity between reinforcement and hippocampal systems...... 108

5.4.2. A role for dopamine in hippocampal-dependent learning and plasticity ...... 109

5.4.3. Impact of hippocampal context processing on dopamine cell responses to reward ...... 110

6. The neurobiology of reinforcement learning and goal-directed navigation: striatal contributions ...... 112

6.1. Striatal based navigational circuitry ...... 112

6.2. Dopamine signaling and reward prediction error within the striatum ...... 113

Abbreviations: BLA, basolateral amygdale complex; DLS, dorsolateral striatum; DMS, dorsomedial striatum; LDTg, lateral dorsal tegmental nucleus; mPFC, medial prefrontal

cortex; OFC, orbitofrontal cortex; PPTg, pedunculopontine nucleus; SI/MI, primary sensory and motor cortices; SNc, pars compacta; vPFC, ventral prefrontal

cortex; VTA, .

* Corresponding author at: Department of Psychology, Box 351525, University of Washington, Seattle, WA 98195-1525, United States. Tel.: +1 206 685 9660;

fax: +1 206 685 3157.

E-mail addresses: [email protected], [email protected] (Sheri J.Y. Mizumori).

0301-0082/$ – see front matter ß 2011 Published by Elsevier Ltd. doi:10.1016/j.pneurobio.2011.08.010

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 97

6.3. The ventral striatum: Pavlovian learning and cost-based decision making ...... 115

6.3.1. and Pavlovian learning...... 116

6.3.2. The nucleus accumbens and cost-based decision making ...... 116

6.3.3. Spatial learning and navigation: the role of the ventral striatum ...... 118

6.4. Dorsal striatum: contributions to response and associative learning...... 118

6.4.1. Action–outcome learning and habit learning in the dorsal striatum ...... 119

6.4.2. Response learning in the dorsal striatum...... 119

6.4.3. Sequence learning in the dorsal striatum...... 120

6.5. Interactions between the dorsomedial and dorsolateral striatum ...... 120

7. Neural systems coordination: cellular mechanisms ...... 121

7.1. Single cells and local network coordination ...... 121

7.2. Neural systems organization and oscillatory activity ...... 122

7.2.1. Theta rhythms ...... 122

7.2.2. Gamma rhythms ...... 122

7.2.3. Coordination of theta and gamma rhythms ...... 122

8. Neural systems coordination: decisions and common foraging behaviors ...... 123

8.1. Goal directed navigation in a familiar context ...... 123

8.2. Goal directed navigation in a familiar context following a significant change in context...... 123

8.3. Goal directed navigation in a novel context ...... 124

9. The challenges ahead...... 125

Acknowledgements ...... 125

References ...... 125

1. Introduction animals will not acquire the efficient learning strategies necessary

for adaptive behaviors. It should be noted that the suggestion to

Nearly all cognitive processes utilize or include some aspect of link reinforcement learning ideas with navigation dates back

spatial information processing. An animal’s ability to find its way decades, although the terminology may be different (e.g., cost–

around its world is critical for survival; it is crucial for obtaining benefit analysis of foraging behavior vs. value-based decision

food, avoiding prey and finding mates. Research into spatial making). By investigating this link in freely navigating animals, we

information processing over many decades not only continues to may be able to uncover the mechanisms that underlie naturalistic

define the mechanisms that contribute to spatial information motivated behaviors.

processing, but these efforts have also provided significant insight

into the fundamental mechanisms that underlie learning and 2. Navigation and foraging behavior

memory more generally.

Within the laboratory, goal-directed spatial navigation, in The natural foraging environments on which laboratory

particular, is an immensely useful behavior to study because in navigational tasks are based are tremendously complex. The

many ways it reflects ethologically relevant learning challenges, forager’s challenge is to acquire sufficient food stores to prevent

and provides opportunities to examine dynamic features of neural starvation, produce viable offspring, and avoid predators. A natural

function that are otherwise not afforded by more simple tendency for many animals, including rodents, is to hoard small

behavioral paradigms and tasks. Goal-directed navigation is a amounts of food in a scattered distribution within their home

complex behavior, requiring the subject to perceive its environ- range or nest (Stephens, 1986). The caching of food requires careful

ment, learn about the significance of the environment, and then route planning to and from the source of food, the cache, and the

select where to go next based upon what has been learned. Thus, home nest. Moreover, because animals acquire food during times

navigation-based tasks can be used to investigate behavioral and when it is abundant, and recover it when food sources are scarce,

neural aspects of external and internal sensory perception, the animal must retain knowledge of where the food has been

learning and decision making, memory consolidation and updat- cached. This behavior, a naturally occurring spatially directed

ing, and planned movement. Goal-directed navigation, then, is a behavior, is evident in many species, including rodents, birds,

powerful model by which to study dynamic neural systems spiders, honeybees, and humans (e.g., Anderson, 1984; Davies,

interactions during a fundamental and complex natural behavior. 1977; Diaz-Fleischer, 2005; Goss-Custard, 1977; Hawkes et al.,

As a whole, efforts to understand the neurobiology of 1982; Waddington and Holden, 1979).

navigational behavior have focused mainly on the nature and The development of mathematical models that formally

mechanisms of spatial representation in limbic brain structures defined naturally occurring foraging behaviors led to optimal

that are known to be important for spatial learning. As a result, foraging theory which describes the foraging behavior of an animal

there have been important revelations regarding the physiological in relation to the metabolic payoff it receives when using different

mechanisms that control limbic spatial representations. Relating foraging options. Most animals are adapted structurally and

such representations, however, to limbic-mediated learning or physiologically to feed on a limited range of food and to gather this

memory has been indirect and correlational at best (as discussed in food in specific ways (e.g., caching of food during times of

Mizumori et al., 2007a). Here, we suggest that careful application abundance). Some food may contain more energy but be harder to

of reinforcement learning theory to an understanding of how capture or be further away, while food that is close at hand may not

decisions are made during goal-directed navigation can identify a be considered as nutritionally profitable. According to optimal

fundamental and essential process that likely underlies naviga- foraging theory, an ‘optimal forager’ will make decisions that

tion-related perception, learning, memory or response selection. maximize energy gain and minimize energy expenditure (Krebs

That is, in order to understand how spatial representations are and McCleery, 1984; Stephens, 1986). Two foraging models are of

related to learning, it is necessary to understand how decisions are note: the ‘prey model’ proposed by MacArthur and Pianka (1966),

made during navigation from both neural and behavioral and the ‘patch model’ proposed by Charnov (1976). The prey model

perspectives. Without the ability to make adaptive decisions, seeks to define the criteria that determine whether prey items will

98 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

be consumed based on the level of energetic investment needed to

acquire the prey and the rate of energetic return (MacArthur and

Pianka, 1966). One prediction of the prey model is that when there

is an abundance of high quality food, an animal’s diet will consist

mainly of these items, and lower quality food is less likely to be

consumed. The patch model, on the other hand (Charnov, 1976),

takes into account the energy expended when an animal searches

for food that is clumped in space and time, and thus must decide

how long to spend foraging within a food patch before abandoning

it and moving onto another (i.e., exploration vs. exploitation).

These models have been mapped onto the behavior of several

species (e.g., Anderson, 1984; Cowie, 1977; Davies, 1977; Diaz-

Fleischer, 2005; Goss-Custard, 1977; Lima, 1983), and they

demonstrated decades ago the strength of applying an economic

approach to the study of naturally occurring, complex behaviors.

3. Laboratory tasks that are based on foraging behavior

The study of navigational behavior within the laboratory

became central to the study of learning and memory function with

the introduction of the rat as the primary research subject (Munn,

1950). There are a number of reasons why rodent foraging behavior

is an ideal model with which to study complex learning in the

laboratory: (1) rodents are naturally excellent foragers, and

therefore they tend to learn tasks based on this ability

exceptionally well; (2) we can apply our understanding of the

brain’s natural motivational circuitry to gain new clues about the

mechanisms of a highly evolved and adaptive complex learning

system; (3) despite its complexity – which is what most real world

learning is – this model is highly tractable; (4) within the human

literature, navigation-based tasks have been developed that mimic

the tasks used with rodents (e.g., Astur et al., 1998; Burgess et al.,

2002; Fitting et al., 2007; Hamilton et al., 2002).

As early as the late 1890s and early 1900s, Willard S. Small used

one of the first mazes to investigate learning by white rats (Small,

1899, 1900, 1901), and others soon followed (e.g., Carr, 1917;

Fig. 1. Laboratory tasks used to assess navigational behaviors. (A) Morris swim task.

Honzik, 1933; Tolman, 1930; Watson, 1907). Early mazes

Photograph of a rat swimming in the cued version of the Morris swim task, in which

consisted of a system of runways or alleys arranged in various

an escape platform is clearly visible to the rat. In the spatial version of the task, the

configurations. The first investigations into maze learning were platform is submerged beneath the opaque water, and the rat uses distal cues

around the room to locate the platform. (B) Barnes Circular Platform Task.

aimed primarily at determining which sensory inputs were

Photograph of a rat making an ‘error’ on the Circular Platform Task by looking into a

essential for successfully navigating a maze to the intended goal,

hole that is not over the dark escape chamber. The arrow points to the correct

and this led to the assumption that navigation through a maze is

location of the hole over the goal, which the rat must find on the basis of the features

performed purely on proprioceptive responses (i.e., stimulus– of the environment distal to the platform. (C) Radial arm maze. Photograph of a rat

response behavior), although later studies demonstrated that on one of the 8 arms of the radial maze, which is designed to mimic natural foraging

behaviors. At the end of each of the arms is a food cup where reward is delivered. At

stimulus–response strategies were not sufficient to optimally solve

the beginning of a trial, subjects are placed in the center of the maze and allowed

complex mazes (Munn, 1950; O’Keefe and Nadel, 1978a,b). While

access to all of the maze arms, but only a subset of the arms will actually contain a

many different kinds of mazes were developed in the early years of

reward (usually four). After a rentention delay, the subject is returned to the maze.

maze use, only a select few are still used, and these are well suited In win-stay conditions, the same four arms are baited after the delay, and the

number of correct choices the subject makes in collecting these rewards is recorded.

for studying reinforcement learning in the context of navigation.

In win-shift conditions, the four arms not baited in the earlier trial are now baited,

These include the T-Maze, and similar variations including the

and the number of correct arm choices is recorded. Each day, a new set of four arms

multiple T-maze, the plus maze, and the Y-maze. The radial maze,

is chosen randomly. (D and E) Schematic of a plus maze. The plus maze represents a

introduced by David Olton in 1976, is another excellent and well- ‘dual solutions’ problem in that it can be solved using a ‘response’ strategy or a

used example of a so-called ‘multiple solutions’ laboratory task ‘place’ strategy. In the place/response task, rats are trained to retrieve food from one

arm of a T-maze or cross maze. The content of learning can be assessed by moving

(Olton and Samuelson, 1976). Unlike many of the mazes used in

the starting arm to the other side of the maze on a probe test. The animal may enter

the early days, the solution to these sorts of maze tasks is

the arm corresponding to the location of the reward during training (place strategy)

sufficiently ambiguous that successful performance is based on

or the arm corresponding to the turning response that was reinforced during

more than a single trajectory to a unique goal, and this allows for training (response strategy).

Photograph in panel (A) taken by Dr. J. Lister; photograph in panel (B) taken by Dr.

testing of more than one cognitive strategy (see Fig. 1).

C.A. Barnes; photograph in panel (C) taken by D. Jaramillo. All used with permission.

The plus maze figured prominently in early debates between

behaviorists and cognitive learning theorists who pondered what,

exactly, an animal learned that enabled it to find the goal on the other hand, argued that rats could engage in goal-directed

maze (Hull, 1932, 1943; Packard, 2009; Restle, 1957; Tolman, behaviors to solve the maze task, meaning that animals were

1930). Behaviorists argued that all behavior is simply elicited by capable of learning the casual relationship between their actions

antecedent stimuli within the environment, and thus a task such as and the resulting outcomes, allowing them control over their own

the plus maze can be solved simply via stimulus–response action based on their desire for that particular outcome (Tolman,

associations (Guthrie, 1935). Cognitive learning theorists, on the 1948). The plus-maze is arranged so that a goal location can be

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 99

approached from one of two start boxes. In the standard ‘dual

solution’ version of the task, rats are consistently released from the The Goal

same start arm, and are trained to retrieve reward from another in a Given Context

consistently baited maze arm. Rats can use one of two strategies to

solve this task: they can acquire information concerning the spatial

location of the goal and use that information to navigate to the

Learning & Memory Value Assessment

rewarded arm (i.e., a place strategy), or the rat can learn to

approach the rewarded location by acquiring a specific response,

such as a right body turn to reach the reward (i.e., a response

strategy). To determine which strategy the rat is using, a probe trial

Action Selection

can be given in which the rat starts the task from a different arm of

the maze. Rats with knowledge of the spatial location of the food

should continue to approach the rewarded arm on the probe trial,

whereas rats that have learned a specific body turn should choose

Outcome Evaluation

the opposite arm. A number of factors can influence which strategy

a rat will ultimately use to reach the goal, including the amount of

training the animal receives. Rat that are overtrained on this task

tend to predominantly use a response strategy, whereas most rats

will use a place strategy early on in training. Thus, overtraining Fig. 2. A general conceptual framework for evaluating goal-directed decision

making behavior. Within a context, an assessment of the internal and external

results in a shift from goal-directed action–outcome learning and

factors of the current situation help to determine the current goal for behavior. The

strategy use to less flexible stimulus–response learning and

factors that influence goal assessment include internal states (e.g., hunger or thirst)

strategy use (Packard, 1999; Packard and McGaugh, 1996). Other and external factors (e.g., distance to different goal locations, presence of

goal-directed navigation-based tasks that are widely used include predators). A value assessment involves considering how rewarding any one

goal is (e.g., a far away large cache of food vs. an uncertain but close cache) and

the Morris Swim Task (Morris, 1981) and the Barnes Circular

assigns value to each of the available options. An action is selected and is then

Platform Task (Barnes, 1979). All of the above tasks test goal-

implemented. An evaluation of the outcome is made. Did the behavior result in the

directed navigation that requires active decision making and

expected reward? Was the outcome better (e.g., more food) or worse (no food) than

learning about how reinforcers influence choices that are made. expected? The outcome of the behavior results in learning when the outcome does

These tasks can be contrasted to other ‘foraging’ tasks in which the not match the expectation, and might be considered ‘complete’ when a mismatch

between what is expected and what is actually achieved no longer occurs. Memory

animal is not required to implement a decision-based strategy,

stores can then be updated to guide subsequent behavior.

including random foraging (for bits of food sprinkled randomly

After Rangel et al. (2008).

around an open platform or box), tasks in which movement is

passive (i.e., ‘assisted’ exploration), or tasks in which animals

follow paths provided by the experimenter until rewards are ‘complete’ when the outcome of the chosen course of action is

encountered. aligned to the expected outcome. If the outcome, on the other

Navigational strategies (such as those just described) may range hand, is better or worse than expected, learning about which

from relatively simple approach and avoidance behavior to the use actions will lead to an optimal outcome continues.

of complex representations of the environment (e.g., geometrical These processes are, of course theoretical in nature and not

maps). In the context of natural foraging, the goal is to find food absolute, but help to guide our thinking about the neurobiological

while avoiding predators and minimizing energy expenditures. processes that contribute to successful goal-directed navigation. It

Similarly, in many maze tasks, the goal of a hungry rodent is to find may be prudent, at this point, to define ‘reward’ (for the sake of

food, or avoid unpleasant situations, such as cool water or bright simplicity, we consider reward to be synonymous with goal).

open spaces. In most cases, an animal is faced with more than one Rewards can be defined as objects or events that elicit approach

option. In a natural foraging context, animals may be faced with a and consummatory behavior, and they represent positive out-

situation in which it must take into account the energy expended comes of decisions that result in positive emotions and hedonic

while searching for food, and thus must decide how long to spend feelings. Rewards are crucial for survival and support elementary

foraging within a food patch before abandoning it and moving onto processes such as drinking, eating and reproduction. For other

another (i.e., exploration vs. exploitation). On a maze task (e.g., 8- situations, rewards can also be more abstract, such as money,

arm radial maze) the animal may need to decide which arms on the social status, and information. (e.g., Bromberg-Martin and

maze to visit first, for example, an arm that always has a small food Hikosaka, 2009; Corrado et al., 2009).

reward, or an arm that only sometimes has a large food reward. To

determine a course of action, the animal will engage in ‘value- 4. Reinforcement learning and decision making environments

based decision making’, which can be broken down into several

steps (Fig. 2; Rangel et al., 2008; Mizumori et al., 2000; Sutton and Reinforcement learning describes the process through which an

Barto, 1998). First, the organism needs to determine the goal of the organism learns to optimize behavior within a decision environ-

current behavior, a process that may include the assessment of ment (see Fig. 3). The ultimate goal of reinforcement learning is to

one’s internal state, such as level of hunger, and external context, implement behaviors or actions that result in a maximization of

such as risk in the environment. Next, a value assignment is made reward or minimization of punishment. The decision-making

for each available action, taking into consideration the relative cost environments in which reinforcement learning occurs consist of a

or benefit associated with each action. Once these values have been set of ‘states’ (Sutton and Barto, 1998), which in the case of

assigned, they can be compared, and a choice is then made about navigation, can be represented by locations on a maze (e.g., the

which behavior to select, and it is then implemented. An analysis of center platform would be one ‘state’, the end of an arm another

the outcome of the behavior can then be determined. Did the ‘state’); a set of possible actions that the decision-maker can

action result in the desired outcome? Was the outcome better than choose from (e.g., turn left or travel south); and a set of rules that

expected, or worse? Finally, this feedback is used to update the decision-maker will initially be naı¨ve to, and thus must learn

learning and memory processes so that future decisions can be via interaction with the environment (e.g., a large reward is always

impacted by what has just been learned. Learning is said to be available on the south maze arm). The actions or behaviors that the

100 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

A Start B Start

S1 S1

S4 S2 S3 + S4 S2 S3 + ++ ++

S5 S5 - - Model-free trial and error Model-based action-outcome

decision making decision making

Fig. 3. Reinforcement learning on a maze task. (A) Schematic of model-free trial and error decision making on a plus maze task. Model-free reinforcement learning involves

learning action values directly, by trial and error. The environments in which learning occurs consists of a set of states (i.e., locations on the maze), and each state (S1–S5) is

initially independent of other states. Because the decision-maker has not had experience with the states, they will all have similar values assigned to them, and are thus

equally likely to be chosen. (B) Schematic of model-based action–outcome decision making. The ultimate goal of reinforcement learning is to select actions that result in a

maximization of reward. Model-based reinforcement learning uses experience to construct an internal model, for example, a cognitive map, of the transitions and immediate

outcomes in the environment. Through trial error learning, this representation is constructed, and helps to strengthen the connection between states. In the example shown

here, thicker lines represent stronger associative connections, while thinner lines represent connections that are not as strong. Dashed lines indicate that an association has

not been strengthened, as in the case when reward is not delivered at one of those states (S5). In this example, the decision-maker has learned that choosing to go from S2 to S4

results in a large reward, whereas moving from S2 to S3 results in acquisition of a small reward. In a dynamic environment, the value of the rewards may change, resulting in

either strengthening or weakening of states.

decision-maker implements move the agent from one state to modeling behavioral and neural aspects of reward-related learning

another, and produces outcomes which can have positive or (e.g., Bayer and Glimcher, 2005; Kurth-Nelson and Redish, 2009,

negative utilities (e.g., finding a large reward, a small reward or no 2010; Ludvig et al., 2008; Maia, 2009; Montague et al., 1996;

reward). Finally, the utility of the outcome can change, even within Nakahara et al., 2004; O’Doherty et al., 2003; Pan et al., 2005, 2008;

the same state, by factors such as the motivational circumstances Schultz et al., 1997; Seymour et al., 2004) such that reward

of the decision-maker, such as a change from hunger or thirst to predictions are constantly improved by comparing them to actual

satiation (e.g., Aberman and Salamone, 1999; Dayan and Daw, rewards (Sutton and Barto, 1998). According to such models, an

2008; Dayan and Niv, 2008; Niv, 2009; Sutton and Barto, 1998). expected reward value for a given state is estimated. When

Reinforcement learning models are often divided into model- external reward is delivered, it is translated into an internal signal

free and model-based categories (e.g., Daw et al., 2005; Niv et al., that enters into a computation that determines whether the value

2006). Using model-free reinforcement learning strategies, ani- of the current state is better or worse than predicted. Signals that

mals learn the value of each action directly, by trial and error. In reflect discrepancies between expected and actual reward values

contrast, model-based reinforcement learning uses experience to can be used to update future expected values and reward

construct an internal model, for example, a cognitive map, of the probabilities. The temporal difference model can be used to

transitions and immediate outcomes in the environment. Animals describe how neural responses to stimuli change during learning;

can then estimate the value associated with each action in every as prediction improves, these responses reflect the linking of

trial using knowledge about their costs and benefits. Within the stimuli with their expected probability of reinforcement. By

framework of navigational behavior, this kind of learning allows extension, then, the temporal difference model predicts that neural

action selection to be dynamic, changing as the rules within the activation will gradually shift from the time of reward to the time

environment change, and is thus suited to support goal-directed of the predictors of subsequent reinforcement (reviewed in Suri,

behaviors. Learning using both model-based and model-free 2002; Suri and Schultz, 2001). Indeed, different types of neurons

strategies is generally driven by ‘prediction errors’, which are have been shown to exhibit these sorts of changes in firing during

the differences between actual and expected outcomes, and are learning (Hollerman and Schultz, 1998; Mirenowicz and Schultz,

used to update expectations in order to make predictions more 1994; Schultz et al., 1993).

accurate. Although the neural circuitry by which temporal difference

computations occur remains to be clarified, a popular idea is that

4.1. Temporal difference learning there is one neural network that selects behaviors (the ‘actor’), and

a second neural network that evaluates the outcomes of the

A critical problem in animal and human decision making is how behaviors selected by the actor. That second network is referred to

to choose behaviors that will lead to reward in the long run. A as the ‘critic’ (e.g., Houk et al., 1995; Sutton and Barto, 1998). The

‘classic’ approach to this problem was proposed by Rescorla and fact that neurons within the reward circuitry represent action, and

Wagner (1972) who argued that learning occurs when there is a sometimes action sequences, as well as reward (Graybiel, 1998;

discrepancy between events that are predicted and those that Hikosaka et al., 1989, 1999; Lavoie and Mizumori, 1994; Mulder

actually happen (Rescorla and Wagner, 1972). An extension to the et al., 2004; Schmitzer-Torbert and Redish, 2004; Schultz et al.,

Rescorla–Wagner model was proposed by Sutton (1988) and 1997; van der Meer et al., 2010; Wiener, 1993) was taken as initial

Sutton and Barto (1998) in a model which came to be known as evidence to support an actor–critic explanation. Computational

‘temporal difference learning’. This has been widely used in models suggest that the critic compares the outcome of the action

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 101

of the actor against the expected value based on past experience. If The prediction error hypothesis has garnered a great deal of

there is a discrepancy between predicted and actual rewards (i.e., a attention since it was first proposed because it is exactly the kind of

reward prediction error), a temporal difference reinforcement teaching signal that figures prominently in many models of

signal is used to update the value signal in memory. Future actions learning, including the Rescorla–Wagner model and the temporal

are then selected according to whether they are expected to difference reinforcement learning algorithm (Rescorla and

produce a maximal value reward. Wagner, 1972; Sutton and Barto, 1998; Sutton, 1988). There is,

The striatum has received much attention as the locus of the however, evidence that dopamine may also function in other

actor–critic function (e.g., Joel et al., 2002). The lateral dorsal capacities to facilitate learning. For example, while most con-

striatum is often considered to mediate stimulus–response or ceptualizations focus on reward-related signaling in the positive

habit learning, while the ventral striatum and medial dorsal sense, there is also evidence that a subpopulation of dopamine

striatum are thought of as evaluators of the outcomes of actions neurons exhibit phasic responses to aversive stimuli or to cues that

(see Section 6). Thus, many view the actor–critic networks as predict aversive events (e.g., Brischoux et al., 2009; Joshua et al.,

corresponding to the lateral dorsal striatum and ventral/medial 2008; Matsumoto and Hikosaka, 2009; Zweifel et al., 2011). In

dorsal striatum, respectively (e.g., van der Meer and Redish, 2011; addition, there are data suggesting that dopamine may provide a

van der Meer et al., 2010). Since reward prediction error signals are reward risk signal (Fiorillo et al., 2003), and also signal non-

coded by dopamine neurons as well (Khamassi et al., 2008; rewarding salient events, such as surprising or novel stimuli

O’Doherty et al., 2004; Schultz, 1997), dopamine neurons may also (Redgrave and Gurney, 2006). Thus, a broader conceptualization of

contribute to analysis by the critic. Others suggest that there are the role of dopamine in learning has emerged (e.g., Berridge, 2007;

multiple actor–critic functional modules within striatum, and Bromberg-Martin et al., 2010; Redgrave and Gurney, 2006;

these correspond to the matrix–patch cellular subdivisions that Redgrave et al., 1999b; Salamone, 2007; Wise, 2006). Based on a

run through both dorsal and ventral striatum, respectively (Houk, growing body of experimental evidence that suggests that

1995). While the issue of localization remains to be resolved, it is different subgroups of neurons within the midbrain respond

becoming clearer that the neurocircuitry underlying critic func- differentially to, reward, aversive stimuli and novelty, Bromberg-

tions extends across, at least, the dopaminergic-striatal circuitry Martin et al. (2010) suggest that some dopamine neurons encode

(see Section 6). reward value, necessary for reward seeking and value learning,

It is worth noting that as appealing as the temporal difference while others encode motivational salience necessary for orienting

model is, it cannot represent the full picture for how reinforcement and general motivation.

outcomes are determined. This is because reward is often delayed, One hypothesis about how dopamine supports reinforcement

and can be separated from the action for which it was rewarded by learning is that it adjusts the strength of synaptic connections

other, irrelevant actions. Such a delay creates an accountability between neurons according to a modified Hebbian rule (‘neurons

problem referred to as the problem of ‘temporal credit assignment’ that fire together wire together’; Hebb, 1949). Conceptually, if cell

(Sutton and Barto, 1998). Studies of goal-directed navigation could A activates cell B, and cell B results in an action that is rewarded,

be particularly useful for determining how the brain naturally dopamine is released and the A/B connection is reinforced

solves the temporal credit assignment: one can imagine a case (Montague et al., 1996; Schultz, 1998a,b). With enough experience,

when an animal will have to make a decision at, for example, a ‘fork this mechanism would allow an organism to learn the optimal

in the road’. After enacting a decision about which way to turn, a choice of actions to gain reward. In fact, dopamine has been shown

number of pathways may become available, the selection of any to facilitate synaptic plasticity in several mnemonic brain

one of which will lead to the goal (see Fig. 3). The next time the structures (Frank, 2005; Goto et al., 2010; Lisman and Grace,

animal encounters the ‘fork in the road’, it will have to remember 2005; Marowsky et al., 2005; Molina-Luna et al., 2009; Surmeier

which of the many subsequent alternatives led to the desired goal. et al., 2010). The precise information transmitted when dopamine

cells fire is not clear. To address this issue, it is necessary to

4.2. Dopamine and reinforcement learning understand the firing patterns of dopamine neurons, and the

factors that regulate these patterns. Dopamine signals occur in two

A critical and unresolved issue is how the brain implements modes, a tonic mode and a phasic mode (Grace, 1991; Grace et al.,

reinforcement learning algorithms. In a series of pioneering studies 2007). Tonic dopaminergic signaling maintains a steady baseline

conducted in non-human primates, Schultz et al. (1997) provided level of dopamine in afferent structures. While a precise functional

evidence that one of the primary neural correlate of reinforcement role for the tonic dopamine signal has not yet been established

learning theory may reside in the signal provided by midbrain (Ostlund et al., 2011), one intriguing hypothesis is that tonic

dopamine neurons. Dopamine neurons respond with phasic bursts dopamine may represent the ‘‘net value’’ of rewards, and underlie

of action potentials when an unexpected reward is delivered, and the vigor with which responding is made (Niv et al., 2007). Phasic

also respond to conditioned cues that predict reward (Ljungberg dopamine, on the other hand, is the dopaminergic signal that is

et al., 1992; Mirenowicz and Schultz, 1994). When, however, an thought to do the heavy lifting, at least in terms of reward

expected event or reward does not occur, the activity of some processing (Schultz, 1997; Schultz et al., 1997; Wise, 2005) and

putative dopamine cells tend is inhibited. Thus, a reward that is incentive salience that promotes reward seeking (Berridge and

better than predicted can generate a positive prediction error, a Robinson, 1998). Dopamine may have unique effects across

fully predicted reward elicits no error, and a reward that is worse different efferent targets, however, since (a) the regulation of

than predicted can elicit a negative prediction error (e.g., Bayer and tonic vs. phasic activation of dopamine cells is controlled by an

Glimcher, 2005; Hollerman and Schultz, 1998; Hollerman et al., array of diverse inputs, and (b) dopamine efferent systems express

1998). In this way, dopamine acts as a teaching signal that enables different levels and types of dopamine receptors. Important for the

the use of flexible behaviors during learning (Schultz and present discussion, both the ventral tegmental area (VTA) and the

Dickinson, 2000), and facilitates motivated behaviors by signaling substantia nigra pars compacta (SNc) project to the hippocampus

the salience of environmental stimuli, such as cues that predict and to the striatum, two brain structures frequently discussed in

food (Berridge and Robinson, 1998; Flagel et al., 2011; Salamone terms of goal-directed navigation and learning. How dopamine

and Correa, 2002). In addition, the prediction error signal appears contributes to information processing within these structures

to take into account the behavioral context in which rewards are during navigation-based learning will be discussed in the

obtained (Nakahara et al., 2004). following sections.

102 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 A Place cells Neocortex

Postrhinal Perirhinal cortex cortex

Medial Lateral entorhinal cortex

CA1

CA3 Dentate CA1 CA3 gyrus

Spatial information B Grid cells

Nonpatial information Parahippocampal cortex

Fig. 4. Flow of cortical information to hippocampus. Multimodal sensory, motor,

and associative information arrive in the hippocampus primarily through the

parahippocampal cortex. The anatomically distinct medial entorhinal cortex and

lateral entorhinal cortex receive spatial and nonspatial information from distinct

adjacent cortical regions of the postrhinal cortex (spatial), which receives input

from the parietal and retrosplenial cortices (not shown), and perirhinal cortex

(nonspatial), respectively. Both entorhinal cortical regions, in turn, project to the

, CA3, CA1 and subicular regions of hippocampus proper. Although all

intrahippocampal regions receive neocortical input, each is thought to make a

distinct contribution to the determination of context saliency as context

information passes through from the dentate gyrus to the subiculum. The red

arrow refers to the large recurrent excitatory system found amongst CA3 neurons.

Presumably this unique pattern allows for information to be held on-line for brief periods.

Head direction cells

5. The neurobiology of reinforcement learning and goal- C

directed navigation: hippocampal contributions

The previous discussion clearly illustrates the central role of

dopamine in decision-making processes that lead to effective

learning. In this section, we first describe the hippocampal neural

circuit whose dynamic and interactive functions form the

substrate on which the dopamine system acts, then discuss how

this circuit guides decision making (and ultimately learning) by

identifying the saliency of a context (i.e., whether a familiar

context has changed or if the current context is novel). Both

instances of context analysis may rely on the same computation.

5.1. Hippocampal place fields as spatial context representations

The hippocampal complex is comprised of hippocampus proper

and the surrounding parahippocampal cortex. Generally speaking,

there are two tracks of information flow into the hippocampus Fig. 5. (A) Schematic illustration of location-selective firing by a hippocampal CA1

from the neocortex (see Fig. 4). Spatial information arrives from the place cell (red), and a hippocampal CA3 place cell (blue). As shown, CA3 place fields

tend to be more spatially constricted than CA1 place fields. Also place fields

postrhinal region to the medial entorhinal area of posterior cortex.

typically show a Gaussian distribution of firing as an animal traverses the place

In contrast, predominantly nonspatial information is passed from

field. (B) Entorhinal cortex contains cells that show regularly spaced location-

the perirhinal cortex to the lateral entorhinal cortex. Both

selective firing. These are referred to as grid cells as the firing fields can be viewed as

entorhinal cortices in turn project to all of the subregions of vertices of a grid that covers a particular environment. (C) A third type of spatial

hippocampus proper (which includes the dentate gyrus, CA3, CA1 representation is one that relays information about the directional heading of an

animal. In this example, the arrows indicate the preferred orientation direction of a

and subicular areas; Amaral and Lavenex, 2006; Burwell, 2000;

cell: if the animal orients its head in the northeast direction of the environment

Burwell and Amaral, 1998a,b; Van Strien et al., 2009).

(from any location), the cell will preferentially fire. Typically, when the rat orients

Single unit recording studies have generated foundational its head in other directions, a head direction cell will not fire.

information for theories of hippocampal function. The most

commonly reported behavioral correlate of hippocampal output

neurons (pyramidal cells) is location-selective firing, referred to as

place fields (see Fig. 5 for an example; O’Keefe and Dostrovsky,

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 103

1971). The seminal discovery that hippocampal pyramidal neurons (Buzsaki, 1989; Fenton and Muller, 1998; Ferbinteanu and Shapiro,

exhibit remarkably distinct and reliable firing when rats visit 2003; Ferbinteanu et al., 2011; Foster and Wilson, 2006; Frank

particular regions of the environment led to a widely held view of et al., 2000; Lee and Wilson, 2002; Louie and Wilson, 2001;

hippocampus as a cognitive map (O’Keefe and Nadel, 1978a,b). Olypher et al., 2002; Pennartz et al., 2002; Touretzky and Redish,

Decades of research (for reviews see McNaughton et al., 1996; 1996; Redish, 1999; Wilson and McNaughton, 1994; Wood et al.,

Mizumori et al., 1999; Muller et al., 1996; O’Keefe, 1976; O’Mara, 2000; Yeshenko et al., 2004). Additional reports provide evidence

1995; Wiener, 1996) clearly demonstrate that place fields reflect that place fields reflect expectations based on learned reward

more than details of the current external sensory surround since information (e.g., Jackson and Redish, 2007). Place fields have been

they are observed when external cues are essentially absent observed to move closer to goal locations as animals gain more

(McNaughton et al., 1996; O’Keefe and Conway, 1978; Quirk et al., experience receiving rewards at the goal (Hollup et al., 2001;

1990). Further, in the absence of external sensory cues, temporal or Lenck-Santini et al., 2001, 2002). Further, when compared to times

internal sensory cue information has been shown to shape the of random foraging, a larger proportion of hippocampal neurons

characteristics of place fields. For instance, the elapsed time since exhibit reward responsiveness when rats are explicitly trained to

leaving a goal box can often be a better predictor of place fields discriminate reward locations ( and Mizumori, 2006b). Thus,

than the external features of an environment (Gothard et al., 1996; an animal’s motivational state or its expectations or successful

Redish et al., 2000). Also, internally generated sensory and motion behavioral outcomes contribute to how learning-related brain

information about one’s own behavior impacts place fields: the structures code information that is directly relevant to future

velocity of an animal’s movement through a place field, the decisions and behavioral choices.

direction in which rats traverse a place field, and vestibular (or Place fields, then, appear to represent a matrix of information

inertial) information has been shown to be correlated with place that includes location-selective salient features such as external

cell firing rates (e.g., Gavrilov et al., 1998; Hill and Best, 1981; and internal sensory information, an animal’s past, present, and

Knierim et al., 1995; Markus et al., 1994; McNaughton et al., 1983; future behaviors relative to the target location, as well as the

Wiener et al., 1995). Evidence indicates that the location selectivity expectations for the consequences of behaviors. This sort of

of place fields is positively related to the degree of sensitivity to complex representation has been taken as evidence that during

internally generated cues: for example, the extent to which place active navigation, the hippocampus represents spatially organized

fields are sensitive to internally generated cues systematically contextual information, perhaps for the purpose of determining

declines from the septal pole to the temporal pole of hippocampus the salience of the current context. Context saliency refers to not

(Maurer et al., 2005), and place fields become increasingly larger only the significance of currently existing contextual features, but

for place cells recorded along the dorsal-to-ventral axis (e.g., Jung also the extent to which the expected contextual features have

et al., 1994). Also supporting the conclusion that (at least dorsal) changed (e.g., Kubie and Ranck, 1983; Mizumori et al., 1999, 2000;

hippocampal place fields represent egocentric information are Mizumori, 2008; Nadel and Payne, 2002; Nadel and Wilner, 1980).

findings that the degree to which animals are free to move about in This conclusion is consistent with a literature documenting the

an environment predicts place field specificity (Foster et al., 1989; impact of hippocampal lesions on animals’ use of contextual

Gavrilov et al., 1998; Song et al., 2005). Compared to passive information (for reviews see Anagnostaras et al., 2001; Maren,

movement conditions in which rats are made to go through a place 2001; Myers and Gluck, 1994). For example, subjects with

field either by being held by the experimenter or by being placed hippocampal damage do not exhibit conditioned fear responses

on a moveable robotic device, active and unrestrained movement to contextual stimuli even though responses to discrete condition-

corresponds to the observation of more selective and reliable place al stimuli remain intact (Kim and Fanselow, 1992; Phillips and

fields (Terrazas et al., 2005). The fact that neural representations in LeDoux, 1992). While intact subjects exhibit decrements in

the brain are so dramatically affected by voluntary and active conditioned responding when the context is altered, subjects with

navigation provides a compelling argument for studying not only lesions of the hippocampus (Penick and Solomon, 1991) or the

learning, but also decision making, in animals that navigate entorhinal cortex (Freeman et al., 1997) do not. These findings

spatially extended environments. converge on a hypothesis that hippocampus is important for

One interpretation of the sensitivity of place fields to both determining context saliency.

egocentric and allocentric information is that it allows rats to It is important to note that a context processing interpretation

rapidly switch between multiple cue sources, thereby insuring of hippocampal neural representations is entirely consistent with a

continuously adaptive choices (e.g., Etienne and Jeffery, 2004; number of hypotheses that have been put forth to account for

Gavrilov et al., 1998; Knierim et al., 1995; Maurer et al., 2005; hippocampal contributions to learning, including spatial proces-

McNaughton et al., 1996; Mizumori et al., 2000; Mizumori, 2008; sing (e.g., Long and Kesner, 1996; O’Keefe and Nadel, 1978a,b;

Whishaw and Gorny, 1999). Such an ability seems advantageous in Poucet, 1993), working memory (Olton et al., 1979), relational

a constantly changing environment. The identity of the necessary learning (Eichenbaum and Cohen, 2001), episodic memory (e.g.,

changes in conditions that lead to a decision to switch strategies, Tulving, 2002), context processing (e.g., Hirsh, 1974), declarative

however, remains to be determined. memory (Squire, 1994), and the encoding of experiences in general

To identify motivational or mnemonic, rather than sensory or (Moscovitch et al., 2005). It is consistent with these other theories

behavioral state influences on place fields, rats can be trained to because context analyses represent a fundamental computation of

solve a maze task under conditions in which the external sensory the hippocampus that underlies relational learning, or episodic,

environment and the behavioral requirements of the task are held working, or declarative memory (e.g., Mizumori, 2008).

constant while the internal state or specific memory used to guide

behaviors are manipulated by the experimenter (e.g., Frank et al., 5.2. The hippocampus distinguishes contexts during navigation

2000; Kelemen and Fenton, 2010; Smith and Mizumori, 2006a,b;

Wood et al., 2000; Yeshenko et al., 2004). Under these test The literature shows that place cells are simultaneously

conditions place field representation of sensory and behavioral responsive to, and thus presumably encode, a combination of

information can be conditional upon an animal’s motivational different context-defining features such as spatial information (i.e.,

state (e.g., hungry or thirsty; Kennedy and Shapiro, 2004), as well location and heading direction), consequential information (i.e.,

as recent (retrospective coding) or upcoming (prospective coding) reward), current movement-related (i.e., velocity and acceleration

events such as behavioral sequences, or response trajectories – determinants of response trajectory), external (nonspatial)

104 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

sensory information, the currently active memory (defined cells exhibit characteristic short-lasting, high frequency bursts of

operationally in terms of task strategy and/or task phase), and action potentials when a rat passes through a cell’s place field

the current motivational state. Thus, place fields are considered to (Ranck, 1973). This type of phasic, burst firing pattern is thought to

be spatial context representations, and it has been suggested that be associated with increased synaptic plasticity (Martin et al.,

they code the extent to which familiar contexts change (Nadel and 2000), as well as the encoding of discrete features of a situation

Payne, 2002; Nadel and Wilner, 1980), perhaps by performing a that do not change very rapidly or often (e.g., significant locations,

match–mismatch comparison of expected and actual context reward expectations, task phase). Interneurons, on the other hand,

features (e.g., Anderson and Jeffery, 2003; Jeffery et al., 2004; discharge signals continuously and at high rates, a pattern that is

Mizumori et al., 1999, 2000; Vinogradova, 1995). The results of well suited to encode rapidly and continuously changing features,

match–mismatch comparisons can serve as a metric for determin- such as changes in movement and orientation during task

ing the saliency of the current context, and this in turn should be performance. The combination of context features and the

directly related to an animal’s ability to distinguish contexts. Such potential for temporally patterned discharge by both pyramidal

a discrimination function seems necessary for the hippocampus to cells and interneurons, then, provides the hippocampus with a rich

define significant events or episodes (as defined by Tulving, 2002). array of rate and temporal neural codes to use in the determination

Analogous to what has been described by others (e.g., Hasselmo, of context saliency (Mizumori et al., 1999; Mizumori, 2008).

2005a,b; Hasselmo and McGaughy, 2004; Lisman, 1999; Mizumori, It is often reported that place fields rapidly reorganize (i.e.,

2008; Smith and Mizumori, 2006a,b; Treves, 2004; Wang and change field location and/or firing rate with the place field) when

Morris, 2010) the process of comparing expected and actual an environmental context is altered. Notably, however, unless an

contexts should be automatic in nature because a change in a animal is tested in a completely novel environment, one also finds

context can happen often or at unexpected times during natural a group of place fields that are unchanged following a change in the

foraging. By continually determining context saliency (i.e., always context. Thus there seems to be two forms of context representa-

computing whether a context has changed), the hippocampus can tion in the hippocampus. The place fields that reorganize after

immediately alert other neural systems when a change does context modification may reflect current contextual features while

occurs. In this way, the hippocampus contributes to rapid learning the place fields that persist when a context changes may reflect the

of new information and the optimal implementation of adaptive expected contextual features. In principle, a novel environment

choices and behaviors. would not generate expectations, resulting in ‘complete reorgani-

What is the underlying neural circuitry that discriminates zation’, where 100% of the cells exhibit new place field properties.

contexts? A Context Discrimination Hypothesis (Mizumori, 2008; However, when an animal experiences a change in a familiar

Smith and Mizumori, 2006a) emphasizes the importance of context, one observes what is referred to as ‘partial reorganization’,

representing integrated sensory, motivational, response, and when only a subset of place fields show altered properties (for

memorial input. Indeed, place fields represent such integrated review, see Colgin et al., 2008). To explain the latter, it is helpful to

information. The relative strengths of these four types of inputs clarify that any context representation, almost by definition,

may vary depending on task demands such that a given cell may reflects a unique array of inputs. In theory, then, a change in any

show, for example, a place correlate during the performance of one one or combination of features could result in the production of an

task, and a nonspatial correlate during the performance of a ‘error’ signal that reflects a mismatch between expected and actual

different task (e.g., Wiener et al., 1989). Also, movement correlates context features (Mizumori et al., 2000). If such a ‘context

observed in one task may not be observed when the memory prediction error’ occurs, then the output message from hippocam-

component of the context, and not behavior, changes (e.g., pus should reflect this fact. Such a signal may be sent to update

Yeshenko et al., 2004). It should be noted that context discrimina- cortical memory circuits, which in turn leads to an update of the

tion by hippocampal neurons is observed not only during most recent hippocampal expectation for a context. A hippocampal

performance of spatial tasks, but also during nonspatial task output that signals a context prediction error may also be sent to

performance such as olfactory (e.g., Wiener et al., 1989) or auditory the ventral striatum to engage the critic function of the actor–critic

discrimination (Freeman et al., 1996; Sakurai, 1994). Thus, context system (described in more detail in Section 4.1). Further, a context

discrimination may be a basic hippocampal operation that can be error message should update the selection of ongoing behaviors by

universally applied to facilitate decision making, enhance learning, informing circuitry. If it is determined that the

and/or strengthen any sort of memory that uses context context has not changed (i.e., there is no place field reorganization),

information. As such, it is important to understand how context a consistent hippocampal output will result in the persistence and

discrimination is accomplished at a neural level, since this should strengthening of currently active neural activity patterns, which in

help us to understand the types of contextual information that turn maintains the same expectation information in hippocampus,

come to impact future decisions. The following summarizes the and the same behavioral expression patterns.

neural circuitry that may be responsible for determining context It is intriguing to note that the proposed error analysis by

saliency by hippocampal neurons. hippocampus is analogous to error prediction signals that

dopamine cells generate when an expected reward is not realized.

5.3. Cellular and network mechanisms underlying hippocampal It is known from studies of dopamine cells that the magnitude of

context processing the error prediction signal depends in part on the certainty and

saliency of reward (Fiorillo et al., 2003; Mirenowicz and Schultz,

Determining context saliency likely involves a number of stages 1994; Schultz, 1997; Schultz et al., 1997): the less certain it is that a

of processing within different synaptic regions of hippocampus reward will be found, the smaller the magnitude of an error

(Fig. 4). The following discussion describes how these various prediction signal. When this idea is applied to our understanding of

stages of processing may result in an assessment of context place field reorganization, one could argue that whether a place

saliency, beginning with context representation by individual field reorganizes depends on the strength of memory expectations.

neurons. A strong expectation signal to some cells may result in a high

The relative influence of context-defining input on the threshold for generating error signals, i.e., place field reorganiza-

discharge rates of place (pyramidal) cells and interneurons may tion. Place fields of these cells would tend to show persistent place

vary not only according to the strength of each type of afferent fields when there is a minor context shift. Such a condition may

input, but also the intrinsic (membrane) properties of a cell. Place apply to CA1. Other cells may not receive such a strong expectation

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 105

signal, resulting in place field reorganization following even minor themselves. The recurrent networks of the CA3 region may support

changes in context, such as that which is observed for CA3 place the short-term buffer that is postulated to be needed to determine

fields. whether specific features of the current context match expected

With the introduction of new technologies and clever contextual features (e.g., Gold and Kesner, 2005; Guzowski et al.,

experimentation by a large number of researchers, a neurobiologi- 2004; Treves, 2004).

cal model of hippocampal function has emerged that describes CA1 also seems to represent current and expected contextual

mechanisms involved in determining the saliency of a context. The information but, relative to CA3, a greater proportion of cells show

process of context comparison begins by identifying the relevant persistent place fields despite changes in a familiar context (e.g.,

stimuli and memories (or expectations). The dentate gyrus is Lee et al., 2004; Leutgeb et al., 2004; Mizumori et al., 1989b, 1999).

thought to engage in pattern separation functions that might serve CA1 place fields also show more discordant responses to context

this purpose by distinguishing between similar, potentially change than CA3 (Lee et al., 2004), and this may reflect the fact that

important inputs (Gilbert et al., 2001; Leutgeb et al., 2007; O’Reilly CA3 is driven in large part by recurrent collaterals while CA1 is not.

and McClelland, 1994; Rolls, 1996). Specifically, dentate gyrus Further, as noted above, CA3 may be more strongly tied to a spatial

place fields tend to be smaller (i.e., more spatially localized) than coordinate system than CA1, and perhaps this accounts for the

either CA3 or CA1 place fields, and they show the most immediate common findings that CA3 place fields tend to be smaller in size

response to context changes. Also, the fact that there is tremendous relative to CA1 place fields, and that more CA1 than CA3 place cells

convergence of input from the dentate gyrus to the CA3 regions show ‘split fields’, i.e., more than one location that elicits elevated

(Amaral et al., 1990) further suggests that the dentate gyrus filters, firing. All of the above differences suggest that CA1 place fields do

or separates patterns of information, for subsequent hippocampal not convey as precise location or sensory information as CA3 place

processing. The transformation of CA3 place fields to downstream fields, and consequently they may include more nonspatial

CA1 place fields is currently enigmatic since the connections are information within their neural code (Mizumori et al., 2000;

direct, yet there are clear differences in the properties of CA3 and Wiener et al., 1989). Furthermore, Henriksen and colleagues

CA1 place fields. (2010) further suggest that the extent to which CA1 conveys

spatial and nonspatial information varies depending on the

5.3.1. CA3 and CA1 place fields contributions to the evaluation of location of the CA1 place cell being recorded: distal (closest to

context subiculum) CA1 neurons show stronger spatial codes than

Hippocampal-based context evaluations require representation proximal CA1 place neurons.

of both expected and current context information. There is ample A difference in the ratio of spatial to nonspatial information

evidence that both CA1 and CA3 place fields represent both coded by CA3 and CA1 place fields may be accounted for by their

expected and current contextual information. However, recent different afferent patterns of input. For example, nonspatial

data suggest that the contributions made by CA3 and CA1 place context-defining information may arrive directly in CA1 via layer

cells differ. When rats perform at asymptotic levels on hippocam- III entorhinal input. By comparison, CA3 receives its direct

pal-dependent spatial memory tasks, CA3 place fields are smaller entorhinal cortex input from layer II (Witter et al., 2000) which

than CA1 place fields, and more easily disrupted following cue seems to contain more neural codes for explicit spatial features

manipulations (Barnes et al., 1990; Guzowski et al., 2004; than layer III. If some of the nonspatial input to CA1 includes

Mizumori, 2006; Mizumori et al., 1989b, 1999). CA3 place fields memory-defined expectations, then this may account for a greater

are more labile generally than CA1 place fields in that they are also proportion of CA1 place fields showing stability across minor shifts

more easily disrupted following reversible inactivation of the in context.

medial septum (Mizumori et al., 1989a). The greater sensitivity of If CA3 is primarily responsible for the comparison of contextual

CA3 fields to changed inputs seems to occur regardless of the type information, then what function does CA1 serve? Many have

of task being used (Lee et al., 2004; Leutgeb et al., 2004). This may suggested that CA1 is especially important for temporally

indicate that CA3 place fields are more exclusively linked to the organizing or sequencing information (e.g., Gilbert et al., 2001;

currently active spatial coordinate system (i.e., a map; Leutgeb Hampson et al., 1993; Hoge and Kesner, 2007; Kesner et al., 2004;

et al., 2007) compared to CA1 place fields. As such, CA3 is better Olton et al., 1979; Rawlins, 1985; Treves, 2004; Wiener et al.,

suited than CA1 to distinguish the contextual significance of 1995). That is, CA1 place cells may temporally organize, or define,

absolute locations in space, a process that presumably relies on CA3 output such that meaningful epochs of related information are

small differences in input configurations at different locations. This passed on to efferent targets, such as the prefrontal cortex (Jay

function is likely related to the key role that CA3 plays in the rapid et al., 1989) and subiculum, to impact future behavioral choices.

acquisition of new memories (Kesner, 2007; Miyashita et al., 2009), Neocortical-based memory representations may, via direct ento-

a conclusion that is consistent with a vast literature on the rhinal input to CA1 (Witter et al., 2000), predispose CA1 to

importance of hippocampus for new learning (Mizumori et al., temporally organize CA3-based information in experience-depen-

2007b). dent ways (Mizumori et al., 1999). Although the precise nature of

If CA3 is the brain area where context novelty is identified, then this temporal organization remains to be determined, CA1 appears

one would expect CA3 to also represent information that defines to be more tightly coupled than CA3 cells to the rhythmic

the baseline expectations from which novelty (i.e., unexpected oscillations of hippocampal EEG (Buzsaki, 2005; Buzsaki and

information) is determined. In this regard, it is worth noting that Chrobak, 2005).

despite the greater overall sensitivity of CA3 place fields to changes

in contextual information, a subpopulation of CA3 place fields 5.3.2. Temporal encoding of spatial contextual information

continue to persist when faced with contextual changes in familiar It is becoming clearer that important context information is

environments (Mizumori et al., 1999). Novelty detection requires a embedded within the temporal organization of intrahippocampal

mechanism by which baseline and new information can be held networks. Many years ago, it was shown that movement through

briefly on-line so that the expected and current information can be place fields is associated with dynamic changes in spike timing

compared. The intrinsic circuitry of CA3 is one that can hold relative to the ongoing theta oscillations in the EEG (O’Keefe and

information on-line: less than one-third of its inputs come from Recce, 1993). That is, on a single pass through a field, the first spike

outside of CA3 (Amaral and Lavenex, 2006), and the most of successive bursts of spikes occurs at progressively earlier phases

prominent input to CA3 pyramidal cells come from the CA3 cells of the theta cycle. The discovery of this so-called ‘phase precession’

106 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

effect is considered significant because it was the first clear hippocampus via the medial regions of the parahippocampal

evidence that place cells are part of a temporal code that could cortex (i.e., postrhinal cortex and the MEC) since a prominent input

contribute to the mnemonic processes of the hippocampus. to postrhinal cortex is the posterior parietal cortex (Burwell and

Changes in this sort of temporally organized spiking may be a Amaral, 1998a,b). In contrast, the multimodal temporal cortex of

key mechanism by which place fields provide a link between rat projects nonspatial information to the hippocampus via the

temporally extended behaviors of an animal and the comparatively lateral parahippocampal regions (i.e., perirhinal cortex and LEC).

rapid synaptic plasticity mechanisms that are thought to subserve Both MEC and LEC afferents appear to relay visual, auditory,

learning (e.g., Skaggs et al., 1996). Theoretical models have been olfactory and/or tactile sensory information (Burwell and Amaral,

generated to explain in more detail how phase precession could 1998a). Thus, the nature of information transmitted within a

explain the link between predictive and sequence behaviors, and pathway or brain structure does not reveal how that information is

neural plasticity mechanisms (Buzsaki, 2005; Buzsaki and used. [This broad conclusion will be seen to be relevant when the

Chrobak, 2005; Jensen and Lisman, 1996; Lisman and Redish, mesoaccumbens system is discussed below.] Also, although the

2009; Zugaro et al., 2005). MEC is often considered to be specialized to process spatial

Another form of temporal-based neuroplasticity involves a information, accurate navigation likely relies on integrated input

change in the timing of spike discharge by one cell relative to those from both MEC and LEC since one needs to understand the spatial

of other cells. For example, theta recorded from CA1 and CA3 tend dimensions of behavior (e.g., location and orientation) relative to

to be more cohesive when rats pass through the stem region of a T- salient environmental information. Indeed, contralateral, but not

Maze, presumably reflecting greater synchrony of neural firing ipsilateral, lesion of the perirhinal cortex and the hippocampus

during times when decision are made (Montgomery et al., 2009). results in impaired object–place association learning (Jo and Lee,

Greater synchronization could offer a stronger output signal to 2010).

efferent structures. Experience-dependent temporal codes may The recent development of more theories on a more specific

also be found in terms of the temporal relationships between the role for parahippocampal cortex during active navigation is mainly

firing of cells with adjacent place fields. With continued exposure due to the discovery of multiple types of spatial representation in

to a new environment, place fields begin to expand asymmetrically the MEC (Enomoto and Floresco, 2009; Hafting et al., 2005;

in that the peak firing rate is achieved with shorter latency upon Sargolini et al., 2006; Taha et al., 2007), including grid cells and head

entrance into the field (Mehta et al., 1997, 2000). It was postulated direction cells (see Fig. 5). Like place cells, grid cells fire when

that repeated activation of a particular sequence of place cells animals traverse specific locations within an environment.

results in stronger synaptic connections between cells with However, unlike place cells, grid cells fire relative to a number

adjacent fields. Under these conditions entry into one place field of small regions arranged in a hexagonal grid rather than in a single

begins to activate the cell with the adjacent place field at shorter region of a given environment. Head direction cells, on the other

and shorter latency. The asymmetric backwards expansion of place hand, show elevated firing rates that coincide with the particular

fields is thought to provide a neural mechanism for learning head orientation of the rat regardless of the rat’s location. A third

directional sequences. Moreover, it has been suggested that the population of cells shows both grid and head direction properties,

backward expansion phenomenon may contribute to the trans- and are therefore called conjunctive cells. Finally, a fourth class of

formation of a rate code to a temporal code such as that illustrated spatial cell is the border cells that are found in the medial entorhinal

in phase precession (Mehta et al., 2000). The backward expansion cortex. Head direction cells and border cells are known to also exist

mechanism could also help to explain other place field phenome- in related cortical regions, such as subiculum, postsubiculum,

non such as the tendency for place cells to fire in anticipation of parasubiculum, and postrhinal cortices (Lever et al., 2009; Taube

entering a field within a familiar environment (Muller and Kubie, et al., 1990). There are strong anatomical and functional ties

1989). While the dynamic changes in place field shape are between cells associated with these types of spatial representation,

intriguing, it remains to be determined whether the asymmetric and they are thought to form a coordinated network for orienting

expansion is directly related to spatial learning. Also, there is an an animal in allocentric space.

intriguing possibility that dopamine may play a key role in There are a number of excellent reviews that detail grid field

coordinating some aspect of the temporal phenomena observed in properties (Burgess et al., 2007; Derdikman and Moser, 2010;

hippocampus. For example it has been shown that the temporal Moser et al., 2008; Savelli and Knierim, 2010). Briefly, MEC layer II

coherence of the discharges of place cells is greater in mice with an has the highest proportion of grid cells (50%), layer III has a more

intact hippocampus compared to mice with deficient NMDA diverse blend of grid cells, head direction cells and conjunctive

systems (McHugh et al., 1996), and there is evidence that cells; head direction cells are the predominant cell type in the deep

dopamine may exert powerful influences in hippocampus via layers. Nearby grid cells tend to have similar spacing, but their

control of NMDA receptor function (e.g., Bethus et al., 2010; Frey peaks are offset relative to each other. The spacing seems to reflect

et al., 1990). Therefore it is possible that even though the relative spatial features of the current environment since, in familiar

quantity of dopamine innervations in hippocampus is small (Fields environments, grid fields will rotate in the direction of cue

et al., 2007) dopamine may have a critical orchestrating role in a rotations, and if a familiar environment is widened or narrowed,

hippocampal determination of context salience. grid field spacing will resize accordingly (Barry et al., 2007). Across

the dorsal–ventral axis, there seems to be a topographically

5.3.3. Sources of hippocampal spatial and nonspatial information organized increased spacing of adjacent grid fields (Enomoto and

Consideration of the sources of the different types of informa- Floresco, 2009; Hafting et al., 2005). If experimental procedures

tion that enters into hippocampal context-related computations induce grid field reorganization, different grid fields rotate and

provides keen insight into the stages of processing required to translate together. Such cohesion between grid cells, along with

make efficient, context-relevant choices. The parahippocampal the regularity of the grids and their apparently consistent spacing,

region (which includes perirhinal, postrhinal, and entorhinal gives the impression that the grid system is stable across

cortices; see Fig. 4) is considered to provide the bulk of the spatial environments and that they might form a blueprint (i.e., a spatial

and nonspatial sensory information to the hippocampus (Burwell, reference frame) onto which the hippocampus can add relevant

2000; Burwell and Amaral, 1998a,b; Eichenbaum and Lipton, 2008; information. Presumably, these sort of spatial and nonspatial

Hunsaker et al., 2007; Knierim et al., 2006; Witter et al., 2000). associations in hippocampus derive from convergent input from

Generally, spatial information is thought to arrive in the the MEC and LEC. This associative process must occur fairly rapidly

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 107

since hippocampal place fields are observed upon first exposure to (e.g., place and grid fields) may contribute to the determination of

a new environment (e.g., Hill, 1978; Muller and Kubie, 1987; context saliency, but there is abundant evidence to support the

O’Keefe and Burgess, 1996; Wilson and McNaughton, 1993). The claim that this is a key function of the hippocampus.

apparent regularity of the spatial representations within the

hippocampal and entorhinal system has been further strengthened 5.3.4. Determining context saliency as a part of learning

by findings that grid fields, head direction preferences, and place As one learns the significance of a new environment, one’s

fields show a high degree of coherence (e.g., displacement) in perception of the relationship between environmental stimuli,

response to changes in simple geometric environments (Har- responses, and consequences is continually updated. Presumably,

greaves et al., 2007; Lee and Knierim, 2007; Nicola et al., 1996). mismatches between updated expectations and experiences with

Additional studies, however, suggest that a straightforward the new context are frequently detected, resulting in the continual

description of the relationship between grid and place fields is not shaping of long-term memory representations (McClelland et al.,

likely. Place fields in CA1 continue to reorganize in response to 1995). As memory representations become more precise, so too will

changes in the visuo-spatial environment for periods of time that the feedback to hippocampal cells regarding the expected contex-

exceed the period of grid field responses (Van Cauter et al., 2008). tual features. Thus, it is predicted that place fields should become

Also, place fields have been observed to become more specific after more specific and reliable with continued training as one gradually

repeated exposure to a familiar environment (Nicola and Malenka, learns about associations relevant to the test environment. In

1998) even after entorhinal cortex lesions. Further, as the support of this prediction, many studies have shown that place fields

behavioral tasks have become more complex, so have the nature become more specific and/or reliable with short-term exposure to

of the responses of grid fields. Importantly the hexagonal grid novel environments (e.g., Frank et al., 2004; Hetherington and

patterns do not appear to persist in more complex environments. Shapiro, 1997; Kentros et al., 1998; Markus et al., 1995; Muller and

When an animal is running along a linear track, the grid patterns Kubie, 1987; O’Keefe and Burgess, 1996; Wilson and McNaughton,

reset when rats turns around (Fyhn et al., 2007) and if a maze 1993). More spatially selective firing (or reduced ‘overdispersion’)

contains multiple hairpin turns, the resetting occurs periodically has also been reported to reflect goal-directed learning (e.g., Fenton

(Hikosaka et al., 2008). Finally, when using a linear track that is and Muller, 1998; Mizumori et al., 1996; Kentros et al., 1998; O’Keefe

18 m long, periodicity is limited to sections of the track (Nicola and and Speakman, 1987; Rosenzweig et al., 2003).

Malenka, 1998). These observations imply that the ‘gridness’ of Learning can be considered to be complete when mismatches

each cell is subject to being organized by ongoing behavior perhaps no longer occur and consistent memory representations are

separately from place field reorganization. The extent to which maintained during behavior (Mizumori, 2008). Indeed, after

other features of a context (e.g., motivation, memory, etc.) learning, place fields are remarkably stable after repeated

similarly impact all spatial representations remains to be exposures to the same, familiar context, and this presumably

determined. reflects stable input from memory representations. If more than

One issue of importance is the assumption that place and grid one context is learned simultaneously, a given population of place

field reliability and spatial specificity is necessary for optimal cells should show context-specific patterns of place fields, and each

decision-making during navigation. For place fields, this issue has pattern should be reliable for that context (Smith and Mizumori,

been addressed in a number of ways (for review see Mizumori 2006a,b). Presumably, such stable hippocampal patterns are in

et al., 2007b), including demonstrations that physiological some way driven by established neocortical networks, or schemas

conditions that are associated with normal learning and decisions (Tse et al., 2007). To insure adaptive behavior, however, the

(e.g., synaptic plasticity mechanisms, sensory and motor proces- hippocampus must constantly engage in context comparisons in

sing systems, motivational systems, and so on) are also associated the event that the familiar context is altered. Similarly, hippocam-

with greater place field stability. Although a systematic and direct pus should process contextual information even for tasks that do

test of this relationship has yet to be carried out, it is worth noting not explicitly require contextual knowledge in case contextual

that it may be difficult to observe a clear and strong correlation information becomes relevant. Place cell studies indeed, show that

between (at least) CA1 place field stability and choice accuracy specific neural codes in the hippocampus remain responsive to

since the recorded CA1 population tend to exhibit a heterogeneous changes in context even though contextual learning is not

collection of neural responses (e.g., within a single recording necessary to solve a task (Yeshenko et al., 2004). Thus, processing

session, there are individual cell differences in place field contextual information by the hippocampus appears to be

responses to context changes). Indeed, laboratories have reported automatic and continuous (Morris and Frey, 1997). A different

a lack of correlation between CA1 place field reorganization and but related theory is that the hippocampus uses context

behavior (e.g., Cooper and Mizumori, 2001; Jeffery et al., 2003). information to recall specific context-relevant memories (Fuhs

Most of the place field data in the literature are based on recordings and Touretzky, 2007; Redish, 1999; Redish et al., 2001).

from CA1 neurons. Therefore the relationship between CA3 place If the hippocampus continually processes contextual informa-

field properties and optimal decisions remains to be determined. tion, then why do hippocampal lesions disrupt only certain forms

The same is true for grid cells: the results of direct tests of the of learning and not others? If one assumes that lesion effects are

relevance of grid fields for accurate decisions are not yet known. observed only when the intrinsic processing by the structure of

The discussion so far presents the view that hippocampus interest is unique and essential for learning to take place, then no

functions to detect differences between contexts, or detect when a behavioral impairment should be observed if other neural circuits

context changes. A basic algorithm that compares an animal’s can compensate for the lesion-induced change in function. Indeed,

expectations of a familiar contextual environment (i.e., the spatial there is abundant evidence that under most conditions, stimulus–

layout of external sensory cues, the relevant behaviors to obtain response learning is not impaired following hippocampal lesions,

rewards, the location of goals, and consequences to specific since striatal computations are sufficient to support such learning

choices) with actual experiences can be used to discriminate (e.g., McDonald and White, 1993; Packard et al., 1989; Packard and

contexts, detect changes in a familiar context, or identify novel McGaugh, 1996). This does not mean that the hippocampus does

situations. All of these operations have in common the need to not normally play a role in stimulus–response performance, but

determine the saliency of the current context. There is currently rather, that the hippocampus may contribute by defining the

only a rudimentary understanding of how the various neural context for the learning, which in turn may allow the learned

representations of the spatial context by hippocampal neurons information to be more adaptive in new situations in the future.

108 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

distributed across the subiculum, CA1, CA3, and the dentate gyrus

Hippocampus (CA1/subiculum)

with CA1 and the subiculum receiving more innervation relative to

Prefrontal CA3 and the dentate gyrus (Gasbarri et al., 1994a,b, 1997).

cortex Compared to other efferent structures of the dopaminergic system

GLU such as the nucleus accumbens, the hippocampus receives a

relatively small proportion of input from the VTA; 10% or less of the

Ventral

GLU cytochemically identified dopamine neurons project to the

striatum

hippocampus, whereas 80% of that population projects to the

GABA Ventral nucleus accumbens (Fields et al., 2007).

GABA DA

GLU pallidum Although the hippocampus receives modest dopaminergic

innervation from the VTA, it is one of the few brain regions that

Ventral

express all of the five dopamine receptor subtypes. The dentate

tegmental GLU

GLU gyrus and subiculum shows high levels of the D1 receptor subtype,

area ACh

and the D1-like D5 receptors are expressed throughout the

Pedunculopontine

hippocampus. D2 receptor binding sites are most prominent in

tegmental

dorsal CA1 and the subiculum, while the levels of D3 receptors are

nucleus

low throughout. Finally, D4 receptors are found in the dentate

Lateral dorsal tegmentum

gyrus, CA1, and CA3. The dopaminergic innervations of the

Lateral habenula

structure, along with the expression of all five receptor subtypes

Lateral hypothalamus

allows dopamine to have a powerful influence on the function of

and more

the hippocampus, impacting information processing and plasticity

(Frey et al., 1990; Huang and Kandel, 1995; Li et al., 2003;

Fig. 6. An essential that links hippocampal (spatial context)

Otmakhova and Lisman, 1998).

information with reinforcement learning and decision making systems of the

The path from hippocampus to the midbrain dopaminergic

brain. Direct hippocampal arrives in the reinforcement learning system via the CA1

and subicular projections to the ventral striatum (i.e., the nucleus accumbens). The system is indirect and varied (see Fig. 6). The most direct path from

ventral striatum is thought to serve as the ‘critic’ in the actor–critic model of the hippocampus involves transmission from both dorsal and

reinforcement learning. As such, the ventral striatum determines whether the

ventral subiculum, and to a lesser extent CA1, via the fimbria-

outcomes of behavior are as predicted based on an animal’s expectations for a given

(Boeijinga et al., 1993; Lopes da Silva et al., 1984; Groenewegen et al.,

context. If the outcome is as expected, ventral striatum continues exerting

inhibitory control over VTA neurons. In this situation, encounters with rewards do 1999a, 1987; McGeorge and Faull, 1989; Mulder et al., 1998;

not result in dopamine cell firing. If the saliency of a context changes (as determined Swanson and Cowan, 1977; Totterdell and Meredith, 1997; van

by hippocampal processing), signals to the ventral striatum may preferentially

Groen and Wyss, 1990). More specifically, the dorsal subiculum (and

excite VTA neurons via an indirect pathway that includes the ventral pallidum and

CA1) project primarily to the rostro-lateral shell region of the

the pedunculopontine nucleus. The result of this elevated excitation may be a

nucleus accumbens, while the ventral subiculum (and CA1)

depolarization of VTA neurons such that they are more likely to fire when

subsequent reward information arrives in VTA. selectively terminate throughout the rostral–caudal extent of the

accumbens shell. Entorhinal cortex also provides extensive input to

5.4. Relationship between hippocampal context codes and the nucleus accumbens, with the MEC preferentially innervating the

reinforcement based learning rostro-medial shell and core divisions of the accumbens, and the LEC

terminating throughout the rostral–caudal extent of the lateral shell

Hippocampal efferent systems can use the result of the and core regions (Totterdell and Meredith, 1997). It should be noted

hippocampal context analysis to update their neural response that the limbic input to the ventral striatum (including the nucleus

profile such that subsequent behavioral choices are optimized. The accumbens) is one of a number of convergent inputs to individual

midbrain and striatal reinforcement learning systems are a major ventral striatal neurons (e.g., Floresco et al., 2001; French and

target of hippocampal output (see Fig. 6). Therefore, it is often Totterdell, 2002; Goto and O’Donnell, 2002; O’Donnell and Grace,

assumed that hippocampus provides the necessary context 1995). Other sources of afferents include the prelimbic/infralimbic

information that guides dopamine-related reward or behavioral and orbital frontal cortices, as well as the basolateral .

responses. The outcomes of behavioral choices are evaluated by Thus,the ventral striatum has long been considered a central point of

the reinforcement learning system, and the result of such an integration of information needed for adaptive behaviors (Mogen-

evaluation is thought to feed back to memory systems and the son et al., 1980).

hippocampus to update future context-based expectations. To It is through the ventral striatum that the hippocampus may

begin to discuss how a hippocampal evaluation of context saliency ultimately impact dopamine cell firing, since the ventral striatum

impacts reinforcement learning systems of the brain, the following in turn innervates the VTA and SNc. Moreover, both of the core and

discusses (1) a neuroanatomical network that supports a shell components of the nucleus accumbens have some degree of

functional link between hippocampal place fields and reinforce- control over the dopamine cells that in turn project to them. The

ment learning systems, (2) evidence for a role for dopamine in details of the circuitry is complex (for a recent excellent summary,

hippocampal-dependent learning and plasticity, and (3) the see Humphries and Prescott, 2010) but of direct relevance here is

possible impact of hippocampal context processing on dopamine that the lateral and medial shell innervates, either via direct or

cell responses to reward. indirect routes, the lateral or ventral sectors of the VTA,

respectively (Ikemoto, 2007; Zhou et al., 2003). This pattern

5.4.1. Functional connectivity between reinforcement and matches the topography of VTA connections back to the shell

hippocampal systems region. Also of note is the fact that both GABA and dopamine

Direct dopaminergic innervation of the hippocampus arises neurons participate in this reciprocal interaction between VTA and

from both the VTA and the sunstantia nigra pars compacta (SNc), ventral striatum (Carr and Sesack, 2000; Nair-Roberts et al., 2008).

although input from the VTA is more extensive (Gasbarri et al., This is an important point to note since studies of VTA single unit

1994b). Dopaminergic projections occur across the entirety of the representations during hippocampal-based memory performance

dorsal–ventral axis of the hippocampus, with the ventral axis being suggest that it is likely that both dopaminergic and GABAergic

more heavily innervated. The innervation is also differentially populations contribute to reward processing (Martig and

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 109

Mizumori, 2011; Puryear et al., 2010). Core regions of the The hippocampus likely plays a role in detecting changes in

accumbens project to a slightly different population of dopami- familiar contexts, and for generating novelty related signals that

nergic neurons, those in the SNC and in the lateral regions of the initiate relevant investigatory behaviors for both spatial and

VTA (Berendse et al., 1992a,b; Usuda et al., 1998; Zhou et al., 2003). nonspatial tasks. Interestingly, the dopamine system is also known

These dopaminergic regions seem to project back to the same core for its association with novelty detection (Horvitz et al., 1997;

areas that project to it (Joel and Weiner, 1994). For both shell and Ljungberg et al., 1992; Redish et al., 2007; Seamans and Yang,

core regions, their impact on the VTA and SNc are presumed to be 2004), a response that is perhaps triggered following hippocampal

inhibitory since the accumbens projection cells are GABAergic. identification of novelty. Further, exposure to novel environments

Thus, one possibility is that excitatory (glutamatergic) messages enhances synaptic plasticity mechanisms in hippocampus, and this

from the hippocampus add to the inhibitory control over enhancement appears related to D1 receptor activation (Li et al.,

dopaminergic neurons. Currently it is not possible to state how 2003). Thus, it has been postulated that a functional loop between

much control the hippocampus exerts onto dopamine neurons the VTA and the hippocampus allows novelty signals from the

since we do not yet fully understand the significance and hippocampus to be relayed to the VTA to generate responses to

mechanism of convergence in ventral striatum of hippocampal, novelty by dopaminergic neurons (Lisman and Grace, 2005;

frontal and amygdala information. Nevertheless, this is likely an Mizumori et al., 2004). The latter responses are then thought to

important pathway by which hippocampal systems and the be relayed back to the hippocampus to facilitate plasticity circuits

midbrain motivational circuitry interact. and learning.

In addition to the hippocampal–accumbens–VTA/SNc pathway, Most of the studies investigating possible dopaminergic effects

there are a number of sources of excitatory and inhibitory control on hippocampal function include the application of drugs directly

over dopamine cell firing (see Fig. 6), and details of these to, or lesions of, the hippocampus. Recently, Martig et al. (2009)

connections remain to be worked out. Four of the most studied employed a different approach, and that was to reversibly

dopamine afferent systems include the frontal cortex and the inactivate the VTA of rats to temporarily reduce endogenous

amygdala (Lodge and Grace, 2006; Woolf, 1991), as well as the levels of dopamine within the hippocampus. Attempts were made

pedunculopontine nucleus (PPTg) and the lateral dorsal tegmental to selectively silence VTA dopamine neurons by infusing baclofen

nucleus. As an example of the complex nature of each afferent (Xi and Stein, 1998), rather than more broadly inactivating VTA

input, the PPTg provides cholinergic (Woolf, 1991) and glutama- with anesthetics such as lidocaine or tetracaine. VTA inactivation

tergic input to VTA and SNc (Beninato and Spencer, 1987; Futami significantly impaired choice accuracy on a hippocampal-depen-

et al., 1995; Sesack et al., 2003) and this input is topographical in dent spatial working memory task. However the effect was time

nature. The PPTg is characterized by an uneven distribution of dependent: greater impairment was observed after the initial days

distinct populations of cholinergic, glutamatergic, and GABAergic of infusion, suggesting some form of compensatory change in the

cells (Wang and Morales, 2009), with differential input and output neural circuitry connecting the hippocampus and the VTA. Further,

projections of its anterior and posterior subdivisions (Alderson VTA inactivation selectively impaired short term working memory,

et al., 2008). Cholinergic cells are concentrated in posterior PPTg a form of memory that is hypothesized to be important following a

(Wilson et al., 2009) and project mostly to VTA, while anterior PPTg change in context. Importantly, the selective behavioral effects

contains proportionately greater GABAergic cells that project to demonstrate that the hippocampal effects were not due to changes

the SNc (Oakman et al., 1995). It has been argued that the PPTg in behavioral control or motivation.

regulates the transition to burst firing by dopamine cells (Grace In a subsequent experiment, Martig and Mizumori (2011)

et al., 2007), but precisely how this happens remains under recorded hippocampal place field responses to baclofen-induced

investigation. Thus, the ventral striatum may ultimately be in a inactivation of the VTA as rats performed a spatial working

position to orchestrate the balance between inhibitory and memory task on a radial arm maze. Based on the findings of

excitatory control over dopamine cell firing depending on the Kentros et al. (2004), it was predicted that VTA inactivation would

determination of saliency of the current context by hippocampus. destabilize choice accuracy that is dependent on hippocampal

function, as well as the stability of place fields. Also, given the

5.4.2. A role for dopamine in hippocampal-dependent learning and differential distribution of VTA afferents to the hippocampal

plasticity subfields (CA1 > CA3), it was expected that CA1 place fields would

There is abundant evidence that the dopaminergic system plays be impacted more dramatically than CA3 place fields. Finally, given

an important role in hippocampal-dependent behavior and the transient behavioral effect that was observed by Martig et al.

plasticity. The hippocampal dopaminergic system has been (2009), the maze training procedures were modified to increase

manipulated in a number of ways, and the bulk of the evidence the likelihood that VTA was essential for good performance. That is,

shows that dopaminergic agonism and antagonism, respectively, rats learned to expect rewards of different magnitudes at specific

enhance and impair spatial learning. As examples, D1 receptor locations on the maze.

knock-out mice exhibit deficits in spatial learning (El-Ghundi et al., The results showed that VTA inactivation significantly, and

1999) and selective 6-OHDA lesions in hippocampus impaired more consistently, impaired choice accuracy than in Martig et al.

performance in the Morris swim task (Gasbarri et al., 1996). Direct (2009). This behavioral impairment occurred even though rats

hippocampal infusions of agents that disrupt D1–NMDA receptor retained their preference to visit maze locations that were

interactions also produce performance deficits in the working previously associated with large rewards. This result was

memory version of the Morris swim task (Nai et al., 2010). Selective surprising given that VTA neurons are known to preferentially

removal of hippocampal dopamine input via local 6-OHDA respond to larger rewards than small rewards (Puryear et al., 2010;

infusions into the subiculum and adjacent CA1 region of rats also Schultz et al., 1997). The authors interpreted this unexpected

impairs performance in the spatial version of the water maze result to indicate that VTA’s selective coding of large rewards is not

(Gasbarri et al., 1996). Manipulations of endogenous levels of necessary or sufficient to drive behavioral choices toward the large

dopamine in the hippocampus also negatively impact hippocam- rewards. Rather, the VTA neural codes may contribute to an

pal-dependent processing (e.g., Kentros et al., 2004; Martig et al., evaluation of the consequences of behaviors. Recorded hippocam-

2009; Wisman et al., 2008). Finally, dopamine agonist treatment in pal CA1 place cells showed less stable fields after VTA inactivation

the hippocampus can reverse age-related decreases in spatial relative to control conditions and relative to CA3 place cells. The

performance (Bach et al., 1999; Behr et al., 2000). differential response reveals that in a well learned task, CA3 place

110 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

fields alone are not sufficient to maintain high choice accuracy shown to reduce EPSPs in subiculum (Behr et al., 2000). This result

during navigation. This supports the view described above that a implies that excitatory inputs to hippocampus must surpass the

hippocampal evaluation of the expectations (and hence saliency) inhibitory influence of low levels of dopamine in subiculum.

of a context requires coordinated effort between CA1 and CA3. However, when large quantities of dopamine are applied, there is a

In summary, there is substantial evidence that there is an facilitation of long lasting synaptic potentiation in the CA1 region

important role for the VTA dopamine cells in regulating (Huang and Kandel, 1995). Therefore, dopamine acts to dose-

hippocampal-dependent learning and context representation. dependently gate excitatory drive by reducing the effectiveness of

The place field data show that hippocampal neurons rely on potentially irrelevant inputs. By determining the overall effective-

dopamine input for representing context-relevant information ness of excitatory inputs within a structure, dopamine could be

over time. These results are consistent with growing evidence that part of a mechanism that determines the likelihood that new or

dopamine increases the stability of neural plasticity mechanisms salient information is remembered.

in hippocampus. Cellular mechanisms for this stabilization

function are revealed in studies of dopamine effects on hippocam- 5.4.3. Impact of hippocampal context processing on dopamine cell

pal synaptic plasticity. Dopamine appears to importantly regulate responses to reward

a leading model of learning-related synaptic plasticity, long-term In contrast to the abundant evidence for a functional link from

potentiation (LTP). LTP is generally described as a persistent the dopaminergic system to the hippocampal system, converging

increase in synaptic efficiency (Martin et al., 2000), and it has been evidence for a functional link in the other direction is only recently

shown that its induction alters place fields (Dragoi et al., 2003). The beginning to emerge. Nevertheless, existing theories argue that the

duration of LTP varies depending upon the pattern of neural VTA–hippocampal connection is important for several complex

activation used for induction (Morris and Frey, 1997). D1 receptor behaviors, such as reinforcement learning, spatial/contextual

activation appears critical for the maintenance of late phase LTP in learning, and motivation (Fields et al., 2007; Lisman and Grace,

CA1 (L-LTP; Frey et al., 1990, 1991; Huang and Kandel, 1995; 2005; Schultz, 2002; Wise, 2004). Central to these functions is the

Williams and Eskandar, 2006). Dopamine application is also idea that dopamine may strengthen stimulus–reward associations

capable of inducing LTP, referred to as early phase LTP (E-LTP), in (Schultz, 2002). Accordingly, dopamine neurons fire upon presen-

the dentate gyrus, following stimulation protocols which are tation of unexpected rewards and conditioned cues that predict

normally insufficient to do so (Kusuki et al., 1997). Further there is reward, and they are inhibited when expected events do not occur

some indication that dopamine agonists alone may be sufficient to (Schultz and Dickinson, 2000). These firing patterns may signal an

induce a slowly developing potentiation that is independent of any error in the prediction of reward (Bayer and Glimcher, 2005;

other external stimulation (Huang and Kandel, 1995; Williams Hollerman and Schultz, 1998), and this in turn enables the use of

et al., 2006; Williams and Eskandar, 2006). The general pattern, flexible behaviors during learning (Schultz and Dickinson, 2000).

then, seems to be that dopamine elevates and/or maintains The reward prediction error signal appears to take into account the

synaptic excitability of hippocampal neurons. Enhancing the behavioral context in which rewards are obtained (Nakahara et al.,

duration of strong neural signals may be an important way to 2004; Roesch et al., 2007), context information that may derive

increase the associative capacity of temporally discrete events, and from hippocampal input. If this is the case, it should be possible to

this could in turn facilitate accurate determinations of context record similar reward responses in freely behaving rats performing

saliency. a hippocampal-dependent maze task. A recent study explicitly

A possible mechanism for dopamine’s effects on hippocampal tested this idea.

neurons was revealed by findings that dopamine agonist-induced Puryear et al. (2010) found that VTA dopamine neurons

L-LTP can be significantly attenuated by NMDA-receptor antago- increased firing when rats encountered rewards in expected

nism (Stramiello and Wagner, 2008) suggesting an important locations on a radial maze, and that the response was much larger

interaction between these neurotransmitter systems. There is following encounters of the larger size rewards. This is analogous

additional evidence that the interaction between glutamatergic to dopamine responses reported from studies with primate

and dopaminergic systems modulates heterosynaptic LTP, where- (Schultz et al., 1997). Moreover, it appeared as if these cells fired

by weak inputs become strongly potentiated (O’Carroll and Morris, in response to cues that predict reward in that they exhibited

2004). Specifically, it is suggested that NMDA-receptor activation elevated discharge coincident with an auditory stimulus that

in hippocampus may ‘prime’ synaptic markers that synergize with signified the beginning of a trial. Also, it was shown that changes in

neuromodulatory signals, such as dopamine, to initiate increases in the visual aspects of the test environment resulted in significant

the mRNA and protein synthesis that is thought to be so important alterations in the reward responsiveness of the dopamine neurons.

for L-LTP (Frey and Morris, 1997). Thus, again as shown in primate studies, the dopamine reward

The electrical stimulation protocols used to induce LTP are responses appear to be context-dependent. Of particular interest

unlikely to occur during natural learning scenarios. However, was whether rodent VTA neurons would show evidence for either

evidence indicates that lasting changes in synaptic plasticity in the positive or negative reward prediction signaling during navigation

hippocampus can result from exposure to different spatial based goal-directed behaviors. Indeed, it was found that VTA cells

contexts. Dopamine has been implicated in such context-induced increased firing when a larger than expected reward was

changes in hippocampal synaptic plasticity. Pre-treatment with a encountered, and reduced firing when an expected reward was

D1/D5 receptor antagonist interferes with the LTP-inducing effects not found. In addition to confirming that rodent dopamine cells

of spatial exploration (Lemon and Manahan-Vaughan, 2006; Li code reward when spatial information is used to guide behaviors to

et al., 2003). The ability of dopamine to gate exploration-induced locations that signify food, use of a navigation-based task allowed

synaptic plasticity, then, may be reflected in changes in spatially Puryear et al. (2010) to examine the relationship between

selective neural activity. If dopamine enhances the duration of LTP, voluntary movement and reward codes. This was of interest given

then dopamine may act to stabilize place field properties. This a vast clinical and research literature showing a critical role for the

hypothesis was supported recently by Martig and Mizumori (2011) dopamine system in the voluntary initiation of behaviors. The

who found that temporarily removing dopamine input to place firing rates of dopaminergic reward neurons were found to be

cells reduces place field stability. correlated with velocity and/or acceleration as rats moved

Hippocampal output via the subiculum is also modulated by between food locations. However, in contrast to the reward

dopamine afferents. In one study, a low dose of dopamine was responses, the movement correlates were not context-dependent,

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 111

suggesting that there are at least two independent sources that that does not involve the PPTg (such as the lateral habenula,

regulate dopamine cell firing during navigation. Matsumoto and Hikosaka, 2007). A separate population of PPTg

A rather surprising result of the Puryear study was that neurons exhibited firing rate correlations with the velocity of

dopamine neurons consistently responded to rewards even though movement. There were also a small number of cells that encoded

the task was well learned. According to the now classic studies by reward in conjunction with a specific type of egocentric movement

Schultz and his colleagues (e.g., Schultz, 1998b, 2010; Schultz et al., (i.e., turning behavior). The context-dependency of PPTg reward

1997) dopamine cells cease firing to rewards and instead fire in responses was tested by observing the impact of changes in

response to the presentation of cues that predict rewards. Firing to visuospatial and reward information. Visuospatial, but not reward

cues was in fact observed in the Puryear study, but so was firing to manipulations significantly altered PPTg reward-related activity.

the rewards. One possible explanation for the continued response Movement-related responses, however, were not affected by either

to reward by dopamine neurons is that our working memory task type of manipulations. These results suggest that PPTg neurons

generated a sufficient degree of uncertainty about choices that conjunctively encode both reward and behavioral response

dopamine responses to rewards were retained (Fiorillo et al., information, and that the reward information is processed in a

2003). Dopamine signals can be thought of as ‘uncertainty signals’ context-dependent manner.

that reflect the strategy of continually updating action–outcome Upon closer examination of the PPTg data, it was found that

systems to optimize future behavioral choices. To test this excitatory reward responses predominated for anterior PPTg cells,

hypothesis, Martig and Mizumori (2011) recorded VTA neurons and not posterior PPTg neurons. Considering their different

as rats learned a spatial task that did not involve working memory. efferent targets (Puryear and Mizumori, 2008), it appears that

Rats learned to visit the same maze arm to obtain food reward. there is increased synaptic drive to nigral cells from anterior PPTg

After rats learned the initial goal location over days, the same rats coincident with reward consumption in our task. At the same time

were trained to find food in a novel location; after rats learned the there is reduced synaptic drive to VTA. This was unexpected since

second location, a third novel location was introduced. The number it has been shown that under the identical test conditions, both

of VTA cells showing reward responses declined as additional VTA and nigral cells increase burst firing relative to reward

locations were learned. For comparison, SNc neurons were also acquisition (Gill and Mizumori, 2007; Martig and Mizumori, 2011;

recorded as rats performed the same task. In contrast to the VTA Puryear et al., 2010). To account for this apparent discrepancy, it is

cells, SNc cells did not show a change in the number of reward cells suggested that during reward acquisition, the reduction of

with continued training. This differential response of VTA and SNc cholinergic input to VTA from the posterior PPTg may reduce

cells is potentially highly significant since it (1) suggests that the excitatory drive to VTA GABA neurons. Since VTA GABA

dopamine signaling can have more than one function, and (2) neurons normally provide inhibitory control over dopamine cells

stresses the importance in future studies of identifying the (Omelchenko and Sesack, 2009), their reduced activation ‘permits’

locations of the cells being recorded in any functional analysis dopamine burst firing. Posterior PPTg responses to rewards tended

of dopamine neurons. Evidently, context-dependent reward to persist for the duration of reward consumption, whereas VTA

responses are more apparent for VTA than for SNc cells. This cells show phasic high frequency burst firing to rewards, and the

finding begs the question: what is the source of context duration of the VTA response is relatively short compared to the

information for VTA neurons? duration of reward consumption. Thus while posterior PPTg may

The VTA may receive context-dependent information via an initiate VTA dopaminergic reward responses, other intrinsic or

indirect pathway from the hippocampus that includes the ventral extrinsic mechanisms regulate the duration of dopamine burst

striatum, ventral pallidum, and the PPTg (Fig. 6). Recent work firing (perhaps the inhibitory input from accumbens or pallidum

tested whether the latter pathway is an essential link that bridges (Zahm and Heimer, 1990; Zahm et al., 1996). Fig. 7 provides a

hippocampal context processing and the VTA. It had been known schematic illustration of a comparison between VTA and PPTg

that PPTg contributes to the burst firing of dopamine cells (Oakman neural responses to reward.

et al., 1995; Pan and Hyland, 2005), yet the significance of this A salient feature of the dopamine cell response to reward is the

influence is not clear. Consideration of sensory afferents to the brief changes in firing rates when rats encounter unexpectedly

PPTg (Redgrave et al., 1987; Reese et al., 1995) along with the large or small rewards. Such a prediction error signal was not

established role of dopamine in reinforcement-based operant observed for PPTg neuron, suggesting that it is either computed

learning (Schultz, 1998b) suggests that the PPTg may facilitate the locally within VTA circuitry, or it is received by an afferent

processing of (or attention to) learned conditioned stimuli via a structure. Matsumoto and Hikosaka (2007) provide convincing

sensory-gating mechanism (Kobayashi and Isa, 2002; Winn, 2006). evidence that the lateral habenula is at least a critical player in

Indeed, PPTg neurons exhibit phasic responses to auditory and generating a prediction error signal for dopamine cells since its

visual sensory stimuli that predict reward with a shorter latency neurons also show altered firing rates in response to a change in

(5–10 ms) than dopamine cells (Pan and Hyland, 2005). The PPTg the expected amount of reward. The direction of the change,

may, however, serve a more complex function than to relay current however, is the opposite of that of dopamine cells: they increase

sensory information since context-dependent responses of PPTg firing when animals encounter less reward than expected, and they

neurons have been described in cats performing a motor show reduced firing after encounters of unexpectedly large

conditioning task (Dormont et al., 1998). Thus it was of interest rewards. This pattern is consistent with the finding that lateral

to identify the nature of the information passed from PPTg to habenula activation normally inhibits the activity of VTA and SNc

dopamine cells during goal-directed navigation by investigating dopamine neurons (Christoph et al., 1986; Herkenham and Nauta,

PPTg neural responses during performance of a task that is (a) 1979). Additionally, Puryear and Mizumori (2008) found predic-

known to rely on intact hippocampal processing, and (b) known to tion error codes in cells of the medial reticular nucleus (Swanson,

generate burst firing by VTA neurons in a context-dependent 2003), which is known to provide glutamatergic input to VTA

fashion (Puryear et al., 2010). (Geisler et al., 2007). The reticular formation is thought to be

When PPTg cells were recorded from rats searching for food in important for modulating arousal and vigilance levels necessary

known locations on a radial maze, 45% of recorded PPTg neurons for attending to and acting upon salient stimuli (Mesulam, 1981;

were either excited or inhibited upon reward acquisition, and there Pragay et al., 1978). Thus, it seems reasonable that multiple areas

was no evidence for prediction error signaling. Thus, the latter modulate the activity of VTA dopamine neurons when the outcome

component of reward processing may arrive in the VTA via a route of behavior does not meet expectations.

112 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

Midbrain DA cells PPTg cells large small large small 30

Unexpected 0 rewards 500 ms

stimulus reward stimulus reward

Expected reward

Reward

omission

Fig. 7. Reward-related neural discharge has now been shown to exist in multiple brain structures throughout the midbrain and forebrain areas. Left: Responses of a midbrain

(VTA) dopamine cell to rewards of large and small magnitude. The top two rows illustrate responses when a large or small reward is unexpectedly presented to an animal: the

top row shows a schematized response that illustrates a greater dopamine cell response to large rewards. The example of a response by a single dopamine neuron in the

second row confirms the schematic on the top row. The third row illustrates that after a stimulus has been associated with reward, the stimulus itself, and not the reward,

elicits dopamine cell discharge. In this case the subject expects to receive reward following presentation of the stimulus. The bottom row illustrates dopamine cell responses

when a reward is omitted after the associated stimulus is presented. It can be seen that dopamine cells increase firing after stimulus presentation, but the same cell shows

reduced firing at the time when the rat expected to receive reward. This inhibited response is referred to as an inhibitory (or negative) reward prediction error that signals

efferent structures that an expected reward was not found. Right: For comparison with dopamine cell responses, schematized and exemplar responses are shown for cells

recorded in the pedunculopontine nucleus (PPTg), a structure that is thought to regulate burst firing by dopamine cells. Like dopamine cells, PPTg cells respond not only

respond to encounters with unexpected reward, but they also do so differentially. However, in contrast to dopamine cells, the PPTg responses differentiate reward

magnitudes in terms of the duration of response, and not magnitude of response. This pattern suggests that PPTg cells signal the presence of reward. If stimuli are associated

with subsequent reward encounters, PPTg cells show responses to cues that predict rewards (and not to stimuli that do not predict rewards). Unlike dopamine cells, PPTg cells

continue to response to reward presentations even after the presentation of a conditioned stimulus. The last row shows that, again unlike the response of dopamine neurons,

PPTg cells show no evidence of prediction error signaling.

To summarize, the hippocampus may provide a fundamental fact, these forms of learning can be represented on a kind of

analysis of the current context that allows subsequent decisions to continuum. Pavlovian learning mechanisms underlie the ability of

be made based on the most recent determination of context an organism to learn that neutral stimuli can be predictive of

saliency. Via direct projections to the ventral striatal–VTA system, rewards and goals and can eventually facilitate instrumental

hippocampus may signal the dopaminergic component of the learning (i.e., Pavlovian-instrumental transfer), and instrumental

reinforcement learning system when there are violations of one’s learning can progress from goal-directed behavior, to habitual

expectations for a given context. This ‘alerting’ signal may lower action–outcome associations once a behavior has been well-

the threshold for dopamine cell firing to reward so that the learned. Within the reinforcement learning literature, these

‘teaching signal’ can be distributed to update memory and different modes of learning are described by ‘model-free’ algo-

behavioral systems. The following section will describe current rithms that attempt to explain stimulus–response behavior, and

ideas about the impact of dopamine signals on the ventral and ‘model-based’ algorithms that describe how learning about the

dorsal striatum, focusing on the role of dopamine in decision environment allows an organism to consider impending actions or

making and behavioral control during navigation. formulate new actions within the current context. Until very

recently, it was thought that the dorsal striatum worked as the

6. The neurobiology of reinforcement learning and goal- actor in a model-free system, and the ventral striatum functioned

directed navigation: striatal contributions as the critic in a model-based system (Atallah et al., 2007; Johnson

et al., 2007; van der Meer and Redish, 2011). A wealth of recent

Decision making or action selection processes have been data, however, suggests a more fine-tuned delineation of function

attributed to the striatum, which acts as a dynamic controller of across the dorsal–ventral striatum. Along with a refinement of the

behavior, integrating sensory, contextual and motivational infor- functional anatomy of the striatum, it is also clear that reinforce-

mation from a wide network of cortical and subcortical structures. ment learning algorithms themselves may need to be reconsidered

This function can be accomplished through the use of reinforce- if they are to successfully model learning in complex environ-

ment learning algorithms that compare the expected success of a ments.

learned behavior with the actual success experienced by the

organism. In reinforcement learning models, the actor and critic 6.1. Striatal based navigational circuitry

use these predictions to implement successful action–outcome

policies (Khamassi et al., 2005). The actor–critic distinction Like the hippocampus, the striatum is composed of several

represents a classic distinction in psychological literature, that functionally and anatomically distinct subregions. All cortical

between Pavlovian learning (stimulus–outcome relationships) and areas project to the striatum (Berendse et al., 1992a,b; McGeorge

instrumental learning (action–outcome learning). While these and Faull, 1987, 1989; Parent, 1990) and the distribution of these

aspects of learning are often studied under restrictive conditions projections help to define three main subdivisions of the striatum:

designed to assess particular features of each type of learning, in the ventral striatum (often synonymous with the nucleus

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 113 A B C

Limbic loop Associative loop Sensorimotor loop

Orbital & PFC & Parietal Sensorimotor Ventral PFC BLA assoc. cortex cortices Mediodorsal Mediodorsal Ventral /ventral thalamus thalamus thalamus

NAc Ventral Dorsomedial Assoc. Dorsolateral Motor Shell Core pallidum striatum pallidum striatum pallidum

Medial Lateral Ventral Dorsal VTA VTA SNc SNc

Excitatory Disinhibition

Inhibitory DA modulation

Fig. 8. Striatal–cortical information processing loops. (A) The ‘limbic loop’ connects the orbital and ventromedial prefrontal cortex with the nucleus accumbens. Input from

these cortical regions is excitatory. The accumbens sends inhibitory projections to the ventral pallidum, which innervates the mediodorsal and other thalamic divisions. (B)

An ‘associative loop’ connects the prefrontal and parietal association cortices with the dorsomedial striatum. The dorsomedial striatum sends inhibitory projections to the

associative pallidum which innervates the mediodorsal and ventral thalamus. (C) The ‘sensorimotor loop’ connects the primary sensorimotor cortices with the dorsolateral

striatum. Emphasis is placed on the spiraling midbrain–striatum–midbrain projections, which allows information to be propagated forward in a hierarchical manner. Note

that this is only one possible neural implementation; interactions via different thalamo-cortico-thalamic projections are also possible (Haber, 2003). BLA, basolateral

amygdale complex; core, nucleus accumbens core; DLS, dorsolateral striatum; DMS, dorsomedial striatum; mPFC, medial prefrontal cortex; OFC, orbitofrontal cortex; shell,

nucleus accumbens shell; SI/MI, primary sensory and motor cortices; SNc, substantia nigra pars compacta; vPFC, ventral prefrontal cortex; VTA, ventral tegmental area.

accumbens), dorsomedial striatum, and dorsolateral striatum behaviors (e.g., learning vs. performance, operant vs. maze

(Alexander and Crutcher, 1990a; Alexander et al., 1986; Humph- learning, Pavlovian vs. instrumental learning). A complete discus-

ries and Prescott, 2010; Voorn et al., 2004). Each of these sion of these issues is beyond the scope of the current paper, thus

subregions participates in one of a series of parallel loops that the interested reader is directed to several excellent reviews that

go from the neocortex to the striatum, pallidum, thalamus, and have discussed these details (Bromberg-Martin et al., 2010;

then back to neocortex (see Fig. 8; Alexander and Crutcher, 1990a; Humphries and Prescott, 2010; Nicola et al., 2000; Redgrave and

Groenewegen et al., 1999a; Haber, 2003). These loops include a Gurney, 2006; Wise, 2009; Yin et al., 2008).

‘limbic loop’ that connects the ventromedial prefrontal cortex with

the ventral striatum (Alexander and Crutcher, 1990a; Graybiel, 6.2. Dopamine signaling and reward prediction error within the

2008; Graybiel et al., 1994; Pennartz et al., 2009; Voorn et al., 2004; striatum

Yin and Knowlton, 2006), an ‘associative loop’ that connects the

medial prefrontal cortex with the dorsomedial striatum, and a The striatum is a major target of midbrain dopaminergic

‘sensorimotor loop’ that connects somatosensory and motor projections from both the VTA and the SNc (Beckstead et al.,

cortical areas with the dorsolateral striatum. Activity within these 1979; Haber et al., 2000; Humphries and Prescott, 2010). The

loops is modulated by dopamine, released from fibers originating dopaminergic projections from the VTA and SNc play a crucial role in

in either the VTA or the SNc. Dopamine influences glutamatergic motor control and in emotional and cognitive processes (Wise,

afferents and striatal medium spiny neuron efferents, and through 2004). Dopamine neurons in the VTA send projections to the

these actions, modulates striatal output from these loops (Horvitz, prefrontal cortex, hippocampus, and amygdala, in addition to the

2002; Nicola et al., 2004). The particular role that dopamine plays projection to the ventral striatum, whereas dopaminergic neurons

in regulating information processing within each of the cortical– from the SNc connect primarily to the dorsal striatum (Bjorklund and

striatal loops is influenced by the origin and destination of the Dunnett, 2007).The projections that originate in the VTA and connect

dopaminergic projections. In addition, recent work has demon- to the prefrontal cortex are thought to regulate attentional processes

strated regional differences in tonic and phasic dopamine signals and working memory (Dalley et al., 2004), whereas VTA projections

across the ventral–dorsal axis of the striatum (Zhang et al., 2009). to the ventral striatum are assumed to play a key role in reward,

As recently pointed out by Humphries and Prescott (2010), and motivation, and goal-directed behavior (Ikemoto, 2007; McFarland

also noted by others (Bromberg-Martin et al., 2010; Salamone, and Ettenberg, 1995; Smith-Roe and Kelley, 2000; Wolterink et al.,

2007; Wise, 2009; Yin et al., 2008), a number of issues related to 1993). In terms of dopaminergic projections that originate in the SNc,

dopamine signaling within the striatum remain topics of intense the traditional view has been that this projection influences motor

debate, for example, where and what type of dopamine receptors output and stimulus–response learning (Featherstone and McDo-

are found within the striatum, and what effects their activation nald, 2004; Hikosaka et al., 2006; O’Doherty et al., 2004). However,

may have on cell signaling and behavior. The factors that are likely recent evidence indicates that goal-directed behaviors depend on

to contribute to the confusion include unclear boundaries between signaling in the dorsomedial striatum and prefrontal cortex

striatal compartments, unclear boundaries between midbrain (Graybiel, 2008; Yin et al., 2008). In addition, data from rodents

dopaminergic regions (VTA and the SNc), and the different with neurotoxic lesions of nigrostriatal dopaminergic neurons

methods used to study the effects of dopamine (e.g., pharmaco- suggest that the dorsal striatum strongly contributes to visuospatial

logical manipulations, lesions, genetically engineered mice, function and memory (Baunez and Robbins, 1999; Chudasama and

microdialysis, and voltammetry) on many different kinds of Robbins, 2006; De Leonibus et al., 2007; Da Cunha et al., 2003).

114 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

The nucleus accumbens is the dopamine terminal field most promotes habit learning (Yin et al., 2008). For example, selective

strongly implicated in reward function. As discussed in Section 4.2, lesions of dopamine cells that project to the dorsolateral striatum

the predominate view of phasic burst firing of dopaminergic impairs habit learning ( et al., 2005). Local dopamine

neurons within the midbrain is that it provides a reward prediction depletion, then, is similar to excitotoxic lesions of the dorsolateral

error signal representing the difference between the expected and striatum, in that both manipulations retard habit formation and

the received reward outcome (Ljungberg et al., 1992; Schultz, favor the acquisition of goal-directed actions (Yin et al., 2004).

1998b). In Pavlovian conditioning tasks, in which a cue signals the Further evidence that dopamine signaling within the dorsal

availability of reward, these neurons burst fire in response to striatum may differentially mediate action–outcome and habit/

reward but with learning, this activity shifts to the cue that motor learning has been provided by Yin et al. (2009). Medium

predicts reward. When the reward is omitted after learning, the spiny neurons within the striatum can be segregated into two

putative dopamine cells show a brief depression in activity at the distinct populations, those projecting directly to neurons of the

expected time of its delivery (e.g., Fiorillo et al., 2003; Tobler et al., substantia nigra pars reticulata (SNr) and internal segment of the

2003; Waelti et al., 2001; see Section 4.2). Demonstrating changes (the ‘direct’ pathway) and those that project to the

in activity within the dopamine-rich VTA, however, does not external segment of the globus pallidus, or entopeduncular

necessarily equate to dopamine release within its target structure, nucleus in rodents (the ‘indirect’ pathway). Neurons of the

although one would predict that these events would be correlated or entopeduncular neurons then project

if dopamine modulates activity within the nucleus accumbens. to the SNr, the , and subthalamic nucleus.

Technological advances have provided a tool, fast-scan cyclic These two populations exhibit distinct physiological properties

voltammetry, for measuring dopamine release in target structures and, importantly, express different dopaminergic receptors, with

on a subsecond timescale (Clark et al., 2010; Robinson et al., 2003; neurons of the direct pathway preferentially expressing D1

Wightman and Robinson, 2002). Using this technique, work by receptors and neurons of the indirect pathway preferentially

Regina Carelli’s group tested the hypothesis that dopamine release expressing D2 receptors (Albin et al., 1989; Surmeier et al., 2007).

in the accumbens core is indeed correlated with a prediction error Using D2-eGFP mice, Yin et al. (2009) found that D2 expressing

signal in an appetitive Pavlovian conditioning paradigm (Day et al., neurons located in dorsolateral striatum exhibit a significant

2007). As would be predicted based on activity in the VTA, a phasic increase in synaptic strength compared to D1 expressing neurons

dopamine signal in the accumbens core was observed immediately from the same region when mice underwent extended training on

after receipt of reward, but over extended training, this signal a rotarod task. Further, blocking D1 receptors did not affect

shifted to the conditioned stimuli. This finding supports the performance when injected after the task had been well-learned. In

original ‘prediction error’ hypothesis and is also consistent with contrast, blocking D2 receptors impaired performance at both

earlier work showing impaired performance of a Pavlovian early and late training phases. This suggests that motor skill

conditioned response after either dopamine receptor antagonism learning involves an increase in synaptic activation of D2

or dopamine depletion in the accumbens core (Di Ciano et al., expressing medium spiny neurons within the dorsolateral stria-

2001; Parkinson et al., 2002). Thus, at least within the nucleus tum. An intriguing possibility is that these kinds of changes may

accumbens, the generation of a reward prediction error within the also underlie habitual behavior as routes become very familiar in

VTA does appear to provide a teaching signal to the nucleus an unchanging environment.

accumbens that facilitates learning, but it should be noted that it Additional methods to assess the distinct role that dopamine

may not provide a unitary teaching signal across the ventral has on learning and decision making mechanisms within the dorsal

striatum (Aragona et al., 2009). striatum have been employed by Palmiter and colleagues, using a

Although many remarkable discoveries have been made in dopamine deficient mouse (Palmiter, 2008; Wall et al., 2011).

terms of how the nucleus accumbens contributes to decision These mice lack tyrosine hydroxylase selectively in dopamine

making processes, it has become increasingly clear that the dorsal neurons and are therefore unable to synthesize dopamine. In

striatum is also involved. The existence of an error prediction contrast to lesion models, dopamine neurons in dopamine

signal, however, is not as well established in the dorsal compared deficient mice are functionally intact (Robinson et al., 2004),

to the ventral striatum. Direct measurement of dopamine within and endogenous dopamine signaling can be selectively restored by

the dorsal striatum has not been undertaken during a task that the experimenter, making them a powerful tool for studying

would produce a prediction error signal from the midbrain. Work dopamine signaling. These mice show impairments in instrumen-

by Oyama et al. (2010) has provided the best evidence to date that tal learning and performance, but their performance can be

an error signal is in fact generated in the dorsal striatum. In this restored either by L-DOPA injection or by anatomically selective

study, single unit activity was recorded in the dorsal striatum and viral gene transfer (Robinson et al., 2007; Sotak et al., 2005). Work

the VTA/SNc within the same animals to look for correlated activity by Darvas and Palmiter (2010, 2011) has provided evidence that

between structures during performance of a probabilistic Pavlovi- dopamine is necessary for cognitive flexibility using a water U-

an conditioning task. The data indicate that neurons within the maze task in which mice had to shift from an initially acquired

dorsal striatum do in fact show activity indicative of an error escape strategy to a new strategy, or to reverse the initially learned

prediction signal that is similar to the signal generated by putative strategy. Restricting dopamine signaling to the ventral striatum

dopaminergic neurons within the midbrain. did not impair learning of the initial strategy or reversal-learning

In addition to potentially providing a prediction signal, but strongly disrupted strategy-shifting. In contrast, mice with

dopamine within the dorsal striatum promotes learning and dopamine signaling restricted to the dorsal striatum had intact

memory processes that are necessary for goal-directed behavior. learning of the initial strategy, reversal-learning, and strategy-

The dopamine projection to the dorsomedial striatum, however, shifting. This suggests that dopamine signaling in both dorsal and

may play a different role in learning than the projection to the ventral striatum is sufficient for reversal-learning, whereas only

dorsolateral striatum, as these two regions may differ significantly dopamine signaling in the dorsal striatum is sufficient for the more

in the temporal profile of dopamine release, uptake and degrada- demanding strategy-shifting task. In a follow-up study (Darvas and

tion (Wickens et al., 2007a,b). One current working hypothesis is Palmiter, 2011) dopamine was restored to the ventromedial

that dopamine projections to the dorsomedial striatum from the striatum, and this treatment rescued spatial memory, visuospatial

medial SNc promotes action–outcome learning, while dopaminer- and discriminatory learning. Acquisition of operant behavior was

gic projections from the lateral SNc to the dorsolateral striatum delayed, however, and motivation to obtain food rewards was

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 115

blunted. These studies indicate that precise restoration of (b) The dorsomedial striatum, on the other hand, appears to

dopamine signaling within the striatum can selectively affect support action–outcome associations. This kind of learning is

behavior. It should be noted, however, that whatever functions can fundamental for adaptive goal-directed behaviors. Many of our

be rescued by L-DOPA or adenosine antagonism in DA-deficient behaviors can be considered goal-directed, for example,

mice are likely related to restoration of tonic dopamine signaling, publishing more papers will lead to a promotion at work, or

rather than phasic dopamine signaling. In addition, these mice increasing our level of exercise may lead to better health.

have not been used to directly assess habit formation, or the (c) The dorsolateral striatum is involved in incremental stimulus–

potential parallel signaling that may take place between the response kinds of learning that underlie procedural learning

dorsolateral and dorsomedial striatum as learning develops. and the formation of habits, and the sequencing of behavior. In

Nevertheless, the development of this kind of model for selectively many cases, habits are thought of in a negative context such as

investigating dopamine function in the striatum is likely to drug addiction. When habits are discussed here, the term is

significantly advance our understanding of the role that dopamine meant to indicate something more general and adaptive,

plays in decision making during learning. reflecting a well-learned skill or automatic behavior. One

Based on these data, one hypothesis about the influence of example of this kind of learning may be learning to ride a

dopamine on striatal function suggests that the striatum can be bicycle; initially, a great deal of effort and conscious thought

organized into four regions that underlie different, but synergistic goes into staying upright and moving the bicycle forward. Over

association processes, each contributing to the decision processes time, however, these actions become considerably easier and

that are necessary for navigating within complex learning the individual components of the behavior that keeps you

environments (Ikemoto, 2007; Yin et al., 2008). Neuronal signaling upright and moves the bicycle forward become an implicit fluid

moves through a serial cascade, beginning in the ventral striatum sequence that may be difficult to verbalize when teaching

and moving into the dorsomedial and finally, the dorsolateral someone else how to ride a bicycle.

striatum as learning progresses. It is thought that this spiraling of

information through the ventral–dorsal aspects of the striatum While these descriptions of the contributions of the striatal

promotes the transition from goal-directed to habit-driven subregions to decision making processes suggests separable

behaviors (Belin and Everitt, 2008; Everitt and Robbins, 2005). functions (i.e., serial processing), it is more likely that these

Details of this working model of the striatum include the following subregions function synergistically within a wide network to

(also see Fig. 9): direct behavior in complex learning environments (Groenewegen

et al., 1999b; Haber, 2003; Haruno and Kawato, 2006; Joel and

(a) The ventral striatum is important for Pavlovian learning and Weiner, 2000; Yin et al., 2008; Zahm, 2000). These functions will be

the interaction between Pavlovian and instrumental learning discussed individually.

mechanisms. This kind of stimulus–reward learning underlies

conditioned approach behaviors, and is a powerful way in 6.3. The ventral striatum: Pavlovian learning and cost-based decision

which one can learn that neutral stimuli leads to reward. In making

some cases, the stimuli that predict reward may acquire some

of the motivational properties of the primary reward. An The ventral striatum receives convergent glutamatergic input

example of this is the value that money has – while money from multiple sensory and association areas of the neocortex

itself has no innate biological importance, it is often paired with (prefrontal cortex) and the limbic system, including the amygdala

items that do have motivational significance, allowing it to and hippocampus and related structures (subiculum, area CA1,

serve as a predictor for future rewards, and also as a powerful entorhinal cortex) (Boeijinga et al., 1993; Flaherty and Graybiel,

conditioned reinforcer. 1993; Groenewegen et al., 1999a,b, 1987; Humphries and Prescott,

Dorsolateral Striatum Model-free Stimulus response learning Habits, skills, behavioral sequencing Model-based Dorsomedial Striatum Action-outcome learning Goal-directed action

DLS Nucleus Accumbens Core Stimulus-outcome learning DMS Pavlovian preparatory CRs & Anticipatory approach behaviors core

Nucleus Accumbens Shell shell Stimulus-outcome learning Pavlovian consummatory CRs &

Hedonic URs

Fig. 9. Major functional domains of the striatum. An illustration of a coronal section of the striatum showing half of the brain (Paxinos and Watson, 2007). The four functional

domains are anatomically continuous, and roughly correspond to what are commonly known as nucleus accumbens shell and core (ventral striatum), the dorsomedial

striatum and the dorsolateral striatum. These striatal subregions are thought to implement different aspects of reinforcement learning, either ‘model-free’ learning (dark

grey) or ‘model-based’ learning (light grey). In addition, these subregions are thought to represent both the actor and the critic. Within the dorsal striatum, the lateral portion

supports a model-free actor function whereas the dorsomedial region represents a model-based actor. The ventral striatum, which is crucial for Pavlovian learning, is thought

to represent the critic; the core represents a model-free critic, whereas the shell represents a model-based critic.

After Bornstein and Daw (2011) and Yin et al. (2008).

116 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

2010; Izquierdo et al., 2006; McGeorge and Faull, 1989; Mulder organism learns about environmental cues that lead to biologically

et al., 1998; Totterdell and Meredith, 1997; van Groen and Wyss, significant events such as food, mates, and shelter. It is not

1990; Voorn et al., 2004). The nucleus accumbens, the main portion surprising then, that autoshaping is demonstrated by a number of

of the ventral striatum, can be divided into two major subregions, species, including birds (Brown and Jenkins, 1968) monkeys

the core which is continuous with the dorsomedial striatum, and (Sidman and Fletcher, 1968) and humans (Wilcove and Miller,

the shell which occupies the ventral and medial portions of the 1974).

nucleus accumbens. Although the core and shell regions share A number of studies suggest that the nucleus accumbens

common characteristics, they also differ significantly in terms of mediates autoshaping. For example, Cardinal et al. (2001)

their cellular morphology, neurochemistry, and patterns of demonstrated that excitotoxic lesions of the nucleus accumbens

projections, all of which may suggest a different function for the core impair the ability to discriminate between a cue that is

core and shell (Heimer et al., 1991; Jongen-Relo et al., 1994; predictive of reward and an alternate cue with no predictive value.

Meredith, 1999; Meredith et al., 1992, 1996, 2008; Usuda et al., Similarly, depletion of dopamine in the nucleus accumbens results

1998; Zahm and Brog, 1992; Zahm and Heimer, 1993). The core in deficits in the acquisition and expression of approach behaviors

and shell regions of the nucleus accumbens are not likely to (Di Ciano et al., 2001; Parkinson et al., 2002). Further, electrophys-

function completely independently of each other, however, as iological recordings during autoshaping demonstrate that accum-

direct interconnections between these areas have also been bens neurons exhibit phasic changes in firing rate that are selective

described (Heimer et al., 1991; van Dongen et al., 2005; Zahm, for cues predictive of reward; in some cases, an increase in activity

1999; Zahm and Brog, 1992; Zahm and Heimer, 1993). is associated with the onset of a reward predicting cue, while a

Based on its connectivity, a general working model has been second subset of neurons is significantly inhibited. These same

that the nucleus accumbens represents a ‘limbic–motor interface’ cells showed little or no change in activity in response to a cue that

that facilitates appropriate responding to reward-predictive was not paired with reward. These findings were also core and

stimuli (e.g., Ikemoto and Panksepp, 1999; Mogenson et al., shell specific; significantly fewer neurons in the shell showed an

1980; Nicola, 2007; Pennartz et al., 1994; Wise, 2004; Wright et al., excitatory response to predictive cues compared to neurons within

1996; Zahm, 2000). How this process is achieved, however, is not the core (Day et al., 2006). In addition, lesion and pharmacological

fully understood. If the accumbens does indeed represent such an data indicate that disrupting activity within the core interferes

interface, then it should, at the very least, process information with approach toward predictive cues, suggesting that the core

related to reward and the actions that lead to the acquisition of may help organisms discriminate between biologically relevant

reward. In fact, there is a fair amount of evidence suggesting that and irrelevant cues (Cardinal et al., 2001; Di Ciano et al., 2001). The

neurons within the nucleus accumbens respond to cues associated functional dissociation between the core and shell might be

with a reward (e.g., Carelli and Ijames, 2001; Cromwell and expected given that these regions send separate projections to

Schultz, 2003; Hassani et al., 2001; Hollerman and Schultz, 1998; different output structures (Heimer et al., 1991; Sesack and Grace,

Nicola et al., 2004; Roitman et al., 2005; Setlow et al., 2003; Wilson 2010).

and Bowman, 2005), as well as the selection of one behavior from The accumbens is also involved in Pavlovian-instrumental

among competing alternatives (Hikosaka et al., 2006; Nicola, 2007; transfer (PIT), which is the capacity of a Pavlovian stimulus that

Pennartz et al., 1994; Redgrave et al., 1999a; Roesch et al., 2009; predicts reward to elicit or increase instrumental responses for the

Taha et al., 2007). same (or a similar) reward (Estes, 1943, 1948; Kruse et al., 1983;

Rescorla and Solomon, 1967). To produce PIT, animals first undergo

6.3.1. Nucleus accumbens and Pavlovian learning Pavlovian and then instrumental training during which they learn

Foraging animals encounter situations in which they are to associate a cue with reward and then later, learn to make a

required to find food or other necessary resources. In order to specific operant response (i.e., press a lever) for the reward. On a

learn that certain stimuli may signal the availability of the resource probe trial, the predictive cue is presented with the lever, and the

being pursued, organisms must be able to learn relationships change in response rate on the lever is measured. Two forms of PIT

between positive outcomes and their reward predictive cues. This can be observed, one that is related to the arousing effect of

behavior can be investigated within the laboratory using an reward-related cues (non-selective PIT), and another that is more

autoshaping (also known as ‘sign tracking’) paradigm. In auto- selective for choice performance produced by the predictive status

shaping experiments, a cue is paired with the availability of of a cue with respect to one specific reward compared to others

reward. Initially, this cue is neutral, meaning that the cue itself is (outcome-selective PIT) (Holmes et al., 2010). The shell and core

neither biologically significant, nor is it predictive of reward. regions of the nucleus accumbens are differentially involved in

Because the cue is novel, and rodents have a propensity for general and selective PIT; general PIT is disrupted by lesions of the

investigating novel cues and objects (Bardo et al., 1989, 1996; core, but not by lesions of the shell (Hall et al., 2001), whereas

Bardo and Dwoskin, 2004; Burns et al., 1996; De Leonibus et al., selective PIT is disrupted by lesions of the shell, but not by lesions

2006), the animal will approach the cue, and over time will begin to of the core (Corbit et al., 2001). Importantly, because the

associate the cue with a reward. Thus, the neutral cue gains control accumbens is not thought to be integral to instrumental behaviors

over approach responses even though reward delivery is indepen- (Yin et al., 2008), other regions of the striatum that are involved in

dent of any specific behavior, and with extended training, instrumental learning should also be involved in PIT. In fact, Corbit

approach responses are observed nearly every time the reward- and Janak (2007, 2010) have shown that the dorsolateral and

predictive cue is presented. A cue that has never been paired with dorsomedial striatum integrate different aspects of Pavlovian and

reward does not elicit approach behavior even after repeated instrumental information. For example, lesions of the dorsolateral

presentation (Bussey et al., 1997; Robbins and Everitt, 2002). This striatum reduces PIT altogether, whereas lesions of the dorsome-

approach behavior lacks the flexibility of instrumental learning in dial striatum interferes with the selectivity of PIT (Corbit and Janak,

that the behavior is not generally altered by the introduction of 2007).

new contingencies (Bussey et al., 1997; Day and Carelli, 2007;

Jenkins and Moore, 1973; Locurto et al., 1976; Williams and 6.3.2. The nucleus accumbens and cost-based decision making

Williams, 1969). Autoshaping has important implications for When animals are pursuing a goal, they are often faced with

foraging behavior; in a rapidly changing environment, autoshaping complex effort or time-related barriers that separate the actions

behaviors represent a fundamental mechanism through which an they make from the goal being pursued. This is the case in natural

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 117

foraging environments, and in the laboratory where animals are their predictive cues are separable and independently modulated

trained to lever press or navigate a maze for reward. Thus, it is when instrumental-response requirements are progressively in-

adaptive for animals to cope with delayed reinforcement or creased. That is, reward-evoked dopamine release within the

increased effort to obtain the desired outcome. Within the accumbens is affected by escalating costs in proportion to the delay

laboratory, effort-based decision making can be assessed by imposed prior to reward delivery rather than to increased work

providing the organism with a choice between a low-cost/low requirements, whereas cue-evoked dopamine release is unaffected

value reward vs. a high-cost/high value reward. Most typically, low by either temporal or effort-related costs. Together, these results

cost options are associated with, for example, few lever press may be congruent with competing theories of dopamine function: if

responses or a short time delay, while high cost options require dopamine provides a prediction error signal, then dopamine

significantly more lever presses or impose a longer delay between neurons in a trained animal respond to rewards only when they

the last response and the delivery of reward. Many factors may are unexpected (Fiorillo et al., 2003; Schultz et al., 1997), as would be

influence the choice that any one animal makes, including the case when the relative cost of a reward changes. In addition,

motivational factors such as how hungry the animal is, or how phasic dopamine signals may provide an incentive signal that is used

desirable the reward is (Salamone et al., 2007, 2009). A growing to determine the value of the reward (Berridge, 2007). This would

body of work suggests that the nucleus accumbens and its cortical also explain the observation that changes in phasic dopamine occur

afferents (e.g., the anterior and medial prefrontal when costs to obtain the reward changes. Finally, these results may

cortex) are involved in exertion of effort and effort-related choice also be consistent with the ‘Flexible Approach Hypothesis’ which

behaviors (e.g., Cardinal et al., 2001; Floresco and Ghods-Sharifi, states that dopamine signaling within the accumbens is required for

2007; Floresco et al., 2008a; Salamone, 2002; Walton et al., 2006). reward seeking behavior only when specific actions that are

Disrupting activity within the nucleus accumbens can shift necessary to obtain reward are variable across trials (Nicola, 2010).

behavior toward actions that require less effort or are associated The role of the nucleus accumbens in mediating cost-based

with shorter delays to reward (Aberman and Salamone, 1999; choice behavior has also been tested using maze tasks. For

Aberman et al., 1998; Bezzina et al., 2008; Cardinal et al., 2001; Day example, a T-maze choice task (Cousins et al., 1996; Salamone,

et al., 2011; Hauber and Sommer, 2009; Walton et al., 2006). In a 1994) can be used in which one of the choice arms contains a large

recent study (Day et al., 2011), the complex role that the nucleus food reward, whereas the other arm has a significantly smaller

accumbens plays in effort-based and delay-based costs was reward. Effort-related decision problems can be introduced by

assessed. In this study, a visual cue signaled the relative value placing a barrier in the arm that contains the larger reward, thus

of an upcoming reward. Analysis of single unit activity within the presenting an obstacle that the rat must climb to gain access to the

accumbens indicates that a subgroup of neurons show phasic larger reward. Alternatively, the barrier that prevents the rat from

increases in firing in response to the predictive cue, and this accessing the larger reward can be used to impose a delay before

activity reflects the cost-discounted value of the upcoming access to the large reward is granted. Using an effort-based version

response for effort-related, but not delay-related costs. In contrast, of this task, Cousins et al. (1996) demonstrated that excitotoxic

additional subgroups of neurons respond during response initia- lesions of the accumbens significantly decreased selection of the

tion or reward delivery, but this activity does not differ on the basis high effort/high reward maze arm. When, however, reward was

of reward cost. Finally, another population of neurons within the entirely omitted from the low effort maze arm, these rats choose

accumbens showed sustained changes in firing rate (either the high effort/high reward arm and were capable of obtaining the

excitation or inhibition) while rats completed high-effort require- reward, despite the high cost.

ments or waited for delayed rewards. The complexity of the results Recently, Bardgett et al. (2009) used a discounting version of the

reported in this study highlights the complexity of the computa- T-maze task in which the amount of food in the large reward arm of

tions required to make decisions when faced with competing the maze was reduced each time the rat selected that arm. This

options. For the foraging animal, the cost of obtaining rewards is ‘adjusting-amount’ discounting variant of the T-maze task permits

dynamic; for example, the time to explore and the distance that assessment of the indifference point for each rat, which is defined

must be travelled to obtain resources is constantly changing as the point at which the rat no longer shows a preference for one

(Stephens, 1986). Because individual neurons within the accum- reward over the other, and therefore chooses both amounts equally

bens receive diverse cortical and subcortical inputs, they are likely often (Richards et al., 1997). When dopamine signaling was

to carry a heavy information processing load (Kincaid et al., 1998) blocked with either a D1 or D2 receptor antagonist, rats were more

in complex decision making environments. likely to choose the small-reward arm, but when treated with

Dopamine signaling also contributes to the execution of cost– amphetamine, rats were more likely to choose the large-reward

benefit decisions (Fiorillo et al., 2003, 2008; Gan et al., 2010; arm. Clearly, carefully designed behavioral studies with mazes can

Kobayashi and Schultz, 2008; Ostlund et al., 2011; Phillips et al., provide a more complete understanding of how the brain

2007; Roesch et al., 2007; Roitman et al., 2004; Tobler et al., 2005; processes information necessary for making (optimal) decisions

Wanat et al., 2010). Some studies have investigated the role of in complex learning environments. In fact, cost-based decision

putative dopamine neurons to cost-based decisions by measuring making has been investigated on several maze-based tasks, have

activity in the midbrain (Fiorillo et al., 2003, 2005, 2008; Kobayashi undergone behavioral validation and evaluation (Cousins et al.,

and Schultz, 2008; Roesch et al., 2007; Tobler et al., 2005) while 1996; Salamone et al., 1991; van den Bos et al., 2006), and have

others studies have obtained a measure of dopamine activity within been used by several laboratories to characterize the effects of

the nucleus accumbens, since the latter is a major target of midbrain brain lesions or drug manipulations on choice behavior (Bardgett

dopaminergic projections, and is known to be involved in the et al., 2009; Denk et al., 2005; Salamone et al., 1991; Schweimer

computations that support cost-based decision making (Day et al., and Hauber, 2006; Walton et al., 2002). Although there are very

2011; Gan et al., 2010; Salamone et al., 2009; Wanat et al., 2010). In obvious differences between these tasks, and the operant tasks

studies using voltammetry to measure phasic dopamine release, after which they have been modeled, both have yielded remark-

cue-evoked dopamine signals are shown to be relatively insensitive ably similar results (Bardgett et al., 2009; Cousins et al., 1994; Denk

to both effort-based and delay-based costs, but a significant et al., 2005; Floresco et al., 2008b; Koch et al., 2000; Salamone et al.,

response is observed when the cost to obtain reward changes 1991, 2002; Sink et al., 2008; Wakabayashi et al., 2004; Walton

(Gan et al., 2010; Roesch et al., 2007; Wanat et al., 2010). Further, et al., 2006). Thus, maze tasks appear to be valid models for

Wanat et al. (2010) showed that dopamine responses to rewards and investigating choice behavior during cost-based decision making.

118 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

6.3.3. Spatial learning and navigation: the role of the ventral striatum neurons described by Lavoie and Mizumori (1994) likely reflect the

The ability to make optimal cost-based decisions is essential if integration of spatial with non-spatial information (i.e., reward

animals are to make adaptive behavioral choices during goal- and movement-related information) within the ventral striatum.

directed navigation. The ventral striatum appears strategically In addition, the fact that single ventral striatal neurons encode

positioned to play a key role in cost based decisions during multiple types of information supports the view that spatial,

navigation given the convergent evidence from a variety of maze reward and movement information may be integrated at the level

studies, including the spatial version of the Morris swim task of individual ventral striatal neurons. Thus, together with the

(Sargolini et al., 2003; Setlow and McGaugh, 1998), the radial maze hippocampus, the ventral striatum plays a key role in evaluating

(Gal et al., 1997; Smith-Roe et al., 1999), a spatial version of the and selecting the behaviors most likely to result in reward, and

hole board task (Maldonado-Irizarry and Kelley, 1995), as well as a thus underlie goal-directed behavior (in this particular case, goal-

task in which the animals are required to discriminate a spatial directed navigation).

displacement of objects (e.g., Annett et al., 1989; et al., In addition to characterizing the activity of the hippocampus

2005; Roullet et al., 2001; Sargolini et al., 1999; Seamans and and the ventral striatum in a maze-based decision making task,

Phillips, 1994; Usiello et al., 1998). characterization of the dorsal striatum was also undertaken.

To investigate the idea that the ventral striatum associates Previous studies have provided evidence that neurons within the

specifically spatial context with reward information to facilitate dorsal striatum exhibit egocentric movement-related discharge

initiation of appropriate navigation-based behaviors (Mogenson (e.g., Barnes et al., 2005; Jog et al., 1999; Yeshenko et al., 2004) and

et al., 1980), Lavoie and Mizumori (1994) recorded neural activity show spatially selective firing on maze tasks. On the multiple T-

in the ventral striatum while rats navigated an 8-arm radial maze maze task, van der Meer et al. (2010) observed a gradual increase in

for food reward. This study demonstrated, for the first time, spatial the coding efficiency of dorsal striatal neurons as the animals

firing correlates within the ventral striatum (Lavoie and Mizumori, become better at implementing the correct choice. In addition,

1994). The mean place specificity for all ventral striatal neurons these responses within the dorsal striatum are most evident during

was significantly lower than that typically observed in the the turn sequence, at reward location, and in response to cues that

hippocampus (Barnes et al., 1990), indicating that while ventral are predictive of reward (van der Meer et al., 2010). This suggests

striatal neurons discharge with spatial selectivity, they are not as that activity in the dorsal striatum may reflect the events that

selective as those observed from hippocampal neurons. The define the task structure; because the ultimate goal of the task is to

moderate spatial selectivity likely reflects the integration of reach reward, this is one salient event, and the turn sequence that

spatial with other non-spatial information within the ventral the rat makes in order to reach that reward might be considered

striatum, including reward and movement. The fact that single another salient aspect of task structure. This result is in line with

ventral striatal neurons encode multiple types of information work from Graybiel and colleagues (e.g., Barnes et al., 2005; Jog

supports the view that spatial, reward and movement information et al., 1999), which is discussed in greater detail below. Overall,

may be integrated at the level of individual ventral striatal these results provide evidence for a functional network that

neurons. Recent evidence suggests that spatial information within supports choice behavior on a goal-directed navigation based task.

the ventral striatum is derived from the hippocampus, Ito et al. The role that the dorsal striatum plays in decision and learning

(2008) showed that an interruption of information sharing processes will be discussed below.

between the hippocampus and shell of the nucleus accumbens

disrupted the acquisition of context-dependent retrieval of cue 6.4. Dorsal striatum: contributions to response and associative

information, suggesting that the shell, in particular, may provide a learning

site at which spatial and discrete cue information may be

integrated. Historically, investigation of the particular role that the dorsal

Work by Redish and his colleagues have sought to describe the striatum plays in mediating goal-directed behaviors investigated

unique contributions that the hippocampus and the striatum make the dorsal striatum as a single entity, and it has only been fairly

to choice behavior and spatial information processing using a recently recognized that the lateral and medial aspects of the

multiple T-maze task. With this task, several choice points are dorsal striatum participate in learning in unique ways (Balleine

presented to the rat as it navigates from a start location to a reward et al., 2007; Balleine and O’Doherty, 2010; Yin et al., 2008). The

site. The final choice point on the maze represents a point in space dorsomedial striatum is innervated by the association cortices, and

where the animal makes a final ‘high-cost’ choice to gain access to the anterior portion of the dorsomedial striatum also receives

reward. At this critical point, a number of interesting events occur projections from the prefrontal cortex, while the more posterior

in terms of both observable behavior and neuronal responses. First, region receives significant projections from the perirhinal and

early in training, while the animal is learning the correct choice, agranular insular regions, as well as the entorhinal cortex and

the animal pauses and engages in what is called ‘vicarious trial and basolateral amygdala (McGeorge and Faull, 1987, 1989). This

error’ (Tolman, 1938, 1939). While this behavior is being engaged, region of the dorsal striatum is thought to mediate goal-directed

ensembles of hippocampal neurons transiently represent locations behaviors, as has been shown in instrumental operant tasks, and in

ahead of the animal, sweeping down the arms of the maze before goal-directed navigational tasks. In contrast, the dorsolateral

the animal implements a choice (Schmitzer-Torbert and Redish, striatum, which is innervated by the primary motor and

2002; van der Meer et al., 2010). In parallel with these forward somatosensory cortices, underlies motor skill learning and habit

sweeps, neurons in the ventral striatum that are responsive to learning that allows automaticity of behavior when appropriate

reward (i.e., at the reward site on the maze), also show enhanced (see Balleine et al., 2007; Johnson et al., 2007; Yin and Knowlton,

neural responses at the final decision point. This activity is thought 2006; Yin et al., 2008). Importantly, both modes of learning will

to reflect an ‘expectation-of-reward’ signal at decision points (van contribute to flexible navigational behaviors – it is through the

der Meer et al., 2010; van der Meer and Redish, 2010). This interaction of these two modes of learning that animals will be able

interpretation is congruent with work described above showing to select the most adaptive behavior necessary to navigate in a

that the ventral striatum is involved in mediating the influence complex learning environment. In terms of reinforcement learning

that motivationally relevant cues have on behavior (Cardinal et al., theory, as a whole, the dorsal striatum is thought to represent the

2001; Day and Carelli, 2007; Kelley, 2004). In addition, these actor in the actor–critic framework, but the dorsomedial striatum

results support the idea that the moderately spatial selective is thought to perform this function within a model-based system

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 119

whereas the dorsolateral striatum is thought to perform this whereas the dorsolateral striatum underlies response (motor)

function within a model-free framework. learning. Lesions to the dorsomedial or dorsolateral striatum were

made prior to the acquisition of the task, and rats were then

6.4.1. Action–outcome learning and habit learning in the dorsal extensively trained to retrieve reward using a response strategy,

striatum specifically a rightward body turn. The strategy that the animal is

Given enough time and practice, the learning of a motor skill or using can be assessed directly on a probe trial in which the animal

habit can move from being effortful, to a point where the newly begins the trial on a different arm of the maze. If the animal is

acquired skill can be performed without a great deal of cognitive dependent on a response/motor strategy, then it will persist in

effort. Under ‘normal’ learning conditions, some degree of making a rightward body turn, but if using a more flexible place

automation of behavior may be beneficial in that well-learned strategy, the animal will be able to navigate to the rewarded site by

behaviors can take place without a great deal of information reintegrating the spatial features of the environment with the goal

processing resources being engaged, thus leaving the organism in a location. Lesions of the posterior dorsomedial striatum resulted in

position to direct attentional and cognitive resources to more the use of a response strategy; in this case, the animals continued

difficult or urgent matters. The mechanisms that underlie this to make rightward body turns, while control animals were able to

transition are only just beginning to be understood. Behavioral employ a place strategy to successfully retrieve the reward. This

evidence indicates that motor skill and habit learning takes place observation, together with the data discussed above, indicates that

over an initial phase of fast improvements, followed by a slower the dorsomedial striatum underlies flexible choice behavior

phase of gradual refinement (Costa et al., 2004; Karni et al., 1998; (Corbit and Janak, 2010; Devan and White, 1999; Ragozzino

Yin and Knowlton, 2006; Yin et al., 2008). Within an instrumental et al., 2002; Whishaw et al., 1987).

learning task, this incremental learning is observed during an Neurophysiological studies indicate that neurons within the

initial phase of learning that is sensitive to both the action– dorsomedial striatum undergo changes in activity early on during

outcome contingency and the value of the outcome. After motor learning and their firing has been shown to change

prolonged training, however, these actions are transformed, and according to flexible stimulus-value assignments (Kimchi and

the behavior becomes automatic and insensitive to both the Laubach, 2009; Yin et al., 2009). Similarly, inactivation or

action–outcome contingency and to the outcome value (Balleine pharmacological manipulations of the prelimbic and infralimbic

and Dickinson, 1998; Balleine et al., 2009; Yin et al., 2008). cortical areas, which form part of the association loop that projects

A series of elegant studies conducted by Yin and his colleagues to the medial portion of the dorsal striatum, also impairs

have clearly identified functional differences between the dorsolat- behavioral flexibility (Ragozzino et al., 1999a,b). Whereas the

eral and dorsomedial striatum (Yin et al., 2004, 2005, 2006, 2009; Yin hippocampus may be necessary to establish the spatial location of

and Knowlton, 2004). Animals were trained to lever press for sucrose the goal (see Section 5), it would appear that the dorsomedial

reward using instrumental contingencies that are known to striatum is important for choosing the correct course of action that

eventually lead to habit formation. To test if the behavior indeed leads the animal to this location. One intriguing interpretation of

reached habit status, the reward was paired with lithium chloride to these results is that the hippocampus does not compete with, or

induce taste aversion. Control animal given this treatment contin- function independently of, the striatum, as has been previously

ued to lever press for sucrose reward, indicating that their behavior claimed (Packard and Knowlton, 2002; Poldrack and Packard,

was impervious to the reward devaluation procedure. Animals with 2003), but rather, these brain regions work synergistically to form

selective lesions of the dorsolateral striatum, however, significantly a functional circuit (Mizumori et al., 2004, 2009; Yin and Knowlton,

reduced their rate of responding, indicating that the dorsolateral 2006). This hypothesis is supported by studies that have examined

striatum plays a key role in habit behavior. Importantly, lesions of neural activity in the dorsomedial and dorsolateral striatum during

the dorsomedial striatum after the acquisition of the habitual spatial navigation. Some of the neurons within these regions

behavior did not affect habitual responding; these animals exhibit location-specific firing while a rat traverses a maze,

continued to lever press for sucrose reward after lithium chloride occasionally independent of both movement and reward condition

treatment, indicating that the dorsomedial striatum is not necessary (Mizumori et al., 2000; Ragozzino et al., 2001; Wiener, 1993).

for the expression of habitual behavior once it has been acquired (Yin While it has been argued that hippocampal place fields contribute

et al., 2004). Working on the idea that the dorsomedial striatum may to the determination of context saliency (discussed in Section

be involved in action–outcome learning rather than habit learning, 5.3.1), striatal place fields may be used to provide location-

Yin et al. again trained rats on a task that is normally sensitive to selective and context-dependent control over an animal’s move-

outcome devaluation and contingency degradation in which the ment. On the other hand, neurons that are sensitive to the

probability of reward delivery is no longer dependent on an egocentric movement of the animal are likely to reflect intentional

appropriate response by the rat (Colwill and Rescorla, 1990; movement/planning of movement toward the goal location, and

Hammond, 1980). Reversible inactivation of the posterior part of neurons responsive to the goal location provide information

the dorsomedial striatum, as well as pre- and post-training lesions of regarding the outcome of the action/movement to the goal location

this region, eliminated sensitivity to outcome devaluation and (Mizumori et al., 2004; Yeshenko et al., 2004). Support for this idea

degradation, and thus led to habit-like responding (Yin et al., 2005). has also been shown in non-human primates, in which striatal

Based on these results, it appears that the posterior dorsomedial neurons become engaged in processing information about learned

striatum is important for learning and expression of goal-directed events that have not yet occurred, suggesting that this activity is

behavior because when this region is functionally blocked, behavior evoked by the expectation of an upcoming salient event (Schultz

of the animal becomes habitual even under training conditions that et al., 1997). This kind of neural activity signals not only whether

normally result in goal-directed actions in control rats. an event is going to occur, but also the location of the event

As discussed in relation to the ventral striatum, maze tasks can (Hikosaka et al., 1989), and in some cases, the direction of

be used that closely parallel learning contingencies used within impending movement (Alexander and Crutcher, 1990b).

instrumental-operant tasks, despite there being obvious differ-

ences between the motor programs necessary for pressing a lever 6.4.2. Response learning in the dorsal striatum

and traversing a maze. Using a T-maze task, Yin and Knowlton For response learning, sensory stimuli directs the behavior or

(2004) evaluated the idea that the posterior dorsomedial striatum motor response that will ultimately be made, for example, an arm

is involved in flexible action–outcome/associative learning, movement or a body turn. The likelihood that any particular

120 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

movement is made in response to a stimulus is initially influenced suggests that the dorsal striatum participates in self-initiated

by the presence or absence of reward. Over time however, reward sequences of behaviors that lead to reward. In this study, rats were

no longer reliably influences behavior, and thus, the behavior is no trained to press two levers in a particular sequence in order to gain

longer considered flexible, but is considered habitual. The access to reward. Excitotoxic lesions of the dorsolateral striatum

acquisition of a habit involves the gradual development of specific significantly impaired the acquisition of the correct sequence,

S–R associations (Mishkin et al., 1984; Squire et al., 1993). A habit while lesions of the dorsomedial striatum had no significant effect

is distinguished by the tendency to be ‘response-like’ meaning that on sequence learning. In terms of reinforcement learning algo-

it is triggered automatically by a particular stimulus or stimulus rithms, chunking of behaviors into a coherent ‘whole’ that leads to

complex (Dickinson, 1985). If individual neurons represent a desired goal is formalized in hierarchical reinforcement learning

stimulus–response associations, then they should exhibit two models (Botvinick et al., 2009). These models are attractive for

key characteristics: their activity should be modulated by the describing goal-directed behavior in complex learning situations

presentation of a stimulus that cues the organism to perform an because they may be able to more accurately describe multiple

action for reward, and their activity should encode some aspect of ‘bits’ of behavior that ultimately lead to goal acquisition, blending

the action that the organism performs once the stimulus has been both model-based and model-free behavioral strategies that are

presented. This kind of activity has been well demonstrated in the likely to underlie flexible goal-directed behavior. Learning to

dorsolateral striatum using tasks that require the subject to make a execute learned actions in a complete sequence is essential for

specific response movement to receive a reward as directed by an survival and subserves many routine behaviors, including naviga-

instructional cue (e.g., Barnes et al., 2011; Jog et al., 1999; Thorn tion.

et al., 2010). These kinds of results have been demonstrated in both Organizing behaviors into sequences requires that precise

primates and rodents, for several different kinds of task-relevant timing and identification of the beginning and the end of a

cues, including auditory and visual cues, and for many different complete sequence of behaviors. Recent work has elegantly

body movements, including movement of the hand, arm/forelimb, demonstrated that the ‘stop’ and ‘start’ signals that identify the

eyes, head and whole body movements (Alexander and Crutcher, beginning and end of self-initiated sequential behavior appear to

1990b; Barnes et al., 2005; Gardiner and Kitai, 1992; Hikosaka be coded within the dorsal striatum (Jin and Costa, 2010). In this

et al., 1989; Jaeger et al., 1993; Jog et al., 1999; Kimura et al., 1992; study, rats were trained to press a lever on a fixed ratio schedule

Schultz and Romo, 1988, 1992; White and Rebec, 1993). that required 8 lever presses to obtain sucrose reward. Over the

Work by Ann Graybiel and her colleagues have identified some course of training, rats gradually acquired a sequence of

of the key neural mechanisms that underlie habit formation/ approximately 8 lever presses, with few responding any more or

stimulus–response learning (Barnes et al., 2005, 2011; Jog et al., any less when the lever was active. As the rats learned the

1999). Using a T-maze task, rats were overtrained to respond to the behavioral sequence necessary for obtaining reward, the activity of

presentation of an auditory instruction cue that indicated that the neurons within the dorsal striatum and the SNc appeared to reflect

animal should turn left or right to reach the goal (i.e., food reward). the initiation and termination of the self-paced action sequences.

Single unit recordings from the dorsolateral striatum were Importantly, control experiments provided evidence that these

performed throughout the training procedure, which allowed an learning-related changes in neuronal activity reflected neither

assessment of potential changes in neural activity as learning movement speed nor action value (Jin and Costa, 2010). Thus, these

progressed. In addition, task-related neural activity was assessed results have identified a fundamental mechanism that organizes

at different areas on the maze, including the start area, the area actions into behavioral sequences, and have important implica-

where the tone was provided, the area where the body turn toward tions for complex adaptive behaviors, including goal-directed

the goal was executed, and the goal location. Initially, neural navigation.

activity was responsive to several aspects of the task, especially the

point at which an animal executed the body turn toward the goal 6.5. Interactions between the dorsomedial and dorsolateral striatum

location. Over the course of learning, however, neural activity

gradually shifted, so that task-related activity reflected the Although many behaviors that are performed on a regular basis

beginning and the end of the task. This pattern of activity are often performed automatically, there are instances when it is

remained stable over the course of several weeks, as did the necessary to alter a routine if something in the environment

behavior (Jog et al., 1999). These results suggest that there is a changes and the routine behavior is thus rendered inappropriate.

restructuring of neuronal responses within the sensorimotor The regulation of this behavioral switching can occur either

striatum as habitual behavior develops. retroactively as a result of error feedback or proactively by

detecting a change in context. A salient example that is often given

6.4.3. Sequence learning in the dorsal striatum for this kind of behavior is driving to work – anecdotally, many

In addition to learning which behaviors ultimately lead to people have experienced suddenly arriving at work in their car

reward, goal-directed behavior may require that behaviors are without any specific recollection of the journey, despite being the

performed in a particular order or sequence. There is evidence that driver of the car. This is due, in part, to a fairly static context in

the striatum participates in the sequential organization of natural which we transverse the same route, and thus encounter the same

behaviors in monkeys (Van den Bercken and Cools, 1982) and rats traffic lights, execute the same turns, and become accustomed to

(Berridge and Whishaw, 1992; DeCoteau and Kesner, 2000; Pellis the background scenery around us (buildings, street lights, trees,

et al., 1993). For example, the dorsal striatum has been shown to be etc.). When, however, a significant change is encountered on our

critical for grooming sequences in rats (Aldridge and Berridge, drive to work, for example an unexpected accident that is backing

1998; Berridge and Whishaw, 1992). In addition, in the work up traffic, we can quite quickly interrupt our behavioral routine

discussed above, it was demonstrated that neurons within the and evaluate other available options for getting to work. Thus,

dorsolateral striatum tend to respond to the beginning and the end when confronted with a change in context, an important decision

of trials as training on a cued T-maze task progresses. This response can be made to switch from a routine behavior to an alternative

may indicate that behavioral sequences are parsed into ‘chunks’ as behavior that will allow us to reach our goal location.

the task is learned (e.g., Barnes et al., 2005; Boyd et al., 2009; In order for habits to develop, learning needs to occur that

Graybiel, 1998; Kubota et al., 2009; Thorn and Graybiel, 2010; associates a particular action with a particular outcome. As

Tremblay et al., 2009, 2010). Recent work by Yin (2010) also described above, this kind of association can be mediated by the

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 121

dorsomedial striatum. Once a behavior has been well-learned, multiple neural systems are involved, but also the adaptive

however, its performance appears to be mediated by the features of this behavioral model depend on conditional and

dorsolateral striatum. If these observations are true, then a iterative processing loops, as well as coordination at multiple

question that remains is how these different subregions gain or levels of neural function (from single neuron to specific interac-

maintain control over behavior? Recent work by Thorn et al. (2010) tions between brain structures). Also contributing to the difficulty

suggests that activity within the dorsolateral and dorsomedial of studying complex behaviors are the dynamic ways in which the

striatum undergo simultaneous changes in their neuronal activity nature of the signals transmitted to efferent structures can change,

patterns, but that these changes are unique in each structure as both in terms of information content, and whether such signals

learning progresses. These results point to a confluence of many serve activating, inhibiting, or permissive roles. Moreover, much of

other pieces of data (e.g., Jog et al., 1999; Yin and Knowlton, 2004; the existing literature on the neurobiology of complex behaviors

Yin et al., 2009) and suggest a current working model in which the considers rate codes of neurons, and to a lesser degree, temporal

dorsomedial striatum regulates the evolution of behavior toward codes, although this is changing in more recent studies. At a higher,

habit formation. This idea has been further tested by Yin et al. more integrative level, the identity of the coordinating mechanism

(2009), who identified region specific changes in striatal neural of orchestrated neural activity is not yet known. With regard to the

activity that map onto different phases of skill learning. latter issue, a likely possibility is that the primary determinant of

Electrophysiological recordings from the dorsolateral and dor- the interactive and dynamic patterns that emerge may not be

somedial striatum were performed while mice learned an attributed to a single brain structure but rather a state, such as a

accelerating rotarod task, a task that requires the gradual motivational or emotional state.

acquisition of complex movements to stay on the rotating rod.

Performance on this task is characterized by rapid initial 7.1. Single cells and local network coordination

improvement on the first day of training, with performance

reaching asymptotic levels after three days of training. These The functional orchestration of neural systems that underlie

behavioral observations were accompanied by distinct changes in complex behaviors should be expected to involve integration

the rate of neuronal activity in the dorsomedial striatum early on in within and across multiple levels of processing, from cellular to

training, while the dorsolateral striatum showed increased rate local circuit to neural systems. We are only beginning to

modulation during the extended training period. Further, when understand how such integration can happen, and studies of

lesions of the dorsomedial striatum were given prior to training, goal-directed navigation have begun to reveal important clues.

mice were unable to acquire the skill, but this was not observed Starting at the level of single neurons, it is known that dopamine

when lesions were produced after the acquisition of the skill. In has effects across different timescales in different brain structures,

contrast, lesions of the dorsolateral striatum affected both early and this may define the type of coordination that is possible at any

and late phases of training, suggesting that the dorsolateral given point in time. In the hippocampus, a short lasting effect of

striatum and dorsomedial striatum participate in the acquisition of dopamine may be to determine the location of a place field (Martig

the motor skills, but once the skills are learned, the dorsomedial et al., 2009), while a long lasting effect could be to enhance the

striatum is no longer engaged. Recordings from slices taken from duration of the post-event period of plasticity (Huang and Kandel,

the trained animals demonstrated a potential synaptic mechanism 1995; Otmakhova and Lisman, 1996; Rossato et al., 2009). By

for this transition: medium spiny neurons in both the dorsomedial prolonging periods of plasticity, dopamine activation may make it

and dorsolateral striatum exhibited training phase-related so that there is sufficient time for accurate context analysis, a

changes in glutamatergic transmission. The slope of excitatory process that in turn determines which memories are formed or

postsynaptic potentials, a measure of synaptic strength, was higher updated. An example of how this might work can be seen when one

selectively in the dorsomedial striatum following early training, considers place field responses during learning: place fields

while an increase in synaptic strength was higher in the become sequentially associated as rats repeatedly traverse a path

dorsolateral striatum only after extended training. Although this on its way to reward. This sequential activation of place cells was

task is different from more traditional learning tasks (instrumen- shown to repeat itself ‘off-line’ during subsequent period of

tal-operant or maze tasks), it is likely to point to a fundamental relative inactivity (e.g., Lee and Wilson, 2002; Louie and Wilson,

synaptic mechanism that underlies the transition from action– 2001; Wilson and McNaughton, 1994). This pattern of neural

outcome/associative learning to well learned habit/motor skills, ‘replay’ is consistent with many theories of memory including the

irrespective of the task used. idea that optimal memory requires the reactivation of behavioral

In summary, there is emerging evidence that the striatum experiences, typically during periods of sleep or rest (Buzsaki,

functions to evaluate the outcomes of behaviors in terms of an 1989; Marr, 1971; McClelland et al., 1995; Pennartz et al., 2002).

organism’s learned expectations. Through a series of interactive Interestingly, dopamine has been shown to facilitate hippocampal

loops of information flow between the striatum and different ‘replay’ of sequences of place fields (Singer and Frank, 2009). Thus,

cortical and subcortical structures, behavioral responses and their dopamine may direct both cellular (e.g., place field location) and

expected consequences will become more refined and predictable. circuit level (e.g., sequential activation of place fields) neural

Ultimately, a well learned behavioral response will develop as the organization within the hippocampus. In this way, dopamine is

dorsolateral striatum assumes greater control over behavior. These necessary for synaptic plasticity within the hippocampus.

functions must ultimately be coordinated with the context The replay of temporally ordered neural activity has been

saliency function of the hippocampus so that the ‘best’ behaviors primarily studied in populations of hippocampal pyramidal cells

can be selected within the correct context or decision making that exhibit place fields (Skaggs et al., 1996; Wilson and

environment. How this coordination among different brain McNaughton, 1994), where it is assumed to underlie spatial and

structures occurs is discussed in the following section. contextual information processing. Work by Pennartz et al. (2004),

however, indicates that this kind of replay may reflect a common

7. Neural systems coordination: cellular mechanisms process that enables binding of many kinds of information. In that

study, replay of sequences of neural activity was found to also

Understanding how, and under what conditions, neural occur in the ventral striatum during periods of rest that follow

systems interact is no small feat, even with a tractable model periods of activity. Moreover, recent work suggests that reward-

such as goal-directed navigation. This is the case not only because related replay contributes a motivational component to a

122 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

reactivated memory trace (Lansink et al., 2008). A follow-up study rats (Benchenane et al., 2010). Assuming this is also the case in

by the same group (Lansink et al., 2009) further demonstrated that awake navigating rats, it may be that dopamine plays a crucial role

hippocampal–striatal ensembles reactivated together during in coordinating ensemble activity across brain areas within a

sleep. This process was especially strong in pairs in which the decision-making network during navigation. Functionally, this

hippocampal cell processed spatial information and ventral striatal type of control by dopamine suggests that information about the

firing correlated to reward, suggesting a mechanism for consoli- saliency of reward may determine which brain systems become

dating place-reward associations. synchronized (and desynchronized), and this in turn informs

decisions about what information is used to update memories and

7.2. Neural systems organization and oscillatory activity which behaviors are selected.

Neural circuits have a natural tendency to oscillate according to a 7.2.2. Gamma rhythms

wide range of frequencies, and as such are likely to reflect a Neuronal groups are observed to synchronize their activity at

fundamental mechanism for coordinating neural activity across frequencies that are higher than the theta rhythm. In particular it is

multiple brain regions (e.g., Buzsaki, 2006; Fries, 2009). Goal- now well established that many brain areas exhibit rhythmic

directed navigation likely requires a high degree of coordination of neural activity in the gamma band (30–100 Hz). These include

multiple forms of information so that decisions can be made quickly. many sensory and motor areas of cortex, hippocampus, parietal

Thus, it seems reasonable to assume that a rich array of rhythmic cortex, and striatum (e.g., Bauer et al., 2006; Berke et al., 2004;

coordination occurs as animals engage in decision processes during Brosch et al., 2002; Csicsvari et al., 2003; Hoogenboom et al., 2006;

navigation. Oscillatory activity reflects alternating periods of Leung and Yim, 1993; Womelsdorf et al., 2006). In all cases, it is

synchronous and desynchronous neural firing: synchronous activity thought that the inhibitory interneuron networks within each

is associated with greater synaptic plasticity and stronger coupling structure play a major role in generating synchronized gamma

among cells of an ensemble, while desynchronous activity is oscillations (e.g., Bartos et al., 2007; Vida et al., 2006; Whittington

associated with period of less plasticity and weak signal strength et al., 1995). The functional importance of gamma oscillations

(Buzsaki, 2006; Hasselmo, 2005b; Hasselmo et al., 2002). remains debated. However, since gamma oscillations tend to occur

intermittently (i.e., in the form of a ‘gamma burst’ of about 150–

7.2.1. Theta rhythms 250 ms followed by periods of desynchronous activity), informa-

Numerous laboratories have now reported that synchronous tion carried by the cells that participate in a gamma-burst

neural activity (in particular coherence of the theta rhythm) can be effectively become a noticeable punctuate signal against a

detected across local neural networks both within and between background of disorganized neural activity. For this reason, it

brain structures such as the hippocampus, striatum, or prefrontal has been suggested that gamma-bursts represent a fundamental

cortex (DeCoteau et al., 2007a; Engel et al., 2001; Fell et al., 2001; mechanism by which information becomes segmented and/or

Siapas et al., 2005; Tabuchi et al., 2000; Varela et al., 2001; filtered within a structure, as well as a way to coordinate

Womelsdorf et al., 2007). For example, hippocampal theta activity information across structures (Buzsaki, 2006). Although theta

modulates the probability of neuronal firing, and theta can become and gamma frequencies vary by quite a bit (perhaps reflecting the

synchronized with place cell firing, serving to coordinate the type of information that each rhythm coordinates), there are many

timing of spatial coding (Gengler et al., 2005; O’Keefe and Recce, common physiological and behavioral relationships that suggest

1993). A growing number of studies demonstrate coordinated they are components of a coordinated and larger scale oscillatory

neural activity between the hippocampus and the striatum. Theta network. For example, similar to theta rhythms, single unit

oscillations within the striatum can become entrained to the responses that are recorded simultaneously with gamma oscilla-

hippocampal theta rhythm (Allers et al., 2002; Berke et al., 2004; tions have been found to have specific phase relationships to the

DeCoteau et al., 2007a). Stimulating the striatum can induce gamma rhythm (e.g., Berke, 2009; Kalenscher et al., 2010; van der

hippocampal theta activity (Sabatino et al., 1985) and increases Meer and Redish, 2009). Also, it is hypothesized that gamma

high frequency theta power, which is thought to be important for oscillations may effectively select salient information that can

sensorimotor integration (Hallworth and Bland, 2004). When come to impact decisions, learning, and behavioral responses (e.g.,

neural activity is disrupted in the striatum via D2 receptor Kalenscher et al., 2010; van der Meer and Redish, 2009) since their

antagonism, striatal modulation of high frequency hippocampal appearance is often in relation to task-relevant events. Another

theta activity is reduced, motor and spatial/contextual information similarity with the theta system is that the occurrence gamma

is not integrated, and task performance is impaired (Gengler et al., oscillations appear to be at least in part regulated by the dopamine

2005). It appears then that during goal directed navigation, system (Berke, 2009).

hippocampal and striatal activity becomes increasingly coherent,

and this pattern appears dopamine dependent. 7.2.3. Coordination of theta and gamma rhythms

Particularly intriguing is a finding common to both the It appears that task demands dictate the nature of neural

hippocampus and striatum: synchronous neural activity occurs synchrony across distal brain structures, suggesting that coordi-

in specific task-relevant ways (e.g., Hyman et al., 2005; Jones and nation of neural activity across brain structures has at least a

Wilson, 2005), and in particular, during times when rats are said to mnemonic component. A recent study (Fujisawa and Buzsaki,

be engaged in decision making (e.g., Benchenane et al., 2010). For 2010) showed that such an influence may come in the form of a

example, striatal theta is modified over the course of learning an very low frequency (4 Hz) entrainment of local field potentials

egocentric T-maze task, increasing as the rat chooses and initiates across brain areas (e.g., the 7–12 Hz theta oscillation). In that

turn behavior (DeCoteau et al., 2007a,b). Rats that learned the task study, a 4-Hz rhythm emerged only during phases of a maze task

developed an antiphase relationship between hippocampal and when rats made decisions (i.e., in the stem of a T-Maze). During

striatal theta oscillations, while rats that did not learn the task also decision periods, the 4 Hz rhythm was phase locked to the theta

did not show this coherent theta relationship. This coherence has oscillations in both the prefrontal cortex and VTA. Some of the

also been observed during striatal-dependent classical condition- individual prefrontal and VTA neurons were also phase locked to

ing (Kropf and Kuschinsky, 1993). hippocampal theta oscillation at this time. Importantly, the 4 Hz

Coherent theta oscillations across distant brain structures can rhythm was present only during a decision making period when

be enhanced with application of dopamine, at least in anesthetized theta oscillations were also present. The findings of the Fujisawa

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 123

and Buzsaki (2010) study suggest that a 4 Hz rhythm may only a representation of the current spatial structure of an

coordinate activity in distal brains structures specifically as environment, but also an experience-dependent definition of a

animals make decisions during goal-directed navigation. It rat’s expectations for the sensory environment that is, itself,

remains to be seen whether dopamine selectively activates the influenced by the appropriate behavioral repertoire, and expecta-

4 Hz rhythm when decisions need to be made. tions about the consequences of decisions and choices. Lateral

entorhinal cortex is presumably also activated by current (but in

8. Neural systems coordination: decisions and common this case nonspatial) sensory input as well as the same set of

foraging behaviors expectations (i.e., memories) that influence medial entorhinal

cortical processing. With the combined input from medial and

Successful decisions during goal-directed navigation likely lateral entorhinal cortex, the hippocampus can determine the

depend on a hierarchy of systems and cellular level interactions in extent to which the rat’s (spatial and nonspatial) expectations for

the brain. The accompanying video (http://depts.washington.edu/ the current context are met.

mizlab) demonstrates on a basic level, the relative involvement of When goals are achieved as predicted (e.g., food is found in

the hippocampus, the dopamine system, and the ventral and dorsal expected locations), hippocampal output may have the effect of

(medial and lateral) striatum during a simple food search task on a strengthening currently active memory circuits thereby increasing

laboratory maze. Particular attention is paid to the relative the likelihood that the same decisions and behaviors will be

contributions of these brain areas during each of the five ‘states’ selected the next time the rat is in the same familiar situation. The

of processing in Fig. 3, and as a function of novel exposure, new signal strength to ventral striatum would be expected to be

learning, and asymptotic performance levels. moderate, resulting in ventral striatal output that maintains a

To illustrate in more detail the functional interactions of the baseline level of inhibitory control over VTA neural responses to

same brain regions during common foraging scenarios, the reward encounters. That is, when rats encounter rewards in

following are neural and behavioral explanations for how animals expected locations, there should be no VTA response to the reward

make adaptive choices while navigating familiar environments, encounter itself. If an animal finds itself engaging in rather

how decisions are adjusted when familiar conditions change, and stereotyped or habitual behaviors in the familiar environment, it is

then how this same circuitry mediates rapid and adaptive learning likely that the dorsolateral striatum exerts more control over

when animals find themselves in novel situations. behavior than ventral striatum since dorsolateral striatum is

particularly involved in the performance of habitual behaviors

8.1. Goal directed navigation in a familiar context (e.g., Atallah et al., 2007; Jog et al., 1999; Thorn et al., 2010; Yin and

Knowlton, 2004; Yin et al., 2009 as discussed above).

There is clearly a home court advantage when it comes to an VTA dopamine neurons are known to increase firing when an

animal’s survival. If animals are familiar with their environment, animal encounters cues that predict reward (in familiar test

they are more likely to make good choices when it comes to conditions, e.g., Puryear et al., 2010; Schultz et al., 1997). These

deciding when and where to secure food, safe shelter, and mates. cue-elicited responses may arrive from the frontal cortex as there

This is the case not only because animals have learned the physical is little evidence of predictive cue processing in at least two other

characteristics of the environment, but perhaps more importantly major VTA afferent structures (e.g., the PPTg and LDTg). Thus,

because they have learned to identify its salient features. These during navigation in a familiar environment, both frontal cortex

salient features have taken on predictive value based on the and hippocampus may determine the timing of dopamine cells’

expected probability of reward given certain levels of effort. This contribution to reward processing. Although the details of the

information can be used to make choices that are appropriate for underlying neurocircuitry are presently not clear, this pattern of

different motivational and behavioral states. Under constant dopamine cell firing to cues and rewards results in the mainte-

conditions, obtaining a predicted outcome should result in the nance of the currently active memory networks.

strengthening of the memories that were used to guide decisions

and behavioral choices in the first place. 8.2. Goal directed navigation in a familiar context following a

It is postulated that the motivational state of an animal significant change in context

predisposes them to pay attention to specific cues within a familiar

environment, cues that have been previously associated with goal The natural environment is a continuously changing one. Thus,

acquisition. In this way, memories of past behavioral outcomes of, even when a rat navigates a familiar environment, the hippocam-

for example a hungry rat, defines the appropriate behavioral pus should automatically and continuously evaluate the saliency of

responses needed to obtain maximum amounts of food with the current context. In that way, when a rat encounters a change in

minimal effort or temporal delay. Based on the extensive literature the expected matrix of context information, hippocampal output

summarized previously, it seems reasonable to assume that when can immediately reflect the detected change to assess the need to

a rat enters a familiar environment in search of food, its change decisions and behaviors. Note that since a given context is

translational movement generates (movement-sensitive) theta comprised of multiple features, a detectable change in any one

rhythms in hippocampal regions, resulting in the activation of a feature should result in a signal that the context is different. The

spatial coordinate system that in turn imposes an experience- impact of detecting a context change on subsequent behaviors

determined spatial organization to information used during the depends on the processing within efferent target structures.

current event. The clearest neural instantiation of such an When an unexpected behavioral outcome or stimulus configu-

organization (often referred to as a spatial reference frame, map ration occurs in a familiar environment, rats increase exploratory

or chart) is represented by the grid cells of medial entorhinal activity and attention to potential cues. The latter would be

cortex. While there remain unresolved issues about how such a expected to result from the reorganization of spatial representa-

reference system actually works (e.g., does a given ‘map’ reset tions (e.g., grid and place cells) in hippocampal systems. The

during a single navigational event, and if so how and under what hippocampal reorganization would in turn generate an output that

conditions?), the current view is that both learned spatial and reflects the context change. In anticipation of receipt of new

nonspatial information arrive in hippocampus via the medial and information, striatal theories (e.g., Belin and Everitt, 2008;

lateral entorhinal cortices, respectively. Upon entering a familiar Humphries and Prescott, 2010; Salamone et al., 2009) suggest

environment, the medial entorhinal spatial reference includes not that when there is a significant change in a familiar environment,

124 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

ventral striatum may come to play a greater role in behavioral have been interpreted as ‘teaching signals’ for other neural systems

control than dorsal striatum. According to the circuitry presented (Schultz and Dickinson, 2000).

in Fig. 6, hippocampal output to the ventral striatum can The outcome of a striatal/VTA evaluation of the reinforcement

potentially activate two pathways of information flow to the outcomes of context-dependent behaviors is likely used by striatal

VTA. According to a scenario described by Humphries and Prescott efferent systems to modify decisions about which behaviors to

(2010), the ventral striatum in turn relays information about engage and which memories to modify. As memories become

reward expectations via a direct inhibitory pathway to the VTA, updated, so do the expectations for a given spatial context.

and information about the actual rewards via an indirect excitatory Assuming that the expected spatial context input to hippocampus

pathway (the ventral pallidum and the PPTg) to the VTA. When the is continuously refreshed, the context discrimination can always

actual rewards occur as expected, there is comparable inhibitory proceed with the most recent information from neocortex.

and excitatory control over dopamine cell responses to reward.

This balanced pattern of input results in no response to rewards by 8.3. Goal directed navigation in a novel context

dopamine neurons. Indeed dopamine cells do not respond to the

acquisition of expected rewards. If however, the actual reward is Recent evidence suggests that at least rats have an innate,

greater than expected, the excitatory drive should be greater than though initially rudimentary, spatial navigation-related neural

the inhibitory one, resulting in increased firing to reward by network that continues to develop over time (Langston et al., 2010;

dopamine cells. Perhaps the increased excitatory ventral striatal Wills et al., 2010). While the directional heading circuitry appears

input transitions dopamine cell membranes to a relative depolar- adult-like from a very young age, the grid and location systems

ized state. On the other hand, if the actual reward is less than take more time to develop. As experiences accumulate over a

expected, the inhibitory drive becomes greater than the excitatory lifetime then so might the efficiency of a context-dependent

one and this is manifest as reduced firing at the time of expected navigation circuit. Learning is faster when the outcomes of

rewards. Either of these altered dopamine responses to reward behaviors are predictable, and predictability can be enhanced

Theta wave Gamma wave

Dorsal Dopamine Ventral Hippocampus striatum (VTA/SN) striatum (Medial) (Lateral)

Context Salience Reward Salience Expectations Behavioral state

ed

hing signal Critic r-model free ontext Salience cto C Teac Actor-model basA

Goal-directed navigation

Fig. 10. Orchestration of neural systems while animals make decisions during goal-directed navigation. Accurate goal-directed navigation requires precise integration of

multiple types of information (e.g., context salience, reward salience, expectations (based on memories), and one’s behavioral state). Based on the current literature, it is clear

that all of these types of information are represented in some way within different neural systems. For illustration purposes, only the hippocampus, dopamine system, ventral

and dorsal striatum are shown. Thus, the nature of information represented does not clearly reveal the unique contributions of any one of these neural systems to goal-

directed navigation. Rather, the specialized contributions of different neural systems must be defined by their computational capacities (i.e., their intrinsic patterns of neural

connectivity), and the particular efferent structures that receive their output messages. Converging evidence supports the view that hippocampal output reflects an

evaluation of the salience of the current context, dopamine cells signal changes in expected reward values (and in doing so serve as a ‘teaching signal’ that updates processing

in efferent structures), the ventral striatum determines whether the outcomes of behavior were predicted, and dorsal striatum selects the appropriate behavior based on the

ventral striatal analysis. Especially during new learning, the dorsal medial striatum plays this ‘actor’ role for model based learning. As learning and performance become

model free, the dorsal lateral striatum serves the ‘actor’ role. These neural systems do not necessarily function independently. Rather emerging findings show that, depending

on specific task demands, neural activity may become synchronized across combinations of two or three brain structures according to theta and gamma rhythm frequencies.

Importantly the synchronization appears to happen at times when decisions should be made. This suggests that there may be some overarching factor that determines when

systems interactions will occur. One possibility is that a very low frequency oscillation (4 Hz) coordinates the theta and gamma coherence that has been observed between

neural systems (Fujisawa and Buzsaki, 2010). Since general physiological states are known to alter patterns of neural representations during learning (e.g., Kennedy and

Shapiro, 2004), it is suggested here that physiological states, such as hunger, fear, and stress, may determine the kind of neural systems orchestration that needs to take place

in order for animals to make optimal decisions relative to the achievement of specific kinds of goals.

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 125

even in a novel environment, if the significance of at least a subset in reinforcement learning (e.g., Doya, 2008; Farrar et al., 2008,

of contextual features can be inferred from past experiences with 2007, 2010; Font et al., 2008; Mingote et al., 2008a,b; Miyazaki

similar features. et al., 2011; Mott et al., 2009; Ragozzino, 2003; Ragozzino et al.,

Novelty coding by the navigational circuit is typically tested 2009; Worden et al., 2009).

with rats that have been trained to forage for food in one (b) There are many unanswered questions regarding the role of

environment, then placed in a new testing environment but asked dopamine in decision making and learning. For example, does

to perform the same behaviors (e.g., search for randomly place food dopamine have the same impact on synaptic and behavioral

in a novel open arena). Thus, the task rules and motor instructions functions in all brain regions that receive dopamine inputs?

for the novel context are previously learned, but there is novelty in The answer is likely yes and no. Dopamine appears to facilitate

terms of the cues that are present to inform goal-directed choices. excitation in efferent structures, although the details and

The familiar features (e.g., the narrow alleys of a maze, an enclosed temporal details may vary. Even if the degree of excitability

testing area with cues, and the fact that rewards can be found on were the same in different brain areas, the impact on behavior

such mazes) should immediately activate a ‘best match’ reference will likely be different since different structures (e.g.,

frame that can be used to guide initial exploration and goal hippocampus and striatum) engage unique intrinsic computa-

directed decisions. As consequences to choices occur and learning tional architectures to process similar information (e.g., spatial,

takes place, the difference between expected and actual context movement, and reward). Another critical issue whose resolu-

will diminish, and the relevant memory and reference frame will tion will impact future theoretical explanations of decision

be updated accordingly. At the point when the expected contexts making during navigation is the regulation and meaning of

and behavioral outcomes match what actually occur, one can tonic release of dopamine. For instance, tonic levels of

conclude that learning is complete. This new learning process may dopamine may contribute to defining the overall motivation

use similar neural circuitry as that described above when or goals during navigation (e.g., Niv et al., 2007).

information about changes in an expected context update (c) When recording in navigating animals, it is clear that against a

memories. In this way, behaviors that increase cue predictability foreground of interesting task-relevant firing is a background

and reduce unexpected outcomes will be associated with specific of neural codes for egocentric movements exhibited by the

cues. animal. The meaning of this seemingly universal coding of

If a rat with no testing experience is placed in an experimental egocentric information remains elusive. An intriguing possi-

arena for the first time, the rat may still bring to bear a ‘best match’ bility is that such codes guide specific task-relevant codes in a

option, or some minimal form of spatial reference within which to manner analogous to the way that intended movements appear

incorporate new information into the memories that are being to bias sensory responses by cortical neurons (e.g., Colby, 1998;

created during learning. For example, the rat may have learned the Colby and Goldberg, 1999). Interestingly, the movement-

identity of a safe new food, but now needs to learn the rules that related cells are often interpreted as reflecting the firing

lead to the efficient, most cost-effective strategy for securing the patterns of inhibitory interneurons, the specific function of

food. Compared to a foraging situation when there are slight which are only beginning to be appreciated.

changes in a familiar context, it should take more trials or more

time to reach the point when the expectations match the actual

The existence of many unresolved issues should not deter

outcomes (i.e., when learning is complete).

continued and intensive investigation of the adaptive navigation-

based heuristic for complex learning situations. Rather, because it

9. The challenges ahead

is evolutionarily highly conserved, this model holds great promise

for continuing to reveal fundamental organizing principles within

A big challenge facing the general field of neuroscience is to

and across neural systems, as well as between neural systems

understand the dynamic neural mechanisms that underlie

functions and behavior.

complex and adaptive natural behaviors. A first step toward

addressing this challenge could be to integrate existing literatures

Acknowledgements

on specific components of the adaptive behavior of interest, such as

context processing and decision-making that occurs during goal-

We thank Yong Sang Jo for helpful comments on earlier versions

directed navigation. In addition, new findings indicate that

of this manuscript and for producing all of the figures, Trevor

decision making during navigation is a powerful model for not

Bortins for producing the video linked to the article, Daniela

only defining neural and behavioral states that are relevant to this

Jaramillo for help managing references, Drs. Jeremy Clark and

behavior, but also for understanding how these states switch

Andrea Stocco for insightful discussion regarding the striatum, and

processing modes during natural learning situations. The identifi-

Dr. Van Redila for comments on an earlier version. We also thank

cation of such ‘switching mechanisms’ is important for our

anonymous reviewers for their comments. This work is funded by

understanding of what leads to decisions to ‘stay the course’ or

NIMH grant MH58755.

change behaviors. It is proposed that the motivational state of the

animal establishes the intended goals, and as such sets the References

thresholds for, and constraints on, neural activation across

multiple brain structures. A summary of key elements of this Aberman, J.E., Salamone, J.D., 1999. Nucleus accumbens dopamine depletions make

rats more sensitive to high ratio requirements but do not impair primary food

model is shown in Fig. 10.

reinforcement. Neuroscience 92, 545–552.

An explanation of the neurobiological mechanisms that support

Aberman, J.E., Ward, S.J., Salamone, J.D., 1998. Effects of dopamine antagonists and

decisions during goal-directed navigation will undoubtedly accumbens dopamine depletions on time-constrained progressive-ratio per-

become more complex. This is the case not only because of the formance. Pharmacol. Biochem. Behav. 61, 341–348.

Albin, R.L., Young, A.B., Penney, J.B., 1989. The functional anatomy of basal ganglia

technological advances in our ability to probe brain function, but

disorders. Trends Neurosci. 12, 366–375.

also because of the following:

Alderson, H.L., Latimer, M.P., Winn, P., 2008. A functional dissociation of the anterior

and posterior pedunculopontine tegmental nucleus: excitotoxic lesions have

differential effects on locomotion and the response to nicotine. Brain Struct.

(a) There are other important contributing factors that were not

Funct. 213, 247–253.

discussed here. Examples include the possible roles of

Aldridge, J.W., Berridge, K.C., 1998. Coding of serial order by neostriatal neurons: a

serotonin, acetylcholine, enkephalins, A2A receptors, and GABA ‘‘natural action’’ approach to movement sequence. J. Neurosci. 18, 2777–2787.

126 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

Alexander, G.E., Crutcher, M.D., 1990a. Functional architecture of basal ganglia Behr, J., Gloveli, T., , D., Heinemann, U., 2000. Dopamine depresses excit-

circuits: neural substrates of parallel processing. Trends Neurosci. 13, 266–271. atory synaptic transmission onto rat subicular neurons via presynaptic D1-like

Alexander, G.E., Crutcher, M.D., 1990b. Preparation for movement: neural repre- dopamine receptors. J. Neurophysiol. 84, 112–119.

sentations of intended direction in three motor areas of the monkey. J. Neu- Belin, D., Everitt, B.J., 2008. Cocaine seeking habits depend upon dopamine-depen-

rophysiol. 64, 133–150. dent serial connectivity linking the ventral with the dorsal striatum. Neuron 57,

Alexander, G.E., DeLong, M.R., Strick, P.L., 1986. Parallel organization of functionally 432–441.

segregated circuits linking basal ganglia and cortex. Annu. Rev. Neurosci. 9, Benchenane, K., Peyrache, A., Khamassi, M., Tierney, P.L., Gioanni, Y., Battaglia, F.P.,

357–381. Wiener, S.I., 2010. Coherent theta oscillations and reorganization of spike

Allers, K.A., Ruskin, D.N., Bergstrom, D.A., Freeman, L.E., Ghazi, L.J., Tierney, P.L., timing in the hippocampal–prefrontal network upon learning. Neuron 66,

Walters, J.R., 2002. Multisecond periodicities in basal ganglia firing rates 921–936.

correlate with theta bursts in transcortical and hippocampal EEG. J. Neurophy- Beninato, M., Spencer, R.F., 1987. A cholinergic projection to the rat substantia nigra

siol. 87, 1118–1122. from the pedunculopontine tegmental nucleus. Brain Res. 412, 169–174.

Amaral, D.G., Ishizuka, N., Claiborne, B., 1990. Neurons, numbers and the hippo- Berendse, H.W., Galis-de Graaf, Y., Groenewegen, H.J., 1992a. Topographical orga-

campal network. Prog. Brain Res. 83, 1–11. nization and relationship with ventral striatal compartments of prefrontal

Amaral, D.G., Lavenex, P., 2006. Hippocampal neuroanatomy. In: Anderson, P., corticostriatal projections in the rat. J. Comp. Neurol. 316, 314–347.

Morris, R., Amaral, D., Bliss, T., O’Keefe, J. (Eds.), The Hippocampus. Oxford Berendse, H.W., Groenewegen, H.J., Lohman, A.H., 1992b. Compartmental distribu-

University Press, Oxford. tion of ventral striatal neurons projecting to the mesencephalon in the rat. J.

Anagnostaras, S.G., Gale, G.D., Fanselow, M.S., 2001. Hippocampus and contextual Neurosci. 12, 2079–2103.

fear conditioning: recent controversies and advances. Hippocampus 11, 8–17. Berke, J.D., 2009. Fast oscillations in cortical–striatal networks switch frequency

Anderson, M.I., Jeffery, K.J., 2003. Heterogeneous modulation of place cell firing by following rewarding events and stimulant drugs. Eur. J. Neurosci. 30, 848–

changes in context. J. Neurosci. 23, 8827–8835. 859.

Anderson, O., 1984. Optimal foraging by largemouth bass in structured environ- Berke, J.D., Okatan, M., Skurski, J., Eichenbaum, H.B., 2004. Oscillatory entrainment

ments. Ecology 65, 851–861. of striatal neurons in freely moving rats. Neuron 43, 883–896.

Annett, L.E., McGregor, A., Robbins, T.W., 1989. The effects of ibotenic acid lesions of Berridge, K.C., 2007. The debate over dopamine’s role in reward: the case for

the nucleus accumbens on spatial learning and extinction in the rat. Behav. incentive salience. Psychopharmacology (Berl) 191, 391–431.

Brain Res. 31, 231–242. Berridge, K.C., Robinson, T.E., 1998. What is the role of dopamine in reward: hedonic

Aragona, B.J., Day, J.J., Roitman, M.F., Cleaveland, N.A., Wightman, R.M., Carelli, R.M., impact, reward learning, or incentive salience? Brain Res. Brain Res. Rev. 28,

2009. Regional specificity in real-time development pf phasic dopamine trans- 309–369.

mission patterns during acquisition of a cue-cocaine association in rats. Eur. J. Berridge, K.C., Whishaw, I.Q., 1992. Cortex, striatum and cerebellum: control of

Neurosci. 30, 1889–1899. serial order in a grooming sequence. Exp. Brain Res. 90, 275–290.

Astur, R.S., Ortiz, M.L., Sutherland, R.J., 1998. A characterization of performance by Bethus, I., Tse, D., Morris, R.G., 2010. Dopamine and memory: modulation of the

men and women in a virtual Morris water task: a large and reliable sex persistence of memory for novel hippocampal NMDA receptor-dependent

difference. Behav. Brain Res. 93, 185–190. paired associates. J. Neurosci. 30, 1610–1618.

Atallah, H.E., Lopez-Paniagua, D., Rudy, J.W., O’Reilly, R.C., 2007. Separate neural Bezzina, G., Body, S., Cheung, T.H., Hampson, C.L., Deakin, J.F., Anderson, I.M.,

substrates for skill learning and performance in the ventral and dorsal striatum. Szabadi, E., Bradshaw, C.M., 2008. Effect of quinolinic acid-induced lesions of

Nat. Neurosci. 10, 126–131. the nucleus accumbens core on performance on a progressive ratio schedule of

Bach, M.E., Barad, M., Son, H., Zhuo, M., Lu, Y.F., Shih, R., Mansuy, I., Hawkins, R.D., reinforcement: implications for inter-temporal choice. Psychopharmacology

Kandel, E.R., 1999. Age-related defects in spatial memory are correlated with (Berl) 197, 339–350.

defects in the late phase of hippocampal long-term potentiation in vitro and are Bjorklund, A., Dunnett, S.B., 2007. Dopamine neuron systems in the brain: an

attenuated by drugs that enhance the cAMP signaling pathway. Proc. Natl. Acad. update. Trends Neurosci. 30, 194–202.

Sci. U.S.A. 96, 5280–5285. Boeijinga, P.H., Mulder, A.B., Pennartz, C.M., Manshanden, I., Lopes da Silva, F.H.,

Balleine, B.W., Delgado, M.R., Hikosaka, O., 2007. The role of the dorsal striatum in 1993. Responses of the nucleus accumbens following fornix/fimbria stimula-

reward and decision-making. J. Neurosci. 27, 8161–8165. tion in the rat. Identification and long-term potentiation of mono- and poly-

Balleine, B.W., Dickinson, A., 1998. Goal-directed instrumental action: contingency synaptic pathways. Neuroscience 53, 1049–1058.

and incentive learning and their cortical substrates. Neuropharmacology 37, Bornstein, A.M., Daw, N.D., 2011. Multiplicity of control in the basal ganglia:

407–419. computational roles of striatal subregions. Curr. Opin. Neurobiol. 21, 374–

Balleine, B.W., Liljeholm, M., Ostlund, S.B., 2009. The integrative function of the 380.

basal ganglia in instrumental conditioning. Behav. Brain Res. 199, 43–52. Botvinick, M.M., Niv, Y., Barto, A.C., 2009. Hierarchically organized behavior and its

Balleine, B.W., O’Doherty, J.P., 2010. Human and rodent homologies in action neural foundations: a reinforcement learning perspective. Cognition 113, 262–

control: corticostriatal determinants of goal-directed and habitual action. 280.

Neuropsychopharmacology 35, 48–69. Boyd, L.A., Edwards, J.D., Siengsukon, C.S., Vidoni, E.D., Wessel, B.D., Linsdell, M.A.,

Bardgett, M.E., Depenbrock, M., Downs, N., Points, M., Green, L., 2009. Dopamine 2009. Motor sequence chunking is impaired by basal ganglia stroke. Neurobiol.

modulates effort-based decision making in rats. Behav. Neurosci. 123, 242–251. Learn. Mem. 92, 35–44.

Bardo, M.T., Donohew, R.L., Harrington, N.G., 1996. Psychobiology of novelty Brischoux, F., Chakraborty, S., Brierley, D.I., Ungless, M.A., 2009. Phasic excitation of

seeking and drug seeking behavior. Behav. Brain Res. 77, 23–43. dopamine neurons in ventral VTA by noxious stimuli. Proc. Natl. Acad. Sci. U.S.A.

Bardo, M.T., Dwoskin, L.P., 2004. Biological connection between novelty- and drug- 106, 4894–4899.

seeking motivational systems. Nebr. Symp. Motiv. 50, 127–158. Bromberg-Martin, E.S., Hikosaka, O., 2009. Midbrain dopamine neurons signal

Bardo, M.T., Neisewander, J.L., Pierce, R.C., 1989. Novelty-induced place preference preference for advance information about upcoming rewards. Neuron 63,

behavior in rats: effects of opiate and dopaminergic drugs. Pharmacol. Biochem. 119–126.

Behav. 32, 683–689. Bromberg-Martin, E.S., Matsumoto, M., Hikosaka, O., 2010. Dopamine in motiva-

Barnes, C.A., 1979. Memory deficits associated with senescence: a neurophysiologi- tional control: rewarding, aversive, and alerting. Neuron 68, 815–834.

cal and behavioral study in the rat. J. Comp. Physiol. Psychol. 93, 74–104. Brosch, M., Budinger, E., Scheich, H., 2002. Stimulus-related gamma oscillations in

Barnes, C.A., McNaughton, B.L., Mizumori, S.J., Leonard, B.W., Lin, L.H., 1990. primate auditory cortex. J. Neurophysiol. 87, 2715–2725.

Comparison of spatial and temporal characteristics of neuronal activity in Brown, P.L., Jenkins, H.M., 1968. Auto-shaping of the pigeon’s key-peck. J. Exp. Anal.

sequential stages of hippocampal processing. Prog. Brain Res. 83, 287–300. Behav. 11, 1–8.

Barnes, T.D., Kubota, Y., Hu, D., Jin, D.Z., Graybiel, A.M., 2005. Activity of striatal Burgess, N., Barry, C., O’Keefe, J., 2007. An oscillatory interference model of grid cell

neurons reflects dynamic encoding and recoding of procedural memories. firing. Hippocampus 17, 801–812.

Nature 437, 1158–1161. Burgess, N., Maguire, E.A., O’Keefe, J., 2002. The human hippocampus and spatial

Barnes, T.D., Mao, J.B., Hu, D., Kubota, Y., Dreyer, A.A., Stamoulis, C., Brown, E.N., and episodic memory. Neuron 35, 625–641.

Graybiel, A.M., 2011. Advance-cueing produces enhanced action-boundary Burns, L.H., Annett, L., Kelley, A.E., Everitt, B.J., Robbins, T.W., 1996. Effects of lesions

patterns of spike activity in the sensorimotor striatum. J. Neurophysiol. 105, to amygdala, ventral subiculum, medial prefrontal cortex, and nucleus accum-

1861–1878. bens on the reaction to novelty: implication for limbic–striatal interactions.

Barry, C., Hayman, R., Burgess, N., Jeffery, K.J., 2007. Experience-dependent rescaling Behav. Neurosci. 110, 60–73.

of entorhinal grids. Nat. Neurosci. 10, 682–684. Burwell, R.D., 2000. The parahippocampal region: corticocortical connectivity. Ann.

Bartos, M., Vida, I., Jonas, P., 2007. Synaptic mechanisms of synchronized gamma N. Y. Acad. Sci. 911, 25–42.

oscillations in inhibitory interneuron networks. Nat. Rev. Neurosci. 8, 45–56. Burwell, R.D., Amaral, D.G., 1998a. Cortical afferents of the perirhinal, postrhinal,

Bauer, M., Oostenveld, R., Peeters, M., Fries, P., 2006. Tactile spatial attention and entorhinal cortices of the rat. J. Comp. Neurol. 398, 179–205.

enhances gamma-band activity in somatosensory cortex and reduces low- Burwell, R.D., Amaral, D.G., 1998b. Perirhinal and postrhinal cortices of the rat:

frequency activity in parieto-occipital areas. J. Neurosci. 26, 490–501. interconnectivity and connections with the entorhinal cortex. J. Comp. Neurol.

Baunez, C., Robbins, T.W., 1999. Effects of dopamine depletion of the dorsal striatum 391, 293–321.

and further interaction with subthalamic nucleus lesions in an attentional task Bussey, T.J., Everitt, B.J., Robbins, T.W., 1997. Dissociable effects of cingulate and

in the rat. Neuroscience 92, 1343–1356. medial frontal cortex lesions on stimulus–reward learning using a novel

Bayer, H.M., Glimcher, P.W., 2005. Midbrain dopamine neurons encode a quantita- Pavlovian autoshaping procedure for the rat: implications for the neurobiology

tive reward prediction error signal. Neuron 47, 129–141. of emotion. Behav. Neurosci. 111, 908–919.

Beckstead, R.M., Domesick, V.B., Nauta, W.J., 1979. Efferent connections of the Buzsaki, G., 1989. Two-stage model of memory trace formation: a role for ‘‘noisy’’

substantia nigra and ventral tegmental area in the rat. Brain Res. 175, 191–217. brain states. Neuroscience 31, 551–570.

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 127

Buzsaki, G., 2005. Theta rhythm of navigation: link between path integration and Day, J.J., Carelli, R.M., 2007. The nucleus accumbens and Pavlovian reward learning.

landmark navigation, episodic and semantic memory. Hippocampus 15, 827– Neuroscientist 13, 148–159.

840. Day, J.J., Jones, J.L., Carelli, R.M., 2011. Nucleus accumbens neurons encode predicted

Buzsaki, G., 2006. Rhythms of the Brain. Oxford Press, NY. and ongoing reward costs in rats. Eur. J. Neurosci. 33, 308–321.

Buzsaki, G., Chrobak, J.J., 2005. Synaptic plasticity and self-organization in the Day, J.J., Roitman, M.F., Wightman, R.M., Carelli, R.M., 2007. Associative learning

hippocampus. Nat. Neurosci. 8, 1418–1420. mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat.

Cardinal, R.N., Pennicott, D.R., Sugathapala, C.L., Robbins, T.W., Everitt, B.J., 2001. Neurosci. 10, 1020–1028.

Impulsive choice induced in rats by lesions of the nucleus accumbens core. Day, J.J., Wheeler, R.A., Roitman, M.F., Carelli, R.M., 2006. Nucleus accumbens

Science 292, 2499–2501. neurons encode Pavlovian approach behaviors: evidence from an autoshaping

Carelli, R.M., Ijames, S.G., 2001. Selective activation of accumbens neurons by paradigm. Eur. J. Neurosci. 23, 1341–1351.

cocaine-associated stimuli during a water/cocaine multiple schedule. Brain Dayan, P., Daw, N.D., 2008. Decision theory, reinforcement learning, and the brain.

Res. 907, 156–161. Cogn. Affect. Behav. Neurosci. 8, 429–453.

Carr, D.B., Sesack, S.R., 2000. Projections from the rat prefrontal cortex to the ventral Dayan, P., Niv, Y., 2008. Reinforcement learning: the good, the bad and the ugly.

tegmental area: target specificity in the synaptic associations with mesoac- Curr. Opin. Neurobiol. 18, 185–196.

cumbens and mesocortical neurons. J. Neurosci. 20, 3864–3873. De Leonibus, E., Pascucci, T., Lopez, S., Oliverio, A., Amalric, M., Mele, A., 2007. Spatial

Carr, H.A., 1917. The distribution and elimination of errors in the maze. J. Anim. deficits in a mouse model of Parkinson disease. Psychopharmacology (Berl) 194,

Behav. 7, 145–159. 517–525.

Charnov, E.L., 1976. Optimal foraging, the marginal value theorem. Theor. Popul. De Leonibus, E., Verheij, M.M., Mele, A., Cools, A., 2006. Distinct kinds of novelty

Biol. 9, 129–136. processing differentially increase extracellular dopamine in different brain

Christoph, G.R., Leonzio, R.J., Wilcox, K.S., 1986. Stimulation of the lateral habenula regions. Eur. J. Neurosci. 23, 1332–1340.

inhibits dopamine-containing neurons in the substantia nigra and ventral DeCoteau, W.E., Kesner, R.P., 2000. A double dissociation between the rat hippo-

tegmental area of the rat. J. Neurosci. 6, 613–619. campus and medial caudoputamen in processing two forms of knowledge.

Chudasama, Y., Robbins, T.W., 2006. Functions of frontostriatal systems in cogni- Behav. Neurosci. 114, 1096–1108.

tion: comparative neuropsychopharmacological studies in rats, monkeys and DeCoteau, W.E., Thorn, C., Gibson, D.J., Courtemanche, R., Mitra, P., Kubota, Y.,

humans. Biol. Psychol. 73, 19–38. Graybiel, A.M., 2007a. Learning-related coordination of striatal and hippocam-

Clark, J.J., Sandberg, S.G., Wanat, M.J., Gan, J.O., Horne, E.A., Hart, A.S., Akers, C.A., pal theta rhythms during acquisition of a procedural maze task. Proc. Natl. Acad.

Parker, J.G., Willuhn, I., Martinez, V., Evans, S.B., Stella, N., Phillips, P.E., 2010. Sci. U.S.A. 104, 5644–5649.

Chronic microsensors for longitudinal, subsecond dopamine detection in be- DeCoteau, W.E., Thorn, C., Gibson, D.J., Courtemanche, R., Mitra, P., Kubota, Y.,

having animals. Nat. Methods 7, 126–129. Graybiel, A.M., 2007b. Oscillations of local field potentials in the rat dorsal

Colby, C.L., 1998. Action-oriented spatial reference frames in cortex. Neuron 20, 15– striatum during spontaneous and instructed behaviors. J. Neurophysiol. 97,

24. 3800–3805.

Colby, C.L., Goldberg, M.E., 1999. Space and attention in parietal cortex. Annu. Rev. Denk, F., Walton, M.E., Jennings, K.A., Sharp, T., Rushworth, M.F., Bannerman, D.M.,

Neurosci. 22, 319–349. 2005. Differential involvement of serotonin and dopamine systems in cost–

Colgin, L.L., Moser, E.I., Moser, M., 2008. Understanding memory through hippo- benefit decisions about delay or effort. Psychopharmacology (Berl) 179, 587–

campal remapping. Trends Neurosci. 31, 469–477. 596.

Colwill, R.M., Rescorla, R.A., 1990. Effect of reinforcer devaluation on discriminative Derdikman, D., Moser, E.I., 2010. A manifold of spatial maps in the brain. Trends

control of instrumental behavior. J. Exp. Psychol. Anim. Behav. Process 16, 40– Cogn. Sci. 14, 561–569.

47. Devan, B.D., White, N.M., 1999. Parallel information processing in the dorsal

Cooper, B.G., Mizumori, S.J., 2001. Temporary inactivation of the retrosplenial striatum: relation to hippocampal function. J. Neurosci. 19, 2789–2798.

cortex causes a transient reorganization of spatial coding in the hippocampus. Di Ciano, P., Cardinal, R.N., , R.A., Little, S.J., Everitt, B.J., 2001. Differential

J. Neurosci. 21, 3986–4001. involvement of NMDA, AMPA/kainate, and dopamine receptors in the nucleus

Corbit, L.H., Janak, P.H., 2007. Inactivation of the lateral but not medial dorsal accumbens core in the acquisition and performance of Pavlovian approach

striatum eliminates the excitatory impact of Pavlovian stimuli on instrumental behavior. J. Neurosci. 21, 9471–9477.

responding. J. Neurosci. 27, 13977–13981. Dickinson, A., 1985. Actions and habits: the development of behavioural autonomy.

Corbit, L.H., Janak, P.H., 2010. Posterior dorsomedial striatum is critical for both Philos. Trans. R. Soc. Lond. B: Biol. Sci. 308, 67–78.

selective instrumental and Pavlovian reward learning. Eur. J. Neurosci. 31, Diaz-Fleischer, F., 2005. Predatory behavior and prey-capture decision-making by

1312–1321. the web-weaving spider Micrathena sagittata. Can. J. Zool. Rev. Can. Zool. 83,

Corbit, L.H., Muir, J.L., Balleine, B.W., 2001. The role of the nucleus accumbens in 268–273.

instrumental conditioning: evidence of a functional dissociation between Dormont, J.F., Conde, H., Farin, D., 1998. The role of the pedunculopontine tegmental

accumbens core and shell. J. Neurosci. 21, 3251–3260. nucleus in relation to conditioned motor performance in the cat. I. Context-

Corrado, G.S., Sugrue, L.P., Brown, J.R., Newsome, W.T., 2009. The trouble with dependent and reinforcement-related single unit activity. Exp. Brain Res. 121,

choice: studying decision variables in the brain. In: Glimcher, P.W., Camerer, 401–410.

C.F., Fehr, E., Poldrack, R.A. (Eds.), Neuroeconomics: Decision Making the Brain. Doya, K., 2008. Modulators of decision making. Nat. Neurosci. 11, 410–416.

Elsevier. Dragoi, G., Harris, K.D., Buzsaki, G., 2003. Place representation within hippocampal

Costa, R.M., Cohen, D., Nicolelis, M.A., 2004. Differential corticostriatal plasticity networks is modified by long-term potentiation. Neuron 39, 843–853.

during fast and slow motor skill learning in mice. Curr. Biol. 14, 1124– Eichenbaum, H., Cohen, N.J., 2001. From Conditioning to Conscious Recollection:

1134. Memory Systems of the Brain. Oxford University Press, New York.

Cousins, M.S., Atherton, A., Turner, L., Salamone, J.D., 1996. Nucleus accumbens Eichenbaum, H., Lipton, P.A., 2008. Towards a functional organization of the medial

dopamine depletions alter relative response allocation in a T-maze cost/benefit temporal lobe memory system: role of the parahippocampal and medial

task. Behav. Brain Res. 74, 189–197. entorhinal cortical areas. Hippocampus 18, 1314–1324.

Cousins, M.S., Wei, W., Salamone, J.D., 1994. Pharmacological characterization of El-Ghundi, M., Fletcher, P.J., Drago, J., Sibley, D.R., O’Dowd, B.F., George, S.R., 1999.

performance on a concurrent lever pressing/feeding choice procedure: effects Spatial learning deficit in dopamine D(1) receptor knockout mice. Eur. J.

of dopamine antagonist, cholinomimetic, sedative and stimulant drugs. Psycho- Pharmacol. 383, 95–106.

pharmacology (Berl) 116, 529–537. Engel, A.K., Fries, P., Singer, W., 2001. Dynamic predictions: oscillations and

Cowie, R.J., 1977. Optimal foraging in great tits (Parus major). Nature 268, 137–139. synchrony in top-down processing. Nat. Rev. Neurosci. 2, 704–716.

Cromwell, H.C., Schultz, W., 2003. Effects of expectations for different reward Enomoto, T., Floresco, S.B., 2009. Disruptions in spatial working memory, but not

magnitudes on neuronal activity in primate striatum. J. Neurophysiol. 89, short-term memory, induced by repeated ketamine exposure. Prog. Neurop-

2823–2838. sychopharmacol. Biol. Psychiatry 33, 668–675.

Csicsvari, J., Jamieson, B., Wise, K.D., Buzsaki, G., 2003. Mechanisms of gamma Estes, W.K., 1943. Discriminative conditioning. I. A discriminative property of

oscillations in the hippocampus of the behaving rat. Neuron 37, 311–322. conditioned anticipation. J. Exp. Psychol. 32, 150–155.

Dalley, J.W., Cardinal, R.N., Robbins, T.W., 2004. Prefrontal executive and cognitive Estes, W.K., 1948. Discriminative conditioning. II. Effects of a Pavlovian conditioned

functions in rodents: neural and neurochemical substrates. Neurosci. Biobehav. stimulus upon a subsequently established operant response. J. Exp. Psychol. 38,

Rev. 28, 771–784. 173–177.

Da Cunha, C., Wietzikoski, S., Wietzikoski, E.C., Miyoshi, E., Ferro, M.M., Anselmo- Etienne, A.S., Jeffery, K.J., 2004. Path integration in mammals. Hippocampus 14,

Franci, J.A., Canteras, N.S., 2003. Evidence for the substantia nigra pars compacta 180–192.

as an essential component of a memory system independent of the hippocam- Everitt, B.J., Robbins, T.W., 2005. Neural systems of reinforcement for drug addic-

pal memory system. Neurobiol. Learn Mem. 79, 236–242. tion: from actions to habits to compulsion. Nat. Neurosci. 8, 1481–1489.

Darvas, M., Palmiter, R.D., 2010. Restricting dopaminergic signaling to either Farrar, A.M., Font, L., Pereira, M., Mingote, S., Bunce, J.G., Chrobak, J.J., Salamone, J.D.,

dorsolateral or medial striatum facilitates cognition. J. Neurosci. 30, 1158–1165. 2008. Forebrain circuitry involved in effort-related choice: injections of the

Darvas, M., Palmiter, R.D., 2011. Contributions of striatal dopamine signaling to the GABAA agonist muscimol into ventral pallidum alter response allocation in

modulation of cognitive flexibility. Biol. Psychiatry 69, 704–707. food-seeking behavior. Neuroscience 152, 321–330.

Davies, N.B., 1977. Prey selection and search strategy of spotted flycatcher (Mus- Farrar, A.M., Pereira, M., Velasco, F., Hockemeyer, J., Muller, C.E., Salamone, J.D.,

ciapa striata)-filed-study on optimal foraging. Anim. Behav. 25, 1016–1033. 2007. Adenosine A(2A) receptor antagonism reverses the effects of dopamine

Daw, N.D., Niv, Y., Dayan, P., 2005. Uncertainty-based competition between pre- receptor antagonism on instrumental output and effort-related choice in the

frontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. rat: implications for studies of psychomotor slowing. Psychopharmacology

12, 1704–1711. (Berl) 191, 579–586.

128 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

Farrar, A.M., Segovia, K.N., Randall, P.A., Nunes, E.J., Collins, L.E., Stopper, C.M., Port, Frey, U., Schroeder, H., Matthies, H., 1990. Dopaminergic antagonists prevent long-

R.G., Hockemeyer, J., Muller, C.E., Correa, M., Salamone, J.D., 2010. Nucleus term maintenance of posttetanic LTP in the CA1 region of rat hippocampal

accumbens and effort-related functions: behavioral and neural markers of the slices. Brain Res. 522, 69–75.

interactions between adenosine A2A and dopamine D2 receptors. Neuroscience Fries, P., 2009. Neuronal gamma-band synchronization as a fundamental process in

166, 1056–1067. cortical computation. Annu. Rev. Neurosci. 32, 209–224.

Faure, A., Haberland, U., Conde, F., El Massioui, N., 2005. Lesion to the nigrostriatal Fuhs, M.C., Touretzky, D.S., 2007. Context learning in the rodent hippocampus.

dopamine system disrupts stimulus–response habit formation. J. Neurosci. 25, Neural Comput. 19, 3173–3215.

2771–2780. Fujisawa, S., Buzsaki, G., 2010. Theta and 4 Hz Oscillations: Region-specific Coupling

Featherstone, R.E., McDonald, R.J., 2004. Dorsal striatum and stimulus–response of PFC, VTA and Hippocampus in a Goal-directed Behavior. Society for Neuro-

learning: lesions of the dorsolateral, but not dorsomedial, striatum impair science, San Diego, CA.

acquisition of a simple discrimination task. Behav. Brain Res. 150, 15–23. Futami, T., Takakusaki, K., Kitai, S.T., 1995. Glutamatergic and cholinergic inputs

Fell, J., Klaver, P., Lehnertz, K., Grunwald, T., Schaller, C., Elger, C.E., Fernandez, G., from the pedunculopontine tegmental nucleus to dopamine neurons in the

2001. Human memory formation is accompanied by rhinal-hippocampal cou- substantia nigra pars compacta. Neurosci. Res. 21, 331–342.

pling and decoupling. Nat. Neurosci. 4, 1259–1264. Fyhn, M., Hafting, T., Treves, A., Moser, M.B., Moser, E.I., 2007. Hippocampal

Fenton, A.A., Muller, R.U., 1998. Place cell discharge is extremely variable during remapping and grid realignment in entorhinal cortex. Nature 446, 190–194.

individual passes of the rat through the firing field. Proc. Natl. Acad. Sci. U.S.A. Gal, G., Joel, D., Gusak, O., Feldon, J., Weiner, I., 1997. The effects of electrolytic lesion

95, 3182–3187. to the shell subterritory of the nucleus accumbens on delayed non-matching-

Ferbinteanu, J., Shirvalkar, P., Shapiro, M.L., 2011. Memory modulates journey- to-sample and four-arm baited eight-arm radial-maze tasks. Behav. Neurosci.

dependent coding in the rat hippocampus. J. Neurosci. 31, 9135–9146. 111, 92–103.

Ferbinteanu, J., Shapiro, M.L., 2003. Prospective and retrospective memory coding in Gan, J.O., Walton, M.E., Phillips, P.E., 2010. Dissociable cost and benefit encoding of

the hippocampus. Neuron 40, 1227–1239. future rewards by mesolimbic dopamine. Nat. Neurosci. 13, 25–27.

Ferretti, V., Florian, C., Costantini, V.J., Roullet, P., Rinaldi, A., De Leonibus, E., Gardiner, T.W., Kitai, S.T., 1992. Single-unit activity in the globus pallidus and

Oliverio, A., Mele, A., 2005. Co-activation of glutamate and dopamine receptors neostriatum of the rat during performance of a trained head movement. Exp.

within the nucleus accumbens is required for spatial memory consolidation in Brain Res. 88, 517–530.

mice. Psychopharmacology (Berl) 179, 108–116. Gasbarri, A., Packard, M.G., Campana, E., Pacitti, C., 1994a. Anterograde and retro-

Fields, H.L., Hjelmstad, G.O., Margolis, E.B., Nicola, S.M., 2007. Ventral tegmental grade tracing of projections from the ventral tegmental area to the hippocampal

area neurons in learned appetitive behavior and positive reinforcement. Annu. formation in the rat. Brain Res. Bull. 33, 445–452.

Rev. Neurosci. 30, 289–316. Gasbarri, A., Sulli, A., Innocenzi, R., Pacitti, C., Brioni, J.D., 1996. Spatial memory

Fiorillo, C.D., Newsome, W.T., Schultz, W., 2008. The temporal precision of reward impairment induced by lesion of the mesohippocampal dopaminergic system

prediction in dopamine neurons. Nat. Neurosci. 11, 966–973. in the rat. Neuroscience 74, 1037–1044.

Fiorillo, C.D., Tobler, P.N., Schultz, W., 2003. Discrete coding of reward probability Gasbarri, A., Sulli, A., Packard, M.G., 1997. The dopaminergic mesencephalic projec-

and uncertainty by dopamine neurons. Science 299, 1898–1902. tions to the hippocampal formation in the rat. Prog. Neuropsychopharmacol.

Fiorillo, C.D., Tobler, P.N., Schultz, W., 2005. Evidence that the delay-period activity Biol. Psychiatry 21, 1–22.

of dopamine neurons corresponds to reward uncertainty rather than back- Gasbarri, A., Verney, C., Innocenzi, R., Campana, E., Pacitti, C., 1994b. Mesolimbic

propagating TD errors. Behav. Brain Funct. 1, 7. dopaminergic neurons innervating the hippocampal formation in the rat: a

Fitting, S., Allen, G.L., Wedell, D.H., 2007. Remembering places in space: a human combined retrograde tracing and immunohistochemical study. Brain Res. 668,

analog study of the Morris water maze. In: Barkowsky, T., Knauff, M., Ligozat, 71–79.

G., Montello, D.R. (Eds.), Spatial Cognition V: Reasoning, Action, Interaction. Gavrilov, V.V., Wiener, S.I., Berthoz, A., 1998. Discharge correlates of hippocampal

Springer-Verlag, Berlin, Heidelberg, pp. 59–75. complex spike neurons in behaving rats passively displaced on a mobile robot.

Flagel, S.B., Clark, J.J., Robinson, T.E., Mayo, L., Czuj, A., Willuhn, I., Akers, C.A., Hippocampus 8, 475–490.

Clinton, S.M., Phillips, P.E., Akil, H., 2011. A selective role for dopamine in Geisler, S., Derst, C., Veh, R.W., Zahm, D.S., 2007. Glutamatergic afferents of the

stimulus–reward learning. Nature 469, 53–57. ventral tegmental area in the rat. J. Neurosci. 27, 5730–5743.

Flaherty, A.W., Graybiel, A.M., 1993. Output architecture of the primate . J. Gengler, S., Mallot, H.A., Holscher, C., 2005. Inactivation of the rat dorsal striatum

Neurosci. 13, 3222–3237. impairs performance in spatial tasks and alters hippocampal theta in the freely

Floresco, S.B., Blaha, C.D., Yang, C.R., Phillips, A.G., 2001. Modulation of hippocampal moving rat. Behav. Brain Res. 164, 73–82.

and amygdalar-evoked activity of nucleus accumbens neurons by dopamine: Gilbert, P.E., Kesner, R.P., Lee, I., 2001. Dissociating hippocampal subregions: double

cellular mechanisms of input selection. J. Neurosci. 21, 2851–2860. dissociation between dentate gyrus and CA1. Hippocampus 11, 626–636.

Floresco, S.B., Ghods-Sharifi, S., 2007. Amygdala-prefrontal cortical circuitry reg- Gill, K.M., Mizumori, S.J., 2007. Inactivation of prefrontal cortex alters reward-

ulates effort-based decision making. Cereb. Cortex 17, 251–260. related neural activity in substantia nigra. In: Society for Neuroscience

Floresco, S.B., St Onge, J.R., Ghods-Sharifi, S., Winstanley, C.A., 2008a. Cortico- Abstracts. Program No. 640.1.

limbic-striatal circuits subserving different forms of cost–benefit decision Gold, A.E., Kesner, R.P., 2005. The role of the CA3 subregion of the dorsal hippo-

making. Cogn. Affect. Behav. Neurosci. 8, 375–389. campus in spatial pattern completion in the rat. Hippocampus 15, 808–814.

Floresco, S.B., Tse, M.T., Ghods-Sharifi, S., 2008b. Dopaminergic and glutamatergic Goss-Custard, J.D., 1977. Response of redshank, Tringa totanus, to absolute and

regulation of effort- and delay-based decision making. Neuropsychopharma- relative densities of 2 prey species. J. Anim. Ecol. 46, 867–874.

cology 33, 1966–1979. Gothard, K.M., Skaggs, W.E., Moore, K.M., McNaughton, B.L., 1996. Binding of

Font, L., Mingote, S., Farrar, A.M., Pereira, M., Worden, L., Stopper, C., Port, R.G., hippocampal CA1 neural activity to multiple reference frames in a land-

Salamone, J.D., 2008. Intra-accumbens injections of the adenosine A2A agonist mark-based navigation task. J. Neurosci. 16, 823–835.

CGS 21680 affect effort-related choice behavior in rats. Psychopharmacology Goto, Y., O’Donnell, P., 2002. Timing-dependent limbic–motor synaptic integration

(Berl) 199, 515–526. in the nucleus accumbens. Proc. Natl. Acad. Sci. U.S.A. 99, 13189–13193.

Foster, D.J., Wilson, M.A., 2006. Reverse replay of behavioural sequences in hippo- Goto, Y., Yang, C.R., Otani, S., 2010. Functional and dysfunctional synaptic plasticity

campal place cells during the awake state. Nature 440, 680–683. in prefrontal cortex: roles in psychiatric disorders. Biol. Psychiatry 67, 199–207.

Foster, T.C., Castro, C.A., McNaughton, B.L., 1989. Spatial selectivity of rat hippo- Grace, A.A., 1991. Phasic versus tonic dopamine release and the modulation of

campal neurons: dependence on preparedness for movement. Science 244, dopamine system responsivity: a hypothesis for the etiology of schizophrenia.

1580–1582. Neuroscience 41, 1–24.

Frank, L.M., Brown, E.N., Wilson, M., 2000. Trajectory encoding in the hippocampus Grace, A.A., Floresco, S.B., Goto, Y., Lodge, D.J., 2007. Regulation of firing of dopami-

and entorhinal cortex. Neuron 27, 169–178. nergic neurons and control of goal-directed behaviors. Trends Neurosci. 30,

Frank, L.M., Stanley, G.B., Brown, E.N., 2004. Hippocampal plasticity across multiple 220–227.

days of exposure to novel environments. J. Neurosci. 24, 7681–7689. Graybiel, A.M., 1998. The basal ganglia and chunking of action repertoires. Neuro-

Frank, M.J., 2005. Dynamic dopamine modulation in the basal ganglia: a neuro- biol. Learn. Mem. 70, 119–136.

computational account of cognitive deficits in medicated and nonmedicated Graybiel, A.M., 2008. Habits, rituals, and the evaluative brain. Annu. Rev. Neurosci.

Parkinsonism. J. Cogn. Neurosci. 17, 51–72. 31, 359–387.

Freeman Jr., J.H., Cuppernell, C., Flannery, K., Gabriel, M., 1996. Context-specific Graybiel, A.M., Aosaki, T., Flaherty, A.W., Kimura, M., 1994. The basal ganglia and

multi-site cingulate cortical, limbic thalamic, and hippocampal neuronal activ- adaptive motor control. Science 265, 1826–1831.

ity during concurrent discriminative approach and avoidance training in rab- Groenewegen, H.J., Galis-de Graaf, Y., , W.J., 1999a. Integration and segrega-

bits. J. Neurosci. 16, 1538–1549. tion of limbic cortico-striatal loops at the thalamic level: an experimental

Freeman Jr., J.H., Weible, A., Rossi, J., Gabriel, M., 1997. Lesions of the entorhinal tracing study in rats. J. Chem. Neuroanat. 16, 167–185.

cortex disrupt behavioral and neuronal responses to context change during Groenewegen, H.J., Vermeulen-Van der Zee, E., te Kortschot, A., Witter, M.P., 1987.

extinction of discriminative avoidance behavior. Exp. Brain Res. 115, 445–457. Organization of the projections from the subiculum to the ventral striatum in

French, S.J., Totterdell, S., 2002. Hippocampal and prefrontal cortical inputs mono- the rat. A study using anterograde transport of Phaseolus vulgaris leucoagglu-

synaptically converge with individual projection neurons of the nucleus accum- tinin. Neuroscience 23, 103–120.

bens. J. Comp. Neurol. 446, 151–165. Groenewegen, H.J., Wright, C.I., Beijer, A.V., Voorn, P., 1999b. Convergence and

Frey, U., Matthies, H., Reymann, K.G., 1991. The effect of dopaminergic D1 receptor segregation of ventral striatal inputs and outputs. Ann. N. Y. Acad. Sci. 877, 49–

blockade during tetanization on the expression of long-term potentiation in the 63.

rat CA1 region in vitro. Neurosci. Lett. 129, 111–114. Guthrie, E.R., 1935. The Psychology of Learning. Harper, New York.

Frey, U., Morris, R.G., 1997. Synaptic tagging and long-term potentiation. Nature Guzowski, J.F., Knierim, J.J., Moser, E.I., 2004. Ensemble dynamics of hippocampal

385, 533–536. regions CA3 and CA1. Neuron 44, 581–584.

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 129

Haber, S.N., 2003. The primate basal ganglia: parallel and integrative networks. J. Holmes, N.M., Marchand, A.R., Coutureau, E., 2010. Pavlovian to instrumental

Chem. Neuroanat. 26, 317–330. transfer: a neurobehavioural perspective. Neurosci. Biobehav. Rev. 34, 1277–

Haber, S.N., Fudge, J.L., McFarland, N.R., 2000. Striatonigrostriatal pathways in 1295.

primates form an ascending spiral from the shell to the dorsolateral striatum. Honzik, C.H., 1933. Maze learning in rats on the absence of specific intra- and extra-

J. Neurosci. 20, 2369–2382. maze stimuli. Psychol. Bull. 30, 589–590.

Hafting, T., Fyhn, M., Molden, S., Moser, M.B., Moser, E.I., 2005. Microstructure of a Hoogenboom, N., Schoffelen, J.M., Oostenveld, R., Parkes, L.M., Fries, P., 2006.

spatial map in the entorhinal cortex. Nature 436, 801–806. Localizing human visual gamma-band activity in frequency, time and space.

Hall, J., Parkinson, J.A., Connor, T.M., Dickinson, A., Everitt, B.J., 2001. Involvement of Neuroimage 29, 764–773.

the central nucleus of the amygdala and nucleus accumbens core in mediating Horvitz, J.C., 2002. Dopamine gating of glutamatergic sensorimotor and incentive

Pavlovian influences on instrumental behaviour. Eur. J. Neurosci. 13, 1984– motivational input signals to the striatum. Behav. Brain Res. 137, 65–74.

1992. Horvitz, J.C., Stewart, T., Jacobs, B.L., 1997. Burst activity of ventral tegmental

Hallworth, N.E., Bland, B.H., 2004. Basal ganglia–hippocampal interactions support dopamine neurons is elicited by sensory stimuli in the awake cat. Brain Res.

the role of the hippocampal formation in sensorimotor integration. Exp. Neurol. 759, 251–258.

188, 430–443. Houk, J.C., 1995. Information processing in modular circuits linking basal ganglia

Hamilton, D.A., Driscoll, I., Sutherland, R.J., 2002. Human place learning in a virtual and cerebral cortex. In: Houk, J.C., Davis, J.L., Beiser, D.G. (Eds.), Models of

Morris water task: some important constraints on the flexibility of place Information Processing in the Basal Ganglia. MIT Press, Cambridge.

navigation. Behav. Brain Res. 129, 159–170. Houk, J.C., Davis, J.L., Beiser, D.G., 1995. Models of Information Processing in the

Hammond, L.J., 1980. The effect of contingency upon the appetitive conditioning of Basal Ganglia. MIT Press, Cambridge, MA.

free-operant behavior. J. Exp. Anal. Behav. 34, 297–304. Huang, Y.Y., Kandel, E.R., 1995. D1/D5 receptor agonists induce a protein synthesis-

Hampson, R.E., Heyser, C.J., Deadwyler, S.A., 1993. Hippocampal cell firing correlates dependent late potentiation in the CA1 region of the hippocampus. Proc. Natl.

of delayed-match-to-sample performance in the rat. Behav. Neurosci. 107, 715– Acad. Sci. U.S.A. 92, 2446–2450.

739. Hull, C.L., 1932. The goal gradient hypothesis and maze learning. Psychol. Rev. 39,

Hargreaves, E.L., Yoganarasimha, D., Knierim, J.J., 2007. Cohesiveness of spatial and 25–43.

directional representations recorded from neural ensembles in the anterior Hull, C.L., 1943. Principles of Behavior. Appleton-Century Crofts, New York.

thalamus, parasubiculum, medial entorhinal cortex, and hippocampus. Hippo- Humphries, M.D., Prescott, T.J., 2010. The ventral basal ganglia, a selection mecha-

campus 17, 826–841. nism at the crossroads of space, strategy, and reward. Prog. Neurobiol. 90, 385–

Haruno, M., Kawato, M., 2006. Heterarchical reinforcement-learning model for 417.

integration of multiple cortico-striatal loops: fMRI examination in stimulus– Hunsaker, M.R., Mooy, G.G., Swift, J.S., Kesner, R.P., 2007. Dissociations of the medial

action–reward association learning. Neural Netw. 19, 1242–1254. and lateral projections into dorsal DG, CA3, and CA1 for spatial

Hassani, O.K., Cromwell, H.C., Schultz, W., 2001. Influence of expectation of different and nonspatial (visual object) information processing. Behav. Neurosci. 121,

rewards on behavior-related neuronal activity in the striatum. J. Neurophysiol. 742–750.

85, 2477–2489. Hyman, J.M., Zilli, E.A., Paley, A.M., Hasselmo, M.E., 2005. Medial prefrontal cortex

Hasselmo, M.E., 2005a. The role of hippocampal regions CA3 and CA1 in matching cells show dynamic modulation with the hippocampal theta rhythm dependent

entorhinal input with retrieval of associations between objects and context: on behavior. Hippocampus 15, 739–749.

theoretical comment on Lee et al. (2005). Behav. Neurosci. 119, 342–345. Ikemoto, S., 2007. Dopamine reward circuitry: two projection systems from the

Hasselmo, M.E., 2005b. What is the function of hippocampal theta rhythm? Linking ventral midbrain to the nucleus accumbens- complex. Brain

behavioral data to phasic properties of field potential and unit recording data. Res. Rev. 56, 27–78.

Hippocampus 15, 936–949. Ikemoto, S., Panksepp, J., 1999. The role of nucleus accumbens dopamine in

Hasselmo, M.E., Hay, J., Ilyn, M., Gorchetchnikov, A., 2002. Neuromodulation, theta motivated behavior: a unifying interpretation with special reference to re-

rhythm and rat spatial navigation. Neural Netw. 15, 689–707. ward-seeking. Brain Res. Brain Res. Rev. 31, 6–41.

Hasselmo, M.E., McGaughy, J., 2004. High acetylcholine levels set circuit dynamics Ito, R., Robbins, T.W., Pennartz, C.M., Everitt, B.J., 2008. Functional interaction

for attention and encoding and low acetylcholine levels set dynamics for between the hippocampus and nucleus accumbens shell is necessary for the

consolidation. Prog. Brain Res. 145, 207–231. acquisition of appetitive spatial context conditioning. J. Neurosci. 28, 6950–

Hauber, W., Sommer, S., 2009. Prefrontostriatal circuitry regulates effort-related 6959.

decision making. Cereb. Cortex 19, 2240–2247. Izquierdo, I., Bevilaqua, L.R., Rossato, J.I., Bonini, J.S., Da Silva, W.C., Medina, J.H.,

Hawkes, K., Hill, K., O’Connell, J., 1982. Why hunters gather—optimal foraging and Cammarota, M., 2006. The connection between the hippocampal and the

the ache of eastern Paraguay. Am. Ethnol. 9, 379–398. striatal memory systems of the brain: a review of recent findings. Neurotox.

Hebb, D.O., 1949. The Organization of Behavior: A Neuropsychological Theory. John Res. 10, 113–121.

Wiley and Sons. Jackson, J., Redish, A.D., 2007. Network dynamics of hippocampal cell-assemblies

Heimer, L., Zahm, D.S., Churchill, L., Kalivas, P.W., Wohltmann, C., 1991. Specificity resemble multiple spatial maps within single tasks. Hippocampus 17, 1209–

in the projection patterns of accumbal core and shell in the rat. Neuroscience 41, 1229.

89–125. Jaeger, D., Gilman, S., Aldridge, J.W., 1993. Primate basal ganglia activity in

Henriksen, E.J., Colgin, L.L., Barnes, C.A., Witter, M.P., Moser, M.B., Moser, E.I., 2010. a precued reaching task: preparation for movement. Exp. Brain Res. 95,

Spatial representation along the proximodistal axis of CA1. Neuron 68, 127– 51–64.

137. Jay, T.M., Glowinski, J., Thierry, A.M., 1989. Selectivity of the hippocampal

Herkenham, M., Nauta, W.J., 1979. Efferent connections of the habenular nuclei in projection to the prelimbic area of the prefrontal cortex in the rat. Brain

the rat. J. Comp. Neurol. 187, 19–47. Res. 505, 337–340.

Hetherington, P.A., Shapiro, M.L., 1997. Hippocampal place fields are altered by the Jeffery, K.J., Anderson, M.I., Hayman, R., Chakraborty, S., 2004. A proposed architec-

removal of single visual cues in a distance-dependent manner. Behav. Neurosci. ture for the neural representation of spatial context. Neurosci. Biobehav. Rev.

111, 20–34. 28, 201–218.

Hikosaka, O., Bromberg-Martin, E., Hong, S., Matsumoto, M., 2008. New insights on Jeffery, K.J., Gilbert, A., Burton, S., Strudwick, A., 2003. Preserved performance in a

the subcortical representation of reward. Curr. Opin. Neurobiol. 18, 203–208. hippocampal-dependent spatial task despite complete place cell remapping.

Hikosaka, O., Nakahara, H., Rand, M.K., Sakai, K., Lu, X., Nakamura, K., Miyachi, S., Hippocampus 13, 175–189.

Doya, K., 1999. Parallel neural networks for learning sequential procedures. Jenkins, H.M., Moore, B.R., 1973. The form of the auto-shaped response with food or

Trends Neurosci. 22, 464–471. water reinforcers. J. Exp. Anal. Behav. 20, 163–181.

Hikosaka, O., Nakamura, K., Nakahara, H., 2006. Basal ganglia orient eyes to reward. Jensen, O., Lisman, J.E., 1996. Hippocampal CA3 region predicts memory sequences:

J. Neurophysiol. 95, 567–584. accounting for the phase precession of place cells. Learn. Mem. 3, 279–287.

Hikosaka, O., Sakamoto, M., Usui, S., 1989. Functional properties of monkey caudate Jin, X., Costa, R.M., 2010. Start/stop signals emerge in nigrostriatal circuits during

neurons. III. Activities related to expectation of target and reward. J. Neuro- sequence learning. Nature 466, 457–462.

physiol. 61, 814–832. Jo, Y.S., Lee, I., 2010. Disconnection of the hippocampal–perirhinal cortical circuits

Hill, A.J., 1978. First occurrence of hippocampal spatial firing in a new environment. severely disrupts object–place paired associative memory. J. Neurosci. 30,

Exp. Neurol. 62, 282–297. 9850–9858.

Hill, A.J., Best, P.J., 1981. Effects of deafness and blindness on the spatial correlates of Joel, D., Niv, Y., Ruppin, E., 2002. Actor–critic models of basal ganglia function: new

hippocampal unit activity in the rat. Exp. Neurol. 74, 204–217. anatomical and computational perpectives. Neural Netw. 15, 535–547.

Hirsh, R., 1974. The hippocampus and contextual retrieval of information from Joel, D., Weiner, I., 1994. The organization of the basal ganglia–thalamocortical

memory: a theory. Behav. Biol. 12, 421–444. circuits: open interconnected rather than closed segregated. Neuroscience 63,

Hoge, J., Kesner, R.P., 2007. Role of CA3 and CA1 subregions of the dorsal hippo- 363–379.

campus on temporal processing of objects. Neurobiol. Learn. Mem. 88, 225–231. Joel, D., Weiner, I., 2000. The connections of the dopaminergic system with the

Hollerman, J.R., Schultz, W., 1998. Dopamine neurons report an error in the striatum in rats and primates: an analysis with respect to the functional and

temporal prediction of reward during learning. Nat. Neurosci. 1, 304–309. compartmental organization of the striatum. Neuroscience 96, 451–474.

Hollerman, J.R., Tremblay, L., Schultz, W., 1998. Influence of reward expectation on Jog, M.S., Kubota, Y., Connolly, C.I., Hillegaart, V., Graybiel, A.M., 1999. Building

behavior-related neuronal activity in primate striatum. J. Neurophysiol. 80, neural representations of habits. Science 286, 1745–1749.

947–963. Johnson, A., van der Meer, M.A., Redish, A.D., 2007. Integrating hippocampus and

Hollup, S.A., Kjelstrup, K.G., Hoff, J., Moser, M.B., Moser, E.I., 2001. Impaired striatum in decision-making. Curr. Opin. Neurobiol. 17, 692–697.

recognition of the goal location during spatial navigation in rats with hippo- Jones, M.W., Wilson, M.A., 2005. Theta rhythms coordinate hippocampal–prefrontal

campal lesions. J. Neurosci. 21, 4505–4513. interactions in a spatial memory task. PLoS Biol. 3, e402.

130 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

Jongen-Relo, A.L., Voorn, P., Groenewegen, H.J., 1994. Immunohistochemical char- Lansink, C.S., Goltstein, P.M., Lankelma, J.V., Joosten, R.N., McNaughton, B.L., Pen-

acterization of the shell and core territories of the nucleus accumbens in the rat. nartz, C.M., 2008. Preferential reactivation of motivationally relevant informa-

Eur. J. Neurosci. 6, 1255–1264. tion in the ventral striatum. J. Neurosci. 28, 6372–6382.

Joshua, M., Adler, A., Mitelman, R., Vaadia, E., Bergman, H., 2008. Midbrain dopa- Lansink, C.S., Goltstein, P.M., Lankelma, J.V., McNaughton, B.L., Pennartz, C.M., 2009.

minergic neurons and striatal cholinergic interneurons encode the difference Hippocampus leads ventral striatum in replay of place-reward information.

between reward and aversive events at different epochs of probabilistic classi- PLoS Biol. 7, e1000173.

cal conditioning trials. J. Neurosci. 28, 11673–11684. Lavoie, A.M., Mizumori, S.J., 1994. Spatial, movement- and reward-sensitive dis-

Jung, M.W., Wiener, S.I., McNaughton, B.L., 1994. Comparison of spatial firing charge by medial ventral striatum neurons of rats. Brain Res. 638, 157–168.

characteristics of units in dorsal and ventral hippocampus of the rat. J. Neurosci. Lee, A.K., Wilson, M.A., 2002. Memory of sequential experience in the hippocampus

14, 7347–7356. during slow wave sleep. Neuron 36, 1183–1194.

Kalenscher, T., Lansink, C.S., Lankelma, J.V., Pennartz, C.M., 2010. Reward-associated Lee, I., Knierim, J.J., 2007. The relationship between the field-shifting phenomenon

gamma oscillations in ventral striatum are regionally differentiated and mod- and representational coherence of place cells in CA1 and CA3 in a cue-altered

ulate local firing activity. J. Neurophysiol. 103, 1658–1672. environment. Learn. Mem. 14, 807–815.

Karni, A., Meyer, G., Rey-Hipolito, C., Jezzard, P., Adams, M.M., Turner, R., Ungerlei- Lee, I., Yoganarasimha, D., Rao, G., Knierim, J.J., 2004. Comparison of population

der, L.G., 1998. The acquisition of skilled motor performance: fast and slow coherence of place cells in hippocampal subfields CA1 and CA3. Nature 430,

experience-driven changes in primary motor cortex. Proc. Natl. Acad. Sci. U.S.A. 456–459.

95, 861–868. Lemon, N., Manahan-Vaughan, D., 2006. Dopamine D1/D5 receptors gate the

Kelemen, E., Fenton, A.A., 2010. Dynamic grouping of hippocampal neural activity acquisition of novel information through hippocampal long-term potentiation

during cognitive control of two spatial frames. PLoS 8, e1000403. and long-term depression. J. Neurosci. 26, 7723–7729.

Kelley, A.E., 2004. Ventral striatal control of appetitive motivation: role in Lenck-Santini, P.P., Muller, R.U., Save, E., Poucet, B., 2002. Relationships between

ingestive behavior and reward-related learning. Neurosci. Biobehav. Rev. place cell firing fields and navigational decisions by rats. J. Neurosci. 22, 9035–

27, 765–776. 9047.

Kennedy, P.J., Shapiro, M.L., 2004. Retrieving memories via internal context requires Lenck-Santini, P.P., Save, E., Poucet, B., 2001. Evidence for a relationship between

the hippocampus. J. Neurosci. 24, 6979–6985. place-cell spatial firing and spatial memory performance. Hippocampus 11,

Kentros, C.G., Agnihotri, N.T., Streater, S., Hawkins, R.D., Kandel, E.R., 2004. Increased 377–390.

attention to spatial context increases both place field stability and spatial Leung, L.S., Yim, C.Y., 1993. Rhythmic delta-frequency activities in the nucleus

memory. Neuron 42, 283–295. accumbens of anesthetized and freely moving rats. Can. J. Physiol. Pharmacol.

Kentros, C., Hargreaves, E., Hawkins, R.D., Kandel, E.R., Shapiro, M., Muller, R.V., 71, 311–320.

1998. Abolition of long-term stability of new hippocampal place cell maps by Leutgeb, J.K., Leutgeb, S., Moser, M.B., Moser, E.I., 2007. Pattern separation in the

NMDA receptor blockade. Science 280, 2121–2126. dentate gyrus and CA3 of the hippocampus. Science 315, 961–966.

Kesner, R.P., 2007. Behavioral functions of the CA3 subregion of the hippocampus. Leutgeb, S., Leutgeb, J.K., Treves, A., Moser, M.B., Moser, E.I., 2004. Distinct ensemble

Learn. Mem. 14, 771–781. codes in hippocampal areas CA3 and CA1. Science 305, 1295–1298.

Kesner, R.P., Lee, I., Gilbert, P., 2004. A behavioral assessment of hippocampal Lever, C., Burton, S., Jeewajee, A., O’Keefe, J., Burgess, N., 2009. Boundary vector cells

function based on a subregional analysis. Rev. Neurosci. 15, 333–351. in the subiculum of the hippocampal formation. J. Neurosci. 29, 9771–9777.

Khamassi, M., Lacheze, L., Girard, B., Berthoz, A., Guillot, A., 2005. Actor–critic Li, S., Cullen, W.K., Anwyl, R., Rowan, M.J., 2003. Dopamine-dependent facilitation of

models of reinforcement learning in the basal ganglia: from natural to arificial LTP induction in hippocampal CA1 by exposure to spatial novelty. Nat. Neurosci.

rats. Adapt. Behav. 13, 131–148. 6, 526–531.

Khamassi, M., Mulder, A.B., Tabuchi, E., Douchamps, V., Wiener, S.I., 2008. Antici- Lima, S.L., 1983. Downy woodpecker foraging behavior-foraging by expectation and

patory reward signals in ventral striatal neurons of behaving rats. Eur. J. energy intake rate. Oecologia 58, 232–237.

Neurosci. 28, 1849–1866. Lisman, J.E., 1999. Relating hippocampal circuitry to function: recall of memory

Kim, J.J., Fanselow, M.S., 1992. Modality-specific retrograde amnesia of fear. Science sequences by reciprocal dentate–CA3 interactions. Neuron 22, 233–242.

256, 675–677. Lisman, J.E., Grace, A.A., 2005. The hippocampal–VTA loop: controlling the entry of

Kimchi, E.Y., Laubach, M., 2009. Dynamic encoding of action selection by the medial information into long-term memory. Neuron 46, 703–713.

striatum. J. Neurosci. 29, 3148–3159. Lisman, J., Redish, A.D., 2009. Prediction, sequences and the hippocampus. Philos.

Kimura, M., Aosaki, T., Hu, Y., Ishida, A., Watanabe, K., 1992. Activity of primate Trans. R. Soc. Lond. B: Biol. Sci. 364, 1193–1201.

putamen neurons is selective to the mode of voluntary movement: visually Ljungberg, T., Apicella, P., Schultz, W., 1992. Responses of monkey dopamine

guided, self-initiated or memory-guided. Exp. Brain Res. 89, 473–477. neurons during learning of behavioral reactions. J. Neurophysiol. 67, 145–163.

Kincaid, A.E., Zheng, T., Wilson, C.J., 1998. Connectivity and convergence of single Locurto, C., Terrace, H.S., Gibbon, J., 1976. Autoshaping, random control, and

corticostriatal axons. J. Neurosci. 18, 4722–4731. omission training in the rat. J. Exp. Anal. Behav. 26, 451–462.

Knierim, J.J., Kudrimoti, H.S., McNaughton, B.L., 1995. Place cells, head direction Lodge, D.J., Grace, A.A., 2006. The laterodorsal tegmentum is essential for burst

cells, and the learning of landmark stability. J. Neurosci. 15, 1648–1659. firing of ventral tegmental area dopamine neurons. Proc. Natl. Acad. Sci. U.S.A.

Knierim, J.J., Lee, I., Hargreaves, E.L., 2006. Hippocampal place cells: parallel input 103, 5167–5172.

streams, subregional processing, and implications for episodic memory. Hip- Long, J.M., Kesner, R.P., 1996. The effects of dorsal versus ventral hippocampal, total

pocampus 16, 755–764. hippocampal, and parietal cortex lesions on memory for allocentric distance in

Kobayashi, S., Schultz, W., 2008. Influence of reward delays on responses of rats. Behav. Neurosci. 110, 922–932.

dopamine neurons. J. Neurosci. 28, 7837–7846. Lopes da Silva, F.H., Arnolds, D.E., Neijt, H.C., 1984. A functional link between the

Kobayashi, Y., Isa, T., 2002. Sensory-motor gating and cognitive control by the limbic cortex and ventral striatum: physiology of the subiculum accumbens

brainstem cholinergic system. Neural Netw. 15, 731–741. pathway. Exp. Brain Res. 55, 205–214.

Koch, M., , A., Schnitzler, H.U., 2000. Role of muscles accumbens dopamine Louie, K., Wilson, M.A., 2001. Temporally structured replay of awake hippocampal

D1 and D2 receptors in instrumental and Pavlovian paradigms of conditioned ensemble activity during rapid eye movement sleep. Neuron 29, 145–156.

reward. Psychopharmacology 152, 67–73. Ludvig, E.A., Sutton, R.S., Kehoe, E.J., 2008. Stimulus representation and the timing of

Krebs, J.R., McCleery, R.H., 1984. Optimization in behavioural ecology. In: Davies, reward-prediction errors in models of the dopamine system. Neural Comput.

J.R.K.N.B. (Ed.), Behavioural Ecology. Sinauer, Sunderland, MA, pp. 91– 20, 3034–3054.

121. MacArthur, R.H., Pianka, E.R., 1966. On optimal use of patchy environments. Am.

Kropf, W., Kuschinsky, K., 1993. Conditioned effects of apomorphine are manifest in Nat. 100, 603–609.

regional EEG of rats both in hippocampus and in striatum. Naunyn Schmiede- Maia, T.V., 2009. Reinforcement learning, conditioning, and the brain: success and

bergs Arch. Pharmacol. 347, 487–493. challenges. Cogn. Affect. Behav. Neurosci. 9, 343–364.

Kruse, J.M., Overmier, B., Konz, W.A., Rokke, E., 1983. Pavlovian conditioned Maldonado-Irizarry, C.S., Kelley, A.E., 1995. Excitatory amino acid receptors within

stimulus effects upon instrumental choice behavior are reinforcer specific. nucleus accumbens subregions differentially mediate spatial learning in the rat.

Learn. Motiv. 14, 165–181. Behav. Pharmacol. 6, 527–539.

Kubie, J.L., Ranck Jr., J.B., 1983. Sensory-behavioral correlates in individual Maren, S., 2001. Neurobiology of Pavlovian fear conditioning. Annu. Rev. Neurosci.

hippocampus neurons in three situations: space and context. In: Seifert, 24, 897–931.

W. (Ed.), Neurobiology of the Hippocampus. Academic, New York, pp. Markus, E.J., Barnes, C.A., McNaughton, B.L., Gladden, V.L., Skaggs, W.E., 1994.

433–447. Spatial information content and reliability of hippocampal CA1 neurons: effects

Kubota, Y., Liu, J., Hu, D., DeCoteau, W.E., Eden, U.T., Smith, A.C., Graybiel, A.M., 2009. of visual input. Hippocampus 4, 410–421.

Stable encoding of task structure coexists with flexible coding of task events in Markus, E.J., Qin, Y.L., Leonard, B., Skaggs, W.E., McNaughton, B.L., Barnes, C.A., 1995.

sensorimotor striatum. J. Neurophysiol. 102, 2142–2160. Interactions between location and task affect the spatial and directional firing of

Kurth-Nelson, Z., Redish, A.D., 2009. Temporal-difference reinforcement learning hippocampal neurons. J. Neurosci. 15, 7079–7094.

with distributed representations. PLoS One 4, e7362. Marowsky, A., Yanagawa, Y., Obata, K., Vogt, K.E., 2005. A specialized subclass of

Kurth-Nelson, Z., Redish, A.D., 2010. A reinforcement learning model of precom- interneurons mediates dopaminergic facilitation of amygdala function. Neuron

mitment in decision making. Front. Behav. Neurosci. 4, 184. 48, 1025–1037.

Kusuki, T., Imahori, Y., Ueda, S., Inokuchi, K., 1997. Dopaminergic modulation of LTP Marr, D., 1971. Simple memory: a theory for . Philos. Trans. R. Soc. Lond.

induction in the dentate gyrus of intact brain. Neuroreport 8, 2037–2040. B: Biol. Sci. 262, 23–81.

Langston, R.F., Ainge, J.A., Couey, J.J., Canto, C.B., Bjerknes, T.L., Witter, M.P., Moser, Martig, A.K., Jones, G.L., Smith, K.E., Mizumori, S.J., 2009. Context dependent effects

E.I., Moser, M.B., 2010. Development of the spatial representation system in the of ventral tegmental area inactivation on spatial working memory. Behav. Brain

rat. Science 328, 1576–1580. Res. 203, 316–320.

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 131

Martig, A.K., Mizumori, S.J., 2011. Ventral tegmental area disruption selectively CA3c output: evidence for pattern completion in hippocampus. J. Neurosci. 9,

affects CA1/CA2 but not CA3 place fields during a differential reward working 3915–3928.

memory task. Hippocampus 21, 172–184. Mizumori, S.J., Puryear, C.B., Martig, A.K., 2009. Basal ganglia contributions to

Martin, S.J., Grimwood, P.D., Morris, R.G., 2000. Synaptic plasticity and memory: an adaptive navigation. Behav. Brain Res. 199, 32–42.

evaluation of the hypothesis. Annu. Rev. Neurosci. 23, 649–711. Mizumori, S.J., Ragozzino, K.E., Cooper, B.G., Leutgeb, S., 1999. Hippocampal repre-

Matsumoto, M., Hikosaka, O., 2007. Lateral habenula as a source of negative reward sentational organization and spatial context. Hippocampus 9, 444–451.

signals in dopamine neurons. Nature 447, 1111–1115. Mizumori, S.J., Smith, D.M., Puryear, C.B., 2007a. Hippocampal and neocortical

Matsumoto, M., Hikosaka, O., 2009. Two types of dopamine neuron distinctly interactions during context discrimination: electrophysiological evidence from

convey positive and negative motivational signals. Nature 459, 837–841. the rat. Hippocampus 17, 851–862.

Maurer, A.P., Vanrhoads, S.R., Sutherland, G.R., Lipa, P., McNaughton, B.L., 2005. Self- Mizumori, S.J., Yeshenko, O., Gill, K.M., Davis, D.M., 2004. Parallel processing across

motion and the origin of differential spatial scaling along the septo-temporal neural systems: implications for a multiple memory system hypothesis. Neu-

axis of the hippocampus. Hippocampus 15, 841–852. robiol. Learn. Mem. 82, 278–298.

McClelland, J.L., McNaughton, B.L., O’Reilly, R.C., 1995. Why there are complemen- Mizumori, S.J.Y., 2008. Hippocampal Place Fields: Relevance to Learning and

tary learning systems in the hippocampus and neocortex: insights from the Memory. Oxford University Press, New York.

successes and failures of connectionist models of learning and memory. Psy- Mizumori, S.J.Y., Smith, D.M., Puryear, C.B., 2007b. Mnemonic contributions of

chol. Rev. 102, 419–457. hippocampal place cells. In: Martinez, J.L., Kesner, R.P. (Eds.), Neurobiology

McDonald, R.J., White, N.M., 1993. A triple dissociation of memory systems: of Learning and Memory. Academic Press.

hippocampus, amygdala, and dorsal striatum. Behav. Neurosci. 107, 3–22. Mogenson, G.J., Jones, D.L., Yim, C.Y., 1980. From motivation to action: functional

McFarland, K., Ettenberg, A., 1995. Haloperidol differentially affects reinforcement interface between the limbic system and the motor system. Prog. Neurobiol. 14,

and motivational processes in rats running an alley for intravenous heroin. 69–97.

Psychopharmacology (Berl) 122, 346–350. Molina-Luna, K., Pekanovic, A., Rohrich, S., Hertler, B., Schubring-Giese, M., Rioult-

McGeorge, A.J., Faull, R.L., 1987. The organization and collateralization of corti- Pedotti, M.S., Luft, A.R., 2009. Dopamine in motor cortex is necessary for skill

costriate neurones in the motor and sensory cortex of the rat brain. Brain Res. learning and synaptic plasticity. PLoS One 4, e7082.

423, 318–324. Montague, P.R., Dayan, P., Sejnowski, T.J., 1996. A framework for mesencephalic

McGeorge, A.J., Faull, R.L., 1989. The organization of the projection from the cerebral dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–

cortex to the striatum in the rat. Neuroscience 29, 503–537. 1947.

McHugh, T.J., Blum, K.I., Tsien, J.Z., Tonegawa, S., Wilson, M.A., 1996. Impaired Montgomery, S.M., Betancur, M.I., Buzsaki, G., 2009. Behavior-dependent coordi-

hippocampal representation of space in CA1-specific NMDAR1 knockout mice. nation of multiple theta dipoles in the hippocampus. J. Neurosci. 29, 1381–

Cell 87, 1339–1349. 1394.

McNaughton, B.L., Barnes, C.A., Gerrard, J.L., Gothard, K., Jung, M.W., Knierim, J.J., Morris, R.G., Frey, U., 1997. Hippocampal synaptic plasticity: role in spatial learning

Kudrimoti, H., Qin, Y., Skaggs, W.E., Suster, M., Weaver, K.L., 1996. Deciphering or the automatic recording of attended experience? Philos. Trans. R. Soc. Lond.

the hippocampal polyglot: the hippocampus as a path integration system. J. B: Biol. Sci. 352, 1489–1503.

Exp. Biol. 199, 173–185. Morris, R.G.M., 1981. Spatial localization does not require the presence of local cues.

McNaughton, B.L., Barnes, C.A., O’Keefe, J., 1983. The contributions of position, Learn. Motiv. 12, 239–260.

direction, and velocity to single unit activity in the hippocampus of freely- Moscovitch, M., Rosenbaum, R.S., Gilboa, A., Addis, D.R., Westmacott, R., Grady, C.,

moving rats. Exp. Brain Res. 52, 41–49. McAndrews, M.P., Levine, B., Black, S., Winocur, G., Nadel, L., 2005. Functional

Mehta, M.R., Barnes, C.A., McNaughton, B.L., 1997. Experience-dependent, asym- neuroanatomy of remote episodic, semantic and spatial memory: a unified

metric expansion of hippocampal place fields. Proc. Natl. Acad. Sci. U.S.A. 94, account based on multiple trace theory. J. Anat. 207, 35–66.

8918–8921. Moser, E.I., Kropff, E., Moser, M.B., 2008. Place cells, grid cells, and the brain’s spatial

Mehta, M.R., Quirk, M.C., Wilson, M.A., 2000. Experience-dependent asymmetric representation system. Annu. Rev. Neurosci. 31, 69–89.

shape of hippocampal receptive fields. Neuron 25, 707–715. Mott, A.M., Nunes, E.J., Collins, L.E., Port, R.G., Sink, K.S., Hockemeyer, J., Muller,

Meredith, G.E., 1999. The synaptic framework for chemical signaling in nucleus C.E., Salamone, J.D., 2009. The adenosine A2A antagonist MSX-3 reverses the

accumbens. Ann. N. Y. Acad. Sci. 877, 140–156. effects of the dopamine antagonist haloperidol on effort-related decision

Meredith, G.E., Agolia, R., Arts, M.P., Groenewegen, H.J., Zahm, D.S., 1992. Morpho- making in a T-maze cost/benefit procedure. Psychopharmacology (Berl) 204,

logical differences between projection neurons of the core and shell in the 103–112.

nucleus accumbens of the rat. Neuroscience 50, 149–162. Mulder, A.B., Hodenpijl, M.G., Lopes da Silva, F.H., 1998. Electrophysiology of the

Meredith, G.E., Baldo, B.A., Andrezjewski, M.E., Kelley, A.E., 2008. The structural hippocampal and amygdaloid projections to the nucleus accumbens of the

basis for mapping behavior onto the ventral striatum and its subdivisions. Brain rat: convergence, segregation, and interaction of inputs. J. Neurosci. 18,

Struct. Funct. 213, 17–27. 5095–5102.

Meredith, G.E., Pattiselanno, A., Groenewegen, H.J., Haber, S.N., 1996. Shell and core Mulder, A.B., Tabuchi, E., Wiener, S.I., 2004. Neurons in hippocampal afferent zones

in monkey and human nucleus accumbens identified with antibodies to cal- of rat striatum parse routes into multi-pace segments during maze navigation.

bindin-D28k. J. Comp. Neurol. 365, 628–639. Eur. J. Neurosci. 19, 1923–1932.

Mesulam, M.M., 1981. A cortical network for directed attention and unilateral Muller, R.U., Kubie, J.L., 1987. The effects of changes in the environment on the

neglect. Ann. Neurol. 10, 309–325. spatial firing of hippocampal complex-spike cells. J. Neurosci. 7, 1951–1968.

Mingote, S., Font, L., Farrar, A.M., Vontell, R., Worden, L.T., Stopper, C.M., Port, R.G., Muller, R.U., Kubie, J.L., 1989. The firing of hippocampal place cells predicts the

Sink, K.S., Bunce, J.G., Chrobak, J.J., Salamone, J.D., 2008a. Nucleus accumbens future position of freely moving rats. J. Neurosci. 9, 4101–4110.

adenosine A2A receptors regulate exertion of effort by acting on the ventral Muller, R.U., Stead, M., Pach, J., 1996. The hippocampus as a cognitive graph. J. Gen.

striatopallidal pathway. J. Neurosci. 28, 9037–9046. Physiol. 107, 663–694.

Mingote, S., Pereira, M., Farrar, A.M., McLaughlin, P.J., Salamone, J.D., 2008b. Munn, N.L., 1950. Handbook of Psychological Research on the Rat; An Introduction

Systemic administration of the adenosine A(2A) agonist CGS 21680 induces to Animal Psychology. Houghton Mifflin, Oxford.

sedation at doses that suppress lever pressing and food intake. Pharmacol. Myers, C.E., Gluck, M., 1994. Context, conditioning, and hippocampal rerepresenta-

Biochem. Behav. 89, 345–351. tion in animal learning. Behav. Neurosci. 108, 835–847.

Mirenowicz, J., Schultz, W., 1994. Importance of unpredictability for reward Nadel, L., Payne, J.D., 2002. The hippocampus, wayfinding and episodic memory. In:

responses in primate dopamine neurons. J. Neurophysiol. 72, 1024–1027. Sharp, P.E. (Ed.), The Neural Basis of Navigation: Evidence from Single Cell

Mishkin, M., Malamut, B., Bachevalier, J., 1984. Memories and habits: two neural Recording. Kluwer Academic Publication, MA.

systems. In: Lynch, G., MCGaugh, J.L., Weinberger, N.M. (Eds.), Neurobiology of Nadel, L., Wilner, J., 1980. Context and conditioning: a place for space. Physiol.

Learning and Memory. Guilford, New York. Psychol. 8, 218–228.

Miyashita, T., Kubik, S., Haghighi, N., Steward, O., Guzowski, J.F., 2009. Rapid Nai, Q., Li, S., Wang, S.H., Liu, J., Lee, F.J., Frankland, P.W., Liu, F., 2010. Uncoupling the

activation of plasticity-associated gene transcription in hippocampal neurons D1–N-methyl-D-aspartate (NMDA) receptor complex promotes NMDA-depen-

provides a mechanism for encoding of one-trial experience. J. Neurosci. 29, dent long-term potentiation and working memory. Biol. Psychiatry 67, 246–

898–906. 254.

Miyazaki, K.W., Miyazaki, K., Doya, K., 2011. Activation of the central serotonergic Nair-Roberts, R.G., Chatelain-Badie, S.D., Benson, E., White-Cooper, H., Bolam, J.P.,

system in response to delayed but not omitted rewards. Eur. J. Neurosci. 33, Ungless, M.A., 2008. Stereological estimates of dopaminergic, GABAergic and

153–160. glutamatergic neurons in the ventral tegmental area, substantia nigra and

Mizumori, S.J., 2006. Hippocampal place fields: a neural code for episodic memory? retrorubral field in the rat. Neuroscience 152, 1024–1031.

Hippocampus 16, 685–690. Nakahara, H., Itoh, H., Kawagoe, R., Takikawa, Y., Hikosaka, O., 2004. Dopamine

Mizumori, S.J., Barnes, C.A., McNaughton, B.L., 1989a. Reversible inactivation of the neurons can represent context-dependent prediction error. Neuron 41, 269–

medial septum: selective effects on the spontaneous unit activity of different 280.

hippocampal cell types. Brain Res. 500, 99–106. Nicola, S.M., 2007. The nucleus accumbens as part of a basal ganglia action selection

Mizumori, S.J., Cooper, B.G., Leutgeb, S., Pratt, W.E., 2000. A neural systems analysis circuit. Psychopharmacology (Berl) 191, 521–550.

of adaptive navigation. Mol. Neurobiol. 21, 57–82. Nicola, S.M., 2010. The flexible approach hypothesis: unification of effort and cue-

Mizumori, S.J., Lavoie, A.M., Kalyani, A., 1996. Redistribution of spatial representa- responding hypotheses for the role of nucleus accumbens dopamine in the

tion in the hippocampus of aged rats performing a spatial memory task. Behav. activation of reward-seeking behavior. J. Neurosci. 30, 16585–16600.

Neurosci. 110, 1006–1016. Nicola, S.M., Kombian, S.B., Malenka, R.C., 1996. Psychostimulants depress excit-

Mizumori, S.J., McNaughton, B.L., Barnes, C.A., Fox, K.B., 1989b. Preserved spatial atory synaptic transmission in the nucleus accumbens via presynaptic D1-like

coding in hippocampal CA1 pyramidal cells during reversible suppression of dopamine receptors. J. Neurosci. 16, 1591–1604.

132 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

Nicola, S.M., Malenka, R.C., 1998. Modulation of synaptic transmission by dopamine Palmiter, R.D., 2008. Dopamine signaling in the dorsal striatum is essential for

and norepinephrine in ventral but not dorsal striatum. J. Neurophysiol. 79, motivated behaviors: lessons from dopamine-deficient mice. Ann. N. Y. Acad.

1768–1776. Sci. 1129, 35–46.

Nicola, S.M., Surmeier, J., Malenka, R.C., 2000. Dopaminergic modulation of neuronal Pan, W.X., Hyland, B.I., 2005. Pedunculopontine tegmental nucleus controls condi-

excitability in the striatum and nucleus accumbens. Annu. Rev. Neurosci. 23, tioned responses of midbrain dopamine neurons in behaving rats. J. Neurosci.

185–215. 25, 4725–4732.

Nicola, S.M., Yun, I.A., Wakabayashi, K.T., Fields, H.L., 2004. Firing of nucleus Pan, W.X., , R., Wickens, J.R., Hyland, B.I., 2005. Dopamine cells respond to

accumbens neurons during the consummatory phase of a discriminative stim- predicted events during classical conditioning: evidence for eligibility traces in

ulus task depends on previous reward predictive cues. J. Neurophysiol. 91, the reward-learning network. J. Neurosci. 25, 6235–6242.

1866–1882. Pan, W.X., Schmidt, R., Wickens, J.R., Hyland, B.I., 2008. Tripartite mechanism of

Niv, Y., 2009. Reinforcement learning in the brain. J. Math. Psychol. 53, 139–154. extinction suggested by dopamine neuron activity and temporal difference

Niv, Y., Daw, N.D., Joel, D., Dayan, P., 2007. Tonic dopamine: opportunity costs and model. J. Neurosci. 28, 9619–9631.

the control of response vigor. Psychopharmacology (Berl) 191, 507–520. Parent, A., 1990. Extrinsic connections of the basal ganglia. Trends Neurosci. 13,

Niv, Y., Joel, D., Dayan, P., 2006. A normative perspective on motivation. Trends 254–258.

Cogn. Sci. 10, 375–381. Parkinson, J.A., Dalley, J.W., Cardinal, R.N., Bamford, A., Fehnert, B., Lachenal, G.,

O’Carroll, C.M., Morris, R.G., 2004. Heterosynaptic co-activation of glutamatergic Rudarakanchana, N., Halkerston, K.M., Robbins, T.W., Everitt, B.J., 2002. Nucleus

and dopaminergic afferents is required to induce persistent long-term potenti- accumbens dopamine depletion impairs both acquisition and performance of

ation. Neuropharmacology 47, 324–332. appetitive Pavlovian approach behaviour: implications for mesoaccumbens

O’Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., Dolan, R.J., 2004. dopamine function. Behav. Brain Res. 137, 149–163.

Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Paxinos, G., Watson, C., 2007. The Rat Brain in Stereotaxic Coordinates. Elsevier

Science 304, 452–454. Academic Press, San Diego.

O’Doherty, J.P., Dayan, P., Friston, K., Critchley, H., Dolan, R.J., 2003. Temporal Pellis, S.M., Castaneda, E., McKenna, M.M., Tran-Nguyen, L.T., Whishaw, I.Q., 1993.

difference models and reward-related learning in the . Neuron The role of the striatum in organizing sequences of play fighting in neonatally

38, 329–337. dopamine-depleted rats. Neurosci. Lett. 158, 13–15.

O’Donnell, P., Grace, A.A., 1995. Synaptic interactions among excitatory afferents to Penick, S., Solomon, P.R., 1991. Hippocampus, context, and conditioning. Behav.

nucleus accumbens neurons: hippocampal gating of prefrontal cortical input. J. Neurosci. 105, 611–617.

Neurosci. 15, 3622–3639. Pennartz, C.M., Berke, J.D., Graybiel, A.M., Ito, R., Lansink, C.S., van der Meer, M.,

O’Keefe, J., 1976. Place units in the hippocampus of the freely moving rat. Exp. Redish, A.D., Smith, K.S., Voorn, P., 2009. Corticostriatal interactions during

Neurol. 51, 78–109. learning, memory processing, and decision making. J. Neurosci. 29, 12831–

O’Keefe, J., Burgess, N., 1996. Geometric determinants of the place fields of hippo- 12838.

campal neurons. Nature 381, 425–428. Pennartz, C.M., Groenewegen, H.J., Lopes da Silva, F.H., 1994. The nucleus accum-

O’Keefe, J., Conway, D.H., 1978. Hippocampal place units in the freely moving rat: bens as a complex of functionally distinct neuronal ensembles: an integration of

why they fire where they fire. Exp. Brain Res. 31, 573–590. behavioural, electrophysiological and anatomical data. Prog. Neurobiol. 42,

O’Keefe, J., Dostrovsky, J., 1971. The hippocampus as a spatial map. Preliminary 719–761.

evidence from unit activity in the freely-moving rat. Brain Res. 34, 171–175. Pennartz, C.M., Lee, E., Verheul, J., Lipa, P., Barnes, C.A., McNaughton, B.L., 2004. The

O’Keefe, J., Nadel, L., 1978a. The Hippocampus as a Cognitive Map. Oxford University ventral striatum in off-line processing: ensemble reactivation during sleep and

Press. modulation by hippocampal ripples. J. Neurosci. 24, 6446–6456.

O’Keefe, J., Recce, M.L., 1993. Phase relationship between hippocampal place units Pennartz, C.M., Uylings, H.B., Barnes, C.A., McNaughton, B.L., 2002. Memory reacti-

and the EEG theta rhythm. Hippocampus 3, 317–330. vation and consolidation during sleep: from cellular mechanisms to human

O’Mara, S.M., 1995. Spatially selective firing properties of hippocampal formation performance. Prog. Brain Res. 138, 143–166.

neurons in rodents and primates. Prog. Neurobiol. 45, 253–274. Phillips, P.E., Walton, M.E., Jhou, T.C., 2007. Calculating utility: preclinical evidence

O’Reilly, R.C., McClelland, J.L., 1994. Hippocampal conjunctive encoding, storage, for cost–benefit analysis by mesolimbic dopamine. Psychopharmacology (Berl)

and recall: avoiding a trade-off. Hippocampus 4, 661–682. 191, 483–495.

O’Keefe, J., Nadel, L., 1978b. The Hippocampus as a Cognitive Map. Oxford University Phillips, R.G., LeDoux, J.E., 1992. Differential contribution of amygdala and hippo-

Press, Oxford. campus to cued and contextual fear conditioning. Behav. Neurosci. 106, 274–

O’Keefe, J., Speakman, A., 1987. Single unit activity in the rat hippocampus during a 285.

spatial memory task. Exp. Brain Res. 68, 1–27. Poldrack, R.A., Packard, M.G., 2003. Competition among multiple memory systems:

Oakman, S.A., Faris, P.L., Kerr, P.E., Cozzari, C., Hartman, B.K., 1995. Distribution of converging evidence from animal and human brain studies. Neuropsychologia

pontomesencephalic cholinergic neurons projecting to substantia nigra differs 41, 245–251.

significantly from those projecting to ventral tegmental area. J. Neurosci. 15, Poucet, B., 1993. Spatial cognitive maps in animals: new hypotheses on their

5859–5869. structure and neural mechanisms. Psychol. Rev. 100, 163–182.

Olton, D.S., Becker, J.T., Handelmann, G.E., 1979. Hippocampus, space, and memory. Pragay, E.B., Mirsky, A.F., Ray, C.L., Turner, D.F., Mirsky, C.V., 1978. Neuronal activity

Brain Behav. Sci. 2, 313–365. in the brain stem reticular formation during performance of a ‘‘go-no go’’ visual

Olton, D.S., Samuelson, R.J., 1976. Remembrance of places passed: spatial memory attention task in the monkey. Exp. Neurol. 60, 83–95.

in rats. J. Exp. Psychol. Anim. Behav. Process. 2, 97–116. Puryear, C.B., Kim, M.J., Mizumori, S.J., 2010. Conjunctive encoding of movement

Olypher, A.V., Lansky, P., Fenton, A.A., 2002. Properties of the extra-positional signal and reward by ventral tegmental area neurons in the freely navigating rodent.

in hippocampal place cell discharge derived from the overdispersion in loca- Behav. Neurosci. 124, 234–247.

tion-specific firing. Neuroscience 111, 553–566. Puryear, C.B., Mizumori, S.J., 2008. Reward prediction error signals by reticular

Omelchenko, N., Sesack, S.R., 2009. Ultrastructural analysis of local collaterals of rat formation neurons. Learn. Mem. 15, 895–898.

ventral tegmental area neurons: GABA phenotype and synapses onto dopamine Quirk, G.J., Muller, R.U., Kubie, J.L., 1990. The firing of hippocampal place cells in the

and GABA cells. Synapse 63, 895–906. dark depends on the rat’s recent experience. J. Neurosci. 10, 2008–2017.

Ostlund, S.B., Wassum, K.M., Murphy, N.P., Balleine, B.W., Maidment, N.T., 2011. Ragozzino, K.E., Leutgeb, S., Mizumori, S.J., 2001. Dorsal striatal head direction and

Extracellular dopamine levels in striatal subregions track shifts in motivation hippocampal place representations during spatial navigation. Exp. Brain Res.

and response cost during instrumental conditioning. J. Neurosci. 31, 200– 139, 372–376.

207. Ragozzino, M.E., 2003. Acetylcholine actions in the dorsomedial striatum support

Otmakhova, N.A., Lisman, J.E., 1996. D1/D5 dopamine receptor activation increases the flexible shifting of response patterns. Neurobiol. Learn. Mem. 80, 257–267.

the magnitude of early long-term potentiation at CA1 hippocampal synapses. J. Ragozzino, M.E., Detrick, S., Kesner, R.P., 1999a. Involvement of the prelimbic–

Neurosci. 16, 7478–7486. infralimbic areas of the rodent prefrontal cortex in behavioral flexibility for

Otmakhova, N.A., Lisman, J.E., 1998. D1/D5 dopamine receptors inhibit depotentia- place and response learning. J. Neurosci. 19, 4585–4594.

tion at CA1 synapses via cAMP-dependent mechanism. J. Neurosci. 18, 1270– Ragozzino, M.E., Mohler, E.G., Prior, M., Palencia, C.A., Rozman, S., 2009. Acetylcho-

1279. line activity in selective striatal regions supports behavioral flexibility. Neu-

Oyama, K., Hernadi, I., Iijima, T., Tsutsui, K., 2010. Reward prediction error coding in robiol. Learn. Mem. 91, 13–22.

dorsal striatal neurons. J. Neurosci. 30, 11447–11457. Ragozzino, M.E., Ragozzino, K.E., Mizumori, S.J., Kesner, R.P., 2002. Role of the

Packard, M.G., 1999. Glutamate infused posttraining into the hippocampus or dorsomedial striatum in behavioral flexibility for response and visual cue

caudate-putamen differentially strengthens place and response learning. Proc. discrimination learning. Behav. Neurosci. 116, 105–115.

Natl. Acad. Sci. U.S.A. 96, 12881–12886. Ragozzino, M.E., Wilcox, C., Raso, M., Kesner, R.P., 1999b. Involvement of rodent

Packard, M.G., 2009. Exhumed from thought: basal ganglia and response learning in prefrontal cortex subregions in strategy switching. Behav. Neurosci. 113, 32–41.

the plus-maze. Behav. Brain Res. 199, 24–31. Ranck Jr., J.B., 1973. Studies on single neurons in dorsal hippocampal formation and

Packard, M.G., Knowlton, B.J., 2002. Learning and memory functions of the basal septum in unrestrained rats. I. Behavioral correlates and firing repertoires. Exp.

ganglia. Annu. Rev. Neurosci. 25, 563–593. Neurol. 41, 461–531.

Packard, M.G., McGaugh, J.L., 1996. Inactivation of hippocampus or Rangel, A., Camerer, C., Montague, P.R., 2008. A framework for studying the

with lidocaine differentially affects expression of place and response learning. neurobiology of value-based decision making. Nat. Rev. Neurosci. 9, 545–556.

Neurobiol. Learn. Mem. 65, 65–72. Rawlins, J.N.P., 1985. Associations across time: the hippocampus as a temporary

Packard, M.G., Hirsh, R., White, N.M., 1989. Differential effects of fornix and caudate memory store. Brain Behav. Sci. 8, 479–496.

nucleus lesions on two radial maze tasks: evidence for multiple memory Redgrave, P., Gurney, K., 2006. The short-latency dopamine signal: a role in

systems. J. Neurosci. 9, 1465–1472. discovering novel actions? Nat. Rev. Neurosci. 7, 967–975.

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 133

Redgrave, P., Mitchell, I.J., Dean, P., 1987. Further evidence for segregated output Salamone, J.D., Correa, M., Farrar, A., Mingote, S.M., 2007. Effort-related functions of

channels from superior colliculus in rat: ipsilateral tecto-pontine and tecto- nucleus accumbens dopamine and associated forebrain circuits. Psychophar-

cuneiform projections have different cells of origin. Brain Res. 413, 170–174. macology (Berl) 191, 461–482.

Redgrave, P., Prescott, T.J., Gurney, K., 1999a. The basal ganglia: a vertebrate Salamone, J.D., Correa, M., Farrar, A.M., Nunes, E.J., Pardo, M., 2009. Dopamine,

solution to the selection problem? Neuroscience 89, 1009–1023. behavioral economics, and effort. Front. Behav. Neurosci. 3, 13.

Redgrave, P., Prescott, T.J., Gurney, K., 1999b. Is the short-latency dopamine Salamone, J.D., Steinpreis, R.E., McCullough, L.D., Smith, P., Grebel, D., Mahan, K.,

response too short to signal reward error? Trends Neurosci. 22, 146–151. 1991. Haloperidol and nucleus accumbens dopamine depletion suppress lever

Redish, A.D., Jensen, S., Johnson, A., Kurth-Nelson, Z., 2007. Reconciling reinforce- pressing for food but increase free food consumption in a novel food choice

ment learning models with behavioral extinction and renewal: implications for procedure. Psychopharmacology (Berl) 104, 515–521.

addiction, relapse, and problem gambling. Psychol. Rev. 114, 784–805. Sargolini, F., Florian, C., Oliverio, A., Mele, A., Roullet, P., 2003. Differential involve-

Redish, A.D., 1999. Beyond the Cognitive Map: From Place Cells to Episodic Memory. ment of NMDA and AMPA receptors within the nucleus accumbens in consoli-

The MIT Press, Boston. dation of information necessary for place navigation and guidance strategy of

Redish, A.D., Battaglia, F.P., Chawla, M.K., Ekstrom, A.D., Gerrard, J.L., Lipa, P., mice. Learn. Mem. 10, 285–292.

Rosenzweig, E.S., Worley, P.F., Guzowski, J.F., McNaughton, B.L., Barnes, C.A., Sargolini, F., Fyhn, M., Hafting, T., McNaughton, B.L., Witter, M.P., Moser, M.B.,

2001. Independence of firing correlates of anatomically proximate hippocampal Moser, E.I., 2006. Conjunctive representation of position, direction, and velocity

pyramidal cells. J. Neurosci. 21, RC134 (1–6). in entorhinal cortex. Science 312, 758–762.

Redish, A.D., Rosenzweig, E.S., Bohanick, J.D., McNaughton, B.L., Barnes, C.A., 2000. Sargolini, F., Roullet, P., Oliverio, A., Mele, A., 1999. Effects of lesions to the

Dynamics of hippocampal ensemble activity realignment: time versus space. J. glutamatergic afferents to the nucleus accumbens in the modulation of

Neurosci. 20, 9298–9309. reactivity to spatial and non-spatial novelty in mice. Neuroscience 93,

Reese, N.B., Garcia-Rill, E., Skinner, R.D., 1995. The pedunculopontine nucleus— 855–867.

auditory input, arousal and pathophysiology. Prog. Neurobiol. 47, 105–133. Savelli, F., Knierim, J.J., 2010. Hebbian analysis of the transformation of medial

Rescorla, R.A., Solomon, R.L., 1967. Two-process learning theory: relationships entorhinal grid-cell inputs to hippocampal place fields. J. Neurophysiol. 103,

between Pavlovian conditioning and instrumental learning. Psychol. Rev. 74, 3167–3183.

151–182. Schmitzer-Torbert, N., Redish, A.D., 2002. Development of path stereotypy in a

Rescorla, R.A., Wagner, A.R., 1972. A theory of Pavlovian conditioning: variations in single day in rats on a multiple-T maze. Arch. Ital. Biol. 140, 295–301.

the effectiveness of reinforcement and nonreinforcement. In: Black, A.H., Schmitzer-Torbert, N., Redish, A.D., 2004. Neuronal activity in the rodent dorsal

Prokasy, W.F. (Eds.), Classical Conditioning II: Current Research and Theory. striatum in sequential navigation: separation of spatial and reward responses

Appleton Century Crofts, New York, pp. 64–99. on the multiple T task. J. Neurophysiol. 91, 2259–2272.

Restle, F., 1957. Discrimination of cues in mazes: a resolution of the place-vs.- Schultz, W., 1997. Dopamine neurons and their role in reward mechanisms. Curr.

response question. Psychol. Rev. 64, 217–228. Opin. Neurobiol. 7, 191–197.

Richards, J.B., Mitchell, S.H., de Wit, H., Seiden, L.S., 1997. Determination of discount Schultz, W., 1998a. The phasic reward signal of primate dopamine neurons. Adv.

functions in rats with an adjusting-amount procedure. J. Exp. Anal. Behav. 67, Pharmacol. 42, 686–690.

353–366. Schultz, W., 1998b. Predictive reward signal of dopamine neurons. J. Neurophysiol.

Robbins, T.W., Everitt, B.J., 2002. Limbic–striatal memory systems and drug addic- 80, 1–27.

tion. Neurobiol. Learn. Mem. 78, 625–636. Schultz, W., 2002. Getting formal with dopamine and reward. Neuron 36, 241–263.

Robinson, D.L., Venton, B.J., Heien, M.L., Wightman, R.M., 2003. Detecting subsecond Schultz, W., 2010. Dopamine signals for reward value and risk: basic and recent

dopamine release with fast-scan cyclic voltammetry in vivo. Clin. Chem. 49, data. Behav. Brain Funct. 6, 24.

1763–1773. Schultz, W., Apicella, P., Ljungberg, T., 1993. Responses of monkey dopamine

Robinson, S., Rainwater, A.J., Hnasko, T.S., Palmiter, R.D., 2007. Viral restoration of neurons to reward and conditioned stimuli during successive steps of learning

dopamine signaling to the dorsal striatum restores instrumental conditioning a delayed response task. J. Neurosci. 13, 900–913.

to dopamine-deficient mice. Psychopharmacology (Berl) 191, 567–578. Schultz, W., Dayan, P., Montague, P.R., 1997. A neural substrate of prediction and

Robinson, S., Smith, D.M., Mizumori, S.J., Palmiter, R.D., 2004. Firing properties of reward. Science 275, 1593–1599.

dopamine neurons in freely moving dopamine-deficient mice: effects of dopa- Schultz, W., Dickinson, A., 2000. Neuronal coding of prediction errors. Annu. Rev.

mine receptor activation and anesthesia. Proc. Natl. Acad. Sci. U.S.A. 101, Neurosci. 23, 473–500.

13329–13334. Schultz, W., Romo, R., 1988. Neuronal activity in the monkey striatum during the

Roesch, M.R., Calu, D.J., Schoenbaum, G., 2007. Dopamine neurons encode the better initiation of movements. Exp. Brain Res. 71, 431–436.

option in rats deciding between differently delayed or sized rewards. Nat. Schultz, W., Romo, R., 1992. Role of primate basal ganglia and frontal cortex in the

Neurosci. 10, 1615–1624. internal generation of movements. I. Preparatory activity in the anterior stria-

Roesch, M.R., Singh, T., Brown, P.L., Mullins, S.E., Schoenbaum, G., 2009. Ventral tum. Exp. Brain Res. 91, 363–384.

striatal neurons encode the value of the chosen action in rats deciding between Schweimer, J., Hauber, W., 2006. Dopamine D1 receptors in the anterior cingulate

differently delayed or sized rewards. J. Neurosci. 29, 13365–13376. cortex regulate effort-based decision making. Learn. Mem. 13, 777–782.

Roitman, M.F., Stuber, G.D., Phillips, P.E., Wightman, R.M., Carelli, R.M., 2004. Seamans, J.K., Phillips, A.G., 1994. Selective memory impairments produced by

Dopamine operates as a subsecond modulator of food seeking. J. Neurosci. transient lidocaine-induced lesions of the nucleus accumbens in rats. Behav.

24, 1265–1271. Neurosci. 108, 456–468.

Roitman, M.F., Wheeler, R.A., Carelli, R.M., 2005. Nucleus accumbens neurons are Seamans, .J.K., Yang, C.R., 2004. The principal features and mechanisms of dopamine

innately tuned for rewarding and aversive taste stimuli, encode their predictors, modulation in the prefrontal cortex. Prog. Neurobiol. 74, 1–58.

and are linked to motor output. Neuron 45, 587–597. Sesack, S.R., Carr, D.B., Omelchenko, N., Pinto, A., 2003. Anatomical substrates for

Rolls, E.T., 1996. A theory of hippocampal function in memory. Hippocampus 6, glutamate–dopamine interactions: evidence for specificity of connections and

601–620. extrasynaptic actions. Ann. N. Y. Acad. Sci. 1003, 36–52.

Rosenzweig, E.S., Redish, A.D., McNaughton, B.L., Barnes, C.A., 2003. Hippocampal Sesack, S.R., Grace, A.A., 2010. Cortico-basal ganglia reward network: microcircuit-

map realignment and spatial learning. Nat. Neurosci. 6, 609–615. ry. Neuropsychopharmacology 35, 27–47.

Rossato, J.I., Bevilaqua, L.R., Izquierdo, I., Medina, J.H., Cammarota, M., 2009. Setlow, B., McGaugh, J.L., 1998. Sulpiride infused into the nucleus accumbens

Dopamine controls persistence of long-term memory storage. Science 325, posttraining impairs memory of spatial water maze training. Behav. Neurosci.

1017–1020. 112, 603–610.

Roullet, P., Sargolini, F., Oliverio, A., Mele, A., 2001. NMDA and AMPA antagonist Setlow, B., Schoenbaum, G., Gallagher, M., 2003. Neural encoding in ventral striatum

infusions into the ventral striatum impair different steps of spatial information during olfactory discrimination learning. Neuron 38, 625–636.

processing in a nonassociative task in mice. J. Neurosci. 21, 2143–2149. Seymour, B., O’Doherty, J.P., Dayan, P., Koltzenburg, M., Jones, A.K., Dolan, R.J.,

Sabatino, M., , G., Liberti, G., Vella, N., La Grutta, V., 1985. Striatal and septal Friston, K.J., Frackowiak, R.S., 2004. Temporal difference models describe

influence on hippocampal theta and spikes in the cat. Neurosci. Lett. 61, 55–59. higher-order learning in humans. Nature 429, 664–667.

Sakurai, Y., 1994. Involvement of auditory cortical and hippocampal neurons in Siapas, A.G., Lubenov, E.V., Wilson, M.A., 2005. Prefrontal phase locking to hippo-

auditory working memory and reference memory in the rat. J. Neurosci. 14, campal theta oscillations. Neuron 46, 141–151.

2606–2623. Sidman, M., Fletcher, F.G., 1968. A demonstration of auto-shaping with monkeys. J.

Salamone, J.D., 1994. The involvement of nucleus accumbens dopamine in appeti- Exp. Anal. Behav. 11, 307–309.

tive and aversive motivation. Behav. Brain Res. 61, 117–133. Singer, A.C., Frank, L.M., 2009. Rewarded outcomes enhance reactivation of experi-

Salamone, J.D., 2002. Functional significance of nucleus accumbens dopamine: ence in the hippocampus. Neuron 64, 910–921.

behavior, pharmacology and neurochemistry. Behav. Brain Res. 137, 1. Sink, K.S., Vemuri, V.K., Olszewska, T., Makriyannis, A., Salamone, J.D., 2008.

Salamone, J.D., 2007. Functions of mesolimbic dopamine: changing concepts and Cannabinoid CB1 antagonists and dopamine antagonists produce different

shifting paradigms. Psychopharmacology (Berl) 191, 389. effects on a task involving response allocation and effort-related choice in

Salamone, J.D., Arizzi, M.N., Sandoval, M.D., Cervone, K.M., Aberman, J.E., 2002. food-seeking behavior. Psychopharmacology (Berl) 196, 565–574.

Dopamine antagonists alter response allocation but do not suppress appetite for Skaggs, W.E., McNaughton, B.L., Wilson, M.A., Barnes, C.A., 1996. Theta phase

food in rats: contrast between the effects of SKF 83566, raclopride, and precession in hippocampal neuronal populations and the compression of

fenfluramine on a concurrent choice task. Psychopharmacology (Berl) 160, temporal sequences. Hippocampus 6, 149–172.

371–380. Small, W.S., 1899. Notes on the psychic development of the young white rat. Am. J.

Salamone, J.D., Correa, M., 2002. Motivational views of reinforcement: implications Psychol. 11, 80–100.

for understanding the behavioral functions of nucleus accumbens dopamine. Small, W.S., 1900. An experimental study of the mental processes of the rat. Am. J.

Behav. Brain Res. 137, 3–25. Psychol. 11, 133–165.

134 M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135

Small, W.S., 1901. Experimental study of the mental processes of the rat. Am. J. Tse, D., Langston, R.F., Kakeyama, M., Bethus, I., Spooner, P.A., Wood, E.R., Witter,

Psychol. 12, 206–239. M.P., Morris, R.G., 2007. Schemas and memory consolidation. Science 316, 76–

Smith-Roe, S.L., Kelley, A.E., 2000. Coincident activation of NMDA and dopamine D1 82.

receptors within the nucleus accumbens core is required for appetitive instru- Tulving, E., 2002. Episodic memory: from mind to brain. Annu. Rev. Psychol. 53, 1–

mental learning. J. Neurosci. 20, 7737–7742. 25.

Smith-Roe, S.L., Sadeghian, K., Kelley, A.E., 1999. Spatial learning and performance Usiello, A., Sargolini, F., Roullet, P., Ammassari-Teule, M., Passino, E., Oliverio, A.,

in the radial arm maze is impaired after N-methyl-D-aspartate (NMDA) receptor Mele, A., 1998. N-methyl-D-aspartate receptors in the nucleus accumbens are

blockade in striatal subregions. Behav. Neurosci. 113, 703–717. involved in detection of spatial novelty in mice. Psychopharmacology (Berl)

Smith, D.M., Mizumori, S.J., 2006a. Hippocampal place cells, context, and episodic 137, 175–183.

memory. Hippocampus 16, 716–729. Usuda, I., Tanaka, K., Chiba, T., 1998. Efferent projections of the nucleus accumbens

Smith, D.M., Mizumori, S.J., 2006b. Learning-related development of context-spe- in the rat with special reference to subdivision of the nucleus: biotinylated

cific neuronal responses to places and events: the hippocampal role in context dextran amine study. Brain Res. 797, 73–93.

processing. J. Neurosci. 26, 3154–3163. Van Cauter, T., Poucet, B., Save, E., 2008. Unstable CA1 place cell representation in

Song, E.Y., Kim, Y.B., Kim, Y.H., Jung, M.W., 2005. Role of active movement in place- rats with entorhinal cortex lesions. Eur. J. Neurosci. 27, 1933–1946.

specific firing of hippocampal neurons. Hippocampus 15, 8–17. Van den Bercken, J.H., Cools, A.R., 1982. Evidence for a role of the caudate nucleus in

Sotak, B.N., Hnasko, T.S., Robinson, S., Kremer, E.J., Palmiter, R.D., 2005. Dysregula- the sequential organization of behavior. Behav. Brain Res. 4, 319–327.

tion of dopamine signaling in the dorsal striatum inhibits feeding. Brain Res. van den Bos, R., Lasthuis, W., den Heijer, E., van der Harst, J., Spruijt, B., 2006. Toward

1061, 88–96. a rodent model of the Iowa gambling task. Behav. Res. Methods 38, 470–478.

Squire, L.R., Knowlton, B., Musen, G., 1993. The structure and organization of van der Meer, M.A., Redish, A.D., 2011. Ventral striatum: a critical look at models of

memory. Annu. Rev. Psychol. 44, 453–495. learning and evaluation. Curr. Opin. Neurobiol. 21, 387–392.

Squire, L.R., 1994. Memory and forgetting: long-term and gradual changes in van der Meer, M.A., Johnson, A., Schmitzer-Torbert, N.C., Redish, A.D., 2010. Triple

memory storage. Int. Rev. Neurobiol. 37, 243–269 discussion 248–285. dissociation of information processing in dorsal striatum, ventral striatum, and

Stephens, B.a.K.J., 1986. Foraging Theory. Princeton University Press, Princeton, NJ. hippocampus on a learned spatial decision task. Neuron 67, 25–32.

Stramiello, M., Wagner, J.J., 2008. D1/5 receptor-mediated enhancement of LTP van der Meer, M.A., Redish, A.D., 2009. Low and high gamma oscillations in rat

requires PKA. Src family kinases, and NR2B-containing NMDARs. Neurophar- ventral striatum have distinct relationships to behavior, reward, and spiking

macology 55, 871–877. activity on a learned spatial decision task. Front. Integr. Neurosci. 3, 9.

Suri, R.E., 2002. TD models of reward predictive responses in dopamine neurons. van der Meer, M.A., Redish, A.D., 2010. Expectancies in decision making, reinforce-

Neural Netw. 15, 523–533. ment learning, and ventral striatum. Front. Neurosci. 4, 6.

Suri, R.E., Schultz, W., 2001. Temporal difference model reproduces anticipatory van Dongen, Y.C., Deniau, J.M., Pennartz, C.M., Galis-de Graaf, Y., Voorn, P., Thierry,

neural activity. Neural Comput. 13, 841–862. A.M., Groenewegen, H.J., 2005. Anatomical evidence for direct connections

Surmeier, D.J., Ding, J., Day, M., Wang, Z., Shen, W., 2007. D1 and D2 dopamine- between the shell and core subregions of the rat nucleus accumbens. Neuro-

receptor modulation of striatal glutamatergic signaling in striatal medium science 136, 1049–1071.

spiny neurons. Trends Neurosci. 30, 228–235. van Groen, T., Wyss, J.M., 1990. Extrinsic projections from area CA1 of the rat

Surmeier, D.J., Shen, W., Day, M., Gertler, T., Chan, S., Tian, X., Plotkin, J.L., 2010. The hippocampus: olfactory, cortical, subcortical, and bilateral hippocampal for-

role of dopamine in modulating the structure and function of striatal circuits. mation projections. J. Comp. Neurol. 302, 515–528.

Prog. Brain Res. 183, 149–167. Van Strien, N.M., Cappaert, N.L., Witter, M.P., 2009. The anatomy of memory: an

Sutton, R.S., 1988. Learning to predict by the methods of temporal differences. interactive overview of the parahippocampal–hippocampal network. Nat. Rev.

Mach. Learn. 3, 9–44. Neurosci. 10, 272–282.

Sutton, R., Barto, A., 1998. Reinforcement Learning: An Introduction. MIT Press, Varela, F., Lachaux, J.P., Rodriguez, E., Martinerie, J., 2001. The brainweb: phase

Cambridge, MA. synchronization and large-scale integration. Nat. Rev. Neurosci. 2, 229–239.

Swanson, L.W., 2003. Brain Maps: Structure of the Rat Brain, 3rd edition. Academic Vida, I., Bartos, M., Jonas, P., 2006. Shunting inhibition improves robustness of

Press, San Diego, CA. gamma oscillations in hippocampal interneuron networks by homogenizing

Swanson, L.W., Cowan, W.M., 1977. An autoradiographic study of the organization firing rates. Neuron 49, 107–117.

of the efferent connections of the hippocampal formation in the rat. J. Comp. Vinogradova, O.S., 1995. Expression, control, and probable functional significance of

Neurol. 172, 49–84. the neuronal theta-rhythm. Prog. Neurobiol. 45, 523–583.

Tabuchi, E.T., Mulder, A.B., Wiener, S.I., 2000. Position and behavioral modulation of Voorn, P., Vanderschuren, L.J., Groenewegen, H.J., Robbins, T.W., Pennartz, C.M.,

synchronization of hippocampal and accumbens neuronal discharges in freely 2004. Putting a spin on the dorsal–ventral divide of the striatum. Trends

moving rats. Hippocampus 10, 717–728. Neurosci. 27, 468–474.

Taha, S.A., Nicola, S.M., Fields, H.L., 2007. Cue-evoked encoding of movement Waddington, K.D., Holden, L.R., 1979. Optimal foraging-flower selection by bees.

planning and execution in the rat nucleus accumbens. J. Physiol. 584, 801–818. Am. Nat. 114, 179–196.

Taube, J.S., Muller, R.U., Ranck Jr., J.B., 1990. Head-direction cells recorded from the Waelti, P., Dickinson, A., Schultz, W., 2001. Dopamine responses comply with basic

postsubiculum in freely moving rats. I. Description and quantitative analysis. J. assumptions of formal learning theory. Nature 412, 43–48.

Neurosci. 10, 420–435. Wakabayashi, K.T., Fields, H.L., Nicola, S.M., 2004. Dissociation of the role of nucleus

Terrazas, A., Krause, M., Lipa, P., Gothard, K.M., Barnes, C.A., McNaughton, B.L., 2005. accumbens dopamine in responding to reward-predictive cues and waiting for

Self-motion and the hippocampal spatial metric. J. Neurosci. 25, 8085–8096. reward. Behav. Brain Res. 154, 19–30.

Thorn, C.A., Atallah, H., Howe, M., Graybiel, A.M., 2010. Differential dynamics of Wall, V.Z., Parker, J.G., Fadok, J.P., Darvas, M., Zweifel, L., Palmiter, R.D., 2011. A

activity changes in dorsolateral and dorsomedial striatal loops during learning. behavioral genetics approach to understanding D1 receptor involvement in

Neuron 66, 781–795. phasic dopamine signaling. Mol. Cell. Neurosci. 46, 21–31.

Thorn, C.A., Graybiel, A.M., 2010. Pausing to regroup: thalamic gating of cortico- Walton, M.E., Bannerman, D.M., Rushworth, M.F., 2002. The role of rat medial

basal ganglia networks. Neuron 67, 175–178. frontal cortex in effort-based decision making. J. Neurosci. 22, 10996–11003.

Tobler, P.N., Dickinson, A., Schultz, W., 2003. Coding of predicted reward omission Walton, M.E., Kennerley, S.W., Bannerman, D.M., Phillips, P.E., Rushworth, M.F.,

by dopamine neurons in a conditioned inhibition paradigm. J. Neurosci. 23, 2006. Weighing up the benefits of work: behavioral and neural analyses of

10402–10410. effort-related decision making. Neural Netw. 19, 1302–1314.

Tobler, P.N., Fiorillo, C.D., Schultz, W., 2005. Adaptive coding of reward value by Wanat, M.J., Kuhnen, C.M., Phillips, P.E., 2010. Delays conferred by escalating costs

dopamine neurons. Science 307, 1642–1645. modulate dopamine release to rewards but not their predictors. J. Neurosci. 30,

Tolman, E.C., 1930. Maze performance a function of motivation and of reward as 12020–12027.

well as knowledge of the maze paths. J. Gen. Psychol. 4, 338–342. Wang, H.L., Morales, M., 2009. Pedunculopontine and laterodorsal tegmental nuclei

Tolman, E.C., 1938. The determiners of behavior at a choice point. Psychol. Rev. 46, contain distinct populations of cholinergic, glutamatergic and GABAergic neu-

318–336. rons in the rat. Eur. J. Neurosci. 29, 340–358.

Tolman, E.C., 1939. Prediction of vicarious trial and error by means of the schematic Wang, S.H., Morris, R.G., 2010. Hippocampal–neocortical interactions in memory

sowbug. Psychol. Rev. 46, 318–336. formation, consolidation, and reconsolidation. Annu. Rev. Psychol. 61 (49-79),

Tolman, E.C., 1948. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208. C41–C44.

Totterdell, S., Meredith, G.E., 1997. Topographical organization of projections from Watson, J.B., 1907. Kinaesthetic and organic sensations: their role in the reactions of

the entorhinal cortex to the striatum of the rat. Neuroscience 78, 715–729. the white rat. Psychol. Rev. Monogr. (Suppl. (8)) number 2.

Touretzky, D.S., Redish, A.D., 1996. Theory of rodent navigation based on interacting Whishaw, I.Q., Gorny, B., 1999. Path integration absent in scent-tracking fimbria-

representations of space. Hippocampus 6, 247–270. fornix rats: evidence for hippocampal involvement in ‘‘sense of direction’’ and

Tremblay, P.L., Bedard, M.A., Langlois, D., Blanchet, P.J., Lemay, M., Parent, M., ‘‘sense of distance’’ using self-movement cues. J. Neurosci. 19, 4662–4673.

2010. Movement chunking during sequence learning is a dopamine-depen- Whishaw, I.Q., Mittleman, G., Bunch, S.T., Dunnett, S.B., 1987. Impairments in the

dant process: a study conducted in Parkinson’s disease. Exp. Brain Res. 205, acquisition, retention and selection of spatial navigation strategies after medial

375–385. caudate-putamen lesions in rats. Behav. Brain Res. 24, 125–138.

Tremblay, P.L., Bedard, M.A., Levesque, M., Chebli, M., Parent, M., Courtemanche, White, I.M., Rebec, G.V., 1993. Responses of rat striatal neurons during performance

R., Blanchet, P.J., 2009. Motor sequence learning in primate: role of the D2 of a lever-release version of the conditioned avoidance response task. Brain Res.

receptor in movement chunking during consolidation. Behav. Brain Res. 198, 616, 71–82.

231–239. Whittington, M.A., Traub, R.D., Jefferys, J.G., 1995. Synchronized oscillations in

Treves, A., 2004. Computational constraints between retrieving the past and pre- interneuron networks driven by metabotropic glutamate receptor activation.

dicting the future, and the CA3–CA1 differentiation. Hippocampus 14, 539–556. Nature 373, 612–615.

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 135

Wickens, J.R., Budd, C.S., Hyland, B.I., Arbuthnott, G.W., 2007a. Striatal contributions Wood, E.R., Dudchenko, P.A., Robitsek, R.J., Eichenbaum, H., 2000. Hippocampal

to reward and decision making: making sense of regional variations in a neurons encode information about different types of memory episodes occur-

reiterated processing matrix. Ann. N. Y. Acad. Sci. 1104, 192–212. ring in the same location. Neuron 27, 623–633.

Wickens, J.R., Horvitz, J.C., Costa, R.M., Killcross, S., 2007b. Dopaminergic mecha- Woolf, N.J., 1991. Cholinergic systems in mammalian brain and spinal cord. Prog.

nisms in actions and habits. J. Neurosci. 27, 8181–8183. Neurobiol. 37, 475–524.

Wiener, S.I., 1993. Spatial and behavioral correlates of striatal neurons in rats Worden, L.T., Shahriari, M., Farrar, A.M., Sink, K.S., Hockemeyer, J., Muller, C.E.,

performing a self-initiated navigation task. J. Neurosci. 13, 3802–3817. Salamone, J.D., 2009. The adenosine A2A antagonist MSX-3 reverses the effort-

Wiener, S.I., 1996. Spatial, behavioral and sensory correlates of hippocampal CA1 related effects of dopamine blockade: differential interaction with D1 and D2

complex spike cell activity: implications for information processing functions. family antagonists. Psychopharmacology (Berl) 203, 489–499.

Prog. Neurobiol. 49, 335–361. Wright, C.I., Beijer, A.V., Groenewegen, H.J., 1996. Basal amygdaloid complex

Wiener, S.I., Korshunov, V.A., Garcia, R., Berthoz, A., 1995. Inertial, substratal and afferents to the rat nucleus accumbens are compartmentally organized. J.

landmark cue control of hippocampal CA1 place cell activity. Eur. J. Neurosci. 7, Neurosci. 16, 1877–1893.

2206–2219. Xi, Z.X., Stein, E.A., 1998. Nucleus accumbens dopamine release modulation by

Wiener, S.I., Paul, C.A., Eichenbaum, H., 1989. Spatial and behavioral correlates of mesolimbic GABAA receptors—an in vivo electrochemical study. Brain Res. 798,

hippocampal neuronal activity. J. Neurosci. 9, 2737–2763. 156–165.

Wightman, R.M., Robinson, D.L., 2002. Transient changes in mesolimbic dopamine Yeshenko, O., Guazzelli, A., Mizumori, S.J., 2004. Context-dependent reorganization

and their association with ‘reward’. J. Neurochem. 82, 721–735. of spatial and movement representations by simultaneously recorded hippo-

Wilcove, W.G., Miller, J.C., 1974. CS-USC presentations and a lever: human auto- campal and striatal neurons during performance of allocentric and egocentric

shaping. J. Exp. Psychol. 103, 868–877. tasks. Behav. Neurosci. 118, 751–769.

Williams, D.R., Williams, H., 1969. Auto-maintenance in the pigeon: sustained Yin, H.H., 2010. The sensorimotor striatum is necessary for serial order learning. J.

pecking despite contingent non-reinforcement. J. Exp. Anal. Behav. 12, 511– Neurosci. 30, 14719–14723.

520. Yin, H.H., Knowlton, B.J., 2004. Contributions of striatal subregions to place and

Williams, S., Mmbaga, N., Chirwa, S., 2006. Dopaminergic D1 receptor agonist SKF response learning. Learn. Mem. 11, 459–463.

38393 induces GAP-43 expression and long-term potentiation in hippocampus Yin, H.H., Knowlton, B.J., 2006. The role of the basal ganglia in habit formation. Nat.

in vivo. Neurosci. Lett. 402, 46–50. Rev. Neurosci. 7, 464–476.

Williams, Z.M., Eskandar, E.N., 2006. Selective enhancement of associative learning Yin, H.H., Knowlton, B.J., Balleine, B.W., 2004. Lesions of dorsolateral striatum

by microstimulation of the anterior caudate. Nat. Neurosci. 9, 562–568. preserve outcome expectancy but disrupt habit formation in instrumental

Wills, T.J., Cacucci, F., Burgess, N., O’Keefe, J., 2010. Development of the hippocampal learning. Eur. J. Neurosci. 19, 181–189.

cognitive map in preweanling rats. Science 328, 1573–1576. Yin, H.H., Knowlton, B.J., Balleine, B.W., 2006. Inactivation of dorsolateral striatum

Wilson, D.I., Bowman, E.M., 2005. Rat nucleus accumbens neurons predominantly enhances sensitivity to changes in the action-outcome contingency in instru-

respond to the outcome-related properties of conditioned stimuli rather than mental conditioning. Behav. Brain Res. 166, 189–196.

their behavioral-switching properties. J. Neurophysiol. 94, 49–61. Yin, H.H., Mulcare, S.P., Hilario, M.R., Clouse, E., Holloway, T., Davis, M.I., Hansson,

Wilson, D.I., MacLaren, D.A., Winn, P., 2009. Bar pressing for food: differential A.C., Lovinger, D.M., Costa, R.M., 2009. Dynamic reorganization of striatal

consequences of lesions to the anterior versus posterior pedunculopontine. Eur. circuits during the acquisition and consolidation of a skill. Nat. Neurosci. 12,

J. Neurosci. 30, 504–513. 333–341.

Wilson, M.A., McNaughton, B.L., 1993. Dynamics of the hippocampal ensemble code Yin, H.H., Ostlund, S.B., Balleine, B.W., 2008. Reward-guided learning beyond

for space. Science 261, 1055–1058. dopamine in the nucleus accumbens: the integrative functions of cortico-basal

Wilson, M.A., McNaughton, B.L., 1994. Reactivation of hippocampal ensemble ganglia networks. Eur. J. Neurosci. 28, 1437–1448.

memories during sleep. Science 265, 676–679. Yin, H.H., Ostlund, S.B., Knowlton, B.J., Balleine, B.W., 2005. The role of the dor-

Winn, P., 2006. How best to consider the structure and function of the peduncu- somedial striatum in instrumental conditioning. Eur. J. Neurosci. 22, 513–523.

lopontine tegmental nucleus: evidence from animal studies. J. Neurol. Sci. 248, Zahm, D.S., 1999. Functional–anatomical implications of the nucleus accumbens

234–250. core and shell subterritories. Ann. N. Y. Acad. Sci. 877, 113–128.

Wise, R.A., 2004. Dopamine, learning and motivation. Nat. Rev. Neurosci. 5, 483– Zahm, D.S., 2000. An integrative neuroanatomical perspective on some subcortical

494. substrates of adaptive responding with emphasis on the nucleus accumbens.

Wise, R.A., 2005. Forebrain substrates of reward and motivation. J. Comp. Neurol. Neurosci. Biobehav. Rev. 24, 85–105.

493, 115–121. Zahm, D.S., Brog, J.S., 1992. On the significance of subterritories in the ‘‘accumbens’’

Wise, R.A., 2006. Role of brain dopamine in food reward and reinforcement. Philos. part of the rat ventral striatum. Neuroscience 50, 751–767.

Trans. R. Soc. Lond. B: Biol. Sci. 361, 1149–1158. Zahm, D.S., Heimer, L., 1990. Two transpallidal pathways originating in the rat

Wise, R.A., 2009. Roles for nigrostriatal—not just mesocorticolimbic—dopamine in nucleus accumbens. J. Comp. Neurol. 302, 437–446.

reward and addiction. Trends Neurosci. 32, 517–524. Zahm, D.S., Heimer, L., 1993. Specificity in the efferent projections of the nucleus

Wisman, L.A., Sahin, G., Maingay, M., Leanza, G., Kirik, D., 2008. Functional conver- accumbens in the rat: comparison of the rostral pole projection patterns with

gence of dopaminergic and cholinergic input is critical for hippocampus- those of the core and shell. J. Comp. Neurol. 327, 220–232.

dependent working memory. J. Neurosci. 28, 7797–7807. Zahm, D.S., Williams, E., Wohltmann, C., 1996. Ventral striatopallidothalamic

Witter, M.P., Naber, P.A., van Haeften, T., Machielsen, W.C., Rombouts, S.A., Barkhof, projection: IV. Relative involvements of neurochemically distinct subterritories

F., Scheltens, P., Lopes da Silva, F.H., 2000. Cortico-hippocampal communication in the ventral pallidum and adjacent parts of the rostroventral forebrain. J.

by way of parallel parahippocampal-subicular pathways. Hippocampus 10, Comp. Neurol. 364, 340–362.

398–410. Zhang, L., Doyon, W.M., Clark, J.J., Phillips, P.E., Dani, J.A., 2009. Controls of tonic and

Wolterink, G., Phillips, G., Cador, M., Donselaar-Wolterink, I., Robbins, T.W., Everitt, phasic dopamine transmission in the dorsal and ventral striatum. Mol. Phar-

B.J., 1993. Relative roles of ventral striatal D1 and D2 dopamine receptors in macol. 76, 396–404.

responding with conditioned reinforcement. Psychopharmacology (Berl) 110, Zhou, L., Furuta, T., Kaneko, T., 2003. Chemical organization of projection neurons in

355–364. the rat accumbens nucleus and olfactory tubercle. Neuroscience 120, 783–798.

Womelsdorf, T., Fries, P., Mitra, P.P., Desimone, R., 2006. Gamma-band synchro- Zugaro, M.B., Monconduit, L., Buzsaki, G., 2005. Spike phase precession persists after

nization in visual cortex predicts speed of change detection. Nature 439, 733– transient intrahippocampal perturbation. Nat. Neurosci. 8, 67–71.

736. Zweifel, L., Fadok, J.P., Argilli, E., Garelick, M.G., Jones, G.L., Dickerson, T.M.K., Allens,

Womelsdorf, T., Schoffelen, J.M., Oostenveld, R., Singer, W., Desimone, R., Engel, A.K., J.M., Mizumori, S.J.Y., Bonci, A., Palmiter, R., 2011. Activation of dopamine

Fries, P., 2007. Modulation of neuronal interactions through neuronal synchro- neurons is critical for aversive conditioning and prevention of generalized

nization. Science 316, 1609–1612. anxiety. Nat. Neurosci. 14, 620–626.