Prefrontal Mechanisms Underlying Sequential Tasks

By

Feng-Kuei Chiang

A dissertation submitted in partial satisfaction of the requirements for the degree of

Doctor of Philosophy

in

Psychology

in the

Graduate Division

of the

University of California, Berkeley

Committee in charge:

Professor Joni Wallis, Chair Professor Richard B. Ivry Professor Ming Hsu

Summer 2017 Prefrontal Mechanisms Underlying Sequential Tasks

Copyright c 2017

by

Feng-Kuei Chiang Abstract

Prefrontal Mechanisms Underlying Sequential Tasks

by

Feng-Kuei Chiang

Doctor of Philosophy in Psychology

University of California, Berkeley

Professor Joni Wallis, Chair

For decades, mechanisms of cognitive behaviors have been studied in a simple form of stimulus- and action-outcome associations. Those seminal studies serve as funda- mental frameworks and enable us to explore how neural activities in brain represent the increasingly complex and temporarily-extended associations in sequential tasks. The (PFC) has long been suspected to play an important role in cognitive control, in the ability to orchestrate thought and action in accordance with internal representation. In particular, PFC “top-down” processes serve as an internal signal to guide high-level cognitive functions, such as working memory (WM), abstract rules, or goal-directed decision-making. Several models have described how task information, in- cluding supra- and super-ordinate information, is organized in prefrontal cortex, but it remains unclear precisely how this cognitive information maps onto neurophysiological functions.

To explore these issues, we devised primate versions of two tasks that tax sequential behavior: a spatial self-ordered search task and a hierarchical reinforcement learning (HRL) task. These tasks examine how sequential behavior interacts with WM and reinforcement learning (RL), respectively. We examined how prefrontal neurons en- coded task-related information across these two cognitive tasks. Our results show that prefrontal neurons are capable of adaptively regulating the precision with which infor- mation is encoded. In the spatial self-ordered search task, lateral prefrontal neurons have spatiotemporal mnemonic fields, in that their firing rates are modulated both by the spatial location of future selection behaviors and the temporal organization of that behavior. Furthermore, the precision of this tuning can be dynamically modulated by the demands of the task. In the HRL task, prefrontal neurons are involved in inte- grating the abstract subject values. Especially, we found that the firing rate of a small population of neurons encoded pseudo-reward prediction errors and these neurons were restricted to anterior cingulate cortex. Taken together, our findings suggest that pre- frontal neurons encode not only basic information associated with external stimuli, but also high-level information that is used to organize task relevant behaviors.

1 To my parents, family, and friends

All is Well

i Contents

1 Introduction 1 1.1 Outline of thesis ...... 2 1.2 Prefrontal cortex (PFC) ...... 3 1.2.1 Lateral prefrontal cortex (LPFC) ...... 3 1.2.2 (OFC) ...... 5 1.2.3 Anterior cingulate cortex (ACC) ...... 7 1.2.4 Closing remarks on anatomy ...... 9 1.3 The role of PFC in working memory ...... 9 1.4 The role of PFC in reinforcement learning ...... 10 1.4.1 Historical perspective ...... 11 1.4.2 Reinforcement learning and hierarchical behaviors ...... 11

2 General Methods 19 2.1 Overview ...... 19 2.2 Behavioral training materials and methods ...... 19 2.2.1 Subjects ...... 19 2.2.2 Behavioral training ...... 20 2.2.3 Materials and methods for training ...... 20 2.3 Neurophysiological techniques ...... 21 2.3.1 Isolation of recording sites ...... 21 2.3.2 Surgery ...... 22 2.4 Recordings ...... 23 2.4.1 Materials and methods for neurophysiology ...... 23 2.5 Statistical analysis ...... 24

3 Spatiotemporal encoding of search strategies by prefrontal neurons 28 3.1 Introduction ...... 28 3.2 Methods ...... 29 3.3 Results ...... 30 3.3.1 Task performance ...... 30 3.3.2 Behavioral strategies ...... 31 3.3.3 Neurophysiological analysis ...... 32 3.3.4 Encoding of spatial information ...... 33

ii 3.3.5 Effects of behavioral strategy on spatial encoding ...... 34 3.4 Discussion ...... 35 3.4.1 Role of prefrontal cortex in the sequential organization of behavior . . 35 3.4.2 Contribution of PFC to working memory ...... 36 3.4.3 Role of PFC in cognitive control ...... 37

4 Neuronal encoding in prefrontal cortex during hierarchical reinforcement learning 48 4.1 Introduction ...... 48 4.2 Methods ...... 49 4.2.1 Subjects and behavioral task ...... 49 4.2.2 Neurophysiological procedures ...... 51 4.2.3 Statistical methods ...... 51 4.3 Results ...... 54 4.3.1 Behavioral task performance ...... 54 4.3.2 Neural encoding ...... 55 4.4 Discussion ...... 55

5 Conclusion 67 5.1 Summary of results ...... 67 5.2 Remaining questions ...... 68 5.2.1 Errors in sequential tasks ...... 68 5.2.2 Adaptation from prediction errors ...... 69 5.3 Future directions ...... 69 5.4 Closing remarks ...... 70

6 Bibliography 71

iii List of Figures

1.1 Hierarchical representation of a routine sequential task. From Humphreys and Forde, 1998...... 14 1.2 (A) Flow of control in one step of a sequential task, with blue representing the increased involvement of supervisory control and red representing increased involvement of schematic control during a single step. (B) Representation of the multiple, hierarchical levels that can characterize sequences. Each step in more concrete motor sequences or more abstract task sequences may engage supervisory or schematic control and the interaction between them. From Desrochers et al. (2015)...... 15 1.3 The medial, lateral, and orbital surfaces of the prefrontal cortex. Monkey outlines taken from Carmichael and Price (1996) and Petrides and Pandya (1999); human outline taken from Ongur and Price (2000)...... 16 1.4 Medial and orbital networks of the prefrontal cortex taken from Carmichael and Price (1996). Adapted and re-printed with permission from Wallis (2012). 17 1.5 Overview of Content-Specific Activity during Working Memory Delays in the Macaque (Left) and Human (Right) Brain. Icons indicate persistent stimulus- selective activity for each stimulus type indicated by the icon (see legend) at the respective locations. Both left- and right-sided effects are shown on the left hemisphere. A full list of individual studies is reported in the supple- mental information from Christophel, et al., 2017. Brain areas are identified by abbreviations: AC, auditory cortex; ERC, entorhinal cortex; EVC, early visual cortex; FEF, frontal eye fields; FG, fusiform gyrus; hMT+, human analog to MT/MST; IPS, intraparietal sulcus; IT, inferior temporal cortex; LOC, lateral occipital complex; lPFC, lateral prefrontal cortex; PM, premotor cortex; PPC, posterior parietal cortex...... 18

2.1 Experimental set up used to control behavioral events and record neurophys- iological data...... 25 2.2 Magnetic resonance imaging (MRI) scans illustrating the coronal slice from the middle of our recording locations illustrating potential electrode paths. Example on the top is from subject R in WM task. Example on the bottom is from subject Q in HRL task. Possible electrode tracks are highlighted in white. Brain areas recorded from are highlighted in red (LPFC), green (ACC), and blue (OFC)...... 26

iv 2.3 Cluster isolation of neuronal waveforms. On the left are 32 channels, of which many have neurons on them, i.e., detected waveforms. A sample channel is zoomed in in the middle panel to reveal three distinct waveforms. Those waveforms are decomposed into components in Plexon’s online sorter, and each waveform is plotted as a single dot in the right panel. Clusters of wave- forms are then isolated manually and entered into neuronal analyses. . . . . 27

3.1 (A) Spatial self-ordered search task. (B) Each configuration consisted of 6 targets (green filled circle), which were selected from 36 possible locations (gray filled circle). We ensured that targets were approximately balanced across the display by requiring that the centroid of the configuration (red cross) was located within ± 3◦ fixation window (black circle). The inter- target distances were spaced at least 6◦ to avoid overlap of the eye position detection window around the target. (C) Number of unique sequences of target selection per configuration...... 39 3.2 Behavioral performance. (A) Distribution of the number of incorrect saccades per trial. (B) Observed and expected error rates plotted as a function of which saccade in the sequence was performed. The observed error rate was significantly higher than the expected error rate for the last saccade in both subjects (binomial test, ∗ ∗ ∗p < 0.001). (C) Reaction times as a function of saccade. Boxplots indicate the 25th, 50th, and 75th percentiles of the sRT distribution. Red dots indicated the mean sRT...... 40 3.3 (A) For each block of trials, we calculated how many times a particular target was selected by a particular saccade. We then plotted this data according to the most common sequence by which targets were selected. (B) Representative blocks, illustrating how the pseudocolor plots changed as a function of the stereotype index. Blocks above the line are from subject R, while blocks below the line are from subject Q. (C) Increased values of the stereotype index were correlated with better behavioral performance (fewer incorrect saccades) in both subjects...... 41 3.4 Recording locations. (A) MRI of a coronal slice through the of subject R. Red region in each hemisphere denotes the area of the LPFC inves- tigated. White lines depict electrode paths. (B) We measured the anterior- posterior position from the interaural line (x-axis), and the lateral-medial position relative to the lip of the ventral bank of the principal sulcus (0 point on y-axis). Gray shading indicates unfolded sulci. Diameter of the circles in- dicate the number of recordings from a given location. SA = superior arcuate sulcus; IA = inferior arcuate sulcus; P = principal sulcus...... 42 3.5 (A) Percentage of selective neurons that encoded different predictors in each epoch. Black circles indicate the percentage of selective neurons defined as those that encoded at least one predictor during the epoch. (B) The propor- tion of beta coefficients with positive (+) or negative (-) values during the hold epoch...... 43

v 3.6 (A) Color axis indicates the percentage of neurons that encode the spatial position of each of the targets in the sequence (x-axis), during the epochs associated with selecting a specific target in the sequence (y-axis). Entries on the main diagonal represent neurons that encoded the current target’s spatial information (a significant beta associated with either the x- or y-coordinate). Entries on the lower left diagonals represent neurons that encoded the spatial position of previously selected targets. Entries on the upper right diagonals represent neurons that encoded the spatial position of upcoming targets. Top and bottom panels are from subject R and Q, respectively. (B) Mean amount of spatial information encoded by spatially selective neurons as function of saccade number relative to the current saccade. Saccades related to previously selected targets are negative and upcoming saccades are positive. Dashed line indicates chance levels, determined by randomly shuffling the assignment between neural firing rates and spatial position within the sequence...... 44 3.7 (A) Single neuron examples showing the effects of the stereotype strategy on spatial tuning. Spike density histograms are plotted from two neurons, one recorded in subject R (left two panels) and one recorded from subject Q (right two panels). The color of the plots refers to the position of the selected target on the screen, as shown in the key. Plots on the top are from the three blocks in the session with the higher SI, whereas the plots on the bottom are from the three blocks with the lower SI. Less spatial tuning is observed when the animal is searching through the targets using a more stereotyped strategy. (B) Spatial tuning, as measured by the R-square from the reduced model, as a function of SI. Stronger spatial tuning occurs in blocks with less stereotyped behavior...... 45 3.8 (Continued)(C) Spatial tuning from the same configuration sorted from low to high SI, and then grouped into three ranks. Spatial tuning decreases for the same configuration display when the subject adopts a more stereotyped search strategy (r = −0.0654; p < 5×10−5. (D) Polar plot showing the distribution of standardized mean firing rates from the neuron in the left panel of (A), corresponding to the selected target location during the hold epoch. Firing rates are sorted according to whether they are from blocks of low or high SI. Gray and black lines indicate the mean response vector for low and high blocks, respectively. The difference between these two response vectors (d’) was then calculated. (E) Distribution of d’ values for neurons that were spatially tuned in either the low or high SI blocks only or both types of block...... 46

vi 3.9 Encoding of spatial information for targets immediately preceding and/or fol- lowing the currently selected target. In subject R, 40.9% of neurons encoded preceding targets compared to 44.2% that encoded upcoming targets. The percentage of neurons that encoded both preceding and upcoming targets was 32.4%, which was significantly greater than we would have expected by chance if these were two independent populations (chance = 18.1%, binomial test, p < 5×10−5). Likewise, in subject Q, 30.1% of neurons encoded preced- ing targets, 30.9% encoded upcoming targets and 21.7% encoded both (chance = 9.3%, binomial test, binomial test, p < 1×10−12...... 47

4.1 (A) Timeline of the behavioral task. On choice trials, subjects chose one of two stimulus configurations and then moved a joystick back-and-forth to move the green dot forwards on a green-white-blue route. The optimal choice was to select the shortest route, since this would lead to reward more quickly and with less physical effort. On jump trials, a single configuration was presented, followed by a second configuration, which sometimes required updating the expectancy of how much work would be required to earn the reward. (B) Sample post-jump configurations. The original configuration is shown in the top left. Numbers above the configuration indicate the number of steps for TS, SG and SD...... 59 4.2 (A) AIC weights across the 14 tested models. The AIC weight is the relative likelihood of a given model within the set of tested models. The full model was clearly favored in both subjects, although a logarithmic transformation of distance was favored by subject R, whereas subject Q estimated distances linearly. (B) Behavioral performance during the choice trials. The probability of selecting the left configuration as a function of the difference in value of the left and right configurations as determined by Equations 1 and 2. Gray circles indicate actual data, whereas green lines indicate the best fitting model as determined by a formal model comparison...... 60 4.3 (A) Lever movement times for steps relative to subgoal or goal positions. (B) Movement times relative to the previous steps. The diagram indicates the specific movements referenced by the x-axis. Asterisk indicated that the values were significant lower than any other values after pairwise comparisons (p < 0.01, Bonferroni corrected)...... 61

vii 4.4 (A) Coronal MRI scans illustrating potential electrode paths. Red, green, and blue target areas indicate LPFC, ACC, and OFC, respectively. (B) Flattened reconstructions of the cortex indicating the locations of recorded neurons. The size of the circles indicates the number of neurons recorded at that location. We measured the anterior–posterior position from the interaural line (x-axis), and the dorsoventral position relative to the lip of the ventral bank of the prin- cipal sulcus (0 point on y-axis). Gray shading indicates unfolded sulci. LPFC recording locations were located within the principal sulcus. ACC recording locations were located within the cingulate sulcus. OFC recording locations were largely located within and between the lateral and medial orbital sulci. All recording locations are plotted relative to the ventral bank of the principal sulcus, which is a consistent landmark across animals. PS, principal sulcus; CS, cingulate sulcus; LOS, lateral orbital sulcus; MOS, medial orbital sulcus. 62 4.5 Percentage of neurons in LPFC, OFC, and ACC that encode different pre- dictors during the pre-jump and post-jump configurations. Shading indicates the proportion of neurons that encoded the variable with a given relation- ship: dark shading = positive, light shading = negative. For the post-jump configuration, gray color indicates the proportion encoding both positive and negative predictors, which was possible since we included these as separate regressors. Asterisks indicate that the prevalence of neurons is significantly different between areas (chi-squared test, ∗ p < 0.05, ∗∗ p < 0.01). Dotted line indicates the percentage of selective neurons expected by chance given our statistical threshold for selectivity...... 63 4.6 Spike density histograms illustrating selective neurons for subject value (SV) from three recording areas during the pre-jump (A) or post-jump (B) configu- rations. In each plot, the top panel indicates mean firing rate as a function of SV. The bottom panel indicates that the coefficient of partial determination (CPD) for SV. This measure indicates the amount of variance in the neuron’s firing rate that is accounted for by SV and cannot be explained by any of the other parameters in the regression model (see Materials and Methods). Ma- genta data points indicate that the SV significantly predicts neuronal firing rate. The gray lines indicate the onset and offset of the pre- and post-jump configurations...... 64 4.7 Encoding of the SV of the pre-jump configuration across the population in three prefrontal areas. Each horizontal line on the plot indicates the selectivity of a single neuron as measured using the coefficient of partial determination (see Materials and Methods). Neurons have been sorted according to the latency at which they first show selectivity. The vertical white lines indicate the onset and offset of the pre-jump configuration...... 65 4.8 Encoding of predictors related to the post-jump configuration across the pop- ulation in three prefrontal areas. Conventions are as in Figure 4.7...... 66

viii Chapter 1

Introduction

In our daily life, assembling furniture is one of most common things to do when people settle down in a new place. Luckily, modern furniture is designed friendly for putting pieces together without professional skills, and it turns out to be satisfied and functional equipment such as working desk or bookcase. However, as people may notice after assembled them, tilt and unbalanced legs always happened before final adjustment. It is not easy to notice this annoying issue while we connect each leg by repeated actions in sequence but some strategies are helpful to tackle this problem. One of them is to work step-by-step with diagonal way in four corners no matter which we start from which leg. This strategy could also apply to apparatus that require more precision. For example, to fasten a cooling fan onto the surface of the computer central procession unit. Another familiar but a bit more complicated example is cooking in the kitchen. A culinary beginner is sometimes hard to understand how each cooking step influence the palatable meals even following the recipes. However, a skillful cook chef can easily tell which ingredient enhances the flavor when making salsa sauces.

As the above experience, organized behaviors with a series of actions are common in our daily life. Lashley (1951) rejected reflex chaining accounts of the sequencing of behavior and argued instead for a more cognitive account in which behavioral sequences are typically controlled with hierarchical plans. To understand the organization of the complex daily be- haviors, an example of making a cup of tea from Humphreys and Forde (1998) introduced the hierarchy of superordinate and subordinate actions (Figure 1.1). In brief, superordinate actions are able to be broken down into several basic actions, such as put teabag and pour hot water into teapot. Further, one of superordinate actions is composed of few more sub- ordinate actions. One can even imagine that these actions are further broken down into motor programs, or sequential muscle flexions and extensions. In addition, the purpose of superordinate actions itself may also belong to higher-order goals, e.g. making a set of breakfast.

The models of sequential actions have been proposed in viewpoints of attention (Norman

1 and Shallice, 1986), computation (Cooper and Shallice, 2000), and connectionist approaches (Botvinick and Plaut, 2006). Although there are some disagreements in between, generally, the complex task sequences have been categorized by automatic (e.g. the furniture example) and supervised (e.g. the salsa sources example) sequences. More recently, Desrochers et al. (2015b) elaborated these features and proposed a sequential task control system with the supervisory controller and the schematic controller (Figure 1.2A). The supervisory controller keeps track of progress such as monitoring and handling exceptions toward a higher level goal. The schematic controller executes a set of proceduralized actions which are organized as a unitary schema. Further, this control system in Figure 1.2B provided a framework with not only a temporal order but also a nested structure enables us to map the control modules onto the neural substrate of hierarchical behaviors (Badre, 2008).

The prefrontal cortex (PFC) has long been suspected to play an important role in cog- nitive control, in the ability to orchestrate thought and action in accordance with internal representation (Miller and Cohen, 2001). In particular, PFC “top-down” processes serve as an internal signal to guide high-level cognitive functions, such as working memory (WM), abstract rules, or goal-directed decision-making. Several models have described how task in- formation, including supra- and super-ordinate information, is organized in prefrontal cortex (Duncan, 2001; Sigala et al., 2008), but it remains unclear precisely how this cognitive infor- mation maps onto neurophysiological functions. To explore these issues, two projects in my dissertation use sequential behavioral tasks that are based on classic theoretical frameworks in reinforcement learning (RL) and working memory (WM). Then, we examined how PFC neurons encoded task-related information across these different cognitive tasks. Our find- ings suggest that PFC neurons encode not only basic information associated with external stimuli, but also high-level information that is used to organize task relevant behaviors.

1.1 Outline of thesis

The thesis consists of four chapters beyond this current one. The following Chapter 2 describes the procedures and equipment used across experiments. Afterward, each of the two experiments is described in full. The rest experiment is in Chapter 3 and addresses the effect of mnemonic strategies on tuning of prefrontal neurons in a visuospatial WM task with self-ordered sequential responses. The second experiment is in Chapter 4 and addresses the neural basis of pseudo-reward prediction errors in a RL with hierarchical task structure. Lastly, Chapter 5 closes with conclusions for each experiment and future directions.

In the remainder of this chapter, we introduce concepts for understanding PFC and its role in reinforcement learning and working memory. We begin with background on the anatomy of the three PFC regions, starting with lateral PFC (LPFC), then orbitofrontal cortex (OFC), and anterior cingulate cortex (ACC). We also introduce specific subsections of these regions, which are the subsequent foci of our experiments. Next, we provide con-

2 temporary theoretical frameworks for studying reinforcement learning and working memory. Specifically, we explain what the challenges both frameworks confront with when applying those predictions to a series of actions, which consist of a sequential or strained organization in human and animals’ naturalistic behaviors. Then we describe the functionality of the brain areas in relation to sequential behaviors.

1.2 Prefrontal cortex (PFC)

PFC is located at the front of the frontal lobe, rostral to the adjacent premotor cortex as divided approximately by the arcuate sulcus in the monkey (Barbas and Pandya, 1989). Homologies between rodents and primates are difficult and exacerbated by the lack of sulci in rodents. One method for defining PFC across species is to locate the projection zone from the mediodorsal nucleus of the thalamus (Rose and Woolsey, 1948). Using this classification, PFC comprises approximately thirty percent of the human neocortex, a larger proportion of the cortex than in any other species (Fuster, 1997). PFC is the slowest brain region to develop relative to other cortical areas and is only fully mature after a person enters their twenties (Giedd et al., 1999; Fuster, 2001). Anatomically positioned for diverse information integration, PFC regions connect with cortical and subcortical motor areas, virtually all sensory areas, and with midbrain and limbic structures associated with the processing of reward, emotion, and memory (Barbas and Pandya, 1989; Fuster, 1997; Miller and Cohen, 2001). There are three gross subregions in PFC (Figure 1.3): orbitofrontal cortex (OFC), medial PFC, which includes the anterior cingulate cortex (ACC), and lateral areas (LPFC) and these areas are highly interconnected (Barbas and Pandya, 1989). The following sections will discuss medial, lateral, and orbital PFC subregions in more detail including functionality as well as morphology across species and reciprocal connectivity within the brain.

1.2.1 Lateral prefrontal cortex (LPFC)

Covering the lateral surface of the anterior end of the brain in monkeys and humans, LPFC has several subregions: areas 9 and 46 dorsally and 44, 45, and 47/12 ventrally. These areas are flanked by the frontopolar cortex (area 10) anteriorly and the frontal eye fields (area 8) posteriorly (Petrides and Pandya, 1999). This thesis will focus on areas 9 and 46, abbreviated LPFC, in the macaque monkey. This region is homologous to the human middle frontal gyrus (Petrides and Pandya, 1999). Because rodent PFC is agranular, it is difficult to identify a homologous region in rodents. However, Uylings and colleagues claim that dorsomedial shoulder regions of the rat PFC are similar to the dorsolateral portion of PFC in primates (Uylings et al., 2003).

LPFC receives sensory and motor input and is well positioned to integrate complex

3 stimuli information (Petrides and Pandya, 1999). Polysensory information enters from the superior temporal sulcus. Visual signals related to object recognition and location are re- ceived from inferotemporal (IT) cortex and V5/MT, respectively. Abstract visual informa- tion, including that related to body position in visual space, is projected to LPFC from posterior parietal cortex. Although LPFC is not directly connected to primary motor areas, it is interconnected with premotor cortex and frontal and supplementary eye fields, convey- ing high-level motor planning and preparatory activity rather than low-level muscle control commands. In contrast to its strong sensorimotor connections, LPFC has little direct limbic connectivity although it can process memory-related information via reciprocal connections with the hippocampus. Limbic information is also received from interconnections with OFC.

Petrides and Pandya (1999, 2002) have shown that there are anatomical subdivisions present in lateral prefrontal cortex such that the ventral and dorsal regions of dorsolateral PFC, as defined by the dorsal and ventral lips of the principal sulcus, connect differentially with distinct regions of parietal and premotor cortices and the cerebellum (Hoshi, 2006). Ev- idence has been accumulating to support functional divisions within LPFC as well (Koechlin et al., 2003; Hoshi, 2006; Badre, 2008; Badre et al., 2010; Hampshire et al., 2011). Using a simple behavioral task where monkeys were presented with visual cues instructing specific motor responses and then, following a short delay, visual choice cues were shown at various locations and animals could choose an action to achieve a positive outcome, (Yamagata et al., 2012) characterized the differential contributions of neurons in dorsal and ventral LPFC as well as dorsal premotor area. Dorsal LPFC seemed to encode higher level information about the behavioral goal across the delay whereas neurons in ventral LPFC encoded stimulus fea- tures of the cues as well as spatial information specifying the action to be taken, informing the differentiation of information processing along the perception-action hierarchy underlying goal-directed behavior. Another study examined the activity of single cells in LPFC during a cue-target association task. Here, (Sigala et al., 2008) show a hierarchical representation in neural activity whereby responses of single cells in each phase of a task do not predict their response during other phases and instead show orthogonal activity, for example cue information in phase and target information in another. In addition to physiology studies in non-human primates, functional neuroimaging (fMRI) and lesion studies in humans suggest that there may be a hierarchical functional organization within the frontal cortices whereby the post posterior regions control direct, concrete motor responses and information becomes progressively more abstract, goal-oriented, and context dependent anteriorly. (Koechlin et al., 2003) conducted an fMRI study confirming that the increasing cognitive demands of sen- sory, contextual, and episodic information engage premotor to caudal to rostral prefrontal regions accordingly. Badre and D’Esposito (2007) subsequently devised an fMRI experi- ment manipulating competition at four levels of abstraction from simple motor responses to contextual cue-to-dimension mappings, showing an increasing reaction time gradient in addition to hemodynamic activation along the rostro-caudal axis corresponding with a cog- nitive representational hierarchy. Furthermore, patients with stroke-related lesions along the rostrocaudal axis show predictable deficits in the same task (Badre et al., 2009).

4 All PFC areas appear to contain neurons that encode reward-related information (Rangel and Hare, 2010; Wallis and Kennerley, 2010). However, in contrast to ACC and OFC, the reward information encoded by LPFC seems to be more highly processed. For example, LPFC neurons are able to predict reward values by distinguishing stimuli on a categorical basis, independent of visual properties (Pan et al., 2008). Reward information in LPFC also interacts with working memory processes, for example, by increasing the precision with which information is stored in working memory (Kennerley and Wallis, 2009a). LPFC neurons also show properties that would be useful for allowing reward to control hierarchical behaviors. For example, neurons in LPFC seem to maintain a trace of previous choice activity for several trials in the past in order to maintain an average rate of reward, which would be useful for controlling temporally extended behaviors (Seo et al., 2007). In addition, LPFC may play a role in allowing higher-level goals to suppress more low-level reward information. For example, Rangel and colleagues showed that LPFC was activated in humans dieting in response to unhealthy food options presented in a functional magnetic resonance imaging (fMRI) study (Hare et al., 2009). The LPFC signal correlated with the strength of the value signal in ventromedial PFC, suggesting that LPFC may have worked as a mechanism to suppress the value information elicited by the unhealthy food.

In summary, LPFC displays many properties that make it ideally suited for the control of hierarchical behavior. Neuronal signals are frequently temporally extended, which may enable the representation of superordinate behaviors. In addition, reward signals in LPFC show the capacity to interact with high-level cognitive representations. However, at this stage, these ideas remain speculative; the current project aims to determine the precise contribution of LPFC.

1.2.2 Orbitofrontal cortex (OFC)

OFC lies on the ventral surface of the PFC right above the eye orbits of the skull. OFC includes Brodmann areas 10, 11, 12, 13, and 14. Along the anterior-posterior axis, the cy- toarchitecture of tissue goes from granular (anterior) to dysgranular to agranular (posterior) dependent on the prominence of granular cells in layer IV (Morecraft et al., 1992). Though human OFC shares similar cytoarchitectonic organization (Mackey and Petrides, 2010), rat OFC lacks clear homologous areas and is instead described anatomically as ventrolateral OFC, lateral OFC, and agranular insular cortex (Ongur and Price, 2000).

OFC has few connections with motor areas; indeed, it is the PFC subregion that is most poorly connected to the motor system. However, there are some connections between ventral premotor cortex and areas 12 and 13, and there are interconnections between OFC and LPFC that may indirectly affect motor behavior (Carmichael and Price, 1995b). While these connections suggest a limited role in motor processing and execution, OFC has many more connections with sensory areas. Area 12 receives visual information from inferotemporal

5 cortex, perirhinal cortex and regions in the superior temporal cortex relay polysensory input (Carmichael and Price, 1995b; Kondo et al., 2005). In addition, areas 12 and 13 receive tactile information about the face, hand, and forelimb from the anterior infraparietal area and secondary somatosensory cortex (Mountcastle et al., 1975; Petrides and Pandya, 1984), taste information from insula and opercular cortex, and olfactory information from the pyriform cortex (Carmichael and Price, 1995b). In addition to the vast amount of sensory information it receives, OFC also receives an array of signals related to emotion and reward-related activity. Amygdala, hippocampus, temporal pole, entorhinal, perirhinal, perihippocampal, and cingulate cortices all connect with areas 11, 13 and 14 (Carmichael and Price, 1995a). Overall, though the OFC receives a relative lack of motor information, the area is very rich with limbic and sensory information. OFC appears to play an important role in value- based decision-making. For example, patients with damage to OFC exhibit choices that are inconsistent with their subjective preferences (Camille et al., 2011b). OFC neurons respond to valuable stimuli in the environment (Rolls, 1996; Schultz et al., 2000; Wallis, 2007, 2012), encode both positive and negative expected outcomes (Morrison and Salzman, 2009), reflect the value of one reward relative to others (Padoa-Schioppa and Assad, 2006, 2008), and associate with each choice alternative dynamically (Rich and Wallis, 2016). Such signals may underlie the role OFC plays in decision-making. Further, the strong limbic input to OFC, as well as its strong connections with all sensory modalities, place it in an ideal location for learning stimulus-outcome associations. OFC lesions impair Pavlovian conditioning in rats but leave instrumental conditioning unaffected (Ostlund and Balleine, 2007). Although intact on a range of neuropsychological exams, patients with OFC damage are unable to cope with stimulus-outcome reversals such that they continue responding in favor of a once- rewarding stimulus even though it may no longer be rewarding (Rolls et al., 1994). Although much of the recent literature has focused on the role of OFC in encoding reward information, there is also evidence that it plays a role in more cognitive processing. For example, OFC neurons are able to encode high-level, abstract rule information (Wallis et al., 2001). In addition, although working memory processes are more commonly ascribed to LPFC rather than OFC (Bechara et al., 1998), recent evidence has shown that OFC neurons can hold information about rewards in working memory (Lara et al., 2009). Wilson et al. (2014) proposes that OFC integrates multisensory perceptual input from cortical and subcortical areas together with previous information such as stimuli, choices, and rewards to determine the current state. The current state here represents a cognitive map with an abstract label of a multitude of task information. This representation is especially critical when task states include unobservable information, for instance, from working memory.

In summary, OFC has both the anatomical connections as well as the functional proper- ties to make an important contribution to processing reward information and using reward information to guide behavior, and those contributions could include enabling reward infor- mation to interact with high-level cognitive processes.

6 1.2.3 Anterior cingulate cortex (ACC)

ACC is located at the cingulate sulcus on the medial wall of PFC and consists largely of area 24. In humans, ACC is comparable to monkeys with one difference: human area 24 in the cingulate sulcus has noticeably larger pyramidal neurons in layer V (Nimchinsky et al., 1996). Medial PFC in rats is more rudimentary than that of monkeys and humans, with simpler cell morphology and fewer divisions (Ongur and Price, 2000). Most primate electrophysiology studies of ACC focus on the region surrounding the portion of the cingulate sulcus anterior to the genu of the corpus callosum (Matsumoto et al., 2003). This is, in part, because this region is most accessible for electrophysiological studies. It lies close to the surface of the brain and is sufficiently lateral to avoid any accidental contact with the central sinus. This is also the region that we will focus on in the current thesis. ACC has strong connections with limbic and motor-related areas. Of the limbic areas, the amygdala, a key structure for processing affective value and emotion, connects strongly with all part of ACC (Carmichael and Price, 1995a). ACC is also the region of frontal cortex with the heaviest dopaminergic input (Williams and Goldman-Rakic, 1993). As for motor connectivity, ACC strongly connects to the area immediately posterior to it - the cingulate motor area (CMA) - which has direct projections to the spinal cord controlling movements of the arm and leg (Dum and Strick, 1991, 1996). CMA is also connected with the supplementary motor area (SMA), and together their spinal projections make up 40 % of all corticospinal projections in the frontal lobe (Dum and Strick, 1996). In contrast, ACC has few connections with sensory areas (Carmichael and Price, 1995b). Recent findings, using diffusion tensor imaging, have shown that ACC has the same pattern of connections in the human as in the monkey (Croxson et al., 2005).

Consistent with its connectivity to limbic areas, ACC neurons encode information about many different aspects of rewards, including their size, probability of delivery and how much work was required to earn the reward (Kennerley et al., 2009), as well as information about negative outcomes (Sallet et al., 2007; Seo and Lee, 2009). Although ACC and OFC share similar limbic connections, and similar responsivity to rewards, their pattern of connections suggests that they are part of two very separate networks performing different functions. Areas in the medial wall tend to connect with one another, but only have weak connections with areas in OFC, while areas in OFC tend to connect with one another, but only have weak connections with areas in the medial wall (Figure 1.4). These anatomical findings have led to the suggestion that there are two distinct limbic networks in frontal cortex (Carmichael and Price, 1996): the medial network and the orbital network.

Given the existence of these two networks, there has been speculation as to what the difference is in their function. One possibility, which would be consistent with the anatomical connections of the two networks, is that OFC is important for associating stimuli with the rewarding outcomes they predict while ACC is important for associating actions with rewarding outcomes (Rushworth et al., 2007). Related ideas have been proposed in the decision-making literature. OFC is argued to be responsible for assigning values to sensory

7 stimuli in the environment, thereby enabling the organism to efficiently make choices between different goods. This decision space is argued to be independent of the action necessary to acquire those goods. In contrast, ACC is argued to calculate the value of actions by integrating information about the action and the object to which the action is directed. There has been considerable debate about whether decision-making occurs solely in the realm of the goods space (Wunderlich et al., 2010; Padoa-Schioppa, 2011), the action space (Kawagoe et al., 1998; Platt and Glimcher, 1999) (Roesch and Olson, 2003) or requires the two systems to operate in parallel (Cisek and Kalaska, 2010; Luk and Wallis, 2013).

Although there is neuropsychological evidence to support these distinctions in humans (Camille et al., 2011a), monkeys (Rudebeck et al., 2008) and rats (Balleine and Dickinson, 1998) (Pickens et al., 2003; Ostlund and Balleine, 2007), the evidence at the single-neuron level is more mixed. OFC neurons in monkeys typically encode the value of predicted outcomes rather than the motor response necessary to obtain the outcome (Tremblay and Schultz, 1999; Wallis and Miller, 2003; Padoa-Schioppa and Assad, 2006; Ichihara-Takeda and Funahashi, 2008; Abe and Lee, 2011), but there have been some notable exceptions (Tsujimoto et al., 2009). Furthermore, robust encoding of actions has been seen in rat OFC (Feierstein et al., 2006; Furuyashiki et al., 2008; Sul et al., 2010; van Wingerden et al., 2010). With regard to ACC, many studies have emphasized the role it plays in predicting the outcome associated with a given action (Ito et al., 2003; Matsumoto et al., 2003; Williams et al., 2004; Luk and Wallis, 2009; Hayden and Platt, 2010), but there have also been studies showing ACC neurons encoding the rewards predicted by stimuli (Seo and Lee, 2007; Kennerley et al., 2009; Cai and Padoa-Schioppa, 2012).

Further yet, there is substantial and growing evidence that implicates ACC in reward prediction error signaling, including data to suggest that separate populations of neurons encode positive and negative errors similarly (Matsumoto et al., 2007; Sallet et al., 2007; Kennerley et al., 2011). In addition to, and in contrast with, standard RPE models where “good” events trigger positive RPEs and “bad” events trigger negative RPEs, it has been suggested that the cellular activity in this region may signify the occurrence of an unexpected outcome positively or a non-occurrence of an expected outcome negatively regardless of whether the outcome is affectively positive or negative (Alexander and Brown, 2011). The reward prediction error activity may also be dependent on task phase or epoch (Sallet et al., 2007; Kennerley et al., 2011) and can occur in response to a stimulus announcing a reward discrepancy before the actual reward has even been dispensed (Sallet et al., 2007). Though results conveying the types of information encoded in ACC are diverse and can be somewhat difficult to interpret in light of each other, that it receives heavy dopaminergic input and makes connections with limbic and motor areas, together with its documented role in action-outcome monitoring, ACC is uniquely poised to convey learning signals in complex behavioral environments such as the present hierarchical experiment where reward is manipulated on multiple levels.

8 1.2.4 Closing remarks on anatomy

PFC can be divided into three general regions: lateral, orbital, and medial (Fuster, 1997), each of which has its own specialization. In this thesis, we focus on LPFC, OFC, and ACC, respectively. The LPFC receives and processes the abstract information, which suggests LPFC has a higher-order cognitive role in such as rules and strategies. The first experiment in this thesis will assess the effect of mnemonic strategies on tuning of lateral prefrontal neurons in a visuospatial WM task with self-ordered sequential responses. Next, neurons in LPFC, OFC, and ACC encode many value-related attributes, such as the potential costs or benefits of a decision. The processes of RL enable these value signals to integrate and adapt by prediction errors while making choices. The second experiment in this thesis will assess the neural basis of prediction errors in a hierarchical RL task and further identify how error signals work corresponding to levels of task structure

1.3 The role of PFC in working memory

A recent review, which categorized ninety-seven empirical results, has provided an overview for storage of working memory contents in multiple brain regions ranging from sensory to parietal and prefrontal cortex (Christophel et al., 2017). Compared to other sensory and association regions (Figure 1.5), these distributed representations in PFC are differed due to the level of abstractness and generalizability corresponding to task relevant information in working memory (Miller and Cohen, 2001; Baddeley, 2003; Goldman-Rakic, 2011). This account originates from decades of work that showed strong neural activity in PFC during the delay period of working memory tasks (Fuster and Alexander, 1971; Fu- nahashi et al., 1993b; Levy and Goldman-Rakic, 2000; Procyk and Goldman-Rakic, 2006). This delay period activity has two key properties. First, it is specific to the stimulus being remembered, consistent with it containing information about the content of working mem- ory. Second, it only encodes stimuli that are relevant to the task at hand: it is resistant to distractors (Miller et al., 1996; Sakai et al., 2002) and task irrelevant information is not encoded in working memory (Rainer et al., 1998).

These properties of delay period activity have been observed at the single-neuron level in monkeys as well as on a larger scale in human imaging studies (Courtney et al., 1998; Zarahn et al., 1999; Curtis et al., 2004). In monkeys, single neurons recorded from PFC maintain stimulus information across the delay period, even when distracting stimuli are presented in the middle of the delay (Miller et al., 1996). In humans, multiple studies using various imaging techniques have also shown an increase in delay period activity in PFC. For example, using functional magnetic resonance imaging (fMRI) sustained activation was measured in the lateral PFC while subjects kept spatial locations in working memory across delays of several seconds (Courtney et al., 1998). The necessity of PFC delay activity for

9 working memory is demonstrated by studies showing that lesions to PFC produce strong deficits in working memory tasks both in monkeys (Fuster and Alexander, 1971; Bauer and Fuster, 1976; Funahashi et al., 1993a; Levy and Goldman-Rakic, 2000) and humans (Muller et al., 2002; Tsuchida and Fellows, 2009; Voytek and Knight, 2010). In addition, disruption of delay period activity with microstimulation increases the rate of errors (Wegener et al., 2008). Furthermore, the longer the delay, the greater the error rate, consistent with a failure of working memory to retain stimulus information. These findings have formed the basis for the prevailing view of that PFC is the site where information about the stimulus to be remembered is stored in working memory (D’Esposito and Postle, 2015). However, recently there has been a growing body of work that has cast doubt on this theory (Druzgal and D’Esposito, 2001; Curtis and D’Esposito, 2003; Postle et al., 2003; Ranganath et al., 2004; Sreenivasan et al., 2014a; Sreenivasan et al., 2014b; Postle, 2015).

Although plenty of evidences are conducted for above ongoing debates, relatively few studies focus on how PFC encodes interactively while implementing behavioral strategies to overcome the limited capacity of working memory (Miller, 1956; Cowan, 2001). Bor et al. (2003) have shown using functional neuroimaging that when people employ methods of chunking in a memory task of spatial sequences, behavior improved and LPFC is more active than when no such strategy is used. In this thesis, we will devise a novel sequential working memory task to investigate how strategies and working memory interact at the single neuron level.

1.4 The role of PFC in reinforcement learning

Reinforcement learning (RL) is one typical approach that describes how humans and animals take actions to maximize rewards (e.g. payoff or food) in uncertain environments. In general, RL has been extensively studied using neural basis evidence and computational models, but the detailed mechanisms that control each learning trial remain unclear. Grow- ing evidences suggest when learning to achieve a goal through a complex series of actions, humans often chunk several actions into an option and evaluate whether the option achieved a specific subgoal first. Each goal-directed behavior may consist of several subgoals. Re- cently, some studies have tried to unravel this puzzle by using hierarchical reinforcement learning (HRL). HRL allows us to understand neural mechanisms of pursuing subgoals in two ways. First, HRL requires achieving a set of subgoals before completing the goal-directed behavior. Then, it supposes that the subgoals are associated with a special form of rewards called pseudo-rewards. Pseudo-reward prediction errors (PPEs), which are unique to HRL, can help achieve subgoals, complete planned actions, and acquire final outcomes. How- ever, the neural mechanisms that are representative of the functions of PPEs remain poorly understood. In the remainder of this section, we introduce the historical perspective of reinforcement learning.

10 1.4.1 Historical perspective

Reinforcement learning (RL) is one of several typical approaches that describe how hu- man and animals act to maximize rewards in uncertain environments (Sutton and Barto, 1998). RL is an extension of the Rescorla-Wagner model, which is arguably the most influen- tial model of animal learning to date. The Rescorla-Wagner model postulates that learning is driven by the discrepancy between what was predicted to happen and what happened actually (Rescorla and Wagner, 1972). However, the Rescorla-Wagner model has as a con- straint that its unit of learning is a conditioning trial that acts as a discrete temporal object. So, this model only predicts the immediately forthcoming rewards within one learning trial. To overcome this limitation, RL generates a learning rule, temporal difference (TD) learning model, which considers higher order conditioning so that the model is sensitive to the tem- poral relationships within a learning trial (Sutton and Barto, 1998; Niv, 2009). The basic concept of the TD model is to estimate the values of different states or situations related to predict the future rewards or punishments. Based on the discrepancy between the predicted values, behavior is guided by positive or negative reward prediction errors (RPE) that derive from the final outcomes’ unexpected appearance or omission, respectively. The TD model extends the Rescorla-Wagner model and is confirmed by the findings of neural correlates in the brain dopamine system (Schultz et al., 1997).

The TD model has provided an essential framework for understanding the neural sub- strates of learning and decision making (Niv, 2009). Previous studies that either use single- unit recording in behaving animals (Schultz et al., 1997; Roesch et al., 2007) or functional imaging of human decision-making (McClure et al., 2003; O’Doherty et al., 2003) have re- vealed the existence of neural signals related to RPE in the brain. These neural signals were found in the medial prefrontal cortex (Matsumoto et al., 2007; Seo and Lee, 2007; Kennerley et al., 2011), orbitofrontal cortex (Sul et al., 2010), and dorsal striatum (Oyama et al., 2010). The evidence from studies of the neural correlates of the TD model has been applied directly in simple behavioral task settings. It is still unclear whether related principles might apply in cases of more complex behaviors (Dayan and Niv, 2008; Daw and Frank, 2009).

1.4.2 Reinforcement learning and hierarchical behaviors

Recent research has proposed using hierarchical reinforcement learning (HRL) algorithms to attempt to fit a set of selected actions and their subgoals by upgrading the application of RL (Botvinick, 2008; Botvinick et al., 2009; Badre and Frank, 2012). Growing evidences suggest when learning to achieve a goal through a complex series of actions, humans often chunk several actions into options and evaluate whether the option achieved a specific sub- goal (Behrens and Jocham, 2011). The pivotal innovation of HRL consists of two parts. First, HRL requires a set of subgoals before completing the overall task goal. All subgoals represent stepping stones toward the eventual task goal. Then, it supposes that the sub-

11 goals are associated with a special form of rewards called pseudo-rewards. In the context of HRL, accomplishing a subgoal would produce a pseudo-reward, but not a primary reward. Compared to the RPE that encodes the unanticipated change in accomplishing overall task goal, pseudo-reward prediction errors (PPEs) in HRL are unique and reflect the different degrees of a relevant subgoal and its associated pseudo-reward (Botvinick et al., 2009). Al- though the functional relevance of HRL to neural function requires further empirical testing, a basic assumption of HRL is that the neural correlates of PPEs should occur in relation to task subgoals. Recently, Ribas-Fernandes et al. (2011) tested this assumption by using a video game, named Courier Task, which is a benchmark task from the computational HRL literature (Dietterich, 2000). The overall objective of this task was to complete a “package delivery” as quickly as possible. The participants were instructed that there would be three stimuli (truck, package, and house) on the screen and that they needed to use the joystick to guide the truck first to the package and then to the house. The relationship between these three locations in the courier task provided a hierarchical structure. The overall goal was to deliver the package to the house, and the subgoal was to guide the truck to the package from the start position. One important manipulation in the courier task was the package’s “jump event”, which made the package move to a new, unexpected location before the trunk reached it. The design of the jump event could manipulate three distances: the trunk to package (subgoal distance), from package to house (target distance), and the total path (the summation of subgoal and target distances). According to RL, a jump event may trigger a positive or negative RPE depending on the decreasing and increasing total path, respec- tively; the same expectation is two for HRL. Furthermore, HRL supposes that a jump event may also trigger positive or negative PPEs. Based on these assumptions, the converging data from the EEG and fMRI studies suggest that a jump event that increases a subgoal distance without changing the total path will elicit a negative PPE and that the proper- ties of this neural signal may be generated from the dorsal anterior cingulate cortex (ACC) (Ribas-Fernandes et al., 2011). These results imply that negative PPEs may be involved in the error monitoring process because the error monitoring system has been correlated to feedback-related negativity (FRN), which is a typical brain activity also generated from ACC (Holroyd and Coles, 2002).

Previous studies suggest that FRN is generated by a high-level, generic, error-processing system. The error-processing system can generally detect the error and use it to improve performance in a given task. In addition, this error-processing system is highly flexible in wide variety of contexts, and is associated with the executive control processes mediated by the frontal areas of the brain (Holroyd and Coles, 2002). Most of these results were validated by the Eriksen Flanker Task (Eriksen and Eriksen, 1974). In this typical behavioral task, participants are directed to focus their attention on a letter in the center of each stimulus array. They are then required to respond quickly with one hand by pressing a button whenever they see a congruent stimulus and with the contralateral hand whenever they see an incongruent stimulus (Gehring et al., 1993). The demands required to generate correct responses in this task are so simple that the task detects optimal error processing in the brain when unexpected error happen. Therefore, it could be necessary to re-verify this behavioral

12 function of ACC in error-processing by manipulating the levels of processing and using a hierarchical behavioral task (Behrens and Jocham, 2011).

To verify the hypothesis, we proposed that ACC neurons respond as pseudo-reward prediction errors (PPEs) when unexpected events occur when performing hierarchical be- havioral tasks, our study must accomplish three goals. First, we must set a proper HRL behavioral task that combines neurophysiological methods. Second, we must determine how the neural firings in ACC signal PPEs to unexpected events. Third, we must reassess about the behavioral functions of ACC in error-processing based on hierarchical behavioral task.

13 Figure 1.1: Hierarchical representation of a routine sequential task. From Humphreys and Forde, 1998.

14 Figure 1.2: (A) Flow of control in one step of a sequential task, with blue representing the increased involvement of supervisory control and red representing increased involvement of schematic control during a single step. (B) Representation of the multiple, hierarchical levels that can characterize sequences. Each step in more concrete motor sequences or more abstract task sequences may engage supervisory or schematic control and the interaction between them. From Desrochers et al. (2015).

15 Figure 1.3: The medial, lateral, and orbital surfaces of the prefrontal cortex. Monkey outlines taken from Carmichael and Price (1996) and Petrides and Pandya (1999); human outline taken from Ongur and Price (2000).

16 Figure 1.4: Medial and orbital networks of the prefrontal cortex taken from Carmichael and Price (1996). Adapted and re-printed with permission from Wallis (2012).

17 Figure 1.5: Overview of Content-Specific Activity during Working Memory Delays in the Macaque (Left) and Human (Right) Brain. Icons indicate persistent stimulus-selective activ- ity for each stimulus type indicated by the icon (see legend) at the respective locations. Both left- and right-sided effects are shown on the left hemisphere. A full list of individual studies is reported in the supplemental information from Christophel, et al., 2017. Brain areas are identified by abbreviations: AC, auditory cortex; ERC, entorhinal cortex; EVC, early vi- sual cortex; FEF, frontal eye fields; FG, fusiform gyrus; hMT+, human analog to MT/MST; IPS, intraparietal sulcus; IT, inferior temporal cortex; LOC, lateral occipital complex; lPFC, lateral prefrontal cortex; PM, premotor cortex; PPC, posterior parietal cortex.

18 Chapter 2

General Methods

2.1 Overview

This chapter will describe the methods used in performing all experiments in this thesis. Specific details related to each experiment will be included in the chapter describing that Experiment.

2.2 Behavioral training materials and methods

2.2.1 Subjects

We used the same two male rhesus monkeys (Macaca mulatta) for both experiments. These two experiments were spaced apart by 2 years. For the first set of neurophysiological recordings, subject R was 6 years old and weighed 9 kg, and subject Q was 5 years old and weighed 7 kg. Subjects were housed in pairs as part of a 6-member colony living within a large room. They were fed twice a day and obtained daily environmental enrichment. They experienced a 13-hour long light cycle that began each day at 7 am. The subject’s fluid intake was regulated so as to maintain motivation on the tasks. All procedures were in accordance with the National Institutes of Health guidelines and the recommendations of the University of California Berkeley Animal Care and Use Committee.

19 2.2.2 Behavioral training

We trained the subjects to perform the behavioral tasks using positive reinforcement. Sitting in front of a video monitor, subjects used either eye-saccadic (for WM task) or joystick (for HRL task) movements to choose from available visual stimuli in order to obtain a liquid reward with a mixture of 50% water and 50% apple juice. The subject continued to do this until he received as much reward as he wanted. Subjects’ daily water consumption, which includes juice reward from tasks and supplemental water in the colony room was monitored. We ensured that subjects received at least 300 ml of fluid per day. Once the subject had learned the task, we started the recording sessions. Training was typically carried out for five or six days a week.

2.2.3 Materials and methods for training

Subjects performed tasks seated in a primate chair facing a 19-inch LCD computer screen with a viewing distance of 43 cm for the WM task and 50 cm for the HRL task. A sys- tem of computers controlled the display of behavioral events (Figure 2.1). These computers utilized MonkeyLogic (http://www.MonkeyLogic.net/), a toolbox running in conjunction with MATLAB (http://www.mathworks.com/products/matlab/), for the design and exe- cution of psychophysical tasks. The central MonkeyLogic control computer sent commands via a COM port to a receiving computer which then presented the visual stimuli on an LCD monitor. The stimuli were mirrored via a video splitter onto a third screen in the sound-attenuating boxed room, in which the monkey sat. By mirroring the display, we could monitor exactly what the subject saw without disturbing him as he worked. The Monkey- Logic control computer ran timing routines and interfaced with various external devices via a PCI-6229 data acquisition (DAQ) card (National Instruments, Austin, TX). Each behav- ioral event in the trial was marked with a code that was sent as an 8-bit number from the DAQ card to the multichannel acquisition processor (MAP). The MAP system read in this number and recorded a timestamp of when it occurred along with its value. Its timestamp was stored along with neurophysiological data in a single ‘.plx’ data file. The MonkeyLogic control computer ran with a single interrupt routine that triggered every 1 ms and updated both a software clock and sampled all data lines. Thus, the control of the behavioral con- tingencies, the presentation of stimuli and the monitoring of behavioral events all took place with a 1 ms resolution.

Actions of the subjects were registered using a 4-TPS-E1 Touch Sensor Module (Crist Instrument, Damascus, MD) connected to the digital input port of the DAQ card. The touch sensor was a contact sensitive device designed to send a 5 V TTL pulse when a grounded subject touched it. Actions were executed by the subject via custom-made joysticks that connected to the Touch Sensor Module. The joystick was constrained, such that the subject could only make left and right movements.

20 Eye position was recorded using an infrared eye monitoring system (ISCAN, Burlington, MA). An infrared camera focused on the subject’s eye and visualized the results using pro- prietary image tracking software. The software tracked the center of the subject’s pupil as x and y coordinates as well as the pupil diameter. These three measures were fed separately to three DAQ analog input channels and recorded for the duration of the session.

Juice rewards were delivered by commands from the DAQ digital output ports to the juice pump and a particular juice’s corresponding solenoid valve. The ISMATEC-IPC-8 peristaltic juice pump (ISMATEC SA, Glattbrugg, Switzerland) took a 5 V TTL pulse that delivered fluid at 0.62 ml/s during the duration of the TTL pulse through polymer tubing which ended in a custom-made mouthpiece positioned near the animal’s mouth.

2.3 Neurophysiological techniques

2.3.1 Isolation of recording sites

To determine the exact location for performing neuronal recordings, we began by placing recording chambers over the brain areas of interest. Magnetic resonance images (MRIs) of the subject’s brain were taken at the U.C. Davis Center for Imaging Sciences with a 1.5 T scanner prior to the animal’s arrival at Berkeley. Those digital images were then imported into commercial graphics software (Adobe Illustrator Creative Cloud, San Jose, CA), where we calculated stereotactic coordinates for the chambers. We verified the correspondence between the MRI scans and electrode placement by mapping the position of sulci and the boundaries of gray and white matter during recording sessions. Upon completing data collection, we plotted the electrode positions and sulci and created an unfolded cortical map of our recording locations and positioning of task-selective neurons.

The WM task required access to LPFC in both hemispheres. Both subjects were im- planted with one recording chamber over each hemisphere through which we could reach both sides of LPFC. In subject Q, the left chamber was positioned over the left hemisphere centered at 34 mm anterior of the interaural line (i.e., AP 34) and 14 mm lateral of the mid-sagittal plane (i.e., LM 14). The chamber was vertical in both the AP and LM axis. The right chamber was over the right hemisphere centered at AP 34 mm and LM 12 mm on the skull and angled at 20◦ tilted laterally from the vertical. In subject R, the left chamber was over the left hemisphere centered at AP 29 mm and LM 11 mm, and was vertical in both the LM and AP axes. The right chamber in subject R was over the right hemisphere centered at AP 30 mm and LM 14 mm on the skull and angled at 24 degrees tilted laterally from the vertical (Figure 2.2A).

For the HRL task, we recorded from more brain areas in each subject. In subject R,

21 the left chamber was positioned to allow recording from ACC and LPFC. This chamber was centered at AP 33 mm and LM 14 mm on the skull and angled at 25◦ tilted laterally from the vertical. The right chamber was positioned to allow recording from OFC. It was centered at AP 31 and LM 11 mm and was vertical in the AP and LM axis. In subject Q, the right chamber was positioned to allow recording from ACC and LPFC. This chamber was centered at AP 34 mm and LM 12 mm and angled at 20◦ tilted laterally from the vertical. The left chamber was positioned to allow recording from OFC and was centered at AP 34 mm and LM 14 mm, and was vertical in the LM and AP axes (Figure 2.2B).

2.3.2 Surgery

Each subject underwent a surgery to implant a custom-made titanium head-positioning post secured with titanium orthopedic screws. This post kept the subject’s head immobile so that we could track the subject’s eye position and have electrodes in the brain during neuronal recording. Subsequently, we performed two operations, one to implant cylindrical titanium recording chambers on to the skull above the brain area of interest, then another to create the craniotomy through which recording electrodes could enter the brain. The chambers were secured via bone cement and orthopedic titanium screws. Chambers were covered with a cap (made of either polypropylene CILUX from Crist Instrument, Hagerstown, MD or polyetherimide Ultem R , custom-built).

Before surgery, we anesthetized the animals with ketamine (10 mg/kg IM). Xylocaine or lidocaine spray (14%) or ointment was used as a local anesthetic to facilitate intubation. During surgery, the surgical team monitored the animal closely, to ensure appropriate level of analgesia. Anesthesia was maintained with isoflurane (2-4%). The level was monitored by heartbeat (90-150 beats/min), respiration rate (17-23 breaths/min), temperature (36- 39◦C) and absence of responses to stimuli such as toe pinch. Blood oxygen saturation was also monitored with a pulse-oximeter. A lactated ringer IV drip (2-4 ml/kg/hr) was used during surgery to ensure that the animal received sucient hydration, and a heating pad and towels were used to maintain body temperature. Following surgery, gas anesthesia was discontinued, and once the animal showed signs of coming out of anesthesia, the intubation tube was removed and buprenorphine (0.01-0.03 mg/kg SC or IM) or morphine (1-2 mg/kg SC) were injected for relief of post-operative pain. Immediately after surgery the animal was checked at least every half hour by members of the lab and veterinary staff, and the time between checks was gradually lengthened as the animal recovered from anesthesia. After initial recovery, the animal was checked several times per day at which time appropriate analgesics and antibiotics were administered. Typically, buprenorphine (0.01-0.03 mg/kg SC or IM) or morphine (1-2 mg/kg SC) was administered 2-3 times per day for 3-5 days. In general, all appropriate measures were taken to minimize post-operative pain. To ensure adequate control of post-operative pain, choice of analgesia and frequency of administration were made in consultation with veterinary staff.

22 2.4 Recordings

We recorded neuronal activity from three brain areas. In ACC, our recordings focused on the dorsal bank of the cingulate sulcus anterior of the corpus callosum. In LPFC, recordings were largely from the dorsal and ventral bank of the principal sulcus. In OFC, recordings were mostly between the medial and lateral orbital sulci. Activity from these areas was simultaneously recorded by lowering multiple tungsten electrodes (FHC, Bowdoin, ME) at- tached to custom-designed screw microdrives that independently moved pairs of electrodes. The microdrives were mounted to a custom-made plastic grid containing an array of 24- gauge holes spaced 1 mm apart. Handheld cutting reamers were used to adjust the size of grid holes for holding needles. Stainless steel 23-gauge ultra-thin-wall hypodermic needles (Terumo, Somerset, NJ) were glued to the bottom of the grid, such that the beveled tip of the needles pointed out of the grid. These tips served to puncture the dura and guide the electrodes to the desired recording location. Electrodes were lowered manually by handheld screwdriver to depths predetermined from the MRI scans. When an electrode approached a cell layer, we slowed the lowering and stopped when we isolated single neurons. After neurons were located on most or all of the channels, we waited 1-2 hours for the brain to set- tle, thereby ensuring stability during the recording session, which typically lasted 1.5 hours. Neurons tended to stay the full duration of the recording session without much drift. We randomly sampled neurons to ensure fairer comparison of neuronal properties between the different brain regions. During the recording session, no changes to the channels were made, as that would distract the subject from performing the task.

We cleaned the chambers both before and after the recording session to minimize in- fection. The cleaning began by removing the cap or grid and sterilizing the outside of the chamber with alcohol wipes. Under sterile conditions, the inside of the chamber was flushed with sterile saline, then Nolvasan or povidone, and iodine antiseptic, followed again with sterile saline. The tissue was then dabbed dry with a sterile cotton swab. If cleaning was prior to recording, the plastic grid with microdrives and electrodes was fit on top of the chamber. If cleaning was after recording, a new sterilized cap was placed on top of the chamber. Grids and electrodes were sterilized overnight in Cidex.

2.4.1 Materials and methods for neurophysiology

Voltage signals were taken from the tip of the electrodes with respect to a ground ref- erence, which was the head-positioning post affixed to the subject’s skull. Those neuronal signals were recorded and amplified via hardware and software from Plexon Inc. (Dallas, TX) as shown in figure 2.1. The signal was first amplified 20 times from an op-amp based circuit in the headstage, which connected directly to the electrodes. The signal was then further amplified 1000 times through a preamplifier and filtered for spikes in the 100 Hz - 8 kHz frequency band and local field potentials (LFPs) in the 1 Hz - 300 Hz frequency

23 band. Signals next fed into a Multichannel Acquisition Processor (MAP system) for further amplification and filtering.

Spikes and LFPs were digitized at 40 kHz with 12-bit resolution per channel. For the spikes, thresholds were set to ensure that neuronal signals were a minimum of 4 standard deviations above background noise (calculated over a 10 s period immediately prior to record- ing). When the spike waveform crossed a manually set threshold, the program recorded the timestamp of the threshold crossing, along with the actual waveform in a 1400 ms window around the time of crossing. Waveforms that did not cross the threshold were discarded. The digitized waveforms were then sorted offline, using Offline Sorter software (Plexon Inc., Dallas, TX). This constructed 2D or 3D plots of a subset of 12 waveform features including the first three projections from principal components analysis (PCA), peak-valley difference and widths, and waveform energy. From those 2D or 3D plots, clusters of waveforms were grouped together manually and classified as single units (Figure 2.3). We ensured the sepa- ration of neuronal waveforms by rejecting channels where more than 0.2% of the waveforms were separated by intervals of less than 1.5 ms or where neuronal ‘drift’ occurred. Typically, approximately 15% of the channels were discarded.

2.5 Statistical analysis

We used MATLAB (MathWorks, Natick, MA) to perform all analyses. Analyses were restricted to correct trials, as there were typically too few error trials to permit a meaningful analysis of activity on such trials. To characterize the selectivity of a neuron, we first calculated its mean ring rate in each trial during a defined time epoch. We compared differences in firing rate between experimental conditions with the null hypothesis that the neuron did not encode a given type of information. Once we had classified the neurons according to the type of information they encoded, we assessed differences between brain areas in the prevalence of these different types of neurons using chi-squared tests. We also characterized strength and earliest latency of encoding in selective neurons. Specific statistical analyses are described in detail in the experimental chapters.

24 Figure 2.1: Experimental set up used to control behavioral events and record neurophysio- logical data.

25 Figure 2.2: Magnetic resonance imaging (MRI) scans illustrating the coronal slice from the middle of our recording locations illustrating potential electrode paths. Example on the top is from subject R in WM task. Example on the bottom is from subject Q in HRL task. Possible electrode tracks are highlighted in white. Brain areas recorded from are highlighted in red (LPFC), green (ACC), and blue (OFC).

26 Figure 2.3: Cluster isolation of neuronal waveforms. On the left are 32 channels, of which many have neurons on them, i.e., detected waveforms. A sample channel is zoomed in in the middle panel to reveal three distinct waveforms. Those waveforms are decomposed into components in Plexon’s online sorter, and each waveform is plotted as a single dot in the right panel. Clusters of waveforms are then isolated manually and entered into neuronal analyses.

27 Chapter 3

Spatiotemporal encoding of search strategies by prefrontal neurons

3.1 Introduction

The capacity of working memory was empirically defined by Miller as 7 ± 2 items (Miller, 1956), although more recently this limit has been revised down to four items (Cowan, 2001). A substantial proportion of undergraduates tested by Vogel and Machizawa were only able to remember one or two items (Vogel and Machizawa, 2004). Whatever the exact limit, this seems hard to reconcile with our everyday experience, where we appear to have little problem temporarily remembering numbers of items at or beyond the empirical limit. This is partly due to the rigorous laboratory conditions under which capacity limits are measured, while in everyday life we use a repertoire of behavioral strategies to overcome these limits. For example, dealing a hand of cards to players in random order would necessarily involve a large working memory load, requiring the dealer to keep track of how many cards each player has been dealt. By instead selecting a behavioral strategy, such as dealing the cards to players in a clockwise order, the dealer reduces the working memory load to only a single item (last player dealt a card), enabling the task of dealing cards to be completed at a lower computational cost.

It remains unclear how these high-level strategies interact with the contents of working memory. A likely source for the neural basis of this interaction is the lateral prefrontal cortex (LPFC). Neurons in this area selectively encode task-specific information across intervening delay periods and distractors (Miller et al., 1996), consistent with maintaining information in working memory. Neurons in this region also encode high-level information about rules (Wallis et al., 2001), categories (Freedman et al., 2001) and strategies (White and Wise, 1999). Neuroimaging studies have revealed that LPFC shows greater activity as working

28 memory load increases (Rypma et al., 1999; Emrich et al., 2013). This could simply reflect the increase in task difficulty as working memory capacity is approached as increasing task demands reliably increases activity in prefrontal cortex (Crittenden and Duncan, 2014). However, when information is organized into a meaningful sequence, task difficulty decreases and behavioral performance increases, while BOLD activity in LPFC remains increased (Bor et al., 2003). These contradicting behavioral findings suggest that LPFC plays an important role in organizing information in working memory that is orthogonal to the working memory demands of the task.

To investigate how strategies and working memory interact at the single neuron level, we trained monkeys to perform a spatial self-ordered search task (Petrides and Milner, 1982). This required them to search through an array of targets collecting rewards and necessi- tating tracking which targets previously visited on a given trial. Across different blocks of trials, the monkeys varied in the extent to which they spontaneously adopted a strategy to search through the targets, which enabled us to also examine how strategies interacted with mnemonic coding in LPFC. Once trained, we recorded the activity of single neurons in LPFC during performance of the task. Adoption of a search strategy led to improved behavioral performance and a reduction in the precision with which spatial information was encoded, consistent with a mechanism by which high-level strategies can ameliorate the limited capacity of working memory.

3.2 Methods

We trained two rhesus monkeys to perform an oculomotor spatial self-ordered search task (Fig. 3.1A). Subjects were presented with a display of six targets and were required to select each target in turn. Subjects were allowed to select the targets in any order, but only received a reward the first time that a target was selected per trial. Revisiting a previously selected target resulted in a 500 ms time-out. Thus, subjects had to maintain in working memory which targets had already been selected. Once subjects received six rewards by selecting each target once, the current trial was terminated and an inter-trial interval (ITI) was initiated. The target-reward contingency reset at the beginning of a new trial. Target configurations consisted of 6 identical targets, each 2◦ in diameter and located at one of 36 positions on the visual display (Fig. 3.1B). The subject had to saccade to each target in turn and fixate it for 500 ms. Fixation was defined as maintaining the eye position within ± 3◦ of the center of the target. Any drift out of this window would reset the 500 ms fixation duration.

After the successful fixation of a target, the outcome was determined according to whether or not the selected target had been previously selected on the current trial. First time selections earned a juice reward, whereas repeat selections triggered a 500 ms time- out (red screen) period. In between each target selection, the targets disappeared and the

29 fixation cue was presented. The subject had to fixate this cue for 1 s before the targets were presented again. To avoid excessive frustration, if the subject experienced more than 10 time-outs in a single trial then the trial was aborted and defined as an incomplete trial. This happened on < 1% of trials. To ensure that subjects realized that they had found all six possible rewards on a given trial, the color of the targets changed from trial to trial in a green-white-blue sequence while the target configuration remained stable for a block of 40 trials, of which subjects completed 6 per session. The spatial configuration of targets used for each block was pseudorandomly chosen from a pool of 20 a priori designed spatial configurations. The same configuration was presented repeatedly within a block but used only once per session.

Our methods for neurophysiological recording have been reported in detail previously (Lara et al., 2009). Briefly, we implanted both subjects with a head positioner for restraint, and a titanium recording chamber over each hemisphere. Acutely, we recorded simultane- ously from bilateral LPFC populations by using arrays of 16-32 tungsten microelectrodes (FHC Instruments). All procedures were in accord with the National Institute of Health guidelines and the recommendations of the University of California at Berkeley Animal Care and Use Committee. Further methodological details are in the supplementary information.

3.3 Results

3.3.1 Task performance

Subject R completed 15 sessions and subject Q completed 10 sessions, with each session comprising 6 blocks of 40 trials. One block of trials was excluded from Subject Q as there were less than 20 complete trials in that block. To quantify task performance, we tabulated the number of revisited targets on each trial. Subjects performed the task at a high level (Fig. 3.2A); about 80% trials from both subjects were completed with 2 or fewer incorrect saccades. The likelihood of making an incorrect saccade tended to increase for later targets in the trial. To some extent, this is intrinsic to the structure of the task: it is easier to revisit a previously selected target when five of six targets have been previously selected compared to when only one target has been selected. To quantify this, we first calculated the likelihood of making an incorrect saccade collapsed across all target selections, which we termed the total error rate (T). We then calculated the expected error rate (E) using the following equation:

S  E = T − T × (3.1) 6

30 where S denotes the number of targets that have yet to be selected. Then, in order to test whether the subjects made more incorrect saccades than expected, we calculated the observed error rates for each target selection and used a binomial test to determine whether this was significantly different from E. Subjects made more incorrect saccades in selecting the final target compared to the expected error rate (Fig. 3.2B, binomial test, p < 0.001). We defined the saccadic reaction time (sRT) as the time from the onset of the target configuration to the beginning of the 500 ms hold target epoch (Fig. 3.2B). Incorrect saccades were excluded from the calculation of sRT. The overall median sRT was 210 and 221 ms from subject R and Q respectively. Further, we calculated sRT separately for each target selection and performed a linear regression using sRT as the dependent variable and the selected target’s position in the selection sequence as a predictor. For both subjects, we found that sRT significantly increased as the number of selected targets increased (Fig. 3.2B; subject R: F (1, 20596) = 2400, r2 = 0.10, p < 1×10−18; Subject Q: F (1, 12491) = 81, r2 = 0.006, p < 1×10−18). These results, as well as the distribution of errors across the target selections, suggest that the working memory demands of the task increased as the number of selected targets increased, and began to exceed the subjects’ working memory capacity by the sixth target selection.

3.3.2 Behavioral strategies

We next analyzed the patterns of target selection in order to gain insight into the behav- ioral strategies that the subjects employed to solve the task. Each of the 20 configurations was pseudorandomly selected 4-6 times for subject R and 3 times for subject Q across the recording sessions. For each block of 40 trials using a specific spatial configuration, we calculated the total number of unique sequences in which the subjects selected the six targets (Fig. 3.1C). Some configurations were associated with fewer numbers of unique se- quences than others, suggesting that subjects approached these configurations using a more stereotyped strategy. Although some configurations were consistently associated with more stereotyped selection patterns than others, there was considerable variability, with the same configuration frequently exhibiting either more or less stereotyped selection patterns with subsequent presentations to the same subject. Furthermore, there was no relationship be- tween the number of unique sequences and the number of times the configuration had been used in the task (Fig. 3.1C; subject R: F (1, 88) = 1.47, r2 = 0.016, p = 0.23; Subject Q: F (1, 57) = 1.71, r2 = 0.029, p = 0.20). In other words, there was no evidence that repeatedly experiencing the same configuration led to an increase in the use of stereotyped strategies. To quantify the degree to which the subject’s selection strategy was stereotyped, for each block of trials, we calculated how often each target was selected by a specific correct saccade (first saccade, second saccade, etc.). We then used this information to construct the pseudocolor matrices shown in Figure 3.3A. The x-axis shows each of the six saccades that the subject completed on each trial. The y-axis shows each of the six targets that the subject selected on each trial, sorted according to the most common sequence in which the targets were selected.

31 The color scale shows how frequently a given saccade was directed to a particular target. We used this data to calculate a stereotype Index (SI), which quantified the extent to which the subject searched through the targets in the same sequence for all 40 trials within a block:

a − b SI = (3.2) a + b

where a is the sum of the entries on the main diagonal of the matrix and b is the off- diagonal sum of the matrix. Figure 3.3B illustrates how the value of the SI index increased as the subjects’ pattern of saccades became more stereotyped. Increased levels of stereo- typed behavior significantly improved behavioral performance in both subjects, evident as a reduction in the number of incorrect saccades for both subjects (Fig. 3.3C; Subject R: F (1, 88) = 11, r2 = 0.11, p < 0.005; Subject Q: F (1, 57) = 31, r2 = 0.35, p < 1×10−7).

3.3.3 Neurophysiological analysis

We recorded from 1077 LPFC neurons (R: 709, Q: 368), the majority of which were within the principal sulcus (Fig. 3.4). For each neuron, we calculated its mean firing rate during five trial epochs: early fixation, late fixation, selection, hold target and reward. The early and late fixation epochs were defined as the first and second 500 ms of the fixation period, when subjects were required to fixate a central point. The selection epoch consisted of a 500 ms period, starting 150 ms before the onset of the targets until 350 ms after the onset. The hold target epoch comprised the 500 ms period of fixation required to select the target. The reward epoch was the first 500 ms of juice delivery following a correct saccade. We used a multiple linear regression model to determine which aspects of the task each neuron encoded. For each neuron, we quantified the extent to which we could predict its mean firing rate in each epoch based on the following predictors: the x- and y- coordinates of each correct selection, distance from the fixation cue to the target (radius), saccade number (one thru six), and SI (Equation 2). We also included a nuisance parameter (trial number) that would capture gradual changes in the neuron’s firing rate across the course of the session, for example, due to changes in the neuron’s position relative to the recording electrode (‘drift’). Significance was evaluated at p < 0.01. Firing rates and predictors were standardized to allow comparison of regression coefficients.

FR = b ×x + b ×y + b ×Radius + b ×SaccadeNumber 1 2 3 4 (3.3) + b5×SI + b6×T rialNnumber

Both subjects showed similar patterns of encoding. We observed the maximum amount of encoding during the hold target epoch (Fig. 3.5A). Although most selective neurons encoded

32 the spatial position of the target, the firing rates of many neurons were also affected by SI. However, the degree to which the subject was implementing a stereotyped strategy did not consistently affect LPFC neuronal firing rates; neurons were equally likely to increase their firing rate as SI increased, as they were to decrease their firing rate (Fig. 3.5B).

3.3.4 Encoding of spatial information

Since the majority of neurons were spatially selective in at least one epoch of the task, we examined how spatial selectivity evolved as the animal selected each target in turn. We examined how the neuron’s firing rate, during a given epoch and for a given saccade, was predicted by the spatial position of each of the targets in the sequence. There were 12 predictors consisting of six pairs of x- and y-coordinates corresponding to the spatial position of each selected target during a given sequence. We defined a neuron as significantly encoding a target’s position if the full model was significant and the beta associated with either the x- or y-position of a target was significant at p < 0.01. Figure 3.5A illustrates how spatial selectivity for the targets changed as the sequence was completed. The most prevalent encoding, present in 35% of LPFC neurons in subject R and 20% in subject Q during the hold target epoch, was the spatial position of the currently selected target, evident on the main diagonal of the pseudocolor matrices. However, many neurons were also selective for targets that were not currently being selected (colors on the off-diagonals of the pseudocolor matrices) reflecting targets that had either been previously selected (lower left of the matrix) or would be selected in the future (upper right of the matrix). Which of these targets was maximally encoded evolved over the course of trial events. During early fixation, the most recently selected target was encoded most strongly, particularly in subject R, and then spatially selective activity began to reflect the current target, peaking during the hold epoch. To quantify these effects, we summed the absolute beta coefficients for the x- and y-position of each target as a function of its position in the sequence, relative to the current target (Fig. 3.6B). In both subjects, LPFC neurons encoded upcoming targets with stronger selectivity than those that had been previously selected. We next examined how preceding and upcoming targets were encoded at the level of single neurons. One possibility is that there may be two groups of LPFC neurons, tuned according to whether they represent events retrospectively or prospectively (Funahashi et al., 1993b; Rainer et al., 1999). To determine this, for each of selected targets 2 through 5, we calculated the percentage of neurons that encoded the immediately preceding target, the percentage that encoded the immediately upcoming target, and the percentage that encoded both targets. We focused on the hold epoch, since this was the epoch in which neurons were maximally spatially selective. In both subjects, rather than separate groups of neurons encoding past and future targets, there appeared to be a single population of spatially selective neurons that showed spatial tuning for multiple targets within the sequence, both retrospective and prospective (Fig. 3.8).

33 3.3.5 Effects of behavioral strategy on spatial encoding

Many LPFC neurons showed a different degree of spatial encoding depending on whether the subject was performing the task using a stereotyped strategy. Figure 3.7A shows two examples, one from each subject. In both cases, the neurons showed stronger encoding of spatial information on blocks of trials in which the animal was not using a stereotyped strategy. To explore this effect at the population level, we first identified spatially selective neurons as those which significantly encoded the target’s x- and/or y-coordinate based on the full regression model. We again focused on the hold target epoch since this was when we observed the maximum amount of spatial selectivity. There were 740 spatially selective neurons (R: 529, Q: 211). For each neuron, we then applied a reduced regression model to each block of trials to quantify spatial tuning:

FR = b1×X + b2×Y (3.4)

We then plotted the percentage of variance in the neuron’s firing rate that could be explained by the spatial position of the targets as a function of SI, calculated from the r2 of the reduced model. There was a significantly negative correlation between the r2 and the magnitude of the SI in both subjects (Fig. 3.7B; Subject R: r = −0.047, p < 0.01; Subject Q: r = −0.097, p < 0.001). Thus, when subjects performed the task using a more stereotyped strategy, LPFC neurons encoded less spatial information. We also examined whether this effect was evident within the same spatial configuration. We sorted each configuration according to the SI each time the configuration was used for a block of trials. We then grouped them into three ranks: low, medium and high SI. For subject R, there were six repetitions of each configuration, and so two blocks contributed to each group. For subject Q, there were three repetitions of each configuration, so one block contributed to each group. This recoding of the data ensured that any differences that we observed in spatial tuning as a function of SI could not reflect stimulus differences in the spatial configurations, since we had removed the variance between the configurations with regard to SI. This was a less powerful analysis, since we were converting our continuous measure to a ranked measure, so we pooled the neurons from both subjects into a single analysis. Spatial tuning decreased for the same configuration display when the subject adopted a more stereotyped search strategy (Fig. 3.7C). Thus, even when comparing within the same spatial configuration, spatial tuning was reduced when the subject adopted a more stereotyped strategy.

Although neurons showed less spatial tuning when subjects followed a more stereotyped search strategy, the direction of the tuning remained constant. To quantify this, for each neuron we plotted the distribution of firing rates during the hold epoch, sorted as a function of the spatial position of the selected target (Fig. 3.7D). The firing rates were further grouped according to whether they were from the three blocks associated with a lower SI or the three blocks associated with a higher SI within a given session. We calculated the difference, d’,

34 between the mean response vector for the low and high SI distribution. Some neurons were either not spatially selective at all, or were selective only in either the high or low SI blocks, in which case the d’ measure was uniformly distributed. This is because the direction of the response vector would be random for the non-selective blocks, since it would be driven largely by noise in the firing rate. However, for neurons that encoded space in both high and low SI blocks, we observed that d’ clustered around 0, consistent with neurons showing the same direction of tuning in both types of block (Fig. 3.7E).

3.4 Discussion

We recorded the activity of LPFC neurons from two monkeys trained to perform a serial self-ordered search task (Petrides and Milner, 1982; Owen et al., 1990). There were three main findings of interest. First, individual LPFC neurons encoded the spatial location of the current search target, but also encoded other targets, even those several steps away in the search sequence. Second, LPFC neurons were more likely to encode upcoming targets than previously visited targets, although both were encoded well above chance. Finally, behavioral strategies that both monkeys spontaneously employed to help solve the task improved behavioral performance while simultaneously reducing the neuronal load required.

3.4.1 Role of prefrontal cortex in the sequential organization of behavior

Prefrontal cortex is at the apex of the perception-action cycle, responsible for integrat- ing sensory information and structuring behavior over long time-scales (Fuster, 2001). For example, damage to LPFC produces deficits in planning, both in laboratory tests (Shallice, 1982) and everyday behavior (Shallice and Burgess, 1991). Neuroimaging and neuropsy- chology studies have shown that progressively more anterior regions of the frontal lobe are responsible for integrating behavior across progressively more complex and temporally ex- tended task structures (Badre and D’Esposito, 2007; Badre et al., 2009). Neurophysiological recordings have shown that LPFC neurons encode information about specific actions, but this encoding often depends on how this action is embedded within a sequence of actions (Sugihara et al., 2006). At a more abstract level, population analyses have shown that the main information encoded by prefrontal neurons during the performance of a cognitive task is the sequence of events in the task (Sigala et al., 2008). Prefrontal neurons have also been shown to encode high-level information, such as categories (Freedman et al., 2001) and rules (Wallis et al., 2001).

The above studies are consistent with a role for prefrontal cortex in structuring behavior at a high-level. Our current results suggest a mechanism that helps to delineate a more

35 precise role for prefrontal cortex in this process. LPFC neurons were strongly tuned for the spatial locations of targets, but this tuning was not restricted to just the current target, but rather included previous and upcoming targets, often several steps in the sequence from the current target. Previous studies that have examined the contribution of LPFC neurons to working memory have concentrated on the ability of LPFC neurons to exhibit spatial tuning across intervening delays, so-called mnemonic receptive fields (Fuster and Alexander, 1971; Kubota and Niki, 1971; Funahashi et al., 1989; Constantinidis et al., 2001). Our results extend this concept and suggest that a more complete description of prefrontal tuning is a spatiotemporal receptive field that encompasses both the spatial coordinates of the target behavior as well its temporal order within the behavioral structure. We also noticed that upcoming actions tended to be represented more strongly than completed responses, which is consistent with previous studies that have also noted a bias towards prospective encoding in prefrontal neurons (Funahashi et al., 1989; Rainer et al., 1999).

3.4.2 Contribution of PFC to working memory

The spatial self-ordered search task was developed by Petrides and Milner because pa- tients with prefrontal damage performed remarkably well on short-term memory tests, such as digit span and story recall (Petrides and Milner, 1982; Petrides, 2000). This contrasted with the severe deficits that these patients exhibited on tasks that required planning (Shal- lice, 1982) or flexible control of behavior (Milner, 1963). Indeed, recent studies have shown that prefrontal damage does not impair classic tasks of spatial working memory, such as the memory-guided saccade task, so long as the frontal eye fields are preserved (Mackey et al., 2016). Petrides has argued that the critical feature of the self-ordered search task that makes it sensitive to prefrontal damage is the requirement to attend to all stimuli within the set and monitor successive choices, in other words, tracking which stimuli have been selected and which have not been selected. The spatiotemporal receptive fields that we observed in LPFC neurons contain precisely the information that would be needed for this process, encoding both the current target of behavior, as well as which stimuli have been and will be selected. Controversy regarding the precise role of prefrontal cortex in working memory has arisen from several lines of convergent evidence (Lara and Wallis, 2015). Neuroimaging studies have shown that delay period activity in PFC does not encode information specific to the stimulus being held in working memory (Curtis and D’Esposito, 2003; Riggall and Postle, 2012), while the converse is true for posterior sensory areas (Ester et al., 2009; Har- rison and Tong, 2009; Serences et al., 2009; Emrich et al., 2013). Studies in monkeys have concluded that the role of LPFC in working memory tasks relates more to the control of attention necessary for successful performance of the task than the storage of information per se (Lebedev et al., 2004; Lara and Wallis, 2014; Pasternak et al., 2015). For example, during a color working memory task (Lara and Wallis, 2012), LPFC neurons encoded spatial signals that related to the locus of covert attention rather than the color of the stimuli that needed to be remembered (Lara and Wallis, 2014). Working memory may be an emergent property

36 along the sensorimotor continuum consisting of a retrospective attention-based rehearsal mechanism and a prospective motor planning mechanism (Postle, 2006).

What about the deficits of spatial working memory that have been observed in monkeys with prefrontal lesions? These deficits were described as a mnemonic scotoma: a region of the visual field to which memory-guided saccades were impaired, while visually-guided saccades were intact (Funahashi et al., 1993a). However, a reanalysis of this data showed that although initial saccades were frequently to the wrong area of space, they were typically followed by a second, corrective saccade to the correct location (Tsujimoto and Postle, 2012). Furthermore, erroneous saccades were typically to the location that had been relevant on the previous trial. These results are difficult to reconcile with the notion that prefrontal cortex is simply storing spatial information in working memory. However, they are consistent with the notion that prefrontal neurons have spatiotemporal tuning that is responsible for organizing behavior. Thus, the deficit is not recalling the spatial location of the target, but rather performing the currently relevant response given the temporal context as defined by the task structure.

3.4.3 Role of PFC in cognitive control

When subjects searched through the targets in a more stereotyped way, their behavioral performance improved, despite a drop in the precision with which spatial information was encoded in LPFC. Our previous studies looked at the effects of reward on LPFC spatial tuning (Kennerley and Wallis, 2009a). If a reward-predictive cue was presented prior to a memorandum, then LPFC neurons showed stronger spatial tuning and behavioral perfor- mance improved. However, presenting the reward-predictive cue after the memorandum was distracting. Under these conditions, LPFC neurons showed weaker tuning and behavioral performance declined. Thus, behavioral performance correlated with the strength of LPFC spatial tuning. Taken together with our current results, this shows that the prefrontal repre- sentation of space is highly dynamic and the precision of the stored information can change in response to the cognitive demands of the task. However, our current results contrast with this previous study, in that loss of spatial precision was correlated with an improvement in precision.

One possibility is that using a stereotyped strategy to search through the targets re- cruits other brain regions more involved in sequence learning, thereby reducing the need for prefrontal working memory mechanisms (Desrochers et al., 2015a). Neuroimaging studies have shown a shift from a prefrontal-cerebellar network to a premotor-striatal network as a motor sequence becomes more stereotyped and performance more automatic (Doyon et al., 2002; Floyer-Lea and Matthews, 2004). Neural recordings in the supplementary motor area have revealed neurons that encode specific sequences of movements, while inactivation of this area impairs the performance of a motor sequence, but not the execution of individual

37 movements (Shima and Tanji, 1998). One advantage of automation with skill development is that it frees up attentional resources for other tasks, which would be consistent with a drop in prefrontal tuning, if LPFC is indeed responsible for the allocation of these resources.

In sum, our data shows that prefrontal coding of information relevant to working memory can be dynamically modulated according to task demands. Consequently, the implementa- tion of behavioral strategies to reduce performance demands can change the information content of prefrontal neurons. This has implications for the leading computational accounts of working memory, which are described as bump attractors, whereby persistent population codes are maintained through a combination of local recurrent excitation and broader feed- back inhibition (Compte et al., 2000; Wimmer et al., 2014). If these population codes can be modulated by task demands this would pose challenges for downstream areas attempting to readout the information. One possible solution to this problem, which was developed to accommodate temporal variability in neural activity (Druckmann and Chklovskii, 2012; Murray et al., 2017), is to use a subspace of the population that maintains the stability of information in working memory, despite variability in the remainder of the population. Future research should examine whether such a mechanism can also account for changes in the working memory representation evoked by variable task demands.

38 Figure 3.1: (A) Spatial self-ordered search task. (B) Each configuration consisted of 6 targets (green filled circle), which were selected from 36 possible locations (gray filled circle). We ensured that targets were approximately balanced across the display by requiring that the centroid of the configuration (red cross) was located within ± 3◦ fixation window (black circle). The inter-target distances were spaced at least 6◦ to avoid overlap of the eye position detection window around the target. (C) Number of unique sequences of target selection per configuration.

39 Figure 3.2: Behavioral performance. (A) Distribution of the number of incorrect saccades per trial. (B) Observed and expected error rates plotted as a function of which saccade in the sequence was performed. The observed error rate was significantly higher than the expected error rate for the last saccade in both subjects (binomial test, ∗ ∗ ∗p < 0.001). (C) Reaction times as a function of saccade. Boxplots indicate the 25th, 50th, and 75th percentiles of the sRT distribution. Red dots indicated the mean sRT.

40 Figure 3.3: (A) For each block of trials, we calculated how many times a particular target was selected by a particular saccade. We then plotted this data according to the most common sequence by which targets were selected. (B) Representative blocks, illustrating how the pseudocolor plots changed as a function of the stereotype index. Blocks above the line are from subject R, while blocks below the line are from subject Q. (C) Increased values of the stereotype index were correlated with better behavioral performance (fewer incorrect saccades) in both subjects.

41 Figure 3.4: Recording locations. (A) MRI of a coronal slice through the frontal lobe of subject R. Red region in each hemisphere denotes the area of the LPFC investigated. White lines depict electrode paths. (B) We measured the anterior-posterior position from the interaural line (x-axis), and the lateral-medial position relative to the lip of the ventral bank of the principal sulcus (0 point on y-axis). Gray shading indicates unfolded sulci. Diameter of the circles indicate the number of recordings from a given location. SA = superior arcuate sulcus; IA = inferior arcuate sulcus; P = principal sulcus.

42 Figure 3.5: (A) Percentage of selective neurons that encoded different predictors in each epoch. Black circles indicate the percentage of selective neurons defined as those that encoded at least one predictor during the epoch. (B) The proportion of beta coefficients with positive (+) or negative (-) values during the hold epoch.

43 Figure 3.6: (A) Color axis indicates the percentage of neurons that encode the spatial position of each of the targets in the sequence (x-axis), during the epochs associated with selecting a specific target in the sequence (y-axis). Entries on the main diagonal represent neurons that encoded the current target’s spatial information (a significant beta associated with either the x- or y-coordinate). Entries on the lower left diagonals represent neurons that encoded the spatial position of previously selected targets. Entries on the upper right diagonals represent neurons that encoded the spatial position of upcoming targets. Top and bottom panels are from subject R and Q, respectively. (B) Mean amount of spatial information encoded by spatially selective neurons as function of saccade number relative to the current saccade. Saccades related to previously selected targets are negative and upcoming saccades are positive. Dashed line indicates chance levels, determined by randomly shuffling the assignment between neural firing rates and spatial position within the sequence.

44 Figure 3.7: (A) Single neuron examples showing the effects of the stereotype strategy on spatial tuning. Spike density histograms are plotted from two neurons, one recorded in subject R (left two panels) and one recorded from subject Q (right two panels). The color of the plots refers to the position of the selected target on the screen, as shown in the key. Plots on the top are from the three blocks in the session with the higher SI, whereas the plots on the bottom are from the three blocks with the lower SI. Less spatial tuning is observed when the animal is searching through the targets using a more stereotyped strategy. (B) Spatial tuning, as measured by the R-square from the reduced model, as a function of SI. Stronger spatial tuning occurs in blocks with less stereotyped behavior.

45 Figure 3.7(Continued): (C) Spatial tuning from the same configuration sorted from low to high SI, and then grouped into three ranks. Spatial tuning decreases for the same config- uration display when the subject adopts a more stereotyped search strategy (r = −0.0654; p < 5×10−5. (D) Polar plot showing the distribution of standardized mean firing rates from the neuron in the left panel of (A), corresponding to the selected target location during the hold epoch. Firing rates are sorted according to whether they are from blocks of low or high SI. Gray and black lines indicate the mean response vector for low and high blocks, respectively. The difference between these two response vectors (d’) was then calculated.(E) Distribution of d’ values for neurons that were spatially tuned in either the low or high SI blocks only or both types of block.

46 Figure 3.8: Encoding of spatial information for targets immediately preceding and/or follow- ing the currently selected target. In subject R, 40.9% of neurons encoded preceding targets compared to 44.2% that encoded upcoming targets. The percentage of neurons that encoded both preceding and upcoming targets was 32.4%, which was significantly greater than we would have expected by chance if these were two independent populations (chance = 18.1%, binomial test, p < 5×10−5). Likewise, in subject Q, 30.1% of neurons encoded preceding targets, 30.9% encoded upcoming targets and 21.7% encoded both (chance = 9.3%, binomial test, binomial test, p < 1×10−12.

47 Chapter 4

Neuronal encoding in prefrontal cortex during hierarchical reinforcement learning

4.1 Introduction

Reinforcement learning (RL) is one of the most influential learning models to date, and has had a dramatic impact on both artificial intelligence (Mnih et al., 2015) and our understanding of neural computation (Schultz et al., 1997). RL uses discrepancies between expected and actual reward outcomes to drive learning (Sutton and Barto, 1998). This estimation, known as a reward prediction error (RPE), is encoded by midbrain dopamine neurons (Schultz et al., 1997; Hollerman and Schultz, 1998) and is thought to underlie how animals and humans learn behaviors necessary to acquire rewards from the environment (Dayan and Niv, 2008; Lee et al., 2012). RPE-related neural signals are also found in lateral prefrontal cortex (LPFC) (Asaad and Eskandar, 2011) and anterior cingulate cortex (ACC) (Kennerley et al., 2011). However, RL suffers from a problem of scaling (Botvinick et al., 2009). While it performs well in relatively constrained learning environments, as the number of environmental states and actions increases, the amount of sampling required by the agent, and hence the amount of training time needed to acquire a behavior, scales as a positively accelerating function. Thus, an environment can quickly become too complex for RL to be feasible learning solution.

Computational theoretical studies have proposed modifications to the conventional RL models in order to allow them to accommodate more complex hierarchical behavioral struc- ture that is typical of the real word (Sutton et al., 1999). Instead of reinforcing individual actions, hierarchical reinforcement learning (HRL) allows the chunking of actions into more

48 temporally abstract behaviors, referred to as ‘options’. Each option terminates when a par- ticular subgoal is attained, which generates an option-specific prediction error, referred to as a pseudo-reward prediction error (PPE). For example, when making a cup of coffee, one option might be adding milk, but individual actions that contribute to that option (e.g. getting the milk out of the fridge, opening the milk carton) would contribute solely to the PPE rather than the RPE generated by drinking the coffee.

The notion that complex behavior is organized hierarchically also has a long history in neuroscience. Hughlings Jackson, for example, emphasized the notion that the frontal lobe represented behaviors in a hierarchical manner (Phillips, 1973). Neuroimaging and neu- ropsychology studies have shown that progressively more complex behaviors are controlled by progressively more anterior regions of prefrontal cortex (Koechlin et al., 2003; Badre and D’Esposito, 2007; Badre et al., 2009). Recent efforts have focused on determining the neural substrates of the algorithmic processes derived from computational theories of HRL (Ribas-Fernandes et al., 2011; Badre and Frank, 2012; Frank and Badre, 2012; Holroyd and McClure, 2015). However, to date there has been little attempt to study HRL at the level of individual neurons, which could provide insights into the specific computations performed by prefrontal neurons that support HRL. Therefore, we trained two monkeys to perform a primate version of a task that has been used in humans to study HRL (Ribas-Fernandes et al., 2011). The task required performing a sequence of lever movements in order to move a stimulus from a start position to a goal position, by way of an intermediate subgoal posi- tion. On a fraction of trials the position of the subgoal changes, thereby generating a PPE. In the human version of the task, the BOLD response in ACC positively correlated with the magnitude of the PPE. To examine whether this information was encoded at the level of single neurons, we recorded the electrical activity of single neurons in LPFC, ACC and orbitofrontal cortex (OFC) while animals performed the HRL task.

4.2 Methods

4.2.1 Subjects and behavioral task

Two male rhesus monkeys (Macaca mulatta) served as subjects (Q and R). Subjects were 5 and 6 years of age, and weighed approximately 7 and 9 kg at the time of recording. We regulated the daily fluid intake of our subjects to maintain motivation on the task. Subjects sat in a primate chair and viewed a computer screen. We used MonkeyLogic system (Asaad and Eskandar, 2008) to control the presentation of the stimuli and the task contingencies. Eye movements were tracked with an infrared system (ISCAN). All procedures were in accord with the National Institute of Health guidelines and the recommendations of the University of California at Berkeley Animal Care and Use Committee.

49 Our behavioral task has previously been used to measure pseudo-reward prediction errors in humans (Ribas-Fernandes et al., 2011). The delivery task requires subjects to take the perspective of a delivery driver that has to choose between two jobs involving picking up a package (the subgoal) and delivering it to a customer (goal). After the subject selects one of the jobs, the position of the package sometimes changes, which generates a pseudo-reward prediction error. We trained two animals to perform a version of this task (Fig. 4.1A). Subjects were required to fixate a central cue to initiate a trial, after which two stimulus configurations appeared on the left and right of the screen. Each configuration consisted of three color dots which represented the start position (green), subgoal position (white), and goal position (blue). Subjects selected one of the configurations with a joystick movement. Once one of the two configurations was chosen, the other one disappeared and the subject had to make a series of joystick movements back-and-forth between the center location and the chosen side in order to move the green dot step-by-step from the start position to the goal position via the subgoal position. Each joystick movement moved the green dot 1◦ of visual angle. A juice reward was delivered once the green dot reached the goal position. The optimal choice was to select the shortest route, since this would lead to reward more quickly and with less physical effort. The start and goal positions in each original configuration were placed on the circumference of a circle 8◦ of visual angle in diameter. This circle was not visible to the animal. We manipulated two variables in each configuration: total steps (TS), the number of steps from the start position to the goal via the subgoal, and subgoal steps (SG), the number of steps from the start position to the subgoal. We also calculated the straight-line distance (SD), which is the degrees of visual angle in a straight line from the start position to the goal.

Once the animals had been trained on the choice task, we implanted the neurophysi- ological recording equipment and recorded neural activity. During recording sessions, only 10% of the trials were choice trials. The other 90% of trials, which we collectively refer to as ‘jump’ trials, began with the presentation of a single stimulus configuration for 500 ms in the center of the screen (pre-jump configuration), followed by a second configuration (post-jump configuration) for 500 ms. On 56% of the jump trials, the post-jump configuration contained no new information, either because it was identical to the pre-jump configuration, or be- cause the subgoal changed position but remained the same distance from the start and goal positions (Fig 4.1B, ‘mirror’ condition). On the other 44% of the jump trials, the post-jump configuration generated a pseudo-reward prediction error (because the difference from the start position to the subgoal changed) and/or a reward prediction error (because the total number of steps to the goal changed). The fixation cue then changed color indicating to the subject whether they should make rightward or leftward joystick movements in order to move the green dot to the goal position. Table 4.1 describes the different combinations of experimental conditions and Figure 4.1B illustrates example configurations.

50 4.2.2 Neurophysiological procedures

Our methods for neurophysiological recording have been reported in detail previously (Lara et al., 2009). Briefly, we implanted both subjects with a titanium head positioner for restraint and one recording chamber over each hemisphere, the position of which was deter- mined using a 1.5 T magnetic resonance imaging (MRI) scanner. One recording chamber was positioned at an angle to allow access to LPFC and ACC, and the other was a vertical chamber to allow access to OFC. We recorded simultaneously from LPFC, ACC, and OFC using arrays of 6-14 tungsten microelectrodes (FHC Instruments). We determined the ap- proximate distance to lower the electrodes from the MRI scans and advanced the electrodes using custom-built, manual microdrives until they were located just above the cell layer. We then slowly lowered the electrodes into the cell layer until we obtained a neuronal waveform, which were digitized and analyzed off-line (Plexon Instruments). We randomly sampled neu- rons; we did not attempt to select neurons based on responsiveness. This procedure aimed to reduce any bias in our estimate of neuronal activity thereby allowing a fairer comparison of neuronal properties between the different brain regions. We reconstructed our recording locations by measuring the position of the recording chambers using stereotactic methods. We plotted the positions onto the MRI sections using commercial graphics software (Adobe Illustrator). We confirmed the correspondence between the MRI sections and our recording chambers by mapping the position of sulci and gray and white matter boundaries using neu- rophysiological recordings. We traced and measured the distance of each recording location along the cortical surface from the lip of the ventral bank of the principal sulcus. We also measured the positions of the other sulci in this way, allowing the construction of unfolded cortical maps.

4.2.3 Statistical methods

Behavioral data analysis. We conducted all statistical analyses using MATLAB (Math- works). All data for behavioral analyses were from the choice trials. To determine how the parameters of the stimulus configurations affected choice behavior, we performed a formal model comparison. We predicted that configurations with fewer total steps should be consid- ered more valuable than configurations with more steps and consequently should be chosen preferentially by the animals. We expected the position of the subgoal to have a smaller or negligible influence on choice behavior. We also included the straight line distance between the start and goal position, since this provided a complete description of the triangular ar- rangement of start, subgoal, and goal positions. We tested logarithmic transformations of the distances, in addition to linear distances, since we have previously observed a better fit between visual stimuli and reward value using logarithmic transformations (Rich and Wallis, 2014). We used these parameters to estimate the subjective value (SV ) of the left and right choice options:

51 SVL = 1 − w1×TSL − w2×SGL − w3×SDL (4.1)

SVR = 1 − w1×TSR − w2×SGR − w3×SDR (4.2)

where TS is the total steps from the start position to the goal position by way of the subgoal, is the distance from the start position to the subgoal position, and SD is the straight line distance between the start and goal position. We then fit a logistic regression model using the discounted values (SVL − SVR) to predict PL, the probability that the subject chose the left configuration. We included a bias term, b, which accounted for any tendency of the subject to select the leftward configuration that was independent of the configurations’ values:

1 PL = (4.3) 1 + ew4(SVL−SVR)−w5b

We estimated the weights of each parameters in the model by determining the values that minimized the log likelihood of the model. To fit the weights (w1 to w5), we used a maximum likelihood fitting (“fmincon” function in MATLAB) to find the set of parameters that best predicted the experimental data. To obtain fitted weights, we ran the maximum likelihood fitting function 100 times for each of 10 different randomly determined initial weights and then calculated the mean of the fitted weights. We compared models using Akaike’s Information Criterion (AIC) (Akaike, 1974). Our other behavioral measure was the lever movement time, which we defined as the time taken to move the joystick from the center position to the chosen side and then back again following the movement of the green dot.

Neural data analysis. All data for the neural analysis was from the jump trials. We visualized single neuron activity by constructing spike density histograms. We calculated the mean firing rate of the neuron across the appropriate experimental conditions using a sliding window of 100 ms. We then analyzed neuronal activity in two predefined epochs of 50-500 ms each, corresponding to the presentation of pre- and post-jump configurations. For each neuron, we calculated its mean firing rate on each trial during each epoch. To determine whether a neuron encoded an experimental factor, we used linear regressions to quantify how well the experimental manipulation predicted the neuron’s firing rate. Before conducting the regression, we standardized our dependent variable (i.e., firing rate) by subtracting the mean of the dependent variable from each data point and dividing each data point by the SD of the distribution. We evaluated the significance of selectivity at the single neuron level using an alpha level of p ¡ 0.05. We examined how neurons encoded information about the pre-jump configuration by performing a linear regression on the neuron’s mean firing rate (F ) during the pre-jump configuration presentation:

52 F = b0 + b1×SV + b2×LR (4.4)

where SV denotes the subjective value of the pre-jump configuration calculated according to the weights derived from our behavioral model and LR was a dummy variable that indicated whether the start position was to the left or right of fixation. Selective neurons were defined as those in which Equation 4 significantly predicted the neuron’s firing rate (F-test evaluated at p < 0.05) and one or more of the beta coefficients was significant (coefficient t-test evaluated at p < 0.05). We examined how neurons encoded the post-jump configuration by performing a linear regression on the neuron’s mean firing rate (F ) during the post-jump event with six predictors:

F = b + b ×SV + b ×LR + b p×RPE 0 1 2 3 (4.5) + b4×nRP E + b5×pP P E + b6×nP P E

RPE = SVpost − SVpre (4.6)

PPE = w2×(SGpost − SGpre) (4.7)

SV and LR are defined as for Equation 4. Another four predictors represented positive or negative reward prediction errors (RPE) or pseudo-reward prediction errors (PPE). Selective neurons were then defined in the same way as for Equation 4.

To quantify the strength of neural encoding, for each neuron, we calculated the coefficient of partial determination (CPD) for each parameter. This is the amount of variance in the neuron’s firing rate that can be explained by one predictor over and above the variance explained by other predictors included in the model. The CPD for predictor i is defined as:

SSEx−i − SSEx CPDi = (4.8) SSEx−i

where SSEX−i is the sum of squared errors in a regression model that includes all of the relevant predictor variables except i, whereas SSEX is the sum of squared errors in a regression model that includes all of the relevant predictor variables.

To examine the time course of the contribution of each predictor, we performed a “slid- ing” regression analysis to calculate the CPD at each time point for each neuron. We fit each regression model (Equation 4 for the pre-jump configuration and Equation 5 for the post-jump configuration) to neuronal firing for overlapping 200 ms windows, beginning with

53 the 200 ms immediately prior to the task epoch and then shifting the window in 10 ms steps we reached the end of task epoch. To determine the threshold for significant selectivity, we performed an analogous analysis on neural activity during the fixation epoch. At this stage of the task, the subject has no information about the upcoming configuration, and so any neurons that reached criterion must have done so by chance. We determined the maximum CPD value for each neuron during the fixation period, and used the 95th percentile of this distribution of values as our criterion for selective encoding of the predictors during the pre- or post-jump configurations. We calculated the neuron’s latency to encode selective infor- mation as at which the CPD values significantly exceeded the criterion for three consecutive time bins (evaluated at p < 0.005).

4.3 Results

4.3.1 Behavioral task performance

To examine the influence of the stimulus configurations, we performed a model com- parison, as described in detail in the Methods. The full model included parameters for the total steps, subgoal position and straight-line distance from start to goal (w1, w2, and w3). Against this model we compared other models in which we tested subsets of these param- eters. In addition, we evaluated whether choice behavior relied on linear or logarithmic estimates of distances. In both animals, the full model was clearly favored, although subject R favored logarithmic estimates of distance while subject Q favored linear estimates (Table 4.2 and Fig. 4.2A). For subject R, w1 = 3.2, w2 = 0.5andw3 = 1.4, while for subject Q, w1 = 0.5, w2 = 0.4andw3 = 1.1. Thus, for both subjects, the subgoal position had the smallest effect on choice behavior, although subject R based their choices more on the total steps, while subject Q used the straight-line distance. These two variables were positively correlated (correlation coefficient = 0.91), which likely accounted for why either variable could be used to solve the task. Overall our models provided an excellent fit to the choice behavior (Fig. 4.2B), explaining 93% of the variance in subject R’s choice behavior, and 90% of the variance in subject Q.

Although the subgoal position only had a small effect on choice behavior, the model in which it was included clearly performed better than the model in which it was omitted in both animals. This indicated that the animals were not simply ignoring the subgoal. Further evidence was apparent in the lever movement times. Both animals showed a tendency to slow down as they approached both the subgoal and the goal, and to speed up again once the subgoal had been acquired (Fig. 4.3A). This was evident when we looked at the change in movement time from one step to the next (Fig 4.3B). We found that subjects slowed down (positive values on the y-axis) on approaching the subgoal, and sped up (negative values on the y-axis) immediately after its attainment (one-way ANOVA, F (5, 227) = 17.62, p <

54 1×10−13 for subject R; F (5, 179) = 22.85, p < 1×10−16 for subject Q). In other words, subjects did pay attention to the subgoal position in the series of lever movements.

4.3.2 Neural encoding

We recorded the activity of 308 neurons from LPFC (subject R 132; subject Q 176), 249 neurons from OFC (R 130; Q 119), and 212 neurons from ACC (R 106; Q 106). Recording locations are illustrated in Figure 4.4. We collected the data across 38 recording sessions for subject R and 30 sessions for subject Q. In order to obtain sufficient statistical power, the neurons from the two subjects were pooled. For all significant results, there were no qualitative differences between the two subjects (i.e. the effects were in the same direction), unless otherwise noted. During the presentation of the pre-jump configuration the most prevalent encoding was the value of the configuration, and this was more prevalent in ACC relative to the other two areas (Figure 4.5A). Figure 6A illustrates example neurons that encoded selectively the SV of the pre-jump configurations. Fewer neurons encoded sensory information about the stimulus configuration, i.e. LR, which is whether the start position was to the right or left of fixation. Since sensory encoding is not the focus of this report, we will not discuss LR encoding further. The time course of SV selectivity across the neural population is illustrated in Figure 4.7. There was no difference between the areas with respect to the onset of SV selectivity (median LPFC = 171 ms; median OFC = 166 ms; median ACC = 151 ms, 1-way ANOVA, F (2, 122) = 0.45, p > 0.05). During the post- jump configuration, we also found that the most prevalent encoding was the encoding of the configuration’s value (Figure 4.5B and 4.6B). Note that some neurons encoded SV before the onset of the post-jump configuration, which indicates that they also encoded the SV of the pre-jump configuration. Some neurons also encoded RPE, and these neurons were most prevalent in ACC. In contrast, very few neurons encoded PPE, although the prevalence of such neurons did exceed chance in ACC (20/212 or 9.4%, binomial test, p < 0.01). However, the weak encoding of PPE relative to the other variables is evident in the population plots shown in Figure 4.8, where the robust encoding of SV contrasts with the weaker encoding of RPE and the virtually absent encoding of PPE.

4.4 Discussion

We developed a primate version of a task that has been previously used to study hier- archical reinforcement learning in humans (Ribas-Fernandes et al., 2011). Animals had to use a lever to move a dot to a subgoal position and then on towards a goal position. Many prefrontal neurons encoded the value of the presented task configuration, as defined by the parameters that individual animals used to guide their choice behavior. This replicates our previous results (Kennerley et al., 2009; Kennerley and Wallis, 2009b), in which neurons

55 encoded the number of lever presses that animals needed to make in order to earn a reward. Prefrontal neurons also encoded RPE, which is also consistent with our previous work (Ken- nerley et al., 2011). On the other hand, very few neurons encoded PPE, although those that did appear to be located in ACC.

Neuroimaging studies in humans that used the same task showed that PPE correlated with increased BOLD activation in ACC (Ribas-Fernandes et al., 2011). Our data therefore appear to provide convergent evidence to support the HRL theoretical framework and a role for ACC in this process. However, an important caveat is that the degree of neural encoding that we observed in ACC was not particularly compelling. Only a handful of neurons showed significant encoding, and none of those neurons were particularly strongly tuned to PPE.

One possible explanation for the weak effects is that the animals were not paying suf- ficient attention to the task configurations, since the majority of trials did not require a choice. This explanation seems inadequate. In previous tasks where we have interleaved trials requiring a choice with those that did not require a choice, we have seen little differ- ence in the response of prefrontal neurons to both types of trial (Rich and Wallis, 2016). In addition, in the current study, we observed robust encoding of the value of the stimulus configuration. Finally, both animals showed changes in reaction time on attaining the sub- goal. Taken together, these results suggest that the animals were appropriately attending to the task and the subgoal. Differences might also have arisen due to the way the task was represented across the two species. Humans bring context to the task in a way that monkeys cannot. For example, in humans, the description of the task involved a driver picking up a package and delivering it to a customer. This real-world knowledge might have contributed to humans approaching the task in a more hierarchical fashion compared to the relatively abstract representation that the animals experience. An additional difference between the two species relates to the value of acquiring the subgoal. In humans, there was no evidence that the subgoal influenced choice behavior, suggesting that acquiring the subgoal was not rewarding. In contrast, in the current study, the subgoal did influence the animals’ choice behavior, albeit to a smaller extent than the other stimulus parameters. Thus, we cannot rule out the possibility that the PPE that was generated in ACC simply reflected an RPE that was generated by the acquisition of the subgoal.

This raises a broader issue with the HRL framework. The original study examining HRL in humans emphasized that pseudo-rewards are distinct from primary rewards because attaining subgoals is not necessarily rewarding in and of itself (Ribas-Fernandes et al., 2011). An example is adding milk to coffee: the subgoal brings one closer to the first sip of coffee, but the act itself is not rewarding. However, traditional RL models can also account for the influence of non-rewarding subgoals on behavior, because reward values become progressively associated with earlier reward-predictive events, which would include attaining subgoals that are not in themselves necessarily rewarding. Thus, the critical difference between HRL and RL rests, not so much in the distinction between pseudo-rewards and primary rewards, but rather in the way in which behavior is organized, in particular, the unit of behavior that

56 is reinforced. In RL, prediction errors are calculated for each individual action, whereas in HRL, individual actions are chunked into a subroutine that generates its own prediction error on completion. It is not clear whether the prediction error generated by the subrou- tine necessarily needs to involve a distinct neural signal compared to the prediction error generated by the primary reward.

How the brain determines the appropriate behavioral unit for reinforcement learning mechanisms is an area of active investigation. One idea is that the brain tends to group together mutually predictive stimuli and actions into a single event (Schapiro et al., 2013). For example, driving to a restaurant and ordering a meal are both actions that can acquire an ultimate goal of eating a tasty meal, but behaviorally, the agent experiences a continuous stream of stimuli and actions. However, the act of driving involves many mutually predictive stimuli (e.g. steering wheel, traffic lights, seat belt) but only weakly predicts going to a restaurant, since one can drive to many alternate destinations. Likewise, ordering a meal involves many mutually predictive stimuli (e.g. server, menu, water), but may only weakly predict driving, since one could have also walked or caught the subway. Thus, the act of driving is grouped as a separate event from ordering the meal. The responses of prefrontal neurons are consistent with organizing behavior into these high-level events. For example, one of the major determinants of prefrontal firing rates is in which part of the task the agent is currently engaged (Sigala et al., 2008). Prefrontal neurons also encode events at an abstract, high-level, incorporating categories (Freedman et al., 2001) and rules (Wallis et al., 2001). It may be that standard RL mechanisms operating on these high-level, behavioral events are sufficient to account for hierarchical behavior.

In summary, our results provide partial support for the involvement of ACC in HRL. In a task designed to use hierarchical behavior, we observed neurons in ACC whose firing rate correlated with PPE. However, there were caveats to this support, including the weak encoding, particularly in comparison to other signals that have been more firmly associated with ACC, such as predicted value and RPE, and whether HRL even requires a PPE signal distinct from RPE.

57 Table 4.1: Relationship of jump configurations to parameters in the HRL model.

Table 4.2: Akaike values for both subjects across all tested models.

58 Figure 4.1: (A) Timeline of the behavioral task. On choice trials, subjects chose one of two stimulus configurations and then moved a joystick back-and-forth to move the green dot forwards on a green-white-blue route. The optimal choice was to select the shortest route, since this would lead to reward more quickly and with less physical effort. On jump trials, a single configuration was presented, followed by a second configuration, which sometimes required updating the expectancy of how much work would be required to earn the reward. (B) Sample post-jump configurations. The original configuration is shown in the top left. Numbers above the configuration indicate the number of steps for TS, SG and SD.

59 Figure 4.2: (A) AIC weights across the 14 tested models. The AIC weight is the relative likelihood of a given model within the set of tested models. The full model was clearly favored in both subjects, although a logarithmic transformation of distance was favored by subject R, whereas subject Q estimated distances linearly. (B) Behavioral performance during the choice trials. The probability of selecting the left configuration as a function of the difference in value of the left and right configurations as determined by Equations 1 and 2. Gray circles indicate actual data, whereas green lines indicate the best fitting model as determined by a formal model comparison.

60 Figure 4.3: (A) Lever movement times for steps relative to subgoal or goal positions. (B) Movement times relative to the previous steps. The diagram indicates the specific movements referenced by the x-axis. Asterisk indicated that the values were significant lower than any other values after pairwise comparisons (p < 0.01, Bonferroni corrected).

61 Figure 4.4: (A) Coronal MRI scans illustrating potential electrode paths. Red, green, and blue target areas indicate LPFC, ACC, and OFC, respectively. (B) Flattened reconstructions of the cortex indicating the locations of recorded neurons. The size of the circles indicates the number of neurons recorded at that location. We measured the anterior–posterior position from the interaural line (x-axis), and the dorsoventral position relative to the lip of the ventral bank of the principal sulcus (0 point on y-axis). Gray shading indicates unfolded sulci. LPFC recording locations were located within the principal sulcus. ACC recording locations were located within the cingulate sulcus. OFC recording locations were largely located within and between the lateral and medial orbital sulci. All recording locations are plotted relative to the ventral bank of the principal sulcus, which is a consistent landmark across animals. PS, principal sulcus; CS, cingulate sulcus; LOS, lateral orbital sulcus; MOS, medial orbital sulcus.

62 Figure 4.5: Percentage of neurons in LPFC, OFC, and ACC that encode different predictors during the pre-jump and post-jump configurations. Shading indicates the proportion of neurons that encoded the variable with a given relationship: dark shading = positive, light shading = negative. For the post-jump configuration, gray color indicates the proportion encoding both positive and negative predictors, which was possible since we included these as separate regressors. Asterisks indicate that the prevalence of neurons is significantly different between areas (chi-squared test, ∗ p < 0.05, ∗∗ p < 0.01). Dotted line indicates the percentage of selective neurons expected by chance given our statistical threshold for selectivity.

63 Figure 4.6: Spike density histograms illustrating selective neurons for subject value (SV) from three recording areas during the pre-jump (A) or post-jump (B) configurations. In each plot, the top panel indicates mean firing rate as a function of SV. The bottom panel indicates that the coefficient of partial determination (CPD) for SV. This measure indicates the amount of variance in the neuron’s firing rate that is accounted for by SV and cannot be explained by any of the other parameters in the regression model (see Materials and Methods). Magenta data points indicate that the SV significantly predicts neuronal firing rate. The gray lines indicate the onset and offset of the pre- and post-jump configurations.

64 Figure 4.7: Encoding of the SV of the pre-jump configuration across the population in three prefrontal areas. Each horizontal line on the plot indicates the selectivity of a single neuron as measured using the coefficient of partial determination (see Materials and Methods). Neurons have been sorted according to the latency at which they first show selectivity. The vertical white lines indicate the onset and offset of the pre-jump configuration.

65 Figure 4.8: Encoding of predictors related to the post-jump configuration across the popu- lation in three prefrontal areas. Conventions are as in Figure 4.7.

66 Chapter 5

Conclusion

We began this thesis with the question of how neural activities in prefrontal cortex represent a series of actions with increasingly complex and temporal extension in sequential tasks. In WM experiment, we used a spatial self-ordered search task to determine how behavioral strategies can help mitigate capacity constraints on WM. In HRL experiment, we manipulated the parameters of task configuration to exam how levels of information were represented and guided the choice behaviors. Based on shared anatomical connections, we focused on LPFC, OFC, and ACC these three prefrontal areas. This chapter will summarize findings from our two experiments in relation to current prefrontal literature. Then it will address remaining questions and future directions related to this work.

5.1 Summary of results

We approached the internal representations in hierarchical and sequential tasks by con- trasting how prefrontal regions encoded and integrated a series of dynamic information in WM and HRL models.

In Chapter 3 we addressed that people rarely notice the inconvenience which results from the limited capacity of WM in the daily life, in part because we develop behavioral strategies that help mitigate the effects of WM constraints. How behavioral strategies are mediated at the neural level is unclear, but a likely locus is LPFC. Neurons in LPFC play a prominent role in WM and have been shown to encode behavioral strategies. To examine the role of LPFC in overcoming WM limitations, we recorded the activity of LPFC neurons in two animals trained to perform a serial self-ordered search task. This task measured the animals’ ability to prospectively plan the selection of unchosen spatial search targets while retrospectively tracking which targets were previously visited. As such, we found that

67 individual LPFC neurons encoded the spatial location of the current search target, but also encoded the spatial location of targets up to several steps away in the search sequence. Neurons were more likely to encode prospective than retrospective targets. When subjects used a behavioral strategy of stereotyped target selection, mitigating the WM requirements of the task, not only did the number of selection errors decrease, but there was a significant reduction in the extent of spatial encoding in LFPC. These results show that LPFC neurons have spatiotemporal mnemonic fields, in that their firing rates are modulated both by the spatial location of future selection behaviors and the temporal organization of that behavior. Furthermore, the precision of this tuning can be dynamically modulated by the demands of the task.

In Chapter 4 we addressed that the conventional RL models have proven highly effective for understanding learning in both artificial and biological systems. However, these models have difficulty scaling up to the complexity of real-life environments. One solution is to incorporate the hierarchical structure of behavior into these models. In HRL, primitive actions are chunked together into more temporally abstract actions, called ‘options’, that are reinforced by attaining a subgoal. These subgoals are capable of generating pseudo- reward prediction errors, which are distinct from reward prediction errors that are associated with the final goal of the behavior. Previous studies in humans have shown that pseudo- reward prediction errors positively correlate with activation of anterior cingulate cortex. To determine how pseudo-reward prediction errors are encoded at the single neuron level, we trained two animals to perform a primate version of the task used to generate these errors in humans. We recorded the electrical activity of neurons in ACC during performance of this task, as well as neurons in LPFC and OFC. We found that the firing rate of a small population of neurons encoded pseudo-reward prediction errors and these neurons were restricted to ACC. Our results provide support for the idea that ACC may play an important role in the encoding of subgoals and pseudo-reward prediction errors in order to support HRL, although with the caveat that neurons encoding pseudo-reward prediction errors were relatively few in number, especially in comparison to neurons that encoded information about the main goal of the task.

5.2 Remaining questions

5.2.1 Errors in sequential tasks

In WM experiment, the results we found mostly came from correct saccades in spatial self-ordered task, however, we did not have enough error saccades to do neural analysis. The error saccades which happened in between correct responses may imply a position of chunk- ing boundary which separates the information into two organized subgroups. A metaphor of the chunking boundary is analogue to the hyphen we usually put it in phone or credit

68 card numbers. The errors happened due to the conflict of item information showing in in- consistent subgroups. One of psychological model of short-term memory, a Start-End Model (SEM), considers the item positions in a sequence and those items are associated relative to the start and end of that sequence (Henson, 1998). Simulations confirm SEM’s ability to capture the main phenomena in serial recall, such as the effects of primacy, recency, grouping, and proactive interference. Moreover, SEM is the first model to capture the complete pattern of errors, including transpositions, repetitions, omissions, intrusions, confusions, and, in par- ticular, positional errors between groups and between trials. Further, an alternative model with recurrent connectionist approach was proposed to consider the actions in the proper sequence requires a continuously updated representation of temporal context (Botvinick and Plaut, 2006). Degrading this representation led to errors resembling those observed in every- day behavior. Taken together, the errors in sequential tasks could be an anchor to mark the transition in between two subgroups of information. What exact item information chunked in a subgroup may not be easy to track with a certain sequential order, but at least the error position serve as a chunking boundary is observable.

5.2.2 Adaptation from prediction errors

In HRL experiment, we found that prefrontal neurons encoded positive or negative RPE and few neurons encoded PPE, although those that did appear to be located in ACC. One possible reason is that the contribution of subgoal-induced PPE to the final reward is unobvious. Reminding of the concepts of convention reinforcement learning model (Sutton and Barto, 1998), humans and animals are able to maximize the reward outcomes through optimizing the learning progresses. Although we manipulated the size of subject value in HRL task, subjects received the value information passively before/after jump events and did not allow them to gain more reward actively due to the setting of certain juice amount in this task. A study which a computer agent was trained by deep learning model to play Atari games may provide us another viewpoint (Mnih et al., 2015). The agent learns the optimal strategy in Breakout game, which is to first dig a tunnel around the side of the wall allowing the ball to be sent around the back to destroy a large number of blocks by bouncing once. This suggests that the performance could be dramatically optimized by exploiting and achieving a subgoal, for instance digging a tunnel. In this case, the contribution of subgoal is linked to the behavioral strategy and allows to maximize the reward outcome across trainings.

5.3 Future directions

More work is needed to shed light on the precise nature of the neural representation in between PFC and other brain regions while the task-relevant information accumulated during

69 a series of actions in sequential tasks. The use of modern large-scale recording methods (Kipke et al., 2008; Dotson et al., 2015) and analysis techniques (Meyers et al., 2008; Stokes et al., 2013; Cunningham and Yu, 2014) has the potential to allow the tracing of the flow of information in PFC and back again during the sequential tasks. With helps of those advanced methods, one promising idea is taking advantage of intrinsic temporal orders in task structure to understand how neuronal activities engage/disengage in the internal state of the brain. This approach forms the basis of the dynamical-systems framework, which has recently been adopted to understand the neural mechanisms underlying the sequential tasks (Sigala et al., 2008). Given that executive processes like working memory and goal-directed behaviors are, by their very nature, internal, dynamical processes, using a dynamical-systems approach in their study has the potential to shed light on how the way prefrontal neurons code successive phases of a structured task plan that are required for such a complex repertoire of cognitive control.

5.4 Closing remarks

The experiments in this thesis devise primate version of sequential tasks and has exam- ined the neural substrate of hierarchical behaviors in PFC. Specifically, we have considered how single neurons in three distinct frontal brain regions - LPFC, OFC, and ACC - encode levels of subject value, event epochs, and behavioral strategies enabling animals to access the abstract information of the goal-subgoal status with additional temporal order. Although our findings only focus on the evidences of research areas in working memory and reinforce- ment learning, beginning to study how neurons code the internal representation of sequential tasks is never too late to extend the knowledge from those in the simple tasks. As states by Lashley (1951):

“Serial order is typical of the problems raised by cerebral activity; few, if any, of the problems are simpler or promise easier solution. We can, perhaps, postpone the fatal today when we must face them, by saying that they are too complex for present analysis, but there is danger here of con- structing a false picture of those processes that we believe to be simpler.” (p.134)

To understand the potential mechanisms in the sequential tasks, more studies are necessary to directly observe and manipulate the neural circuitry in sequential tasks. These findings could help illustrate how the complex behavioral repertoire of the primate is implemented.

70 Bibliography

1. Abe H, Lee D (2011) Distributed coding of actual and hypothetical outcomes in the orbital and dorsolateral prefrontal cortex. Neuron 70:731-741.

2. Akaike H (1974) A new look at statistical-model identification. IEEE T Automat Contr Ac19:716-723.

3. Alexander WH, Brown JW (2011) Medial prefrontal cortex as an action-outcome pre- dictor. Nat Neurosci 14:1338-1344.

4. Asaad WF, Eskandar EN (2008) A flexible software tool for temporally-precise behav- ioral control in Matlab. J Neurosci Methods 174:245-258.

5. Asaad WF, Eskandar EN (2011) Encoding of both positive and negative reward pre- diction errors by neurons of the primate lateral prefrontal cortex and caudate nucleus. J Neurosci 31:17772-17787.

6. Baddeley A (2003) Working memory: looking back and looking forward. Nature re- views Neuroscience 4:829-839.

7. Badre D (2008) Cognitive control, hierarchy, and the rostro-caudal organization of the frontal lobes. Trends Cogn Sci 12:193-200.

8. Badre D, D’Esposito M (2007) Functional magnetic resonance imaging evidence for a hierarchical organization of the prefrontal cortex. J Cogn Neurosci 19:2082-2099.

9. Badre D, Frank MJ (2012) Mechanisms of hierarchical reinforcement learning in cortico- striatal circuits 2: evidence from fMRI. Cereb Cortex 22:527-536.

10. Badre D, Kayser AS, D’Esposito M (2010) Frontal cortex and the discovery of abstract action rules. Neuron 66:315-326.

11. Badre D, Hoffman J, Cooney JW, D’Esposito M (2009) Hierarchical cognitive control deficits following damage to the human frontal lobe. Nat Neurosci 12:515-522.

12. Balleine BW, Dickinson A (1998) The role of incentive learning in instrumental out- come revaluation by sensory-specific satiety. Anim Learn Behav 26:46-59.

13. Barbas H, Pandya DN (1989) Architecture and intrinsic connections of the prefrontal cortex in the rhesus monkey. J Comp Neurol 286:353-375.

14. Bauer RH, Fuster JM (1976) Delayed-matching and delayed-response deficit from cool- ing dorsolateral prefrontal cortex in monkeys. J Comp Physiol Psychol 90:293-302.

71 15. Bechara A, Damasio H, Tranel D, Anderson SW (1998) Dissociation of working memory from decision making within the human prefrontal cortex. J Neurosci 18:428-437.

16. Behrens TE, Jocham G (2011) How to perfect a chocolate souffle and other important problems. Neuron 71:203-205.

17. Bor D, Duncan J, Wiseman RJ, Owen AM (2003) Encoding strategies dissociate pre- frontal activity from working memory demand. Neuron 37:361-367.

18. Botvinick MM (2008) Hierarchical models of behavior and prefrontal function. Trends Cogn Sci 12:201-208.

19. Botvinick MM, Plaut DC (2006) Short-term memory for serial order: a recurrent neural network model. Psychol Rev 113:201-233.

20. Botvinick MM, Niv Y, Barto AC (2009) Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition 113:262-280.

21. Cai X, Padoa-Schioppa C (2012) Neuronal encoding of subjective value in dorsal and ventral anterior cingulate cortex. J Neurosci 32:3791-3808.

22. Camille N, Tsuchida A, Fellows LK (2011a) Double dissociation of stimulus-value and action-value learning in humans with orbitofrontal or anterior cingulate cortex damage. J Neurosci 31:15048-15052.

23. Camille N, Griffiths CA, Vo K, Fellows LK, Kable JW (2011b) Ventromedial frontal lobe damage disrupts value maximization in humans. J Neurosci 31:7527-7532.

24. Carmichael ST, Price JL (1995a) Limbic connections of the orbital and medial pre- frontal cortex in macaque monkeys. J Comp Neurol 363:615-641.

25. Carmichael ST, Price JL (1995b) Sensory and premotor connections of the orbital and medial prefrontal cortex of macaque monkeys. J Comp Neurol 363:642-664.

26. Carmichael ST, Price JL (1996) Connectional networks within the orbital and medial prefrontal cortex of macaque monkeys. J Comp Neurol 371:179-207.

27. Christophel TB, Klink PC, Spitzer B, Roelfsema PR, Haynes JD (2017) The distributed nature of working memory. Trends Cogn Sci 21:111-124.

28. Cisek P, Kalaska JF (2010) Neural mechanisms for interacting with a world full of action choices. Annu Rev Neurosci 33:269-298. Compte A, Brunel N, Goldman-Rakic PS, Wang XJ (2000) Synaptic mechanisms and network dynamics underlying spatial working memory in a cortical network model. Cereb Cortex 10:910-923.

72 29. Constantinidis C, Franowicz MN, Goldman-Rakic PS (2001) The sensory nature of mnemonic representation in the primate prefrontal cortex. Nat Neurosci 4:311-316. Cooper R, Shallice T (2000) Contention scheduling and the control of routine activities. Cogn Neuropsychol 17:297-338.

30. Courtney SM, Petit L, Maisog JM, Ungerleider LG, Haxby JV (1998) An area special- ized for spatial working memory in human frontal cortex. Science 279:1347-1351.

31. Cowan N (2001) The magical number 4 in short-term memory: a reconsideration of mental storage capacity. Behav Brain Sci 24:87-114; discussion 114-185.

32. Crittenden BM, Duncan J (2014) Task difficulty manipulation reveals multiple demand activity but no frontal lobe hierarchy. Cereb Cortex 24:532-540.

33. Croxson PL, Johansen-Berg H, Behrens TE, Robson MD, Pinsk MA, Gross CG, Richter W, Richter MC, Kastner S, Rushworth MF (2005) Quantitative investigation of connec- tions of the prefrontal cortex in the human and macaque using probabilistic diffusion tractography. J Neurosci 25:8854-8866.

34. Cunningham JP, Yu BM (2014) Dimensionality reduction for large-scale neural record- ings. Nat Neurosci 17:1500-1509.

35. Curtis CE, D’Esposito M (2003) Persistent activity in the prefrontal cortex during working memory. Trends Cogn Sci 7:415-423.

36. Curtis CE, Rao VY, D’Esposito M (2004) Maintenance of spatial and motor codes during oculomotor delayed response tasks. J Neurosci 24:3944-3952.

37. D’Esposito M, Postle BR (2015) The of working memory. Annu Rev Psychol 66:115-142.

38. Daw ND, Frank MJ (2009) Reinforcement learning and higher level cognition: intro- duction to special issue. Cognition 113:259-261.

39. Dayan P, Niv Y (2008) Reinforcement learning: the good, the bad and the ugly. Curr Opin Neurobiol 18:185-196.

40. Desrochers TM, Chatham CH, Badre D (2015a) The necessity of rostrolateral pre- frontal cortex for higher-level sequential behavior. Neuron 87:1357-1368.

41. Desrochers TM, Burk DC, Badre D, Sheinberg DL (2015b) The monitoring and control of task sequences in human and non-human primates. Frontiers in systems neuroscience 9:185.

42. Dietterich TG (2000) Hierarchical reinforcement learning with the MAXQ value func- tion decomposition. J Artif Intell Res 13:227-303.

73 43. Dotson NM, Goodell B, Salazar RF, Hoffman SJ, Gray CM (2015) Methods, caveats and the future of large-scale microelectrode recordings in the non-human primate. Frontiers in systems neuroscience 9:149.

44. Doyon J, Song AW, Karni A, Lalonde F, Adams MM, Ungerleider LG (2002) Experience- dependent changes in cerebellar contributions to motor sequence learning. Proc Natl Acad Sci USA 99:1017-1022.

45. Druckmann S, Chklovskii DB (2012) Neuronal circuits underlying persistent represen- tations despite time varying activity. Curr Biol 22:2095-2103.

46. Druzgal TJ, D’Esposito M (2001) A neural network reflecting decisions about human faces. Neuron 32:947-955.

47. Dum RP, Strick PL (1991) The origin of corticospinal projections from the premotor areas in the frontal lobe. J Neurosci 11:667-689.

48. Dum RP, Strick PL (1996) Spinal cord terminations of the medial wall motor areas in macaque monkeys. J Neurosci 16:6513-6525.

49. Duncan J (2001) An adaptive coding model of neural function in prefrontal cortex. Nature reviews Neuroscience 2:820-829.

50. Emrich SM, Riggall AC, Larocque JJ, Postle BR (2013) Distributed patterns of activity in sensory cortex reflect the precision of multiple items maintained in visual short-term memory. J Neurosci 33:6516-6523.

51. Eriksen BA, Eriksen CW (1974) Effects of noise letters upon identification of a target letter in a nonsearch task. Percept Psychophys 16:143-149.

52. Ester EF, Serences JT, Awh E (2009) Spatially global representations in human pri- mary visual cortex during working memory maintenance. J Neurosci 29:15258-15265.

53. Feierstein CE, Quirk MC, Uchida N, Sosulski DL, Mainen ZF (2006) Representation of spatial goals in rat orbitofrontal cortex. Neuron 51:495-507.

54. Floyer-Lea A, Matthews PM (2004) Changing brain networks for visuomotor control with increased movement automaticity. J Neurophysiol 92:2405-2412.

55. Frank MJ, Badre D (2012) Mechanisms of hierarchical reinforcement learning in cor- ticostriatal circuits 1: computational analysis. Cereb Cortex 22:509-526.

56. Freedman DJ, Riesenhuber M, Poggio T, Miller EK (2001) Categorical representation of visual stimuli in the primate prefrontal cortex. Science 291:312-316.

57. Funahashi S, Bruce CJ, Goldman-Rakic PS (1989) Mnemonic coding of visual space in the monkey’s dorsolateral prefrontal cortex. J Neurophysiol 61:331-349.

74 58. Funahashi S, Bruce CJ, Goldmanrakic PS (1993a) Dorsolateral prefrontal lesions and oculomotor delayed-response performance - evidence for mnemonic scotomas. J Neu- rosci 13:1479-1497.

59. Funahashi S, Chafee MV, Goldman-Rakic PS (1993b) Prefrontal neuronal activity in rhesus monkeys performing a delayed anti-saccade task. Nature 365:753-756.

60. Furuyashiki T, Holland PC, Gallagher M (2008) Rat orbitofrontal cortex separately en- codes response and outcome information during performance of goal-directed behavior. J Neurosci 28:5127-5138.

61. Fuster JM (1997) The prefrontal cortex: anatomy, physiology, and neuropsychology of the frontal lobe. In: Lippincott-Raven.

62. Fuster JM (2001) The prefrontal cortex - an update: time is of the essence. Neuron 30:319-333.

63. Fuster JM, Alexander GE (1971) Neuron activity related to short-term memory. Sci- ence 173:652-654. Gehring WJ, Goss B, Coles MGH, Meyer DE, Donchin E (1993) A Neural system for error-detection and compensation. Psychological science 4:385-390.

64. Giedd JN, Blumenthal J, Jeffries NO, Castellanos FX, Liu H, Zijdenbos A, Paus T, Evans AC, Rapoport JL (1999) Brain development during childhood and adolescence: a longitudinal MRI study. Nat Neurosci 2:861-863.

65. Goldman-Rakic PS (2011) Circuitry of primate prefrontal cortex and regulation of be- havior by representational memory. In: Handbook of Physiology, The Nervous System, Higher Functions of the Brain (Plum F, ed), pp 373-417: First published in print 1987. doi: 10.1002/cphy.cp010509.

66. Hampshire A, Thompson R, Duncan J, Owen AM (2011) Lateral prefrontal cortex subregions make dissociable contributions during fluid reasoning. Cereb Cortex 21:1- 10.

67. Hare TA, Camerer CF, Rangel A (2009) Self-control in decision-making involves mod- ulation of the vmPFC valuation system. Science 324:646-648.

68. Harrison SA, Tong F (2009) Decoding reveals the contents of visual working memory in early visual areas. Nature 458:632-635.

69. Hayden BY, Platt ML (2010) Neurons in anterior cingulate cortex multiplex informa- tion about reward and action. J Neurosci 30:3339-3346.

70. Henson RN (1998) Short-term memory for serial order: the start-end model. Cogn Psychol 36:73-137.

75 71. Hollerman JR, Schultz W (1998) Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1:304-309.

72. Holroyd CB, Coles MGH (2002) The neural basis. of human error processing: Rein- forcement learning, dopamine, and the error-related negativity. Psychol Rev 109:679- 709.

73. Holroyd CB, McClure SM (2015) Hierarchical control over effortful behavior by rodent medial frontal cortex: a computational model. Psychol Rev 122:54-83.

74. Hoshi E (2006) Functional specialization within the dorsolateral prefrontal cortex: A review of anatomical and physiological studies of non-human primates. Neurosci Res 54:73-84.

75. Humphreys GW, Forde EME (1998) Disordered action schema and action disorgani- sation syndrome. Cognitive Neuropsychology 15:771-811.

76. Ichihara-Takeda S, Funahashi S (2008) Activity of primate orbitofrontal and dorso- lateral prefrontal neurons: effect of reward schedule on task-related activity. J Cogn Neurosci 20:563-579.

77. Ito S, Stuphorn V, Brown JW, Schall JD (2003) Performance monitoring by the anterior cingulate cortex during saccade countermanding. Science 302:120-122.

78. Kawagoe R, Takikawa Y, Hikosaka O (1998) Expectation of reward modulates cognitive signals in the basal ganglia. Nat Neurosci 1:411-416.

79. Kennerley SW, Wallis JD (2009a) Reward-dependent modulation of working memory in lateral prefrontal cortex. J Neurosci 29:3259-3270.

80. Kennerley SW, Wallis JD (2009b) Evaluating choices by single neurons in the frontal lobe: outcome value encoded across multiple decision variables. Eur J Neurosci 29:2061- 2073.

81. Kennerley SW, Behrens TEJ, Wallis JD (2011) Double dissociation of value computa- tions in orbitofrontal and anterior cingulate neurons. Nat Neurosci 14:1581-U1119.

82. Kennerley SW, Dahmubed AF, Lara AH, Wallis JD (2009) Neurons in the frontal lobe encode the value of multiple decision variables. J Cogn Neurosci 21:1162-1178.

83. Kipke DR, Shain W, Buzsaki G, Fetz E, Henderson JM, Hetke JF, Schalk G (2008) Advanced neurotechnologies for chronic neural interfaces: new horizons and clinical opportunities. J Neurosci 28:11830-11838.

84. Koechlin E, Ody C, Kouneiher F (2003) The architecture of cognitive control in the human prefrontal cortex. Science 302:1181-1185.

76 85. Kondo H, Saleem KS, Price JL (2005) Differential connections of the perirhinal and parahippocampal cortex with the orbital and medial prefrontal networks in macaque monkeys. J Comp Neurol 493:479-509.

86. Kubota K, Niki H (1971) Prefrontal cortical unit activity and delayed alternation performance in monkeys. J Neurophysiol 34:337-347.

87. Lara AH, Wallis JD (2012) Capacity and precision in an animal model of visual short- term memory. Journal of Vision 12:13.

88. Lara AH, Wallis JD (2014) Executive control processes underlying multi-item working memory. Nat Neurosci 17:876-883.

89. Lara AH, Wallis JD (2015) The role of prefrontal Cortex in working memory: a mini review. Frontiers in systems neuroscience 9:173.

90. Lara AH, Kennerley SW, Wallis JD (2009) Encoding of gustatory working memory by orbitofrontal neurons. J Neurosci 29:765-774.

91. Lashley KS (1951) The problem of serial order in behavior. In: Cerebral Mechanisms in Behavior (Jeffress LA, ed), pp 112-146. New York, NY: John Wiley and Sons.

92. Lebedev MA, Messinger A, Kralik JD, Wise SP (2004) Representation of attended versus remembered locations in prefrontal cortex. PLoS Biol 2:e365.

93. Lee D, Seo H, Jung MW (2012) Neural basis of reinforcement learning and decision making. Annu Rev Neurosci, 35:287-308.

94. Levy R, Goldman-Rakic PS (2000) Segregation of working memory functions within the dorsolateral prefrontal cortex. Exp Brain Res 133:23-32.

95. Luk CH, Wallis JD (2009) Dynamic encoding of responses and outcomes by neurons in medial prefrontal cortex. J Neurosci 29:7526-7539.

96. Luk CH, Wallis JD (2013) Choice coding in frontal cortex during stimulus-guided or action-guided decision-making. J Neurosci 33:1864-1871.

97. Mackey S, Petrides M (2010) Quantitative demonstration of comparable architectonic areas within the ventromedial and lateral orbital frontal cortex in the human and the macaque monkey brains. Eur J Neurosci 32:1940-1950.

98. Mackey WE, Devinsky O, Doyle WK, Meager MR, Curtis CE (2016) Human dor- solateral prefrontal cortex is not necessary for spatial working memory. J Neurosci 36:2847-2856.

99. Matsumoto K, Suzuki W, Tanaka K (2003) Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science 301:229-232.

77 100. Matsumoto M, Matsumoto K, Abe H, Tanaka K (2007) Medial prefrontal cell activity signaling prediction errors of action values. Nat Neurosci 10:647-656. 101. McClure SM, Berns GS, Montague PR (2003) Temporal prediction errors in a passive learning task activate human striatum. Neuron 38:339-346. 102. Meyers EM, Freedman DJ, Kreiman G, Miller EK, Poggio T (2008) Dynamic pop- ulation coding of category information in inferior temporal and prefrontal cortex. J Neurophysiol 100:1407-1419. 103. Miller EK, Cohen JD (2001) An integrative theory of prefrontal cortex function. Annu Rev Neurosci 24:167-202. 104. Miller EK, Erickson CA, Desimone R (1996) Neural mechanisms of visual working memory in prefrontal cortex of the macaque. J Neurosci 16:5154-5167. 105. Miller GA (1956) The magical number seven plus or minus two: some limits on our capacity for processing information. Psychol Rev 63:81-97. 106. Milner B (1963) Effects of different brain lesions on card sorting - role of frontal lobes. Arch Neurol 9:90-100. 107. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518:529-533. 108. Morecraft RJ, Geula C, Mesulam MM (1992) Cytoarchitecture and neural afferents of orbitofrontal cortex in the brain of the monkey. J Comp Neurol 323:341-358. 109. Morrison SE, Salzman CD (2009) The convergence of information about rewarding and aversive stimuli in single neurons. J Neurosci 29:11471-11483. 110. Mountcastle VB, Lynch JC, Georgopoulos A, Sakata H, Acuna C (1975) Posterior parietal association cortex of the monkey: command functions for operations within extrapersonal space. J Neurophysiol 38:871-908. 111. Muller NG, Machado L, Knight RT (2002) Contributions of subregions of the prefrontal cortex to working memory: evidence from brain lesions in humans. J Cogn Neurosci 14:673-686. 112. Murray JD, Bernacchia A, Roy NA, Constantinidis C, Romo R, Wang XJ (2017) Stable population coding for working memory coexists with heterogeneous neural dynamics in prefrontal cortex. Proc Natl Acad Sci USA 114:394-399. 113. Nimchinsky EA, Hof PR, Young WG, Morrison JH (1996) Neurochemical, morphologic, and laminar characterization of cortical projection neurons in the cingulate motor areas of the macaque monkey. J Comp Neurol 374:136-160.

78 114. Niv Y (2009) Reinforcement learning in the brain. J Math Psychol 53:139-154.

115. Norman DA, Shallice T (1986) Attention to action. In: Consciousness and Self- Regulation: Advances in Research and Theory Volume 4 (Davidson RJ, Schwartz GE, Shapiro D, eds), pp 1-18. Boston, MA: Springer US.

116. O’Doherty J, Critchley H, Deichmann R, Dolan RJ (2003) Dissociating valence of outcome from behavioral control in human orbital and ventral prefrontal cortices. J Neurosci 23:7931-7939.

117. Ongur D, Price JL (2000) The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans. Cereb Cortex 10:206-219.

118. Ostlund SB, Balleine BW (2007) Orbitofrontal cortex mediates outcome encoding in Pavlovian but not instrumental conditioning. J Neurosci 27:4819-4825.

119. Owen AM, Downes JJ, Sahakian BJ, Polkey CE, Robbins TW (1990) Planning and spa- tial working memory following frontal lobe lesions in man. Neuropsychologia 28:1021- 1034.

120. Oyama K, Hernadi I, Iijima T, Tsutsui KI (2010) Reward prediction error coding in dorsal striatal neurons. J Neurosci 30:11447-11457.

121. Padoa-Schioppa C (2011) Neurobiology of economic choice: a good-based model. Annu Rev Neurosci 34:333-359.

122. Padoa-Schioppa C, Assad JA (2006) Neurons in the orbitofrontal cortex encode eco- nomic value. Nature 441:223-226.

123. Padoa-Schioppa C, Assad JA (2008) The representation of economic value in the or- bitofrontal cortex is invariant for changes of menu. Nat Neurosci 11:95-102.

124. Pan X, Sawa K, Tsuda I, Tsukada M, Sakagami M (2008) Reward prediction based on stimulus categorization in primate lateral prefrontal cortex. Nat Neurosci 11:703-712.

125. Pasternak T, Lui LL, Spinelli PM (2015) Unilateral prefrontal lesions impair memory- guided comparisons of contralateral visual motion. J Neurosci 35:7095-7105.

126. Petrides M (2000) Impairments in working memory after frontal cortical excisions. Adv Neurol 84:111-118.

127. Petrides M, Milner B (1982) Deficits on subject-ordered tasks after frontal- and temporal- lobe lesions in man. Neuropsychologia 20:249-262.

128. Petrides M, Pandya DN (1984) Projections to the frontal-cortex from the posterior parietal region in the rhesus-monkey. J Comp Neurol 228:105-116.

79 129. Petrides M, Pandya DN (1999) Dorsolateral prefrontal cortex: comparative cytoarchi- tectonic analysis in the human and the macaque brain and corticocortical connection patterns. Eur J Neurosci 11:1011-1036.

130. Petrides M, Pandya DN (2002) Comparative cytoarchitectonic analysis of the human and the macaque ventrolateral prefrontal cortex and corticocortical connection patterns in the monkey. Eur J Neurosci 16:291-310.

131. Phillips CG (1973) Proceedings: Hughlings Jackson Lecture. Cortical localization and sensorimotor processes at middle level in primates. P Roy Soc Med 66:987-1002.

132. Pickens CL, Saddoris MP, Setlow B, Gallagher M, Holland PC, Schoenbaum G (2003) Different roles for orbitofrontal cortex and basolateral amygdala in a reinforcer deval- uation task. J Neurosci 23:11078-11084.

133. Platt ML, Glimcher PW (1999) Neural correlates of decision variables in parietal cor- tex. Nature 400:233-238.

134. Postle BR (2006) Working memory as an emergent property of the mind and brain. Neuroscience 139:23-38.

135. Postle BR (2015) The cognitive neuroscience of visual short-term memory. Curr Opin Behav Sci 1:40-46.

136. Postle BR, Druzgal TJ, D’Esposito M (2003) Seeking the neural substrates of visual working memory storage. Cortex 39:927-946.

137. Procyk E, Goldman-Rakic PS (2006) Modulation of dorsolateral prefrontal delay ac- tivity during self-organized behavior. J Neurosci 26:11313-11323.

138. Rainer G, Asaad WF, Miller EK (1998) Selective representation of relevant information by neurons in the primate prefrontal cortex. Nature 393:577-579.

139. Rainer G, Rao SC, Miller EK (1999) Prospective coding for objects in primate pre- frontal cortex. J Neurosci 19:5493-5505.

140. Ranganath C, DeGutis J, D’Esposito M (2004) Category-specific modulation of inferior temporal activity during working memory encoding and maintenance. Brain Res Cogn Brain Res 20:37-45.

141. Rangel A, Hare T (2010) Neural computations associated with goal-directed choice. Curr Opin Neurobiol 20:262-270.

142. Rescorla RA, Wagner AR (1972) A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Classical conditioning II (Black AH, Prokasy WF, eds), pp 64-99. New York: Appleton-Century-Crofts.

80 143. Ribas-Fernandes JJF, Solway A, Diuk C, McGuire JT, Barto AG, Niv Y, Botvinick MM (2011) A neural signature of hierarchical reinforcement learning. Neuron 71:370- 379.

144. Rich EL, Wallis JD (2014) Medial-lateral organization of the orbitofrontal Cortex. J Cogn Neurosci 26:1347-1362.

145. Rich EL, Wallis JD (2016) Decoding subjective decisions from orbitofrontal cortex. Nat Neurosci 19:973-980.

146. Riggall AC, Postle BR (2012) The Relationship between working memory storage and elevated activity as measured with functional magnetic resonance imaging. J Neurosci 32:12990-12998.

147. Roesch MR, Olson CR (2003) Impact of expected reward on neuronal activity in pre- frontal cortex, frontal and supplementary eye fields and premotor cortex. J Neuro- physiol 90:1766-1789.

148. Roesch MR, Calu DJ, Schoenbaum G (2007) Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat Neurosci 10:1615-1624.

149. Rolls ET (1996) The orbitofrontal cortex. Philos T Roy Soc B 351:1433-1443.

150. Rolls ET, Hornak J, Wade D, Mcgrath J (1994) Emotion-related learning in patients with social and emotional changes associated with frontal-lobe damage. J Neurol Neurosur Ps 57:1518-1524.

151. Rose JE, Woolsey CN (1948) The orbitofrontal cortex and its connections with the mediodorsal nucleus in rabbit, sheep and cat. Res Publ Assoc Res Nerv Ment Dis 27:210-232.

152. Rudebeck PH, Behrens TE, Kennerley SW, Baxter MG, Buckley MJ, Walton ME, Rushworth MF (2008) Frontal cortex subregions play distinct roles in choices between actions and stimuli. J Neurosci 28:13775-13785.

153. Rushworth MF, Behrens TE, Rudebeck PH, Walton ME (2007) Contrasting roles for cingulate and orbitofrontal cortex in decisions and social behaviour. Trends Cogn Sci 11:168-176.

154. Rypma B, Prabhakaran V, Desmond JE, Glover GH, Gabrieli JD (1999) Load-dependent roles of frontal brain regions in the maintenance of working memory. Neuroimage 9:216-226.

155. Sakai K, Rowe JB, Passingham RE (2002) Active maintenance in prefrontal area 46 creates distractor-resistant memory. Nat Neurosci 5:479-484.

81 156. Sallet J, Quilodran R, Rothe M, Vezoli J, Joseph JP, Procyk E (2007) Expectations, gains, and losses in the anterior cingulate cortex. Cognitive, affective & behavioral neuroscience 7:327-336. 157. Schapiro AC, Rogers TT, Cordova NI, Turk-Browne NB, Botvinick MM (2013) Neural representations of events arise from temporal community structure. Nat Neurosci 16:486-492. 158. Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593-1599. 159. Schultz W, Tremblay L, Hollerman JR (2000) Reward processing in primate orbitofrontal cortex and basal ganglia. Cereb Cortex 10:272-284. 160. Seo H, Lee D (2007) Temporal filtering of reward signals in the dorsal anterior cingulate cortex during a mixed-strategy game. J Neurosci 27:8366-8377. 161. Seo H, Lee D (2009) Behavioral and neural changes after gains and losses of conditioned reinforcers. J Neurosci 29:3627-3641. 162. Seo H, Barraclough DJ, Lee D (2007) Dynamic signals related to choices and outcomes in the dorsolateral prefrontal cortex. Cereb Cortex 17 Suppl 1:i110-i117. 163. Serences J, Scolari M, Awh E (2009) Online response-selection and the attentional blink: Multiple-processing channels. Vis cogn 17:531-554. 164. Shallice T (1982) Specific impairments of planning. Philos Trans R Soc Lond B Biol Sci 298:199-209. 165. Shallice T, Burgess PW (1991) Deficits in strategy application following frontal lobe damage in man. Brain 114:727-741. 166. Shima K, Tanji J (1998) Role for cingulate motor area cells in voluntary movement selection based on reward. Science 282:1335-1338. 167. Sigala N, Kusunoki M, Nimmo-Smith I, Gaffan D, Duncan J (2008) Hierarchical coding for sequential task events in the monkey prefrontal cortex. Proc Natl Acad Sci USA 105:11969-11974. 168. Sreenivasan KK, Curtis CE, D’Esposito M (2014a) Revisiting the role of persistent neural activity during working memory. Trends Cogn Sci 18:82-89. 169. Sreenivasan KK, Gratton C, Vytlacil J, D’Esposito M (2014b) Evidence for working memory storage operations in perceptual cortex. Cognitive, affective & behavioral neuroscience 14:117-128. 170. Stokes MG, Kusunoki M, Sigala N, Nili H, Gaffan D, Duncan J (2013) Dynamic coding for cognitive control in prefrontal cortex. Neuron 78:364-375.

82 171. Sugihara T, Diltz MD, Averbeck BB, Romanski LM (2006) Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. J Neurosci 26:11138-11147. 172. Sul JH, Kim H, Huh N, Lee D, Jung MW (2010) Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron 66:449-460. 173. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. In: MIT press Cambridge. 174. Sutton RS, Precup D, Singh S (1999) Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif Intell 112:181-211. 175. Tremblay L, Schultz W (1999) Relative reward preference in primate orbitofrontal cortex. Nature 398:704-708. 176. Tsuchida A, Fellows LK (2009) Lesion evidence that two distinct regions within pre- frontal cortex are critical for n-back performance in humans. J Cogn Neurosci 21:2263- 2275. 177. Tsujimoto S, Postle BR (2012) The prefrontal cortex and oculomotor delayed response: a reconsideration of the ”mnemonic scotoma”. J Cogn Neurosci 24:627-635. 178. Tsujimoto S, Genovesio A, Wise SP (2009) Monkey orbitofrontal cortex encodes re- sponse choices near feedback time. J Neurosci 29:2569-2574. 179. Uylings HBM, Groenewegen HJ, Kolb B (2003) Do rats have a prefrontal cortex? Behav Brain Res 146:3-17. 180. van Wingerden M, Vinck M, Lankelma JV, Pennartz CMA (2010) Learning-associated gamma-band phase-locking of action-outcome selective neurons in orbitofrontal cortex. J Neurosci 30:10025-10038. 181. Vogel EK, Machizawa MG (2004) Neural activity predicts individual differences in visual working memory capacity. Nature 428:748-751. 182. Voytek B, Knight RT (2010) Prefrontal cortex and basal ganglia contributions to visual working memory. Proc Natl Acad Sci USA 107:18167-18172. 183. Wallis JD (2007) Orbitofrontal cortex and its contribution to decision-making. Annu Rev Neurosci 30:31-56. 184. Wallis JD (2012) Cross-species studies of orbitofrontal cortex and value-based decision- making. Nat Neurosci 15:13-19. 185. Wallis JD, Miller EK (2003) Neuronal activity in primate dorsolateral and orbital prefrontal cortex during performance of a reward preference task. Eur J Neurosci 18:2069-2081.

83 186. Wallis JD, Kennerley SW (2010) Heterogeneous reward signals in prefrontal cortex. Curr Opin Neurobiol 20:191-198.

187. Wallis JD, Anderson KC, Miller EK (2001) Single neurons in prefrontal cortex encode abstract rules. Nature 411:953-956.

188. Wegener SP, Johnston K, Everling S (2008) Microstimulation of monkey dorsolateral prefrontal cortex impairs antisaccade performance. Exp Brain Res 190:463-473.

189. White IM, Wise SP (1999) Rule-dependent neuronal activity in the prefrontal cortex. Exp Brain Res 126:315-335.

190. Williams SM, Goldman-Rakic PS (1993) Characterization of the dopaminergic inner- vation of the primate frontal-cortex using a dopamine-specific antibody. Cereb Cortex 3:199-222.

191. Williams ZM, Bush G, Rauch SL, Cosgrove GR, Eskandar EN (2004) Human anterior cingulate neurons and the integration of monetary reward with motor responses. Nat Neurosci 7:1370-1375.

192. Wilson RC, Takahashi YK, Schoenbaum G, Niv Y (2014) Orbitofrontal cortex as a cognitive map of task space. Neuron 81:267-279.

193. Wimmer K, Nykamp DQ, Constantinidis C, Compte A (2014) Bump attractor dynam- ics in prefrontal cortex explains behavioral precision in spatial working memory. Nat Neurosci 17:431-439.

194. Wunderlich K, Rangel A, O’Doherty JP (2010) Economic choices can be made using only stimulus values. Proc Natl Acad Sci USA 107:15005-15010.

195. Yamagata T, Nakayama Y, Tanji J, Hoshi E (2012) Distinct Information Representa- tion and Processing for Goal-Directed Behavior in the Dorsolateral and Ventrolateral Prefrontal Cortex and the Dorsal Premotor Cortex. J Neurosci 32:12934-12949.

196. Zarahn E, Aguirre GK, D’Esposito M (1999) Temporal isolation of the neural correlates of spatial mnemonic processing with fMRI. Brain Res Cogn Brain Res 7:255-268.

84