bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Dorsal and ventral engage opposing networks in the

2 accumbens 3 4 Marielena Sosa1, Hannah R. Joo1, & Loren M. Frank1,2 5 6 1. Graduate Program, Kavli Institute for Fundamental Neuroscience and 7 Department of Physiology, University of California San Francisco, CA 94158, USA. 8 2. Howard Hughes Medical Institute, San Francisco, CA 94158, USA.

1 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Abstract 2 3 of positive experiences require the to link places, events, and reward outcomes. 4 Neural processing underlying the association of spatial experiences with reward is thought to 5 depend on interactions between the hippocampus and the (NAc)1-9. 6 Hippocampal projections to the NAc arise from both the ventral hippocampus (vH) and the 7 dorsal hippocampus (dH)6-12, and studies using optogenetic interventions have demonstrated that 8 either vH5,6 or dH7 input to the NAc can support behaviors dependent on spatial-reward 9 associations. It remains unclear, however, whether dH, vH, or both coordinate 10 processing of spatial-reward information in the hippocampal-NAc circuit under normal 11 conditions. Times of memory reactivation within and outside the hippocampus are marked by 12 hippocampal sharp-wave ripples (SWRs)13-19, discrete events which facilitate investigation of 13 inter-regional information processing. It is unknown whether dH and vH SWRs act in concert or 14 separately to engage NAc neuronal networks, and whether either dH or vH SWRs are 15 preferentially linked to spatial-reward representations. Here we show that dH and vH SWRs 16 occur asynchronously in the awake state and that NAc spatial-reward representations are 17 selectively activated during dH SWRs. We performed simultaneous extracellular recordings in 18 the dH, vH, and NAc of rats learning and performing an appetitive spatial task and during 19 sleep. We found that individual NAc activated during SWRs from one subdivision of 20 the hippocampus were typically suppressed or unmodulated during SWRs from the other. NAc 21 neurons activated during dH versus vH SWRs showed markedly different task-related firing 22 patterns. Only dH SWR-activated neurons were tuned to similarities across spatial paths and past 23 reward, indicating a specialization for the dH-NAc, but not vH-NAc, network in linking reward 24 to discrete spatial paths. These temporally and anatomically separable hippocampal-NAc 25 interactions suggest that dH and vH coordinate opposing channels of processing in 26 the NAc.

2 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Text 2 3 Identifying the nature of coordination between the dH-NAc and vH-NAc pathways 4 requires a simultaneous survey of all three regions. We therefore recorded from dH, vH, and 5 NAc using chronically implanted tetrode arrays in rats (Fig. 1a, Extended Data Fig. 1), in the 6 context of both a dynamic task and interleaved sleep periods. We utilized a 7 “Multiple-W” task19 in which a rat must first learn which three of six maze arms are reward 8 locations and alternate between them, then must transfer the alternation rule to a new set of 9 reward locations (Fig. 1b,c, Extended Data Fig. 2). This task requires reward-based spatial 10 learning and is known to engage dH SWRs18,19, but when vH SWRs occur with respect to awake 11 behavior is unknown. 12 We examined awake SWRs detected in dH and vH during periods of immobility on the 13 task, which occurred primarily at the reward wells (example in Fig. 1d). Both dH and vH SWRs 14 (dSWRs and vSWRs) showed the expected spectral properties and increases in multiunit 15 activity16,20, indicating strong, transient activation of local hippocampal networks at the times of 16 SWRs (Extended Data Fig. 3). 17 Strikingly, despite the existence of dorsoventral connectivity within the hippocampus21 18 and observations of occasional synchrony between dSWRs and vSWRs during sleep20, dSWRs 19 and vSWRs occurred asynchronously during awake immobility on the maze (Fig. 1d,e). Only 20 ~3.7% of dSWRs occurred within 50 ms of a vSWR, which was no more than expected from a 21 shuffle of SWR times (Fig. 1e, Extended Data Fig. 4a,c). This asynchrony was in stark contrast 22 to the prominent synchrony between pairs of recording sites within dH or within vH (Fig. 1f, 23 Extended Data Fig. 4b), indicating temporally separable dH and vH outputs to downstream brain 24 areas during SWRs. 25 DSWRs and vSWRs also showed very different patterns of modulation by novelty and 26 reward. As expected given previous findings15-17,19, dSWRs were more prevalent when the 27 environment was novel and following receipt of reward (Fig. 1g, Extended Data Fig. 4d). Given 28 the strong anatomical projections from the vH to limbic brain areas involved in reward 29 processing5,6,8-11, we expected that vSWRs would show a similar or greater level of reward 30 modulation. Instead, vSWRs maintained a similar rate on rewarded and error trials and were not 31 enhanced during early novelty (Fig. 1h, Extended Data Fig. 4d). The onset time of dSWRs and

3 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 vSWRs also differed: while dSWRs began only after receipt of reward following initial learning, 2 vSWRs were detected as soon as the animal stopped moving (Extended Data Fig. 4e,f). Thus, 3 dSWRs and vSWRs are differently regulated by novelty and reward. 4 The temporal separation of awake dSWRs and vSWRs provided the opportunity to 5 determine whether these events differentially engaged the NAc. Previous work reported 6 activation of NAc neurons during dSWRs in sleep3,4. It remains unknown whether NAc neurons 7 are engaged during awake dSWRs or during either awake or sleep vSWRs. Importantly, it is 8 also unknown whether dSWRs and vSWRs engage similar or contrasting NAc populations. To 9 sample the respective target regions of the sparse dH projection and the much more prominent 10 vH projection6-10,12, we recorded from both the NAc core and medial shell (Extended Data Fig. 11 1d). We then classified NAc single units into putative medium spiny neurons (MSNs) and fast- 12 spiking (FSIs) (Extended Data Fig. 5a) and examined their activity around the times 13 of awake dSWRs and vSWRs. 14 We found that 51% of MSNs either significantly increased or decreased their firing rates 15 around the times of dSWRs and/or vSWRs. Strikingly, the observed firing rate changes were 16 often opposite for dSWRs and vSWRs, such that 10.6% of cells were significantly dSWR- 17 activated and vSWR-suppressed (D+V-) or dSWR-suppressed and vSWR-activated (D-V+) (Fig. 18 2a,b). Crucially, this fraction was significantly larger than would be expected by total chance 19 overlap of the D+/V- and D-/V+ subgroups, while the fraction of co-positively modulated cells 20 (3.2% D+V+) was not greater than chance. In addition, many cells were significantly modulated 21 during only dSWRs or vSWRs but not both (Fig. 2b). Across the full population of MSNs (Fig. 22 2c, see also Extended Data Fig. 6a), we found a significant anti-correlation in SWR modulation 23 amplitudes (Fig. 2d), demonstrating that dSWRs and vSWRs are consistently associated with 24 opposite activity changes at the level of individual NAc MSNs. We verified that these results 25 held when the small number of coincident dSWRs and vSWRs were excluded (Extended Data 26 Fig. 6e-g) and when we applied a conservative criterion (Extended Data Fig. 5c-j) to ensure that 27 each cell was included only once (Extended Data Fig. 6h-j). We also noted that MSN activity 28 changes predominantly followed dH or vH neuronal activation during SWRs, suggesting 29 hippocampus to NAc information flow (Extended Data Fig. 6b). 30 The same opposing patterns of modulation were seen when we examined FSIs, applying 31 the same criterion to ensure that each cell was included only once (Extended Data Fig. 5c-j).

4 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 The majority of FSIs were SWR-modulated (85%), and we identified FSIs that were D+V-, D- 2 V+, or only D+ or V+, but none that were D+V+ (Fig. 2e,f). The FSI population as a whole 3 showed anti-correlated modulation during dSWRs versus vSWRs (Fig. 2g,h, Extended Data Fig. 4 6c,d). As individual FSIs innervate large populations of MSNs22, FSIs may play an important 5 role in coordinating NAc responses to hippocampal inputs7. We also found that SWR- 6 modulation of both MSNs and FSIs was anatomically distributed in a pattern consistent with 7 reported dH and vH projections to the NAc6-10,12, with V+ neurons present in both the medial 8 shell and parts of the core and D+ neurons restricted to the core (Extended Data Fig. 7). 9 The Multiple-W task involves reward associations with particular spatial paths, allowing 10 us to test the prevailing hypothesis that vH would most strongly engage valence-related 11 representations downstream10,11,23-26, including reward associations in the NAc. We speculated 12 that dSWR- and vSWR-activated NAc representations may be distinctly related to the task, since 13 when one is activated the other is typically suppressed. 14 Consistent with the possibility of distinct representations, we identified multiple 15 differences between dSWR- and vSWR-activated NAc firing patterns outside of SWRs. 16 Surprisingly, these differences highlighted a specificity for spatial-reward information in the 17 dSWR-activated neurons. To examine the dSWR- and vSWR-activated populations 18 independently, we grouped together all cells that were D+ (only D+ or D+V-) and separately, all 19 cells that were V+ (only V+ or D-V+), excluding the D+V+ cells. We then examined the firing 20 patterns of each population on the six rewarded trajectories across the two alternation sequences. 21 Each trajectory included both time spent at the reward well starting the trajectory (“well,” 22 excluding spikes during SWRs) and time spent during movement between reward wells (“path”; 23 Fig. 3a). 24 First, we found that D+ MSNs tended to fire very similarly across distinct trajectories as 25 a function of both normalized trial time and linearized position (see Online Methods). We 26 quantified firing similarity as a mean coefficient of determination (r2) across pairs of trajectories. 27 D+ MSNs with high r2 values were “tuned” to the same relative point of progression through 28 each trajectory in both time and distance, regardless of actual spatial location or movement 29 direction (examples in Fig. 3b, left). This preferred trajectory stage varied across D+ cells, such 30 that D+ population activity spanned the full extent of each trajectory (Extended Data Fig. 8a). 31 Such patterns have been reported in dorsal and ventral striatum27-29 and are consistent with dH

5 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 input30, but they have not been previously linked to dSWR activation. Moreover, these D+ MSN 2 firing patterns mirror the subset of medial prefrontal cortical neurons that are activated during 3 dSWRs31, consistent with a role for dSWRs in broadly engaging task-related representations that 4 generalize across different spatial paths. 5 By contrast, many V+ MSNs showed sparser firing patterns that were largely 6 uncorrelated between distinct trajectories (Fig. 3b, right) and even between traversals on the 7 same trajectory (Extended Data Fig. 9a,b). Across the population, D+ MSNs showed 8 significantly higher firing similarity across trajectories compared to both the V+ MSNs and 9 unmodulated (N) MSNs (Fig. 3c,d). Furthermore, the D+ population had much higher mean 10 firing rates on both the path and well components suggesting greater task engagement (Fig. 3e,f, 11 Extended Data Fig. 8b-e). Increased correlations across trajectories persisted when we controlled 12 for firing rates (Extended Data Fig. 8f,g). These findings indicate that, in the context of our task, 13 D+ cells (and D+V+ cells, Extended Data Fig. 8h) are much more active and express clear task- 14 related firing properties that are not evident in cells that are V+ and not D+ (see also Extended 15 Data Fig. 9c-g). 16 We also observed a strong and differential effect of reward history on D+ MSNs, 17 consistent with a role for these neurons in representing the outcome of the previous choice8,9. We 18 computed a reward history preference for each MSN from the difference in its mean firing rate 19 curve on paths following a rewarded well visit versus an unrewarded visit (error), controlling for 20 movement speed and the influence of upcoming choice (Fig. 3g,h; Extended Data Fig. 8i-l). We 21 found that the D+ population fired more on paths following reward than following an error, 22 demonstrating a clear reward history preference that was not seen in the V+ or unmodulated 23 populations (Fig. 3h). At the same time, we were surprised to find an overall lack of enhanced 24 firing during receipt of reward at the reward sites, given previous work suggesting reward-site- 25 specificity for NAc neurons activated during dSWRs in sleep3,4. While we found individual 26 examples of MSNs that had significantly higher firing rates during rewarded as opposed to 27 unrewarded times at the wells, we also found cells that showed the opposite pattern, and neither 28 the D+ nor V+ populations were enriched for cells showing reward-specific firing at the wells 29 (Extended Data Fig. 9h-k). These results suggest that a learned relationship between spatial 30 paths and reward outcomes is selectively reflected in D+ MSNs.

6 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Finally, we asked whether D+ and V+ neurons constitute separate networks across both 2 waking task performance and sleep, as we would expect if these physiological subtypes reflect 3 distinct anatomical networks. During sleep there is greater synchrony between dSWRs and 4 vSWRs20 (~6.7% of dSWRs, Extended Data Fig. 10), so we first excluded all pairs of dSWRs 5 and vSWRs that occurred within 250 ms of each other. In the remaining, temporally isolated 6 SWRs in sleep, we found that the proportion of single MSNs showing opposing modulation was 7 again greater than chance (Fig. 4a). Furthermore, the population-level opposition during dSWRs 8 versus vSWRs remained apparent, particularly for MSNs (Fig. 4b-f). 9 Patterns of co-firing outside of SWRs provided further evidence for distinct networks. 10 During both awake behavior and sleep, pairs of D+ MSNs showed a stronger tendency to be 11 coactive than D+/V+ pairs. No such difference was seen for V+ MSN pairs as compared to 12 D+/V+ pairs (Fig. 4g,h), perhaps because of the overall low levels of activity of V+ neurons in 13 our task (Fig. 3e,f). These findings indicate that the D+ MSN population constitutes a specific 14 network distinct from the V+ MSN population. 15 Our findings identify two distinct and opposing networks in the NAc, one activated 16 during dSWRs and another activated during vSWRs. These two networks are activated at 17 different times, and only the dSWR-activated (D+) network showed strong encoding of 18 information related to spatial experience and past reward, consistent with a recent report7. The 19 V+ NAc neurons, in contrast, were not evidently tuned to specific features in our appetitive task. 23,32,33 24,26 20 As vH has been shown to encode both broad spatial context and aversive experiences , 21 neither of which was relevant for our task, we suggest that the vH-NAc network could be 22 specialized for associations not engaged in our task such as those between overall context and 23 valence. While the specific role of the vH-NAc network remains unclear, our findings establish 24 an opposition between the dH-NAc and vH-NAc networks that is well suited to support the 25 processing of different aspects of experience at different times.

7 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not Figure 1 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a b 0 1 2 3 4 5

dH 76 cm Sequence A NAc vH

c Acquisition days 1-7 Switch day 1 Switch day 2 1 1 0 1 2 3 4 5 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 Probability of an Sequence B

Accurate Response 0 0 0 200 400 600 800 1000 1158 1358 1538 Trials Trials 1 mV d e 0.03 Raw

100 µV 0.02

dCA1 Ripple 0.01 cross-corr. Normalized

1 mV 0 -0.5 0 0.5 Raw Time of vSWRs relative to dSWRs (s) 100 µV

vCA1 Ripple 1 0 Speed 10 (cm/s) 0 -1

0 0.5 1 1.5 2 2.5 3 (relative to shuffle)

Z-scored cross-corr. -0.5 0 0.5 Time (s) Time of vSWRs relative to dSWRs (s) f within dCA1 g dCA1 h vCA1 40 1.2 rewarded trials 1.2 rewarded trials error trials error trials 1 20 1

Z-scored 0.8 cross-corr. 0.8 0 -0.5 0 0.5 0.6 0.6 Time (s) 0.4 0.4 within vCA1 SWR rate (events/s) SWR rate (events/s) 0.2 0.2 40 0 0 20

Z-scored < 0.3 > 0.8 < 0.6 > 0.8 < 0.3 > 0.8 < 0.6 > 0.8 cross-corr. 0 0.3-0.50.5-0.60.6-0.70.7-0.8 0.6-0.8 0.3-0.50.5-0.60.6-0.70.7-0.8 0.6-0.8 -0.5 0 0.5 1st sequence 2nd sequence 1st sequence 2nd sequence Time (s) (Acquisition) (Switch) (Acquisition) (Switch) Probability correct Probability correct

8 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Figure 1. Awake dorsal and ventral hippocampal SWRs occur asynchronously. 2 a, Sagittal schematic of tetrodes targeting NAc, dH, and vH in the rat brain. 3 b, Schematic of the Multiple-W task. Yellow circles indicate visually identical reward wells. The 4 animal was required to alternate visits from the center well of the “W” to the outer two wells to 5 receive a liquid food reward on each correct well visit. Expanded section depicts four 6 consecutive correct trials. Animals acquired one sequence (either A or B, counterbalanced across 7 animals) to ~80% correct (“Acquisition” days) before the second sequence was introduced on the 8 first “Switch” day. Sequences were thereafter switched across maze epochs (behavior sessions 9 separated by sleep epochs in a separate rest box) for the remainder of the experiment (Extended 10 Data Fig. 2). 11 c, Example behavior (Rat 5) expressed as the probability that the rat is making an accurate 12 choice on each trial according to Sequence A (orange) or B (green) (see Online Methods). Solid 13 line indicates the mode of the probability distribution, shaded region indicates the 90% 14 confidence interval. Colored bars above the plot indicate the rewarded sequence, grey vertical 15 lines mark epoch boundaries, black triangles mark the start of each day, and the horizontal dotted 16 line indicates chance performance (0.167). For , only 2 full switch days are shown on the 17 right (see Extended Data Fig. 2 for the complete behavior). 18 d, Example dSWRs and vSWRs during awake immobility at a reward well (Rat 4). The raw (1- 19 400 Hz) and ripple-filtered (150-250 Hz) local field potential is shown for two tetrodes each in 20 dorsal CA1 (dCA1) and ventral CA1 (vCA1). Shaded regions highlight detected dSWRs (pink) 21 and vSWRs (blue). Bottom plot depicts the speed of the animal. 22 e, Mean cross-correlation histogram (CCH) between onset times of dSWRs and vSWRs across 23 animals (n=5 rats). Top: CCH normalized by the number of dSWRs. The y-axis signifies the 24 fraction of dSWRs that have a vSWR occurring within each time bin. Bottom: CCH z-scored 25 relative to shuffled vSWR onset times. Z-score 0 reflects mean of shuffles. Error bars indicate 26 s.e.m. 27 f, Mean CCH showing high synchrony of awake SWRs between pairs of tetrodes within dCA1 28 (top, n=5 rats) or vCA1 (bottom, n=3 rats with >1 tetrode in vCA1), z-scored relative to shuffled 29 event times. SWRs here were detected on individual tetrodes. Error bars indicate s.e.m. 30 g, Change in mean dSWR rate on rewarded vs. error trials (well visits), calculated per time spent 31 immobile at reward wells (minimum 1 s). Each point represents the mean SWR rate across 32 animals ± s.e.m. within that learning stage (n=4 rats in Acquisition stages 0.7-0.8, >0.8; n=5 rats 33 all other stages; see Online Methods), defined by each animal’s probability of correct 34 performance on each trial (see Fig. 1c, Extended Data Fig. 2). Learning stage is used here as a 35 proxy for novelty; the task and environment are both novel when Acquisition performance of the 36 first sequence is <0.3. For Switch performance, SWR rate is shown only for the second sequence 37 when it was rewarded. Learning stage means are fit by an exponential decay function for both 38 rewarded and error trials (dotted lines indicate 95% confidence bounds on the fit). 39 h, Similar to g, but for vSWR rate in each learning stage (n=4 rats in Acquisition stages 0.7-0.8, 40 >0.8; n=5 rats all other stages). 41

9 Figure 2 b a e f bioRxiv preprint ***14%

0.1 0.2 0.3 Fraction of cells Fraction of cells 31% FR (Hz) SWR # 0.1 0.2 0.3 FR (Hz) SWR # FR (Hz) SWR # 400 200 600 400 200 27% 31% 600 400 200 0 0 10 0500.5 0 -0.5 0500.5 0 -0.5 0 0 5 0 1 0 0500.5 0 -0.5 0 0 D only 116

4 D only 21 0 10% 70 4 SR vSWRs dSWRs SR vSWRs dSWRs 137 Time (s) Time (s) Time (s) 4 V only 49 37 12

2 1 V only 2 246 Cell 7:D+V- 3 Cell 1:D+V- Cell 2:D+V- 49% 15% 23% 16 45 3 1

Both doi:

Both 8 1

certified bypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. ** *** 1000 https://doi.org/10.1101/604116 800 600 400 200 600 400 200 500 10 0500.5 0 -0.5 0500.5 0 -0.5 0500.5 0 -0.5 0 1 0 0 0 5 0 0 V only Both D only D- V+ D+ V- V- V+ D+ Neither V only Both D only D- V- D- V+ D+ V- D+ V+ V- V+ D- D+ Neither Time (s) Time (s) Time (s) g c Sorted NAc cell #

Sorted NAc cell # 500 450 400 350 300 250 200 150 100 12 10 50 05005-. 0.5 0 -0.5 0.5 0 -0.5 05005-. 0.5 0 -0.5 0.5 0 -0.5 8 6 4 2 SRfrn vSWRfiring dSWR firing dSWR firing FR (Hz) SWR # FR (Hz) SWR # FR (Hz) SWR # 200 100 600 400 200 600 400 200 20 0500.5 0 -0.5 0500.5 0 -0.5 0500.5 0 -0.5 0 0 0 1 0 3 0 0 Time fromSWRonset(s) ; Time fromSWRonset(s) this versionpostedApril23,2019. SR vSWRs dSWRs SR vSWRs dSWRs Time (s) Time (s) Time (s) Cell 8:D-V+ Cell 3:D-V+ Cell 4:D-V+ FSIs MSNs 500 450 400 350 300 250 200 150 100 12 10 8 6 4 2 50 400 200 800 600 400 200 600 400 200 20 10 0500.5 0 -0.5 0500.5 0 -0.5 0500.5 0 -0.5 0 0 3 0 vSWR firing 0 1 0 0 Time (s) Time (s) Time (s)

-5 -4 -3 -2 -1 The copyrightholderforthispreprint(whichwasnot 0 1 2 3 4 5 6

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 Z-scored firing rate firing Z-scored

FR (Hz) SWR # rate firing Z-scored FR (Hz) SWR # FR (Hz) SWR # 400 200 600 400 200 600 400 200 h d 10 0500.5 0 -0.5 0500.5 0 -0.5 0 7 0 0 2 0 0500.5 0 -0.5 0 0

Z-scored dSWR mod.10 12 14

Z-scored dSWR mod. -4 -2 SR vSWRs dSWRs SR vSWRs dSWRs -8 -6 -4 -2 0 2 4 6 8 0 2 4 6 8 Time (s) Time (s) 3- -1 -2 -3 Time (s) 202468 6 4 2 0 -2 Z-scored vSWRmod. Cell 9:D+only Cell 5:D+only Z-scored vSWRmod. Cell 6:V+only 1000 5 4 2 1 0 500 400 200 600 400 200 10 0500.5 0 -0.5 0500.5 0 -0.5 0500.5 0 -0.5 0 slope =-0.61 0 0 7 0 0 0 2 p=1.32 x10 slope =-0.46 p =0.032 Time (s) Time (s) r =-0.59 Time (s) r =-0.28 3 -10 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Figure 2. Opposing patterns of NAc modulation during awake dH vs. vH SWRs. 2 a, Examples of single NAc MSNs showing significant modulation during awake SWRs on the 3 Multiple-W task. Spike rasters and peri-event time histograms (PETHs) are aligned to the onset 4 of dSWRs (left within each cell, pink line) or vSWRs (right within each cell, blue line). 5 Horizontal lines separate maze epochs. Categories at the top of each cell indicate directions of 6 significant modulation (positive or negative). All modulations in these examples are significant 7 at p<0.01 (shuffle test). 8 b, Proportions of significantly SWR-modulated NAc MSNs. Top: fractions modulated during 9 dSWRs only (D only), vSWRs only (V only), Both, or neither dSWRs nor vSWRs (Neither) 10 regardless of the direction of modulation. Cell counts are shown in white. Significantly more 11 cells are modulated during Both than would be expected by chance overlap of dSWR- and 12 vSWR-modulated cells (***p=5.44x10-4, z-test for proportions). Bottom: directional modulation 13 of MSNs. Cell counts are shown next to each bar. The fractions of D+V- cells alone and total 14 “opposing” cells (gradient bar, D+V- and D-V+) are higher than would be expected by chance 15 (**p=0.0017 and ***p=6.12x10-4, respectively, z-tests for proportions). All fractions are out of 16 502 MSNs that fired at least 50 spikes around both dSWRs and vSWRs, from all 5 rats. 17 c, NAc MSN population shows opposing modulation during dSWRs vs. vSWRs. Left: dSWR- 18 aligned z-scored PETHs for each MSN ordered by its modulation amplitude (mean z-scored 19 firing rate in the 200 ms following SWR onset). Right: vSWR-aligned z-scored PETHs for the 20 same ordered MSNs shown on the left. Z-scores are calculated within cell relative to the pre- 21 SWR period (-500 to 0 ms). 22 d, Anti-correlation between dSWR and vSWR modulation amplitudes of MSNs. Each point 23 represents a single cell. Dotted line and shaded regions represent a linear fit with 95% confidence 24 intervals. Pearson’s correlation coefficient (r), p-value and slope are shown in upper right. 25 e, Examples of single NAc FSIs showing significant modulation during SWRs. Format and 26 modulation categories as in a, top row. All modulations are significant at p<0.01 (shuffle test). 27 f, Proportions of significantly SWR-modulated FSIs after exclusion of any potential duplicates 28 from holding the same cell across days, which could bias this small population of cells (see 29 Online Methods). Fractions are out of 13 FSIs from 5 rats. Similar to b. 30 g, NAc FSI population shows opposing modulation during dSWRs vs. vSWRs. Similar to c. 31 h, Anti-correlation between dSWR and vSWR modulation amplitudes of FSIs. Similar to d.

11 Figure 3 g b a c Well Speed bioRxiv preprint Mean FR (Hz) Mean r2 in trial time

(cm/s) 0.2 0.4 0.6 0.8 60 14 0 1 0 0 Path N *** Cell 5 Path doi: certified bypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. D+ https://doi.org/10.1101/604116 ***

*** Linearized Normalized D+ MSNs V+ position trial time 12 60 0 0 Mean FR (Hz) Mean FR (Hz) 20 21 0 0 d Well Cell 6

Path Mean r2 in linear pos. 0.2 0.4 0.6 0.8 Cell 1 Path 0 1 r *** r 2 2 =0.781 =0.693 Path N ; this versionpostedApril23,2019. *** D+ MSNs 4.5 60 D+ 0 0 22 25 *** 0 0 V+ Cell 7 Well Path 12 Cell 2 Path e V+ MSNs r r 2 2 =0.757 =0.885 Mean path FR (Hz) Path 10 12 14 60 0 2 4 6 8 0 7 0 The copyrightholderforthispreprint(whichwasnot N *** Cell 8 0 3 0 3 Path D+ Well error reward *** Cell 3 Path V+ r r 2 2 Path =0.080 =0.186 h f Reward history preference V+ MSNs -0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 0.8 -1

Mean well FR (Hz) 10 0 1 2.5 0 2 4 6 8 0 0 2 N N Well *** *** D+ Cell 4 Path D+ *** *** r r 2 2 Path V+ =0.133 =0.189 V+

error reward bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Figure 3. Selective encoding of task-related information in the dH-NAc network. 2 a, Schematic of the 6 rewarded trajectories across Sequence A and B, defined spatially by start 3 and end reward well and directionally by left (top) or right (middle) movement between wells. 4 Bottom: An example trajectory split into its well and path components, where well is the time 5 from nosepoke to turnaround (departure) and path is the time from turnaround to next nosepoke. 6 b, Example NAc MSNs from the D+ and V+ populations showing high (D+, left) vs. low (V+, 7 right) firing similarity (r2) across trajectories. Trajectories are color coded according to the key 8 in a. Cell numbers do not correspond to Fig. 2. Top row: trajectories are plotted as a function of 9 normalized trial time. Grey vertical line marks 100% of the well period and beginning of the path 10 period, and the end of the path period starts a new well visit. Bottom row: trajectories are plotted 11 as a function of linearized position (one-dimensional distance in cm) from each trajectory’s start 12 well to end well. Note that well times fall into the first and last position bins of the path since 13 there is no position change at the well. X-axis for each linearized position plot covers ~220 cm. 14 Pearson’s r2 across all possible pairs of trajectories is shown in upper right. Controls for 15 behavioral variability were included (see Online Methods). 16 c, d, Distributions of mean r2 in normalized trial time (c) and linearized position (d) by SWR- 17 modulation category. c, N (n=226 cells) vs. D+ (n=154 cells): ***p=5.45x10-19; D+ vs. V+ 18 (n=42 cells): ***p=6.33x10-9. d, N (n=222 cells) vs. D+ (n=152 cells): ***p=3.17x10-12; D+ vs 19 V+ (n=40 cells): ***p=5.46x10-7 (Wilcoxon rank-sum tests). Circles are individual cells, boxes 20 show interquartile range, horizontal lines mark the median, triangles mark the 95% confidence 21 interval of the median, whiskers mark non-outlier extremes. 22 e, f, Distributions of mean firing rate outside of SWRs during maze epochs, during path (e) and 23 well periods (f), by SWR-modulation category. e, N (n=226 cells) vs. D+ (n=154 cells): 24 ***p=1.57x10-12; D+ vs. V+ (n=42 cells): ***p=5.70x10-6. f, N vs. D+: ***p=4.94x10-22; D+ 25 vs. V+: ***p=3.69x10-5; V+ vs. N: p=0.024, n.s. with Bonferroni correction for multiple 26 comparisons (Wilcoxon rank-sum tests). Circles and boxes as in c,d. 27 g, Example NAc MSNs from the D+ (left) and V+ (right) populations showing path firing 28 patterns as a function of reward history, defined as whether or not the previous well visit was 29 rewarded. Top: firing rate of each cell (mean ± s.e.m.) on all paths following a reward (teal) vs. 30 all paths following an error (brown). Reward history preference (see Online Methods): Cell 5: 31 0.30, Cell 6: 0.47, Cell 7: 0.025, Cell 8: -0.19. ***p<0.001 indicates cells with significantly 32 higher firing rate on paths following reward compared to error (one-tailed permutation test). 33 Bottom: faded lines indicate speed profiles of individual paths following reward and error, thick 34 lines indicate mean speeds. 35 h, Distributions of reward history preference by SWR-modulation category. Filled circles 36 indicate significantly reward-preferring (>0) or error-preferring (<0) cells, open circles indicate 37 non-significant cells. The D+ population (n=160 cells) is significantly shifted toward positive 38 values (p=8.18x10-13, one-tailed signed-rank test) in additional to being significantly more 39 reward-preferring than the N (n=241 cells, ***p=2.63x10-5) and V+ (n=45 cells, ***p=0.00041) 40 populations.

13 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not Figure 4 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

18% a 219 D only b MSNs during sleep c ***3% 43 Both 114 865 dSWR firing vSWR firing 8 r = -0.20 9% V only -12 6 Z-scored firing rate p = 3.51x10 Neither 5 6 70% 200 200 4 slope = -0.25 0.2 3 4 D+ 400 400 2 28 D- 0.15 1 V+ 600 600 0 2 V- -1 0.1 800 800 0 15 D+ V+ -2 191 -3 3 D+ V- Sorted NAc cell # 1000 1000 0.05 99 17 D- V+ -4 -2 Fraction of cells 3 -5 Z-scored dSWR mod. * D- V- 1200 1200 0 20 * -4 -0.5 0 0.5 -0.5 0 0.5 -4 -2 0 2 4 6 8 Both D only V only Time from SWR onset (s) Z-scored vSWR mod.

38% FSIs during sleep r = -0.14 d 31% e f 4 D only dSWR firing vSWR firing p = 0.64 5 Both 3

V only 6 Z-scored firing rate slope = -0.16 3 2 2 5 1 Neither 23% 4 2 8% 4 4 3 D+ 6 6 2 0.3 D- 1 1 0 V+ 8 8 -1 0.2 1 4 10 10 -2 -3 0 0.1 D+ V+ Sorted NAc cell # 2 12 12 -4 D- V+ -5 Z-scored dSWR mod.

Fraction of cells 1 0 -1 -0.5 0 0.5 -0.5 0 0.5 -1 10 2 3 Both D only V only Time from SWR onset (s) Z-scored vSWR mod.

Awake Sleep Awake Sleep Awake Sleep g h 0.4 1 1 *** * 0.9 0.9 0.8 *** *** 0.8 0.3 0.7 0.7 0.6 0.6 0.5 0.5 0.2 0.4 D+/V+ 0.4 D+/V+

of cell pairs 0.3 0.3 D+/D+ V+/V+ 0.1 0.2 0.2 (fraction of pairs) Positive correlations Cumulative fraction 0.1 0.1 0 0 0 0-10 10 20 0-10 10 20 0-10 10 20 0-10 10 20 Z-scored spike cross-correlation Z-scored spike cross-correlation D+ / V+D+ / D+V+ / V+ D+ / V+D+ / D+V+ / V+

14 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Figure 4. Hippocampal-NAc network patterns are maintained during sleep. 2 a, Proportions of NAc MSNs showing significant modulation during asynchronous (more than 3 250 ms apart) dSWRs and vSWRs in sleep, similar to Fig. 2b. Top: fractions of modulated 4 MSNs regardless of the direction of modulation. Cell counts are shown in white. Significantly 5 more cells are modulated during Both than would be expected by chance overlap of dSWR- and 6 vSWR-modulated cells (***p=1.50x10-4, z-test for proportions). Bottom: directional modulation 7 of NAc MSNs. Cell counts are shown next to each bar. The fraction of D-V+ cells alone and 8 total “opposing” cells (gradient bar, D+V- and D-V+) are higher than would be expected by 9 chance (*p=0.012 and *p=0.037, respectively, z-tests for proportions). All fractions are out of 10 1241 MSNs that fired at least 50 spikes around both dSWRs and vSWRs in sleep, from all 5 rats. 11 b, NAc MSN population activity shows opposing modulation during asynchronous dSWRs and 12 vSWRs in sleep, similar to Fig. 2c. Left: dSWR-aligned z-scored PETHs for each MSN ordered 13 by its modulation amplitude. Right: vSWR-aligned z-scored PETHs for the same ordered MSNs 14 shown on the left. 15 c, Anti-correlation between dSWR and vSWR modulation amplitudes of MSNs, similar to Fig. 16 2d. Each point represents a single cell. Dotted line and shaded regions represent a linear fit with 17 95% confidence intervals. Pearson’s correlation coefficient (r), p-value and slope are shown in 18 upper right. 19 d, Similar to a, but for fractions of FSIs showing significant modulation during asynchronous 20 SWRs in sleep, after removal of potential duplicate cells. Fractions are out of 13 FSIs. 21 e, Similar to b, but for FSI population in sleep. 22 f, Similar to c, but for FSI population in sleep. 23 g, Cumulative distributions of z-scored spike cross-correlations (mean at zero lag ±10 ms) 24 between pairs of MSNs outside SWRs. Left two plots: pairs of D+ MSNs (D+/D+) vs. pairs of 25 D+ and V+ MSNs (D+/V+) during awake movement (far left) and during sleep (second from 26 left). The D+/D+ distribution is significantly shifted to the right of the D+/V+ distribution in 27 both states (awake ***p=5.01x10-7; sleep ***p=2.86x10-5; one-tailed Wilcoxon rank-sum tests). 28 Right two plots: pairs of V+ MSNs (V+/V+) vs. D+/V+ pairs, during awake movement (second 29 from right) and during sleep (far right). The V+/V+ distribution is not significantly different 30 from the D+/V+ distribution in either state. D+/V+ distributions are repeated from left to right 31 for clarity. D+ and V+ categories are assigned based on awake SWR modulation, excluding cells 32 that are both D+V+. Sleep cross-correlations were only calculated for awake-SWR-modulated 33 cells also active in sleep (awake n=272 D+/D+ pairs, 146 D+/V+ pairs, 10 V+/V+ pairs; sleep 34 n=161 D+/D+ pairs, 75 D+/V+ pairs, 3 V+/V+ pairs). 35 h, Fraction of cell pairs exhibiting positive spike cross-correlations (≥2 z-scores, mean at zero 36 lag ±10 ms). Awake: D+/D+ vs. D+/V+: ***p=1.11x10-9; D+/D+ vs. V+/V+: p=0.094. Sleep: 37 D+/D+ vs. D+/V+: *p=0.025; D+/D+ vs. V+/V+: p=0.32 (z-tests for proportions).

15 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

16 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Extended Data Figure 1. Histological identification of recording sites. 2 a, b, Nissl-stained coronal sections showing example tetrode lesions marked by black arrows in 3 (a) dCA1 and dCA3 and (b) NAc, in Rat 5. 4 c, Nissl-stained coronal sections showing example tetrode lesions in vH. Each section includes 5 the tetrode used for vSWR detection in each rat (red arrows). In a, b, and c, scale bars are 1 mm. 6 d, Summary of all NAc recording locations across rats, aligned to the nearest representative 7 section adapted from the Paxinos & Watson Wistar rat brain atlas (2007). Dots mark the last 8 recorded depth of each tetrode at the end of the experiment. Dotted lines indicate approximate 9 borders of NAc core and shell. Grey shaded regions represent ventricles. All distances are in AP 10 coordinates (mm) relative to Bregma and correspond to original plate labels from the atlas; note 11 that theses distances do not correspond exactly to true recording locations in our Long-Evans rats 12 and are for illustration purposes only. For true implant coordinates, see Online Methods. 13 e, f, Summary of all recording sites in dH (e) and vH (f). Light blue regions represent stratum 14 pyramidale of CA1/CA3 or layer of , darker blue regions represent 15 stratum pyramidale of CA2. Grey shaded regions represent ventricles. Red arrows indicate the 5 16 sites used for vSWR detection (1 site per animal). Abbreviations: ac, anterior commissure; 17 NAcC, nucleus accumbens core; NAcSh, nucleus accumbens shell; cc, corpus callosum; DG, 18 dentate gyrus; DS, dorsal ; VS, ventral subiculum.

17 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not Extended Data certifiedFigure by peer 2review) is the author/funder. All rights reserved. No reuse allowed without permission.

Acquisition Switch Rat 1 1

0.8

0.6

0.4 Probability of an

Accurate Response 0.2 0 0 500 1000 1500 2000 2500 3000 Acquisition Switch

Rat 2 1

0.8

0.6

0.4 Probability of an

Accurate Response 0.2 0 0 500 1000 1500 2000 2500 3000 Acquisition Switch

Rat 3 1

0.8

0.6

0.4

Probability of an 0.2 Accurate Response 0 0 500 1000 1500 2000 2500 3000 3500 Acquisition Switch

Rat 4 1

0.8

0.6

0.4

Probability of an 0.2 Accurate Response 0 0 500 1000 1500 2000 2500 3000 Acquisition Switch

Rat 5 1

0.8

0.6

0.4

Probability of an 0.2 Accurate Response 0 500 1000 1500 2000 2500 3000 Trials

18 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Extended Data Figure 2. Multiple-W behavior. Behavior from each rat expressed as the 2 probability that the rat is making an accurate choice on each trial according to Sequence A 3 (orange) or B (green) (see Online Methods). Solid line indicates the mode of the probability 4 distribution, shaded region indicates the 90% confidence interval. Colored bars at the top of each 5 plot indicate the rewarded sequence, grey vertical lines mark epoch boundaries, black triangles 6 mark the start of days, and the horizontal dotted line indicates chance performance: 1/6 (0.167).

19 Extended DataFigure3 bioRxiv preprint f c a vCA3 SWRs vCA1 SWRs dCA1 SWRs

Fraction of SWRs0.35 Sleep Awake

MUA FR (Hz) MUA FR (Hz) MUA FR (Hz) MUA FR (Hz) MUA FR (Hz) 0 Frequency (Hz) Frequency (Hz) 100 100 200 120 160 100 150 200 250 300 100 150 200 250 300 12 16 20 40 60 80 10 20 30 40 50 20 40 80 40 80 60 50 50 030-0.3 0 -0.3 -0.5 -0.5 -0.5 -0.5 -0.5 -0.3 0 4 8 0 0 0 0 Time fromSWR(s) 50 wk Sleep Awake *** doi: Time fromSWR(s) SWR amplitude(s.d.) certified bypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. 015 10 https://doi.org/10.1101/604116 dCA1 Awake -0.3 0 0.5 0 0.5 0 0.5 0 0.5 0 0 0.3 0 Rat 5 dCA3 Rat 5 dCA1 vCA3 Rat 5 vCA1 Rat 1 vCA3 Rat 5 *** . -0.5 0.5 100 150 200 250 300 100 150 200 250 300 50 50 030-0.3 0 -0.3 030-. 030-0.3 0 -0.3 -0.3 0 -0.3 0155010 Time fromSWR(s) 100 100 120 160 200 12 16 20 40 60 80 20 40 60 80 10 20 30 40 50 40 80 0500.5 0 -0.5 0500.5 0 -0.5 0500.5 0 -0.5 0500.5 0 -0.5 0 4 8 0 0 0 0 vCA1 Time fromSWR(s) d

Fraction of SWRs ; this versionpostedApril23,2019. Sleep 0.2 0 0.5 0 . . . 0.4 0.3 0.2 0.1 0 100 150 200 250 300 100 150 200 250 300 50 50 030-0.3 0 -0.3 Duration of<4s.d.SWRs(s) Rat 5 dCA3 Rat 5 dCA1 Rat 1 vCA3 vCA3 Rat 5 vCA1 Rat 5 *** Awake Time fromSWR(s) 20 vCA3 g 0.18

vCA1 SWRs dCA1 SWRs 0 . . . 0.4 0.3 0.2 0.1 0 Z-scored MUA Z-scored MUA Z-scored MUA Z-scored MUA

0 1 2 3 0 1 2 3

10 15 20 10 15 20 10 15 20 10 15 20 25 Sleep 05005-. 0.5 0 -0.5 0.5 0 -0.5 power Z-scored power Z-scored 0500.5 0 -0.5 0500.5 0 -0.5 0500.5 0 -0.5 *** 0 5 0 5 0 5 0 5 The copyrightholderforthispreprint(whichwasnot Time fromSWR(s) b Z-scored power Z-scored power Awake 0 1 2 3 0 1 2 3 0 5 0 250 200 150 100 0 5 0 5 0 5 0 5 0 5 0 250 200 150 100 250 200 150 100 250 200 150 100 Frequency (Hz) 150.0 ±4.4 e 184.0 ±4.9 C1vA vCA3 vCA1 dCA1 Fraction of epochs 0.18 dCA1 dCA3 vCA3 vCA1 0 . 0.8 0.4 0 Awake SWR rateperepoch(Hz) 0 5 0 5 0 5 0 250 200 150 100 250 200 150 100 0 1 2 3 0 1 2 3 rqec H)Frequency(Hz) Frequency (Hz) 10 15 20 25 10 15 20 10 15 20 10 15 20 0500.5 0 -0.5 0.5 0 -0.5 0500.5 0 -0.5 0 5 0 5 0 5 0 5 178.7 ±2.0 175.8 ±0 Time fromSWR(s) *** 0.18 Sleep 0 . 0.8 0.4 0 0 1 2 3 0 1 2 3 Sleep 175.8 dCA3 181.6 dCA1 vCA1 vCA3 *** bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Extended Data Figure 3. Characterization of dH and vH SWRs. 2 a, Examples of mean SWR-onset-triggered spectrograms on one tetrode in one rat, from each 3 detection region, during awake immobility (top) and sleep (bottom). Left: all dCA1 SWRs 4 (n=9,642 awake, 26,048 sleep) in Rat 5; Center: all vCA1 SWRs (n=9,672 awake, 31,553 sleep) 5 in Rat 5; Right: all vCA3 SWRs (n=18,798 awake, 24,536 sleep) in Rat 1 (the only rat in which 6 vCA3 SWRs were used). All SWRs >2 s.d. are included. Color bar shows power, z-scored 7 within each epoch and averaged across epochs and days. In a and b, SWRs were detected as 8 described in Online Methods (3 dCA1 tetrodes and 1 vH tetrode per animal). 9 b, Mean z-scored power (a slice of the SWR-triggered spectrogram) at frequencies between 100- 10 250 Hz, at the time of peak ripple power for each animal during awake immobility (top row) and 11 sleep (bottom row). Grey curves indicate individual animals, colored curves indicate mean ± 12 s.e.m. across animals. Arrows indicate mean ± s.e.m. peak ripple frequency. Left: dCA1 SWRs 13 (n=51,333 awake, 115,158 sleep, 5 rats); Center: vCA1 SWRs (n=47,222 awake, 120,271 sleep, 14 4 rats); Right: vCA3 SWRs (n=18,798 awake, 24,536 sleep, 1 rat). As vCA1 and vCA3 SWRs 15 occurred at nearly the same frequency, we pooled these ventral SWRs (vSWRs) for the 16 remainder of the study. 17 c, Distributions of SWR amplitudes across animals. In both wake and sleep, dSWRs (pink) are 18 typically larger amplitude than vSWRs (blue; ***p=0, Wilcoxon rank-sum test), consistent with 19 previous work20. In c, d, and e, only one dCA1 tetrode per animal was used for SWR detection 20 to allow for direct comparison to vSWRs. (in c and e, awake n=54,496 dSWRs, 74,355 vSWRs; 21 sleep n=118,897 dSWRs, 146,995 vSWRs). 22 d, Distributions of SWR durations for <4 s.d. SWRs, across animals. In both wake and sleep, 23 dSWRs (pink) are significantly longer than vSWRs (blue), although this difference is more 24 pronounced in sleep (***p<10-40, Wilcoxon rank-sum test; awake n=38,170 dSWRs, 71,029 25 vSWRs; sleep n=66,382 dSWRs, 135,035 vSWRs). 26 e, Distributions of dSWR (pink) and vSWR (blue) rate per epoch, expressed as the fraction of 27 epochs with each rate (out of 256 awake epochs and 411 sleep epochs across animals). For 28 awake epochs on the maze, rate was calculated per total time spent at <4 cm/s. For sleep epochs, 29 rate was calculated per time spent in NREM sleep (see Online Methods). In both wake and sleep, 30 the vSWR rate is typically higher than the dSWR rate (***p<10-40, Wilcoxon rank-sum test). 31 f, Examples of hippocampal multiunit activity (MUA) in wake and sleep, aligned to the onset of 32 dSWRs (pink lines), vCA1 SWRs (blue lines), or vCA3 SWRs (aqua lines). In the upper right of 33 each panel is the region and rat from which MUA was detected. Firing rate was calculated from 34 the summed spike count from all tetrodes in the region. 35 g, Mean z-scored multiunit firing rate across animals. Grey shaded region indicates s.e.m. Top 36 two rows: dCA1 and dCA3 MUA aligned to dCA1 SWRs (n=5 rats). Bottom two rows: vCA1 37 MUA (n=4 rats) and vCA3 MUA (n=3 rats) aligned to vCA1 SWRs. Z-scores were calculated 38 within animal relative to the pre-SWR period (-500 to 0 ms). Strong similarity in activation 39 timing of vCA3 MUA between vCA1 SWRs and vCA3 SWRs also provided support for the use 40 of vCA3 for SWR detection in Rat 1.

21 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not Extended Data certifiedFigure by peer 4 review) is the author/funder. All rights reserved. No reuse allowed without permission.

Rat 1 Rat 2 Rat 3 Rat 4 Rat 5 a 2 2 2 2 2 1 1 1 1 1 0 0 0 0 0 Z-scored

cross-corr. -1 -1 -1 -1 -1 -2 -2 -2 -2 -2 -0.5 0 0.5 -0.5 0 0.5 -0.5 0 0.5 -0.5 0 0.5 -0.5 0 0.5 Time of vSWRs relative to dSWRs (s)

c rewarded trials error trials b within dCA1 within vCA1 1.5 1.5 1 0.3 0.3 1 0.5 0.5 0 0 -0.5 -0.5 cross-corr. Normalized (relative to shuffle) 0 0 (relative to shuffle) -1 -1 Z-scored cross-corr. Z-scored cross-corr. -0.5 0 0.5 -0.5 0 0.5 -1.5 -1.5 Time (s) Time (s) -0.5 0 0.5 -0.5 0 0.5 Time of vSWRs relative to dSWRs (s)

d dSWRs 2 vSWRs 1.6 A reward 1.6 A error 1.2 B reward 1.2 e B error 0.8 reward delivery

Rat 1 0.8 30 0.4 0.4 Speed (cm/s) SWR rate (events/s) 0 0 0 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 1 2 3 4 5 Epochs Epochs 1.5 Novel 1.5 2 rew. err. 1.6 0 1 1.2 Late Acquisition

Rat 2 1 0.8 0.5 rew. 0.4 err. dCA1 SWR rate (events/s) 0 0 0 0 10 20 30 40 50 60 0 10 20 30 40 50 60 All switch Epochs Epochs 1

SWR rate (events/s) rew. 1.5 1.6 err. 0 1.2 0 1 2 3 4 5 1 Time relative to nosepoke (s) 0.8 Rat 3 0.5 f 0.4 reward delivery 30 SWR rate (events/s) 0 0 0 10 20 30 40 50 0 10 20 30 40 50 Speed Epochs Epochs (cm/s) 0 2 0 1 2 3 4 5 1.6 2 Novel reward delivery 1.6 1.2 1 rew. 1.2 err.

Rat 4 0.8 0.8 0 0.4 0.4 2 Late Acquisition SWR rate (events/s) 0 0 1

0 10 20 30 40 50 60 0 10 20 30 40 50 60 vCA1 rew. Epochs Epochs 0 err.

1.5 SWR rate (events/s) All switch 1.6 2 1 rew. 1.2 1 err. 0.8 0 Rat 5 0 1 2 3 4 5 0.5 0.4 Time relative to nosepoke (s)

SWR rate (events/s) 0 0 0 10 20 30 40 0 10 20 30 40 Epochs Epochs

22 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Extended Data Figure 4. Awake SWR asynchrony and timing relative to reward. 2 a, Cross-correlation histograms (CCH) of awake vSWRs relative to dSWRs within each animal, 3 z-scored relative to shuffle. Note a lack of synchrony at less than 50 ms lag across animals. 4 Error bars indicate s.e.m. across days (n=15-19 days) for each animal. 5 b, Mean CCH showing high synchrony of awake SWRs between tetrodes within dCA1 (left, n=5 6 rats) or vCA1 (right, n=3 rats with >1 tetrode in vCA1), normalized to the number of SWRs on 7 one tetrode. Complementary to Fig. 1f. Error bars indicate s.e.m across animals. 8 c, Asynchrony between dSWRs and vSWRs is present on both rewarded (left) and error (right) 9 trials. A minimum of 3 s immobility following the time of reward delivery (or expected reward 10 delivery on error trials) was required to include a trial’s SWRs (n=5 rats, error bars are s.e.m.). 11 d, Changes in rewarded vs. error SWR rate over task exposure, as a function of maze epochs per 12 animal. Filled circles indicate mean SWR rate across rewarded trials (well visits) within the 13 epoch (± s.e.m.), open squares indicate mean SWR rate across error trials. Each point is color 14 coded by the rewarded sequence of that epoch (Sequence A: orange; Sequence B: green). Left: 15 dSWRs; black line indicates an exponential fit ± 95% confidence interval on error SWR rate, 16 grey line indicates exponential fit on rewarded SWR rate. Right: vSWRs; note that some error 17 bars extend above the y-axis limit. 18 e, Timing of dSWRs relative to reward delivery, by behavioral stage. Gold line in e and f 19 indicates reward delivery or its expected time on error trials, 2 s after nosepoke at the well. Top: 20 mean speed in each behavioral stage; note that each rat’s head takes ~1 s to fully decelerate and 21 dip into the reward well. Below: mean SWR rate across animals in 200 ms bins for rewarded 22 and error trials during the first 100 trials of Multiple-W behavior (“Novel”), trials occurring at 23 >0.6 probability correct on the first sequence (“Late Acquisition”), and during all Switch epochs 24 (“All switch”). Shades of red are rewarded trials, shades of grey are error trials. 25 f, Timing of vSWRs relative to reward delivery, by behavioral stage, similar to e. Speed data at 26 top are repeated from e. Note that since reward rate is only calculated when at least 2 s of data 27 are present below 4 cm/s in each bin, empty bins indicate too little data to calculate an SWR rate. 28 Shades of blue are rewarded trials, shades of grey are error trials.

23 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Extended Data Figure 5

b a Rats 1, 2 (filtered 600-6000 Hz) Rats 3, 4, 5 (filtered 300-6000 Hz) All rats: dorsal CA1

18 35 MSN 35 pyramidal 16 30 FSI 30 TAN 14 25 unclassified 12 unclassified 25 10 20 20 8 15 15 6 10 10 4 5 5

2 0.8 Mean firing rate (Hz) Mean firing rate (Hz) 0.8 0.8 0.6 0 0.6 0.6 0 0.4 0.4 0 0.4 0 0.2 0.2 0 0.2 0.4 0.2 0 0.2 0.2 0.4 0.6 0.8 0 0.6 0.8 1 0 0.4 0.6 0.8 0 Trough width (ms) 1 Trough width (ms) Trough width (ms) 1 Peak width (ms) Peak width (ms) Peak width (ms)

c Different tetrodes, same day e Same tetrode, different days g Same tetrode, different days R2-D6-T27-C1 R2-D6-T28-C2 R1-D13-T18-C1 R1-D14-T18-C1 R2-D16-T27-C3 R2-D17-T27-C3 200 200 120 120 200 200 Amp. ch. 4 (uV) Amp. ch. 4 (uV) Amp. ch. 4 (uV) Amp. ch. 4 (uV) Amp. ch. 3 (uV) Amp. ch. 3 (uV) Amp. ch. 3 (uV) 0 0 0 0 0 0 0 200 0 200 0 120 0 120 0 200 0 200 Amp. ch. 2 (uV) Amp. ch. 2 (uV) Amp. ch. 2 (uV) Amp. ch. 1 (uV) Amp. ch. 1 (uV) Amp. ch. 1 (uV)

r = 0.8907 r = 0.9023 r = 0.9943 d f h R1-D20-T17-C1 R1-D20-T20-C2 R3-D9-T24-C4 R3-D10-T24-C4 R3-D9-T24-C4 R3-D10-T24-C3 210 200 180 180 180 180

180

160

140

120

100

80

60

40 Amp. ch. 3 (uV) Amp. ch. 4 (uV) Amp. ch. 2 (uV) Amp. ch. 2 (uV) Amp. ch. 2 (uV) Amp. ch. 2 (uV) 20 0 0 0 0 0 0 0 140 0 200 0 180 0 180 0 180 0 180 Amp. ch. 1 (uV) Amp. ch. 1 (uV) Amp. ch. 1 (uV) Amp. ch. 1 (uV) Amp. ch. 1 (uV) Amp. ch. 1 (uV)

r = 0.9613 r = 0.8802 r = 0.9992

i Awake epochs on maze j Sleep epochs 0.04 0.04 different tetrodes, 0.03 0.03 same day 0.035 0.035 same tetrode, 0.03 0.03 different days 0.025 0.025 threshold for 0.5 1 0.5 1 0.02 0.02 potential duplicates 0.015 0.015 0.01 0.01 Fraction of cell pairs Fraction of cell pairs 0.005 0.005 0 0 0.85 0.9 0.95 1 0.85 0.9 0.95 1 mean waveform correlation (r) mean waveform correlation (r)

24 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Extended Data Figure 5. NAc cell type classification and waveform correlation control. 2 a, Classification of putative NAc cell types by waveform properties and mean firing rate. Each 3 point represents a single unit. Number of units (wake and sleep included): MSNs: 1799, FSIs: 4 30, TANs: 2, unclassified: 12. Data from Rats 1 and 2 (left) were classified separately from Rats 5 3-5 (right) due to narrower bandpass filtering at time of acquisition, as this changes waveform 6 shape. 7 b, Classification of putative hippocampal cell types in dCA1 (contributes to Extended Data Fig. 8 6). Number of units (wake and sleep included): pyramidal: 1218, interneuron: 13, unclassified: 9 18. 10 c-h, Illustration of waveform correlation metric to assess cells as “unique” or “duplicate.” Each 11 panel shows an example pair of cells with their mean Pearson’s r value across all 4 channels 12 listed at the bottom. Top row: each cell’s cluster in amplitude space on two channels of the 13 tetrode; each point is the maximum amplitude of a spike. Colored spikes represent the chosen 14 unit; grey spikes represent all other threshold crossings. Cell ID code lists the rat number, day 15 number (since beginning of data acquisition), tetrode number, and cell number (assigned out of 16 the total cells on that tetrode, for that day). Bottom row: mean waveforms of the cell pair on each 17 channel, aligned to the first cell’s peak on its maximum channel. All scale bars are 60 µV on the 18 y-axis, 1 ms on the x-axis. c and d, Cell pairs which were recorded on different tetrodes on the 19 same day, thus by definition are different cells. e and f, Cell pairs recorded on the same tetrode 20 on consecutive days at the same depth, but which have an r below the threshold for a duplicate 21 cell, indicating that they are likely different cells. g and h, Cell pairs recorded on the same 22 tetrode on consecutive days at the same depth with an r above the threshold, and thus are 23 classified as potential duplicates across days. The left (blue) cell in h is the same cell on the left 24 in f, and the clustering projection on the right in h is the same as the right projection in f. 25 i, Distributions of mean waveform r for all cell pairs, calculated from awake epochs on the maze. 26 Black: pairs on different tetrodes on the same day. Magenta: pairs on the same tetrode at similar 27 depths across days. Inset shows zoomed-out view of the whole distribution. Vertical dotted line 28 shows the threshold used to classify a cell pair as potential duplicates, r=0.979, which is the 95th 29 percentile of the black distribution. 30 j, Similar to i, but for waveform r calculated from sleep epochs. Sleep threshold: r=0.980. Cells 31 present in both wake and sleep were classified as potential duplicates based on their r in wake.

25 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not Extended Datacertified Figure by peer review)6 is the author/funder. All rights reserved. No reuse allowed without permission.

** 104 ms 88 ms 75 ms 36 ms MSNs 104 ms 117 ms a vSWR firing dSWR firing b 16 12

50 50 6 Z-scored firing rate 5 14 dCA1 pyr 10 vCA1 MUA 100 100 4 12 D+ MSNs V+ MSNs 8 150 150 3 10 D- MSNs V- MSNs 200 200 2 8 6 250 250 1 0 6 300 300 4 -1 4 350 350 -2 2 2 400 400 -3 0 Z-scored firing rate

Sorted NAc cell # -4 0 450 450 -5 500 500 -2 -2 -0.5 0 0.5 -0.5 0 0.5 -0.5 0 0.5 -0.5 0 0.5 Time from SWR onset (s) Time from dSWR onset (s) Time from vSWR onset (s)

75 ms 70 ms 60 ms 36 ms FSIs 104 ms 90 ms c vSWR firing dSWR firing d 16 12 dCA1 pyr vCA1 MUA 6 Z-scored firing rate 14 2 2 5 D+ FSIs 10 V+ FSIs 12 4 D- FSIs 8 V- FSIs 4 4 3 10 2 8 6 6 6 1 0 6 4 8 8 -1 4 2 -2 2 10 10 -3 0

Z-scored firing rate 0 Sorted NAc cell # -4 -2 12 12 -5 -2 -4 -4 -0.5 0 0.5 -0.5 0 0.5 -0.5 0 0.5 -0.5 0 0.5 Time from SWR onset (s) Time from dSWR onset (s) Time from vSWR onset (s)

MSNs 27% e D only f dSWR firing vSWR firing g 123 Both 10 r = -0.36 216 -15 V only 50 6 Z-scored firing rate p =1.66x10 61 50 5 ***13% Neither 8 slope = -0.55 53 48% 100 100 4 150 150 3 6 12% 2 0.3 D+ 200 200 1 4 18 D- 250 250 0 V+ -1 2 0.2 V- 300 300 -2 1 350 350 -3 0 105 7 ** D+ V+ 0.1 13 D+ V- 400 -4 -2 Sorted NAc cell # 400 36 ** -5 40 D- V+ 450 450 Z-scored dSWR mod. 17 D- V- -4 Fraction of cells 0 -0.5 0 0.5 -0.5 0 0.5 -3 -2 -1 0 1 2 3 4

Both Time from SWR onset (s) Z-scored vSWR mod. D only V only

h 24% i MSNs j 47 D only r = -0.27 Both dSWR firing vSWR firing 107 14 -4 23 V only p =1.91x10

*12% 6 Z-scored firing rate 12 15 Neither 20 20 56% 5 10 slope = -0.49 8% 40 40 4 60 60 3 8 D+ 80 80 2 6 0.25 1 D- 100 100 4 0.2 10 V+ 0 V- 120 120 -1 2 0.15 140 140 -2 37 D+ V+ -3 0 0.1 3 * 160 160

D+ V- Sorted NAc cell # -4 2 -2 12 D- V+ -5 Z-scored dSWR mod. 0.05 * 180 180 -4 13 8 Fraction of cells 0 -0.5 0 0.5 -0.5 0 0.5 -2 -1 0 1 2 3 4 5 Time from SWR onset (s) Z-scored vSWR mod. Both D only V only

26 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Extended Data Figure 6. NAc modulation during awake SWRs. 2 a, Opposing modulation is not a result of cell ordering. Left: vSWR-aligned z-scored PETHs for 3 each MSN ordered by its modulation amplitude (mean z-scored firing rate in the 200 ms 4 following SWR onset). Right: dSWR-aligned z-scored PETHs for the same ordered MSNs 5 shown on the left. Same MSNs as shown in Fig. 2c. 6 b, Timing of significantly modulated MSN ensemble activity during awake dSWRs (left) and 7 vSWRs (right), overlaid on dCA1 firing (n=332 cells) during dSWRs (left) and 8 vCA1 multiunit activity (n=4 rats) during vSWRs (right). Each curve represents the mean ± 9 s.e.m. z-scored firing rate of the hippocampal (black), D+ (n=49 cells, dark red), V+ (n=16 cells, 10 dark blue), D- (n=13 cells, light red), or V- (n=14 cells, light blue) populations. MSN 11 subpopulations here are restricted to cells that passed the potential duplicate cell control. Arrows 12 indicate the center of mass in the 0-200 ms post-SWR interval of each subpopulation’s activity. 13 The D+ center of mass occurs significantly later than the V+ center of mass (**p<0.01, one- 14 tailed permutation test). Timing difference of centers of mass of the D- and V- populations was 15 not significantly different at the p<0.05 level. Also note that the centers of mass of the activated 16 NAc ensembles lag the local activation of dH and vH neurons. 17 c, Similar to a but for FSIs shown in Fig. 2g. 18 d, Timing of significantly modulated FSI ensemble activity during awake dSWRs (left), and 19 vSWRs (right), overlaid on the same hippocampal activity as shown in b. Arrows indicate the 20 center of mass in the 0-200 ms post-SWR interval of each population’s activity; no significant 21 differences in timing (permutation tests). 22 e-g, Opposing MSN modulation cannot be accounted for by occasional co-occurrence of dSWRs 23 and vSWRs (control for Fig. 2b-d). We removed dSWRs and vSWRs occurring within 250 ms of 24 each other to examine SWRs that occurred in isolation. Fraction modulated during Both: 25 ***p=9.13x10-4, fraction opposing: **p=0.0046, fraction D+V-: **p=0.0089 (z-tests for 26 proportions). 27 h-j, Opposing MSN modulation cannot be accounted for by inclusion of potential duplicate cells 28 (control for Fig. 2b-d). We excluded potential duplicates using the method illustrated in 29 Extended Data Fig. 5. Fraction modulated during Both: *p=0.016, fraction opposing: *p=0.038, 30 fraction D+V-: *p=0.046 (z-tests for proportions).

27 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Extended Data Figure 7

a

ac NAcC NAcSh

+2.76 +2.52 +2.16 +1.80 +1.56 +1.32 +1.20 mm D+ V+ both D+ and V+

b

ac NAcC NAcSh

+2.76 +2.52 +2.16 +1.80 +1.56 +1.32 +1.20 mm D- V- both D- and V-

28 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Extended Data Figure 7. Anatomical locations of NAc neurons modulated during awake 2 SWRs. 3 a, Mapping of approximate tetrode locations where at least one D+ (dark red) or one V+ (dark 4 blue) (either MSN or FSI) was detected. All AP coordinates are in mm relative to 5 Bregma and correspond to plate labels from the atlas (see Extended Data Fig. 1). Overlapping 6 colors indicate locations where both D+ and V+ were detected, typically not in the same neuron. 7 Note that borders of core (NAcC) and shell (NAcSh) are approximate, but that D+ modulation is 8 limited largely to the dorsolateral core and its dorsal boundary, whereas V+ modulation is seen in 9 the medial shell and also extends into the core. 10 b, Mapping of approximate tetrode locations where at least one D- neuron (light red) or one V- 11 neuron (light blue) was detected, again typically not in the same neuron. Note coincidence of D- 12 sites with V+ sites and V- sites with D+ sites.

29 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not Extended Data certifiedFigure by peer 8review) is the author/funder. All rights reserved. No reuse allowed without permission.

a

5 -crdfrn aeZ-scored firing rate Z-scored firing rate 4 20 20 20 20 20 20 3 2 40 40 40 40 40 40 1 60 60 60 60 60 60 0 -1

Sorted D+ MSN # 80 80 80 80 80 80 -2 Well Path Well Path Well Path Well Path Well Path Well Path 5 4 5 5 5 5 5 5 3 10 10 10 10 10 10 2 1 15 15 15 15 15 15 0 -1

Sorted V+ MSN # 20 20 20 20 20 20 -2 Well Path Well Path Well Path Well Path Well Path Well Path

Potential duplicate cells removed

b *** *** c *** ** d *** *** e *** ** 1 1 11 9 10 8 9 0.8 0.8 8 7 7 6 0.6 0.6 6 5 in trial time

2 5 in linear pos. 4 0.4 2 0.4 4 3 3 0.2 0.2 2 2 Mean well FR (Hz) Mean path FR (Hz) Mean r 1 Mean r 1 0 0 0 0 N D+ V+ N D+ V+ N D+ V+ N D+ V+

f Matched by path FR Matched by well FR g Matched by path FR h D+V+ MSNs *** ** *** *** *** 1 10 1 1 1 14 0.8 12 8 0.8 0.8 0.8

2 10 0.6 0.6 0.6 0.6 8 6 in trial time in trial time in linear pos. 2 2 0.4 2 0.4 6 4 0.4 Mean r 0.4 4 0.2 0.2 0.2 0.2 Mean FR (Hz) 2

Mean r 2 Mean r Mean r 0 0 0 0 0 0 N D+ V+ N D+ V+ N D+ V+ trial lin. path well time pos. FR FR i j k l D+V+ MSNs p=0.06 p= 0.025 *** ** *** * 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 reward reward reward 0.2 0.2 0.2 reward 0.2 0 0 0 0 -0.2 -0.2 -0.2 -0.2 error error error -0.4 error -0.4 -0.4 -0.4 -0.6 -0.6 -0.6 -0.6

-0.8 -0.8 (outer wells to center) -0.8 -0.8 Reward history preference Reward history preference Reward history preference Reward history preference -1 -1 -1 (no speed correlated cells) -1

(potential duplicates removed) N D+ V+ N D+ V+ N D+ V+

30 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Extended Data Figure 8. Population task-related firing patterns of D+ and V+ MSNs. 2 a, Heterogeneity of trajectory stages represented by NAc MSNs, as a function of normalized trial 3 time. Specific trajectories are shown in cartoons above the heat plots. D+ MSNs (top) and V+ 4 MSNs (bottom) are sorted by their peak z-scored firing location on the third trajectory (light 5 green). Z-scores are calculated within cell from the mean firing rate profile on each trajectory. 6 Only cells which were active on enough trials (see Online Methods) of all six trajectories are 7 shown here, such that these cells predominantly correspond to the Switch phase of the task (D+ 8 n=85 cells, V+ n=22 cells). 9 b-e, Controls for Fig. 3c-f with potential duplicate cells removed using the method illustrated in 10 Extended Data Fig. 5. In b, d, e: N n=93 cells, D+ n=45 cells, V+ n=15 cells; in c: N n=90 cells, 11 D+ n=44 cells, V+ n=13 cells. **p<0.01, ***p<0.001. All tests between populations in b-l are 12 Wilcoxon rank-sum tests with Bonferroni correction for multiple comparisons, setting 13 significance level at p<0.017. 14 f, D+ MSN firing pattern similarity across trajectories in normalized trial time is maintained 15 when populations are matched for mean firing rate on the path (left: n=42 cells in each 16 subpopulation; N vs. D+: ***p=1.18x10-4; D+ vs. V+: ***p=1.22x10-4) or well (right: N vs. D+: 17 p=0.34; D+ vs. V+: ***p=4.17x10-4). 18 g, Similar to f but for firing similarity in linearized position, matched by mean firing rate on the 19 path (n=40 cells in each subpopulation; N vs. D+: **p=0.0016; D+ vs. V+: ***p=2.55x10-5). 20 h, D+V+ MSNs resemble D+ MSNs (see Fig. 3c-f) in firing pattern similarity across trajectories 21 and in firing rate. All D+V+ MSNs are included. Crosses indicate outliers. 22 i, Reward history preference of all D+V+ MSNs is not significantly greater than zero (p=0.11, 23 one-tailed signed-rank test). 24 j, Control for Fig. 3h with potential duplicate cells removed. N n=104 cells, D+ n=49 cells, V+ 25 n=16 cells. D+ population is still significantly shifted greater than zero (p=0.0041, one-tailed 26 signed-rank test). 27 k, Control for Fig. 3h, examining only paths coming from the outer two wells to the center well 28 of each rewarded sequence. This aims to control both for upcoming choice and reward 29 expectation, as returns to the center well of each sequence are always rewarded. D+ population is 30 significantly shifted greater than zero (p=3.51x10-11, one-tailed signed-rank test). N (n=228 31 cells) vs. D+ (n=159 cells): ***p=4.87x10-5; D+ vs V+ (n=43 cells): **p=0.0042. 32 l, Control for Fig. 3h removing cells that are significantly correlated with trial-by-trial changes in 33 in mean speed. This aims to control for the possibility that differing speed on trials following 34 reward vs. error could account for reward history preference. D+ population is significantly 35 shifted greater than zero (p=9.67x10-9, one-tailed signed-rank test). N (n=172 cells) vs. D+ 36 (n=107 cells): ***p=6.68x10-4; D+ vs. V+ (n=28 cells): *p=0.013.

31 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not Extended Data certifiedFigure by peer 9review) is the author/funder. All rights reserved. No reuse allowed without permission.

p = 0.031 a b p = 0.035 c d p = 0.033 *** *** *** p = 0.030 *** 1 2

2 2.5 0.6 4 0.8 2 3 0.6 0.4 1.5 2 0.4 1 0.2 0.2 0.5 (bits/spike) 1 Mean trial-by-trial r per traj. (norm. time) Mean trial-by-trial r by vertical segment segment (bits/spike) per traj. (linear pos.) Spatial info by maze

0 0 0 Spatial + direction info 0 N D+ V+ N D+ V+ N D+ V+ N D+ V+

e p = 0.042 f *** p = 0.02 g 1 1 1 0.8 0.8 0.6 0.8 0.6 0.4 0.4 path

0.2 0.6 0.2 toward 0 0 -0.2 0.4 -0.2 well

-0.4 preference -0.4 away unidirectional -0.6 -0.6 0.2 well preference

-0.8 Trajectory direction -0.8 Toward vs. away from

Path vs. well preference -1 0 -1 N D+ V+ N D+ V+ N D+ V+

D+ MSNs V+ MSNs Cell 1 Cell 2 h RI = 0.64*** RI = 0.30*** i 4 10 1 8 0.8 3 6 reward 0.6 error 0.4 2 4 FR (Hz) FR (Hz) 0.2 reward 1 2 0 0 0 -0.2

60 60 -0.4 error -0.6

during norm. well time -0.8 Reward vs. error index (cm/s) Speed (cm/s) Speed -1 0 0 N D+ V+ Well Well D+ MSNs V+ MSNs *** p = 0.033 j reward Cell 3 reward Cell 4 k delivery RI = 0.62* delivery RI = 0.82* 1 1.5 1.5 0.8 reward 0.6 error 0.4 reward FR (Hz)

FR (Hz) 0.2 0 0 0 50 50 -0.2

-0.4 error

2 s post-reward -0.6 (cm/s) (cm/s) Speed Speed 0 0 -0.8 0 1 2 3 4 0 1 2 3 4 Reward vs. error index -1 Cell 5 Cell 6 N D+ V+ RI = -0.67*** RI = -0.49* 12 6 FR (Hz) 0 FR (Hz) 0 50 50 (cm/s) (cm/s) Speed Speed 0 0 0 1 2 3 4 0 1 2 3 4 Time from nosepoke (s) Time from nosepoke (s)

32 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Extended Data Figure 9. Task-related firing pattern parameters explored for D+ and V+ 2 MSNs. 3 a, Mean trial-by-trial correlation, calculated across individual traversals of each trajectory in 4 normalized trial time and then averaged across trajectories to get a single r2. Note that the V+ 5 and N populations have lower trial-by-trial similarity (are more variable) than the D+ population. 6 N (n=235 cells) vs. D+ (n=159 cells): ***p=2.14x10-12; D+ vs. V+ (n=45 cells): ***p=1.09x10-7. 7 All tests between populations in a-g, i, k are Wilcoxon rank-sum tests with Bonferroni correction 8 for multiple comparisons, setting significance level at p<0.017. 9 b, Mean trial-by-trial correlation in linearized position, based on spike counts in 5 cm bins. 10 Again, note lower similarity of the N (n=225 cells) and V+ (n=41 cells) populations compared to 11 the D+ population (n=159 cells; N vs. D+: ***p=3.63x10-4). 12 c, Spatial information of MSN spiking, where each of 11 maze segments (6 vertical, 5 13 horizontal) is considered a spatial bin. Both the D+ (n=161 cells) and V+ (n=45 cells) 14 populations trend towards being less spatially-specific than the N (n=246 cells) population. 15 d, Spatial and directional information of MSN spiking, similar to c, here examining the 6 vertical 16 segments in both directions for 12 total bins. Note significant or trending lower spatial and 17 directional selectivity of the D+ (n=161 cells) compared to the N (n=246 cells; N vs. D+: 18 ***p=3.50x10-5) and V+ (n=45 cells) populations, consistent with greater generalization across 19 trajectories. 20 e, Preference for the path (>0) vs. well (<0) components of all trajectories. While individual 21 cells exhibit strong preferences, the populations do not (D+ n=154 cells, N n=226 cells, V+ n=42 22 cells). 23 f, Trajectory direction preference, where 0 indicates bidirectionality and values closer to 1 24 indicate stronger unidirectionality for either leftward- or rightward-moving trajectories. Note 25 significant or trending lower directionality of the D+ population (n=152 cells) compared to the N 26 (n=221 cells; N vs. D+: ***p=3.16x10-5) and V+ (n=40 cells) populations. 27 g, Preference for moving toward or away from a reward well (regardless of reward outcome). 28 Values greater than 0 indicate preference for approaching a well, values less than 0 indicate 29 preference for leaving a well. While individual cells exhibit strong preferences, the populations 30 do not (D+ n=161 cells, N n=241 cells, V+ n=44 cells). 31 h, Examples of individual MSNs (D+, left; V+, right) showing strong preference for rewarded 32 well visits (teal) compared to error visits (brown) as a function of normalized well time. Top: 33 firing rate (mean ± s.e.m. across trials). Reward vs. error index (RI) is shown in the upper right 34 of each panel (***p<0.001, one-tailed permutation test). Bottom: faded lines indicate speed 35 profiles of individual well visits, thick lines indicate mean speeds. Dotted grey lines flank the 36 time period analyzed for significance, when both rewarded and error mean speeds are <4 cm/s. 37 i, Distributions of reward vs. error index during normalized well time, by SWR-modulation 38 category. Values greater than 0 signify higher firing rates when rewarded, values less than 0 39 signify higher firing rates on errors. Filled circles indicate significantly reward- or error- 40 preferring cells, open circles indicate non-significant cells (D+ n=141 cells, N n=196 cells, V+ 41 n=39 cells). 42 j, Examples of individual MSNs (D+, left; V+, right) firing as a function of time since nosepoke. 43 Top row: higher firing rates (mean ± s.e.m. across trials) on rewarded trials compared to error

33 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 trials in the 2 s following reward delivery time, with individual and mean speeds for these trials 2 shown below. Bottom row: higher firing rates on error trials compared to rewarded trials. RI is 3 shown in the upper right of each cell (*p<0.05, ***p<0.001, one-tailed permutation test in the 2- 4 4 s window). Gold vertical line marks reward delivery (or time of expected reward delivery on 5 error trials). 6 k, Distributions of reward vs. error index during 2 s following reward delivery time, by SWR- 7 modulation category (D+ n=147 cells, N n=195 cells, V+ n=37 cells). The D+ population is 8 significantly shifted negative of zero by this metric (p=5.15x10-7, one-tailed signed-rank test) and 9 shows significantly less post-reward preference than the N population (***p=9.06x10-6).

34 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not Extended Datacertified Figure by peer review)10 is the author/funder. All rights reserved. No reuse allowed without permission.

a 1 mV b c D vs D 0.04 Raw 0.3 0.03 100 µV 0.02 0.2

dCA1 Ripple 0.1 cross-corr.

Normalized 0.01 cross-corr. Normalized 1 mV 0 0.5 Raw -0.5 0 0.5 -0.5 0

100 µV 4 V vs V vCA1 0.3 Ripple 3 2 0.2 1 Speed 10 0 0.1 cross-corr. (cm/s) 0 -1 Normalized

0 0.5 1 1.5 2 2.5 3 (relative to shuffle)

Z-scored cross-corr. -0.5 0 0.5 -0.5 0 0.5 Time (s) Time of vSWRs relative Time (s) to dSWRs (s)

d 5 Rat 1 5 Rat 2 5 Rat 3 5 Rat 4 5 Rat 5 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 1 1 1 1 1 Z-scored

cross-corr. 0 0 0 0 0 -1 -1 -1 -1 -1 -0.5 0 0.5 -0.5 0 0.5 -0.5 0 0.5 -0.5 0 0.5 -0.5 0 0.5 Time of vSWRs relative to dSWRs (s)

e NAc MSNs f NAc MSNs 7 7 r = 0.44 r = 0.31 6 6 p = 1.77x10-18 p = 9.12x10-10 5 slope = 0.34 5 slope = 0.28 4 4 3 3 2 2 Sleep 1 Sleep 1 0 0 -1 -1 Z-scored vSWR mod. Z-scored dSWR mod. -2 -2 -3 -3 -4 -2 0 2 4 6 8 10 12 14 -3 -2 -1 0 1 2 3 4 5 6 Z-scored dSWR mod. Z-scored vSWR mod. Awake Awake

35 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Extended Data Figure 10. Increased synchrony between dH and vH SWRs during sleep. 2 a, Example of dCA1 and vCA1 SWRs during sleep from the same tetrodes and same rat as Fig. 3 1d (Rat 4). Shaded regions highlight detected dSWRs (pink) and vSWRs (blue). Note the 4 occasional coincidence of dSWRs and vSWRs. 5 b, Mean cross-correlation histogram (CCH) for sleep SWRs across animals. Top: normalized by 6 the number of dSWRs. ~6.7% of dSWRs occur within ±50 ms of a vSWR. Bottom: CCH z- 7 scored relative to shuffled vSWR onset times. Note that synchrony between dSWRs and vSWRs 8 in sleep is greater than expected by chance (>2 z-scores), and vSWRs are most often leading (left 9 of time 0). Error bars indicate s.e.m. (n=5 rats). 10 c, High synchrony of sleep SWRs between tetrodes within dH (top, n=5 rats) or vH (bottom, n=3 11 rats with >1 tetrode in vCA1). Error bars indicate s.e.m. across included animals. 12 d, Z-scored CCH (relative to shuffle) of sleep vSWRs and dSWRs in each animal. Error bars 13 indicate s.e.m. across days (n=15-19 days) for each animal. 14 e, Z-scored dSWR-modulation of individual NAc MSNs is significantly correlated between 15 awake immobility and sleep dSWRs. Each point represents a single cell, and only cells present 16 in both wake and sleep are included (n=368 cells). Dotted line and shaded regions represent a 17 linear fit with 95% confidence intervals. Pearson’s correlation coefficient (r), p-value and slope 18 are shown in upper right. 19 f, Similar to e, but for vSWR-modulation of NAc MSNs during awake immobility vs. sleep. 20 Only cells present in both wake and sleep are included (n=368 cells).

36 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

References

1 Ito, R., Robbins, T. W., Pennartz, C. M. & Everitt, B. J. Functional interaction between the hippocampus and nucleus accumbens shell is necessary for the acquisition of appetitive spatial context conditioning. J. Neurosci. 28, 6950-6959 (2008). 2 Sjulson, L., Peyrache, A., Cumpelik, A., Cassataro, D. & Buzsaki, G. Place Conditioning Strengthens Location-Specific Hippocampal Coupling to the Nucleus Accumbens. Neuron 98, 926-934 (2018). 3 Lansink, C. S. et al. Preferential reactivation of motivationally relevant information in the ventral . J. Neurosci. 28, 6372-6382 (2008). 4 Lansink, C. S., Goltstein, P. M., Lankelma, J. V., McNaughton, B. L. & Pennartz, C. M. Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol. 7, e1000173 (2009). 5 LeGates, T. A. et al. Reward behaviour is regulated by the strength of hippocampus- nucleus accumbens . Nature 564, 258-262 (2018). 6 Britt, J. P. et al. Synaptic and behavioral profile of multiple inputs to the nucleus accumbens. Neuron 76, 790-803 (2012). 7 Trouche, S. et al. A Hippocampus-Accumbens Tripartite Neuronal Motif Guides Appetitive Memory in Space. Cell 176, 1393-1406 (2019). 8 Pennartz, C. M., Ito, R., Verschure, P. F., Battaglia, F. P. & Robbins, T. W. The hippocampal-striatal axis in learning, prediction and goal-directed behavior. Trends Neurosci. 34, 548-559 (2011). 9 Humphries, M. D. & Prescott, T. J. The ventral , a selection mechanism at the crossroads of space, strategy, and reward. Prog. Neurobiol. 90, 385-417 (2010). 10 Strange, B. A., Witter, M. P., Lein, E. S. & Moser, E. I. Functional organization of the hippocampal longitudinal axis. Nat. Rev. Neurosci. 15, 655-669 (2014). 11 Fanselow, M. S. & Dong, H. W. Are the dorsal and ventral hippocampus functionally distinct structures? Neuron 65, 7-19 (2010). 12 Brog, J. S., Salyapongse, A., Deutch, A. Y. & Zahm, D. S. The patterns of afferent innervation of the core and shell in the "accumbens" part of the rat ventral striatum:

37 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

immunohistochemical detection of retrogradely transported fluoro-gold. J. Comp. Neurol. 338, 255-278 (1993). 13 Jadhav, S. P., Kemere, C., German, P. W. & Frank, L. M. Awake hippocampal sharp- wave ripples support spatial memory. Science 336, 1454-1458 (2012). 14 Pfeiffer, B. E. & Foster, D. J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74-79 (2013). 15 Ambrose, R. E., Pfeiffer, B. E. & Foster, D. J. Reverse Replay of Hippocampal Place Cells Is Uniquely Modulated by Changing Reward. Neuron 91, 1124-1136 (2016). 16 Buzsaki, G. Hippocampal sharp wave-ripple: A cognitive biomarker for and planning. Hippocampus 25, 1073-1188 (2015). 17 Joo, H. R. & Frank, L. M. The hippocampal sharp wave-ripple in memory retrieval for immediate use and consolidation. Nat. Rev. Neurosci. 19, 744-757 (2018). 18 Singer, A. C., Carr, M. F., Karlsson, M. P. & Frank, L. M. Hippocampal SWR Activity Predicts Correct Decisions during the Initial Learning of an Alternation Task. Neuron 77, 1163-1173 (2013). 19 Singer, A. C. & Frank, L. M. Rewarded outcomes enhance reactivation of experience in the hippocampus. Neuron 64, 910-921 (2009). 20 Patel, J., Schomburg, E. W., Berenyi, A., Fujisawa, S. & Buzsaki, G. Local generation and propagation of ripples along the septotemporal axis of the hippocampus. J. Neurosci. 33, 17029-17041 (2013). 21 Sosa, M., Gillespie, A. K. & Frank, L. M. Neural Activity Patterns Underlying Spatial Coding in the Hippocampus. Curr. Top. Behav. Neurosci. 37, 43-100 (2016). 22 Tepper, J. M. et al. Heterogeneity and Diversity of Striatal GABAergic Interneurons: Update 2018. Front. Neuroanat. 12, 91 (2018). 23 Riaz, S., Schumacher, A., Sivagurunathan, S., Van Der Meer, M. & Ito, R. Ventral, but not dorsal, hippocampus inactivation impairs reward memory expression and retrieval in contexts defined by proximal cues. Hippocampus 27, 822-836 (2017). 24 Ciocchi, S., Passecker, J., Malagon-Vina, H., Mikus, N. & Klausberger, T. Brain computation. Selective information routing by ventral hippocampal CA1 projection neurons. Science 348, 560-563 (2015).

38 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

25 Royer, S., Sirota, A., Patel, J. & Buzsaki, G. Distinct representations and theta dynamics in dorsal and ventral hippocampus. J. Neurosci. 30, 1777-1787 (2010). 26 Jimenez, J. C. et al. Cells in a Hippocampal-Hypothalamic Circuit. Neuron 97, 670-683 (2018). 27 Berke, J. D., Breck, J. T. & Eichenbaum, H. Striatal versus hippocampal representations during win-stay maze performance. J. Neurophysiol. 101, 1575-1587 (2009). 28 van der Meer, M. A., Johnson, A., Schmitzer-Torbert, N. C. & Redish, A. D. Triple dissociation of information processing in dorsal striatum, ventral striatum, and hippocampus on a learned spatial decision task. Neuron 67, 25-32 (2010). 29 Lansink, C. S. et al. Reward cues in space: commonalities and differences in by hippocampal and ventral striatal ensembles. J. Neurosci. 32, 12444-12459 (2012). 30 Singer, A. C., Karlsson, M. P., Nathe, A. R., Carr, M. F. & Frank, L. M. Experience- dependent development of coordinated hippocampal spatial activity representing the similarity of related locations. J. Neurosci. 30, 11586-11604 (2010). 31 Yu, J. Y., Liu, D. F., Loback, A., Grossrubatscher, I. & Frank, L. M. Specific hippocampal representations are linked to generalized cortical representations in memory. Nat. Commun. 9, 2209 (2018). 32 Komorowski, R. W. et al. Ventral hippocampal neurons are shaped by experience to represent behaviorally relevant contexts. J. Neurosci. 33, 8079-8087 (2013). 33 de Hoz, L. & Martin, S. J. Double dissociation between the contributions of the septal and temporal hippocampus to spatial learning: the role of prior experience. Hippocampus 24, 990-1005 (2014).

39 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Author contributions 2 M.S. and L.M.F. designed the study. M.S. and H.R.J. conducted experiments. M.S. analyzed 3 the data. M.S. and L.M.F. wrote the paper with input from H.R.J. 4 5 Acknowledgements 6 We thank J. Berke, H. Fields, M. Kheirbek, M. Brainard, A. Gillespie, A. Joshi, A. Comrie, M. 7 Coulter, J. Yu, and K. Kay for comments on the manuscript, and members of the Frank 8 laboratory for useful discussions. We additionally thank K. Kay for contributing key analysis 9 code, J. Chung and J. Magland for development of drift tracking in Mountainsort, and I. 10 Grossrubatscher, V. Kharazia, and E. Miller for technical assistance. This work was supported 11 by Howard Hughes Medical Institute (L.M.F.), Simons Foundation Collaboration for the Global 12 Brain Grants 521921 and 542981 (L.M.F.), National Institute of Mental Health Ruth L. 13 Kirschstein National Research Service Awards F31MH111214 (M.S.) and F30MH115582 14 (H.R.J.), and National Institute of General Medical Sciences Medical Scientist Training Program 15 Grant T32GM007618 (H.R.J.). The content is solely the responsibility of the authors and does 16 not necessarily represent the official views of the National Institutes of Health. 17 18 Competing interests 19 The authors declare no competing interests. 20 21 Corresponding author 22 Loren M. Frank, [email protected]

40 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Methods 2 3 Animals, implants, and behavior. All procedures were in accordance with guidelines from the 4 University of California San Francisco Institutional Animal Care and Use Committee and US 5 National Institutes of Health. Five male Long-Evans rats (500-650 g, 5-8 months old) were food 6 restricted to 85% of their baseline weight and pre-trained to run back and forth on a 1 m long 7 linear track for liquid reward (evaporated milk plus 5% sucrose), delivered automatically from 8 reward wells at the ends of the track. Animals were incrementally introduced to a delay between 9 well entry (nosepoke) and reward delivery of up to 2 seconds. After animals learned to alternate 10 consistently for at least ~30 well visits per 5 min (4-6 days), they were switched to an ad libitum 11 diet and then surgically implanted with microdrive arrays. 12 Each microdrive housed a maximum of 28 independently movable tetrodes in a custom 3D- 13 printed drive body (PolyJetHD Blue, Stratasys Ltd.) cemented to 3 stainless steel cannulae at 14 fixed relative positions, targeting NAc vertically (8-12 tetrodes) and dH (6-7 tetrodes) and vH (9- 15 13 tetrodes) at a 12˚ angle from vertical (tilted mediolaterally). NAc and dH tetrodes were made 16 of 12.7 µm-diameter nichrome (Sandvik), while vH tetrodes were made of 12.7 µm nichrome, 17 12.7 µm tungsten, or 20 µm tungsten (California Fine Wire). Tetrode ends were plated with gold 18 to a final impedance of ~240-330 kOhms. The microdrive was stereotaxically implanted over 19 the right hemisphere such that the center of each cannula was targeted to the following 20 coordinates relative to the animal’s Bregma: dH: AP -3.9-4.0 mm, ML +1.7 mm; vH: AP -5.6- 21 5.7 mm, ML +4.0 mm; NAc: AP +1.3-1.4 mm, ML +1.3 mm (Rat 1 vH: AP -5.75, ML +4.1, 22 oval-shaped cannula). The approximate AP/ML spread of tetrodes in each area was defined by 23 the inner radius of each cannula as follows: dH: ±0.49 mm, vH: ±0.87 mm, NAc: ±0.60 mm. A 24 ground screw was inserted in the skull above the right cerebellum as a global reference. 25 While animals recovered from surgery, tetrodes were manually adjusted over ~2-3 weeks to 26 their target depths relative to brain surface (dCA1: ~2.2-3.3 mm, 12˚ angle; vCA1: ~7.0-8.3 mm, 27 12˚ angle; NAc: ~5.4-7.5 mm, 0˚ angle), using electrophysiological landmarks such as unit 28 density and SWR amplitude. Each rat was then food restricted again and re-trained on the linear 29 track for 4-6 days with neural recording (not analyzed in this study). Animals were then 30 introduced to the Multiple-W task (Fig. 1b), a version of which has been described previously19.

41 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Tetrodes were sometimes advanced a small amount after the conclusion of the day’s recording 2 on a case-by-case basis to improve cell yield. 3 The Multiple-W track consisted of six 76 cm arms spaced ~36 cm apart at their midpoints, 4 with 3 cm high walls, connected to a “back” which extended past the first and sixth arm by 14 5 cm on each side (to mimic the availability of a right and left turn from these arms), and elevated 6 76 cm off the ground. On each day, the animal experienced three 20 min “run” (maze) epochs 7 on the track flanked by four 20-45 min sleep epochs in a separate high-walled rest box; only in 8 rare cases (2 epochs each for Rats 1-2, 1 epoch each for Rats 3-4) were there four run epochs. 9 The track was separated from the experimenter by an opaque black curtain, and the white walls 10 of the room were marked with black distal cues of various shapes. Each arm contained a visually 11 identical reward well connected to milk tubing, and milk was run through each well at the 12 beginning of the day to create similar olfactory cues in all wells. 13 In each run epoch, the animal was placed at the back of the center arm of the rewarded 14 sequence and was required by trial-and-error to find the 3 rewarded wells and figure out the 15 alternation sequence between them, Sequence A (SA) or Sequence B (SB). Trials are defined as 16 well visits. A visit to the center well of the sequence (well 3 in SA, well 4 in SB) was rewarded if 17 the animal came from any other well. If a center visit was the first of the epoch or followed an 18 error to a non-sequence arm, the animal could initiate an “outer” well visit to either of the center- 19 adjacent wells to get reward. If a center visit followed a visit to a center-adjacent well, the 20 animal had to then visit the opposite center-adjacent well, requiring hippocampal-dependent 21 memory of the previous trial34. For example, a correct series of trials for SA would be 3-2-3-4- 22 3-2; for SB, 2-1-2-3-2-1. Consecutive visits to the same well were counted as errors, such that 23 chance performance was defined as 1 out of 6 (0.167). The nosepoke at each well was detected 24 by an infrared beam break, which automatically triggered liquid reward delivery (105 µL 25 evaporated milk plus 5% sucrose) via a syringe pump (OEM Systems) after a 2 second delay, 26 and the animal’s departure from the well was self-paced. 27 During the “Acquisition” phase (5-9 days), the same sequence was rewarded on every epoch: 28 3 animals acquired SA and 2 animals acquired SB. When the animal achieved greater than 80% 29 correct performance on the Acquisition sequence for at least 1 epoch (assessed in real time as an 30 epoch average), the novel sequence was introduced in the second epoch of the first “Switch” day. 31 Only Rat 1 failed to reach 80% correct but was advanced to the Switch phase after achieving

42 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 >75% correct and one full week of training as in ref. 19; this rat was thus excluded from the 70- 2 80% and >80% Acquisition performance bins in Fig. 1g,h. In the Switch phase, the rewarded 3 sequence was switched on each run epoch, such that the starting sequence of each Switch day 4 was also alternated (8-10 days). The goal was to continually promote adaptive, spatially-guided 5 reward-seeking behavior, and the overlap between arms of the two sequences allowed us to 6 compare rewarded and error trials on the same spatial paths. 7 8 Data collection and processing. Spiking, local field potential (LFP), position video, and reward 9 well digital inputs and outputs were collected using the NSpike data acquisition system (L.M.F 10 and J. MacArthur, Harvard Instrumentation Design Laboratory). For Rats 1-3, LFP data were 11 collected at 1500 Hz sampling rate and digitally filtered online at 1-400 Hz (2-pole Bessel for 12 high- and low-pass). Spikes were sampled at 30 kHz and saved as snippets of each waveform, 13 filtered at 600-6000 Hz for hippocampus and 600-6000 Hz (2 rats) or 300-6000 Hz (1 rat) for 14 NAc. For Rats 4-5, LFP and spikes were collected continuously at 30 kHz and filtered online at 15 1-6000 Hz, with post-hoc filtering applied in Matlab to extract LFP and spike waveforms using 16 the same parameters as above (300-6000 Hz for NAc spikes). Subsets of spike data were 17 collected as snippets in these animals to verify our post-hoc filtering. Note that for LFP and spike 18 waveforms, negative voltages are displayed upward. All LFP and spikes were collected relative 19 to local references lacking spiking activity, which were themselves referenced to cerebellar 20 ground: for dH tetrodes, the reference tetrode was located in corpus callosum (4 rats) or deep 21 cortex with no units (1 rat); for vH, in ventral corpus callosum (1 rat) or in at the 22 ventrolateral edge of the midbrain ( or optic tract; 4 rats); for NAc, typically in 23 corpus callosum, the lateral ventricle, or anterior commissure. Overhead video of the track, 24 collected at 30 frames/s, allowed us to track the animal’s position via an array of infrared diodes 25 attached to the top of the headstage, a few cm above the rat’s head. 26 Spike sorting was performed using a combination of manual clustering in Matclust 27 (Mattias Karlsson; Rats 1-3) and automated sorting with manual curation in Mountainsort 28 (Flatiron Institute; all data for Rats 4-5, individual days for Rats 1-3). Cells were clustered 29 within epochs but tracked across all run and sleep epochs for which they could be isolated; with 30 Mountainsort, this was done with a drift-tracking extension of the core pipeline and manual 31 merging as needed35. In Matclust, clustering was performed in amplitude and principal

43 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 component space, and only well-separated units with clear refractory periods in their ISI 2 distributions were accepted. In Mountainsort, we generally accepted clusters with isolation score 3 >0.96, noise overlap <0.03 (median isolation score ~0.995, median noise overlap ~0.002), and 4 clear separation from other clusters in amplitude and principal component space. The similarity 5 of cluster quality between Mountainsort and Matclust was verified manually on a subset of the 6 data and has been extensively verified in previous work36. The same pattern of SWR- 7 modulation of NAc cells was observed within each animal (data not shown), indicating that our 8 results were not due to unit clustering in certain animals. 9 10 Histology. At the conclusion of the experiment, animals were anesthetized with isoflurane and 11 small electrolytic lesions were made at the end of each tetrode to mark recording locations (30 12 µA of positive current for 3 seconds, on 2-4 channels of the tetrode). The animal recovered for 13 24 hours to allow gliosis, and was then euthanized with pentobarbitol and perfused transcardially 14 with PBS followed by 4% paraformaldehyde in 0.1M PB. The brain was post-fixed in 4% 15 paraformaldehyde, 0.1M PB in situ for at least 24 hours, followed by removal of the tetrodes and 16 cryoprotection in 30% sucrose in PBS. were embedded in OCT compound and sectioned 17 coronally at 50 µm thickness. Tissue was either Nissl stained using cresyl violet, or for a subset 18 of dH sections, immunostained for RGS14, a marker of CA2, using previously described 19 methods37. 20 To reconstruct recording sites (Extended Data Fig. 1), evenly spaced plates from the 21 Paxinos and Watson Rat Atlas (2007), which is based on Wistar rats, were stretched and 22 modified to align to representative sections from each brain region, using landmarks such as the 23 ventricles, corpus callosum, and hippocampal pyramidal layers as guides. These modified plates 24 were then treated as atlases to align the remaining sections and recording sites across animals. 25 26 Data analysis. All analyses were performed using custom code written in Matlab. 27 28 Behavioral analysis. The animals’ task performance was analyzed using a state space 29 algorithm38 which estimates the probability that the animal is performing accurately according to 30 Sequence A or B on each trial. This algorithm provides 90% confidence intervals which reveal 31 when the animal is performing one sequence significantly better than the other. All trials in the

44 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Acquisition phase were analyzed together and background probability was set at chance (0.167), 2 so that behavior of all animals could be compared from a similar starting point. During the 3 Switch phase, each epoch was estimated independently with an unspecified background 4 probability to get the most accurate representation of the animals’ behavior; this means that 5 occasionally the behavioral state could “jump” at an epoch boundary. The mode of the 6 probability distribution was used to assign trials into performance stages for the SWR rate 7 analysis in Fig. 1g,h, which yielded a different number of trials per stage from each animal. 8 9 SWR detection. SWRs were detected in dCA1 and vCA1 in 4 rats (Rats 2-5), and dCA1 and 10 ventral CA3 in 1 rat (Rat 1), using methods described previously37. Briefly, each tetrode’s LFP 11 was filtered to the ripple band at 150-250 Hz, the ripple amplitude was squared, summed across 12 tetrodes (3 per animal in dH, only 1 per animal in vH, as this was the minimum number present 13 in all animals), and smoothed with a 4 ms s.d., 32 ms wide Gaussian kernel. We then took the 14 square root of this trace as the power envelope to detect excursions greater than 2 s.d. of the 15 mean power within an epoch, lasting at least 15 ms. Tetrodes were chosen based on ripple band 16 power and proximity to the center of the pyramidal layer. The SWR start time (when the 17 envelope first crosses the threshold) was used as the event detection time. For spiking, 18 characterization, and cross-correlation analyses, we excluded SWRs that occurred within 0.5 s of 19 a previous SWR (i.e. chained SWRs). SWRs were only included for all analyses if detected at 20 head speeds <4 cm/s. 21 As a control, we also detected “noise ripples,” events in the 150-250 Hz band that 22 exceeded a 2 s.d. threshold on our reference tetrodes for dH and vH (which were not in the 23 hippocampus). These events are highly unlikely to be SWRs, but instead may reflect muscle 24 artifacts or other high-frequency noise. For all analyses of SWRs other than NAc spiking 25 modulation (to include the maximum number of NAc spikes), we excluded SWRs with start 26 times occurring within 100 ms of a “noise ripple” on the local reference. 27 28 Behavioral state definitions. During run epochs, periods of “immobility” were defined as times 29 with a head movement speed <4 cm/s calculated as the derivative of the smoothed position data 30 from the headstage-mounted diodes. We defined “sleep” as periods of immobility in sleep 31 epochs that occurred >60 s after any movement at >4 cm/s. To calculate overall sleep SWR rate

45 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 in Extended Data Fig. 3, NREM sleep periods were defined by exclusion of REM sleep as 2 defined previously37. Specifically, REM periods were detected as times when the ratio of Hilbert 3 amplitudes of theta (5–11 Hz) to delta (1–4 Hz), referenced to cerebellar ground, exceeded a per- 4 animal threshold of 1.4-1.7 for at least 10 s. 5 6 Characterization of SWR properties. To characterize the spectral properties of dSWRs and 7 vSWRs, multi-tapered spectrograms of the raw LFP triggered on SWR start times were 8 generated using the Chronux toolbox (mtspecgramtrigc, sliding 100 ms window with 10 ms 9 overlap, bandwidth 2-300 Hz), and z-scored to the mean power in each epoch before averaging 10 across epochs and days. To approximate the peak ripple frequency, a slice of this spectrogram 11 was taken at the time of peak ripple power per animal. For the remaining properties described in 12 Extended Data Fig. 3: we defined SWR amplitude as the minimum threshold in s.d. that would 13 be required to detect the event (see above). SWR duration is the time between first threshold 14 crossing and return of the envelope to the mean. Mean epoch SWR rate was calculated for all 15 immobility periods in run epochs and all NREM sleep periods during sleep epochs. 16 17 SWR cross-correlation. Cross-correlations between vSWRs and dSWRs were performed 18 within day, using dSWRs as the reference, in 50 ms bins up to 0.5 s lag, and were normalized to 19 the number of dSWRs in each day before averaging across days and animals. To create a z- 20 scored version of the cross-correlation histogram, vSWR event times were circularly shuffled 21 1000 times within immobility periods (by a random amount up to ±half the mean immobility 22 period length) to create 1000 shuffled histograms. The real cross-correlation values were z- 23 scored relative to the distribution of shuffled values within each bin, such that a z-score of 0 24 indicates that the real data is no different than the mean of the shuffles. 25 26 SWR rate relative to reward and novelty. In Fig. 1g,h, we calculated the rate of SWR events 27 per time spent immobile after reward delivery time on individual rewarded or error trials, and 28 then averaged those rates across trials in each learning stage from each animal. Trials with less 29 than 1 s spent immobile were excluded. In Extended Data Fig. 4d, this rate was averaged across 30 trials within each run epoch for each animal. In Extended Data Fig. 4e,f, we calculated SWR 31 rate in 200 ms bins from 0 to 5 s after nosepoke. We subsampled rewarded and error trials based

46 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 on speed by excluding any trial where the animal spent more than 5 position samples (150 ms) 2 moving faster than 4 cm/s (allowing for some jitter of head position), from 1.5 s after nosepoke 3 to the end of the 5 s window. As the animal’s retreat from the reward well is self-paced, this 4 greatly reduced the number of included error trials and focused exclusively on error trials when 5 the animal waited at the well beyond the expected reward delivery time. We also excluded any 6 bins that were not below the speed threshold, as SWRs could not be detected in these bins 7 according to our criteria. SWR rate per bin was then calculated per the number of included bins 8 in each animal, and we required at least 2 s total of data per bin (10 accepted bins) to calculate a 9 rate across animals. 10 11 Unit inclusion. Only NAc units firing at least 100 spikes in a given epoch were included in the 12 current study (865 total NAc units in run, 1678 units in sleep, from all five rats across days). 13 Additional inclusion criteria were applied per analysis. 14 15 Putative cell type classification. NAc single units were classified similar to methods described 16 previously39, using mean firing rate, mean waveform peak width at half-maximum, mean 17 waveform trough width at half-minimum, and ISI distribution. These values were averaged 18 across epochs when a cell was present in multiple epochs within a day. When plotted, mean 19 firing rate and waveform features generated distinguishable clusters (Extended Data Fig. 5), the 20 boundaries of which were defined as follows: fast-spiking interneurons (FSIs): firing rate >3 Hz, 21 peak width <0.2 ms, and a ratio of trough width to peak width (TPR) <2.7 (TPR was estimated 22 by k-means clustering and was more reliable than exact trough width for FSIs); tonically-active 23 neurons (TANs): <5% of ISIs less than 10 ms, a median ISI >100 ms, and peak width and trough 24 width above the 95th percentile for the remainder of the units; unclassified units had low TPR 25 and/or narrow trough widths (<0.2 or 0.3 ms) but firing rates <2 Hz; all other units were 26 considered putative medium spiny neurons (MSNs). Only MSNs and FSIs are included in the 27 current study. 28 Hippocampal units were also classified according to mean firing rate and peak and trough 29 width. Putative interneurons were defined as having firing rates >5 Hz, peak width <0.2 ms and 30 trough width <0.3 ms. All other non-unclassified units were considered putative pyramidal cells 31 (used only in Extended Data Fig. 6).

47 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 2 SWR-triggered spiking activity. For all analyses of SWR-aligned spiking, we created SWR- 3 onset-triggered rasters (1 ms bins) in a 1 s peri-SWR window. From this raster, the mean firing 4 rate was smoothed with a 10 ms s.d., 80 ms wide Gaussian kernel to generate a peri-event time 5 histogram (PETH). For analyses based on z-scored firing rates (e.g. Fig. 2c,g), the raster was 6 padded with a 100x repetition of its start and end values, smoothed, unpadded, and z-scored to 7 the pre-SWR period -500 to 0 ms. 8 For multiunit activity (MUA) analysis in dH and vH, we thresholded all spike events at 9 40 µV on tetrodes with clear multiunit firing in the pyramidal layer. In Rats 4 and 5, MUA was 10 extracted by post-hoc thresholding of the 600-6000 Hz filtered LFP. SWR-triggered MUA spike 11 counts were summed across tetrodes and then divided by the total time per bin to calculate a 12 mean firing rate per animal. 13 To detect significant SWR-modulation of NAc cells, we followed a procedure described 14 previously40. Briefly, for each cell, we circularly shuffled each SWR-triggered spike train by a 15 random amount up to ±0.5 s to generate 5000 shuffled PETHs. We then calculated the summed 16 squared difference of the real PETH relative to the mean of the shuffles in a 0-200 ms window 17 post SWR-onset, and compared it to the same value for each shuffle relative to the mean of the 18 shuffles. Significance at p<0.05 indicates that the real modulation exceeded 95% of the shuffles. 19 The direction of modulation was defined from a modulation index, calculated as the mean firing 20 rate in the 0-200 ms window minus the mean baseline firing rate from -500 to -100 ms, divided 21 by the mean baseline firing rate. This sign of this index was used to assign cells as significantly 22 positively or negatively SWR-modulated. 23 To categorize cells according to both dSWRs and vSWRs, we only included cells that 24 fired at least 50 spikes in the peri-event rasters for both types of SWRs. Cells were subsequently 25 categorized according to their significance and direction as unmodulated (Neither, N), dSWR- 26 significant only (D only), vSWR-significant only (V only), significant during both (Both), 27 dSWR-activated (D+), dSWR-suppressed (D-), vSWR-activated (V+), vSWR-suppressed (V-), 28 or combinations of these: D+V+, D+V-, D-V+, or D-V-. In both wake and sleep, we observed 29 more dSWR- and vSWR-modulated cells than the chance level of 5%. To assess the significance 30 of the “both” modulation categories, we compared each fraction to the chance overlap of our 31 empirical fractions of dSWR- and vSWR-modulated cells using a nonparametric z-test for

48 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 proportions. We defined “modulation amplitude” as the mean z-scored firing rate of each cell 2 (relative to the pre-SWR period -500 to 0 ms) in the 0-200 ms window following SWR onset. 3 4 Potential duplicate cell control. To control for the possibility that cells stably recorded on the 5 same tetrode across days could have been counted more than once and could influence any of our 6 results, we excluded potential duplicate cells based on waveform similarity41 (illustrated in 7 Extended Data Fig. 5). We first established a waveform correlation threshold based on cells 8 recorded on different tetrodes on the same day, which are different cells by definition. For each 9 pair of cells, we aligned their mean waveforms at the peak (on the maximum channel) of one of 10 the cells and calculated a Pearson’s correlation coefficient on each channel (channel 1 of cell A 11 was compared to channel 1 of cell B, and so on). In cases where the waveform snippets were 12 different lengths (due to different spike extraction in Matclust as compared to Mountainsort), we 13 aligned the snippets at their peaks and padded the edges with zeroes as needed. The resulting r 14 values for each channel were then averaged to establish a mean r for that pair. The 95th 15 percentile of r values in this different-cell distribution, 0.979 for wake and 0.980 for sleep, was 16 taken as the threshold for waveform correlation. Next, if a tetrode was moved ≥78 µm across 17 days, we considered the newly acquired cells to be “unique.” If a tetrode was moved less than 78 18 µm between days, we computed the mean r for all pairs of cells on that tetrode across all 19 previous days of similar depth. This could exclude cells that disappeared and “came back” 20 across multiple days, even though this scenario would seem to be unlikely. Cell pairs with a 21 mean r greater than the threshold were tagged as potential duplicates. We first kept cells from 22 the day with the most cells on that tetrode (randomly selected if multiple days tied for maximum 23 cell count). If a given potential duplicate cell had not yet been kept, one instance of that cell was 24 randomly selected to keep. We note that this system will result in some false positive exclusions 25 and false negative inclusions; different MSNs can have highly similar waveforms even though 26 they are different cells (false positive), and waveforms can change dramatically from day to day 27 even for the same cell due to changes in cell health or relative position of the tetrode (false 28 negative). However, applying this conservative control did not change any of our main results. 29 30 NAc neuron task firing. We analyzed trajectory firing patterns using two methods: normalized 31 trial time and linearized position. In the normalized trial time method, each trial was split into

49 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 normalized progression of time spent at the well (from nosepoke to when the animal turns 2 around; “well”) and time spent moving along the path between wells (from turnaround to next 3 nosepoke; “path”). The turnaround time was detected by a >4 cm movement in the x direction, a 4 change in head direction of >0.25 radians (~14˚), and a speed of >2 cm/s. Additionally, we 5 required that the animal had moved away from the well in the y-direction one second in the 6 future, otherwise the turnaround time was incremented. Path and well time were divided into 7 bins of 0.5% of the total completion time. Firing rate was calculated by dividing the number of 8 spikes in each bin by the bin width in seconds on that particular trial (excluding spikes during 9 either dSWRs or vSWRs), smoothing the rate with a 5 bin (2.5%) s.d., 40 bin (20%) wide 10 Gaussian kernel, and then averaging across trials of the same trajectory type (defined by start and 11 end well). We further attempted to control for variation in the animal’s behavior on individual 12 trials in three ways: by only calculating mean trajectory rates when there were at least 3 trials on 13 that trajectory; by performing a pairwise speed profile correlation across trials and only 14 accepting trials that fell at or above the 25th percentile of speed similarity values; and by only 15 accepting trials with a duration at or below the 75th percentile of the trial length distribution. 16 These methods excluded trials that were long, slow, or had many stops. 17 In the linearized position method, we projected the animal’s 2D position to a line 18 connecting each junction and endpoint of the maze, generating a linearized position relative to 19 the start of each trajectory defined by start and end well. Each trajectory thus contained a 20 specific set of maze segments, and we again controlled for behavioral variation by only 21 accepting trials where the animal deviated ≤12 linear cm onto segments not included in the 22 current trajectory (this allowed for small “head swings” onto neighboring segments). From the 23 set of included trials on each trajectory, we calculated a firing rate per linear position bin of 2 cm 24 (divided by total time occupancy in that bin), smoothed it with a 2 cm s.d., 10 cm wide Gaussian 25 kernel, and calculated the mean rate on that trajectory within day. Trajectories missing more 26 than 5 linear position samples (as a result of diode occlusion or due to low time occupancy) were 27 excluded from firing similarity analysis (5 or fewer missing samples were interpolated), and 28 linearized distance was normalized before pairwise correlation across trajectories (below). 29 To assess the firing similarity of a given cell across trajectories that differ in spatial 30 location and direction, we focused on the 6 rewarded trajectories (across SA and SB) depicted in 31 Fig. 3a. We calculated the coefficient of determination between the mean firing profiles of each

50 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 pair of trajectories (as a function of normalized trial time or linearized position), and then took 2 the mean r2 across pairs. We controlled for the effect of firing rate by matching cells in the V+ 3 population to D+ and N cells with the closest firing rates, generating subsampled D+ and N 4 populations. Note that the variety of behavioral controls applied to both methods excluded 5 slightly different numbers of cells, depending on whether the cells were active on enough trials 6 that passed our criteria to compute an r2. 7 We explored a variety of additional task-related firing parameters, reported in Extended 8 Data Fig. 9. Trial-by-trial correlation was performed with the same controls for behavioral 9 variability as described above. Specifically, we correlated pairs of individual trials on the same 10 trajectory to get a mean r2 for that trajectory, and then took the mean r2 across trajectories. In the 11 linearized position version, this was done with spike counts in 5 cm bins as opposed to smoothed 12 firing rate, as time occupancy in any given bin could be very low on a given trial. Spatial 13 information42 was calculated by treating each maze segment (Extended Data Fig. 9c) or each 14 vertical maze segment plus direction (Extended Data Fig. 9d) as a single spatial bin. Path vs. 15 well preference was calculated from each cell’s mean path and well (excluding SWRs) firing 16 rates across trajectories in normalized trial time, as (path – well)/(path + well), such that values 17 greater than zero indicate path preference and values less than zero indicate well preference. 18 Trajectory direction preference was calculated from linearized position, as the absolute area 19 between leftward and rightward trajectory firing rate curves on the same maze segments, divided 20 by their sum (values closer to 1 indicate a stronger preference in one direction, either left or 21 right). Toward vs. away-from-well preference measures the preference of a cell for path 22 movement toward or away from wells (regardless of reward outcome), and is calculated from the 23 linearized position firing rate curves on the same vertical segment in opposite directions (i.e. the 24 area between the two curves divided by their sum). 25 26 NAc neuron reward and reward history firing. To calculate reward vs. error preference at the 27 wells (based on current reward or error), we used two methods. In the first method (Extended 28 Data Fig. 9h,i), we calculated firing rate on rewarded and error well visits as a function of 29 normalized time at the wells (0.5% bins, excluding SWR spikes, smoothed with a 1.5% s.d., 12% 30 wide Gaussian kernel), again applying a pairwise speed profile correlation to only include trials 31 that fell at or above the 25th percentile of speed similarity values. We only included cells for

51 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 which at least 3 trials of both types met the above criteria. We then calculated a reward vs. error 2 index (RI) per cell which equaled the area between the mean rewarded firing rate and mean error 3 firing rate curves divided by their sum, exclusively in time bins for which the mean speed on 4 both rewarded and error trials was <4 cm/s. Significance of reward preference (>0) or error 5 preference (<0) was calculated by a one-tailed permutation test from the set of rewarded and 6 error trials. 7 In the second method for reward preference at the wells (Extended Data Fig. 9j,k), we 8 calculated rewarded and error firing rates as a function of time from nosepoke (0 to 4 s, 100 ms 9 bins). We then computed a reward vs. error index (RI) as the difference in mean rewarded and 10 error firing rates post-reward-delivery time (2 to 4 s, excluding SWR spikes and times), divided 11 by their sum. We again controlled for speed by excluding any trial where the animal spent more 12 than 5 position samples (150 ms) moving faster than 4 cm/s, and required at least 5 included 13 trials of both types to compute an RI. Significance was again assessed with a one-tailed 14 permutation test in the 2-4 s window. 15 To examine reward history preference, we calculated mean firing rate on normalized path 16 time using the same methods as for trajectory firing (but smoothed with a 1.5% s.d., 12% wide 17 Gaussian kernel), now comparing paths following a rewarded well visit or an error well visit 18 (trials are defined by reward outcome of trial n-1). We only included cells for which at least 3 19 trials of both types met our speed and trial length criteria. The reward history preference was 20 calculated as area between the curves divided by their sum and tested for significance of reward 21 preference (>0) vs. error preference (<0) with a one-tailed permutation test. 22 23 Spike cross-correlations. Spike cross-correlations between pairs of D+ or V+ NAc MSNs were 24 calculated in 10 ms bins at up to 0.5 s lag. Each CCH was first normalized by the square root of 25 the product of the number of spikes from each cell. To z-score the CCH of each cell pair, one of 26 the spike trains was circularly shuffled 1000 times (by a random amount up to ±half the mean 27 immobility period length) to create 1000 shuffled CCHs. Each real and shuffled CCH was 28 smoothed with a 20 ms s.d., 160 ms wide Gaussian kernel. The real cross-correlation values were 29 then z-scored relative to the distribution of shuffled values within each bin. We averaged the 30 smoothed cross-correlation z-score in a 20 ms bin around 0 (±10 ms) to get an approximate “0

52 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 lag” value, and the distribution of z-scores at 0 lag across pairs was converted to a cumulative 2 density function (Fig. 4). 3 4 Data reporting. No statistical methods were used to predetermine sample size. The minimum 5 number of required animals was established beforehand as four or more, in line with similar 6 studies in which this number yields data with sufficient statistical power. 7 8 Data availability. Data are available from the authors upon reasonable request. 9 10 Statistics. All statistical tests were two-sided unless otherwise specified. 11 12 Code availability. All custom code is available from the authors upon reasonable request.

53 bioRxiv preprint doi: https://doi.org/10.1101/604116; this version posted April 23, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Methods References

34 Kim, S. M. & Frank, L. M. Hippocampal lesions impair rapid learning of a continuous spatial alternation task. PLoS ONE 4, e5494 (2009). 35 Chung, J. E. et al. High-Density, Long-Lasting, and Multi-region Electrophysiological Recordings Using Polymer Electrode Arrays. Neuron 101, 21-31.e25 (2019). 36 Chung, J. E. et al. A Fully Automated Approach to Spike Sorting. Neuron 95, 1381- 1394.e1386 (2017). 37 Kay, K. et al. A hippocampal network for spatial coding during immobility and sleep. Nature 531, 185-190 (2016). 38 Smith, A. C. et al. Dynamic analysis of learning in behavioral experiments. J. Neurosci. 24, 447-461 (2004). 39 Berke, J. D. Uncoordinated firing rate changes of striatal fast-spiking interneurons during behavioral task performance. J. Neurosci. 28, 10075-10080 (2008). 40 Jadhav, S. P., Rothschild, G., Roumis, D. K. & Frank, L. M. Coordinated Excitation and Inhibition of Prefrontal Ensembles during Awake Hippocampal Sharp-Wave Ripple Events. Neuron 90, 113-127 (2016). 41 Schmitzer-Torbert, N. & Redish, A. D. Neuronal activity in the dorsal striatum in sequential navigation: separation of spatial and reward responses on the multiple T task. J. Neurophysiol. 91, 2259-2272 (2004). 42 Skaggs, W. E., McNaughton, B. L., Gothard, K. & Markus, E. in Advances in Neural Information Processing Systems (eds S. Hanson, J. D. Cowan, & C. L. Giles) 1030- 1037 (Morgan Kaufmann Publishers, 1993).

54