The Credit Assignment Problem in Cortico‐Basal Ganglia‐Thalamic
Total Page:16
File Type:pdf, Size:1020Kb
Received: 2 January 2020 | Revised: 23 March 2020 | Accepted: 25 March 2020 DOI: 10.1111/ejn.14745 SPECIAL ISSUE REVIEW The credit assignment problem in cortico-basal ganglia-thalamic networks: A review, a problem and a possible solution Jonathan E. Rubin1 | Catalina Vich2 | Matthew Clapp3 | Kendra Noneman4 | Timothy Verstynen3,5 1Department of Mathematics, Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA 2Department de Matemàtiques i Informàtica, Institute of Applied Computing and Community Code, Universitat de les Illes Balears, Palma, Spain 3Carnegie Mellon Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA 4Micron School of Materials Science and Engineering, Boise State University, Boise, ID, USA 5Department of Psychology, Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA, USA Correspondence Jonathan E. Rubin, Department of Abstract Mathematics, 301 Thackeray Hall, The question of how cortico-basal ganglia-thalamic (CBGT) pathways use dopamin- University of Pittsburgh, Pittsburgh, PA, ergic feedback signals to modify future decisions has challenged computational neu- USA. Email: [email protected] roscientists for decades. Reviewing the literature on computational representations of dopaminergic corticostriatal plasticity, we show how the field is converging on a and Timothy Verstynen, Department of normative, synaptic-level learning algorithm that elegantly captures both neurophys- Psychology, 342c Baker Hall, Carnegie, iological properties of CBGT circuits and behavioral dynamics during reinforcement Mellon University, Pittsburgh, PA, USA. learning. Unfortunately, the computational studies that have led to this normative Email: [email protected] algorithmic model have all relied on simplified circuits that use abstracted action- Funding information selection rules. As a result, the application of this corticostriatal plasticity algorithm European Regional Development Funds, to a full model of the CBGT pathways immediately fails because the spatiotemporal Grant/Award Number: MTM2015-71509- C2-2-R, and MTM2017-83568-P; US distance between integration (corticostriatal circuits), action selection (thalamocorti- National Science Foundation, Grant/Award cal loops) and learning (nigrostriatal circuits) means that the network does not know Number: 1351748, 1516288, 1612913 and 1724240; Ministerio de Economía which synapses should be reinforced to favor previously rewarding actions. We show y Competitividad; Agencia Estatal de how observations from neurophysiology, in particular the sustained activation of se- Investigación; Carnegie Mellon University lected action representations, can provide a simple means of resolving this credit as- signment problem in models of CBGT learning. Using a biologically realistic spiking model of the full CBGT circuit, we demonstrate how this solution can allow a net- work to learn to select optimal targets and to relearn action-outcome contingencies when the environment changes. This simple illustration highlights how the norma- tive framework for corticostriatal plasticity can be expanded to capture macroscopic network dynamics during learning and decision-making. Abbreviations: CBGT, cortico-basal ganglia-thalamic; DDM, drift-diffusion model. Jonathan E. Rubin and Timothy Verstynen contributed equally to this article. Edited by Dr. Yoland Smith. The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ejn.14745 © 2020 Federation of European Neuroscience Societies and John Wiley & Sons Ltd 2234 | wileyonlinelibrary.com/journal/ejn Eur J Neurosci. 2021;53:2234–2253. RUBIN ET AL. | 2235 KEYWORDS action selection, basal ganglia, credit assignment, dopamine, eligibility trace, reinforcement learning, synaptic plasticity 1 | INTRODUCTION more exploratory ideas, both to test the capacity of these net- works for effective learning and action selection, and to gen- Survival in dynamic natural environments requires that ani- erate associated predictions. mals effectively use prior experiences to guide future action Here, we review the computational approaches that have selection. For example, remembering which plants previously been used to understand how dopamine signals implement provided a tasty treat versus which plants induced gastroin- reinforcement learning in CBGT pathways. We structure the testinal distress can have a substantial impact on an animal's review part of this work to highlight the progression toward survival. In the mammalian brain, this form of reinforcement current efforts to connect synaptic-level models of plasticity learning (Sutton, Barto, & Book, 1998) is driven in large part with circuit-level models of CBGT-dependent decision-mak- by the neurochemical dopamine (Schultz, 1998, 2016; Schultz ing and corresponding cognitive representations. We begin in & Romo, 1990). Sensory events that happen immediately after Section 2 by providing a concise general description of the an action (e.g., a sweet taste or a painful sensation) can affect CBGT pathways, a brief introduction to ideas about evidence the output of midbrain dopamine neurons that target cells in accumulation and an overview of the current understanding the striatum, the major recipient of cortical inputs to the basal of corticostriatal plasticity. Next, in Section 3, we summarize ganglia, tuning the sensitivity of striatal neurons to descending previous computational models that describe how reward-re- cortical inputs (for review, see Peak, Hart, & Balleine, 2019; lated dopamine could influence corticostriatal synapses in a Surmeier et al., 2010). In turn, this dopamine-dependent corti- way that impacts future action selection. While these elegant costriatal plasticity is thought to impact how the basal ganglia synapse-level models are effective at capturing the nuanced modulate cortical activity during action selection via their in- dynamics of reinforcement learning, they rely on simple, ab- fluence on thalamocortical connections, closing the so-called stracted action-selection rules that ignore or oversimplify the cortico-basal ganglia-thalamic (CBGT) computational “loop” credit assignment problem (i.e., knowing how to change only (Alexander, DeLong, & Strick, 1986; DeLong & Wichmann, those synapses that lead to the decision despite the spatiotem- 2007; Parent & Hazrati, 1995). poral distance between selection and subsequent feedback The mainstream view of reward-related dopamine sig- signals). We go on to show how insights from the electro- nals is that they shape corticostriatal synaptic plasticity in physiology literature can provide essential clues as to how to a way that promotes reward-inducing behaviors and sup- resolve the credit assignment problem in CBGT pathways and presses non-rewarding actions (Mink, 1996), casting the cor- present some proof-of-concept simulations to highlight how ticostriatal synapses as a critical site for reward-dependent dopaminergic signals can implement reinforcement learning learning via synaptic plasticity. This elegant idea is consis- in a biologically plausible spiking model of CBGT circuits. tent with a wealth of data across species (Lee, Tai, Zador, Finally, we end by highlighting some open issues and future & Wilbrecht, 2015; Smeets, Marin, & Gonzalez, 2000). The directions that the computational neuroscience literature can challenge that has long stymied neuroscientists, however, consider in order to bridge the gap between circuit properties is the proverbial devil in the details. The mapping from ac- of CBGT pathways and behavior. tions to dopamine release to the actual dopamine signals that are received in the striatum is complicated and depends on factors relating to many aspects of experimental condi- 2 | CIRCUIT-LEVEL tions, including species involved, history preceding action ARCHITECTURE OF CBGT performance, subject motivational state and measurement PATHWAYS techniques. The transformation from dopamine signals to eventual modifications in behavior is even more opaque and The computational goal of the CBGT loops appears to be difficult to elucidate because we lack a clear understanding to integrate information from competing cortical sources in of how the circuit-level properties of CBGT pathways map to order to (a) bias downstream selection systems and (b) use behavior (but see Bogacz & Gurney, 2007; Dunovan, Vich, feedback signals to promote learning that modifies this bias- Clapp, Verstynen, & Rubin, 2019; Ratcli & Frank, 2012). ing process in the future (Mink, 1996). The canonical model of This complexity provides a natural setting for the application how CBGT circuits (Figure 1) implement these computations of computational modeling, which can be used to instanti- relies on three structurally and functionally dissociable con- ate the known components of the system together with other trol pathways: the direct (facilitation), indirect (suppression) 2236 | RUBIN ET AL. and hyperdirect (braking) pathways. This model assumes organization of CBGT pathways, have led to more recent that the basal ganglia is organized into multiple action chan- theoretical models proposing that, within an action channel, nels (Bogacz, 2007; Bogacz & Gurney, 2007; Mink, 1996), the dMSN and iMSNs work in a competitive manner to regu- with each channel containing a direct and an indirect path- late the certainty of a given action decision (Bariselli, Fobbs, way. Activation of the direct pathway suppresses the basal Creed, & Kravitz, 2018; Dunovan,