Psychological Review, 2017, 124, 472-482.

A Neural Interpretation of Exemplar Theory

F. Gregory Ashby & Luke Rosedahl University of California, Santa Barbara

Exemplar theory assumes that people categorize a novel object by comparing its similarity to the representations of all previous exemplars from each relevant category. Exemplar theory has been the most prominent cognitive theory of categorization for more than 30 years. Despite its considerable success in providing good quantitative fits to a wide variety of accuracy data, it has never had a detailed neurobiological interpretation. This article proposes a neural interpretation of exemplar theory in which category is mediated by synaptic plasticity at cortical-striatal synapses. In this model, categorization training does not create new memory representations, rather it alters connectivity between striatal neurons and neurons in sensory association cortex. The new model makes identical quantitative predictions as exemplar theory, yet it can account for many empirical phenomena that are either incompatible with or outside the scope of the cognitive version of exemplar theory.

Keywords: Categorization; Exemplar Theory; COVIS; Striatum

Introduction One challenge to interpreting exemplar theory at the neu- ral level is to separate the quantitative predictions of the the- Exemplar theory assumes that categorization is a process ory from the cognitive interpretations that are used to justify of learning about the exemplars that belong to the category. those predictions. At the cognitive level, exemplar theory When an unfamiliar stimulus is encountered, its similarity assumes that people access the memory representations of is computed to the memory representation of every previ- all previously seen exemplars. The standard interpretation is ously seen exemplar from each potentially relevant category. that these are detailed replicas of each exemplar (filtered by The probability that the stimulus is assigned to each category attentional processes) that do not typically include contextual increases with the sum of these similarities (Brooks, 1978; information (e.g., details about the experimental room). The Estes, 1986, 1994; Kruschke, 1992; Lamberts, 2000; Medin closest match in the memory literature to such representa- & Schaffer, 1978; Nosofsky, 1986). tions is probably provided by semantic memory. Exemplar theory has been the most prominent cognitive This cognitive version of exemplar theory has been a chal- theory of categorization for more than 30 years. Despite its lenge to interpret at the neural level. First, the memory considerable success in providing good quantitative fits to ac- processes postulated by exemplar theory appear qualitatively curacy data from a wide variety of experiments, it has never different from all of the memory systems that have been iden- had a detailed neurobiological interpretation. This article tified by memory researchers. For example, within the mem- corrects that shortcoming of the literature. We propose a neu- ory systems literature, semantic memory is assumed to be ral version of exemplar theory in which category learning is declarative. In contrast, exemplar theorists are careful to as- mediated by synaptic plasticity at cortical-striatal synapses. sume that people do not have conscious awareness that they The neural version makes identical quantitative predictions are accessing exemplar when making categoriza- to the exemplar model, yet it can account for many empirical tion decisions. Thus, the cognitive version of exemplar the- phenomena that are either incompatible with or outside the ory appears to postulate a unique memory system that has scope of the cognitive version of exemplar theory. not yet been discovered by memory researchers. It is important to note, however, that other instance-based theories postulate more traditional memory systems. For example, RULEX (Nosofsky, Palmeri, & McKinley, 1994) c 2017, American Psychological Association. This paper is not the copy of record and may not exactly replicate the final, authorita- assumes people use explicit rules during categorization but tive version of the article. Please do not copy or cite without authors they memorize exceptions. Presumably, people are aware of permission. The final article will be available, upon publication, via these exceptions, so this form of memory seems identical to its DOI: 10.1037/rev0000064. semantic memory. This research was supported by NIH grant 2R01MH063760. Previous attempts at providing a neural interpretation

1 2 EXEMPLAR THEORY of the cognitive processes postulated by exemplar theory of striatal-mediated procedural learning than hippocampal- have assigned key roles to the hippocampus and surround- mediated declarative learning (Crossley, Madsen, & Ashby, ing medial temporal lobe structures (e.g., Pickering, 1997; 2012). Sakamoto & Love, 2004). The problem with these attempts Third, a number of behavioral dissociations have been re- is that the evidence supporting a major role for medial tem- ported that seem more consistent with a striatal locus for the poral lobe structures in category learning is weak. learning of similarity-based categories than a medial tempo- First, medial temporal lobe interpretations of exemplar ral lobe locus (for a review, see Ashby & Valentin, 2016). theory predict that patients with damage to medial tempo- This work has contrasted rule-based (RB) and information- ral lobe structures should be impaired in category learning. integration (II) category-learning tasks. In RB tasks, the cat- We know of three studies that reported category-learning egories can be learned via some explicit reasoning process deficits in amnesiacs (Hopkins, Myers, Shohamy, Gross- (Ashby, Alfonso-Reese, Turken, & Waldron, 1998). In the man, & Gluck, 2004; Kolodny, 1994; Zaki, Nosofsky, Jes- most common applications, only one stimulus dimension is sup, & Unversagt, 2003), one that reported normal perfor- relevant, and the participant’s task is to discover this relevant mance on the first 50 trials but impaired performance later dimension and then to map the different dimensional values on (Knowlton, Squire, & Gluck, 1994), and one that re- to the relevant categories. A variety of evidence suggests ported normal categorization by amnesiacs when the stim- that success in RB tasks depends on declarative memory uli were faces, but impaired performance when the stimuli and especially on working memory and executive attention were virtual reality scenes (Graham et al., 2006). On the (Ashby et al., 1998; Maddox, Ashby, Ing, & Pickering, 2004; other hand, many more studies have reported intact category- Waldron & Ashby, 2001; Zeithamova & Maddox, 2006). learning performance in patients with amnesia (e.g. Bayley, In II category-learning tasks, accuracy is maximized with a Frascino, & Squire, 2005; Filoteo, Maddox, & Davis, 2001b; similarity-based strategy in which information from two or Janowsky, Shimamura, Kritchevsky, & Squire, 1989; Knowl- more incommensurable stimulus components is integrated at ton & Squire, 1993; Kolodny, 1994; Leng & Parkin, 1988; some predecisional stage (Ashby & Gott, 1988; Ashby et al., Squire & Knowlton, 1995; Zaki et al., 2003). For example, 1998). Evidence suggests that success in II tasks depends on Filoteo et al. (2001b) reported normal performance by amne- procedural memory (Ashby, Ell, & Waldron, 2003; Ashby siacs in a difficult information-integration categorization task & Ennis, 2006; Filoteo, Maddox, Salmon, & Song, 2005; with nonlinearly separable categories that required hundreds Knowlton, Mangels, & Squire, 1996; Maddox, Bohil, & Ing, of training trials. In fact, in the Filoteo et al. (2001b) study, 2004). Exemplar theory assumes a similarity-based catego- one (medial temporal lobe) amnesiac and one control par- rization strategy so it seems especially tenable for II tasks. ticipant completed a second day of testing. Despite lacking Furthermore, many researchers have argued that learning in an explicit memory of the previous session, the patient with RB tasks is mediated by an explicit, rule-learning process amnesia performed slightly better than the control on the first that is incompatible with exemplar theory (e.g., Ashby et al., block of day 2. This result suggests that amnesiacs do not 1998; Erickson & Kruschke, 1998; McDaniel, Cahill, Rob- necessarily rely on working memory to perform normally in bins, & Wiener, 2014; Nosofsky et al., 1994; Rouder & Rat- category-learning tasks (because working memory cannot be cliff, 2006). used to retain category knowledge across days). Currently, at least 25 separate empirical dissociations A second set of problematic results come from neu- between RB and II category learning have been reported roimaging studies of unstructured category-learning tasks. (Ashby & Valentin, 2016). Two of these are especially im- Unstructured categories are those in which the stimuli are portant for the present purposes. First, II but not RB tasks assigned to each contrasting category randomly, and thus are extremely sensitive to feedback timing (Maddox, Ashby, there is no rule- or similarity-based strategy for determin- & Bohil, 2003; Maddox & Ing, 2005; Worthy, Markman, & ing category membership. Introspection seems to suggest Maddox, 2013). In particular, II learning is better when the that the only way arbitrary categories of this type could be feedback is delivered 500 ms after the response than when learned is via explicit memorization, so if medial tempo- the feedback is immediate or delivered after a 1 sec delay ral lobe structures play a critical role in any categorization (Worthy et al., 2013), whereas feedback delays of 2.5 sec or task, then unstructured categorization tasks seem like a good longer completely abolish almost all II learning (Maddox et candidate. Even so, fMRI studies of unstructured-category al., 2003; Maddox & Ing, 2005). In contrast, delays of up learning have found task-related activity in the striatum, but to 10 sec have no effect on RB learning. The II results are typically not in the hippocampus or other medial temporal consistent with the effects on cortical-striatal synaptic plas- lobe structures (Lopez-Paniagua & Seger, 2011; Seger & ticity of delays between dopamine (DA) release and Ca2+ in- Cincotta, 2005; Seger, Peterson, Cincotta, Lopez-Paniagua, flux into the spines of medium spiny neurons (Yagishita et & Anderson, 2010). In addition, unstructured category al., 2014), and inconsistent with the hypothesis that II learn- learning includes a motor component that is more typical ing is mediated by medial temporal lobe structures. Another EXEMPLAR THEORY 3 important dissociation is that switching the locations of the the weighted Minkowski metric: response keys interferes with performance of II tasks but not  m 1/r with performance of one-dimensional RB tasks (Ashby, Ell, X  δ =  w |x − x |r , (2) & Waldron, 2003; Maddox, Bohil, & Ing, 2004; Maddox, ik  j i j k j  Glass, O’Brien, Filoteo, & Ashby, 2010; Spiering & Ashby, j=1 2008). This is relevant because a similar result has been re- where m is the number of perceptual dimensions, w is the ported for the most widely studied procedural-learning task – j proportion of attention allocated to dimension j, x is the namely the serial reaction time task (Willingham, Wells, Far- i j coordinate value of stimulus i on the jth perceptual dimen- rell, & Stemwedel, 2000), and procedural learning is thought sion, and r determines the nature of the distance metric. In to depend on the striatum – not on the hippocampus. particular, r = 1 produces city-block distance and r = 2 pro- This article describes a neural interpretation of exemplar duces Euclidean distance. Similarity is inversely related to theory that easily accounts for all of these problematic re- distance via: sults. First though, we define the exemplar model more ex- ω ηik = exp(−cδ ) (3) plicitly. ik where c and ω are constants. The parameter c is a measure of stimulus sensitivity, which increases with the overall dis- The Exemplar Model criminability of the stimuli. The constant ω defines the na- ture of the similarity function. In virtually all applications ω Many researchers have tested exemplar models of catego- = 1 or 2. A value of ω = 1, which produces the exponential rization (e.g., Estes, 1986, 1994; Medin & Schaffer, 1978; similarity function (Shepard, 1987), is typically combined Nosofsky, 1986, 2011). These various versions are all highly with a city-block distance metric, whereas a value of ω = 2, similar, but to establish mathematical equivalence it is neces- which produces the Gaussian similarity function, is typically sary to focus on one specific version of exemplar theory. The combined with a Euclidean distance metric. model has evolved somewhat from its initial description, so our focus will be on the exemplar model known as the gen- The Neural Model eralized context model (GCM; Nosofsky, 1986, 2011). Although the evidence does not favor medial temporal The equivalence result established below holds for any lobe structures as the most important regions for category number of categories, but to keep the notation simpler we learning, it does favor a critical role for the basal ganglia, will assume a standard categorization experiment with two and especially the striatum – a major input region within categories A and B. Under these conditions, the exemplar the basal ganglia that includes the caudate nucleus and the model assumes that the probability that a participant assigns putamen (for reviews, see e.g., Ashby & Ennis, 2006; Seger, stimulus k to category A equals 2008; Seger & Miller, 2010). A complete review is beyond P the scope of this article, but briefly, in addition to the disso- βA ViAηik i∈CA ciations between RB and II categorization mentioned above, P(A|k) = P P , (1) βA ViAηik + βB ViBηik patients with striatal dysfunction are highly impaired in cat- i∈CA i∈CB egory learning (e.g., Ashby, Noble, Filoteo, Waldron, & Ell, 2003; Filoteo, Maddox, & Davis, 2001a; Filoteo et al., 2005; where CA and CB are sets containing the stimuli in categories Knowlton et al., 1996; Sage et al., 2003; Shohamy, Myers, A and B, respectively, ηik is the similarity between stimuli i Onlaor, & Gluck, 2004; Witt, Nuhsman, & Deuschl, 2002), and k, βA and βB are constants reflecting the participant’s bias and almost all fMRI studies of category learning have re- toward responding A and B, respectively, and ViJ represents ported significant task-related activity in the striatum (e.g., the memory strength of stimulus i with respect to category Nomura et al., 2007; Poldrack et al., 2001; Seger & Cincotta, J. The memory strengths ViJ are not free parameters, but 2002, 2005; Waldschmidt & Ashby, 2011). instead are usually “set equal to the relative frequency with So the evidence is good that the striatum plays a key which each exemplar i is provided with Category J feedback role in category learning – at least in the learning of II during the classification training phase” (Nosofsky, 2011). and unstructured categories (i.e., rather than RB categories). Most categorization experiments present all stimuli equally The problem for exemplar theory is that basal ganglia neu- often, and in this case note that all ViJ are equal and therefore roanatomy does not favor traditional interpretations of ex- can be canceled from Eq. (1). emplar representations. Virtually all of cortex (except V1) Similarity is assumed to be inversely related to the dis- sends excitatory (glutamatergic) projections to the striatum tance between the perceptual representations of the stimuli. (Reiner, 2010). These cortical inputs, which synapse on More specifically, the distance between the perceptual repre- medium spiny neurons (MSNs), are massively convergent, sentations of stimuli i and k, denoted δik, is computed from with estimates that somewhere between 50,000 and 350,000 4 EXEMPLAR THEORY

Sensory association cortex is modeled in the same way as in Ashby, Ennis, and Spiering (2007). Briefly, this means we assume an ordered array of many units in sensory cortex, each tuned to a different stimulus. Neurons in sensory cor- ε ε tex respond not only to a preferred stimulus, but also more weakly to similar stimuli. In perceptual neuroscience this β β phenomenon is described by the neuron’s tuning curve. We model the tuning curve of each sensory cortical unit in the Figure 1 model via radial basis functions – that is, we as- sume that each unit responds maximally when its preferred stimulus is presented and that its response decreases with the distance in perceptual space between the stimulus preferred by that unit and the presented stimulus. The equivalence to exemplar theory holds for any number of these units, so long as every perceptually distinct stimulus maximally excites a Figure 1 . Architecture of the model that is computationally different unit. Given the known cortical-striatal convergence, equivalent to exemplar theory. with the two MSNs shown in Figure 1 we would expect the relevant regions of sensory cortex to include somewhere be- tween 10,000 and 60,000 neurons. cortical neurons converge onto a single striatal MSN (Bolam The equivalence result established below assumes a stan- et al., 2006; Kincaid, Zheng, & Wilson, 1998; C. J. Wil- dard categorization experiment in which the stimulus is pre- son, 1995). In contrast, each of these cortical neurons might sented at constant intensity and without a mask until the par- synapse onto as few as 10 – 100 MSNs (Wickens & Arbuth- ticipant responds. Under these conditions, activation in each nott, 2010). Thus, it appears that the resolution of the striatal sensory unit is 0 during periods when no stimulus is dis- MSNs is not dense enough to allocate one MSN for every ex- played and is either 0 or equal to some positive constant value emplar in any arbitrary category, especially when exemplar during the duration of stimulus presentation. More specifi- similarity is high. cally, consider a trial when stimulus k is presented. Then the As a result, we propose a reconceptualization of ‘exem- response of sensory unit i during the time when the stimulus plar representation’. To our knowledge, there are three ex- is present will equal: isting process-level interpretations of exemplar theory – AL- ω COVE (Kruschke, 1992), the EBRW (Nosofsky & Palmeri, Ii|k = exp(−δik/γ), (4) 1997), and the information-accumulation model (Lamberts, 2000). In each of these, the presentation of a new stimulus where δik is the distance in perceptual space between the rep- adds a new node to the network, and this node serves as the resentations of stimuli i and k as defined by Eq. 2. When exemplar representation of that stimulus on all future trials. ω = 2, Eq. 4 describes a Gaussian radial basis function and We propose instead that the presentation of a stimulus either when ω = 1, Eq. 4 describes a Laplacian radial basis func- changes the synaptic strength at an existing cortical-striatal tion. Either way, Eq. 4 is a popular method for modeling synapse or creates a new synapse. As a result, categoriza- the receptive fields of sensory units, both in models of cate- tion training does not recruit any new nodes, rather it alters gorization (e.g., Ashby & Crossley, 2011; Kruschke, 1992) connectivity between MSNs and units in sensory association and in other tasks (e.g., Er, Wu, Lu, & Toh, 2002; Oglesby cortex. & Mason, 1991; Riesenhuber & Poggio, 1999; Rosenblum, In the remainder of this section, we show that under appro- Yacoob, & Davis, 1996). priate conditions, such a model is mathematically equivalent The key insight that links exemplar theory to cortical- to the exemplar model described by Eqs. (1), (2), and (3). striatal synaptic plasticity comes from Ashby and Alfonso- Reese (1995), who showed that exemplar models are equiv- Model Architecture alent to a classifier that estimates each category distribution via a Parzen (1962) kernel density estimator. Parzen ker- The architecture of the proposed model is shown in Fig- nels are known today as radial basis functions, so the key to ure 1. Essentially this just reproduces the well-known di- the Ashby and Alfonso-Reese (1995) result was to recognize rect pathway through the basal ganglia, and is identical to that the similarity function proposed by exemplar theory (i.e., the simplest version of the procedural-learning system of Eq. 3) could be interpreted as a radial basis function. This the COVIS model of category learning (Ashby et al., 1998; reinterpretation of exemplar similarity provides a natural link Ashby & Waldron, 1999; Ashby, Paul, & Maddox, 2011; between exemplar theory and models such as COVIS that Ashby & Crossley, 2011). use radial basis functions to model the response properties of EXEMPLAR THEORY 5 sensory cortical units. n The second key feature of exemplar theory is that simi- AJ(n) = wJk(n)Ik|k larities are summed to determine category membership (i.e., = w (n), (5) see Eq. 1). Radial basis functions are summed in two differ- Jk ent ways in neural models such as COVIS. First, each MSN where w (n) is the strength of the synapse between sensory responds to a weighted sum of the radial basis functions that Jk unit k and MSN J on trial n. The latter equality holds because model activation in sensory cortex1, and second, the rules no matter how distance is defined, the distance from a point that govern synaptic plasticity (i.e., learning) dictate that the to itself must be zero. As a result δ = 0, and so I = 1 current synaptic strength is a weighted sum of previous acti- kk k|k (i.e., for an ω and any γ). vations. This latter sum is especially important when estab- The second firing-rate equation converts the postsynaptic lishing equivalence to exemplar theory because it depends on activation into postsynaptic firing rate. This equation models all previously seen exemplars, whereas the former sum does nonlinearities in the neural response (e.g., response compres- not. So both exemplar theory and COVIS assign a key role to sion). Following the standard approach, we assume that the weighted sums of radial basis functions of all previously seen postsynaptic firing rate in unit J, denoted by R (n) equals exemplars, and we believe it is this common feature of both J theories that cause such similar behavior in the two concep- RJ(n) = F [AJ(n)] , (6) tually different accounts of categorization. The mathemati- cal equivalence established below only requires the latter of where F is a monotonically increasing function known as the these two neural summing operations. Later however, we activation function. Equivalence to exemplar theory requires show that if both neural summing operations are allowed and that F behaves as a natural log. Thus3, if many of the ancillary assumptions required for exact math- ematical equivalence are relaxed, numerical equivalence is RJ(n) = ln [AJ(n)] barely affected. We believe this is because in both theo- = ln [wJk(n)] . (7) ries, behavior depends fundamentally on sums of radial basis functions that are activated by stimulus presentation. Figure 1 shows the major neuroanatomical projections At the outset of a new experimental session, we assume from the MSNs to regions in premotor cortex that control the that the sensory cortical units are fully interconnected with motor response. However, the equivalence result established the two striatal MSNs. Specifically, every unit in sensory here treats these as simple relays that convey the signals gen- cortex synapses with a dedicated spine on each of the two erated within the striatum to motor output units. Therefore, MSNs. Thus, if there are 40,000 units in sensory cortex there there is no need to model activity in each of these regions are 40,000 spines on each MSN. separately. Instead, we simply assume that the MSN firing rates are unaltered as they pass through these relays. How- Activation in each MSN is modeled using a firing-rate ever, mathematical equivalence requires that two important model (Dayan & Abbott, 2001; Ermentrout & Terman, 2010; processes must occur within premotor cortex. First, indepen- H. R. Wilson & Cowan, 1972, 1973). Thus, the two MSNs dent noise is added to each output unit. We assume that the shown in Figure 1 could be viewed as two identical popula- most active output unit controls the response on each trial, tions of MSNs, or the activations produced could be viewed so noise is needed to account for probabilistic responding. as average firing rates over many repetitions of the stimulus Second, a response bias term is added to each MSN output. conditions. Firing-rate models are described by two equa- For equivalence to the exemplar model the additive bias must tions. The first, typically written as a differential equation, describes how presynaptic input generates postsynaptic acti- 1Nonlinearities are introduced downstream of this summing pro- vation. This equation typically describes within-trial tempo- cess. ral dynamics and it integrates all sensory input2, but in the 2This integration leads to the weighted sum of all sensory corti- present application, only a very simple model is needed. In cal radial basis functions activated by stimulus presentation. 3 particular, there is no need to model the temporal dynamics The natural log has two problematic properties that make it an of within-trial activation or the summing of sensory inputs. unusual choice for an activation function. First, firing rates are com- Specifically, it suffices to simply assume that the postsynap- monly restricted to the range [0,1] and of course, the natural log is unbounded. Second, the natural log can be negative, whereas acti- tic activation in each MSN equals the presynaptic activation vations and firing rates are usually restricted to nonnegative values. at the most active spine weighted by the relevant synaptic Mathematical equivalence to the exemplar model holds whether strength. The most active spine will be the one synapsing RJ (n) is positive or negative, but of course the model is more bi- with the cortical unit that responds most strongly to the stim- ologically realistic if RJ (n) ≥ 0. In practice, this can almost always A n J ω ulus. Let J( ) denote the activation in striatal unit (where be arranged by replacing Eq. (4) with Ii|k = θexp(−δik/γ), for some J = A or B) on trial n. Then if stimulus k is presented on trial sufficiently large value of θ (when the stimulus is present). 6 EXEMPLAR THEORY equal B = ln βJ (see Figure 1). So if we denote the activation postsynaptic activation is above the NMDA threshold, and 3) in output unit J on trial n as YJ(n) then the amount that DA is below baseline. Solving Eq. (9) iteratively produces

YJ(n) = RJ(n) + ln βJ + J. (8) wJk(n + 1) = wJk(0) We assume that response A is made on trial n if Y (n) > A Xn Y (n) and that response B is made if the opposite ordering + + B + αw Ik|S i [RJ(i) − θNMDA] [D(i) − Dbase] occurs. i=1 n X + + Learning − βw Ik|S i [RJ(i) − θNMDA] [Dbase − D(i)] . (10) i=1 DA is known to have pronounced effects on cortical- striatal synaptic plasticity (e.g., Centonze, Picconi, Note that this solution includes a sum of radial basis func- Gubellini, Bernardi, & Calabresi, 2001; Shen, Flajolet, tions activated by all previously seen exemplars. Since the Greengard, & Surmeier, 2008). Much evidence suggests exemplar model includes a similar such sum, this is the key that strengthening of cortical-striatal synapses (and long- feature of the neural model that allows it to mimic the exem- term potentiation) requires strong pre- and postsynaptic plar model. activation and DA levels above baseline. More specifically, the postsynaptic activation must be strong enough to activate Assumptions NMDA receptors (a high-threshold glutamate receptor). In Proving that this model is equivalent to the exemplar contrast, cortical-striatal synapses are weakened (and long- model requires the following extra assumptions. term depression occurs) if pre- and postsynaptic activation A1. Error trials do not change synaptic strengths. are strong and DA is below baseline (e.g., Arbuthnott, This assumption implies that βw = 0 in Eq. (9). Although Ingham, & Wickens, 2000; Ronesi & Lovinger, 2005). DA this assumption is biologically implausible (e.g., Calabresi, levels rise above baseline following unexpected rewards Maj, Pisani, Mercuri, & Bernardi, 1992), it is not necessarily and fall below baseline following the failure to receive an imcompatible with exemplar theory. As noted above, exem- expected reward (Hollerman & Schultz, 1998; Mirenowicz plar theory assumes that the memory strength of exemplar i & Schultz, 1994; Schultz, 1998). As a result, a number in category J is strengthened when a categorization response of researchers have proposed that synaptic plasticity at to exemplar i is provided with category J feedback during cortical-striatal synapses follows reinforcement-learning classification training (Nosofsky, 2011). On trials when the rules (Doya, 2000; Houk, Adams, & Barto, 1995). response is correct this is clear. Positive feedback unambigu- Suppose every sensory-cortical neuron has one synapse ously signals that the stimulus belongs to the category associ- onto each striatal MSN. A biologically motivated form of re- ated with the participant’s response. However, on error trials inforcement learning assumes that if stimulus S k is presented this is not so clear. Negative feedback typically only signals on trial n, then the strength of the synapse between striatal that the stimulus does not belong to the responded category. unit J (for J = A or B) and sensory-cortical unit k on trial In the two-category case an inference can be made about the n + 1 equals (e.g., Ashby & Crossley, 2011): correct category membership of the stimulus, but when there are more than two categories, then no such inference is pos- wJk(n + 1) = wJk(n) sible. As a result, exemplar theory also seems to predict no + + + αwIk|S n [RJ(n) − θNMDA] [D(n) − Dbase] learning on error trials – at least in experiments with more + + than two categories. − βwIk|S n [RJ(n) − θNMDA] [Dbase − D(n)] , (9) A2. Only the spines on the striatal unit associated where αw and βw are the constant learning and unlearning with the most active motor unit are eligible for synap- rates, respectively, [ f (n)]+ = f (n) if f (n) > 0 and 0 if tic plasticity. Assumptions A1 and A2 are clearly over- f (n) ≤ 0, θNMDA is a constant specifying the threshold for simplifications. However, note that they tend to offset each NMDA receptor activation, D(n) is the amount of DA re- other – at least to a certain extent. Together they tend to leased on trial n, and Dbase is the baseline DA level. Note that ensure that active synapses on the MSN associated with the this model assumes that the amount of synaptic strengthen- incorrect response (call this the incorrect MSN) never change ing (i.e., the plus term) is proportional to the product of 1) strength. In the absence of these two assumptions, those presynaptic activation, 2) the amount that the postsynaptic synapses would get strengthened on trials when the correct activation is above the NMDA threshold, and 3) the amount MSN controls the response and weakened on trials when the that DA is above baseline. In contrast, the amount of synap- incorrect MSN controls the response. Another interpretation tic weakening (i.e., the minus term) is proportional to the of Assumptions A1 and A2, therefore, is that this strengthen- product of 1) presynaptic activation, 2) the amount that the ing and weakening cancel each other out. EXEMPLAR THEORY 7

A3. All initial synaptic strengths are negligible – just Equivalence to the Exemplar Model large enough to allow postsynaptic activation to the first stimulus presentation, but small enough so that mathe- In this section, we show that the model sketched in Fig- matically we can assume that w (0) = w (0) = 0, for all Ai Bi ure 1 makes identical quantitative predictions as the exem- i. A justification for this assumption is that the presentation plar model described by Eqs. (1), (2), and (3), given the of a novel stimulus that has no previous reward association reinforcement-learning model described earlier and under causes DA release (e.g., Horvitz, Stewart, & Jacobs, 1997; the assumptions outlined in the previous section. To keep the Wickelgren, 1997), and increased DA levels potentiate the notation simple, we will demonstrate the equivalence for a postsynaptic effects of glutamate (Ashby & Casale, 2003). two-category task, but the equivalence holds for any number Thus, available evidence suggests that the first presentation of categories. of a stimulus during an experimental session is likely to cause an uncharacteristically large striatal response. Consider a two-category task in which category A con- tains the MA exemplars CA = {a1, a2,..., aM } and category It is also important to note that even without this assump- A B contains the MB exemplars CB = {b1, b2,..., bMB }. Note tion the contribution of wAi(0) and wBi(0) to wAi(n) and wBi(n) that there are four kinds of trials: correct A response trials, decreases as n increases. In other words, the effects of the ini- correct B response trials, incorrect A response trials, and in- tial weights on the asymptotic performance of the model are correct B response trials. By assumption 1, neither type of negligible. This is important because the exemplar model is incorrect response trial will change any synaptic weights. a model of asymptotic performance – it was never proposed So first consider correct A response trials – that is, trials as a model of initial learning. So although Assumption A3 is when the stimulus belongs to the set CA and motor unit A is necessary for strict mathematical equivalence, at the practical more active than motor unit B. By assumption 2, no synaptic level this assumption is not critical. strengths on unit B will change. So correct A response trials will only cause changes in the strength of synapses on unit A. A4. In whichever unit controls the response, + + Similarly, correct B response trials will only cause changes [RJ (n) − θNMDA] [D(n) − Dbase] = K for all n, where + in the strength of cortical-striatal synapses on unit B. K is a constant. In general, we expect [RJ(n) − θNMDA] to increase with n because the synaptic strength associated By assumptions A1 – A4, the reinforcement-learning with its sensory input should increase as a result of train- model described by Eq. (10) predicts that the strength of the ing. In contrast, virtually all current models predict that synapse between sensory unit k and striatal unit A reduces to + [D(n) − Dbase] will decrease with n because the rewards be- come more predictable as training progresses. The evidence XnA w (n + 1) = α K I , (12) is good that DA neurons respond to the reward prediction Ak A w k|S i error (Schultz, 2002; Schultz, Dayan, & Montague, 1997) – i=1 defined as the value of the obtained reward minus the value of the predicted reward. Thus, as learning progresses, accuracy where i = 1, ..., nA denotes the first nA correct A response tri- rises and so does the ability to predict the feedback valence. als. Suppose that of these nA trials, stimulus ai was presented PMA As a result, the amount by which DA levels rise following nai times (so i=1 nai = nA, where MA is the number of ex- positive feedback should decrease. Thus, an alternative inter- emplars in category A). Note that Ik|S i equals the response of + sensory unit k on a trial when stimulus S is presented. By pretation of assumption A4 is that [RJ(n)−θNMDA] increases i + Eq. (4) this equation becomes with n at the same rate that [D(n) − Dbase] decreases with n.

n A5. The noise terms J in Eq. (8) are independent XA w (n + 1) = α K exp(−δω /γ) random samples from identical double exponential distri- Ak A w kS i butions. A double exponential distribution is required for i=1 equivalence because probabilities associated with the max- XMA − ω imum of double exponentially distributed random variables = αwK na j exp( δk j/γ) j=1 satisfy the relative goodness rule of Eq. (1) (Yellott, 1977). X In particular, suppose  ,  , ...,  are a set of independent ω 1 2 n = αwK na j exp(−δk j/γ). (13) random variables with identical double exponential distribu- j∈CA tions. Then (Yellott, 1977) Of course, by identical reasoning, a similar equation will u hold for the synapses on striatal unit B.  n  e k P uk + k = max {ui + i} = . (11) i=1 Pn Now consider the very next trial after all this training. eui i=1 Suppose some stimulus k is presented. Then by Eq. (7) the 8 EXEMPLAR THEORY

firing rate in striatal unit A will equal profoundly affect the model’s predictions about asymptotic performance. This section tests that prediction. RA = ln [wAk] To examine the importance of assumptions A1, A2, A4,   and A5 we conducted a number of simulations of the neu-  X  = ln α K n exp(−δω /γ) . (14) ral model in the absence of those assumptions. We call  w a j k j  j∈CA the model that satisfies all assumptions and is mathemati- cally equivalent to exemplar theory the exemplar-equivalent Similarly, model, and the more biologically realistic version that satis-   fies all assumptions of the exemplar-equivalent model except  X  R = ln α K n exp(−δω /γ) . (15) assumptions A1, A2, A4, and A5 the biologically-plausible B  w b j k j  j∈CB model. Specifically, the biologically-plausible model was es- sentially identical to the procedural-learning component of By Eq. (8), activation in the output units will equal the COVIS model described by Ashby et al. (2011). It differs from the exemplar-equivalent model in the following ways: YA = RA + ln βA + A (16)   1. The biologically-plausible model changed synaptic  X   ω  strengths on all trials. So this version of the model = ln αwK na exp(−δ /γ) + ln βA + A  j k j  was not constrained by assumption A1. The amount j∈CA   of synaptic weakening on error trials was set equal to  X  = ln α Kβ n exp(−δω /γ) +  , the amount of synaptic strengthening on correct trials  w A a j k j  A j∈CA (i.e., β was set equal to α in Eq. 9). 2. All cortical-striatal spines were eligible for synap- and    X  tic plasticity on every trial. In particular, synaptic  ω  YB = ln αwKβB nb exp(−δ /γ) + B. (17) strengths on both MSNs were modified on every trial.  j k j  j∈CB So this version was not constrained by assumption A2. Finally, by assumption A5 3. No assumptions were made about the relationship be- P − ω tween the striatal and DA responses. So this version αwKβA na j exp( δk j/γ) j∈C was not constrained by assumption A4. In particular, P(A|k) = A P − ω P − ω DA release was set to the same piecewise linear func- αwKβA na j exp( δk j/γ) + αwKβB nb j exp( δk j/γ) j∈CA j∈CB tion of reward prediction error as in Eqs. 11 – 13 of P − ω βA na j exp( δk j/γ) Ashby et al. (2011). j∈C = A , P − ω P − ω 4. In the exemplar-equivalent model, double- βA na j exp( δk j/γ) + βB nb j exp( δk j/γ) j∈CA j∈CB exponentially distributed noise was added to the (18) activations of the premotor units. In the biologicaly- plausible model, no noise was added to the premotor which is clearly equivalent to the the exemplar model de- units and normally distributed noise was added to the scribed by Eqs. (1), (2), and (3), with n = V , n = V , a j jA b j jB MSN activations. and γ = 1/c. Our general goal was to compare asymptotic performance Effects of Relaxing Assumptions of the exemplar-equivalent and biologically-plausible mod- els, and if these differed, to ask whether the altered structure As is generally always the case, proving exact mathemat- of the biologically-plausible model was consistent or incon- ical equivalence requires strong assumptions. In particular, sistent with the predictions of exemplar theory. To begin, assumptions A1, A2, and A4 seem biologically unreason- it was important to set parameter values in the exemplar- able, and assumption A5 seems arbitrary. A natural ques- equivalent model to values that are typical of previous ap- tion to ask therefore is to what extent equivalence depends on plications of exemplar theory. To satisfy this constraint, we these assumptions. As mentioned earlier, we believe that the did a crude search to find parameter values that allowed the most important condition for equivalence is that both models exemplar-equivalent model to reproduce the 16 × 2 confu- assume that categorization responses are largely determined sion matrices predicted by the GCM in the Dimensional, by weighted sums of radial basis functions of all previously Criss-Cross, Interior-Exterior, and Diagonal transfer condi- seen exemplars. According to this view, relaxing the ancil- tions reported by Nosofsky (1986)4. lary assumptions needed for equivalence might affect the ini- tial learning trajectory of the neural model, but is unlikely to 4The model was run through the same sequence of trials as the EXEMPLAR THEORY 9

The critical question now is how much these predictions exemplar memory, but rather create an urge to respond either will change when we switch to the biologically more plau- A or B, and in this sense the neural version makes different sible version of the model. It turns out that the predictions psychological assumptions than classical accounts of exem- change only a small amount. The r2 between the predic- plar theory5. tions of the exemplar-equivalent and biologically-plausible Mathematical equivalence to the (GCM) exemplar model models was .999, .998, .981, and .955, for the Dimensional, requires a number of strong assumptions that in some cases Criss-Cross, Interior-Exterior, and Diagonal conditions re- are biologically implausible. Even so, we showed via simu- spectively. Thus, the only condition where the predictions lation that these assumptions have little effect on the asymp- changed by a non-negligible amount was the Diagonal condi- totic predictions of the model. Thus, the neural model de- tion. There are two possibilities here. Either the biologically- scribed here mimics the numerical predictions of exemplar plausible model fundamentally changed the structure of the theory when the biologically implausible assumptions are re- data in a way this is incompatible with the exemplar model, placed by assumptions that are more biologically realistic. or it changed the predictions in a way that is consistent We believe that the predictions of the neural model are ro- with exemplar theory but best accounted for by some slight bust with respect to violations of these ancillary assumptions change in one or more parameter values. To answer this because in both models the probability that a stimulus is as- question, we fit the GCM to the confusion matrices predicted signed to a particular category increases with the weighted by the biologically-plausible model. If the biologically- sum of radial basis functions activated by all previously seen plausible model changed the structure of the data in a way exemplars from that category. The ancillary assumptions do that is incompatible with exemplar theory, then the GCM not change this fundamental property of the neural model, will fit poorly. In fact, the GCM provided excellent fits in and since this same property is shared by exemplar theory, all four conditions. The r2 between the predictions of the the two psychologically different models make similar (or biologically-plausible model and the GCM was .999, .999, even identical) quantitative predictions. .981, and .990, for the Dimensional, Criss-Cross, Interior- Because the two models make identical or nearly identi- Exterior, and Diagonal conditions, respectively. cal quantitative predictions, any previous evidence in favor In summary, although exact mathematical equivalence re- of the exemplar model that is based on its success in provid- quires all assumptions described above, assumptions A1, A2, ing good quantitative fits to categorization data is also evi- A4, and A5 have little numerical effect on the asymptotic dence in favor of the neural model. But in addition, the neu- predictions of the model. ral model is able to account for many empirical phenomena that are either incompatible with or outside the scope of the Discussion cognitive version of exemplar theory. This includes all the problematic data reviewed earlier in this article. The neural model described in this article makes very First, because the neural model assumes that the critical different psychological assumptions than are usually associ- site of category learning is at cortical-striatal synapses, it eas- ated with exemplar theory, yet it is mathematically equiv- ily accounts for the many reports that patients with striatal alent to the exemplar model. The cognitive version of ex- dysfunction are impaired in category learning. Second, for emplar theory assumes people retrieve memory representa- the same reason, it accounts for the common fMRI finding tions of previously seen exemplars and that they compare of categorization-related activation in the striatum (Lopez- the presented stimulus to these stored memories. Every new Paniagua & Seger, 2011; Nomura et al., 2007; Poldrack et stimulus creates a new memory representation. In contrast, al., 2001; Seger & Cincotta, 2002, 2005; Seger et al., 2010; the neural version assumes that no memory representations are retrieved. For example, the neural model assumes that Nosofsky (1986) participants – that is, 1200 training and 3500 trans- stimulus presentation always activates two striatal MSNs (in fer trials for each category structure. We fit the model to the average two-category tasks), regardless of the number of exemplars of the Subject 1 and Subject 2 GCM predicted confusion matrices, in each category and regardless of the number of training tri- and we used the coordinates of each stimulus in physical, rather than als. Instead of adding a new exemplar memory represen- perceptual space. This is because our goal was not to reproduce ex- tation, the neural version of exemplar theory assumes that actly what Nosofsky (1986) did, but rather to produce predictions presentation of a new stimulus either modifies the strength of of the exemplar-equivalent model that were representative of a typ- ical exemplar model application. In this sense we were successful an existing cortical-striatal synapse or creates a new synapse because when we fit the exemplar model (i.e., the GCM) to the con- (e.g., by adding a new spine to an existing MSN). Thus, the fusion matrices that resulted from this process, r2 was .996, .996, exemplar representation in the neural version of exemplar .994, and .993 for the Dimensional, Criss-Cross, Interior-Exterior, theory is either the strength of a cortical-striatal synapse or and Diagonal conditions, respectively. perhaps it could be interpreted as a dedicated spine on an 5Note however, that many parameters have the same meaning MSN. The more important point however, is that stimulation in both accounts, including the response biases βJ , the attention of that MSN (or that dendritic spine) would not retrieve an weights w j, and overall discriminability c. 10 EXEMPLAR THEORY

Waldschmidt & Ashby, 2011). Third, it accounts for the trial is repeated an infinite number of times. many empirical dissociations that have been reported be- As one moves down Marr’s hierarchy, it is generally true tween RB and II categorization tasks. For example, it ac- that more than one interpretation is possible. For example, counts for the sensitivity of II category learning to feed- we know of three different algorithmic-level versions of ex- back delays because of the known sensitivity of cortical- emplar theory (Kruschke, 1992; Lamberts, 2000; Nosofsky striatal plasticity to delays between DA release and activ- & Palmeri, 1997), which all make slightly different assump- ity at cortical-striatal synapses (Valentin, Maddox, & Ashby, tions. The neural model proposed here is the first known 2014; Yagishita et al., 2014). It also accounts for the interfer- implementational-level version, and it shows that some qual- ence that occurs in II and unstructured categorization perfor- itatively different psychological assumptions are compatible mance when the location of the response buttons is switched with the quantitative predictions of computational-level ver- – because the terminal nodes in the Figure 1 model are in sions of exemplar theory. One of the main reasons for de- premotor cortex, rather than prefrontal cortex. Fourth, it ac- veloping lower-level versions of any theory is to account for counts for deficits by patients with Parkinson’s disease in II more data. The algorithmic-level versions of exemplar the- and unstructured category learning (Hélie, Paul, & Ashby, ory can account for response time and learning data that are 2012). outside the scope of the computational-level version. And If more anatomical details are added into the model of the the neural version proposed here can account for many phe- striatum (as in Ashby & Crossley, 2011; Cantwell, Crossley, nomena that are outside the scope of the algorithmic-level & Ashby, 2015; Crossley, Ashby, & Maddox, 2013), then the versions. Although many other neural versions may be theo- neural version of exemplar theory can also account for a wide retically possible, the many extra constraints provided by all variety of other phenomena including i) single-unit record- these neuroscience-related phenomena would seem to nar- ings from MSNs and TANs during instrumental conditioning row the set of candidate neural interpretations to a model that (Ashby & Crossley, 2011); ii) many behavioral phenomena assigns a major role to the striatum. If so, then any alternative from instrumental conditioning experiments, including fast neural interpretation that is also consistent with the available reacquisition after extinction, the partial reinforcement ex- neuroscience-related data should look qualitatively similar to tinction effect, spontaneous recovery, and renewal (Crossley, the version proposed here (e.g., since both models would be Horvitz, Balsam, & Ashby, 2016); iii) the result that recov- constrained by the same basal ganglia neuroanatomy). ery from a full reversal is quicker than learning new cate- The neural version of exemplar theory is essentially a simplified version of the procedural-learning component of gories constructed from the same stimuli (Cantwell et al., 6 2015); and iv) unlearning and failures of unlearning (e.g., re- the COVIS model of category learning (Ashby & Crossley, newal) during II categorization (Crossley et al., 2013; Cross- 2011; Ashby et al., 2007; Cantwell et al., 2015). The CO- ley, Ashby, & Maddox, 2014). VIS model includes much more biological detail (e.g., spik- ing MSNs, striatal cholinergic interneurons), other brain ar- David Marr (1982) famously described three levels of eas (e.g., interlaminar thalamic nuclei), a more sophisticated mathematical modeling, which he referred to as computa- reinforcement-learning model (e.g., with rate-limiting terms tional, algorithmic, and implementational. Computational that constrain synaptic strengths to a fixed interval [0, wmax]), models (often called descriptive models in ) make and it does not make any of assumptions A1 – A5. Even so, quantitative predictions, but do not describe the algorithms it is important to note that the equivalence established here that produce those predictions. Algorithmic models (of- greatly benefits both theories. Of course it provides exem- ten called process models in psychology) describe the al- plar theory access to a huge range of neuroscience-related gorithms, but not the architecture that implements those al- phenomena that generally are outside the scope of any purely gorithms. At the lowest level, implementational models de- cognitive theory. But it also greatly benefits COVIS. First, scribe the architecture that implements the algorithms that it means that successes of the exemplar model in provid- produce the quantitative predictions. The original versions of ing good quantitative fits to categorization data imply that exemplar theory – for example, the version described by Eqs. (1), (2), and (3) – were computational-level descriptions of 6COVIS assumes this procedural-learning component domi- asymptotic categorization behavior (i.e., the theory made no nates in unstructured and II category-learning tasks, but that an attempt to account for learning). Cognitive process was used explicit-learning system controls behavior in RB tasks. Exemplar to motivate the quantitative predictions, but no attempt was theory was proposed before there was any evidence or inclination that humans might have multiple category-learning systems. In made to model those processes. To see this, note for exam- the interim, some exemplar theorists have proposed a second rule- ple, that Eq. (1) makes no predictions about what response learning system (e.g., Erickson & Kruschke, 1998; Nosofsky et a participant will make on any single trial of a categorization al., 1994). Of these, perhaps most similar to COVIS is ATRIUM experiment. Rather it simply describes the relative propor- (Erickson & Kruschke, 1998), which includes two sub-models – an tion of times the participant will give every possible response explicit rule-learning model that would dominate in RB tasks and an under the ideal conditions in which the same categorization exemplar model that would dominate in unstructured and II tasks. EXEMPLAR THEORY 11

COVIS should be equally successful at accounting for those Ashby, F. G., & Valentin, V. V. (2016). Multiple systems of percep- data. Second, it provides a method to quickly fit COVIS to tual category learning: Theory and cognitive tests. In H. Cohen asymptotic response proportions collected in the course of & C. Lefebvre (Eds.), Handbook of categorization in cognitive a categorization experiment. Fitting the COVIS model cur- science, second edition (p. in press). New York: Elsevier. rently requires time-consuming Monte Carlo simulations and Ashby, F. G., & Waldron, E. M. (1999). On the nature of implicit therefore, exploring its predictions is a challenging compu- categorization. Psychonomic Bulletin & Review, 6(3), 363-378. tational process. In contrast, the exemplar model is simple Bayley, P. J., Frascino, J. C., & Squire, L. R. (2005). Robust habit learning in the absence of awareness and independent of the me- enough that it can quickly be fit to data in a straightforward dial temporal lobe. Nature, 436(7050), 550–553. manner using standard optimization algorithms. Therefore, Bolam, J., Bergman, H., Graybiel, A., Kimura, M., Plenz, D., Se- because of the equivalence established here, asymptotic be- ung, H., . . . Wickens, J. (2006). Microcircuits, molecules and havioral predictions of COVIS can be quickly tested by fit- motivated behaviour: Microcircuits in the striatum. In S. Grill- ting the exemplar model to the empirical confusion matrices. ner & A. M. Graybiel (Eds.), Microcircuits: The interface be- Finally, this article makes one more equally important tween neurons and global brain function (p. 165-190). Cam- contribution. The equivalence of exemplar theory and the bridge, MA: MIT Press. procedural-learning component of COVIS means that it is Brooks, L. R. (1978). Nonanalytic concept formation and memory probably fruitless to attempt to test between these two mod- for instances. In E. Rosch & B. B. Lloyd (Eds.), and els by comparing goodness-of-fits in any categorization ex- categorization (pp. 169–211). Hillsdale, NJ: Erlbaum. periment. Instead, the two historically disparate approaches Calabresi, P., Maj, R., Pisani, A., Mercuri, N. B., & Bernardi, G. to categorization modeling should be used together to create (1992). Long-term synaptic depression in the striatum: physio- logical and pharmacological characterization. Journal of Neuro- a more powerful armamentarium that can be used to improve science, 12(11), 4224–4233. our overall understanding of categorization behavior. Cantwell, G., Crossley, M. J., & Ashby, F. G. (2015). Multiple stages of learning in perceptual categorization: Evidence and References neurocomputational theory. Psychonomic Bulletin & Review, 22, Arbuthnott, G., Ingham, C., & Wickens, J. (2000). Dopamine 1598–1613. and synaptic plasticity in the neostriatum. Journal of Anatomy, Centonze, D., Picconi, B., Gubellini, P., Bernardi, G., & Calabresi, 196(4), 587–596. P. (2001). Dopaminergic control of synaptic plasticity in the dor- Ashby, F. G., & Alfonso-Reese, L. A. (1995). Categorization as sal striatum. European Journal of Neuroscience, 13(6), 1071– probability density estimation. Journal of Mathematical Psy- 1077. chology, 39(2), 216–233. Crossley, M. J., Ashby, F. G., & Maddox, W. T. (2013). Eras- Ashby, F. G., Alfonso-Reese, L. A., Turken, A. U., & Waldron, ing the engram: The unlearning of procedural skills. Journal of E. M. (1998). A neuropsychological theory of multiple systems Experimental Psychology: General, 142(3), 710-741. in category learning. Psychological Review, 105(3), 442-481. Crossley, M. J., Ashby, F. G., & Maddox, W. T. (2014). Context- Ashby, F. G., & Casale, M. B. (2003). A model of dopamine mod- dependent savings in procedural category learning. Brain and ulated cortical activation. Neural Networks, 16(7), 973–984. Cognition, 92, 1–10. Ashby, F. G., & Crossley, M. J. (2011). A computational model Crossley, M. J., Horvitz, J. C., Balsam, P. D., & Ashby, F. G. of how cholinergic interneurons protect striatal-dependent learn- (2016). Expanding the role of striatal cholinergic interneurons ing. Journal of Cognitive Neuroscience, 23(6), 1549-1566. and the midbrain dopamine system in appetitive instrumental Ashby, F. G., Ell, S. W., & Waldron, E. M. (2003). Procedural learn- conditioning. Journal of Neurophysiology, 115(1), 240–254. ing in perceptual categorization. Memory & Cognition, 31(7), Crossley, M. J., Madsen, N. R., & Ashby, F. G. (2012). Procedu- 1114-1125. ral learning of unstructured categories. Psychonomic Bulletin & Ashby, F. G., & Ennis, J. M. (2006). The role of the basal ganglia Review, 19(6), 1202-1209. in category learning. Psychology of Learning and Motivation, Dayan, P., & Abbott, L. F. (2001). Theoretical neuroscience: Com- 46, 1-36. putational and mathematical modeling of neural systems. Cam- Ashby, F. G., Ennis, J. M., & Spiering, B. J. (2007). A neuro- bridge, MA: MIT Press. biological theory of automaticity in perceptual categorization. Doya, K. (2000). Complementary roles of basal ganglia and cere- Psychological Review, 114(3), 632-656. bellum in learning and motor control. Current Opinion in Neu- Ashby, F. G., & Gott, R. E. (1988). Decision rules in the perception robiology, 10(6), 732–739. and categorization of multidimensional stimuli. Journal of Ex- Er, M. J., Wu, S., Lu, J., & Toh, H. L. (2002). Face recognition with perimental Psychology: Learning, Memory, and Cognition, 14, radial basis function (rbf) neural networks. IEEE Transactions 33-53. on Neural Networks, 13(3), 697–710. Ashby, F. G., Noble, S., Filoteo, J. V., Waldron, E. M., & Ell, S. W. Erickson, M. A., & Kruschke, J. K. (1998). Rules and exemplars in (2003). Category learning deficits in parkinson’s disease. Neu- category learning. Journal of Experimental Psychology: Gen- ropsychology, 17(1), 115–124. eral, 127(2), 107-140. Ashby, F. G., Paul, E. J., & Maddox, W. T. (2011). COVIS. In Ermentrout, G. B., & Terman, D. H. (2010). Mathematical founda- E. M. Pothos & A. Wills (Eds.), Formal approaches in catego- tions of neuroscience. New York: Springer Science & Business rization (pp. 65–87). New York: Cambridge University Press. Media. 12 EXEMPLAR THEORY

Estes, W. K. (1986). Array models for category learning. Cognitive 44. Psychology, 18(4), 500-549. Lamberts, K. (2000). Information-accumulation theory of speeded Estes, W. K. (1994). Classification and cognition. New York: categorization. Psychological Review, 107(2), 227–260. Oxford University Press. Leng, N. R., & Parkin, A. J. (1988). Double dissociation of frontal Filoteo, J. V., Maddox, W. T., & Davis, J. D. (2001a). A possible dysfunction in organic amnesia. British Journal of Clinical Psy- role of the striatum in linear and nonlinear category learning: chology, 27, 359 – 362. Evidence from patients with hungtington’s disease. Behavioral Lopez-Paniagua, D., & Seger, C. A. (2011). Interactions within Neuroscience, 115(4), 786–798. and between corticostriatal loops during component processes Filoteo, J. V., Maddox, W. T., & Davis, J. D. (2001b). Quantitative of category learning. Journal of Cognitive Neuroscience, 23(10), modeling of category learning in amnesic patients. Journal of 3068–3083. the International Neuropsychological Society, 7, 1–19. Maddox, W. T., Ashby, F. G., & Bohil, C. J. (2003). Delayed Filoteo, J. V., Maddox, W. T., Salmon, D. P., & Song, D. D. (2005). feedback effects on rule-based and information-integration cat- Information-integration category learning in patients with stri- egory learning. Journal of Experimental Psychology: Learning, atal dysfunction. Neuropsychology, 19(2), 212-222. Memory, and Cognition, 29, 650-662. Graham, K. S., Scahill, V. L., Hornberger, M., Barense, M. D., Lee, Maddox, W. T., Ashby, F. G., Ing, A. D., & Pickering, A. D. (2004). A. C., Bussey, T. J., & Saksida, L. M. (2006). Abnormal cate- Disrupting feedback processing interferes with rule-based but gorization and perceptual learning in patients with hippocampal not information-integration category learning. Memory & Cog- damage. The Journal of Neuroscience, 26(29), 7547–7554. nition, 32(4), 582-591. Hélie, S., Paul, E. J., & Ashby, F. G. (2012). A neurocomputational Maddox, W. T., Bohil, C. J., & Ing, A. D. (2004). Evidence for a account of cognitive deficits in parkinson’s disease. Neuropsy- procedural-learning-based system in perceptual category learn- chologia, 50(9), 2290–2302. ing. Psychonomic Bulletin & Review, 11(5), 945-952. Hollerman, J. R., & Schultz, W. (1998). Dopamine neurons report Maddox, W. T., Glass, B. D., O’Brien, J. B., Filoteo, J. V., & Ashby, an error in the temporal prediction of reward during learning. F. G. (2010). Category label and response location shifts in Nature Neuroscience, 1(4), 304–309. category learning. Psychological Research, 74(2), 219-236. Hopkins, R. O., Myers, E. C., Shohamy, D., Grossman, S., & Gluck, Maddox, W. T., & Ing, A. D. (2005). Delayed feedback disrupts the M. (2004). Impaired probabilistic category learning in hypoxic procedural-learning system but not the hypothesis testing system subjects with hippocampal damage. Neuropsychologia, 42, 524 in perceptual category learning. Journal of Experimental Psy- – 535. chology: Learning, Memory, and Cognition, 31(1), 100-107. Horvitz, J. C., Stewart, T., & Jacobs, B. L. (1997). Burst activ- Marr, D. (1982). Vision: A computational investigation into the hu- ity of ventral tegmental dopamine neurons is elicited by sensory man representation and processing of visual information. New stimuli in the awake cat. Brain Research, 759(2), 251–258. York: Freeman. Houk, J., Adams, J., & Barto, A. (1995). A model of how the McDaniel, M. A., Cahill, M. J., Robbins, M., & Wiener, C. (2014). basal ganglia generate and use neural signalsthat predict rein- Individual differences in learning and transfer: Stable tenden- forcement. In J. C. Houk, J. L. davis, & D. G. beiser (Eds.), cies for learning exemplars versus abstracting rules. Journal of Models of information processing in the basal ganglia (pp. 249– Experimental Psychology: General, 143(2), 668–693. 270). Cambridge, MA: MIT Press. Janowsky, J. S., Shimamura, A. P., Kritchevsky, M., & Squire, L. R. Medin, D. L., & Schaffer, M. M. (1978). Context theory of classi- (1989). Cognitive impairment following frontal lobe damage fication learning. Psychological Review, 85(3), 207-238. and its relevance to human amnesia. Behavioral Neuroscience, Mirenowicz, J., & Schultz, W. (1994). Importance of unpredictabil- 103, 548 – 560. ity for reward responses in primate dopamine neurons. Journal Kincaid, A. E., Zheng, T., & Wilson, C. J. (1998). Connectivity of Neurophysiology, 72(2), 1024–1027. and convergence of single corticostriatal axons. The Journal of Nomura, E., Maddox, W., Filoteo, J., Ing, A., Gitelman, D., Par- Neuroscience, 18(12), 4722–4731. rish, T., . . . Reber, P. (2007). Neural correlates of rule-based Knowlton, B. J., Mangels, J. A., & Squire, L. R. (1996). A neostri- and information-integration visual category learning. Cerebral atal habit learning system in humans. Science, 273(5280), 1399- Cortex, 17(1), 37-43. 1402. Nosofsky, R. M. (1986). Attention, similarity, and the Knowlton, B. J., & Squire, L. R. (1993). The learning of natu- identification-categorization relationship. Journal of Experi- ral categories: Parallel memory systems for item memory and mental Psychology: General, 115, 39–57. category-level knowledge. Science, 262, 1747 –1749. Nosofsky, R. M. (2011). The generalized context model: An exem- Knowlton, B. J., Squire, L. R., & Gluck, M. A. (1994). Probabilis- plar model of classification. In E. M. Pothos & A. Wills (Eds.), tic classification learning in amnesia. Learning & Memory, 1(2), Formal approaches in categorization (pp. 18–39). New York: 106-120. Cambridge University Press. Kolodny, J. A. (1994). Memory processes in classification learn- Nosofsky, R. M., & Palmeri, T. J. (1997). An exemplar-based ran- ing: An investigation of amnesic performance in categorization dom walk model of speeded classification. Psychological Re- of dot patterns and artistic styles. Psychological Science, 5, 164– view, 104(2), 266-300. 169. Nosofsky, R. M., Palmeri, T. J., & McKinley, S. C. (1994). Rule- Kruschke, J. K. (1992). Alcove: An exemplar-based connectionist plus-exception model of classification learning. Psychological model of category learning. Psychological Review, 99(1), 22– Review, 101(1), 53-79. EXEMPLAR THEORY 13

Oglesby, J., & Mason, J. (1991). Radial basis function networks for Seger, C. A., Peterson, E. J., Cincotta, C. M., Lopez-Paniagua, speaker recognition. In 1991 international conference on acous- D., & Anderson, C. W. (2010). Dissociating the contributions tics, speech, and signal processing, icassp-91 (pp. 393–396). of independent corticostriatal systems to visual categorization Parzen, E. (1962). On estimation of a probability density function learning through the use of reinforcement learning modeling and and mode. The Annals of Mathematical Statistics, 33(3), 1065– granger causality modeling. NeuroImage, 50(2), 644–656. 1076. Shen, W., Flajolet, M., Greengard, P., & Surmeier, D. J. (2008). Di- Pickering, A. D. (1997). New approaches to the study of amnesic chotomous dopaminergic control of striatal synaptic plasticity. patients: What can a neurofunctional philosophy and neural net- Science, 321(5890), 848–851. work methods offer? Memory, 5(1-2), 255–300. Shepard, R. N. (1987). Toward a universal law of generalization for Poldrack, R. A., Clark, J., Pare-Blagoev, E., Shohamy, D., Moy- psychological science. Science, 237(4820), 1317–1323. ano, J. C., Myers, C., & Gluck, M. (2001). Interactive memory Shohamy, D., Myers, C., Onlaor, S., & Gluck, M. (2004). Role systems in the human brain. Nature, 414(6863), 546–550. of the basal ganglia in category learning: How do patients with Reiner, A. (2010). Organization of corticostriatal projection neuron parkinson’s disease learn? Behavioral Neuroscience, 118(4), types. In H. Steiner & K. Y. K. Y. Tseng (Eds.), Handbook of 676–686. basal ganglia structure and function (pp. 323–339). New York: Spiering, B. J., & Ashby, F. G. (2008). Response processes Elsevier. in information–integration category learning. Neurobiology of Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of Learning and Memory, 90(2), 330-338. object recognition in cortex. Nature Neuroscience, 2(11), 1019– Squire, L. R., & Knowlton, B. J. (1995). Learning about categories 1025. in the absence of memory. Proceedings of the National Academy Ronesi, J., & Lovinger, D. M. (2005). Induction of striatal long- of Science USA, 92, 12470 – 12474. term synaptic depression by moderate frequency activation of Valentin, V. V., Maddox, W. T., & Ashby, F. G. (2014). A com- cortical afferents in rat. The Journal of Physiology, 562(1), 245– putational model of the temporal dynamics of plasticity in pro- 256. cedural learning: Sensitivity to feedback timing. Frontiers in Rosenblum, M., Yacoob, Y., & Davis, L. S. (1996). Human ex- Psychology, 5(643). doi: 10.3389/fpsyg.2014.00643 pression recognition from motion using a radial basis function Waldron, E. M., & Ashby, F. G. (2001). The effects of concur- network architecture. IEEE Transactions on Neural Networks, rent task interference on category learning: Evidence for multi- 7(5), 1121–1138. ple category learning systems. Psychonomic Bulletin & Review, Rouder, J. N., & Ratcliff, R. (2006). Comparing exemplar-and rule- 8(1), 168-176. based theories of categorization. Current Directions in Psycho- Waldschmidt, J. G., & Ashby, F. G. (2011). Cortical and striatal logical Science, 15(1), 9–13. contributions to automaticity in information-integration catego- Sage, J. R., Anagnostaras, S. G., Mitchell, S., Bronstein, J. M., rization. Neuroimage, 56(3), 1791-1802. De Salles, A., Masterman, D., & Knowlton, B. J. (2003). Analy- Wickelgren, I. (1997). Getting the brain’s attention. Science, sis of probabilistic classification learning in patients with parkin- 278(5335), 35–37. son’s disease before and after pallidotomy surgery. Learning & Willingham, D. B., Wells, L. A., Farrell, J. M., & Stemwedel, M. E. Memory, 10(3), 226–236. (2000). Implicit motor sequence learning is represented in re- Sakamoto, Y., & Love, B. C. (2004). Schematic influences on cate- sponse locations. Memory & Cognition, 28(3), 366-375. gory learning and recognition memory. Journal of Experimental Wilson, C. J. (1995). The contribution of cortical neurons to the fir- Psychology: General, 133(4), 534–553. ing pattern of striatal spiny neurons. In J. C. Houk, J. L. Davis, Schultz, W. (1998). Predictive reward signal of dopamine neurons. & D. G. Beiser (Eds.), Models of information processing in the Journal of Neurophysiology, 80(1), 1–27. basal ganglia (pp. 29–50). Cambridge, MA: MIT Press. Schultz, W. (2002). Getting formal with dopamine and reward. Wilson, H. R., & Cowan, J. D. (1972). Excitatory and inhibitory Neuron, 26, 241–263. interactions in localized populations of model neurons. Biophys- Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural ical Journal, 12, 1–24. substrate of prediction and reward. Science, 275(5306), 1593– Wilson, H. R., & Cowan, J. D. (1973). A mathematical theory of the 1599. functional dynamics of nervous tissue. Kybernetik, 13, 55–80. Seger, C. A. (2008). How do the basal ganglia contribute to catego- Witt, K., Nuhsman, A., & Deuschl, G. (2002). Dissociation of rization? Their roles in generalization, response selection, and habit-learning in parkinson’s and cerebellar disease. Journal of learning via feedback. Neuroscience & Biobehavioral Reviews, Cognitive Neuroscience, 14(3), 493–499. 32(2), 265–278. Worthy, D. A., Markman, A. B., & Maddox, W. T. (2013). Feed- Seger, C. A., & Cincotta, C. M. (2002). Striatal activity in concept back and stimulus-offset timing effects in perceptual category learning. Cognitive, Affective, & Behavioral Neuroscience, 2(2), learning. Brain and Cognition, 81(2), 283-293. 149–161. Yagishita, S., Hayashi-Takagi, A., Ellis-Davies, G. C., Urakubo, Seger, C. A., & Cincotta, C. M. (2005). The roles of the caudate H., Ishii, S., & Kasai, H. (2014). A critical time window for nucleus in human classification learning. The Journal of Neuro- dopamine actions on the structural plasticity of dendritic spines. science, 25(11), 2941–2951. Science, 345(6204), 1616–1620. Seger, C. A., & Miller, E. K. (2010). Category learning in the brain. Yellott, J. I. (1977). The relationship between Luce’s choice ax- Annual Review of Neuroscience, 33, 203-219. iom, Thurstone’s theory of comparative judgment, and the dou- 14 EXEMPLAR THEORY

ble exponential distribution. Journal of Mathematical Psychol- els. Journal of the International Neuropsychological Society, ogy, 15(2), 109–144. 34, 394 – 406. Zaki, S. R., Nosofsky, R. M., Jessup, N. M., & Unversagt, Zeithamova, D., & Maddox, W. T. (2006). Dual-task interference F. W. (2003). Categorization and recognition performance of in perceptual category learning. Memory & Cognition, 34(2), a memory-impaired group: Evidence for single-system mod- 387-398.