<<

Biol. Cybern. 87, 459–470 (2002) DOI 10.1007/s00422-002-0356-8 Ó Springer-Verlag 2002

Hebbian spike-driven synaptic plasticity for patterns of mean firing rates

Stefano Fusi Institute of Physiology, University of Bern, Bu¨ hlplatz 5, 3012 Bern, Switzerland

Received: 18 May 2002 / Accepted: 15 July 2002

Abstract. Synaptic plasticity is believed to underlie the elevated selective rates to the stimulus, but they do not formation of appropriate patterns of connectivity that show any sustained delay activity (Miyashita 1993). This stabilize stimulus-selective reverberations in the cortex. is an indication that the delay-activity phenomenon is Here we present a general quantitative framework for established only after many presentations of the same studying the process of learning and memorizing of visual stimuli. A typical example of recorded delay patterns of mean spike rates. General considerations activity is shown in Fig. 1. In the absence of any based on the limitations of material (biological or experimental evidence of the contrary, we assume that electronic)synaptic devices show that most learning the spike activities of different neurons are asynchro- networks share the palimpsest property: old stimuli are nous, and that all the information about the identity of forgotten to make room for the new ones. In order to the stimulus is encoded in the mean spike frequency of prevent too-fast forgetting, one can introduce a stochas- every cell. tic mechanism for selecting only a small fraction of to be changed upon the presentation of a stimulus. Such a mechanism can be easily implemented 1.1 The attractor picture by exploiting the noisy fluctuations in the pre- and postsynaptic activities to be encoded. The spike-driven The experimental findings described above have been synaptic dynamics described here can implement such a interpreted as an expression of the cortical interactions selection mechanism to achieve slow learning, which is between large numbers of neurons, and a comprehensive shown to maximize the performance of the network as picture has been suggested in the framework of attractor an associative . neural networks (Amit 1995; Amit and Brunel 1997a,b). In this framework the selective sustained activity is not a single-cell property, but rather the result of a feedback mechanism that maintains a reverberating activity in the absence of the sensory stimulus. During the delay 1 Introduction: cortical reverberations period, those neurons that have been driven to high spike rates by the visual stimulus and that are coupled In a growing number of neurophysiological experiments by strong-enough synaptic connections excite one an- in which a primate performs a memory task, neurons are other in such a way that the enhanced activity is stably observed to have elevated spike rates in the delay periods self-sustained until the arrival of the next visual between successive visual stimuli (for recent reviews see, stimulus. The dynamics of this population of neurons, e.g., Miyashita and Toshirio 2000; Wang 2001). The and in particular of the feedback mechanism, is gov- delay-activity distribution across the recorded cells is erned by the set of all the synaptic connections and automatically triggered by specific visual stimuli, and it efficacies. This set stores passively all the possible delay lasts for long periods after the removal of the sensory activity distributions in response to different stimula- stimulus (up to 30 s; Fuster 1995). In the inferotemporal tions: the visual stimulus selects one of the potential cortex the sustained activity is stimulus specific, that is, responses by determining the initial state of activation of each visual stimulus evokes a characteristic pattern of the population. Following the removal of the stimulus, delay activity. When unfamiliar, novel stimuli are the network dynamics is attracted towards one of the presented, the recorded neurons may respond with stable patterns of activity (attractors), which represents the response of the network. Since these responses are expressed in terms of spike-rate variations, they can be Correspondence to: (e-mail: [email protected], communicated actively to other areas for further Tel.: +41-31-6318778, Fax: +41-31-6314611) processing. 460

The formation of the suitable synaptic structure for and general description of a wide class of biologically stabilizing the delay activity distributions discussed in plausible prescriptions for updating the synaptic con- Amit (1995)is probably one of the best exemplifications nections. This framework has been introduced in Amit of the Hebbian paradigm (Hebb 1949). The repeated and Fusi (1992, 1994), and relies on basic constraints imposition of a pattern of reverberating activity to an that are likely to pose limitations on any type of assembly of neurons eventually leads to a synaptic material (biological or artificial)synaptic device. The structure that makes the reverberating activity stable, guiding principle that dictated the assumptions proved even in the absence of the sensory stimulus that triggered to be surprisingly powerful for drawing general con- it. In other words, when a cell takes part in the collective clusions about synaptic dynamics. The assumptions dynamics that activates another cell (during the impo- are: sition of the visual stimulus), its efficacy is increased, since this change goes in the direction of further stabi- 1. Locality in time (online learning)and in space: the lizing the imposed reverberating activity. This is equiv- only information available to the is the cur- alent to a covariance-based updating rule for the rent activity of the two neurons it connects and the synaptic efficacy: when the two neurons are simulta- previous synaptic state. The synapse is supposed to neously active the synapse should be potentiated acquire information from every single presentation, (Sejnowski 1977; Hopfield 1982). and no temporary storage is available for a later up- date in which the information about more than one 2 An analytic,quantitative framework stimulus is available at the same time. for realistic learning 2. All internal variables describing the synaptic state are bounded. Attractor formation was our starting point for devel- 3. Long-term modifications of the synaptic internal oping a general framework that would provide a simple variables cannot be arbitrarily small.

Fig. 1. Stimulus-selective delay activity in the cortex. The monkey the test stimulus are combined across match and nonmatch performs a delayed-match-to-sample task in which it has to compare a conditions. The visual stimulus S triggers a sustained delay activity visual sample stimulus (S)to a visual test stimulus ( T)and respond in response to stimulus 14, but not to stimulus 24. The information differently depending on whether the two stimuli are the same or about the last stimulus seen is propagated up to the presentation of different. The plot shows the response of a cell to two different stimuli the next visual stimulus. The mechanism underlying the selective delay (STIM 24 and STIM 14).Therasters show the spikes emitted by the activity seems to be automatic, i.e., not effected by the task. Indeed the cell in different trials, and the peristimulus time histogram shows the sustained activity is triggered also in the intertrial interval, between the meanspikerateacrossalltherepetitionsofthesamestimulusasa test stimulus and the sample of the next trial, where there is no need to function of time. Note that the trial intervals are sorted and hold in memory the identity of the last stimulus seen. See also Amit reorganized according to their corresponding stimulus identity. et al. (1997)for a description of many other features of the delay Consequently, the number and order of the rasters for the sample activity (figure adapted from Yakovlev et al. 1998) and test stimuli are not aligned: the data shown during and following 461

KJ 2.1 The palimpsest property Q ðnpre; npostÞ¼0, the transition cannot take place. If a particular pair of activities leaves the synapse un- Under the above assumptions, any network of neurons changed, then Q is equal to the identity matrix (the exhibits the ‘‘palimpsest’’ property (Nadal et al. 1986; synapse remains in the initial state and only the diagonal Parisi 1986; Amit and Fusi 1992, 1994): old stimuli are terms are nonzero). If the pair of activities potentiates or automatically forgotten to make room for the most depresses the synapse by inducing a transition to one of recent ones. The memory span is limited to a sliding the two neighboring states, then the matrix has the window containing a certain number of stimuli that following form (top: long-term potentiation, LTP; induced synaptic modifications. Within this window, bottom: long-term depression, LTD): recent stimuli are best remembered while stimuli outside 0 1 the memory window are completely forgotten, as if they 010... 0 B C had never been seen by the network. The width of the B 001... 0 C B C sliding window depends on how many synapses are B C B ...... C changed following each presentation: if this number is B C Q ¼ B C small the network is slow to acquire information from B 000... 1 C B C the stimuli to be learnt, but the memory span is large. @ A Otherwise, if the fraction of synapses that are changed 000... 1 upon each stimulus presentation is large, the network learns quickly but the memory span is quite limited. This 0 1 10... 00 constraint can be so tight that it might seriously B C compromise the functioning of the network as an B 10... 00C B C associative memory. In Sect. 2.1.1 we illustrate the B C B 01... 00C palimpsest property when the network learns uncorre- B C Q ¼ B C lated, random patterns of activity. B ...... C B C @ 00... 10A 2.1.1 The problem of fast forgetting. Learning can be seen as a stochastic process when the stimuli to be stored are random (Heskes and Kappen 1991; Amit and Fusi 1992): each stimulus imposes a specific activity level to Low indexes (J; K)correspond to depressed synaptic the two neurons connected by the synapse. If these states. Notice that each row contains only one nonzero stimuli are random and uncorrelated, each synapse will term. We now focus on the memory trace left by a generic see a random sequence of pairs of activities. We denote 1 t t stimulus n , followed by the presentation of p À 1 other the activity of neuron i by the variable vii, where ni t uncorrelated random patterns. We want to know denotes a generic stimulus shown at time t. In our case ni represents the mean spike rate, but in principle it can be whether the final, current synaptic matrix still preserves any dependence on the pattern of activity imposed by any quantity related to what should be encoded by the 1 synapse (e.g., it might be the degree of correlation stimulus n . As shown below, the memory trace of the most recent stimuli n2; ...; np is stronger than the one of between the spike activities of the pre- and postsynaptic 1 1 neurons). To simplify the analysis we assume that the set n . Hence if n can be retrieved from memory, then all of stable internal synaptic states is discrete (but it can be the other p À 1 patterns can be recalled a fortiori, and p arbitrarily large). As a consequence the presentation of a provides an estimate of the number of patterns that can be stored in the synaptic matrix. The conditional dis- sequence of uncorrelated stimuli induces a random walk p 1 1 among the stable synaptic values which can be described tribution function qJ ðnpre; npostÞ that a synapse is in state as a Markov process. More formally, the probability J following the presentation of p patterns, the first of which imposed n1 ; n1 on the synapse, satisfies the MKJ that a synapse makes a transition from the internal pre post stable state K to the stable state J is given by: equation

X Xns M ¼ pðn ; n ÞQðn ; n Þð1Þ p 1 1 1 1 1 pÀ1 pre post pre post qJ ðnpre; npostÞ¼ qK ðnpre; npostÞðM ÞKJ ; ð2Þ npre;npost K¼1 where the sum extends over all the pairs of activities that where the index K runs over all the stable ns synaptic 1 induce long-term synaptic modifications. For each pair states. The initial distribution qK induced by the 1 of activities, pðnpre; npostÞ is the probability that a presentation of stimulus n is modified p À 1 times by stimulus imposes the activities npre and npost to the pair the successive random stimuli. We implicitly hypothesize of neurons connected by the synapse, and Qðnpre; npostÞ that all the patterns of activity induced by the p stimuli is a matrix of binary values that encodes the learning have the same statistics and are not correlated with the rule, i.e., how the neural activities modify the internal first pattern n1. KJ synaptic state: if Q ðnpre; npostÞ¼1, then a transition It is reasonable to assume that there is always a from state K to state J occurs whenever the specific pair sequence of synaptic transitions, on any given synapse, of activities npre and npost is imposed by the stimulus to that can bring the synapse from any one of its stable the pre- and postsynaptic neurons. Otherwise, when states to any other state. This type of dynamics is 462 ergodic and hence, for a large number of presentations permitted, then n0 corresponds to the number of stable of random patterns, there is an asymptotic distribu- states ns. In general n0  ns. If the synapse is deeply tion: analog (i.e., the number of stable states is large), n0 is p proportional to the full range of variability of the syn- 1 1 1 max min qJ ðnpre; npostÞ!qJ ð3Þ aptic efficacy W À W divided by the minimal 1 1 change in the synaptic efficacy that a single stimulus which is independent of npre; npost. When the distribution presentation can induce. of the synapses becomes too close to the asymptotic 1 For the ergodic theorem (see, e.g., Shiryaev 1984)we distribution, the memory trace of the first pattern n have, for any pair (K; J)of states, that Mp converges to fades away, and eventually disappears. This is another its limit geometrically: expression of the palimpsest property. The number of patterns of which there is still some memory trace p 1 ½p=n0ŠÀ1 ðM Þ ÀðM Þ < ð1 À KminÞ ð5Þ depends essentially on the forgetting rate, i.e., the KJ KJ n0 convergence rate to the asymptotic distribution. where Kmin is the minimal value of M , and depends in a complicated way on the smallest nonzero components of 2.1.2 A tight constraint on storage capacity. The synaptic M and on n0. The inequality (5)sets an upper bound for p matrices that would be obtained after the presentation the difference between qK and the asymptotic, memory- 1 of a very large number of statistically independent less distribution qK . The necessary condition of (4)for random patterns (in principle an infinite number)would retrieving the oldest pattern n1 can be rewritten as be independent of the initial condition determined by the follows: presentation of the first, oldest pattern n1. Following X each stimulus presentation, the synaptic matrix still DW N 2 q1 n1 ; n1 MpÀ1 changes, but the statistics remains constant and is only  ½ K ð pre postÞð ÞKJ determined by the asymptotic distribution of (3). In J;K order to preserve a dependence (and hence some 1 1 1 1 J memory)on n , the difference between the synaptic ÀqK ðnpre; npostÞðM ÞKJ ŠW > d matrix WðpÞ following the presentation of p patterns and any of the asymptotic synaptic matrixes Wð1Þ We used the explicit expression of the conditional should be large enough. This is just a necessary distribution qp given in (2). If we now replace the condition because this memory trace expresses only a difference between the two matrices (MpÀ1 À M1)with generic dependence of the synaptic matrix on the its maximum given by (5), we obtain structure of n1, and such a dependence could be too hi p X X small for the network dynamics to retrieve information À1 1 2 n0 J 1 1 1 about stimulus n . In more formal terms we should DW < N ð1 À KminÞ W qK ðnpre; npostÞ impose hiJ K

X p À1 2 ns n0 < N nsW ð1 À KminÞ DW ¼ ½WijðpÞÀWijð1ފ > d ð4Þ i;j where W ns is the maximal efficacy. The final condition 1 where d depends on the details of the network neural for retrieving pattern n now reads dynamics. If the network is composed of N neurons, N is W W ns n N 2 ½p=n0ŠÀ1 large, and the number of synapses scales as N 2 (the best D  s ð1 À KminÞ > d case), then DW is estimated by which poses a constraint on the maximum number of

patterns p that can be stored in the synaptic matrix: X DW N 2 qp n1 ; n1 q1 W J ffiffiffiffi  ½ J ð pre postÞÀ J Š 2 p Àn logðn N =dÞ n0 logðN nsÞ J p < 0 s  ð6Þ logð1 À KminÞ Kmin where now the sum extends over all the stable synaptic states 1; ...; ns that correspond to the synaptic efficacies This constraint is extremely tight and very general. Only W 1; ...; W ns . We implicitly hypothesized that the statis- a number of patterns that scale as log N can be retrieved, tics of the stimuli is ‘‘translation invariant,’’ i.e., it does which makes the network a very inefficient associative not depend on the specific pair of neurons that we are memory. Note that if one of the parameters such as n0 or considering. What follows can be easily extended to the Kmin becomes N-dependent, then one can extend the case in which the statistics of the stimuli is more memory span. For instance, if the number of synaptic complex. states increases with N (as in the case of the Hopfield Since we assume that the process is ergodic, there will (1982)model), n0 and 1=Kmin provide two extra N- n0 be an n0 such that all the elements of M (M raised to dependent factors which in some cases destroy the the power n0)are greater than 0. n0 is the mean number palimpsest behavior. In those cases the storage capacity of presentations that are needed to visit all the stable is mainly limited by the interference between the synaptic states. If the synapse is discrete, and only memory traces of different stimuli, and not by memory transitions between neighboring synaptic states are fading. For a wide class of models, as soon as the 463 maximum storage capacity is surpassed, the network 1. The memory trace left by each stimulus presentation. suddenly becomes unable to retrieve any of the memo- This represents the way the synapse encodes the rized patterns. This is also known as the blackout activity of the pre- and postsynaptic neurons. It catastrophe (see, e.g., Amit 1989). In order to prevent depends also on the initial distribution of the synaptic this, one has to stop learning at the right time, or the states. network must be able to forget. Hence the palimpsest 2. The decay of the memory trace due the palimpsest property, if forgetting is not too fast, can be a desirable property. New stimuli use resources that had been feature of the learning process. previously allocated to other stimuli, and, in doing so, erase the memory trace of the oldest stimuli. The 2.1.3 The solution: the stochastic selection mecha- forgetting rate depends on the statistics of the stimuli nism. The expression for p (Eq. 6)contains the direc- and on the inherent transition probability of the tions to solve the problem of the logarithmic constraint. synapse. 3. The interference between memory traces corre- Indeed Kmin depends on the fraction of synapses that make a transition to a different stable state (contained in sponding to different stimuli: the simultaneous pres- the structure of M), which, in turn, depends on the ence in the synaptic matrix of memory traces statistical properties of the stimuli (i.e., the fraction of corresponding to different patterns within the mem- neurons that are driven to activities that induce long- ory span generates noise that might prevent the sys- term synaptic changes)and on the inherent synaptic tem from recalling the stored patterns correctly. dynamics which in our case is described by the Q matrix We now illustrate the meaning of these three ingredients (see Eq. 1). Decreasing the fraction of synapses that are with a simple example. Assume that the synapse has only changed upon each presentation can dramatically in- two stable states (ns ¼ 2), corresponding to two effica- crease the memory span. What the network needs is a cies: W 1 ¼ 0; W 2 ¼ W . In this case the transition matrix mechanism that selects which synaptic efficacies have to M is 2  2 and contains only two independent terms: (i) be changed following each stimulation. In the absence of the probability of making a transition to the potentiated an external supervisor that could perform this task, a state and (ii)the probability of jumping into the possible local, unbiased mechanism is stochastic learn- depressed state (see below for the expression of M). ing: at parity of conditions (activities of the pre- and Each stimulus selects randomly a mean fraction f of postsynaptic neurons)the transitions between stable neurons and drives them to a state of elevated activity, states occur with some probability. This means that only the same for all the active neurons. In order to estimate a fraction of randomly chosen synapses are changed the retrievable memory trace of the first stimulus pre- upon each stimulus presentation. This stochastic selec- sented (n1), we introduce the classical signal-to-noise tion would correspond to transforming the binary Q ratio (indicated as S/R; see, e.g., Amit 1989): the signal S matrix, which expresses a deterministic learning rule expresses the distance between the distribution of the (given a specific pair of pre- and postsynaptic activities total synaptic input across all neurons that should be the synapse always behaves in the same way), into the active, and the corresponding distribution across the matrix of the probabilities of making a transition from neurons that should stay quiescent when the pattern of one state to another state given a specific pair of pre- and activity n1 is imposed on the network. Quantitatively S is postsynaptic activities. In this way the elements of Q can defined as the difference between the averages of these be arbitrarily small and Kmin can tend to zero, hence two distributions. The noise R represents the mean width increasing the memory span. See Sect. 2.2 for an of the two distributions. A high S/R would allow the example of this. network to retrieve from memory the pattern of activity n1 that is embedded in the synaptic matrix, and make it a stable point of the collective dynamics. 2.2 The standard learning scenario The synaptic updating rule is as follows: when the two neurons connected by the synapse are both active, the The scenario depicted in Sect. 2.1.3 has been studied synapse makes a transition to the potentiated state with P analytically in (Amit and Fusi 1992, 1994; Battaglia and probability qAA and a transition to the depressed state D Fusi 1995; Fusi 1995; Kahn et al. 1995; Lattanzi et al. with probability qAA. The other transitions occur with D 1997; Brunel et al. 1998). Learning can be described as a probabilities denoted with a similar notation (e.g., qIA is random walk of the internal synaptic variable among the the probability that depression occurs when the presy- stable states. Every stimulus imposes a pattern of naptic neuron is active and the postsynaptic neuron is neuronal activity, and a randomly selected fraction of inactive). These probabilities correspond to the terms of the synapses make a transition to a different stable state. the Qðnpre; npostÞ matrix introduced in (1)and, again, Such a system can be described as a Markov process, encode the learning rule. For simplicity we dropped one and the final distribution of the synaptic efficacies can be of the state indexes K; J (the one corresponding to the computed as an explicit function of the patterns of initial state)since the synapse is bistable, and we ex- activity imposed by the stimuli. From the analysis it pressed the dependence on the activities npre and npost in turns out that in general there are at least three elements two subscript indexes. With this notation, the most that characterize online updating rules for material general Markov matrix has the following form (see synapses: Eq. 1): 464 ! ! P P P P be stored and retrieved from memory without errors can 2 1 À qAA qAA 1 À qAI qAI M ¼ f þ f ð1 À f Þ be as large as N 2=ðlog NÞ2, if the mean fraction f of qD 1 À qD qD 1 À qD AA AA !AI AI active neurons scales as log N/N (Amit and Fusi 1994). 1 À qP qP Slow learning also allows automatic prototype extrac- þ f ð1 À f Þ IA IA tion from a class of stimuli that evoke similar, correlated qD 1 À qD IA IA! patterns of activity (Fusi 1995; Brunel et al. 1998). 1 À qP qP Increasing the number ns of stable synaptic states 1 f 2 II II þð À Þ D D does not improve the performance of the network much, qII 1 À qII provided that in the patterns of activity to be stored large populations of neurons encode the same informa- f 2 where is the probability that the pre- and postsynap- tion (e.g., they have the same mean spike rate). In the f f tic neurons are active (AA), ð1 À Þ is the probability case analyzed here this assumption is definitely true, that the postsynaptic neuron is active and the presy- since there are only two levels of activity imposed by the naptic neuron is inactive (IA), and so on for all the four stimuli: neurons are either active or inactive. All the combinations of activities. active and inactive neurons are read by the postsynaptic p For such a system, after the presentation of pat- neuron through two classes of synaptic efficacies, and terns, the signal corresponding to the oldest pattern is the total synaptic current can be written as (for the derivation of this expression in a simpler case, see Amit and Fusi 1994) 1 X 1 X I ¼ W n þ W n i N ij j N ij j pÀ1 P P D D j2A j2I S ¼ Wf k ½ð1 À c0ÞðqAA À qIAÞþc0ðqIA À qAAފ ð7Þ n X n X where c is the initial fraction of potentiated synapses ¼ A W þ I W 0 N ij N ij and k is the smallest eigenvalue of the Markov transition j2A j2I matrix. k is given by If the two groups of synapses are large enough, the two P D 2 P D P D k ¼ 1 ÀðqAA þ qAAÞf ÀðqIA þqIA þqAI þ qAIÞf ð1Àf Þ sums are equivalent to an analog weight that has a P D 2 number of states as large as the number of neurons that ÀðqII þ qIIÞð1 À f Þ encode the same activity (when each synaptic efficacy is which is essentially 1 minus the sum of all the transition binary). For a wide class of network models, if p probabilities, each multiplied by the corresponding patterns can be stored with binary synapses, no more probability of occurrence of a specific pair of activities than ðns À 1Þp patterns can be memorized with a (e.g., the probability of both pre- and postsynaptic multistable synapse (Amit and Fusi 1994). In small 2 networks, the performance improvement that can be neurons being active is f ). Let c1 be the asymptotic fraction of potentiated synapses that one would get after achieved by increasing ns can be more dramatic. an infinite number of presentations of different stimuli. Interestingly, the dependence on the initial fraction c0 of potentiated synapses c0 can change the behavior of The interference noise term depends mostly on c1 and not on the conditional distributions that still preserve the network qualitatively. By changing this parameter memory of the first pattern n1 (Amit and Fusi 1994): one can even implement the phenomenon of primacy, in rffiffiffiffiffiffiffiffiffiffiffi which the first stimulus seen by the network has a f stronger memory trace than the subsequent stimuli R  W c1 (Kahn et al. 1995). N Finally, a few considerations about the coding level f where N is the number of neurons in the network. of the patterns are mentioned. The optimal case is From these formulae it is clear that the most efficient achieved when both f and the transition probabilities q way of storing patterns of activities (i.e., when S/R is tend to 0 with N, as explained above. However there are P many intermediate cases that are studied in Amit and maximal)is when the Hebbian term qAA dominates over P Fusi (1994)in which the network can perform well as an qIA, the transition probabilities are low (k  1), and the coding level f of the stimuli tends to zero with N. The associative memory, even if only the condition of transition probabilities should scale in such a way that sparseness of the stimuli is imposed. However, in this all terms in k tend to zero with f at the same rate as case the number p of random patterns with a mean P 2 P P fraction f of active neurons that can be stored and qAAf (e.g., qIA  qAAf )or faster. This would corre- spond to a scenario in which learning is slow – i.e., the successfully retrieved does not surpass 1=f . In the case stimuli have to be repeatedly presented to the network in of stochastic learning p can be as large as 1=f 2. order to be learnt – and the updating rule ensures the balance between the mean number of potentiations and the mean number of depressions. In such a case the 3 Synaptic dynamics network performs extremely well as an associative memory (e.g., it recovers the optimal storage capacity in Stochastic learning provides a good tool for studying terms of information that can be stored in the synaptic quantitatively the process of attractor formation without matrix), even if the synaptic efficacy is binary (two stable specifying the details of the synaptic dynamics. Howev- states only). The number of different patterns that can er, to proceed further, it is important to have a detailed 465 model that provides a link between the transition low(Vpost < VL). The depolarization is indirectly related probabilities and the synaptic dynamics driven by the to the activity of the postsynaptic neuron. A single neural activities. This implies the identification of the reading of the instantaneous depolarization does not source of noise that drives the stochastic selection contain much information about the postsynaptic mean mechanism. The solution we propose is suggested by rate. However, the required information is distributed the analysis of cortical recordings: the spike trains across several neurons that are driven to the same recorded in vivo are quite irregular (see, e.g., Softky and activity by the stimulus. This means that even a single Koch 1993), and the interspike interval variability is instantaneous reading of the depolarization of a popu- directly accessible by the synapse. It is actually possible lation of cells contains all the information about the to exploit this source of stochasticity to achieve sto- mean spike rate and many other statistical properties of chastic transitions in a fully deterministic synaptic the activity of these neurons (see the Appendix). model.

3.2 Stochastic transitions 3.1 The model To illustrate the stochastic nature of the learning The synaptic dynamics is described in terms of an mechanism, we assume that the presynaptic spike train internal variable X ðtÞ. X is restricted to the interval ½0; 1Š is Poisson distributed, while the afferent current to the and, inside the permitted interval, obeys postsynaptic neuron is a noisy, Gauss-distributed sto-   dX ðtÞ chastic process. Such a situation is meant to mimic what ¼ÀaH À X ðtÞþhX happens in vivo during the presentation of a visual t d   stimulus. The heavy bombardment of synaptic inputs þ bH X ðtÞÀhX þ HðtÞ ; ð8Þ that determines the activity of the postsynaptic neuron is here emulated by a Gaussian-distributed current (see, where H is the Heaviside function. The first two terms e.g., Tuckwell 1988). make the upper and the lower bound the only two stable The synaptic dynamics depends on the detailed sta- states of X when HðtÞ, the stimulus driven Hebbian tistics of the spike trains of the pre- and postsynaptic learning term, is 0. The internal state decays at a rate a neurons. During stimulation the synapses move tempo- to the lower bound X ¼ 0 when X < hX . Otherwise X is rarily up and down, driven by the presynaptic spikes. attracted towards the upper bound X ¼ 1 with a Following the removal of the stimulus, the synaptic ef- constant drift b. The Hebbian term depends on variables ficacy may return to its initial state, or it may make a related to the activities of the two neurons connected by transition to another state. Then, in the presence of the synapse. Presynaptic spikes trigger temporary mod- spontaneous activity, the internal state of the synapse ifications in the synaptic variable: each spike induces a continues to change upon the arrival of each spike, but jump in X whose value depends on the instantaneous the probability of crossing the threshold hX becomes depolarization of the postsynaptic neuron Vpost.Itis negligible and the synapse fluctuates around one of the upwards (X ! X þ a)if the depolarization is high stable states. Figure 2 shows two cases, during a typical (Vpost > VH), and downwards if the depolarization is stimulation, at parity of mean spike rates of pre- and

Fig. 2. Stochastic long-term potentiation (LTP): pre- and postsynap- post-synaptic neuron (the threshold for emitting a spike is 1). Left tic neurons fire at the same mean rate and the synapse starts from the panels: LTP is caused by a burst of presynaptic spikes that drives X ðtÞ same initial value [X ð0Þ¼0] in both cases illustrated in the left and above the synaptic threshold. Right panels: At the end of stimulation, right panels. In each panel are plotted as a function of time (from top X returns to the initial value. Even though the mean firing rates are to bottom): the presynaptic spikes, the simulated synaptic internal identical, the final state is different in the two cases (figure reproduced variable X ðtÞ, and the depolarization V ðtÞ of an integrate-and-fire from Fusi et al. 2000a) 466 postsynaptic neurons: in one case (left panels in Fig. 2), parameters that characterize the statistics of the input a fluctuation drives the synaptic efficacy above threshold current depends on the details of the neuronal dynamics. and, when the stimulus is removed, X is attracted to the In the case of simple integrate-and-fire neurons, this high state: LTP has occurred; in the second case (right relation can be computed analytically (Ricciardi 1977; panels), when the stimulus is removed, X is below Fusi and Mattia 1999). Below we adopt the following threshold and is attracted by the refresh to the initial model of an integrate-and-fire neuron: the depolariza- state (no transition occurred). In the two cases the sta- tion V is the only dynamical variable. The neuron tistics of the activity to be encoded – the mean spike integrates linearly the synaptic input as long as V is frequency – is the same, but the realization of the sto- below some threshold h. When this threshold is crossed, chastic process that generated the pre and postsynaptic the neuron emits a spike and is then reset to some value activities is different. H, where it stays for the duration of a refractory period Such a stochastic selection mechanism shows that the sr; then it starts to integrate the current again. In formal importance of an internal threshold hX for the synaptic terms, the dynamics is governed by the following dynamics is at least twofold: (i)it stabilizes memory by differential equation: ignoring all the fluctuations that do not drive the inter- dV nal state across the threshold, and (ii)it provides a ¼Àk þ IðtÞ simple mechanism that allows the neural activity to se- dt lect which synapses are to be modified. This mechanism where k is a constant leakage and IðtÞ is the total is so simple and robust that it can be readily imple- synaptic current. Such a neuron has a similar behavior mented in discrete electronics (Badoni et al. 1995; Del to the one of the classical integrate-and-fire neuron with Giudice et al. 1998)or in analog VLSI (Fusi et al. 2000a; a leak proportional to the depolarization provided that Chicca and Fusi 2001). V is limited from below by a rigid barrier (Fusi and Mattia (1999). Interestingly, the response function (the mean firing frequency as a function of the mean and 3.3 Transition probabilities variance of the input current)of these simple neurons can be fitted to that of real cortical pyramidal cells The transition probability is defined as the fraction of (A. Rauch, G. La Camera, H.-R. Luescher, W. Senn, cases – out of a large number of repetitions of the same S. Fusi, unpublished work, 2002). stimulation conditions – in which at the end of the The parameters of the current determine not only the stimulation, the synapse made a transition to a state mean firing rate, but also the distribution of the depo- different from its original state. The spike-based synaptic larization, and hence the probability that a presynaptic dynamics described here produces the transition prob- spike triggers an upwards (probability ra)or a down- abilities required to store an extensive number of wards jump (probability rb). Given these two probabil- patterns of mean rates. The stochastic process induced ities and the presynaptic rate mpre, the LTP/LTD by the noisy pre- and postsynaptic activity on the KJ probabilities [the Q ðnpre; npostÞ of Eq. 1] can be com- internal synaptic variable X is known as a Taka´ cs puted numerically by calculating the probability that X process and can be studied using a density approach is above/below the threshold hX at the end of the stim- (Fusi et al. 2000a). The solution of the equations for the ulation, when the initial state is 0/1. distribution of X gives the transition probabilities for These transition probabilities are plotted in Fig. 3. any pair of pre- and postsynaptic spike rates, and for We assume that every stimulus brings a mean fraction f any stimulation time interval. For each postsynaptic of cells to an activity level of 50 Hz; all the other neurons frequency, the parameters l and r characterizing the remain at a spontaneous level of 4 Hz. The LTP prob- input current are tuned to produce the desired mean ability when the pre- and postsynaptic neurons are both firing rate mpost, as explained in the Appendix. The P active correspond to the qAA introduced in Sect. 2.2 and relation between the mean spike frequency and the P is the highest transition probability. qIA (presynaptic

Fig. 3. Contour plots of LTP and long-term depression (LTD)probabilities ( q)on a log scale vs pre- and postsynaptic neuron rates for a 500-ms stimulation. LTP occurs when pre and post synaptic rates are both high. Around the white plateau, PLTP drops sharply and becomes negligible for sponta- neous rates. The strong nonlinearity allows easy discrimination between relevant signals and background noise (figure adapted from Fusi et al. 2000a) 467 neuron at 50 Hz, postsynaptic neuron at 5 Hz)is clearly preliminary evidence that this kind of noisy activity is P D D much smaller than qAA,andqIA > qAA. All these rela- actually good enough to generate small transition tions go in the direction of maximizing the signal S (see probabilities (Chicca and Fusi 2001). Eq. 7)left by a single stimulus presentation. The pa- Another advantage of transferring the load of gen- D rameters of the synapse are also tuned to ensure that qIA erating stochasticity to the network dynamics is that the P is a fraction f ¼ 1=30 of qAA, to ensure the balance learning and forgetting speeds can be readily controlled between potentiations and depressions. This is required by the statistics of the network activity. One simple to maximize the memory span. The highest probability control measure is the rate provoked by the stimuli, as P qAA is anyway small (0.03), which means that learning a discussed above. However, for the same mean frequen- pattern of activity would require many presentations cies, the synapse is rather sensitive also to the variability (30)of the same stimulus (slow learning).As one in the interspike intervals and to the degree of syn- moves towards low, spontaneous activities, all the chronization of the spike trains. This means that the transitions probabilities (LTP and LTD)drop dramati- network can easily and quickly switch from a single-shot cally. This is necessary to prevent the network from learning modality (for highly correlated inputs)to slow forgetting in the presence of spontaneous neural activity. learning (with uncorrelated inputs)and optimal storage Interestingly, the mean time one has to wait until the capacity, without changing any inherent parameter of synapse makes a spontaneous transition to another state the synaptic dynamics (Chicca and Fusi 2001). This is of the order of 100 days (or years, if the spontaneous would fit nicely into a scenario in which the mean fre- activity is below 2 Hz), a period of time which is several quency encodes the information about the stimulus to be orders of magnitude longer than the longest inherent memorized, and the other statistical properties provide a time constant of the synaptic dynamics (the decay due to triggering signal for learning (e.g., due to attention). a occurs in a time of the order of 100 ms). This is another advantage of transferring the load of generating stochasticity to a system (the network)which is much 4.2 Biological plausibility bigger than a single synapse and hence offers a much larger state space. This permits the generation of rare We showed that the network can perform well as an events, as required for the synaptic transitions, without associative memory even if the analog depth of the needing fine tuning of parameters or the resorting to synaptic efficacies is reduced to the extreme (bistable biologically implausible long time constants. synapses). One might wonder whether biological syn- apses are discrete on long time scales. Such a discrete- ness is compatible with experimental data (see, e.g., Bliss 4 Discussion and Collingridge (1993). Recently Peterson et al. (1998) provided preliminary experimental evidence that LTP is 4.1 The problem of noise generation actually all or none, meaning that for each synapse only two efficacies can be preserved on long time scales. Stochastic learning turned out to be a simple, unbiased A second issue concerns the protocols for inducing mechanism for selecting the synapses that are to be LTP and LTD. Recent experiments (see, e.g., Markram updated upon a stimulus presentation. The mechanism et al. 1997; Zhang et al. 1998)indicate that the precise has the great advantage of being localized in space: each timing of presynaptic spikes and postsynaptic action synapse decides whether or not to change without potentials can determine the directions of the synaptic knowing what the other synapses are doing, and change: LTP occurs when the presynaptic spikes precede nevertheless the average fraction of updated synapses postsynaptic spikes, and LTD when presynaptic spikes is kept constant. Moreover slow learning can be easily follow the postsynaptic spike within a short time win- achieved since the transition probabilities can be so low dow. The synaptic dynamics described here is compati- that the mean number of modified synapses is even ble with such an experimental result: whenever a smaller than one (Fusi 2001). However, this approach presynaptic spike precedes a postsynaptic action poten- moves the problem to the generation of the proper noise tial, the depolarization is likely to be near to the emis- to drive the synaptic dynamics. The fully deterministic sion threshold (above the threshold VH)and LTP is spike-driven synaptic model described here exploits the likely to be induced. When a presynaptic spike occurs variability in the neural activity to drive the stochastic just after the emission of a postsynaptic , mechanism: the source of randomness is in the spike- that model neuron is likely to be hyperpolarized (due to emission process of the neurons. The next issues, the spike after hyperpolarization), and the depolariza- currently studied, are: (i)is it possible to generate the tion will tend to be below VL. This produces LTD. proper irregular spike activity with a deterministic Of course the depolarization of our postsynaptic network? and 2)would this irregular activity be good neuron should be considered as an effective variable: it is enough to drive the synaptic stochastic mechanism? It is the only internal variable characterizing the neural dy- known that deterministic networks of randomly con- namics of our simple integrate-and-fire neuron. Such a nected integrate-and-fire neurons can generate highly variable may not correspond directly to the depolariza- irregular activity by exploiting the quenched, frozen tion of a complex biological neuron. There may instead disorder in the pattern of connectivity (van Vreeswijk be some internal variable which accounts also for other and Sompolinsky 1996; Fusi et al. 2000b). There is internal states and which is sensitive to particular time 468 intervals, before and after the postsynaptic spike. For Mongillo 2002), or, finally, an inherent property of the instance, the effect of the postsynaptic action potential synaptic dynamics that automatically normalizes the should definitely be incorporated in the model (in our statistics of the plastic weights (Fusi 2002). model even the depolarization peak during the emission of the spike is totally ignored). However our study Acknowledgements. Most of the work described in the first part of shows that it is easy to embed patterns of asynchronous this manuscript was carried out in the group coordinated by activity in the synaptic matrix by reading the depolar- D.J. Amit in Rome, in the context of the LANN initiative of INFN (see the cited works for more details). I would like to thank ization of the postsynaptic neuron, and our mechanism W. Gerstner, G. La Camera, S. Liu, A. Renart, and W. Senn: with does not require strong nonlinearities as in a synaptic their remarks they contributed to greatly improving a previous dynamics entirely based on spike timing (Senn 2002). version of the manuscript. Such a simple solution has probably not been over- looked by biology, and recent experiments have actually shown that there is an important dependence of LTP Appendix: Mean spike rates vs depolarization distribution and LTD on the residual depolarization of the postsy- naptic neuron (Sjo¨ stro¨ m et al. 2001). The probabilities of occurrence ra and rb of the temporary modifications a and b control the direction in which the synapse is modified by the activity of the 4.3 Back to delay activity pre- and postsynaptic neurons. These probabilities depend on the statistics of the depolarization of the A final issue is whether a network of integrate-and-fire postsynaptic neuron under stimulation, and can be neurons connected by the plastic synapses described here calculated analytically when the model of the neuron is can reproduce the delay activity observed in cortical simple. Here we focus on the simple model of an recordings and discussed in Sect. 1. The problem is integrate-and-fire neuron with constant leak and a complicated by the fact that the network activity induces reflecting barrier at the resting potential (Fusi and synaptic changes, and the synaptic changes affect the Mattia 1999)which is described in Sect. 3.3. If such a network dynamics (for a review, see Del Giudice et al. neuron is injected with a Gaussian current characterized 2002). The synaptic modifications can actually compro- by its mean l and variance r2, the stationary distribu- mise the stability of the network dynamics. From recent tion of the depolarization pðvÞ has a simple expression, work (Del Giudice et al. 2001; Amit and Mongillo and is given by (Fusi and Mattia 1999; Fusi et al. 2000a) (2002), it is becoming clear that the synaptic dynamics h  m À2l ðhÀvÞ described here or a slightly modified version thereof is pðvÞ¼ Hðv À HÞ 1 À e r2 good enough to generate stable selective delay activity l i À2lH À2l h 2lv and to reproduce the phenomenology described in þ HðH À vÞ e r2 À e r2 er2 Fig. 1, provided that some extra regulatory mechanism for controlling the stability of the network dynamics is where m is the mean firing frequency: introduced. This mechanism might be short-term syn-  2 À1 aptic depression (Del Giudice et al. 2001), a progressive r À2lh À2lH h À H m ¼ s þ e r2 À e r2 þ reduction of the external input coming from the sensory r 2l2 l areas as the patterns of activity are learnt (Amit and

Fig. A1. Distributions of the postsynaptic depolarization, pðvÞ (left), hð¼ 1Þ is low. As mpost increases, the distribution becomes convex, and and ra (probability of an upward jump)(right)vs mð¼ mpostÞ for the increasingly uniform. The white line is the depolarization value which stimulation procedure described in the Appendix. In the case shown separates the regions corresponding to upwards (high V )and here the reset potential H has been chosen to be identical to the resting downwards (lowR V )jumps. Notice that in the case shown here, h potential V ¼ 0. For low spike rates, pðvÞ is concave and the VH ¼ VL. ra ¼ pðvÞ (figure reproduced from Fusi et al. 2000a) VH probability of a depolarization near the spike-emission threshold 469 where sr is the absolute refractory period. ra and rb are Chicca E, Fusi S (2001)Stochastic synaptic plasticity in deter- ministic aVLSI networks of spiking neurons. In: Rattay F (ed) given by the integral of pðvÞ over the intervals ½VH; hŠ and ½0; V Š, respectively. It is straightforward to compute Proceedings of the World Congress on Neuroinformatics, L Vienna, Austria, 24–29 September, pp 468–477 these integrals analytically. The analytical expressions Del Giudice P, Mattia M (2001)Long and short-term synaptic for pðvÞ and m for the classical integrate-and-fire neuron plasticity and the formation of working memory: a case study. with a leak proportional to the depolarization are also Neurocomputing 38: 1175–1180 available (see, e.g., Brunel 2000). Del Giudice P, Fusi S, Badoni D, Dante V, Amit DJ (1998) l and r characterize the synaptic input and depend Learning attractors in an asynchronous, stochastic electronic on the network interactions. We assume that m is neural network. Network 9: 183–205 post Del Giudice P, Fusi S, Mattia M (2002)Modeling the formation of changed by increasing or decreasing the average spike working memory with networks of integrate-and-fire neurons frequency of a subpopulation of presynaptic neurons connected by plastic synapses. J Physiol (Paris)(in press) (Fusi et al. 2000a). If the recurrent feedback of the Fusi S (1995)Prototype extraction in material attractor neural postsynaptic neurons does not have a great effect on networks with stochastic dynamic learning. In: Rogers SK, the network activity, then the parameters of the input Ruck DW (eds)Proceedings of SPIE 95, Orlando, Fla., current move along a linear trajectory in the ðl; r2Þ pp 1027–1038 Fusi S (2001)Long term memory: and storing strategies space. We chose l as an independent parameter, and 2 of the . Neurocomputing 38: 1223–1228 r ¼ Jl þ K. In a network of excitatory and inhibitory Fusi S (2002)Spike-driven synaptic plasticity for learning corre- neurons, in which in a spontaneous activity state the lated patterns of asynchronous activity. In: Dorronsoro JR (ed) recurrent input is as large as the external input, we Proceedings of ICANN2002, Madrid, Spain, 27–30 August. (Lecture notes in computer science)Springer, Berlin Heidelberg have that J ¼ JE (the average coupling between excit- atory neurons)and K ¼ mI N J ðJ þ J Þ, where m is the New York 0 I I I E 0 Fusi S, Mattia M (1999)Collective behavior of networks with spontaneous activity of the NI inhibitory neurons that linear (VLSI)integrate-and-fire neurons. Neural Comput are projecting to the postsynaptic cell (mean coupling 11: 643–662 JI). ra is plotted in Fig. A1. Since the external stimulus Fusi S, Annunziato M, Badoni D, Salamon A, Amit DJ (2000a) increases mpost, the distribution of the depolarization V Spike-driven synaptic plasticity: theory, simulation, VLSI im- plementation. Neural Comput 12: 2227–2258 changes in such a way that rb decreases and ra increases. Figure A1 exhibits the characteristics of the Fusi S, Del Giudice P, Amit DJ (2000b) of a VLSI spiking neural network. In: Amari S-I, Giles L, Gori M, resulting distribution of Vpost. Piuri V (eds)Proceedings of the IEEE-INNS-ENNS Interna- tional Joint Conference on Neural Networks, Como, Italy, 24–27 July, pp 121–126 References Fuster JM 1995 Memory in the cerebral cortex. MIT Press, Cam- bridge, Mass. Amit DJ (1989)Modeling brain function. Cambridge University Hebb DO (1949)The organization of behavior: a neuropsycho- Press, New York logical theory. Wiley, New York Amit DJ (1995)The Hebbian paradigm reintegrated: local rever- Heskes TM, Kappen B (1991)Learning processes in neural net- berations as internal representations. Behav Brain Sci 18: 617– works. Phys Rev A 44: 2718 657 Hopfield JJ (1982)Neural networks and physical systems with Amit DJ, Brunel N (1997a)Model of global spontaneous activity emergent selective computational abilities. Proc Natl Acad Sci and local structured (learned)delay activity in cortex. Cereb USA 79: 2554 Cortex 7: 237–252 Kahn PE, Wong KYM, Sherrington D (1995)A memory model Amit DJ, Brunel N (1997b)Dynamics of a recurrent network of with novel behaviour in sequential learning. Network 6: spiking neurons before and following learning. Network 415–427 8: 373–404 Lattanzi G, Nardulli G, Pasquariello G, Stramaglia S (1997)Sto- Amit DJ, Fusi S (1992)Constraints on learning in dynamic syn- chastic learning in a neural network with adapting synapses. apses. Network 3: 443 Phys Rev E 56: 4567–4573 Amit DJ, Fusi S (1994)Dynamic Learning in neural networks with Markram H, Lubke J, Frotscher M, Sakmann B (1997)Regulation material synapses. Neural Comput 6: 957 of synaptic efficacy by coincidence of postsynaptic APs and Amit DJ, Mongillo G (2002)Spike-driven synaptic dynamics EPSPs. Science 275: 213–215 generating working memory states, neural computation. Neu- Miyashita Y (1993)Inferior temporal cortex: where visual per- ral Comput (in press) ception meets memory. Annu Rev Neurosci 16: 245–263 Amit DJ, Fusi S, Yakovlev V (1997)Paradigmatic attractor cell. Miyashita Y, Toshirio H (2000)Neural representation of visual Neural Comput 9: 1071–1093 objects: encoding and top-down activation. Curr Opin Neu- Badoni D, Bertazzoni S, Buglioni S, Salina G, Amit DJ, Fusi S robiol 10: 187–194 (1995)Electronic implementation of a stochastic learning Nadal JP, Toulouse G, Changeux JP, Dehaene S (1986)Networks attractor neural network. Network 6: 125–157 of formal neurons and memory palimpsests. Europhys Lett Battaglia FP, Fusi S (1995)Learning in neural networks with 1: 535 partially structured synaptic transitions. Network 6: 261 Parisi G (1986)A memory which forgets. J Phys A Math Gen Bliss TVP, Collingridge GL (1993)A synaptic model of memory: 19: L617 long term potentiation in the Nature 361: 31–39 Petersen CCH, Malenka RC, Nicoll RA, Hopfield JJ (1998)All-or- Brunel (2000)Dynamics of sparsely connected networks of excit- none potentiation at CA3–CA1 synapses. Proc Natl Acad Sci atory and inhibitory spiking neurons. J Comput Neurosci USA 95: 4732–4737 8: 183–208 Ricciardi LM (1977)Diffusion processes and related topics in Brunel N, Carusi F, Fusi S (1998)Slow stochastic Hebbian learning biology. Springer, Berlin Heidelberg New York of classes of stimuli in a recurrent neural network. Network Senn W (2002)Beyond spike timing: the role of nonlinear plasticity 9: 123–152 and unreliable synapses. Biol Cybern 87: 344–355 470

Sejnowski TJ (1977)Storing covariance with nonlinearly interact- Vreeswijk CA van, Sompolinsky H (1996)Chaos in neural net- ing neurons. J Math Biol 4: 303 works with balanced excitatory and inhibitory activity. Science Shiryaev AN (1984)Probability. Springer, Berlin Heidelberg New 274: 1724–1726 York Wang XJ (2001)Synaptic reverberation underlying mnemonic Sjo¨ stro¨ m PJ, Turrigiano GG, Nelson SB (2001)Rate, timing and persistent activity. Trends Neurosci 24: 455–463 cooperativity jointly determine cortical synaptic plasticity. Yakovlev V, Fusi S, Berman E, Zohary E (1998)Inter-trial neu- Neuron 32: 1149–1164 ronal activity in infero-temporal cortex: a putative vehicle to Softky WR, Koch C (1993)The highly irregular firing of cortical generate long term associations. Nat Neurosci 1: 310–331 cells is inconsistent with temporal integration of random Zhang LI, Tao HW, Holt CE, Harris WA, Poo M (1998)A critical EPSPs. J Neurosci 1: 334–350 window for cooperation and competition among developing Tuckwell HC (1988)Introduction to theoretical neurobiology, retinotectal synapses. Nature 395: 37–44 vol 2. Cambridge University Press, Cambridge