PHYSICAL REVIEW E VOLUME 59, NUMBER 4 APRIL 1999

Hebbian learning and spiking

Richard Kempter* Physik Department, Technische Universita¨tMu¨nchen, D-85747 Garching bei Mu¨nchen,

Wulfram Gerstner† Swiss Federal Institute of Technology, Center of Neuromimetic Systems, EPFL-DI, CH-1015 Lausanne,

J. Leo van Hemmen‡ Physik Department, Technische Universita¨tMu¨nchen, D-85747 Garching bei Mu¨nchen, Germany ͑Received 6 August 1998; revised manuscript received 23 November 1998͒ A correlation-based ͑‘‘Hebbian’’͒ learning rule at a spike level with millisecond resolution is formulated, mathematically analyzed, and compared with learning in a firing-rate description. The relative timing of presynaptic and postsynaptic spikes influences synaptic weights via an asymmetric ‘‘learning window.’’ A differential equation for the learning dynamics is derived under the assumption that the time scales of learning and neuronal spike dynamics can be separated. The differential equation is solved for a Poissonian model with stochastic spike arrival. It is shown that correlations between input and output spikes tend to stabilize structure formation. With an appropriate choice of parameters, learning leads to an intrinsic normal- ization of the average weight and the output firing rate. Noise generates diffusion-like spreading of synaptic weights. ͓S1063-651X͑99͒02804-4͔

PACS number͑s͒: 87.19.La, 87.19.La, 05.65.ϩb, 87.18.Sn

I. INTRODUCTION In contrast to the standard rate models of Hebbian learn- ing, we introduce and analyze a learning rule where synaptic Correlation-based or ‘‘Hebbian’’ learning ͓1͔ is thought modifications are driven by the temporal correlations be- to be an important mechanism for the tuning of neuronal tween presynaptic and postsynaptic spikes. First steps to- connections during development and thereafter. It has been wards a detailed modeling of temporal relations have been shown by various model studies that a learning rule which is taken for rate models in ͓34͔ and for spike models in ͓22,35– driven by the correlations between presynaptic and postsyn- 43͔. aptic neurons leads to an evolution of neuronal receptive fields ͓2–9͔ and topologically organized maps ͓10–12͔. II. DERIVATION OF THE LEARNING EQUATION In all these models, learning is based on the correlation between neuronal firing rates, that is, a continuous variable A. Specification of the Hebb rule reflecting the mean activity of a neuron. This is a valid de- We consider a neuron that receives input from Nӷ1 syn- scription on a time scale of 100 ms and more. On a time apses with efficacies J ,1рiрN; cf. Fig. 1. We assume scale of 1 ms, however, neuronal activity consists of a se- i that changes of Ji are induced by presynaptic and postsyn- quence of short electrical pulses, the so-called action poten- aptic spikes. The learning rule consists of three parts. ͑i͒ Let tials or spikes. During recent years experimental and theoret- t f be the arrival time of the f th input spike at synapse i. The ical evidence has accumulated which suggests that temporal i coincidences between spikes on a millisecond or even sub- millisecond scale play an important role in neuronal infor- mation processing ͓13–24͔. If so, a rate description may, and often will, neglect important information that is contained in the temporal structure of a neuronal spike train. Neurophysiological experiments also suggest that the change of a synaptic efficacy depends on the precise timing of postsynaptic action potentials with respect to presynaptic input spikes on a time scale of 10 ms. Specifically, a synaptic weight is found to increase, if presynaptic firing precedes a postsynaptic spike, and to decrease otherwise ͓25,26͔; see also ͓27–33͔. Our description of learning at a temporal reso- lution of spikes takes these effects into account.

FIG. 1. Single neuron. We study the development of synaptic *Electronic address: [email protected] weights Ji ͑small filled circles, 1рiрN) of a single neuron ͑large † in Electronic address: [email protected]fl.ch circle͒. The neuron receives input spike trains denoted by Si and ‡ Electronic address: [email protected] produces output spikes denoted by Sout.

1063-651X/99/59͑4͒/4498͑17͒/$15.00PRE 59 4498 ©1999 The American Physical Society PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4499 arrival of a spike induces the weight Ji to change by an amount ␩win which can be either positive or negative. The quantity ␩Ͼ0 is a ‘‘small’’ parameter. ͑ii͒ Let tn be the nth output spike of the neuron under consideration. This event triggers the change of all N efficacies by an amount ␩wout which can also be positive or negative. ͑iii͒ Finally, time differences between all pairs of input and output spikes in- fluences the change of the efficacies. Given a time difference f n sϭti Ϫt between input and output spikes, Ji is changed by an amount ␩W(s) where the learning window W is a real- valued function. It is to be specified shortly; cf. also Fig. 6. Starting at time t with an efficacy Ji(t), the total change ⌬J (t)ϭJ (tϩ )ϪJ (t) in a time interval , which may be i i T i T FIG. 2. Hebbian learning and spiking neurons—schematic. In interpreted as the length of a learning trial, is calculated by the bottom graph we plot the time course of the synaptic weight summing the contributions of input and output spikes as well Ji(t) evoked through input and output spikes ͑upper graphs, vertical as all pairs of input and output spikes occurring in the time 1 bars͒. An output spike, e.g., at time t , induces the weight Ji to interval ͓t,tϩ ͔. Denoting the input spike train at synapse i change by an amount wout which is negative here. To consider the T in f by a series of ␦ functions, Si (t)ϭ͚ f ␦(tϪti ), and, simi- effect of correlations between input and output spikes, we plot the out n larly, output spikes by S (t)ϭ͚n␦(tϪt ), we can formu- learning window W(s) ͑center graphs͒ around each output spike, late the rules ͑i͒–͑iii͒ explicitly by setting where sϭ0 matches the output spike times ͑vertical dashed lines͒. 1 2 3 The three input spikes at times ti , ti , and ti ͑vertical dotted lines͒ in increase Ji by an amount w each. There is no influence of corre- tϩ 1 T in in out out lations between these input spikes and the output spike at time t . ⌬Ji͑t͒ϭ␩ dtЈ͓w Si ͑tЈ͒ϩw S ͑tЈ͔͒ This becomes visible with the aid of the learning window W cen- ͵t tered at t1. The input spikes are too far away in time. The next tϩ tϩ output spike at t2, however, is close enough to the previous input T T in ϩ␩ dtЈ dtЉW͑tЉϪtЈ͒S ͑tЉ͒Sout͑tЈ͒ 3 out ͵ ͵ i spike at ti . The weight Ji is changed by w Ͻ0 plus the contribu- t t 3 2 tion W(ti Ϫt )Ͼ0, the sum of which is positive ͑arrowheads͒. 4 in 4 ͑1a͒ Similarly, the input spike at time ti leads to a change w ϩW(ti Ϫt2)Ͻ0. Ј in Ј out Ј f n ϭ␩ ͚ w ϩ͚ w ϩ ͚ W͑ti Ϫt ͒ . f tn f n discrete events that trigger a discontinuous change of the ͫ ti ti ,t ͬ ͑1b͒ synaptic weight; cf. Fig. 2 ͑bottom͒. If we assume a stochas- tic spike arrival or if we assume a stochastic process for generating output spikes, the change ⌬Ji is a random vari- f n In Eq. ͑1b͒ the prime indicates that only firing times ti and t able, which exhibits fluctuations around some mean drift. in the time interval ͓t,tϩ ͔ are to be taken into account; cf. Averaging implies that we focus on the drift and calculate Fig. 2. T the expected rate of change. Fluctuations are treated in Sec. Equation ͑1͒ represents a Hebb-type learning rule since VI. they correlate presynaptic and postsynaptic behavior. More precisely, here our learning scheme depends on the time se- 1. Self-averaging of learning in out quence of input and output spikes. The parameters w ,w Effective learning needs repetition over many trials of as well as the amplitude of the learning window W may, and length , each individual trial being independent of the pre- in general will, depend on the value of the efficacy J . Such i vious ones.T Equation ͑1͒ tells us that the results of the indi- a J dependence is useful so as to avoid unbounded growth i vidual trials are to be summed. According to the ͑strong͒ law of synaptic weights. Even though we have not emphasized of large numbers 44 in conjunction with being ‘‘small’’ this in our notation, most of the theory developed below is ͓ ͔ ␩ valid for J -dependent parameters; cf. Sec. V B. ͓45͔ we can average the resulting equation, viz., Eq. ͑1͒, i regardless of the random process. In other words, the learn- ing procedure is self-averaging. Instead of averaging over B. Ensemble average several trials, we may also consider one single long trial Given that input spiking is random but partially correlated during which input and output characteristics remain con- and that the generation of spikes is in general a complicated stant. Again, if ␩ is sufficiently small, time scales are sepa- dynamic process, an analysis of Eq. ͑1͒ is a formidable prob- rated and learning is self-averaging. lem. We therefore simplify it. We have introduced a small The corresponding average over the resulting random pro- parameter ␩Ͼ0 into Eq. ͑1͒ with the idea in mind that the cess is denoted by angular brackets ͗͘ and is called an en- learning process is performed on a much slower time scale semble average, in agreement with physical usage. It is a than the neuronal dynamics. Thus we expect that only aver- probability measure on a probability space, which need not aged quantities enter the learning dynamics. be specified explicitly. We simply refer to the literature ͓44͔. Considering averaged quantities may also be useful in or- Substituting sϭtЉϪtЈ on the right-hand side of Eq. ͑1a͒ and der to disregard the influence of noise. In Eq. ͑1͒ spikes are dividing both sides by , we obtain T 4500 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

occur on average in a time interval of length . Then, using Ϫ1 tϩ T the notation f (t)ϭ ͐t TdtЈf(tЈ), we may introduce the inT in out out mean firing rates ␯i (t)ϭ͗Si ͘(t) and ␯ (t)ϭ͗S ͘(t). We in out call ␯i and ␯ mean firing rates in order to distinguish them in from the previously defined instantaneous rates ͗Si ͘ and ͗Sout͘ which are the result of an ensemble average only. Be- cause of their definition, mean firing rates ␯ always vary slowly as a function of time. That is, they vary on a time scale of the order of . The quantities ␯in and ␯out therefore T i FIG. 3. Inhomogeneous Poisson process. In the upper graph we carry hardly any information that may be present in the tim- in have plotted an example of an instantaneous rate ␭i (t) in units of ing of discrete spikes. Hz. The average rate is 10 Hz ͑dashed line͒. The lower graph shows For the sake of further simplification of Eq. ͑2͒, we define in a spike train Si (t) which is a realization of an inhomogeneous the width of the learning window W(s) and consider the in W Poisson process with rate ␭i (t). The spike times are denoted by case ӷ . Most of the ‘‘mass’’ of the learning window vertical bars. shouldT beW inside the interval ͓Ϫ , ͔. Formally we require Ϫ W Wϱ ͐ ϪW ds͉W(s)͉ӷ͐ϪϱWds͉W(s)͉ϩ͐ ds͉W(s)͉. For ӷ J t tϩ W W T W ͗⌬ i͑͘ ͒ ␩ T in in out out the integration over s in Eq. ͑2͒ can be extended to run from ϭ dtЈ[w ͗Si ͑͘tЈ͒ϩw ͗S ͑͘tЈ͒] ͵t Ϫϱ to ϱ. With the definition of a temporally averaged cor- T T relation function, ␩ tϩ tϩ ϪtЈ T T ϩ dtЈ dsW͑s͒ C s;t͒ϭ Sin tϩs͒ Sout t͒ , ͑3͒ ͵t ͵tϪt i͑ ͗ i ͑ ͑ ͘ T Ј ϫ Sin t ϩs Sout t . 2 the last term on the right in Eq. ͑2͒ reduces to ͗ i ͑ Ј ͒ ͑ Ј͒͘ ͑ ͒ ϱ ͐ϪϱdsW(s)Ci(s;t). Correlations between presynaptic and 2. Example: Inhomogeneous Poisson process postsynaptic spikes, thus, enter spike-based Hebbian learning through Ci convolved with the window W. We note that the Averaging the learning equation before proceeding is jus- correlation Ci(s;t), though being both an ensemble and a tified if both input and output processes will be taken to be temporally averaged quantity, may change as a function of s inhomogeneous Poisson processes, which will be assumed on a much faster time scale than or the width of the throughout Secs. IV–VI. An inhomogeneous Poisson pro- T W learning window. The temporal structure of Ci depends es- cess with time-dependent rate function ␭(t)у0 is character- sentially on the neuron ͑model͒ under consideration. An ex- ized by two facts: ͑i͒ disjoint intervals are independent and ample is given in Sec. IV A. ͑ii͒ the probability of getting a single event at time t in an We require learning to be a slow process; cf. Sec. II B 1. interval of length ⌬t is ␭(t)⌬t, more events having a prob- More specifically, we require that J values do not change ability o(⌬t); see also ͓46͔, Appendix A for a simple expo- much in the time interval . Thus separates the time scale sition of the underlying mathematics. The integrals in Eq. ͑width of the learning windowT WT ) from the time scale of ͑1a͒ or the sums in Eq. ͑1b͒ therefore decompose into many theW learning dynamics, which is proportional to ␩Ϫ1. Under independent events and, thus, the strong law of large num- those conditions we are allowed to approximate the left-hand bers applies to them. The output is a temporally local process side of Eq. ͑2͒ by the rate of change dJ /dt, whereby we as well so that the strong law of large numbers also applies i n have omitted the angular brackets for brevity. Absorbing ␩ to the output spikes at times t in Eq. ͑1͒. into the learning parameters win, wout, and W, we obtain If we describe input spikes by inhomogeneous Poisson processes with intensity in(t), then we may identify the d ϱ ␭i in in out out ensemble average over a spike train with the stochastic in- Ji͑t͒ϭw ␯i ͑t͒ϩw ␯ ͑t͒ϩ dsW͑s͒Ci͑s;t͒. dt ͵Ϫ in in in ϱ tensity, ͗Si ͘(t)ϭ␭i (t); cf. Fig. 3. The intensity ␭i (t) can ͑4͒ be interpreted as the instantaneous rate of spike arrival at synapse i. In contrast to temporally averaged mean firing The ensemble-averaged learning equation ͑4͒, which holds rates, the instantaneous rate may vary on a fast time scale in for any neuron model, will be the starting point of the argu- many biological systems; cf. Sec. III C. The stochastic inten- ments below. sity ͗Sout͘(t) is the instantaneous rate of observing an output spike, where ͗͘ is an ensemble average over both the input III. SPIKE-BASED AND RATE-BASED and the output. Finally, the correlation function HEBBIAN LEARNING in out ͗Si (tЉ)S (tЈ)͘ is to be interpreted as the joint probability In this section we indicate the assumptions that are re- density for observing an input spike at synapse i at the time quired to reduce spike-based to rate-based Hebbian learning tЉ and an output spike at time tЈ. and outline the limitations of the latter.

C. Separation of time scales A. Rate-based Hebbian learning We require the length of a learning trial in Eq. ͑2͒ to be In neural network theory, the hypothesis of Hebb ͓1͔ is much larger than typicalT interspike intervals. Both many in- usually formulated as a learning rule where the change of a put spikes at any synapse and many output spikes should synaptic efficacy Ji depends on the correlation between the PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4501

in mean firing rate ␯i of the ith presynaptic neuron and the mean firing rate ␯out of a postsynaptic neuron, viz.,

dJi ϵJ˙ ϭa ϩa ␯inϩa ␯out dt i 0 1 i 2

in out in 2 out 2 ϩa3␯i ␯ ϩa4͑␯i ͒ ϩa5͑␯ ͒ , ͑5͒

Ϫ1 Ϫ1 FIG. 4. The postsynaptic potential ⑀ in units of ͓e ␶0 ͔ as a where a0Ͻ0, a1 , a2 , a3 , a4 , and a5 are proportionality function of time s in milliseconds. We have ⑀ϵ0 for sϽ0 so that ⑀ constants. Apart from the decay term a0 and the ‘‘Hebbian’’ is causal. The kernel ⑀ has a single maximum at sϭ␶0 . For s in term ␯ ␯out proportional to the product of input and output ϱ the postsynaptic potential ⑀ decays exponentially with time i → rates, there are also synaptic changes which are driven sepa- constant ␶0 ; cf. also Appendix B 2. rately by the presynaptic and postsynaptic rates. The param- eters a0 ,...,a5 may depend on Ji . Equation ͑5͒ is a general IV. STOCHASTICALLY SPIKING NEURONS formulation up to second order in the rates; see, e.g., ͓3,47,12͔. A crucial step in analyzing Eq. ͑4͒ is determining the correlations Ci between input spikes at synapse i and output B. Spike-based Hebbian learning spikes. The correlations, of course, depend strongly on the neuron model under consideration. To highlight the main To get Eq. ͑5͒ from the spike-based learning rule in Eq. points of learning, we study a simple toy model. Input spikes ͑4͒ two approximations are required. First, if there are no are generated by an inhomogeneous Poisson process and fed correlations between input and output spikes apart from the into a stochastically firing neuron model. For this scenario correlations contained in the instantaneous rates, we can we are able to derive an analytical expression for the corre- write Sin(tЈϩs)Sout(tЈ) Ϸ Sin (tЈϩs) Sout (tЈ). Second, ͗ i ͘ ͗ i ͘ ͗ ͘ lations between input and output spikes. The introduction of if these rates change slowly as compared to , then we have in out T the model and the derivation of the correlation function is the Ci(s;t)Ϸ␯i (tϩs)␯ (t). In addition, ␯ϭ͗S͘ is the time topic of the first subsection. In the second subsection we use evolution on a slow time scale; cf. the discussion after Eq. the correlation function in the learning equation ͑4͒ and ana- ͑3͒. Since we have ӷ , the rates ␯in also change slowly as T W i lyze the learning dynamics. In the final two subsections the compared to the width of the learning window and, thus, relation to the work of Linsker ͓3͔͑a rate formulation of in W in we may replace ␯i (tϩs)by␯i(t) in the correlation term Hebbian learning͒ and some extensions based on spike cod- ϱ ϱ ͐ϪϱdsW(s) Ci(s;t). This yields ͐ϪϱdsW(s)Ci(s;t) ing are considered. ˜ in out ˜ ϱ ϷW(0)␯i (t)␯ (t), where W(0)ª͐ϪϱdsW(s). Under the above assumptions we can identify W˜ (0) with a . By fur- 3 A. Poisson input and stochastic neuron model ther comparison of Eq. ͑4͒ with Eq. ͑5͒ we identify win with out a1 and w with a2 , and we are able to reduce Eq. ͑4͒ to Eq. We consider a single neuron which receives input via N 5 by setting a ϭa ϭa ϭ0. ͑ ͒ 0 4 5 synapses 1рiрN. The input spike trains arriving at the N synapses are statistically independent and generated by an C. Limitations of rate-based Hebbian learning inhomogeneous Poisson process with time-dependent inten- in in The assumptions necessary to derive Eq. ͑5͒ from Eq. ͑4͒, sities ͗Si ͘(t)ϭ␭i (t) with 1рiрN ͓46͔. however, are not generally valid. According to the results of In our simple neuron model we assume that output spikes Markram et al. ͓25͔, the width of the Hebbian learning are generated stochastically with a time-dependent rate window in cortical pyramidal cellsW is in the range of 100 ms. ␭out(t) that depends on the timing of input spikes. Each input f At retinotectal synapses is also in the range of 100 ms spike arriving at synapse i at time ti increases ͑or decreases͒ W out f ͓26͔. the instantaneous firing rate ␭ by an amount Ji(ti ) ⑀(t A mean rate formulation thus requires that all changes of f Ϫti ), where ⑀ is a response kernel. The effect of an incom- the activity are slow at a time scale of 100 ms. This is not ing spike is thus a change in probability density proportional necessarily the case. The existence of oscillatory activity in to Ji . Causality is imposed by the requirement ⑀(s)ϭ0 for the cortex in the range of 40 Hz ͑e.g., ͓14,15,20,48͔͒ implies sϽ0. In biological terms, the kernel ⑀ may be identified with activity changes every 25 ms. Retinal ganglion cells fire an excitatory ͑or inhibitory͒ postsynaptic potential. In synchronously at a time scale of about 10 ms ͓49͔; cf. also throughout what follows, we assume excitatory couplings 50 . Much faster activity changes are found in the auditory ͓ ͔ JiϾ0 for all i and ⑀(s)у0 for all s. In addition, the response system. In the auditory pathway of, e.g., the barn owl, spikes kernel ⑀(s) is normalized to ͐ds⑀(s)ϭ1; cf. Fig. 4. can be phase-locked to frequencies of up to 8 kHz ͓51–53͔. The contributions from all N synapses as measured at the Furthermore, beyond the correlations between instantaneous axon hillock are assumed to add up linearly. The result gives rates, additional correlations between spikes may exist. rise to a linear inhomogeneous Poisson model with intensity Because of all the above reasons, the learning rule ͑5͒ in the simple rate formulation is insufficient to provide a gen- N erally valid description. In Secs. IV and V we will therefore out f f ␭ ͑t͒ϭ␯0ϩ͚ ͚ Ji͑ti ͒⑀͑tϪti ͒. ͑6͒ study the full spike-based learning equation ͑4͒. iϭ1 f 4502 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

Here, ␯0 is the spontaneous firing rate and the sums run over all spike arrival times at all synapses. By definition, the spike generation process ͑6͒ is independent of previous output spikes. In particular, this Poisson model does not include refractoriness. In the context of Eq. ͑4͒, we are interested in ensemble averages over both the input and the output. Since Eq. ͑6͒ is a linear equation, the average can be performed directly and yields

N ϱ out in FIG. 5. Spike-spike correlations. To understand the meaning of ͗S ͑͘t͒ϭ␯0ϩ Ji͑t͒ ds⑀͑s͒␭i ͑tϪs͒. ͑7͒ i͚ϭ1 ͵0 in out in Eq. ͑9͒ we have sketched ͗Si (tЈ)S (t)͘/␭i (tЈ) as a function of time t ͑full line͒. The dot-dashed line at the bottom of the graph is f the contribution J (t )⑀(tϪt ) of an input spike occurring at time In deriving Eq. ͑7͒ we have replaced Ji(ti )byJi(t) because i Ј Ј efficacies are assumed to change adiabatically with respect to tЈ. Adding this contribution to the mean rate contribution, ␯0 in the width of ⑀. The ensemble-averaged output rate in Eq. ͑7͒ ϩ͚ jJ j(t)⌳ j (t) ͑dashed line͒, we obtain the rate inside the square depends on the convolution of ⑀ with the input rates. In what brackets of Eq. ͑9͒͑full line͒. At time tЉϾtЈ the input spike at time follows we denote tЈ enhances the output firing rate by an amount Ji(tЈ)⑀(tЉϪtЈ) ͑arrows͒. Note that in the main text we have taken tЉϪtЈϭϪs.

ϱ in in B. Learning equation ⌳i ͑t͒ϭ ds⑀͑s͒␭i ͑tϪs͒. ͑8͒ ͵0 Before inserting the correlation function ͑10͒ into the learning rule ͑4͒ we define the covariance matrix Equation ͑7͒ may suggest that input and output spikes are statistically independent, which is not the case. To show this q ͑s;t͒ ͓␭in͑tϩs͒Ϫ␯in͑tϩs͔͓͒⌳in͑t͒Ϫ␯in͑t͔͒ ͑11͒ explicitly, we determine the ensemble-averaged correlation ij ª i i j j in out in out ͗Si (tϩs)S (t)͘ in Eq. ͑3͒. Since ͗Si (tϩs)S (t)͘ corre- and its convolution with the learning window W, sponds to a joint probability, it equals the probability density in ␭i (tϩs) for an input spike at synapse i at time tϩs times ϱ the conditional probability density of observing an output Qij͑t͒ª͵ dsW͑s͒qij͑s;t͒. ͑12͒ spike at time t given the above input spike at tϩs, Ϫϱ

in out Using Eqs. ͑7͒, ͑10͒, and ͑12͒ in Eq. ͑4͒, we obtain ͗Si ͑tϩs͒S ͑t͒͘ N ˙ in in out ˜ in in in Jiϭw ␯i ϩw ␯0ϩ͚ J j␯ j ϩW͑0͒␯i ␯0 ϭ␭i ͑tϩs͒ ␯0ϩJi͑t͒⑀͑Ϫs͒ϩ ͚ J j͑t͒⌳ j ͑t͒ . ͫ j ͬ ͫ jϭ1 ͬ ϱ ͑9͒ ˜ in in in ϩ͚ J j QijϩW͑0͒␯i ␯ j ϩ␦ij␯i dsW͑s͒⑀͑Ϫs͒ . j ͫ ͵Ϫϱ ͬ The first term inside the square brackets is the spontaneous ͑13͒ output rate and the second term is the specific contribution caused by the input spike at time tϩs, which vanishes for For the sake of brevity, we have omitted the dependence sϾ0. We are allowed to write J (t) instead of the ‘‘correct’’ i upon time. weight J (tϩs); cf. the remark after Eq. 7 . To understand i ͑ ͒ The assumption of identical and constant mean input the meaning of the second term, we recall that an input spike rates, ␯in(t)ϭ␯in for all i, reduces the number of free param- arriving before an output spike ͑i.e., sϽ0) raises the output i eters in Eq. ͑13͒ considerably and eliminates all effects com- firing rate by an amount proportional to ⑀(Ϫs); cf. Fig. 5. ing from rate coding. We define The sum in Eq. ͑9͒ contains the mean contributions of all synapses to an output spike at time t. For the proof of Eq. out ˜ in in in ͑9͒, we refer to Appendix A. k1ϭ͓w ϩW͑0͒␯ ͔␯0ϩw ␯ , Inserting Eq. ͑9͒ into Eq. ͑3͒ we obtain out ˜ in in k2ϭ͓w ϩW͑0͒␯ ͔␯ , ͑14͒ N in in Ci͑s;t͒ϭ J j͑t͒␭i ͑tϩs͒⌳ j ͑t͒ j͚ϭ1 in k3ϭ␯ ͵ dsW͑s͒⑀͑Ϫs͒ in ϩ␭i ͑tϩs͓͒␯0ϩJi͑t͒⑀͑Ϫs͔͒, ͑10͒ in Eq. ͑13͒ and arrive at where we have assumed the weights J j to be constant in the time interval ͓t,tϩ ͔. Temporal averages are denoted by a ˙ T in in Jiϭk1ϩ͚ ͑Qijϩk2ϩk3 ␦ij͒Jj . ͑15͒ bar; cf. Sec. II C. Note that ␭i (t)ϭ␯i (t). j PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4503

Equation ͑15͒ describes the ensemble-averaged dynamics of synaptic weights for a spike-based Hebbian learning rule ͑1͒ under the assumption of a linear inhomogeneous Poissonian model neuron.

C. Relation to Linsker’s equation Linsker ͓3͔ has derived a mathematically equivalent equa- tion starting from Eq. ͑5͒ and a linear graded-response neu- ron, a rate-based model. The difference between Linsker’s equation and Eq. ͑15͒ is, apart from a slightly different no- tation, the term k ␦ . 3 ij FIG. 6. The learning window W in units of the learning param- Equation ͑15͒ without the k3 term has been analyzed ex- f n eter ␩ as a function of the delay sϭti Ϫt between presynaptic tensively by MacKay and Miller ͓5͔ in terms of eigenvectors f spike arrival at synapse i at time ti and postsynaptic firing at time and eigenfunctions of the matrix Qijϩk2. In principle, there tn.IfW(s) is positive ͑negative͒ for some s, the synaptic efficacy J is no difficulty in incorporating the k term in their analysis, i 3 is increased ͑decreased͒. The increase of Ji is most efficient if a because Qijϩk2ϩ␦ij k3 contains k3 times the unit matrix presynaptic spike arrives a few milliseconds before the postsynaptic and thus has the same eigenvectors as Qijϩk2. All eigen- neuron starts firing ͑vertical dashed line at sϭs*). For ͉s͉ ϱ we → values are simply shifted by k3 . have W(s) 0. The form of the learning window and parameter → The k3 term can be neglected if the number N of synapses values are as described in Appendix B 1. is large. More specifically, the influence of the k3 term as compared to the k and Q term is negligible if for all i, depending on the sign of k3 . Since firing rates ␯ are always 2 ij in positive and k3ϭ␯ ͐dsW(s) ⑀(Ϫs), the sign of the integral ͐dsW(s)⑀(Ϫs) is crucial. Hebb’s principle suggests that for ͉Qijϩk2͉Jjӷ͉k3͉Ji . ͑16͒ ͚j excitatory synapses, the integral is always positive. To un- derstand why, let us recall that s is defined as the time dif- This holds, for instance, if ͑i͒ we have many synapses, ͑ii͒ ference between input and output spikes. The response ker- ͉k3͉ is smaller than or at most of the same order of magni- nel ⑀ vanishes for negative arguments. Thus the integral tude as ͉k2ϩQij͉ for all i and j, and ͑iii͒ each synapse is effectively runs only over negative s. According to our defi- weak as compared to the total synaptic weight, JiӶ͚ jJ j . nition, sϽ0 implies that presynaptic spikes precede postsyn- The assumptions ͑i͒–͑iii͒ are often reasonable neurobiologi- aptic firing. These are the spikes that may have participated cal conditions, in particular when the pattern of synaptic in firing the postsynaptic neuron. Hebb’s principle ͓1͔ sug- weights is still unstructured. The analysis of Eq. ͑15͒ pre- gests that these synapses are strengthened, hence W(s)Ͼ0 sented in Sec. V and focusing on normalization and structure for sϽ0; cf. Fig. 6. This idea is also in agreement with formation is therefore based on these assumptions. In par- recent neurobiological results ͓25,26,33͔: Only those syn- ticular, we neglect k3 . apses are potentiated where presynaptic spikes arrive a few Nevertheless, our approach even without the k3 term is far milliseconds before a postsynaptic spike occurs so that the more comprehensive than Linsker’s rate-based ansatz ͑5͒ be- former arrive ‘‘in time.’’ We conclude that ͐dsW(s)⑀(Ϫs) cause we have derived Eq. ͑15͒ from a spike-based learning Ͼ0 and, hence, the k3 term is positive. rule ͑1͒. Therefore correlations between spikes on time With k3Ͼ0 every weight and thus every structure in the scales down to milliseconds or below can enter the driving distribution of weights is enhanced. This may contribute to term Qij so as to account for structure formation. Correla- the stability of structured weight distributions at the end of tions on time scales of milliseconds or below may be essen- learning, in particular when the synapses are few and strong tial for information processing in neuronal systems; cf. Sec. ͓22,54͔. In this case, Eq. ͑16͒ may be not fulfilled and the k3 III C. In contrast to that, Linsker’s ansatz is based on a term in Eq. ͑15͒ has an important influence. Thus spike- firing-rate description where the term Qij contains correla- based learning is different from simple rate-based learning tions between mean firing rates only. If we use a standard rules. Spike-spike correlations on a millisecond time scale interpretation of rate coding, a mean firing rate corresponds play an important role and tend to stabilize existing strong to a temporally averaged quantity which varies on a time synapses. scale of the order of hundreds of milliseconds. The temporal structure of spike trains is neglected completely. V. LEARNING DYNAMICS Finally, our ansatz ͑1͒ allows the analysis of the influence of noise on learning. Learning results from stepwise weight In order to get a better understanding of the principal changes. Each weight performs a random walk whose expec- features of the learning dynamics, we discuss Eq. ͑15͒ with tation value is described by the ensemble-averaged equation k3ϭ0 for a particularly simple configuration: a model with ͑15͒. Analysis of noise as a deviation from the mean is de- two groups of synapses. Input rates are homogeneous within ferred to Sec. VI. each group but different between one group and the other. Our discussion focuses on intrinsic normalization of output rates and structure formation. We take lower and upper D. Stabilization of learning bounds for the J values into account explicitly and consider We now discuss the influence of the k3 term in Eq. ͑15͒. the limiting case of weak correlations in the input. We will It gives rise to an exponential growth or decay of weights, see that for a realistic scenario we need to require winϾ0 and 4504 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

TABLE I. Values of the parameters used for the numerical in in arbitrary time-dependent input, ␭i (t)ϭ␭ (t), with the same simulations ͑a͒ and derived quantities ͑b͒. in in mean input rate ␯ ϭ␭ (t) as in group 1 . Without going into details about the dependence of ␭inM(t) upon the time t, ͑a͒ Parameters in we assume ␭ (t) to be such that the covariance qij(s;t)in Learning ␩ϭ10Ϫ5 Eq. ͑11͒ is independent of t. In this case it follows from Eq. winϭ␩ ͑12͒ that Q (t)ϵQ for i, j෈ , regardless of t. For the ij M2 woutϭϪ1.0475␩ sake of simplicity we require in addition that QϾ0. In sum-

Aϩϭ1 mary, we suppose in the following AϪϭϪ1 QϾ0 for i, j෈ 2 , ␶ϩϭ1ms Qij͑t͒ϭ M ͑17͒ ␶ϭ20 ms ͭ0 otherwise. Ϫ ␶synϭ5ms We recall that Qij is a measure of the correlations in the input arriving at synapses i and j; cf. Eqs. ͑11͒ and ͑12͒. EPSP ␶0ϭ10 ms Synaptic input Nϭ50 Equation ͑17͒ states that at least some of the synapses re- ceive positively correlated input, a rather natural assumption. M 1ϭ25 M ϭ25 Three different realizations of Eq. ͑17͒ are now discussed in 2 turn. ␯inϭ10 sϪ1 ␦␯inϭ10 sϪ1 1. White-noise input ␻/(2␲)ϭ40 sϪ1 For all synapses in group 2 , let us consider the case of M in Further parameters ␯0ϭ0 stochastic white-noise input with intensity ␭ (t) and mean in in ␽ϭ0.1 firing rate ␭ (t)ϭ␯ (t)у0. The fluctuations are in in in in ͓␭ (tϩs)Ϫ␯ (tϩs)͔͓␭ (t)Ϫ␯ (t)͔ϭ␴0␦(s). Due to the ͑b͒ Derived quantities convolution ͑8͒ with ⑀, Eq. ͑11͒ yields qij(s;t)ϭ␴0⑀(Ϫs), ˜ Ϫ8 independently of t, i, and j. We use Eq. ͑12͒ and find W(0)ª͐dsW(s)ϭ4.75ϫ10 s ͐dsW(s)2ϭ3.68ϫ10Ϫ12 s Qij(t)ϵQϭ␴0͐dsW(s)⑀(Ϫs). We want QϾ0 and there- in ͐dsW(s)⑀(Ϫs)ϭ7.04ϫ10Ϫ6 fore arrive at ͐W(s)⑀(Ϫs)ϭk3 /␯ Ͼ0. We have seen be- Qϭ6.84ϫ10Ϫ7 sϪ1 fore in Sec. IV D that k3Ͼ0 is a natural assumption and in Ϫ4 Ϫ1 agreement with experiments. k1ϭ1ϫ10 s Ϫ4 Ϫ1 k2ϭϪ1ϫ10 s Ϫ5 Ϫ1 2. Colored-noise input k3ϭ7.04ϫ10 s ␶avϭ2ϫ102 s Let us now consider the case of an instantaneous and in ␶strϭ2.93ϫ104 s memoryless excitation, ⑀(s)ϭ␦(s). We assume that ␭ in ␶noiseϭ1.62ϫ105 s Ϫ␯ obeys a stationary Ornstein-Uhlenbeck process ͓62͔ Javϭ2ϫ10Ϫ2 with correlation time ␶c . The fluctuations are therefore * Dϭ2.47ϫ10Ϫ9 sϪ1 qij(s;t)ϰexp(Ϫ͉s͉/␶c), independent of the synaptic indices i DЈϭ1.47ϫ10Ϫ9 sϪ1 and j. QϾ0 implies ͐dsW(s)exp(Ϫ͉s͉/␶c)Ͼ0.

3. Periodic input woutϽ0 and that we can formulate theoretical predictions of Motivated by oscillatory neuronal activity in the auditory the relative magnitude of the learning parameters win,wout system and in the cortex ͑cf. Sec. III C͒, we now consider the and the form of the learning window W. The theoretical con- scenario of periodically modulated rates ͓␭in(t)Ϫ␯in͔ siderations are illustrated by numerical simulations whose ϭ␦␯in cos(␻ t), where ␻ӷ2␲/ . Let us first study the case inT 2 parameters are justified in Appendix B and summarized in ⑀(s)ϭ␦(s). We find Qϭ(␦␯ ) /2͐dsW(s) cos(␻s). Posi- Table I. tive Q hence requires the real part of the Fourier transform ˜ ˜ W(␻)ª͐dsW(s)exp(i␻s) to be positive, i.e., Re͓W(␻)͔ Ͼ A. Models of synaptic input 0. For a general interaction kernel ⑀(s), we find qij(s;t) ϭ(␦␯in)2/2 ͐dsЈ⑀(sЈ)cos͓␻ (sϩsЈ)͔ and hence We divide the N statistically independent synapses, all converging onto the very same neuron, into two groups, 1 in 2 ˜ ˜ M Qϭ͑␦␯ ͒ /2 Re͓W͑␻͒⑀͑␻͔͒, ͑18͒ and 2 . The numbers of synapses are M 1 and M 2 , respec- tively,M where M ϩM ϭN and M ,M ӷ1. Since each 1 2 1 2 independent of t. Then QϾ0 requires Re͓W˜ (␻)˜⑀(␻)͔Ͼ0. group contains many synapses, we may assume that M 1 and M 2 are of the same order of magnitude. The spike input at B. Normalization synapses i in group 1 is generated by a Poisson process M in in with a constant intensity ␭i (t)ϵ␯ , which is independent of Normalization is a very desirable property for any learn- t. We therefore have Qij(t)ϵ0 for i or j෈ 1 ; cf. Eqs. ͑11͒ ing rule. It is a natural requirement that the average weight and ͑12͒. The synapses i in group Mare driven by an and the mean output rate do not blow up during learning but M2 PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4505

FIG. 7. Numerical simulation of weight normalization with pa- FIG. 8. Development of the average weight Jav as a function of rameters as given in Appendix B. The four graphs show the tem- time t in units of 103 s. The simulations we started at tϭ0 with poral evolution of synaptic weights J ,1рiр50, before (tϭ0) i five different average weights, Jav෈͕0, 0.01, 0.02ϭJav, 0.05, 0.1 and during learning (tϭ200, 500, and 1000 s͒. Before learning, all * ϭ␽͖. Full lines indicate homogeneous initial weight distributions, weights are initialized at the upper bound ␽ϭ0.1. During learning, where J ϭJav at tϭ0 for all i; cf. also Fig. 7, upper left panel. In all weights decrease towards the fixed point of the average weight, i five cases, Jav decays with the time constant ␶avϭ2.0ϫ102 s de- Javϭ2.0ϫ10Ϫ2; cf. also Fig. 8, topmost full line. The time constant * scribing the rate of normalization to the fixed point Javϭ2.0 of normalization is ␶avϭ2.0ϫ102 s, which is much smaller than the * ϫ10Ϫ2. Our theoretical prediction according to Sec. V B ͑crosses time constant of structure formation; cf. Sec. V C and Fig. 9. For on the uppermost full line͒ is in good agreement with the numerical times tр1000 s we therefore can neglect effects coming from results. The dashed line indicates the development of Jav starting structure formation. from an inhomogeneous initial weight distribution Jiϭ0 for 1рi р25 and J ϭ␽ for 25Ͻiр50ϭN. In the inhomogeneous case, ␶av are stabilized at a reasonable value in an acceptable amount i is enlarged as compared to the homogeneous case by a factor of 2 of time. Standard rate-based Hebbian learning can lead to because only half of the synapses are able to contribute to normal- unlimited growth of the average weight. Several methods ization; cf. Appendix D. The insets ͑signatures as in Fig. 7͒ show have been designed to control this unlimited growth; for in- the inhomogeneous weight distributions ͑arrows͒ at times t stance, subtractive or multiplicative rescaling of the weights ϭ0, 200, 500, and 1000 s; the dotted line indicates the fixed point after each learning step so as to impose either ͚ jJ jϭconst or Javϭ0.2␽. We note that here the distribution remains inhomoge- 2 * else ͚ jJ j ϭconst; cf., e.g., ͓2,7,55͔. It is hard to see, how- neous. ever, where this should come from. Furthermore, a J depen- dence of the parameters a1 ,...,a5 in the learning equation specific input ͑17͒ described in the preceding section yields av 2 ͑5͒ is often assumed. Higher-order terms in the expansion ͑5͒ Q ϭ(M 2 /N) QϾ0. We rewrite Eq. ͑20͒ in the standard may also be used to control unlimited growth. form J˙ avϭ͓JavϪJav͔/␶av, where In this subsection we show that under some mild condi- * tions there is no need whatsoever to invoke the J dependence JavϭϪk /͓N͑k ϩQav͔͒ ͑21͒ of the learning parameters, rescaling of weights, or higher- * 1 2 order correlations to get normalization, which means here that the average weight is the fixed point for the average weight and

N av av av 1 ␶ ϭJ /k1ϭϪ1/͓N͑k2ϩQ ͔͒ ͑22͒ av * J ϭ Ji ͑19͒ N i͚ϭ1 is the time constant of normalization. The fixed point in Eq. av approaches a stable fixed point during learning. Moreover, in ͑21͒ is stable, if and only if ␶ Ͼ0. out in this case the mean output rate ␯ is also stabilized since During learning, weights ͕Ji͖ and rates ͕␭i ͖ may become out av in correlated. In Appendix C we demonstrate that the influence ␯ ϭ␯0ϩNJ ␯ ; cf. Eq. ͑7͒. As long as the learning parameters do not depend on the J of any interdependence between weights and rates on nor- values, the rate of change of the average weight is obtained malization can be neglected in the case of weak correlations in the input, from Eqs. ͑15͒, ͑19͒, and k3ϭ0 ͑Sec. IV C͒,

0ϽQӶϪk . 23 ˙ av av Ϫ1 2 ͑ ͒ J ϭk1ϩNk2J ϩN QijJj . ͑20͒ ͚i, j The fixed point Jav in Eq. ͑21͒ and the time constant ␶av in * In the following we consider the situation at the beginning Eq. ͑22͒ are, then, almost independent of the average corre- av of the learning procedure where the set of weights ͕Ji͖ has lation Q , which is always of the same order as Q. not picked up any correlations with the set of Poisson inten- In Figs. 7 and 8 we show numerical simulations with in av sities ͕␭i ͖ yet and therefore is independent. We may then parameters as given in Appendix B. The average weight J av replace Ji and Qij on the right-hand side of Eq. ͑20͒ by their always approaches J , independent of any initial conditions av av Ϫ2 N * average values J and Q ϭN ͚i, jQij , respectively. The in the distribution of weights. 4506 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

Weight constraints lead to further conditions on learning may be enlarged is of order 1, if we take the upper bound to parameters be ␽ϭ(1ϩd)Jav , where dϾ0 is of order 1, which will be * We have seen that normalization is possible without a J assumed throughout what follows; cf. Appendix D. dependence of the learning parameters. Even if the average weight Jav approaches a fixed point Jav , there is no restric- C. Structure formation * tion for the size of individual weights, apart from Jiу0 for av In our simple model with two groups of input, structure excitatory synapses and J ՇNJ . This means that a single str i * formation can be measured by the difference J between the weight at most comprises the total ͑normalized͒ weight of all average synaptic strength in groups 1 and 2 ; cf. Sec. N synapses. The latter case is, however, unphysiological, V A. We derive conditions under whichM thisM difference in- since almost every neuron holds many synapses with nonva- creases during learning. In the course of the argument we nishing efficacies ͑weights͒ and efficacies of biological syn- also show that structure formation takes place on a time scale apses seem to be limited. We take this into account in our ␶str considerably slower than the time scale ␶av of normaliza- learning rule by introducing a hard upper bound ␽ for each tion. individual weight. As we will demonstrate, a reasonable We start from Eq. ͑15͒ with k3ϭ0 and randomly distrib- value of ␽ does not influence normalization in that Jav re- * uted weights. For the moment we assume that normalization mains unchanged. However, an upper bound ␽Ͼ0, whatever has already taken place. Furthermore, we assume small cor- its value, leads to further constraints on the learning param- relations as in Eq. ͑23͒, which assures that the fixed point eters. JavϷϪk /(Nk ) is almost constant during learning; cf. Eqs. * 1 2 To incorporate the restricted range of individual weights ͑21͒ and ͑C1͒. If the formation of any structure in ͕J ͖ is into our learning rule ͑1͒, we assume that we can treat the i in out slow as compared to normalization, we are allowed to use learning parameters w ,w , and the amplitude of W to be JavϭJav during learning. The consistency of this ansatz is constant in the range 0рJiр␽. For JiϽ0orJiϾ␽, we take * in out checked at the end of this section. w ϭw ϭWϭ0. In other words, we use Eq. ͑15͒ only be- The average weight in each of the two groups and tween the lower bound 0 and the upper bound ␽ and set M1 2 is dJi /dtϭ0ifJiϽ0orJiϾ␽. M Because of lower and upper bounds for each synaptic 1 1 av J͑1͒ϭ J and J͑2͒ϭ J . ͑26͒ weight, 0рJiр␽ for all i,arealizable fixed point J has to ͚ i ͚ i * M 1 i෈ M 2 i෈ be within these limits. Otherwise all weights saturate either M1 M2 at the lower or at the upper bound. To avoid this, we first of If lower and upper bounds do not influence the dynamics of all need JavϾ0. Since ␶avϭJav/k in Eq. ͑22͒ must be posi- * * 1 each weight, the corresponding rates of change are tive for stable fixed points, we also need k1Ͼ0. The meaning becomes transparent from Eq. ͑14͒ in the case of vanishing J˙ ͑1͒ϭk ϩM J͑1͒k ϩM J͑2͒k , spontaneous activity in the output, ␯0ϭ0. Then k1Ͼ0 re- 1 1 2 2 2 duces to ͑27͒ ˙ ͑2͒ ͑2͒ ͑1͒ J ϭk1ϩM 2J ͑k2ϩQ͒ϩM 1J k2 . winϾ0, ͑24͒ One expects the difference JstrϭJ(2)ϪJ(1) between those av- which corresponds to neurobiological reality ͓28,56,31͔. erage weights to grow during learning because group A second condition for a realizable fixed point arises M2 av receives a stronger reinforcement than 1 . Differentiating from the upper bound ␽ϾJ . This requirement leads to str M av * J with respect to time, using Eq. ͑27͒ and the constraint k2ϽϪk1 /(N ␽)ϪQ . Exploiting only k2Ͻ0, we find from av av Ϫ1 (1) (2) J ϭJ ϭN (M 1J ϩM2J ), we find the rate of growth Eq. ͑14͒ that woutϩW˜ (0)␯inϽ0, which means that postsyn- * aptic spikes on average reduce the total weight of synapses. M 1 M 2 This is one of our predictions that can be tested experimen- J˙ strϭ QJstrϩM QJav . ͑28͒ N 2 * tally. Assuming W˜ (0)Ͼ0, which seems reasonable — with the benefit of hindsight — in terms of rate-coded learning in the manner of Hebb ͑Sec. III͒, we predict The first term on the right-hand side gives rise to an expo- nential increase (QϾ0) while the second term gives rise to a str woutϽ0, ͑25͒ linear growth of J . Equation ͑28͒ has an unstable fixed point at JstrϭϪN/M Jav . Note that Jstr is always negative * 1 * * which has not been verified by experiments yet. and independent of Q. str Weight constraints do not influence the position of the We associate the time constant ␶ of structure formation str fixed point Jav ͑as long as it remains realizable͒ but may with the time that is necessary for an increase of J from a * str enlarge the value of the time constant ␶av of normalization typical initial value to its maximum. The maximum of J is av of order Jav if M /M is of order 1 ͑Sec. V A͒ and if ␽ ͑see details in Appendix D͒. The time constant ␶ changes * 1 2 ϭ(1ϩd) Jav , where dϾ0 is of order 1 ͑Sec. V B͒.Atthe because weights saturated at the lower ͑upper͒ bound cannot * contribute to a decrease ͑increase͒ of Jav. If fewer than the beginning of learning (tϭ0) we may take Jstr(0)ϭ0. Using total number of weights add to our ͑subtractive͒ normaliza- this initial condition, an integration of Eq. ͑28͒ leads to Jstr(t)ϭ(N/M ) Jav͓exp(tM M Q/N)Ϫ1͔. With tϭ␶str and tion, then the fixed point is approached more slowly; cf. Fig. 1 * 1 2 8, dashed line and insets. The factor, however, by which ␶av Jstr(␶str)ϭJav we obtain ␶strϭN/(M M Q) ln(M /Nϩ1). * 1 2 1 PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4507

FIG. 10. The asymptotic distribution of weights ͕Ji͖ at time t ϭ105 s; signatures are as in Fig. 7. This distribution is the final result of the numerical simulation shown in Fig. 9 and remains stable thereafter apart from minor rapid fluctuations. All but one of the synapses are saturated either at the lower or at the upper bound. FIG. 9. Temporal evolution of average weights Jav, J(1), and In stable weight distributions, it is now shown that all J(2) as a function of the learning time t in units of 104 s. The quantity Jav is the average weight of all synapses, J(1) and J(2) are synapses but one are saturated either at the lower or the average weights in groups and , respectively. Synapses i in upper bound. In the scenario of Fig. 10, the k3 term keeps a M1 M2 single weight J in group at the upper bound ␽, even group , where 1рiр25, receive incoherent input, whereas syn- m1 1 M1 M apses i in group , where 26рiр50, are driven by a coherently though there is a nonsaturated one J in . Group 2 m2 2 2 modulated inputM intensity. Parameters are as given in Appendix B. M M ͑in contrast to 1) comprises most of the total weight and is Simulations started at time tϭ0 with a homogeneous weight distri- M driven by positively correlated input. Why does Jm not bution J ϭ␽ϭ0.1 for all i. The normalization of the average 1 i decrease in favor of J ? The answer comes from Eq. weights takes place within a time of order O(100 s); see also the m2 str ͑15͒. Weight J receives a stronger reinforcement than uppermost full line in Fig. 8. On the time scale of ␶ ϭ2.93 m1 4 ϫ10 s a structure in the distribution of weights emerges in that J ,ifJ˙ ϾJ˙ holds. Using Eq. ͑15͒ we find J (2) (1) av m2 m1 m2 m1 J grows at the expense of J . The average weight J remains av av Ϫ2 ϾQ/k ͚ J ϩJ . Approximating ͚ J ϭNJ almost unaffected near J ϭ2ϫ10 ͑dashed line͒. The slight en- 3 j෈ 2 j m2 j෈ 2 j * M M * largement of Jav between tϭ104 s and tϭ7ϫ104 s can be ex- ϭϪk /k we obtain J ϾϪQk /(k k )ϩJ . This con- 1 2 m1 1 2 3 m2 plained by using Eq. ͑C1͒ and taking also the k term into account. 3 dition is fulfilled because Jm Ϸ0.1, Jm Ϸ0.04 ͑cf. Fig. 10͒, The insets signatures as in Figs. 7 and 8 show the weight distri- 1 2 ͑ ͒ and ϪQk /(k k )Ϸ0.01 ͑cf. Table I͒; here k Ͼ0, k Ͻ0, butions at times tϭ103,104, 2.93ϫ104, and 7ϫ104 s ͑arrows͒. 1 2 3 1 2 and k3Ͼ0. Since we only need an estimate of ␶str we drop the logarithm, VI. NOISE which is of order 1. Finally, approximating N/(M1M2)by 1/N we arrive at the estimate In this section we discuss the influence of noise on the

str Ϫ1 evolution of each weight. Noise may be due to jitter of input ␶ ϭ͑NQ͒ . ͑29͒ and output spikes and the fact that we deal with spikes per se ͑Sec. II B͒. This gives rise to a random walk of each weight We could adopt a refined analysis similar to the one we have around the mean trajectory described by Eq. ͑15͒. The vari- used for Jav to discuss the effects of the upper and lower ance var J (t) of this random walk increases linearly with bounds for individual weights. We will not do so, however, i time as it does in free diffusion. From the speed of the vari- since the result ͑29͒ suffices for our purpose: the comparison ance increase we derive a time scale ␶noise. A comparison of time constants. with the time constant ␶str of structure formation leads to A comparison of ␶av in Eq. ͑22͒ with ␶str in Eq. ͑29͒ further constraints on our learning parameters and shows shows that we have a separation of the fast time scale of that, in principle, any correlation in the input, however weak, normalization from the slow time scale of structure forma- can be learned, if there is enough time available for learning. tion, if Eq. ͑23͒ holds. The calculation of var J is based on four approximations. A numerical example confirming the above theoretical i First, we neglect upper and lower bounds of the learning considerations is presented in Fig. 9. Simulation parameters dynamics as we have done for the calculation of the time are as given in Appendix B. constants of normalization ͑Sec. V B͒ and structure forma- tion ͑Sec. V C͒. Second, we neglect spike-spike correlations D. Stabilization of learning between input and output and work directly with ensemble- Up to this point we have neglected the influence of the k3 averaged rates. As we have seen, spike-spike correlations term in Eq. ͑15͒, which may lead to a stabilization of weight show up in the term k3 in Eq. ͑15͒ and have little influence distributions, in particular when synapses are few and strong on learning, given many, weak synapses and an appropriate ͓22,54͔; cf. Sec. IV D. This is the case, for example, in the scenario for our learning parameters; cf. Sec. IV C. Third, we in in scenario of Fig. 10, which is the final result of the simula- assume constant input rates ␭i (t)ϭ␯ for all i. A temporal tions described in Fig. 9. The shown weight distribution is structure in the input rates is expected to play a minor role stable so that learning has terminated apart from minor rapid here. Fourth, as a consequence of constant input rates we fluctuations due to noise. assume a constant output rate ␭out(t)ϭ␯out. 4508 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

Despite such a simplified approach, we can study some interesting effects caused by neuronal spiking. Within the limits of our approximations, input and output spikes are generated by independent Poisson processes with constant intensities. The variance var Ji(t) increases basically because of shot noise at the synapses. We now turn to the details.

A. Calculation of the variance

We start with some weight Ji(t0) at time t0 and calculate 2 2 the variance var Ji(t)ª͗Ji ͘(t)Ϫ͗Ji͘ (t) as a function of t for tϾt0 . Angular brackets ͗͘ again denote an ensemble average; cf. Sec. II B. A detailed analysis is outlined in Ap- FIG. 11. Influence of noise. We compare numerical results for pendix E. The result is the evolution of the variance var͕Ji͖(t) defined in Eq. ͑32͒͑full var J ͑t͒ϭ͑tϪt ͒ D for tϪt ӷ , ͑30͒ lines͒ with our theoretical prediction ͑dashed line͒ based on Eq. i 0 0 W ͑33͒ with DЈϭ1.47ϫ10Ϫ9 sϪ1. Learning starts at time tϭ0 with a av where is the width of the learning window W cf. Sec. homogeneous distribution, JiϭJ ϭ0.02 for all i. The thin line cor- ͑ * av II C͒ andW responds to the simulation of Fig. 8 with initial condition J ϭ0.02, viz., two groups of 25 synapses each. The thick line has in been obtained with incoherent input for all 50 synapses, ␭i (t) Dϭ␯in͑win͒2ϩ␯out͑wout͒2ϩ␯in␯out ͵ ds W͑s͒2 ϭ␯in for all i ͑all other parameters in Appendix B being equal͒. Because the number of synapses is finite (Nϭ50), deviations from ϩ␯in␯outW˜ ͑0͒ ͓2͑winϩwout͒ϩW˜ ͑0͒͑␯inϩ␯out͔͒. the dashed straight line are due to fluctuations. The overall agree- ment between theory and simulation is good. ͑31͒ var ͕Ji͖(t) as long as upper and lower bounds have not been Thus because of Poisson spike arrival and stochastic output reached. Furthermore, all synapses ‘‘see’’ the same spike firing with disjoint intervals being independent, each weight train of the postsynaptic neuron they belong to. In contrast to Ji undergoes a diffusion process with diffusion constant D. that, input spikes at different synapses are independent. To discuss the dependence of D upon the learning param- Again we assume that input and output spikes are indepen- eters, we restrict our attention to the case ␯inϭ␯out in Eq. dent; cf. the second paragraph at the beginning of Sec. VI. ͑31͒. Since mean input and output rates in biological neurons Combining the above arguments, we obtain the diffusion typically are not too different, this makes sense. Moreover, constant DЈ by simply setting woutϭ0 and disregarding the we do not expect that the ratio ␯in/␯out is a critical parameter. term ͓␯inW˜ (0)͔2␯out in Eq. ͑31͒, which leads to out in in in 2 in out 2 in 2 We recall from Sec. V B that ␯ ϭϪk1 /k2 ␯ once the DЈϭ␯ (w ) ϩ␯ ␯ ͓͐ds W(s) ϩ2 w W˜ (0)ϩW˜ (0) ͔. av weights are already normalized and if ␯0ϭQ ϭ0. With The boundaries of validity of Eq. ͑33͒ are illustrated in Fig. in out ␯ ϭ␯ this is equivalent to k1ϭϪk2 . Using the definition 12. in out ˜ in of k1 and k2 in Eq. ͑14͒ we find w ϩw ϭϪW(0) ␯ .If we insert this into Eq. ͑31͒, the final term vanishes. In what B. Time scale of diffusion remains of Eq. ͑31͒ we identify the contributions due to input The effects of shot noise in input and output show up on spikes, ␯in (win)2, and output spikes, ␯out (wout)2. Weight a time scale ␶noise which may be defined as the time interval changes because of correlations between input and output necessary for an increase of the variance ͑30͒ from in out 2 var J (t )ϭ0 to var J (t ϩ␶noise)ϭ(Jav)2. We chose Jav as a spikes enter Eq. ͑31͒ via ␯ ␯ ͐ds W(s) . i 0 i 0 * * Equation ͑30͒ describes the time course of the variance of reference value because it represents the available range for each weight. From Eq. ͑30͒ we obtain ␶noiseϭ(Jav)2/D.We a single weight. Estimating var Ji is numerically expensive * use JavϭϪk /(Nk ) from Eq. ͑21͒ and Qavϭ0. This yields because we have to simulate many independent learning tri- * 1 2 als. It is much cheaper to compute the variance of the distri- 2 1 k1 bution ͕Ji͖ of weights in a single learning trial. For the sake ␶noiseϭ . ͑34͒ 2 of a comparison of theory and numerics in Fig. 11, we plot N Dͩ k2 ͪ

N 1 C. Comparison of time scales var J t J t ϪJav t 2, 32 ͕ i͖͑ ͒ª ͚ ͓ i͑ ͒ ͑ ͔͒ ͑ ͒ noise NϪ1iϭ1 We now compare ␶ in Eq. ͑34͒ with the time constant ␶strϭ1/(NQ) of structure formation as it appears in Eq. ͑29͒. which obeys a diffusion process with The ratio

noise 2 var ͕Ji͖͑t͒ϭ͑tϪt0͒ DЈ, ͑33͒ ␶ Q k1 ϭ ͑35͒ str ND k in a way similar to Eq. ͑30͒. The diffusion constant DЈ is, ␶ ͩ 2ͪ however, different from D because weights of single neurons should exceed 1 so as to enable structure formation ͑in the do not develop independently of each other. Each output out sense of Sec. V C͒. Otherwise weight diffusion due to noise spike triggers the change of all N weights by an amount w . spreads the weights between the lower bound 0 and the up- Therefore, output spikes do not contribute to a change of per bound ␽ and, consequently, destroys any structure. PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4509

be motivated by elementary dynamic processes at the level of the synapse ͓54,57͔ and can also be implemented in hard- ware; cf. ͓40͔. A phenomenological model of the experimen- tal effects which is close to the model studied in the present paper has been introduced ͓42͔. A compartmental model of the biophysics and ion dynamics underlying spike-based learning along the lines of ͓58͔ has not been attempted yet. As an alternative to changing synaptic weights, spike-based learning rules which act directly on the delays may also be considered ͓59–61͔. The learning rule ͑1͒ discussed in the present paper is rather simple and contains only terms that are linear and quadratic in the presynaptic and postsynaptic spikes ͑‘‘Heb- FIG. 12. The variance var ͕Ji͖(t) as in Fig. 11 but on a longer time scale. Thick line: all 50 synapses receive incoherent input. bian’’ learning͒. This simple mathematical structure, which Thin line: two groups of synapses that are treated differently, as in is based on experiment ͓25,26,30͔, has allowed us to derive Figs. 8, 9, and 10. The four insets ͑signatures as in Fig. 7͒ corre- analytical results and identify some key quantities. spond to the thick line scenario and show the evolution of the dis- First of all, if the input signal contains no correlations tribution of synaptic weights. As in Fig. 11, full lines are numerical with the output at the spike level, and if we use a linear results and the dashed line is our theoretical prediction. Both differ Poissonian neuron model, the spike-based learning rule re- significantly for times tϾ104 s. The reason is that Eq. ͑33͒ does duces to Eq. ͑15͒, which is closely reminiscent of Linsker’s not include the influence of correlations between input and output. linear learning equation for rate coding ͓3͔. The only differ- Spike-spike correlations due to the k term increase weights with a 3 ence is an additional term k3 , which is not accounted for by velocity proportional to their weight; cf. Eq. ͑15͒. Large weights, pure rate models. It is caused by precise temporal correla- 4 which are already present at times tտ2ϫ10 s ͑see inset͒, there- tions between an output spike and an input spike that has fore grow at the expense of the smaller ones. This gives rise to an triggered the pulse. This additional term reinforces synapses enlarged variance ͑thick full line͒. In the thin-line scenario, we also that are already strong and hence helps to stabilize existing have the Qij term in Eq. ͑15͒, which contributes to an additional 5 synapse configurations. increase of var͕Ji͖. Finally, at tϷ10 s, var͕Ji͖ saturates because In the limit of rate coding, the form of the learning win- most of the weights are either at the lower or at the upper bound. dow W is not important but only the integral ͐ds W(s)

in out counts: ͐ds W(s)Ͼ0 would be called ‘‘Hebbian,’’ We note that D in Eq. ͑35͒ is quadratic in w , w , and ͐ds W(s)Ͻ0 is sometimes called ‘‘anti-Hebbian’’ learning. W, whereas k1 , k2 , and Q are linear; cf. Eqs. ͑12͒, ͑14͒, in out In general, however, input rates may be modulated on a fast ͑17͒, and ͑31͒. As a consequence, scaling w , w , and time scale or contain correlations at the spike level. In this W(s) ͑or the learning parameter ␩) in Eq. ͑1͒ by a common case, the shape of the learning window does matter. A learn- factor ␥ changes the ratio of time constants in Eq. ͑35͒ by ing window with a maximum at s*Ͻ0 ͑thus maximal in- 1/␥ without affecting the ͑normalized͒ mean output rate and av str av crease of the synaptic strength for a presynaptic spike pre- the fixed points J and J ϭϪJ N/M 1 ; cf. Eq. ͑21͒. ceding a postsynaptic spike; cf. Fig. 6͒ picks up the * * * noise str Hence it is always possible to achieve ␶ /␶ Ͼ1 by tuning correlations in the input. In this case a structured distribution ␥. This means that any covariance matrix ͑11͒ that gives rise of synaptic weights may evolve ͓22͔. to QϾ0, however small, can be learned. More precisely, it The mathematical approach developed in this paper leads can be learned if there is enough time for learning. to a clear distinction between different time scales. First, the str A reduction of ␥ also increases the time constant ␶ fastest time scale is set by the time course of the postsynaptic ϭ1/(NQ) of structure formation; cf. Eq. ͑29͒. If the learning potential ⑀ and the learning window W. Correlations in the time is limited, which may be the case in biological systems, input may occur on the same fast time scale, but can also be only input with Q larger than some minimal value can be slower or faster, there is no restriction. Second, learning oc- learned. Considering the learning parameters as fixed, we see curs on a much slower time scale and in two phases: ͑i͒ an that increasing the number of synapses, on the one hand, intrinsic normalization of total synaptic weight and the out- str helps reduce the time ␶ necessary for learning but, on the put firing rate followed by ͑ii͒ structure formation. Third, if other hand, decreases the ratio ␶noise/␶str in Eq. ͑35͒, possibly the learning rate is small enough, then diffusion of the below 1. weights due to noise is slow as compared to structure forma- With parameters as given in Appendix B, the ratio ͑35͒ is tion. In this limit, the learning process is described by the 5.5. Therefore, the desired structure in Fig. 9 can emerge differential equation ͑4͒ for the expected weights. before noise spreads the weights at random. Normalization is possible, if at least winϾ0 and woutϽ0 for ͐ds W(s)Ͼ0 in Eq. ͑1͒͑‘‘Hebbian’’ learning͒. In this VII. DISCUSSION case, the average weight may decay exponentially to a fixed point, though there is no decay term for individual weights. Changes of synaptic efficacies are triggered by the rela- In other words, normalization is an intrinsic property since tive timing of presynaptic and postsynaptic spikes ͓25,26͔. we do not invoke multiplicative or subtractive rescaling of The learning rule ͑1͒ discussed in this paper is a first step weights after each learning step ͓2,7,55͔. towards a description and analysis of the effects of synaptic The fluctuations due to noise have been treated rather changes with single-spike resolution. Our learning rule can crudely in the present paper. In principle, it should be pos- 4510 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

in sible to include the effects of noise directly at the level of the The term ͗Si (tϩs) f i(t)͘ in Eq. ͑A1͒ has to be handled with differential equation, as is standard in statistics ͓62͔. Such an more care as it describes the influence of synapse i on the approach would then lead to a Fokker-Planck equation for firing behavior of the postsynaptic neuron, the evolution of weights as discussed in ͓63͔. All this is in principle straightforward but in practice very cumbersome. Finally, we emphasize that we have used a crudely over- in simplified neuron model, viz., a linear stochastic unit. In par- ͗Si ͑tϩs͒ f i͑t͒͘ ticular, there is no spike emission threshold nor reset or spike afterpotential. Poisson firing is not as unrealistic as it may at first seem, though. Large networks of integrate-and-fire neu- f Ј f rons with stochastic connectivity exhibit Poisson-like firing ϭ ␦͑tϩsϪti ͒ Ji͑t͒ ⑀͑tϪti ͒ . ͑A3͒ ͚ ͚f ͓64͔. Experimental spike interval distributions are also con- ͳ ͫ f Ј ͬͫ ͬʹ sistent with Poisson firing ͓65͔. In the present paper, the simple Poisson model has been chosen so as to grasp the mathematics and get an explicit expression for the correla- The first term on the right in Eq. ͑A3͒ samples spike events tion between input and output spikes. The formulation of the at time tϩs. To be mathematically precise, we sample all learning rule and the derivation of the learning equation 4 ͑ ͒ spikes in a small interval of size ⌬t around tϩs, average, is general and holds for any neuron model. The calculation and divide by ⌬t. We replace the first sum in Eq. ͑A3͒ by the of the correlations which enter in the definition of the param- Ϫ1 ͑approximate͒ identity (⌬t) 1͕spike in [tϩs,tϩsϩ⌬t)͖ , where eter Qij in Eq. ͑15͒ is, however, much more difficult, if a nonlinear neuron model is used. 1͕͖ is the indicator function of the set ͕͖; i.e., it equals 1 Spike-based Hebbian learning has important implications when its argument is in the set ͕͖ and 0 elsewhere. Because for the question of since it allows us to pick up the postsynaptic potential ⑀ is a continuous function, we ap- proximate the second sum by ͚ ⑀(tϪt ), and stabilize fast temporal correlations ͓38,22,41͔. A better k1͕spike in [tk ,tkϩ⌬t)͖ k understanding of spike-triggered learning may thus also con- where ͕͓tk ,tkϩ⌬t),k෈Z͖ is a decomposition of the real tribute to a resolution of the problem of neural coding axis. Since it is understood that ⌬t 0, all events with two or → ͓17,19,65–67͔. more spikes in an interval ͓tk ,tkϩ⌬t) have a probability o(⌬t) and, hence, can be neglected. It is exactly this prop- ACKNOWLEDGMENTS erty that is typical to a Poisson process — and to any bio- logical neuron. The authors thank Christian Leibold for helpful comments What we are going to compute is the correlation between and a careful reading of the manuscript. R.K. has been sup- Sin , the input at synapse i, and the output Sout, which is ported by the Deutsche Forschungsgemeinschaft ͑DFG͒ un- i governed by all synapses, including synapse i. Here the sim- der Grant Nos. He 1729/8-1 and Kl 608/10-1 ͑FG Ho¨rob- plicity of the linear Poissonian neuron model pays off as Sout jekte͒. The authors also thank the DFG for travel support ͑Grant No. He 1729/8-1͒. is linear in the sum of the synaptic inputs and, hence, in each of them. Furthermore, whatever the model, the synaptic ef- ficacies Ji(t) are changing adiabatically with respect to the APPENDIX A: PROOF OF EQ. „9… neuronal dynamics so that they can be taken to be constant In proving Eq. 9 there is no harm in setting ϭ0. We and, thus, out of the average. In the limit ⌬t 0 we can ͑ ͒ ␯0 → then have to compute the average therefore rewrite the right-hand side of Eq. ͑A3͒ so as to find

in out in ͗Si ͑tϩs͒ S ͑t͒͘ϭ Si ͑tϩs͒ f i͑t͒ϩ ͚ f j͑t͒ , ͑A1͒ ͳ ͫ j͑Þi͒ ͬʹ Ϫ1 Ji͑t͒͑⌬t͒ ⑀͑tϪtk͒ ͚k f where f i(t)ϭJi(t)͚ f ⑀(tϪti ) with the upper index f ranging ϫ . f ͗1͕spike in [tϩs,tϩsϩ⌬t)͖ 1͕spike in [tk ,tkϩ⌬t͖͒͘ over the firing times ti Ͻt of neuron i, which has an axonal connection to synapse i; here 1рiрN. Since ⑀ is causal, i.e., ͑A4͒ f ⑀(s)ϵ0 for sϽ0, we can drop the restriction ti Ͻt. The synapses being independent of each other, the sum over j in Without restriction of generality we can choose our parti- (Þi) is independent of Si and thus we obtain tion so that tkϭsϩt for some k, say kϭl. Singling out k ϭl, the rest (kÞl) can be averaged directly, since events in Sin͑tϩs͒ f ͑t͒ i ͚ j disjoint intervals are independent. Because ͗1͕͖͘ ͫ j͑Þi͒ ͬ ͳ ʹ in ϭprob͕spike in ͓tϩs,tϩsϩ⌬t)͖ϭ␭i (tϩs) ⌬t, the result in in is Ji(t) ␭i (tϩs) ⌳i (t), where we have used Eq. ͑8͒.Asfor in the term kϭl, we plainly have 2 ϭ , as an indicator ϭ͗Si ͑͘tϩs͒ ͚ ͗ f j͑͘t͒ 1͕͖ 1͕͖ ͫ j͑Þi͒ ͬ function assumes only two distinct values, 0 and 1. We ob- in ϱ tain Ji(t) ␭i (tϩs) ⑀(Ϫs). in in ϭ␭i ͑tϩs͒ ͚ J j͑t͒ dtЈ⑀͑tЈ͒ ␭j ͑tϪtЈ͒ . ͑A2͒ Collecting terms and incorporating ␯0Þ0, we find Eq. ͑9͒. ͫ j͑Þi͒ ͵0 ͬ PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4511

APPENDIX B: PARAMETERS FOR NUMERICAL SIMULATIONS We discuss the parameter regime of the simulations as shown in Secs. V and VI. Numerical values and important derived quantities are summarized in Table I.

1. Learning window We use the learning window

s s s exp Aϩ 1Ϫ ϩAϪ 1Ϫ for sр0, ͩ ␶synͪ ͫ ͩ ˜␶ ͪ ͩ ˜␶ ͪͬ W͑s͒ϭ␩ ϩ Ϫ ͑B1͒ s s Aϩ exp Ϫ ϩAϪexp Ϫ for sϾ0. Ά ͩ ␶ϩ ͪ ͩ ␶Ϫ ͪ

Here s is the delay between presynaptic spike arrival and syn 2 syn 3 postsynaptic firing, ␩ is a ‘‘small’’ learning parameter, ͵ ds W͑s͒ ⑀͑Ϫs͒ϭ␩ ͑␶ ͒ /͑␶ ϩ␶0͒ syn ˜ syn syn ˜ syn syn ␶ ,␶ϩ ,␶Ϫ ,␶ϩª␶ ␶ϩ /(␶ ϩ␶ϩ), and ␶Ϫª␶ ␶Ϫ /(␶ ϫ͓A ͑2 ␶syn ␶ /␶ ϩ␶synϩ3␶ ͒ ϩ␶Ϫ) are time constants. The dimensionless constants Aϩ Ϫ 0 Ϫ 0 and A determine the strength of synaptic potentiation and Ϫ ϩA ͑2 ␶syn ␶ /␶ ϩ␶synϩ3␶ ͔͒. depression, respectively. Numerical values are ␩ ϩ 0 ϩ 0 Ϫ5 syn ϭ10 , ␶ ϭ5 ms, ␶ϩϭ1 ms, ␶Ϫϭ20 ms, and Aϩ ͑B5͒ ϭ1, AϪϭϪ1; cf. Table I. The learning window ͑cf. Fig. 6͒ is in accordance with experimental results ͓25,26,28,29,33͔. A detailed explanation of our choice of the learning window 3. Synaptic input on a microscopical basis of Hebbian learning can be found The total number of synapses is Nϭ50. For 1рiрM elsewhere ͓54,57͔. 1 ϭ25 synapses in group 1 we use a constant input intensity For the analysis of the learning process we need the inte- in ␭ (t)ϭ␯in. The remainingM M ϭ25 synapses receive a peri- ˜ 2 i 2 grals W(0)ª͐ds W(s) and ͐ds W(s) . The numerical re- odic intensity, in(t)ϭ inϩ in cos( t) for i ; cf. also syn ␭i ␯ ␦␯ ␻ ෈ 2 sult is listed in Table I. Using cϩª␶ /␶ϩ and in M in syn Sec. V A 3. Numerical parameters are ␯ ϭ10 Hz, ␦␯ cϪª␶ /␶Ϫ we obtain ϭ10 Hz, and ␻/(2␲)ϭ40 Hz. For the comparison of theory and simulation we need the value of Q in Eq. ͑18͒. We numerically took the Fourier transforms of ⑀ and W at ds W͑s͒ϭ␩ ␶syn͓A ͑2ϩc ϩcϪ1͒ ͵ Ϫ Ϫ Ϫ the frequency ␻. The time constant ␶str is calculated via Eq. Ϫ1 ͑29͒; cf. Table I. ϩAϩ ͑2ϩcϩϩcϩ ͔͒ ͑B2͒ in out 4. Parameters w , w , ␯0 , and ␽ and We use the learning parameters winϭ␩ and wout ϭϪ1.0475 , where ϭ10Ϫ5. The spontaneous output rate 2 ␩ ␩ 2 ␩ 2 3 2 is ␯ ϭ0 and the upper bound for synaptic weights is ␽ ds W͑s͒ ϭ ͕A ␶ ͓c ϩ4c ϩ5c ϩ2͔ 0 ͵ 4 Ϫ Ϫ Ϫ Ϫ Ϫ ϭ0.1. These values have been chosen in order to fulfill the 2 3 2 following five conditions for learning. First, the absolute val- ϩAϩ ␶ϩ ͓cϩϩ4cϩϩ5cϩϩ2͔ ues of win and wout are of the same order as the amplitude of ϩ2 A A ␶syn ͓c c ϩ2 ͑c ϩc ͒ the learning window W; cf. Fig. 6. Furthermore, these abso- ϩ Ϫ ϩ Ϫ ϩ Ϫ lute values are small as compared to the normalized average ϩ5ϩ4/͑cϩϩcϪ͔͖͒. ͑B3͒ weight ͑see below͒. Second, the constraints on k1 and k2 for a stable and realizable fixed point are satisfied; cf. Sec. V B and Eq. ͑14͒. Third, the correlations in the input are weak so 2. Postsynaptic potential that 0ϽQӶϪk2 ; cf. Eq. ͑23͒. This implies that the time We use the excitatory postsynaptic potential ͑EPSP͒ scale ␶av of normalization in Eq. ͑22͒ is orders of magnitude smaller than the time scale ␶str of structure formation in Eq. ⑀͑s͒ϭs/␶2 exp͑Ϫs/␶ ͒ ͑s͒, ͑B4͒ 0 0 H ͑29͒; cf. also Table I. Fourth, the k3 term in Eq. ͑14͒ can be neglected in the sense of Sec. IV C. Proving this, we note where ( ) denotes the Heaviside step function, and that the fixed point for the average weight is Javϭ2ϫ10Ϫ2 H Ϫ5 Ϫ1 * ͐ds ⑀(s)ϭ1. For the membrane time constant we use ␶0 ͓cf. Eq. ͑21͔͒ and k3ϭ7.04ϫ10 s . We now focus on ϭ10 ms, which is reasonable for cortical neurons ͓68,69͔. Eq. ͑16͒. Since Qij (Ӷ͉k2͉ for all i, j) can be neglected and The EPSP has been plotted in Fig. 4. Using Eqs. ͑B1͒ and Jiр␽ for all i, we find from Eq. ͑16͒ the even more restric- ͑B4͒ we obtain tive condition N ͉k ͉ Jav/␽ӷ͉k ͉ which is fulfilled in our pa- 2 * 3 4512 KEMPTER, GERSTNER, AND van HEMMEN PRE 59 rameter regime. Fifth, input and output rates are identical for APPENDIX E: RANDOM WALK OF SYNAPTIC WEIGHTS in out normalized weights, ␯ ϭ␯ for ␯0ϭ0; see Sec. V B. We consider the random walk of a synaptic weight Ji(t) for tϾt0 , where Ji(t0) is some starting value. The time APPENDIX C: NORMALIZATION AND CORRELATIONS course of J (t) follows from Eq. ͑1͒, BETWEEN WEIGHTS AND INPUT RATES i t The assumption of independence of the weights ͕Ji͖ and in in out out in Ji͑t͒ϭJi͑t0͒ϩ ͵ dtЈ ͓w Si ͑tЈ͒ϩw S ͑tЈ͔͒ the rates ͕␭i ͖ used in Sec. V B for the derivation of a nor- t0 malization property of Eq. ͑15͒ is not valid in general. Dur- ing learning we expect weights to change according to their t t in out input. For the configuration of the input as introduced in Sec. ϩ ͵ dtЈ͵ dtЉ W͑tЉϪtЈ͒ Si ͑tЉ͒ S ͑tЈ͒. V A this depends on whether synapses belong to groups t0 t0 M1 ͑E1͒ or 2 . To show that even under the condition of interde- pendenceM of J and ␭in there is a normalization property ͕ i͖ ͕ i ͖ For a specific i, the spike trains Sin(tЈ) and Sout(tЈ) are now of Eq. ͑15͒ similar to that derived in Sec. V B, we investigate i assumed to be statistically independent and generated by the most extreme case in which the total mass of synaptic Poisson processes with constant rates ␯in for all i and ␯out, weight is, e.g., in . Taking J ϭ0 for i෈ into 2 i 1 respectively; cf. Secs. II A and IV A. Here ␯in can be pre- account, we replaceM NϪ1͚ J Q in Eq. M͑20͒ by ij j ij scribed, whereas ␯out then follows; cf., for instance, Eq. ͑7͒. M Ϫ1N2Jav Qav. The fixed point Jav is similar to that in Eq. 2 * For large N the independence is an excellent approximation. ͑21͒ except for a multiplicative prefactor N/M 2 of order 1 in out av The learning parameters w and w can be positive or preceding Q in Eq. ͑21͒, negative. The learning window W is some quadratically in- tegrable function with a width as defined in Sec. II C. JavϭϪk /͓N͑k ϩQav N/M ͔͒. ͑C1͒ W * 1 2 2 Finally, it may be beneficial to realize that spikes are de- scribed by ␦ functions. av av Since N/M Ͼ1, k Ͼ0, and k ϩQ N/M Ͻ0, J in Eq. The weight Ji(t) is a stepwise constant function of time; 2 1 2 2 * ͑C1͒ is larger than Jav in Eq. ͑21͒, where we assumed inde- see Fig. 2 ͑bottom͒. According to Eq. ͑E1͒, an input spike * pendence of J and ␭in . Correlations between J and arriving at synapse i at time t changes Ji at that time by a ͕ i͖ ͕ i ͖ ͕ i͖ in t in av constant amount w and a variable amount ͐t dtЈ W(t ͕␭i ͖ can be neglected, however, if we assume 0ϽQϷQ 0 av out ӶϪk2 ; cf. Eq. ͑23͒. In this case, J in Eqs. ͑21͒ and ͑C1͒ ϪtЈ) S (tЈ), which depends on the sequence of output * av are almost identical and independent of Q . spikes in the interval ͓t0 ,t͔. Similarly, an output spike at time t results in a constant weight change wout and a variable one that equals ͐t dtЉ W(tЉϪt) Sin(tЉ). We obtain a random APPENDIX D: NORMALIZATION AND WEIGHT t0 i CONSTRAINTS walk with independent steps but randomly variable step size. Suitable rescaling of this random walk leads to Brownian Let us consider the influence of weight constraints ͑Sec. motion. VB͒on the position of the fixed point Jav in Eq. ͑21͒ and the As in Sec. II B, we substitute sϭtЉϪtЈ in the second line * time constant ␶av of normalization in Eq. ͑22͒. We call N of Eq. ͑E1͒ and extend the integration over the new variable and N the number of weights at the lower bound 0 and the↓ s so as to run from Ϫϱ to ϱ. This does not introduce a big ↑ upper bound ␽Ͼ0, respectively. By construction we have error for tϪt0ӷ . The second line of Eq. ͑E1͒ then reduces t W in N ϩN рN, where N is the number of synapses. to ͐ds W(s)͐ dtЈ S (tЈϩs) Sout(tЈ). t0 i ↓ For↑ example, if the average weight Jav approaches Jav * We denote ensemble averages by angular brackets ͗͘. from below, then only NϪN weights can contribute to an The variance then reads increase of Jav. For the remaining↑ N saturated synapses we ˙ ↑ 2 2 have Jiϭ0. Deriving from Eq. ͑15͒ an equation equivalent to var Ji͑t͒ϭ͗Ji ͑͘t͒Ϫ͗Ji͘ ͑t͒. ͑E2͒ ˙ av av Eq. ͑20͒, we obtain J ϭ(1ϪN /N)(k1ϩNk2J ϩJav Qav/N). The fixed point Jav remains↑ unchanged as To simplify the ensuing argument, upper and lower bounds * compared to Eq. ͑21͒ but the time constant ␶av for an ap- for each weight are not taken into account. proach of Jav from below is increased by a factor (1 For the calculation of the variance in Eq. ͑E2͒, first of all * in ϪN /N)Ϫ1у1 as compared to Eq. ͑22͒. Similarly, ␶av for we consider the term ͗Ji͘(t). We use the notation ͗Si ͘(t) in out out an approach↑ of Jav from above is increased by a factor (1 ϭ␯ and ͗S ͘(t)ϭ␯ because of constant input and out- * ϪN /N)Ϫ1 1. put intensities. Stochastic independence of input and output у in out in out The↓ factor by which ␶av is increased is of order 1, if we leads to ͗Si (tЈϩs) S (tЈ)͘ϭ␯ ␯ . Using Eq. ͑E1͒ and use the upper bound ␽ϭ(1ϩd) Jav , where dϾ is of order 1. ͐ds W(s)ϭW˜ (0) we then obtain * If JavϭJav , at most N ϭN/(1ϩd) synapses can saturate at * ↑ in in out out in out ˜ the upper bound comprising the total weight. The remaining ͗Ji͑͘t͒ϭJi͑t0͒ϩ͑tϪt0͓͒w ␯ ϩw ␯ ϩ␯ ␯ W͑0͔͒. N ϭNϪN/(1ϩd) synapses are at the lower bound 0. The ͑E3͒ time↓ constant ␶av is enhanced by at most 1ϩ1/d and 1ϩd 2 for an approach of the fixed point from below and above, Next, we consider the term ͗Ji ͘(t) in Eq. ͑E2͒. Using Eq. respectively. ͑E1͒ once again we obtain for tϪt ӷ 0 W PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4513

t t We note that Sin ϭ␯in for all i and Sout ϭ␯out. 2 2 ͗ i ͘ ͗ ͘ ͗Ji ͑͘t͒ϭϪJi͑t0͒ ϩ2Ji͑t0͒ ͗Ji͑͘t͒ϩ͵ dtЈ͵ duЈ Input spikes at times tЈ and uЈ are independent as long as t0 t0 in in tЈÞuЈ. In this case we therefore have ͗Si (tЈ) Si (uЈ)͘ in in in in in 2 ϭ␯ ␯ . For arbitrary times tЈ and uЈ we find ͑cf. Appendix ϫͭ͗Si ͑tЈ͒ Si ͑uЈ͒͘ ͑w ͒ A͒ out out out 2 ϩ͗S ͑tЈ͒ S ͑uЈ͒͘ ͑w ͒ in in in in ͗Si ͑tЈ͒ Si ͑uЈ͒͘ϭ␯ ͓␯ ϩ␦͑tЈϪuЈ͔͒. ͑E6͒ ϩ͗Sin͑tЈ͒ Sout͑uЈ͒͘ 2 win wout i Similarly, for the correlation between output spike trains we

in in out in obtain ϩ2 ͵ ds W͑s͒ ͓͗Si ͑tЈ͒ Si ͑uЈϩs͒ S ͑uЈ͒͘ w ͗Sout͑tЈ͒ Sout͑uЈ͒͘ϭ␯out ͓␯outϩ␦͑tЈϪuЈ͔͒. ͑E7͒ out in out out ϩ͗S ͑tЈ͒ Si ͑uЈϩs͒ S ͑uЈ͒͘ w ͔ Using Eqs. ͑E5͒, ͑E6͒, and ͑E7͒ in Eq. ͑E4͒, performing ϩ ds dv W͑s͒ W͑v͒ the integrations, and inserting the outcome together with Eq. ͵ ͵ ͑E3͒ into Eq. ͑E2͒, we arrive at

ϫ Sin͑tЈϩs͒ Sin͑uЈϩv͒ Sout͑tЈ͒ Sout͑uЈ͒ . ͗ i i ͮ͘ var Ji͑t͒ϭ͑tϪt0͒ D, ͑E8͒ ͑E4͒ where Since input and output were assumed to be independent, we get Dϭ␯in͑win͒2ϩ␯out͑wout͒2ϩ␯in␯out ͵ ds W͑s͒2 in in out out in in out out ͗Si Si S S ͘ϭ͗Si Si ͗͘S S ͘, ϩ␯in␯outW˜ ͑0͒ ͓2 ͑winϩwout͒ϩW˜ ͑0͒ ͑␯inϩ␯out͔͒, in in out in in out ͗Si Si S ͘ϭ͗Si Si ͗͘S ͘, ͑E9͒

out out in out out in ͗S S Si ͘ϭ͗S S ͗͘Si ͘. ͑E5͒ as announced.

͓1͔ D.O. Hebb, The Organization of Behavior ͑Wiley, New York, ͓18͔ A.K. Kreiter and W. Singer, Eur. J. Neurosci. 4, 369 ͑1992͒. 1949͒. ͓19͔ M. Abeles, in Models of Neural Networks II, edited by E. ͓2͔ C. von der Malsburg, Kybernetik 14,85͑1973͒. Domany, J.L. van Hemmen, and K. Schulten ͑Springer, New ͓3͔ R. Linsker, Proc. Natl. Acad. Sci. USA 83, 7508 ͑1986͒. York, 1994͒, pp. 121–140. ͓4͔ R. Linsker, Proc. Natl. Acad. Sci. USA 83, 8390 ͑1986͒. ͓20͔ W. Singer, in Models of Neural Networks II, edited by E. ͓5͔ D.J.C. MacKay and K.D. Miller, Network 1, 257 ͑1990͒. Domany, J.L. van Hemmen, and K. Schulten ͑Springer, New ͓6͔ K.D. Miller, Neural Comput. 2, 321 ͑1990͒. York, 1994͒, pp. 141–173. ͓7͔ K.D. Miller and D.J.C. MacKay, Neural Comput. 6, 100 ͓21͔ J.J. Hopfield, Nature ͑London͒ 376,33͑1995͒. ͑1994͒. ͓22͔ W. Gerstner, R. Kempter, J.L. van Hemmen, and H. Wagner, ͓8͔ S. Wimbauer, O.G. Wenisch, K.D. Miller, and J.L. van Hem- Nature ͑London͒ 383,76͑1996͒. men, Biol. Cybern. 77, 453 ͑1997͒. ͓23͔ R.C. deCharms and M.M. Merzenich, Nature ͑London͒ 381, ͓9͔ S. Wimbauer, O.G. Wenisch, J.L. van Hemmen, and K.D. 610 ͑1996͒. Miller, Biol. Cybern. 77, 463 ͑1997͒. ͓24͔ M. Meister, Proc. Natl. Acad. Sci. USA 93, 609 ͑1996͒. ͓10͔ D.J. Willshaw and C. von der Malsburg, Proc. R. Soc. London, ͓25͔ H. Markram, J. Lu¨bke, M. Frotscher, and B. Sakmann, Science Ser. B 194, 431 ͑1976͒. 275, 213 ͑1997͒. ͓11͔ K. Obermayer, G.G. Blasdel, and K. Schulten, Phys. Rev. A ͓26͔ L.I. Zhang et al., Nature ͑London͒ 395,37͑1998͒. 45, 7568 ͑1992͒. ͓27͔ B. Gustafsson et al., J. Neurosci. 7, 774 ͑1987͒. ͓12͔ T. Kohonen, Self-Organization and Associative Memory ͓28͔ T.V.P. Bliss and G.L. Collingridge, Nature ͑London͒ 361,31 ͑Springer, Berlin, 1984͒. ͑1993͒. ͓13͔ L.M. Optican and B.J. Richmond, J. Neurophysiol. 57, 162 ͓29͔ D. Debanne, B.H. Ga¨hwiler, and S.M. Thompson, Proc. Natl. ͑1987͒. Acad. Sci. USA 91, 1148 ͑1994͒. ͓14͔ R. Eckhorn et al., Biol. Cybern. 60, 121 ͑1988͒. ͓30͔ T.H. Brown and S. Chattarji, in Models of Neural Networks II, ͓15͔ C.M. Gray and W. Singer, Proc. Natl. Acad. Sci. USA 86, edited by E. Domany, J.L. van Hemmen, and K. Schulten 1698 ͑1989͒. ͑Springer, New York, 1994͒, pp. 287–314. ͓16͔ C.E. Carr and M. Konishi, J. Neurosci. 10, 3227 ͑1990͒. ͓31͔ C.C. Bell, V.Z. Han, Y. Sugawara, and K. Grant, Nature ͑Lon- ͓17͔ W. Bialek, F. Rieke, R.R. de Ruyter van Steveninck, and D. don͒ 387, 278 ͑1997͒. Warland, Science 252, 1855 ͑1991͒. ͓32͔ V. Lev-Ram et al., Neuron 18, 1025 ͑1997͒. 4514 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

͓33͔ H.J. Koester and B. Sakmann, Proc. Natl. Acad. Sci. USA 95, ͓49͔ M. Meister, L. Lagnado, and D.A. Baylor, Science 270, 1207 9596 ͑1998͒. ͑1995͒. ͓34͔ L.F. Abbott and K.I. Blum, Cereb. Cortex 6, 406 ͑1996͒. ͓50͔ M.J. Berry, D.K. Warland, and M. Meister, Proc. Natl. Acad. ͓35͔ A.V.M. Herz, B. Sulzer, R. Ku¨hn, and J.L. van Hemmen, Eu- Sci. USA 94, 5411 ͑1997͒. rophys. Lett. 7, 663 ͑1988͒. ͓51͔ W.E. Sullivan and M. Konishi, J. Neurosci. 4, 1787 ͑1984͒. ͓52͔ C.E. Carr, Annu. Rev. Neurosci. 16, 223 ͑1993͒. ͓36͔ A.V.M. Herz, B. Sulzer, R. Ku¨hn, and J.L. van Hemmen, Biol. ͓53͔ C. Ko¨ppl, J. Neurosci. 17, 3312 ͑1997͒. Cybern. 60, 457 ͑1989͒. ͓54͔ R. Kempter, Hebbsches Lernen zeitlicher Codierung: Theorie ͓37͔ J.L. van Hemmen et al.,inKonnektionismus in Artificial Intel- der Schallortung im Ho¨rsystem der Schleiereule, Naturwissen- ligence und Kognitionsforschung, edited by G. Dorffner schaftliche Reihe, Vol. 17 ͑DDD, Darmstadt, 1997͒. ͑Springer, Berlin, 1990͒, pp. 153–162. ͓55͔ L. Wiskott and T. Sejnowski, Neural Comput. 10, 671 ͑1998͒. ͓38͔ W. Gerstner, R. Ritz, and J.L. van Hemmen, Biol. Cybern. 69, ͓56͔ P.A. Salin, R.C. Malenka, and R.A. Nicoll, Neuron 16, 797 503 ͑1993͒. ͑1996͒. ͓39͔ R. Kempter, W. Gerstner, J.L. van Hemmen, and H. Wagner, ͓57͔ W. Gerstner, R. Kempter, J.L. van Hemmen, and H. Wagner, in Advances in Neural Information Processing Systems 8, ed- in Pulsed Neural Nets, edited by W. Maass and C.M. Bishop ited by D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo ͑MIT Press, Cambridge, MA, 1998͒, Chap. 14, pp. 353–377. ͑MIT Press, Cambridge, MA, 1996͒, pp. 124–130. ͓58͔ A. Zador, C. Koch, and T.H. Brown, Proc. Natl. Acad. Sci. USA 87, 6718 ͑1990͒. ͓40͔ P. Ha¨fliger, M. Mahowald, and L. Watts, in Advances in Neu- ͓59͔ C.W. Eurich, J.D. Cowan, and J.G. Milton, in Proceedings of ral Information Processing Systems 9, edited by M. C. Mozer, the 7th International Conference on Artificial Neural Networks M. I. Jordan, and T. Petsche ͑MIT Press, Cambridge, MA, (ICANN’97), edited by W. Gerstner, A. Germond, M. Hasler, 1997͒, pp. 692–698. and J.-D. Nicoud ͑Springer, Heidelberg, 1997͒, pp. 157–162. ͓41͔ B. Ruf and M. Schmitt, in Proceedings of the 7th International ͓60͔ C.W. Eurich, U. Ernst, and K. Pawelzik, in Proceedings of the Conference on Artificial Neural Networks (ICANN’97), edited 8th International Conference on Artificial Neural Networks by W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud (ICANN’98), edited by L. Niklasson, M. Boden, and T. ͑Springer, Heidelberg, 1997͒, pp. 361–366. Ziemke ͑Springer, Berlin, 1998͒, pp. 355–360. ͓42͔ W. Senn, M. Tsodyks, and H. Markram, in Proceedings of the ͓61͔ H. Hu¨ning, H. Glu¨nder, and G. Palm, Neural Comput. 10, 555 7th International Conference on Artificial Neural Networks ͑1998͒. (ICANN’97), edited by W. Gerstner, A. Germond, M. Hasler, ͓62͔ N.G. van Kampen, Stochastic Processes in Physics and Chem- and J.-D. Nicoud ͑Springer, Heidelberg, 1997͒, pp. 121–126. istry ͑North-Holland, Amsterdam, 1992͒. ͓43͔ R. Kempter, W. Gerstner, and J.L. van Hemmen, in Advances ͓63͔ H. Ritter, T. Martinez, and K. Schulten, Neural Computation in Neural Information Processing Systems 11, edited by M. and Self-Organizing Maps: An Introduction ͑Addison-Wesley, Kearns, S. A. Solla, and D. Cohn ͑MIT Press, Cambridge, MA, Reading, 1992͒. in press͒. ͓64͔ C. van Vreeswijk and H. Sompolinsky, Science 274, 1724 ͓44͔ J. W. Lamperti, Probability, 2nd ed. ͑Wiley, New York, 1996͒. ͑1996͒. ͓45͔ J.A. Sanders and F. Verhulst, Averaging Methods in Nonlinear ͓65͔ W. Softky and C. Koch, J. Neurosci. 13, 334 ͑1993͒. Dynamical Systems ͑Springer, New York, 1985͒. ͓66͔ F. Rieke, D. Warland, R. de Ruyter van Steveninck, and W. ͓46͔ R. Kempter, W. Gerstner, J.L. van Hemmen, and H. Wagner, Bialek, Spikes: Exploring the Neural Code ͑MIT Press, Cam- Neural Comput. 10, 1987 ͑1998͒. bridge, MA, 1997͒. ͓47͔ T.J. Sejnowski and G. Tesauro, in Neural Models of Plasticity: ͓67͔ M.N. Shadlen and W.T. Newsome, Curr. Opin. Neurobiol. 4, Experimental and Theoretical Approaches, edited by J. H. By- 569 ͑1994͒. rne and W.O. Berry ͑Academic Press, San Diego, 1989͒, Chap. ͓68͔ O¨ . Bernander, R.J. Douglas, K.A.C. Martin, and C. Koch, 6, pp. 94–103. Proc. Natl. Acad. Sci. USA 88,11569͑1991͒. ͓48͔ R. Ritz and T.J. Sejnowski, Curr. Opin. Neurobiol. 7, 536 ͓69͔ M. Rapp, Y. Yarom, and I. Segev, Neural Comput. 4, 518 ͑1997͒. ͑1992͒.