Research Collection
Doctoral Thesis
On randomness as a principle of structure and computation in neural networks
Author(s): Weissenberger, Felix
Publication Date: 2018
Permanent Link: https://doi.org/10.3929/ethz-b-000312548
Rights / License: In Copyright - Non-Commercial Use Permitted
This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.
ETH Library Felix Weissenberger
On randomness as a principle of structure and com- putation in neural networks
Diss. ETH No. 25298 2018
ONRANDOMNESSASAPRINCIPLEOFSTRUCTURE ANDCOMPUTATIONINNEURALNETWORKS
Diss. ETH No. 25298
On randomness as a principle of structure and computation in neural networks
A thesis submitted to attain the degree of
DOCTOROFSCIENCES of ETHZURICH (Dr. sc. ETH Zurich)
presented by FELIXWEISSENBERGER MSc ETH in Theoretical Computer Science born on 04.08.1989 citizen of Germany
accepted on the recommendation of Prof. Dr. Angelika Steger Prof. Dr. Jean-Pascal Pfister Dr. Johannes Lengler
2018
Contents
Abstract iii
Zusammenfassung v
Thanks vii
1 Introduction 1
2 Emergence of synfire chains 19
3 Rate based learning with short stimuli 79
4 Mutual inhibition with few inhibitory cells 109
5 Lognormal synchrony in CA1 125
Bibliography 163
i
Abstract
This work examines the role of randomness in structure and informa- tion processing of biological neural networks and how it may improve our understanding of the nervous system. Our approach is motivated by the pragmatic observation that many components and processes in the brain are intrinsically stochastic. Therefore, probability theory and its methods are particularly well suited for its analysis and modeling. More profoundly, our approach is based on the hypothesis that the stochasticity of the nervous system is much more than just an artifact of a biological system. This hope stems from the experience in probability theory that random structures often have highly desirable properties and the theory of randomized algorithms, which impressively demonstrates that chance is extremely useful for the efficient computation of solutions to many problems. It is therefore not surprising that randomness has also been given a fundamental role in the structure and information processing of the nervous system. In this tradition, we study simple, mostly stochastic mathematical models of neurons, synapses and their interaction in neural networks and investigate emergent properties that can be proven mathematically, often with the help of discrete probability theory. The mathematical analysis allows the extraction of essential concepts that can ultimately be fully understood. Furthermore, we simulate more complex models to check whether the knowledge gained in this way generalizes. In this way, we can quickly examine, test and usually reject many hypotheses in purely theoretical considerations. In the case of useful ideas, these
iii can inspire concrete biological experiments and predict their outcome or help to understand and interpret experiments already carried out. In this process, we often draw inspiration from the field of discrete probability theory, especially random graph theory and the theory of randomized algorithms. Concretely, we first show that the structure of biological neural networks favors the formation of so-called synfire chains since it re- sembles locally the structure of directed random graphs. Synfire chains are an established model of multi-stage signal transmission in neural networks. Second, we demonstrate how the efficiency of rate based synaptic plasticity can benefit from a dependence on the local membrane potential as the fluctuations of this potential contain more relevant information than individual action potentials. Third, we prove that random synaptic connectivity in combination with the nonlinear interaction of inhibitory synapses allows mutual inhibitory communication between excitatory neurons, even if the number of inhibitory neurons is much smaller than the number of excitatory neu- rons. Fourth, we provide a possible explanation for the experimental observation that the number of neurons firing during certain stereotyp- ical network activity in the hippocampus corresponds to a lognormal distribution: the synaptic transfer of normally distributed network activity from one area to the next leads to lognormally distributed activity there.
iv Zusammenfassung
Diese Arbeit betrachtet exemplarisch die Rolle des Zufalls in der Struktur und Informationsverarbeitung biologischer neuronaler Netze und wie wir diese ausnutzen können, um das zentrale Nervensystem besser zu verstehen. Motiviert ist unser Ansatz zunächst durch die pragmatische Beob- achtung, dass viele Komponenten und Prozesse des Gehirns intrin- sisch stochastisch sind. Daher eignet sich die Wahrscheinlichkeitstheo- rie und ihre Methoden zur Analyse und Modellierung in besonderem Masse. Tief greifender, beruht unser Ansatz auf der Hypothese, dass die Stochastizität des Nervensystems weit mehr ist als nur ein Artefakt eines biologischen Systems. Diese Hoffnung rührt aus der Erfahrung in der Wahrscheinlichkeitstheorie, dass zufällige Strukturen oft sehr wünschenswerte Eigenschaften haben und der Theorie randomisier- ter Algorithmen, die eindrucksvoll belegt, dass Zufall zur effizienten Berechnung von Lösungen vieler Probleme äusserst nützlich ist. Da- her erstaunt es nicht, dass dem Zufall auch eine grundlegende Rolle in der Struktur und Informationsverarbeitung des Nervensystems eingeräumt wurde. In dieser Tradition betrachten wir einfache, meist stochastische ma- thematische Modelle von Neuronen, Synapsen und deren Verbund in neuronalen Netzen und untersuchen emergente Eigenschaften, die sich mathematisch, oft mithilfe diskreter Wahrscheinlichkeitstheorie, beweisen lassen. Ein solcher Ansatz erlaubt die Reduktion auf we- sentliche Konzepte die schlussendlich vollständig verstanden werden können. Des Weiteren simulieren wir komplexere Modelle, um zu
v prüfen, ob sich die so gewonnenen Erkenntnisse generalisieren las- sen. So können wir in rein theoretischen Betrachtungen schnell viele Thesen prüfen, testen und meist verwerfen. Im Fall brauchbarer Ide- en können diese konkrete biologische Experimente motivieren und deren Ausgang vorhersagen oder bereits vorgenommene Experimen- te verstehen und deuten. Inspiration schöpfen wir dabei häufig aus dem Gebiet der diskreten Wahrscheinlichkeitstheorie, vor allem der Zufallsgraphentheorie und der Theorie randomisierter Algorithmen. Konkret zeigen wir erstens, dass die Struktur biologischer neuro- naler Netze, die Formation sogenannter Synfire Ketten begünstigt, da sie lokal der Struktur gerichteter Zufallsgraphen ähnelt. Synfire Ketten sind ein etabliertes Modell mehrstufiger Signalübertragung in neuronalen Netzen. Zweitens demonstrieren wir wie die Effizienz synaptischer Plastizität von einer Einbeziehung des lokalen Membran- potentials profitieren kann, da die Fluktuationen dieses Potenzials mehr relevante Information enthält als einzelne Aktionspotentiale. Drittens beweisen wir, dass zufällige synaptische Verbindungen in Kombination mit nicht linearer Interaktion inhibitorischer Synapsen eine wechselseitige inhibitorische Kommunikation zwischen exzita- torischen Neuronen erlaubt, selbst wenn die Anzahl inhibitorischer Neuronen viel kleiner ist als die Anzahl exzitatorischer Neuronen. Viertens liefern wir eine mögliche Erklärung für die experimentelle Beobachtung, dass die Anzahl der Neuronen die während bestimm- ter stereotyper Netzwerk Aktivität im Hippocampus feuern, einer logarithmischen Normalverteilung entspricht: die synaptische Über- tragung normal verteilter Netzwerkaktivität von einem Bereich in den nächsten, führt dort zu log-normal verteilter Aktivität.
vi Thanks
Thank you to everybody who made my time at ETH so much fun!
First off, to my supervisor, Angelika Steger. I am sincerely grateful for the opportunity to work in your group. The environment on the intersection of combinatorics, neuroscience and machine learning that you created is unique. Your trust, support and advice mean a lot to me. I could not imagine a better boss and more inspiring mentor. Thank you! Thank you to Johannes Lengler, for your help, patience and uplifting spirit. You have been incredibly supportive. To Jean-Pascal Pfister, for invaluable feedback, for letting me partici- pate in his group meetings, and for sacrificing his time to referee this thesis. I also want to thank my other collaborators who contributed to this thesis; much of what is written here must be largely attributed to you. I am further especially thankful to all past and current members of our group and the institute who shared the time of my PhD with me. I will miss having you around.
Finally, I thank my family and friends for their love and support. I do not take this for granted. Thank you so very much.
Zurich June, 2018
vii
1 Introduction
The human brain is a fantastic computer. All our actions and thoughts, from simple movements to brilliant ideas, emanate from computations in our brains. This reductionist view allows a profound insight: the brain serves as a proof of concept for what human-designed computers should be capable of. Yet, it also shows us how poorly we understand information processing in the central nervous system right now.
1.1 The brain as an inherently probabilistic computer
If we want to understand computation in the brain, it may be in- structive to compare the central nervous system to digital computers, which we actually understand. First, let us start this comparison at the level of elementary compo- nents: our digital computers, following the architecture proposed by John von Neumann in the middle of the 20th century, are built from transistors joined by conductors in integrated circuits. Analogously, as discovered in the seminal physiology work by Santiago F. Ramón y Cajal in the early 20th century, the elementary components of the nervous system are discrete individual nerve cells interconnected by synapses in neural networks. However, although these fundamental building blocks seem to some extent comparable, there is a remarkable difference: while transistors are homogeneous, reliable and unmodifi- able, there is an abundance of different nerve cell and synapse types, which are stochastic in nature and constantly change their behavior.
1 1. introduction
For instance, individual synapses transmit signals only with a certain (surprisingly low) probability (Branco & Staras, 2009), and the qual- ity of signal transmission continually adapts in response to synaptic activity (Bliss & Lømo, 1973). If we now zoom out to the network level, we see a similar picture. The structure of the circuits in our digital computers, such as the central processing unit, is highly organized, static, and identical for all units of the same type. In contrast, neural networks often seem to lack any structure, evolve invariably and differ from one brain to the other. For example, locally the synaptic connectivity in the neocortex or certain areas of the hippocampus, such as area CA3, is considered to be random and independent of the spatial arrangement of neurons (Buzsáki, 2006). This apparent discrepancy, illustrated by the reliability and unrelia- bility of individual components along with order and disorder of their interaction, is not limited to the physical ‘hardware’, but also appears in the operating mode of computers and brains. Whereas the registers of digital computers typically contain the same bit strings whenever the same computation is carried out, the response of single neurons to identical stimuli displays a high degree of variability: the points in time when a neuron emits action potentials relative to the onset of a specific visual cue are highly variable from one trial to another, and appear to be completely irreproducible (Softky & Koch, 1993). At first glance, this unreliability, lack of structure and irreproducibil- ity may intuitively seem like a huge problem – if the brain is intrinsi- cally random, how could we expect any predictability whatsoever in its computations? More specific, how should we use our knowledge of computation, which is tied to our understanding of deterministic digital computers, to understand how the brain processes information?
2 determinism from probabilistic components
1.2 Determinism from probabilistic components
Fortunately, numerous results in probability theory show that pre- dictability and randomness are not incompatible. In fact, even com- pletely random structures and processes have very predictable large- scale properties. Let us illustrate this on the example of unreliable synaptic transmis- sion. As mentioned above, the probability that a particular synapse transmits a signal may be low. Nevertheless, pairs of neurons are connected not only by a single synapse but form several, up to hun- dreds of synaptic connections. Further, the probability of transmission appears to be independent among distinct synapses (Branco & Staras, 2009). Hence, even if one synapse transmits a signal only with proba- bility 20 %, the probability that no signal is transmitted at all between two neurons that are connected via 20 synapses is less than 1 %. Hence, one can be quite certain that a signal is indeed transmitted. Similarly, the random connectivity of neural networks implies order: the probability that two neurons in the hippocampal CA3 region, which contains roughly 200,000 neurons, are connected is roughly 5 % (Buzsáki, 2006). Hence, the expected number of connections a neuron forms is 10,000. Now, the probability that the actual number of connections deviates from this by only 250 is already less than 1 %. So we can almost certainly say that the actual number of connections a neuron forms is in this range. Analogously, the variability of single neurons is not problematic if the information is encoded in redundant populations of neurons or if one may take an average over many trials (Gerstner, Kistler, Naud, & Paninski, 2014). The previous considerations all rely on the fact that a large number
3 1. introduction of independent random events collectively exhibits almost determinis- tic behavior. This insight goes back to the origin of probability theory when Jacob Bernoulli proved the law of large numbers in the early 18th century and has culminated in the numerous concentration bounds that comprise the toolbox of any modern theoretician (Dubhashi & Panconesi, 2009). Concluding our comparison of digital computers and brains, there appears to be a remarkably consistent dichotomy between determin- ism and randomness. What is more, although the brain appears intrinsically probabilistic, this does not mean that it cannot compute deterministically, and the tools provided by probability theory seem to be adequate to study neural structure and computation. However, while this is comforting it is not too exciting. In the sequel, we will ar- gue in favor of a more fundamental role of randomness in the brain by taking a short (and biased) detour through the history of randomness in computation.
1.3 Chance as a principle of structure and computation
Around the beginning of the 20th century, mathematicians discovered that probability theory may be used to prove deep results in other areas of mathematics in a surprisingly elegant way. This led to the development of the so-called probabilistic method, initiated and pop- ularized by Paul Erd˝os.In essence, the probabilistic method can be summarized as follows: in order to prove the existence of an object with certain properties, we construct an appropriate probability space and show that a randomly chosen element in this space has the de- sired properties with positive probability (Alon & Spencer, 2008). The success of the probabilistic method demonstrates the usefulness of
4 chance as a principle of structure and computation chance to prove the existence of desirable combinatorial structures in an impressive manner, and we will see an application to neuroscience in Chapter 4. At the same time, the field of random graph theory emerged. A graph is a finite structure that consists of a set of vertices, some of which may be joined by edges. The original random graph model is called Gn,p, or Erd˝os-Rényimodel, after Paul Erd˝osand Alfréd Rényi who studied it in detail. For p [0, 1], the random graph G ∈ n,p is the graph on n vertices in which every possible edge is included independently with probability p. As a central part of random graph theory, so-called threshold phenomena have been studied extensively. Here, one is interested in threshold values such that if p is slightly larger or slightly smaller than the threshold, then Gn,p does or does not possess a certain property with high probability. Numerous properties have been considered with respect to their threshold, including the graph being connected or containing a certain subgraph (Bollobás, 2001). Further, also properties that are interesting to neuroscience may be considered, see Chapter 2. Meanwhile, random graph theory developed a broad class of random graph models, which are suitable to study all kinds of networks, including neural networks. The fruitful application of probability theory in many areas of mathematics combined with the rise of theoretical computer science eventually sparked the systematic study of randomness in compu- tation. This set the foundation for the field of randomized algorithms. Randomized algorithms use simulated randomness during their exe- cution to efficiently compute solutions (Mitzenmacher & Upfal, 2005). The resulting algorithms are often considerably simpler and, crucially, more efficient than their deterministic counterparts. In consequence, many algorithms running on our computers are in fact randomized.
5 1. introduction
For instance, testing if a number is a prime, which is an essential operation in cryptographic protocols we use in everyday life, is done by a randomized algorithm, the Miller Rabin primality test, although asymptotically more efficient deterministic algorithms are known. Similarly, the elementary operation of sorting numbers is typically im- plemented by the Quicksort algorithm, which deserves its name only because it is randomized. The list of examples, demonstrating how useful randomness is for computation, could be extended indefinitely, including algorithms for Hashing, Sampling, and Optimization. Thus, we may wonder whether randomness can also be useful in neural computation. Specifically, we may ask if the inherent random- ness of the nervous system is not just an unpreventable by-product of biological systems – but rather a principle of its structure and computation? In light of this question we return to the three examples we encoun- tered in our earlier comparison of the brain with digital computers. The stochasticity of synapses has been hypothesized, for example, to enable exploration of network configurations while at the same time maintaining the networks functionality (Kappel, Habenschuss, Legen- stein, & Maass, 2015). Moreover, the benefit of random connectivity in neural networks immediately follows from the numerous results in random graph theory, and is further detailed in Chapters 2 and 4. Finally, the trial-to-trial variability of neuronal activity has been proposed to realize complex probability distributions in neural net- works in such a way that computations necessary to perform inference are feasible (Habenschuss, Jonke, & Maass, 2013). The latter is inter- esting because various studies in cognitive science and neuroscience conclude that the brain in fact performs inference (Ernst & Banks, 2002).
6 chance as a principle of structure and computation
These examples are by no means exhaustive yet illustrate the ap- proach we follow in this thesis. We study the role randomness plays in structure and computation of neural networks, in the hope of better understanding the brain. Thereby, we rely on inspiration from the probabilistic method, random graph theory, and randomized algo- rithms. Concretely, we analyze simple mathematical models of neurons, synapses and their interaction in neural networks and examine emerg- ing properties that can be mathematically proven. By that, we may identify all necessary model assumptions and finally reduce the mod- els to contain only essential components, which can be fully under- stood. In addition, we simulate more complex models, which depict their biological archetype in greater detail, to test whether our in- sights generalize. This allows us to quickly examine, test and usually reject many ideas through purely theoretical considerations. In the case of useful ideas, these may inspire or guide concrete biological experiments and predict their outcome, or understand and interpret experiments already carried out. This thesis is a compilation of four independent papers. Their common theme is the motivation introduced above. Hence, in the remainder of this introduction, we present our contributions from this perspective. Further, each of the following sections summarizes one paper and starts with a brief repetition of the relevant neurobiological background. By that, we largely neglect to put our work into the context of current neuroscience research. Thus, to get the full picture, it is strongly recommended to also read the introductions of the individual Chapters 2, 3, 4 and 5.
7 1. introduction
1.4 Emergence of synfire chains
Anatomically, a neuron consists of three parts: the dendrites, which often look like a heavily branched tree, the cell body or soma, and the axon, which typically has the form of a long thin cable. Further, a neuron is confined by a thin cell membrane from the extracellular space and the difference in electrical potential between the inside and the outside of the neuron is called the membrane potential. Having defined the membrane potential, we can describe elementary neuronal computation as postulated in the law of dynamic polarization by San- tiago F. Ramón y Cajal: the dendrites serve as an input device, where most synapses are located and synaptic input changes the membrane potential. The soma is the central processing unit, and if the somatic membrane potential exceeds a certain threshold, the membrane poten- tial rapidly rises and falls within 2 ms. This stereotypical rise and fall in potential is called an action potential or spike. The underlying mech- anism of spike generation was discovered by Alan L. Hodgkin and Andrew F. Huxley in an early and beautiful symbiosis of mathematical modeling with physiological experiment (Hodgkin & Huxley, 1952). The axon is the output device of the neuron. If a spike is generated at the initial segment of the axon, the spike travels along the axon and finally via synapses as input to target neurons. The stereotypical form of spikes led to the conclusion that only the presence or absence of spikes may carry information. Hence, neurons send discrete binary signals. The time to transmit a spike from one neuron to the next takes only a few milliseconds. In contrast, the processing time of many tasks in the nervous system is much longer. For example, even in a simple reaction time test, where a subject is asked to press a button
8 emergence of synfire chains in response to a sound, the delay exceeds a hundred milliseconds. Therefore, in such a computation, a chain of spike transmission steps is involved. Consider the signal propagating along a chain of neurons, serially connected by single synapses. We immediately see that such an ar- rangement is flawed if its components are inherently unreliable: as mentioned earlier, synapses transmit spikes only with a certain proba- bility. Hence, the probability of successful signal transmission along the chain is exponentially small in its length, assuming independence of synaptic transmission. Further, temporal jitters of spike-timing accumulate along the chain and the reproduction of exact spike timing is hopeless. These limitations are overcome if the neurons are connected in a certain scheme, which was established by Moshe Abeles as synfire chain (Abeles, 1982). In a synfire chain, groups of neurons are serially connected such that neurons in one group form many synaptic con- nections to neurons in the subsequent group and few to other neurons. This leads to the signal being propagated in synchronous volleys of spikes along the chain, thus both reliable transmission and exact spike timing are ensured even in the presence of unreliability and noise. Therefore, synfire chains provide a candidate solution for stable and precisely timed multi-stage signal transmission in neural networks. How the specific connectivity scheme of synfire chains may emerge in initially unstructured neural networks is unclear. Notably, neural networks with random connectivity already contain an abundance of connectivity schemes resembling synfire chains – if the connection probability exceeds some threshold value (Abeles, 1991). This does not come as a surprise to us, knowing about classic work in random graph theory, including the threshold of subgraph containment (Bollobás,
9 1. introduction
2001). Unfortunately, in such ceompletely unstructured networks, the activity does not propagate along one synfire chain in a stable manner, but rather dies out or explodes quickly. In Chapter 2 we build on this observation. We propose a synaptic learning rule that removes a few synapses from an initially random network, thereby stabilizing one synfire chain through ongoing neu- ronal activity. This eventually leads to the emergence of long synfire chains. One of our main insights is that random networks with a certain connection probability are the perfect substrate for the emergence of synfire chains. In particular, they allow surpassing two main obstructions encountered in previous work. Firstly, it is sufficient that the learning rule makes only minor changes to the initial connectivity. More explicitly, it is not necessary that new synapses with specific targets are formed if the connection probability is above a certain threshold. Secondly, the absence of such structural plasticity and, crucially, a connection probability close to the threshold avoids the formation of short and cyclic synfire chains. These insights have largely benefited from intuition derived in random graph theory. Further, we analyze the proposed learning rule with respect to the capacity of the network. To this end, modeling the process of chain formation as a random graph process and applying tools from random graph theory permits not only to compute the length of the emerging synfire chains but also to show that the proposed learning rule is optimal in this regard.
10 rate based learning with short stimuli
1.5 Rate based learning with short stimuli
The immediate response of a neuron to an incoming action potential is a change in its membrane potential. The amplitude of this so-called postsynaptic potential depends on the efficacy of the transmitting synapse. This efficacy is usually abstracted as the synaptic weight. Crucially, synaptic weights are not fixed but are modifiable, in particu- lar in response to synaptic activity. This is known as synaptic plasticity, discovered by Tim V. P. Bliss and Terje Lømo (Bliss & Lømo, 1973). Synaptic plasticity is largely regarded as the basis of learning new skills and making memories. Yet, little is known about how synap- tic plasticity implements learning and memory in detail. A major challenge is that synaptic plasticity mechanisms (i.e. the underlying biophysical machinery) have limited access to relevant information, simply because of physical restrictions. Hence, synaptic plasticity can only depend on local quantities including the activity (e.g. spiking or not) of the presynaptic and postsynaptic neurons, the state (e.g. membrane potential or local calcium concentration) of the postsy- naptic cell or signals provided by neuromodulators (e.g. dopamine). Relating to computer science, a synapse may be considered an agent in a distributed computing network. How synaptic weights change as a function of local quantities has been studied in numerous experiments. In essence, these experiments measure the synaptic weight change in response to manipulation of certain local quantities according to a specific protocol. Notably, such experiments led to the discovery of learning rules that describe synaptic weight change as a function of time difference of presynaptic and postsynaptic spikes (Bi & Poo, 1998) or the firing rate of the presynaptic and postsynaptic neurons (Brown, Chapman, Kairiss, &
11 1. introduction
Keenan, 1988). Here, the firing rate of a neuron is simply the number of spikes it emits per unit of time. Such rate based learning rules relate signal to weight change under the assumption that the signal is encoded in the firing rate. Historically, the local quantity that has been considered to deter- mine the rate is spikes: to compute the rate from spikes one can simply count the number of spikes per unit of time. However, this computation is only meaningful if there are ‘enough’ spikes in the relevant time interval. Consider the following simple example. On the one hand, biologically relevant signals are typically short, say in the order of 50 ms. On the other hand, a typical neuronal firing rate is 40 Hz (Rieke, Warland, de Ruyter van Steveninck, & Bialek, 1999). Hence, in such a time interval one expects as little as 2 spikes. Considering realistic noise levels the variance in the sketched rate computation is immense. Thus, it is impossible to compute the rate with high accuracy in such a short time. Consequently, if rate based learning rules are implemented via spikes, then learning is restricted to long and stationary signals. This is the problem that we study in Chapter 3. The reason why it takes so long to estimate the rate from spikes is that there are only a few spikes in a short time interval. However, from discrete probability theory, we know that a larger number of samples may give a higher accuracy. In fact, this is exactly what the law of large numbers tells us. The firing rate of a neuron is a function of the synaptic input it receives. Further, the number of input spikes is much larger than the number of output spikes per unit of time, in particular, if excitatory and inhibitory inputs balance each other. These input spikes, nevertheless, are reflected in fluctuations of the membrane potential, which may be considered a local quantity as well.
12 mutual inhibition with few inhibitory cells
Thus, our idea is to compute the rate not from spikes but from the membrane potential. We formalize this idea in the classical framework of Richard B. Stein’s neuron model (Stein, 1965) and its diffusion approximation. In this model, under the assumption of balanced excitation and inhibition below threshold (i.e. the mean input is constant and not sufficient to trigger spikes), the stochastic process of spiking is a Poisson process whereas the stochastic process governing the membrane potential dynamics is an Ornstein-Uhlenbeck process. Thus, computing the rate from spikes is then equivalent to estimating the rate of a Poisson process and computing it from the membrane potential boils down to estimating the fluctuations of an Ornstein-Uhlenbeck process. We find that the latter requires much less time because samples can be taken at a higher rate than the actual rate of the neuron. This confirms the intuition we obtained from discrete probability theory: increasing the number of samples increases the accuracy. Hence, if a plasticity mechanism uses the fluctuations of the membrane potential to realize a rate dependence, then rate based learning can deal with much shorter signals than if it was only based on spikes.
1.6 Mutual inhibition with few inhibitory cells
Signal transmission at chemical synapses works as follows: after the presynaptic neuron spikes, neurotransmitters are released at its axon terminals into the synaptic cleft (i.e. the small gap between the presynaptic and postsynaptic neuron). The released transmitters diffuse to the membrane of the postsynaptic cell, where they bind to transmitter-receptors. These receptors then activate ion channels, through which current flows, ultimately resulting in a change of
13 1. introduction the postsynaptic membrane potential. The effect of a presynaptic spike on the postsynaptic membrane potential depends on the type of transmitters that are released. There are transmitters that increase the membrane potential (e.g. glutamate) and others that decrease it (e.g. GABA). However, neurons release the same transmitters at all their axon terminals regardless of the identity of the target cell (with very few known exceptions). This phenomenon is known as Dale’s principle, attributed to Henry H. Dale (Dale, 1935). His principle allows to categorize neurons into excitatory and inhibitory neurons, depending on whether their spikes increase or decrease the membrane potential of target cells. Many computational neural network models in neuroscience and most artificial neural networks used in machine learning violate Dale’s law. In such cases, neurons may excite some and inhibit other targets, depending on the weight of their synaptic connection. The reasoning why such models are not in direct conflict with biological constraints is that it is easy to transform a neural network that does not respect Dale’s law into one that does. Each abstract neuron can be replaced by an excitatory and an accompanying inhibitory neuron, through which inhibitory signals are mediated. However, this construction has a fundamental flaw: it requires an equal number of excitatory and inhibitory neurons, whereas in real neural networks the number of inhibitory neurons is much smaller than the number of excitatory neurons. It is unlikely that networks built up of simple model neurons, which sum up their synaptic input and emit spikes if the summed input exceeds a threshold, can overcome this limitation because the number of synaptic weights to be stored in the network requires an equal number of excitatory and inhibitory neurons. However, such models
14 mutual inhibition with few inhibitory cells are oversimplifying as they neglect potential computation performed on the dendritic tree: synaptic inputs travel from potentially distant dendritic compartments towards the soma of the neuron, where spikes are generated; along the way, there is an abundance of biophysical mechanisms that implement nonlinear interactions with other synap- tic inputs. Such mechanisms may allow intricate computation, much more powerful than the simple summation of inputs. Exploiting these mechanisms to perform computation is known as dendritic computa- tion (London & Häusser, 2005). For example, an inhibitory input on the path of an excitatory input towards the soma can implement a logical NAND function between the two inputs (shunting inhibition). Further, spatially close excitatory synapses can cause a supra-linear response if they are co-active, implementing a logical AND function (coincidence detection via dendritic spikes). Moreover, dendritic com- putation can also regulate synaptic plasticity, for example, inhibitory input on the dendrite can prevent that spiking information, which is necessary for plasticity, travels back to specific synapses (Wilmes, Sprekeler, & Schreiber, 2016). In Chapter 4, we study if dendritic computation may allow that, in networks of excitatory and inhibitory neurons, all excitatory neurons can send specific inhibitory signals to all other excitatory neurons, a property called mutual inhibition, under the constraint that there are much fewer inhibitory than excitatory cells. Mutual inhibition has many desirable computational properties which are useful for example in the decorrelation of signals (Barlow & Földiák, 1989). Whereas the traditional model of mutual inhibition assigns one inhibitory neuron to each excitatory neuron, our idea is to assign a subset of inhibitory neurons to each excitatory neuron. To decode the subset and associate it with a specific weight by which the membrane potential of the
15 1. introduction target cell is modified, we propose that on the dendrites of excitatory neurons, the logical AND function of inhibitory synaptic activity is computed and multiplied by the weight. We speculate that the AND function is realized by nonlinear interaction of inhibitory synapses resembling dendritic spikes and that the weight corresponds to the distance between the synapses and the soma. Since the number of (possibly overlapping) subsets of inhibitory neurons is much larger than the number of excitatory neurons, even for a few inhibitory neurons, this approach has the potential to solve the problem and reduce the number of required inhibitory neurons substantially. However, the choice of inhibitory subsets that are associated with the excitatory neurons is crucial. If for example, two excitatory neurons get the same inhibitory subset, all excitatory neurons receive the respective inhibitory input even if only one of the excitatory neurons is active. Thus, at the heart of our model is the choice of these subsets, which ensures that mutual inhibition is implemented correctly. A family of subsets that satisfies the required properties (i.e. no not too large union of subsets contains another subset from the fam- ily) is known as cover-free set family (Füredi, 1996) in combinatorics. Interestingly, it can be shown using the probabilistic method that a suitable cover-free set family exists with high probability. In particular, assigning random subsets of inhibitory neurons to excitatory ones reliably implements mutual inhibition. Besides admitting an elegant proof, the random construction is interesting from a biological point of view because it shows that no specific connectivity scheme between excitatory and inhibitory neurons is necessary, yet random connectiv- ity provides the desired structure. This makes the proposed model more biologically plausible and may serve as an instructive example of the use of randomness in the structure of the nervous system.
16 lognormal network synchrony in ca1
1.7 Lognormal network synchrony in CA1
The hippocampus is a part of the brain that is believed to be involved in the transfer of short-term memories to long-term memory. In partic- ular, during sleep, the neuronal activity that may represent memories acquired during the day is replayed in order to be written into the cortex in a consolidation process. These replay events are associated with certain stereotypical neural network activity, termed sharp-wave ripples (SPW-Rs), which occur in the CA3 and CA1 region of the hippocampus (Buzsáki, 2015). First, an ensemble of neurons in CA3 spikes in a network burst. Second, the spikes are transmitted to the CA1 region where another ensemble of neurons spikes synchronously in response. The name SPW-Rs originates from their discovery in local field potential recordings: a sharp wave in the local field potential reflects the strong synaptic input towards CA1 and the ripple is a fast oscillation, caused by the interplay of synchronous activity of excitatory and inhibitory neurons in CA1. SPW-Rs have been studied extensively because they provide a relatively easy to detect and reoc- curring event that is not directly caused by external input but reflects internal information processing. The computation performed in the transmission step from CA3 to CA1 is still unclear. Experiments revealed that the number of CA1 neurons that spike during SPW-Rs follows a lognormal distribution. A random variable follows a lognormal distribution if its logarithm follows a normal distribution. Simplifying, this means that most of the time, the number of CA1 neurons participating in a SPW-R is small, whereas sometimes it is atypically large. The origin of this phenomenon is unknown. It is often instructive to study the origin of the distribution of a
17 1. introduction quantity as it may allow to understand the underlying processes. For example, the lognormal distribution arises naturally if a quantity is the product of many independent positive components: taking the logarithm transforms the product into a sum and the sum of many independent random variables follows a normal distribution according to the central limit theorem. This offered, for example, an explanation of the lognormal distribution of synaptic weights via multiplicative synaptic plasticity (Loewenstein, Kuras, & Rumpel, 2011). In Chapter 5 we study a simple model of the CA3-CA1 circuit with respect to the distribution of CA1 neurons participating in SPW-Rs. We find that if the size of the CA3 network bursts is normally distributed, then the size of the CA1 activity in response follows a lognormal distribution. We derive this result by showing that synchronous transmission over one synaptic layer transforms a normal distribution into a lognormal distribution. Thereby, we predict that the activity in CA3 is normally distributed and that the computation performed in the transmission is a certain signal transformation. In contrast to the previous chapters, where we showed how ran- domness may be exploited for structure and computation, here we simply aim to postdict an experimental observation. Doing so, we rely on the intuition gained in the study of a graph process called bootstrap percolation on random graphs (Janson, Łuczak, Turova, & Vallier, 2012). In particular, the synchronous transmission of spikes from CA3 to CA1 during SPW-Rs may be modeled as the first round of this process.
18 2 Emergence of synfire chains
The results in this chapter were obtained in joint work with Florian Meier, Jo- hannes Lengler, Hafsteinn Einarsson and Angelika Steger, see (Weissenberger, Meier, Lengler, Einarsson, & Steger, 2017).
2.1 Introduction
A synfire chain is a connectivity scheme that connects a sequence of neuron groups of roughly the same size, called patterns, in a neural network, see Figure 21; the connectivity is such that synchronous ac- tivity in one pattern elicits synchronous activity only in the following pattern after one synaptic delay (Abeles, 1982). Several theoretical re- sults indicate that activity propagates in synchronous volleys of spikes along synfire chains (Abeles, 1982; Gewaltig, Diesmann, & Aertsen, 2001; Goedeke & Diesmann, 2008) and that this works robustly even in a noisy environment (Hertz, 1997; Diesmann, Gewaltig, & Aertsen, 1999; Aviel, Mehring, Abeles, & Horn, 2003). Synfire chains have become an important model for multi-stage signal transmission in the brain (Diesmann et al., 1999; Vogels, Rajan, & Abbott, 2005). Traces of synfire chains have been found in various brain areas across species (Abeles, Bergman, Margalit, & Vaadia, 1993; Prut et al., 1998; Nádasdy, Hirase, Czurkó, Csicsvari, & Buzsáki, 1999; Hahnloser, Kozhevnikov, & Fee, 2002; Reyes, 2003; Ikegaya et al., 2004; Segev, Baruchi, Hulata, & Ben-Jacob, 2004; Luczak, Barthó, Marguet, Buzsáki, & Harris, 2007; Tang et al., 2008; Long, Jin, & Fee, 2010), although,
19 2. emergence of synfire chains with current recording techniques it is still difficult to unambiguously verify their existence (Gerstein, Williams, Diesmann, Grün, & Tren- gove, 2012). Aside from their role as a model of signal transmission, synfire chains have been successfully applied to many computational tasks (Jacquemin, 1994; Aertsen & Braitenberg, 1996; Arnoldi, En- glmeier, & Brauer, 1999; Abeles, Hayon, & Lehmann, 2004; Hayon, Abeles, & Lehmann, 2005; Izhikevich, 2006). From a theoretical point of view, synfire chains have been intensively studied over the last decades (Abeles, 2009). In particular, it has been shown that synfire chains can be embedded into recurrent neural networks (Bienenstock, 1995; Herrmann, Hertz, & Prügel-Bennett, 1995; Mehring, Hehl, Kubo, Diesmann, & Aertsen, 2003; Aviel et al., 2003; Leibold & Kempter, 2006; Kumar, Rotter, & Aertsen, 2008; Trengove, van Leeuwen, & Diesmann, 2013). While this embedding is well understood, the question of how synfire chains emerge in initially unstructured networks is still far from solved; this is the question that we address here. Already in 1991, Moshe Abeles made the observation that sparse random networks (as observed locally throughout cortex) contain an abundance of connectivity schemes similar to synfire chains (Abeles, 1991). However, in such networks the activity does not propagate along a single chain but rather diverges quickly, resulting in ‘chaotic’ network behaviour (van Vreeswijk & Sompolinsky, 1998). In this paper, we study whether there exist learning rules which ‘stabilize’ these connectivity schemes and yield emerging synfire chains in such networks. We find that spike-timing dependent plasticity (STDP) modulated by the global activity in the population gives a positive answer: a long chain grows in an unsupervised way from a set of neurons (stimulus)
20 introduction which are synchronously stimulated with low frequency (multiple stimuli yield multiple chains). The resulting learning rule is a three- factor learning rule, this is, in addition to the pre- and postsynaptic spike time, it depends on a third factor (for review of such rules, see (Frémaux & Gerstner, 2015; Pawlak, Wickens, Kirkwood, & Kerr, 2010)). The third factor is the global activity in the population and it determines the polarity of STDP. Since the global activity is a feedback signal from within the network, it has been termed internal feedback in similar learning rules (Urbanczik & Senn, 2009; Friedrich, Urbanczik, & Senn, 2011; Brea, Senn, & Pfister, 2013). The internal feedback fosters neurons to participate multiple times in the chain or in multiple chains. Neurons are reused within a single chain and across multiple chains, which increases the network capacity and is in agreement with experimental observations (Abeles et al., 1993; Segev et al., 2004; Luczak et al., 2007). Interestingly, the restriction to sparse connectivity prevents the chains from becoming short and cyclic and shows that the formation of specific new synapses is not essential in the process of chain development, opposed to previous speculations (Jun & Jin, 2007). We analyze the rule mathematically in a simple network of binary threshold neurons and show that it is optimal: no learning mecha- nism which starts with a sparse random network and does not add additional synapses can form longer chains asymptotically without in- troducing strong correlations between the patterns. Subsequently, we investigate and simulate the rule in a network of conductance-based leaky integrate-and-fire (LIF) neurons and find that the emerging connectivity scheme resembles the one of the simple network. As an application, we show that the emerged chains can be used to learn sequences of precisely timed neuronal activity in a ‘one-
21 2. emergence of synfire chains shot’ fashion: once the synfire chain is established in some neuronal population, a sequence of neuronal activity in a different population is learned by modifying the synapses between the two populations with a Hebbian rule. The model solves similar tasks as proposed in (Lazar, Pipa, & Triesch, 2009; Brea et al., 2013), however, our learning procedure needs to be exposed only once to the sequence to be learned (one-shot learning).
22 introduction
Time
Figure 21: Illustration of a synfire chain in a network of n = 8 neurons over the time course of 4 time steps (each time step shows the entire network). The chain has pattern size m = 3 (i.e. in every time step groups of 3 neurons are active, indicated by blue color), spike threshold k = 2 (i.e. neurons turn active if they receive signals from at least two neurons which have been active in the previous time step, indicated by pink color), and length 4. The patterns do not need to be disjoint, for example the second neuron from the top participates in the first and the last pattern. In real neural networks m and k are assumed to be much larger.
23 2. emergence of synfire chains
2.2 Materials and Methods
In this section, we first introduce our learning rule in a simple network of binary threshold neurons and later transfer it to a network of conductance-based LIF neurons. Second, we propose a network model for one-shot learning of sequences. We start with a simplified model in which time is divided into time steps (of length roughly one axonal plus synaptic delay), neurons are binary threshold neurons, and inhibition and the feedback signal are precise. These restrictions allow a precise mathematical treatment and we relax them below (imprecise inhibition and feedback mechanism) and further in Section 2.2.2 in a network of spiking neurons.
2.2.1 Simple model We consider a population of n excitatory binary threshold neurons (McCulloch & Pitts, 1943) over the course of discrete time steps. The activity of each neuron v at time t is a binary variable x (v) 0, 1 . t ∈ { } The initial network structure is given by a directed graph G = (V, E), with vertex set V and edge set E abstracting the neurons and synapses. We consider a sparse random network with connection probability p.A random network corresponds to a directed version of Gn,p, the Erd˝os- Rényi random graph (Erd˝os& Rényi, 1959): between each pair of neurons a synapse is present, independently with probability p. The network is sparse if p 1. The synapses are multistate synapses (Ben Dayan Rubin & Fusi, 2007), and their state yt depends on internal metaplasticity parameters (introduced below): if yt(uv) = 1, then the synapse from neuron u onto neuron v is active in the sense that it transmits signals, and if
24 materials and methods
yt(uv) = 0, then it is silent meaning that it does not transmit signals at time step t. The input of neuron v in time step t is comprised of three sources: (1) excitatory input from neurons within the network which spiked in the previous time step, (2) external input representing spontaneous activity and denoted by St(v), and (3) the inhibitory input from a source of global inhibition, denoted by It. The state of neuron v in time step t is thus given by ! xt(v) = H k + St(v) + It + ∑ xt 1(u) yt 1(uv) , − uv E − · − ∈ where H is the Heaviside step function, with H(x) = 1 if x 0 ≥ and H(x) = 0 otherwise. This determines the spike threshold k of the neurons. The external input triggers a spike (in the absence of inhibition, regardless of excitation from within the network) with probability pspon independently for all neurons (i.e. St(v) = k with probability pspon and St(v) = 0 otherwise). Inhibition prevents any spike if the activity in the previous time step was too large (i.e. if ∑v V xt 1(v) > m, then It = ∞ and It = 0 otherwise). Therefore, ∈ − − the parameter m of the inhibition determines the pattern size, as will become clear later. Note that here the inhibitory pathway is at least twice as fast as the excitatory one. However, this assumption is not crucial as indicated below.
Learning rule. The state of each synapse uv at time t is determined by the consolidation value c (uv) 0, . . . , c of the synapse and the t ∈ { ∗}
25 2. emergence of synfire chains irresolution value r (uv) 0, . . . , r of the synapse: t ∈ { ∗} ( 1 if ct(uv) > 0 and rt(uv) < r∗, yt(uv) = 0 otherwise.
These two metaplasticity parameters have the following purpose. The consolidation value can only be increased or decreased by 1, so if ct(uv) is small (large), then the synapse can easily (hardly) turn silent (initially the consolidation value of all synapses is small). The irreso- lution value counts how often a synapse turned from active to silent, and the synapse is removed if it did so too often (initially, all synapses have irresolution value 0). The learning rule is an STDP rule modulated by a feedback signal. This feedback signal Ft is triggered if the activity in the network is too large (i.e. Ft = 1 if ∑v V xt(v) > m and Ft = 1 otherwise). As the − ∈ feedback is determined from within the network, the feedback signal is internal. The learning rule modifies the consolidation value of a synapse as follows:
c + (uv) := c (uv) + x (u) x + (v) F + , t 1 t t · t 1 · t 1 which is clipped to stay between 0 and c∗. On the one hand, if the feedback signal is not present, then a presynaptic spike in the time step before the postsynaptic spike causes LTP (i.e. long-term synaptic potentiation: the synapse increases its consolidation value by 1, in case the consolidation value is not at its maximum c∗). If the consolidation value was 0, this means that the synapse turns from silent to active. On the other hand, if the feedback signal indicates that the activity is too large, then it causes LTD (i.e. long-term synaptic depression: the synapse decreases its consolidation value by 1, if the consolidation
26 materials and methods value is positive) if the presynaptic neuron spikes in the time step before the postsynaptic neuron. If the consolidation value was 1, this means that the synapse turns from active to silent. Further, the learning rule modifies the irresolution value of a synapse as follows:
r + (uv) := r (uv) + y (uv) (1 y + (uv)), t 1 t t · − t 1 which is clipped to stay between 0 and r∗. Thus, if a synapse turns from active to silent, then it increases its irresolution value by 1. Note that the irresolution value can never decrease. Hence, as soon as it reaches its maximum r∗ the synapse cannot get active ever again, and we say that the synapse is removed from the network. We distinguish three basic states of a synapse. First, we say that a synapse is present in the network if its irresolution value is smaller than r∗ (i.e. the synapse has not been removed). Second, we call a synapse active if it is present and its consolidation value is at least 1 (i.e. the synapse transmits signals). Third, a synapse is consolidated if it is present and its consolidation value attains the maximum c∗.
Learning procedure. A subset of m neurons, called stimulus and de- noted by A1, is repeatedly activated synchronously by external input and the chain will grow starting from this stimulus. The time between two reactivations of the stimulus is one round of the learning proce- dure. In each round, the activity spreads – time step by time step – through the chain developed so far until it dies out (if the activity becomes too large it is stopped by inhibition). If at the beginning of each round one of several stimuli is activated synchronously, then from each of those stimuli a chain grows.
27 2. emergence of synfire chains
Relation of connectivity and dynamics. In this network model, the connectivity of the network and the spread of activity in the network (i.e. its dynamics) are closely related, particularly in the absence of spontaneous activity and inhibition. We now introduce useful notation and highlight this property. By underlining (overlining) the notions introduced below, we in- dicate that they concern present (consolidated) synapses. Otherwise, they concern active synapses. Consider a network G = (V, E). For a neuron v V and a set of ∈ neurons A V, we denote by degA(v) the indegree (convergence) of ⊆ v with respect to A. This is the number of neurons in A projecting to v via active synapses. As mentioned above, the number of neurons in A projecting to v via present (consolidated) synapses is denoted A degA(v) (deg (v)). We abbreviate deg(v) := degV (v). For k N, we ∈ denote by Γk(A) := v V degA(v) k the k-neighborhood of A. { ∈ | ≥ } This is the set of neurons with at least k in-neighbors in A. If A is the set of neurons which spike at time t, then Γk(A) is the set of neurons which spike at time t + 1 (in the absence of spontaneous activity and inhibition). For two sets A, B V, we write E(A, B) for the set of ⊆ synapses with presynaptic neuron in A and postsynaptic neuron in B. The state of synapses changes over time due to learning. We indi- cate this by introducing time to these notions. We denote by Et the k synapses, by degt(v) the indegree of v, by Γt (A) the k-neighborhood of A, and by Et(A, B) the synapses from A to B in the network at time step t. The corresponding density of synapses is then defined as p := E /n2. t | t| In the absence of spontaneous activity and inhibition, we denote the set of neurons spiking at time t by At. Thus, the spread of
28 materials and methods
activity starting from a stimulus A1 can be recursively defined as k At+1 = Γt (At). In this case, the sub-network of active synapses defines the spread of activity. In the presence of spontaneous activity, + we denote the set of neurons spiking at time t by At and the neurons which would spike even in the absence of spontaneous activity by At.
Synfire chains. A synfire chain is a structure in a neural network connecting neuron groups of roughly the same size (patterns) in a sequence such that synchronous activity in one pattern elicits syn- chronous activity in only the following pattern after one synaptic delay, see Figure 21. Griffith proposed the underlying connectivity scheme (Griffith, 1963) and Moshe Abeles established it together with its dynamics as synfire chain (Abeles, 1991). If in our model the activation of the stimulus A1 results in the activation of exactly m neurons in the subsequent l 1 time steps (i.e. − if A = ... = A = m holds in the absence of spontaneous activity), | 1| | l| then the activity propagates along a synfire chain of pattern size m, spike threshold k, and length l with patterns A1,..., Al. We define the length of the synfire chain starting from the stimulus A1 to be the smallest l such that A = m or A = A for some 1 t l (the | l+1| 6 l+1 t ≤ ≤ second condition defines the length for cyclic chains).
Relaxations. The simple model introduced above works with unre- alistically precise constraints. For example, inhibition is triggered if more than m neurons spike in one time step. While this simplifying approach helps tremendously to understand how the model works, it is important to observe that none of these constraints is essential for the process of chain formation. For this reason, we sketch a relaxed model in which the hard constraints are mitigated (below we also dis-
29 2. emergence of synfire chains cuss a continuous-time model with spiking neurons, see Section 2.2.2). As in the simple model, we assume that the same mechanism triggers inhibition and the feedback signal. Recall that there, the mechanism can detect exactly whether or not more than m neurons spike in one time step. In the relaxed model we use a probabilistic mechanism with accuracy probability p : if A+ > m, then the mechanism detects acc | t | with probability min(1, ( A+ m) p ) that too many neurons spike | t | − · acc in step t. Hence, if the number of active neurons exceeds the pattern size m by 1, then the mechanism detects this with probability pacc, but if the number of active neurons is at least m + 1/pacc, then it does so with probability 1. Note that this does not only relax the accuracy, but also the timing of the mechanism since a too large activity may not be detected right away but only several steps later. Moreover, we make the learning rule probabilistic through the parameters pinc and pdec. A synapse responds to an LTP signal with probability pinc and to an LTD signal with probability pdec. Hence, an active set of size larger than m yields an LTP signal (with probability at most p (1 p )) or an LTD signal (with probability inc · − acc at least p p ). We thus require p (1 p ) < p p to acc · dec inc · − acc dec · acc achieve stability.
Parameters. The following parameters were used in Figures 22–25 if not explicitly specified otherwise in the respective caption. We simulate a network of n = 500 neurons where the pattern size is m = 30 and the spike threshold is k = 3. This determines the connection probability p = (1 + δ) p, where p is roughly 15/n according to · Equation (2.3) from Section 2.3.1 below and δ = 0.2. We simulate both the basic (pacc = 1) and the relaxed version (pacc = 0.2). We set the remaining parameters to pdec = 1, pinc = 0.1, c∗ = 100, and r∗ = 50.
30 materials and methods
We take pinc = 0.1 for both the simple and the relaxed model to make them comparable. The default initial distribution of active synapses is uniform (i.e. pact = 0.5). Active synapses are initialized with consolidation value 1 and all synapses are initialized with irresolution value 0. To speed up the simulations we use an artificial mechanism of spontaneous activity: if the activity is too small in one round, we additionally activate a random neuron (this is not crucial for chain formation, see Section 2.2.2).
Technical assumptions and notation. We carry out an asymptotic analysis with the number of neurons going to infinity (i.e. n ∞). In → asymptotic statements, we write a b to indicate that a and b agree ∼ up to smaller order terms (i.e. a = (1 x) b, with x 1) and a b ± · ≈ to indicate that a is of the order of b (i.e. a = c b, with constant c > 0). · We require m n and m log n (e.g. m = √n) and choose the connection probability p = (1 + δ) p, where p satisfies Equation (2.3) · from Section 2.3.1 below and δ 1. The spike threshold k 2 is a ≥ constant integer. In our analysis we assume that degAt (v) is Bin( A , p )-distributed, t | t| t independently for different v. Analogously, for present and consol- idated synapses. This condition is not fully satisfied: if u and v are neurons such that deg(u) > deg(v) then u has a larger probability to appear in a pattern than v. This means that for fixed t, neuron u is more likely to appear in At than v. Therefore, we also consider random networks with fixed indegree d, denoted Gn,d. To generate such a network, every neuron v chooses an input set I V of d v ⊆ vertices uniformly at random, and we insert all synapses uv for u I . ∈ v
31 2. emergence of synfire chains
2.2.2 Network of spiking neurons In this section we transfer the learning rule from Section 2.2.1 to a network of conductance-based leaky integrate-and-fire neurons (LIF neurons) with continuous-time dynamics. We simulated the network using the NEST neural simulation tool (Gewaltig & Diesmann, 2007). The excitatory population consists of n = 200 conductance-based LIF neurons (NEST iaf_cond_exp) with membrane capacitance Cm = 1 µF/cm2, leak reversal potential V = 60 mV, excitatory reversal l − potential VE = 0 mV, inhibitory reversal potential VI = 70 mV, 2 − constant leak conductance gL = 0.4 mS/cm , threshold potential V = 50 mV, and synaptic time constants τ = 4 ms, in accordance θ − s with (Fiete, Senn, Wang, & Hahnloser, 2010). The refractory period is t = 25 ms and the reset potential is V = 60 mV, as imple- re f reset − mented by individual inhibition in (Fiete et al., 2010). Each excitatory neuron spikes spontaneously according to a Poisson process with rate λ 0.03 Hz. spon ≈ The excitatory population is randomly interconnected by plastic synapses (adaptation of the synapse introduced in Section 2.2.1, dis- cussed below) such that every excitatory neuron has indegree 35 (i.e. δ 0.4). The delay of the plastic synapses is d = 5 ms, making ≈ EE up for the burst time in (Fiete et al., 2010). The weight of an active synapse is such that 5 EPSPs occurring in a short period trigger a spike (i.e. k = 5), whereas the weight of a silent synapse is 0. Initially, all synapses are silent. Inhibition is implemented by a single neuron (NEST iaf_cond_exp) 2 with membrane capacitance Cm = 1 µF/cm , leak reversal potential V = 60 mV, excitatory reversal potential V = 0 mV, inhibitory l − E reversal potential V = 70 mV, constant leak conductance g = I − L
32 materials and methods
0.4 mS/cm2, threshold potential V = 50 mV, and synaptic time con- θ − stants τs = 2 ms. The refractory period is tre f = 5.0 ms and the reset potential is V = 60 mV. Every excitatory neuron is connected to reset − the inhibition via a static synapse (NEST static_connection) with delay dEI = 1 ms. The weight of such a synapse is such that 21 EPSPs occurring in a short period of time trigger a spike (i.e. m = 20). More- over, the inhibition is connected to all excitatory neurons via a static synapse (NEST static_connection) with delay dIE = 1 ms where the weight is chosen such that hyperpolarized neurons can essentially not spike in the time interval when the next excitatory input is expected.
Learning rule. The plastic synapses have the same properties as in- troduced in Section 2.2.1 and obey the same learning rule, where we introduce an STDP window to make up for the absence of time steps. They are an adaption of NEST stdp_dopa_connection, which implements the feedback signal as a neuromodulator (Potjans, Morri- son, & Diesmann, 2010)). Let tpre (tpost) be the time of a presynaptic (postsynaptic) spike. The learning rule is as follows:
if there is no feedback signal in the time interval [t , t + ∆ ] • pre pre − and ε t t ∆+, then LTP is triggered; ≤ post − pre ≤ if there is feedback signal in the time interval [t , t + ∆ ] and • pre pre − ε t t ∆ , then LTD is triggered (if two such intervals ≤ post − pre ≤ − overlap, then LTD is triggered only once).
For each presynaptic spike, only the closest postsynaptic spike is considered. If the consolidation value of a synapse reaches c∗, then it cannot be decreased. The feedback signal is coupled to the inhibition, if the inhibition spikes, then 0.1 ms later the feedback signal is present.
33 2. emergence of synfire chains
+ The synaptic parameters are ∆ = 10 ms, ∆− = 60 ms, ε = 3 ms, c∗ = 100, and r∗ = 50. Moreover, plastic synapses are subject to a small decay which triggers LTD with a rate of 0.6 Hz (Miller & Jin, ≈ 2013).
Choice of parameters. The parameters must satisfy some conditions we list here. The positive learning window ∆+ must be such that only synapses connecting neurons in subsequent patterns get consolidated by LTP. Therefore, it is of the order of the synaptic delay or the burst time. The negative learning window ∆− however, is chosen such that LTD can remove any conflicting synapses. Thus, ∆− is on the same time scale as the membrane time constant. The decay must be strong enough to prevent synapses which do not connect neurons in subsequent patterns from becoming essential parts of the chain, as this would create instability since they are not consolidated by LTP. Furthermore, the (long) refractory period is a simplified model of a mechanism preventing that neurons burst due to slowly arriving synaptic current. Such mechanisms include specific inhibition for each neuron as in (Fiete et al., 2010), neuron models with adaptation, or a strong relative refractory period. The feedback signal is coupled to the inhibition almost instantaneously. However, a feedback signal affecting a pre-post pair can arrive anywhere in the interval [tpre, tpre + ∆−] (and the right bound of the interval is arbitrary and can be much larger). Thus, a longer delay between inhibition and feedback signal, as expected for neuromodulatory signals, is feasible.
Simulation. The stimulus is a random subset of the excitatory pop- ulation containing m = 20 neurons. The stimulus is simultaneously activated every 300 ms. We perform 150, 000 reactivations of the stim-
34 materials and methods ulus in a continuous segment. These reactivations correspond to roughly 12.5 h of simulated time. The resolution of the simulation is 0.2 ms.
2.2.3 Network model for one-shot learning of sequences Here, we sketch a neural network model of a short-term memory for sequences. Already in the 1950s, Lashley suggested that memory items cannot be directly linked together to form a sequence (Lashley, 1951). This suggestion led Conrad to his positional theory of sequence learning in short-term memory (Conrad, 1965), which is now known as Conrad’s boxes: he suggested that each item is linked to a box and that sequence recall corresponds to stepping through the boxes in sequential order. This idea inspired our model: we represent the boxes by the patterns of a synfire chain and the linking to memory items is done by Hebbian learning in a one-shot fashion such that the sequence to be learned needs to be presented only once.
Network architecture. The network consists of a hidden layer and a visible layer. The hidden layer is a sparse network as described in Section 2.2.1 and it will contain a synfire chain representing the sequential ordering. For simplicity, the visible layer is comprised of entities which represent the symbols in , the underlying alphabet S of the sequences to be learned. Moreover, the visible layer is a 1- winner-takes-all (WTA) network. That is, at each time step during recall the entity with the largest input is active (in a slightly more complex setup, if the visible layer is a population of neurons without WTA dynamics, then sequences of precisely timed neuronal activity in the visible layer can be learned). The two layers are connected
35 2. emergence of synfire chains via afferent synapses from the hidden layer to the visible layer (each possible afferent synapse is independently present with probability pa f f ). The learning rule of these afferent synapses is Hebbian as discussed below. For an illustration of the network architecture, see Figure 27 (a). Note that the described architecture with a hidden network layer and visible read out units resembles SORNs (Lazar et al., 2009), reservoir computing (Maass, Natschläger, & Markram, 2002), and can also be found in recent model of hippocampal replays (Gauy et al., 2018).
Learning and recall. Learning takes place in two phases. In the first phase, a synfire chain A1,..., Al emerges in the hidden layer, as shown below in Section 2.3.2. This first phase may take a long time, but it is completely self-organized in the sense that it is unsupervised and does not require any external input; in particular it does not require any information about the sequences to be learned. The network automatically converges to a state with a long chain, and it will stay in this state after convergence. Hence, there is no need for a supervisor who decides when the first phase should end, since it can just go on indefinitely. For an illustration of the network after the first phase of learning, see Figure 27 (b). In the second phase, the network learns an input sequence from a single presentation (i.e. in a one-shot fashion). More precisely, we assume that all afferent synapses are initially silent. When an input sequence s ,..., s with s is to be learned, the entity s is 1 l t ∈ S t activated in the t-th time step (by the teacher). Moreover, the stimulus A1 is activated in the first step. The learning rule is a simple Hebbian learning rule: in each step, all afferent synapses between the active pattern and the active entity turn strong (Erickson, Maramara, &
36 results
Lisman, 2010). Note that by the construction of the chain, in the t-th step the active pattern in the hidden layer is At, and the active entity in the visible layer is st. For an illustration of the network after the second phase of learning, see Figure 27 (c). For recall, the stimulus in the hidden layer is activated, and activity propagates through the network. The output of the network at time t is the active entity in the visible layer at time t. The length of the recalled sequence is the number of time steps until the first mistake.
2.3 Results
We start with an outline of our principal findings, which are then explained in greater detail below. In Section 2.3.1 we substantiate the observation made by Moshe Abeles that random networks without learning contain the connectiv- ity scheme underlying synfire chains. However, we find that even if parameters are fine tuned – in the absence of learning – the activity diverges after log m steps and thus the length of a functional synfire ≈ chain with pattern size m in a random network of n neurons is only in that order of magnitude. In Section 2.3.2, we describe how long chains emerge due to our learning rule: the stimulus is repeatedly activated and the activity propagates along the chain developed so far until at the end of the chain either new neurons are recruited by spontaneous activity and LTP to grow a pattern, or LTD carves out a pattern of the correct size. The chain is stabilized by the metaplasticity parameters of the synapses so that the growth stops as soon as the capacity of the net- work is reached. After convergence, the network remains in a stable state. Simulation results are summarized in Figures 22–25. The simu-
37 2. emergence of synfire chains lation reveals that the rule works well even in the relaxed model with imprecise mechanisms (Figure 22). We show that there is a trade-off in the size of the connection probability. If it is too small, then no synfire chain can develop in the network, and if it is too large, then the chains get short and cyclic (i.e. patterns are highly correlated; confirmed by simulation in Figure 25. Since large connection proba- bility corresponds to the possibility of forming (almost) all synaptic connections, this observation shows that formation of synapses is not only not required but actually obstructive for the process of chain formation. Moreover, Figure 23 shows the maximum overlap between patterns and implies efficient reuse of neurons. Finally, we show by simulation that indeed multiple chains emerge if multiple stimuli are activated (Figure 24). In Section 2.3.3, we determine the length of the chains asymptoti- 2 1/k cally to be in the order of (n/m) − log m, where n is the number of neurons, m is the pattern size of the chain, and k is the spike threshold, see Equation (2.11). Thus learning improves the length by a factor of (n/m)2 1/k and neurons get reused (n/m)1 1/k log m times ≈ − ≈ − on average in the chain. Note that for m = √n the improvement is essentially by a factor proportional to the number of neurons in the network. Simulations confirm the asymptotic results for finite n, see Figure 22, if parameters admit the assumptions made in the analysis. In Section 2.3.4, we show that one cannot hope for longer chains (unless the patterns in the chain may be highly correlated) by pre- senting an asymptotically matching upper bound, which holds for all non-structural learning rules: by an information theoretic argument every learning procedure where each pattern contributes a γ-fraction of its maximal information, and no additional synapses are formed can only produce a chain of length at most (n/m)2 1/k/γ. ≈ −
38 results
In Section 2.3.5, we investigate the learning rule in a network of conductance-based LIF neurons. Here, we present simulation results (Figure 26) and argue that in both models essentially the same connec- tivity scheme emerges in the network. Finally, in Section 2.3.6, we demonstrate that the developed synfire chains can be used to learn sequences in a one-shot fashion, see Figure 28.
2.3.1 Sparse random networks contain many synfire chains Already in 1991, Moshe Abeles observed that sparse random net- works (with parameters as locally observed in the cortex) contain an abundance of connectivity schemes resembling synfire chains (Abeles, 1991). Here we show that although random networks with large enough connection probability contain many synfire chains, the ac- tivity does not propagate along a single chain, but rather explodes (even if parameters are fine tuned). Our learning rule avoids this by removing few redundant synapses. Similarly, it has recently been shown that modifying only a small fraction of synapses in a random network is sufficient to match recorded sequences of neural activity closely (Rajan, Harvey, & Tank, 2016). For now, consider the simple model without spontaneous activity, inhibition, and learning. Under these assumptions, all synapses may be considered active (since silent ones cannot get active in case no learning is involved). Let A be a set of m neurons. Since each synapse is independently present with the connection probability p, the indegree degA(v) of each neuron v is Bin(m, p)-distributed, independently of all other
39 2. emergence of synfire chains neurons’ indegrees, and we get k m i m i E[ Γ (A) ] = n ∑ p (1 p) − (2.1) | | · i k i − ≥ (mp)k n .(2.2) ∼ · k! We define the equilibrium connection probability p as the connec- tion probability satisfying the equilibrium condition E[ Γk(A) ] = m. | | Solving Equation (2.1) for p yields
k! 1/k p .(2.3) k 1 ∼ nm − If the connection probability is significantly smaller or larger than p, then a few time steps (more precisely, log log n steps (Janson et ≈ al., 2012)) after activating m neurons, either zero or all neurons spike in one time step. In the first case, the network typically contains no synfire chain with pattern size m and spike threshold k, whereas in the second case an abundance of them are present since one can pick the next pattern of size m from the k-neighborhood of the current pattern and proceed recursively. However, even if the connection probability is exactly equal to p, then after log m time steps the activity is either ≈ larger than 2m or smaller than m/2 and thus the length of a functional synfire chain in the network is only log m. This can be seen as ≈ follows. From Equation (2.1) we see that if we have A = (1 + δ)m, where | | δ 1 may be positive or negative, then ((1 + δ)mp)k E[ Γk(A) ] n (1 + kδ)m (2.4) | | ∼ · k! ∼
40 results holds. Thus, the error δ is multiplied by a factor of k. In the equilibrium condition, we have that if A m, then Γk(A) is | | ≈ | | Bin(n, (m/n))-distributed, which has variance m. Therefore, ≈ ≈ the relative error is 1+ √m/m = 1 + 1/√m. If we start by activat- ≈ ing A1 of size m and denote the error in each time step by δt, then under the assumption that the indegree degAt (v) of each neuron v is Bin( A , p)-distributed, independently of all other neurons indegrees, | t| the error grows like
t t k δ + kδt = k δ .(2.5) t 1 ∼ 1 ≈ √m
In particular, the relative error will be a constant factor for t ∼ log(m)/(2 log k).
2.3.2 Learning rule stabilizes synfire chain In this section we demonstrate how a synfire chain grows during the learning procedure, for simulation results see Figures 22–25. Consider the simple model with spontaneous activity, inhibition, and learning. We start by describing the dynamics of the network in a single round of learning: activation of the stimulus A1 results in the activation + + of A2 in the second time step, which activates A3 in the third time step and so forth. Consider the t-th time step. We distinguish three scenarios. First, if A+ is zero, the activity dies out and learning | t | continues by reactivating the stimulus in the next round. Second, + if At m (but larger than zero), then by the learning rule all | | ≤ + + synapses in Et 1(At 1, At ) (i.e. all synapses in the network that − − + connect a presynaptic neuron in At 1 and a postsynaptic neuron + − in At ) increase their consolidation value due to LTP; in this way
41 2. emergence of synfire chains
+ silent synapses can become active. Third, if At > m, then synaptic | + | + depression is triggered: all synapses in Et 1(At 1, At ) decrease their consolidation value and active synapses might− get− silent. Additionally, in this case, inhibition is triggered and stops the spread of activity, which results in the procedure being continued through reactivation of the stimulus in the next round.
Formation of patterns. Consider the t-th time step and assume that the patterns A1 to At 1 have the correct size m and that additionally + − A = At holds for all 1 t0 < t, so that the chain was not interrupted t0 0 ≤ by spontaneous activity. If A < m, then the only way A could grow is if a neuron v which | t| t is not already in At is activated by spontaneous activity in the t-th round and if v additionally has at least k in-neighbors in At 1 (i.e. At 1 − deg − (v) k holds). By LTP, future activation of At 1 (in later t 1 ≥ − rounds)− results in the activation of at least A v . Thus, neurons t ∪ { } are recruited into patterns by spontaneous activity. + + If At > m, then synapses in Et 1(At 1, At ) decrease their consol- | | − − idation value due to LTD and may turn silent (or be removed, which we discuss later). Hence, in this case the future activation of At 1 − results in the activation of a smaller pattern and a pattern of correct size is carved out eventually.
Stable growth of the chain. Turning synapses active to recruit neu- rons for a pattern may also result in an increase of the size of previous patterns in the chain. However, if previous patterns become too large in this way, only these recently activated synapses are turned silent by LTD in the next round since their consolidation values are small compared to the synapses that have been part of the chain for longer
42 results and thus have higher consolidation values. Note that not all, but only synapses which cause this issue repeatedly need to be removed. The removal is controlled by the irresolution value. Thus, the interplay of the consolidation and the irresolution value ensures that the chain does not break and rather grows in a stable manner.
Convergence of chain development. The previous considerations also explain why the length of the chain is limited and how the growth eventually converges. Recall that l is the smallest index such that A = m or A = A for some 1 t l. If Γk(A ) < m | l+1| 6 l+1 t ≤ ≤ | l | and recruiting new neurons by spontaneous activity is impossible (as many synapses have been removed already), then the chain cannot k grow further. If Γ (A ) > m and all synapses from E (A , A ) are | l | l l l+1 consolidated because they connect previous patterns, then the chain cannot grow since it is not possible to reduce Al+1 to a pattern of correct size. Similarly, if A = A for any 1 t < l + 1 , then the l+1 t ≤ chain becomes cyclic and does not change. Hence, the chain only grows until one of these three cases occurs.
Connection probability trade-off. The connection probability p must be at least as large as the equilibrium connection probability p accord- ing to Equation (2.3). Otherwise, it is even unlikely that the second pattern can have size m. However, choosing p exactly equal to p does not yield a long synfire chain as discussed above. Perhaps counterin- tuitively, it is also not useful to start with a much larger connection probability since this increases the correlation of patterns, which re- sults in short cyclic chains, see Figure 25. Such correlations have been a problem in previous models (Hertz & Prügel-Bennett, 1996; Levy, Horn, Meilijson, & Ruppin, 2001; Kitano, Câteau, & Fukai, 2002;
43 2. emergence of synfire chains
Zheng & Triesch, 2014). To understand this, observe that if p p, then Γk(A) contains many neurons with large (i.e. k) indegree into the pattern A. Consequently the learning rule leads to many neurons with a large active indegree. However, neurons with large active indegree are likely to be in many patterns which results in highly correlated patterns. A connection probability close to p avoids this problem, see Figure 23 (d).
44 results
Simple Relaxed a Population size n b Pattern size m 1000 500 200 20 30 50 120 50 100 40 80 30 60 20
Length 40 20 10
0 0 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1
pactive pactive
c Spike threshold k d Density parameter δ 10 5 3 0.5 0.2 0.1 200 60 50 150 40 100 30 20 Length 50 10 0 0 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1
pactive pactive
Figure 22: Length of the chain. The intensity of the color corresponds to the variation given at the top of each figure. We performed 200 trials and the error bars show the standard error of the mean. The dashed lines show the analytic chain length obtained from Equation (2.10), in (d) the value for δ = 0.5 is 111 (not shown); if not altered in the plot, the number of neurons is n = 500, the pattern size is m = 30, the spike threshold is k = 3, and the density parameter is δ = 0.2, see Section 2.2.1 for all parameter values.
45 2. emergence of synfire chains
Emerged Gn,p Random Emerged G Random a n,d b 20 ● 25 ●
15 20 ● ● ●
● 15 10 ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● Overlap ● ● 5 ● ● ● ● ● 5 ● ●
200 500 800 1000 20 30 40 50 Population size n Pattern size m c d
14 ● ● ● 15 ● 12 ● ● ● ● ● ● 10 ● ● ● ● 10 ● 8 ● ● ● ● ● ● ● ● ● 6 ● ● ● ● ● ● ● ● ● 5 ●
Overlap 4 ● 2
3 5 7 10 0.1 0.2 0.3 0.4 0.5 Spike threshold k Density parameter δ
Figure 23: Maximum overlap between two patterns of the emerged chain (pinc = 1). We compare to a sequence of random patterns of the same length (control), as indicated by the color intensity. We performed 50 trials and the error bars show the standard error of the mean; If not altered in the plot, the number of neurons is n = 500, the pattern size is m = 30, the spike threshold is k = 3, and the density parameter is δ = 0.2, see Section 2.2.1 for all parameter values.
46 results
Simple Relaxed
pactive = 1 pactive = 1 pactive = 0.5 pactive = 0.5 pactive = 0 pactive = 0
● 150 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● Summed lengths
1 2 3 4 5 6 7 8 9 10 Number of chains
Figure 24: Multiple chains. We performed 50 trials and the error bars show the standard error of the mean; The number of neurons is n = 500, the pattern size is m = 30, the spike threshold is k = 5, and the density parameter is δ = 0.2, see Section 2.2.1 for all parameter values.
47 2. emergence of synfire chains
Simple Relaxed
pactive = 1 pactive = 1 pactive = 0.5 pactive = 0.5 pactive = 0 pactive = 0
60 ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● 40 ● ● ● ● ● ● ● ● ● ● 30 ● ●
Length ● ● ● ● 20 ● ● ● 10 ● ● ●
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. Density parameter δ Figure 25: Effect of increasing δ (i.e. the connection probability) on the length of the chain. For δ 0.5 the chains tend to become cyclic. We performed 50 trials and the≥ error bars show the standard error of the mean. The number of neurons is n = 500, the pattern size is m = 30, and the spike threshold is k = 3, see Section 2.2.1 for all parameter values.
48 results
2.3.3 Estimation of the chain length In this section, we determine the length of the chain. We outline an auxiliary procedure which performs slightly worse than our learning procedure but is mathematically tractable. Analyzing the auxiliary procedure gives a lower bound.
Auxiliary procedure. The auxiliary procedure starts with the stim- ulus A1 (in the first time step) in a network of active synapses and consolidates or removes synapses in order to form a chain. Consider the t-th step of the procedure. So far, the chain consists of patterns A1,..., At 1. The pattern At is formed as follows: first the procedure − removes (unconsolidated) synapses such that At 1 activates exactly − m neurons at time t and second the procedure consolidates synapses to prevent the removal of synapses which are part of the chain (it does so randomly as indicated below). The procedure either stops if too many synapses have been removed or too many synapses are consolidated. In contrast to our learning procedure, synapses are removed immediately and thus non-optimally. This results in shorter sequences and yields a lower bound on the chain length (a detailed definition of the auxiliary procedure and its relation to the learning procedure can be found in the supplementary material.
Evolution of the densities. To determine when the auxiliary proce- dure stops we track how the density of active synapses and the density of consolidated synapses evolve (detailed calculations can be found in the supplementary material. Let pt be the density of active synapses after formation of the t-th pattern. Since the connection probability is p = (1 + δ)p, where p is
49 2. emergence of synfire chains
k given in Equation (2.3) and δ 1, almost all neurons in Γt 1(At 1) − − have exactly k in-neighbors in At 1 and the procedure removes k − ∼ Γt 1(At 1) m synapses in step t (typically one from each neuron | − − | − which will not be in At). These considerations allow to determine the density of active synapses as
t 1 mk − p = p + δp 1 .(2.6) t − n2p
To determine the evolution pt of consolidated synapses, we assume that in every step mk random synapses are selected to become consoli- dated. Hence, for a non-consolidated synapse that has survived until 2 step t, the probability that it is consolidated in round t is mk/(n pt 1). − From this, we can conclude that the density of consolidated synapses is t 1! mk − p = p 1 (1 δ) 1 .(2.7) t − − − n2p
Length of the chain. From the evolution of the densities we can now determine the expected length of the chain. There are two reasons why the auxiliary procedure could stop at step t (terminating with a chain of length t 1). First, if too many synapses have been removed and −k therefore Γt 1(At 1) < m holds, and second, if too many synapses | − − | k are consolidated and thus Γt 1(At 1) > m holds. We assume that | − − | k k Γt 1(At 1) and Γt 1(At 1) are binomially distributed. Let qt be | − − | | − − | the probability that a neuron has at least k in-neighbors into At 1. As − At 1 = m, we have | − | q = Pr[Bin(m, p ) k].(2.8) t t ≥
50 results
Let Pt be the probability that the procedure stops in the t-th step because of the first reason. We get
Pt = Pr[Bin(n, qt) < m].(2.9)
Analogously, one obtains qt, the probability that a neuron has at least k in-neighbors via consolidated edges into At 1 and Pt, the probability − of stopping in the t-th step for the second reason. Let L be the random variable for the length of the chain. Since L is geometrically distributed, the expected length is
∞ t 1 E[L] = t − (1 P ) P .(2.10) ∑ · ∏ − t0 · t t=1 t0=1 From this, we determine the asymptotic of this expectation as
1 n2p n 2 E[L] log m − k log m,(2.11) ≈ m · ≈ m · substituting p according to Equation (2.3) in the last step. The full calculations can be found in the supplementary material. Note that this (together with having a small overlap of patterns, see Figure 23 implies heavy reuse of neurons in the chain: a neuron will be in roughly (n/m)1 1/k log m patterns on average. ≈ − For the small network sizes considered in Figures 22–25, the asymp- totic expression of the chain length in Equation (2.11) is not adequate 4 1/4 (e.g. for n = 10 , m = √n, δ = m− , and k = 3 the relative error compared to the simulated auxiliary process is roughly 2). Hence, we compare Equation (2.10) to the simulated learning procedure and obtain good agreement, see Figure 22, in particular, if the assumptions we made in our analysis are met: if δ is small as a function of n,
51 2. emergence of synfire chains see Figure 22 (a) and (d), and if m is large enough but significantly smaller than n, see Figure 22 (b). Furthermore, since k is assumed to be constant and independent of n there is good agreement for all k, see Figure 22 (c).
2.3.4 An upper bound for non-structural learning rules In this section, we give an upper bound on the length of a synfire chain in a sparse random network using a short information theoretic argument. This upper bound applies to all learning procedures satis- fying the following two conditions: (i) no new synapses can be added to the network and (ii) all patterns in the chain must be reasonably uncorrelated. More precisely, we assume that each pattern contributes n entropy γ log (m), for some 0 < γ < 1. Note that this is a γ-fraction of the maximal information that a pattern of size m can contribute. Hence, γ is a measure of how correlated the patterns are: if γ is large, the patterns are essentially random; small γ however, corresponds to large overlaps among patterns. On the one hand, by the second assumption the total binary entropy of a chain of length l is at least l γ log ( n ) l γm log n. On · 2 m ∼ · 2 the other hand, by the first assumption, we can encode the chain by encoding the network, and for each synapse encoding whether 2 it is active or not. The entropy of Gn,p is H2(p)n (where H2(x) = x log x (1 x) log (1 x) is the binary entropy function), and − 2 − − 2 − we can encode for each synapse with one bit whether or not it has been removed. Therefore, the entropy of the chain cannot exceed H (p)n2 + pn2 H (p)n2. Together, we obtain 2 ∼ 2 l γm log n H (p)n2,(2.12) · 2 ≤ 2
52 results or equivalently
1 H (p)n2 n2 p 1 n 2 1 l 2 − k ,(2.13) ≤ γm log n ∼ m · γ ≈ m · γ where the last step holds if p is chosen according to Equation (2.3). Note that this matches the performance computed in Equation (2.11) for γ 1/ log m. Since we require γ to be not too small (the patterns ≈ should be sufficiently uncorrelated), this shows that one cannot hope for a substantially longer chain.
2.3.5 Simulation results of spiking network In the network of spiking neurons (Section 2.2.2), the neurons in- tegrate input from more than just the previous pattern because the membrane time constant is longer than the synaptic delay. As observed in previous models this poses difficulties for synchrony and chain development. However, our learning rule (in combination with the decay) ensures that only synapses which connect neurons in succeed- ing patterns are consolidated. Thus, essentially the same connectivity scheme as in the simple model is carved out and a long chain with synchronous transmission emerges. Therefore the analysis of the simple network carries over since the statements made there concern primarily the connectivity scheme of the network. We simulated 150, 000 reactivations of the stimulus and obtained a chain of length 22 with pattern size 20, a maximal overlap of ≈ ≈ 7 among two patterns, and using 189 out of the 200 neurons, ≈ ≈ see Figure 26. Hence, on average neurons are used at least twice (see Figure 26 (c) for the histogram of occurrences). Although a comparison in absolute numbers is not too meaningful since it strongly depends on
53 2. emergence of synfire chains network parameters this exceeds previous spiking models qualitatively by many measures such as absolute length, reuse of neurons in the chain, the ratio of neurons used over population size, see (Levy et al., 2001; Fiete et al., 2010; Waddington, Appleby, De Kamps, & Cohen, 2012). The only model achieving long chains is (Jun & Jin, 2007) which we discuss below in Section 2.4.2.
54 results
a b
● ● ● ● ●● ● ●● ● ●● 20 ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● 150 ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● 15 ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● 100 ●●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● 10 ●●● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● 50 ●●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● 5 ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ●● ● ● Neuron no. Neuron ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● Chain length ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 40 60 80 100 120 140 ● 20 40 60 80 100 120● 140 Rounds (thous.) Time (ms) c d 70 100 60 Emerged Emerged 50 Random 80 Random 40 60 30 40 20 Frequency Frequency 10 20 0 0 01234567 01234567 Occurrence in chain Overlap among patterns
Figure 26: (a) Simulation of the network of conductance-based LIF neurons. We performed 32 trials and the error bars show the standard error of the mean; (b) The rasterplot shows the spread of activity along the chain (stimulus onset at time 0) after 150, 000 reactivations of the stimulus. The neurons are sorted according to their first occurrence in a pattern; (c) Histogram of the number of occurrences of neurons in the chain (reuse). We compare the emerged chain to a sequence of random patterns of the same length (control). We performed 32 trials and the error bars show the standard error of the mean; (d) Histogram of overlap among patterns in the chain. We compare the emerged chain to a sequence of random patterns of the same length (control). We performed 32 trials and the error bars show the standard error of the mean.
55 2. emergence of synfire chains
2.3.6 Sequences can be learned in one shot Learning and recall of sequences in our model from Section 2.2.3 can be analyzed by using the well-established theory of auto- and hetero- associative memory (Gauy, Meier, & Steger, 2017; Einarsson, Lengler, & Steger, 2014; Amit & Fusi, 1994; Knoblauch, Palm, & Sommer, 2010; Willshaw, Buneman, & Longuet-Higgins, 1969) and thus the underlying principles are already understood. If the patterns in the chain are mutually disjoint and all afferent synapses are present, then the length of the sequences which can be learned and fully recalled is the same as the length of the synfire chain: activating the stimulus A1 results in the propagation of activity along the chain, so At will be active in round t of recall and At thereafter activates st (only st gets input, if the patterns in the chain are disjoint). Hence, the entire sequence can be recalled. If the patterns of the chain are not disjoint (as in our case), then learning sequences in which letters are repeated is harder and reduces the length of a sequence that can be learned, see Figure 28. Thus, there is a trade-off between alphabet size of the sequence to be learned and reusability of neurons in the synfire chain. As a side note, we remark that if the afferent synapses are transient (they turn weak after some time), then the chain can be used an unlimited number of times to learn and recall sequences. Thus the first learning phase is only needed once, and afterwards provides a network for learning an arbitrary number of sequences, one sequence at a time. Learning a new sequence requires that the weights of the synapses between the hidden layer and the visible layers are reset. This can, for example, be realized by decreasing all weights over time. It is also possible to have multiple smaller chains, see Figure 24.
56 results
Assuming that the stimulus of each chain can be activated individually, the network can store many short sequences simultaneously. Note that this network allows to store rhythms where the time grid is given by the synaptic delay. The rate of the sequence to be learned is independent of the time a pattern in the chain is active: if one element of the sequence is presented longer, then it will be bound to several subsequent patterns in the chain and subsequently also presented longer during recall. Furthermore, one can relax the WTA dynamics in the visible layer and replace the abstract entities by neurons, to obtain a network which learns sequences of precisely timed neuronal activity (Brea et al., 2013) in a one-shot fashion.
57 2. emergence of synfire chains
a
Hidden layer A1 Before learning Afferent density Visible layer a b c d
b
A1 A2 A3 A4 A5 A6 After emergence of synfire chain a b c d
c A A A A A A After one-shot 1 2 3 4 5 6 learning of the sequence “babcdd” a b c d
Figure 27: Illustration of our Network for one-shot learning of se- quences. The hidden layer is a network as described in Section 2.2.1 with stimulus A1. The visible layer consists of entities representing the letters a, b, c, and d. Light grey color indicates presence of silent synapses; (b) The network after the first phase of learning. A synfire chain of length 6 developed in the hidden layer. Pink color indicates presence of active synapses; (c) The network after learning the se- quence babddc. Note that the sequence contains multiple occurrences of single characters. Pink color indicates presence of active synapses.
58 results
100 ● ● ● ● ● ● ● ● Synfire chain ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Random 80 ● ● ● ● ● ● ● Emerged G ● ● n,d ● 60 ● ● ● ● Emerged G ● n,p ● ● ● ● ● ● ● ● ● 40 ● ● ● ● ● ● Afferent density ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● paff = 0.8 20 ● ● ● ● ● ● ● ● ● paff = 0.5 paff = 0.3 2 4 6 8 10 12 14 16 18 20
Length of recalled sequence Length of recalled Alphabet size (hidden layer)
Figure 28: Length of recalled sequence for different alphabet sizes. The sequences are random strings over the alphabet. The color intensity of the color corresponds to the variation given at the right of the plot, we compare to a sequence of random patterns of the same length as the emerged chain (control), and the color indicates the connection probability pa f f between layers. The parameters are n = 1, 000, m = 30, k = 3, p satisfies Equation (2.3), similarly d, δ = 0.2, c∗ = 100, and r∗ = 50, pacc = 1, pdec = 1, pinc = 1, and pact = 0.5. Note that if the patterns in the synfire chain were disjoint, the maximum length of a sequence would be n/m = 33. We performed 50 trials and the error bars show the standardb errorc of the mean.
59 2. emergence of synfire chains
2.4 Discussion
In this section, we first discuss the model assumptions in their bio- logical background. After that, we put our results into the context of related work.
2.4.1 Model assumptions In our model synapses have binary efficacy, this means they are either silent, and have zero or small efficacy, or they are active hav- ing large efficacy. The consolidation value determines the efficacy and is increased or decreased by LTP or LTD events, respectively. This model was termed multistate synapse (Ben Dayan Rubin & Fusi, 2007) and has successfully been applied in learning and mem- ory (Baldassi, Braunstein, Brunel, & Zecchina, 2007; Ben Dayan Rubin & Fusi, 2007; Leibold & Kempter, 2008). It is based on experiments reporting that synaptic efficacy has discrete stable states (for review, see (Montgomery & Madison, 2004)) and in particular the discovery of silent synapses (Isaac, Nicoll, & Malenka, 1995; Montgomery, Pavlidis, & Madison, 2001). Moreover, it has been observed that LTP and LTD can switch between large and small efficacy states in an all-or-none fashion (Petersen, Malenka, Nicoll, & Hopfield, 1998; O’Connor, Wit- tenberg, & Wang, 2005). The plasticity of synapses can depend on previous activity, a phenomenon known as metaplasticity (for review, see (Abraham, 2008)) and this dependency creates discrete plasticity states (Montgomery & Madison, 2004). Furthermore, our synapses can switch from being silent to active only a limited number of times, as implemented by the irresolution value. It is known that silent synapses lack AMPA receptors in their postsynaptic membrane. However LTP can turn them active by integrating AMPA receptors into their post-
60 discussion synaptic membrane and LTD correlates with the removal of these receptors (Montgomery & Madison, 2004). The irresolution value simply assumes that this process can only occur a limited amount of times. Our learning rule is a three-factor STDP rule (for review of such rules, see (Frémaux & Gerstner, 2015; Pawlak et al., 2010)). In addition to pre- and postsynaptic spike events, the learning rule depends on the global activity in the network: if the activity is in a healthy regime, then a postsynaptic spike immediately following a presynaptic spike triggers LTP, however, if the activity is too large, then LTD is trig- gered. STDP modulated by such an internal feedback signal has been proposed in (Urbanczik & Senn, 2009; Friedrich et al., 2011; Brea et al., 2013). There are multiple proposals on how such a learning rule could be implemented biologically. Neuromodulators can affect STDP (for reviews, see (Pawlak et al., 2010; Frémaux & Gerstner, 2015)) and experimental support where a neuromodulator turns LTP into LTD is given in (Couey et al., 2007; Seol et al., 2007; Kwon, Longart, Vullhorst, Hoffman, & Buonanno, 2005; Cassenaer & Laurent, 2012). Further- more, it is conceivable that the feedback is determined or implemented by inhibition (Steele & Mauk, 1999; Wilmes et al., 2016). Another pos- sibility is the implementation through astrocytes as discussed in a similar setting in (Brea et al., 2013). However, an experiment linking it directly to population activity as predicted by our model has so far not been conducted (to the best of our knowledge). Our learning rule does not include LTD if a postsynaptic spike precedes a presynaptic spike (whereas standard STDP predicts LTD), since this LTD component is not necessary for chain formation. However, it is clear that it also does not hinder chain formation: such an LTD component would effectively depress synapses between a pattern and preceding patterns.
61 2. emergence of synfire chains
This shortens the sequences, since more synapses are removed, but decreases correlations between patterns. The connectivity of our network is sparse as indicated by elec- trophysical experiments (Mason, Nicoll, & Stratford, 1991; Markram, Lübke, Frotscher, Roth, & Sakmann, 1997) and uniform, justified by the small network size. Hence, not all combinatorially possible synapses can be formed. While synaptogenesis occurs throughout a lifetime, our model does not rely on the possibility to form a synapse between any particular pair of neurons. Moreover, during chain development, few silent synapses are removed permanently from the network as it has been speculated that silenced synapses are preferred candidates for elimination (Montgomery & Madison, 2004). The inhibitory mechanism in our model is global and highly sim- plified. However, the relaxed model indicates that neither timing nor precision need to be fine tuned, which makes it conceivable that it can be implemented by a population of fast inhibitory interneurons. By designing a more sophisticated inhibitory mechanism that favours spontaneous activity during pattern formation, but reduces its level during replay, the speed of chain formation can be improved. Similarly, the relaxed model shows that timing and accuracy of the feedback signal do not need to be fine tuned. Moreover, note that as shown in the network of spiking neurons, the time course of learning in the presence of the modulating feedback signal is much larger than in its absence, allowing for a slower mechanism in the first case, which is consistent with the modulator mechanisms discussed above.
62 discussion
2.4.2 Related work Previously suggested models use Hebbian learning or STDP in combi- nation with limiting the total synaptic input or output of each neuron to indirectly restrict the pattern size in the chain (Bienenstock, 1991; Hertz & Prügel-Bennett, 1996; Jun & Jin, 2007; Fiete et al., 2010; Okubo, Mackevicius, Payne, Lynch, & Fee, 2015). Such restrictions were encouraged by (Abeles, 2009) where it is speculated that this is the op- timal approach. However, limiting the total synaptic input or output of a neuron impedes the reuse of neurons in multiple uncorrelated pat- terns, because a neuron gets input from a limited number of neurons and sends output to a small number of other neurons. To illustrate this phenomenon consider the (slightly simplified) mechanism of (Jun & Jin, 2007; Miller & Jin, 2013): here the outdegree (divergence) of each neuron is bounded by d. Hence, the number of spikes sent by a pattern of size m is d m. By choosing the spike threshold k of the · neurons to be (roughly) equal to the divergence d it is ensured that the next pattern has size at most (m d)/k = m as well. However, a · neuron can now only excite d other neurons, and if it occurs multiple times the patterns get highly correlated, which results in cyclic chains. Similarly, neurons can also hardly participate in multiple chains in the network, disagreeing with experimental data (Abeles et al., 1993; Segev et al., 2004; Luczak et al., 2007). Such restrictions are in contrast to our model which limits the pattern size directly via the feedback signal on global activity. Here, the next pattern is not determined by single neurons in the current pattern (as above), but rather by their combination. This mechanism allows reusing neurons efficiently in many patterns of the chain or multiple chains. Moreover, previous models suffer from short and cyclic chains
63 2. emergence of synfire chains
(Bienenstock, 1991; Hertz & Prügel-Bennett, 1996; Levy et al., 2001; Kitano et al., 2002; Zheng & Triesch, 2014). They usually consider networks in which every synapse is present (silent or active) or can be added. For some learning paradigms, dense connectivity is even crucial (Fiete et al., 2010). In our model, restricting the connectivity to a cortical-like sparse random network is actually beneficial because it prevents cyclic, yet still allows long chains. The only previous work with long emerging chains is (Jun & Jin, 2007); Their approach relies on the formation of new synapses and uses a special mechanism of axon remodeling to limit the number of outgoing synapses of a neuron to enforce a certain pattern size (as sketched above). This approach, in turn, hinders the reuse of neurons: their chain has length 67 with a pattern size of 6 to 7 using only 443 neurons in of a population of size 1,000. Our approach shows that using a modulated STDP rule can overcome this limitation and that the remodeling/formation of synapses (although speculated in (Jun & Jin, 2007) to be essential) is not necessary for the process of chain formation.
2.5 Supplementary Material
This section contains the full analysis to determine the length of the chain constructed by our learning procedure. We first modify the learning procedure to obtain an auxiliary procedure which performs slightly worse but is mathematically tractable. Analyzing the auxiliary procedure then gives a lower bound.
64 supplementary material
2.5.1 Auxiliary procedure For ease of notation we assume that all synapses are active initially (this implies that the notions of present and active synapses are equiv- alent throughout the procedure, since synapses are either removed or consolidated, but never turned silent). The procedure creates a chain in one ‘round’, forming a pattern in each time step. First, the stimulus A1 is picked as a random subset of size m. Then, the chain is inductively constructed as follows. Assume patterns Ai are already k constructed for 1 i < t. All vertices of At must belong to Γt 1(At 1) ≤ k − − since no synapses can be added. If Γt (At 1) < m, the procedure | 1 − | thus stops with a chain of length t −1. Otherwise all neurons of k − Γt 1(At 1) will be in At since consolidated synapses cannot be re- − − k moved. Thus, if Γt 1(At 1) > m, then the procedure stops with a | − | chain of length t −1 (we will see in the analysis, that typically the − procedure stops because of the first reason). Otherwise, a random sub- k k k set X of size m Γt 1(At 1) is chosen from Γt (At 1) Γt 1(At 1) − | − | 1 − \ − and synapses are consolidated− resp. removed such− that in the− resulting k k network Γl (At 1) = Γt 1(At 1) X holds. This means, all synapses − − − ∪ from Et 1(At 1, At) are consolidated and unconsolidated edges from − − At 1 towards v are deleted randomly (by setting their irresolution − At 1 values to r∗) such that degt − (v) < k holds.
2.5.2 Relation to learning rule We point out why the auxiliary procedure yields a shorter chain than the learning rule. Since both procedures grow patterns by adding random neurons, it is clear that they only differ in how they remove synapses (assuming c∗ and r∗ are large enough). The following exam-
65 2. emergence of synfire chains ple demonstrates why the learning procedure makes ‘better’ choices at removing synapses than the auxiliary procedure. Consider a pat- tern At 1 and a neuron v with present degree k and consolidated − At 1 At 1 degree 0 (i.e. deg − (v) = k and deg − (v) = 0). Assume that both t 1 t 1 − − procedures decide that v should not be in At (both do so randomly). The auxiliary procedure removes a random unconsolidated synapse from At 1 to v immediately. The learning rule however, only removes − At 1 a synapse from At 1 to v if degt − (v) = k 1 for some t0 > t 1. − 0 − − In particular it removes the optimal synapse, since all other synapses from At 1 to v are part of the chain. −
2.5.3 Evolution of the densities To determine the time step in which the auxiliary procedure stops, we track how the density of active synapses and the density of con- solidated synapses evolve (in expectation). We choose the connection probability as p = (1 + δ)p, where p is chosen according to Equa- tion (2.3) and δ 1, as Figure 25 indicates that this is best. Moreover, we assume that degAt (v) is independently Bin( A , p )-distributed t | t| t for different v (and analogously for consolidated synapses) for t 1. ≥ Note that this condition is not fully satisfied: if u, v are neurons such that deg(u) > deg(v), then u has a larger probability to appear in a pattern than v. This means that for fixed t, neuron u is more likely to appear in At than v, increasing correlation among patterns, see Figure23. The effect vanishes as the average degree pn2 ∞, but → since we assume sparse graphs (and thus, pn2 to be a slowly growing function), the effect prevails for a long time. To estimate this effect, we consider also random networks with fixed indegree d, denoted Gn,d. To generate such a network, every neuron v chooses uniformly
66 supplementary material at random an input set I V of d vertices, and we insert all edges v ⊆ (u, v) for u I . ∈ v We first calculate the evolution of pt, the density of active synapses after formation of the t-th pattern. Since δ 1, almost all neurons k in Γt 1(At 1) have exactly k in-neighbors in At 1 and the procedure − −k − removes Γt 1(At 1) m synapses in step t (one from each neuron | − − | − which will not be in At). This yields
k Γt 1(At 1) m pt = pt 1 | − − | − (2.14) − − n2 k k nm pt 1 k! − m pt 1 − (2.15) ∼ − − n2 k! m p + (pt 1 p) = pt 1 + 1 − − (2.16) − n2 − p m pt 1 p pt 1 + 1 1 + k − − (2.17) ∼ − n2 − p mk mk = pt 1 1 + ,(2.18) − − n2p n2 where (2.15) follows from the definition of p in Equation (2.3) and (2.17) holds since (pt 1 p)/p δ (note that pt [p, (1 + δ)p]). Solving the − − ≤ ∈ recursion gives t 1 mk − p = p + δp 1 .(2.19) t − n2p
In order to determine the evolution pt of consolidated synapses, we assume that in every step mk random synapses are selected to become consolidated. Hence, for a non-consolidated synapse that has sur- vived until step t, the probability that it is consolidated in round t is
67 2. emergence of synfire chains
2 mk/(n pt 1). Thus, − mk = + ( ) pt pt 1 pt 1 pt 1 2 (2.20) − − − − n pt 1 − mk mk = + pt 1 1 2 2 (2.21) − − n pt 1 n − holds. Solving the recursion and using p1 = 0, we get
mk t 1 r 1 mk = − − pt 2 ∑ ∏ 1 2 (2.22) n = = − n pt s r 0 s 0 − t r 1 ! mk mk − 1 2 ∑ exp 2 ∑ ,(2.23) ∼ n − n pt s r=1 s=1 − where the approximation is valid since 1 x = exp( x + (x2)) for − − O x 1. Using (2.19), we bound the inner sum as ≤ r 1 1 1 r 1 1 − = − (2.24) ∑ p p ∑ t s s=1 t s s=1 1 + δ 1 mk − − − n2p r 1 t s! 1 mk − − 1 δ 1 (2.25) p ∑ 2p ∼ s=1 − − n t r+1! r 1 mk − − 1 δ 1 .(2.26) ≥ p − − n2p