<<

Research Collection

Doctoral Thesis

On as a principle of structure and computation in neural networks

Author(s): Weissenberger, Felix

Publication Date: 2018

Permanent Link: https://doi.org/10.3929/ethz-b-000312548

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.

ETH Library Felix Weissenberger

On randomness as a principle of structure and com- putation in neural networks

Diss. ETH No. 25298 2018

ONRANDOMNESSASAPRINCIPLEOFSTRUCTURE ANDCOMPUTATIONINNEURALNETWORKS

Diss. ETH No. 25298

On randomness as a principle of structure and computation in neural networks

A thesis submitted to attain the degree of

DOCTOROFSCIENCES of ETHZURICH (Dr. sc. ETH Zurich)

presented by FELIXWEISSENBERGER MSc ETH in Theoretical born on 04.08.1989 citizen of Germany

accepted on the recommendation of Prof. Dr. Angelika Steger Prof. Dr. Jean-Pascal Pfister Dr. Johannes Lengler

2018

Contents

Abstract iii

Zusammenfassung v

Thanks vii

1 Introduction 1

2 Emergence of synfire chains 19

3 Rate based learning with short stimuli 79

4 Mutual inhibition with few inhibitory cells 109

5 Lognormal synchrony in CA1 125

Bibliography 163

i

Abstract

This work examines the role of randomness in structure and informa- tion processing of biological neural networks and how it may improve our understanding of the nervous system. Our approach is motivated by the pragmatic observation that many components and processes in the brain are intrinsically stochastic. Therefore, and its methods are particularly well suited for its analysis and modeling. More profoundly, our approach is based on the hypothesis that the stochasticity of the nervous system is much more than just an artifact of a biological system. This hope stems from the experience in probability theory that random structures often have highly desirable properties and the theory of randomized algorithms, which impressively demonstrates that chance is extremely useful for the efficient computation of solutions to many problems. It is therefore not surprising that randomness has also been given a fundamental role in the structure and information processing of the nervous system. In this tradition, we study simple, mostly stochastic mathematical models of neurons, synapses and their interaction in neural networks and investigate emergent properties that can be proven mathematically, often with the help of discrete probability theory. The mathematical analysis allows the extraction of essential concepts that can ultimately be fully understood. Furthermore, we simulate more complex models to check whether the knowledge gained in this way generalizes. In this way, we can quickly examine, test and usually reject many hypotheses in purely theoretical considerations. In the case of useful ideas, these

iii can inspire concrete biological experiments and predict their outcome or help to understand and interpret experiments already carried out. In this process, we often draw inspiration from the field of discrete probability theory, especially random graph theory and the theory of randomized algorithms. Concretely, we first show that the structure of biological neural networks favors the formation of so-called synfire chains since it re- sembles locally the structure of directed random graphs. Synfire chains are an established model of multi-stage signal transmission in neural networks. Second, we demonstrate how the efficiency of rate based synaptic plasticity can benefit from a dependence on the local membrane potential as the fluctuations of this potential contain more relevant information than individual action potentials. Third, we prove that random synaptic connectivity in combination with the nonlinear interaction of inhibitory synapses allows mutual inhibitory communication between excitatory neurons, even if the number of inhibitory neurons is much smaller than the number of excitatory neu- rons. Fourth, we provide a possible explanation for the experimental observation that the number of neurons firing during certain stereotyp- ical network activity in the hippocampus corresponds to a lognormal distribution: the synaptic transfer of normally distributed network activity from one area to the next leads to lognormally distributed activity there.

iv Zusammenfassung

Diese Arbeit betrachtet exemplarisch die Rolle des Zufalls in der Struktur und Informationsverarbeitung biologischer neuronaler Netze und wie wir diese ausnutzen können, um das zentrale Nervensystem besser zu verstehen. Motiviert ist unser Ansatz zunächst durch die pragmatische Beob- achtung, dass viele Komponenten und Prozesse des Gehirns intrin- sisch stochastisch sind. Daher eignet sich die Wahrscheinlichkeitstheo- rie und ihre Methoden zur Analyse und Modellierung in besonderem Masse. Tief greifender, beruht unser Ansatz auf der Hypothese, dass die Stochastizität des Nervensystems weit mehr ist als nur ein Artefakt eines biologischen Systems. Diese Hoffnung rührt aus der Erfahrung in der Wahrscheinlichkeitstheorie, dass zufällige Strukturen oft sehr wünschenswerte Eigenschaften haben und der Theorie randomisier- ter Algorithmen, die eindrucksvoll belegt, dass Zufall zur effizienten Berechnung von Lösungen vieler Probleme äusserst nützlich ist. Da- her erstaunt es nicht, dass dem Zufall auch eine grundlegende Rolle in der Struktur und Informationsverarbeitung des Nervensystems eingeräumt wurde. In dieser Tradition betrachten wir einfache, meist stochastische ma- thematische Modelle von Neuronen, Synapsen und deren Verbund in neuronalen Netzen und untersuchen emergente Eigenschaften, die sich mathematisch, oft mithilfe diskreter Wahrscheinlichkeitstheorie, beweisen lassen. Ein solcher Ansatz erlaubt die Reduktion auf we- sentliche Konzepte die schlussendlich vollständig verstanden werden können. Des Weiteren simulieren wir komplexere Modelle, um zu

v prüfen, ob sich die so gewonnenen Erkenntnisse generalisieren las- sen. So können wir in rein theoretischen Betrachtungen schnell viele Thesen prüfen, testen und meist verwerfen. Im Fall brauchbarer Ide- en können diese konkrete biologische Experimente motivieren und deren Ausgang vorhersagen oder bereits vorgenommene Experimen- te verstehen und deuten. Inspiration schöpfen wir dabei häufig aus dem Gebiet der diskreten Wahrscheinlichkeitstheorie, vor allem der Zufallsgraphentheorie und der Theorie randomisierter Algorithmen. Konkret zeigen wir erstens, dass die Struktur biologischer neuro- naler Netze, die Formation sogenannter Synfire Ketten begünstigt, da sie lokal der Struktur gerichteter Zufallsgraphen ähnelt. Synfire Ketten sind ein etabliertes Modell mehrstufiger Signalübertragung in neuronalen Netzen. Zweitens demonstrieren wir wie die Effizienz synaptischer Plastizität von einer Einbeziehung des lokalen Membran- potentials profitieren kann, da die Fluktuationen dieses Potenzials mehr relevante Information enthält als einzelne Aktionspotentiale. Drittens beweisen wir, dass zufällige synaptische Verbindungen in Kombination mit nicht linearer Interaktion inhibitorischer Synapsen eine wechselseitige inhibitorische Kommunikation zwischen exzita- torischen Neuronen erlaubt, selbst wenn die Anzahl inhibitorischer Neuronen viel kleiner ist als die Anzahl exzitatorischer Neuronen. Viertens liefern wir eine mögliche Erklärung für die experimentelle Beobachtung, dass die Anzahl der Neuronen die während bestimm- ter stereotyper Netzwerk Aktivität im Hippocampus feuern, einer logarithmischen Normalverteilung entspricht: die synaptische Über- tragung normal verteilter Netzwerkaktivität von einem Bereich in den nächsten, führt dort zu log-normal verteilter Aktivität.

vi Thanks

Thank you to everybody who made my time at ETH so much fun!

First off, to my supervisor, Angelika Steger. I am sincerely grateful for the opportunity to work in your group. The environment on the intersection of combinatorics, neuroscience and machine learning that you created is unique. Your trust, support and advice mean a lot to me. I could not imagine a better boss and more inspiring mentor. Thank you! Thank you to Johannes Lengler, for your help, patience and uplifting spirit. You have been incredibly supportive. To Jean-Pascal Pfister, for invaluable feedback, for letting me partici- pate in his group meetings, and for sacrificing his time to referee this thesis. I also want to thank my other collaborators who contributed to this thesis; much of what is written here must be largely attributed to you. I am further especially thankful to all past and current members of our group and the institute who shared the time of my PhD with me. I will miss having you around.

Finally, I thank my family and friends for their love and support. I do not take this for granted. Thank you so very much.

Zurich June, 2018

vii

1 Introduction

The human brain is a fantastic computer. All our actions and thoughts, from simple movements to brilliant ideas, emanate from computations in our brains. This reductionist view allows a profound insight: the brain serves as a proof of concept for what human-designed computers should be capable of. Yet, it also shows us how poorly we understand information processing in the central nervous system right now.

1.1 The brain as an inherently probabilistic computer

If we want to understand computation in the brain, it may be in- structive to compare the central nervous system to digital computers, which we actually understand. First, let us start this comparison at the level of elementary compo- nents: our digital computers, following the architecture proposed by John von Neumann in the middle of the 20th century, are built from transistors joined by conductors in integrated circuits. Analogously, as discovered in the seminal physiology work by Santiago F. Ramón y Cajal in the early 20th century, the elementary components of the nervous system are discrete individual nerve cells interconnected by synapses in neural networks. However, although these fundamental building blocks seem to some extent comparable, there is a remarkable difference: while transistors are homogeneous, reliable and unmodifi- able, there is an abundance of different nerve cell and synapse types, which are stochastic in nature and constantly change their behavior.

1 1. introduction

For instance, individual synapses transmit signals only with a certain (surprisingly low) probability (Branco & Staras, 2009), and the qual- ity of signal transmission continually adapts in response to synaptic activity (Bliss & Lømo, 1973). If we now zoom out to the network level, we see a similar picture. The structure of the circuits in our digital computers, such as the central processing unit, is highly organized, static, and identical for all units of the same type. In contrast, neural networks often seem to lack any structure, evolve invariably and differ from one brain to the other. For example, locally the synaptic connectivity in the neocortex or certain areas of the hippocampus, such as area CA3, is considered to be random and independent of the spatial arrangement of neurons (Buzsáki, 2006). This apparent discrepancy, illustrated by the reliability and unrelia- bility of individual components along with order and disorder of their interaction, is not limited to the physical ‘hardware’, but also appears in the operating mode of computers and brains. Whereas the registers of digital computers typically contain the same bit strings whenever the same computation is carried out, the response of single neurons to identical stimuli displays a high degree of variability: the points in time when a neuron emits action potentials relative to the onset of a specific visual cue are highly variable from one trial to another, and appear to be completely irreproducible (Softky & Koch, 1993). At first glance, this unreliability, lack of structure and irreproducibil- ity may intuitively seem like a huge problem – if the brain is intrinsi- cally random, how could we expect any predictability whatsoever in its computations? More specific, how should we use our knowledge of computation, which is tied to our understanding of deterministic digital computers, to understand how the brain processes information?

2 determinism from probabilistic components

1.2 Determinism from probabilistic components

Fortunately, numerous results in probability theory show that pre- dictability and randomness are not incompatible. In fact, even com- pletely random structures and processes have very predictable large- scale properties. Let us illustrate this on the example of unreliable synaptic transmis- sion. As mentioned above, the probability that a particular synapse transmits a signal may be low. Nevertheless, pairs of neurons are connected not only by a single synapse but form several, up to hun- dreds of synaptic connections. Further, the probability of transmission appears to be independent among distinct synapses (Branco & Staras, 2009). Hence, even if one synapse transmits a signal only with proba- bility 20 %, the probability that no signal is transmitted at all between two neurons that are connected via 20 synapses is less than 1 %. Hence, one can be quite certain that a signal is indeed transmitted. Similarly, the random connectivity of neural networks implies order: the probability that two neurons in the hippocampal CA3 region, which contains roughly 200,000 neurons, are connected is roughly 5 % (Buzsáki, 2006). Hence, the expected number of connections a neuron forms is 10,000. Now, the probability that the actual number of connections deviates from this by only 250 is already less than 1 %. So we can almost certainly say that the actual number of connections a neuron forms is in this range. Analogously, the variability of single neurons is not problematic if the information is encoded in redundant populations of neurons or if one may take an average over many trials (Gerstner, Kistler, Naud, & Paninski, 2014). The previous considerations all rely on the fact that a large number

3 1. introduction of independent random events collectively exhibits almost determinis- tic behavior. This insight goes back to the origin of probability theory when Jacob Bernoulli proved the law of large numbers in the early 18th century and has culminated in the numerous concentration bounds that comprise the toolbox of any modern theoretician (Dubhashi & Panconesi, 2009). Concluding our comparison of digital computers and brains, there appears to be a remarkably consistent dichotomy between determin- ism and randomness. What is more, although the brain appears intrinsically probabilistic, this does not mean that it cannot compute deterministically, and the tools provided by probability theory seem to be adequate to study neural structure and computation. However, while this is comforting it is not too exciting. In the sequel, we will ar- gue in favor of a more fundamental role of randomness in the brain by taking a short (and biased) detour through the history of randomness in computation.

1.3 Chance as a principle of structure and computation

Around the beginning of the 20th century, mathematicians discovered that probability theory may be used to prove deep results in other areas of mathematics in a surprisingly elegant way. This led to the development of the so-called probabilistic method, initiated and pop- ularized by Paul Erd˝os.In essence, the probabilistic method can be summarized as follows: in order to prove the existence of an object with certain properties, we construct an appropriate probability space and show that a randomly chosen element in this space has the de- sired properties with positive probability (Alon & Spencer, 2008). The success of the probabilistic method demonstrates the usefulness of

4 chance as a principle of structure and computation chance to prove the existence of desirable combinatorial structures in an impressive manner, and we will see an application to neuroscience in Chapter 4. At the same time, the field of random graph theory emerged. A graph is a finite structure that consists of a set of vertices, some of which may be joined by edges. The original random graph model is called Gn,p, or Erd˝os-Rényimodel, after Paul Erd˝osand Alfréd Rényi who studied it in detail. For p [0, 1], the random graph G ∈ n,p is the graph on n vertices in which every possible edge is included independently with probability p. As a central part of random graph theory, so-called threshold phenomena have been studied extensively. Here, one is interested in threshold values such that if p is slightly larger or slightly smaller than the threshold, then Gn,p does or does not possess a certain property with high probability. Numerous properties have been considered with respect to their threshold, including the graph being connected or containing a certain subgraph (Bollobás, 2001). Further, also properties that are interesting to neuroscience may be considered, see Chapter 2. Meanwhile, random graph theory developed a broad class of random graph models, which are suitable to study all kinds of networks, including neural networks. The fruitful application of probability theory in many areas of mathematics combined with the rise of theoretical computer science eventually sparked the systematic study of randomness in compu- tation. This set the foundation for the field of randomized algorithms. Randomized algorithms use simulated randomness during their exe- cution to efficiently compute solutions (Mitzenmacher & Upfal, 2005). The resulting algorithms are often considerably simpler and, crucially, more efficient than their deterministic counterparts. In consequence, many algorithms running on our computers are in fact randomized.

5 1. introduction

For instance, testing if a number is a prime, which is an essential operation in cryptographic protocols we use in everyday life, is done by a randomized algorithm, the Miller Rabin primality test, although asymptotically more efficient deterministic algorithms are known. Similarly, the elementary operation of sorting numbers is typically im- plemented by the Quicksort algorithm, which deserves its name only because it is randomized. The list of examples, demonstrating how useful randomness is for computation, could be extended indefinitely, including algorithms for Hashing, Sampling, and Optimization. Thus, we may wonder whether randomness can also be useful in neural computation. Specifically, we may ask if the inherent random- ness of the nervous system is not just an unpreventable by-product of biological systems – but rather a principle of its structure and computation? In light of this question we return to the three examples we encoun- tered in our earlier comparison of the brain with digital computers. The stochasticity of synapses has been hypothesized, for example, to enable exploration of network configurations while at the same time maintaining the networks functionality (Kappel, Habenschuss, Legen- stein, & Maass, 2015). Moreover, the benefit of random connectivity in neural networks immediately follows from the numerous results in random graph theory, and is further detailed in Chapters 2 and 4. Finally, the trial-to-trial variability of neuronal activity has been proposed to realize complex probability distributions in neural net- works in such a way that computations necessary to perform inference are feasible (Habenschuss, Jonke, & Maass, 2013). The latter is inter- esting because various studies in cognitive science and neuroscience conclude that the brain in fact performs inference (Ernst & Banks, 2002).

6 chance as a principle of structure and computation

These examples are by no means exhaustive yet illustrate the ap- proach we follow in this thesis. We study the role randomness plays in structure and computation of neural networks, in the hope of better understanding the brain. Thereby, we rely on inspiration from the probabilistic method, random graph theory, and randomized algo- rithms. Concretely, we analyze simple mathematical models of neurons, synapses and their interaction in neural networks and examine emerg- ing properties that can be mathematically proven. By that, we may identify all necessary model assumptions and finally reduce the mod- els to contain only essential components, which can be fully under- stood. In addition, we simulate more complex models, which depict their biological archetype in greater detail, to test whether our in- sights generalize. This allows us to quickly examine, test and usually reject many ideas through purely theoretical considerations. In the case of useful ideas, these may inspire or guide concrete biological experiments and predict their outcome, or understand and interpret experiments already carried out. This thesis is a compilation of four independent papers. Their common theme is the motivation introduced above. Hence, in the remainder of this introduction, we present our contributions from this perspective. Further, each of the following sections summarizes one paper and starts with a brief repetition of the relevant neurobiological background. By that, we largely neglect to put our work into the context of current neuroscience research. Thus, to get the full picture, it is strongly recommended to also read the introductions of the individual Chapters 2, 3, 4 and 5.

7 1. introduction

1.4 Emergence of synfire chains

Anatomically, a neuron consists of three parts: the dendrites, which often look like a heavily branched tree, the cell body or soma, and the axon, which typically has the form of a long thin cable. Further, a neuron is confined by a thin cell membrane from the extracellular space and the difference in electrical potential between the inside and the outside of the neuron is called the membrane potential. Having defined the membrane potential, we can describe elementary neuronal computation as postulated in the law of dynamic polarization by San- tiago F. Ramón y Cajal: the dendrites serve as an input device, where most synapses are located and synaptic input changes the membrane potential. The soma is the central processing unit, and if the somatic membrane potential exceeds a certain threshold, the membrane poten- tial rapidly rises and falls within 2 ms. This stereotypical rise and fall in potential is called an action potential or spike. The underlying mech- anism of spike generation was discovered by Alan L. Hodgkin and Andrew F. Huxley in an early and beautiful symbiosis of mathematical modeling with physiological experiment (Hodgkin & Huxley, 1952). The axon is the output device of the neuron. If a spike is generated at the initial segment of the axon, the spike travels along the axon and finally via synapses as input to target neurons. The stereotypical form of spikes led to the conclusion that only the presence or absence of spikes may carry information. Hence, neurons send discrete binary signals. The time to transmit a spike from one neuron to the next takes only a few milliseconds. In contrast, the processing time of many tasks in the nervous system is much longer. For example, even in a simple reaction time test, where a subject is asked to press a button

8 emergence of synfire chains in response to a sound, the delay exceeds a hundred milliseconds. Therefore, in such a computation, a chain of spike transmission steps is involved. Consider the signal propagating along a chain of neurons, serially connected by single synapses. We immediately see that such an ar- rangement is flawed if its components are inherently unreliable: as mentioned earlier, synapses transmit spikes only with a certain proba- bility. Hence, the probability of successful signal transmission along the chain is exponentially small in its length, assuming independence of synaptic transmission. Further, temporal jitters of spike-timing accumulate along the chain and the reproduction of exact spike timing is hopeless. These limitations are overcome if the neurons are connected in a certain scheme, which was established by Moshe Abeles as synfire chain (Abeles, 1982). In a synfire chain, groups of neurons are serially connected such that neurons in one group form many synaptic con- nections to neurons in the subsequent group and few to other neurons. This leads to the signal being propagated in synchronous volleys of spikes along the chain, thus both reliable transmission and exact spike timing are ensured even in the presence of unreliability and noise. Therefore, synfire chains provide a candidate solution for stable and precisely timed multi-stage signal transmission in neural networks. How the specific connectivity scheme of synfire chains may emerge in initially unstructured neural networks is unclear. Notably, neural networks with random connectivity already contain an abundance of connectivity schemes resembling synfire chains – if the connection probability exceeds some threshold value (Abeles, 1991). This does not come as a surprise to us, knowing about classic work in random graph theory, including the threshold of subgraph containment (Bollobás,

9 1. introduction

2001). Unfortunately, in such ceompletely unstructured networks, the activity does not propagate along one synfire chain in a stable manner, but rather dies out or explodes quickly. In Chapter 2 we build on this observation. We propose a synaptic learning rule that removes a few synapses from an initially random network, thereby stabilizing one synfire chain through ongoing neu- ronal activity. This eventually leads to the emergence of long synfire chains. One of our main insights is that random networks with a certain connection probability are the perfect substrate for the emergence of synfire chains. In particular, they allow surpassing two main obstructions encountered in previous work. Firstly, it is sufficient that the learning rule makes only minor changes to the initial connectivity. More explicitly, it is not necessary that new synapses with specific targets are formed if the connection probability is above a certain threshold. Secondly, the absence of such structural plasticity and, crucially, a connection probability close to the threshold avoids the formation of short and cyclic synfire chains. These insights have largely benefited from intuition derived in random graph theory. Further, we analyze the proposed learning rule with respect to the capacity of the network. To this end, modeling the process of chain formation as a random graph process and applying tools from random graph theory permits not only to compute the length of the emerging synfire chains but also to show that the proposed learning rule is optimal in this regard.

10 rate based learning with short stimuli

1.5 Rate based learning with short stimuli

The immediate response of a neuron to an incoming action potential is a change in its membrane potential. The amplitude of this so-called postsynaptic potential depends on the efficacy of the transmitting synapse. This efficacy is usually abstracted as the synaptic weight. Crucially, synaptic weights are not fixed but are modifiable, in particu- lar in response to synaptic activity. This is known as synaptic plasticity, discovered by Tim V. P. Bliss and Terje Lømo (Bliss & Lømo, 1973). Synaptic plasticity is largely regarded as the basis of learning new skills and making memories. Yet, little is known about how synap- tic plasticity implements learning and memory in detail. A major challenge is that synaptic plasticity mechanisms (i.e. the underlying biophysical machinery) have limited access to relevant information, simply because of physical restrictions. Hence, synaptic plasticity can only depend on local quantities including the activity (e.g. spiking or not) of the presynaptic and postsynaptic neurons, the state (e.g. membrane potential or local calcium concentration) of the postsy- naptic cell or signals provided by neuromodulators (e.g. dopamine). Relating to computer science, a synapse may be considered an agent in a distributed computing network. How synaptic weights change as a function of local quantities has been studied in numerous experiments. In essence, these experiments measure the synaptic weight change in response to manipulation of certain local quantities according to a specific protocol. Notably, such experiments led to the discovery of learning rules that describe synaptic weight change as a function of time difference of presynaptic and postsynaptic spikes (Bi & Poo, 1998) or the firing rate of the presynaptic and postsynaptic neurons (Brown, Chapman, Kairiss, &

11 1. introduction

Keenan, 1988). Here, the firing rate of a neuron is simply the number of spikes it emits per unit of time. Such rate based learning rules relate signal to weight change under the assumption that the signal is encoded in the firing rate. Historically, the local quantity that has been considered to deter- mine the rate is spikes: to compute the rate from spikes one can simply count the number of spikes per unit of time. However, this computation is only meaningful if there are ‘enough’ spikes in the relevant time interval. Consider the following simple example. On the one hand, biologically relevant signals are typically short, say in the order of 50 ms. On the other hand, a typical neuronal firing rate is 40 Hz (Rieke, Warland, de Ruyter van Steveninck, & Bialek, 1999). Hence, in such a time interval one expects as little as 2 spikes. Considering realistic noise levels the variance in the sketched rate computation is immense. Thus, it is impossible to compute the rate with high accuracy in such a short time. Consequently, if rate based learning rules are implemented via spikes, then learning is restricted to long and stationary signals. This is the problem that we study in Chapter 3. The reason why it takes so long to estimate the rate from spikes is that there are only a few spikes in a short time interval. However, from discrete probability theory, we know that a larger number of samples may give a higher accuracy. In fact, this is exactly what the law of large numbers tells us. The firing rate of a neuron is a function of the synaptic input it receives. Further, the number of input spikes is much larger than the number of output spikes per unit of time, in particular, if excitatory and inhibitory inputs balance each other. These input spikes, nevertheless, are reflected in fluctuations of the membrane potential, which may be considered a local quantity as well.

12 mutual inhibition with few inhibitory cells

Thus, our idea is to compute the rate not from spikes but from the membrane potential. We formalize this idea in the classical framework of Richard B. Stein’s neuron model (Stein, 1965) and its diffusion approximation. In this model, under the assumption of balanced excitation and inhibition below threshold (i.e. the mean input is constant and not sufficient to trigger spikes), the of spiking is a Poisson process whereas the stochastic process governing the membrane potential dynamics is an Ornstein-Uhlenbeck process. Thus, computing the rate from spikes is then equivalent to estimating the rate of a Poisson process and computing it from the membrane potential boils down to estimating the fluctuations of an Ornstein-Uhlenbeck process. We find that the latter requires much less time because samples can be taken at a higher rate than the actual rate of the neuron. This confirms the intuition we obtained from discrete probability theory: increasing the number of samples increases the accuracy. Hence, if a plasticity mechanism uses the fluctuations of the membrane potential to realize a rate dependence, then rate based learning can deal with much shorter signals than if it was only based on spikes.

1.6 Mutual inhibition with few inhibitory cells

Signal transmission at chemical synapses works as follows: after the presynaptic neuron spikes, neurotransmitters are released at its axon terminals into the synaptic cleft (i.e. the small gap between the presynaptic and postsynaptic neuron). The released transmitters diffuse to the membrane of the postsynaptic cell, where they bind to transmitter-receptors. These receptors then activate ion channels, through which current flows, ultimately resulting in a change of

13 1. introduction the postsynaptic membrane potential. The effect of a presynaptic spike on the postsynaptic membrane potential depends on the type of transmitters that are released. There are transmitters that increase the membrane potential (e.g. glutamate) and others that decrease it (e.g. GABA). However, neurons release the same transmitters at all their axon terminals regardless of the identity of the target cell (with very few known exceptions). This phenomenon is known as Dale’s principle, attributed to Henry H. Dale (Dale, 1935). His principle allows to categorize neurons into excitatory and inhibitory neurons, depending on whether their spikes increase or decrease the membrane potential of target cells. Many computational neural network models in neuroscience and most artificial neural networks used in machine learning violate Dale’s law. In such cases, neurons may excite some and inhibit other targets, depending on the weight of their synaptic connection. The reasoning why such models are not in direct conflict with biological constraints is that it is easy to transform a neural network that does not respect Dale’s law into one that does. Each abstract neuron can be replaced by an excitatory and an accompanying inhibitory neuron, through which inhibitory signals are mediated. However, this construction has a fundamental flaw: it requires an equal number of excitatory and inhibitory neurons, whereas in real neural networks the number of inhibitory neurons is much smaller than the number of excitatory neurons. It is unlikely that networks built up of simple model neurons, which sum up their synaptic input and emit spikes if the summed input exceeds a threshold, can overcome this limitation because the number of synaptic weights to be stored in the network requires an equal number of excitatory and inhibitory neurons. However, such models

14 mutual inhibition with few inhibitory cells are oversimplifying as they neglect potential computation performed on the dendritic tree: synaptic inputs travel from potentially distant dendritic compartments towards the soma of the neuron, where spikes are generated; along the way, there is an abundance of biophysical mechanisms that implement nonlinear interactions with other synap- tic inputs. Such mechanisms may allow intricate computation, much more powerful than the simple summation of inputs. Exploiting these mechanisms to perform computation is known as dendritic computa- tion (London & Häusser, 2005). For example, an inhibitory input on the path of an excitatory input towards the soma can implement a logical NAND function between the two inputs (shunting inhibition). Further, spatially close excitatory synapses can cause a supra-linear response if they are co-active, implementing a logical AND function (coincidence detection via dendritic spikes). Moreover, dendritic com- putation can also regulate synaptic plasticity, for example, inhibitory input on the dendrite can prevent that spiking information, which is necessary for plasticity, travels back to specific synapses (Wilmes, Sprekeler, & Schreiber, 2016). In Chapter 4, we study if dendritic computation may allow that, in networks of excitatory and inhibitory neurons, all excitatory neurons can send specific inhibitory signals to all other excitatory neurons, a property called mutual inhibition, under the constraint that there are much fewer inhibitory than excitatory cells. Mutual inhibition has many desirable computational properties which are useful for example in the decorrelation of signals (Barlow & Földiák, 1989). Whereas the traditional model of mutual inhibition assigns one inhibitory neuron to each excitatory neuron, our idea is to assign a subset of inhibitory neurons to each excitatory neuron. To decode the subset and associate it with a specific weight by which the membrane potential of the

15 1. introduction target cell is modified, we propose that on the dendrites of excitatory neurons, the logical AND function of inhibitory synaptic activity is computed and multiplied by the weight. We speculate that the AND function is realized by nonlinear interaction of inhibitory synapses resembling dendritic spikes and that the weight corresponds to the distance between the synapses and the soma. Since the number of (possibly overlapping) subsets of inhibitory neurons is much larger than the number of excitatory neurons, even for a few inhibitory neurons, this approach has the potential to solve the problem and reduce the number of required inhibitory neurons substantially. However, the choice of inhibitory subsets that are associated with the excitatory neurons is crucial. If for example, two excitatory neurons get the same inhibitory subset, all excitatory neurons receive the respective inhibitory input even if only one of the excitatory neurons is active. Thus, at the heart of our model is the choice of these subsets, which ensures that mutual inhibition is implemented correctly. A family of subsets that satisfies the required properties (i.e. no not too large union of subsets contains another subset from the fam- ily) is known as cover-free set family (Füredi, 1996) in combinatorics. Interestingly, it can be shown using the probabilistic method that a suitable cover-free set family exists with high probability. In particular, assigning random subsets of inhibitory neurons to excitatory ones reliably implements mutual inhibition. Besides admitting an elegant proof, the random construction is interesting from a biological point of view because it shows that no specific connectivity scheme between excitatory and inhibitory neurons is necessary, yet random connectiv- ity provides the desired structure. This makes the proposed model more biologically plausible and may serve as an instructive example of the use of randomness in the structure of the nervous system.

16 lognormal network synchrony in ca1

1.7 Lognormal network synchrony in CA1

The hippocampus is a part of the brain that is believed to be involved in the transfer of short-term memories to long-term memory. In partic- ular, during sleep, the neuronal activity that may represent memories acquired during the day is replayed in order to be written into the cortex in a consolidation process. These replay events are associated with certain stereotypical neural network activity, termed sharp-wave ripples (SPW-Rs), which occur in the CA3 and CA1 region of the hippocampus (Buzsáki, 2015). First, an ensemble of neurons in CA3 spikes in a network burst. Second, the spikes are transmitted to the CA1 region where another ensemble of neurons spikes synchronously in response. The name SPW-Rs originates from their discovery in local field potential recordings: a sharp wave in the local field potential reflects the strong synaptic input towards CA1 and the ripple is a fast oscillation, caused by the interplay of synchronous activity of excitatory and inhibitory neurons in CA1. SPW-Rs have been studied extensively because they provide a relatively easy to detect and reoc- curring event that is not directly caused by external input but reflects internal information processing. The computation performed in the transmission step from CA3 to CA1 is still unclear. Experiments revealed that the number of CA1 neurons that spike during SPW-Rs follows a lognormal distribution. A random variable follows a lognormal distribution if its follows a normal distribution. Simplifying, this means that most of the time, the number of CA1 neurons participating in a SPW-R is small, whereas sometimes it is atypically large. The origin of this phenomenon is unknown. It is often instructive to study the origin of the distribution of a

17 1. introduction quantity as it may allow to understand the underlying processes. For example, the lognormal distribution arises naturally if a quantity is the product of many independent positive components: taking the logarithm transforms the product into a sum and the sum of many independent random variables follows a normal distribution according to the central limit theorem. This offered, for example, an explanation of the lognormal distribution of synaptic weights via multiplicative synaptic plasticity (Loewenstein, Kuras, & Rumpel, 2011). In Chapter 5 we study a simple model of the CA3-CA1 circuit with respect to the distribution of CA1 neurons participating in SPW-Rs. We find that if the size of the CA3 network bursts is normally distributed, then the size of the CA1 activity in response follows a lognormal distribution. We derive this result by showing that synchronous transmission over one synaptic layer transforms a normal distribution into a lognormal distribution. Thereby, we predict that the activity in CA3 is normally distributed and that the computation performed in the transmission is a certain signal transformation. In contrast to the previous chapters, where we showed how ran- domness may be exploited for structure and computation, here we simply aim to postdict an experimental observation. Doing so, we rely on the intuition gained in the study of a graph process called bootstrap percolation on random graphs (Janson, Łuczak, Turova, & Vallier, 2012). In particular, the synchronous transmission of spikes from CA3 to CA1 during SPW-Rs may be modeled as the first round of this process.

18 2 Emergence of synfire chains

The results in this chapter were obtained in joint work with Florian Meier, Jo- hannes Lengler, Hafsteinn Einarsson and Angelika Steger, see (Weissenberger, Meier, Lengler, Einarsson, & Steger, 2017).

2.1 Introduction

A synfire chain is a connectivity scheme that connects a sequence of neuron groups of roughly the same size, called patterns, in a neural network, see Figure 21; the connectivity is such that synchronous ac- tivity in one pattern elicits synchronous activity only in the following pattern after one synaptic delay (Abeles, 1982). Several theoretical re- sults indicate that activity propagates in synchronous volleys of spikes along synfire chains (Abeles, 1982; Gewaltig, Diesmann, & Aertsen, 2001; Goedeke & Diesmann, 2008) and that this works robustly even in a noisy environment (Hertz, 1997; Diesmann, Gewaltig, & Aertsen, 1999; Aviel, Mehring, Abeles, & Horn, 2003). Synfire chains have become an important model for multi-stage signal transmission in the brain (Diesmann et al., 1999; Vogels, Rajan, & Abbott, 2005). Traces of synfire chains have been found in various brain areas across species (Abeles, Bergman, Margalit, & Vaadia, 1993; Prut et al., 1998; Nádasdy, Hirase, Czurkó, Csicsvari, & Buzsáki, 1999; Hahnloser, Kozhevnikov, & Fee, 2002; Reyes, 2003; Ikegaya et al., 2004; Segev, Baruchi, Hulata, & Ben-Jacob, 2004; Luczak, Barthó, Marguet, Buzsáki, & Harris, 2007; Tang et al., 2008; Long, Jin, & Fee, 2010), although,

19 2. emergence of synfire chains with current recording techniques it is still difficult to unambiguously verify their existence (Gerstein, Williams, Diesmann, Grün, & Tren- gove, 2012). Aside from their role as a model of signal transmission, synfire chains have been successfully applied to many computational tasks (Jacquemin, 1994; Aertsen & Braitenberg, 1996; Arnoldi, En- glmeier, & Brauer, 1999; Abeles, Hayon, & Lehmann, 2004; Hayon, Abeles, & Lehmann, 2005; Izhikevich, 2006). From a theoretical point of view, synfire chains have been intensively studied over the last decades (Abeles, 2009). In particular, it has been shown that synfire chains can be embedded into recurrent neural networks (Bienenstock, 1995; Herrmann, Hertz, & Prügel-Bennett, 1995; Mehring, Hehl, Kubo, Diesmann, & Aertsen, 2003; Aviel et al., 2003; Leibold & Kempter, 2006; Kumar, Rotter, & Aertsen, 2008; Trengove, van Leeuwen, & Diesmann, 2013). While this embedding is well understood, the question of how synfire chains emerge in initially unstructured networks is still far from solved; this is the question that we address here. Already in 1991, Moshe Abeles made the observation that sparse random networks (as observed locally throughout cortex) contain an abundance of connectivity schemes similar to synfire chains (Abeles, 1991). However, in such networks the activity does not propagate along a single chain but rather diverges quickly, resulting in ‘chaotic’ network behaviour (van Vreeswijk & Sompolinsky, 1998). In this paper, we study whether there exist learning rules which ‘stabilize’ these connectivity schemes and yield emerging synfire chains in such networks. We find that spike-timing dependent plasticity (STDP) modulated by the global activity in the population gives a positive answer: a long chain grows in an unsupervised way from a set of neurons (stimulus)

20 introduction which are synchronously stimulated with low frequency (multiple stimuli yield multiple chains). The resulting learning rule is a three- factor learning rule, this is, in addition to the pre- and postsynaptic spike time, it depends on a third factor (for review of such rules, see (Frémaux & Gerstner, 2015; Pawlak, Wickens, Kirkwood, & Kerr, 2010)). The third factor is the global activity in the population and it determines the polarity of STDP. Since the global activity is a feedback signal from within the network, it has been termed internal feedback in similar learning rules (Urbanczik & Senn, 2009; Friedrich, Urbanczik, & Senn, 2011; Brea, Senn, & Pfister, 2013). The internal feedback fosters neurons to participate multiple times in the chain or in multiple chains. Neurons are reused within a single chain and across multiple chains, which increases the network capacity and is in agreement with experimental observations (Abeles et al., 1993; Segev et al., 2004; Luczak et al., 2007). Interestingly, the restriction to sparse connectivity prevents the chains from becoming short and cyclic and shows that the formation of specific new synapses is not essential in the process of chain development, opposed to previous speculations (Jun & Jin, 2007). We analyze the rule mathematically in a simple network of binary threshold neurons and show that it is optimal: no learning mecha- nism which starts with a sparse random network and does not add additional synapses can form longer chains asymptotically without in- troducing strong correlations between the patterns. Subsequently, we investigate and simulate the rule in a network of conductance-based leaky integrate-and-fire (LIF) neurons and find that the emerging connectivity scheme resembles the one of the simple network. As an application, we show that the emerged chains can be used to learn sequences of precisely timed neuronal activity in a ‘one-

21 2. emergence of synfire chains shot’ fashion: once the synfire chain is established in some neuronal population, a sequence of neuronal activity in a different population is learned by modifying the synapses between the two populations with a Hebbian rule. The model solves similar tasks as proposed in (Lazar, Pipa, & Triesch, 2009; Brea et al., 2013), however, our learning procedure needs to be exposed only once to the sequence to be learned (one-shot learning).

22 introduction

Time

Figure 21: Illustration of a synfire chain in a network of n = 8 neurons over the time course of 4 time steps (each time step shows the entire network). The chain has pattern size m = 3 (i.e. in every time step groups of 3 neurons are active, indicated by blue color), spike threshold k = 2 (i.e. neurons turn active if they receive signals from at least two neurons which have been active in the previous time step, indicated by pink color), and length 4. The patterns do not need to be disjoint, for example the second neuron from the top participates in the first and the last pattern. In real neural networks m and k are assumed to be much larger.

23 2. emergence of synfire chains

2.2 Materials and Methods

In this section, we first introduce our learning rule in a simple network of binary threshold neurons and later transfer it to a network of conductance-based LIF neurons. Second, we propose a network model for one-shot learning of sequences. We start with a simplified model in which time is divided into time steps (of length roughly one axonal plus synaptic delay), neurons are binary threshold neurons, and inhibition and the feedback signal are precise. These restrictions allow a precise mathematical treatment and we relax them below (imprecise inhibition and feedback mechanism) and further in Section 2.2.2 in a network of spiking neurons.

2.2.1 Simple model We consider a population of n excitatory binary threshold neurons (McCulloch & Pitts, 1943) over the course of discrete time steps. The activity of each neuron v at time t is a binary variable x (v) 0, 1 . t ∈ { } The initial network structure is given by a directed graph G = (V, E), with vertex set V and edge set E abstracting the neurons and synapses. We consider a sparse random network with connection probability p.A random network corresponds to a directed version of Gn,p, the Erd˝os- Rényi random graph (Erd˝os& Rényi, 1959): between each pair of neurons a synapse is present, independently with probability p. The network is sparse if p 1.  The synapses are multistate synapses (Ben Dayan Rubin & Fusi, 2007), and their state yt depends on internal metaplasticity parameters (introduced below): if yt(uv) = 1, then the synapse from neuron u onto neuron v is active in the sense that it transmits signals, and if

24 materials and methods

yt(uv) = 0, then it is silent meaning that it does not transmit signals at time step t. The input of neuron v in time step t is comprised of three sources: (1) excitatory input from neurons within the network which spiked in the previous time step, (2) external input representing spontaneous activity and denoted by St(v), and (3) the inhibitory input from a source of global inhibition, denoted by It. The state of neuron v in time step t is thus given by ! xt(v) = H k + St(v) + It + ∑ xt 1(u) yt 1(uv) , − uv E − · − ∈ where H is the Heaviside step function, with H(x) = 1 if x 0 ≥ and H(x) = 0 otherwise. This determines the spike threshold k of the neurons. The external input triggers a spike (in the absence of inhibition, regardless of excitation from within the network) with probability pspon independently for all neurons (i.e. St(v) = k with probability pspon and St(v) = 0 otherwise). Inhibition prevents any spike if the activity in the previous time step was too large (i.e. if ∑v V xt 1(v) > m, then It = ∞ and It = 0 otherwise). Therefore, ∈ − − the parameter m of the inhibition determines the pattern size, as will become clear later. Note that here the inhibitory pathway is at least twice as fast as the excitatory one. However, this assumption is not crucial as indicated below.

Learning rule. The state of each synapse uv at time t is determined by the consolidation value c (uv) 0, . . . , c of the synapse and the t ∈ { ∗}

25 2. emergence of synfire chains irresolution value r (uv) 0, . . . , r of the synapse: t ∈ { ∗} ( 1 if ct(uv) > 0 and rt(uv) < r∗, yt(uv) = 0 otherwise.

These two metaplasticity parameters have the following purpose. The consolidation value can only be increased or decreased by 1, so if ct(uv) is small (large), then the synapse can easily (hardly) turn silent (initially the consolidation value of all synapses is small). The irreso- lution value counts how often a synapse turned from active to silent, and the synapse is removed if it did so too often (initially, all synapses have irresolution value 0). The learning rule is an STDP rule modulated by a feedback signal. This feedback signal Ft is triggered if the activity in the network is too large (i.e. Ft = 1 if ∑v V xt(v) > m and Ft = 1 otherwise). As the − ∈ feedback is determined from within the network, the feedback signal is internal. The learning rule modifies the consolidation value of a synapse as follows:

c + (uv) := c (uv) + x (u) x + (v) F + , t 1 t t · t 1 · t 1 which is clipped to stay between 0 and c∗. On the one hand, if the feedback signal is not present, then a presynaptic spike in the time step before the postsynaptic spike causes LTP (i.e. long-term synaptic potentiation: the synapse increases its consolidation value by 1, in case the consolidation value is not at its maximum c∗). If the consolidation value was 0, this means that the synapse turns from silent to active. On the other hand, if the feedback signal indicates that the activity is too large, then it causes LTD (i.e. long-term synaptic depression: the synapse decreases its consolidation value by 1, if the consolidation

26 materials and methods value is positive) if the presynaptic neuron spikes in the time step before the postsynaptic neuron. If the consolidation value was 1, this means that the synapse turns from active to silent. Further, the learning rule modifies the irresolution value of a synapse as follows:

r + (uv) := r (uv) + y (uv) (1 y + (uv)), t 1 t t · − t 1 which is clipped to stay between 0 and r∗. Thus, if a synapse turns from active to silent, then it increases its irresolution value by 1. Note that the irresolution value can never decrease. Hence, as soon as it reaches its maximum r∗ the synapse cannot get active ever again, and we say that the synapse is removed from the network. We distinguish three basic states of a synapse. First, we say that a synapse is present in the network if its irresolution value is smaller than r∗ (i.e. the synapse has not been removed). Second, we call a synapse active if it is present and its consolidation value is at least 1 (i.e. the synapse transmits signals). Third, a synapse is consolidated if it is present and its consolidation value attains the maximum c∗.

Learning procedure. A subset of m neurons, called stimulus and de- noted by A1, is repeatedly activated synchronously by external input and the chain will grow starting from this stimulus. The time between two reactivations of the stimulus is one round of the learning proce- dure. In each round, the activity spreads – time step by time step – through the chain developed so far until it dies out (if the activity becomes too large it is stopped by inhibition). If at the beginning of each round one of several stimuli is activated synchronously, then from each of those stimuli a chain grows.

27 2. emergence of synfire chains

Relation of connectivity and dynamics. In this network model, the connectivity of the network and the spread of activity in the network (i.e. its dynamics) are closely related, particularly in the absence of spontaneous activity and inhibition. We now introduce useful notation and highlight this property. By underlining (overlining) the notions introduced below, we in- dicate that they concern present (consolidated) synapses. Otherwise, they concern active synapses. Consider a network G = (V, E). For a neuron v V and a set of ∈ neurons A V, we denote by degA(v) the indegree (convergence) of ⊆ v with respect to A. This is the number of neurons in A projecting to v via active synapses. As mentioned above, the number of neurons in A projecting to v via present (consolidated) synapses is denoted A degA(v) (deg (v)). We abbreviate deg(v) := degV (v). For k N, we ∈ denote by Γk(A) := v V degA(v) k the k-neighborhood of A. { ∈ | ≥ } This is the set of neurons with at least k in-neighbors in A. If A is the set of neurons which spike at time t, then Γk(A) is the set of neurons which spike at time t + 1 (in the absence of spontaneous activity and inhibition). For two sets A, B V, we write E(A, B) for the set of ⊆ synapses with presynaptic neuron in A and postsynaptic neuron in B. The state of synapses changes over time due to learning. We indi- cate this by introducing time to these notions. We denote by Et the k synapses, by degt(v) the indegree of v, by Γt (A) the k-neighborhood of A, and by Et(A, B) the synapses from A to B in the network at time step t. The corresponding density of synapses is then defined as p := E /n2. t | t| In the absence of spontaneous activity and inhibition, we denote the set of neurons spiking at time t by At. Thus, the spread of

28 materials and methods

activity starting from a stimulus A1 can be recursively defined as k At+1 = Γt (At). In this case, the sub-network of active synapses defines the spread of activity. In the presence of spontaneous activity, + we denote the set of neurons spiking at time t by At and the neurons which would spike even in the absence of spontaneous activity by At.

Synfire chains. A synfire chain is a structure in a neural network connecting neuron groups of roughly the same size (patterns) in a sequence such that synchronous activity in one pattern elicits syn- chronous activity in only the following pattern after one synaptic delay, see Figure 21. Griffith proposed the underlying connectivity scheme (Griffith, 1963) and Moshe Abeles established it together with its dynamics as synfire chain (Abeles, 1991). If in our model the activation of the stimulus A1 results in the activation of exactly m neurons in the subsequent l 1 time steps (i.e. − if A = ... = A = m holds in the absence of spontaneous activity), | 1| | l| then the activity propagates along a synfire chain of pattern size m, spike threshold k, and length l with patterns A1,..., Al. We define the length of the synfire chain starting from the stimulus A1 to be the smallest l such that A = m or A = A for some 1 t l (the | l+1| 6 l+1 t ≤ ≤ second condition defines the length for cyclic chains).

Relaxations. The simple model introduced above works with unre- alistically precise constraints. For example, inhibition is triggered if more than m neurons spike in one time step. While this simplifying approach helps tremendously to understand how the model works, it is important to observe that none of these constraints is essential for the process of chain formation. For this reason, we sketch a relaxed model in which the hard constraints are mitigated (below we also dis-

29 2. emergence of synfire chains cuss a continuous-time model with spiking neurons, see Section 2.2.2). As in the simple model, we assume that the same mechanism triggers inhibition and the feedback signal. Recall that there, the mechanism can detect exactly whether or not more than m neurons spike in one time step. In the relaxed model we use a probabilistic mechanism with accuracy probability p : if A+ > m, then the mechanism detects acc | t | with probability min(1, ( A+ m) p ) that too many neurons spike | t | − · acc in step t. Hence, if the number of active neurons exceeds the pattern size m by 1, then the mechanism detects this with probability pacc, but if the number of active neurons is at least m + 1/pacc, then it does so with probability 1. Note that this does not only relax the accuracy, but also the timing of the mechanism since a too large activity may not be detected right away but only several steps later. Moreover, we make the learning rule probabilistic through the parameters pinc and pdec. A synapse responds to an LTP signal with probability pinc and to an LTD signal with probability pdec. Hence, an active set of size larger than m yields an LTP signal (with probability at most p (1 p )) or an LTD signal (with probability inc · − acc at least p p ). We thus require p (1 p ) < p p to acc · dec inc · − acc dec · acc achieve stability.

Parameters. The following parameters were used in Figures 22–25 if not explicitly specified otherwise in the respective caption. We simulate a network of n = 500 neurons where the pattern size is m = 30 and the spike threshold is k = 3. This determines the connection probability p = (1 + δ) p, where p is roughly 15/n according to · Equation (2.3) from Section 2.3.1 below and δ = 0.2. We simulate both the basic (pacc = 1) and the relaxed version (pacc = 0.2). We set the remaining parameters to pdec = 1, pinc = 0.1, c∗ = 100, and r∗ = 50.

30 materials and methods

We take pinc = 0.1 for both the simple and the relaxed model to make them comparable. The default initial distribution of active synapses is uniform (i.e. pact = 0.5). Active synapses are initialized with consolidation value 1 and all synapses are initialized with irresolution value 0. To speed up the simulations we use an artificial mechanism of spontaneous activity: if the activity is too small in one round, we additionally activate a random neuron (this is not crucial for chain formation, see Section 2.2.2).

Technical assumptions and notation. We carry out an asymptotic analysis with the number of neurons going to infinity (i.e. n ∞). In → asymptotic statements, we write a b to indicate that a and b agree ∼ up to smaller order terms (i.e. a = (1 x) b, with x 1) and a b ± ·  ≈ to indicate that a is of the order of b (i.e. a = c b, with constant c > 0). · We require m n and m log n (e.g. m = √n) and choose the   connection probability p = (1 + δ) p, where p satisfies Equation (2.3) · from Section 2.3.1 below and δ 1. The spike threshold k 2 is a  ≥ constant integer. In our analysis we assume that degAt (v) is Bin( A , p )-distributed, t | t| t independently for different v. Analogously, for present and consol- idated synapses. This condition is not fully satisfied: if u and v are neurons such that deg(u) > deg(v) then u has a larger probability to appear in a pattern than v. This means that for fixed t, neuron u is more likely to appear in At than v. Therefore, we also consider random networks with fixed indegree d, denoted Gn,d. To generate such a network, every neuron v chooses an input set I V of d v ⊆ vertices uniformly at random, and we insert all synapses uv for u I . ∈ v

31 2. emergence of synfire chains

2.2.2 Network of spiking neurons In this section we transfer the learning rule from Section 2.2.1 to a network of conductance-based leaky integrate-and-fire neurons (LIF neurons) with continuous-time dynamics. We simulated the network using the NEST neural simulation tool (Gewaltig & Diesmann, 2007). The excitatory population consists of n = 200 conductance-based LIF neurons (NEST iaf_cond_exp) with membrane capacitance Cm = 1 µF/cm2, leak reversal potential V = 60 mV, excitatory reversal l − potential VE = 0 mV, inhibitory reversal potential VI = 70 mV, 2 − constant leak conductance gL = 0.4 mS/cm , threshold potential V = 50 mV, and synaptic time constants τ = 4 ms, in accordance θ − s with (Fiete, Senn, Wang, & Hahnloser, 2010). The refractory period is t = 25 ms and the reset potential is V = 60 mV, as imple- re f reset − mented by individual inhibition in (Fiete et al., 2010). Each excitatory neuron spikes spontaneously according to a Poisson process with rate λ 0.03 Hz. spon ≈ The excitatory population is randomly interconnected by plastic synapses (adaptation of the synapse introduced in Section 2.2.1, dis- cussed below) such that every excitatory neuron has indegree 35 (i.e. δ 0.4). The delay of the plastic synapses is d = 5 ms, making ≈ EE up for the burst time in (Fiete et al., 2010). The weight of an active synapse is such that 5 EPSPs occurring in a short period trigger a spike (i.e. k = 5), whereas the weight of a silent synapse is 0. Initially, all synapses are silent. Inhibition is implemented by a single neuron (NEST iaf_cond_exp) 2 with membrane capacitance Cm = 1 µF/cm , leak reversal potential V = 60 mV, excitatory reversal potential V = 0 mV, inhibitory l − E reversal potential V = 70 mV, constant leak conductance g = I − L

32 materials and methods

0.4 mS/cm2, threshold potential V = 50 mV, and synaptic time con- θ − stants τs = 2 ms. The refractory period is tre f = 5.0 ms and the reset potential is V = 60 mV. Every excitatory neuron is connected to reset − the inhibition via a static synapse (NEST static_connection) with delay dEI = 1 ms. The weight of such a synapse is such that 21 EPSPs occurring in a short period of time trigger a spike (i.e. m = 20). More- over, the inhibition is connected to all excitatory neurons via a static synapse (NEST static_connection) with delay dIE = 1 ms where the weight is chosen such that hyperpolarized neurons can essentially not spike in the time interval when the next excitatory input is expected.

Learning rule. The plastic synapses have the same properties as in- troduced in Section 2.2.1 and obey the same learning rule, where we introduce an STDP window to make up for the absence of time steps. They are an adaption of NEST stdp_dopa_connection, which implements the feedback signal as a neuromodulator (Potjans, Morri- son, & Diesmann, 2010)). Let tpre (tpost) be the time of a presynaptic (postsynaptic) spike. The learning rule is as follows:

if there is no feedback signal in the time interval [t , t + ∆ ] • pre pre − and ε t t ∆+, then LTP is triggered; ≤ post − pre ≤ if there is feedback signal in the time interval [t , t + ∆ ] and • pre pre − ε t t ∆ , then LTD is triggered (if two such intervals ≤ post − pre ≤ − overlap, then LTD is triggered only once).

For each presynaptic spike, only the closest postsynaptic spike is considered. If the consolidation value of a synapse reaches c∗, then it cannot be decreased. The feedback signal is coupled to the inhibition, if the inhibition spikes, then 0.1 ms later the feedback signal is present.

33 2. emergence of synfire chains

+ The synaptic parameters are ∆ = 10 ms, ∆− = 60 ms, ε = 3 ms, c∗ = 100, and r∗ = 50. Moreover, plastic synapses are subject to a small decay which triggers LTD with a rate of 0.6 Hz (Miller & Jin, ≈ 2013).

Choice of parameters. The parameters must satisfy some conditions we list here. The positive learning window ∆+ must be such that only synapses connecting neurons in subsequent patterns get consolidated by LTP. Therefore, it is of the order of the synaptic delay or the burst time. The negative learning window ∆− however, is chosen such that LTD can remove any conflicting synapses. Thus, ∆− is on the same time scale as the membrane time constant. The decay must be strong enough to prevent synapses which do not connect neurons in subsequent patterns from becoming essential parts of the chain, as this would create instability since they are not consolidated by LTP. Furthermore, the (long) refractory period is a simplified model of a mechanism preventing that neurons burst due to slowly arriving synaptic current. Such mechanisms include specific inhibition for each neuron as in (Fiete et al., 2010), neuron models with adaptation, or a strong relative refractory period. The feedback signal is coupled to the inhibition almost instantaneously. However, a feedback signal affecting a pre-post pair can arrive anywhere in the interval [tpre, tpre + ∆−] (and the right bound of the interval is arbitrary and can be much larger). Thus, a longer delay between inhibition and feedback signal, as expected for neuromodulatory signals, is feasible.

Simulation. The stimulus is a random subset of the excitatory pop- ulation containing m = 20 neurons. The stimulus is simultaneously activated every 300 ms. We perform 150, 000 reactivations of the stim-

34 materials and methods ulus in a continuous segment. These reactivations correspond to roughly 12.5 h of simulated time. The resolution of the simulation is 0.2 ms.

2.2.3 Network model for one-shot learning of sequences Here, we sketch a neural network model of a short-term memory for sequences. Already in the 1950s, Lashley suggested that memory items cannot be directly linked together to form a sequence (Lashley, 1951). This suggestion led Conrad to his positional theory of sequence learning in short-term memory (Conrad, 1965), which is now known as Conrad’s boxes: he suggested that each item is linked to a box and that sequence recall corresponds to stepping through the boxes in sequential order. This idea inspired our model: we represent the boxes by the patterns of a synfire chain and the linking to memory items is done by Hebbian learning in a one-shot fashion such that the sequence to be learned needs to be presented only once.

Network architecture. The network consists of a hidden layer and a visible layer. The hidden layer is a sparse network as described in Section 2.2.1 and it will contain a synfire chain representing the sequential ordering. For simplicity, the visible layer is comprised of entities which represent the symbols in , the underlying alphabet S of the sequences to be learned. Moreover, the visible layer is a 1- winner-takes-all (WTA) network. That is, at each time step during recall the entity with the largest input is active (in a slightly more complex setup, if the visible layer is a population of neurons without WTA dynamics, then sequences of precisely timed neuronal activity in the visible layer can be learned). The two layers are connected

35 2. emergence of synfire chains via afferent synapses from the hidden layer to the visible layer (each possible afferent synapse is independently present with probability pa f f ). The learning rule of these afferent synapses is Hebbian as discussed below. For an illustration of the network architecture, see Figure 27 (a). Note that the described architecture with a hidden network layer and visible read out units resembles SORNs (Lazar et al., 2009), reservoir computing (Maass, Natschläger, & Markram, 2002), and can also be found in recent model of hippocampal replays (Gauy et al., 2018).

Learning and recall. Learning takes place in two phases. In the first phase, a synfire chain A1,..., Al emerges in the hidden layer, as shown below in Section 2.3.2. This first phase may take a long time, but it is completely self-organized in the sense that it is unsupervised and does not require any external input; in particular it does not require any information about the sequences to be learned. The network automatically converges to a state with a long chain, and it will stay in this state after convergence. Hence, there is no need for a supervisor who decides when the first phase should end, since it can just go on indefinitely. For an illustration of the network after the first phase of learning, see Figure 27 (b). In the second phase, the network learns an input sequence from a single presentation (i.e. in a one-shot fashion). More precisely, we assume that all afferent synapses are initially silent. When an input sequence s ,..., s with s is to be learned, the entity s is 1 l t ∈ S t activated in the t-th time step (by the teacher). Moreover, the stimulus A1 is activated in the first step. The learning rule is a simple Hebbian learning rule: in each step, all afferent synapses between the active pattern and the active entity turn strong (Erickson, Maramara, &

36 results

Lisman, 2010). Note that by the construction of the chain, in the t-th step the active pattern in the hidden layer is At, and the active entity in the visible layer is st. For an illustration of the network after the second phase of learning, see Figure 27 (c). For recall, the stimulus in the hidden layer is activated, and activity propagates through the network. The output of the network at time t is the active entity in the visible layer at time t. The length of the recalled sequence is the number of time steps until the first mistake.

2.3 Results

We start with an outline of our principal findings, which are then explained in greater detail below. In Section 2.3.1 we substantiate the observation made by Moshe Abeles that random networks without learning contain the connectiv- ity scheme underlying synfire chains. However, we find that even if parameters are fine tuned – in the absence of learning – the activity diverges after log m steps and thus the length of a functional synfire ≈ chain with pattern size m in a random network of n neurons is only in that order of magnitude. In Section 2.3.2, we describe how long chains emerge due to our learning rule: the stimulus is repeatedly activated and the activity propagates along the chain developed so far until at the end of the chain either new neurons are recruited by spontaneous activity and LTP to grow a pattern, or LTD carves out a pattern of the correct size. The chain is stabilized by the metaplasticity parameters of the synapses so that the growth stops as soon as the capacity of the net- work is reached. After convergence, the network remains in a stable state. Simulation results are summarized in Figures 22–25. The simu-

37 2. emergence of synfire chains lation reveals that the rule works well even in the relaxed model with imprecise mechanisms (Figure 22). We show that there is a trade-off in the size of the connection probability. If it is too small, then no synfire chain can develop in the network, and if it is too large, then the chains get short and cyclic (i.e. patterns are highly correlated; confirmed by simulation in Figure 25. Since large connection proba- bility corresponds to the possibility of forming (almost) all synaptic connections, this observation shows that formation of synapses is not only not required but actually obstructive for the process of chain formation. Moreover, Figure 23 shows the maximum overlap between patterns and implies efficient reuse of neurons. Finally, we show by simulation that indeed multiple chains emerge if multiple stimuli are activated (Figure 24). In Section 2.3.3, we determine the length of the chains asymptoti- 2 1/k cally to be in the order of (n/m) − log m, where n is the number of neurons, m is the pattern size of the chain, and k is the spike threshold, see Equation (2.11). Thus learning improves the length by a factor of (n/m)2 1/k and neurons get reused (n/m)1 1/k log m times ≈ − ≈ − on average in the chain. Note that for m = √n the improvement is essentially by a factor proportional to the number of neurons in the network. Simulations confirm the asymptotic results for finite n, see Figure 22, if parameters admit the assumptions made in the analysis. In Section 2.3.4, we show that one cannot hope for longer chains (unless the patterns in the chain may be highly correlated) by pre- senting an asymptotically matching upper bound, which holds for all non-structural learning rules: by an information theoretic argument every learning procedure where each pattern contributes a γ-fraction of its maximal information, and no additional synapses are formed can only produce a chain of length at most (n/m)2 1/k/γ. ≈ −

38 results

In Section 2.3.5, we investigate the learning rule in a network of conductance-based LIF neurons. Here, we present simulation results (Figure 26) and argue that in both models essentially the same connec- tivity scheme emerges in the network. Finally, in Section 2.3.6, we demonstrate that the developed synfire chains can be used to learn sequences in a one-shot fashion, see Figure 28.

2.3.1 Sparse random networks contain many synfire chains Already in 1991, Moshe Abeles observed that sparse random net- works (with parameters as locally observed in the cortex) contain an abundance of connectivity schemes resembling synfire chains (Abeles, 1991). Here we show that although random networks with large enough connection probability contain many synfire chains, the ac- tivity does not propagate along a single chain, but rather explodes (even if parameters are fine tuned). Our learning rule avoids this by removing few redundant synapses. Similarly, it has recently been shown that modifying only a small fraction of synapses in a random network is sufficient to match recorded sequences of neural activity closely (Rajan, Harvey, & Tank, 2016). For now, consider the simple model without spontaneous activity, inhibition, and learning. Under these assumptions, all synapses may be considered active (since silent ones cannot get active in case no learning is involved). Let A be a set of m neurons. Since each synapse is independently present with the connection probability p, the indegree degA(v) of each neuron v is Bin(m, p)-distributed, independently of all other

39 2. emergence of synfire chains neurons’ indegrees, and we get   k m i m i E[ Γ (A) ] = n ∑ p (1 p) − (2.1) | | · i k i − ≥ (mp)k n .(2.2) ∼ · k! We define the equilibrium connection probability p as the connec- tion probability satisfying the equilibrium condition E[ Γk(A) ] = m. | | Solving Equation (2.1) for p yields

 k! 1/k p .(2.3) k 1 ∼ nm − If the connection probability is significantly smaller or larger than p, then a few time steps (more precisely, log log n steps (Janson et ≈ al., 2012)) after activating m neurons, either zero or all neurons spike in one time step. In the first case, the network typically contains no synfire chain with pattern size m and spike threshold k, whereas in the second case an abundance of them are present since one can pick the next pattern of size m from the k-neighborhood of the current pattern and proceed recursively. However, even if the connection probability is exactly equal to p, then after log m time steps the activity is either ≈ larger than 2m or smaller than m/2 and thus the length of a functional synfire chain in the network is only log m. This can be seen as ≈ follows. From Equation (2.1) we see that if we have A = (1 + δ)m, where | | δ 1 may be positive or negative, then  ((1 + δ)mp)k E[ Γk(A) ] n (1 + kδ)m (2.4) | | ∼ · k! ∼

40 results holds. Thus, the error δ is multiplied by a factor of k. In the equilibrium condition, we have that if A m, then Γk(A) is | | ≈ | | Bin(n, (m/n))-distributed, which has variance m. Therefore, ≈ ≈ the relative error is 1+ √m/m = 1 + 1/√m. If we start by activat- ≈ ing A1 of size m and denote the error in each time step by δt, then under the assumption that the indegree degAt (v) of each neuron v is Bin( A , p)-distributed, independently of all other neurons indegrees, | t| the error grows like

t t k δ + kδt = k δ .(2.5) t 1 ∼ 1 ≈ √m

In particular, the relative error will be a constant factor for t ∼ log(m)/(2 log k).

2.3.2 Learning rule stabilizes synfire chain In this section we demonstrate how a synfire chain grows during the learning procedure, for simulation results see Figures 22–25. Consider the simple model with spontaneous activity, inhibition, and learning. We start by describing the dynamics of the network in a single round of learning: activation of the stimulus A1 results in the activation + + of A2 in the second time step, which activates A3 in the third time step and so forth. Consider the t-th time step. We distinguish three scenarios. First, if A+ is zero, the activity dies out and learning | t | continues by reactivating the stimulus in the next round. Second, + if At m (but larger than zero), then by the learning rule all | | ≤ + + synapses in Et 1(At 1, At ) (i.e. all synapses in the network that − − + connect a presynaptic neuron in At 1 and a postsynaptic neuron + − in At ) increase their consolidation value due to LTP; in this way

41 2. emergence of synfire chains

+ silent synapses can become active. Third, if At > m, then synaptic | + | + depression is triggered: all synapses in Et 1(At 1, At ) decrease their consolidation value and active synapses might− get− silent. Additionally, in this case, inhibition is triggered and stops the spread of activity, which results in the procedure being continued through reactivation of the stimulus in the next round.

Formation of patterns. Consider the t-th time step and assume that the patterns A1 to At 1 have the correct size m and that additionally + − A = At holds for all 1 t0 < t, so that the chain was not interrupted t0 0 ≤ by spontaneous activity. If A < m, then the only way A could grow is if a neuron v which | t| t is not already in At is activated by spontaneous activity in the t-th round and if v additionally has at least k in-neighbors in At 1 (i.e. At 1 − deg − (v) k holds). By LTP, future activation of At 1 (in later t 1 ≥ − rounds)− results in the activation of at least A v . Thus, neurons t ∪ { } are recruited into patterns by spontaneous activity. + + If At > m, then synapses in Et 1(At 1, At ) decrease their consol- | | − − idation value due to LTD and may turn silent (or be removed, which we discuss later). Hence, in this case the future activation of At 1 − results in the activation of a smaller pattern and a pattern of correct size is carved out eventually.

Stable growth of the chain. Turning synapses active to recruit neu- rons for a pattern may also result in an increase of the size of previous patterns in the chain. However, if previous patterns become too large in this way, only these recently activated synapses are turned silent by LTD in the next round since their consolidation values are small compared to the synapses that have been part of the chain for longer

42 results and thus have higher consolidation values. Note that not all, but only synapses which cause this issue repeatedly need to be removed. The removal is controlled by the irresolution value. Thus, the interplay of the consolidation and the irresolution value ensures that the chain does not break and rather grows in a stable manner.

Convergence of chain development. The previous considerations also explain why the length of the chain is limited and how the growth eventually converges. Recall that l is the smallest index such that A = m or A = A for some 1 t l. If Γk(A ) < m | l+1| 6 l+1 t ≤ ≤ | l | and recruiting new neurons by spontaneous activity is impossible (as many synapses have been removed already), then the chain cannot k grow further. If Γ (A ) > m and all synapses from E (A , A ) are | l | l l l+1 consolidated because they connect previous patterns, then the chain cannot grow since it is not possible to reduce Al+1 to a pattern of correct size. Similarly, if A = A for any 1 t < l + 1 , then the l+1 t ≤ chain becomes cyclic and does not change. Hence, the chain only grows until one of these three cases occurs.

Connection probability trade-off. The connection probability p must be at least as large as the equilibrium connection probability p accord- ing to Equation (2.3). Otherwise, it is even unlikely that the second pattern can have size m. However, choosing p exactly equal to p does not yield a long synfire chain as discussed above. Perhaps counterin- tuitively, it is also not useful to start with a much larger connection probability since this increases the correlation of patterns, which re- sults in short cyclic chains, see Figure 25. Such correlations have been a problem in previous models (Hertz & Prügel-Bennett, 1996; Levy, Horn, Meilijson, & Ruppin, 2001; Kitano, Câteau, & Fukai, 2002;

43 2. emergence of synfire chains

Zheng & Triesch, 2014). To understand this, observe that if p p, then  Γk(A) contains many neurons with large (i.e. k) indegree into the  pattern A. Consequently the learning rule leads to many neurons with a large active indegree. However, neurons with large active indegree are likely to be in many patterns which results in highly correlated patterns. A connection probability close to p avoids this problem, see Figure 23 (d).

44 results

Simple Relaxed a Population size n b Pattern size m 1000 500 200 20 30 50 120 50 100 40 80 30 60 20

Length 40 20 10

0 0 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1

pactive pactive

c Spike threshold k d Density parameter δ 10 5 3 0.5 0.2 0.1 200 60 50 150 40 100 30 20 Length 50 10 0 0 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1

pactive pactive

Figure 22: Length of the chain. The intensity of the color corresponds to the variation given at the top of each figure. We performed 200 trials and the error bars show the standard error of the mean. The dashed lines show the analytic chain length obtained from Equation (2.10), in (d) the value for δ = 0.5 is 111 (not shown); if not altered in the plot, the number of neurons is n = 500, the pattern size is m = 30, the spike threshold is k = 3, and the density parameter is δ = 0.2, see Section 2.2.1 for all parameter values.

45 2. emergence of synfire chains

Emerged Gn,p Random Emerged G Random a n,d b 20 ● 25 ●

15 20 ● ● ●

● 15 10 ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● Overlap ● ● 5 ● ● ● ● ● 5 ● ●

200 500 800 1000 20 30 40 50 Population size n Pattern size m c d

14 ● ● ● 15 ● 12 ● ● ● ● ● ● 10 ● ● ● ● 10 ● 8 ● ● ● ● ● ● ● ● ● 6 ● ● ● ● ● ● ● ● ● 5 ●

Overlap 4 ● 2

3 5 7 10 0.1 0.2 0.3 0.4 0.5 Spike threshold k Density parameter δ

Figure 23: Maximum overlap between two patterns of the emerged chain (pinc = 1). We compare to a sequence of random patterns of the same length (control), as indicated by the color intensity. We performed 50 trials and the error bars show the standard error of the mean; If not altered in the plot, the number of neurons is n = 500, the pattern size is m = 30, the spike threshold is k = 3, and the density parameter is δ = 0.2, see Section 2.2.1 for all parameter values.

46 results

Simple Relaxed

pactive = 1 pactive = 1 pactive = 0.5 pactive = 0.5 pactive = 0 pactive = 0

● 150 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● Summed lengths

1 2 3 4 5 6 7 8 9 10 Number of chains

Figure 24: Multiple chains. We performed 50 trials and the error bars show the standard error of the mean; The number of neurons is n = 500, the pattern size is m = 30, the spike threshold is k = 5, and the density parameter is δ = 0.2, see Section 2.2.1 for all parameter values.

47 2. emergence of synfire chains

Simple Relaxed

pactive = 1 pactive = 1 pactive = 0.5 pactive = 0.5 pactive = 0 pactive = 0

60 ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● 40 ● ● ● ● ● ● ● ● ● ● 30 ● ●

Length ● ● ● ● 20 ● ● ● 10 ● ● ●

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. Density parameter δ Figure 25: Effect of increasing δ (i.e. the connection probability) on the length of the chain. For δ 0.5 the chains tend to become cyclic. We performed 50 trials and the≥ error bars show the standard error of the mean. The number of neurons is n = 500, the pattern size is m = 30, and the spike threshold is k = 3, see Section 2.2.1 for all parameter values.

48 results

2.3.3 Estimation of the chain length In this section, we determine the length of the chain. We outline an auxiliary procedure which performs slightly worse than our learning procedure but is mathematically tractable. Analyzing the auxiliary procedure gives a lower bound.

Auxiliary procedure. The auxiliary procedure starts with the stim- ulus A1 (in the first time step) in a network of active synapses and consolidates or removes synapses in order to form a chain. Consider the t-th step of the procedure. So far, the chain consists of patterns A1,..., At 1. The pattern At is formed as follows: first the procedure − removes (unconsolidated) synapses such that At 1 activates exactly − m neurons at time t and second the procedure consolidates synapses to prevent the removal of synapses which are part of the chain (it does so randomly as indicated below). The procedure either stops if too many synapses have been removed or too many synapses are consolidated. In contrast to our learning procedure, synapses are removed immediately and thus non-optimally. This results in shorter sequences and yields a lower bound on the chain length (a detailed definition of the auxiliary procedure and its relation to the learning procedure can be found in the supplementary material.

Evolution of the densities. To determine when the auxiliary proce- dure stops we track how the density of active synapses and the density of consolidated synapses evolve (detailed calculations can be found in the supplementary material. Let pt be the density of active synapses after formation of the t-th pattern. Since the connection probability is p = (1 + δ)p, where p is

49 2. emergence of synfire chains

k given in Equation (2.3) and δ 1, almost all neurons in Γt 1(At 1)  − − have exactly k in-neighbors in At 1 and the procedure removes k − ∼ Γt 1(At 1) m synapses in step t (typically one from each neuron | − − | − which will not be in At). These considerations allow to determine the density of active synapses as

 t 1 mk − p = p + δp 1 .(2.6) t − n2p

To determine the evolution pt of consolidated synapses, we assume that in every step mk random synapses are selected to become consoli- dated. Hence, for a non-consolidated synapse that has survived until 2 step t, the probability that it is consolidated in round t is mk/(n pt 1). − From this, we can conclude that the density of consolidated synapses is  t 1! mk − p = p 1 (1 δ) 1 .(2.7) t − − − n2p

Length of the chain. From the evolution of the densities we can now determine the expected length of the chain. There are two reasons why the auxiliary procedure could stop at step t (terminating with a chain of length t 1). First, if too many synapses have been removed and −k therefore Γt 1(At 1) < m holds, and second, if too many synapses | − − | k are consolidated and thus Γt 1(At 1) > m holds. We assume that | − − | k k Γt 1(At 1) and Γt 1(At 1) are binomially distributed. Let qt be | − − | | − − | the probability that a neuron has at least k in-neighbors into At 1. As − At 1 = m, we have | − | q = Pr[Bin(m, p ) k].(2.8) t t ≥

50 results

Let Pt be the probability that the procedure stops in the t-th step because of the first reason. We get

Pt = Pr[Bin(n, qt) < m].(2.9)

Analogously, one obtains qt, the probability that a neuron has at least k in-neighbors via consolidated edges into At 1 and Pt, the probability − of stopping in the t-th step for the second reason. Let L be the random variable for the length of the chain. Since L is geometrically distributed, the expected length is

∞ t 1 E[L] = t − (1 P ) P .(2.10) ∑ · ∏ − t0 · t t=1 t0=1 From this, we determine the asymptotic of this expectation as

1 n2p  n 2 E[L] log m − k log m,(2.11) ≈ m · ≈ m · substituting p according to Equation (2.3) in the last step. The full calculations can be found in the supplementary material. Note that this (together with having a small overlap of patterns, see Figure 23 implies heavy reuse of neurons in the chain: a neuron will be in roughly (n/m)1 1/k log m patterns on average. ≈ − For the small network sizes considered in Figures 22–25, the asymp- totic expression of the chain length in Equation (2.11) is not adequate 4 1/4 (e.g. for n = 10 , m = √n, δ = m− , and k = 3 the relative error compared to the simulated auxiliary process is roughly 2). Hence, we compare Equation (2.10) to the simulated learning procedure and obtain good agreement, see Figure 22, in particular, if the assumptions we made in our analysis are met: if δ is small as a function of n,

51 2. emergence of synfire chains see Figure 22 (a) and (d), and if m is large enough but significantly smaller than n, see Figure 22 (b). Furthermore, since k is assumed to be constant and independent of n there is good agreement for all k, see Figure 22 (c).

2.3.4 An upper bound for non-structural learning rules In this section, we give an upper bound on the length of a synfire chain in a sparse random network using a short information theoretic argument. This upper bound applies to all learning procedures satis- fying the following two conditions: (i) no new synapses can be added to the network and (ii) all patterns in the chain must be reasonably uncorrelated. More precisely, we assume that each pattern contributes n γ log (m), for some 0 < γ < 1. Note that this is a γ-fraction of the maximal information that a pattern of size m can contribute. Hence, γ is a measure of how correlated the patterns are: if γ is large, the patterns are essentially random; small γ however, corresponds to large overlaps among patterns. On the one hand, by the second assumption the total binary entropy of a chain of length l is at least l γ log ( n ) l γm log n. On · 2 m ∼ · 2 the other hand, by the first assumption, we can encode the chain by encoding the network, and for each synapse encoding whether 2 it is active or not. The entropy of Gn,p is H2(p)n (where H2(x) = x log x (1 x) log (1 x) is the binary entropy function), and − 2 − − 2 − we can encode for each synapse with one bit whether or not it has been removed. Therefore, the entropy of the chain cannot exceed H (p)n2 + pn2 H (p)n2. Together, we obtain 2 ∼ 2 l γm log n H (p)n2,(2.12) · 2 ≤ 2

52 results or equivalently

1 H (p)n2 n2 p 1  n 2 1 l 2 − k ,(2.13) ≤ γm log n ∼ m · γ ≈ m · γ where the last step holds if p is chosen according to Equation (2.3). Note that this matches the performance computed in Equation (2.11) for γ 1/ log m. Since we require γ to be not too small (the patterns ≈ should be sufficiently uncorrelated), this shows that one cannot hope for a substantially longer chain.

2.3.5 Simulation results of spiking network In the network of spiking neurons (Section 2.2.2), the neurons in- tegrate input from more than just the previous pattern because the membrane time constant is longer than the synaptic delay. As observed in previous models this poses difficulties for synchrony and chain development. However, our learning rule (in combination with the decay) ensures that only synapses which connect neurons in succeed- ing patterns are consolidated. Thus, essentially the same connectivity scheme as in the simple model is carved out and a long chain with synchronous transmission emerges. Therefore the analysis of the simple network carries over since the statements made there concern primarily the connectivity scheme of the network. We simulated 150, 000 reactivations of the stimulus and obtained a chain of length 22 with pattern size 20, a maximal overlap of ≈ ≈ 7 among two patterns, and using 189 out of the 200 neurons, ≈ ≈ see Figure 26. Hence, on average neurons are used at least twice (see Figure 26 (c) for the histogram of occurrences). Although a comparison in absolute numbers is not too meaningful since it strongly depends on

53 2. emergence of synfire chains network parameters this exceeds previous spiking models qualitatively by many measures such as absolute length, reuse of neurons in the chain, the ratio of neurons used over population size, see (Levy et al., 2001; Fiete et al., 2010; Waddington, Appleby, De Kamps, & Cohen, 2012). The only model achieving long chains is (Jun & Jin, 2007) which we discuss below in Section 2.4.2.

54 results

a b

● ● ● ● ●● ● ●● ● ●● 20 ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● 150 ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● 15 ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● 100 ●●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● 10 ●●● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● 50 ●●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● 5 ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ●● ● ● Neuron no. Neuron ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● Chain length ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 40 60 80 100 120 140 ● 20 40 60 80 100 120● 140 Rounds (thous.) Time (ms) c d 70 100 60 Emerged Emerged 50 Random 80 Random 40 60 30 40 20 Frequency Frequency 10 20 0 0 01234567 01234567 Occurrence in chain Overlap among patterns

Figure 26: (a) Simulation of the network of conductance-based LIF neurons. We performed 32 trials and the error bars show the standard error of the mean; (b) The rasterplot shows the spread of activity along the chain (stimulus onset at time 0) after 150, 000 reactivations of the stimulus. The neurons are sorted according to their first occurrence in a pattern; (c) Histogram of the number of occurrences of neurons in the chain (reuse). We compare the emerged chain to a sequence of random patterns of the same length (control). We performed 32 trials and the error bars show the standard error of the mean; (d) Histogram of overlap among patterns in the chain. We compare the emerged chain to a sequence of random patterns of the same length (control). We performed 32 trials and the error bars show the standard error of the mean.

55 2. emergence of synfire chains

2.3.6 Sequences can be learned in one shot Learning and recall of sequences in our model from Section 2.2.3 can be analyzed by using the well-established theory of auto- and hetero- associative memory (Gauy, Meier, & Steger, 2017; Einarsson, Lengler, & Steger, 2014; Amit & Fusi, 1994; Knoblauch, Palm, & Sommer, 2010; Willshaw, Buneman, & Longuet-Higgins, 1969) and thus the underlying principles are already understood. If the patterns in the chain are mutually disjoint and all afferent synapses are present, then the length of the sequences which can be learned and fully recalled is the same as the length of the synfire chain: activating the stimulus A1 results in the propagation of activity along the chain, so At will be active in round t of recall and At thereafter activates st (only st gets input, if the patterns in the chain are disjoint). Hence, the entire sequence can be recalled. If the patterns of the chain are not disjoint (as in our case), then learning sequences in which letters are repeated is harder and reduces the length of a sequence that can be learned, see Figure 28. Thus, there is a trade-off between alphabet size of the sequence to be learned and reusability of neurons in the synfire chain. As a side note, we remark that if the afferent synapses are transient (they turn weak after some time), then the chain can be used an unlimited number of times to learn and recall sequences. Thus the first learning phase is only needed once, and afterwards provides a network for learning an arbitrary number of sequences, one sequence at a time. Learning a new sequence requires that the weights of the synapses between the hidden layer and the visible layers are reset. This can, for example, be realized by decreasing all weights over time. It is also possible to have multiple smaller chains, see Figure 24.

56 results

Assuming that the stimulus of each chain can be activated individually, the network can store many short sequences simultaneously. Note that this network allows to store rhythms where the time grid is given by the synaptic delay. The rate of the sequence to be learned is independent of the time a pattern in the chain is active: if one element of the sequence is presented longer, then it will be bound to several subsequent patterns in the chain and subsequently also presented longer during recall. Furthermore, one can relax the WTA dynamics in the visible layer and replace the abstract entities by neurons, to obtain a network which learns sequences of precisely timed neuronal activity (Brea et al., 2013) in a one-shot fashion.

57 2. emergence of synfire chains

a

Hidden layer A1 Before learning Afferent density Visible layer a b c d

b

A1 A2 A3 A4 A5 A6 After emergence of synfire chain a b c d

c A A A A A A After one-shot 1 2 3 4 5 6 learning of the sequence “babcdd” a b c d

Figure 27: Illustration of our Network for one-shot learning of se- quences. The hidden layer is a network as described in Section 2.2.1 with stimulus A1. The visible layer consists of entities representing the letters a, b, c, and d. Light grey color indicates presence of silent synapses; (b) The network after the first phase of learning. A synfire chain of length 6 developed in the hidden layer. Pink color indicates presence of active synapses; (c) The network after learning the se- quence babddc. Note that the sequence contains multiple occurrences of single characters. Pink color indicates presence of active synapses.

58 results

100 ● ● ● ● ● ● ● ● Synfire chain ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Random 80 ● ● ● ● ● ● ● Emerged G ● ● n,d ● 60 ● ● ● ● Emerged G ● n,p ● ● ● ● ● ● ● ● ● 40 ● ● ● ● ● ● Afferent density ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● paff = 0.8 20 ● ● ● ● ● ● ● ● ● paff = 0.5 paff = 0.3 2 4 6 8 10 12 14 16 18 20

Length of recalled sequence Length of recalled Alphabet size (hidden layer)

Figure 28: Length of recalled sequence for different alphabet sizes. The sequences are random strings over the alphabet. The color intensity of the color corresponds to the variation given at the right of the plot, we compare to a sequence of random patterns of the same length as the emerged chain (control), and the color indicates the connection probability pa f f between layers. The parameters are n = 1, 000, m = 30, k = 3, p satisfies Equation (2.3), similarly d, δ = 0.2, c∗ = 100, and r∗ = 50, pacc = 1, pdec = 1, pinc = 1, and pact = 0.5. Note that if the patterns in the synfire chain were disjoint, the maximum length of a sequence would be n/m = 33. We performed 50 trials and the error bars show the standardb errorc of the mean.

59 2. emergence of synfire chains

2.4 Discussion

In this section, we first discuss the model assumptions in their bio- logical background. After that, we put our results into the context of related work.

2.4.1 Model assumptions In our model synapses have binary efficacy, this means they are either silent, and have zero or small efficacy, or they are active hav- ing large efficacy. The consolidation value determines the efficacy and is increased or decreased by LTP or LTD events, respectively. This model was termed multistate synapse (Ben Dayan Rubin & Fusi, 2007) and has successfully been applied in learning and mem- ory (Baldassi, Braunstein, Brunel, & Zecchina, 2007; Ben Dayan Rubin & Fusi, 2007; Leibold & Kempter, 2008). It is based on experiments reporting that synaptic efficacy has discrete stable states (for review, see (Montgomery & Madison, 2004)) and in particular the discovery of silent synapses (Isaac, Nicoll, & Malenka, 1995; Montgomery, Pavlidis, & Madison, 2001). Moreover, it has been observed that LTP and LTD can switch between large and small efficacy states in an all-or-none fashion (Petersen, Malenka, Nicoll, & Hopfield, 1998; O’Connor, Wit- tenberg, & Wang, 2005). The plasticity of synapses can depend on previous activity, a phenomenon known as metaplasticity (for review, see (Abraham, 2008)) and this dependency creates discrete plasticity states (Montgomery & Madison, 2004). Furthermore, our synapses can switch from being silent to active only a limited number of times, as implemented by the irresolution value. It is known that silent synapses lack AMPA receptors in their postsynaptic membrane. However LTP can turn them active by integrating AMPA receptors into their post-

60 discussion synaptic membrane and LTD correlates with the removal of these receptors (Montgomery & Madison, 2004). The irresolution value simply assumes that this process can only occur a limited amount of times. Our learning rule is a three-factor STDP rule (for review of such rules, see (Frémaux & Gerstner, 2015; Pawlak et al., 2010)). In addition to pre- and postsynaptic spike events, the learning rule depends on the global activity in the network: if the activity is in a healthy regime, then a postsynaptic spike immediately following a presynaptic spike triggers LTP, however, if the activity is too large, then LTD is trig- gered. STDP modulated by such an internal feedback signal has been proposed in (Urbanczik & Senn, 2009; Friedrich et al., 2011; Brea et al., 2013). There are multiple proposals on how such a learning rule could be implemented biologically. Neuromodulators can affect STDP (for reviews, see (Pawlak et al., 2010; Frémaux & Gerstner, 2015)) and experimental support where a neuromodulator turns LTP into LTD is given in (Couey et al., 2007; Seol et al., 2007; Kwon, Longart, Vullhorst, Hoffman, & Buonanno, 2005; Cassenaer & Laurent, 2012). Further- more, it is conceivable that the feedback is determined or implemented by inhibition (Steele & Mauk, 1999; Wilmes et al., 2016). Another pos- sibility is the implementation through astrocytes as discussed in a similar setting in (Brea et al., 2013). However, an experiment linking it directly to population activity as predicted by our model has so far not been conducted (to the best of our knowledge). Our learning rule does not include LTD if a postsynaptic spike precedes a presynaptic spike (whereas standard STDP predicts LTD), since this LTD component is not necessary for chain formation. However, it is clear that it also does not hinder chain formation: such an LTD component would effectively depress synapses between a pattern and preceding patterns.

61 2. emergence of synfire chains

This shortens the sequences, since more synapses are removed, but decreases correlations between patterns. The connectivity of our network is sparse as indicated by elec- trophysical experiments (Mason, Nicoll, & Stratford, 1991; Markram, Lübke, Frotscher, Roth, & Sakmann, 1997) and uniform, justified by the small network size. Hence, not all combinatorially possible synapses can be formed. While synaptogenesis occurs throughout a lifetime, our model does not rely on the possibility to form a synapse between any particular pair of neurons. Moreover, during chain development, few silent synapses are removed permanently from the network as it has been speculated that silenced synapses are preferred candidates for elimination (Montgomery & Madison, 2004). The inhibitory mechanism in our model is global and highly sim- plified. However, the relaxed model indicates that neither timing nor precision need to be fine tuned, which makes it conceivable that it can be implemented by a population of fast inhibitory interneurons. By designing a more sophisticated inhibitory mechanism that favours spontaneous activity during pattern formation, but reduces its level during replay, the speed of chain formation can be improved. Similarly, the relaxed model shows that timing and accuracy of the feedback signal do not need to be fine tuned. Moreover, note that as shown in the network of spiking neurons, the time course of learning in the presence of the modulating feedback signal is much larger than in its absence, allowing for a slower mechanism in the first case, which is consistent with the modulator mechanisms discussed above.

62 discussion

2.4.2 Related work Previously suggested models use Hebbian learning or STDP in combi- nation with limiting the total synaptic input or output of each neuron to indirectly restrict the pattern size in the chain (Bienenstock, 1991; Hertz & Prügel-Bennett, 1996; Jun & Jin, 2007; Fiete et al., 2010; Okubo, Mackevicius, Payne, Lynch, & Fee, 2015). Such restrictions were encouraged by (Abeles, 2009) where it is speculated that this is the op- timal approach. However, limiting the total synaptic input or output of a neuron impedes the reuse of neurons in multiple uncorrelated pat- terns, because a neuron gets input from a limited number of neurons and sends output to a small number of other neurons. To illustrate this phenomenon consider the (slightly simplified) mechanism of (Jun & Jin, 2007; Miller & Jin, 2013): here the outdegree (divergence) of each neuron is bounded by d. Hence, the number of spikes sent by a pattern of size m is d m. By choosing the spike threshold k of the · neurons to be (roughly) equal to the divergence d it is ensured that the next pattern has size at most (m d)/k = m as well. However, a · neuron can now only excite d other neurons, and if it occurs multiple times the patterns get highly correlated, which results in cyclic chains. Similarly, neurons can also hardly participate in multiple chains in the network, disagreeing with experimental data (Abeles et al., 1993; Segev et al., 2004; Luczak et al., 2007). Such restrictions are in contrast to our model which limits the pattern size directly via the feedback signal on global activity. Here, the next pattern is not determined by single neurons in the current pattern (as above), but rather by their combination. This mechanism allows reusing neurons efficiently in many patterns of the chain or multiple chains. Moreover, previous models suffer from short and cyclic chains

63 2. emergence of synfire chains

(Bienenstock, 1991; Hertz & Prügel-Bennett, 1996; Levy et al., 2001; Kitano et al., 2002; Zheng & Triesch, 2014). They usually consider networks in which every synapse is present (silent or active) or can be added. For some learning paradigms, dense connectivity is even crucial (Fiete et al., 2010). In our model, restricting the connectivity to a cortical-like sparse random network is actually beneficial because it prevents cyclic, yet still allows long chains. The only previous work with long emerging chains is (Jun & Jin, 2007); Their approach relies on the formation of new synapses and uses a special mechanism of axon remodeling to limit the number of outgoing synapses of a neuron to enforce a certain pattern size (as sketched above). This approach, in turn, hinders the reuse of neurons: their chain has length 67 with a pattern size of 6 to 7 using only 443 neurons in of a population of size 1,000. Our approach shows that using a modulated STDP rule can overcome this limitation and that the remodeling/formation of synapses (although speculated in (Jun & Jin, 2007) to be essential) is not necessary for the process of chain formation.

2.5 Supplementary Material

This section contains the full analysis to determine the length of the chain constructed by our learning procedure. We first modify the learning procedure to obtain an auxiliary procedure which performs slightly worse but is mathematically tractable. Analyzing the auxiliary procedure then gives a lower bound.

64 supplementary material

2.5.1 Auxiliary procedure For ease of notation we assume that all synapses are active initially (this implies that the notions of present and active synapses are equiv- alent throughout the procedure, since synapses are either removed or consolidated, but never turned silent). The procedure creates a chain in one ‘round’, forming a pattern in each time step. First, the stimulus A1 is picked as a random subset of size m. Then, the chain is inductively constructed as follows. Assume patterns Ai are already k constructed for 1 i < t. All vertices of At must belong to Γt 1(At 1) ≤ k − − since no synapses can be added. If Γt (At 1) < m, the procedure | 1 − | thus stops with a chain of length t −1. Otherwise all neurons of k − Γt 1(At 1) will be in At since consolidated synapses cannot be re- − − k moved. Thus, if Γt 1(At 1) > m, then the procedure stops with a | − | chain of length t −1 (we will see in the analysis, that typically the − procedure stops because of the first reason). Otherwise, a random sub- k k k set X of size m Γt 1(At 1) is chosen from Γt (At 1) Γt 1(At 1) − | − | 1 − \ − and synapses are consolidated− resp. removed such− that in the− resulting k k network Γl (At 1) = Γt 1(At 1) X holds. This means, all synapses − − − ∪ from Et 1(At 1, At) are consolidated and unconsolidated edges from − − At 1 towards v are deleted randomly (by setting their irresolution − At 1 values to r∗) such that degt − (v) < k holds.

2.5.2 Relation to learning rule We point out why the auxiliary procedure yields a shorter chain than the learning rule. Since both procedures grow patterns by adding random neurons, it is clear that they only differ in how they remove synapses (assuming c∗ and r∗ are large enough). The following exam-

65 2. emergence of synfire chains ple demonstrates why the learning procedure makes ‘better’ choices at removing synapses than the auxiliary procedure. Consider a pat- tern At 1 and a neuron v with present degree k and consolidated − At 1 At 1 degree 0 (i.e. deg − (v) = k and deg − (v) = 0). Assume that both t 1 t 1 − − procedures decide that v should not be in At (both do so randomly). The auxiliary procedure removes a random unconsolidated synapse from At 1 to v immediately. The learning rule however, only removes − At 1 a synapse from At 1 to v if degt − (v) = k 1 for some t0 > t 1. − 0 − − In particular it removes the optimal synapse, since all other synapses from At 1 to v are part of the chain. −

2.5.3 Evolution of the densities To determine the time step in which the auxiliary procedure stops, we track how the density of active synapses and the density of con- solidated synapses evolve (in expectation). We choose the connection probability as p = (1 + δ)p, where p is chosen according to Equa- tion (2.3) and δ 1, as Figure 25 indicates that this is best. Moreover,  we assume that degAt (v) is independently Bin( A , p )-distributed t | t| t for different v (and analogously for consolidated synapses) for t 1. ≥ Note that this condition is not fully satisfied: if u, v are neurons such that deg(u) > deg(v), then u has a larger probability to appear in a pattern than v. This means that for fixed t, neuron u is more likely to appear in At than v, increasing correlation among patterns, see Figure23. The effect vanishes as the average degree pn2 ∞, but → since we assume sparse graphs (and thus, pn2 to be a slowly growing function), the effect prevails for a long time. To estimate this effect, we consider also random networks with fixed indegree d, denoted Gn,d. To generate such a network, every neuron v chooses uniformly

66 supplementary material at random an input set I V of d vertices, and we insert all edges v ⊆ (u, v) for u I . ∈ v We first calculate the evolution of pt, the density of active synapses after formation of the t-th pattern. Since δ 1, almost all neurons k  in Γt 1(At 1) have exactly k in-neighbors in At 1 and the procedure − −k − removes Γt 1(At 1) m synapses in step t (one from each neuron | − − | − which will not be in At). This yields

k Γt 1(At 1) m pt = pt 1 | − − | − (2.14) − − n2 k k nm pt 1 k! − m pt 1 − (2.15) ∼ − − n2  k! m p + (pt 1 p) = pt 1 + 1 − − (2.16) − n2 − p    m pt 1 p pt 1 + 1 1 + k − − (2.17) ∼ − n2 − p  mk  mk = pt 1 1 + ,(2.18) − − n2p n2 where (2.15) follows from the definition of p in Equation (2.3) and (2.17) holds since (pt 1 p)/p δ (note that pt [p, (1 + δ)p]). Solving the − − ≤ ∈ recursion gives  t 1 mk − p = p + δp 1 .(2.19) t − n2p

In order to determine the evolution pt of consolidated synapses, we assume that in every step mk random synapses are selected to become consolidated. Hence, for a non-consolidated synapse that has sur- vived until step t, the probability that it is consolidated in round t is

67 2. emergence of synfire chains

2 mk/(n pt 1). Thus, − mk = + ( ) pt pt 1 pt 1 pt 1 2 (2.20) − − − − n pt 1 −  mk  mk = + pt 1 1 2 2 (2.21) − − n pt 1 n − holds. Solving the recursion and using p1 = 0, we get

mk t 1 r 1  mk  = − − pt 2 ∑ ∏ 1 2 (2.22) n = = − n pt s r 0 s 0 − t r 1 ! mk mk − 1 2 ∑ exp 2 ∑ ,(2.23) ∼ n − n pt s r=1 s=1 − where the approximation is valid since 1 x = exp( x + (x2)) for − − O x 1. Using (2.19), we bound the inner sum as ≤ r 1 1 1 r 1 1 − = − (2.24) ∑ p p ∑  t s s=1 t s s=1 1 + δ 1 mk − − − n2p r 1  t s! 1 mk − − 1 δ 1 (2.25) p ∑ 2p ∼ s=1 − − n  t r+1! r 1 mk − − 1 δ 1 .(2.26) ≥ p − − n2p

 r Observe that if r = Ω(i), then 1 mk/(n2p) − is negligible com- t − pared to 1 mk/(n2p) . On the other hand, the sum in (2.22) is dom- − inated by the terms for small r. Therefore, and using ∑t exp( (r r=1 − − 1)x) = (1 exp( tx))/(1 exp( x)) (1 (1 x)t)/x, we may − − − − ∼ − − 68 supplementary material simplify !! mk t mk  mk t p exp (r 1) 1 δ 1 (2.27) t 2 ∑ 2p 2p ∼ n r=1 − n − − − n   negligible t z }| {    t !    mk mk   p 1 1 2 1 δ 1    −  − n p − − n2p       (2.28) ∼   t 1 δ 1 mk − − n2p !  mk t  mk t p 1 1 + δ 1 (2.29) ∼ − − n2p − n2p !  mk t = p 1 (1 δ) 1 .(2.30) − − − n2p

2.5.4 Length of the chain From the evolution of the densities we can now determine the ex- pected length of the chain. Recall that there are two reasons why the procedure could stop at step t (terminating with a chain of length t 1): first, if too many synapses have been removed and there- − k fore Γt 1(At 1) < m holds, and second, if too many synapses are − − k consolidated and thus Γt 1(At 1) > m holds. By assumption (as − − k k discussed above) we have that Γt 1(At 1) and Γt 1(At 1) are bino- − − − − mially distributed. Let qt be the probability that a neuron has at least

69 2. emergence of synfire chains

k in-neighbors into At 1. By construction At 1 = m holds, implying − | − | q = Pr[Bin(m, p ) k] (2.31) t t ≥ mk pk t (2.32) ∼ k!  t 1!k m mk − = 1 + δ 1 ,(2.33) n − n2p where we use (2.19). Let Pt be the probability that the procedure stops in the t-th step because of the first reason. We get

Pt = Pr[Bin(n, qt) < m].(2.34)

Similarly, let qt be the probability that a neuron has at least k in- neighbors via consolidated edges into At 1. We have − q = Pr[Bin(m, p ) k] (2.35) t t ≥ mk pk t (2.36) ∼ k!  t 1!k m mk − = 1 (1 δ) 1 ,(2.37) n − − − n2p using (2.27). Let Pt be the probability of stopping in the t-th step for the second reason. We get

Pt = Pr[Bin(n, qt) > m].(2.38)

From this we immediately see that Pt dominates Pt, because the mean of Bin(n, q ) is m(1 + o((1 ((mk)/(n2p)))t 1) for all t whereas the t − − mean of Bin(n, q ) is m(1 + Ω((1 ((mk)/(n2p)))t 1)). Thus, the first t − − 70 supplementary material mean is much closer to m than the second one. Hence, in the following we may neglect the probability to stop for the second reason. Let L be the random variable for the length of the chain (i.e. the number of steps until the procedure stops). We can compute the expected length as

∞ t 1 E[L] = t − (1 P ) P .(2.39) ∑ · ∏ − t0 · t t=1 t0=1 In the following, we determine the asymptotic value of E[L] and show

n2p  1 E[L] = log m + log (δk) mk · 2 − (2.40) 1   n2p   log 2 log + o(1) . − 2 m2k2√π

It will turn out to be necessary that the factor in the outermost brackets grows with n and thus we assume that the functions m(n) and δ(n) are such that 1/2 log m is the dominant term in this factor. Hence, we can simplify to

1 1 n2p k! k  n 2 E[L] log m − k log m,(2.41) ∼ 2mk · ∼ 2k · m · where the last step follows from the definition of p in Equation (2.3).

71 2. emergence of synfire chains

Since L N, we can rewrite the expected length as ∈ ∞ E[L] = ∑ Pr[L t] (2.42) t=0 ≥ Z ∞ = Pr[L z] dz (2.43) 0 ≥ Z ∞ n2p n2p = Pr[L z ] dz.(2.44) 0 mk ≥ · mk Moreover, we rewrite

i Pr[L i] = ∏(1 Pj) (2.45) ≥ j=1 − i P +O(P2) = ∏ e− j j (2.46) j=1    i i 2 = exp  ∑ Pj + O ∑ Pj  .(2.47) − j=1 j=1

i 2 The error term O(∑j=1 Pj ) is negligibly small for all relevant values of i (i.e. for i s.t. the error term does not vanish, Pr[L i] is exponentially ≥ small), so we suppress it in further calculations Then, using the same trick as above to approximate the sum by an integral we obtain

i 2 ! Z n2p n p Pr[L i] = exp mk P 2 dy .(2.48) y n p ≥ − 0 mk · mk

This indicates that it is convenient to normalize with (n2p)/(mk). Hence, we define Q(y) := Py (n2p)/(mk) and approximate Q(y) by a ·

72 supplementary material normal distribution to get

1 Z ∞ 2 Q(y) e x dx (2.49) 2 µ(y) m − ∼ σ (y)2√π · − σ(y)√2 with µ(y) = n q (2.50) · i  2 k  y n p m mk · mk = n 1 + δ 1  (2.51) · n − n2p

m(1 + δe y)k (2.52) ∼ − and q σ2(y) = n q (1 q ) µ(y),(2.53) · i − i ∼ where we hide the dependence on y if it is clear from the context. Again, the error terms in (2.49), (2.50), and (2.53) are small enough to be neglected. Thus, using (2.44),(2.48), and (2.49) we can rewrite the expected value of L as n2p E[L] = mk · ! Z ∞ Z z n2p Z ∞ exp exp( x2) dx dy dz. 2 µ m · 0 − 0 mkσ 2√π − − σ√2 (2.54) For convenience, we split the term and define ! Z b Z z n2p Z ∞ A(a, b) := exp exp( x2) dx dy dz, 2 µ m a − 0 mkσ 2√π − − σ√2 (2.55)

73 2. emergence of synfire chains

Z b n2p Z ∞ B(a, b) := exp( x2) dx dy,(2.56) 2 µ m a mkσ 2√π − − σ√2 and Z b C(a, b) := exp( x2) dx.(2.57) a − Observe that if a = ω(1) we apply Taylor approximation to get

C(a, ∞) = exp( a2) a O(1).(2.58) − · ± Define 1 1   n2p  Z := log m + log (δk) log 2 log (2.59) 2 − 2 m2k2√π and 1 1 Z0 := log m + log (δk) log 2. (2.60) 2 − 2 Let f (n) be a slowly decreasing function whose order of magnitude we will determine later. We will now split up

A(0, ∞) = A(0, Z f ) + A(Z + f , Z0 f )+ − − (2.61) + A(Z0 + f , ∞) + o(1) and consider the single terms individually, where the error term comes from the fact that A(Z f , Z + f ) + A(Z f , Z + f ) 2 f (n). − 0 − 0 ≤ Claim 2.1. We claim A(0, Z f ) = Z f . − − We show that B(0, Z t) = e tω(1) for all t f (n), and therefore − − ≥ A(0, x) = R x1 e tω(1) dt = x o(1) for x Z f . Let g(n) be a 0 − − − ≤ − slowly growing function whose order of magnitude we also determine later. We split

B(0, Z f ) = B(0, g) + B(g, Z f ),(2.62) − − 74 supplementary material and bound both summands separately. By (2.58), ! Z g n2p  µ m 2 B(0, g) exp − mO(1) dy (2.63) ≤ 0 mkσ22√π − σ√2   2 mO(1)n2p (1 + δe g)km m g exp  −  (2.64) 2   q −   ≤ · mkσ 2√π − g k (1 + δe− ) 2m   nO(1) exp Θ(δe 2gm) (2.65) ≤ − −   = exp mΩ(1) ,(2.66) − for an appropriate choice of g(n). So the first summand is negligible. The second summand is

Z Z t n2p B(g, Z t) = − − g mkσ22√π · !  µ m 2   µ m  exp − + O log − dy · − σ√2 σ√2 (2.67)

We can simplify the term (µ m)/(σ√2), which mutually depends − on y, to

y k y µ m (1 + δe− ) m m kδe− √m − = q − = (1 + o(1)).(2.68) σ√2 y k √2 (1 + δe− ) m2

75 2. emergence of synfire chains

Therefore, (2.67) becomes

B(g,Z t) = (2.69) − ! n2p Z Z t  kδe y√m 2 = − exp − (1 + o(1)) dy (2.70) m2k2√π g − √2 2 !! n2p  kδe Z t√m  = O exp − − (1 + o(1)) (2.71) m2k2√π − √2 1 e2t(1+o(1))  n2p  − = (2.72) m2k2√π t  n2p − (2.73) ≤ m2k2√π

 n2p  t log 2 tω(1) = e− m k2√π = e− ,(2.74) where the fourth step holds for all t f (n), if f (n) is suitably chosen. ≥ Claim 1 follows.

Claim 2.2. We claim A(Z + f , Z f ) = o(1). 0 − We show that B(0, z) = ω(1) + ω(z (Z + f )) for all z Z + f − ≥ and therefore Z t ω(1) ω(z (Z+ f )) A(Z + f , t) = e− − − dz = o(1).(2.75) Z+ f

Let z Z + f . We have ≥

76 supplementary material

f B(Z, z) B(Z + , z) (2.76) ≥ 2 Z z n2p Z ∞ = exp( x2) dx dy (2.77) f 2 µ m Z+ mkσ 2√π − − 2 σ√2 Z z n2p f 2 ≥ Z+ 2 m k2√π ·  !2  (2.78) µ(Z + f /2) m exp  − (1 + o(1)) dy · − σ(Z + f /2)√2

f n2p = ( + (z (Z + f ))) 2 − · m2k2√π ·  2  (2.79) Z f /2 ! δke− − √m exp  (1 + o(1)) · − √2

1 e f (1+o(1)) f  n2p  − − · = ( + (z (Z + f ))) (2.80) 2 − · m2k2√π f /2 f  n2p  ( + (z (Z + f ))) (2.81) ≥ 2 − · m2k2√π = ω(1) + ω(z (Z + f )),(2.82) −   2  = 1 n p for f ω log− m2 , proving the claim.

Claim 2.3. We claim A(Z0 + f , ∞) = o(1). Analogously to the previous claim, we show

B(0, z) = ω(1) + ω(z (Z + f )) (2.83) − 0 77 2. emergence of synfire chains for all z Z + f and therefore ≥ 0 Z t ω(1) ω(z (Z + f )) A(Z0 + f , t) = e− − − 0 dz = o(1).(2.84) Z0+ f

Observe that for y Z we have ≥ 0 µ(y) m − = Θ(1) (2.85) σ(y)√2 and therefore C((µ m)/(σ√2), ∞) = Θ(1). This implies −

Z z n2p Z ∞ B(0, z) B(Z , z) = exp( x2) dx dy (2.86) 0 2 µ m ≥ Z mkσ 2√π − − 0 σ√2  n2p  = ( f + z (Z0 f )) Θ (2.87) − − · m2 = ω(1) + ω(z (Z + f )),(2.88) − 0 for f (n) = ω(m2/(n2p)), concluding the proof.

78 3 Rate based learning with short stimuli

The results in this chapter were obtained in joint work with Marcelo Matheus Gauy, Johannes Lengler, Florian Meier and Angelika Steger, see (Weissenberger, Gauy, Lengler, Meier, & Steger, 2018).

3.1 Introduction

The ultimate goal of computational neuroscience is to understand the capabilities of the nervous system to represent and process informa- tion (Sejnowski, Koch, & Churchland, 1988). It is generally agreed that plastic synapses play a key role in the biophysical foundation of complex information processing. How plastic synapses change their efficacy as a function of the activity and state of presynaptic and post- synaptic neurons has been studied in numerous experiments. Based on these results, computational neuroscience aims to derive models of synaptic plasticity that admit to study what kind of computations may emerge in neural networks with plastic synapses. Over the last decades there has been tremendous success in this endeavor, largely unburdened by using a firing rate abstraction of neuronal activity (Dayan & Abbott, 2001). The accessibility of such rate models can be largely contributed to the fact that they permit an analysis for which one can resort to a large body of established math- ematical tools (Dayan & Abbott, 2001; Gerstner et al., 2014). A classic example is the Bienenstock Cooper Munro (BCM) theory, which repro- duces the development of receptive fields in visual cortex (Bienenstock,

79 3. rate based learning with short stimuli

Cooper, & Munro, 1982). More recent work focused on spiking mod- els and demonstrated that plasticity rules formulated in terms of spike timing (STDP rules, e.g. (Gerstner, Kempter, van Hemmen, & Wagner, 1996; Kempter, Gerstner, & van Hemmen, 1999; Pfister & Gerstner, 2005)) and additionally in terms of the postsynaptic voltage (VDP rules, e.g. (Toyoizumi, Pfister, Aihara, & Gerstner, 2005; Mayr & Partzsch, 2010; Clopath & Gerstner, 2010; Clopath, Büsing, Vasi- laki, & Gerstner, 2010)) can be reduced to plasticity rules formulated in terms of firing rates (rate based plasticity rules, e.g. (Bienenstock et al., 1982; Oja, 1982)) under the assumption that firing rates are a meaningful abstraction of neuronal activity (Kempter et al., 1999; Izhikevich & Desai, 2003; Pfister & Gerstner, 2005). As a consequence, current spiking network models, which are capable of remarkable computation, are often implementations of rate models with spiking neurons (Litwin-Kumar & Doiron, 2014; Zenke, Agnes, & Gerstner, 2015). Whether rate or spiking models are suitable to describe neural computation in general and synaptic plasticity in particular is still highly debated (Softky & Koch, 1993; Rieke et al., 1999; London, Roth, Beeren, Häusser, & Latham, 2010; Graupner, Wallisch, & Ostojic, 2016), see (Brette, 2015) for review. A critical limitation of all these models is that they rely on the assumption that firing rates encode the information that is relevant to perform the desired computation. However, a firing rate is a temporal average of spikes. For cortical neurons, which spike in a dynamic range of 0-200 Hz this average must be taken over milliseconds to seconds, as otherwise no spikes are observed and the concept of a firing rate is hollow (Stein, 1967). This implies that rate based computation is restricted to computational tasks where information is encoded in slowly changing neuronal activity (Rieke et al., 1999;

80 introduction

Gerstner et al., 2014). This is in sharp contrast to the activity of cortical neurons in re- sponse to natural stimuli, which is typically characterized by the instantaneous rate (or firing probability) of the neuron. The instan- taneous rate is reported in a peri-stimulus-time histogram (PSTH), which averages neuronal spiking over several repetitions of the same stimulus (Gerstner et al., 2014). In vivo recordings of the instanta- neous rate of cortical neurons in response to natural stimuli reveal that the activity of such neurons changes quickly, in the order of few milliseconds (Shadlen & Newsome, 1998; Rieke et al., 1999). This suggests that for many computational tasks the relevant information is encoded in rapidly changing neuronal activity and thus a firing rate abstraction neglects a large amount of information. It is currently unknown if and how the information encoded in the instantaneous rate is available to local synaptic plasticity mechanisms. The reason is that the instantaneous rate is an abstract concept whose computation requires several repetitions of identical stimuli, which in a natural environment are sparse, irregular and distant in time. In contrast, information encoded in the firing rate is directly accessible to local synaptic plasticity mechanisms via spikes, and the dependence of plasticity on firing rate (Brown et al., 1988; Dudek & Bear, 1992; Bliss, Collingridge, & Morris, 1993; Sjöström, Turrigiano, & Nelson, 2001) and spike timing (Markram, Lübke, Frotscher, & Sakmann, 1997; Bi & Poo, 1998; Sjöström et al., 2001; Froemke & Dan, 2002; Wang, Gerkin, Nauen, & Bi, 2005) is well established. In this work we resolve this discrepancy: we show that the instan- taneous rate can be precisely estimated from the fluctuations of the membrane potential in balanced networks. Hence, the instantaneous rate is directly accessible to voltage-dependent synaptic plasticity

81 3. rate based learning with short stimuli mechanisms (Artola, Bröcher, & Singer, 1990; Ngezahayo, Schachner, & Artola, 2000; Sjöström, Turrigiano, & Nelson, 2004). In balanced networks, excitatory inputs are canceled by inhibitory inputs on av- erage (van Vreeswijk & Sompolinsky, 1996; Brunel, 2000; Vogels, Sprekeler, Zenke, Clopath, & Gerstner, 2011) and it is likely that cortical circuits operate in this balanced regime (Shadlen & Newsome, 1994; Renart et al., 2010). Our result immediately implies that rate based plasticity rules that are separable in the presynaptic and postsynaptic rate can be under- stood in terms of the instantaneous rate. Therefore, known insights on rate based plasticity transfer naturally to scenarios where relevant information is encoded in rapidly changing neuronal activity. So far, learning in such scenarios was only known to be feasible with STDP rules under the assumption of information being encoded in precise spike timing, in contrast to the rate based setup we study here. Concretely, we analytically quantify how long neuronal activity, which encodes a certain stimulus in firing rate or instantaneous rate, must be stationary such that a plasticity rule can apply a desired weight change, which is given by an arbitrary function of the presy- naptic and postsynaptic rate, with a given accuracy. Here we compare plasticity mechanisms that either solely depend on spiking of presy- naptic and postsynaptic neurons (spike-dependent plasticity (SDP) rule, equivalent to STDP in a rate based setting) or additionally on the postsynaptic membrane potential (voltage-dependent plasticity (VDP) rule). We find that for fixed accuracy the neuronal activity may change at least one order of magnitude faster in the case of VDP compared to SDP, since VDP can utilize the instantaneous rate. We illustrate this on the example of the BCM rule to perform input selectivity of stimuli presented for a very short period of time (10 ms).

82 materials and methods

3.2 Materials and methods

In this section we introduce our model and analyze it, which makes it a prerequisite to understand the results in Section 3.3.

3.2.1 Neuron model We use the classical model of Stein (Stein, 1965) for cortical in vivo neuronal dynamics and its diffusion approximation (Lánský, 1984) (see (Tuckwell, 1988; Gerstner et al., 2014) for excellent introductions and (Burkitt, 2006) for review). In Stein’s model a leaky integrate-and- fire (LIF) neuron is driven by stochastic spike arrival. The membrane potential u(t) evolves according to

d N ( ) = ( ) + ( f ) τ u t u t τ ∑ ∑ wkδ t tk ,(3.1) dt − k=1 f − tk where τ is the membrane time constant, k indexes the N synapses, f tk are the spike arrival times, wk is the weight of the k-th synapse, and δ is the Dirac δ-function. If the membrane potential reaches the threshold ϑ, the neuron spikes and the membrane potential is set to the reset potential ur immediately afterwards. Hence, the action potential is not explicitly modeled. The spikes arriving at the k-th synapse are generated by a Poisson process with rate νk. The weights wk can be positive or negative corresponding to excitatory or inhibitory synapses respectively. We assume loosely balanced excitation and inhibition (see (Denève & N Machens, 2016) for review and Section 3.4.3): the mean ∑k=1 wkνk of the synaptic input is zero (van Vreeswijk & Sompolinsky, 1996). Hence, the rate r of the neuron is determined by the variance of the

83 3. rate based learning with short stimuli

N 2 synaptic input ∑k=1 w νk. This dependence is indicated by denoting k q N 2 the rate as r(σ) for σ := τ ∑k=1 wk νk. For analytic tractability we consider the diffusion approximation of Stein’s model (where N is large and wk small, also known as the synaptic bombardment assumption or high input regime (Shadlen & Newsome, 1998)), which describes the membrane potential u(t) as an Ornstein-Uhlenbeck process (OUP)

u(t)dt σ du(t) = − + dWt,(3.2) τ √τ where dWt are the increments of a Wiener process in time dt (Lánský, 1984). The spike generation at threshold ϑ followed by a reset to ur is analogous to Stein’s model. The diffusion approximation allows to determine the rate as the inverse of the expected first passage time of the OUP given by Siegert’s formula (Tuckwell, 1988) as

ϑ ! 1 Z σ − r(σ) = τ√π exp(x2) (1 + erf(x)) dx ,(3.3) ur σ · where erf denotes the error function. Furthermore, as a consequence of balanced excitation and inhibition the neuron operates in the fluctuation-driven regime and its interspike interval (ISI) distribu- tion is exponential (in the limit of large ϑ/σ). Hence, the neuron spikes according to a Poisson process with rate r(σ) (Nobile, Ricciardi, & Sacerdote, 1985; Shadlen & Newsome, 1998). As a consequence, in a time interval during which σ is constant, r(σ) describes both the firing rate and the instantaneous rate of the neuron. Therefore, in the sequel we will simply continue referring to it as rate, and the length of the considered time interval indicates whether it makes sense to

84 materials and methods think of it as the firing rate (time intervals in the order of seconds) or instantaneous rate (time intervals in the order of milliseconds).

3.2.2 Information about the stimulus Learning with local plasticity rules is limited by the amount of infor- mation about the stimulus, encoded in the neuronal activity of presy- naptic and postsynaptic neurons, available per time. This amount of information, termed Fisher information, is quantified as the inverse variance of an optimal estimator (an estimator with minimal variance among all estimators) of the stimulus (see (Wasserman, 2013) for an introduction). In this section we analytically compare two local neu- ronal observables, namely spike count and membrane potential, with respect to how much information they convey about the stimulus, which is encoded in a rate. Concretely, we compute the variance of optimal rate estimators based on either spike times or voltage samples.

Information from spiking. As a consequence of balanced excitation and inhibition, the neuron spikes according to a Poisson process with rate r. Let t1,..., tn be the spike times observed in a time interval of length T. The maximum likelihood estimator of the rate of a Poisson process, which is an optimal estimator of the rate, is given by n rˆspike = ,(3.4) T and has variance spike r Var = ,(3.5) r T see (Wasserman, 2013). Hence, the Fisher information of the rate in a Poisson spike train is proportional to the length of the observed

85 3. rate based learning with short stimuli time interval. Further, as a consequence of the Poisson model the actual spike times are irrelevant. Therefore, if the rate estimate is based solely on spiking, then one can only increase the amount of information about the rate by observing the neuron for a longer time interval.

Information from membrane potential. The membrane potential evolves according to an OUP as in Equation (3.2) and the neuron spikes with rate r(σ), see Equation (3.3). Observing the membrane potential to extract information is modeled by taking samples u := u0,..., un of the membrane potential in a time interval of length T. Possible postsynaptic action potentials are not contained in the membrane potential trajectory as they are not explicit in the LIF neuron model. We assume equidistant sampling times with distance ε := T/n and refer to 1/ε as the sampling rate. The transition probability density of the OUP is   2  t t0 u u0 exp( − )  − − τ  exp   2(t t )   − σ2 1 exp( − 0 ) − − τ p(u, t u0, t0) = r ,(3.6) |  2(t t )  πσ2 1 exp( − 0 ) − − τ see for example (Tuckwell, 1988). This is the probability that u(t) is equal to u given that u(t0) was equal to u0. Therefore, the likelihood of the samples is

2 ! ( ε ) (ui+1 ui exp τ ) exp − −  n 1 2 ( 2ε ) − σ 1 exp τ L(σ; u) = − − − ,(3.7) ∏ r   i=0 πσ2 1 exp( 2ε ) − − τ

86 materials and methods the log-likelihood (only terms depending on σ are shown) is

n 1 ε 2 1 ui+ ui exp( ) l(σ; u) = ... n log σ − 1 − − τ ,(3.8) 2 ∑  2ε  − − σ i=0 1 exp( ) − − τ and the first derivative of the log-likelihood with respect to σ is

n 1 ε 2 d n 2 ui+ ui exp( ) l(σ; u) = + − 1 − − τ .(3.9) 3 ∑  2ε  dσ − σ σ i=0 1 exp( ) − − τ Thus, by the invariance principle (Wasserman, 2013), an optimal esti- mator of the rate is then given by

rˆvoltage = r(σˆ ),(3.10) with v u n 1 ε 2 u 2 ∑ − ui+1 ui exp( ) σˆ = t i=0 − − τ .(3.11) n (1 exp( 2ε )) · − − τ The expectation of the second derivative of the log-likelihood with respect to σ is d2 2T l(σ; u) = .(3.12) dσ2 − σ2ε Therefore, the variance of σˆ is σ2ε/(2T). Using the invariance principle and the delta method (Wasserman, 2013) we conclude that the variance of the rate estimator is 2  2 voltage σ ε d Var = r(σ) .(3.13) r 2T · dσ Note that the amount of information about the rate extractable from the membrane potential is not only proportional to the duration of

87 3. rate based learning with short stimuli observation but crucially also to the sampling rate. Therefore, if the rate estimate is based on the membrane potential, then the amount of information about the rate can be increased by a higher sampling rate. However, the sampling rate must be smaller than the spike arrival rate, which led to the approximation of the membrane potential by an OUP, as otherwise this approximation is not valid, see Section 3.4.1.

Time improvement. Let Tspike and Tvoltage be the duration of a stim- ulus that is required to extract a certain amount of information about the stimulus either from the spike train or the membrane potential evolution of a neuron encoding it. The factor of time improvement is given by spike T 2rpost = ,(3.14) voltage  2 T 2 d ( ) σ ε dσ r σ combining Equations (3.5) and (3.13).

3.2.3 Rate based plasticity with spiking neurons

A rate based plasticity rule describes the synaptic weight change ∆w := f (rpre, rpost) as a function of the presynaptic and postsynaptic rates rpre and rpost. A general plasticity rule realizes a particular rate based rule f if the expected weight change of the synaptic weight is equal to ∆w, after the presynaptic and postsynaptic neurons spiked with rates rpre and rpost for time T (the expectation is over the randomness of the Poisson spike trains) (Kempter et al., 1999; Izhikevich & Desai, 2003; Pfister & Gerstner, 2005). Crucially for learning, the actual weight change should be close to its expectation. Hence, an optimal plasticity rule minimizes the variance of the weight change, among all rules applying the same expected weight change.

88 materials and methods

Optimal SDP rule. We now derive a lower bound for the variance of spike the weight change Varw , induced by the postsynaptic variability, of any SDP rule realizing f . Applying a SDP rule for time T can be seen as a protocol to estimate the weight change ∆w. Hence, by the invariance principle we can obtain an optimal estimator for ∆w as

ˆ spike spike spike ∆w = f (rˆpre , rˆpost ),(3.15) where rˆpre and rˆpost are optimal estimators for the presynaptic and postsynaptic rates, given in Equation (3.4). This immediately defines an optimal SDP realization of f : first, estimate the rates according to Equation (3.4) and thereafter apply f to the estimates (the optimal voltage based rule is analogous, using Equation (3.10) respectively). By the delta method and Equation (3.5) we derive that the variance of spike the optimal estimator ∆ˆ w is

 2 spike rpost ∂ Varw = f (rpre, rpost) .(3.16) T · ∂rpost

Thus, we can conclude that each SDP rule applies a weight change with variance at least the variance computed above.

voltage Optimal VDP rule. Let Varw be the variance of the weight change induced by the postsynaptic variability of an optimal VDP rule. Anal- ogously to the derivation of Equation (3.15), the optimal VDP realiza- tion is given by

voltage spike voltage ∆ˆ = T f (rˆ , rˆ ),(3.17) w · pre post

89 3. rate based learning with short stimuli according to the invariance principle, and using the delta method together with Equation (3.13) we conclude

2  2  2 voltage σ ε d ∂ Varw = r(σ) f (rpre, rpost) .(3.18) 2T · dσ · ∂rpost

Time scale of stimuli. Combining Equations (3.16) and (3.18) imme- diately shows that for fixed variance the relative improvement factor of required stimulus duration for learning is given by Equation (3.14). This factor determines how much longer a stimulus needs to be sta- tionary in case of SDP compared to VDP to achieve the same accuracy in the desired weight change. In particular, this relative improvement factor is independent of the plasticity rule f , thus allows to conclude a general advantage of VDP over SDP regarding the time scale in which stimuli must be stationary.

3.2.4 Selectivity with the BCM rule The BCM theory (Bienenstock et al., 1982) is one of the most influential rate based learning theories (see (Cooper & Bear, 2012) for a review). The BCM rule maximizes selectivity and can reproduce formation of receptive fields in the visual cortex (Cooper & Bear, 2012). In this section we derive optimal SDP and VDP realizations of the BCM rule and define the computational task of selectivity. This task will later serve as an example of how to transform a rate based computational task into a fast spiking model.

BCM rule. The BCM rule defines the change in synaptic weight as f (r , r ) = r φ(r , r ), with nonlinear function φ and pre post pre · post post postsynaptic reference rate rpost. The function φ displays long-term

90 materials and methods depression (LTD) for low postsynaptic rate and long-term potentiation (LTP) for high postsynaptic rate, see Figure 32 (a). Further, rpost determines a sliding threshold between LTD and LTP, which depends nonlinearly on rpost on a slower time scale, and increases (decreases) if rpost has been large (small) for some time.

Optimal SDP and VDP realizations of the BCM rule. Let us for- mally define the BCM rule, with a particular choice of φ and sliding threshold, following (Intrator & Cooper, 1992), and its optimal SDP and VDP realizations. The weight change in a short time interval of length T during which the rates are assumed to be constant is

∆ = η r (r2 r r ),(3.19) w · pre · post − post · post where η > 0 is the step size. The change of the sliding threshold rpost is defined by 2 (rpost rpost) ∆rpost = − ,(3.20) τBCM with time constant τBCM. We now introduce optimal realizations, which achieve minimum variance among SDP (Izhikevich & Desai, 2003; Pfister & Gerstner, 2005) and VDP (Toyoizumi et al., 2005; Mayr & Partzsch, 2010; Clopath & Gerstner, 2010; Clopath et al., 2010) realizations of the BCM rule. Since the BCM rule is linear in the presynaptic rate, both SDP and VDP realizations simply perform a weight update for each presynaptic spike. Assume that in a short time interval of length T, the presy- naptic and postsynaptic cells spike with constant rate. According to Equation (3.15) and Equation (3.4), the optimal weight update of a

91 3. rate based learning with short stimuli

SDP rule is then given by    npost 2 npost η r ,(3.21) · T − post · T where npost is the number of postsynaptic spikes in the interval. According to Equation (3.17) and Equation (3.10), the optimal weight update of a VDP rule is given by   η r (σˆ )2 r r (σˆ ) ,(3.22) · − post · with σˆ as in Equation (3.11). The implementation of the sliding thresh- old in Equation (3.20) is analogous. The resulting VDP realization of the BCM rule relates fluctuations in the membrane potential to a synaptic weight change. This is in contrast to previous VDP rules, which have been shown to realize BCM under the assumption of Pois- son spike trains, since they rely on low-pass filtered versions of the membrane potential and thus cannot exploit the information in the fluctuations (Toyoizumi et al., 2005; Mayr & Partzsch, 2010; Clopath & Gerstner, 2010; Clopath et al., 2010).

Selectivity. The task of selectivity is that a neuron becomes selective to one particular stimulus out of a set of stimuli. Here we formulate this task (based on the simulation paradigm of (Clothiaux, Bear, & Cooper, 1991)) in a spiking model. We consider a feed forward network with N excitatory input neu- rons and one output neuron, see Figure 32 (b) and (f). A stimulus is | described by ν = (ν1,..., νN ) , where the k-th component corresponds to the rate of the k-th input neuron. Moreover, we denote the vector | of synaptic weights by w = (w1,..., wN ) , hence the weight of the

92 materials and methods synapse connecting the k-th excitatory neuron with the output neuron is wk. To model balanced excitation and inhibition, each excitatory in- put neuron is accompanied by an inhibitory neuron and the respective weights are mirrored. It has been shown in (Vogels et al., 2011) how this mirroring is achieved by inhibitory plasticity in an experience dependent manner. Thus, the expected input to the output neuron N 2 is zero and the variance of the input is 2 ∑k=1 νkw . Consequently, q k N 2 the rate of the output neuron is r(σ) with σ := 2τ ∑k=1 wk νk, see Section 3.2.1. We stress that σ is a function of the stimulus and the weight, but do not indicate this in the notation for simplicity. (j) m (j) We denote by ν j=1 a set of m stimuli and define σ := q { } N 2 (j) 2τ ∑k=1 wk νk . Moreover, let pj be the probability of the j-th stim- ulus (the pj’s form a probability distribution P over stimuli). The selectivity of the output neuron for a given weight vector w is defined as E [r(σ)] Sel(w) = 1 P .(3.23) (j) − maxj r(σ ) Further, we round down responses below 1 Hz to 0 Hz and apply the convention that 0/0 = 1 to avoid trivial selectivity. Note that if all stimuli result in the same postsynaptic response, then the selectivity is 0, however if the response is nonzero for exactly one stimulus and zero for all others, the selectivity is at its maximum 1 1/m. −

Simulation protocol. The BCM rule makes the weights converge to a maximally selective fixed point. The simulation protocol is as follows: first, fix the duration of stimulus presentation T. Thereafter, in each round pick a stimulus ν(j) according to the probability distribution P and simulate the output neuron for time T with the corresponding

93 3. rate based learning with short stimuli

input. From this derive npost, the number of spikes of the neuron and u1,..., un, the samples of the membrane potential. Then up- date the weight of the k-th synapse according to Equation (3.21), for the SDP rule, and Equation (3.22), for the VDP rule, multiplied by (j) νk . After each round, the sliding threshold is updated according to Equation (3.20), using the respective estimators of the postsynaptic rate.

3.3 Results

Our first result shows that the information about the stimulus ex- tractable per time from the membrane potential is much higher than the information content of the spike train, see Figure 31 (a) and Sec- tion 3.2.2. This is a consequence of excitation and inhibition being balanced, because in the balanced setting, the membrane potential changes due to a large number of input spikes, while the neuron produces only few output spikes (Shadlen & Newsome, 1998), see also Section 3.4.3. The Fisher information about the rate, which encodes stimuli, obtained from neuronal spiking is proportional to the dura- tion of observation, see Equation (3.5). Thus, if the stimulus is present only for short time, then only a limited amount of information about it is available to a synaptic plasticity mechanism depending solely on spiking. The Fisher information obtained from the membrane potential is not only proportional to the duration of observation, but also to the sampling rate, see Equation (3.13). Hence, if the stimulus is present only for short time, the amount of information extractable from the membrane potential can exceed the limit of the spike based case. We illustrate this with a concrete example: consider the neuron firing with 10 Hz. How long does it take to obtain an estimate of

94 results the rate that is within 5 Hz accuracy with 70 % confidence? Based on neuronal spiking this takes at least 500 ms, however, sampling the membrane potential with a sampling rate of 1 kHz requires only 10 ms, see Figure 31 (b) grey line. Therefore, in the latter it is fine if the stimulus changes every 10 ms. This is in the order in which the instantaneous rate changes in vivo (Shadlen & Newsome, 1998). Thus, it is possible to extract information about the instantaneous rate from voltage traces in contrast to spike trains of the same duration. The relative time improvement of voltage based estimation over spike based, given in Equation (3.14), is at least one order of magnitude for typical neuronal parameters, see Figure 31 (c). Our second result directly relates the previous observation to synap- tic plasticity. A rate based plasticity rule defines the synaptic weight change as a function of the presynaptic and postsynaptic firing rates. We derive optimal SDP and VDP realizations of any rate based rule in Section 3.2.3. For optimal SDP rules, the variance of the applied weight change scales as the inverse of the Fisher information about the rate obtained from neuronal spiking, see Equation (3.16), whereas for optimal VDP rules the variance scales as the inverse of the Fisher in- formation obtained from the membrane potential, see Equation (3.18). In particular, if a stimulus is stationary only on a short time scale, the SDP rule applies a weight change that can be far from the desired weight change. In contrast, a VDP rule can still be highly accurate, see Figure 31 (d). This “speed" improvement of VDP rules over SDP rules is determined in Section 3.2.2 and is equal to the improvement of information retrieval, see Equation (3.14). Hence, the improvement factor is independent of the specific learning rule at hand. Thus, this constitutes a general improvement in learning speed for VDP over SDP and highlights that VDP can operate on the timescale in which

95 3. rate based learning with short stimuli the instantaneous rate changes in vivo. Finally, we illustrate the previous considerations on the classic learn- ing task of selectivity, see Section 3.2.4. Given a collection of stimuli in the form of rate profiles of input neurons (e.g. representing the activity of the lateral geniculate nucleus (LGN) induced by an angular bar sweep), the task is to make the output neuron selective for one particular stimulus: the output neuron should strongly respond to one stimulus and remain quiet for any other stimulus. The network and the stimuli are depicted schematically in Figure 32 (b) and (f). This task is solved by the BCM learning rule acting on the synapses (Bienenstock et al., 1982). First, we study two orthogonal stimuli presented to the network. In each stimulus one input neuron spikes with a certain rate while the other input neuron is quiet, see Figure 32 (b). For orthogonal stimuli the BCM rule guarantees that the weight vector converges to a maximally selective fixed point, if the stimuli are presented randomly, round by round, for a certain duration. We choose the same stimulus duration for both the VDP and the SDP rule (10 ms), which reflects the time scale on which the instantaneous rate changes in vivo (Shadlen & Newsome, 1998). For the optimal VDP realization of the BCM rule, given in Equation (3.22), the weights converge to a maximally selective fixed point, see Figure 32 (c). For the same stimulus duration, the variance of the weight changes induced by the optimal SDP realization of BCM, given in Equation (3.21), is much larger, see Figure 32 (d). This variability results in bad performance because the weights leave the maximally selective fixed point, causing instability, see Figure 32 (e). Therefore, to bound the variance of the weight change and thus guarantee stability, the stimuli must be available significantly longer for the SDP rule than for the VDP rule.

96 results

We next investigate the performance for more realistic stimuli (Clothiaux et al., 1991). Here, each stimulus has a Gaussian profile with certain peak and base rates and standard deviation, see Fig- ure 32 (f). With such stimuli, the convergence is not guaranteed, and the maximal selectivity decreases with increasing base/peak rate ratio and standard deviation of the Gaussian profile. Since the weight vector does not converge, but BCM only increases the selectivity, the variance of the weight changes induced by learning determines how selective the neuron can be. We now choose different stimulus duration for the VDP rule (10 ms) and the SDP rule (500 ms). With significantly longer stimulus duration for the SDP rule, both realizations of BCM yield similar performance, shown as a function in the number of stimulus presentations in Figure 32 (g). This implies that the total exposition time of the neuron to stimuli is at least an order of magnitude smaller for the VDP rule compared to SDP rule, see Figure 32 (h) where the selectivity is shown as a function of total exposition time on a log scale.

97 3. rate based learning with short stimuli a c

150

100

50 Rateestimate Time improvement Time Voltage Spikes 1,000 2,000 3,000 4,000 5,000 Sampling rate (Hz) b d 30 0.04

20 0.02

10 0.00 Weight change Weight 0.02 0 −

SD rate estimate(Hz) SD rate 2 1 0 1 2 1 0 10 − 10 − 10 10 10 − 10 − 10 Stimulus duration (s) Stimulus duration (s)

Figure 31: Required stimulus duration of SDP and VDP rules. (a) Obtaining information about the rate from spikes (blue) and voltage (green). The amount of information is quantified as the inverse vari- ance of the optimal rate estimate (Fisher information). (b) Standard deviation (SD) of the rate estimate based on spikes (blue) and voltage (green) as function of stimulus duration. Horizontal grey line indi- cates that for fixed information level the required duration differs by an order of magnitude. Dashed lines correspond to Equations (3.5) and (3.13), solid lines are respective simulations (empirical SD of esti- mates according to Equations (3.4) and (3.10) of a simulated neuron). (c) Factor of time improvement for information extraction as a function of sampling rate according to Equation (3.14), for different firing rates 10 Hz (solid), 20 Hz (dotted), 40 Hz (dashed). (d) Weight change as function of stimulus duration. Grey horizontal line indicates desired weight change, shaded areas show one SD of the weight change ap- plied by optimal SDP rule (blue) and VDP rule (green) according to Equations (3.16) and (3.18). (Continued on the following page.)

98 results

Figure 31: Parameters (if not varied in the respective plot) are r = 10 Hz, ϑ = 55 mV, u = 70 mV, τ = 0.02 s, 1/ε = 1 kHz, 100 trials. − r −

99 3. rate based learning with short stimuli

a c d e 0.9 1.5 0.4 0.8

1.0 0.2

0.7 Weight b Weight Selectivity 0.6 0.5 0.0 0 1,000 2,000 3,000 0 1,000 2,000 3,000 0 1,000 2,000 3,000 f Time (s) Time (s) Time (s) g h

0.8 0.8

0.6 0.6

0.4 0.4 Selectivity Selectivity

0.2 0.2 1 0 1 2 3 4 0 1 2 3 4 5 10 − 10 10 10 10 10 Stimulus resentations 10 5 Time (s) × Figure 32: Fast selectivity with BCM and natural stimuli. (a) The BCM learning rule; weight change as function of the postsynaptic rate. (b) Task with orthogonal stimuli (dashed gray and gray) and two input neurons; for orthogonal stimuli the weights converge to a maximally selective fixed point (rate based analysis). (c) Evolution of the two weights (light and dark green) from (b) over time for the optimal VDP realization of BCM. (d) Evolution of the weights (light and dark blue) from (b) over time for the optimal SDP realization of BCM. (e) Respective selectivity of the weights in (c) and (d) over time; while the VDP rule (green) converges, the SDP (blue) jumps out of the maximally selective fixed point. Parameters in (c), (d) and (e) are N = 2, m = 2, peak rate 10 Hz, initial weights 0.8 mV, T = 10 ms, ϑ = 55 mV, ur = 70 mV, τ = 0.02 s, 1/ε = 1 kHz, 6 − − η = 10− , τBCM = 1, 000, θ = 1 Hz.(f) Task with non-orthogonal stimuli (Gaussian rate profiles). For such stimuli, BCM still increases the selectivity (rate based simulation). (g) Selectivity as a function of the number of stimulus presentations; duration of individual stimuli is 500 ms for SDP rule (blue) and 10 ms for the VDP rule. (Continued on the following page.)

100 results

Figure 32:(h) Selectivity as a function of time (log scale). Duration of stimuli is chosen such that the variance of the weight change for SDP rule (blue) and VDP rule (green) match and are small to allow close to optimal selectivity, see (g). Parameters in (g) and (h) are N = 100, m = 10, peak rate 10 Hz, base rate 2 Hz, standard deviation of Gaussian rate profile 10, initial weights 0.1 mV, ϑ = 55 mV, ur = 70 mV, 6 − − τ = 0.02 s, 1/ε = 1 kHz, η = 10− , τBCM = 1, 000, θ = 1 Hz.

101 3. rate based learning with short stimuli

3.4 Discussion

In this section, we discuss the assumptions of our model and speculate on the connection of voltage fluctuations and plasticity. Further, we summarize related work on the balance of excitation and inhibition and show why it is crucial to our model. Finally, we conclude with some predictions that our model makes.

3.4.1 Neuron model We use Stein’s model and its diffusion approximation as an abstrac- tion for cortical in vivo neuronal dynamics, following the influential paper by Michael N. Shadlen and William T. Newsome (Shadlen & Newsome, 1998). Their work, which contains a detailed discussion of the biological justifications (La Camera, Giugliano, Senn, & Fusi, 2008) and limitations of the model, points out a crucial property implied by excitation and inhibition being balanced: the neuron produces a highly variable postsynaptic spike train, which is essentially independent of the spike timing of the presynaptic neurons. This is consistent with experimental observations (Softky & Koch, 1993; Cohen & Maunsell, 2009). We approximate Stein’s model by its diffusion approximation. The diffusion approximation is justified by the large number of post- synaptic potentials (PSPs) arriving at cortical neurons, a phenomenon known as high input or synaptic bombardment regime. Rough es- timates (100-1,000 neurons out of 1,000-10,000 input neurons spike with a rate of 10 Hz) yield a spike arrival rate in the order of 1 kHz -10 kHz (Shadlen & Newsome, 1998). In our approach we estimate the variance of the diffusion approximation by sampling the membrane potential. It is clear that the sampling rate cannot be higher than the arrival rate because otherwise the approximation would be invalid (the

102 discussion difference between two samples is assumed be a Gaussian, but if the sampling rate is too high, then this assumption does not hold). Hence, the arrival rate determines a natural upper bound for a reasonable sampling rate. Notably, taking the diffusion approximation is not nec- essary to recover our results qualitatively: in Stein’s model (allowing for fewer and stronger synaptic inputs) the information about the rate contained in the membrane potential trajectory typically exceeds the information in spike trains as long as the arrival rate is significantly higher than the output rate.

3.4.2 Biophysical connection of voltage fluctuations and plasticity Experiments revealed that synaptic plasticity depends on the presy- naptic and postsynaptic rates (Brown et al., 1988; Dudek & Bear, 1992; Bliss et al., 1993; Sjöström et al., 2001), the exact time difference of presynaptic and postsynaptic spikes (Bi & Poo, 1998; Markram, Lübke, Frotscher, & Sakmann, 1997; Froemke & Dan, 2002; Wang et al., 2005), the postsynaptic membrane potential (Artola et al., 1990; Ngezahayo et al., 2000; Sjöström et al., 2004), and ultimately the calcium concentration in the postsynaptic dendritic spine in conse- quence of voltage-dependent calcium and NMDA receptor channel activation (Mulkey & Malenka, 1992; Cummings, Mulkey, Nicoll, & Malenka, 1996). Modest calcium levels cause LTD whereas high lev- els result in LTP (Shouval, Castellani, Blais, Yeung, & Cooper, 2002; Graupner & Brunel, 2012). Hence, via voltage-gated calcium and NMDAR channels, plasticity is inherently voltage dependent. Thus, the magnitude of voltage fluctuations may translate to different levels of calcium concentration. In particular, as the calcium influx depends nonlinearly on the voltage due to channel activation thresholds, larger

103 3. rate based learning with short stimuli voltage fluctuations might lead to a higher calcium concentration even if the mean voltage stays unchanged. This establishes a possible link between voltage fluctuations and plasticity. However, it is not clear if there exists a mechanism that implements the estimator in Equa- tion (3.11) and thereby exploits a high sampling rate to estimate the voltage fluctuations precisely. In our neuron model the somatic membrane potential is a local observable at the postsynaptic part of the synapse. However, this is a strong assumption, which is only legitimate for synapses close to the soma. For synapses on distant dendritic spines, the somatic membrane potential can be replaced by a local potential in the dendritic com- partment, which potentially still contains more information about the postsynaptic instantaneous rate than single back propagating action potentials (BAPs) (Markram, Helm, & Sakmann, 1995). Notably, our approach requires a biophysical pathway that trans- ports information about the fluctuations of the somatic membrane potential of the postsynaptic neuron (and thus its instantaneous rate) back along the dendrites to the postsynaptic site of a synapse, see also Section 3.4.4. There is experimental evidence that this is possible for the mean of the membrane potential (Artola et al., 1990; Ngezahayo et al., 2000; Sjöström et al., 2004) and see (Lisman & Spruston, 2005) for a review of the voltage dependence of LTP and LTD. This pathway does not need to be fast, and the signalling mechanism does not necessarily need to be a voltage signal.

3.4.3 Balanced excitation and inhibition It has been observed that excitatory and inhibitory synaptic inputs to cortical neurons exhibit strong temporal and quantitative relations, a

104 discussion phenomenon termed balanced excitation and inhibition, see (Okun & Lampl, 2009) for review. One distinguishes two types of balance: (1) loose balance, where a large number of uncorrelated small excitatory and inhibitory synaptic inputs cancel each other out on average (2) tight balance, where inhibition closely tracks excitation with a very short time lag, see (Denève & Machens, 2016). Loose balance was postulated to explain the high-degree variability in neuronal responses to natural stimuli (Softky & Koch, 1993; Shadlen & Newsome, 1994, 1998). This led to a widely accepted class of network models (balanced networks) that display asynchronous irregular spiking dynamics (van Vreeswijk & Sompolinsky, 1996; Brunel, 2000; Renart et al., 2010), resembling the activity in many cortical areas. Tight balance has been suggested to be a signature of highly efficient coding, see (Denève & Machens, 2016) and the references therein, however it is not consistent with trial to trial variability of neuronal responses and asynchronous irregular firing (Okun & Lampl, 2008). We model loose balance that is maintained over time and stimuli (termed detailed balance in (Vogels et al., 2011)). Hence, the mean of the membrane potential µ is constant over time and stimuli. Therefore, stimuli are encoded in the fluctuations of the membrane potential, rather than its mean, see Equation (3.3). This implies that the instan- taneous rate can be decoded from the membrane potential quickly depending on the sampling rate, see Equation (3.13). Without balance, we would have to write the rate as r(µ, σ) and the variance of an optimal rate estimator based on the membrane potential becomes

2  2 2  2 voltage σ ε ∂ σ τ ∂ Var = r(µ, σ) + r(σ, µ) ,(3.24) r 2T · ∂σ T · ∂µ derived along the lines of Equation (3.13). In this case the information

105 3. rate based learning with short stimuli about the rate in the membrane potential cannot simply be increased by a higher sampling rate, since the variance of the mean potential only decreases with observation time T and the membrane time constant τ, indicated by the second term of Equation (3.24). It turns out that the time scale of information extraction is thus in the same order as in the spike based case. This reveals that one functional advantage of the loosely balanced state is efficient encoding of the instantaneous rate in the membrane potential.

3.4.4 Predictions of our model The main prediction of our model is how synaptic plasticity depends on the variance of the postsynaptic membrane potential, assuming a specific rate based learning rule, for example the BCM rule. So far, voltage dependence has only been studied with fixed postsynaptic (super-threshold) depolarization, inconsistent with in vivo conditions, without controlling the variance of the depolarization (Artola et al., 1990; Ngezahayo et al., 2000; Sjöström et al., 2004). This revealed the existence of a voltage threshold for LTD and a higher voltage threshold for LTP induction (Artola et al., 1990). It would be interesting to study high variance depolarization because in this way both thresholds are reached and it is not clear how or if the LTP and LTD components are combined. Concretely, the rate of the postsynaptic neuron can be controlled in two ways by current injection: (1) by injecting a current with large mean and zero variance (2) by injecting a current with small mean and large variance (La Camera et al., 2008). Our hypothesis is that the effect on the synaptic efficacy only depends on the rate of the neuron, not how it is induced. If this does not hold true, this would give an argument against rate based plasticity models.

106 discussion

Furthermore, our model predicts that as a consequence of loosely balanced excitation and inhibition the instantaneous rate can be well estimated from voltage recordings. To test this hypothesis one can compute the instantaneous rate of a neuron in vivo using two proto- cols. The classic protocol is via construction of the PSTH from many spike train recordings. Our proposed protocol is to estimate it via Equation (3.10) from a single or few voltage recordings. We hypothe- size that the number of required voltage recordings is much smaller than the number of spike train recordings in order to get a certain accuracy. As a consequence the number of required repetitions of the experiment can be reduced in order to compute the instantaneous rate and interestingly the instantaneous rate could also be computed in scenarios where the experiment cannot be repeated at all since the stimulus is actually unknown.

107

4 Mutual inhibition with few inhibitory cells

The results in this chapter were obtained in joint work with Marcelo Matheus Gauy, Xun Zou and Angelika Steger.

4.1 Introduction

Dale’s law states that neurons perform the same chemical action at all their synaptic connections to target cells, regardless of the identity of the target (Dale, 1935). In particular, neurons can be subdivided into two broad classes: neurons that excite other neurons and neurons that inhibit other neurons. Many computational models of neural networks violate Dale’s law: in such networks, neurons can simultaneously excite and inhibit other neurons depending on the weight of their synaptic connections, see for example (Rosenblatt, 1958; Hopfield, 1982; Földiák, 1990).The tra- ditional way to transform such networks into networks obeying Dale’s law is to introduce a population of excitatory neurons and accompany each excitatory neuron with an inhibitory one. Then, positive connec- tions in the old network are simply connections between excitatory neurons in the new network and negative connections in the old net- works are implemented via the respective inhibitory neuron (Barlow & Földiák, 1989). However, this construction has a fundamental flaw: the number of inhibitory neurons is as large as the number of excitatory neurons – whereas in the brain the number of inhibitory neurons is much smaller than the number of excitatory neurons (Isaacson &

109 4. mutual inhibition with few inhibitory cells

Scanziani, 2011). In this letter we propose a novel way to embed networks not obey- ing Dale’s law in networks that do – and have much fewer inhibitory neurons than excitatory ones. Our networks exploit nonlinear interac- tion of inhibitory synapses on dendritic compartments of excitatory neurons, reminiscent of local dendritic spikes implementing a logical AND function by coincidence detection (London & Häusser, 2005; Stuart & Spruston, 2015). In this way it is possible to have all-to- all inhibitory connections between n excitatory neurons with only Θ(k2 log(n)) inhibitory neurons, where k is an upper bound on the · number of excitatory neurons that are active simultaneously. Tradi- tional networks require Θ(n) inhibitory neurons, as Θ(n2) inhibitory synaptic weights have to be represented in the network. Interestingly, we show that no specific hard-coded wiring is necessary, yet random connectivity between excitatory and inhibitory neurons achieves the desired properties. This is shown using the combinatorial concept of cover-free set families (Füredi, 1996; Kautz & Singleton, 1964). For concreteness, we test the embedding on a neural network classifier for MNIST and discuss the construction in light of a network model for sparse coding in Dentate Gyrus introduced in (Földiák, 1990) and related models of dendritic computation (Poirazi & Mel, 2001).

4.2 Model

We first describe a generic network model which does not respect Dale’s law, and refer to it as non-Dale network in the sequel. Consider a network of N binary threshold neurons (McCulloch & Pitts, 1943). The weight of the synapse projecting from the i-th neuron to the j-th neuron is denoted w R. Thus, the state x 0, 1 of the j-th ij ∈ j ∈ { }

110 model neuron is defined as

N ! xj = H θ + ∑ wij xi ,(4.1) − i=1 · where H denotes the Heaviside step function and θ R+ is the ∈ threshold. We call the sum in (4.1) the activation of the j-th neuron and say that the j-th neuron is active if xj = 1 and inactive otherwise. We now propose a network model that respects Dale’s law and includes nonlinear interaction of inhibitory synapses on the dendritic tree of excitatory neurons. We refer to it as Dale network. Consider a network of NE excitatory and NI inhibitory binary threshold neurons. The state of the i-th excitatory (inhibitory) neuron is xE 0, 1 (xI i ∈ { } i ∈ 0, 1 ). The weight of the synapse projecting from the i-th excitatory { } neuron to the j-th excitatory (inhibitory) neuron is denoted wEE R+ ij ∈ 0 (wEI R+). We model the nonlinear interaction of inhibitory synapses ij ∈ 0 on dendritic compartments of excitatory neurons as follows. Each excitatory neuron has several dendritic compartments on which a set of inhibitory synapses is located spatially close. We index these compartments with the set of inhibitory neurons that project to it and denote the possible sets of inhibitory synapses that may interact with each other by ([NI ]). Thus, the state of the dendritic C ⊆ P compartment indexed by the set C on the j-th excitatory neuron ∈ C is denoted y 0, 1 . Further, a dendritic compartment is active if Cj ∈ { } all inhibitory neurons projecting to it are active, this is

I yCj = ∏ xi .(4.2) i C ∈ Hence, the dendritic compartment computes the logical AND func- tion among the inhibitory synapses projecting to it, reminiscent of

111 4. mutual inhibition with few inhibitory cells coincidence detection via dendritic spikes (London & Häusser, 2005). The impact of the C-th dendritic compartment of the j-th excitatory neuron on the activation of the j-th excitatory neuron is wIE R , Cj ∈ 0− which may correspond to the distance of the clusters to the soma or the intensity of the dendritic spike (Poirazi, Brannon, & Mel, 2003). Therefore, the state of the j-th excitatory neuron is

NE ! E EE E IE xj = H θ + ∑ wij xi + ∑ wCj yCj .(4.3) − i=1 · C · ∈C I Finally, the state xj of the j-th inhibitory neuron is defined as

N ! I EI E xj = H θ + ∑ wij xi .(4.4) − i=1 ·

Concluding, we remark that our proposed network model is a gener- alization of traditional networks with single compartment neurons, which can be seen by choosing = i i [NI ] . A schematic C {{ } | ∈ } illustration of an excitatory neuron in our network model is shown in Figure 41.

112 model

Figure 41: Schematic illustration of an excitatory neuron with its dendritic tree. Excitatory synapses are black, compartments with nonlinear multiplicative interaction of inhibitory synapses are purple, and the somatic compartment with linear summation is blue.

113 4. mutual inhibition with few inhibitory cells

4.3 Results

Here, we show how any non-Dale network can be embedded in a Dale network. By embedding we mean that we construct a Dale network such that if in the non-Dale network the i-th neuron sends weight w to the j-th neuron, then in the Dale network the i-th excitatory neuron sends weight w to the j-th excitatory neuron. More precisely, sending weight w means that if xi = 1 then w is added to the activation of the j-th neuron.

4.3.1 Embedding a non-Dale network in a Dale network Consider a non-Dale network on N neurons. We construct a Dale network on NE = N excitatory and NI inhibitory neurons (NI is EE chosen below). To embed positive weights, we simply set wij = wij for all wij > 0. To embed negative weights we have to make use of the inhibitory neurons. Firstly, we associate with the i-th excitatory neuron a subset Bi of inhibitory neurons. We denote the union of all B ’s by (particular set families are chosen below). Now we set the i B B weights from excitatory neurons to inhibitory neurons such that if the i-th excitatory neuron is active then the set Bi of inhibitory neurons is active. More precisely, we set wEI = θ for j B . Secondly, we set ij ∈ i IE = . Lastly, we set w = wij for all wij < 0. The resulting network C B Bi j structure is depicted in Figure 42.

4.3.2 A property for B In order that the embedding works (i.e. all weights are sent properly), we need that the set family satisfies a certain property. This property B is defined as follows.

114 results

Figure 42: Schematic illustration of the embedding. The excitatory neuron on the left (blue), connects to its subset of inhibitory neurons (red), which are in turn connected to clusters on dendrites (purple) of multiple different excitatory target neurons (blue, right).

Definition 4.1 (r-cover-free). A family of sets is r-cover-free if S F 0 6⊆ S ... S holds for all distinct S , S ,... S . 1 ∪ ∪ r 0 1 r ∈ F Cover-free families were introduced in the context of Coding Theory as binary superimposed codes in 1964 by Kautz and Singleton (Kautz & Singleton, 1964). They are studied in Combinatorics (Füredi, 1996) and are also known as r-disjunct matrices in the context of group testing (Bush, Federer, Pesotan, & Raghavarao, 1984).

4.3.3 Embedding works if is cover-free B Now we show that if is k-cover-free and the number of coactive B excitatory neurons is always at most k, then the embedding described in Section 4.3.1 works.

115 4. mutual inhibition with few inhibitory cells

It is obvious by the construction that the positive weights are sent. Thus, it remains to show the same for negative weights. Firstly, if the i-th excitatory neuron is active then the corresponding set Bi of inhibitory neurons is active, the dendritic compartment yBi j is active and thus the weight wIE = w is sent to the j-th excitatory neuron. Bi j ij Secondly, if the i-th neuron is inactive then wij is not sent to the j-th excitatory neuron; this follows immediately from the fact that = B C is k-cover-free: assume that wIE = w is sent because y is active as Bi j ij Bi j the set Bi of inhibitory neurons is active. Let A be the set of excitatory neurons that are active. Note that A k. Since the i-th excitatory | | ≤ neuron is inactive it follows that [ B B ,(4.5) i ⊆ j j A ∈ which is a contradiction to being k-cover-free. B

4.3.4 Explicit embeddings So far, we described an abstract (in terms of ) embedding of a non- B Dale network in a Dale network and showed that the embedding works if the set family is cover-free. Hence, it remains to consider B explicit embeddings, with explicit choices of . In the sequel we B introduce two specific choices and discuss the resulting embeddings.

Traditional embedding. For the first embedding, we choose = B i i [NE] (it is easy to see that is NE-cover-free as the sets {{ } | ∈ } B are disjoint). This is the traditional embedding, where each excitatory neuron is accompanied with an inhibitory neuron that mediates the inhibitory signals emitted by the excitatory neuron. Hence, in this

116 results embedding NI = NE and the embedding does not make use of the nonlinear interaction of inhibitory synapses (each compartment contains a single inhibitory synapse). Note that as is NE-cover-free, B there is no restriction on how many excitatory neurons may be active simultaneously.

Random embedding. For the second embedding we choose ran- B domly. Let k be an upper bound on the number of excitatory neurons that may be active simultaneously. Further, let NI = Θ(k2 log NE). Now, pick the sets Bi by randomly including each inhibitory neuron with probability p = Θ(1/k). The fact that the resulting family is B with high probability k-cover-free follows from the following lemma, a folklore result, which can be found in (Wikipedia, 2017).

Lemma 4.2 (Random construction). Let Ω be a set of size m. Let = F S i [n] , where each S includes each element of Ω independently with { i | ∈ } i probability 1/(4r). There exists a constant c 0 such that if m = cr2 log n, ≥ then is r-cover-free with probability at least 1 2n 1. F − − Hence, the random embedding works with high probability. Further, p if k o( NE/(log NE)) holds, then the number of inhibitory neurons ∈ is much smaller than the number of excitatory neurons. Therefore, the embedding utilizes the nonlinear interaction of inhibitory synapses to save on the number of inhibitory neurons. This improvement comes at a cost. Firstly, the number of simultaneously active excitatory neurons is restricted. Secondly, the number of individual inhibitory synapses is increased: in the traditional embedding the number of inhibitory synapses is N2, whereas the random embedding requires Θ(N2 k log N) inhibitory synapses. ·

117 4. mutual inhibition with few inhibitory cells

The error of the random embedding for networks of the size of a cortical column is shown in Figure 43. We measure the error as the expected number of excitatory neurons that wrongly send inhibitory signals given that a random set of k excitatory neurons is active. If all B ’s have size d := p NI, the expected number of active inhibitory i · neurons is M := NI (1 (1 p)k). Further, if the number of active · − − I inhibitory neurons is M, the error is E := (NE k) (M)/(N ). The − · d d analytically derived error and the simulated error are shown in Fig- ure 43. For an activity bound consistent with a lognormal firing rate distribution in cortex (Buzsáki & Mizuseki, 2014)) we observe that a realistic proportion of 80 % excitation to 20 % inhibition (Isaacson & Scanziani, 2011) is feasible. Finally, to study the error of the random embedding on an artifi- cial neural network solving a particular task, we train a multiclass logistic regression network (Bishop, 1996) on MNIST (LeCun, 1998). To obtain binary and sparse neuronal activity the MNIST images are thresholded; pixels with at most 0.8 intensity are rounded to zero and pixels with more than 0.8 intensity are rounded up to one. This yields sparse activity with an average of roughly 10 % pixels (i.e. neurons) being active. The network is trained with batch size 100 for 1, 000 steps with stochastic gradient descent and learning rate 0.5 using TensorFlow (Abadi et al., 2015). The trained network achieves an accu- racy of roughly 0.9 on the MNIST testset. We then apply the random embedding to the trained network and plot the resulting accuracy in Figure 44. The ratio of inhibition to excitation required to obtain good accuracy is larger than in Figure 43, because the network size is by a factor 10 smaller and the quality of the embedding improves with growing network size (the analysis in Lemma reflem:randomcoverfree is asymptotic for n ∞). →

118 results

a b c 20 8 20 7 15 6 15 5 10 4

Error 10 Error Error 3

5 5 2 1 0 0 0 1500 1800 2100 2400 60 80 100 120 5 10 15 20 25 30 N I k p N I · Figure 43: Error of the random embedding. The error is measured as the expected number of excitatory neurons that wrongly send inhibitory signals (simulation in blue, error bars show standard de- viation for 100 trials and analytic formula in red). The number of excitatory neurons is NE = 8, 000. If not altered in the plot, the num- ber of inhibitory neurons is NI = 2, 000, the bound on the maximum number of coactive excitatory neurons is k = 100, and the size of the I Bi’s is p N = 20 (we sample random Bi’s of fixed size, to prevent small number· effects). (a) Error as function of the number of inhibitory neurons. (b) Error as function of the number of coactive excitatory neurons. (c) Error as function of expected inhibitory interaction size.

119 4. mutual inhibition with few inhibitory cells

a b 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 Accuracy Accuracy

0.2 0.2

0.0 0.0 100 200 300 400 500 600 1 2 3 4 5 6 7 8 9 10 N I p N I · Figure 44: Accuracy of the random embedding applied to a multiclass logistic regression network classifying MNIST digits (blue, error bars show standard deviation for 10 trials) versus accuracy of the original network (red). The accuracy is the ratio of correctly classified images over all images in the testset. The number of excitatory neurons is NE = 784, the bound on the maximum number of coactive excitatory neurons is roughly k = 78 (obtained by thresholding the images). We sample random Bi’s of fixed size, to prevent small number effects. (a) Accuracy of the embedding as function of the number of inhibitory I neurons; the size of the Bi’s is p N = 4 (blue) and 7 (light blue). (b) Accuracy of the embedding as· function of expected inhibitory interaction size; the number of inhibitory neurons is NI = 300 (blue) and 200 (light blue).

120 discussion

4.4 Discussion

In this chapter we proposed a network model that allows excitatory neurons to mediate inhibitory signals to all other excitatory neurons. Crucially, under the assumption of sparse activity, the number of required inhibitory neurons in the network is much smaller than the number of excitatory neurons. We thereby show how networks which do not obey Dale’s law can be transformed into networks which do. However, our approach is more general and not limited to conventional neural networks in which excitation and inhibition play equal but opposite roles. In particular, it is also applicable to more biological network models in which inhibition serves a specific role which is different from the mere inverse of excitation, see for example (Oster, Douglas, & Liu, 2009; Boerlin, Machens, & Denève, 2013; Denève, Alemi, & Bourdoukan, 2017) and also the following section.

4.4.1 Decorrelation in Dentate Gyrus The influential modeling work of David Marr proposed that the func- tional role of the Dentate Gyrus (DG) is to decorrelate inputs from the Entorhinal Cortex (EC) (Marr, Willshaw, & McNaughton, 1991). It has been shown that for models achieving decorrelation, mutual inhibition between excitatory neurons is crucial (Barlow & Földiák, 1989; Földiák, 1990; Rolls, Stringer, & Elliot, 2006; Wiechert, Jud- kewitz, Riecke, & Friedrich, 2010). Simplifying, mutual inhibition between excitatory neurons increases competition among them and thus facilitates decorrelation. In particular, specific inhibitory all-to-all interaction is necessary, as the inhibitory connection strength between any pair of excitatory neurons must reflect their degree of correlation

121 4. mutual inhibition with few inhibitory cells to ensure decorrelation. The authors of (Barlow & Földiák, 1989) were aware of the problematic assumption of mutual inhibition between ex- citatory neurons, as mediating the inhibitory signal through inhibitory neurons would require an inhibitory neuron for each excitatory neu- ron (i.e. the traditional embedding of a non-Dale network in a Dale network). The proposed random embedding overcomes this limita- tion by making use of nonlinear interactions of inhibitory synapses on dendritic compartments of excitatory neurons. Their model of sparse coding also inherently limits the number of excitatory neurons that may be active simultaneously in DG, which allows to reduce the number of required inhibitory neurons by the random embedding.

4.4.2 Local nonlinear synaptic interaction The proposed nonlinear interaction of inhibitory synapses is motivated by the established nonlinear interaction of excitatory synapses, namely local dendritic spikes in active dendrites, see (London & Häusser, 2005; Stuart & Spruston, 2015) for review. Local dendritic spikes refer to the phenomenon that synaptic input that is sufficiently close in time and space may trigger the activation of voltage-dependent channels that creates additional current in a positive feedback loop. Hence, this mechanism provides a substrate for the neuron to detect coinci- dences of synaptic input on a very fast timescale. Dendritic spikes can be initiated by different mechanisms, mediated by a variety of different channel types (sodium (Ariav, Polsky, & Schiller, 2003), cal- cium (Schiller, Schiller, Stuart, & Sakmann, 1997) and NMDA (Schiller, Major, Koester, & Schiller, 2000) spikes), all of which have a depolar- izing effect. Experimental evidence for inhibitory nonlinear synaptic interaction, as required in our model, is elusive and remains to be

122 discussion explored.

4.4.3 Excitability of inhibition and structural inhibitory plasticity The specific connectivity in the random embedding has some prop- erties worth mentioning. Firstly, the assignment of sets of inhibitory neurons to excitatory neurons. We show that a random assignment sat- isfies the required properties for the embedding to work. This shows that no specific hard-wiring or plasticity mechanism is required at this step. Further, we propose strong synaptic weight from excitatory to inhibitory neurons: input from a single active excitatory neuron is enough to activate an inhibitory neuron. Relating to the example of DG above, it has been observed in DG that selective activation of excitatory neurons (DG granule cells) yields strong activation of inhibitory neurons (local GABAergic interneurons), which in turn mediate strong inhibitory input to other excitatory neurons (Drew et al., 2016). Secondly, we propose that sets of inhibitory neurons are clustered on dendritic compartments to allow nonlinear interaction. While it is known that different types of inhibitory neurons specifically target certain dendritic areas of excitatory neurons (i.e. basket cells target the somatic and perisomatic compartment, chandelier cells target the axon initial segment, and Martinotti cells target the apical dendrites, see (Isaacson & Scanziani, 2011), experimental evidence of close in- hibitory synaptic clustering is vacant. In our model, specific sets of inhibitory synapses are spatially close on dendritic compartments (in particular = ). This requires a specific structure, which may C B be achieved through activity dependent structural plasticity on the dendritic spine: after the random assignment of inhibitory sets to

123 4. mutual inhibition with few inhibitory cells excitatory neurons, these sets are persistently coactive and thus an ac- tivity dependent plasticity mechanism could move synapses in these sets closer together. There is experimental evidence for structural inhibitory plasticity, see (Flores & Méndez, 2014) for review. In partic- ular, it has been shown that inhibitory synapse remodeling is spatially clustered and that this process is activity dependent (Chen et al., 2012).

4.4.4 Dendritic computation The study of the computational power of nonlinear interactions in the dendritic tree goes back to (Koch, Poggio, & Torre, 1983) and is reviewed in (London & Häusser, 2005; Stuart & Spruston, 2015). Net- works that implicitly incorporate nonlinear dendritic processing have been introduced as morphological neural networks (Ritter, Sussner, & Diza-de Leon, 1998) and their use as associative memories has been in- vestigated. Panayiota Poirazi and Bartlett W. Mel introduced a model of neurons whose dendritic compartments act as separate neuron-like units and studied the memory capacity of networks comprised of such neurons (Poirazi & Mel, 2001). Our model of synaptic nonlinear interaction can be considered as a special case of their units restricted to inhibitory interaction and negative thresholds. Viewing dendritic compartments as individual processing units allowed to show that a neuron with its dendritic tree is computationally as powerful as a traditional two layer network (Poirazi et al., 2003). Recently, a learning rule depending on presynaptic, dendritic, and postsynaptic spikes has been derived, which acts as a biological version of the classical error- backpropagation algorithm (Rumelhart, Hinton, & Williams, 1986) applied to the two layer network embodied in a pyramidal neuron and its dendritic tree (Schiess, Urbanczik, & Senn, 2016).

124 5 Lognormal synchrony in CA1

The results in this chapter were obtained in joint work with Hafsteinn Einarsson, Marcelo Matheus Gauy, Florian Meier, Asier Mujika, Johannes Lengler, Angelika Steger, see (Weissenberger, Einarsson, et al., 2018).

5.1 Introduction

Over the last decades it became clear that the hippocampus plays a key role in memory consolidation: during immobility and sleep, the fast sequential replay of cell assemblies, representing memory items, is believed to transfer the respective information from hippocampus to cortex via the trisynaptic loop (Eschenko, Ramadan, Mölle, Born, & Sara, 2008; Girardeau, Benchenane, Wiener, Buzsáki, & Zugaro, 2009; Dupret, O’Neill, Pleydell-Bouverie, & Csicsvari, 2010). These replay events are associated with sharp wave ripples (SPW-Rs, see (Buzsáki, Vanderwolf, et al., 1983; Buzsáki, Horvath, Urioste, Hetke, & Wise, 1992; Csicsvari, Hirase, Mamiya, & Buzsáki, 2000) and (Buzsáki, 2015) for extensive review). SPW-Rs in the hippocampus consist of two events: (1) the sharp wave is a negative deflection of the local field potential (LFP) in the dendritic layer that reflects strong depolariza- tion of the apical dendrites of CA1 pyramidal cells (PCs) caused by a network burst in the CA3 region (NB) and (2) the ripple, which is a fast oscillation of the LFP in the somatic layer comprising the interaction of the CA1 PCs with CA1 basket cells (BCs) in response to the strong synchronous excitatory drive from CA3 (Schlingloff, Káli,

125 5. lognormal synchrony in ca1

Freund, Hájos, & Gulyás, 2014; Stark et al., 2014). How memories are consolidated in this process and the particular role of the CA3-CA1 cir- cuit is yet to be discovered. One particular unexplained phenomenon is that the number of CA1 PCs that spike during the ripple, this is the size of the CA1 synchronous events (SEs), follows a heavy-tailed and skewed lognormal distribution (Mizuseki & Buzsáki, 2013; Buzsáki & Mizuseki, 2014; Malvache, Reichinnek, Villette, Haimerl, & Cossart, 2016). Accumulating evidence in the advent of large-scale neuronal record- ing techniques shows that many neural parameters follow such skewed and heavy-tailed distributions (Buzsáki & Mizuseki, 2014). For ex- ample, this has been observed for neuronal firing rates (Mizuseki & Buzsáki, 2013), synaptic efficacy (Sayer, Friedlander, & Redman, 1990; Song, Sjöström, Reigl, Nelson, & Chklovskii, 2005; Ikegaya et al., 2012), and network synchrony (Mizuseki & Buzsáki, 2013; Malvache et al., 2016). In consequence, it can be insufficient to describe parameters of biological systems by their mean value, as in fact, the distribution of parameters may be important. In addition to observing such dis- tributions, it is of great interest to study the underlying mechanisms that generate them. For example, the lognormal distribution arises naturally if a parameter is the product of many positive independent variables. This effect, known as Gibrat’s law (Sutton, 1997), suggests a simple explanation of the lognormal distribution of synaptic effi- cacy under the assumption that weight updates are independent and multiplicative (Loewenstein et al., 2011). Further, a skewed and heavy- tailed distribution appears by applying a suitable nonlinearity to a symmetric and light-tailed distribution. This has been proposed as the mechanism responsible for the highly skewed and heavy-tailed distri- bution of average firing rates observed in vivo (Roxin, Brunel, Hansel,

126 results

Mongillo, & van Vreeswijk, 2011). Yet, the origin of the lognormal SE size distribution is not known (Buzsáki & Mizuseki, 2014). Here, we study a computational model of the CA3-CA1 circuit with respect to the origin of the lognormal SE size distribution. We show analytically that if the NB size follows a normal distribution, then the SE size follows a lognormal distribution, as the activation function of CA1 PCs amplifies the exponential right tail of the normal distribution into a heavy tail. This observation is not restricted to the CA3-CA1 circuit, yet reveals a general principle how synchronous synaptic transmission may generate lognormal network synchrony, which is reminiscent of the proposed explanation for the skewed and heavy-tailed distribution of average firing rates of individual neurons in (Roxin et al., 2011). Our model suggests that the NB size follows a normal distribution in general. Further, fitting the parameters of a normal NB size distribution such that the resulting SE size distribution fits experimental data of (Buzsáki & Mizuseki, 2014) yields a prediction of NB size in particular. In addition, we study how a postulated lognormal NB size distribution (Omura, Carvalho, Inokuchi, & Fukai, 2015) affects the distribution of the SE size. Here, we find that the heavy tail of the NB size distribution yields an unrealistically heavy tail in the SE size distribution, in contradiction with experimental evidence (Buzsáki & Mizuseki, 2014).

5.2 Results

5.2.1 SE size is a function of NB size A NB in CA3 is characterized by a subset of CA3 PCs that spike in a short (30 ms-120 ms) time interval (Buzsáki, 2015). We define the

127 5. lognormal synchrony in ca1 size of a NB as the number of CA3 PCs that emit a spike during the NB and we denote it by the random variable X. A SE is defined as a subset of CA1 PCs that spike in a short time interval due to the fast and strong depolarization of their apical dendrites via the Schaffer collateral synapses in response to a NB in CA3. This depolarization is reflected in the LFP SPWs in the stratum radiatum of the CA1 network (Sullivan et al., 2011). The size of a SE is the number of CA1 PCs that participate in the SE, and we denote it by the random variable Y. The distribution of Y has experimentally been found to follow a lognormal distribution (Mizuseki & Buzsáki, 2013; Buzsáki & Mizuseki, 2014; Malvache et al., 2016). The neural circuitry underlying the SE generation from NBs con- sists of the Schaffer collateral synapses (Buzsáki et al., 1983), which project uniformly from CA3 PCs towards CA1 PCs (Buzsáki, 2015; Muller, Stead, & Pach, 1996) and CA1 BCs (Klausberger & Somogyi, 2008; Bezaire & Soltesz, 2013). Thus, the excitatory drive of a NB is transmitted to the CA1 PCs and CA1 BCs, where the latter in turn provide inhibitory input to the CA1 PCs (Klausberger & Somogyi, 2008). Our model assumes that the size of NBs determines the size of SEs, as it determines both the excitatory and inhibitory input to CA1 PCs during the SPW-R complex. Whereas the relation of the NB size and the excitatory drive to CA1 PCs is clear, we model the inhibitory input to CA1 PCs via CA1 BCs as a function of the NB size according to (Donoso, Schmitz, Maier, & Kempter, 2018), see Figure 54. Thus, the distribution of SE size, Y, is a function of the NB size, X, regulated by the direct synchronous transmission and the indirect inhibitory interaction.

128 results

5.2.2 Normal NB size distribution yields lognormal SE size distribution Here, we show analytically that if the size X of NBs in CA3 follows a normal distribution, then the size Y of SEs in CA1 follows a lognormal distribution. In the sequel we provide an intuitive explanation, a formal proof can be found in an Appendix, see Section 5.5. Assume the size X of the NBs follows a normal distribution with mean µX and standard deviation σX. For analytical tractability we consider the limit case of short NBs and large network size. Further, we assume constant inhibition (i.e. BC activity is independent of NB size), no background activity, homogeneous neurons, and a homogeneous network. These assumptions are then relaxed later. Let NCA1 be the number of CA1 PCs, p be the probability that a CA3 PC projects to a CA1 PC via a Schaffer collateral synapse, and let k be the normalized threshold of CA1 PCs (i.e. the mean number of excitatory synaptic inputs in a short time interval that triggers a spike despite constant inhibitory input, see Section 5.4). Then, a CA1 PC spikes in a SE if at least k CA3 PCs that project to it spike in the causal NB. If a particular NB has size x, then the resulting SE has expected size NCA1 Pr[Bin(x, p) k], as the number of synaptic inputs from · ≥ the NB to a CA1 PC follows a binomial distribution with x trials and success probability p, and the CA1 PC spikes if it receives at least k excitatory inputs. The SE size can be approximated by its expectation, as it is concentrated in the limit of large network size, see Section 5.5. Thus, the size y of the resulting SE given the size x of the causal NB can be written as y = NCA1 Pr[Bin(x, p) k].(5.1) · ≥ To see that in consequence Y follows a skewed heavy-tailed distri-

129 5. lognormal synchrony in ca1 bution, inspect Figure 51: the exponential right tail of the normal distribution is transformed into the heavy tail of the lognormal dis- tribution by applying the activation function of the CA1 PC (i.e. the size of Y given X). In fact, the resulting distribution of Y is lognormal, more precisely log Y (µ , σ ), and the underlying normal ∼ N log Y log Y distribution has mean   µ = log NCA1 + k log (µ p) log (k!) µ p (5.2) log Y X − − X and standard deviation

σ = (k/µ p) σ ,(5.3) log Y X − · X see Section 5.5 for the complete derivation. The results of comparing our analysis with the simulated model under the assumptions of the analysis are shown in Figures 52 and 53. The derived formula fits well even for finite network size (we resort to the formulation in Equations (5.12) and (5.14)), in particular if the ratio σX/µX is small as is assumed in the analysis. The parameters of the distribution of Y are quite susceptible to variations in the parameters of the distribution of X. This strong susceptibility to parameter variations is in principle detrimental to our model as it, for example forces X to be very concentrated, see Figure 53. However, as we will see in the following section, this is just an artefact of our idealized assumptions. Once we introduce realistic parameter variability, noise, NB duration and, crucially, allow the effect of inhibition to depend on the NB size (according to (Donoso et al., 2018)), the effect on Y of varying parameters of X becomes much more reasonable.

130 results

A B

Distribution of Y 300

Distribution of X 250 Size of Y given X .

200 s u o h t

150 n i

Y Probability Probability 100 104 105 Y 50

0 10 11 12 13 14 15 0 100 200 300 X in thous. Y in thous.

Figure 51: Transformation of normal NB size X into a lognormal SE size Y (schematic). (A) shows the normally distributed NB size X (blue, left axis) and the size of the resulting SE Y (orange, right axis) as a function of X. (B) shows the resulting lognormal distribution of Y (inset shows distribution of Y on log-scale). The distribution of Y is generated as follows: the probability of a particular value of Y, for example the dark green square for the light green square in (B) can be derived in (A) by finding the value of X that gives rise to the value of Y (i.e. the intersection of the dashed black line starting at the light green square with the orange line) and reading off its probability (i.e. following the dashed black line to the intersection with the blue line and to the dark green square). The green squares show how the most probable value of Y is generated, the red squares provide an example of the origin of the heavy tail of Y’s distribution.

131 5. lognormal synchrony in ca1

A B C

1.0 X = 13800, X = 100

X = 14000, X = 100 X = 14500, X = 100 0.8

0.6 PDF X X = 13800, X = 100 X = 13800, X = 100 X = 14000, X = 100 X = 14000, X = 100 = 14500, = 100 = 14500, = 100 X X 0.4 X X Probability Probability

14 15 Cum. probability 0.2 X in thous.

0.0 0 100 200 300 102 103 104 105 0 100 200 300 Y in thous. Y Y in thous.

Figure 52: Comparing the simulated limit model (solid lines) to the analysis for varying mean µX of the normally distributed NB size X. Dotted lines show theoretical predictions according to Equations (5.12) and (5.14). Dashed lines show the best fit lognormal. (A) shows the distribution of the SE size Y. (B) shows the distribution of the logarithm of the SE size log Y. (C) shows the cumulative distribution of the SE size Y. Network parameters not restricted by the limit case are as in Section 5.4.

A B C

1.0 X = 14000, X = 50 X = 14000, X = 50

X = 14000, X = 100 X = 14000, X = 100 X = 14000, X = 200 X = 14000, X = 200 0.8

PDF X 0.6

0.4 Probability Probability

14 15 Cum. probability 0.2 X in thous. X = 14000, X = 50 X = 14000, X = 100 = 14000, = 200 0.0 X X 0 100 200 300 102 103 104 105 0 100 200 300 Y in thous. Y Y in thous.

Figure 53: Analogous to Figure 52 for varying standard deviation σX.

132 results

5.2.3 Realistic Inhibition, parameter variability and noise increase the robustness of SEs with respect to NB size distribution parameters. We analyzed the model mathematically in the limit of zero temporal, neuronal, and synaptic variability and noise. Moreover, we assumed constant inhibition, independent of the NB size. In this section we study the effect of noise, nonuniform parameters and inhibition as a function of NB size on the activation function of CA1 PCs by means of simulation. It turns out that in this more realistic setting the activation function is widened, which causes the emergence of lognormally distributed SE size to be less susceptible to the parameters of the NB size distribution. Inhibitory input to CA1 PCs during SPW-Rs mainly stems from CA1 BCs (Klausberger & Somogyi, 2008). The activity of BCs is responsible for the fast oscillation in LFP during the ripple event and phase-locked firing of CA1 PCs (English et al., 2014; Stark et al., 2014). Further, the activity of BCs and thus the inhibitory input to CA1 PCs can be expressed as a function of the NB size, see Figure 54, following the model of (Donoso et al., 2018). In accordance to (Donoso et al., 2018), the BC activity is characterized by the frequency of the ripple oscillation, the average firing rate of BCs during the ripple and the synchrony of the BC network. Higher frequency and rate have a negative effect on the probability of CA1 PC firing during ripples, whereas increased synchrony of the BC network has a positive effect, as the window of opportunity for PC firing is widened. In particular, the BC rate and BC synchrony balance each other to some extend. The effect of inhibition on the CA1 PC activation function is depicted in Figure 55. We compare constant inhibition (independent of NB size)

133 5. lognormal synchrony in ca1 and inhibition as a function of NB size. For the latter, we consider different synaptic weights of the GABAergic synapses from BCs to CA1 PCs. This controls the ratio of excitatory and inhibitory input to CA1 PCs. It is clear that constant inhibition yields a monotone activation function of CA1 PCs (as in the analysis above), because the net excitatory input increases monotonically with NB size. In contrast, the effect of realistic inhibition is non trivial. We observe that inhibition that varies with NB as in Figure 54 results in a significant widening of the activation function, see Figure 55. The reason is that inhibition increases with NB size and thus balances the excitatory drive. In consequence, larger variations in NB size (i.e. a larger standard deviation of the distribution of X) are necessary compared to the constant inhibition setting to obtain comparable SE distributions. The balancing effect further increases with the strength of inhibition, determined by the synaptic weight of GABAergic synapses on CA1 PCs. Temporal noise arises from the duration of the NB, as the spikes are distributed in time. We denote the duration of a NB by the parameter ∆NB. In the previous analysis we considered the limit of short NBs (∆NB = 0; i.e. all input spikes occurred synchronously). However, the duration of a NB is between 30 ms and 120 ms (Buzsáki et al., 1983). Increased NB duration reduces the size of SEs, due to the leakage in CA1 PCs. Therefore, the activation function is shifted to the right as shown in Figure 56. Furthermore, the slope of the activation function decreases with increased NB duration, because for long NBs the spikes are distributed further in time and therefore the excitatory drive per unit of time is smaller compared to short NBs. Neuronal variability is comprised of fluctuations in the membrane potential and varying spike thresholds. Fluctuations in the membrane

134 results potential are caused by persistent background activity. Further, we assume Gaussian noise on the spike threshold to capture the inhomo- geneity of CA1 PCs. The noise on the threshold decreases the slope of the activation function slightly, see Figure 57. The fluctuations of the membrane potential shift the activation function to the left, see Fig- ure 57, as the chosen background activity is mainly excitatory and the probability of spiking increases with increasing voltage fluctuations in general (La Camera et al., 2008). Synaptic weights of Schaffer collateral synapses follow a lognormal distribution (Sayer et al., 1990). If the mean synaptic weight is the same as in the uniform case, this only marginally changes the activation function. The reason is that the input to a neuron is the sum of many independent lognormal weights, and as the central limit theorem implies that this sum is concentrated, the effect vanishes, see Figure 57. In conclusion, the different sources of stochasticity all widen the activation function slightly, but do not influence it qualitatively. The non-constant inhibition drastically widens the activation function, because non-constant inhibition has a strong balancing effect. In consequence, the resulting distribution of Y is much less susceptible to variations in the parameters of the distribution of X, compare Figures 52 and 53 to Figures 58 and 59. Further, the distribution of Y still resembles a lognormal distribution, as the shape of the nonlinearity of the activation function of CA1 PCs is qualitatively similar to the limit case, despite non-constant inhibition and noise, see Figures 58 and 59.

135 5. lognormal synchrony in ca1

A Ripple frequency B BC rate over Ripple freq. C BC synchrony 1.2 1.0 250 Baseline Baseline Fit Fit 1.0 Data 0.8 Data 200 0.8 0.6 150 0.6 Ratio 0.4

100 Coherence 0.4 Frequency in Hz

Baseline 0.2 50 0.2 Fit Data 0 0.0 0.0 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 X in thous. X in thous. X in thous.

Figure 54: Inhibition as a function of NB size according to (Donoso et al., 2018, Fig. 1 D) and a constant baseline. (A) shows the frequency of the BC network oscillation as a function of NB size. (B) depicts the ratio of network frequency and the average rate of BCs as a function of NB size. (C) shows coherence of BC firing, a measure of synchrony (see Section 5.4) as function of NB size.

1.0

0.8

0.6 Constant wGABA = 8pF CA3 dependent wGABA = 7pF

CA3 dependent wGABA = 8pF

0.4 CA3 dependent wGABA = 9pF Probability

0.2

0.0 5 10 15 20 25 X in thous.

Figure 55: Effect of inhibition on the probability that a CA1 PC spikes as function of the NB size X. Constant inhibition corresponds to the baseline in Figure 54, CA3 dependent corresponds to inhibition as characterized in (Donoso et al., 2018, Fig. 1 D), see Figure 54.

136 results

1.0 NB = 80

NB = 100 0.8 NB = 120

0.6

0.4 Probability

0.2

0.0 5 10 15 20 25 X in thous.

Figure 56: Effect of synchrony (i.e. NB duration ∆B) on the probability that a CA1 PC spikes as function of the NB size X.

1.0 Temporal Voltage 0.8 Threshold Weight Combined data 0.6 Combined

0.4 Probability

0.2

0.0 5 10 15 20 25 X in thous.

Figure 57: Effect of parameter variability and noise on the proba- bility that a CA1 PC spikes as function of the NB size X. Network parameters which are not restricted are as in Section 5.4. Restricted parameters are λe = λi = 0 in the absence of voltage fluctuations, 4 σT = 0.001 in the absence of variable thresholds, σs < 10− for uni- form weights. For the combined model no parameters are restricted; for voltage, threshold, and weight, the respective other two parameters are restricted. For temporal, all three parameters are restricted.

137 5. lognormal synchrony in ca1

A B C

1.0 X = 13000, X = 1500 X = 13000, X = 1500

X = 14000, X = 1500 X = 14000, X = 1500 X = 16000, X = 1500 X = 16000, X = 1500 0.8

PDF X 0.6

0.4 Probability Probability

10 15 Cum. probability 0.2 X in thous. X = 13000, X = 1500 X = 14000, X = 1500 = 16000, = 1500 0.0 X X 0 100 200 300 102 103 104 105 0 100 200 300 Y in thous. Y Y in thous.

Figure 58: Comparing the simulated model (solid lines) to the best fit lognormal (dashed lines) for varying the mean µX of the normally distributed NB size X. (A) shows the distribution of the SE size Y. (B) shows the distribution of the logarithm of the SE size log Y. (C) shows the cumulative distribution of the SE size Y.

A B C

1.0 X = 14000, X = 500 X = 14000, X = 500

X = 14000, X = 1000 X = 14000, X = 1000 X = 14000, X = 2000 X = 14000, X = 2000 0.8

PDF X 0.6

0.4 Probability Probability

10 15 Cum. probability 0.2 X in thous. X = 14000, X = 500 X = 14000, X = 1000 = 14000, = 2000 0.0 X X 0 100 200 300 102 103 104 105 0 100 200 300 Y in thous. Y Y in thous.

Figure 59: Analogous to Figure 58 for varying standard deviation σX.

138 results

5.2.4 Prediction of NB size distribution with data on SEs Assuming a normally distributed NB size X, the only free parameters in our model are the mean µX and the standard deviation σX of X (network parameters are chosen to be consonant with experimental data, see Section 5.4, yet our prediction heavily depends on the par- ticular choice, see for instance Figure 55). We fitted these parameters such that the resulting distribution of the SE size Y fits the experimen- tally derived distribution from (Buzsáki & Mizuseki, 2014, Fig. 2 B) (minimizing the mean squared error of the histograms), see Figure 510. The predicted values are µ 14, 200 and σ 1, 700. This is roughly X ≈ Y ≈ 7 % of CA3, which is in the right order of magnitude, see (Csicsvari et al., 2000, Fig. 3 C) where 10 % of CA3 spiking in a short 100 ms time window has been observed during SPW-Rs.

0.25

0.20

Y for X = 14212, X = 1714 0.15 Y best fit lognormal Data 0.10 Data best fit lognormal Fraction of SEs

0.05

0.00 10 2 10 1 Proportion of CA1 PCs firing

Figure 510: Fitting the mean µX and the standard deviation σX of normally distributed NB size X such that the mean squared error between the distribution of the resulting SE size Y and data depicted in (Buzsáki & Mizuseki, 2014, Fig. 2 B) is minimized.

139 5. lognormal synchrony in ca1

5.2.5 Participation of CA1 PCs in SEs follows a skewed heavy-tailed distribution It has been experimentally observed that the number of SEs in which a CA1 PC participates follows a skewed heavy-tailed distribution as well, see (Mizuseki & Buzsáki, 2013, Fig. S5 B). In Figure 511 we plot this quantity for our model with the NB size distribution as determined in Section 5.2.4 and realistic network parameters as described in Section 5.4, and see that it matches qualitatively with a shift to the left compared to the data (i.e. on average, neurons participate in less SEs than experimentally observed). The heavy-tailed distribution of participation follows from the fact that neurons are inhomogeneous (i.e. their thresholds vary); for homogeneous neurons the distribution would be a normal distribution as a consequence of the central limit theorem.

0.10

0.08

0.06 Model Model best fit lognormal Data 0.04 Data best fit lognormal Fraction of PCs 0.02

0.00 10 3 10 2 10 1 100 Proportion of SEs in which CA1 PC fired

Figure 511: Comparing the proportion of SEs in which a CA1 PC participates in our model to data of (Mizuseki & Buzsáki, 2013, Fig. S5 B).

140 results

5.2.6 Heavy-tailed CA3 NB size distributions yields unrealistic CA1 SE size distribution A lognormal NB size distribution has been suggested as the origin of the experimentally observed lognormal SE size distribution in the modeling work of (Omura et al., 2015). In the sequel we point out that in our model, heavy-tailed NB size distributions (of which lognormal is a special case) yield an unrealistic SE size distribution, which in particular predicts global CA1 activation with non-negligible probability. The underlying mechanism is illustrated in Figure 512: the heavy right tail of the lognormal NB size distribution is transformed into the even heavier tail of the resulting SE size distribution. The effect of lognormally distributed NB sizes in our model is depicted in Figures 513 and 514.

141 5. lognormal synchrony in ca1

A B

Distribution of Y 300

Distribution of X 250 Size of Y given X .

200 s u o h t

150 n i

Y Probability Probability 100

50

0 10 11 12 13 14 15 0 100 200 300 X in thous. Y in thous.

Figure 512: Transformation of heavy-tailed NB size X into even heavier-tailed SE size Y (schematic). (A) shows the distribution of the NB size X (blue, left axis) and the size of the resulting SE Y (orange, right axis) as a function of X. (B) shows the resulting distribution of Y. The distribution of Y is generated as follows: the probability of a particular value of Y, e.g., dark green square for light green square in (B) can be derived in (A) by finding the value of X that gives rise to the value of Y (i.e. the intersection of the dashed black line starting at the light green square with the orange line) and reading off its probability (i.e. following the dashed black line to the intersection with the blue line and to the dark green square). The green squares show how the most probable value of Y is generated, the red squares provide an example of the origin of the heavy tail of Y’s distribution.

142 results

A B C

1.0 logX = 9.4, logX = 0.3 logX = 9.4, logX = 0.3

logX = 9.55, logX = 0.3 logX = 9.55, logX = 0.3

logX = 9.7, logX = 0.3 logX = 9.7, logX = 0.3 0.8

PDF X 0.6

0.4 Probability Probability

10 20 30 Cum. probability 0.2 logX = 9.4, logX = 0.3 X in thous. logX = 9.55, logX = 0.3 = 9.7, = 0.3 0.0 logX logX 0 100 200 300 102 103 104 105 0 100 200 300 Y in thous. Y Y in thous.

Figure 513: Comparing the simulated model (solid lines) to the best fit lognormal (dashed lines) for varying the log mean µlog X of the lognormally distributed NB size X. The best lognormal fit is much worse compared to normal NB size, cf. Figure 58. Further, there is a non-negligible probability that all 320 thous. CA1 neurons turn active. (A) shows the distribution of the SE size Y. (B) shows the distribution of the logarithm of the SE size log Y. (C) shows the cumulative distribution of the SE size Y.

A B C

1.0 logX = 9.55, logX = 0.15 logX = 9.55, logX = 0.15

logX = 9.55, logX = 0.3 logX = 9.55, logX = 0.3

logX = 9.55, logX = 0.45 logX = 9.55, logX = 0.45 0.8

PDF X 0.6

0.4 Probability Probability

10 20 30 Cum. probability 0.2 logX = 9.55, logX = 0.15 X in thous. logX = 9.55, logX = 0.3 = 9.55, = 0.45 0.0 logX logX 0 100 200 300 102 103 104 105 0 100 200 300 Y in thous. Y Y in thous.

Figure 514: As Figure 513 for varying log standard deviation σlog X.

143 5. lognormal synchrony in ca1

5.3 Discussion

In this work we study a simple CA3-CA1 circuit model and show that it is capable of reproducing experimentally observed network syn- chrony characteristics (Mizuseki & Buzsáki, 2013; Buzsáki & Mizuseki, 2014; Malvache et al., 2016). In particular, we show that small vari- ations in the size of NBs are transformed into larger variations and increased size of SEs, implying that normally distributed NBs sizes give rise to the observed heavy-tailed and skewed SE distribution. We find that this is a result of the interplay between the NB size distribu- tion and the activation function of the CA1 PCs. Furthermore, this amplification of CA3 activity provides evidence against a lognormal NB size distribution, as the heavy tail of the NB size distribution would result in an unrealistically heavy tail of the SE size distribu- tion, which in particular would result in the activation of the entire CA1 circuit. The observation that normally distributed activity is transformed into lognormal activity due to synchronous transmission over one synaptic layer constitutes a general principle for the origin of lognormal neuronal synchrony, which is not restricted to the CA3-CA1 circuit.

5.3.1 Related computational models Previous computational models of the CA3-CA1 circuit could replicate the characteristic LFP ripple event in CA1 as a response to a NB. The ripple event occurred as a consequence of the interaction within and across the excitatory and inhibitory subcircuits in CA1 (inhibition-first models), see (Brunel & Wang, 2003; Taxidis, Coombes, Mason, & Owen, 2012; Malerba, Krishnan, Fellous, & Bazhenov, 2016; Donoso et al., 2018) and models studying multiple inhibitory subcircuits (Cutsuridis

144 discussion

& Hasselmo, 2010; Cutsuridis & Taxidis, 2013). They differ in whether the inhibitory CA1 neurons are primarily driven by CA3 (Schlingloff et al., 2014) or by CA1 PCs (Stark et al., 2014). Other models propose that the ripple is caused by the interaction of PCs. Here, gap junctions among PCs (Draguhn, Traub, Schmitz, & Jefferys, 1998; Traub & Bibbig, 2000) and the sparse recurrent connectivity in CA1 together with dendritic spikes has been shown to support synchronized activity propagation resulting in high-frequency oscillation (Memmesheimer, 2010). Moreover, the role of additional cortical input has been studied in (Taxidis, Mizuseki, Mason, & Owen, 2013) extending (Taxidis et al., 2012). Yet, these models do not display the in vivo observed distribution of SPW-R magnitude. A model for spontaneous lognormal NB generation has been pro- posed in (Omura et al., 2015). There, it was shown that a recurrent network of bursting neurons interconnected by synapses with a lognor- mal weight distribution spontaneously emits NBs whose size distribu- tion follows a lognormal distribution. The authors speculate that this lognormal NB size distribution may explain the observed lognormal SE distribution, which assumes a linear activation function of CA1 PCs. In contrast, our model requires that the size of NBs is normally distributed, as the activation function is nonlinear. This difference is analogous to (Koulakov, Hromádka, & Zador, 2009) and (Roxin et al., 2011) on the single neuron level. With respect to the emergence of normal NBs, it has been shown in a model of asynchronous spike transmission in random networks that normally distributed NBs with small standard deviation occur if inhibition is strong enough to stop the spread of activity (Einarsson, Lengler, Panagiotou, Mousset, & Steger, 2014), consistent with the observation that inhibition domi- nates during NBs in vitro (Hájos et al., 2013). Further, supporting

145 5. lognormal synchrony in ca1 normally distributed NB size, the distribution of SPW amplitude has been observed to follow a truncated normal distribution (where the truncation stems from the SPW detection threshold), see (Sullivan et al., 2011), however, the relation between SPW amplitude and NB size is not clear. Our model investigates synchronous spike transmission propagated over one synaptic layer. Synchronous spike transmission over multiple layers has been studied in the context of synfire-chains (Abeles, 2009) where it has been shown that in the absence of homeostatic plasticity the activity diverges quickly (Weissenberger et al., 2017) and the distribution of activity is very broad (Einarsson, Lengler, Panagiotou, et al., 2014). Such divergence is consistent with our work: the tail of the input distribution is amplified over one synaptic layer and iterative amplification over multiple layers results in explosions of activity. Heavy-tailed skewed distributions of neuronal synchrony have also been studied in the context of neuronal avalanches (Beggs & Plenz, 2003), where they have been shown to follow a powerlaw distribution. Here, the NBs occur spontaneously because the network is close to criticality (Herz & Hopfield, 1995; Eurich, Herrmann, & Ernst, 2002), whereas SEs in CA1 are believed to be caused by the NBs (Buzsáki et al., 1983, 1992; Csicsvari et al., 2000).

5.3.2 SE size as function of excitatory drive from CA3 In our model the size of a SE is determined solely by the size of the inducing NB (the role of NBs in inducing SEs has been established, e.g. by various lesion experiments (Buzsáki et al., 1983; Suzuki & Smith, 1988)). Direct experimental evidence supporting our assumption is absent, as simultaneous large scale unit recordings in CA3 and CA1

146 discussion during SPW-Rs have not been performed. Further, direct correlation between the amplitude of the SPW and SE size has not been investi- gated. However, it is known that the amplitude of the SPW is strongly correlated with the peak amplitude of the ripple (Sullivan et al., 2011, Fig. 2 B) and the amplitude of the ripple is correlated with a gain in the firing rate of pyramidal CA1 neurons during the ripple (Csicsvari, Hirase, Czurkó, Mamiya, & Buzsáki, 1999a, Fig. 3B and C). Hence, the assumed dependence of SE size on NB size is indeed plausible. In addition to the magnitude of the excitatory drive from CA3 to CA1 via Schaffer collaterals, the size of SEs is further affected by the inhibitory circuit in CA1. There are different types of interneurons in CA1 that show various activity patterns during SPW-Rs (Csicsvari, Hirase, Czurkó, Mamiya, & Buzsáki, 1999b; Klausberger & Somogyi, 2008). Most prominently, parvalbumin-immunoreactive BCs, which receive excitatory input both from CA3 and CA1 PCs and recurrent in- hibitory input (Klausberger & Somogyi, 2008). They fire phase-locked at ripple frequency (Ylinen et al., 1995; Csicsvari et al., 1999b). This observation motivated the hypothesis that inhibition primarily serves as a pacemaker of the ripple and thus for precise spike-timing of CA1 PCs during ripples, see (Stark et al., 2014; English et al., 2014) for experimental evidence and the inhibition-first models (Cutsuridis & Hasselmo, 2010; Taxidis et al., 2012; Cutsuridis & Taxidis, 2013; Malerba et al., 2016; Donoso et al., 2018): the CA3 PCs excite both CA1 PCs and CA1 BCs, which start to fire at ripple frequency synchronized by their recurrent interaction. Thereby, creating windows of opportu- nity for the CA1 PCs to spike. Experimental data documenting the effect of inhibition as a function of NB size on the size of SEs is elusive. Thus, we rely on a recent model that studies the inhibitory CA1 net- work during SPW-Rs (Donoso et al., 2018). Their model predicts the

147 5. lognormal synchrony in ca1 ripple frequency, the firing probability of BCs during the ripple and their synchrony as a function of NB size. This allows us to determine the inhibitory input to CA1 PCs during ripples as a function of NB size and thus permits to study the SE size as a function of NB size. Note that the interaction of CA1 PCs on BCs and recurrent BC interaction are incorporated implicitly. It remains to discuss the influence of recurrent interactions among CA1 PCs on the size of SE size. In vitro slice experiments revealed that CA1 PCs are locally connected with probability of approximately 1 % (Deuchars & Thomson, 1996; Thomson & Radpour, 1991; Yang et al., 2014). These recurrent interactions have been largely ignored in models, as the relatively short axons and the long time delay of signal propagation hinders rapid signal propagation to large populations of target cells (Yang et al., 2014). Our model does not include recurrent connections. Note, however, that a qualitative influence of recurrent excitation is not expected: in our model the CA1 PCs that get more CA3 PC input than others (i.e. at least k) fire a spike and participate in the SE, as observed in vivo (Hulse, Moreaux, Lubenov, & Siapas, 2016). Including recurrent CA1 PC interactions may cause that additional CA1 PCs participate in the SE. Yet, there is a strong bias towards the CA1 PCs that receive already a lot of input from CA3 PCs (say at least k-1). Hence, recurrent excitation in CA1 may cause that rather CA1 PCs with at least k 1 inputs participate in the SE than only CA1 PCs − with at least k inputs. Thus, including recurrent interaction reduces to a shift of k, which does not change the behaviour of the model qualitatively. This argument holds true if the recurrent excitation is weak and does not cause rapid signal propagation to large target populations, as indicated in (Yang et al., 2014). However, this is challenged in (Memmesheimer, 2010), where including supralinear

148 discussion dendritic interactions have been shown to support such propagation in CA1.

5.3.3 Conclusion The transient synchrony of distributed groups of neurons (cell as- semblies) is believed to be an integral part of neural information processing (Hebb, 1949; Buzsáki, 2010). For example, some CA1 PCs are receptive to the spatial location of the subject (O’Keefe & Dostro- vsky, 1971). Such place cells have been found to being reactivated during SPW-Rs, in the order in which they have been traversed in past awake navigation episodes, see for example (Skaggs & McNaughton, 1996). Studies demonstrating impairment of memory consolidation in consequence of selective suppression of SPW-Rs (Girardeau et al., 2009) and the fact that cell participation in SPW-Rs predicts future performance of spatial memory tasks (Dupret et al., 2010), indicate that SPW-Rs are involved in memory consolidation. Yet, the func- tional role of CA1 and in particular the CA3-CA1 circuit is still under debate (Csicsvari et al., 2000). Our model predicts different distributions of cell assembly size in CA3 and CA1: normal distribution in CA3 versus the lognormal distribution that has been observed in CA1. Notably, our model can be falsified by providing evidence for a heavy-tailed and skewed distribution of cell assembly size in CA3. In this case, it may be instructive to investigate the activation function of CA1 PCs and interesting to determine the underlying homeostatic mechanism that prevents amplification of the heavy tail. Hence, our model proposes that the functional role of the CA3-CA1 circuit is to transform the distribution of cell assembly size. This

149 5. lognormal synchrony in ca1 may be a necessary signal transformation in the process of memory consolidation. In the view of the trisynaptic loop, CA1 provides input to cortex. In many sensory systems the relation between input and perception is logarithmic, a phenomenon known as Weber-Fechner law (Weber, 1834; Fechner, 1860). Hence, in perception the logarithm of the input is the important quantity. A lognormal distribution is a hallmark of this since it maximizes the entropy among all distributions with specified log mean and log standard deviation (Kvalseth, 1982).

5.4 Methods

5.4.1 Network We model the CA3-CA1 circuit as follows. We consider a population of 210,000 CA3 pyramidal cells (PCs), a population of 320,000 CA1 PCs and a population of 5,530 CA1 parvalbumin-immunoreactive basket cells (BCs). The CA3 PCs project randomly to the CA1 PCs with connection density 0.1 (Buzsáki, 2015; Muller et al., 1996) and to the CA1 BCs with probability 0.05 (Bezaire & Soltesz, 2013). BCs project to CA1 PCs such that each CA1 PC receives 180 GABAergic synapses (the 5530 BCs form 943 connections to CA1 PCs each, comprising 11 synapses (Bezaire & Soltesz, 2013)). The connectivity from CA3 PCs to BCs, from CA1 PCs to BCs and from BCs to BCs is implicit in the way we model inhibition, see below. Recurrent connectivity between CA1 PCs is neglected. The network parameters are summarized in Table 51. Unless explicitly mentioned, all parameters and the neuron and synapse models follow (Donoso et al., 2018).

150 methods

Table 51: Parameters of the network.

Symbol Description Value NCA3 Number of CA3 PCs 210,000 NCA1 Number of CA1 PCs 320,000 NBC,PC Number of BC synapses to CA1 PC 180 p Connection prob. from CA3 PCs to CA1 PCs 0.1

5.4.2 Neurons The CA3 PC dynamics are not explicitly modeled but reduced to the synaptic input to CA1 PCs. Similarly, the BCs are not explicitly modeled but reduced to oscillatory inhibitory synaptic input to CA1 PCs, described below. The CA1 PCs are modeled by leaky integrate- and-fire (LIF) neurons. The membrane potential dynamics are given by

d C V(t) = g (E V(t)) + ge(t) (Ee V(t)) + g (t) (E V(t)) , dt l rest − − i i − (5.4) with parameter values summarized in Table 52. The neuron fires a spike if the voltage exceeds the threshold ET. After a spike the voltage is reset to Ereset and clamped for the absolute refractory time τabs. The neurons are inhomogeneous as ET is drawn from a normal distribution with mean µT and standard deviation σT.

151 5. lognormal synchrony in ca1

Table 52: Parameters of the membrane dynamics.

Symbol Description Value

Erest Resting potential 67 mV C Capacitance 275− pF gl Leak conductance 25 nS Ee Excitatory reversal potential 0 mV Ei Inhibitory reversal potential 68 mV µ Threshold mean −50 mV T − σT Threshold standard deviation 0.5 mV E Reset potential 60 mV reset − τabs Absolute refractory period 2 ms

5.4.3 Synapses The time course of a postsynaptic conductance due to a presynaptic spike is given by  

g(t) = F(t) ∑ Wsj(t) ,(5.5) ∗ j 1 ≥ where F(t) is the synaptic kernel, denotes convolution, W is the X ∗ X synaptic weight and si (t) = ∑l 1 δ(t ti,l) is the spike train of the ≥ X − i-th neuron of population X, where ti,l is the time of the l-th spike and δ is the Dirac delta function. Further, the synaptic kernel is given by

1   t   t  F(t) = exp exp ,(5.6) τd τr − τd − − τr − where τr and τd are the time constants of the rise and decay of postsynaptic potentials. We model AMPA-type synapses providing

152 methods excitatory input from CA3 PCs and GABAergic synapses providing inhibitory input from BCs. For AMPA-type synapses, the synaptic efficacy is drawn from a lognormal distribution (Sayer et al., 1990) such that the mean corresponds to a peak conductance of 0.9 nS. The weight of GABAergic synapses is chosen such that the peak conduc- tance is 2.82 nS, which yields an IPSP peak amplitude of 0.23 mV at 57 mV, which is smaller than the reported mean but within reported − values (Buhl, Cobb, Halasy, & Somogyi, 1995) (a peak conductance of 9.0 nS as in (Donoso et al., 2018) prevents any CA1 PC activity). The synaptic parameters are summarized in Table 53.

Table 53: Parameters of the synapses.

Symbol Description Value

µAMPA AMPA weight mean 2.52 pF σAMPA AMPA weight standard deviation 1.34 pF WGABA GABA weight 8 pF r τe AMPA synaptic rise time 0.5 ms d τe AMPA synaptic decay time 1.8 ms r τi GABA synaptic rise time 0.4 ms d τi GABA synaptic decay time 2 ms

5.4.4 Synaptic input The synaptic input to CA1 PCs consists of two components; excitatory input from CA3 PCs and inhibitory input from BCs. The excitatory input is comprised of the NB and a weak Poisso- nian background activity proportional to the number of AMPA-type synapses. A NB in CA3 is modeled as a set of CA3 PCs that fire a

153 5. lognormal synchrony in ca1 spike. The spikes are then distributed over time according to a Gaus- sian distribution with mean 0 (centered at SPW peak) and standard deviation 0.25 ∆ , where ∆ is the NB duration. The NB sets · NB NB are random subsets of CA3 neurons of size X where X is a random variable distributed according some distribution (we consider normal or lognormal distribution). This describes the excitatory input to CA1 PCs during the SPW-R, and corresponds to the SPW. The inhibitory input is modeled according to (Donoso et al., 2018). In particular, the network frequency of BCs (ripple frequency), the mean firing rate of individual BCs and the coherence (i.e. the square- root of the ratio of the power of the ripple frequency and the power of the zero frequency) as a function of the NB size is taken from (Donoso et al., 2018, Fig 1. D) (to interpolate we use a polynomial fit of degree 3), see Figure 54. The relation between NB size and the excitatory input to BCs is as follows. We map a NB of size x to an average input rate of x 0.25 ∆ /√2πe (this is the instantaneous rate one · · NB standard deviation away from the SPW peak, hence, the Gaussian is approximated by a rectangle of this height). We chose time invariant input to BCs for simplicity as the relative timing of the direct excitatory path and the indirect inhibitory path are unclear. Additionally, BCs receive a constant background activity of 1,200 Hz (Donoso et al., 2018). Hence, the total input rate to a BC for a NB of size x is x 0.25 ∆ /√2πe + 1, 200. To sample an inhibitory input spike · · NB train satisfying the assumptions (frequency, rate and coherence), we sample spike times from Gaussian distributions shifted by the inverse of the ripple frequency. Hence, if the ripple frequency is 200 Hz, the means of the Gaussians are at 0 ms, 5 ms, 10 ms, etc. Their common standard deviation is chosen as a function of the ripple frequency to obtain the given coherence (larger standard deviation yields smaller

154 methods coherence, as spikes will be more evenly distributed in time). Finally, the number of sampled spike times is determined to obtain the given mean firing rate of BCs. Additionally, the CA1 PCs receive very weak Poissonian inhibitory background activity. Parameters of the synaptic input are summarized in Table 54

Table 54: Parameters of the synaptic input.

Symbol Description Value

∆NB NB duration 100 ms λe Excitatory noise input to CA1 PCs 2,000 Hz λi Inhibitory noise input to CA1 PCs 200 Hz

5.4.5 Simulation Here, we describe the simulation procedure to empirically derive the distribution of Y. We first compute the probabilities px that a CA1 PC emits a spike in response to the activity of a random CA3 PC subset of fixed size x. To this end, we repeat the following sampling procedure 500 times and compute the average. First, we draw the number of CA3 PCs that project to the CA1 PC from a CA3 set of size x (this follows a binomial distribution with parameters x and p). Second, we draw the respective synaptic weights. Third, we draw the threshold of the CA1 PC. Fourth, we draw the spike times of the CA3 PCs in the NB, the BCs and the background activity. Finally, we simulate the dynamics of the CA1 PC at time resolution is 0.01 ms and output whether or not the CA1 PC emitted a spike. This is done at a resolution for x of less than 100, smoothed by a Gaussian filter and interpolated by cubic splines, see Figure 57). The resulting

155 5. lognormal synchrony in ca1

function px is shown in Figures 55, 56 and 57. To finally derive the distribution of Y, we draw 107 x’s according to the distribution of X and draw Y from a binomial distribution with x trials and success probability px. This is then plotted in a histogram with bin size 10, see Figures 58, 59, 510, 513, and 514. For the limit model in Figures 52 and 53 (in which all CA3 PC spikes arrive simultaneously at a CA1 PC), px is simply the probability that a binomially distributed (with x trials and success probability p) random variable is at least k. Here k denotes the normalized threshold k, which is the average number of synchronous synaptic inputs from CA3 PCs at mean weight required to make a neuron spike despite constant inhibitory input. Loosely fitting the CA1 PC activation function of the limit model to the CA1 PC activation function of the general model with constant inhibition justifies k = 1450. Furthermore, we are interested in the distribution of the proportion Z of SEs in which a neuron fired. Since the PCs are inhomogeneous this distribution deviates from a normal distribution. We first compute the probabilities pET that a CA1 PC with threshold ET participates in a SE. To this end, we repeat the following sampling procedure 500 times and compute the average. We first draw the size x of the NB according to a normal with mean 14212 and standard deviation 1712. Thereafter, we draw the number of CA3 PCs that project to a CA1 PC from a CA3 set of size x and the corresponding synaptic weights. Then, we draw the spike times and simulate the dynamics of the CA1 neuron.

Given the probabilities pET for all possible values of ET (resolution 7 of 0.01), we get the distribution of Z by drawing 10 ET’s according to a normal with mean µT and standard deviation σT and plotting the obtained pET ’s in a histogram (bin size as in (Mizuseki & Buzsáki, 2013, Fig S5 B)), see Figure 511.

156 proof that normal nbs yield lognormal ses

5.5 Proof that normal NBs yield lognormal SEs

Here we provide the formal derivation that a normal NB size dis- tribution yields a lognormal SE size distribution. To this end let X (µ , σ ). For analytical tractability we consider the limit of ∼ N X X very short NBs (i.e. ∆ 0) and large network size (i.e. NCA3 ∞ NB → → and NCA1 ∞). Further, we assume a noiseless membrane potential → (i.e. σ = 0) homogeneous neurons (i.e. σT = 0), homogeneous synap- tic weights (i.e. σw = 0) and constant inhibition. Hence, because of homogeneity, constant inhibition and in the limit of ∆ 0, a CA1 NB → PC spikes in a SE if at least k CA3 PCs that project to it spike in the corresponding NB. Furthermore, to avoid technical details, we ignore the dependencies among different SEs that come from NBs 1. For- mally, the assumption we make is equivalent to setting the synapses at random with a certain probability at every time we observe a NB. While this is of course unreasonable, due to the sparsity of the NBs the chance that CA1 PCs share their input is so small the dependencies are negligible. Assuming the NB has size X = x, this assumption implies that the probability qX that a particular CA1 PC spikes in the resulting SE is Pr[Bin(x, p) k], as the number of synaptic inputs ≥ from the NB to a CA1 neuron follows a binomial distribution with x trials and (expected) success probability p (recall that each CA1 PC receives on expectation NCA3 p synaptic contacts from CA3 PCs) and · the CA1 PC spikes if it receives at least k inputs. Observe that qX is a random variable since it depends on X. Furthermore, denote by Y∗ the random variable corresponding to the size of the resulting SEs.

1One can easily observe that the dependencies are not negligible, say, if the size of the NBs would be almost the whole CA3. However, in the setting we consider here, and in the typical biological scenario, the NBs would only correspond to a small fraction of the whole CA3 and this assumption is thus justified.

157 5. lognormal synchrony in ca1

Because each neuron in CA1 can appear in the SE with probability q , we have Y Bin(NCA1, q ). Due to the assumption of large X ∗ ∼ X network size (NCA1 ∞), the distribution of Y , given X = x, is → ∗ largely concentrated around the value E [Y ] = E[Y X = x]. Again, x ∗ ∗| to avoid technical details, we will make use of this fact and replace Y∗ by the random variable Y E[Y X], which can be written as: ∼ ∗| Y NCA1 Pr[Bin(X, p) k].(5.7) ∼ · ≥ To show that Y follows a lognormal distribution it suffices to show that log Y follows a normal distribution. Note that from Equation (5.7) it follows that log Y can be written as f (X), with

f (x) := log(NCA1) + log (Pr[Bin(x, p) k]).(5.8) ≥ We start by approximating f (X). We assume µ p = c k for some X · · small constant c with 0 < c < 1, as otherwise almost all (c 1, ≥ means typically more than 1/2 of CA1) or almost no CA1 (vanishing c means that the probability a neuron participates is very small due to concentration of the binomial distribution) neuron participates in any SE. Further, we assume that k and p are functions of the network size. We get   !  CA1 X i X i f (X) = log N + log ∑ p (1 p) − (5.9) i k i − ≥   X   Xp  = log NCA1 + log pk(1 p)X k 1 + k − − · O k (5.10)

X i X i where we repeatedly use the fact that, for i k, ( i )p (1 p) − = X i 1 X i 1 ≥ − (Xp/k)(i 1)p − (1 p) − − to simplify the sum. This expression O − − 158 proof that normal nbs yield lognormal ses can be further simplified to obtain:

   Xp  f (X) = log NCA1 + k log (Xp) log (k!) Xp + + Xp2 − − O k (5.11) where we just take the log and Taylor expand log(1 p) up to second − order terms (hence the term Xp2). In the sequel we neglect the last term in (5.11) as its expectation is bounded by (c k), which is a lower order term. We continue O · approximating the expectation and the variance of f (X). Using the Taylor approximation of f (X) at µX we get

f (µ ) E[ f (X)] f (µ ) + 00 X σ2 (5.12) ≈ X 2 · X   log NCA1 + k log (µ p) log (k!) µ p =: µ (5.13) ≈ X − − X f and

2 4 2 2 f 00(µX) σX Var[ f (X)] f 0(µ ) σ + (5.14) ≈ X X 2  2 k 2 2 p σX =: σf ,(5.15) ≈ µX − · where we discarded lower order terms. To finally show that log Y ∼ (µ , σ ) holds we apply the following lemma with µ = µ p and N f f n X · σn = σX. Lemma 5.1. Let X (0, 1) and define sequences of random variables ∼ N

Yn := f (g(X)) and Zn := h(X), (5.16)

159 5. lognormal synchrony in ca1 where for µ , σ , k Ω(1), the functions f , g and h are defined as n n n ∈ f (x) := k log x x (5.17) n − g(x) := µn + σnx (5.18)   kn h(x) := kn log µn µn + σn 1 x. (5.19) − µn − If µ = c k = ω(1) holds for some constant c with 0 < c < 1 and n · n σn/µn = o(1) holds, then Y D Z (5.20) n −→ n as n ∞. → Proof. To begin the proof we first define auxiliary random variables

Y˜ := Y X Ω and (5.21) n n | ∈ Z˜ := Z X Ω,(5.22) n n | ∈ with Ω := [(k µ )/σ , ∞[. To prove the lemma it suffices to show n − n n that Y˜ D Z˜ (5.23) n −→ n holds, as X / Ω occurs with probability o(1). For such convergence ∈ in distribution it suffices to show that for any ε > 0 and all t R if n ∈ is sufficiently large then

Pr[Y˜ t] Pr[Z˜ t] ε.(5.24) | n ≤ − n ≤ | ≤ Denote by F the cumulative distribution function of X. Since Y˜n and Z˜ condition on the event X Ω we need to scale their densities by n ∈ the term 1 = 1 + o(1),(5.25)  k µ  1 F n− n − σn

160 proof that normal nbs yield lognormal ses as (k µ )/σ ∞. Therefore, n − n n → − Z 1 x2/2 Pr[Y˜n t] = (1 + o(1)) e− dx and (5.26) ≤ · Sy˜ √2π Z 1 x2/2 Pr[Z˜ n t] = (1 + o(1)) e− dx (5.27) ≤ · Sz˜ √2π hold, where

S = Ω ( f g) 1 (] ∞, t]) and (5.28) y˜ ∩ ◦ − − S = Ω h 1 (] ∞, t]) .(5.29) z˜ ∩ − − It suffices to show that both

S S and S S (5.30) y˜ \ z˜ z˜ \ y˜ are vanishing in measure. Note that both h and f g are monotonously ◦ decreasing on the interval Ω (the latter has a maximum in (k n − µn)/σn). Hence, i i Ω ( f g) 1 (] ∞, t]) = Ω ∞, ( f g) 1(t) and (5.31) ∩ ◦ − − ∩ − ◦ − i i Ω h 1 (] ∞, t]) = Ω ∞, h 1(t) .(5.32) ∩ − − ∩ − − Therefore, it remains to prove

( f g) 1(t) h 1(t) = o(1).(5.33) | ◦ − − − |

161 5. lognormal synchrony in ca1

First, we have for all x Ω, that ∈

( f g)(x) = kn log(µn + σnx) µn σnx (5.34) ◦   − − σnx = kn log µn 1 + µn σnx (5.35) · µn − −   σnx = kn log µn + kn log 1 + µn σnx (5.36) µn − − σnx = kn log µn + kn µn σnx + o(1) (5.37) µn − −   kn = kn log µn µn + σn 1 x + o(1) (5.38) − µn − · = h(x) + o(1),(5.39) where we applied a Taylor approximation of log(1 + x) at x = 0. Next, we observe that h(x) is a line with slope δ := σ (c 1) n · − ∈ Ω(1). Thus, ( f g)(x) = h(x) + o(1) = h(x + o(1)/δ) and therefore 1 ◦ 1 1 ( f g)− (t) = h− (t + o(1)/δ). Finally, note that h− (t) is a line with ◦ 1 1 slope 1/δ and therefore we conclude ( f g)− (t) = h− (t + o(1)/δ) = 1 2 1 ◦ h− (t) + o(1)/δ = h− (t) + o(1).

162 Bibliography

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., . . . Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. (Software available from tensorflow.org) Abeles, M. (1982). Local Cortical Circuits: An Electrophysiological Study. Berlin Heidelberg: Springer. Abeles, M. (1991). Corticonics: Neural Circuits of the Cerebral Cortex. Cambridge: Cambridge University Press. Abeles, M. (2009). Synfire Chains. Scholarpedia, 4(7), 1441. Abeles, M., Bergman, H., Margalit, E., & Vaadia, E. (1993). Spatiotem- poral Firing Patterns in the Frontal Cortex of Behaving Monkeys. Journal of Neurophysiology, 70(4), 1629–1638. Abeles, M., Hayon, G., & Lehmann, D. (2004). Modeling Compo- sitionality by Dynamic Binding of Synfire Chains. Journal of Computational Neuroscience, 17(2), 179–201. Abraham, W. C. (2008). Metaplasticity: tuning synapses and networks for plasticity. Nature Reviews Neuroscience, 9(5), 387–387. Aertsen, A., & Braitenberg, V. (1996). Brain Theory: Biological Basis and Computational Principles. Amsterdam: Elsevier. Alon, N., & Spencer, J. H. (2008). The Probabilistic Method. New Jersey: John Wiley & Sons. Amit, D. J., & Fusi, S. (1994). Learning in Neural Networks with Material Synapses. Neural Computation, 6(5), 957–982. Ariav, G., Polsky, A., & Schiller, J. (2003). Submillisecond Precision of the Input-Output Transformation Function Mediated by Fast

163 Sodium Dendritic Spikes in Basal Dendrites of CA1 Pyramidal Neurons. Journal of Neuroscience, 23(21), 7750–7758. Arnoldi, H.-M. R., Englmeier, K.-H., & Brauer, W. (1999). Translation- Invariant Pattern Recognition Based on Synfire Chains. Biological Cybernetics, 80(6), 433–447. Artola, A., Bröcher, S., & Singer, W. (1990). Different voltage- dependent thresholds for inducing long-term depression and long-term potentiation in slices of rat visual cortex. Nature, 347(6288), 69. Aviel, Y., Mehring, C., Abeles, M., & Horn, D. (2003). On Embedding Synfire Chains in a Balanced Network. Neural Computation, 15(6), 1321–1340. Baldassi, C., Braunstein, A., Brunel, N., & Zecchina, R. (2007). Efficient supervised learning in networks with binary synapses. Proceed- ings of the National Academy of Sciences, 104(26), 11079–11084. Barlow, H., & Földiák, P. (1989). Adaptation and Decorrelation in the Cortex. In The Computing Neuron (pp. 54–72). Wokingham: Addison-Wesley. Beggs, J. M., & Plenz, D. (2003). Neuronal Avalanches in Neocortical Circuits. Journal of Neuroscience, 23(35), 11167–11177. Ben Dayan Rubin, D. D., & Fusi, S. (2007). Long Memory Lifetimes Require Complex Synapses and Limited Sparseness. Frontiers in Computational Neuroscience, 1, 7. Bezaire, M. J., & Soltesz, I. (2013). Quantitative Assessment of CA1 Local Circuits: Knowledge Base for Interneuron-Pyramidal Cell Connectivity. Hippocampus, 23(9), 751–785. Bi, G.-q., & Poo, M.-m. (1998). Synaptic Modifications in Cultured Hippocampal Neurons: Dependence on Spike Timing, Synaptic Strength, and Postsynaptic Cell Type. Journal of Neuroscience,

164 18(24), 10464–10472. Bienenstock, E. L. (1991). Notes on the Growth of a Composition Ma- chine. In D. Andler, E. Bienenstock, & B. Laks (Eds.), Proceedings of the First Interdisciplinary Workshop on Compositionality in Cogni- tion and Neural Networks (pp. 25–43). Abbaye de Royaumont. Bienenstock, E. L. (1995). A Model of Neocortex. Network: Computation in Neural Systems, 6(2), 179–224. Bienenstock, E. L., Cooper, L. N., & Munro, P. W. (1982). Theory for the Development of Neuron Selectivity: Orientation Specificity and Binocular Interaction in Visual Cortex. Journal of Neuroscience, 2(1), 32–48. Bishop, C. M. (1996). Neural Networks for Pattern Recognition. USA: Oxford University Press. Bliss, T. V., Collingridge, G. L., & Morris, R. G. (1993). A synaptic model of memory: long-term potentiation in the hippocampus. Nature, 361(6407), 31–39. Bliss, T. V., & Lømo, T. (1973). Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path. Journal of Physiology, 232(2), 331–356. Boerlin, M., Machens, C. K., & Denève, S. (2013). Predictive Coding of Dynamical Variables in Balanced Spiking Networks. PLoS Computational Biology, 9(11), e1003258. Bollobás, B. (2001). Random Graphs. Cambridge: Springer. Branco, T., & Staras, K. (2009). The probability of neurotransmitter release: variability and feedback control at single synapses. Nature Reviews Neuroscience, 10(5), 373. Brea, J., Senn, W., & Pfister, J.-P. (2013). Matching Recall and Storage in Sequence Learning with Spiking Neural Networks. Journal of

165 Neuroscience, 33(23), 9565–9575. Brette, R. (2015). Philosophy of the Spike: Rate-Based vs. Spike-Based Theories of the Brain. Frontiers in Systems Neuroscience, 9, 151. Brown, T. H., Chapman, P. F., Kairiss, E. W., & Keenan, C. L. (1988). Long-Term Synaptic Potentiation. Science, 242(4879), 724–728. Brunel, N. (2000). Dynamics of Sparsely Connected Networks of Exci- tatory and Inhibitory Spiking Neurons. Journal of Computational Neuroscience, 8(3), 183–208. Brunel, N., & Wang, X.-J. (2003). What Determines the Frequency of Fast Network Oscillations with Irregular Neural Discharges? I. Synaptic Dynamics and Excitation-Inhibition Balance. Journal of Neurophysiology, 90(1), 415–430. Buhl, E., Cobb, S., Halasy, K., & Somogyi, P. (1995). Properties of unitary IPSPs evoked by anatomically identified basket cells in the rat hippocampus. European Journal of Neuroscience, 7(9), 1989–2004. Burkitt, A. N. (2006). A Review of the Integrate-and-Fire Neuron Model: I. Homogeneous Synaptic Input. Biological Cybernetics, 95(1), 1–19. Bush, K., Federer, W., Pesotan, H., & Raghavarao, D. (1984). New combinatorial designs and their applications to group testing. Journal of Statistical Planning and Inference, 10(3), 335–343. Buzsáki, G. (2006). Rhythms of the Brain. USA: Oxford University Press. Buzsáki, G. (2010). Neural Syntax: Cell Assemblies, Synapsembles, and Readers. Neuron, 68(3), 362–385. Buzsáki, G. (2015). Hippocampal Sharp Wave-Ripple: A Cognitive Biomarker for Episodic Memory and Planning. Hippocampus, 25(10), 1073–1188.

166 Buzsáki, G., Horvath, Z., Urioste, R., Hetke, J., & Wise, K. (1992). High-frequency network oscillation in the hippocampus. Science, 256(5059), 1025–1027. Buzsáki, G., & Mizuseki, K. (2014). The log-dynamic brain: how skewed distributions affect network operations. Nature Reviews Neuroscience, 15(4), 264–278. Buzsáki, G., Vanderwolf, C. H., et al. (1983). Cellular Bases of Hip- pocampal EEG in the Behaving Rat. Brain Research Reviews, 6(2), 139–171. Cassenaer, S., & Laurent, G. (2012). Conditional Modulation of Spike- Timing-Dependent Plasticity for Olfactory Learning. Nature, 482(7383), 47–52. Chen, J. L., Villa, K. L., Cha, J. W., So, P. T., Kubota, Y., & Nedivi, E. (2012). Clustered Dynamics of Inhibitory Synapses and Dendritic Spines in the Adult Neocortex. Neuron, 74(2), 361–373. Clopath, C., Büsing, L., Vasilaki, E., & Gerstner, W. (2010). Connec- tivity Reflects Coding: a Model of Voltage-based STDP with Homeostasis. Nature Neuroscience, 13(3), 344–352. Clopath, C., & Gerstner, W. (2010). Voltage and Spike Timing Interact in STDP – a Unified Model. Frontiers in Synaptic Neuroscience, 2, 25. Clothiaux, E. E., Bear, M. F., & Cooper, L. N. (1991). Synaptic Plastic- ity in Visual Cortex: Comparison of Theory with Experiment. Journal of Neurophysiology, 66(5), 1785–1804. Cohen, M. R., & Maunsell, J. H. (2009). Attention Improves Perfor- mance Primarily by Reducing Interneuronal Correlations. Nature Neuroscience, 12(12), 1594–1600. Conrad, R. (1965). Order Error in Immediate Recall of Sequences. Journal of Verbal Learning and Verbal Behavior, 4(3), 161–169.

167 Cooper, L. N., & Bear, M. F. (2012). The BCM Theory of Synapse Mod- ification at 30: Interaction of Theory with Experiment. Nature Reviews Neuroscience, 13(11), 798–810. Couey, J. J., Meredith, R. M., Spijker, S., Poorthuis, R. B., Smit, A. B., Brussaard, A. B., & Mansvelder, H. D. (2007). Distributed Network Actions by Nicotine Increase the Threshold for Spike- Timing-Dependent Plasticity in Prefrontal Cortex. Neuron, 54(1), 73–87. Csicsvari, J., Hirase, H., Czurkó, A., Mamiya, A., & Buzsáki, G. (1999a). Fast Network Oscillations in the Hippocampal CA1 Region of the Behaving Rat. Journal of Neuroscience, 19(RC20), 1–4. Csicsvari, J., Hirase, H., Czurkó, A., Mamiya, A., & Buzsáki, G. (1999b). Oscillatory Coupling of Hippocampal Pyramidal Cells and In- terneurons in the Behaving Rat. Journal of Neuroscience, 19(1), 274–287. Csicsvari, J., Hirase, H., Mamiya, A., & Buzsáki, G. (2000). Ensem- ble Patterns of Hippocampal CA3-CA1 Neurons during Sharp Wave–Associated Population Events. Neuron, 28(2), 585–594. Cummings, J. A., Mulkey, R. M., Nicoll, R. A., & Malenka, R. C. (1996). Ca2+ Signaling Requirements for Long-Term Depression in the Hippocampus. Neuron, 16(4), 825–833. Cutsuridis, V., & Hasselmo, M. (2010). Dynamics and Function of a CA1 Model of the Hippocampus during Theta and Ripples. In International Conference on Artificial Neural Networks (pp. 230– 240). Cutsuridis, V., & Taxidis, J. (2013). Deciphering the role of CA1 inhibitory circuits in sharp wave-ripple complexes. Frontiers in Systems Neuroscience, 7, 13. Dale, H. H. (1935). Pharmacology and nerve-endings. Proceedings of

168 the Royal Society of Medicine, 28(3), 319–332. Dayan, P., & Abbott, L. F. (2001). Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. Cambridge MA: MIT Press. Denève, S., Alemi, A., & Bourdoukan, R. (2017). The Brain as an Efficient and Robust Adaptive Learner. Neuron, 94(5), 969–977. Denève, S., & Machens, C. K. (2016). Efficient codes and balanced networks. Nature Neuroscience, 19(3), 375–382. Deuchars, J., & Thomson, A. (1996). Ca1 pyramid-pyramid connections in rat hippocampus in vitro: dual intracellular recordings with biocytin filling. Neuroscience, 74(4), 1009–1018. Diesmann, M., Gewaltig, M.-O., & Aertsen, A. (1999). Stable propaga- tion of synchronous spiking in cortical neural networks. Nature, 402(6761), 529–533. Donoso, J. R., Schmitz, D., Maier, N., & Kempter, R. (2018). Hippocam- pal Ripple Oscillations and Inhibition-First Network Models: Frequency Dynamics and Response to GABA Modulators. Jour- nal of Neuroscience, 38(12), 3124–3146. Draguhn, A., Traub, R., Schmitz, D., & Jefferys, J. (1998). Electrical cou- pling underlies high-frequency oscillations in the hippocampus in vitro. Nature, 394(6689), 189. Drew, L. J., Kheirbek, M. A., Luna, V. M., Denny, C. A., Cloidt, M. A., Wu, M. V., . . . Hen, R. (2016). Activation of Local Inhibitory Circuits in the Dentate Gyrus by Adult-Born Neurons. Hippocampus, 26(6), 763–778. Dubhashi, D. P., & Panconesi, A. (2009). Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge: Cambridge University Press. Dudek, S. M., & Bear, M. F. (1992). Homosynaptic long-term depres-

169 sion in area CA1 of hippocampus and effects of N-methyl-D- aspartate receptor blockade. Proceedings of the National Academy of Sciences, 89(10), 4363–4367. Dupret, D., O’Neill, J., Pleydell-Bouverie, B., & Csicsvari, J. (2010). The reorganization and reactivation of hippocampal maps predict spatial memory performance. Nature Neuroscience, 13(8), 995– 1002. Einarsson, H., Lengler, J., Panagiotou, K., Mousset, F., & Steger, A. (2014). Bootstrap percolation with inhibition. arXiv preprint arXiv:1410.3291. Einarsson, H., Lengler, J., & Steger, A. (2014). A High-Capacity Model for One Shot Association Learning in the Brain. Frontiers in Computational Neuroscience, 8, 140. English, D. F., Peyrache, A., Stark, E., Roux, L., Vallentin, D., Long, M. A., & Buzsáki, G. (2014). Excitation and Inhibition Compete to Control Spiking during Hippocampal Ripples: Intracellular Study in Behaving Mice. Journal of Neuroscience, 34(49), 16509– 16517. Erd˝os,P., & Rényi, A. (1959). On Random Graphs. Publicationes Mathematicae Debrecen, 6, 290–297. Erickson, M. A., Maramara, L. A., & Lisman, J. (2010). A Single Brief Burst Induces GluR1-Dependent Associative Short-Term Potentiation: a Potential Mechanism for Short-term Memory. Journal of Cognitive Neuroscience, 22(11), 2530–2540. Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), 429. Eschenko, O., Ramadan, W., Mölle, M., Born, J., & Sara, S. J. (2008). Sustained increase in hippocampal sharp-wave ripple activity

170 during slow-wave sleep after learning. Learning & Memory, 15(4), 222–228. Eurich, C. W., Herrmann, J. M., & Ernst, U. A. (2002). Finite-size effects of avalanche dynamics. Physical Review E, 66(6), 066137. Fechner, G. T. (1860). Elemente der Psychophysik. Leipzig: Breitkopf und Härtl. Fiete, I. R., Senn, W., Wang, C. Z., & Hahnloser, R. H. (2010). Spike- Time-Dependent Plasticity and Heterosynaptic Competition Or- ganize Networks to Produce Long Scale-Free Sequences of Neu- ral Activity. Neuron, 65(4), 563–576. Flores, C. E., & Méndez, P. (2014). Shaping inhibition: activity depen- dent structural plasticity of GABAergic synapses. Frontiers in Cellular Neuroscience, 8, 327. Földiák, P. (1990). Forming sparse representations by local anti- Hebbian learning. Biological Cybernetics, 64(2), 165–170. Frémaux, N., & Gerstner, W. (2015). Neuromodulated Spike-Timing- Dependent Plasticity, and Theory of Three-Factor Learning Rules. Frontiers in Neural Circuits, 9, 85. Friedrich, J., Urbanczik, R., & Senn, W. (2011). Spatio-Temporal Credit Assignment in Neuronal Population Learning. PLoS Computational Biology, 7(6), e1002092. Froemke, R. C., & Dan, Y. (2002). Spike-timing-dependent synaptic modification induced by natural spike trains. Nature, 416(6879), 433–438. Füredi, Z. (1996). On r-Cover-free Families. Journal of Combinatorial Theory, Series A, 73(1), 172–173. Gauy, M. M., Lengler, J., Einarsson, H., Meier, F., Weissenberger, F., Yanik, M. F., & Steger, A. (2018). A hippocampal model for behavioral time acquisition and fast bidirectional replay of

171 spatio-temporal memory sequences. bioRxiv, 343988. Gauy, M. M., Meier, F., & Steger, A. (2017). Multiassociative Memory: Recurrent Synapses Increase Storage Capacity. Neural Computa- tion, 29(5), 1375–1405. Gerstein, G. L., Williams, E. R., Diesmann, M., Grün, S., & Trengove, C. (2012). Detecting Synfire Chains in Parallel Spike Data. Journal of Neuroscience Methods, 206(1), 54–64. Gerstner, W., Kempter, R., van Hemmen, J. L., & Wagner, H. (1996). A neuronal learning rule for sub-millisecond temporal coding. Nature, 383(6595), 76–78. Gerstner, W., Kistler, W. M., Naud, R., & Paninski, L. (2014). Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition. Cambridge: Cambridge University Press. Gewaltig, M.-O., & Diesmann, M. (2007). NEST (NEural Simulation Tool). Scholarpedia, 2(4), 1430. Gewaltig, M.-O., Diesmann, M., & Aertsen, A. (2001). Propagation of Cortical Synfire Activity: Survival Probability in Single Trials and Stability in the Mean. Neural Networks, 14(6), 657–673. Girardeau, G., Benchenane, K., Wiener, S. I., Buzsáki, G., & Zugaro, M. B. (2009). Selective suppression of hippocampal ripples impairs spatial memory. Nature Neuroscience, 12(10), 1222–1223. Goedeke, S., & Diesmann, M. (2008). The Mechanism of Synchro- nization in Feed-Forward Neuronal Networks. New Journal of Physics, 10(1), 015007. Graupner, M., & Brunel, N. (2012). Calcium-based plasticity model explains sensitivity of synaptic changes to spike pattern, rate, and dendritic location. Proceedings of the National Academy of Sciences, 109(10), 3991–3996. Graupner, M., Wallisch, P., & Ostojic, S. (2016). Natural Firing Patterns

172 Imply Low Sensitivity of Synaptic Plasticity to Spike Timing Compared with Firing Rate. Journal of Neuroscience, 36(44), 11238– 11258. Griffith, J. (1963). On the Stability of Brain-Like Structures. Biophysical Journal, 3(4), 299–308. Habenschuss, S., Jonke, Z., & Maass, W. (2013). Stochastic Computa- tions in Cortical Microcircuit Models. PLoS Computational Biology, 9(11), e1003311. Hahnloser, R. H., Kozhevnikov, A. A., & Fee, M. S. (2002). An ultra- sparse code underlies the generation of neural sequences in a songbird. Nature, 419(6902), 65–70. Hájos, N., Karlócai, M. R., Németh, B., Ulbert, I., Monyer, H., Szabó, G., . . . Gulyás, A. I. (2013). Input-Output Features of Anatomically Identified CA3 Neurons during Hippocampal Sharp Wave/Rip- ple Oscillation In Vitro. Journal of Neuroscience, 33(28), 11677– 11691. Hayon, G., Abeles, M., & Lehmann, D. (2005). A Model for Repre- senting the Dynamics of a System of Synfire Chains. Journal of Computational Neuroscience, 18(1), 41–53. Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Theory. New York: John Wiley & Sons. Herrmann, M., Hertz, J., & Prügel-Bennett, A. (1995). Analysis of Synfire Chains. Network: Computation in Neural Systems, 6(3), 403–414. Hertz, J. (1997). Modelling Synfire Processing. In K.-Y. M. Wong, I. King, & D.-Y. Yeung (Eds.), Theoretical Aspects of Neural Compu- tation (pp. 135—144). Springer. Hertz, J., & Prügel-Bennett, A. (1996). Learning Short Synfire Chains by Self-Organization. Network: Computation in Neural Systems,

173 7(2), 357–363. Herz, A. V., & Hopfield, J. J. (1995). Earthquake Cycles and Neural Reverberations: Collective Oscillations in Systems with Pulse- Coupled Threshold Elements. Physical Review Letters, 75(6), 1222– 1225. Hodgkin, A. L., & Huxley, A. F. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. Journal of Physiology, 117(4), 500–544. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8), 2554–2558. Hulse, B. K., Moreaux, L. C., Lubenov, E. V., & Siapas, A. G. (2016). Membrane Potential Dynamics of CA1 Pyramidal Neurons Dur- ing Hippocampal Ripples in Awake Mice. Neuron, 89(4), 800– 813. Ikegaya, Y., Aaron, G., Cossart, R., Aronov, D., Lampl, I., Ferster, D., & Yuste, R. (2004). Synfire Chains and Cortical Songs: Temporal Modules of Cortical Activity. Science, 304(5670), 559–564. Ikegaya, Y., Sasaki, T., Ishikawa, D., Honma, N., Tao, K., Takahashi, N., . . . Matsuki, N. (2012). Interpyramid Spike Transmission Stabilizes the Sparseness of Recurrent Network Activity. Cerebral Cortex, 23(2), 293–304. Intrator, N., & Cooper, L. N. (1992). Objective Function Formula- tion of the BCM Theory of Visual Cortical Plasticity: Statistical Connections, Stability Conditions. Neural Networks, 5(1), 3–17. Isaac, J. T., Nicoll, R. A., & Malenka, R. C. (1995). Evidence for Silent Synapses: Implications for the Expression of LTP. Neuron, 15(2), 427–434. Isaacson, J. S., & Scanziani, M. (2011). How Inhibition Shapes Cortical

174 Activity. Neuron, 72(2), 231–243. Izhikevich, E. M. (2006). Polychronization: Computation with Spikes. Neural Computation, 18(2), 245–282. Izhikevich, E. M., & Desai, N. S. (2003). Relating STDP to BCM. Neural Computation, 15(7), 1511–1523. Jacquemin, C. (1994). A Temporal Connectionist Approach to Natural Language. ACM SIGART Bulletin, 5(3), 12–22. Janson, S., Łuczak, T., Turova, T., & Vallier, T. (2012). Bootstrap percolation on the random graph Gn,p. The Annals of Applied Probability, 22(5), 1989–2047. Jun, J. K., & Jin, D. Z. (2007). Development of Neural Circuitry for Precise Temporal Sequences through Spontaneous Activity, Axon Remodeling, and Synaptic Plasticity. PLoS One, 2(8), e723. Kappel, D., Habenschuss, S., Legenstein, R., & Maass, W. (2015). Network Plasticity as Bayesian Inference. PLoS Computational Biology, 11(11), e1004485. Kautz, W., & Singleton, R. (1964). Nonrandom Binary Superimposed Codes. IEEE Transactions on , 10(4), 363–377. Kempter, R., Gerstner, W., & van Hemmen, J. L. (1999). Hebbian Learning and Spiking Neurons. Physical Review E, 59(4), 4498– 4514. Kitano, K., Câteau, H., & Fukai, T. (2002). Self-Organization of Memory Activity through Spike-Timing-Dependent Plasticity. Neuroreport, 13(6), 795–798. Klausberger, T., & Somogyi, P. (2008). Neuronal Diversity and Tem- poral Dynamics: The Unity of Hippocampal Circuit Operations. Science, 321(5885), 53–57. Knoblauch, A., Palm, G., & Sommer, F. T. (2010). Memory Capacities for Synaptic and Structural Plasticity. Neural Computation, 22(2),

175 289–341. Koch, C., Poggio, T., & Torre, V. (1983). Nonlinear interactions in a dendritic tree: localization, timing, and role in information processing. Proceedings of the National Academy of Sciences, 80(9), 2799–2802. Koulakov, A. A., Hromádka, T., & Zador, A. M. (2009). Correlated Con- nectivity and the Distribution of Firing Rates in the Neocortex. Journal of Neuroscience, 29(12), 3685–3694. Kumar, A., Rotter, S., & Aertsen, A. (2008). Conditions for Prop- agating Synchronous Spiking and Asynchronous Firing Rates in a Cortical Network Model. Journal of Neuroscience, 28(20), 5268–5280. Kvalseth, T. (1982). Some Informational Properties of the Lognormal Distribution. IEEE Transactions on Information Theory, 28(6), 963– 966. Kwon, O.-B., Longart, M., Vullhorst, D., Hoffman, D. A., & Buonanno, A. (2005). Neuregulin-1 Reverses Long-Term Potentiation at CA1 Hippocampal Synapses. Journal of Neuroscience, 25(41), 9378–9383. La Camera, G., Giugliano, M., Senn, W., & Fusi, S. (2008). The response of cortical neurons to in vivo-like input current: theory and experiment. Biological Cybernetics, 99(4-5), 279–301. Lánský, P. (1984). On Approximations of Stein’s Neuronal Model. Journal of Theoretical Biology, 107(4), 631–647. Lashley, K. S. (1951). The Problem of Serial Order in Behavior. In L. A. Jeffress (Ed.), Cerebral Mechanisms in Behavior (pp. 112–131). Wiley. Lazar, A., Pipa, G., & Triesch, J. (2009). SORN: a Self-Organizing Re- current Neural Network. Frontiers in Computational Neuroscience,

176 3, 23. LeCun, Y. (1998). The MNIST database of handwritten digits. Retrieved 2018, from http://yann.lecun.com/exdb/mnist/ Leibold, C., & Kempter, R. (2006). Memory Capacity for Sequences in a Recurrent Network with Biological Constraints. Neural Computation, 18(4), 904–941. Leibold, C., & Kempter, R. (2008). Sparseness Constrains the Prolon- gation of Memory Lifetime via Synaptic Metaplasticity. Cerebral Cortex, 18(1), 67–77. Levy, N., Horn, D., Meilijson, I., & Ruppin, E. (2001). Distributed Synchrony in a Cell Assembly of Spiking Neurons. Neural Networks, 14(6), 815–824. Lisman, J., & Spruston, N. (2005). Postsynaptic depolarization require- ments for LTP and LTD: a critique of spike timing-dependent plasticity. Nature Neuroscience, 8(7), 839–841. Litwin-Kumar, A., & Doiron, B. (2014). Formation and maintenance of neuronal assemblies through synaptic plasticity. Nature Com- munications, 5, 5319. Loewenstein, Y., Kuras, A., & Rumpel, S. (2011). Multiplicative Dy- namics Underlie the Emergence of the Log-Normal Distribution of Spine Sizes in the Neocortex In Vivo. Journal of Neuroscience, 31(26), 9481–9488. London, M., & Häusser, M. (2005). Dendritic Computation. Annual Reviews Neuroscience, 28, 503–532. London, M., Roth, A., Beeren, L., Häusser, M., & Latham, P. E. (2010). Sensitivity to perturbations in vivo implies high noise and sug- gests rate coding in cortex. Nature, 466(7302), 123. Long, M. A., Jin, D. Z., & Fee, M. S. (2010). Support for a synaptic chain model of neuronal sequence generation. Nature, 468(7322),

177 394–399. Luczak, A., Barthó, P., Marguet, S. L., Buzsáki, G., & Harris, K. D. (2007). Sequential structure of neocortical spontaneous activity in vivo. Proceedings of the National Academy of Sciences, 104(1), 347–352. Maass, W., Natschläger, T., & Markram, H. (2002). Real-Time Com- puting without Stable States: A New Framework for Neural Computation Based on Perturbations. Neural Computation, 14(11), 2531–2560. Malerba, P., Krishnan, G. P., Fellous, J.-M., & Bazhenov, M. (2016). Hippocampal CA1 Ripples as Inhibitory Transients. PLoS Com- putational Biology, 12(4), e1004880. Malvache, A., Reichinnek, S., Villette, V., Haimerl, C., & Cossart, R. (2016). Awake hippocampal reactivations project onto orthogo- nal neuronal assemblies. Science, 353(6305), 1280–1283. Markram, H., Helm, P. J., & Sakmann, B. (1995). Dendritic calcium transients evoked by single back-propagating action potentials in rat neocortical pyramidal neurons. Journal of Physiology, 485(1), 1–20. Markram, H., Lübke, J., Frotscher, M., Roth, A., & Sakmann, B. (1997). Physiology and anatomy of synaptic connections between thick tufted pyramidal neurones in the developing rat neocortex. The Journal of Physiology, 500(2), 409–440. Markram, H., Lübke, J., Frotscher, M., & Sakmann, B. (1997). Regu- lation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs. Science, 275(5297), 213–215. Marr, D., Willshaw, D., & McNaughton, B. (1991). Simple Memory: A Theory for Archicortex. In From the Retina to the Neocortex (pp. 59–128). Birkhäuser.

178 Mason, A., Nicoll, A., & Stratford, K. (1991). Synaptic Transmission Between Individual Pyramidal Neurons of the Rat Visual Cortex In Vitro. Journal of Neuroscience, 11(1), 72–84. Mayr, C. G., & Partzsch, J. (2010). Rate and Pulse Based Plasticity Governed by Local Synaptic State Variables. Frontiers in Synaptic Neuroscience, 2, 33. McCulloch, W., & Pitts, W. (1943). A Logical of the Ideas Immanent in Nervous Activity. The Bulletin of Mathematical Biophysics, 5(4), 115–133. Mehring, C., Hehl, U., Kubo, M., Diesmann, M., & Aertsen, A. (2003). Activity Dynamics and Propagation of Synchronous Spiking in Locally Connected Random Networks. Biological Cybernetics, 88(5), 395–408. Memmesheimer, R.-M. (2010). Quantitative prediction of intermittent high-frequency oscillations in neural networks with supralinear dendritic interactions. Proceedings of the National Academy of Sciences, 107(24), 11092–11097. Miller, A., & Jin, D. Z. (2013). Potentiation Decay of Synapses and Length Distributions of Synfire Chains Self-Organized in Recur- rent Neural Networks. Physical Review E, 88(6), 062716. Mitzenmacher, M., & Upfal, E. (2005). Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge: Cam- bridge University Press. Mizuseki, K., & Buzsáki, G. (2013). Preconfigured, Skewed Distribu- tion of Firing Rates in the Hippocampus and Entorhinal Cortex. Cell Reports, 4(5), 1010–1021. Montgomery, J. M., & Madison, D. V. (2004). Discrete Synaptic States Define a Major Mechanism of Synapse Plasticity. Trends in Neurosciences, 27(12), 744–750.

179 Montgomery, J. M., Pavlidis, P., & Madison, D. V. (2001). Pair Record- ings Reveal All-Silent Synaptic Connections and the Postsynaptic Expression of Long-Term Potentiation. Neuron, 29(3), 691–701. Mulkey, R. M., & Malenka, R. C. (1992). Mechanisms Underlying Induction of Homosynaptic Long-Term Depression in Area CA1 of the Hippocampus. Neuron, 9(5), 967–975. Muller, R. U., Stead, M., & Pach, J. (1996). The Hippocampus as a Cognitive Graph. Journal of General Physiology, 107(6), 663–694. Nádasdy, Z., Hirase, H., Czurkó, A., Csicsvari, J., & Buzsáki, G. (1999). Replay and Time Compression of Recurring Spike Sequences in the Hippocampus. Journal of Neuroscience, 19(21), 9497–9507. Ngezahayo, A., Schachner, M., & Artola, A. (2000). Synaptic Activity Modulates the Induction of Bidirectional Synaptic Changes in Adult Mouse Hippocampus. Journal of Neuroscience, 20(7), 2451– 2458. Nobile, A., Ricciardi, L., & Sacerdote, L. (1985). Exponential Trends of Ornstein–Uhlenbeck First-Passage-Time Densities. Journal of Applied Probability, 22(02), 360–369. O’Connor, D. H., Wittenberg, G. M., & Wang, S. S.-H. (2005). Graded bidirectional synaptic plasticity is composed of switch-like uni- tary events. Proceedings of the National Academy of Sciences, 102(27), 9679–9684. Oja, E. (1982). Simplified Neuron Model as a Principal Component Analyzer. Journal of Mathematical Biology, 15(3), 267–273. O’Keefe, J., & Dostrovsky, J. (1971). The hippocampus as a spatial map. preliminary evidence from unit activity in the freely-moving rat. Brain Research, 34(1), 171–175. Okubo, T. S., Mackevicius, E. L., Payne, H. L., Lynch, G. F., & Fee, M. S. (2015). Growth and splitting of neural sequences in songbird

180 vocal development. Nature, 528(7582), 352–357. Okun, M., & Lampl, I. (2008). Instantaneous correlation of excitation and inhibition during ongoing and sensory-evoked activities. Nature Neuroscience, 11(5), 535–537. Okun, M., & Lampl, I. (2009). Balance of Excitation and Inhibition. Scholarpedia, 4(8), 7467. Omura, Y., Carvalho, M. M., Inokuchi, K., & Fukai, T. (2015). A Lognormal Recurrent Network Model for Burst Generation dur- ing Hippocampal Sharp Waves. Journal of Neuroscience, 35(43), 14585–14601. Oster, M., Douglas, R., & Liu, S.-C. (2009). Computation with Spikes in a Winner-Take-All Network. Neural Computation, 21(9), 2437– 2465. Pawlak, V., Wickens, J. R., Kirkwood, A., & Kerr, J. N. (2010). Timing is Not Everything: Neuromodulation Opens the STDP Gate. Frontiers in Synaptic Neuroscience, 2, 146. Petersen, C. C., Malenka, R. C., Nicoll, R. A., & Hopfield, J. J. (1998). All-or-none potentiation at CA3-CA1 synapses. Proceedings of the National Academy of Sciences, 95(8), 4732–4737. Pfister, J.-P., & Gerstner, W. (2005). Beyond Pair-Based STDP: A Phenomenological Rule for Spike Triplet and Frequency Effects. In Advances in Neural Information Processing Systems (pp. 1081– 1088). Poirazi, P., Brannon, T., & Mel, B. W. (2003). Pyramidal Neuron as Two-Layer Neural Network. Neuron, 37(6), 989–999. Poirazi, P., & Mel, B. W. (2001). Impact of Active Dendrites and Structural Plasticity on the Memory Capacity of Neural Tissue. Neuron, 29(3), 779–796. Potjans, W., Morrison, A., & Diesmann, M. (2010). Enabling Functional

181 Neural Circuit Simulations with Distributed Computing of Neu- romodulated Plasticity. Frontiers in Computational Neuroscience, 4, 141. Prut, Y., Vaadia, E., Bergman, H., Haalman, I., Slovin, H., & Abeles, M. (1998). Spatiotemporal Structure of Cortical Activity: Properties and Behavioral Relevance. Journal of Neurophysiology, 79(6), 2857– 2874. Rajan, K., Harvey, C. D., & Tank, D. W. (2016). Recurrent Network Models of Sequence Generation and Memory. Neuron, 90(1), 128–142. Renart, A., De La Rocha, J., Bartho, P., Hollender, L., Parga, N., Reyes, A., & Harris, K. D. (2010). The Asynchronous State in Cortical Circuits. Science, 327(5965), 587–590. Reyes, A. D. (2003). Synchrony-dependent propagation of firing rate in iteratively constructed networks in vitro. Nature Neuroscience, 6(6), 593–599. Rieke, F., Warland, D., de Ruyter van Steveninck, R., & Bialek, W. (1999). Spikes: Exploring the Neural Code. Cambridge MA: MIT Press. Ritter, G. X., Sussner, P., & Diza-de Leon, J. (1998). Morphological Associative Memories. IEEE Transactions on Neural Networks, 9(2), 281–293. Rolls, E. T., Stringer, S. M., & Elliot, T. (2006). Entorhinal cortex grid cells can map to hippocampal place cells by competitive learning. Network: Computation in Neural Systems, 17(4), 447–465. Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408. Roxin, A., Brunel, N., Hansel, D., Mongillo, G., & van Vreeswijk,

182 C. (2011). On the Distribution of Firing Rates in Networks of Cortical Neurons. Journal of Neuroscience, 31(45), 16217–16226. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. Sayer, R., Friedlander, M., & Redman, S. (1990). The Time Course and Amplitude of EPSPs Evoked at Synapses Between Pairs of CA3/CA1 Neurons in the Hippocampal Slice. Journal of Neuroscience, 10(3), 826–836. Schiess, M., Urbanczik, R., & Senn, W. (2016). Somato-Dendritic Synaptic Plasticity and Error-Backpropagation in Active Den- drites. PLoS Computational Biology, 12(2), e1004638. Schiller, J., Major, G., Koester, H. J., & Schiller, Y. (2000). NMDA spikes in basal dendrites of cortical pyramidal neurons. Nature, 404(6775), 285–289. Schiller, J., Schiller, Y., Stuart, G., & Sakmann, B. (1997). Calcium action potentials restricted to distal apical dendrites of rat neocortical pyramidal neurons. Journal of Physiology, 505(3), 605–616. Schlingloff, D., Káli, S., Freund, T. F., Hájos, N., & Gulyás, A. I. (2014). Mechanisms of Sharp Wave Initiation and Ripple Generation. Journal of Neuroscience, 34(34), 11385–11398. Segev, R., Baruchi, I., Hulata, E., & Ben-Jacob, E. (2004). Hidden Neuronal Correlations in Cultured Networks. Physical Review Letters, 92(11), 118102. Sejnowski, T. J., Koch, C., & Churchland, P. S. (1988). Computational Neuroscience. Science, 241(4871), 1299–1306. Seol, G. H., Ziburkus, J., Huang, S., Song, L., Kim, I. T., Takamiya, K., . . . Kirkwood, A. (2007). Neuromodulators Control the Polarity of Spike-Timing-Dependent Synaptic Plasticity. Neuron, 55(6),

183 919–929. Shadlen, M. N., & Newsome, W. T. (1994). Noise, Neural Codes and Cortical Organization. Current Opinion in Neurobiology, 4(4), 569–579. Shadlen, M. N., & Newsome, W. T. (1998). The Variable Discharge of Cortical Neurons: Implications for Connectivity, Computation, and Information Coding. Journal of Neuroscience, 18(10), 3870– 3896. Shouval, H. Z., Castellani, G. C., Blais, B. S., Yeung, L. C., & Cooper, L. N. (2002). Converging Evidence for a Simplified Biophysical Model of Synaptic Plasticity. Biological Cybernetics, 87(5), 383– 391. Sjöström, P. J., Turrigiano, G. G., & Nelson, S. B. (2001). Rate, Timing, and Cooperativity Jointly Determine Cortical Synaptic Plasticity. Neuron, 32(6), 1149–1164. Sjöström, P. J., Turrigiano, G. G., & Nelson, S. B. (2004). Endocannabinoid-Dependent Neocortical Layer-5 LTD in the Absence of Postsynaptic Spiking. Journal of Neurophysiology, 92(6), 3338–3343. Skaggs, W. E., & McNaughton, B. L. (1996). Replay of Neuronal Firing Sequences in Rat Hippocampus During Sleep Following Spatial Experience. Science, 271(5257), 1870–1873. Softky, W. R., & Koch, C. (1993). The Highly Irregular Firing of Corti- cal Cells is Inconsistent with Temporal Integration of Random EPSPs. Journal of Neuroscience, 13(1), 334–350. Song, S., Sjöström, P., Reigl, M., Nelson, S., & Chklovskii, D. (2005). Highly Nonrandom Features of Synaptic Connectivity in Local Cortical Circuits. PLoS Biology, 3(3), e68. Stark, E., Roux, L., Eichler, R., Senzai, Y., Royer, S., & Buzsáki, G. (2014).

184 Pyramidal Cell-Interneuron Interactions Underlie Hippocampal Ripple Oscillations. Neuron, 83(2), 467–480. Steele, P. M., & Mauk, M. D. (1999). Inhibitory Control of LTP and LTD: Stability of Synapse Strength. Journal of Neurophysiology, 81(4), 1559–1566. Stein, R. B. (1965). A Theoretical Analysis of Neuronal Variability. Biophysical Journal, 5(2), 173–194. Stein, R. B. (1967). The Information Capacity of Nerve Cells Using a Frequency Code. Biophysical Journal, 7(6), 797–826. Stuart, G. J., & Spruston, N. (2015). Dendritic integration: 60 years of progress. Nature Neuroscience, 18(12), 1713–1721. Sullivan, D., Csicsvari, J., Mizuseki, K., Montgomery, S., Diba, K., & Buzsáki, G. (2011). Relationships between Hippocampal Sharp Waves, Ripples, and Fast Gamma Oscillation: Influence of Dentate and Entorhinal Cortical Activity. Journal of Neuroscience, 31(23), 8605–8616. Sutton, J. (1997). Gibrat’s Legacy. Journal of Economic Literature, 35(1), 40–59. Suzuki, S. S., & Smith, G. K. (1988). Spontaneous EEG spikes in the nor- mal hippocampus. IV. Effects of medial septum and entorhinal cortex lesions. Electroencephalography and Clinical Neurophysiology, 70(1), 73–83. Tang, A., Jackson, D., Hobbs, J., Chen, W., Smith, J. L., Patel, H., . . . Beggs, J. M. (2008). A Maximum Entropy Model Applied to Spatial and Temporal Correlations from Cortical Networks In Vitro. Journal of Neuroscience, 28(2), 505–518. Taxidis, J., Coombes, S., Mason, R., & Owen, M. R. (2012). Modeling Sharp Wave-Ripple Complexes Through a CA3-CA1 Network Model with Chemical Synapses. Hippocampus, 22(5), 995–1017.

185 Taxidis, J., Mizuseki, K., Mason, R., & Owen, M. R. (2013). Influ- ence of slow oscillation on hippocampal activity and ripples through cortico-hippocampal synaptic interactions, analyzed by a cortical-CA3-CA1 network model. Frontiers in Computational Neuroscience, 7, 3. Thomson, A. M., & Radpour, S. (1991). Excitatory connections between CA1 pyramidal cells revealed by spike triggered averaging in slices of rat hippocampus are partially NMDA receptor medi- ated. European Journal of Neuroscience, 3(6), 587–601. Toyoizumi, T., Pfister, J.-P., Aihara, K., & Gerstner, W. (2005). General- ized Bienenstock-Cooper-Munro rule for spiking neurons that maximizes information transmission. Proceedings of the National Academy of Sciences, 102(14), 5239–5244. Traub, R. D., & Bibbig, A. (2000). A Model of High-Frequency Ripples in the Hippocampus Based on Synaptic Coupling Plus Axon– Axon Gap Junctions between Pyramidal Neurons. Journal of Neuroscience, 20(6), 2086–2093. Trengove, C., van Leeuwen, C., & Diesmann, M. (2013). High-Capacity Embedding of Synfire Chains in a Cortical Network Model. Journal of Computational Neuroscience, 34(2), 185–209. Tuckwell, H. C. (1988). Introduction to Theoretical Neurobiology: Nonlinear and Stochastic Theories. Cambridge: Cambridge University Press. Urbanczik, R., & Senn, W. (2009). Reinforcement learning in popula- tions of spiking neurons. Nature Neuroscience, 12(3), 250–252. van Vreeswijk, C., & Sompolinsky, H. (1996). Chaos in Neuronal Net- works with Balanced Excitatory and Inhibitory Activity. Science, 274(5293), 1724–1726. van Vreeswijk, C., & Sompolinsky, H. (1998). Chaotic Balanced State in a Model of Cortical Circuits. Neural Computation, 10(6), 1321–

186 1371. Vogels, T. P., Rajan, K., & Abbott, L. (2005). Neural Network Dynamics. Annual Reviews Neuroscience, 28, 357–376. Vogels, T. P., Sprekeler, H., Zenke, F., Clopath, C., & Gerstner, W. (2011). Inhibitory Plasticity Balances Excitation and Inhibition in Sensory Pathways and Memory Networks. Science, 334(6062), 1569–1573. Waddington, A., Appleby, P. A., De Kamps, M., & Cohen, N. (2012). Triphasic Spike-Timing-Dependent Plasticity Organizes Net- works to Produce Robust Sequences of Neural Activity. Frontiers in Computational Neuroscience, 6, 88. Wang, H.-X., Gerkin, R. C., Nauen, D. W., & Bi, G.-Q. (2005). Coactiva- tion and timing-dependent integration of synaptic potentiation and depression. Nature Neuroscience, 8(2), 187–193. Wasserman, L. (2013). All of Statistics: A Concise Course in Statistical Inference. New York: Springer. Weber, E. H. (1834). De pulsu, resorptione, auditu et tactu: Annotationes anatomicae et physiologicae. Leipzig: CF Koehler. Weissenberger, F., Einarsson, H., Gauy, M. M., Meier, F., Mujika, A., Lengler, J., & Steger, A. (2018). On the Origin of Lognormal Network Synchrony in CA1. Hippocampus. Weissenberger, F., Gauy, M. M., Lengler, J., Meier, F., & Steger, A. (2018). Voltage dependence of synaptic plasticity is essential for rate based learning with short stimuli. Scientific Reports, 8(1), 4609. Weissenberger, F., Meier, F., Lengler, J., Einarsson, H., & Steger, A. (2017). Long Synfire Chains Emerge by Spike-Timing Dependent Plasticity Modulated by Population Activity. International Journal of Neural Systems, 27(08), 1750044.

187 Wiechert, M. T., Judkewitz, B., Riecke, H., & Friedrich, R. W. (2010). Mechanisms of pattern decorrelation by recurrent neuronal cir- cuits. Nature Neuroscience, 13(8), 1003–1010. Wikipedia. (2017). Disjunct matrix. Retrieved 2018, from https://en.wikipedia.org/w/index.php?title=Disjunct _matrix&oldid=790379334 Willshaw, D. J., Buneman, O. P., & Longuet-Higgins, H. C. (1969). Non-holographic associative memory. Nature, 222, 960–962. Wilmes, K. A., Sprekeler, H., & Schreiber, S. (2016). Inhibition as a Binary Switch for Excitatory Plasticity in Pyramidal Neurons. PLoS Computational Biology, 12(3), e1004768. Yang, S., Yang, S., Moreira, T., Hoffman, G., Carlson, G. C., Bender, K. J., . . . Tang, C.-M. (2014). Interlamellar CA1 network in the hippocampus. Proceedings of the National Academy of Sciences, 111(35), 12919–12924. Ylinen, A., Bragin, A., Nádasdy, Z., Jando, G., Szabo, I., Sik, A., & Buzsáki, G. (1995). Sharp Wave-Associated High-Frequency Oscillation (200 Hz) in the Intact Hippocampus: Network and Intracellular Mechanisms. Journal of Neuroscience, 15(1), 30–46. Zenke, F., Agnes, E. J., & Gerstner, W. (2015). Diverse synaptic plastic- ity mechanisms orchestrated to form and retrieve memories in spiking neural networks. Nature Communications, 6, 6922. Zheng, P., & Triesch, J. (2014). Robust Development of Synfire Chains from Multiple Plasticity Mechanisms. Frontiers in Computational Neuroscience, 8, 66.

188