University of Calgary PRISM: University of Calgary's Digital Repository
Graduate Studies The Vault: Electronic Theses and Dissertations
2019-05-02 The Universal Critical Dynamics of Noisy Neurons
Korchinski, Daniel James
Korchinski, D. J. (2019). The Universal Critical Dynamics of Noisy Neurons (Unpublished master's thesis). University of Calgary, Calgary, AB. http://hdl.handle.net/1880/110325 master thesis
University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY
The Universal Critical Dynamics of Noisy Neurons
by
Daniel James Korchinski
A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE
GRADUATE PROGRAM IN PHYSICS AND ASTRONOMY
CALGARY, ALBERTA May, 2019
c Daniel James Korchinski 2019 Abstract
The criticality hypothesis posits that the brain operates near a critical point. Typically, critical neurons are assumed to spread activity like a simple branching process and thus fall into the universality class of directed percolation. The branching process describes activity spreading from a single initiation site, an assumption that can be violated in real neurons where external drivers and noise can initiate multiple concurrent and independent cascades. In this thesis, I use the network structure of neurons to disentangle independent cascades of activity. Using a combination of numerical simulations and mathematical modelling, I show that criticality can exist in noisy neurons but that the presence of noise changes the underly- ing universality class from directed to undirected percolation. Directed percolation describes only small scale distributions of activity, on larger scales cascades can merge together and undirected percolation is the appropriate description.
ii Preface
This thesis is an original work by the author. No part of this thesis has been previously published.
iii Acknowledgements
This work would not be possible without the gracious financial support of the National Science and Engineering Research Council, Alberta Innovates, the University of Calgary’s Faculty of Graduate Studies, Student Aid Alberta, and the Nathoo family. I would like to ex- press my gratitude to Professor J¨ornDavidsen, who captured my imagination by introducing criticality in the brain to me. He started me on this journey, and with his patience, support, and drive, has seen me through to completion. I’d also like to thank Dr. Seung-Woo, for the many fruitful conversations and suggestions made over coffee and while reviewing percolation theory. Javier Orlandi was also tremendously helpful with numerous technical details related to modelling biological neurons. Without these three, this thesis would be a shadow of its present state. I would also like to thank my parents for their support and gently prodding questions, and to Raelyn for her humour and cheer on days that mine lapsed.
iv Table of Contents
Abstract ii
Preface iii
Acknowledgements iv
Table of Contents v
List of Figures and Illustrations viii
List of Tables xiv
List of Symbols, Abbreviations and Nomenclature xv
1 Introduction 1 1.1 Complex systems ...... 1 1.2 Complex networks ...... 2 1.3 The brain as a complex system ...... 3
2 Criticality in Neural Systems 6 2.1 A brief review of criticality ...... 6 2.2 Experimental evidence of neural criticality ...... 9 2.3 Modelling criticality in the brain ...... 11 2.3.1 Hodgkin-Huxley and other “biological” dynamical neuron models . . 12 2.3.2 Branching processes ...... 15 2.3.3 Contact processes ...... 17 2.4 Noise in the brain ...... 20 2.4.1 The effect of noise on observables ...... 20 2.4.2 Modelling noise in the brain ...... 24 2.5 Summary ...... 25
3 Mathematical Background 26 3.1 Random graphs and network theory ...... 26 3.1.1 k-ary trees ...... 28 3.1.2 k-regular graphs ...... 28 3.1.3 Erd¨os-R´enyi graphs ...... 30
v 3.1.4 Small-world graphs ...... 31 3.1.5 Power-law graphs ...... 34 3.1.6 Hierarchical modular graphs ...... 37 3.2 Percolation ...... 41 3.2.1 Percolation in 1-dimension ...... 42 3.2.2 Percolation on the Bethe lattice ...... 44 3.2.3 Percolation on other graphs ...... 47 3.3 Directed percolation ...... 49 3.3.1 Spreading processes ...... 51 3.4 Summary ...... 52
4 The Branching Process with Noise 54 4.1 Results for the branching process with noise on infinite k-regular graphs . . . 55 4.1.1 Active fraction ...... 56 4.1.2 Mean cluster size ...... 59 4.1.3 Phase diagram ...... 66 4.1.4 Mergeless cluster distribution ...... 72 4.1.5 Cluster size distribution ...... 74 4.1.6 Avalanche duration and scaling relations ...... 78 4.1.7 Correlation length ...... 82 4.1.8 Size of the giant component ...... 89 4.2 Numerical results for the branching process with noise on finite k-regular graphs 93 4.2.1 Avalanche distributions ...... 93 4.2.2 The giant component in finite graphs ...... 95 4.2.3 Mean cluster size ...... 96 4.3 Simulations on other finite networks ...... 99 4.3.1 Small-world graphs ...... 100 4.3.2 Power-law networks ...... 101 4.3.3 Hierarchical modular networks ...... 103 4.4 Thresholded avalanches ...... 108 4.5 Summary ...... 113
5 Quadratic Integrate-and-Fire neurons 116 5.1 The model ...... 116 5.2 Simulations on Erd¨os-R´enyi and hierarchical modular networks ...... 119 5.3 Summary ...... 121
6 Conclusions 124 6.1 Summary of results ...... 124 6.2 Outlook and future work ...... 126
Bibliography 129
A Supplementary Figures 141
vi B Numerical Methods 152 B.1 Simulation of infinite k-regular branching processes with spontaneous activity 152
vii List of Figures and Illustrations
2.1 Neuronal avalanches, reproduced from Beggs [Beggs and Plenz, 2003]. Top: each point indicates the detection of an action potential at that electrode label. Bottom: Detail showing the evolution of a single avalanche...... 10 2.2 The basic anatomy of a pair of ideal neurons. The neuron outlines are repli- cated from [Mel, 1994]...... 12 2.3 An example of a branching process on a simple linear bidirectional network (shown at the top). The dynamics consists of a single cascade initiated at node 1 at time t = 1. As connections here are recurrent, nodes can be reactivated, as occurs at node 1, at time t=3 and node 3 at time t=5...... 16 2.4 The results of overlapping avalanches, when avalanches are initiated as a Pois- son process of various rates. Avalanche sizes are drawn from a pure power-law,√ P (S) ∼ S−3/2, and avalanche durations are assumed to scale T ∼ S, with time rescaled so that the duration of a size 1 avalanche is T = 1. If another avalanche is triggered in the timespan of the first, their sizes are added and the length of the avalanche is potentially increased, possibly including another independent cascade...... 21 2.5 Causal webs can be used to distinguish spatially distinct events, as well as the progenitor events in avalanches. On the left are the spike trains observed in the neurons on the right. There are two causal webs of size three, as well as a causal web of size 4. Under the traditional model of avalanches, with avalanches delineated by periods of silence, there would be two avalanches: one of size six and one of size four...... 23
3.1 A demonstration of the Watts-Strogatz model. (a) A circulant graph, connect- ing the nearest two neighbours (giving each node a degree of four) is shown for N = 10. Of the 20 bonds, 4 are selected for rearrangement. (b) The bonds for rearrangement retain one end-point while the other is swapped for another at random...... 33 3.2 Degree distribution for power-law networks with uncorrelated degree distri- butions generated via the configuration model, with λ = 3.5 and kmin = 5, averaged across 500 networks of size N = 105...... 36 3.3 Degree distribution for power-law networks with uncorrelated degree distri- butions generated via the Goh model, with λ = 3.5 and hki = 10 averaged across 500 networks of size N = 105...... 37
viii 3.4 The in/out-degree correlations resulting averaged from an ensemble of 500 networks, both with an asymptotic degree distribution of p(k) ∼ k−3.5. (a) In/out-degree correlations for power-law networks generated by the configura- tion algorithm, as in Figure-3.2. (b) In/out-degree correlations for power-law networks generated by the Goh algorithm as in Figure-3.3...... 38 3.5 Base modules are represented by filled squares. Each base module might contain a dense network of neurons. Modules are wired into pairs – these pairs constitute a super module. Super-module pairings are indicated by a lighter shade of blue. Super-super modules are constructed from pairs of super- modules, and are indicated by the lightest shade of blue. During the formation of the super-super modules, a base module from each of the super-modules is selected, these two base-modules are then wired together as indicated with the lightest-blue edge. A single super3-module is constructed from the two super2 modules, and is indicated in green. Two base modules, one from each super2-module are wired together, this connection is indicated in green. . . . 39 3.6 A simple example of the vertices and edges populating two base modules (coloured blue) connected together to form a super-module (coloured purple). Here the number of intra-vertices per module, NPN, is 5. Each intra-vertice is coloured navy blue. The number of inter-vertices, NPC, is 2, and each inter-vertice is coloured red. Here, the out-degree of each vertex is 2. The inter-vertices only connect to the intra-vertices of the other module, their edges are in purple. Intra-vertices can connect to other intra-vertices or to the inter-vertices of the same module, their edges are shown here in green. The populations of intra-vertices and inter-vertices are circled in light blue. . 40 3.7 An example of three infinite lattice structures. (a) the 1-dimensional lattice. (b) The Bethe lattice of degree 4. (c) The 2-dimensional triangular lattice. . 42 3.8 Percolation on a 1D lattice of size N = 14. Occupied sites are coloured black. Sampling the seven active nodes, the cluster size distribution is Pn(S = 1) = 2 2 3 7 , Pn(S = 2) = 7 , Pn(S = 3) = 7 . Sampling the four clusters, the size 2 1 1 distribution is Pc(S = 1) = 4 , Pc(S = 2) = 4 , Pc(S = 3) = 4 . Hence, the 7 mean cluster sizes are hSin = 2 and hSic = 4 ...... 42 3.9 An example of (1+1)-dimensional directed bond percolation on a tilted square lattice. Surviving bonds after dilution are marked in black. A cluster of size 8 is marked in blue, beginning at the site marked in red and proceeding down the lattice following the directed links...... 49 3.10 Here, a contact process beginning a node 1 spreads to node 2, which in turn spreads the process to nodes 1 and 3, after which the process terminates. . . 51
4.1 An example of a branching process with multiple spontaneous activations/infections on a simple linear bidirectional network (shown at the top). The dynamics consists of two independent cascades, one with two roots (node 1 at time t = 1 and node 4 at time t = 2), and one with a single root (node 0 at time t = 3). 55
ix 4.2 a. The distribution of avalanche sizes on a 10-regular graph, with N = 104 nodes simulated for T = 103 (empty circles) or 104 timesteps (filled circles), −5 averaged over five network configurations, for various p1 and p0 = 10 . Solid −3/2 lines are exponentially truncated power-law fits, p(s) ∼ s exp[−s/sξ]. The p1 values for each curve are marked in panel b. b. The average number of nodes active in the largest cluster each time step...... 56 4.3 The active fraction Φ(p0, p1) for 10-regular graphs, for various p0 as a function of p1...... 58 4.4 The dynamical susceptibility χ0 as a function of p0 and p1. Maxima of the dynamical susceptibility are marked with blue squares. The susceptibility along the Widom line is plotted in black...... 59 4.5 A CWEB of size four is shown. Physical connections between nodes are shown in grey. Node B has nodes A and C as parents, while node C has D as a parent. Directed edges in black correspond to how the cluster is built, beginning from from A. Associated with each node added to the cluster is a probability of inclusion that depends only on information available along that path. Here, node A triggers B concurrently with C, while D triggered C. Evaluated from A however, the probability that B triggered (without knowledge of C’s firing) is pd, while the probability that C fired, conditioned on both A and B having fired, is pp1. Lastly, the probability that D contributes to C, conditioned only on the fact that C was activated, is pp. In this figure non-firing sites (e.g. the parents of A) are hidden to reduce clutter...... 61 4.6 (a) The phase-diagram for a 10-regular graph, with the Widom line, unity branching ratio line, and the phase transition line, on which χn = hSin di- verges. The limits of the diverging χn fall at the points expected for a directed and undirected percolation process on a Bethe lattice of coordination number k + 1 and 2k, respectively. (b) As in (a), but in the limit of low noise, and in 1 η log-log scale. All three lines follow p0 ∝ k − p1 with different η. From top to bottom, the η are 1, 2, and 3...... 67 4.7 The causal-web distribution Pc(s) of an infinite 10-regular graph, simulated −3/2 −5/2 for different p0 on the critical line. Power-laws, s and s are present to guide the eye...... 75 4.8 The causal web distribution Pc(s) of an infinite 10-regular graph, simulated −3/2 −5/2 for different p0 on the critical line. Power-laws, s and s are present to guide the eye...... 76 −5 4.9 a. Avalanche statistics for p0 = 10 simulated on an infinite 10-regular network at the theoretically determined critical point. Simulated avalanches with one root are shown with symbols, while the analytical prediction is shown with a line of the same colour. b. Average number of roots R for avalanches of a given size are shown for simulations of various p0 on an infinite 10-regular network. Inset shows curve-collapse across various p0, with rescaled x-axis of 2/3 sp0 ...... 77 4.10 Symbols are simulation results on infinite 10-regular lattices for 2 × 107 clus- ters, while solid lines are the analytical predictions of Equation-4.11...... 78
x 4.11 a. Mean avalanche durations for avalanches of various sizes simulated on the infinite 10-regular network, with varying levels of spontaneous activity. p1 −11/2 is set to a slightly sub-critical value, p1 = p1c − 10 , so that no infinite avalanches occur. b. As in a., the mean avalanche duration exhibits reason- able curve-collapse, with collapse quality increasing as p0 → 0...... 80 4.12 Avalanche durations simulated on the infinite 10-regular network, with varying levels of spontaneous activity. p1 is set to a slightly sub-critical value, p1 = −11/2 p1c − 10 , so that no infinite avalanches occur...... 81 4.13 The perpendicular correlation length function for simulations of infinite 10- −4 regular graphs near the critical line, with p0 = 10 . Thick solid lines are analytical predictions, while the lighter hue denote numerical averages from 2 × 106 simulations...... 85 4.14 a. Cluster size distributions obtained for finite (N = 107 for T = 104) and infinite 10-regular networks at the theoretical critical point given by the diver- gence of Equation-4.11. Finite simulations are given with transparent symbols, with the corresponding infinite graph result as a line of the same colour. b. As in a., but with a curve-collapse effected by rescaling according to the rate of spontaneous activation...... 94 4.15 Here we compare the analytical results of Equation-4.19 to the size distribution of mergeless avalanches on finite graphs for a variety of noise levels at the theoretical critical point given by the divergence of Equation-4.11...... 95 4.16 a. The fraction of the graph occupied by the largest cluster for various graph −3 −4 sizes, with p0 = 10 denoted by circles, and p0 = 10 denoted by triangles. Solid lines are theoretical predictions for the giant component size, as devel- oped in §4.1.8. Simulations are for T = 104 time steps. b. As in a., but with a curve-collapse effected by finite size scaling...... 97 4.17 a. The active fraction Φ and giant components G analytically (solid lines) and for simulations of varying sizes (symbols). Crosses are N = 1012/3, circles are N = 1014/3, and squares are N = 1016/3. Finite simulations are for T = 104 time steps, averaged over five network realizations. b. As in a., but for the fraction of active nodes that are part of the giant component...... 98 2 −3 4.18 a. The mean (finite) cluster size hS in with p0 = 10 denoted by circles, and −4 4 p0 = 10 denoted by triangles. Simulations are for T = 10 time steps. b. As in a., but with a curve-collapse effected by finite size scaling...... 99 4.19 a. Cluster size distributions obtained for finite (N = 105 for T = 105) small- world networks with rewiring probability 0.01. b. As in a., but with a curve- collapse effected by rescaling according to the rate of spontaneous activation...... 101 4.20 Simulations on small-world networks for N = 1013/3 and T = 105 averaged across three network realizations. Panels a-d correspond to various re-wire probabilities...... 102
xi 4.21 a. Simulations of the giant component on power-law networks with p(k) ∼ k−3.5 of varying sizes for T = 107/3 time steps and with 10 network realizations with estimated p1c = 0.1110, and giant emergence exponent β = 2 with 1/(dν) ≈ 0.25. b. As in a., but with a curve-collapse effected by finite size scaling...... 104 4.22 Simulations of the giant component on power-law networks with p(k) ∼ k−3.5 of varying sizes for T = 107/3 time steps and with 10 network realizations. Finite size scaling is performed with estimated p1c = 0.1110 and finite size scaling exponent 1/(dν) ≈ 0.25...... 105 4.23 a. Cluster size distributions obtained for finite (N = 107 for T = 104) power- law networks (p(k) ∼ k−3.5). b. As in a., but with a curve-collapse effected by rescaling according to the rate of spontaneous activation...... 106 4.24 a. Cluster size distributions obtained for finite (N = 215 modules, each con- sisting of M = 102 nodes for T = 104) on hierarchical modular networks. b. As in a., but with a curve-collapse effected by rescaling according to the rate of spontaneous activation...... 107 4.25 Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the lowest 2.5th percentile...... 111 4.26 Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 50th percentile...... 112
5.1 The response of a single QIFAD neuron, to a periodically-applied current, in- creasing in strength with each application. The top panel shows the membrane voltage, while the bottom panel shows the applied current...... 118 5.2 Simulations near the critical point for the QIFAD model on Erd¨os-R´enyi net- works, with N = 1011/2 neurons and λ = 4 × 10−3 kHz or 4 Hz, for three different values of connection strength g...... 121 5.3 Simulations conducted on hierarchical modular networks, with 1000 neurons per base node, 100 neurons per inter-modular connection, and 7 hierarchical layers (for a total of 27 nodes). The average in/out degree of each neuron was ≈70. Neuron parameters are as given in Table-5.1, except the that capacitance is given by C = 174 pF, while k = 0.4 and b = 3.5. The excitatory connection strength is given by g = 100 pA. In this simulation, λ = 650 Hz, while gshot = 70.3 pA. Ten percent of intra-neurons were inhibitory (GABAergic), with gGABA = -15 pA and τk = 20 ms. Values are the result of an average across five network realizations, each simulated for three minutes. a. The probability distribution function for mean causal web size, with a power-law fit generated by maximum likelihood estimator. b. The probability distribution function for mean causal web duration. c. The mean size for avalanches of a given duration. Fit is to avalanches smaller than the bursts...... 122
1/3 A.1 Power-law scaling of 1/k − p1c ∝ p0 , shown here for k = 10...... 142 1/2 A.2 Power-law scaling of 1/k − p1c ∝ p0 for the σ = 1 line, shown here for k = 10. 142 A.3 Scaling of the first- (Equation-4.13) and second-order (Equation-4.17) approx- imations to the active fraction (Equation-4.3) along the Widom line...... 143
xii A.4 The Widom line in the neighbourhood of p0 1 is asymptotically approxi- mated by Equation-4.18...... 144 A.5 The scaling of the size cutoff for mergeless avalanches. The exact sξ is given by Equation-4.21 and is plotted in purple. Equation-4.22 captures the correct scaling form for sξ, however has a poor prefactor for small p0, as can be seen in green in the figure above. Equation-4.25 shows an improved prefactor, and is plotted in blue...... 145 A.6 Giant component for simulations with N = 104 nodes on 10-regular graphs, of varying durations. Above the critical point, variation in simulation duration has no effect. Below the critical point, the largest cluster doesn’t scale exten- sively, and hence its occupation fraction for the whole simulation decreases as the simulation duration increases...... 146 A.7 Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 1st percentile...... 147 A.8 Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 34th percentile...... 148 A.9 Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 76th percentile...... 149 A.10 Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 97.5th percentile...... 150 A.11 Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 99th percentile...... 151
B.1 A cluster is developed from root-node A. This particular network structure is an illustrative conceit – no specific structure is specified in memory. (a) Consider a cluster developed from a single root node, occurring on an infinite, random, 2-regular graph for clarity. There are initially two type-I connections. (b) Each type-I node potentially has k − 1 other parents, each independently active with probability Φ. In this specific example, let’s suppose both of the type-I connections of node A have another parent. Then, each of the 2 type-I connections will be included in the cluster with probability: 1 − p0p1 . (c) Suppose the left type-I fails to activate, while the right (now labelled B) succeeds. The other parent B is now a type-II connection, while the hitherto unconsidered daughters of B are two new type-Is. (d) The type-II connection (now labelled C) is always included. It introduces a new type-I connection, and (after sampling from Equation-B.1) adds 1 new type-II connection. The other possible (but inactive) parent is shown in light-grey...... 155 B.2 A possible cluster realization of size 5, following the remaining steps outlined in Table-B.1, continuing from Figure-B.1...... 156
xiii List of Tables
3.1 A summary of percolation exponents in different network configurations and dimension. PL here denotes “power-law” and refers to a random graph with −λ degree distribution p(k) ∼ k . SW here denotes “small-world”. Here sξ denotes the characteristic size beyond which an exponential cut-off appears to truncate the power-law of Pn(S). Results for d = 1 and d = 2 are as given in [Christensen and Moloney, 2005]. Small-world values are as given in [Moore and Newman, 2000]. The PL network values hold for λ ∈ (2, 4). Those with λ < 3 have a percolation transition at p = 1 while the transition is at p < 1 for λ > 3, hence many quantities have singularities at λ = 3. Additionally, the cluster distribution has logarithmic corrections to Pn(S) for λ = 3. γ takes the value +1 for λ ∈ (3, 4) and −1 for λ ∈ (2, 3) [Cohen et al., 2002]. Mean field values, d ≥ 6, are as given in [Christensen and Moloney, 2005]...... 48 3.2 This table summarizes the directed percolation critical exponents for mean- field networks (d ≥ 4) and for both uncorrelated and correlated power-law networks. “Unc. PL” refers to directed percolation on directed power-law graphs with P in(j) ∼ j−λin and P out(k) ∼ k−λout uncorrelated at each vertex. “Cor. PL” refers to the same, but with the existence of a fraction AB of λout−1 nodes that are fully correlated, with the in-degree j = k λin−1 related to the ∗ λin−λout out-degree. λ = λout + in the GSCC of the power-law network. λin−1 For the uncorrelated PL networks, the first value of the exponent holds for λout ∈ (2, 3) and the second for λout ≥ 3. For correlated PL networks, the first value holds for λ∗ ∈ (2, 4) (excluding 3 in the case of β) and the second when λ∗ ≥ 4. Power-law values are from [Schwartz et al., 2002]. Mean field values (d ≥ 4) are as given in [Hinrichsen, 2000]...... 50
5.1 Values used for QIFAD simulations. From [Izhikevich, 2007] and [Orlandi et al., 2013]...... 120
B.1 An example of developing a single cluster of size 5. Nodes and edges in the cluster are in black, while the nodes and edges constituting the boundary of the cluster are in light grey. The first few operations are illustrated in Figure-B.1...... 157
xiv List of Symbols, Abbreviations and Nomenclature
Symbol or abbreviation Definition 2F1 The Gauss hypergeometric function p The complementary probability 1 − p BP Branching process GP Griffiths phase SIS Susceptible-infected-susceptible SIR Susceptible-infected-removed fMRI Functional Magnetic Resonance Imaging RSN Resting State Network EEG Electroencephalogram GSCC Giant strongly-connected component GWCC Giant weakly-connected component Billion 109. I use short-scale here.
xv Chapter 1
Introduction
1.1 Complex systems
The field of complex systems is characterized not by the relative intractability of its equations but rather by the adage that “more is more”. A system is complex if the interactions between its elements display emergent behaviour [Hastings et al., 2017]. There are many archetypal systems in which simple dynamics lead to a richer gestalt; however, the motivation for this thesis lies in the application of statistical physics to the brain. Although a cellular neuroscientist might object to this characterization, the brain is a system comprised of a multitude of simple parts, neurons, whose aggregate behaviour is comparatively richer than that of the individual elements. Individual neurons do not compose poetry or evince any hint of consciousness, yet in bulk, they produce language, art, and mathematics. In statistical physics, we are commonly faced with a similar problem of explaining or predicting bulk behaviour from the dynamics of individual particles. The typical approach is to observe some unusual effect in a bulk material. Then, a Hamiltonian function for the individual units of the system is conjectured, beginning with as few interactions between elements as possible. From this parsimonious Hamiltonian is built the Hamiltonian for the ensemble. Should the ensemble Hamiltonian fail to capture the dynamics of the bulk
1 material, the Hamiltonian of the individual units is enriched slightly, and the process is repeated, until the observed bulk effect is explained. By beginning with a maximally simple model, exactly those elements necessary to produce the observed bulk behaviour are present and the emergent phenomenon may be explained.
1.2 Complex networks
Systems are complex if rich collective behaviour emerges from relatively simple dynamics. However, it is sometimes the case that the structure of a system can strongly influence the collective dynamics of a system. In the statistical physics picture, the description of how elements interact (i.e. their interaction potentials) is distinct from the description of which elements interact. For instance, the bulk electronic behaviour of graphene is quite distinct from that of diamond, even though the constituent nodes are in both cases carbon atoms [Sarma et al., 2011]. The difference between the two materials is that of their structures, which differ in both underlying symmetry and dimension. A generic way to encode the structure of a complex network is with the language of graph theory [Newman, 2010]. Individual elements of the system are called nodes. Nodes that can interact with each other are connected via edges (links) between them. Edges can also encode a directionality. For instance, the world-wide-web can be captured as a graph by letting the nodes of the graph denote web-pages and the edges between them being hyperlinks. These edges would be directed because a web-page does not need to link back to the page that linked to it. One area in which complex networks have been employed is modelling the size of disease outbreaks [Newman, 2002]. Individual humans might be nodes, while the links between them denote a possible route of disease transmission. Human interaction networks are often said to be “small-world”, reflecting the observation that there are relatively few intervening steps connecting any two randomly selected people [Travers and Milgram, 1977]. The structure
2 of disease networks has a significant impact on the transmission of disease and the size of epidemics, should they occur at all. In “small-world” networks, if a disease is sufficiently infectious an epidemic is always possible [Moore and Newman, 2000]. However, some dis- eases spread via a different network than just generic human interactions. For instance, the spreading of sexually-transmitted disease in a population of heterosexual individuals can be modelled on a bipartite graph. In such graphs, it is possible to totally eliminate the pos- sibility of disease outbreaks, regardless of the transmissibility of disease, by adjusting the structural properties of the graph [Newman, 2002].
1.3 The brain as a complex system
The brain may also be modelled as a complex system. It has an obvious fundamental unit: the neuron, which communicates with its neighbours via their axons and dendrites. There are many models of neuron behaviour, with varying degrees of sophistication and biological relevance [Izhikevich, 2007]. A statistical physicist may discuss the behaviour of ensembles of such neurons and aim to explain the emergent properties of the brain in such a manner. This approach has been used to explain the apparent presence of scale-free cascades of brain activity. The evidence for this behaviour will be elaborated on in §2.2. One challenge in modelling the brain as a complex system is its scale. There are over a hundred billion neurons in the human brain, making full-scale simulations of the brain currently infeasible [Herculano-Houzel, 2009]. Additionally, experimental tools that neuro- scientists use operate at very different resolutions and scales. Mapping out brain networks called connectomes in living animals is typically done using various types of magnetic reso- nance imaging (MRI) [Basser et al., 2000, Oh et al., 2014, Sporns et al., 2005]. MRI typically has a resolution no finer than 1 mm3, meaning that it can only identify connections between brain regions. At the mesoscale, a map of axonal connections for neurons of specific types can be accomplished with genetic labelling of cells, however since this can only be accomplished
3 by labelling specific cell types in different animals, it is necessary to average across multiple animals to obtain a representative network [Oh et al., 2014]. To obtain a connectome at the finest scale, capturing the physical connections (synapses) between individual neurons can be accomplished in a single animal using finely-sliced persevered neural tissue and an electron microscope. However, this process is slow, and thus far whole brain connectomes at the synaptic level have only been recently accomplished for larval zebrafish [Hildebrand et al., 2017] and fruit flies [Zheng et al., 2018]. Thus, it is most typical to model neural systems at one of two scales. Fine-scaled models of neural systems take neurons to be the basic unit of the system. Such models might include equations describing the ionic currents flowing through the cell membrane, internal dynamics such as protein coupling cascades, and delays in signal propagation [Izhikevich, 2007]. Coarsely-grained models of neural systems instead choose entire brain regions to be the basic unit of the system. These brain regions are typically assumed to be anywhere from a few hundred to a few million neurons in size, and rely on the fact that neighbouring neurons often have correlated activity. These models might describe the average depolarization or firing rate of different populations of cells within the region as in neural mass models, or might be as coarse as a binary variable representing an “active” or “inactive” region [Breakspear, 2017]. Simple models at both scales can successfully replicate many experimental observations at their respective scales. Finely-grained models of neurons can reproduce experimentally cascades of neural activity [Pasquale et al., 2008, Poil et al., 2012]. Meanwhile, coarse models of neural behaviour, when run on experimentally observed human brain networks reproduce characteristic patterns of activity called “resting state networks” (RSNs) [Haimovici et al., 2013, Hansen et al., 2015]. Typically, both finely-grained and coarsely-grained models of neural dynamics ignore spontaneous activity. Isolated neurons, even in the absence of other brain activity, will sometimes spontaneously fire. This spontaneous noise is a source of the scale-free neural
4 cascades mentioned earlier [Orlandi et al., 2013]. Usually, however, it is assumed that after spontaneous activity initiating the cascade, no other spontaneous activity occurs during the cascade. Systems that satisfy this assumption are said to exhibit a “separation of timescales”, which is popular starting point for statistical physics models. The aim of this thesis is to address this separation of timescales violation, by enriching neuron models with spontaneous activity as a fundamental ingredient, and examining how the system’s emergent behaviours are altered in the presence of this noise. This will be accomplished through a combination of extensive computer simulation and analytical techniques.
5 Chapter 2
Criticality in Neural Systems
2.1 A brief review of criticality
Before we describe the evidence for criticality in the brain, it is necessary that we are able to recognize criticality in generic systems. Of principal interest in this thesis will be critical points dividing two distinct phases. “Criticality” therefore describes a system that displays properties consistent with operating close to a continuous (i.e. second-order) phase transition [Kardar, 2007]. Critical points are characterized by the presence of power-laws. For a generic observable X, it is common to observe that for some control parameter T close to a critical point, that X observes the scaling relation
γ γ X ∝ (T − Tc) = T , (2.1)
for some scaling exponent γ, and where T here denotes T − Tc. We say that power-laws are scale-free, because if we rescale the control parameter T by some constant, say C, then the scaling relation remains unchanged, as
γ γ T T X ∝ Cγ ∝ . (2.2) C C
6 Since both Equation-2.1 and Equation-2.2 are valid, both the reduced temperature T and the rescaled reduced temperature T /C are in some sense equivalent. This is atypical for a physical phenomenon, where there is usually some typical scale for the problem. For instance, in radioactive decay the number of nuclei surviving at time t is given by n(t) ∝ 2−t/τ1/2 , which has a characteristic scale of τ1/2, the half-life. Rescaling the time in the decay process by some arbitrary constant t → t/c simply results in a rescaled version of n(t). Indeed, if a function f(T ) rescales as f(aT ) = g(a)f(T ) for some function g of the rescaling constant, then by setting T = 1 we know f(a) = g(a)f(1), so we can write
f(aT ) = f(a)f(T )/f(1) . (2.3)
Taking a derivative with respect to a, we find
T f 0(aT ) = f 0(a)f(T )/f(1) , (2.4) and setting a = 1, we have a separable first-order differential equation in T that can be solved to yield log(f f(1)) = log(T f 0(1)) + C, (2.5) for C = log(f(1)) which yields f(T ) = f(1)T f 0(1)/f(1) (2.6) which is of course a power-law, with two free parameters that choose the normalization and the exponent. This means that scale-free behaviour necessarily implies a power-law. Of course, the use of the symbol T for a control parameter is no accident. The canonical example that statistical physicists draw upon is that of the ferromagnetic Ising model, for which the control parameter is the temperature, T [Christensen and Moloney, 2005, Kardar, 2007]. The Ising model describes the behaviour of a lattice of spins that interact with their nearest neighbours. For lattices with dimension larger than two, the lower critical dimension
7 for the Ising model, the Ising model predicts a phase transition between the ordered phase where most spins align and produce a net magnetization (the ferromagnetic phase), and a disordered phase with randomly aligned spins and no net magnetization (the paramagnetic phase). This describes well the experimental observation that raising a ferromagnetic ma- terial above its Curie temperature destroys the bulk magnetization. Near to that critical point, several quantities vary as a power-law. For instance, the net magnetization varies as
β M ∼ (−T ) for T < Tc. Another relevant quantity is the correlation length. This quantity measures the scale over which correlations between neighbouring spins decay. In the Ising
model, the state of spin σi is ±1. The correlation between two spins σi, σj might be measured
−|i−j|/ξ as hσi; σji = hσiσji − hσiihσji ∼ e where ξ denotes the correlation length. This corre- −ν lation length between spins is asymptotically ξ ∼ T for temperatures near to the critical point. As ν > 0, the correlation length goes to infinity as the temperature approaches the critical point. Consequently, near to the critical point fluctuations grow arbitrarily large. Although the value of the Curie temperature depends on the properties of the material under consideration, the critical exponents are shared across different materials, and depend only on the dimensionality of the system [Christensen and Moloney, 2005]. A similar phenomenon occurs at the critical point of the liquid-gas phase transition. Near the liquid-gas phase transition, particles can coordinate on long scales. The correlation −ν length here also displays a power-law divergence, ξ ∼ T near to the critical temperature. These long-distance correlations can become large enough to scatter light, and result in what is known as critical opalescence, where the fluid becomes milky and pale. Intriguingly, the correlation length exponent ν ≈ 0.63 in the liquid-gas transition agrees with that of the correlation length in the Ising system in three dimensions. Indeed, perhaps surprisingly, all critical exponents are shared between these two systems [Yang and Yang, 1964]. In both systems, the correlation length diverges near to the critical point. The diverging correlation length justifies a coarse-graining process known as renormalization, in which the microscopic details of the model wash away [Kardar, 2007]. In both the 3-dimensional Ising
8 ferromagnet and the liquid-gas transitions, the coarse-grained system rescales in the same way owing to their shared symmetries, which leads to the same critical exponents, and places them in the same universality class. Generally speaking, if two systems can be shown to fall into the same universality class, regardless of their microscopic dynamics, they will share the same bulk behaviour near to their critical points.
2.2 Experimental evidence of neural criticality
The idea that neural systems might naturally operate near to a critical point was suggested in the seminal 2003 paper by John Beggs and Dietmar Plenz, where it was observed that the ensembles of neurons exhibited scale-free cascades of activity that were dubbed “neuronal avalanches” [Beggs and Plenz, 2003]. In this work, slices of rat cortex were cultivated on an eight by eight array of electrodes, which were sensitive enough to detect the electrical activity of the neurons. The activity appeared to exhibit intense cascades of activity which were presumably initiated by a single neuron, as in Figure-2.1. Each avalanche was therefore defined to be a period of activity bounded on each side by a period of quiescence. The avalanche’s size was defined to be the number of action potentials detected during the cascade. It was observed that the avalanche sizes were approximately power-law distributed, with the probability of an avalanche of size S being P (s) ∝ S−τ with τ ≈ 3/2 and an avalanche of duration T being P (T ) ∝ T −α with α ≈ 2. Subsequent studies, with larger multi-electrode arrays and higher time resolutions, have confirmed this basic result [Friedman et al., 2012]. To explain this apparently scale-free behaviour John Beggs proposed that the brain operates near to a critical point of a continuous phase transition [Beggs and Plenz, 2003]. In addition to reporting the distribution of avalanches, John Beggs also observed that this distribution of avalanches matches the predicted exponent for a mean-field branching process near criticality. In a branching process, each node activates connected daughter
9 Figure 2.1: Neuronal avalanches, reproduced from Beggs [Beggs and Plenz, 2003]. Top: each point indicates the detection of an action potential at that electrode label. Bottom: Detail showing the evolution of a single avalanche. nodes with some probability p (see §3.3.1 for details). The branching process falls into the universality class of directed percolation. To explain why criticality might appear in these neural networks he noted that a branching process has two phases: a sub-critical phase in which activity dies away and a super-critical phase where activity explodes to take over the system. He likened the super-critical phase to epilepsy, an undesirable neurological disorder in which rampant neural activity induces seizures, and the sub-critical phase to coma, where neural activity dies away. He observed from simulations of the branching process on a feed- forward (i.e. loopless) network, criticality maximizes information transmission. Although measurements of neural cultures matched many predictions of a critical branch- ing process, including the relationships between critical exponents [Friedman et al., 2012], if neural avalanches are truly scale-free then similar behaviour should be observed at the scale of the whole brain. To this end, Haimovici studied cascades of activity at the scale of the whole brain using functional magnetic resonance imaging (fMRI) [Haimovici et al., 2013].
10 fMRI measures the “blood-oxygen level dependent” (BOLD) signal from 3-dimensional re- gions known as voxels. BOLD is thought to reflect the increased regional metabolic load that corresponds to neural activity, making it a way to noninvasively measure brain activity [Shmuel et al., 2006]. In the work of Haimovici et al., the BOLD signal was converted from a continuously- varying signal into a point-process by labelling each point at which the BOLD signal passed a certain threshold (typically 2 standard deviations from the mean). In this way, a spike train like that of Figure-2.1 could be produced, except with each “spike” corresponding to the activation of a brain region instead of a single neuron. However, no periods of silence existed to delineate the boundaries between avalanches. This reflects the fact that spontaneous activity, even if rare in a neuronal culture of a few thousand neurons, will be pervasive in a sample of a hundred billion neurons. To identify avalanches, Haimovici et al. needed to separate causally disconnected activations. To do this, Haimovici instead studied the evolution of clusters of activity. A new cluster would form whenever a spike was registered with no neighbouring regions spiking before it. Any subsequent neighbouring activations were then added to that cluster until activity died away. At any given moment, the number of neuronal cascades varied but was typically less than 100. The resulting distribution of cluster sizes was also found to be P (S) ∝ S−τ with τ ≈ 3/2 over approximately four orders of magnitude. For a more thorough survey of the experimental evidence of criticality in the brain, I recommend the following reviews [Breakspear, 2017, Chialvo, 2010, Cocchi et al., 2017].
2.3 Modelling criticality in the brain
Muddying the picture of criticality in the brain are observations of power-law exponents that do not always match the mean-field exponents of directed percolation. Indeed, observations of neuronal avalanches in neural cultures raised in adverse conditions also exhibit apparently
11 Figure 2.2: The basic anatomy of a pair of ideal neurons. The neuron outlines are replicated from [Mel, 1994]. scale-free behaviour, but with a different collection of critical exponents than directed per- colation [Yaghoubi et al., 2018]. Additionally, power-laws alone are not enough to signify a critical point: there are many statistical processes that generate power-laws [Newman, 2005]. The observation that neural avalanche statistics observe universal scaling [Friedman et al., 2012] has also been argued to be inadequate, as non-critical processes can also generate power-laws and universal scaling [Touboul and Destexhe, 2017]. Although it is suggestive that power-law exponents that neural systems exhibit do obey the hyper-scaling relations predicted by theories of critical phenomena, to argue that the brain is critical we need more than just statistical evidence. It is also necessary to have accurate and realistic models of neural dynamics that exhibit a phase transition. It also remains to explain why evolution should produce brains that operate close to criticality in the first place. To that end, models of neural dynamics must be related to their capacity for information processing and storage. In this section, we will review a few models used for these purposes.
2.3.1 Hodgkin-Huxley and other “biological” dynamical neuron
models
The most critical aspect of neurons, at least for information propagation and processing, is the voltage across their cell membranes. By means of ion pumps, the concentrations of positively charged sodium, potassium, and calcium ions, as well as negatively charged chlo-
12 rine ions can be maintained inside the cell at concentrations different from the extracellular environment. This chemical gradient results in a voltage across the cell membrane, which in homeostasis is typically maintained at approximately -70 mV. For an extended discussion on the basic cellular biology and electrical properties of neurons, see chapter 4 of [Kolb and Whishaw, 2009]. Communication between neurons is typically done by way of chemicals known as neurotransmitters. Neurons are cells with four parts: the dendrites, cell body, axon, and synapses (see Figure-2.2). When neurotransmitters reach the dendrites of a neu- ron, they induce voltage changes in the cell membrane. These voltage perturbations are effectively summed in the cell body. Should it exceed a certain threshold voltage, ion chan- nels in the axon will open, and initiate a voltage cascade down the axon known as an action potential. When the action potential reaches the axon terminal, the neuron releases neuro- transmitters to its daughter neurons, and thereby propagates information. If that voltage threshold is not reached, no action potential occurs, and no information is propagated. For an extended discussion on the different classes of neurotransmitter, their various effects on neurons, and communication between neurons, see chapter 5 of [Kolb and Whishaw, 2009]. The first mathematical model for neurons was the Hodgkin-Huxley model [Hodgkin and Huxley, 1952], for which its namesakes received the 1963 Nobel prize in physiology. The Hodgkin-Huxley model is a set of four coupled differential equations, which represent the opening of ion channels, the flow of ions, and the evolution of the membrane potential [Izhikevich, 2007]. Inputs from other neurons can be modelled as the injection of currents. These currents, along with the Hodgkin-Huxley differential equations, can be integrated and can reproduce the firing of action potentials observed in real neurons. For this reason, the Hodgkin-Huxley model is said to belong to an extensive class of neuron models known as “Integrate-and-fire” neurons [Izhikevich, 2007]. In practice, these models are never solved analytically but instead, ensembles of neurons obeying these dynamics models are simulated. Simulating such models typically involves inte- grating a collection of differential equations that represent various physiologically-motivated
13 quantities. In the framework of criticality, simulations of physiologically-motivated neurons can reproduce power-laws; however, this can require extensive fine-tuning of different model parameters. This fine-tuning might be expected if a critical point underlies these power-laws. A typical parameter that must be fined-tuned is the coupling strength between neurons. Be- low a certain threshold, activity tends to die away, while above a certain level, activity tends to occupy much of the system, which again fits the rough picture of a phase transition. Sim- ulations of ensembles of Hodgkin-Huxley neurons have shown that their dynamical range is maximized at criticality, suggesting that criticality is optimal for encoding and responding to stimuli [Copelli et al., 2005, Kinouchi and Copelli, 2006]. Other biologically-motivated neuron models attempt to answer the question of how neural systems tune themselves to a critical point. In statistical physics it is not unheard of for systems to self-tune to their critical points; such systems are said to exhibit self-organized criticality (SOC) [Bak et al., 1988]. One canonical example is the appearance of power- law distributions in the extent and frequency of wildfires, where growth pushes forests into a critical state, making a large fire possible, which then lowers the system to a sub-critical state [Malamud et al., 1998, Ricotta et al., 1999]. In the context of neural systems, this can be accomplished by enriching the neuron model with self-tuning properties, that are typically likened to the homeostatic mechanisms present in real neurons [Hesse and Gross, 2014]. Negative feedback mechanisms, such as synaptic depletion [Levina and Herrmann, 2006] and synaptic adaptation, serve to regulate excess activity. Positive feedback mechanisms, such as the spike-timing-dependent-plasticity (STDP) associated with learning, serve to increase activity propagation [Kolb and Whishaw, 2009]. Models that include both positive and negative feedback mechanisms and that appear to tune to a critical point are thought to be examples of SOC. SOC with neuron models has been demonstrated in deep-learning models [Del Papa et al., 2017], as well as simpler automaton models [de Andrade Costa et al., 2015]. In addition to synthetic models, biologically-motivated models of self-tuning with feedback have exhibited SOC [Hesse and Gross, 2014, Kossio et al., 2018, Millman et al.,
14 2010, Orlandi et al., 2013]. Some models also include network dynamics by including the rewiring of neuronal connections as a component of their model [Stepp et al., 2015, van Kessenich et al., 2016, 2018]. Although we can produce richly-detailed models and show that they are sufficient to reproduce phenomena in the brain, it is also important to know what is necessary to re- produce phenomena in the brain. For instance, suppose we were interested in studying the emergence of the RSNs observable in fMRI. Is it the network architecture of the brain that produces and determines the RSNs? Are regulatory mechanisms like inhibitory connections necessary to produce the RSNs? We can answer these types of questions by beginning with simple models, with as few assumptions as possible, and enriching them until we observe the behaviour we are interested in. For this reason, we will also introduce two simple neuron models that have been widely applied: the branching process and the contact process.
2.3.2 Branching processes
As information-processing units, neurons have an important characteristic, which is that they propagate information via an all-or-nothing signal. They either undergo an action potential in response to their parents’ stimulus, or they do not. The branching process is the result of throwing away all the internal information about the neuron (e.g. ion flow, membrane potential, etc.), and instead simply treating its firing stochastically. In the branching process, when a neuron at time t fires, it induces its daughter to fire with probability p at time t + 1. An example of this process is given in Figure-2.3. This discretization of time is typically justified by appealing to the fact that the action potential has a characteristic scale of 1 to 2 ms [Beggs and Plenz, 2003]. Although extremely reductionist, this model is useful for a few reasons. It is analytically tractable and is exactly solved in mean-field. Owing to this, we know that it exhibits a phase transition when the mean branching ratio σ, the average number of immediate descendants of a firing neuron, is one. Below this, activity inevitably dies away, and above this, there
15 Network topology:
0 1 2 3 4
Example dynamics: t = 0
t = 1
t = 2
t = 3
t = 4
t = 5
t = 6
Figure 2.3: An example of a branching process on a simple linear bidirectional network (shown at the top). The dynamics consists of a single cascade initiated at node 1 at time t = 1. As connections here are recurrent, nodes can be reactivated, as occurs at node 1, at time t=3 and node 3 at time t=5.
is a finite probability that any activation leads to an infinitely large and long cascade. Its critical exponents near this point fall into the universality class of directed percolation (see §3.3.1) which, as was noted in 2003 [Beggs and Plenz, 2003], agrees with neural cultures. The branching process is the basis for a broader family of stochastic models, in which neurons may have multiple parents, so that the probability of a neuron firing in a given time- step is generically P (m), where m is the number of active parents. Such examples are more likely to have analytical solutions. One example of P (m) is the quorum percolation model, in which P (m) = H(m − m0), where H is the Heaviside step function, so that a neuron fires only if it has exactly m ≥ m0 parents. This reflects the fact that real neurons will almost invariably fire if a sufficient input is applied. One of the findings of the quorum percolation model, when applied to living neural networks, is the existence of a non-zero steady state, in which some fraction of the nodes are always active [Cohen et al., 2010]. In general, if P (m) is monotonic in m, one may predict a steady state. The prediction of a steady state in
16 neural tissue is unsurprising; healthy human brains are characterized by continuous neural activity, the absence of activity defines coma.
2.3.3 Contact processes
Branching processes are a stochastic process aimed at modelling the behaviour of individual neurons. However, simulating or modelling every neuron in the brain is presently intractable. If the aim is to reproduce observations made at the whole-brain scale, it is necessary to redefine the fundamental unit of the system in question. One way to do this is to consider mesoscopic functional units such as a micro-cortical column or other brain regions, which are assumed to consist of a few hundred cells to a few million. This is a form of temporal and spatial coarse-graining, and there exist many models for collective neuron dynamics [Breakspear, 2017]. As we are dealing with population dynamics, for which the 1 to 2 ms timescale of in- dividual neurons is much smaller than 100 to 200ms timescales of the brain region [van Den Heuvel and Pol, 2010], we often work with continuous time models [Breakspear, 2017]. We will introduce one family of contact processes, where each node is in a particular state, known as compartmental epidemiological models. Such processes were originally developed for modelling epidemiological processes, where individuals were either susceptible (S), in- fected (I), or recovered (or removed) (R) [Bogu´aet al., 2003]. In neuroscience parlance, these might correspond to inactive, active, and refractory states. The contact process is defined by the transition rates between each state, and are typi- cally named by which states are accessible, and in which order. For instance, the susceptible- infected (SI) process models a population of susceptible individuals, that may become sick without any hope of recovery. This might be appropriate for modelling a chronic illness, such as the human immunodeficiency virus. Diseases, where recovery from the disease con- fers immunity, might be modelled with a susceptible-infected-recovered (SIR) model, and a disease in which there is no immunity after being cured might be modelled with a susceptible-
17 infected-susceptible (SIS) model. Initially, such models considered so-called “well-mixed” populations, where every indi- vidual interacts with every other individual. This leads to very simple dynamics, that allow us to model the evolution of the population with a set of coupled differential equations. For instance, the three compartment SIS model is governed by the following differential equations [Brauer, 2008]: S˙ = −λSI + µI (2.7) and I˙ = −µI + λSI , (2.8) where S + I = 1 are the fractions of the population that are susceptible and infected respec- tively, λ denotes the rate at which the infected spread their disease to the susceptible, and µ is the rate of recovery from the disease. For structured populations, we again begin with rates of infection but assume that nodes can only affect the states of their neighbours. As all updates are controlled by rate equations we will have asynchronous updates characterized by state changes at intervals drawn from an exponential distribution. Such systems are typically simulated using the Gillespie algorithm or its optimized variations [Cota and Ferreira, 2017]. On both undirected and directed networks, these models have a phase transition. With large λ these models have an epidemic phase, where the population of infected or recovered is a non-zero fraction of the total network size. With small λ, in the infinite system size limit, there is a non-epidemic phase [Ferreira et al., 2012]. The critical λc will depend on the network topology; however, regardless of the topology, near this phase transition, the process falls into the universality class of directed percolation [Ferreira et al., 2012, Kwon and Kim, 2013, Lee et al., 2013, Parshani et al., 2010]. Of course, when the structure of the network simulated is all-to-all, we reproduce the well-mixed populations and the original population differential equations reappear. Although there is extensive literature dealing with the use of coarse-grained contact
18 processes in neuroscience [Breakspear, 2017], there are two examples of the SIS model that I would like to highlight. The first is the work of Haimovici et al., who showed that running the SIS model on the human connectome (the network structure of functional regions of the brain) results in the appearance of the resting state network, but only when the SIS model is tuned to its critical point [Haimovici et al., 2013]. This is a significant result, because it highlights the necessity of both critical dynamics and brain structure to reproduce resting state networks, without requiring any detailed description of the dynamics at the neural level. The second result I would like to highlight also relates to the effects of network architec- ture on large-scale neural dynamics. One challenge to the criticality hypothesis is that the critical point requires fine-tuning. The typical answer to this is that neural systems exhibit self-organized criticality [Beggs, 2008, Beggs and Timme, 2012]. Moretti et al. offered a complementary solution to the tuning problem, by suggesting that the heterogeneity of the network structure of the brain results in an extended critical regime, dubbed a “Griffiths Phase” [Moretti and Mu˜noz,2013]. An extended critical regime would considerably relax the requirement for fine-tuning to a critical point. One notable aspect of the Griffiths phase is the presence of continuously-varying critical exponents, both in the avalanche distribution and in the exponents related to activity decay. Moretti demonstrated this result on hierar- chical modular networks (see §3.1.6 for details), which reflect the tendency for the brain to be both modular, with functions associated to a certain region, and hierarchical, with sub- sequent refinement of function within sub-regions (see [Meunier et al., 2010] for a review of hierarchical modularity in brain networks). It should be noted however, that Griffiths phases have also been observed to exist on more generic modular networks [Cota et al., 2018]. Both results show that some observables are informed more by the underlying network structure than the underlying dynamics.
19 2.4 Noise in the brain
In most models of neuronal avalanches, initiation of activity occurs on a timescale that is distinct from the propagation of the avalanche, meaning that the two phenomena can be separated. However, this is not always a biologically accurate assumption to make. For instance, although typically neurons only release neurotransmitters when undergoing an ac- tion potential, occasionally synaptic vesicles full of a neurotransmitter will be spontaneously ejected from the synaptic bulb. This leads to a small depolarization of the membrane of the daughter neuron, in a process known as a “mini”. These minis play a role in evoking spontaneous action potentials, even when none of the parent neurons have undergone an action potential [Kavalali, 2015, Sara et al., 2005]. Further, even if the minis alone do not cause a neuron to fire, they can serve to change the relative propensity to fire; a neuron that has recently been exposed to a mini may be partially depolarized, making it more likely to fire due to other inputs from a parent neuron. Minis are not the only source of noise. More generally speaking, unless one is simulating the entire brain, there will be neurons outside of the simulation whose activity may occasion- ally impinge upon the simulation and drive activity therein. This is not a problem unique to the constraints of simulation. In experiments studying neuronal cultures, there are often neurons outside of the field of view whose activity evokes activity within the field of view [Wilting and Priesemann, 2018]. Connections to neurons outside of the region of interest can be viewed as another source of noise. Lastly, there also exist sensory neurons who respond to stimuli and forces outside of the neural network. These outside stimuli also comprise a source of noise.
2.4.1 The effect of noise on observables
The presence of noise significantly complicates the notion of an avalanche as a cascade of causal activity. To illustrate the problem, imagine that the criticality hypothesis holds on the
20 Overlapping avalanches distort avalanche statistics 100 Avalanche triggering rate: 0 Avalanche triggering rate: 10−3 Avalanche triggering rate: 5 × 10−3 10−2 Avalanche triggering rate: 10−2 Avalanche triggering rate: 2 × 10−2 ) S ( 10−4 P
10−6 Probability
10−8
10−10 100 101 102 103 104 105 106 107 108 Avalanche Size S
Figure 2.4: The results of overlapping avalanches, when avalanches are initiated as a Poisson −3/2 process of various rates. Avalanche sizes are drawn from√ a pure power-law, P (S) ∼ S , and avalanche durations are assumed to scale T ∼ S, with time rescaled so that the duration of a size 1 avalanche is T = 1. If another avalanche is triggered in the timespan of the first, their sizes are added and the length of the avalanche is potentially increased, possibly including another independent cascade.
21 scale of a single neuronal culture and that neuronal avalanches are distributed P (s) ∝ s−1.5 √ with the duration of a neuronal avalanche being related to its size by hT i(s) ∝ s, as is the case in the branching process. If we consider a large ensemble of decoupled patches of neurons, the initiation of neuronal avalanches in each patch will be independent and presumably uncorrelated. Therefore, the overall initiation of avalanches will be Poisson, and it will be possible that spatially distinct avalanches will overlap in time. If we simulate this process and require that a “single” avalanche be separated by a period of quiescence, then for quite moderate rates of initiation of avalanches, we find significant deviations from a single pure power-law, as in Figure-2.4. Over the first several decades of the probability distribution, the associated power-law exponent depends on the level of spontaneous activity. Due to experimental limitations, it is typical to offer far less than four orders of magnitude as evidence for scale-free behaviour [Friedman et al., 2012, Tagliazucchi et al., 2012]. What this shows is that were an experimentalist to observe a large neural system and use the definition of avalanche owing to Beggs et al., their observations of avalanches might be biased by non- causal temporally-overlapping cascades. The experimentalist might therefore report different “scale-free” exponents than are actually present in the underlying avalanche process. To overcome the challenge presented by noise, there have been several efforts to gen- eralize avalanches in such a way as to retain the causal aspect of avalanches. One result that we already highlighted was the efforts of Tagliazucchi et al. in section 2.2, who was trying to observe avalanche phenomena in fMRI data. Because at any given moment several independent cascades of activity were travelling across the cortex, they separated activity spatially, by assuming that neighbouring brain regions had the strongest influence on each other [Tagliazucchi et al., 2012]. This was essentially the same approach employed at the mesoscale in mice brains using optical imaging. At any given moment there were numer- ous patches of active cortex, these clusters were labelled independently and merged upon contacting each other [Scott et al., 2014]. The use of structure to help disentangle independent causal activity has led to an interest
22 Figure 2.5: Causal webs can be used to distinguish spatially distinct events, as well as the progenitor events in avalanches. On the left are the spike trains observed in the neurons on the right. There are two causal webs of size three, as well as a causal web of size 4. Under the traditional model of avalanches, with avalanches delineated by periods of silence, there would be two avalanches: one of size six and one of size four. in generalizing avalanches to a new structure known as a “causal web” or cweb [Williams- Garcia et al., 2017], which makes use of the network structure in identifying causal activity (see Figure-2.5). In the limit of low rates of spontaneous activity, where every cascade has only a single progenitor and there is a separation of timescales between initiation and propagation of activity, cwebs are exactly the traditional avalanches. Thus, in the low-noise limit, cwebs can be scale-free. In more active systems, cwebs allow independent coincident cascades to be separated. Demonstrating that systems with high levels of spontaneous activity can also produce scale-free distributions of cwebs will be one of the principal aims of this thesis. Determination of cwebs requires a knowledge of the network structure. This is a challenge to experimental use, as obtaining the complete network topology of living neural networks is an open problem. Hence, it’s also of interest to study how other, more experimentally- accessible indicators of criticality change in the presence of noise. These indicators might be measurements of the local branching ratio [Beggs, 2008], the susceptibility [Moretti and Mu˜noz,2013], or the size of fluctuations in the active fraction of neurons [Williams-Garc´ıa et al., 2014]. It has already been shown that in the presence of noise, some measures of the
23 correlation length no longer diverge, but instead reach a maximum known as the Widom line [Williams-Garc´ıaet al., 2014]. The measurements of the branching ratio will also be affected by the presence of spontaneous activity, as noise or other active parents help drive a daughter neuron to activate. This disrupts one of the other classical measures of a “critical” state – which is that the branching ratio is 1 [Beggs, 2008, Poil et al., 2008, Shew and Plenz, 2013].
2.4.2 Modelling noise in the brain
Numerous studies simulating neuronal avalanches begin with the assumption of a separation of time scales between the resolution and initiation of avalanches, by initiating a new cascade whenever one finishes [de Andrade Costa et al., 2015, Girardi-Schappo et al., 2016, Moretti and Mu˜noz,2013, Odor, 2016, Plenz, 2012, Williams-Garc´ıaet al., 2014]. However, there are several studies [Orlandi et al., 2013, Poil et al., 2012] which include noise in their dynamics to drive the production of avalanches, though none have made a systematic study of the appropriate level of noise. Typically, these models assume a very low level of homogeneous spontaneous activity, so that they effectively only have one avalanche at a time. In dynamical models that describe the membrane potential, noise plays an important role in depolarizing the membrane and making it easier for a neuron to trigger other neurons [Orlandi et al., 2013]. It has been observed that noise can be focused into coherent activity and that this focusing behaviour is determined by the network structure [Orlandi and Casademunt, 2017, Orlandi et al., 2013]. A variant of the branching process known as the cortical branching model has been used to illustrate the distortion of power-laws in the presence of noise [Williams- Garcia et al., 2017]. There, noise led to extended avalanches and non-scale-free avalanche distributions. A recent preprint studying the effect of ongoing noise during avalanches, but under the assumption of a separation of timescales between avalanches, has shown that noise can change the underlying exponent characterizing the resulting avalanche distribution [Das and Levina, 2018]. However, this work did not consider the impact of possibly distinct, but
24 concurrent, cascades on the avalanche distribution. Finally, no equivalent work has studied which, if any, critical exponents describe the causal web distribution in the presence of noise.
2.5 Summary
In this chapter, we have introduced neuroscience’s criticality hypothesis, the notion that neural systems operate at a critical point. This hypothesis is motivated by experimental observations of power-laws in neuronal cultures and fMRI studies. Theoretical models also predict the same power-laws at a phase transition and by use of homeostatic mechanisms explain how the brain might self-tune to this critical point. Although there exist sophisticated models of neuronal behaviour, it turns out much of the biological detail is unnecessary when it comes to producing the aggregate observations of neurons. The most necessary ingredient is that networks have excitable nodes that can spread their activity. Simple branching and contact processes are sufficient to reproduce many results related to neuronal avalanches. However, few of these theoretical models include noise. Noise is a significant complication for the traditional observables related to criticality, such as scale-free avalanches and the branching ratio. One possible generalization of neuronal avalanches, the causal web, will be an object of study for the remainder of this thesis.
25 Chapter 3
Mathematical Background
In this chapter, I will introduce some necessary background material on the topology of networks using the language of graph theory, as well as introduce the dynamics of systems on complex networks in the language of percolation.
3.1 Random graphs and network theory
Here I will introduce the terminology of network theory and methods for constructing graphs. A graph is a structure G consisting of two sets, a set of N labelled vertices N (also known as nodes) and L links L (also known as edges). Each vertex is labelled i ∈ {1,...,N}.
Each link can be represented as (i, j) ∈ ZN × ZN , and represents a connection between two vertices. In the networks we study, we do not allow self-links. Graphs can be considered directed or undirected. In the directed case, the link lij denotes a connection from i to j. In
the undirected case, lij denotes that the connection from i to j is reflexive, i.e. that there is also a connection from j to i. An undirected graph with L links can always be represented
0 as a directed graph with 2L links, by replacing each of the L links lij with two links: lij and
0 0 0 lji. A subgraph H is a graph whose vertices N and links L are themselves subsets of the vertices and links of another graph G so that N 0 ⊆ N and L0 ⊆ L. The underlying graph of a directed graph is the undirected graph obtained by replacing all directed links in the
26 directed graph with undirected links. A complete graph is an undirected graph for which every distinct pair of vertices are connected by a single link.
Two vertices i and in in a directed graph are considered strongly-connected if there
exists a subset of links of the graph lii1 , li1i2 , . . . , lin−1in that form what is known as a path between the two vertices. A graph is considered strongly-connected if every pair of vertices i, j with i 6= j are strongly-connected. Two vertices are considered weakly-connected if there exists a path between the two vertices in the corresponding underlying graph. A directed graph is weakly-connected if every pair of vertices i, j are weakly-connected. The giant strongly-connected component (GSCC) is the largest subgraph of a di- rected graph that is strongly-connected. The in component of the GSCC is the set of vertices reachable from the GSCC by following incoming links, or equivalently, it is the set of all the vertices that can reach the GSCC by following the directed outgoing links. The out component of a GSCC is the set of vertices reachable from the GSCC by following outgoing links from the GSCC. If a graph is strongly-connected, then the GSCC, in compo- nent, and out component are identical and constitute the entire graph. Similarly, the giant weakly-connected component (GWCC) is the largest subgraph of a directed graph that is weakly-connected. The distance between two vertices is the length of the shortest path between them, should such a path exist. Should it not, the distance is considered to be infinite. The diameter of a graph is the greatest distance between any two vertices in the graph. The out-degree of a vertex is the number of links that originate at the vertex. The in degree of a vertex is the number of links that terminate at a vertex. The degree of a vertex is the sum of the in- and out-degrees of the vertex. In undirected graphs, the degree is just the number of links involving the vertex.
27 3.1.1 k-ary trees
A rooted k-ary tree is a type of weakly-connected directed graph with no loops. It has a singular vertex of in-degree 0 called the ‘root’ from which all other nodes can be reached, and all other vertices have in-degree 1 [Graham et al., 1989]. The complete k-ary tree has an out-degree of k at each vertex and is therefore an infinite graph. An incomplete k-ary tree has an out-degree of at most k at each vertex. The number of rooted, incomplete, k-ary trees with s vertices is given by the Fuss-Catalan numbers (per page-347 of [Graham et al., 1989]) 1 ks C(k) = . (3.1) s (k − 1)s + 1 s
Each such tree has a perimeter t, the number of additional edges that each vertex could add while keeping the tree k-ary, which is linearly related to the size of the tree by
t = (k − 1)s + 1 . (3.2)
3.1.2 k-regular graphs
An undirected k-regular graph is a graph where the degree of each vertex is exactly k. A directed k-regular graph is a graph where the in- and out- degrees of each vertex is exactly k. An example of an undirected 2-regular graph is a ring graph, where each node can be embedded on a ring, and is connected to its nearest neighbour in either direction. The strongly-connected 1-regular graph is a ring of vertices, all connected to their nearest neighbour in the (without loss of generality) clockwise direction. Although k-regular graphs may be highly structured, as in the previous examples, for most of this thesis we will be interested in random-graphs. A random k-regular graph with n vertices is a graph selected from the set of all k-regular graphs with n vertices, with equal weight on each graph in the
1 set, so that P (G) = Ω where Ω denotes the number of k-regular graphs with n vertices [Bollob´as,2001, Newman, 2010]. As the undirected k-regular graph is fairly homogeneous, it
28 shouldn’t be surprising that the diameter d of the random k-regular graph has the relatively tight bound
log(2kn log n) 1 + blog nc + blog (log n) − log (6k/(k − 2))c ≤ d ≤ 1 + (3.3) k−1 k−1 k−1 log(k − 1) per [Bollob´as,2001]. Generating finite k-regular graphs is not particularly difficult. To accomplish this, we will introduce the configuration model of graph generation. In general, the configuration model allows for the generation of a random graph with a presupplied (in/out) degree distri- bution, although in the case of a random k-regular graph, the degree distribution is simply in/out p (m) = δmk.
The configuration model
We will consider the configuration model for graphs of N, each with uncorrelated in- and out-degree, so that p(j, m) = pin(j)pout(m) where p(j, m) denotes the probability of a random vertex having j in-degree and m out-degree. Each vertex i is assigned ji incoming ‘stubs’ and
in out PN PN mi outgoing ‘stubs’, drawn from p (j) and p (m) such that i mi = i ji. The incoming and outgoing stubs are randomly paired off. In principle, for degree distributions pin(j) and pout(m) with a well-defined variance this procedure will succeed in generating a simple graph (i.e. with no loops or duplicate edges) a non-zero proportion of the time [Chen and Olvera- Cravioto, 2013]. However, this proportion may be very small for even moderate N, and may require that this procedure be repeated many times before successfully generating a random graph. There are a few ways to deal with the presence of duplicate edges or self-connections. One approach is to simple delete the offending edges. This has two problems however: firstly it distorts the degree distribution, and secondly it can bias the graph selection process. This makes analytical results difficult or impossible to obtain [Newman, 2010]. However,
29 in the limit of large N, the fraction of duplicate edges or self-connections vanishes, and asymptotically we recover the same degree distributions [Chen and Olvera-Cravioto, 2013]. Another possible approach, that maintains the degree distribution, is to introduce restric- tions during the graph generation process. We subject the pairing of stubs to two restrictions: (i) an incoming stub of vertex i may not be paired off with an outgoing stub of vertex j if i = j, and (ii) an incoming stub of vertex i may not be paired off with an outgoing stub of vertex j if there already exists an incoming stub of vertex i paired with an outgoing stub of vertex j. These two restrictions prevent the formation of self-connections and duplicate connections between vertices. Should it be impossible to pair off a given stub (say that would require a self-connection), existing pairs are randomly reassigned in accordance with the two previous rules, until it is possible to pair off the remaining stubs. Once all stubs are paired off, the graph may be constructed, with an outgoing stub from i paired to an incoming stub of j constituting a single directed link lij. Introducing these restrictions means that we no longer equally weight all possible graphs that could have been sampled; however, this discrepancy becomes small for large N [Chen and Olvera-Cravioto, 2013]. We’ve also glossed over one of the other challenges to implementation of the configuration model. It is not always trivial to sample two distributions pout and pin subject to the constraint that the sum of the mi and ji are equal, although there exist algorithms that will produce graphs with the same degree distribution for large N [Chen and Olvera-Cravioto, 2013]. For this reason we will only use the configuration model for trivial degree distributions, such as the k-regular graphs.
3.1.3 Erd¨os-R´enyi graphs
An Erd¨os-R´enyi graph may refer to a graph belonging to one of two families of random graphs. In the G(N,L) model, a graph with N nodes and L links is chosen randomly from the set of all graphs with N nodes and L links, with equal weight on all graphs in the set.
30 In the G(N, p) model, a graph is constructed by connecting nodes at random. Each possible link is included with probability p [Newman, 2010]. We will focus on the second family of random graphs, as the construction of random graphs in this class can be easily implemented algorithmically. The (in/out) degree distribution for the G(N, p) model is given by the binomial distri- bution N − 1 p(m) = pmpN−m , (3.4) m
which in the limit N m and Np constant takes on the form
(pN)me−pN p(m) = , (3.5) m!
which we recognize as the Poisson distribution with mean pN. Hence, the mean degree in an Erd¨os-R´enyi graph is pN, which agrees with the p(N − 1) obtained as the mean of the binomial distribution. When implementing the generation of Erd¨os-R´enyi graphs with fixed pN, the naive algorithm evaluating all N(N − 1) possible links (or N(N − 1)/2 links in the undirected case) runs in O(N 2) time. However, if instead we sample from the Poisson distribution (which runs in constant time using a lookup table) to determine the out-degree for each vertex, and choose destination vertices at random, we instead obtain linear O(N) time. Owing to the original paper in which the G(N, p) model was introduced [Erd¨osand R´enyi, 1960], we know that the graph is almost certainly connected if
pN > log N. (3.6)
3.1.4 Small-world graphs
The random graphs that we have discussed thus far exhibit low clustering. Clustering refers to the propensity of a graph to have nodes that are tightly knit into groups. There are
31 several ways to evaluate the clustering of a graph, including global and local measures. The transitivity index (also sometimes known as the global clustering coefficient), is given by [Wasserman and Faust, 1994]
# of closed triads 3 × # of triangles C(G) = = , (3.7) # number of possible triads # number of possible triads
where a triad is an ordered set of any three vertices. A triad is closed if edges exist (directed or otherwise) between each of the nodes in the triad. The transitivity index is a global measure; however, there exists a local measure, known as the local-clustering coefficient. The local-clustering coefficient of a vertex V in an undirected graph was introduced by Watts and Strogatz to be
2N CC(V ) = V , (3.8) KV (KV − 1)
where NV is the number of connections between neighbours of V and KV is the degree of V [Watts and Strogatz, 1998]. The clustering coefficient varies between 0, where V is the centre of what is locally a star graph, and 1, where V is the centre of a clique (a subgraph where all nodes are connected to all others). From this local-clustering coefficient, a global measure C(G) may be defined on a graph as the mean local-clustering coefficient [Watts and Strogatz, 1998]. We will refer to this as the WS-clustering coefficient (WS denoting Watts-Strogatz).
So for the Erd¨os-R´enyi graph for instance, the number of neighbour interconnections Ni for
a node with Ki edges is just pKi(Ki −1)/2, as each connection occurs with probability p and