University of Calgary PRISM: University of Calgary's Digital Repository

Graduate Studies The Vault: Electronic Theses and Dissertations

2019-05-02 The Universal Critical Dynamics of Noisy Neurons

Korchinski, Daniel James

Korchinski, D. J. (2019). The Universal Critical Dynamics of Noisy Neurons (Unpublished master's thesis). University of Calgary, Calgary, AB. http://hdl.handle.net/1880/110325 master thesis

University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY

The Universal Critical Dynamics of Noisy Neurons

by

Daniel James Korchinski

A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

GRADUATE PROGRAM IN PHYSICS AND ASTRONOMY

CALGARY, ALBERTA May, 2019

c Daniel James Korchinski 2019 Abstract

The criticality hypothesis posits that the brain operates near a critical point. Typically, critical neurons are assumed to spread activity like a simple branching process and thus fall into the of directed . The branching process describes activity spreading from a single initiation site, an assumption that can be violated in real neurons where external drivers and noise can initiate multiple concurrent and independent cascades. In this thesis, I use the network structure of neurons to disentangle independent cascades of activity. Using a combination of numerical simulations and mathematical modelling, I show that criticality can exist in noisy neurons but that the presence of noise changes the underly- ing universality class from directed to undirected percolation. Directed percolation describes only small scale distributions of activity, on larger scales cascades can merge together and undirected percolation is the appropriate description.

ii Preface

This thesis is an original work by the author. No part of this thesis has been previously published.

iii Acknowledgements

This work would not be possible without the gracious financial support of the National Science and Engineering Research Council, Alberta Innovates, the University of Calgary’s Faculty of Graduate Studies, Student Aid Alberta, and the Nathoo family. I would like to ex- press my gratitude to Professor J¨ornDavidsen, who captured my imagination by introducing criticality in the brain to me. He started me on this journey, and with his patience, support, and drive, has seen me through to completion. I’d also like to thank Dr. Seung-Woo, for the many fruitful conversations and suggestions made over coffee and while reviewing . Javier Orlandi was also tremendously helpful with numerous technical details related to modelling biological neurons. Without these three, this thesis would be a shadow of its present state. I would also like to thank my parents for their support and gently prodding questions, and to Raelyn for her humour and cheer on days that mine lapsed.

iv Table of Contents

Abstract ii

Preface iii

Acknowledgements iv

Table of Contents v

List of Figures and Illustrations viii

List of Tables xiv

List of Symbols, Abbreviations and Nomenclature xv

1 Introduction 1 1.1 Complex systems ...... 1 1.2 Complex networks ...... 2 1.3 The brain as a complex system ...... 3

2 Criticality in Neural Systems 6 2.1 A brief review of criticality ...... 6 2.2 Experimental evidence of neural criticality ...... 9 2.3 Modelling criticality in the brain ...... 11 2.3.1 Hodgkin-Huxley and other “biological” dynamical neuron models . . 12 2.3.2 Branching processes ...... 15 2.3.3 Contact processes ...... 17 2.4 Noise in the brain ...... 20 2.4.1 The effect of noise on observables ...... 20 2.4.2 Modelling noise in the brain ...... 24 2.5 Summary ...... 25

3 Mathematical Background 26 3.1 Random graphs and network theory ...... 26 3.1.1 k-ary trees ...... 28 3.1.2 k-regular graphs ...... 28 3.1.3 Erd¨os-R´enyi graphs ...... 30

v 3.1.4 Small-world graphs ...... 31 3.1.5 Power-law graphs ...... 34 3.1.6 Hierarchical modular graphs ...... 37 3.2 Percolation ...... 41 3.2.1 Percolation in 1-dimension ...... 42 3.2.2 Percolation on the Bethe lattice ...... 44 3.2.3 Percolation on other graphs ...... 47 3.3 Directed percolation ...... 49 3.3.1 Spreading processes ...... 51 3.4 Summary ...... 52

4 The Branching Process with Noise 54 4.1 Results for the branching process with noise on infinite k-regular graphs . . . 55 4.1.1 Active fraction ...... 56 4.1.2 Mean cluster size ...... 59 4.1.3 Phase diagram ...... 66 4.1.4 Mergeless cluster distribution ...... 72 4.1.5 Cluster size distribution ...... 74 4.1.6 Avalanche duration and scaling relations ...... 78 4.1.7 Correlation length ...... 82 4.1.8 Size of the giant component ...... 89 4.2 Numerical results for the branching process with noise on finite k-regular graphs 93 4.2.1 Avalanche distributions ...... 93 4.2.2 The giant component in finite graphs ...... 95 4.2.3 Mean cluster size ...... 96 4.3 Simulations on other finite networks ...... 99 4.3.1 Small-world graphs ...... 100 4.3.2 Power-law networks ...... 101 4.3.3 Hierarchical modular networks ...... 103 4.4 Thresholded avalanches ...... 108 4.5 Summary ...... 113

5 Quadratic Integrate-and-Fire neurons 116 5.1 The model ...... 116 5.2 Simulations on Erd¨os-R´enyi and hierarchical modular networks ...... 119 5.3 Summary ...... 121

6 Conclusions 124 6.1 Summary of results ...... 124 6.2 Outlook and future work ...... 126

Bibliography 129

A Supplementary Figures 141

vi B Numerical Methods 152 B.1 Simulation of infinite k-regular branching processes with spontaneous activity 152

vii List of Figures and Illustrations

2.1 Neuronal avalanches, reproduced from Beggs [Beggs and Plenz, 2003]. Top: each point indicates the detection of an action potential at that electrode label. Bottom: Detail showing the evolution of a single avalanche...... 10 2.2 The basic anatomy of a pair of ideal neurons. The neuron outlines are repli- cated from [Mel, 1994]...... 12 2.3 An example of a branching process on a simple linear bidirectional network (shown at the top). The dynamics consists of a single cascade initiated at node 1 at time t = 1. As connections here are recurrent, nodes can be reactivated, as occurs at node 1, at time t=3 and node 3 at time t=5...... 16 2.4 The results of overlapping avalanches, when avalanches are initiated as a Pois- son process of various rates. Avalanche sizes are drawn from a pure power-law,√ P (S) ∼ S−3/2, and avalanche durations are assumed to scale T ∼ S, with time rescaled so that the duration of a size 1 avalanche is T = 1. If another avalanche is triggered in the timespan of the first, their sizes are added and the length of the avalanche is potentially increased, possibly including another independent cascade...... 21 2.5 Causal webs can be used to distinguish spatially distinct events, as well as the progenitor events in avalanches. On the left are the spike trains observed in the neurons on the right. There are two causal webs of size three, as well as a causal web of size 4. Under the traditional model of avalanches, with avalanches delineated by periods of silence, there would be two avalanches: one of size six and one of size four...... 23

3.1 A demonstration of the Watts-Strogatz model. (a) A circulant graph, connect- ing the nearest two neighbours (giving each node a degree of four) is shown for N = 10. Of the 20 bonds, 4 are selected for rearrangement. (b) The bonds for rearrangement retain one end-point while the other is swapped for another at random...... 33 3.2 Degree distribution for power-law networks with uncorrelated degree distri- butions generated via the configuration model, with λ = 3.5 and kmin = 5, averaged across 500 networks of size N = 105...... 36 3.3 Degree distribution for power-law networks with uncorrelated degree distri- butions generated via the Goh model, with λ = 3.5 and hki = 10 averaged across 500 networks of size N = 105...... 37

viii 3.4 The in/out-degree correlations resulting averaged from an ensemble of 500 networks, both with an asymptotic degree distribution of p(k) ∼ k−3.5. (a) In/out-degree correlations for power-law networks generated by the configura- tion algorithm, as in Figure-3.2. (b) In/out-degree correlations for power-law networks generated by the Goh algorithm as in Figure-3.3...... 38 3.5 Base modules are represented by filled squares. Each base module might contain a dense network of neurons. Modules are wired into pairs – these pairs constitute a super module. Super-module pairings are indicated by a lighter shade of blue. Super-super modules are constructed from pairs of super- modules, and are indicated by the lightest shade of blue. During the formation of the super-super modules, a base module from each of the super-modules is selected, these two base-modules are then wired together as indicated with the lightest-blue edge. A single super3-module is constructed from the two super2 modules, and is indicated in green. Two base modules, one from each super2-module are wired together, this connection is indicated in green. . . . 39 3.6 A simple example of the vertices and edges populating two base modules (coloured blue) connected together to form a super-module (coloured purple). Here the number of intra-vertices per module, NPN, is 5. Each intra-vertice is coloured navy blue. The number of inter-vertices, NPC, is 2, and each inter-vertice is coloured red. Here, the out-degree of each vertex is 2. The inter-vertices only connect to the intra-vertices of the other module, their edges are in purple. Intra-vertices can connect to other intra-vertices or to the inter-vertices of the same module, their edges are shown here in green. The populations of intra-vertices and inter-vertices are circled in light blue. . 40 3.7 An example of three infinite lattice structures. (a) the 1-dimensional lattice. (b) The Bethe lattice of degree 4. (c) The 2-dimensional triangular lattice. . 42 3.8 Percolation on a 1D lattice of size N = 14. Occupied sites are coloured black. Sampling the seven active nodes, the cluster size distribution is Pn(S = 1) = 2 2 3 7 , Pn(S = 2) = 7 , Pn(S = 3) = 7 . Sampling the four clusters, the size 2 1 1 distribution is Pc(S = 1) = 4 , Pc(S = 2) = 4 , Pc(S = 3) = 4 . Hence, the 7 mean cluster sizes are hSin = 2 and hSic = 4 ...... 42 3.9 An example of (1+1)-dimensional directed bond percolation on a tilted square lattice. Surviving bonds after dilution are marked in black. A cluster of size 8 is marked in blue, beginning at the site marked in red and proceeding down the lattice following the directed links...... 49 3.10 Here, a contact process beginning a node 1 spreads to node 2, which in turn spreads the process to nodes 1 and 3, after which the process terminates. . . 51

4.1 An example of a branching process with multiple spontaneous activations/infections on a simple linear bidirectional network (shown at the top). The dynamics consists of two independent cascades, one with two roots (node 1 at time t = 1 and node 4 at time t = 2), and one with a single root (node 0 at time t = 3). 55

ix 4.2 a. The distribution of avalanche sizes on a 10-regular graph, with N = 104 nodes simulated for T = 103 (empty circles) or 104 timesteps (filled circles), −5 averaged over five network configurations, for various p1 and p0 = 10 . Solid −3/2 lines are exponentially truncated power-law fits, p(s) ∼ s exp[−s/sξ]. The p1 values for each curve are marked in panel b. b. The average number of nodes active in the largest cluster each time step...... 56 4.3 The active fraction Φ(p0, p1) for 10-regular graphs, for various p0 as a function of p1...... 58 4.4 The dynamical susceptibility χ0 as a function of p0 and p1. Maxima of the dynamical susceptibility are marked with blue squares. The susceptibility along the Widom line is plotted in black...... 59 4.5 A CWEB of size four is shown. Physical connections between nodes are shown in grey. Node B has nodes A and C as parents, while node C has D as a parent. Directed edges in black correspond to how the cluster is built, beginning from from A. Associated with each node added to the cluster is a probability of inclusion that depends only on information available along that path. Here, node A triggers B concurrently with C, while D triggered C. Evaluated from A however, the probability that B triggered (without knowledge of C’s firing) is pd, while the probability that C fired, conditioned on both A and B having fired, is pp1. Lastly, the probability that D contributes to C, conditioned only on the fact that C was activated, is pp. In this figure non-firing sites (e.g. the parents of A) are hidden to reduce clutter...... 61 4.6 (a) The phase-diagram for a 10-regular graph, with the Widom line, unity branching ratio line, and the line, on which χn = hSin di- verges. The limits of the diverging χn fall at the points expected for a directed and undirected percolation process on a Bethe lattice of coordination number k + 1 and 2k, respectively. (b) As in (a), but in the limit of low noise, and in 1 η log-log scale. All three lines follow p0 ∝ k − p1 with different η. From top to bottom, the η are 1, 2, and 3...... 67 4.7 The causal-web distribution Pc(s) of an infinite 10-regular graph, simulated −3/2 −5/2 for different p0 on the critical line. Power-laws, s and s are present to guide the eye...... 75 4.8 The causal web distribution Pc(s) of an infinite 10-regular graph, simulated −3/2 −5/2 for different p0 on the critical line. Power-laws, s and s are present to guide the eye...... 76 −5 4.9 a. Avalanche statistics for p0 = 10 simulated on an infinite 10-regular network at the theoretically determined critical point. Simulated avalanches with one root are shown with symbols, while the analytical prediction is shown with a line of the same colour. b. Average number of roots R for avalanches of a given size are shown for simulations of various p0 on an infinite 10-regular network. Inset shows curve-collapse across various p0, with rescaled x-axis of 2/3 sp0 ...... 77 4.10 Symbols are simulation results on infinite 10-regular lattices for 2 × 107 clus- ters, while solid lines are the analytical predictions of Equation-4.11...... 78

x 4.11 a. Mean avalanche durations for avalanches of various sizes simulated on the infinite 10-regular network, with varying levels of spontaneous activity. p1 −11/2 is set to a slightly sub-critical value, p1 = p1c − 10 , so that no infinite avalanches occur. b. As in a., the mean avalanche duration exhibits reason- able curve-collapse, with collapse quality increasing as p0 → 0...... 80 4.12 Avalanche durations simulated on the infinite 10-regular network, with varying levels of spontaneous activity. p1 is set to a slightly sub-critical value, p1 = −11/2 p1c − 10 , so that no infinite avalanches occur...... 81 4.13 The perpendicular correlation length function for simulations of infinite 10- −4 regular graphs near the critical line, with p0 = 10 . Thick solid lines are analytical predictions, while the lighter hue denote numerical averages from 2 × 106 simulations...... 85 4.14 a. Cluster size distributions obtained for finite (N = 107 for T = 104) and infinite 10-regular networks at the theoretical critical point given by the diver- gence of Equation-4.11. Finite simulations are given with transparent symbols, with the corresponding infinite graph result as a line of the same colour. b. As in a., but with a curve-collapse effected by rescaling according to the rate of spontaneous activation...... 94 4.15 Here we compare the analytical results of Equation-4.19 to the size distribution of mergeless avalanches on finite graphs for a variety of noise levels at the theoretical critical point given by the divergence of Equation-4.11...... 95 4.16 a. The fraction of the graph occupied by the largest cluster for various graph −3 −4 sizes, with p0 = 10 denoted by circles, and p0 = 10 denoted by triangles. Solid lines are theoretical predictions for the giant component size, as devel- oped in §4.1.8. Simulations are for T = 104 time steps. b. As in a., but with a curve-collapse effected by finite size scaling...... 97 4.17 a. The active fraction Φ and giant components G analytically (solid lines) and for simulations of varying sizes (symbols). Crosses are N = 1012/3, circles are N = 1014/3, and squares are N = 1016/3. Finite simulations are for T = 104 time steps, averaged over five network realizations. b. As in a., but for the fraction of active nodes that are part of the giant component...... 98 2 −3 4.18 a. The mean (finite) cluster size hS in with p0 = 10 denoted by circles, and −4 4 p0 = 10 denoted by triangles. Simulations are for T = 10 time steps. b. As in a., but with a curve-collapse effected by finite size scaling...... 99 4.19 a. Cluster size distributions obtained for finite (N = 105 for T = 105) small- world networks with rewiring probability 0.01. b. As in a., but with a curve- collapse effected by rescaling according to the rate of spontaneous activation...... 101 4.20 Simulations on small-world networks for N = 1013/3 and T = 105 averaged across three network realizations. Panels a-d correspond to various re-wire probabilities...... 102

xi 4.21 a. Simulations of the giant component on power-law networks with p(k) ∼ k−3.5 of varying sizes for T = 107/3 time steps and with 10 network realizations with estimated p1c = 0.1110, and giant emergence exponent β = 2 with 1/(dν) ≈ 0.25. b. As in a., but with a curve-collapse effected by finite size scaling...... 104 4.22 Simulations of the giant component on power-law networks with p(k) ∼ k−3.5 of varying sizes for T = 107/3 time steps and with 10 network realizations. Finite size scaling is performed with estimated p1c = 0.1110 and finite size scaling exponent 1/(dν) ≈ 0.25...... 105 4.23 a. Cluster size distributions obtained for finite (N = 107 for T = 104) power- law networks (p(k) ∼ k−3.5). b. As in a., but with a curve-collapse effected by rescaling according to the rate of spontaneous activation...... 106 4.24 a. Cluster size distributions obtained for finite (N = 215 modules, each con- sisting of M = 102 nodes for T = 104) on hierarchical modular networks. b. As in a., but with a curve-collapse effected by rescaling according to the rate of spontaneous activation...... 107 4.25 Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the lowest 2.5th percentile...... 111 4.26 Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 50th percentile...... 112

5.1 The response of a single QIFAD neuron, to a periodically-applied current, in- creasing in strength with each application. The top panel shows the membrane voltage, while the bottom panel shows the applied current...... 118 5.2 Simulations near the critical point for the QIFAD model on Erd¨os-R´enyi net- works, with N = 1011/2 neurons and λ = 4 × 10−3 kHz or 4 Hz, for three different values of connection strength g...... 121 5.3 Simulations conducted on hierarchical modular networks, with 1000 neurons per base node, 100 neurons per inter-modular connection, and 7 hierarchical layers (for a total of 27 nodes). The average in/out degree of each neuron was ≈70. Neuron parameters are as given in Table-5.1, except the that capacitance is given by C = 174 pF, while k = 0.4 and b = 3.5. The excitatory connection strength is given by g = 100 pA. In this simulation, λ = 650 Hz, while gshot = 70.3 pA. Ten percent of intra-neurons were inhibitory (GABAergic), with gGABA = -15 pA and τk = 20 ms. Values are the result of an average across five network realizations, each simulated for three minutes. a. The probability distribution function for mean causal web size, with a power-law fit generated by maximum likelihood estimator. b. The probability distribution function for mean causal web duration. c. The mean size for avalanches of a given duration. Fit is to avalanches smaller than the bursts...... 122

1/3 A.1 Power-law scaling of 1/k − p1c ∝ p0 , shown here for k = 10...... 142 1/2 A.2 Power-law scaling of 1/k − p1c ∝ p0 for the σ = 1 line, shown here for k = 10. 142 A.3 Scaling of the first- (Equation-4.13) and second-order (Equation-4.17) approx- imations to the active fraction (Equation-4.3) along the Widom line...... 143

xii A.4 The Widom line in the neighbourhood of p0  1 is asymptotically approxi- mated by Equation-4.18...... 144 A.5 The scaling of the size cutoff for mergeless avalanches. The exact sξ is given by Equation-4.21 and is plotted in purple. Equation-4.22 captures the correct scaling form for sξ, however has a poor prefactor for small p0, as can be seen in green in the figure above. Equation-4.25 shows an improved prefactor, and is plotted in blue...... 145 A.6 Giant component for simulations with N = 104 nodes on 10-regular graphs, of varying durations. Above the critical point, variation in simulation duration has no effect. Below the critical point, the largest cluster doesn’t scale exten- sively, and hence its occupation fraction for the whole simulation decreases as the simulation duration increases...... 146 A.7 Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 1st percentile...... 147 A.8 Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 34th percentile...... 148 A.9 Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 76th percentile...... 149 A.10 Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 97.5th percentile...... 150 A.11 Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 99th percentile...... 151

B.1 A cluster is developed from root-node A. This particular network structure is an illustrative conceit – no specific structure is specified in memory. (a) Consider a cluster developed from a single root node, occurring on an infinite, random, 2-regular graph for clarity. There are initially two type-I connections. (b) Each type-I node potentially has k − 1 other parents, each independently active with probability Φ. In this specific example, let’s suppose both of the type-I connections of node A have another parent. Then, each of the 2 type-I connections will be included in the cluster with probability: 1 − p0p1 . (c) Suppose the left type-I fails to activate, while the right (now labelled B) succeeds. The other parent B is now a type-II connection, while the hitherto unconsidered daughters of B are two new type-Is. (d) The type-II connection (now labelled C) is always included. It introduces a new type-I connection, and (after sampling from Equation-B.1) adds 1 new type-II connection. The other possible (but inactive) parent is shown in light-grey...... 155 B.2 A possible cluster realization of size 5, following the remaining steps outlined in Table-B.1, continuing from Figure-B.1...... 156

xiii List of Tables

3.1 A summary of percolation exponents in different network configurations and dimension. PL here denotes “power-law” and refers to a random graph with −λ degree distribution p(k) ∼ k . SW here denotes “small-world”. Here sξ denotes the characteristic size beyond which an exponential cut-off appears to truncate the power-law of Pn(S). Results for d = 1 and d = 2 are as given in [Christensen and Moloney, 2005]. Small-world values are as given in [Moore and Newman, 2000]. The PL network values hold for λ ∈ (2, 4). Those with λ < 3 have a percolation transition at p = 1 while the transition is at p < 1 for λ > 3, hence many quantities have singularities at λ = 3. Additionally, the cluster distribution has logarithmic corrections to Pn(S) for λ = 3. γ takes the value +1 for λ ∈ (3, 4) and −1 for λ ∈ (2, 3) [Cohen et al., 2002]. Mean field values, d ≥ 6, are as given in [Christensen and Moloney, 2005]...... 48 3.2 This table summarizes the directed percolation critical exponents for mean- field networks (d ≥ 4) and for both uncorrelated and correlated power-law networks. “Unc. PL” refers to directed percolation on directed power-law graphs with P in(j) ∼ j−λin and P out(k) ∼ k−λout uncorrelated at each vertex. “Cor. PL” refers to the same, but with the existence of a fraction AB of λout−1 nodes that are fully correlated, with the in-degree j = k λin−1 related to the ∗ λin−λout out-degree. λ = λout + in the GSCC of the power-law network. λin−1 For the uncorrelated PL networks, the first value of the exponent holds for λout ∈ (2, 3) and the second for λout ≥ 3. For correlated PL networks, the first value holds for λ∗ ∈ (2, 4) (excluding 3 in the case of β) and the second when λ∗ ≥ 4. Power-law values are from [Schwartz et al., 2002]. Mean field values (d ≥ 4) are as given in [Hinrichsen, 2000]...... 50

5.1 Values used for QIFAD simulations. From [Izhikevich, 2007] and [Orlandi et al., 2013]...... 120

B.1 An example of developing a single cluster of size 5. Nodes and edges in the cluster are in black, while the nodes and edges constituting the boundary of the cluster are in light grey. The first few operations are illustrated in Figure-B.1...... 157

xiv List of Symbols, Abbreviations and Nomenclature

Symbol or abbreviation Definition 2F1 The Gauss hypergeometric function p The complementary probability 1 − p BP Branching process GP Griffiths phase SIS Susceptible-infected-susceptible SIR Susceptible-infected-removed fMRI Functional Magnetic Resonance Imaging RSN Resting State Network EEG Electroencephalogram GSCC Giant strongly-connected component GWCC Giant weakly-connected component Billion 109. I use short-scale here.

xv Chapter 1

Introduction

1.1 Complex systems

The field of complex systems is characterized not by the relative intractability of its equations but rather by the adage that “more is more”. A system is complex if the interactions between its elements display emergent behaviour [Hastings et al., 2017]. There are many archetypal systems in which simple dynamics lead to a richer gestalt; however, the motivation for this thesis lies in the application of to the brain. Although a cellular neuroscientist might object to this characterization, the brain is a system comprised of a multitude of simple parts, neurons, whose aggregate behaviour is comparatively richer than that of the individual elements. Individual neurons do not compose poetry or evince any hint of consciousness, yet in bulk, they produce language, art, and mathematics. In statistical physics, we are commonly faced with a similar problem of explaining or predicting bulk behaviour from the dynamics of individual particles. The typical approach is to observe some unusual effect in a bulk material. Then, a Hamiltonian function for the individual units of the system is conjectured, beginning with as few interactions between elements as possible. From this parsimonious Hamiltonian is built the Hamiltonian for the ensemble. Should the ensemble Hamiltonian fail to capture the dynamics of the bulk

1 material, the Hamiltonian of the individual units is enriched slightly, and the process is repeated, until the observed bulk effect is explained. By beginning with a maximally simple model, exactly those elements necessary to produce the observed bulk behaviour are present and the emergent phenomenon may be explained.

1.2 Complex networks

Systems are complex if rich collective behaviour emerges from relatively simple dynamics. However, it is sometimes the case that the structure of a system can strongly influence the collective dynamics of a system. In the statistical physics picture, the description of how elements interact (i.e. their interaction potentials) is distinct from the description of which elements interact. For instance, the bulk electronic behaviour of graphene is quite distinct from that of diamond, even though the constituent nodes are in both cases carbon atoms [Sarma et al., 2011]. The difference between the two materials is that of their structures, which differ in both underlying symmetry and dimension. A generic way to encode the structure of a complex network is with the language of graph theory [Newman, 2010]. Individual elements of the system are called nodes. Nodes that can interact with each other are connected via edges (links) between them. Edges can also encode a directionality. For instance, the world-wide-web can be captured as a graph by letting the nodes of the graph denote web-pages and the edges between them being hyperlinks. These edges would be directed because a web-page does not need to link back to the page that linked to it. One area in which complex networks have been employed is modelling the size of disease outbreaks [Newman, 2002]. Individual humans might be nodes, while the links between them denote a possible route of disease transmission. Human interaction networks are often said to be “small-world”, reflecting the observation that there are relatively few intervening steps connecting any two randomly selected people [Travers and Milgram, 1977]. The structure

2 of disease networks has a significant impact on the transmission of disease and the size of epidemics, should they occur at all. In “small-world” networks, if a disease is sufficiently infectious an epidemic is always possible [Moore and Newman, 2000]. However, some dis- eases spread via a different network than just generic human interactions. For instance, the spreading of sexually-transmitted disease in a population of heterosexual individuals can be modelled on a bipartite graph. In such graphs, it is possible to totally eliminate the pos- sibility of disease outbreaks, regardless of the transmissibility of disease, by adjusting the structural properties of the graph [Newman, 2002].

1.3 The brain as a complex system

The brain may also be modelled as a complex system. It has an obvious fundamental unit: the neuron, which communicates with its neighbours via their axons and dendrites. There are many models of neuron behaviour, with varying degrees of sophistication and biological relevance [Izhikevich, 2007]. A statistical physicist may discuss the behaviour of ensembles of such neurons and aim to explain the emergent properties of the brain in such a manner. This approach has been used to explain the apparent presence of scale-free cascades of brain activity. The evidence for this behaviour will be elaborated on in §2.2. One challenge in modelling the brain as a complex system is its scale. There are over a hundred billion neurons in the human brain, making full-scale simulations of the brain currently infeasible [Herculano-Houzel, 2009]. Additionally, experimental tools that neuro- scientists use operate at very different resolutions and scales. Mapping out brain networks called connectomes in living animals is typically done using various types of magnetic reso- nance imaging (MRI) [Basser et al., 2000, Oh et al., 2014, Sporns et al., 2005]. MRI typically has a resolution no finer than 1 mm3, meaning that it can only identify connections between brain regions. At the mesoscale, a map of axonal connections for neurons of specific types can be accomplished with genetic labelling of cells, however since this can only be accomplished

3 by labelling specific cell types in different animals, it is necessary to average across multiple animals to obtain a representative network [Oh et al., 2014]. To obtain a connectome at the finest scale, capturing the physical connections (synapses) between individual neurons can be accomplished in a single animal using finely-sliced persevered neural tissue and an electron microscope. However, this process is slow, and thus far whole brain connectomes at the synaptic level have only been recently accomplished for larval zebrafish [Hildebrand et al., 2017] and fruit flies [Zheng et al., 2018]. Thus, it is most typical to model neural systems at one of two scales. Fine-scaled models of neural systems take neurons to be the basic unit of the system. Such models might include equations describing the ionic currents flowing through the cell membrane, internal dynamics such as protein coupling cascades, and delays in signal propagation [Izhikevich, 2007]. Coarsely-grained models of neural systems instead choose entire brain regions to be the basic unit of the system. These brain regions are typically assumed to be anywhere from a few hundred to a few million neurons in size, and rely on the fact that neighbouring neurons often have correlated activity. These models might describe the average depolarization or firing rate of different populations of cells within the region as in neural mass models, or might be as coarse as a binary variable representing an “active” or “inactive” region [Breakspear, 2017]. Simple models at both scales can successfully replicate many experimental observations at their respective scales. Finely-grained models of neurons can reproduce experimentally cascades of neural activity [Pasquale et al., 2008, Poil et al., 2012]. Meanwhile, coarse models of neural behaviour, when run on experimentally observed human brain networks reproduce characteristic patterns of activity called “resting state networks” (RSNs) [Haimovici et al., 2013, Hansen et al., 2015]. Typically, both finely-grained and coarsely-grained models of neural dynamics ignore spontaneous activity. Isolated neurons, even in the absence of other brain activity, will sometimes spontaneously fire. This spontaneous noise is a source of the scale-free neural

4 cascades mentioned earlier [Orlandi et al., 2013]. Usually, however, it is assumed that after spontaneous activity initiating the cascade, no other spontaneous activity occurs during the cascade. Systems that satisfy this assumption are said to exhibit a “separation of timescales”, which is popular starting point for statistical physics models. The aim of this thesis is to address this separation of timescales violation, by enriching neuron models with spontaneous activity as a fundamental ingredient, and examining how the system’s emergent behaviours are altered in the presence of this noise. This will be accomplished through a combination of extensive computer simulation and analytical techniques.

5 Chapter 2

Criticality in Neural Systems

2.1 A brief review of criticality

Before we describe the evidence for criticality in the brain, it is necessary that we are able to recognize criticality in generic systems. Of principal interest in this thesis will be critical points dividing two distinct phases. “Criticality” therefore describes a system that displays properties consistent with operating close to a continuous (i.e. second-order) phase transition [Kardar, 2007]. Critical points are characterized by the presence of power-laws. For a generic observable X, it is common to observe that for some control parameter T close to a critical point, that X observes the scaling relation

γ γ X ∝ (T − Tc) = T , (2.1)

for some scaling exponent γ, and where T here denotes T − Tc. We say that power-laws are scale-free, because if we rescale the control parameter T by some constant, say C, then the scaling relation remains unchanged, as

γ γ  T   T  X ∝ Cγ ∝ . (2.2) C C

6 Since both Equation-2.1 and Equation-2.2 are valid, both the reduced temperature T and the rescaled reduced temperature T /C are in some sense equivalent. This is atypical for a physical phenomenon, where there is usually some typical scale for the problem. For instance, in radioactive decay the number of nuclei surviving at time t is given by n(t) ∝ 2−t/τ1/2 , which has a characteristic scale of τ1/2, the half-life. Rescaling the time in the decay process by some arbitrary constant t → t/c simply results in a rescaled version of n(t). Indeed, if a function f(T ) rescales as f(aT ) = g(a)f(T ) for some function g of the rescaling constant, then by setting T = 1 we know f(a) = g(a)f(1), so we can write

f(aT ) = f(a)f(T )/f(1) . (2.3)

Taking a derivative with respect to a, we find

T f 0(aT ) = f 0(a)f(T )/f(1) , (2.4) and setting a = 1, we have a separable first-order differential equation in T that can be solved to yield log(f f(1)) = log(T f 0(1)) + C, (2.5) for C = log(f(1)) which yields f(T ) = f(1)T f 0(1)/f(1) (2.6) which is of course a power-law, with two free parameters that choose the normalization and the exponent. This means that scale-free behaviour necessarily implies a power-law. Of course, the use of the symbol T for a control parameter is no accident. The canonical example that statistical physicists draw upon is that of the ferromagnetic Ising model, for which the control parameter is the temperature, T [Christensen and Moloney, 2005, Kardar, 2007]. The Ising model describes the behaviour of a lattice of spins that interact with their nearest neighbours. For lattices with dimension larger than two, the lower critical dimension

7 for the Ising model, the Ising model predicts a phase transition between the ordered phase where most spins align and produce a net magnetization (the ferromagnetic phase), and a disordered phase with randomly aligned spins and no net magnetization (the paramagnetic phase). This describes well the experimental observation that raising a ferromagnetic ma- terial above its Curie temperature destroys the bulk magnetization. Near to that critical point, several quantities vary as a power-law. For instance, the net magnetization varies as

β M ∼ (−T ) for T < Tc. Another relevant quantity is the correlation length. This quantity measures the scale over which correlations between neighbouring spins decay. In the Ising

model, the state of spin σi is ±1. The correlation between two spins σi, σj might be measured

−|i−j|/ξ as hσi; σji = hσiσji − hσiihσji ∼ e where ξ denotes the correlation length. This corre- −ν lation length between spins is asymptotically ξ ∼ T for temperatures near to the critical point. As ν > 0, the correlation length goes to infinity as the temperature approaches the critical point. Consequently, near to the critical point fluctuations grow arbitrarily large. Although the value of the Curie temperature depends on the properties of the material under consideration, the critical exponents are shared across different materials, and depend only on the dimensionality of the system [Christensen and Moloney, 2005]. A similar phenomenon occurs at the critical point of the liquid-gas phase transition. Near the liquid-gas phase transition, particles can coordinate on long scales. The correlation −ν length here also displays a power-law divergence, ξ ∼ T near to the critical temperature. These long-distance correlations can become large enough to scatter light, and result in what is known as critical opalescence, where the fluid becomes milky and pale. Intriguingly, the correlation length exponent ν ≈ 0.63 in the liquid-gas transition agrees with that of the correlation length in the Ising system in three dimensions. Indeed, perhaps surprisingly, all critical exponents are shared between these two systems [Yang and Yang, 1964]. In both systems, the correlation length diverges near to the critical point. The diverging correlation length justifies a coarse-graining process known as renormalization, in which the microscopic details of the model wash away [Kardar, 2007]. In both the 3-dimensional Ising

8 ferromagnet and the liquid-gas transitions, the coarse-grained system rescales in the same way owing to their shared symmetries, which leads to the same critical exponents, and places them in the same universality class. Generally speaking, if two systems can be shown to fall into the same universality class, regardless of their microscopic dynamics, they will share the same bulk behaviour near to their critical points.

2.2 Experimental evidence of neural criticality

The idea that neural systems might naturally operate near to a critical point was suggested in the seminal 2003 paper by John Beggs and Dietmar Plenz, where it was observed that the ensembles of neurons exhibited scale-free cascades of activity that were dubbed “neuronal avalanches” [Beggs and Plenz, 2003]. In this work, slices of rat cortex were cultivated on an eight by eight array of electrodes, which were sensitive enough to detect the electrical activity of the neurons. The activity appeared to exhibit intense cascades of activity which were presumably initiated by a single neuron, as in Figure-2.1. Each avalanche was therefore defined to be a period of activity bounded on each side by a period of quiescence. The avalanche’s size was defined to be the number of action potentials detected during the cascade. It was observed that the avalanche sizes were approximately power-law distributed, with the probability of an avalanche of size S being P (s) ∝ S−τ with τ ≈ 3/2 and an avalanche of duration T being P (T ) ∝ T −α with α ≈ 2. Subsequent studies, with larger multi-electrode arrays and higher time resolutions, have confirmed this basic result [Friedman et al., 2012]. To explain this apparently scale-free behaviour John Beggs proposed that the brain operates near to a critical point of a continuous phase transition [Beggs and Plenz, 2003]. In addition to reporting the distribution of avalanches, John Beggs also observed that this distribution of avalanches matches the predicted exponent for a mean-field branching process near criticality. In a branching process, each node activates connected daughter

9 Figure 2.1: Neuronal avalanches, reproduced from Beggs [Beggs and Plenz, 2003]. Top: each point indicates the detection of an action potential at that electrode label. Bottom: Detail showing the evolution of a single avalanche. nodes with some probability p (see §3.3.1 for details). The branching process falls into the universality class of directed percolation. To explain why criticality might appear in these neural networks he noted that a branching process has two phases: a sub-critical phase in which activity dies away and a super-critical phase where activity explodes to take over the system. He likened the super-critical phase to epilepsy, an undesirable neurological disorder in which rampant neural activity induces seizures, and the sub-critical phase to coma, where neural activity dies away. He observed from simulations of the branching process on a feed- forward (i.e. loopless) network, criticality maximizes information transmission. Although measurements of neural cultures matched many predictions of a critical branch- ing process, including the relationships between critical exponents [Friedman et al., 2012], if neural avalanches are truly scale-free then similar behaviour should be observed at the scale of the whole brain. To this end, Haimovici studied cascades of activity at the scale of the whole brain using functional magnetic resonance imaging (fMRI) [Haimovici et al., 2013].

10 fMRI measures the “blood-oxygen level dependent” (BOLD) signal from 3-dimensional re- gions known as voxels. BOLD is thought to reflect the increased regional metabolic load that corresponds to neural activity, making it a way to noninvasively measure brain activity [Shmuel et al., 2006]. In the work of Haimovici et al., the BOLD signal was converted from a continuously- varying signal into a point-process by labelling each point at which the BOLD signal passed a certain threshold (typically 2 standard deviations from the mean). In this way, a spike train like that of Figure-2.1 could be produced, except with each “spike” corresponding to the activation of a brain region instead of a single neuron. However, no periods of silence existed to delineate the boundaries between avalanches. This reflects the fact that spontaneous activity, even if rare in a neuronal culture of a few thousand neurons, will be pervasive in a sample of a hundred billion neurons. To identify avalanches, Haimovici et al. needed to separate causally disconnected activations. To do this, Haimovici instead studied the evolution of clusters of activity. A new cluster would form whenever a spike was registered with no neighbouring regions spiking before it. Any subsequent neighbouring activations were then added to that cluster until activity died away. At any given moment, the number of neuronal cascades varied but was typically less than 100. The resulting distribution of cluster sizes was also found to be P (S) ∝ S−τ with τ ≈ 3/2 over approximately four orders of magnitude. For a more thorough survey of the experimental evidence of criticality in the brain, I recommend the following reviews [Breakspear, 2017, Chialvo, 2010, Cocchi et al., 2017].

2.3 Modelling criticality in the brain

Muddying the picture of criticality in the brain are observations of power-law exponents that do not always match the mean-field exponents of directed percolation. Indeed, observations of neuronal avalanches in neural cultures raised in adverse conditions also exhibit apparently

11 Figure 2.2: The basic anatomy of a pair of ideal neurons. The neuron outlines are replicated from [Mel, 1994]. scale-free behaviour, but with a different collection of critical exponents than directed per- colation [Yaghoubi et al., 2018]. Additionally, power-laws alone are not enough to signify a critical point: there are many statistical processes that generate power-laws [Newman, 2005]. The observation that neural avalanche statistics observe universal scaling [Friedman et al., 2012] has also been argued to be inadequate, as non-critical processes can also generate power-laws and universal scaling [Touboul and Destexhe, 2017]. Although it is suggestive that power-law exponents that neural systems exhibit do obey the hyper-scaling relations predicted by theories of critical phenomena, to argue that the brain is critical we need more than just statistical evidence. It is also necessary to have accurate and realistic models of neural dynamics that exhibit a phase transition. It also remains to explain why evolution should produce brains that operate close to criticality in the first place. To that end, models of neural dynamics must be related to their capacity for information processing and storage. In this section, we will review a few models used for these purposes.

2.3.1 Hodgkin-Huxley and other “biological” dynamical neuron

models

The most critical aspect of neurons, at least for information propagation and processing, is the voltage across their cell membranes. By means of ion pumps, the concentrations of positively charged sodium, potassium, and calcium ions, as well as negatively charged chlo-

12 rine ions can be maintained inside the cell at concentrations different from the extracellular environment. This chemical gradient results in a voltage across the cell membrane, which in homeostasis is typically maintained at approximately -70 mV. For an extended discussion on the basic cellular biology and electrical properties of neurons, see chapter 4 of [Kolb and Whishaw, 2009]. Communication between neurons is typically done by way of chemicals known as neurotransmitters. Neurons are cells with four parts: the dendrites, cell body, axon, and synapses (see Figure-2.2). When neurotransmitters reach the dendrites of a neu- ron, they induce voltage changes in the cell membrane. These voltage perturbations are effectively summed in the cell body. Should it exceed a certain threshold voltage, ion chan- nels in the axon will open, and initiate a voltage cascade down the axon known as an action potential. When the action potential reaches the axon terminal, the neuron releases neuro- transmitters to its daughter neurons, and thereby propagates information. If that voltage threshold is not reached, no action potential occurs, and no information is propagated. For an extended discussion on the different classes of neurotransmitter, their various effects on neurons, and communication between neurons, see chapter 5 of [Kolb and Whishaw, 2009]. The first mathematical model for neurons was the Hodgkin-Huxley model [Hodgkin and Huxley, 1952], for which its namesakes received the 1963 Nobel prize in physiology. The Hodgkin-Huxley model is a set of four coupled differential equations, which represent the opening of ion channels, the flow of ions, and the evolution of the membrane potential [Izhikevich, 2007]. Inputs from other neurons can be modelled as the injection of currents. These currents, along with the Hodgkin-Huxley differential equations, can be integrated and can reproduce the firing of action potentials observed in real neurons. For this reason, the Hodgkin-Huxley model is said to belong to an extensive class of neuron models known as “Integrate-and-fire” neurons [Izhikevich, 2007]. In practice, these models are never solved analytically but instead, ensembles of neurons obeying these dynamics models are simulated. Simulating such models typically involves inte- grating a collection of differential equations that represent various physiologically-motivated

13 quantities. In the framework of criticality, simulations of physiologically-motivated neurons can reproduce power-laws; however, this can require extensive fine-tuning of different model parameters. This fine-tuning might be expected if a critical point underlies these power-laws. A typical parameter that must be fined-tuned is the coupling strength between neurons. Be- low a certain threshold, activity tends to die away, while above a certain level, activity tends to occupy much of the system, which again fits the rough picture of a phase transition. Sim- ulations of ensembles of Hodgkin-Huxley neurons have shown that their dynamical range is maximized at criticality, suggesting that criticality is optimal for encoding and responding to stimuli [Copelli et al., 2005, Kinouchi and Copelli, 2006]. Other biologically-motivated neuron models attempt to answer the question of how neural systems tune themselves to a critical point. In statistical physics it is not unheard of for systems to self-tune to their critical points; such systems are said to exhibit self-organized criticality (SOC) [Bak et al., 1988]. One canonical example is the appearance of power- law distributions in the extent and frequency of wildfires, where growth pushes forests into a critical state, making a large fire possible, which then lowers the system to a sub-critical state [Malamud et al., 1998, Ricotta et al., 1999]. In the context of neural systems, this can be accomplished by enriching the neuron model with self-tuning properties, that are typically likened to the homeostatic mechanisms present in real neurons [Hesse and Gross, 2014]. Negative feedback mechanisms, such as synaptic depletion [Levina and Herrmann, 2006] and synaptic adaptation, serve to regulate excess activity. Positive feedback mechanisms, such as the spike-timing-dependent-plasticity (STDP) associated with learning, serve to increase activity propagation [Kolb and Whishaw, 2009]. Models that include both positive and negative feedback mechanisms and that appear to tune to a critical point are thought to be examples of SOC. SOC with neuron models has been demonstrated in deep-learning models [Del Papa et al., 2017], as well as simpler automaton models [de Andrade Costa et al., 2015]. In addition to synthetic models, biologically-motivated models of self-tuning with feedback have exhibited SOC [Hesse and Gross, 2014, Kossio et al., 2018, Millman et al.,

14 2010, Orlandi et al., 2013]. Some models also include network dynamics by including the rewiring of neuronal connections as a component of their model [Stepp et al., 2015, van Kessenich et al., 2016, 2018]. Although we can produce richly-detailed models and show that they are sufficient to reproduce phenomena in the brain, it is also important to know what is necessary to re- produce phenomena in the brain. For instance, suppose we were interested in studying the emergence of the RSNs observable in fMRI. Is it the network architecture of the brain that produces and determines the RSNs? Are regulatory mechanisms like inhibitory connections necessary to produce the RSNs? We can answer these types of questions by beginning with simple models, with as few assumptions as possible, and enriching them until we observe the behaviour we are interested in. For this reason, we will also introduce two simple neuron models that have been widely applied: the branching process and the contact process.

2.3.2 Branching processes

As information-processing units, neurons have an important characteristic, which is that they propagate information via an all-or-nothing signal. They either undergo an action potential in response to their parents’ stimulus, or they do not. The branching process is the result of throwing away all the internal information about the neuron (e.g. ion flow, membrane potential, etc.), and instead simply treating its firing stochastically. In the branching process, when a neuron at time t fires, it induces its daughter to fire with probability p at time t + 1. An example of this process is given in Figure-2.3. This discretization of time is typically justified by appealing to the fact that the action potential has a characteristic scale of 1 to 2 ms [Beggs and Plenz, 2003]. Although extremely reductionist, this model is useful for a few reasons. It is analytically tractable and is exactly solved in mean-field. Owing to this, we know that it exhibits a phase transition when the mean branching ratio σ, the average number of immediate descendants of a firing neuron, is one. Below this, activity inevitably dies away, and above this, there

15 Network topology:

0 1 2 3 4

Example dynamics: t = 0

t = 1

t = 2

t = 3

t = 4

t = 5

t = 6

Figure 2.3: An example of a branching process on a simple linear bidirectional network (shown at the top). The dynamics consists of a single cascade initiated at node 1 at time t = 1. As connections here are recurrent, nodes can be reactivated, as occurs at node 1, at time t=3 and node 3 at time t=5.

is a finite probability that any activation leads to an infinitely large and long cascade. Its critical exponents near this point fall into the universality class of directed percolation (see §3.3.1) which, as was noted in 2003 [Beggs and Plenz, 2003], agrees with neural cultures. The branching process is the basis for a broader family of stochastic models, in which neurons may have multiple parents, so that the probability of a neuron firing in a given time- step is generically P (m), where m is the number of active parents. Such examples are more likely to have analytical solutions. One example of P (m) is the quorum percolation model, in which P (m) = H(m − m0), where H is the Heaviside step function, so that a neuron fires only if it has exactly m ≥ m0 parents. This reflects the fact that real neurons will almost invariably fire if a sufficient input is applied. One of the findings of the quorum percolation model, when applied to living neural networks, is the existence of a non-zero steady state, in which some fraction of the nodes are always active [Cohen et al., 2010]. In general, if P (m) is monotonic in m, one may predict a steady state. The prediction of a steady state in

16 neural tissue is unsurprising; healthy human brains are characterized by continuous neural activity, the absence of activity defines coma.

2.3.3 Contact processes

Branching processes are a aimed at modelling the behaviour of individual neurons. However, simulating or modelling every neuron in the brain is presently intractable. If the aim is to reproduce observations made at the whole-brain scale, it is necessary to redefine the fundamental unit of the system in question. One way to do this is to consider mesoscopic functional units such as a micro-cortical column or other brain regions, which are assumed to consist of a few hundred cells to a few million. This is a form of temporal and spatial coarse-graining, and there exist many models for collective neuron dynamics [Breakspear, 2017]. As we are dealing with population dynamics, for which the 1 to 2 ms timescale of in- dividual neurons is much smaller than 100 to 200ms timescales of the brain region [van Den Heuvel and Pol, 2010], we often work with continuous time models [Breakspear, 2017]. We will introduce one family of contact processes, where each node is in a particular state, known as compartmental epidemiological models. Such processes were originally developed for modelling epidemiological processes, where individuals were either susceptible (S), in- fected (I), or recovered (or removed) (R) [Bogu´aet al., 2003]. In neuroscience parlance, these might correspond to inactive, active, and refractory states. The contact process is defined by the transition rates between each state, and are typi- cally named by which states are accessible, and in which order. For instance, the susceptible- infected (SI) process models a population of susceptible individuals, that may become sick without any hope of recovery. This might be appropriate for modelling a chronic illness, such as the human immunodeficiency virus. Diseases, where recovery from the disease con- fers immunity, might be modelled with a susceptible-infected-recovered (SIR) model, and a disease in which there is no immunity after being cured might be modelled with a susceptible-

17 infected-susceptible (SIS) model. Initially, such models considered so-called “well-mixed” populations, where every indi- vidual interacts with every other individual. This leads to very simple dynamics, that allow us to model the evolution of the population with a set of coupled differential equations. For instance, the three compartment SIS model is governed by the following differential equations [Brauer, 2008]: S˙ = −λSI + µI (2.7) and I˙ = −µI + λSI , (2.8) where S + I = 1 are the fractions of the population that are susceptible and infected respec- tively, λ denotes the rate at which the infected spread their disease to the susceptible, and µ is the rate of recovery from the disease. For structured populations, we again begin with rates of infection but assume that nodes can only affect the states of their neighbours. As all updates are controlled by rate equations we will have asynchronous updates characterized by state changes at intervals drawn from an exponential distribution. Such systems are typically simulated using the Gillespie algorithm or its optimized variations [Cota and Ferreira, 2017]. On both undirected and directed networks, these models have a phase transition. With large λ these models have an epidemic phase, where the population of infected or recovered is a non-zero fraction of the total network size. With small λ, in the infinite system size limit, there is a non-epidemic phase [Ferreira et al., 2012]. The critical λc will depend on the network topology; however, regardless of the topology, near this phase transition, the process falls into the universality class of directed percolation [Ferreira et al., 2012, Kwon and Kim, 2013, Lee et al., 2013, Parshani et al., 2010]. Of course, when the structure of the network simulated is all-to-all, we reproduce the well-mixed populations and the original population differential equations reappear. Although there is extensive literature dealing with the use of coarse-grained contact

18 processes in neuroscience [Breakspear, 2017], there are two examples of the SIS model that I would like to highlight. The first is the work of Haimovici et al., who showed that running the SIS model on the human connectome (the network structure of functional regions of the brain) results in the appearance of the resting state network, but only when the SIS model is tuned to its critical point [Haimovici et al., 2013]. This is a significant result, because it highlights the necessity of both critical dynamics and brain structure to reproduce resting state networks, without requiring any detailed description of the dynamics at the neural level. The second result I would like to highlight also relates to the effects of network architec- ture on large-scale neural dynamics. One challenge to the criticality hypothesis is that the critical point requires fine-tuning. The typical answer to this is that neural systems exhibit self-organized criticality [Beggs, 2008, Beggs and Timme, 2012]. Moretti et al. offered a complementary solution to the tuning problem, by suggesting that the heterogeneity of the network structure of the brain results in an extended critical regime, dubbed a “Griffiths Phase” [Moretti and Mu˜noz,2013]. An extended critical regime would considerably relax the requirement for fine-tuning to a critical point. One notable aspect of the Griffiths phase is the presence of continuously-varying critical exponents, both in the avalanche distribution and in the exponents related to activity decay. Moretti demonstrated this result on hierar- chical modular networks (see §3.1.6 for details), which reflect the tendency for the brain to be both modular, with functions associated to a certain region, and hierarchical, with sub- sequent refinement of function within sub-regions (see [Meunier et al., 2010] for a review of hierarchical modularity in brain networks). It should be noted however, that Griffiths phases have also been observed to exist on more generic modular networks [Cota et al., 2018]. Both results show that some observables are informed more by the underlying network structure than the underlying dynamics.

19 2.4 Noise in the brain

In most models of neuronal avalanches, initiation of activity occurs on a timescale that is distinct from the propagation of the avalanche, meaning that the two phenomena can be separated. However, this is not always a biologically accurate assumption to make. For instance, although typically neurons only release neurotransmitters when undergoing an ac- tion potential, occasionally synaptic vesicles full of a neurotransmitter will be spontaneously ejected from the synaptic bulb. This leads to a small depolarization of the membrane of the daughter neuron, in a process known as a “mini”. These minis play a role in evoking spontaneous action potentials, even when none of the parent neurons have undergone an action potential [Kavalali, 2015, Sara et al., 2005]. Further, even if the minis alone do not cause a neuron to fire, they can serve to change the relative propensity to fire; a neuron that has recently been exposed to a mini may be partially depolarized, making it more likely to fire due to other inputs from a parent neuron. Minis are not the only source of noise. More generally speaking, unless one is simulating the entire brain, there will be neurons outside of the simulation whose activity may occasion- ally impinge upon the simulation and drive activity therein. This is not a problem unique to the constraints of simulation. In experiments studying neuronal cultures, there are often neurons outside of the field of view whose activity evokes activity within the field of view [Wilting and Priesemann, 2018]. Connections to neurons outside of the region of interest can be viewed as another source of noise. Lastly, there also exist sensory neurons who respond to stimuli and forces outside of the neural network. These outside stimuli also comprise a source of noise.

2.4.1 The effect of noise on observables

The presence of noise significantly complicates the notion of an avalanche as a cascade of causal activity. To illustrate the problem, imagine that the criticality hypothesis holds on the

20 Overlapping avalanches distort avalanche statistics 100 Avalanche triggering rate: 0 Avalanche triggering rate: 10−3 Avalanche triggering rate: 5 × 10−3 10−2 Avalanche triggering rate: 10−2 Avalanche triggering rate: 2 × 10−2 ) S ( 10−4 P

10−6 Probability

10−8

10−10 100 101 102 103 104 105 106 107 108 Avalanche Size S

Figure 2.4: The results of overlapping avalanches, when avalanches are initiated as a Poisson −3/2 process of various rates. Avalanche sizes are drawn from√ a pure power-law, P (S) ∼ S , and avalanche durations are assumed to scale T ∼ S, with time rescaled so that the duration of a size 1 avalanche is T = 1. If another avalanche is triggered in the timespan of the first, their sizes are added and the length of the avalanche is potentially increased, possibly including another independent cascade.

21 scale of a single neuronal culture and that neuronal avalanches are distributed P (s) ∝ s−1.5 √ with the duration of a neuronal avalanche being related to its size by hT i(s) ∝ s, as is the case in the branching process. If we consider a large ensemble of decoupled patches of neurons, the initiation of neuronal avalanches in each patch will be independent and presumably uncorrelated. Therefore, the overall initiation of avalanches will be Poisson, and it will be possible that spatially distinct avalanches will overlap in time. If we simulate this process and require that a “single” avalanche be separated by a period of quiescence, then for quite moderate rates of initiation of avalanches, we find significant deviations from a single pure power-law, as in Figure-2.4. Over the first several decades of the probability distribution, the associated power-law exponent depends on the level of spontaneous activity. Due to experimental limitations, it is typical to offer far less than four orders of magnitude as evidence for scale-free behaviour [Friedman et al., 2012, Tagliazucchi et al., 2012]. What this shows is that were an experimentalist to observe a large neural system and use the definition of avalanche owing to Beggs et al., their observations of avalanches might be biased by non- causal temporally-overlapping cascades. The experimentalist might therefore report different “scale-free” exponents than are actually present in the underlying avalanche process. To overcome the challenge presented by noise, there have been several efforts to gen- eralize avalanches in such a way as to retain the causal aspect of avalanches. One result that we already highlighted was the efforts of Tagliazucchi et al. in section 2.2, who was trying to observe avalanche phenomena in fMRI data. Because at any given moment several independent cascades of activity were travelling across the cortex, they separated activity spatially, by assuming that neighbouring brain regions had the strongest influence on each other [Tagliazucchi et al., 2012]. This was essentially the same approach employed at the mesoscale in mice brains using optical imaging. At any given moment there were numer- ous patches of active cortex, these clusters were labelled independently and merged upon contacting each other [Scott et al., 2014]. The use of structure to help disentangle independent causal activity has led to an interest

22 Figure 2.5: Causal webs can be used to distinguish spatially distinct events, as well as the progenitor events in avalanches. On the left are the spike trains observed in the neurons on the right. There are two causal webs of size three, as well as a causal web of size 4. Under the traditional model of avalanches, with avalanches delineated by periods of silence, there would be two avalanches: one of size six and one of size four. in generalizing avalanches to a new structure known as a “causal web” or cweb [Williams- Garcia et al., 2017], which makes use of the network structure in identifying causal activity (see Figure-2.5). In the limit of low rates of spontaneous activity, where every cascade has only a single progenitor and there is a separation of timescales between initiation and propagation of activity, cwebs are exactly the traditional avalanches. Thus, in the low-noise limit, cwebs can be scale-free. In more active systems, cwebs allow independent coincident cascades to be separated. Demonstrating that systems with high levels of spontaneous activity can also produce scale-free distributions of cwebs will be one of the principal aims of this thesis. Determination of cwebs requires a knowledge of the network structure. This is a challenge to experimental use, as obtaining the complete network topology of living neural networks is an open problem. Hence, it’s also of interest to study how other, more experimentally- accessible indicators of criticality change in the presence of noise. These indicators might be measurements of the local branching ratio [Beggs, 2008], the susceptibility [Moretti and Mu˜noz,2013], or the size of fluctuations in the active fraction of neurons [Williams-Garc´ıa et al., 2014]. It has already been shown that in the presence of noise, some measures of the

23 correlation length no longer diverge, but instead reach a maximum known as the Widom line [Williams-Garc´ıaet al., 2014]. The measurements of the branching ratio will also be affected by the presence of spontaneous activity, as noise or other active parents help drive a daughter neuron to activate. This disrupts one of the other classical measures of a “critical” state – which is that the branching ratio is 1 [Beggs, 2008, Poil et al., 2008, Shew and Plenz, 2013].

2.4.2 Modelling noise in the brain

Numerous studies simulating neuronal avalanches begin with the assumption of a separation of time scales between the resolution and initiation of avalanches, by initiating a new cascade whenever one finishes [de Andrade Costa et al., 2015, Girardi-Schappo et al., 2016, Moretti and Mu˜noz,2013, Odor, 2016, Plenz, 2012, Williams-Garc´ıaet al., 2014]. However, there are several studies [Orlandi et al., 2013, Poil et al., 2012] which include noise in their dynamics to drive the production of avalanches, though none have made a systematic study of the appropriate level of noise. Typically, these models assume a very low level of homogeneous spontaneous activity, so that they effectively only have one avalanche at a time. In dynamical models that describe the membrane potential, noise plays an important role in depolarizing the membrane and making it easier for a neuron to trigger other neurons [Orlandi et al., 2013]. It has been observed that noise can be focused into coherent activity and that this focusing behaviour is determined by the network structure [Orlandi and Casademunt, 2017, Orlandi et al., 2013]. A variant of the branching process known as the cortical branching model has been used to illustrate the distortion of power-laws in the presence of noise [Williams- Garcia et al., 2017]. There, noise led to extended avalanches and non-scale-free avalanche distributions. A recent preprint studying the effect of ongoing noise during avalanches, but under the assumption of a separation of timescales between avalanches, has shown that noise can change the underlying exponent characterizing the resulting avalanche distribution [Das and Levina, 2018]. However, this work did not consider the impact of possibly distinct, but

24 concurrent, cascades on the avalanche distribution. Finally, no equivalent work has studied which, if any, critical exponents describe the causal web distribution in the presence of noise.

2.5 Summary

In this chapter, we have introduced neuroscience’s criticality hypothesis, the notion that neural systems operate at a critical point. This hypothesis is motivated by experimental observations of power-laws in neuronal cultures and fMRI studies. Theoretical models also predict the same power-laws at a phase transition and by use of homeostatic mechanisms explain how the brain might self-tune to this critical point. Although there exist sophisticated models of neuronal behaviour, it turns out much of the biological detail is unnecessary when it comes to producing the aggregate observations of neurons. The most necessary ingredient is that networks have excitable nodes that can spread their activity. Simple branching and contact processes are sufficient to reproduce many results related to neuronal avalanches. However, few of these theoretical models include noise. Noise is a significant complication for the traditional observables related to criticality, such as scale-free avalanches and the branching ratio. One possible generalization of neuronal avalanches, the causal web, will be an object of study for the remainder of this thesis.

25 Chapter 3

Mathematical Background

In this chapter, I will introduce some necessary background material on the topology of networks using the language of graph theory, as well as introduce the dynamics of systems on complex networks in the language of percolation.

3.1 Random graphs and network theory

Here I will introduce the terminology of network theory and methods for constructing graphs. A graph is a structure G consisting of two sets, a set of N labelled vertices N (also known as nodes) and L links L (also known as edges). Each vertex is labelled i ∈ {1,...,N}.

Each link can be represented as (i, j) ∈ ZN × ZN , and represents a connection between two vertices. In the networks we study, we do not allow self-links. Graphs can be considered directed or undirected. In the directed case, the link lij denotes a connection from i to j. In

the undirected case, lij denotes that the connection from i to j is reflexive, i.e. that there is also a connection from j to i. An undirected graph with L links can always be represented

0 as a directed graph with 2L links, by replacing each of the L links lij with two links: lij and

0 0 0 lji. A subgraph H is a graph whose vertices N and links L are themselves subsets of the vertices and links of another graph G so that N 0 ⊆ N and L0 ⊆ L. The underlying graph of a directed graph is the undirected graph obtained by replacing all directed links in the

26 directed graph with undirected links. A complete graph is an undirected graph for which every distinct pair of vertices are connected by a single link.

Two vertices i and in in a directed graph are considered strongly-connected if there

exists a subset of links of the graph lii1 , li1i2 , . . . , lin−1in that form what is known as a path between the two vertices. A graph is considered strongly-connected if every pair of vertices i, j with i 6= j are strongly-connected. Two vertices are considered weakly-connected if there exists a path between the two vertices in the corresponding underlying graph. A directed graph is weakly-connected if every pair of vertices i, j are weakly-connected. The giant strongly-connected component (GSCC) is the largest subgraph of a di- rected graph that is strongly-connected. The in component of the GSCC is the set of vertices reachable from the GSCC by following incoming links, or equivalently, it is the set of all the vertices that can reach the GSCC by following the directed outgoing links. The out component of a GSCC is the set of vertices reachable from the GSCC by following outgoing links from the GSCC. If a graph is strongly-connected, then the GSCC, in compo- nent, and out component are identical and constitute the entire graph. Similarly, the giant weakly-connected component (GWCC) is the largest subgraph of a directed graph that is weakly-connected. The distance between two vertices is the length of the shortest path between them, should such a path exist. Should it not, the distance is considered to be infinite. The diameter of a graph is the greatest distance between any two vertices in the graph. The out-degree of a vertex is the number of links that originate at the vertex. The in degree of a vertex is the number of links that terminate at a vertex. The degree of a vertex is the sum of the in- and out-degrees of the vertex. In undirected graphs, the degree is just the number of links involving the vertex.

27 3.1.1 k-ary trees

A rooted k-ary tree is a type of weakly-connected directed graph with no loops. It has a singular vertex of in-degree 0 called the ‘root’ from which all other nodes can be reached, and all other vertices have in-degree 1 [Graham et al., 1989]. The complete k-ary tree has an out-degree of k at each vertex and is therefore an infinite graph. An incomplete k-ary tree has an out-degree of at most k at each vertex. The number of rooted, incomplete, k-ary trees with s vertices is given by the Fuss-Catalan numbers (per page-347 of [Graham et al., 1989]) 1 ks C(k) = . (3.1) s (k − 1)s + 1 s

Each such tree has a perimeter t, the number of additional edges that each vertex could add while keeping the tree k-ary, which is linearly related to the size of the tree by

t = (k − 1)s + 1 . (3.2)

3.1.2 k-regular graphs

An undirected k-regular graph is a graph where the degree of each vertex is exactly k. A directed k-regular graph is a graph where the in- and out- degrees of each vertex is exactly k. An example of an undirected 2-regular graph is a ring graph, where each node can be embedded on a ring, and is connected to its nearest neighbour in either direction. The strongly-connected 1-regular graph is a ring of vertices, all connected to their nearest neighbour in the (without loss of generality) clockwise direction. Although k-regular graphs may be highly structured, as in the previous examples, for most of this thesis we will be interested in random-graphs. A random k-regular graph with n vertices is a graph selected from the set of all k-regular graphs with n vertices, with equal weight on each graph in the

1 set, so that P (G) = Ω where Ω denotes the number of k-regular graphs with n vertices [Bollob´as,2001, Newman, 2010]. As the undirected k-regular graph is fairly homogeneous, it

28 shouldn’t be surprising that the diameter d of the random k-regular graph has the relatively tight bound

log(2kn log n) 1 + blog nc + blog (log n) − log (6k/(k − 2))c ≤ d ≤ 1 + (3.3) k−1 k−1 k−1 log(k − 1) per [Bollob´as,2001]. Generating finite k-regular graphs is not particularly difficult. To accomplish this, we will introduce the configuration model of graph generation. In general, the configuration model allows for the generation of a random graph with a presupplied (in/out) degree distri- bution, although in the case of a random k-regular graph, the degree distribution is simply in/out p (m) = δmk.

The configuration model

We will consider the configuration model for graphs of N, each with uncorrelated in- and out-degree, so that p(j, m) = pin(j)pout(m) where p(j, m) denotes the probability of a random vertex having j in-degree and m out-degree. Each vertex i is assigned ji incoming ‘stubs’ and

in out PN PN mi outgoing ‘stubs’, drawn from p (j) and p (m) such that i mi = i ji. The incoming and outgoing stubs are randomly paired off. In principle, for degree distributions pin(j) and pout(m) with a well-defined variance this procedure will succeed in generating a simple graph (i.e. with no loops or duplicate edges) a non-zero proportion of the time [Chen and Olvera- Cravioto, 2013]. However, this proportion may be very small for even moderate N, and may require that this procedure be repeated many times before successfully generating a random graph. There are a few ways to deal with the presence of duplicate edges or self-connections. One approach is to simple delete the offending edges. This has two problems however: firstly it distorts the degree distribution, and secondly it can bias the graph selection process. This makes analytical results difficult or impossible to obtain [Newman, 2010]. However,

29 in the limit of large N, the fraction of duplicate edges or self-connections vanishes, and asymptotically we recover the same degree distributions [Chen and Olvera-Cravioto, 2013]. Another possible approach, that maintains the degree distribution, is to introduce restric- tions during the graph generation process. We subject the pairing of stubs to two restrictions: (i) an incoming stub of vertex i may not be paired off with an outgoing stub of vertex j if i = j, and (ii) an incoming stub of vertex i may not be paired off with an outgoing stub of vertex j if there already exists an incoming stub of vertex i paired with an outgoing stub of vertex j. These two restrictions prevent the formation of self-connections and duplicate connections between vertices. Should it be impossible to pair off a given stub (say that would require a self-connection), existing pairs are randomly reassigned in accordance with the two previous rules, until it is possible to pair off the remaining stubs. Once all stubs are paired off, the graph may be constructed, with an outgoing stub from i paired to an incoming stub of j constituting a single directed link lij. Introducing these restrictions means that we no longer equally weight all possible graphs that could have been sampled; however, this discrepancy becomes small for large N [Chen and Olvera-Cravioto, 2013]. We’ve also glossed over one of the other challenges to implementation of the configuration model. It is not always trivial to sample two distributions pout and pin subject to the constraint that the sum of the mi and ji are equal, although there exist algorithms that will produce graphs with the same degree distribution for large N [Chen and Olvera-Cravioto, 2013]. For this reason we will only use the configuration model for trivial degree distributions, such as the k-regular graphs.

3.1.3 Erd¨os-R´enyi graphs

An Erd¨os-R´enyi graph may refer to a graph belonging to one of two families of random graphs. In the G(N,L) model, a graph with N nodes and L links is chosen randomly from the set of all graphs with N nodes and L links, with equal weight on all graphs in the set.

30 In the G(N, p) model, a graph is constructed by connecting nodes at random. Each possible link is included with probability p [Newman, 2010]. We will focus on the second family of random graphs, as the construction of random graphs in this class can be easily implemented algorithmically. The (in/out) degree distribution for the G(N, p) model is given by the binomial distri- bution N − 1 p(m) = pmpN−m , (3.4) m

which in the limit N  m and Np constant takes on the form

(pN)me−pN p(m) = , (3.5) m!

which we recognize as the Poisson distribution with mean pN. Hence, the mean degree in an Erd¨os-R´enyi graph is pN, which agrees with the p(N − 1) obtained as the mean of the binomial distribution. When implementing the generation of Erd¨os-R´enyi graphs with fixed pN, the naive algorithm evaluating all N(N − 1) possible links (or N(N − 1)/2 links in the undirected case) runs in O(N 2) time. However, if instead we sample from the Poisson distribution (which runs in constant time using a lookup table) to determine the out-degree for each vertex, and choose destination vertices at random, we instead obtain linear O(N) time. Owing to the original paper in which the G(N, p) model was introduced [Erd¨osand R´enyi, 1960], we know that the graph is almost certainly connected if

pN > log N. (3.6)

3.1.4 Small-world graphs

The random graphs that we have discussed thus far exhibit low clustering. Clustering refers to the propensity of a graph to have nodes that are tightly knit into groups. There are

31 several ways to evaluate the clustering of a graph, including global and local measures. The transitivity index (also sometimes known as the global clustering coefficient), is given by [Wasserman and Faust, 1994]

# of closed triads 3 × # of triangles C(G) = = , (3.7) # number of possible triads # number of possible triads

where a triad is an ordered set of any three vertices. A triad is closed if edges exist (directed or otherwise) between each of the nodes in the triad. The transitivity index is a global measure; however, there exists a local measure, known as the local-clustering coefficient. The local-clustering coefficient of a vertex V in an undirected graph was introduced by Watts and Strogatz to be

2N CC(V ) = V , (3.8) KV (KV − 1)

where NV is the number of connections between neighbours of V and KV is the degree of V [Watts and Strogatz, 1998]. The clustering coefficient varies between 0, where V is the centre of what is locally a star graph, and 1, where V is the centre of a clique (a subgraph where all nodes are connected to all others). From this local-clustering coefficient, a global measure C(G) may be defined on a graph as the mean local-clustering coefficient [Watts and Strogatz, 1998]. We will refer to this as the WS-clustering coefficient (WS denoting Watts-Strogatz).

So for the Erd¨os-R´enyi graph for instance, the number of neighbour interconnections Ni for

a node with Ki edges is just pKi(Ki −1)/2, as each connection occurs with probability p and

Ki there are Ki(Ki−1)/2 = 2 possible connections. Hence the WS-clustering coefficient C(G) for an Erd¨os-R´enyi graph G(N, p) is just p, or if pN is constant, the clustering coefficient is asymptotically ∼ N −1. The transitivity index and the WS-clustering coefficient typically correlate in real-world networks, however it is not uncommon for the WS-clustering coefficient to be higher than the transitivity index [Estrada, 2016]. It’s easy to construct pathological graphs, such as

32 (a) (b)

Figure 3.1: A demonstration of the Watts-Strogatz model. (a) A circulant graph, connecting the nearest two neighbours (giving each node a degree of four) is shown for N = 10. Of the 20 bonds, 4 are selected for rearrangement. (b) The bonds for rearrangement retain one end-point while the other is swapped for another at random.

the windmill graphs, in which the transitivity index tends to 0 as N increases, and the WS-clustering coefficient tends to 1. That real-world networks often tend to exhibit significant local-clustering suggests that there are mechanisms beyond random assortment at play in producing real networks. The canonical example is that of a social network: your friends are more likely to be friends than two random strangers, and hence the WS-clustering coefficient of the world’s social network should be high. This observation prompted Watts and Strogatz to develop what are called “small-world” networks [Watts and Strogatz, 1998]. They are small as they exhibit a short mean-path length: the number of degrees of separation between two randomly-selected nodes is low, yet the local-clustering is still high. This reflects the observation that there are, on average, fewer than six degrees of separation between two randomly-selected individuals on the planet [Christakis and Fowler, 2016]. The original Watts-Strogatz model begins with a circulant graph, where each node i is connected to i − k through i + k nearest neighbours. Subsequently, some fraction p of edges are randomly rewired (see Figure-3.1). In the limit that p is small, the graph is not small-world, but does have high clustering. In the limit that p is close to 1, the graph is small-world, but local-clustering goes to 0. However, there is a broad intermediate regime

33 in which the WS-clustering coefficient is high and the characteristic path length is low. In their original paper, they considered graphs with k = 10, and N = 1000, and found that the characteristic path length decreased by a factor of ten as p rose from 10−4 to 10−2 (whereafter it was approximately constant), even as the WS-clustering coefficient remained approximately constant (changing by less than 4%). Such graphs with a high clustering coefficient and a low mean path length are said to exhibit the “small-world” property. In this thesis, we will generate small-world graphs using a modified version of the Watts- Strogatz model that does not require rewiring [Song and Wang, 2014]. In this model, vertices i, j ∈ {1,...,N} are connected by links with a probability that depends on the distance dist(i, j) = min(N − |i − j|, |i − j|) between the links. This distance measure corresponds to the network distance between the nodes of a ring graph (the first circulant graph) with N

βK vertices. Links are established between nodes i and j with probability N−1 if dist(i, j) > K k while they are connected with probability 1 − β(1 − N−1 ) otherwise. Here, β corresponds to the rewiring probability. The mean degree of each node is K. In the limit that β → 1, we produce Erd¨os-R´enyi graphs belonging to G(N, p = K/(N − 1)). In the limit that β → 0 we have the K-regular circulant graph. Although we will be studying small-world graphs for a variety of β values, we will particularly highlight β = 10−2, as this corresponds to graphs that match the p = 10−2 results of Watts-Strogatz’s original model [Watts and Strogatz, 1998].

3.1.5 Power-law graphs

Real-world networks also often exhibit hubs. A common example of this can be seen in air-traffic. There are a number of major airports through which a large number of flights are routed (e.g. Frankfurt, Heathrow, etc.), with passengers often transferring between flights there. These hubs are typically rare, and have an atypically large in or out-degree. Upon inspecting the probability distribution of the in- and out-degree of these networks, a fat-tailed distribution is therefore observed, with more high-degree nodes than might be predicted by

34 an exponential or normal distribution. For many such fat-tailed distributions, a power-law i.e. pin / out(k) ∼ k−λ , (3.9) is a plausible fit. For example, the link-distribution of websites has been observed to be power-law with λ ≈ 2 [Adamic and Huberman, 2000]. That being said, no finite network can be truly scale-free, as the degree distribution must always be constrained by the number of nodes in the graph. Further, some networks reported as power-laws are later found to be better fit by Weibull or log-normal distributions. This has led to significant debate in the field of network science as to whether networks can be considered scale-free [Broido and Clauset, 2019]. Nonetheless, these apparent power-law degree distributions often lead to such graphs being referred to as “scale-free” networks, although we will here refer to them as power-law networks. There are two schemes for producing power-law networks that we will use in this thesis. When there is no correlation between in- and out-degree, we will employ the configuration model discussed earlier. To produce graphs with a correlation between the degree distributions, we will use Goh’s algorithm [Goh et al., 2001]. So as to avoid disconnected components, we will often impose a truncated power-law for the configuration

−λ model, with p(k) = 0 if k < kmin with p(k) ∼ k otherwise (cf. Figure-3.2).

Goh’s algorithm

−α All N nodes are labelled, {1, ··· ,N}, and to each node i we assign the weight pi ∝ i , P normalized so that i pi = 1, where α is a control parameter that will control the degree distribution exponent λ. The mean degree of the graph k, is fixed, and kN directed links will be assigned. For each link, we choose the source node i and destination node j independently with probabilities pi and pj, redrawing self-connections i = j. This procedure produces a

1+α degree distribution with a power-law tail governed by λ = α , which enables exponents λ ∈ (2, ∞) by varying α ∈ [0, 1). The limiting value of α = 0 produces Erd¨os-R´enyi graphs, which have a binomial distribution, and no power-law tail. As can be seen in Figure-3.3,

35 In- and out-degree distribution of the configuration power-law network 100 In-degree 10−1 Out-degree Underlying distribution 10−2

10−3

10−4

p(k) 10−5

10−6

10−7

10−8

10−9 100 101 102 103 k

Figure 3.2: Degree distribution for power-law networks with uncorrelated degree distribu- tions generated via the configuration model, with λ = 3.5 and kmin = 5, averaged across 500 networks of size N = 105.

36 In- and out-degree distribution of the Goh power-law network 100 In-degree 10−1 Out-degree

10−2

10−3

10−4

p(k) 10−5

10−6

10−7

10−8

10−9 100 101 102 103 k

Figure 3.3: Degree distribution for power-law networks with uncorrelated degree distribu- tions generated via the Goh model, with λ = 3.5 and hki = 10 averaged across 500 networks of size N = 105. nodes of degree 1 are not infrequent, and the overall shape is almost as though one convolved a binomial distribution with a power-law tail. Just as in the configuration model (Figure-3.2) both in- and out-degree share a power-law exponent (Figure-3.3), however, the in- and out- degree of each node is now correlated as can be seen in Figure-3.4.

3.1.6 Hierarchical modular graphs

A graph is considered modular when it can be approximately decomposed into weakly- connected modules, which are densely intra-connected substructures of the graph [Meunier

37 In-/out- degree distribution of configuration power-law networks

3 10 106

105

2 10 104

103

101 102

101

100 100 100 101 102 103 In-degree

(a)

In-/out- degree distribution of the Goh power-law networks 103

105

4 102 10

103

101 102

101

100 100 100 101 102 103 In-degree

(b)

Figure 3.4: The in/out-degree correlations resulting averaged from an ensemble of 500 net- works, both with an asymptotic degree distribution of p(k) ∼ k−3.5. (a) In/out-degree cor- relations for power-law networks generated by the configuration algorithm, as in Figure-3.2. (b) In/out-degree correlations for power-law networks generated by the Goh algorithm as in Figure-3.3.

38 Figure 3.5: Base modules are represented by filled squares. Each base module might contain a dense network of neurons. Modules are wired into pairs – these pairs constitute a super module. Super-module pairings are indicated by a lighter shade of blue. Super-super modules are constructed from pairs of super-modules, and are indicated by the lightest shade of blue. During the formation of the super-super modules, a base module from each of the super- modules is selected, these two base-modules are then wired together as indicated with the lightest-blue edge. A single super3-module is constructed from the two super2 modules, and is indicated in green. Two base modules, one from each super2-module are wired together, this connection is indicated in green. et al., 2010]. A hierarchical modular graph is a graph whose modules are organized hierarchically [Meunier et al., 2010]. We generate hierarchical modular graphs in three stages. We first generate the hierar- chical structure of the modules, then add the vertices within the modules, and finally add edges between vertices so as to fulfil the previously-generated network structure of the mod- ules. We generate the hierarchical structure of the modules following the method of Moretti [Moretti and Mu˜noz,2013]. Modules are paired into super-modules, and super-modules are paired into super2-modules, ad nauseum. When each supern module is formed, a base module from each of the supern−1 modules is selected to form an edge between them. This process produces the hierarchical structure of the modules, and is illustrated in Figure-3.5. The base modules themselves are then populated with NPN intra-modular vertices (intra-vertices) and NPC × (the out-degree of the module) inter-modular vertices (inter- vertices). NPN denotes the number of vertices per module, in absence of intermodular connections. NPC denotes the number of vertices in each module that will be used to con- nect modules. In our simulations, each of the NPN intra-modular vertices connect onto 10 ± 0.5 other vertices (the precise number for each vertex is drawn from a Gaussian dis-

39 Figure 3.6: A simple example of the vertices and edges populating two base modules (coloured blue) connected together to form a super-module (coloured purple). Here the number of intra-vertices per module, NPN, is 5. Each intra-vertice is coloured navy blue. The number of inter-vertices, NPC, is 2, and each inter-vertice is coloured red. Here, the out-degree of each vertex is 2. The inter-vertices only connect to the intra-vertices of the other module, their edges are in purple. Intra-vertices can connect to other intra-vertices or to the inter-vertices of the same module, their edges are shown here in green. The populations of intra-vertices and inter-vertices are circled in light blue.

40 tribution, with standard deviation 0.5) within the module. These connections can be to other intra-vertices or to any of the NPC × out-degree of module inter-vertices. The inter- vertices draw from the same Gaussian to determine their out-degree; however, they connect to the NPN intra-vertices of another module. Which module that is, is determined by the backbone hierarchical module structure. A simplified example of this scheme, for the trivial modular network with two modules is shown in Figure-3.6, with a reduced out-degree of 2 for clarity.

3.2 Percolation

The two most studied areas of classical percolation theory are site percolation and bond percolation [Christensen and Moloney, 2005]. Both site and bond percolation begin with a graph G. In site percolation, vertices of the graph are retained with probability p, while in bond percolation, edges of the graph are retained with probability p. The process of retaining vertices or edges is known as dilution. This makes the generation of Erd¨os-R´enyi graphs G(N, p) a type of bond percolation, if one considers the initial graph to be the complete graph of N vertices. The clusters are the maximal connected subgraphs that remain after dilution. The distribution of cluster sizes is one of the principle quantities of study. Although we’ll be principally interested in percolation performed on random graphs, percolation theory was first explored on highly-ordered graphs known as lattices and there are several aspects of lattice percolation theory that will be useful to us. For percolation on lattices, there are several salient features: (i) the existence of a phase transition as p is increased from 0 to 1 (ii) the emergence of a giant component at that phase transition (iii) power-laws scaling of several quantities near the phase transition. In finite lattice percolation theory a cluster is considered spanning if it extends between two boundaries of the lattice. In the limit of an infinite lattice, clusters that are also infinite are considered percolating clusters [Christensen and Moloney, 2005].

41 (b) Bethe lattice (c) Planar triangular lattice (a) 1-dimensional lattice

Figure 3.7: An example of three infinite lattice structures. (a) the 1-dimensional lattice. (b) The Bethe lattice of degree 4. (c) The 2-dimensional triangular lattice. Percolation on a Finite Linear Lattice

Figure 3.8: Percolation on a 1D lattice of size N = 14. Occupied sites are coloured black. 2 Sampling the seven active nodes, the cluster size distribution is Pn(S = 1) = 7 , Pn(S = 2 3 2 2) = 7 , Pn(S = 3) = 7 . Sampling the four clusters, the size distribution is Pc(S = 1) = 4 , 1 1 7 Pc(S = 2) = 4 , Pc(S = 3) = 4 . Hence, the mean cluster sizes are hSin = 2 and hSic = 4 .

3.2.1 Percolation in 1-dimension

To illustrate some of the key concepts of percolation – the critical point, power-law diver- gences, and correlation length – we will approximately follow the approach to 1-dimensional percolation laid out in Kim Christensen’s book [Christensen and Moloney, 2005]. We can exactly solve the cluster distribution in 1-dimension (see Figure-3.7a). For a finite 1- dimensional lattice consisting of N sites, the probability of a spanning cluster appearing is pN , so it is clear that in the limit that N → ∞ that only for p = 1 does a percolating cluster exist, and this is the only cluster. For p < 1 we can study the distribution of clusters. There are two ways to sample the cluster distribution: we can randomly sample active nodes and ask what the size of the cluster they belong to is, or we can sample clusters and ask what size these are. The difference between the two is illustrated in the caption of Figure-3.8. The different mean cluster sizes of Figure-3.8 reflect what is known as the “Inspection Paradox”, which arises whenever the quantity being sampled is related to the method of

42 sampling. When sampling by active node large clusters are sampled more frequently than

small clusters. For this reason, Pn(S) ∝ SPc(S) and so by normalization are related by Equation-3.10, yielding:

SPc(S) Pn(S) = . (3.10) hSic

s 2 s So in the case of 1-dimensional percolation, we have Pc(S) ∝ pp p and so Pn(S) ∝ s(p) p , and thus by normalization

2 s−1 Pn(S) = S(p) p . (3.11)

Now we see that this equation is dominated by an exponential decay in the tail, so we can

s −1 identify a characteristic size by defining p = exp[−s/sξ] where sξ = log(p) . As the logarithm − −1 diverges as p → 1 we can Taylor expand the logarithm to find sξ = (1 − p) . Hence, the

characteristic scale diverges with a power-law of −1 near the critical point pc = 1. Using the cluster size distribution we can also find the mean cluster size (commonly denoted χ(p)), sampling by active node to be

∞ X 1 + p χ(p) = hSi = SP (S) = . (3.12) n n 1 − p s=1

This diverges as p → 1−, as we would expect, as this is when a percolating cluster appears. Expanding around p = 1, we find that χ(p) ∼ 2(1 − p)−1 also diverges with a power-law of −1. The last quantity that is worth introducing is the site-site correlation function,

g(~ri, ~rj). The site-site correlation function denotes the probability that a site at ~rj is part

of the same cluster as ~ri, given that the site ~ri is active. In this thesis we will use a slightly

1 more succinct notation by redefining g(~ri, ~rj) → 2 g(r) where r = |~ri − ~rj|, which will be less cumbersome to work with when we later also include a time dimension. We will use g(r) to denote the mean number of nodes at distance r within the same cluster as a randomly-

r selected node. Clearly, g(r) = 2p , so we can identify g(r) = 2 exp[−r/sξ]. Hence, sξ corresponds to both the characteristic cluster scale and the correlation length.

43 3.2.2 Percolation on the Bethe lattice

When analysing percolation in one dimension we made tacit use of the fact that there is only one possible configuration when two nodes separated by distance s are connected, or when a cluster is of size s, namely that of a line of active nodes. However, when considering percolation in higher dimensions, there are numerous possible paths that may connect two nodes, making it challenging to calculate the correlation length. Additionally, enumerating all of the possible configurations of a cluster of size s also becomes challenging as soon as the dimension increases. This technical difficulty is typically blamed on the presence of loops [Christensen and Moloney, 2005] and has only been overcome analytically in d = 2 through the use of conformal field theory. There is, however, another lattice in which the complication of loops is avoided: the Bethe lattice (see Figure-3.7). The Bethe lattice is in some sense the infinite N limit of the undirected k-regular graphs. In the limit of N → ∞, the k-regular graph has a clustering coefficient of zero, as the probability of two neighbours also being linked tends to zero. The probability of higher-order cycles (loops of greater length) similarly goes to zero as N → ∞, albeit slowly, for the k-regular graph. Hence, as N → ∞, the loop-less property of the Bethe lattice is increasingly well-respected. For this reason, we might expect that results derived on the Bethe lattice to apply well to the k-regular graph. We will outline some of the analytical results of the Bethe lattice, approximately following [Christensen and Moloney, 2005], as the techniques used to derive these results will mirror those used later in §4.1. In the Bethe lattice it is possible to analytically determine the critical percolation prob-

ability pc. A percolating cluster requires that, on average, every node that we are connected to has at least one other connection. Hence, if k denotes the degree of the Bethe lattice, there are on average, p(k−1) available edges leaving a vertex, not including the edge we took

1 to arrive. Hence the condition for percolation is p(k − 1) ≥ 1 =⇒ pc = k−1 . Unsurprisingly this agrees with the critical percolation probability for percolation in one-dimension when k = 2, as the k = 2 Bethe lattice is the 1-dimensional lattice.

44 It’s also possible to analytically calculate the mean cluster size χ(p), even in lieu of an

exact calculation for pn(S). We will proceed by beginning at a randomly-selected active node, and summing over the contribution to the mean size from each attached edge, B. Hence χ(p) = 1 + kB . (3.13)

We can calculate B by considering a node with k − 1 attached edges. With probability p that node will contribute 1 + (k − 1)B. Hence, B = p(1 + (k − 1)B). Rearranging, we find

p that B = 1−(k−1)p , and so

1 + p p (1 + p) χ(p) = = c . (3.14) 1 − p(k − 1) pc − p

−1 − We again observe that χ(p) ∝ (pc − p) in the limit that p → pc . This exponent of −1 does not depend on the mean degree, and is an example of a universal exponent. We will see this again in §3.2.3, where we will see that many other such exponents depend only on dimensionality and not on other details of the configuration.

Lastly, it is possible to calculate the scaling form of Pn(s). Beginning at a random active node, a cluster of size s has k neighbours, each of which contribute what may be considered a rooted (the initial active node), ordered (to avoid over-counting), incomplete (the tree

does not necessarily have all its leaves), unlabelled (k − 1)-ary tree with si vertices, with the Pk restriction that i=1 si = s. The number of such trees is given by the Fuss-Catalan numbers (k−1) Cs , per Equation-3.1. Hence, the probability for a cluster of size s is

(k) s−1 (k−1)s+1 Pn(s) = Cs p p . (3.15)

√ n n Now, applying Stirling’s approximation (n! ≈ 2πn e ) to the Fuss-Catalan numbers, we

45 find

1 1 r k 1  kk s  kk s C(k) = √ √ ∼ s−3/2 . (3.16) s (k − 1)s + 1 2π k − 1 s (k − 1)k−1 (k − 1)k−1

Hence, we find that the probability distribution for a k − 1-ary tree of size si is given by

−3/2 P (si) ∼ si exp[si/sξ] ,

−1 where sξ =  k−2 . Now the probability for a cluster of size s is given by log p p pc ( 1−pc )

k k X Y X Y −3/2 −3/2 Pn(s) = P (si) ∼ exp[s/sξ] si ∼ s exp[s/sξ] . P P {si}, si=s i=1 {si}, si=s i=1

−5/2 −1 −2 So we find that Pc(s) ∼ s exp[s/sξ] and that as p → pc , sξ ∼ (pc − p) . So just as in

the 1-dimensional case we have an exponential cut-off for all p < pc. We can also identify a suitable correlation length on the Bethe lattice. Although the lattice has no specific spatial embedding, we can again investigate the number of nodes at distance d that are a part of the same cluster. Starting at a random active node, we initially have k branches to choose from, but each subsequent daughter has only k − 1 branches to choose from. Hence, g(d) = k(k − 1)d−1pd which reproduces the 1-dimensional result when

k k = 2. Recasting g as an exponential decay over e we have g(d) = k−1 exp[−d/ log(p/pc)] which invites the definition of a correlation length l = −1 . This quantity also diverges ξ log(p/pc) −1 − at the critical point, with lξ ∼ (pc − p) for p → pc .

Lastly, we can identify the probability P∞ that a given vertex belongs to a percolating cluster of infinite size. We will proceed in a manner similar to the branch-based approach

we used when computing χ(p). Let P∞ denote the probability that a randomly-selected

(not necessarily active) vertex is part of a percolating cluster. We will denote Q∞ to be the

46 probability that a particular edge leads to a finite cluster. Then, P∞ is given by

k P∞ = p(1 − Q∞) . (3.17)

We can calculate Q∞ by imposing a self-consistency relation

k−1 Q∞ = p + pQ∞ , (3.18) which can be rearranged as

k−2 ! X 0 = (1 − Q) 1 − p − p Qm . m=1

Pk−2 m This has the trivial solution Q = 1. p m=1 Q ranges monotonically in value from 0 when 1 Q = 0 to (k − 2)p when Q = 1. Hence there is a unique second solution when p ≥ k−1 = pc, i.e. in the percolating regime. By taking the Taylor series of Qk−1 ≈ 1 − (k − 1)(1 − Q) + (k − 1)(k − 2)(1 − Q)2/2 and solving Equation-3.18 as a quadratic equation we find that we

2(p−pc)(k−1) have the solutions Q = 1 and Q = 1 − k−2 when p > pc. Inserting the second solution into Equation-3.17 we find

2pk(k − 1(p − p )) P ≈ c ∝ (p − p )1 . ∞ k − 2 c

So the size of the percolating cluster grows with another power-law.

3.2.3 Percolation on other graphs

Percolation on two-dimensional and three-dimensional lattices have also been extensively studied. As we will not be studying dynamical processes on lattices, we will only summarize a few salient results, and ignore many of the results related to the fractal geometry of clusters. The critical percolation probability depends on the structure of the lattice under

47 Exponent Diverging quantity d=1 d = 2 SW PL d ≥ 6 −τ 1 τ Pn(S) ∼ S exp[−s/sξ] - 187/91 3/2 2 + λ−2 3/2 −1/σ 1 |λ−3| 1 σ sξ ∼ (pc − p) 1 36/91 2 λ−2 2 −γ γ χ(p) ∼ (pc − p) 1 43/18 1 ± 1 1 β 1 β P∞ ∼ (p − pc) - 5/36 1 |3−λ| 1 −ν 4 1 ν ξ ∼ (p − pc) 1 3 - - 2 Table 3.1: A summary of percolation exponents in different network configurations and di- mension. PL here denotes “power-law” and refers to a random graph with degree distribution −λ p(k) ∼ k . SW here denotes “small-world”. Here sξ denotes the characteristic size beyond which an exponential cut-off appears to truncate the power-law of Pn(S). Results for d = 1 and d = 2 are as given in [Christensen and Moloney, 2005]. Small-world values are as given in [Moore and Newman, 2000]. The PL network values hold for λ ∈ (2, 4). Those with λ < 3 have a percolation transition at p = 1 while the transition is at p < 1 for λ > 3, hence many quantities have singularities at λ = 3. Additionally, the cluster distribution has logarithmic corrections to Pn(S) for λ = 3. γ takes the value +1 for λ ∈ (3, 4) and −1 for λ ∈ (2, 3) [Cohen et al., 2002]. Mean field values, d ≥ 6, are as given in [Christensen and Moloney, 2005]. question (e.g. whether the lattice is a square lattice or triangular lattice as in Figure-3.7c) and whether it is site or bond percolation being considered. However, the critical exponents depend only on the dimensionality of the system. The one-dimensional exponents are trivial. By way of conformal-field theory, the percolation exponents in two dimensions have been exactly solved. The third through fifth dimensional exponents have only been approximated numerically. Above dimension six, all critical exponents are the same “mean field” values that we arrived at using the Bethe lattice [Christensen and Moloney, 2005]. Percolation can also occur on random graphs. Percolation on k-regular or Erd¨os-R´enyi graphs with mean-degree of 3 or greater have the mean-field critical exponents. Percolation exponents have also been solved for a number of other random graph structures, including power-law [Cohen et al., 2002], small-world [Moore and Newman, 2000], and bipartite graphs [Newman, 2002] by way of generating functions. See Table-3.1 for a summary.

48 Directed percolation on the 1+1 square lattice

Figure 3.9: An example of (1+1)-dimensional directed bond percolation on a tilted square lattice. Surviving bonds after dilution are marked in black. A cluster of size 8 is marked in blue, beginning at the site marked in red and proceeding down the lattice following the directed links.

3.3 Directed percolation

Here we will introduce the directed percolation model and several quantities of interest, as well as their associated critical exponents. Directed percolation is percolation on directed networks. A cluster is the out component beginning at a single initiation point after dilution. Directed percolation on lattices is typically labelled (n+1)-dimensional directed percolation, where the +1 refers to the dimension along which the bonds are directed, as in Figure-3.9. Unlike undirected percolation, directed percolation has not been solved analytically ex- cept in the mean-field case above the critical dimension of 4. Fortunately, this is the case that we are most interested in. The branching process on rooted k-ary trees is a form of percolation. The direction of percolation is away from the root and the lack of loops puts this process into the universality class of mean-field (d ≥ 4) directed percolation [Livi and Politi, 2017]. Hence, we can use the branching process of k-ary trees to derive the avalanche exponent τ. The probability of a cascade of size S on a k-ary tree, with branching probability p is

(k) S (k−1)s+1 as given in Equation-3.15, Pc(S) = Cs p p . As we showed earlier, the Fuss-Catalan

(k) −3/2 −1 numbers are asymptotically Cs ∼ s exp[−s/sξ] with sξ =  k−1 for pc = 1/k log p p pc ( 1−pc ) −2 and where sξ ∼ (pc − p) .

49 Exponent Diverging Quantity Unc. PL Cor. PL d ≥ 4 β 1 1 β P ∝ (p − p ) or 1 ∗ or 1 1 ∞ c λout−2 |3−λ | −τ 1 1 τ P (s) ∼ s exp[−s/s ] 1 + or 3/2 1 + ∗ or 3/2 3/2 c ξ λout−1 λ −2 ν⊥ 1 ν⊥ ξ⊥ ∼ (p − pc) - - 2 νk νk ξk ∼ (p − pc) - - 1 Table 3.2: This table summarizes the directed percolation critical exponents for mean-field networks (d ≥ 4) and for both uncorrelated and correlated power-law networks. “Unc. PL” refers to directed percolation on directed power-law graphs with P in(j) ∼ j−λin and P out(k) ∼ k−λout uncorrelated at each vertex. “Cor. PL” refers to the same, but with the λout−1 existence of a fraction AB of nodes that are fully correlated, with the in-degree j = k λin−1 ∗ λin−λout related to the out-degree. λ = λout + in the GSCC of the power-law network. For λin−1 the uncorrelated PL networks, the first value of the exponent holds for λout ∈ (2, 3) and ∗ the second for λout ≥ 3. For correlated PL networks, the first value holds for λ ∈ (2, 4) (excluding 3 in the case of β) and the second when λ∗ ≥ 4. Power-law values are from [Schwartz et al., 2002]. Mean field values (d ≥ 4) are as given in [Hinrichsen, 2000].

One distinct aspect of directed percolation is the presence of two distinct correlation lengths. These are denoted ξ⊥ and ξk for the directions perpendicular and parallel to the

ν⊥ time direction respectively. Both quantities diverge near the critical point as ξ⊥ ∼ (p − pc)

νk and ξk ∼ (p − pc) where ν⊥ and νk are the corresponding critical exponents. An exact definition of ξ⊥ and ξk are given for the case of (1+1)-dimensional percolation in the next subsection. By way of generating functions, it is also possible to analyse directed percolation on random graphs. While k-regular graphs and directed Erd¨os-R´enyi graphs fall into the d ≥ 4 mean field universality class [Bollob´as,2001, Newman et al., 2001], scale-free networks have also been solved by use of generating functions, and have critical exponents that vary depending on the degree distribution [Schwartz et al., 2002]. One key assumption for these random graphs is that the clustering is zero, so that the effect of loops is zero. These critical exponents are summarized in Table-3.2.

50 Directed percolation on the 1+1 square lattice

1 2 3 4 t = 1 Underlying network:

t = 2 1 2 3 4

t = 3 Linear 1D network, N=4.

t = 4

Figure 3.10: Here, a contact process beginning a node 1 spreads to node 2, which in turn spreads the process to nodes 1 and 3, after which the process terminates.

3.3.1 Spreading processes

Here we will discuss how the spreading process on an undirected n-dimensional lattice can be viewed as a (n+1)-dimensional directed percolation problem. A spreading process on a complex network sees “active” sites spread their activity to neighbouring “inactive” sites. There are many classes of spreading process, many of which occur in discrete time or in continuous-time. An example of the continuous time process is the Susceptible-Infected- Susceptible (SIS) model first introduced in §2.3.3. The SIS model is parametrized by an infection rate λ, which captures how quickly nodes that are adjacent to an infected node are in turn infected, and a recovery rate µ, which captures how quickly an infected node heals. Typically, time is rescaled so that µ = 1, and the model can be expressed as a one-parameter model [Parshani et al., 2010]. A related model can be constructed for discrete time. Each active node expires after one time step, and in that time step has probability p of activating a neighbouring node. This is the branching process introduced in §2.3.2. A realization of this process is illustrated on a 1-dimensional lattice in Figure-3.10, where the connection to site 1 + 1-dimensional percolation is made apparent. This relationship can be used to explain why directed percolation is typically thought of as a non-equilibrium process. Suppose N(t) denotes the number of infected nodes in the discrete time process at time t. Even if hN(t)i = ρ > 0 is constant, there is always a non-zero

51 probability ((1 − p)2N in the case of a 1-dimensional spreading process) of a transition into the state, N(t + 1) = 0. Since there is no transition probability out of this “absorbing” state this breaks what is called detailed balance, a necessary prerequisite to equilibrium [Livi and Politi, 2017]. Considering the (1+1)-dimensional bond percolation of Figure-3.10, we can define two correlation lengths. Following [Livi and Politi, 2017] we can introduce the space correlation function * t + 1 X c|i−j| = lim (si(τ) − hsi)(sj(τ) − hsi) , (3.19) t→∞ t τ=0

1 Pt where si(t) is 1 when site i is active at time t and 0 otherwise, and hsi = limt→∞ t τ=0hsl(τ)i

where the site index l is arbitrary. Asymptotically, c|i−j| ∼ exp[−|i − j|/ξ⊥] defines ξ⊥ [Livi and Politi, 2017]. Similarly the time autocorrelation can be defined as

* L + 1 X c(t) = lim (si(0) − hsi)(si(t) − hsi) , (3.20) L→∞ L i=1

which has the asymptotic behaviour c(t) ∼ exp[−t/ξk] [Livi and Politi, 2017].

3.4 Summary

In this chapter, we have introduced the terminology and mathematical framework for net- work theory, as well as several methods by which complex networks may be generated. Additionally, we have discussed percolation and directed percolation, two processes that can play out on complex networks. Although percolation and directed percolation fall into two distinct universality classes, they share many similarities such as the emergence of percolat- ing clusters and power-law scaling of cluster sizes and of the correlation length near their critical point. One particular distinction is that in mean-field, they have different critical

−5/2 exponents governing the distributions of cluster sizes, with Pc(s) ∼ s in percolation, and

−3/2 Pc(s) ∼ s in directed percolation. Critical exponents are most often determined by the

52 dimensionality of the system in question. Typically, (directed) percolation on random graphs obeys the mean-field exponents. However, certain random graphs, such as power-law graphs, instead have critical exponents that vary depending on their network properties. In addition to summarizing a number of the quantities that may be studied in percolating systems, we also developed a few analytical results for the Bethe lattice. The approaches used in the Bethe lattice we will employ in the next chapter to develop a theory of branching processes with noise.

53 Chapter 4

The Branching Process with Noise

This chapter will be concerned with a model that generalizes the branching process by way of the inclusion of spontaneous activations. In the traditional branching process, an active

node activates each daughter with probability p1. Hence, if a daughter had m active parents

m m at time t, it has a probability (1 − p1) = p1 of being inactive at time t + 1, where we have

introduced the complementary probability p1 = 1 − p1. If nodes also have an independent

probability p0 of being activated spontaneously in each time step, then the probability of

m remaining inactive at time t + 1 becomes p0(p1) . Hence, the probability of activation for a node with m parents is:

m P (daughter active at time t + 1 | m parents active at time t) = 1−(1−p0)(1−p1) , (4.1)

or more concisely,

m P (active |m parents) = p0 p1 . (4.2)

An example of this process playing out on a very simple bi-directional linear network can be seen in Figure-4.1. The first part of this chapter will be concerned with developing analytical results for this model on the k-regular graphs discussed in the previous section. To provide a qualitative feel for how clusters in the system behave, Figure-4.2 shows how the model

54 Network topology:

0 1 2 3 4

Example dynamics: t = 0

t = 1

t = 2

t = 3

t = 4

t = 5

t = 6

Activated Spontaneous

Figure 4.1: An example of a branching process with multiple spontaneous activa- tions/infections on a simple linear bidirectional network (shown at the top). The dynamics consists of two independent cascades, one with two roots (node 1 at time t = 1 and node 4 at time t = 2), and one with a single root (node 0 at time t = 3).

responds to increasing p1 in terms of the distribution of avalanches.

4.1 Results for the branching process with noise on

infinite k-regular graphs

In this section we will derive steady-state mean-field results for the branching process with noise running on infinite k-regular graphs. The assumption of steady-state means that we will consider the system after it has been running infinitely long, so that any transients associated with the initial conditions have been erased. The assumption of mean-field means that we will make a locally tree-like approximation, meaning that for a randomly-selected node, the activity of its parents are uncorrelated. The mean-field approximation can be justified, because random graphs should be well above the upper-critical dimension of directed (and

55 Transistion to percolation a. Avalanche sizes b. Giant Size 100 Supercritical 0.4 Giant size 10−2 Critical Subcritical 0.3 10−4 G

P(s) 0.2 Giants 10−6 0.1 10−8 0 100 102 104 106 108 0.08 0.09 0.1 0.11 0.12

Size, s p1

Figure 4.2: a. The distribution of avalanche sizes on a 10-regular graph, with N = 104 nodes simulated for T = 103 (empty circles) or 104 timesteps (filled circles), averaged over −5 five network configurations, for various p1 and p0 = 10 . Solid lines are exponentially −3/2 truncated power-law fits, p(s) ∼ s exp[−s/sξ]. The p1 values for each curve are marked in panel b. b. The average number of nodes active in the largest cluster each time step. undirected) percolation.

4.1.1 Active fraction

The active fraction is the probability that a randomly-selected node fires in a given time step. As we have assumed the steady-state approximation, we can assume its parents have fired with that same probability, and thereby derive a self-consistency relationship of

Φ := P (activation)

k X = P (activation | m parents)P (m parents) . m=0

56 Since each parent is independent in the mean field, their activation is given by a binomial

k  m k−m distribution P (m parents) = m Φ Φ , and by substitution we get

k   X k k−m Φ = ΦmΦ [1 − p p m] m 0 1 m=0

k = 1 − p0 (p1Φ) (4.3) which defines an implicit relationship

k Φ = p0 (p1Φ) (4.4)

for Φ(p0, p1) that can easily be solved numerically (or exactly in the case that k ≤ 4), as is done for 10-regular graphs in Figure-4.3. Note that the pure branching process, in the

1 absence of noise, has a critical point at p1 = k on a k-regular graph. As the noise is decreased 1 in Figure-4.3, a singularity develops at p1 = k = 0.1. One quantity that can be defined by the active fraction is the dynamical susceptibility, χ0, which denotes the response of the system to an external drive. In an infinitely large system, an external stimulus can be modelled as an infinitesimal increase in the noise of the system, inviting the definition

∂Φ χ0 ≡ ∂p0 k k−1 = Φp1 + p1kp0Φp1 χ0 , and making use of Equation-4.4,

Φ Φ χ0 = + p1k χ0 p0 Φp1

Φp1 − p1kΦ Φ =⇒ χ0 = Φp1 p0

Φ Φp1 =⇒ χ0 =  , (4.5) p0 Φp1 − p1kΦ

57 Active fraction varying spontaneous activity 0.5 −2 p0 = 10 −3 p0 = 10 −4 p0 = 10 −5 0.4 p0 = 10

0.3

0.2 Active fraction

0.1

0 0.0750 .080 .0850 .090 .0950 .10 .1050 .110 .1150 .120 .125

p1

Figure 4.3: The active fraction Φ(p0, p1) for 10-regular graphs, for various p0 as a function of p1.

58 Susceptibility varying spontaneous activity 103 −2 p0 = 10 −3 p0 = 10 −4 p0 = 10 −12 p0 = 10 Widom line 102 Susceptibility 101

100 0.0750 .0850 .0950 .1050 .1150 .125

p1

Figure 4.4: The dynamical susceptibility χ0 as a function of p0 and p1. Maxima of the dynamical susceptibility are marked with blue squares. The susceptibility along the Widom line is plotted in black.

which can again be solved as a function of p0 and p1, as is done on the 10-regular graph in

Figure-4.4. We see that for each p0 there is a corresponding p1 that maximizes the dynamic susceptibility. These pairs of points define a Widom line [Williams-Garc´ıaet al., 2014]. In

1 the low-noise limit, the susceptibility diverges at k , just as in the pure branching process.

4.1.2 Mean cluster size

One of the easiest observables to measure, in both percolating systems and neural systems, is the cluster size distribution. At criticality, the distribution of cluster sizes follows a power-

−τ law Pc(s) ∼ s . As we approach the critical point in a percolating system, the moments of the cluster size distribution may also diverge as a power-law. As we saw in §3.2.1, the

mean cluster size χ (also often identified as the susceptibility) diverges as χn = hSin ∼

−γ |pc − p| . To distinguish between sampling active nodes and clusters (the necessity of

59 which was illustrated in Figure-3.8), we use the subscript n on h · in and χn to denote an average obtained by sampling with respect to active nodes. This divergence in the mean cluster size occurs when the infinite percolating cluster appears at p = pc. In this section, we will analytically calculate the mean cluster size, and find that for each p0 there is a corresponding p1c(p0) at which the mean cluster size diverges. To determine the average cluster size, we begin by exploiting the fact that in the mean field the activity along each parent and daughter branch should be independent. Hence, for a random node (not necessarily a root), the expected size of the cluster to which it belongs is

χn = hSin = 1 + kBd + kBp , (4.6)

where Bd is the expected contribution to the mean cluster size of a daughter branch, and Bp is the expected contribution of a parent branch (given that the parent branch has an active daughter node) and where the subscript ‘n’ denotes that this average is taken with respect to randomly-selected active nodes. A similar average may be constructed over the roots of the cluster, by only considering the daughters

χr = hSir = 1 + kBd .

The expected contribution of each branch can be computed by summing the expected contributions from the second-generation branches of the node, multiplied by the probability of that branch connecting to an active node. For instance, starting from the root node, we have k daughter branches, so χr = 1 + kBd. Each daughter branch contributes size 0 with probability 1 − pd, and contributes +1 (for the daughter) + kBd (for the k daughters of that

daughter) + (k − 1)Bp1 (for the k − 1 other parents of that daughter) with probability pd.

Hence, Bd = (1 − pd) × 0 + pd × (1 + kBd + (k − 1)Bp1). Naively, we might expect that the parents of a daughter branch would contribute Bp. However, Bp is the contribution of a

60 D

pp

A C

pd pp1

B

Figure 4.5: A CWEB of size four is shown. Physical connections between nodes are shown in grey. Node B has nodes A and C as parents, while node C has D as a parent. Directed edges in black correspond to how the cluster is built, beginning from from A. Associated with each node added to the cluster is a probability of inclusion that depends only on information available along that path. Here, node A triggers B concurrently with C, while D triggered C. Evaluated from A however, the probability that B triggered (without knowledge of C’s firing) is pd, while the probability that C fired, conditioned on both A and B having fired, is pp1. Lastly, the probability that D contributes to C, conditioned only on the fact that C was activated, is pp. In this figure non-firing sites (e.g. the parents of A) are hidden to reduce clutter.

61 parent branch from a randomly-selected active node – we have more information about the parents of a daughter branch, because we arrived at the daughter branch by means of an

active parent. Consider a system with p0  1, and hence Φ  1. Any randomly-selected active site was quite likely to have been activated by another site. That site was, by the same logic, also likely to have been activated by a parent, and so we see that Bp might be comparatively large. By comparison, a daughter branch asks us to consider a node that we arrived at by means of another active node – it’s quite likely that node activated the daughter and since Φ is low, it’s unlikely that daughter node has any other active parents.

Hence Bp1 is probably quite small.

The self-consistency relationships for Bp and Bp1 are developed in the same manner as

Bd, and together with the self-consistency relationship for Bd, produce the following set of coupled linear equations:

Bd = pd (1 + kBd + (k − 1)Bp1) ,

Bp = pp (1 + (k − 1)Bd + kBp) ,

Bp1 = pp1 (1 + (k − 1)Bd + kBp) ,

where pd denotes the probability that a daughter fires, given that at least one of its parents are known to fire, pp denotes the probability that a parent fires given that its daughter has

fired, and pp1 denotes the probability that a parent of a firing node with at least one other known active parent fires. These probabilities are illustrated in Figure-4.5. This system of equations can be easily expressed in the following matrix form:

        pd 1 − kpd 0 −(k − 1)pp1 Bd Bd                  p  =  −(k − 1)p 1 − kp 0   B  = M  B  .  p   p p   p   p          pp1 −(k − 1)pp1 −kpp1 1 Bp1 Bp1

Hence, should the probabilities pd, pp, and pp1 be computable, the expected branch contri-

62 butions can be easily found. Therefore, we can write

  pd     −1   χn = 1 + k k 0 M  p  , (4.7)  p    pp1 and similarly, the average size from a root (a node with no active parents) is:

  pd     −1   χr = 1 + k 0 0 M  p  .  p    pp1

So, turning first to pd we have

pd := P (daughter fires | at least one parent is known to fire) , (4.8)

Pk and clearly P (daughter | m ≥ 1 parents) = m=1 P (daughter | m parents)P (m parents) where P (m parents) = P (m-1 parents other than the known parent fire)

k X  k − 1  p = Φm−1(1 − Φ)(k−1)−(m−1) [1 − (1 − p )(1 − p )m] d m − 1 0 1 m=1

k−1 = 1 − (1 − p1)(1 − p0)(1 − p1Φ) , and using Equation-4.3 to simplify:

p1 Φ pd = 1 − . (4.9) p1Φ

As pd is the probability of a randomly-selected daughter firing, we can also use it to find the

63 branching ratio

σ := hMean number of daughtersi ,

and since each daughter is independent, the number m of activated daughters follows a binomial distribution

k X  k  σ = m pmp k−m , m d d m

for which the mean is just

σ = kpd . (4.10)

From Equation-4.10, we see that the unity branching ratio line defined by σ = 1 is (in the

1 limit of low noise, where pd → p1) simply given by p1 = k , precisely as is predicted in classical branching theory.

Turning next to pp, we have

pp := P (parent fires | daughter fires) , and by Bayes’ theorem it follows that

= P (daughter fires | parent fires)P (parent fires)/P (daughter fires) .

Obviously the unconditioned probability that the daughter fires in a given time step is exactly the active fraction, Φ, and the probability that the parent fires is just Φ. Hence

Φ = P (daughter fires | parent fires) Φ

= pd .

64 Lastly, pp1 is the probability that a particular parent (say labelled ‘j’) fired before its daughter did, given that a different parent (say ‘i’) of that same daughter also fired, and hence may

have been the cause of the firing daughter. Writing pp1 as a conditional probability

pp1 := P (parent j | parent i & daughter)

and with application of Bayes’ theorem

P (parent j & parent i & daughter) = P (parent i & daughter) a second application of Bayes’ theorem yields

P (daughter | parents i & j)P (parents i & j) = . P (daughter | parent i)P (parent i)

Clearly, the probability of parents i & j both firing, unconditioned on anything else is just

2 Φ , while P (daughter | parent i) is pd (by Equation-4.8) and P (parent i) is just Φ, so

Φ2 = P (daughter | parents i & j) , pdΦ

Pk and clearly P (daughter | parents i & j) = m=2 P (daughter | m parents)P (m-2 parents) where P (m-2 parents) is understood to refer to the probability that m − 2 parents other than i & j fire. Hence

k Φ X  k − 2  = Φm−2(1 − Φ)(k−2)−(m−2) [1 − (1 − p )(1 − p )m] p m − 2 0 1 d m=2

Φ  2 k−2 = 1 − (1 − p1) (1 − p0)(1 − p1Φ) , pd

65 and simplifying using Equation-4.3

  Φ 2 1 − Φ = 1 − (1 − p1) 2 . pd (1 − p1Φ)

Therefore, we can compute each of the branching probabilities strictly in terms of p1 and

Φ(p0, p1). If Equation-4.7 is solved symbolically, we find that

2 2 1 − kp1(Φ − 1) + Φ + p1Φ(p1 + p1Φ − 4) χn = 2 2 2 , (4.11) 1 + k p1(Φ − 1) + Φ + p1Φ(p1 + p1Φ − 4) − 2k(p1 + Φ + p1Φ(p1Φ − 3))

which diverges when the denominator vanishes. χr can also be represented symbolically,

where it has the same denominator, and hence diverges at the same point. For fixed p0,

there is a corresponding p1c(p0), such that the denominator

2 2 2 f(p0, p1) = 1 + k p1(Φ − 1) + Φ + p1Φ(p1 + p1Φ − 4) − 2k(p1 + Φ + p1Φ(p1Φ − 3))

vanishes, i.e. f(p0, p1c(p0)) = 0 ∀ p0 ∈ [0, 1]. For p0, p1 near to fixed p0c, p1c(p0c), f(p0, p1) u ∂f ∂f f(p0, p1c(p0)) − (p1c − p1) − (p0c − p0) and so ∂p1 ∂p0

1 χn ∝ ∂f ∂f , − (p1c − p1) − (p0c − p0) ∂p1 ∂p0

−1 −1 implying that χn diverges as (p1c − p1) when p0 = p0c and as (p0c − p0) when p1 = p1c.

4.1.3 Phase diagram

The pairs (p0, p1c(p0)) define a phase line, on which χn diverges. χn diverges when the

denominator of Equation-4.11 vanishes. This serves to divide the p0, p1 phase space into a sub-critical regime and a super-critical regime. We can compare this division to the division imposed by the unity branching ratio σ = 1 obtained by solving Equation-4.10. We can also compare it to the Widom line derived in Equation-4.5, which maximizes the dynamic

66 Phase diagram Phase diagram in low-noise limit

10−1 10−2 0.1 10−3 10−4 1/(2k − 1) 10−5 0.05 10−6 10−7 −8 Susceptibility 1/k Susceptibility 10 0 00 .0250 .050 .0750 .10 .125 10−4 10−3 10−2 10−1

p1 1/k − p1 hSin Diverges hSin Diverges Widom line Widom line Unity branching ratio σ = 1 Unity branching ratio σ = 1

(a) (b)

Figure 4.6: (a) The phase-diagram for a 10-regular graph, with the Widom line, unity branching ratio line, and the phase transition line, on which χn = hSin diverges. The limits of the diverging χn fall at the points expected for a directed and undirected percolation process on a Bethe lattice of coordination number k + 1 and 2k, respectively. (b) As in (a), 1 η but in the limit of low noise, and in log-log scale. All three lines follow p0 ∝ k − p1 with different η. From top to bottom, the η are 1, 2, and 3.

susceptibility. All three of these lines are plotted in the phase-diagram of Figure-4.6a. In past work on neural systems, it has been proposed that criticality occurs when σ = 1 [Chialvo, 2010, Haldeman and Beggs, 2005]. However, more recent in-vivo recordings suggest a driven sub-critical (σ < 1) branching process [Priesemann et al., 2014]. Although it has also been suggested that the brain operates on the Widom line, it’s evident that the Widom

line has a branching ratio σ > 1. Meanwhile, the critical line defined by the divergence of χn falls into the regime of σ < 1, which is consistent with observations of a sub-unity branching ratio in real neural networks [Fagerholm et al., 2015, Priesemann et al., 2014].

In the limit of p0 → 0, we recover the traditional branching process. In the traditional

1 branching process, as p1 → k , we see the susceptibility and mean cluster size diverge, as well

as the branching ratio σ → 1. We can see in Figure-4.6b that all three lines approach p0 → 0

1 as a power-law as p1 → k . The exponent of these power-laws differ, with χn approaching

p0 = 0 fastest.

67 Scaling relation of the critical line

Considering that the phase line occurs where the denominator of Equation-4.11 vanishes, we have

2 2 2 0 = 1 + k p1(Φ − 1) + Φ + p1Φ(p1 + p1Φ − 4) − 2k(p1 + Φ + p1Φ(p1Φ − 3) ,

and since the highest power of p1 that appears is of order 2, we can solve for p1 as

k + 2Φ − 3kΦ ± (k − 1)2(1 − Φ)p(2k − 1)Φ =⇒ p = 1 k2 + Φ − 2k2Φ + (k − 1)2Φ2

but since the positive root doesn’t give p1 ∈ [0, 1], we have

k + 2Φ − 3kΦ − (k − 1)2(1 − Φ)p(2k − 1)Φ p = . (4.12) 1 k2 + Φ − 2k2Φ + (k − 1)2Φ2

Of principle interest will be the low-noise regime p0  1. In this regime, Φ  1. We can therefore consider a first-order expansion of Equation-4.4 in Φ, and obtain

k Φ = 1 − (1 − p0)(1 − p1Φ)

2 = 1 − (1 − p0)(1 − kp1Φ + O(Φ ))

p0 =⇒ Φ u . (4.13) 1 − (1 − p0)kp1

In the low-noise regime with both p0 and Φ  1, Equation-4.12 simplifies to

1 k − 1 p = − p(2k − 1)Φ . 1 k k2

Inserting Equation-4.13 and re-arranging (and keeping only first-order terms in p0), we find

1 3 (k − 1)2(2k − 1) =⇒ − p = p , (4.14) k 1 k5 0

68 1 3 which yields the power-law relationship ( k − p1) ∝ p0, which we see has excellent agree- ment with the exact analytical prediction in Figure-A.1, and explains the slope observed in Figure-4.6b.

Scaling relation of the σ = 1 line

Beginning from σ = 1, and using Equation-4.9 and Equation-4.10, we find

p Φ 1 = σ = 1 − 1 p1Φ 1 − kΦ =⇒ p = , 1 k + Φ − 2kΦ

and since Φ  k we have to first-order

1  2 p ≈ − Φ 1 − . 1 k k

p0 Now, using Equation-4.13 and the condition that p0  1 so that Φ ≈ , yields 1−kp1

1 p0 k − 2 p1 ≈ k 1 − kp1 k k2 1 2 =⇒ p = − p . 0 k − 2 k 1

This has excellent agreement with the exact form of the scaling, as can be seen in Figure-A.2.

1 2 This equation yields the scaling relationship p0 ∼ k − p1 explaining the slope seen in Figure-4.6b.

Scaling relation of the Widom line

Lastly, we can obtain a scaling relation for the Widom line close to p0 → 0. The Widom

line is the set of p0 and p1 which maximize the dynamic susceptibility χ0. Hence, for fixed

∂χ0 p0 we can find the corresponding maximizing p1 by solving = 0. Equation-4.5 expresses ∂p1

69 ∂Φ χ0(p0, p1, Φ), however since Φ is a function of p0 and p1, we need to find ≡ χ1. Taking a ∂p1 derivative of Equation-4.3 we have

∂Φ χ1 ≡ ∂p1 k−1 = −p0p1Φ k (−Φ − p1χ1) , and applying Equation-4.3 to simplify and rearranging we find

kΦΦ =⇒ χ1 = . (4.15) p1Φ − kp1Φ

Hence, we can find the maxima of χ0 by setting

∂χ ∂ Φ Φp 0 = 0 = 1 ∂p1 ∂p1 p0(Φp1 − p1kΦ)  2 2 2 ∂Φ 2 kp1Φ − p1Φ ∂p + kΦ = 1 , 2 p0(kp1Φ − p1Φ + 1) and by inserting Equation-4.15 for ∂Φ and simplifying, we find ∂p1

 2 2 kΦ p1Φ 1 − kp1Φ − 2Φ + p1Φ = , 3 p0 p1Φ − kp1Φ

which can only be zero if the numerator is zero, allowing us to find a condition between p1 and Φ that must be met for the Widom line

2 2 =⇒ 0 = 1 − kp1Φ − 2Φ + p1Φ . (4.16)

At this point, we would like to substitute an approximation for Φ in the variables p0 and p1 so that we can obtain a relation between p0 and p1. We could use a first-order approximation (Equation-4.13), however in the neighbourhood of the Widom line, the second derivative of

70 Φ is maximized and hence non-negligible. Accordingly, we will try a second-order approxi- mation for Φ. Beginning from Equation-4.3, and taking a second-order Taylor expansion in Φ we find

 k(k − 1)  Φ ≈ p 1 − kp Φ + (p Φ)2 + O(Φ3) . 0 1 2 1

Truncating the series at second-order, and solving the resulting quadratic equation, we find

p 2 2 2 kp0p1 − 1 ± 1 − k p0 p1 − 2kp0p1(1 + p0p1) =⇒ Φ ≈ 2 , (4.17) (k − 1)kp0p1

with only the positive root of Equation-4.17 corresponding to Φ ∈ [0, 1]. This second-order approximation has excellent agreement with the true active fraction in the neighbourhood of the Widom line (cf. Figure-A.3). Inserting Equation-4.17 into Equation-4.16 and taking

a first-order Taylor expansion around p0 = 0 and p1 = 1/k we find

  4k2 2k   1  0 ≈ 2p + −k + − p − p , 0 k − 1 k − 1 0 k 1

and since p0  1 we have, upon rearrangement

k 1  p ≈ − p . (4.18) 0 2 k 1

Equation-4.18 yields excellent agreement with the true Widom line in the limit of p0  0

1 1 (c.f. Figure-A.4). And we also have that p0 ∼ k − p1 , as in Figure-4.6b. It has been suggested that in the presence of noise, the dynamic susceptibility and fluc- tuation sizes for a system should be proportional [Williams-Garc´ıaet al., 2014]. Hence we would expect that dynamic susceptibility and Var(Φ(t)) should be maximized at the same

point. In the purely noise-driven limit, with p1 → 0 and Φ = p0, the number of active

sites Na(t) = NΦ(t) is given by the binomial distribution, with probability Φ on N sites.

71 Φ Then we find the variance of the active fraction to be Var(Φ(t)) = N (1 − Φ), which is 1 maximized for Φ = 2 . Reassuringly, this is exactly the limiting value of the Widom line

for p1 → 0. However, in the case that p1 6= 0, the active fraction at different points in time Φ(t) are no longer independently distributed via a Binomial distribution, and the line

1 1 Φ = 2 no longer agrees with the Widom line. In particular, Φ = 2 for p0 → 0 implies that  1  1  p1 → 2 1 − 1/k → 2 1 − 1/k , in disagreement with the Widom line, which has [2(1−p0)] 2

p1 → 1/k in the limit that p0 → 0.

4.1.4 Mergeless cluster distribution

Another quantity of interest is that of single-rooted cluster distribution. Single-rooted clus- ters are those that did not result from the merging of initially independent cascades. We will show that this distribution has a power-law exponent of −1.5, which is that of the di- rected percolation universality class, with an exponential cut-off. This exponential cut-off will impose a characteristic size of avalanche. The singly-rooted clusters are the incomplete k-ary trees introduced in §3.1.1. Thus,

(k) 1 ks there are Cs = (k−1)s+1 s configurations of clusters of size s on a k-regular graph. The probability of a mergeless avalanche of size s, up to a factor for normalization, is therefore

(k) s−1 t P (s) = Cs pd1 pd , (4.19)

where t = (k − 1)s + 1 denotes the perimeter of the k-ary tree with s vertices, pd1 denotes the probability that a daughter node fires with no other firing parents, and pd denotes the complement of pd (i.e. the probability that a daughter node does not fire). Clearly the k−1 probability of having no other firing parents is Φ and the probability of firing due to one

firing parent is (from Equation-4.2) just p0 p1. So,

k−1 pd1 = Φ p0 p1 .

72 Now, making use of Stirling’s approximation for the Fuss-Catalan numbers (Equation-3.16) and inserting it into Equation-4.19 we find

 p kk s −3/2 d1 −3/2 −s/sξ P (s) ∝ s k−1 = s e , (4.20) ((k − 1)/pd)

where −1 −1 s := = . (4.21) ξ  k−1  k−1 k pd    log pd1k pd k−1 log kpd1 1−1/k

In the limit of low noise, with p0  1 we can take Φ u 1 and pd u pd1 u p1, so that

−1 s . ξ u   k−1 p1 log kp1 1−1/k

1 If we expand the denominator around p1 = k , we find

!  p k−1 k2(2 − k) log kp 1 = (1/k − p )2 + O (1/k − p )3 . 1 1 − 1/k k − 1 1 1

So, k − 1 s (1/k − p )−2 , (4.22) ξ u k2(2 − k) 1 which predicts the scaling relation

−2 sξ ∼ (1/k − p1) . (4.23)

1 −2 3 Additionally, on the critical line we know that k − p1 ∼ p0 from Equation-4.14, so that on the critical line we have

−2/3 sξ ∼ p0 . (4.24)

Finally, as an aside, the constant prefactor in Equation-4.22 is off by more than an order of magnitude, as can be seen in Figure-A.5. It can be improved if we instead use the

73 approximations

pd u (1 − p1)(1 − Φ) ,

pd1 u p1(1 − (k − 1)Φ) , and

p0 p0 Φ u u , 1 − kp1p0 1 − kp1 whereupon we find 2(k − 1)(2k − 1) s (1/k − p )2 , (4.25) ξ u k3(6k − 1) 1 which gives much better agreement with the exact result in Equation-4.21, as can be seen in Figure-A.5.

4.1.5 Cluster size distribution

In the previous section, we derived a power-law distribution for the size of mergeless avalanches.

−τ The size distribution p(S) ∼ s exp[−s/sξ] with τ = 1.5 would seem to imply that the mergeless avalanches fall into the universality class of directed percolation (see Table-3.2). However, we also found that there is an exponential cut-off in the size of the mergeless avalanche distribution. Production of causal webs above this size, therefore, must be dom- inated by the merging of activity originating at multiple sites. In the noise-driven (perco-

1 lation) limit, where p0 → 2k−1 and p1 → 0 we find that the cluster size cut-off sξ → tends to 1 for mergeless avalanches. We expect that the cluster distribution for sizes above 1 is produced solely by activity originating at multiple sites and that cluster sizes should be distributed with an exponent of −2.5 as in pure-percolation (see Table-3.1). This suggests that above the characteristic size sξ we should expect an avalanche exponent of τ = 2.5. Although we won’t derive an explicit analytical form for the general P (s), we will conduct extensive simulations of the infinite lattice system, and show that it obeys a universal scaling form consistent with our predictions. For a complete description of the algorithm used to simulate the branching process with

74 Figure 4.7: The causal-web distribution Pc(s) of an infinite 10-regular graph, simulated for −3/2 −5/2 different p0 on the critical line. Power-laws, s and s are present to guide the eye. noise on infinite k-regular graphs, see §B.1. In brief, we activate a single node, and then sample a probability distribution to determine how many parents contributed to that node’s firing and another to find how many daughters of that node subsequently trigger. For each parent, we check how many of its own parents fired, as well as how many other daughters (siblings to the initial node) were triggered. Each daughter checks for grand-daughters and for other parents. This cascade can take some time to resolve; however, as long as p1 < p1c(p0) it is always finite. Despite simulating an infinite k-regular graph, the algorithm runs in O(1) space, and is many orders of magnitude faster than simulating a finite k-regular graph where each node must be kept in memory. As we are sampling a random active node and then checking the size of the resulting causal web, we are sampling Pn(s). Now, we are interested in sampling the causal web distribution Pc(s), which is related to Pn(s) by

Pn(s) Pc(s) ∝ s . Unless otherwise specified, we will be using the Pc(s) distribution. For each p0, p1c(p0) combination on the critical line, we find two power-laws are present in Figure-4.7.

75 Figure 4.8: The causal web distribution Pc(s) of an infinite 10-regular graph, simulated for −3/2 −5/2 different p0 on the critical line. Power-laws, s and s are present to guide the eye.

−3/2 −5/2 For small avalanches Pc(s) ∼ s , while for large avalanches Pc(s) ∼ s . The position

1 of this transition depends on the value of p0, or equivalently, k − p1. We know that the high-noise limit of the critical line belongs to the percolation uni- versality class, while the low-noise branching-process limit belongs to that of the directed percolation universality class. In the low-noise system, clusters have a single root and typ- ically do not merge. So we should anticipate that the characteristic size of the mergeless avalanches should explain the transition between directed- and undirected-percolation. In

−2/3 the previous section, we found that sξ ∼ p0 , so if we rescale the causal web distributions 2/3 of Figure-4.7 by p0 we should expect to find that the transitions line up. This is precisely what we observe in Figure-4.8. This kind of curve-collapse is characteristic of universality, and shows that the branching process with noise falls into the same universality class along the entire curve, except at its p0 = 0 limit of directed percolation, where sξ → ∞. We’ve predicted that the smallest avalanches should be governed by the directed percola-

76 Power-law transition is governed by merging a. Avalanches distribution b. Size and root distribution 0 9 10 10 108 10−3 6

−6 i 10 0 ) 10 ) 10 S S 10−7 100 107 ( ( −9 2/3 R P Sp 10 3 0 h 10 10−12 10−15 100 100 102 104 106 100 102 104 106 108 S S −3/2 −3 −7 1 Single root x p0 = 10 p0 = 10 x −5/2 −5 −9 Multiple roots x p0 = 10 p0 = 10

−5 Figure 4.9: a. Avalanche statistics for p0 = 10 simulated on an infinite 10-regular network at the theoretically determined critical point. Simulated avalanches with one root are shown with symbols, while the analytical prediction is shown with a line of the same colour. b. Average number of roots R for avalanches of a given size are shown for simulations of various p0 on an infinite 10-regular network. Inset shows curve-collapse across various p0, 2/3 with rescaled x-axis of sp0 . tion process of spreading, while the largest avalanches should be dominated by the merging of activity. We can decompose the set of avalanches into those with one root, and those that are multiply-rooted (Figure-4.9a), where we find that almost all small avalanches are singly rooted and are described by a power-law of τ = 1.5 with exponential cut-off (matching Equation-4.20), while all large avalanches are multiply rooted and described asymptotically by a power-law of τ = 2.5. We see that for large avalanches, the avalanche size is exactly proportional to the number of roots, as can be seen in Figure-4.9b. The appearance of the multiply rooted avalanches is governed by the level of noise, and the transition point can be

−2/3 seen to scale with p0 . We can also measure the summary statistics of the cluster size distribution, such as the average cluster size. As can be seen in Figure-4.10, the mean cluster size matches the simulation results almost exactly.

77 Figure 4.10: Symbols are simulation results on infinite 10-regular lattices for 2×107 clusters, while solid lines are the analytical predictions of Equation-4.11.

4.1.6 Avalanche duration and scaling relations

Thus far, we have mostly concerned ourselves with the size of neuronal avalanches. However, one of the other early indicators of criticality was the presence of power-law distributions for the duration of neuronal avalanches [Friedman et al., 2012]. It was found that avalanche durations, defined to be the interval of time between the first firing neuron in an avalanche and the last, were distributed with a power-law P (T ) ∼ T −α (typically, α = 2 in mean- field), consistent with predictions from directed percolation. Additionally, for an avalanche of duration T , there is a typical avalanche size given by hSi(T ) ∼ T γ [Friedman et al., 2012]. Similarly, hT i(S) ∼ S1/γ. However, to satisfy P (T ) ∼ T −α, P (S) ∼ S−τ and hSi(T ) ∼ T γ simultaneously there must be a relationship between the scaling exponents γ and τ and α. This can be constructed explicitly by computing the average avalanche size in two different ways, using P (S) and comparing it with the average size computed using hSi(T ) in conjunction with P (T ). Beginning with the obvious, P (S), we find the average

78 size to be Z ∞ Z ∞ hSi = SP (S)dS ∼ S1−τ dS . (4.26) 1 1

Meanwhile, using hSi(T ) we find

Z ∞ hSi = hSi(T )p(T )dT 1

and inserting p(T ) ∼ T −α and hSi(T ) ∼ T γ which also implies that dS = T −1+γdT so we have

Z ∞ 1 −α 1−γ = T σνz T T dS 1

and using T ∼ S1/γ

Z ∞ = S(1−α)/γdS . (4.27) 1

For Equation-4.26 and Equation-4.27 to agree, the integrands must be proportional for each S, which implies α − 1 = γ . (4.28) τ − 1

Scaling relations of this form are a hallmark of criticality. That this relationship holds in neural cultures is highly suggestive of a critical state [Friedman et al., 2012]. So, we should also check to see whether this scaling relationship holds once we introduce noise. In Figure-4.11, we simulate causal webs on the infinite 10-regular network, and consider the effect of spontaneous activity on the duration of causal webs. We see that, once again, there is a characteristic roll-over effect. For small avalanches, we see that hT i(S) ∼ S1/2, which is consistent with the branching process [Friedman et al., 2012], while for large avalanches

1/4 −2/3 hT i(S) ∼ S . The point at which this transition occurs scales with p0 , as can be seen −1/6 by rescaling (cf. Figure-4.11b). The rescaling of the duration by p0 can be understood

79 a. Size duration relation b. Rescaled durations 105 101

104 6

/ 0 3 1 10 0 10 − i p 4 T / h 102 1 S 10i −1 T 101 h

100 100 102 104 106 108 1010 10−2 10−4 100 104 108 S −2/3 −3 −7 p0 = 10 p0 = 10 Sp0 −4 −8 −3 −6 p0 = 10 p0 = 10 p0 = 10 p0 = 10 −5 1/4 −4 −7 p0 = 10 S p0 = 10 p0 = 10 −6 1/2 −5 −8 p0 = 10 S p0 = 10 p0 = 10

Figure 4.11: a. Mean avalanche durations for avalanches of various sizes simulated on the infinite 10-regular network, with varying levels of spontaneous activity. p1 is set to a slightly −11/2 sub-critical value, p1 = p1c − 10 , so that no infinite avalanches occur. b. As in a., the mean avalanche duration exhibits reasonable curve-collapse, with collapse quality increasing as p0 → 0.

by trying to align the curves on the y-axis at the transition point. Up until the transition,

−2/3 1/2 −1/3 at S ∼ p0 , T ∼ S so at the transition point, T ∼ p0 . As we are also rescaling T by

−2/3 1/4 −1/3+2/12 −1/6 (S/p0 ) , we wind up rescaling T by p0 = p0 .

1 1 1 So, we’ve identified that γ appears to be 2 for small avalanches, and 4 for large avalanches. From Equation-4.28, we expect that α = 1 + γ(τ − 1), so as τ is 3/2 for small avalanches and 5/2 for large avalanches, we would expect that α = 2 for short avalanches (consistent with the branching process) and α = 7 for long avalanches. This is exactly what we see in Figure-4.12, where there is once again a roll-over that occurs between two exponents. This is highly suggestive of a critical state. This agreement between τ, α, and γ could also be checked by looking at the mean avalanche profile, which should contain more information than just the agreement of the avalanche exponents, because temporal profiles have infinitely more degrees of freedom than a simple power-law [Friedman et al., 2012]. The introduction of a relationship between the duration and size of avalanches also invites

80 Avalanche duration distribution 100 p = 10−3 p0 = 10−4 10−2 0 −5 p0 = 10−6 p0 = 10−7 −4 p = 10 10 p0 = 10−8 0 T −2 10−6 T −7 10−8 P(T) 10−10 10−12 10−14 100 101 102 103 104 105 T Figure 4.12: Avalanche durations simulated on the infinite 10-regular network, with varying −11/2 levels of spontaneous activity. p1 is set to a slightly sub-critical value, p1 = p1c − 10 , so that no infinite avalanches occur.

−2/3 a very heuristic argument for why the exponent roll-over size scales as p0 . This roll-over occurs when merging from other spontaneously activated streams becomes relevant. If there is some characteristic scale Sc at which spontaneous activity becomes likely to appear, there

1/γ is also a corresponding duration, Tc ∼ Sc . One way of interpreting the spontaneous activation rate p0 is that it is the mean number of spontaneous activations per unit time per site. From a dimensional analysis perspective, since Sc has units of sites, while Ts has units of time, we might then expect that p0ScTc = const. Proceeding as such, we obtain

γ − 1+γ Sc ∼ p0 . (4.29)

Now, before spontaneous activity takes hold, we expect the mean-field exponent of γ = 2,

−2/3 which yields Sc ∼ p0 , which is the scaling we predicted analytically, based on the mergeless cluster size distribution and the scaling relations of the phase-diagram on the critical line. However, this heuristic argument also makes an additional (untested) prediction: if a network structure changes γ, then we should see an optimal avalanche curve-collapse (e.g. as in

81 Figure-4.8) for different levels of noise with a different power of p0. One starting place to check for this behaviour would be on sufficiently fat-tailed power-law graphs (cf. Table-3.2), where the directed percolation exponents for τ and possibly α might change.

4.1.7 Correlation length

In directed percolation, there are two correlations lengths that can be identified: the parallel

correlation length ξk and the perpendicular correlation length ξ⊥, where the ‘parallel’ and ‘perpendicular’ are with respect to the direction of time (see §3.3.1). The parallel compo- nent corresponds to the autocorrelation length of a site, while the perpendicular component corresponds to the correlation between neighbouring sites in a particular instance in time. Given that the cluster size diverges, we would expect a corresponding divergence in (at least one of) the correlation lengths. In pure directed percolation, both correlation lengths diverge at the critical point, while in undirected percolation there is only one diverging correlation length. Here we find analytically and numerically that along the phase curve in the infinite system size limit it is only the perpendicular component that diverges. Let us first introduce the pair-connectedness function, g(d, t). Beginning from a randomly- selected active node, g(d, t) denotes the mean number of other active nodes d network hops and t time steps later that are part of the same causal web. The d network hops are on the underlying graph of the network. So, for example, g(1, 1) denotes the average number of immediate direct descendants of a randomly-selected active node, while g(1, −1) denotes the average number of immediate parents for a randomly-selected active node. Meanwhile, g(2, 0) denotes the mean number of simultaneously-active siblings (i.e. those sharing an active parent), plus the mean number of simultaneously-active spouses (i.e. other nodes sharing an active daughter), for a randomly-selected active node. In general, g(d, d) is the expected number of living direct descendants of a particular node after d time-steps, g(0, t) is the probability that a particular node is reactivated by activity it is connected to after t time-steps, and g(2d, 0) is the probability that a node 2d

82 hops away from an active node is simultaneously active and part of the same cluster as the original randomly-selected node. We use 2d hops, instead of d hops, because if a node is concurrently active with another node, then those two nodes can only be connected by an equal number (say, d) of hops along daughter links and parent links, resulting in 2d total hops. These three specific projections of g(d, t) onto a single-dimensional function of d invite the definition of our correlation lengths:

g(2d, 0) ∼ exp(−d/ξ⊥) , (4.30)

g(0, t) ∼ exp(−|t|/ξk) , and (4.31)

g(d, d) ∼ exp(−d/ξd) . (4.32)

The parallel correlation length ξk

In the limit of a large system size there are no loops, and hence the probability that a randomly-selected node is part of the same cluster at two points in time is zero, and hence

  1 t = 0 g(0, t) = ,  0 t 6= 0

which is satisfied only for ξk = 0. For this reason, we say that the parallel correlation length is uniformly zero.

The perpendicular correlation length ξ⊥

To measure the correlation length ξ⊥ numerically requires extremely large graphs, as the diameter of a random graph typically grows as ≈ log(log(n)). So numerically speaking we will measure the correlation length using infinite graphs, assuming a loopless configuration. In this loopless correlation, we can actually calculate the correlation length analytically.

83 To determine the probability g(2d, 0) we will consider weighted combinations of d of each of the symbols P and D. These combinations will correspond to possible paths between points separated by 2d hops, with the placement of each ‘P’ corresponding to following a parental connection, while ‘D’ corresponds to following a daughter connection. A parent or daughter connection contributes probability pd, except when a parent follows a daughter connection (DP), in which case it contributes weight pp1. Additionally, each “change of direction” (either DP or PD) contributes k − 1 paths, instead of the more typical k, because we will exit the node by the same type of link as we entered. An arbitrary sequence of P and D can be written

l0 j0 l1 j1 lm jm P D (DP )1P D ... (DP )mP D ,

where m is the number of merges of independent activity, and where li and ji are integers ≥ 0. The “number of changes of direction” is equal to the number of merges m (each contributes a DP) and number of gaps between merges m − 1 (as each evinces a PD), and finally +1 if

d−1  l0 6= 0 and +1 if jm 6= 0. So for fixed m, the number of paths with l0 = 0 is m−1 and so the total number of paths with m merges is:

  d   d − 1  k − 12m−1 (k − 1) + k2(d−1) . m m − 1 k

So, since the maximum number of merges for fixed d is d, we have

d      2  2m−1  m X d d − 1 k − 1 d − 1 k − 1 pp1 g(2d, 0) = − + (p k)2d , m m − 1 k m − 1 d k p m=0 d which has the closed form expansion

2d (kpd) (−2(k(pd−pp1)+pp1)2F1(1−d,−d;1;β)+k(pd+kpd+pp1−kpp1)2F1(−d,−d;1;β)) = 2 2 , k pd−(k−1) pp1 (4.33)

84 Figure 4.13: The perpendicular correlation length function for simulations of infinite 10- −4 regular graphs near the critical line, with p0 = 10 . Thick solid lines are analytical predic- tions, while the lighter hue denote numerical averages from 2 × 106 simulations.

2 (k−1) pp1 where β = 2 and 2F1 denotes the Gauss hypergeometric function. This analyti- k pd cal result agrees well with simulation (cf. Figure-4.13). We will show that asymptoti- √ cally 2F1(1−(d+1),1−(d+1);1;β) = 2F1(1−(d+1),−(d+1);1;β) ≈ (1 + β)2, and hence that g(2(d+1),0) ≈ 2F1(1−d,1−d;1;β) 2F1(1−d,−d;1;β) g(2d,0) √ 2 kpd(1 + β) . Treating first 2F1(1 − d, −d; 1; β), we note that Watson [Watson, 1918] showed that for λ  1 and λ  −a and λ  b

 1 − α 2a+b−1Γ(1 − b + λ)Γ(c)(1 + e−υ)c−a−b−1/2e(λ−b)υ 2F1 a + λ, b − λ; c; ≈ √ (4.34) 2 λπΓ(c − b + λ)(1 − e−υ)c−1/2

√ ±υ 2 where α = cosh(υ) so that e = α ± α − 1. To transform 2F1(1 − d, −d; 1; β) into the form present in Equation-4.34 we will make use of the following identity, owing to Kummer’s 24 solutions to the hypergeometric differential equation (see page-67 [Luke, 1969], equations

85 1-3):  z  (1 − z)−λ F (λ, −λ; 1; z) = F 1 − λ, −λ; 1; . (4.35) 2 1 2 1 z − 1

β Now using z = β−1 and setting λ = d we have

 β  F (1 − d, −d; 1; β) = (1 − β)d F d, −d; 1; , 2 1 2 1 1 − β

1+β and now applying Equation-4.34 with a = 0, b = 0, c = 1, and α = 1−β

(1 − β)d2−1 r1 + e−υ ≈ √ eυd , dπ 1 − e−υ

and considering the ratios of two successive terms in the limit of large d, we have (expressing e±v in terms of β)

 s   2 2F1(1 − (d + 1), −(d + 1); 1; β) 1 + β 1 + β ≈ (1 − β)  + − 1 2F1(1 − d, −d; 1; β) 1 − β 1 − β   = 1 + β + 2pβ  2 = 1 + pβ .

Similarly, for 2F1(−d, −d; 1; β) we again have from Kummer’s solutions the identity

 β  F (−d, −d; 1; β) = (1 − β)d F −d, 1 + d; 1; , 2 1 2 1 β − 1

1−α and from the DLMF [DLMF, Equation-15.9.7] we have the relation 2F1(−d, 1 + d; 1; 2 ) =

Pn(α) where Pn denotes the n-th Legendre polynomial, so

d = (1 − β) Pd(α) .

86 l+1   (1+a) 2 √ √ 1 1 −2 Asymptotically, for large l we know that Pl 2 ∼ 2πla l , so taking a = 1 − α = 1−a (1−a) 2 √ (1− β)2 1+β we have

(1 − β)2 = √ (1 − β)2  2 = 1 + pβ .

So, we have g(2(d + 1), 0)   2 ≈ kp 1 + pβ , (4.36) g(2d, 0) d

which implies

g(2d, 0) ∼ e−d/ξ⊥ (4.37) for the correlation length

−1 −1 ξ⊥ = √ = . (4.38)    q 2  2 log kpd 1 + β (k−1) pp1 2 log kpd 1 + 2 k pd

So we attain a divergence in the correlation length with exponent −1 when

s 2 ! (k − 1) pp1 kpd 1 + 2 = 1 . (4.39) k pd

Rearranging this condition, and expanding in terms of p0, p1, and φ we find that the condition for a divergence in the correlation length is

0 = 1 + k2p12(−1 + φ)2 + φ + p1φ(p1 + p1φ − 4) − 2k(p1 + φ + p1φ(p1φ − 3)) . (4.40)

We can recognize that the right-hand side is the denominator of Equation-4.11, which indi- cates that the average cluster size diverges precisely when the correlation length also diverges.

87 The daughter correlation length ξd

We needn’t calculate g(d, d) explicitly to find its scaling form. Generically speaking, we have

kd X g(d, d) := mP (m, d) , m=0

where m is the number of direct descendent d timesteps after a randomly-selected active node, and P (m, d) = P (m descendents | an active parent d time steps ago ). We can then calculate g(d + 1, d + 1) from g(d, d) by considering that the number of daughters for each of the m nodes are independently drawn. Hence,

kd m k ! X X X g(d + 1, d + 1) = P (m, d) tP (t daughters of node i) . m=0 i=1 t=0

Taking advantage of independence of each of the m active nodes

kd m k ! X X X = P (m, d) 1 · tP (t daughters of an active node) m=0 i=1 t=0

and since whether each daughter of i’s activation is indepdent of the activation of her sister

nodes, we have hti = kpd where pd is the probability of each daughter activation, as in Equation-4.9, so

kd X = P (m, d)mkpd m=0

= kpdg(d, d) .

Now as kpd = σ (per Equation-4.10), and we can rearrange to find

g(d + 1, d + 1) = σ , g(d, d)

88 d which, given that g(0, 0) = 1, yields that g(d, d) = σ . So, g(d, d) = exp(−d/ξd) where

1 ξ = − , (4.41) d log (σ)

−1 which is finite and positive for σ < 1 as expected. As σ → 1 , ξd diverges in the expected way 1 1 lim ξd = − ∼ → +∞ , (4.42) σ→1− log σ 1 − σ

−1 with an exponent ξd ∼ (1 − σ) of −1.

4.1.8 Size of the giant component

Here we will analytically calculate the size of the giant component. We will let Qd denote the probability that a randomly-selected daughter branch does not connect to the infinite cluster. We will let Qp denote the probability that a randomly-selected parental branch does not connect to the infinite cluster. We will determine Qd and Qp by establishing self-

consistency relationships in a manner similar to our approach with Bd and Bp.

For a randomly-selected active node, the probability P∞ that this node belongs to the infinite cluster is

k ! k ! X m X m P∞ = 1 − Qd P (m daughters) Qp P (m parents | we activated) , (4.43) m=0 m=0

where P (m parents | we activated) denotes the probability that a randomly-selected node has m active parents in the preceding time step, conditioned on the fact that it is active in the current time step. Considering a randomly-selected active node that we reached by means of one of its parents, the probability that this daughter connection does not lead to

89 the giant component is:

k ! k−1 ! X m X m Qd = Qd P (m daughters) Qp P (m other parents | we activated via a parent) , m=0 m=0 (4.44) where P (m other parents | we activated via a parent) denotes the probability that a randomly- selected site has m parents selected from the k − 1 other parents in the preceding time-step, conditioned on the fact that the kth parent was active in the preceding time-step and that the node activated. Similarly, a randomly-selected active node that we reached by means of one of its daughters has the probability

k−1 ! k ! X m X m Qp = Qd P (m other daughters) Qp P (m parents | we activated) (4.45) m=0 m=0 that it does not connect to the giant component. Treating the first multiplicative term in Equation-4.44 we have

k k X X  k  QmP (m daughters) = Qmpmp k−m d m d d d m=0 m=0

k = (pd + pdQd) . (4.46)

Similarly the first term in Equation-4.45 yields

k−1 X m k−1 Qd P (m other daughters) = (pd + pdQd) . (4.47) m=0

Now the second multiplicative term in Equation-4.44 has the probability

P (m other parents | we activated via a parent) .

90 We can rewrite this probability by Bayes’ theorem as

P (m other parents | we activated via a parent) P (m other parents & we activated | had an active parent) = P (activated | an active parent) k−1−m k−1ΦmΦ [1 − p p m+1] = m 0 1 . pd

So, the second multiplicative term in Equation-4.44 can be written

k−1 X m Qp P (m other parents | we activated via a parent) m=0 k−1 k−1 m k−1−m m+1 X Φ Φ [1 − p0 p1 ] Qm m p p m=0 d 1 h k−1 k−1i = Φ + ΦQp − p0 p1 Φ + ΦQpp1 . pd

Similarly, the second multiplicative term in Equation-4.45 can be written

k k X X P (m parents & activated) QmP (m parents | we activated) = Qm p p P (activated) m=0 m=0 1  k k = ΦQ + Φ − p ΦQ p + Φ . (4.48) Φ p 0 p 1

Hence, Equation-4.44 becomes

k 1 h k−1 k−1i Qd = (pd + pdQd) Φ + ΦQp − p0 p1 Φ + ΦQpp1 , (4.49) pd

and Equation-4.45 becomes

1  k k Q = (p + p Q )k−1 Φ + ΦQ  − p Φ + p ΦQ  . (4.50) p d d d Φ p 0 1 p

Equations-4.49,4.50 can be solved numerically for Qd and Qp. It should be noted that Qd =

91 Qp = 1 and Qd = Qp = 0 are always solutions, although these solutions become unstable depending on p0 and p1. The solution Qd = Qp = 1 becomes unstable for supercritical combinations of p0 and p1. For smaller p0 and p1, this is the unique solution, and reflects the fact that there is no percolating cluster in the sub-critical regime. Using Equation-4.46 and Equation-4.48, we can also rewrite Equation-4.43 in the simpler form

1  k k P = 1 − (p + p Q )k ΦQ + Φ − p ΦQ p + Φ , ∞ Φ d d d p 0 p 1 and we can further simplify using Equation-4.50

= 1 − Qp (pd + pdQd) . (4.51)

We can also express the giant component G as the fraction of randomly-sampled nodes (active or otherwise) that are part of the giant component, as

G = ΦP∞ = Φ (1 − Qp (pd + pdQd)) (4.52)

We can identify the scaling exponent of P∞ near the critical point p1 ≈ p1c(p0) by Taylor- expanding Equation-4.51 as

 ∂Q ∂Q ∂p  P ≈ 0 + (p − p ) − p (1 − p (1 − Q )) − Q p d − Q d (1 − Q ) ∞ 1 1c ∂p d d p d ∂p p ∂p d 1 1 1 p1=p1c 2 + O (p1 − p1c)

however, at the critical point we know that Qp = Qd = 1 and P∞ = 0, and hence

 ∂Q ∂Q  ≈ (p − p ) − p − p d , 1 1c ∂p d ∂p 1 1 p1=p1c

∂Qd ∂Qp and since Qd,Qp ∈ [0, 1] and are decreasing from a value of 1 at p1 = p1c, both and ∂p1 ∂p1

92 are negative, so

1 =⇒ P∞ ∼ (p1 − p1c) . (4.53)

4.2 Numerical results for the branching process with

noise on finite k-regular graphs

Although we’ve developed a theory for infinite k-regular graphs, the question of applicabil- ity to finite networks remains. We may have a modest expectation that these results will generalize, because of the so-called “unreasonable” effectiveness of tree-like (loopless) ap- proximations [Melnik et al., 2011]. In this section, we will verify that our results on infinite k-regular graphs apply to finite k-regular graphs. We will check that the predicted active fraction is realized for finite k-regular graphs. We will confirm our predictions related to the size of mergeless avalanches, examine the curve-collapse of Figure-4.8 on finite graphs, and

1 conduct finite size scaling to extract the critical exponents β, γ and dν .

4.2.1 Avalanche distributions

In the limit of large graph size, the effect of clustering and loops tends to zero. As can be seen in Figure-4.14a, simulations conducted at the theoretical critical point for both infinite networks and large (but finite) networks yield very similar avalanche distributions. Because of the reduced computational requirements to simulate clusters on the infinite network, much more robust statistics can be gathered, with power-laws extending for several decades beyond what can be observed with the finite networks. Although there is likely a finite size effect that eventually truncates the avalanches on the finite size lattices, this effect isn’t evident in Figure-4.14b, where it might be expected to exponentially truncate the plateau. This is likely due to an inadequate number of statistics being gathered, as probing the largest avalanche sizes robustly enough to identify an exponential cut-off requires very long simulation times. As such, a finite size scaling analysis of the avalanche distribution is not possible.

93 Critical Size Distributions on 10-Regular Networks a. Avalanche size distribution b. Rescaled size distribution 100

10−1 −2

10 1 − 0 p 2 / 5

−4 ) 10 3 −3 / 10 2 0 Sp P(S) −6 )( 3

10 / 2 0 −5 Sp 10 (

10−8 P

10−10 10−7 100 101 102 103 104 105 106 10−6 10−4 10−2 100 102 104 106 2/3 S Sp0 −3 −7 −9 −4 p0 = 10 p0 = 10 p0 = 10 p0 = 10 −4 −9 −7 −3 p0 = 10 p0 = 10 p0 = 10 p0 = 10 −5 −3/2 −6 2/3 1 p0 = 10 x p0 = 10 (Sp0 ) −6 −5/2 −5 p0 = 10 x p0 = 10

Figure 4.14: a. Cluster size distributions obtained for finite (N = 107 for T = 104) and infinite 10-regular networks at the theoretical critical point given by the divergence of Equation-4.11. Finite simulations are given with transparent symbols, with the correspond- ing infinite graph result as a line of the same colour. b. As in a., but with a curve-collapse effected by rescaling according to the rate of spontaneous activation.

94 Figure 4.15: Here we compare the analytical results of Equation-4.19 to the size distribution of mergeless avalanches on finite graphs for a variety of noise levels at the theoretical critical point given by the divergence of Equation-4.11.

Mergeless avalanche distributions

One of the critical measures that predicted the avalanche collapse along the critical line was the distribution of the mergeless avalanches. We can test the predictions of Equation-4.21, the mergeless avalanche size distribution, on finite networks. It’s clear in Figure-4.15 that the analytical formula is an accurate predictor of the mergeless avalanche size distribution. This explains the considerable agreement in the power-law “hinge” or transition present in the avalanche distribution of Figure-4.14b.

4.2.2 The giant component in finite graphs

Finite size effects shift the location of the critical point. Although finite graphs do have an effective critical point, above which a single giant component emerges, this effective criti- cal point is higher than in the infinite lattice limit of mean-field (see Figure-4.16). Below

95 this effective critical point, the avalanche distribution has an exponential cut-off, making the maximum cluster size an intensive quantity that doesn’t scale with the simulation du- ration. As such, longer simulations solely have the effect of making the transition sharper (Figure-A.6). Above this effective critical point, the size of the giant component is very nearly exactly realized (Figure-4.16a). This is because the estimate for Φ is typically quite accurate even for finite size graphs (see Figure-4.17a), and above the critical point, the giant component very quickly comes to represent the vast majority of active sites (Figure-4.17b). Only rarely do clusters isolated from the background giant component occur. In traditional percolation, the rate at which the effective critical point approaches the infinite size critical

− 1 point can be used to extract the 1/(dν) as p1c, effective − p1c, infinite ∼ N dν , where d denotes the physical dimension [Christensen and Moloney, 2005]. In our case, we have no physical embedding, so although we could take d = 6 (the upper critical dimension for undirected percolation) and thereby extract a kind of ν, we have no a priori reason to know that we are operating in undirected percolation. Hence, we will content ourselves with extracting the combined exponent, 1/dν ≈ 0.36 in Figure-4.16b. For undirected percolation

1 in mean-field, the expected value for dν is 1/3, which is within the margin of error of our visual assessment of the quality of curve-collapse.

4.2.3 Mean cluster size

An independent way to confirm our assessment of 1/(dν) is to study the mean (finite) cluster size. We analytically calculated χn in the preceding section. Typically however, we have access to the overall cluster distribution, not the cluster distribution conditioned upon sampling active nodes, which differs by a factor of the cluster size (since P (s)n ∝ sP (s)c).

2 Hence, χn ∝ hS ic, where the c subscript denotes an average sampled over the set of clusters instead of over active nodes. We consider the finite clusters to be all but the largest cluster, which we take to be the giant component. The mean cluster size also exhibits a strong finite size effect, and good curve-collapse upon suitable finite size scaling (Figure-4.18). We find

96 Giant components of finite 10-regular networks a. Giant components near criticality b. Finite size scaling

10−1 101

10−3 10−1 g 36 . 0 gN

10−5

10−3

10−7

10−3 10−2 10−1 10−5 p1 − p1c 10−2 10−1 100 101 N = 1012/3 14/3 0.36 N = 10 (p1 − p1c)N N = 1016/3 N = 1012/3 N = 1018/3 N = 1014/3 N = 1020/3 N = 1016/3 1 18/3 (p1 − p1c) N = 10 −3 20/3 p0 = 10 N = 10 −4 1 p0 = 10 (p1 − p1c)

Figure 4.16: a. The fraction of the graph occupied by the largest cluster for various graph −3 −4 sizes, with p0 = 10 denoted by circles, and p0 = 10 denoted by triangles. Solid lines are theoretical predictions for the giant component size, as developed in §4.1.8. Simulations are for T = 104 time steps. b. As in a., but with a curve-collapse effected by finite size scaling.

97 Active fraction and giant component for 10-regular graphs a. Φ and G b. P∞ 0 10−1 10

10−2

10−1 10−3 G 10−4 Φ, ∞ 10−2

10−5 P

10−6 10−3

0.06 0.08 0.1 0.12 p −4 1 −3 10 G, p0 = 10 −4 0.06 0.08 0.1 0.12 G, p0 = 10 −5 G, p0 = 10 p −3 −13 Φ, p0 = 10 p0 = 10 −4 −4 Φ, p0 = 10 p0 = 10 −5 −5 Φ, p0 = 10 p0 = 10

Figure 4.17: a. The active fraction Φ and giant components G analytically (solid lines) and for simulations of varying sizes (symbols). Crosses are N = 1012/3, circles are N = 1014/3, and squares are N = 1016/3. Finite simulations are for T = 104 time steps, averaged over five network realizations. b. As in a., but for the fraction of active nodes that are part of the giant component.

98 Finite size effects on susceptibility of 10-regular graphs a. Susceptibility b. Rescaled Susceptibility

106

104

104

2 36 10 . 0 c − i 2 2

N 10 S c h i 2 S h

100 100

10−2

0.07 0.08 0.09 0.1 0.11 0.12 10−1

0.36 p1 (p1 − p1c)N

N = 1012/3 N = 1018/3 N = 1012/3 N = 1018/3 N = 1014/3 N = 1020/3 N = 1014/3 N = 1020/3 N = 1016/3 N = 1016/3

2 −3 Figure 4.18: a. The mean (finite) cluster size hS in with p0 = 10 denoted by circles, and −4 4 p0 = 10 denoted by triangles. Simulations are for T = 10 time steps. b. As in a., but with a curve-collapse effected by finite size scaling. the same finite size scaling exponent of 1/(dν) = 0.36 (Figure-4.18).

4.3 Simulations on other finite networks

Although we have developed a robust set of theoretical and numerical results for k-regular graphs, it remains to be shown that these results survive on more general classes of graph. Additionally, it is known that percolation and directed percolation will have exponents that change depending on the graph architecture (cf. tables-3.1,3.2). We can exploit this to

99 identify whether the phase transition we observe belongs to the percolation or directed percolation universality class. One challenge when working with arbitrary networks is the

problem of identifying the critical point p1c(p0). On the k-regular networks, we have theo- retical predictions to help us identify the critical point, however on other networks we must identify critical points by the presence of power-laws. The typical procedure is exemplified in

Figure-4.2a, simulations are conducted at a variety of p1 values, with the p1 value successively refined so as to produce a power-law.

4.3.1 Small-world graphs

Due to the presence of extensive local connections in the brain, it has a high clustering coef- ficient. However, it is also known that the brain contains long-ranged fibre tracts connecting different regions [Bassett and Bullmore, 2006]. For this reason, it has been argued that neural networks tend to exhibit the so called “small-world” property [Sanz-Arigita et al., 2010]. This clustering should, at first brush, also invalidate the tree-like approximation used to develop our analytical approach. We might expect that, since most connections will be recurrent in our small-world networks, correlations will play a significant role and lead to strong deviations from mean-field. For these reasons, it is especially important to study whether our analysis is valid on small-world networks. Here, we consider small-world networks with a rewire probability of 0.01 and mean de- gree 10, as was chosen in the original Watts-Strogatz paper [Watts and Strogatz, 1998]. We generate graphs as described in §3.1.4. It’s evident that a transition between power-laws also occurs in the small-world network (Figure-4.19a). The initial directed-percolation avalanche exponent is close to, but not exactly, the exponent expected for mean-field directed per- colation. So far, no numerical or analytical results have been reported for the avalanche exponent τ for directed percolation on the small-world network. The larger exponent, which we expect to correspond to pure percolation, is nearly indistinguishable from τ = 2.5, the mean-field exponent predicted [Moore and Newman, 2000]. Results are qualitatively quite

100 Exponent Transition on Small-World Networks a. Unscaled Avalanche Sizes b. Rescaled Avalanche Sizes 100 100

−2

10 1 − 0 p 2 / 5

−4 )

10 3 ) / 2 0 S ( 10−2 Sp P −6 10 )( 3 / 2 0 Sp 10−8 ( P

10−10 100 102 104 10−4 10−4 10−2 100 102 104 S 2/3 −2 p0 = 10 p1 = 0.083 Sp0 −3 −2 p0 = 10 p1 = 0.100 p0 = 10 p1 = 0.083 −4 −3 p0 = 10 p1 = 0.107 p0 = 10 p1 = 0.100 −5 −4 p0 = 10 p1 = 0.110 p0 = 10 p1 = 0.107 −1.35 −5 x p0 = 10 p1 = 0.110 −5/2 2/3 1.15 x (Sp0 )

Figure 4.19: a. Cluster size distributions obtained for finite (N = 105 for T = 105) small- world networks with rewiring probability 0.01. b. As in a., but with a curve-collapse effected by rescaling according to the rate of spontaneous activation. similar for a variety of rewiring probabilities, as can be seen in Figure-4.20, although for very low rewire probabilities and spontaneous initiation rates, a deviation from power-law behaviour is observed for small avalanches.

4.3.2 Power-law networks

Although there is evidence for a small-world description of the brain, this description has been complemented with a variety of other network properties. It has been observed that the brain is characterized by a significant number of hubs, nodes that have a very high degree distribution [Bullmore and Sporns, 2009]. A natural class of random graph that supports the generation of hubs is the power-law network. That brain networks are power-law has been reported in fMRI studies [Eguiluz et al., 2005]. Further, we haven’t yet addressed which

101 Critical avalanches on small-world networks a. Rewire = 10−1.5 b. Rewire = 10−2.0

10−1 10−1

10−4 10−4

10−7 10−7

100 102 104 100 102 104 −2 −5 −2 −5 p0 = 10 p0 = 10 p0 = 10 p0 = 10 −3 −1.35 −3 −1.35 p0 = 10 x p0 = 10 x −4 −2.5 −4 −2.5 p0 = 10 x p0 = 10 x c. Rewire = 10−2.5 d. Rewire = 10−3.0

10−1 10−1

10−4 10−4

10−7 10−7

100 102 104 100 102 104 −2 −5 −2 −5 p0 = 10 p0 = 10 p0 = 10 p0 = 10 −3 −1.35 −3 −1.35 p0 = 10 x p0 = 10 x −4 −2.5 −4 −2.5 p0 = 10 x p0 = 10 x

Figure 4.20: Simulations on small-world networks for N = 1013/3 and T = 105 averaged across three network realizations. Panels a-d correspond to various re-wire probabilities.

102 universality class the phase transition across the critical line actually represents. One might suspect that, because the pure percolation exponents hold asymptotically in the size distribu- tion, that the transition is ultimately a percolating transition. We can test this by studying the emergence of the giant component on power-law networks. Both directed and undirected

β percolation exhibit a giant component which appears with a power-law G ∼ (p1 − p1c) . In mean-field, β = 1 for both directed and undirected percolation; however, in power-law net- works β varies depending on the power-law exponent λ defining the degree distribution, as was summarized in Tables-3.1,3.2. Importantly, β for directed percolation and percolation attain their mean-field values at different values of λ. Here, we will consider power-law networks with uncorrelated in/out-degree distribution pin/out(k) ∼ k−7/2, constructed as de- scribed in §3.1.5. For such networks, we expect that the directed percolation β should still be 1, while the undirected percolation β should be 2. Studying the emergence of the giant com- ponent on these finite power-law networks, we find a power-law of β = 2 (cf. Figure-4.21). The scaling exponent 1/(dν) is confirmed with Figure-4.22. Additionally, we’ve made the heuristic argument that the power-law tail should be gov- erned by the percolation exponent, saying that the percolation exponent tail appears for

p1 → 0 and should be all that survives above the mergeless avalanche size cutoff. However, we can test this hypothesis by studying the avalanche exponent τ on power-law networks. If pin/out(k) ∼ k−7/2, we expect that τ = 8/3 for undirected percolation, while τ = 3/2 for directed percolation. This is exactly what we see in Figure-4.23, where we also have a good

universal curve-collapse for various p0.

4.3.3 Hierarchical modular networks

In addition to local-clustering, the brain also exhibits a striking degree of modularity [Bull- more and Sporns, 2009]. This has been modelled using hierarchical modular networks [Moretti and Mu˜noz,2013]. In the previous section, we exploited the fact that the power-law network can change the pure-percolation exponents to show that the transition is a perco-

103 Size of Giant Component on Power-Law Networks a. Unscaled Giant Component b. Rescaled Giant Component

0.3 101

100

0.2 500 . −1 0 10 G GN

0.1 10−2

10−3 0 10−2 10−1 0.08 0.1 0.12 0.14 0.16 0.18 0.250 (p1 − p1c)N p 1 N = 1014/3 N = 1014/3 N = 1015/3 N = 1015/3 N = 1016/3 N = 1016/3 N = 1017/3 N = 1017/3 N = 1018/3 N = 1018/3 N = 1019/3 N = 1019/3 N = 1020/3 N = 1020/3 N = 1021/3 21/3 2 N = 10 (p1 − p1c)

Figure 4.21: a. Simulations of the giant component on power-law networks with p(k) ∼ k−3.5 of varying sizes for T = 107/3 time steps and with 10 network realizations with estimated p1c = 0.1110, and giant emergence exponent β = 2 with 1/(dν) ≈ 0.25. b. As in a., but with a curve-collapse effected by finite size scaling.

104 Susceptibility of the Power-Law Network

105

104 250 . 0 103 N

c 2

i 10 2

S 101 h

100

10−2 10−1

−0.250 (p1 − p1c)N

N = 1014/3 N = 1018/3 N = 1015/3 N = 1019/3 N = 1016/3 N = 1020/3 N = 1017/3 N = 1021/3

Figure 4.22: Simulations of the giant component on power-law networks with p(k) ∼ k−3.5 of varying sizes for T = 107/3 time steps and with 10 network realizations. Finite size scaling is performed with estimated p1c = 0.1110 and finite size scaling exponent 1/(dν) ≈ 0.25.

lation transition and that the second power-law in the avalanche size distribution really is the underlying pure-percolation exponent. However, we can go the other direction and seek out a network structure that can affect the directed-percolation exponents. As mentioned earlier, it has been demonstrated in a variety of articles [Cota et al., 2018, Girardi-Schappo et al., 2016, Moretti and Mu˜noz,2013] that on modular networks the SIS model can exhibit

Griffiths effects, tying the spreading parameter p1 to the avalanche exponent τ. The SIS model, running on either directed or undirected networks falls into the directed percolation universality class (see the illustrative example of Figure-3.10). Hence, if the first exponent that appears in our avalanche distributions is truly the directed-percolation exponent, we

would expect that it should vary as p1 is varied. Now, since walking along the critical line will necessarily change p1 (and p0), we should be able to maintain the same power-law asymp- totic tail, while having different critical exponents on the smallest scales. One challenge to this approach is that there are effects due to the module size; however, if we rescale our

105 Exponent Transistion on Power-Law Networks a. Unscaled Avalanche Sizes b. Rescaled Avalanche Sizes 100

−2 10 1 10−2 − 0 p 3 / 8

−4 )

10 3 ) / 2 0 S ( Sp P −6 10 )( −4 3 10 / 2 0

−8 Sp 10 ( P

10−10 100 102 104 10−6 10−6 10−4 10−2 100 102 S 2/3 −3 p0 = 10 p1 = 0.088 Sp0 −4 −3 p0 = 10 p1 = 0.112 p0 = 10 p1 = 0.088 −5 −4 p0 = 10 p1 = 0.122 p0 = 10 p1 = 0.112 −6 −5 p0 = 10 p1 = 0.127 p0 = 10 p1 = 0.122 −7 −6 p0 = 10 p1 = 0.129 p0 = 10 p1 = 0.127 −3/2 −7 x p0 = 10 p1 = 0.129 −8/3 2/3 7/6 x (Sp0 )

Figure 4.23: a. Cluster size distributions obtained for finite (N = 107 for T = 104) power- law networks (p(k) ∼ k−3.5). b. As in a., but with a curve-collapse effected by rescaling according to the rate of spontaneous activation.

106 Griffiths Effect in Hierarchical Modular Networks a. Normalized Avalanches b. Normalized and Rescaled Avalanches 100 3 / 2 0

p 2

−2 10 10 .

10 2 S ) M ( 10−4 /P ) 100 S/M

P(S/M) / P(M) −6 10 3 / 2 0 p ( P 10−8 100 102 104 10−6 10−4 10−2 100 102 104 2/3 S/M p0 S/M 0.8 p0 = 0.0863 p1 = 0.00 x −3 1/3 p0 = 10 p1 = 0.0973 x −4 p0 = 10 p1 = 0.1030 p0 = 0.0863 p1 = 0.00 −5 −3 p0 = 10 p1 = 0.1042 p0 = 10 p1 = 0.0973 −6 −4 p0 = 10 p1 = 0.1046 p0 = 10 p1 = 0.1030 2.10 −5 x p0 = 10 p1 = 0.1042 1.25 −6 x p0 = 10 p1 = 0.1046

Figure 4.24: a. Cluster size distributions obtained for finite (N = 215 modules, each consist- ing of M = 102 nodes for T = 104) on hierarchical modular networks. b. As in a., but with a curve-collapse effected by rescaling according to the rate of spontaneous activation.

avalanche sizes by the size of a single module, we can identify that the mesoscopic avalanches have continuously-varying exponents (Figure-4.24a). Upon rescaling by the amount of noise (Figure-4.24), we only obtain a rough curve-collapse in the tail, where the same power-law is evident at different levels of noise. For intermediate-size avalanches, greater than the mod-

−2/3 ule size M but less than the spreading-merging transition (i.e. M < S < p0 M), different power-law exponents are visible, while for S < M, very strong finite size effects are visible.

The curve-collapse does not occur for p1 = 0, as the condition for curve-collapse was derived in the case that p0  1, which is violated in this case (with p0 = 0.0863).

107 4.4 Thresholded avalanches

In addition to causal webs, there have been other approaches to generalize the notion of neuronal avalanches to accommodate a noisy background. In this section, we consider the recovery of avalanches in the presence of noise by way of thresholding. In [Beggs, 2008, Beggs and Plenz, 2003], neuronal avalanches are delimited by periods of silence. However, for large systems there are no periods of quiet. Nonetheless, there can be large fluctuations in the active fraction. If each large fluctuation corresponds to a single neuronal avalanche, they could be separated by using a threshold. An avalanche is considered to have begun once the active fraction exceeds a given threshold, and to have concluded once the activity falls below that threshold. The traditional avalanche definition is recovered in the limit of a threshold of zero. The hope is that the threshold only affects the smallest avalanches, while leaving the largest avalanches, and therefore the asymptotic power-law and its associated critical exponent, unaffected. The use of thresholds is particularly common in experimental settings where instruments naturally spatially or temporally coarse-grain. For instance in fMRI, the BOLD signal is an average over dozens of milliseconds in a brain region with potentially millions of neurons. To obtain critical avalanches there, it is necessary to use an avalanche threshold [Tagliazucchi et al., 2012]. Although the temporal resolution of EEG or MEG is typically much better than fMRI, the spatial localization is much worse, meaning the activity of tens to hundreds of millions of neurons might be averaged. In studies of avalanche distributions in EEG [Benayoun et al., 2010] or MEG [Shriki et al., 2013], thresholds based on the background activity are used. Finally, in optical experiments with temporal and spatial resolutions similar to fMRI, a threshold of 1 standard deviation of the noise was used to define an active site [Scott et al., 2014]. For experimental systems, it’s typical to use some deflection in a continuous signal, usually set by some number of standard deviations from the mean, to define active sites and the basis for an avalanche. There is no systematic agreement on the selection of an appropriate threshold in the

108 literature, even in simulations where the ground-truth for connectivity and activation of neighbours is known [Del Papa et al., 2017, di Santo et al., 2018]. For instance, in [Del Papa et al., 2017], the threshold was chosen to be half of the mean active fraction. In the same paper, avalanche exponents were also reported when the threshold was varied, by setting the threshold to various percentiles of the active fraction distribution. However, when varying the threshold percentile from 5% to 25%, τ was reported to vary from approximately 1.25 to 1.55, while α varied from 1.5 to 2.0. This calls into question whether the collection of neurons in the paper were truly exhibiting critical behaviour, or whether thresholded avalanches are an appropriate tool for extracting critical exponents. However, it should be noted that the reported power-laws were typically only fit over a single order of magnitude, meaning that the quality of the power-law estimate is quite poor. The work only considered small ensembles of neurons and finite size effects also limited the purported power-laws. To test more systematically whether thresholded avalanches yield scale-free activity in our model, we tested thresholded avalanches for a variety of model parameters. We tested p0

−9 −3 varying from 10 to 10 on four curves in parameter space (in descending order of p1): i) the branching process critical point, with p1 = 1/k, ii) the Widom line, iii) the unity branch- ing ratio line, and iv) the diverging hSi line. Lines (ii-iv) are plotted in the phase-diagram of Figure-4.6. We tested a threshold set to the 2.5th activity percentile (Figure-4.25) and to the 50th percentile (Figure-4.26). In the limit of low-noise (cf. Figure-4.25a,4.26a), there is a complete separation of time-scale between the initiation and resolution of avalanches and the threshold is zero. A power-law exponent of −1.5 is present, consistent with directed- percolation. As the spontaneous activity is increased further, the avalanche exponent in- creases to approximately −1.25 (cf. Figure-4.25b-d). This is consistent with a recent report that, assuming avalanches do not overlap, the introduction of spontaneous activity during an avalanche will change the avalanche exponent to −1.25 [Das and Levina, 2018]. In that report, a bump also appeared at the tail end of the avalanche distribution as the level of spontaneous activity increased, consistent with our results.

109 The choice of threshold has a strong effect on the avalanche distribution’s tail and head. The effect on the head is particularly pronounced for higher noise, where the threshold rises to kill off the smallest avalanches. For the tail, the approximate location of the cut-off does not change as the threshold is increased, however the largest avalanches tend to die as the noise is increased. This can be understood as the size of fluctuations decreasing as the noise increases. In the appendix, several other values of threshold are considered: thresholds below the 50th percentile of activity seem to introduce a “bump” to the end of the avalanche distribution, while thresholds above the 50th percentile introduce some kind of exponential cut-off (cf. Figure-A.7 through Figure-A.10). Comparing Figures-4.25,4.26, the avalanche exponent before that cut-off doesn’t appear to be altered by the choice of threshold. The curves with the greatest p1 have the largest cut-off, but as p0 increases no value of p1 seems to be asymptotically scale-free. Thresholded avalanches may have some utility for very low levels of noise, where they yield similar power-laws for a variety of thresholds. However, they do not accurately cap- ture relevant neural activity for large scale systems. They fallaciously inflate avalanches, by lumping together causally unrelated activity. This is a consequence of using a global measure, the mean active fraction, to estimate a process that occurs locally. Addition- ally, it’s not clear what these thresholded avalanches even measure for large systems, as the fluctuations in the active fraction will decrease even as the mean number of avalanches increases. This means that threshold crossings will almost never correspond to any single avalanche for the largest scale systems. These failings are more readily apparent as the level of noise increases – whereas causal webs can identify scale-free behaviour even for high levels of noise (Figure-4.14), there is a very sharp cut-off after a few orders of magnitude in Figures-4.25,4.26. Lastly, the transition between the directed-percolation and undirected- percolation exponents exposed by the use of causal webs is nowhere to be seen when using

2 −3 thresholded avalanches. That an exponent of −1.5 is seen for S > 10 for p0 = 10 at the diverging χn critical point, given that Figure-4.14 suggests an exponent of −2.5 should be

110 Spontaneous activity and thresholded avalanches. Percentile: 2.5 % −8 −7 a. p0 = 10 b. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

−6 −5 c. p0 = 10 d. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

−4 −3 e. p0 = 10 f. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

Figure 4.25: Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the lowest 2.5th percentile.

111 Spontaneous activity and thresholded avalanches. Percentile: 50.0 % −8 −7 a. p0 = 10 b. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

−6 −5 c. p0 = 10 d. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

−4 −3 e. p0 = 10 f. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

Figure 4.26: Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 50th percentile.

112 visible for S > 102 is yet to be understood. Further, that power-law-like behaviour (albeit truncated) is visible above the theoretical critical point when accounting for noise is yet to be explained.

4.5 Summary

In this chapter, we’ve developed a mean-field theory describing the branching process with noise, a two-parameter model that describes the spontaneous initiation and spreading of activity. We derived the active fraction as well as the dynamical susceptibility, which defines the Widom line. Using this mean field value for Φ, we were able to determine a branching ratio for a randomly-selected active node, another observable of interest to computational neuroscientists. We studied thresholded avalanches as a possible extension of avalanches to domains without a separation of timescales between initiation and propagation, and found that scale-free behaviour is not present. However, by taking into account network structure, we can disentangle independent events using causal webs. When we studied the distribution of causal webs, we found a phase transition. For sufficiently large p0 or p1, a giant component appears, the analogue of a percolating cluster. This phase transition would occur for lower p0 and p1 than the σ = 1 line, which agrees with reports that true neural networks have a sub-critical branching ratio [Priesemann et al., 2014]. The phase transition itself belongs to the universality class of undirected percolation. This is a somewhat surprising result, as typically at the critical point the microscopic details of a given model don’t matter very much. It may have been assumed that the inclusion of noise wouldn’t affect the underlying universality class. However, for causal webs, the introduction of spontaneous activity restores a time symmetry that is absent in directed percolation. There have been studies of related models, such as the susceptible- infected-recovered model (and associated variants) with multiple initial spreaders [Choi et al., 2017, Hasegawa and Nemoto, 2016]. The SIR model is known to exhibit a percolation

113 transition. They found that including multiple initiation sites for activity retained the mean- field percolation transition. We have expanded on the literature by identifying a model that tunes a transition between directed percolation and undirected percolation, and identified a scaling relation describing the power-law divergence of the transition point as the directed percolation limit is approached. This transition between directed and undirected percolation has been confirmed by studying the size distributions of avalanches, the duration distribution of avalanches, and the durations of avalanches of a given size. All three measures show mean-field exponents of both directed and undirected percolation, with a transition point

−2/3 that scales as sc ∼ p0 . We confirmed our analytical predictions with extensive numerical simulations. The small- est avalanches are distributed according to the exponent of directed percolation, while the largest avalanches are distributed according to the exponent of undirected percolation. We were able to modify both of these exponents independently by changing the underlying network structure. Power-law networks couple to the critical exponents for both directed and undirected percolation; however, we studied uncorrelated power-law networks with p(k) ∼ k−3.5, which only couple to the undirected-percolation exponents. Additionally, we found evidence for a Griffiths phase with mesoscopic avalanches on a hierarchical modu- lar network. The avalanche exponent describing avalanches larger than the module size, but smaller than the transition to undirected percolation, varied continuously as we changed p1. This matches reports of Griffiths effects in the SIS model [Cota et al., 2018, Moretti and Mu˜noz,2013]. Most strikingly, we found that the causal web distributions of systems tuned to criticality all obeyed the same scaling form, and that the transition point to undirected

−2/3 percolation seemed to scale as p0 regardless of other factors. We also studied other measures, such as the appearance of the giant component and the scaling of correlation lengths. We found that on the power-law networks, the giant component appeared with an exponent of β = 2, exactly as predicted for undirected percolation on power-law networks [Cohen et al., 2002], supporting our belief that the phase transition is

114 ultimately a percolating one. With regards to the correlation length, it was only possible to meaningfully test the correlation length on the infinite k-regular networks, as the diameter of most random graphs is typically quite small. We found that only the perpendicular correlation length diverged at the phase transition, which supports the notion that the phase transition is a percolation phase transition. The autocorrelation length is always zero, since with infinite random graphs there are no loops of finite size, and hence the odds of an avalanche returning to a site is zero. In finite graphs however, there are certainly loops of finite size, and these loops contribute to the finite size effects we correct for in finite size scaling. Further, for lattices below the critical dimension loops will play a role, regardless of the size of the lattice. It would be of interest, though beyond the scope of this work, to study directed percolation with spontaneous activity on such lattices, and study the autocorrelation length numerically. In the limit of zero noise, the autocorrelation length should diverge near the critical point. Whether it would still diverge at the phase transition once there was a non-zero amount of noise hasn’t yet been studied in the literature, although undirected percolation with multiple seeds has been studied, where it was found to support typical undirected percolation [Roy and Santra, 2013]. Instead of considering this model in another setting and asking questions about its sta- tistical mechanical properties, we could instead aim at enriching this model with features typical of neurons. For instance, we could replace the spontaneous activation of sites with the spontaneous activation of bonds, so as to more accurately represent minis, wherein a synapse spontaneously releases neurotransmitters. In such a case, we would find that the

−2/3 network topology would inform the initiation of activity. This might change the p0 scaling to vary depending on the network under consideration. A richer model of neural behaviour is the topic of the next chapter.

115 Chapter 5

Quadratic Integrate-and-Fire neurons

Thus far, we have considered a much reduced model of neuron behaviour: the branching process with noise operates in discrete time increments and considers neurons that have no hysteresis effects. Indeed, those neurons have no internal variables to speak of, much less the continuously-varying ones that characterise real neurons (e.g. the membrane potential, reserves of neurotransmitter, dendritic density, etc.). It remains to be shown that scale-free behaviour can be obtained in biologically-motivated neurons driven by noise, and whether those noisy neurons operate in the same universality class of the branching process with noise.

5.1 The model

A simple model for quadratic integrate-and-fire with adaptation and depression (QIFAD) neurons is presented by Izhikevich, and consists of a pair of coupled differential equations (Equations-5.1,5.2) [Izhikevich, 2007]. In this work, these equations have been modified to include a generic noise term η. The QIFAD model is governed by the following to equations

Cv˙ = k(v − vr)(v − vt) − u + ie + η and (5.1)

116 τau˙ = b(v − vr) − u . (5.2)

Here, v represents the voltage across the membrane, C corresponds to the capacitance of the cell membrane, k is a parameter that sets the rate of depolarization after the membrane

activation threshold vt is surpassed, and vr is the resting membrane potential, ie and η denote currents from parent neurons and a generic noise current respectively. u denotes the slow

ionic currents, with τa representing the time constant characterising the ionic channels, b corresponds to the strength of the leak channels in response to depolarization. Real neurons

reach a peak voltage vp before quickly returning to their resting potential. We introduce the

firing condition: If v > vp then

v → vc u → u + DDp → βDp tj → t , (5.3)

where D > 0 is large, and produces the hyper-polarization and soft-refractory period char-

acteristic of an action potential. Dp denotes the neuron’s store of neurotransmitter and tj denotes the last time neuron j fired. The external currents ie decay exponentially, with the current of neuron j being

0 X iej = g Dp(tl) exp[−(t − tl)/τk]Θ(t − tl) , (5.4) l

P0 where the sum l is restricted to the parents of node j, τk is the decay time for currents of the neurotransmitter associated with parent l0, g is the coupling strength between neurons, and Θ is the Heaviside step function. In practice, Equation-5.4 is implemented as a differen-

diej tial equation subject to occasional jumps, with τk dt = −iej. Each daughter l of firing parent

j has its current incremented as iel → iel + gDpj. Although it’s simple enough to extend the current to include multiple species of current to reflect different types of neurotransmitter (e.g. GABA vs. AMPA vs. Glutamate, etc.) including inhibitory neurotransmitters, for the purposes of this thesis we will consider strictly excitatory neurons, relying solely on synaptic

117 Figure 5.1: The response of a single QIFAD neuron, to a periodically-applied current, in- creasing in strength with each application. The top panel shows the membrane voltage, while the bottom panel shows the applied current. depression, refractory period, and coupling strength g to regulate activity. This would there- fore most accurately capture neurons with a short-lived excitatory neurotransmitter and only short-term plasticity (e.g. glutamatergic neurons), insofar as we are ignoring long lifetime currents and long-term plasticity. This set of equations can qualitatively reproduce the re- sponse of neurons to applied currents, with applied currents creating a transient membrane depolarization, and in the case that it exceeds the membrane threshold, an action potential (Figure-5.1). It satisfies a number of desirable properties from a neuroscience perspective, by having a characteristic all-or-nothing response, a well-defined threshold, and a refractory period. This model has a large number of parameters. For simplicity, we will take those given in Izhikevich [Izhikevich, 2007] for the regular spiking neuron, which is taken as typical for cortical neurons [Izhikevich, 2007]. The coupling strength, g, we will adjust akin to p1 in the previous chapter. All that remains is the description of the noise term, η. As we are describing a continuous time system, the natural generalization for p0 is Poisson shot noise, with frequency λ. This also has a clear biophysical interpretation: neurotransmitter is applied in discrete quanta representing single vesicles. The release of minis will always

118 be some discrete number of quanta. The noise term will decay exponentially, and will be assumed to be of the same type of neurotransmitter as the neurons themselves, and hence

will have the same time decay, so thatητ ˙ k = −η. We will integrate Equations-5.1,5.2,5.4 using the second-order Runge-Kutta method. During each time step, for each neuron, we draw from the Poisson distribution with frequency λ to determine the number of quanta n released during that time step (typically zero). We increment η by η → η + ngnoise, with gnoise a free parameter. Both λ and gnoise will be considered free parameters, their combined effect being mapped roughly to p0. This mapping, however, is not quite as clean as one might hope; as can be seen in Figure-5.1, after application of a current, the membrane of a neuron remains depolarized for some time. This means that noise can serve to potentiate

firing, thereby making propagation of activity easier. Therefore, noise also can affect p1! And indeed, without the noise term the differential equations governing the voltage and ionic currents are entirely deterministic. Therefore, all neurons would either fire or not fire together without some added stochasticity. As before our principle object of study will be causal webs. The activation of neurons is easy enough to spot and register – every time our program uses the rules of Equation-5.3 we count a neuron as having fired. We initiate new clusters when no parents of a given neuron have fired within some span of time, here typically twelve milliseconds (the period it takes for a small membrane fluctuation to decrease by a single factor of Euler’s constant).

5.2 Simulations on Erd¨os-R´enyi and hierarchical mod-

ular networks

In this section, we present some preliminary numerical simulations of this system. Here we take gnoise = g to reduce the number of parameters we vary. Other constants are as given in Table-5.1. Depending on the value of g, it is typical to find firing rates ranging from 10−1 Hz to 101 Hz. For Figure-5.2, the mean firing rate ranges from 0.1 to 0.4 Hz

119 Parameter Value C 1 β 0.2 τd 200 ms τa 50 ms τk 2.5 ms b 0.2 vc -65 mV d 8 vp 35 mV vr -82 mV vt -42 mV k 0.04

Table 5.1: Values used for QIFAD simulations. From [Izhikevich, 2007] and [Orlandi et al., 2013].

per neuron, meaning that in any given time bin, approximately 474 sites are active. If avalanches were delineated by periods of quiet, there would only be one avalanche, spanning the entire simulation, making it impossible to define useful avalanches here. As can be seen in Figure-5.2, the power-law exponent is approximately 2.5 and any roll-over from an exponent of 1.5 is hidden, meaning that the simulation dynamics are dominated by noise. This makes sense, as gshot = g, with λshot ≈ 10 × λneurons, so most of the input driving these neurons is random noise rather than the firing of their neighbours. Both sub-critical and super-critical behaviour is evident in the avalanche distribution, which includes an exponential cut-off for the sub-critical case, and a number of pre-giant avalanches in the super-critical case. Similarly, in the hierarchical modular networks, we have found a power-law of approxi- mately 2.2 (Figure-5.3) for the size distribution, comparable to the asymptotic results pre- sented in Figure-4.24. This exponent, along with the α ≈ 3.4 is consistent with γ ≈ 2.0, which is very nearly captured in Figure-5.3c. There are some deviations from a pure power- law in these simulations, indicating what most likely reflects a slightly super-critical set of parameters.

120 Avalanche sizes on Erd¨os-R´enyi graphs, λ = 4 × 10−3 100 g: 7.000 g: 7.143 10−2 g: 7.571 x−5/2

10−4

−6 P(S) 10

10−8

10−10

100 101 102 103 104 105 106 Size, S

Figure 5.2: Simulations near the critical point for the QIFAD model on Erd¨os-R´enyi net- works, with N = 1011/2 neurons and λ = 4 × 10−3 kHz or 4 Hz, for three different values of connection strength g.

5.3 Summary

In a biologically-reasonable model of neurons, it’s possible to identify scale-free causal webs. These causal webs have the same scaling exponents as we found in the branching process with noise, suggesting that they fall into the same universality class. However, they do not show the transition between the exponents of directed and undirected percolation. This is likely because the simulation parameters were chosen to leave the neurons only weakly coupled, meaning that the simulations were driven mostly by noise. QIFAD neurons have a large parameter space, with different choices of parameters leading to cyclic firing behaviour, “chattering”, and intrinsic bursting at the neuronal level. At the population level, it’s possi- ble to accurately reproduce bursts present in neuronal cultures, which are giant component causal webs cut short by homeostatic mechanisms such as neurotransmitter depletion and cellular adaptation [Orlandi et al., 2013]. One interesting future research direction might be to construct an explicit projection of the QIFAD model onto the simpler branching process with noise, and use this projection to steer the QIFAD model parameters towards a less

121 Critical Avalanche Distributions on Hierarchical Modular Networks A. Size distribution B. Avalanche Duration C. Size and duration 100 100 105

10−1 −2 10 4 10−2 10

10−4 10−3 103 −4 i

10 S 10−6 h

P(S) −5 P(T) 10 Size 102 10−8 10−6

−7 10 101 10−10 10−8

10−12 10−9 100 100 101 102 103 104 105 106 100 101 102 103 104 100 101 102 103 104 Avalanche Size (S) Duration, T (ms) Duration, T (ms) 0.67 ∗ S−2.19 0.88 ∗ S−3.40 S1.95

Figure 5.3: Simulations conducted on hierarchical modular networks, with 1000 neurons per base node, 100 neurons per inter-modular connection, and 7 hierarchical layers (for a total of 27 nodes). The average in/out degree of each neuron was ≈70. Neuron parameters are as given in Table-5.1, except the that capacitance is given by C = 174 pF, while k = 0.4 and b = 3.5. The excitatory connection strength is given by g = 100 pA. In this simulation, λ = 650 Hz, while gshot = 70.3 pA. Ten percent of intra-neurons were inhibitory (GABAergic), with gGABA = -15 pA and τk = 20 ms. Values are the result of an average across five network realizations, each simulated for three minutes. a. The probability distribution function for mean causal web size, with a power-law fit generated by maximum likelihood estimator. b. The probability distribution function for mean causal web duration. c. The mean size for avalanches of a given duration. Fit is to avalanches smaller than the bursts.

122 noisy regime, or towards other behaviour. It might also be interesting to consider a model with intrinsic heterogeneity, such as might occur if the driving noise consists of minis, and is thus proportional to a neuron’s parent count. This would make the network architecture promote some neurons as the source of most of the systems activity, and could strongly affect the dynamics.

123 Chapter 6

Conclusions

The brain has the richest emergent phenomena we know of. Surprisingly, however, the activity of neural systems shows many of the hallmarks of the critical point of the humble branching process. There are numerous arguments for the optimality of this critical point from an information-processing standpoint, and it is known that at a critical point the basic details of the model often “wash away”. For this reason, many reductionist models of brain activity therefore map the behaviour of individual neurons onto the branching process or other related models. However, the branching process is an example of a system with a nonequilibrium phase transition into the absorbing state of no activity. The model ignores the initiation of new activity, either as input from the outside world or from spontaneous activations intrinsic to neurons. This is made most obvious in large neural networks, where there are no periods of inactivity by which to delineate neuronal avalanches, the original observable that sparked the criticality hypothesis.

6.1 Summary of results

In this thesis, I explicitly studied the effect of including spontaneous activity, or equivalently noise, in two models for the neuron. I did this by examining the resulting distributions of causal webs, a generalization of neuronal avalanches that use network structure to disentangle

124 independent cascades. Unsurprisingly, in the limit of zero noise we recover the typical mean- field exponents of directed percolation reported in the literature for neuronal avalanches. In that limit, causal webs are identical to the typical neuronal avalanches. By increasing the connection strength between neurons, a phase transition occurs, above which a single avalanche that never terminates appears. It is at this liminal point, just before the giant component appears, that scale-free behaviour is exhibited and critical exponents appear. We found that the picture is more complicated in the presence of noise. With noise, independent cascades of activity still exhibit a phase transition as the neuron coupling is increased, with a giant component being the result. However, at the critical point, the avalanche exponents are no longer the mean-field exponents expected of a typical branch- ing process. Instead, they take on the exponents of pure percolation. Studying the simple branching process with noise, we derived a bevy of analytical results. We analytically deter- mined the phase-transition line, and found that it occurs for lower neuron coupling strengths than the unity branching ratio (σ = 1) line sometimes studied in the literature, and below the Widom line that has been proposed for noisy neural networks. This is consistent with reports that the brain tends to occupy a slightly sub-critical state, as the measure of criticality in this study was the branching ratio [Priesemann et al., 2014]. We found that along this phase line, exponents from both directed and undirected percolation appear in the avalanche size distribution, with the undirected-percolation exponents dominating asymptotically. Tak- ing this as a cue, we numerically studied a variety of more generic networks. Simulations on power-law networks showed us unambiguously that in the presence of noise the phase transition is a percolation transition, because the critical exponents governing power-law networks differ from mean-field directed percolation. This is because activity only spreads so far from a single site before the merging with other cascades of activity. The merging of activity belongs to the universality class of undirected percolation, because it enjoys a time symmetry that directed percolation does not: you can merge with activity that started from a root before or after you. Additionally, the presence of spontaneous activity destroys the

125 absorbing state transition characteristic of directed percolation. We calculated analytically on k-regular graphs how far activity spreads before merging takes over, and calculated how it scales with noise on the critical line. Using this we showed that avalanche distributions along the critical curve belong to the same universality class. This result was surprisingly robust, as it worked even for systems with recurrent connections, hubs, and modules. We’ve shown that traditional measures of criticality, such as the branching ratio and the number of active nodes, fail for any level of noise once systems grow large enough. However, it is possible to identify criticality even in large, noisy systems, assuming one makes use of network structure to disentangle independent events. We’ve developed a theory of branch- ing processes with spontaneous activity, which should find application in neuroscience as a model for noise-driven neurons. If a neuroscientist has sufficiently precise information about connections between neurons, it should be possible for them to use causal-web distributions to look for the hallmarks of criticality: power-laws and scaling relations.

6.2 Outlook and future work

Although we addressed many questions related to the phase transition and associated critical behaviour of spreading processes driven by noise, there remain many possible avenues of in- quiry. On the theoretical side, we analytically determined many properties of the branching process with noise on the simplest random graph – the k-regular graph. These theoreti- cal results could be extended to more generic classes of graph by application of generating functions, which have been used to describe the spreading of epidemics (e.g. the SIS or SIR models) on complex networks [Newman, 2002]. Although we found numerical evidence suggesting that our results carry over to various other networks, finding analytical results might reveal subtle variations in the scaling laws that weren’t obvious in numerical analysis. For instance, although the curve-collapse of Figure-4.23 was quite good, it’s possible that power-law networks require rescaling by a different exponent of p0. A generic scaling argu-

126 − γ γ 1+γ ment suggested that if hSi(T ) ∼ T , that sc ∼ p0 . If γ differs in other networks, with power-law networks seeming a likely candidate, then the exponent of p0 necessary to effect a curve-collapse for the size distribution will change. In the vein of theoretical analysis, we made no calculations related to finite size effects. Our analytical approach was tree-like and ignored correlations. We might be able to obtain a first-order estimate of the finite size by including first-order loop corrections to account for these correlations. More generally speaking, the branching process with noise represents an interesting in- termediate model between percolation and directed percolation. If we considered the same model on a spatially-embedded network, such as a lattice or metrical network (where nodes are preferentially connected to their neighbours), then we could perform spatial renormaliza- tion. The resulting renormalization flow might help illuminate the length scale above which percolation exponents appear. A spatial embedding would also be amenable to studying both correlation lengths present in directed percolation. Whereas in the mean-field of net- works, we concluded that the two-point connectedness function’s autocorrelation was zero and so found νk = 0, in a spatially-embedded lattice correlation certainly isn’t zero. There should be some kind of scaling relationship for νk. The above problems lean more in the direction of pure . The branch- ing process with noise could also have application in modelling the spread of viruses, rumours, and epidemics. In particular, any system with hidden couplings that pervade the network, or in which agents are exposed to a continual source of stimulation that propagates activity outside of the network, can be treated with this model. In the context of disease processes, this model could capture the outbreak behaviour of zoonotic diseases, which are diseases that spread from animals to humans [Wolfe et al., 2005]. For instance, the Zika virus can be transmitted sexually or by mosquito [Mayer et al., 2017]. The appropriate network might be the human sexual network, with p1 corresponding to the probability that an infected person transmits the disease to their partner, and p0 representing a background chance that an individual randomly contracts the disease from a mosquito. Diseases like the Ebola or Mar-

127 burg viruses – which can be spread to humans from bats or bush meat, as well as between humans – might be treated in a similar manner. In the domain of social science, simulations of rumour-spreading could exploit the explosion of social networks made available in the in- formation age. This could find application in predicting the susceptibility of such networks to phenomena like “Fake news”. Of course, the original motivation for this work was rooted in neuroscience. Explic- itly constructing a projection of complex biological neuron models, such as the QIFAD and Hodgkin-Huxley models, to the branching process with noise could be used to create indica- tors of criticality in these richer models. Another challenge is connecting neural avalanches in the form of causal webs to actual neural processing. Although neuronal avalanches have been connected to learning in artificial neural networks [Del Papa et al., 2017, van Kessenich et al., 2019], these studies only focus on a single coherent input at a time, and ignore the fact that real brains operate in a highly-parallel manner. For sufficiently large neural networks, it seems likely that causal webs will be necessary to identify neural avalanches. Although it will be easiest to apply causal webs and study the role of noise in artificial neural networks, as microscopy technology improves it should eventually become possible for in vivo studies of the brain to reveal causal behaviour. Then it can be determined if neural systems tune to the scale-free causal-web critical point, the unity branching ratio critical point, or the Widom-line.

128 Bibliography

L. A. Adamic and B. A. Huberman. Power-law distribution of the world wide web. Science, 287(5461):2115–2115, 2000.

P. Bak, C. Tang, and K. Wiesenfeld. Self-organized criticality. Physical Review A, 38(1): 364, 1988.

P. J. Basser, S. Pajevic, C. Pierpaoli, J. Duda, and A. Aldroubi. In vivo fiber tractography using dt-mri data. Magnetic resonance in medicine, 44(4):625–632, 2000.

D. S. Bassett and E. Bullmore. Small-world brain networks. The Neuroscientist, 12(6): 512–523, 2006.

J. M. Beggs. The criticality hypothesis: how local cortical networks might optimize infor- mation processing. Philosophical Transactions of the Royal Society of London A: Mathe- matical, Physical and Engineering Sciences, 366(1864):329–343, 2008.

J. M. Beggs and D. Plenz. Neuronal avalanches in neocortical circuits. Journal of Neuro- science, 23(35):11167–11177, 2003.

J. M. Beggs and N. Timme. Being critical of criticality in the brain. Frontiers in Physiology, 3:163, 2012.

M. Benayoun, M. Kohrman, J. Cowan, and W. van Drongelen. Eeg, temporal correlations, and avalanches. Journal of Clinical Neurophysiology, 27(6):458–464, 2010.

129 M. Bogu´a,R. Pastor-Satorras, and A. Vespignani. Epidemic spreading in complex networks with degree correlations. In Statistical mechanics of complex networks, pages 127–147. Springer, 2003.

B. Bollob´as. Random Graphs. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2 edition, 2001. doi: 10.1017/CBO9780511814068.

F. Brauer. Compartmental models in epidemiology. In Mathematical Epidemiology, pages 19–79. Springer, 2008.

M. Breakspear. Dynamic models of large-scale brain activity. Nature Neuroscience, 20(3): 340, 2017.

A. D. Broido and A. Clauset. Scale-free networks are rare. Nature Communications, 10 (1017), 2019.

E. Bullmore and O. Sporns. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience, 10(3):186, 2009.

N. Chen and M. Olvera-Cravioto. Directed random graphs with given degree distributions. Stochastic Systems, 3(1):147–186, 2013.

D. R. Chialvo. Emergent complex neural dynamics. Nature Physics, 6(10):744, 2010.

W. Choi, D. Lee, and B. Kahng. Critical behavior of a two-step contagion model with multiple seeds. Physical Review E, 95(6):062115, 2017.

N. A. Christakis and J. H. Fowler. Six degrees of separation. Distances, pages 3–57, 2016.

K. Christensen and N. R. Moloney. Complexity and criticality, volume 1. World Scientific Publishing Company, 2005.

L. Cocchi, L. L. Gollo, A. Zalesky, and M. Breakspear. Criticality in the brain: A synthesis of neurobiology, models and cognition. Progress in Neurobiology, 2017.

130 O. Cohen, A. Keselman, E. Moses, M. R. Mart´ınez,J. Soriano, and T. Tlusty. Quorum percolation in living neural networks. EPL (Europhysics Letters), 89(1):18008, 2010.

R. Cohen, D. Ben-Avraham, and S. Havlin. Percolation critical exponents in scale-free networks. Physical Review E, 66(3):036113, 2002.

M. Copelli, R. F. Oliveira, A. C. Roque, and O. Kinouchi. Signal compression in the sensory periphery. Neurocomputing, 65:691–696, 2005.

W. Cota and S. C. Ferreira. Optimized gillespie algorithms for the simulation of markovian epidemic processes on large and heterogeneous networks. Computer Physics Communica- tions, 219:303–312, 2017.

W. Cota, G. Odor,´ and S. C. Ferreira. Griffiths phases in infinite-dimensional, non- hierarchical modular networks. Scientific Reports, 8(1):9144, 2018.

A. Das and A. Levina. Critical neuronal models with relaxed timescales separation. arXiv preprint arXiv:1808.04196, 2018.

A. de Andrade Costa, M. Copelli, and O. Kinouchi. Can dynamical synapses produce true self-organized criticality? Journal of Statistical Mechanics: Theory and Experiment, 2015 (6):P06004, 2015.

B. Del Papa, V. Priesemann, and J. Triesch. Criticality meets learning: Criticality signatures in a self-organizing recurrent neural network. PloS One, 12(5):e0178683, 2017.

S. di Santo, P. Villegas, R. Burioni, and M. A. Mu˜noz.Landau–ginzburg theory of cortex dynamics: Scale-free avalanches emerge at the edge of synchronization. Proceedings of the National Academy of Sciences, 115(7):E1356–E1365, 2018.

DLMF. NIST Digital Library of Mathematical Functions. http://dlmf.nist.gov/, Release

1.0.20 of 2018-09-15. URL http://dlmf.nist.gov/. F. W. J. Olver, A. B. Olde Daalhuis,

131 D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller and B. V. Saunders, eds.

V. M. Eguiluz, D. R. Chialvo, G. A. Cecchi, M. Baliki, and A. V. Apkarian. Scale-free brain functional networks. Physical Review Letters, 94(1):018102, 2005.

P. Erd¨osand A. R´enyi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17–60, 1960.

E. Estrada. When local and global clustering of networks diverge. Linear Algebra and its Applications, 488:249–263, 2016.

E. D. Fagerholm, R. Lorenz, G. Scott, M. Dinov, P. J. Hellyer, N. Mirzaei, C. Leeson, D. W. Carmichael, D. J. Sharp, W. L. Shew, et al. Cascades and cognitive state: focused attention incurs subcritical dynamics. Journal of Neuroscience, 35(11):4626–4634, 2015.

S. C. Ferreira, C. Castellano, and R. Pastor-Satorras. Epidemic thresholds of the susceptible- infected-susceptible model on networks: A comparison of numerical and theoretical results. Physical Review E, 86(4):041125, 2012.

N. Friedman, S. Ito, B. A. Brinkman, M. Shimono, R. L. DeVille, K. A. Dahmen, J. M. Beggs, and T. C. Butler. Universal critical dynamics in high resolution neuronal avalanche data. Physical Review Letters, 108(20):208102, 2012.

M. Girardi-Schappo, G. S. Bortolotto, J. J. Gonsalves, L. T. Pinto, and M. H. Tragtenberg. Griffiths phase and long-range correlations in a biologically motivated visual cortex model. Scientific Reports, 6:29561, 2016.

K.-I. Goh, B. Kahng, and D. Kim. Universal behavior of load distribution in scale-free networks. Physical Review Letters, 87(27):278701, 2001.

R. L. Graham, D. E. Knuth, and O. Patashnik. Concrete Mathematics: A Foundation for Computer Science. Addison-Wesley: New York, 1989.

132 A. Haimovici, E. Tagliazucchi, P. Balenzuela, and D. R. Chialvo. Brain organization into resting state networks emerges at criticality on a model of the human connectome. Physical Review Letters, 110(17):178101, 2013.

C. Haldeman and J. M. Beggs. Critical branching captures activity in living neural networks and maximizes the number of metastable states. Physical Review Letters, 94(5):058101, 2005.

E. C. Hansen, D. Battaglia, A. Spiegler, G. Deco, and V. K. Jirsa. Functional connectivity dynamics: modeling the switching behavior of the resting state. Neuroimage, 105:525–535, 2015.

T. Hasegawa and K. Nemoto. Outbreaks in susceptible-infected-removed epidemics with multiple seeds. Physical Review E, 93(3):032324, 2016.

H. M. Hastings, J. Davidsen, and H. Leung. Challenges in the analysis of complex systems: introduction and overview. The European Physical Journal Special Topics, 226(15):3185–

3197, Dec 2017. ISSN 1951-6401. doi: 10.1140/epjst/e2017-70094-x. URL https://doi. org/10.1140/epjst/e2017-70094-x.

S. Herculano-Houzel. The human brain in numbers: a linearly scaled-up primate brain. Frontiers in Human Neuroscience, 3:31, 2009.

J. Hesse and T. Gross. Self-organized criticality as a fundamental property of neural systems. Frontiers in Systems Neuroscience, 8:166, 2014.

D. G. C. Hildebrand, M. Cicconet, R. M. Torres, W. Choi, T. M. Quan, J. Moon, A. W. Wetzel, A. S. Champion, B. J. Graham, O. Randlett, et al. Whole-brain serial-section electron microscopy in larval zebrafish. Nature, 545(7654):345, 2017.

H. Hinrichsen. Non-equilibrium critical phenomena and phase transitions into absorbing states. Advances in Physics, 49(7):815–958, 2000.

133 A. L. Hodgkin and A. F. Huxley. A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology, 117(4): 500–544, 1952.

E. M. Izhikevich. Dynamical Systems in Neuroscience. MIT press, 2007.

M. Kardar. Statistical physics of fields. Cambridge University Press, 2007.

E. T. Kavalali. The mechanisms and functions of spontaneous neurotransmitter release. Nature Reviews Neuroscience, 16(1):5, 2015.

O. Kinouchi and M. Copelli. Optimal dynamical range of excitable networks at criticality. Nature Physics, 2(5):348, 2006.

B. Kolb and I. Q. Whishaw. Fundamentals of human neuropsychology. Macmillan, 2009.

F. Y. K. Kossio, S. Goedeke, B. van den Akker, B. Ibarz, and R.-M. Memmesheimer. Growing critical: self-organized criticality in a developing neural system. Physical Review Letters, 121(5):058301, 2018.

S. Kwon and Y. Kim. Epidemic spreading in annealed directed networks: Susceptible- infected-susceptible model and contact process. Physical Review E, 87(1):012813, 2013.

H. K. Lee, P.-S. Shim, and J. D. Noh. Epidemic threshold of the susceptible-infected- susceptible model on complex networks. Physical Review E, 87(6):062812, 2013.

A. Levina and M. Herrmann. Dynamical synapses give rise to a power-law distribution of neuronal avalanches. In Advances in Neural Information Processing Systems, pages 771–778, 2006.

R. Livi and P. Politi. Nonequilibrium Statistical Physics: a Modern Perspective. Cambridge University Press, 2017.

Y. L. Luke. Special Functions and Their Approximations, volume 1. Academic press, 1969.

134 B. D. Malamud, G. Morein, and D. L. Turcotte. Forest fires: an example of self-organized critical behavior. Science, 281(5384):1840–1842, 1998.

S. V. Mayer, R. B. Tesh, and N. Vasilakis. The emergence of arthropod-borne viral diseases: A global prospective on dengue, chikungunya and zika fevers. Acta Tropica, 166:155–163, 2017.

B. W. Mel. Information processing in dendritic trees. Neural Computation, 6(6):1031–1085, 1994.

S. Melnik, A. Hackett, M. A. Porter, P. J. Mucha, and J. P. Gleeson. The unreasonable effectiveness of tree-based theory for networks with clustering. Physical Review E, 83(3): 036112, 2011.

D. Meunier, R. Lambiotte, and E. T. Bullmore. Modular and hierarchically modular orga- nization of brain networks. Frontiers in Neuroscience, 4:200, 2010.

D. Millman, S. Mihalas, A. Kirkwood, and E. Niebur. Self-organized criticality occurs in non-conservative neuronal networks during ‘up’states. Nature Physics, 6(10):801, 2010.

C. Moore and M. E. J. Newman. Exact solution of site and bond percolation on small-world networks. Physical Review E, 62(5):7059, 2000.

P. Moretti and M. A. Mu˜noz. Griffiths phases and the stretching of criticality in brain networks. Nature Communications, 4:2521, 2013.

M. E. J. Newman. Spread of epidemic disease on networks. Physical Review E, 66:016128, Jul

2002. doi: 10.1103/PhysRevE.66.016128. URL https://link.aps.org/doi/10.1103/ PhysRevE.66.016128.

M. E. J. Newman. Power-laws, pareto distributions and zipf’s law. Contemporary Physics, 46(5):323–351, 2005.

135 M. E. J. Newman. Networks: An Introduction. Oxford University Press, 2010.

M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Random graphs with arbitrary degree distributions and their applications. Physical Review E, 64:026118, Jul 2001. doi: 10.

1103/PhysRevE.64.026118. URL https://link.aps.org/doi/10.1103/PhysRevE.64. 026118.

G. Odor. Critical dynamics on a large human open connectome network. Physical Review E, 94(6):062411, 2016.

S. W. Oh, J. A. Harris, L. Ng, B. Winslow, N. Cain, S. Mihalas, Q. Wang, C. Lau, L. Kuan, A. M. Henry, et al. A mesoscale connectome of the mouse brain. Nature, 508(7495):207, 2014.

J. G. Orlandi and J. Casademunt. Noise focusing in neuronal tissues: Symmetry breaking and localization in excitable networks with quenched disorder. Physical Review E, 95(5): 052304, 2017.

J. G. Orlandi, J. Soriano, E. Alvarez-Lacalle, S. Teller, and J. Casademunt. Noise focusing and the emergence of coherent activity in neuronal cultures. Nature Physics, 9(9):582, 2013.

R. Parshani, S. Carmi, and S. Havlin. Epidemic threshold for the susceptible-infectious- susceptible model on random networks. Physical Review Letters, 104(25):258701, 2010.

V. Pasquale, P. Massobrio, L. Bologna, M. Chiappalone, and S. Martinoia. Self-organization and neuronal avalanches in networks of dissociated cortical neurons. Neuroscience, 153 (4):1354–1369, 2008.

D. Plenz. Neuronal avalanches and coherence potentials. The European Physical Journal Special Topics, 205(1):259–301, 2012.

136 S.-S. Poil, A. van Ooyen, and K. Linkenkaer-Hansen. Avalanche dynamics of human brain oscillations: Relation to critical branching processes and temporal correlations.

Human Brain Mapping, 29(7):770–777, 2008. doi: 10.1002/hbm.20590. URL https: //onlinelibrary.wiley.com/doi/abs/10.1002/hbm.20590.

S.-S. Poil, R. Hardstone, H. D. Mansvelder, and K. Linkenkaer-Hansen. Critical-state dy- namics of avalanches and oscillations jointly emerge from balanced excitation/inhibition in neuronal networks. Journal of Neuroscience, 32(29):9817–9823, 2012.

V. Priesemann, M. Wibral, M. Valderrama, R. Pr¨opper, M. Le van Quyen, T. Geisel, J. Tri- esch, D. Nikoli´c,and M. H. Munk. Spike avalanches in vivo suggest a driven, slightly subcritical brain state. Frontiers in Systems Neuroscience, 8:108, 2014.

C. Ricotta, G. Avena, and M. Marchetti. The flaming sandpile: self-organized criticality and wildfires. Ecological Modelling, 119(1):73–77, 1999.

B. Roy and S. B. Santra. Continuous percolation transition in random cluster growth model. Croatica Chemica Acta, 86(4):495–501, 2013.

E. J. Sanz-Arigita, M. M. Schoonheim, J. S. Damoiseaux, S. A. Rombouts, E. Maris, F. Barkhof, P. Scheltens, and C. J. Stam. Loss of ‘small-world’networks in alzheimer’s dis- ease: graph analysis of fmri resting-state functional connectivity. PloS One, 5(11):e13788, 2010.

Y. Sara, T. Virmani, F. De´ak,X. Liu, and E. T. Kavalali. An isolated pool of vesicles recycles at rest and drives spontaneous neurotransmission. Neuron, 45(4):563–573, 2005.

S. D. Sarma, S. Adam, E. Hwang, and E. Rossi. Electronic transport in two-dimensional graphene. Reviews of Modern Physics, 83(2):407, 2011.

N. Schwartz, R. Cohen, D. Ben-Avraham, A.-L. Barab´asi,and S. Havlin. Percolation in directed scale-free networks. Physical Review E, 66(1):015104, 2002.

137 G. Scott, E. D. Fagerholm, H. Mutoh, R. Leech, D. J. Sharp, W. L. Shew, and T. Kn¨opfel. Voltage imaging of waking mouse cortex reveals emergence of critical neuronal dynamics. Journal of Neuroscience, 34(50):16611–16620, 2014.

W. L. Shew and D. Plenz. The functional benefits of criticality in the cortex. The Neuro- scientist, 19(1):88–100, 2013.

A. Shmuel, M. Augath, A. Oeltermann, and N. K. Logothetis. Negative functional mri response correlates with decreases in neuronal activity in monkey visual area v1. Nature Neuroscience, 9(4):569, 2006.

O. Shriki, J. Alstott, F. Carver, T. Holroyd, R. N. Henson, M. L. Smith, R. Coppola, E. Bullmore, and D. Plenz. Neuronal avalanches in the resting meg of the human brain. Journal of Neuroscience, 33(16):7079–7090, 2013.

H. F. Song and X.-J. Wang. Simple, distance-dependent formulation of the watts-strogatz model for directed and undirected small-world networks. Physical Review E, 90(6):062801, 2014.

O. Sporns, G. Tononi, and R. K¨otter.The human connectome: a structural description of the human brain. PLoS computational biology, 1(4):e42, 2005.

N. Stepp, D. Plenz, and N. Srinivasa. Synaptic plasticity enables adaptive self-tuning critical networks. PLoS Computational Biology, 11(1):e1004043, 2015.

E. Tagliazucchi, P. Balenzuela, D. Fraiman, and D. R. Chialvo. Criticality in large-scale brain fmri dynamics unveiled by a novel point process analysis. Frontiers in Physiology, 3:15, 2012.

J. Touboul and A. Destexhe. Power-law statistics and universal scaling in the absence of criticality. Physical Review E, 95(1):012413, 2017.

138 J. Travers and S. Milgram. An experimental study of the small world problem. In S. Lein- hardt, editor, Social Networks, pages 179 – 197. Academic Press, 1977. ISBN 978-

0-12-442450-0. doi: https://doi.org/10.1016/B978-0-12-442450-0.50018-3. URL http: //www.sciencedirect.com/science/article/pii/B9780124424500500183.

M. P. van Den Heuvel and H. E. H. Pol. Exploring the brain network: a review on resting- state fmri functional connectivity. European Neuropsychopharmacology, 20(8):519–534, 2010.

L. M. van Kessenich, L. de Arcangelis, and H. Herrmann. Synaptic plasticity and neuronal refractory time cause scaling behaviour of neuronal avalanches. Scientific Reports, 6:32071, 2016.

L. M. van Kessenich, M. Lukovi´c,L. De Arcangelis, and H. J. Herrmann. Critical neural networks with short-and long-term plasticity. Physical Review E, 97(3):032312, 2018.

L. M. van Kessenich, D. Berger, L. de Arcangelis, and H. Herrmann. Pattern recognition with neuronal avalanche dynamics. Physical Review E, 99(1):010302, 2019.

S. Wasserman and K. Faust. Social network analysis: Methods and applications, volume 8. Cambridge university press, 1994.

W. N. Watson. Asymptotic expansions of hypergeometric functions. Transactions of the Cambridge Philosophical Society, 22:277–308, 1918.

D. J. Watts and S. H. Strogatz. Collective dynamics of ‘small-world’networks. Nature, 393 (6684):440, 1998.

R. V. Williams-Garc´ıa,M. Moore, J. M. Beggs, and G. Ortiz. Quasicritical brain dynamics on a nonequilibrium widom line. Physical Review E, 90(6):062714, 2014.

R. V. Williams-Garcia, J. M. Beggs, and G. Ortiz. Unveiling causal activity of complex networks. EPL (Europhysics Letters), 119(1):18003, 2017.

139 J. Wilting and V. Priesemann. Inferring collective dynamical states from widely unobserved systems. Nature Communications, 9(1):2325, 2018.

N. D. Wolfe, P. Daszak, A. M. Kilpatrick, and D. S. Burke. Bushmeat hunting, deforestation, and prediction of zoonotic disease. Emerging Infectious Diseases, 11(12):1822, 2005.

M. Yaghoubi, T. de Graaf, J. G. Orlandi, F. Girotto, M. A. Colicos, and J. Davidsen. Neuronal avalanche dynamics indicates different universality classes in neuronal cultures. Scientific Reports, 8(1):3417, 2018.

C. Yang and C. Yang. Critical point in liquid-gas transitions. Physical Review Letters, 13 (9):303, 1964.

Z. Zheng, J. S. Lauritzen, E. Perlman, C. G. Robinson, M. Nichols, D. Milkie, O. Torrens, J. Price, C. B. Fisher, N. Sharifi, et al. A complete electron microscopy volume of the brain of adult drosophila melanogaster. Cell, 174(3):730–743, 2018.

140 Appendix A

Supplementary Figures

141 1/3 Figure A.1: Power-law scaling of 1/k − p1c ∝ p0 , shown here for k = 10.

1/2 Figure A.2: Power-law scaling of 1/k − p1c ∝ p0 for the σ = 1 line, shown here for k = 10.

142 Figure A.3: Scaling of the first- (Equation-4.13) and second-order (Equation-4.17) approxi- mations to the active fraction (Equation-4.3) along the Widom line.

143 Figure A.4: The Widom line in the neighbourhood of p0  1 is asymptotically approximated by Equation-4.18.

144 Figure A.5: The scaling of the size cutoff for mergeless avalanches. The exact sξ is given by Equation-4.21 and is plotted in purple. Equation-4.22 captures the correct scaling form for sξ, however has a poor prefactor for small p0, as can be seen in green in the figure above. Equation-4.25 shows an improved prefactor, and is plotted in blue.

145 Figure A.6: Giant component for simulations with N = 104 nodes on 10-regular graphs, of varying durations. Above the critical point, variation in simulation duration has no effect. Below the critical point, the largest cluster doesn’t scale extensively, and hence its occupation fraction for the whole simulation decreases as the simulation duration increases.

146 Spontaneous activity and thresholded avalanches. Percentile: 1.0 % −8 −7 a. p0 = 10 b. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

−6 −5 c. p0 = 10 d. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

−4 −3 e. p0 = 10 f. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

Figure A.7: Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 1st percentile.

147 Spontaneous activity and thresholded avalanches. Percentile: 16.0 % −8 −7 a. p0 = 10 b. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

−6 −5 c. p0 = 10 d. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

−4 −3 e. p0 = 10 f. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

Figure A.8: Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 34th percentile.

148 Spontaneous activity and thresholded avalanches. Percentile: 84.0 % −8 −7 a. p0 = 10 b. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

−6 −5 c. p0 = 10 d. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

−4 −3 e. p0 = 10 f. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

Figure A.9: Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 76th percentile.

149 Spontaneous activity and thresholded avalanches. Percentile: 97.5 % −8 −7 a. p0 = 10 b. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

−6 −5 c. p0 = 10 d. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

−4 −3 e. p0 = 10 f. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

Figure A.10: Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 97.5th percentile.

150 Spontaneous activity and thresholded avalanches. Percentile: 99.0 % −8 −7 a. p0 = 10 b. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

−6 −5 c. p0 = 10 d. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

−4 −3 e. p0 = 10 f. p0 = 10 10−1 10−1

10−4 10−4 P(S) P(S) 10−7 10−7

100 102 104 106 100 102 104 106 Size (S) Size (S) Diverging cluster size Diverging cluster size Unity branching Unity branching Widom line Widom line Classical p1 = 1/k Classical p1 = 1/k x−3/2 x−3/2 x−5/4 x−5/4

Figure A.11: Simulations on 10-regular graphs with N = 105 nodes, with a threshold set to the 99th percentile.

151 Appendix B

Numerical Methods

B.1 Simulation of infinite k-regular branching processes

with spontaneous activity

Simulating an infinite lattice might initially seem Sisyphean. Indeed, it is patently obvious that the entire infinite lattice cannot be simultaneously simulated in finite space. However, it is not necessary to simulate the entire infinite lattice to gather statistics for clusters occurring on such a lattice. The key is to only consider finite clusters, and to build clusters one at a time. These clusters can be built by starting from an active site, and adding connected nodes one at a time. This is in fact only possible on an infinite lattice. The procedure would fail to exactly capture all possible clusters on a finite lattice, because loops would appear. However, if we assume that there are no loops (i.e. that the lattice is infinite), we can add nodes to a cluster one a time, safely assured that we haven’t added the same node twice. We develop each cluster by starting with a single active node. We can choose to start at a root (i.e. with no active parents), or at a random active node (potentially with active parents). We then check the connections to the active node to see if additional sites are added. There are two types of connection that can add to our cluster: (type-I) a potentially active node descending from a node already in the cluster, or (type-II) an active parent node

152 to a node in the cluster. If we’re beginning from a root node, there are initially k type-I connections. For a

randomly-selected active node, there may be initially mp 6= 0 type-II connections. The

probability that there are mp active parents, given that we are beginning at an active site, is given by

P (active site | m parents)P (m parents) P (m active parents | active site ) = p p , p P (active site)

where P (active site | mp parents) denotes the probability that a site is active, when it has mp

parents in the preceeding time step, P (mp parents) denotes the probability that a randomly-

selected site has mp active parents in the preceeding time-step, and P (active site) denotes the probability that a randomly-selected site is active. Given that P (active site) = Φ,

k−mp P (active site | m parents) = p p mp (from Equation-4.2), P (m parents) = k Φmp Φ , p 0 1 p mp and P (active site) = Φ by definition, we know

  k−m 1 k mp p mp P (mp active parents | active site ) = Φ Φ (1 − p0 p1 ) . (B.1) Φ mp

If we begin at a randomly-selected active node, then there are mp type-II connections where

mp is drawn from the distribution given in Equation-B.1. Additionally, this probability distribution also gives the number of parents for a type-II node, since we know a type-II node is active. The algorithm for developing a cluster proceeds to check each unevaluated connection (of type-I or -II), possibly adding more as it goes, until either no unevaluated connections remain or the cluster exceeds a given size (typically 1010). Each type of connection is evaluated as follows:

• (Type-I): We check each type-I, by assuming it has md other active parents (drawn from a binomial distribution of k − 1 other parents, each with probability Φ, the active

153 md fraction), and include it with probability 1 − (1 − p0)(1 − p1) . If it is included, then

we add k type-I connections from this daughter, md type-II connections, and increase the size of the cluster by one.

• (Type-II): We include each type-II with probability 1, increasing the cluter size by one, adding an additional k − 1 type-I connections (the −1 reflects the fact that one

of its daughters was already in the cluster), and mp type-II connections, where mp is drawn from the distribution in Equation-B.1.

This algorithm only requires that one count the number of unevaluated connections and the total size of the cluster, and so runs in O(1) space. For the purposes of measuring the two- point connectedness function, the above algorithm can be easily extended to also include the number of time-steps, by simply tracking how many times each active front has followed a daughter branch or a parent branch. The number of roots is also easily extracted, by simply considering each type-II connection added with mp = 0 additional type-II parents to be a root. An example realization of this algorithm, developing a single cluster of size 5 is given in Table-B.1 which is visualized in Figures-B.1,B.2.

154 A Φ A Φ

2 1 − p0 p1 I I I I (b)

(a) II A II

A C B B I

I I (c) I I (d)

Figure B.1: A cluster is developed from root-node A. This particular network structure is an illustrative conceit – no specific structure is specified in memory. (a) Consider a cluster developed from a single root node, occurring on an infinite, random, 2-regular graph for clarity. There are initially two type-I connections. (b) Each type-I node potentially has k − 1 other parents, each independently active with probability Φ. In this specific example, let’s suppose both of the type-I connections of node A have another parent. Then, each of 2 the type-I connections will be included in the cluster with probability: 1−p0p1 . (c) Suppose the left type-I fails to activate, while the right (now labelled B) succeeds. The other parent B is now a type-II connection, while the hitherto unconsidered daughters of B are two new type-Is. (d) The type-II connection (now labelled C) is always included. It introduces a new type-I connection, and (after sampling from Equation-B.1) adds 1 new type-II connection. The other possible (but inactive) parent is shown in light-grey.

155 Figure B.2: A possible cluster realization of size 5, following the remaining steps outlined in Table-B.1, continuing from Figure-B.1.

156 Step Algorithm action Size Type-I Type-II 0 (B.1a) Initialized 1 2 0 1 Evaluated type-I 1 1 0 2 (B.1c) Evaluated type-I 2 2 1 3 (B.1d) Evaluated type-II 3 3 1 4 Evaluated type-II 3 3 0 5 Evaluated type-I 4 4 0 6 Evaluated type-I 4 3 0 7 Evaluated type-I 4 2 0 8 Evaluated type-I 5 3 1 9 Evaluated type-II 5 3 0 10 Evaluated type-I 5 2 0 11 Evaluated type-I 5 1 0 12 (B.2) Evaluated type-I 5 0 0

Table B.1: An example of developing a single cluster of size 5. Nodes and edges in the cluster are in black, while the nodes and edges constituting the boundary of the cluster are in light grey. The first few operations are illustrated in Figure-B.1.

157