bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
1 Estimating Transfer Entropy in Continuous Time Between Neural
2 Spike Trains or Other Event-Based Data
1 1, 2 1 3 David P. Shorten, Richard E. Spinney, and Joseph T. Lizier
1 4 Complex Systems Research Group and Centre for Complex Systems,
5 Faculty of Engineering, The University of Sydney, NSW 2006, Australia
2 6 School of Physics and EMBL Australia Node Single Molecule Science,
7 School of Medicine, The University of New South Wales, Sydney, Australia
1 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
Abstract Transfer entropy (TE) is a widely used measure of directed information flows in a number of domains including neuroscience. Many real-world time series in which we are interested in information flows come in the form of (near) instantaneous events occurring over time, including the spiking of biological neurons, trades on stock markets and posts to social media. However, there exist severe limitations to the current approach to TE estimation on such event-based data via discretising the time series into time bins: it is not consistent, has high bias, converges slowly and cannot simultaneously capture relationships that occur with very fine time precision as well as those that occur over long time intervals. Building on recent work which derived a theoretical framework for TE in continuous time, we present an estimation framework for TE on event- based data and develop a k-nearest-neighbours estimator within this framework. This estimator is provably consistent, has favourable bias properties and converges orders of magnitude more quickly than the discrete-time estimator on synthetic examples. We also develop a local permutation scheme for generating null surrogate time series to test for the statistical significance of the TE and, as such, test for the conditional independence between the history of one point process and the updates of another — signifying the lack of a causal connection under certain weak assumptions. Our approach is capable of detecting conditional independence or otherwise even in the presence of strong pairwise time-directed correlations. The power of this approach is further demonstrated on the inference of the connectivity of biophysical models of a spiking neural circuit inspired by the pyloric circuit of the crustacean stomatogastric ganglion, succeeding where previous related estimators have failed.
8 AUTHOR SUMMARY
9 Transfer Entropy (TE) is an information-theoretic measure commonly used in neuro-
10 science to measure the directed statistical dependence between a source and a target time
11 series, possibly also conditioned on other processes. Along with measuring information flows,
12 it is used for the inference of directed functional and effective networks from time series data.
13 The currently-used technique for estimating TE on neural spike trains first time-discretises
14 the data and then applies a straightforward or ”plug-in” information-theoretic estimation
15 procedure. This approach has numerous drawbacks: it is very biased, it cannot capture re-
2 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
Y Y
X X
Y Y X X
Y y3 y2 y1 (right) performs no time binning. History embeddings x y 16 lationships occurring on both fine and large timescales simultaneously, converges very slowly 17 as more data is obtained, and indeed does not even converge to the correct value. We present 18 a new estimator for TE which operates in continuous time, demonstrating via application to 19 synthetic examples that it addresses these problems, and can reliably differentiate statisti- 20 cally significant flows from (conditionally) independent spike trains. Further, we also apply 21 it to more biologically-realistic spike trains obtained from a biophysical model of the pyloric 22 circuit of the crustacean stomatogastric ganglion; our correct inference of the underlying 23 connection structure here provides an important validation for our approach where similar 24 methods have previously failed 25 I. INTRODUCTION 26 In analysing time series data from complex dynamical systems, such as in neuroscience, 27 it is often useful to have a notion of information flow. We intuitively describe the activities 3 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 28 of brains in terms of such information flows: for instance, information from the visual world 29 must flow to the visual cortex where it will be encoded [1]. Conversely, information coded 30 in the motor cortex must flow to muscles where it will be enacted [2]. 31 Transfer entropy (TE) [3, 4] has become a widely accepted measure of such flows. It is 32 defined as the mutual information between the past of a source time-series process and the 33 present state of a target process, conditioned on the past of the target. More specifically (in 34 discrete time), the transfer entropy rate [5] is: NT 1 1 X p (xt x 35 Here the information flow is being measured from a source process Y to a target X, I( ; ) · · | · 36 is the conditional mutual information [6], xt is the current state of the target, x 37 history of the target, y 38 samples (in units of time), τ is the length of the time series and NT = τ/∆t is the number 39 of time samples. The histories x 40 x 42 in order to ensure convergence in the limit of small time bin size. 43 It is also possible to condition the TE on additional processes [4]. Given additional 44 processes Z = Z ,Z ,...,Z with histories Z = Z , Z ,..., Z , we can { 1 2 nZ } 1 T˙ Y X Z = I (Xt ; Y 46 When combined with a suitable statistical significance test, the conditional TE can be used 47 to show that the present state of X is conditionally independent of the past of Y – conditional 48 on the past of X and of the conditional processes Z. Such conditional independence implies 49 the absence of an active causal connection between X and Y under certain weak assumptions 50 [7, 8], and TE is widely used for inferring directed functional and effective network models 51 [9–12] (and see [4, Sec. 7.2]). 52 TE has received widespread application in neuroscience [13, 14]. Uses have included the 53 network inference as above, as well as measuring the direction and magnitude of information 4 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 54 flows [15, 16] and determining transmission delays [17]. Such applications have been per- 55 formed using data from diverse sources such as MEG [18, 19], EEG [20], fMRI [21], electrode 56 arrays [22], calcium imaging [11] and simulations [23]. 57 Previous applications of TE to spike trains [22, 24–34] and other types of event-based 58 data [35], including for the purpose of network inference [11, 36, 37], have relied on time 59 discretisation. As shown in Fig. 1, the time series is divided into small bins of width ∆t. 60 The value of a sample for each bin could then be assigned as either binary - denoting 61 the presence or absence of events (spikes) in the bin - or a natural number denoting the 62 number of events (spikes) that fell within the bin. A choice is made as to the number of 63 time bins, l and m, to include in the source and target history embeddings y 64 This results in a finite number of possible history embeddings. For a given combination 65 x 66 histories, p (x x , y ), can be directly estimated using the so-called plugin (histogram) t | 68 target history, p (x x ), can be estimated in the same fashion. From these estimates the t | 70 the application of the discrete time TE estimator to synthetic examples including spiking 71 events from simulations of model neurons.) 72 There are two large disadvantages to this approach [5]. If the process is genuinely oc- 73 curring in discrete time, then the estimation procedure just described is consistent. That 74 is, it is guaranteed to converge to the true value of the TE in the limit of infinite data. 75 However, if we are considering a fundamentally continuous-time process, such as a neuron’s 76 action potential, then the lossy transformation of time discretisation will result in an inac- 77 curate estimate of the TE. Thus, in these cases, any estimator based on time discretisation 78 is not consistent. Secondly, whilst the loss of resolution of the discretization will decrease 79 with decreasing bin size ∆t, this requires larger dimensionality of the history embeddings to 80 capture correlations over similar time intervals. This increase in dimension will result in an 81 exponential increase in the state space size being sampled and therefore the data require- 82 ments. This renders the estimation problem intractable for the typical dataset sizes present 83 in neuroscience. 84 In practice then, the application of transfer entropy to event-based data such as spike 85 trains has required a trade-off between fully resolving interactions that occur with fine 5 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 86 time precision and capturing correlations that occur across long time intervals. There is 87 substantial evidence that spike correlations at the millisecond and sub-millisecond scale 88 play a role in encoding visual stimuli [38, 39], motor control [40] and speech [41]. On the 89 other hand, correlations in spike trains exist over lengths of hundreds of milliseconds [42]. 90 A discrete-time TE estimator cannot capture both of these classes of effects simultaneously. 91 Recent work by Spinney et al. [5] derived a continuous-time formalism for TE. It was 92 demonstrated that, for stationary point processes such as spike trains, the pairwise TE rate 93 is given by: NX 1 X λx x 94 It is worth emphasizing that, in this context, the processes X and Y are series of the time 95 points xi of events i. This is contrasted with (1), where X and Y are series of values at the 96 sampled time points t . To avoid confusion we use the notation that the y Y are the raw i i ∈ 97 time points and y 98 xi (see Methods). Here, λx x 99 conditioned on the histories of the target x 100 events in the target process. λx x 101 conditioned on its history alone, ignoring the history of the source. Note that λx x 102 λx x NX 1 X λx x 104 Here, λx x 105 on the histories of the target x 106 z 108 processes, ignoring the history of the source. 109 Crucially, it was demonstrated by Spinney et al., and later shown more rigorously by 110 Cooper and Edgar [43], that if the discrete-time formalism of the TE (in (1)) could be 111 properly estimated as limδt 0, then it would converge to the same value as the continuous- → 112 time formalism. This is due to the contributions to the TE from the times between target 6 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 113 events cancelling. Yet there are two important distinctions in the continuous-time formalism 114 which hold promise to address the consistency issues of the discrete-time formalism. First, 115 the basis in continuous time allows us to represent the history embeddings by inter-event 116 intervals, suggesting the possibility of jointly capturing subtleties in both short and long 117 time-scale effects that has evaded discrete-time approaches. (See Fig. 1 for a diagrammatic 118 representation of these history embeddings, contrasted with the traditional way of construct- 119 ing histories for the discrete-time estimator). Second, note the important distinction that 120 the sums in (3) and (4) are taken over the NX (spiking) events in the target during the 121 time-series over interval τ; this contrasts to a sum over all time-steps in the discrete-time 122 formalism. An estimation strategy based on (3) and (4) would only be required to calcu- 123 late quantities at events, ignoring the ‘empty’ space in between. This implies a potential 124 computational advantage, as well as eliminating one source of estimation variability. 125 These factors all point to the advantages of estimating TE for event-based data using 126 the continuous-time formalism in (3) and (4). This paper presents an empirical approach 127 to doing so. The estimator (presented in Methods) operates by considering the probability 128 densities of the history embeddings observed at events and contrasts these with the prob- 129 ability densities of those embeddings being observed at other (randomly sampled) points. 130 This approach is distinct in utilising a novel Bayesian inversion on (4) in order to operate on 131 these probability densities of the history embeddings, rather than making a more difficult 132 direct estimation of spike rates. This further allows us to utilise k-Nearest-Neighbour (kNN) 133 estimators for entropy terms based these probability densities, bringing known advantages 134 of consistency, data efficiency, low sensitivity to parameters and known bias corrections. 135 By combining these entropy estimators, and adding a further bias-reduction step, we ar- 136 rive at our proposed estimator. The resulting estimator is consistent (see Methods section 137 IV A 5) and is demonstrated on synthetic examples in Results to be substantially superior 138 to estimators based on time discretisation across a number of metrics. 139 These surrogate samples have historically been created by either shuffling the original 140 source samples or by shifting the source process in time. However, this results in surro- 141 gates which conform to an incorrect null hypothesis that the transitions in the target are 142 completely independent of the source histories. That is, they conform to the factorisation 143 p (X , Y X , Z ) = p (X X , Z ) p (Y ). This leads to the possibility that a pro- t 7 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 145 TE. 146 As shown in Results, this can lead to incredibly high false positive rates in certain settings 147 such as the presence of strong common driver effects. Therefore, in order to have a suitable 148 significance test for use in conjunction with the proposed estimator, we also present (in 149 Methods section IV B) an adaptation of a recently proposed local permutation method [8] 150 to our specific case. This adapted scheme produces surrogates which conform to the correct 151 null hypothesis of conditional independence of the present of the target and the source 152 history, given the histories of the target and further conditioning processes. This is the 153 condition that p (X , Y X , Z ) = p (X X , Z ) p (Y X , Z ). All deviations t 155 the appropriate null. 156 We show in Results that the combination of the proposed estimator and surrogate gener- 157 ation method is capable of correctly distinguishing between zero and non-zero information 158 flow in cases where the history of the source might have a strong pairwise correlation with 159 the occurrence of events in the target, but is nevertheless conditionally independent. The 160 combination of the discrete-time estimator and a traditional method of surrogate genera- 161 tion is shown to be incapable of making this distinction. Further, we demonstrate that 162 the proposed combination is capable of inferring the connectivity of a simple circuit of bio- 163 physical model neurons inspired by the crustacean stomatogastric ganglion, a task which 164 the discrete-time estimator was shown to be incapable of performing. This provides strong 165 impetus for the application of the proposed techniques to investigate information flows in 166 spike-train data recorded from biological neurons. 167 II. RESULTS 168 The first two subsections here present the results of the continuous-time estimator applied 169 to two different synthetic examples for which the ground truth value of the TE is known. The 170 first example considers independent processes where T˙ Y X = 0, whilst the second examines → 171 coupled Poisson processes with a known, non-zero T˙ Y X . The continuous-time estimator’s → 172 performance is also contrasted with that of the discrete-time estimator. The emphasis of 173 these sections is on properties of the estimators: their bias, variance and consistency (see 174 Methods). 8 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 175 The third and fourth subsections present the results of the combination of the continuous- 176 time estimator and the local permutation surrogate generation scheme applied to two ex- 177 amples: one synthetic and the other a biologically plausible model of neural activity. The 178 comparison of the estimates to a population of surrogates produces p-values for the statistical 179 significance of the TE values being distinct from zero and implying a directed relationship. 180 The results are compared to the known connectivity of the studied systems. These p-values 181 could be translated into other metrics such as ROC curves and false-positive rates, but we 182 choose to instead visualise the distributions of the p-values themselves. The combination of 183 the discrete-time estimator along with a traditional method for surrogate generation (time 184 shifts) is also applied to these examples for comparison. 185 A. No TE Between Independent Homogeneous Poisson Processes (a) k = 1 (b) k = 5 FIG. 2: Evaluation of the continuous-time estimator on independent homogeneous Poisson processes. The blue line shows the average TE rate across multiple runs and the shaded area shows the standard deviation. Plots are shown for two different values of k nearest neighbours, and four different values of the ratio of the number of sample points to the number of events NU /NX (See Methods, sections IV A 4 and IV A 5). 186 The simplest example on which we can attempt to validate the estimator is independent 187 homogeneous Poisson processes, where the true value of the TE between such processes is 188 zero. 189 Pairs of independent homogeneous Poisson processes were generated, each with rate λ¯ = 9 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. (a) l = 2, m = 2 (b) l = 5, m = 5 FIG. 3: Result of the discrete time estimator applied to independent homogeneous Poisson processes. The solid line shows the average TE rate across multiple runs and the shaded area shows the standard deviation. Plots are shown for four different values of the bin width ∆t as well as different source and target embedding lengths, l and m. 2 3 4 5 190 1, and contiguous sequences of N 1 10 , 1 10 , 1 10 , 1 10 target events were X ∈ { × × × × } 191 selected. For the continuous-time estimator, the parameter NU for the number of placed 192 sample points was varied (see Methods sections IV A 4 and IV A 5) to check the sensitivity 193 of estimates to this parameter. At each of these numbers of target events NX , the averages 194 are taken across 1000, 100, 20 and 20 tested process pairs respectively. 195 Fig. 2 shows the results of these runs for the continuous-time estimator, using various 196 parameter settings. In all cases, the Manhattan (`1) norm is used as the distance metric 197 and the embedding lengths are set to lX = lY = 1 spike (see Methods sections IV A 4 and 198 IV A 5). For this example, the set of conditioning processes Z is empty. 199 The plots show that the continuous-time estimator converges to the true value of the TE 200 (0). This is a numerical confirmation of its consistency for independent processes. Moreover, 201 it exhibits very low bias (as compared to the discrete-time estimator, Fig. 3) for all values of 202 the k nearest neighbours and NU /NX parameters. The variance is relatively large for k = 1, 203 although it is dramatically better for k = 5 — this reflects known results for variance of this 204 class of estimators as a function of k [44]. 205 Fig. 3 shows the result of the discrete-time estimator applied to the same independent 10 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 206 homogeneous Poisson processes for two different combinations of the source and target his- 207 tory embedding lengths, l and m time bins, and four different bin sizes ∆t. At each of the 208 numbers of target events NX , the averages are taken across 1000, 100, 100 and 100 tested 209 process pairs respectively. The variance of this estimator on this process is low and compa- 210 rable to the continuous-time estimator, however the bias is very large and positive for short 211 processes. It would appear that, in the case of these independent processes, it is consistent. 212 B. Consistent TE Between Unidirectionally Coupled Processes lX = 1 1.5 5 1.0 0.5 4 0.0 102 103 104 105 lX = 2 y | x 3 1.5 1.0 2 0.5 TE (nats/second) 0.0 102 103 104 105 lX = 3 1 1.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1 ty 0.5 0.0 102 103 104 105 (a) Firing rate of the target Number of Target Events process as a function of the time since the most recent (b) Continuous-time 1 estimator source spike (ty) (c) Discrete-time estimator FIG. 4: The discrete-time and continuous-time estimators were run on coupled Poisson processes for which the ground-truth value of the TE is known. (a) shows the firing rate of the target process as a function of the history of the source. (b) and (c) show the estimates of the TE provided by the two estimators. The blue line shows the average TE rate across multiple runs and the shaded area shows the standard deviation. The black line shows the true value of the TE. For the continuous-time estimator the parameter values of NU /NX = 1 and k = 4 were used along with the `1 (Manhattan) norm. Plots are shown for three different values of the length of the target history component lX . For the discrete-time estimator, plots are shown for four different values of the bin width ∆t. The source and target history embedding lengths are chosen such that they extend back one time unit. 213 The estimators were also tested on an example of unidirectionally coupled spiking pro- 11 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 214 cesses with a known value of TE (previously presented as example B in [5]). Here, the 215 source process Y is an homogoneous Poisson process. The target process X is produced as 216 a conditional point process where the instantaneous rate is a function of the time since the 217 most recent source event. More specifically: ¯ λy x