Estimating Transfer Entropy in Continuous Time Between Neural

bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 Estimating Transfer Entropy in Continuous Time Between Neural 2 Spike Trains or Other Event-Based Data 1 1, 2 1 3 David P. Shorten, Richard E. Spinney, and Joseph T. Lizier 1 4 Complex Systems Research Group and Centre for Complex Systems, 5 Faculty of Engineering, The University of Sydney, NSW 2006, Australia 2 6 School of Physics and EMBL Australia Node Single Molecule Science, 7 School of Medicine, The University of New South Wales, Sydney, Australia 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Abstract Transfer entropy (TE) is a widely used measure of directed information flows in a number of domains including neuroscience. Many real-world time series in which we are interested in information flows come in the form of (near) instantaneous events occurring over time, including the spiking of biological neurons, trades on stock markets and posts to social media. However, there exist severe limitations to the current approach to TE estimation on such event-based data via discretising the time series into time bins: it is not consistent, has high bias, converges slowly and cannot simultaneously capture relationships that occur with very fine time precision as well as those that occur over long time intervals. Building on recent work which derived a theoretical framework for TE in continuous time, we present an estimation framework for TE on event- based data and develop a k-nearest-neighbours estimator within this framework. This estimator is provably consistent, has favourable bias properties and converges orders of magnitude more quickly than the discrete-time estimator on synthetic examples. We also develop a local permutation scheme for generating null surrogate time series to test for the statistical significance of the TE and, as such, test for the conditional independence between the history of one point process and the updates of another | signifying the lack of a causal connection under certain weak assumptions. Our approach is capable of detecting conditional independence or otherwise even in the presence of strong pairwise time-directed correlations. The power of this approach is further demonstrated on the inference of the connectivity of biophysical models of a spiking neural circuit inspired by the pyloric circuit of the crustacean stomatogastric ganglion, succeeding where previous related estimators have failed. 8 AUTHOR SUMMARY 9 Transfer Entropy (TE) is an information-theoretic measure commonly used in neuro- 10 science to measure the directed statistical dependence between a source and a target time 11 series, possibly also conditioned on other processes. Along with measuring information flows, 12 it is used for the inference of directed functional and effective networks from time series data. 13 The currently-used technique for estimating TE on neural spike trains first time-discretises 14 the data and then applies a straightforward or "plug-in" information-theoretic estimation 15 procedure. This approach has numerous drawbacks: it is very biased, it cannot capture re- 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Y Y X X Y Y X X Y y3 y2 y1 <xi <xi <xi X Y Y yj 2 yj 1 yj X − − x3 x2 x1 <xi <xi <xi yt 5:t 1 X − − xi 3 xi 2 xi 1 xi xt 5:t 1 xt − − − − − 1 2 3 y<x = y , y , y i { <xi <xi <xi } 1 2 3 x<x = x , x , x i { <xi <xi <xi } FIG. 1: Diagrams highlighting the differences in the embeddings used by the discrete and continuous-time estimators. The discrete-time estimator (left) divides the time series into time bins. A binary value is assigned to each bin denoting the presence or absence of a (spiking) event { alternatively, this could be a natural number to represent the occurrence of multiple events. The process is thus recast as a sequence of binary values and the history embeddings (xt 5:t 1 and yt 5:t 1) for each point are binary vectors. The probability of an event occurring− in a− bin, conditioned− − on its associated history embeddings, is estimated via the so-called plugin (histogram) estimator. Conversely, the continuous-time estimator (right) performs no time binning. History embeddings x<xi and y<xi for events or x<ui and y<ui for arbitrary points in time (not shown, see Methods) are constructed from the raw interspike intervals. This approach estimates the TE by comparing the probabilities of the history embeddings of the target processes' history as well as the joint history of the target and source processes at both the (spiking) events and arbitrary points in time. 16 lationships occurring on both fine and large timescales simultaneously, converges very slowly 17 as more data is obtained, and indeed does not even converge to the correct value. We present 18 a new estimator for TE which operates in continuous time, demonstrating via application to 19 synthetic examples that it addresses these problems, and can reliably differentiate statisti- 20 cally significant flows from (conditionally) independent spike trains. Further, we also apply 21 it to more biologically-realistic spike trains obtained from a biophysical model of the pyloric 22 circuit of the crustacean stomatogastric ganglion; our correct inference of the underlying 23 connection structure here provides an important validation for our approach where similar 24 methods have previously failed 25 I. INTRODUCTION 26 In analysing time series data from complex dynamical systems, such as in neuroscience, 27 it is often useful to have a notion of information flow. We intuitively describe the activities 3 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 28 of brains in terms of such information flows: for instance, information from the visual world 29 must flow to the visual cortex where it will be encoded [1]. Conversely, information coded 30 in the motor cortex must flow to muscles where it will be enacted [2]. 31 Transfer entropy (TE) [3, 4] has become a widely accepted measure of such flows. It is 32 defined as the mutual information between the past of a source time-series process and the 33 present state of a target process, conditioned on the past of the target. More specifically (in 34 discrete time), the transfer entropy rate [5] is: NT 1 1 X p (xt x<t; y<t) T_ Y X = I (Xt ; Y<t X<t) = ln j (1) ! ∆t j τ p (x x ) t=1 t j <t 35 Here the information flow is being measured from a source process Y to a target X, I( ; ) · · j · 36 is the conditional mutual information [6], xt is the current state of the target, x<t is the 37 history of the target, y<t is the history of the source, ∆t is the interval between time 38 samples (in units of time), τ is the length of the time series and NT = τ=∆t is the number 39 of time samples. The histories x<t and y<t are usually captured via embedding vectors, e.g. 40 x<t = xt m:t 1 = xt m; xt m+1; : : : ; xt 1 . Recent work [5] has highlighted the importance − − f − − − g 41 of normalising the TE by the width of the time bins, as above, such that it becomes a rate, 42 in order to ensure convergence in the limit of small time bin size. 43 It is also possible to condition the TE on additional processes [4]. Given additional 44 processes Z = Z ;Z ;:::;Z with histories Z = Z ; Z ;:::; Z , we can f 1 2 nZ g <t f 1;<t 2;<t nZ;<tg 45 write the conditional TE rate as 1 T_ Y X Z = I (Xt ; Y<t X<t; Z<t) : (2) ! j ∆t j 46 When combined with a suitable statistical significance test, the conditional TE can be used 47 to show that the present state of X is conditionally independent of the past of Y { conditional 48 on the past of X and of the conditional processes Z. Such conditional independence implies 49 the absence of an active causal connection between X and Y under certain weak assumptions 50 [7, 8], and TE is widely used for inferring directed functional and effective network models 51 [9{12] (and see [4, Sec. 7.2]). 52 TE has received widespread application in neuroscience [13, 14]. Uses have included the 53 network inference as above, as well as measuring the direction and magnitude of information 4 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.154377; this version posted June 16, 2020.

Load more