<<

week ending PRL 108, 258701 (2012) PHYSICAL REVIEW LETTERS 22 JUNE 2012

Escaping the Curse of Dimensionality in Estimating Multivariate Transfer Entropy

Jakob Runge,1,2 Jobst Heitzig,1 Vladimir Petoukhov,1 and Ju¨rgen Kurths1,2,3 1Potsdam Institute for Climate Impact Research (PIK), 14473 Potsdam, Germany 2Department of Physics, Humboldt University, 12489 Berlin, Germany 3Institute for Complex Systems and Mathematical Biology, University of Aberdeen, Aberdeen AB24 3UE, United Kingdom (Received 18 February 2012; published 21 June 2012) Multivariate transfer entropy (TE) is a model-free approach to detect in multivariate time series. It is able to distinguish direct from indirect and common drivers without assuming any underlying model. But despite these advantages it has mostly been applied in a bivariate setting as it is hard to estimate reliably in high dimensions since its definition involves infinite vectors. To overcome this limitation, we propose to embed TE into the framework of graphical models and present a formula that decomposes TE into a sum of finite-dimensional contributions that we call decomposed transfer entropy. Graphical models further provide a richer picture because they also yield the causal coupling delays. To estimate the graphical model we suggest an iterative algorithm, a modified version of the PC-algorithm with a very low estimation dimension. We present an appropriate significance test and demonstrate the method’s performance using examples of nonlinear stochastic delay-differential equations and observa- tional climate data (sea level pressure).

DOI: 10.1103/PhysRevLett.108.258701 PACS numbers: 89.70.Cf, 02.50.r, 05.45.Tp, 89.70.a

Inferring causal relations between processes when only methods are also able to exclude spurious causalities due to some time series of measurements are given is of very other processes. In classical statistics this problem can be general interest in many fields of science, especially when addressed by assuming some class of linear or nonlinear the underlying mechanisms are poorly understood. parametric models and estimating the model that best Consider the climate system, where new observational explains the time series. A similar model-based test is technologies lead to an abundance of observational data [7] and the framework of methods using and yet even the interaction mechanism between such phase coherences [8]. There also exist methods that are less important systems as the El Nin˜o southern oscillation restrictive in that they do not require to specify a certain and the Indian monsoon system is not fully understood functional form, but still require a priori knowledge of the [1,2]. Also the inference of functional brain connectivity distributions or general structure of the model [3]. in neuroscience and the discovery of causal relationships Therefore, the absence of a link in a graph inferred with in economic data are of great importance [3,4]. One can these model-based methods does not imply that the pro- view the processes as nodes of a graph where the links cesses are not interacting since only a certain class of denote interactions. Now the problem is whether these causal mechanisms has been tested. links can be interpreted as ‘‘causal’’ interactions. Towards In the framework of theory [9], no such a causal interpretation, statistical methods need to be able model-based restrictions exist because interactions are to (1) measure associations, (2) measure also time delays, viewed as transfers of information. The most widely ap- and (3) exclude other influences [5]. Usually it is impos- plied information-theoretic functional is (conditional) sible to exclude all other influences, especially if the (CMI) [4], especially in the form of system cannot be experimentally manipulated, and there- (multivariate) transfer entropy (TE) [10]. As formally de- fore the term ‘‘causal’’ can only be understood to be fined below, the common definition of TE leads to the meant relative to the system under study, i.e., in our problem of estimating infinite-dimensional densities, framework the processes that comprise the nodes of the which is commonly called the ‘‘curse of dimensionality’’ graph. The main caveats against a causal interpretation and strongly affects the reliability of as are the possibility of spurious causalities due to indirect demonstrated in Fig. 1. influences or common drivers. For example, if a causal In this Letter we show how this severe limitation can be interaction is given by X ! Y ! Z, a bivariate analysis overcome by embedding TE into the framework of graph- would give a significant link between X and Z that is ical models. In this framework we derive a formula that detected as being only indirect in a multivariate analysis decomposes TE into a sum of finite-dimensional contribu- including Y. tions that we call decomposed transfer entropy (DTE). This Several existing measures fulfill requirements (1) and can drastically reduce the estimation dimension and ena- (2), e.g., classical product-moment correlation analysis or bles a much more reliable estimation of TE even from real- synchronization [6], if delays are taken into account. Some world data. Furthermore, graphical models provide a richer

0031-9007=12=108(25)=258701(5) 258701-1 Ó 2012 American Physical Society week ending PRL 108, 258701 (2012) PHYSICAL REVIEW LETTERS 22 JUNE 2012

FIG. 1 (color online). Estimated TE between all subprocesses X, Y, Z, W for an example process [a discrete-time sampling of Eq. (5) that will be analyzed later] with true coupling structure given by the black arrows. If we truncate the infinite past vectors used in the common definition of TE at 15 lags, the estimation FIG. 2 (color online). Causal interactions in a multivariate dimension is 61. The true TE of all gray and red links is zero, process X. (a) Time series graph (see definition in text). The since they do not represent direct couplings. So of the four boxes mark different sets of conditions for the CMI between estimates of comparable size marked in red boxes, only the Xt at ¼ 2 and Yt (marked by the black dots): the terms in Y ! W link is a correctly identified coupling. Eq. (1) use the infinite set Xt n Xt [ Xt (gray dotted line). S But only the finite set Yt;Xt (red dashed line) is needed to S satisfy Eq. (2). Yt;Xt must be chosen so that it separates the picture of causal interactions because they also yield the skipped infinite conditions ðXt n Xt [ Xt ÞnSY ;X from Yt causal coupling delays. We demonstrate the advantages of t t in the graph (for a formal definition of paths and separation, see this approach on a model and on observational climate data P [13]). The even smaller set of parents Yt (blue box, gray nodes) of mean sea level pressure in Eastern Europe. separates Yt from the past of the whole process Xt n P Y , which To derive DTE, we introduce the following notation: t is used in the algorithm to estimate the graph. (b) Process graph, Given a stationary multivariate discrete-time stochastic X which aggregates the information in the time series graph for process , we denote its uni- or multivariate subprocesses better visualization (labels denote the lags). X; Y; Z; W; ...and the values at time t as Xt;Xt; .... Their X X ; X ; ... X pasts are defined as t ¼ð t1 t2 Þ and t ¼ ? X ðXt1;Xt2; ...Þ and with a slight abuse of notation Xt TE DTE IX Y IX Y ¼ IðXt ; YtjSY ;X Þ; (3) X ! ! t t can be formally treated as a subset of t . Now the TE ¼1 TE I ¼ IðXt ; YtjXt n Xt Þ is the reduction in uncertainty X!Y with ? chosen as the smallest for which the estimated about Yt when learning the past of Xt, if the rest of the past remainder is smaller than some given absolute tolerance of Xt, given by Xt n Xt , is already known. There are two (see the Supplemental Material for details [11]). This can infinite-dimensional parts in TE: Xt and Xt n Xt .We improve the estimation of TE considerably as compared to address the first by decomposing TE into contributions of the direct estimation, although the sets SY ;X and the individual lags of X via the chain rule (for detailed deri- t t vations, see the Supplemental Material [11]), resulting dimensions can still be large. But how can the required time series graph be esti- X1 mated? As depicted in Fig. 2(a), each node in that graph I X; Y X X I X ; Y X X;X : ð t tj t n t Þ¼ ð t tj t n t tÞ represents a subprocess X at a certain time t. Nodes Xt ¼1 and Yt are connected by a directed link Xt ! Yt pointing (1) forward in time if and only if >0 and IðXt ; YtjXt nfXt gÞ > 0; (4) Now the decisive step to escape the still infinite dimension of the condition in each term is done by utilizing the theory and nodes Xt and Yt are connected by an undirected con- of graphical models [12,13]. Looking at Fig. 2(a), the main temporaneous link (visualized by a line) if and only if I X ; Y X X ;Y > 0 Y Þ X idea stems from a Markov property that relates the sepa- ð t tj tþ1 nf t tgÞ [13]. If , we say that ration of nodes in the graph to the conditional independ- the link Xt ! Yt represents a coupling at lag , while for ences in the process ([13], theorem 4.1). It implies that Y ¼ X it represents an autodependency at lag , so the time series graph gives a richer picture of causal interrelations IðX ; Y jX n X;X Þ¼IðX ; Y jS Þ; t t t t t t t Yt;Xt (2) than TE, including the causal coupling delays and contem- poraneous links as well, while the total influence of X on Y S X n X [ X for a certain finite subset Yt;Xt t t t of the can be measured by TE. Figure 2(a) shows the time series conditions [see Fig. 2(a)]. Once the time series graph graph for a small example of a multivariate process with (defined below) of the process is known, suitable sets couplings and auto-dependencies, which can be summa- S Yt;Xt can be determined from it and the TE can be rized in a process graph [see Fig. 2(b)]. estimated using only low-dimensional densities. The re- As shown in Fig. 2(a), there is an important difference S maining infinite sum can be truncated at some finite between the conditions Yt;Xt and the (direct) parents of Y P X X X;X Y since the terms typically decay exponentially with , t, denoted Yt ¼f t: 2 t ! tg. This means

258701-2 week ending PRL 108, 258701 (2012) PHYSICAL REVIEW LETTERS 22 JUNE 2012 in particular that the terms in Eq. (3) can be nonzero also statistical associations and the Markov property is fulfilled for lags for which there is no link in the graph and by a large class of model systems ([13], condition (S)). therefore they cannot be used directly as a measure of CMI has been applied to time-continuous chaotic models lag-specific coupling strength, a topic we explore further in [15] and in [17] it has been shown that dynamical noise P X in a separate paper. Since Yt (or any subset of t that actually helps in the coupling analysis. The latter work contains P Y ) will separate Yt from Xt n P Y in the graph, demonstrated that for a precise inference of coupling lags t t X IðXt ; YtjP Y nfXt gÞ > 0 defines links equivalent to one needs enough dynamical noise in that can be mea- t Y Eq. (4), which is used in the following algorithm to esti- sured in . Therefore, we demonstrate the method with a mate the time series graph. system of four stochastic delay-differential equations and The estimation of graphical models is very similar to the couple them linearly and nonlinearly, also in the stochastic problem of inferring the directed acyclic graph [14] of a set terms. This system of Ornstein-Uhlenbeck processes can of random variables. To this end, the idea of the PC- be interpreted as nonlinearly coupled particles, each fluc- algorithm (named after its inventors [14]) is to iteratively tuating in its harmonic potential: _ unveil the links by testing for conditional independence X ¼0:5XðtÞþ0:6Wðt 4ÞXðtÞ; between all possible pairs of nodes conditioned on itera- Y_ 0:9Y t 1:0X t 2 0:6Z t 5 t ; tively more conditions and testing all combinations among ¼ ð Þ ð Þþ ð Þþ Yð Þ _ them. Thereby, the dimension stays as low as possible in Z ¼0:7ZðtÞ0:5Yðt 6ÞþZðtÞ; every iteration step. We adapt this algorithm for time series W_ 0:8W t 0:4Y t 3 2 0:05Y t 3 t ; and propose some modifications to speed up the perform- ¼ ð Þ ð Þ þ ð Þþ Wð Þ ance. The algorithm starts with no a priori knowledge (5) about the links and iteratively learns the set of parents for : t each Y. The union of parents together with the contempo- with independent unit variance white noise processes ð Þ. Y Z raneous links then comprises the graph. Thus, we have a bidirectional feedback Ð and a feed- back loop X ! Y ! W ! X in which Y ! W is nonlinear For every Y, first we estimate the MIs IðXt; YtÞ and ~ and a stochastic coupling W ! X. This system would be initialize the preliminary parents P Y ¼fXt :X 2 X; 0 < t hard to analyze using model-based approaches, especially max;IðXt ; YtÞ > 0g. This set also contains indirect given short sample lengths as used here (T ¼ 1000). links that are now iteratively removed by testing whether ~ Throughout the analysis we have used a fixed significance the CMI between Yt and each Xt 2 P Y conditioned on t threshold I ¼ 0:015 and a maximum lag max ¼ 15. P~ n;i P~ the incrementally increased set of conditions Yt Yt Figure 3 shows the iteration steps. Step (0.0) gives the vanishes. In the outer loop, iterate n over increasing num- result of an analysis using only MI, the first step of the ber of conditions, starting with some n0 > 0. In the inner algorithm. We would wrongly infer that Y drives X, X loop, iterate i through all combinations of picking n nodes drives W, X and Z are coupled and conclude on a P~ P~ n;i Y Z 12 from Yt to define the conditions Yt in this step, and long-range memory process within and at . I X ; Y P~ n;i X P~ Also the precise coupling delays are buried under a broad estimate the CMI ð t tj Yt Þ for all t 2 Yt . After X I X ; Y P~ n;i 0 range of significant lags. The CMI values for each lag can each step, the nodes t with ð t tj Yt Þ¼ are P~ i removed from Yt and the -iteration stops if all possible P~ combinations have been tested. If the cardinality j Yt j n, the algorithm converges, else, increase n by one and iterate again. Once the parents of each process are known, the same algorithm for ¼ 0 can be used to infer the contempora- N X X X ;X Y neous neighbors Yt ¼f t: 2 t t tg, where now IðX ; Y P ; N~ n;i; undirected links are removed if t tj Yt Yt P N~ n;i Þ 0 FIG. 3 (color online). Iterative steps in the analysis of model ð Yt Þ ¼ . The free parameters of this method are Eq. (5), time series length T ¼ 1000, integration time step dt ¼ the maximum lag max, the initial number of conditions 0:01, sampling interval s ¼ 100, MI and CMI estimated using n0, the significance threshold I to determine whether ~ n;i k ¼ 100. The label (n:i) indicates the iteration step. (0.0) shows IðXt ; YtjP Þ > 0, and any parameters of the estimator Yt the MI process graph. With an initial n0 ¼ 3 the next step (3.0) of CMI. A suitable significance test and details of the (see the Supplemental Material [11]) with three conditions is algorithm are discussed in the Supplemental Material already almost identical to the converged graph in step (3.1), d [11]. For the estimation of CMI we use a k-nearest neigh- where the values in the boxes denote the estimate IDTE of TE via bor estimator [15,16]. Eq. (3) with chosen such that IðXt 1; YtjSY ;X Þ has t t1 Now, we demonstrate the method first on a model sys- declined below significance. Incorrect links and lag labels are tem and second on real data. In theory, CMI measures any in red.

258701-3 week ending PRL 108, 258701 (2012) PHYSICAL REVIEW LETTERS 22 JUNE 2012 be represented via lag functions as shown in the Supplemental Material [11]. The algorithm proceeds as follows. Considering the estimation of the parents of X, we choose an initial n0 ¼ 3 and start with the links with the weakest MI, Z at ¼ 1; 2, conditioned on the P~ 3;0 three preliminary parents with the largest MI, Xt ¼ fXt1;Xt2;Wt4g. As these links are due to Y which drives Z and with one step delay also X via W (see process FIG. 4 (color online). Analysis of time series of mean sea level ~ 3;0 pressure with T ¼ 1268 using the same threshold as before and graphs in Fig. 3), IðZt; XtjP X Þ for ¼ 1; 2 vanishes t max ¼ 10. (0.0) depicts the process graph for MI. The iteration P~ and we remove these links from Xt . The second weakest converges in step (2.1); step (2.0) and the lag functions are shown link, Yt8 ! Xt, is indirectly mediated via W and thus in the Supplemental Material [11]. Labels are as in Fig. 3. I Y ; X P~ 3;0 ð t8 tj Xt Þ vanishes and we remove this link as well. Next, we check the coupling from W and the auto- and W Y are possibly due to spatial proximity. This X dependency from t2 conditioned on the same nodes, causal picture of a south-eastward ‘‘flow of entropy’’ is W X whereupon the links from t5 and t2 vanish. Now consistent with the dynamical processes governing the P~ n 3 the set Xt is smaller than ¼ , i.e., all possible con- lower and middle atmosphere circulation in the considered ditions have been tested, and the algorithm for X converges area. One usually observes a superposition of westerly already in the second step. The analysis for the remaining winds with traveling extratropical cyclones that traverse subprocesses is discussed in the Supplemental Material the area and whose trajectories are regulated by the afore- [11]. Apart from some inaccuracy in the coupling lags, mentioned westerlies [19]. Consistent with the causal lags which is due to the continuous nature of the system, this of 1 or 2 days these processes act on short daily time scales. yields the correct graph. While the dimension for the direct We note that this causal structure might change in the high estimation of TE is D ¼ max 4 þ 1 ¼ 61, the dimension troposphere where the influence of quasistationary plane- using the graph and Eq. (3) is between 5 and 24 (depending tary waves and the Ferrel cell might noticeably modify an S above-mentioned causality. This analysis underlines the on Yt;Xt ). It is interesting to compare TE with the model d importance of knowing coupling delays for physical inter- parameters in Eq. (5): Z ! Y has a higher IDTE than pretations and serves as a first step to study more complex X ! Y , while the corresponding parameters are 0.6 and systems like the Indian monsoon and El Nin˜o. 1.0, respectively. TE as a measure of the total influence To conclude, we have addressed the problem of inferring between two processes, therefore, cannot be simply related causal relations among multiple processes within the to the parameters of the underlying model. model-free information-theoretic framework. For the com- To study the performance of the algorithm using a monly used TE, we have derived an exact decomposition significance test from shuffle surrogates, we ran 1000 formula that enables an estimation using finite vectors. numerical experiments with the class of nonlinear discrete Although the estimation dimension is drastically reduced, stochastic models. The results are shown in the it can still be quite high. The graphical model, on the other Supplemental Material [11] and indicate that, compared hand, can be very efficiently estimated by a modification of to MI, the rate of false positives is strongly reduced, while the PC-algorithm with much lower dimension. It gives a correct linear and nonlinear links are well detected. complementary picture of causal relations among multiple n Further, we found that a larger initial 0 significantly processes with precise coupling delays and contempora- speeds up the performance. neous links. Thus, time resolved causal relations are much We now analyze a climatological data set of daily mean easier to estimate than TE as a measure of the total influ- sea level pressure anomalies in the winter months of 1997– ence between two processes. Now the limiting factor in the 2003 [18] at four locations in Eastern Europe indicated on construction of ‘‘causal’’ networks from brain or climate the map in Fig. 4. First, we give the statistical analysis and data [2] is no longer the network size, but only the maxi- then provide a climatological interpretation. From MI in mum degree, i.e., the number of parents. step (0.0), one would infer an almost fully connected graph We acknowledge the support by the German with a broad range of lags (see lag functions in the Environmental Foundation (DBU), the DFG Grant Supplemental Material [11]). For example, we found a No. KU34-1, the DFG research group 1380 ‘‘HIMPAC,’’ strong Y ! Z ‘‘link’’ and a ‘‘link’’ W ! X with a delay and the BMBF project ‘‘PROGRESS’’. We thank J. of about 2 days. The iteration using an initial n0 ¼ 2 con- Zscheischler for helpful comments. verges in the third step (2.1). The link Y ! Z is now much weaker (even below our significance threshold), because a lot of the shared entropy is due to the common driver W. Even more apparent, the W ! X link vanishes due to the [1] G. Pant and K. Kumar, Climates of South Asia (John Wiley condition on Y. Note that the contemporaneous links X Z & Sons, New York, 1997).

258701-4 week ending PRL 108, 258701 (2012) PHYSICAL REVIEW LETTERS 22 JUNE 2012

[2] J. Donges, Y. Zou, N. Marwan, and J. Kurths, Europhys. [9] T. Cover and J. Thomas, Elements of Information Theory Lett. 87, 48007 (2009). (John Wiley & Sons, New York, 2006). [3] C. Hiemstra and J. D. Jones, J. Finance 49, 1639 (1994); [10] T. Schreiber, Phys. Rev. Lett. 85, 461 (2000); M. Palusˇ,V. D. Marinazzo, M. Pellicoro, and S. Stramaglia, Phys. Rev. Koma´rek, Z. Hrncˇ´ırˇ, and K. Sˇteˇrbova´, Phys. Rev. E 63, Lett. 100, 144103 (2008); J. Prusseit and K. Lehnertz, 046211 (2001). Phys. Rev. E 77, 041914 (2008); W.-X. Wang, R. Yang, [11] See Supplemental Material at http://link.aps.org/ Y.C. Lai, V. Kovanis, and C. Grebogi, Phys. Rev. Lett. supplemental/10.1103/PhysRevLett.108.258701 for for- 106, 154101 (2011). mal derivations, remarks on the algorithm, numerical [4] K. Hlavackova-Schindler, M. Palusˇ, M. Vejmelka, and J. experiments, and a comprehensive discussion of the model Bhattacharya, Phys. Rep. 441, 1 (2007). and application analyses. [5] J. Pearl, Causality: Models, Reasoning, and Inference [12] S. L. Lauritzen, Graphical Models, Oxford Statistical (Cambridge University Press, Cambridge, England, Science Series, Vol. 16 (Clarendon, Oxford, 1996); R. 2000). Dahlhaus, Metrika 51, 157 (2000). [6] A. Pikovsky, M. Rosenblum, and J. Kurths, Synchronization: [13] M. Eichler, Probab. Theory Relat. Fields 1, 233 (2012). A Universal Concept in Nonlinear Sciences (Cambridge [14] P. Spirtes, C. Glymour, and R. Scheines, Causation, University Press, Cambridge, England, 2003). Prediction, and Search (MIT, Cambridge, MA, 2000). [7] C. Granger, Econometrica 37, 424 (1969); J. Geweke, in [15] S. Frenzel and B. Pompe, Phys. Rev. Lett. 99, 204101 Handbook of Econometrics II, edited by Z. Griliches and (2007). M. D. Intriligator (Elsevier, Amsterdam, 1984), Chap. 19. [16] M. Vejmelka and M. Palusˇ, Phys. Rev. E 77, 026214 [8] B. Schelter, M. Winterhalder, R. Dahlhaus, J. Kurths, and (2008). J. Timmer, Phys. Rev. Lett. 96, 208103 (2006);L. [17] B. Pompe and J. Runge, Phys. Rev. E 83, 051122 Sommerlade, M. Eichler, M. Jachan, K. Henschel, J. (2011). Timmer, and B. Schelter, Phys. Rev. E 80, 051128 [18] T. Ansell et al., J. Climate 19, 2717 (2006). (2009); J. Nawrath, M. Carmen Romano, M. Thiel, I. Z. [19] E. Palmen and C. W. Newton, Atmospheric Circulation Kiss, M. Wickramasinghe, J. Timmer, J. Kurths, and B. Systems: Their Structure and Physical Interpretation Schelter, Phys. Rev. Lett. 104, 038701 (2010). (Academic, New York, 1969).

258701-5