Pyif: a Fast and Light Weight Implementation to Estimate Bivariate Transfer Entropy for Big Data

PyIF: A Fast and Light Weight Implementation to Estimate Bivariate Transfer Entropy for Big Data Kelechi M. Ikegwu Jacob Trauger Jeff McMullin IIlinois Informatics Institute Department of Computer Science Department of Accounting University of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign Indiana University Urbana, IL USA Urbana, IL, USA Bloomington, IN, USA [email protected] [email protected] [email protected] Robert J. Brunner Department of Accountancy University of Illinois at Urbana-Champaign Urbana, IL, USA [email protected] Abstract—Transfer entropy is an information measure that perform index-to-index analysis with the Dow Jones Share quantifies information flow between processes evolving in time. Market (DJIA) and the Frankfurt Stock index (DAX) [4] and Transfer entropy has a plethora of potential applications in finan- document the extent to which one index drives the behavior of cial markets, canonical systems, neuroscience, and social media. We offer a fast open source Python implementation called PyIF the other. Following Marschinski and Kantz, other researchers that estimates Transfer Entropy with Kraskov’s method. PyIF apply TE to examine related research questions about financial utilizes KD-Trees, multiple processes by parallelizing queries on markets. These include measuring TE from market indexes, said KD-Trees, and can be used with CUDA compatible GPUs to such as S&P 500 or the DJIA, to individual equities as well significantly reduce the wall time for estimating transfer entropy. as between individual equities [5]. We find from our analyses that PyIF’s GPU implementation is up to 1072 times faster (and it’s CPU implementation is up 181 times Network Inference is another application area of TE. An faster) than existing implementations to estimate transfer entropy objective of network inference is to infer a network by inden- on large data and scales better than existing implementations. tifying relationships between individual processes in the data. Index Terms—Transfer Entropy, Parallel Processing Computational neuroscience, financial market analysis, gene regulatory networks, social media, and multi-agent systems I. INTRODUCTION are areas where TE has been used to model networks. Early Information theory provides a framework for studying the approaches that used TE for network inference either measure quantification, storage, and communication of information [1]. pairwise TE between all pairs of variables in a network or This theory defines entropy as the amount of uncertainty or threshold the TE values to select connections between nodes disorder in a random process. Mutual Information is another in a network [12], [13], and [14]. Recent approaches have measure in this theory which quantifies the amount of informa- used statistical significance tests of pairwise TE to determine tion shared across random variables. While similar to mutual whether links exist [15] and [16]. [5] offers more examples of information, transfer entropy (TE) also considers the dynamics TE applications. of information and how these dynamics evolves in time [2]. Put simply, TE quantifies the reduction in uncertainty in one B. Outline random process from knowing past realizations of another In the next section we formally define TE. The following random process. This is a particularly useful property of TE section discusses TE estimation methods. We then discuss our as many real-world phenomena, from stock market prices to proposed implementation called PyIF to estimate bivariate TE. neural signals, are dynamic processes evolving in time. TE is Next, we describe a comparative analysis between PyIF and also an asymmetric measure of information transfer. Ergo, TE existing implementations that estimate bivariate TE. Lastly, we computed from process A to process B may yield a different conclude the paper with a discussion and future work. result than TE computed from B to A. The information theoretic framework and these measures have led to a variety II. DEFINITION OF TRANSFER ENTROPY of applications in different research areas [1], [5]. In 2000, Schreiber [2] discovered TE and coined the name A. Applications of Transfer Entropy “transfer entropy,” although Milian Palus [3] also indepen- dently discovered the concept as well. Let the function I TE is particularly useful for detecting information transfer represent mutual information between two probability distri- in financial markets [5]. Marschinski and Kantz used TE to butions. Lagged mutual information I(Xt : Yt−k) can be used 978-1-7281-6861-6/20/$31.00 ©2020 IEEE as a time-asymmetric measure of information transfer from Y to X where X and Y are both random processes, k is a B. Kraskov Estimator lag period, and t is the current time period. However, lagged Transfer Entropy can be estimated using k-nearest neighbors mutual information is unsatisfactory as it does not account for [8]. Note that entropy can be estimated with: a shared history between the processes X and Y [6]. n TE considers the shared history between two processes via 1 X H^ (X) = − lnp^(x ) (5) conditional mutual information. Specifically, TE conditions on n i i=1 the past of Xt to remove any redundant or shared information between Xt and its past. This also removes any information Kraskov et al. expanded this definition to estimate entropy to: in the process Y about X at time t that is in the past of X [7]. n Transfer entropy T (where the transfer of information occurs 1 X 1 H^ (X) = − (n (i)) − + (n) + ln(c ) + from Y to X) can be defined as: n x k dx i=1 n TY !X (t) ≡ I(Xt : Yt−kjXt−k) (1) dx X ln((i)) (6) n Kraskov [8] shows that transfer entropy can be expressed as i=1 the difference between two conditional mutual information computations: where n are the number of data points, k are the nearest neighbors, dx is the dimension of x, and cdx is the volume of TY !X (t) = I(XtjXt−k;Yt−k) − I(XtjXt−k) (2) the dx-dimensional unit ball. For two random variables X and (i) th Y, let 2 be the distance between (xi; yi) and it’s k neighbor . x(i) y (i) be denoted by (kxi; kyi). Let and be defined as The intuition of this definition is that TE measures the 2 2 jjxi − kxijj and jjyi − yijj respectively. nx(i) is the number of amount of information in Yt−k about Xt after considering the points xj such that jjxi −xjjj ≤ x(i)=2, (x) is the digamma information in Xt−k about Xt. Put differently, TE quantifies function where the reduction in uncertainty about Xt from knowing Yt−k after considering the reduction in uncertainty about Xt from (x) = Γ(x)−1dΓ(x)=dx (7) knowing Xt−k. and Γ(x) is the ordinary gamma function. Lastly (1) = −C III. ESTIMATING TRANSFER ENTROPY where C = 0:5772156649 and is the Euler-Mascheroni con- There are many techniques for estimating mutual informa- stant. To estimate the entropy for the random variable Y, Y ^ tion. Khan et al. explored the utility of different methods for can be substituted into H(X). mutual information estimation [10] and many of the methods Joint entropy between X and Y can then be estimated as: they considered are applicable to estimate TE. 1 H^ (X; Y ) = − (k) − + (n) + ln(c c ) + k dx dy A. Kernel Density Estimator n dx + dy X ln((i) (8) Kernel Density Estimators can be used to estimate TE [11]. n i For a bivariate dataset of size n with variables X and Y, Mutual Information can be estimated as: where dy is the dimension of y, and cdy is the column of the ^ ^ ^ n dy-dimensional unit ball. Using H(X); H(Y ); and H(X; Y ) 1 X p^XY (xi; yj) I^(X; Y ) = ln (3) mutual information can be estimated as: n p^ (x )^p (y ) i=1 X i Y i n 1 1 X where p^X (xi) and p^Y (yi) are the estimated marginal prob- I^(X; Y ) = (k) − − [ (n (i)) + (n (i))] + (n) k n x y ability density functions and p^XY (xi; yj) is the joint esti- i=1 mated probability density function. For a multivariate dataset (9) containing: x1; x2; :::; xn where each x is in a d-dimensional where ny(i) is the number of points yj such that jjyi − y (i) space, the multivariate kernel density estimator with kernel K yjjj ≤ 2 . This method has been referred to as the Kraskov is defined by: estimator in literature. 1 X x − xi C. Additional Estimators p^(x) = = 1nK( ) (4) nhd h Khan et al. also explored the utility of Edgeworth approxi- i mation of differential entropy to calculate Mutual Information where h is the smoothing parameter, and in this case, K is and adaptive partitioning of the XY plane to estimate the joint a standard multivariate normal kernel defined by K(x) = T probability density, which can be used to estimate mutual −d=2 x x (2π) e 2 . Moon et al. outlined a procedure to estimate information. Ultimately Khan et al. found that a KDE esti- Mutual Information using marginal and joint probabilities with mator and Kraskov estimator outperform other methods with Kernel Density Estimators [11]. respect to their ability to capture the dependence structure of random processes. Currently our software supports estimating data. We make the assumption that there is relatively little bivariate TE using the Kraskov estimator with plans to add to no information transfer between the random processes. We other estimators in the future. run each of the implementations (excluding Transfer Entropy Toolbox) on nano, a cluster of eight SuperMicro servers with IV. PYIF Intel Haswell/Broadwell CPUs and NVIDIA Tesla P100/V100 Our proposed software implementation PyIF 1 is an open GPUs hosted by the National Center of Super Computing Ap- source implementation.

Load more