<<

Vol. 127 (2015) ACTA PHYSICA POLONICA A No. 3-A

Proceedings of the 7th Symposium FENS, Lublin, May 14–17, 2014 Granger and Transfer Entropy for Financial Returns E.M. Syczewskaa,∗ and Z.R. Struzikb,c,d aWarsaw School of Economics, Department of Economic Analyses, Institute of Econometrics, Madalińskiego 6/8, 02-513 Warsaw, Poland bRIKEN Brain Science Institute, 2-1 Hirosawa, Wako-shi 351-0198, Japan cGraduate School of Education, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan dInstitute of Theoretical Physics and Astrophysics, The University of Gdańsk, Wita Stwosza 57, 80-952 Gdańsk, Poland in its linear form has been shown by Barnett, Barrett and Seth [Phys. Rev. Lett. 103, 238701 (2009)] to be equivalent to transfer entropy in case of Gaussian distribution. Generalizations by Hlaváčková- Schindler [Appl. Math. Sci. 5, 3637 (2011)] are applied to distributions typical for biomedical applications. The financial returns, which are of great importance in financial econometrics, typically do not have Gaussian distribution. Generalizations leading to the concept of nonlinear Granger causality (e.g. causality in variance, causality in risk), known and applied in econometric literature, seem to be less known outside this field. In the paper an overview of some of the definitions and applications is given. In particular, we indicate some recent econometric results concerning application of the tests in linear multivariate framework. We emphasize importance of other variants of Granger causality, and need of development of methods reflecting features of financial variables. DOI: 10.12693/APhysPolA.127.A-129 PACS: 89.20.–a, 89.70.Cf, 02.50.–r, 05.45.Tp, 02.70.Rr 1. Introduction important generalizations. Second, the question of sta- Granger causality (GC) and transfer entropy (TE) are tionarity and nonstationarity, and of proper choice of two approaches to mutual causation. The transfer en- distribution has been learned by econometricians the tropy, TE, introduced by Schreiber [1], builds on the hard way. This is especially important for financial time concept of (Shannon) entropy, but aims at detecting series — underestimating or overestimating particular dynamic causation links between a pair of variables. values due to a wrong choice of probability distribu- The Granger causality, GC, aside of TE, tends to be tion for the model can lead to losses or to inefficiency. used by researchers in climatology, physiology, neuro- The proper choice of the distribution is especially crucial physiology, multimode laser dynamics, analysis of causal- for risk assessment and volatility forecasting by financial ity in cardio-respiratory interactions, etc. (see [2], p. 5), institutions. partially due to equivalence of the GC and TE con- Hlaváčková-Schindler et al. [2] in addition to entropy cepts, proved by Barnett et al. in [3], and by Hlaváčková- and mutual measures, describe multivariate Schindler et al. [2]. Let us emphasize that the equiva- GC in vector autoregression model framework, nonlinear lence was shown only for a linear GC test, and originally GC tests, and nonparametric GC measures based on cor- only under assumption of Gaussianity. Later Hlaváčková- relation integral, but still do not cover the GC-TE equiv- Schindler [4] extended the analysis for some probabil- alence in the general sense. On the other hand, econo- ity distributions typical for biomedical phenomena (log- metricians use both linear and nonlinear GC tests, and normal distribution, Gaussian mixtures, generalized nor- tools based on , entropy etc. It seems mal distribution, Weinman exponential distribution). that absorption by econometricians of methods aimed at Barnett and Bossomaier [5] show TE and GC equivalence detecting causality, developed in the field of neurophysi- for the vector autoregressive model also under assump- ology etc. is stronger than that of methods developed in tion of Gaussianity. This does not cover other concepts of econometrics (especially of nonlinear Granger causality) Granger causality, and does not necessarily cover all dis- in the other direction. tributions used in applied financial research and practice. 2. Typical features of financial variables The concentration on the linear form of the GC (and the Gaussianity assumption) in biomedical appli- Financial variables have specific features, which led to cations seems to be somewhat misleading. First of all, development of particular modeling tools from the field of the Granger causality concept as described in papers financial econometrics. Let Pt denote price of a financial by C.W.J. Granger is richer than that (covers linear instrument at time t. Most financial variables of interest and nonlinear causality, causality in spectral domain, (stock indices, exchange rates, stock and options etc.) causality based on information concepts), and has several are nonstationary. Their returns, defined as difference of natural logarithms of prices, rt = ln(Pt) − ln(Pt−1), are stationary in mean, but typically show changes of volatility in time (termed volatility clustering). Changing ∗corresponding author; e-mail: [email protected] volatility is modeled e.g. with ARCH or GARCH-models, the first equation of which describes (and forecasts) mean

(A-129) A-130 E.M. Syczewska, Z.R. Struzik of the variable, and the second equation describes de- The GED probability density function is (see Tsay [15], pendence between changing volatilities with use of condi- p. 122): tional variance of the first equation. The ARCH (“autore- ν exp(−0.5|x/λ|ν ) f (x) = , (2) gressive with conditional heteroskedasticity”) model was GED λ2(1+1/ν)Γ(1/ν) introduced by Robert F. Engle [6] in 1982; GARCH (gen- where Γ(·) is the gamma function, and λ = eralized ARCH) — by Timothy Bollerslev [7] in 1986, and [2(−2/ν)Γ(1/ν)/Γ(3/ν)]1/2. This distribution has heavy many more detailed GARCH-type specifications have tails when ν < 2 and reduces to a Gaussian distribution been introduced by various authors to cover specific fea- when ν = 2. tures of financial instruments and particular markets. To illustrate discrepancies, especially in the tails, be- The GARCH family models originally estimated by the tween Gaussian and the empirical distribution of the maximum likelihood method with assumption of Gaus- daily returns of the USDPLN exchange rate and the sianity of error terms, prove to give more accurate results WIG20 returns, see Figure for daily quotations from if non-Gaussian, often skewed, probability distributions stooq.pl (Jan. 4, 2000–Oct. 31, 2014)‡. The results of the are applied, especially for daily and higher frequency data Granger causality test (based on the VAR model) for the (see Alexander [8]). two variables [16] show that during crisis, Oct. 01, 2007– Returns of financial instruments typically have skewed Feb. 27, 2009, the null hypothesis of no causality from and leptokurtic distributions. According to Mandelbrot, USDPLN to WIG20 is weakly rejected, the same for who first noticed volatility clustering in financial returns causality from WIG20 towards USDPLN. The results time series [9], returns should be modelled with use of † change somewhat after the crisis. Neither the returns nor Pareto-Lévy processes . In finance and in econometrics the error terms of the VAR models are Gaussian. returns are most often described with use of the follow- ing distributions: Gaussian mixtures, t-Student, or more complicated: general error distribution (GED in short) or USDPLN generalizations of the last two (see Osińska [12], pp. 170– ● ● ● ●●●●● ●●●● ●●●●●● 0.04 ●●●●●●● 172). The GED (General Error Distribution) has been ●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● introduced in [13] by Subbotin, and later generalized by ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● 0.00 ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●● Theodossiou [14] to a Skewed GED. The reason is that ●●●●●●●●● ●●●● ●●●●●●●●● ●●●●● some stable distributions are not suitable for statistical ● ● Sample Quantiles testing of an econometric model, and that more complex ● −0.06 distributions are intended to better reflect actual features −2 0 2 of financial variables. The skewed generalized t distribu- tion can be expressed as (see Osińska [12]): Theoretical Quantiles   k  fSGDT(x) = C 1 + WIG20 ν − 2 ν+1 k) k ● ● x + µ ●●● −k ●●●●●● ●●●●●●●●●● × [θ(1 + sgn(x − µ))λ] , (1) ●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●● σ ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● 0.00 ●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●● where B(·, ·) — the beta function, k, ν, λ, µ, σ — param- ●●●●●●●●●●● ●●●●●●●●●● ●●●●●● ●●●●●●● ●●●●●● eters of the distribution (k > 0; ν > 2; −1 < λ < 1; µ — ●●●● ●●●● z Sample Quantiles ● ● the mean; σ — the standard deviation); sgn(z) = |z| ; −0.10 the coefficients are defined as −2 0 2  1 ν −1.5  3 ν − 20.5 C = 0.5B , B , S(λ)σ−1, Theoretical Quantiles k k k k Fig. 1. Comparison of the quantiles of daily returns  k 1/k 1 ν 3 ν − 2 θ = B( , )0.5B( , )0.5S(λ)−1, with theoretical Gaussian distribution. ν − 2 k k k k " In order to be aplicable in financial econometrics, ques-  2 ν − 12 S(λ) = 1 + 3λ2 − 4λ2B , tion of TE and GC equivalence should include general- k k ization to the types of distributions typically applied in #0.5 financial econometric research.  3 ν − 2−1  1 ν −1 × B , B , . In this paper, examples of causation inference in the k k k k case of realistic data models will be reviewed, and the possible directions for improvement will be discussed.

†Description of the Pareto-Lévy, or stable distributions can be found in Feller [10], Vol. II, Chapter VI. S.T. Rachev, among others, ‡QQ-plot is a graph of percentiles of the empirical versus the- is a strong proponent of Lévy distribution (see e.g., [11]). oretical distribution. Granger Causality and Transfer Entropy for Financial Returns A-131

3. Granger causality and transfer entropy denote a distribution of Y conditional on X, and let Jt — two approaches to mutual causation define a set of all information avaiable at time t; Jt \ Xt Causality is usually posed using two alternative scenar- — all information at time t aside from the information ios: the autoregressive, spectral based, causal Granger contained in Xt. If for all forecast horizons k predictive modelling (Granger [17]), and the information F(Yt+k|Jt) = Ft(Yk+1|Jt \ Xt) (3) theoretic oriented, Kullback–Leibler (K–L) divergence then Xt is not a G-cause for Y . based, Transfer Entropy formulation (Schreiber [1]). To make this definition operational, the set of avail- Both measures aspire to infer and quantify mutual di- able information has to be described more precisely. Usu- rectional causation, between coupled variables, or among ally, for weakly stationary time series Xt, Yt, past ob- multiple variables. servations of the variables are taken into account. Let − There is a need to precisely analyse possible equiv- Xt = {xt−1, xt−2, ..., xt−k, ...} (past values of the X vari- alence of the two methods especially for complex phe- able) and similarly for Y . Let Jt be a set of information nomena where the observations and models are usually (σ-algebra) based on the past values of X and Y . If a distributed following non-Gaussian statistics and pos- variance of prediction error for a predictor based on all sess intra-and inter-observable non-linear correlations. the information differs from that for the predictor based As empirical distributions of financial time series differ on all information but the past of X, significally from the Gaussian distribution, in order to σ2(Y |J ) < σ2(Y |J \ X−) (4) apply TE methods to such data we need to address the t t t t t then the X is said to Granger-cause the Y . question of equivalency of the GC and TE approaches The instantaneous G-causality means that Y can be for such skewed and leptokurtic distributions. First we predicted better with use of the past and current value try to clarify concepts of the Granger causality, as de- of X than without them: fined and tested in econometric literature, next we show 2 2 − that concepts based on information theory, such as mu- σ (Yt|Jt ∪ Xt) < σ (Yt|Jt \ (Xt ∪ Xt)). (5) tual information, entropy and transfer entropy are known This definition can be implemented in framework of and used by econometricians to address causality in risk a linear regression model or multivariate vector autore- and in volatility, among others. Next we shall analyse gresion model. In the papers of Barnett et al. [3] and reasoning shown in the papers by Hlaváčková-Schindler Hlaváčková-Schindler [4] (who follows their notation) et al. [2, 4, 18] and Barnett, Barrett, Seth [3, 19], to namely this linear version of GC is compared to the trans- check intuitions behind their proofs of the linear GC and fer entropy approach. TE equivalence. 4.1. Test of significance of (lagged) predictor 4. Diversity of Granger causality concepts The test procedure is the following: run a regression of Y on its lags and lags of X, and check whether lags First the notion of Granger causality needs to be clari- of X are significant (i.e. their parameters are different fied. It can be understood as improvement, by use of past from zero). values of one variable, of forecasts of another, but has also k k other interpretations. The Granger causality ([17, 20]) X X was intended by Granger as a more general notion based yt = αjyt−j + βjxt−j + εt, (6) on two assumptions: “that for a time series, the cause j=1 j=1 preceded the effect and a causal series had information where: yt — observations of a variable of interest; xt−j about the effect that was not contained in any other se- — observations of causal variable; αj, βj — parameters ries according to the conditional distribution”§. Granger of the regression; εt — error term. The restricted version himself devoted several articles and book chapters to is- of the (6) model with the null restrictions imposed is k sues of causality and its role in economic analysis and X forecasts. Aside from the linear time-series approach yt = αjyt−j + ηt. (7) to causality, Granger [17] discussed spectral approach j=1 ¶ as early as in 1969 . In financial analysis, nonlinear The null hypothesis H0 : βj = 0 for all j = 1, ..., k corre- causality concepts and tests (e.g., risk causality, causal- sponds to lack of G-causality from X to Y , the alterna- ity in variance) are becoming more and more important. tive H1 : βj 6= 0 for some j ∈ {1, ..., k} means that the X The concept of the Granger causality is more complicated Granger-causes Y . The joint significance test is applied, than the simplest linear version. either in the form of χ2 or Fisher F distribution∗∗. This The general definition of the Granger causality in mean is called the Granger test of the Granger causality. is the following (Osińska [12], pp. 40–41). Let F(Y |X) Let σˆ2(ε) and σˆ2(η) denote estimators of error term variance for (6) and (7). The Wald test is computed as

§See [21] pp. 69–70. ¶See e.g., Osińska and Stawicki [22] for an example of empirical ∗∗See e.g. Feller [10], Vol. II, chapter II, sect. 3, for description application of causality across spectral frequency bands. of the both statistics. A-132 E.M. Syczewska, Z.R. Struzik

W 2 2 2 T = T [ˆσ (ε) − σˆ (η)]/σˆ (η), (8) Yt = A0 + A1Yt−1 + ... + AkYt−k + Et, where Y — the likelihood-ratio test as vector of variables of the model, Ai — matrices of pa- LR 2 2 rameters, k — number of lags, E — (column) vector of T = T ln[ˆσ (ε)/σˆ (η)] (9) T and the Lagrange Multiplier test as error terms, Et = [ε1t, ε2t, . . . , εmt] . The model can be LM 2 2 2 estimated with maximum likelihood method. T = T [ˆσ (ε) − σˆ (η)]/σˆ (ε), (10) For stationary Y , the test of non-causality can be per- W where T — sample size; the T is preferable; for small formed in the context of bivariate VAR as the Wald sample, the F-form of the tests (right-hand side expres- test of joint insignificance of all lags of one variable in sions are multiplied by (T − k)/q, instead of T , where k the equation explaining another. It can be generalized to — number of restriction, q — number of parameters of VAR with three variables, in which we test GC from X the model) has better properties (see [12], pp. 77-78, 84). to Y and vice versa, with Z as exogeneous regressor. In this framework, instantaneous Granger causality For nonstationary variables, the accepted practice is can be tested: the current X is included on the right- first to check whether there exists a linear stationary hand side with a parameter β0, the null hypothesis cor- combination of the Y1,Y2,...,Ym variables, so-called responds to βj = 0 for j = 0, 1, 2, ..., k. For example, cointegrating relationship which corresponds to a stable Malliaris and Urrutia [23] analyzed daily closing data dynamic long-run equilibrium relationship (see the fa- for index returns of the main stock exchanges (Sydney, mous paper by Engle and Granger [28], and Johansen [29] Tokyo, Hong Kong, Singapore, London and New York), for the appropriate tests)∗∗. The VAR regression should to study potential dependencies between the markets be- then include (stationary) terms corresponding to such a fore, during and after the crash of October 1987. They relationship. For a bivariate case, suppose that uˆt cor- test G-causality and instantaneous version. The null hy- responds to a cointegration relationship between the two pothesis of no causality cannot be rejected for the pre- variables, then the model appropriate for GC-test has the crash period; for the month of the market crash, the χ2 form: test statistics increased, indicating dependence. They de- k k X X tect bidirectional causality between New York and Lon- yt = α1jyt−j + β1jxt−j + γ1uˆt + ε1t, (11) don, New York and Hong Kong, London and Singapore, j=1 j=1 London and Tokyo, London and Sydney, Hong Kong and k k Sydney, and Tokyo and Sydney. X X x = α y + β x + γ uˆ + ε t, (12) Barnett et al. [3] start with a linear regression of the t 2j t−j 2j t−j 2 t 2 “predictee” variable on its lags and lags of “predictor”. j=1 j=1 They use the ratio of the residual variance of errors in where α1j, β1j, γ1 — parameters of the first equa- the restricted equation to the residual variance of unre- tion, α2j, β2j, γ2 — parameters of the second equation. stricted equation. In this they follow Geweke [24], the The null hypothesis β1j = 0 for j = 1, 2, . . . , k corre- same formula as in (9). sponds to lack of causality from X to Y , and the null hy- pothesis α1j = 0 for j = 1, 2, . . . , k to lack of G-causality 4.2. The Sims test of the Granger causality from Y to X. Let us briefly mention another kind of linear test: Recently it was shown by Toda and Yamamoto [31] Sims [25] defines X to be strict exogenuous relative to Y if and Bauer and Maynard [32] that in case of possibly non- the linear predictor of Yt based on past and future values stationary variables it is advisable to apply overparame- of X : ..., xt−1, xt, xt+1, ... is identical to the linear pre- terised VAR model to the pair of variables in question, dictor based only on current and past values of x, and has namely: to choose number of lags, k, in the VAR model shown those two definitions to be equivalent. The Sims according to the Akaike (or Bayesian Schwarz) Informa- test of G-causality from X towards Y is performed as a tion Criteria, and estimate the VAR model using k + r joint significance test of leads xt+1, xt+2, ..., xt+k in a re- lags for both variables, where r denotes maximum inte- †† gression of X on current value and lags of the Y variables gration order for the variables . The Wald test statistics and leads and lags of the X variable. Chamberlain [26] is then applied only to the first k parameters of the lagged extends the Granger and Sims causality definitions using causal variable. conditional independence instead of linear prediction and 5. Entropy and information-based measures shows that Granger and Sims definitions of causality are of dependence equivalent. In discussing potential transfer entropy and linear Granger causality equivalence, we should bear in mind 4.3. Testing G-causality in VAR and VECM framework — stationary vs. nonstationary series Next approach is to check causality for multivariate variables. Let Y denote column vector of variables ∗∗Bossomaier, Barnett and Harré [30] describe briefly Granger of interest: Y = [Y ,Y ,...,Y ]T . The vector autore- causality test in the context of the VAR model (they suggest to 1 2 m apply local linear approximation to a nonstationary series). gressive model (VAR in short), a generalization of the ††A variable is integrated of order r, if it is nonstationary, but its ARMA models of Box and Jenkins [27], is build as r-th differences are stationary. Order of integration is the minimum an autoregression of a (column) vector Y on its lags: number of differences required to achieve stationarity, see [28]. Granger Causality and Transfer Entropy for Financial Returns A-133 that there are several measures and definitions of entropy, Mutual information is the mutual reduction of uncer- starting with the Shannon definition, Rényi entropy and tainty of one variable by knowing the other ([2]). It is others. All can be used to construct measures of mu- nonnegative, equal to zero only for independent variables. tual information, slightly differing in properties; however It can be normalized to give a mutual information co- the Shannon entropy, H, seems to be most widely used. efficient (MIC in short), defined as: Hlaváčková et al. [2] write: “Besides Shannon and Rényi R(X,Y ) = 1 − exp (−2I(X,Y ))0.5 (18) entropy, other entropy definitions (Tsallis, etc.) are stud- for which (see Granger and Teräsvirta [36], Granger and ied in the mathematical literature, but Shannon entropy Lin [37]): is the only one possessing all the desired properties of an 1) 0 ≤ R(X,Y ) ≤ 1; information measure.” (p. 6). Jizba et al. [33] note that 2) R(X,Y ) = 0 ⇐⇒ X and Y are independent; the Rényi transfer entropy is not generally positive, and 3) R(X,Y ) = 1 ⇐⇒ Y = f(X), where f is an invertible the Rényi entropy emphasizes only parts of the probabil- function; ity density function. 4) R(X,Y ) is invariant to data transformation, i.e., The measures based on information theory are known R(X,Y ) = R(h (X), h (Y )), where functions h , h are to and applied by econometricians. Bruzda [34] de- 1 2 1 2 strictly monotonous; scribes conditions formulated by Granger, Maasoumi and 5) for bivariate Gaussian process (X,Y ) with correlation Racine [35] for the ideal measure of functional depen- coefficient ρ(X,Y ), R(X,Y ) simplifies to |ρ(X,Y )|. dence of two stochastic variables: Both I(X,Y ) and R(X,Y ) can be applied to X = X 1) Is well defined for discrete and for continuous vari- t and Y = X , j = 1, 2, ..., to detect autodependencies ables; t−j between a variable and its lags, not necessarily linear. 2) Is normalized to [0, 1] or [−1, 1] interval, and for in- dependent variables is equal to 0; 5.1. Example: R(X,Y ) applied to detecting nonlinearity 3) If there is a measurable function f : Y = f(X), then Bruzda [38] presents simulation experiment for an absolute value of this measure = 1; the nonlinear MA model, ARCH(2), GARCH(1,1), 4) For Gaussian bivariate distribution of (X,Y ), the mea- GARCH-M, three variants of bilinear processes and two sure is either equal to correlation coefficient ρ or is a sim- variants of linear processes: Gaussian white noise and ple function of ρ; Gaussian stationary AR(1) process. Her results are the 5) Fulfills conditions of a distance; following: (1) The mutual information coefficient was able 6) Is invariant to continuous and strictly monotonous to detect all cases of nonlinearity; (2) The maximum cor- transformations of variables. relation coefficient did almost as well, and was able to Let S be a discrete random variable, taking val- detect nonlinearity in variance; (3) The entropy measure ues s1, s2, . . . , sm with corresponding probability pi, i = could distinguish between GARCH-type nonlinearity and 1, 2, . . . , m. The (Shannon) entropy is an “average amount bilinear nonlinearity. (4) For a nonlinear moving-average of information gained from a measurement of one partic- process MA(1), both the mutual information coefficient ular value” (see [2]): and entropy measure cut off after 1 lag, hence perhaps m can be used as a tool for detecting number of lags in such X H(S) = − pi log pi, (13) a model. i=1 5.2. Example: mutual information measures applied next the joint entropy H(X,Y ) of two discrete random to rates of return variables is defined as m m Orzeszko [39] with use of the mutual information and XX XY H(X,Y ) = − p(x , y ) log p(x , y ), (14) the mutual information coefficient for rates of return of i j i j stock exchange indices, proves existence of some autode- i=1 j=1 pendencies in rates of return for the BUX, DAX, CAC20, where x , i = 1, . . . , m — values of the (discrete) vari- i X DJIA, FTSE, HangSeng, Nasdaq, Nikkei, SP500, and able X, y , j = 1, . . . , m — values of the (discrete) vari- j Y WIG20, MWIG40 and SWIG80 for a period 2001/01/02– able Y , p(x , y ) — the joint probability that X is in state i j 2011/06/30. The log returns were first filtered with an x and Y is in state y . The joint entropy is expressed in i j ARMA-GARCH model with t-Student distribution, and terms of conditional entropy H(Y |X): mutual corelation coefficients were applied (and checked H(Y |X) = H(X,Y ) − H(Y ). (15) for significance) for both a series and model errors. In the discrete case the conditional entropy is equal to In some cases this indicated dependencies not explained m m XX XY by the ARMA-GARCH model. H(Y |X) = p(xi, yj)p(yj|xi), (16) 6. G-causality and transfer entropy i=1 j=1 — equivalence only under Gaussianity? where p(yj|xi) denotes conditional probability. Let us address the question of GC and TE equivalence, The mutual information (MI in short) between two starting with description of line of reasoning of Barnett, variables is defined as (see [2, 36–38]): Barrett and Seth [3]. First, they formulate the measure of I(X,Y ) = H(X) + H(Y ) − H(X,Y ). (17) GC as the variance ratio of the restricted and unrestricted linear model error terms, in our notation — equation (7) A-134 E.M. Syczewska, Z.R. Struzik and (6), and the fraction similar to that in (9). Next they Granger causality, and problems of estimation of entropy extend this proportion to multivariate variables. More measures — is performed in [18]. The starting point is precisely, for jointly distributed random variables X, Y also the GC definition based on forecasting interpreta- let Σ(X) denote a covariance matrix of X, Σ(Y ) for Y , tion, and its linear test. Hlaváčková-Schindler next ad- and Σ(X, Y ) = [cov (Xi,Yi)] — cross-covariance matrix dresses question of several methods of entropy estima- for the two variables. They introduce partial covariance tion. In this chapter, she does not investigate question matrix, Σ(Y |X) and show that under certain conditions of other multivariate distributions and their influence on this equals the covariance matrix of the error term in the TE estimation, and on possible equivalence. linear regression of Y on X. Hlaváčková-Schindler et al. [2] describe several other As a measure of Granger causality they use the ratio measures of dependence, and variants of Granger causal- of determinants of the two conditional covariance matri- ity measures (both linear and nonlinear). They note ces: in the numerator, the partial covariance matrix of Y among others, the notion of Granger causality based on conditioned only on its own past (this corresponds to a generalized correlation integrals, used to construct the restricted multivariate model), in the denominator, the Hiemstra–Jones test [40] and its generalizations. determinant of the partial covariance matrix of Y con- Barnett and Seth [19] build the GC measure in the ditioned on its past and the past of X (this corresponds time domain as the likelihood ratio, i.e. again as ratio to the unrestricted model). of appropriate covariance matrices, but provide at the  det(Σ(Y |Y −))  same time tools for GC measure in the frequency domain. FX→Y = ln − − . Their MATLAB toolbox seems to be promising, however det(Σ(Y |Y ∪ X )) does not include tools for Granger causality testing in the If an additional variable, Z, is included on the right-hand nonstationary case of cointegrated variables. In Barnett side of the multivariate model, the measure of GC from and Bossomaier [5] paper, equivalence of GC and TE is X to Y conditional on Z becomes in this notation: shown. In addition, they mention that the VAR linear  det(Σ(Y |Y − ∪ Z−))  model does not necessarily provide the best model for F = ln , X→Y |Z det(Σ(Y |Y − ∪ X− ∪ Z−)) a variable in question, sometimes diagnostic tools indi- cate e.g., heteroskedasticity, thus suggesting the GARCH − − where det(·) denotes determinant of a matrix, Y ∪ Z models; they say: “Now it may be far from clear how one — past values of the Y and Z variable, etc. should define a Granger-like predictive statistic for such Following Schreiber [1], Barnett et al. [3] define trans- models”. In our opinion, the answer is to apply proce- fer entropy of Y to X as “the difference between the dures well known and often applied in financial economet- entropy of Y conditioned on its own past and the past rics — to estimate an ARMA-GARCH or similar model, of Z, and its entropy conditioned, in addition, on the and then apply the GC measure to the residuals (as was past of X”, hence the notation: mentioned in subsection 5.2). − −1 − − − TX→Y |Z =H(Y |Y ∪ Z )−H(Y |Y ∪ X ∪ Z ), 7. Concluding remarks where H(·) denotes entropy and H(·|·) — conditional en- In the context of biomedical data analysis, the assump- tropy. They show that for multivariate Gaussian random tion of Gaussianity and even stationarity may in some variable, its entropy is a function of natural logarithm of cases be fulfilled, but it is not quite realistic in the case the covariance matrix determinant, and that conditional of financial variables. If we take into account such fea- entropy H(Y |X) is proportional to the determinant of tures and note that in general H(f(X)) ≤ H(X), with the corresponding partial covariance matrix. Hence equality only for Gaussian distribution; that estimating F = 2T , of covariance matrix can be more numerically demand- X→Y |Z X→Y |Z ing for data showing heteroskedasticity and long-term de- where the first expression corresponds to the measure of pendence; that distribution of (error terms in models) of Granger causality, the second — to the transfer entropy. financial returns are better approximated by asymmet- Note that Barnett, Barret and Seth in this 2009 paper ric distributions such as (1) or (2) — then we note that do not cover other variants of entropy nor the questions the TE and GC in the context of linear models, does not of entropy estimation. exhaust the topic of the Granger causality. Hlaváčková-Schindler [4] turns to the question of an- In our opinion, other variants of the Granger causal- alytical expressions for entropy of several multivariate ity methods (nonlinear, nonparametric) developed in the distributions — aside from the Gaussian distribution, she field of econometrics, deserve more attention outside of investigates generalize-normal distribution, the Weinman this area. It seems that they are starting to appear re- exponential distribution and the multivariate log-normal cently in neuroscience and similar journals. distribution, showing that for all three the equivalence of For financial applications, TE and GC should be com- FX→Y |Z and TX→Y |Z holds. The starting point is still pared also for (skewed and leptokurtic) distributions used the linear version of the Granger causality. The proof in financial returns modeling, such as skewed t distri- goes along comparison of the corresponding variances bution (1), or the GED (2) of Subbotin [13] and its and TE measure for a Gaussian distribution. More de- skewed generalization of Theodossiou [14]. It is known tailed analysis — covering also nonlinear versions of the that with increase of observations frequency, the financial Granger Causality and Transfer Entropy for Financial Returns A-135 variables distribution similarity to the Gaussian distri- [19] L. Barnett, A.K. Seth, J. Neurosci. Meth. 223, 50 bution decreases with increase of observations frequency (2014). (monthly data can be described with a Gaussian distri- [20] C.W.J. Granger, J. Econ. Dyn. Control 2 329 bution, for daily data it would not work well especially (1980). in modelling and forecasting such features as measures of [21] C.W.J. Granger, J. Econometrics 112, 69 (2003). risk and volatility, needed to make investment decisions). [22] M. Osińska, J. Stawicki, Testing for causality across The more accurate choice of a distribution is crucial for spectral frequency bands, in: Some aspects of the several financial decisions and forecasts. dynamic econometric modelling, Ed. Z. Zieliński, Studies concerning the Shannon entropy representa- Wydawnictwo Uniwersytetu Mikołaja Kopernika, tion for skewed distributions, as the one by Arellano-Valle Toruń 1993, p. 135. et al. [41], seem to be promising base for further research. [23] A.G. Malliaris, J.L. Urrutia, J. Financ. Quant. Also availability of software such as MATLAB software Anal. 27, 353 (1992). by Seth [42] or [19] (and several packages in R) will be [24] J.F. Geweke, J. Am. Stat. Assoc. 77, 304 (1982). helpful. [25] C.A. Sims, Am. Econ. Rev. 62, 540 (1972). In addition, question of precision and robustness of [26] G. Chamberlain, Econometrica 50, 569 (1982). measures (depending e.g. on version of entropy esti- [27] G.E.P. Box, G.M. Jenkins, Time Series Analysis. mates) should be addressed in future research. Forecasting and control, Holden-Day, San Francisco 1976. References [28] C.W.J. Granger, R.F. Engle, Econometrica 55, 251 (1987). [1] Th. Schreiber, Phys. Rev. Lett. 85, 461 (2000). [29] S. Johansen, Likelihood-based inference in cointe- [2] K. Hlávačková-Schindler, M. Paluš, M. Vejmelka, grated vector autoregressive models, Oxford University J. Bhattacharya, Phys. Rep. 441, 1 (2007). Press, Oxford 1995. [3] L. Barnett, A. Barrett, A. Seth, Phys. Rev. Lett. [30] T. Bossomaier, L. Barnett, M. Harré, Complex Adap- 103 238701 (2009). tive Systems Modeling 1, 9 (2013). [4] K. Hlaváčková-Schindler, Appl. Math. Sci. 5, 3637 [31] H.Y. Toda, T. Yamamoto, J. Econometrics 66, 225 (2011). (1995). [5] L. Barnett, T. Bossomaier, Phys. Rev. Lett. 109, [32] D. Bauer, A. Maynard, J. Econometrics 169, 293 138105 (2012). (2012). [6] R.F. Engle, Econometrics 50, 987 (1982). [33] P. Jizba, H. Kleinert, M. Shefaat, Physica A 391, [7] T. Bollerslev, J. Econometrics 31, 307 (1986). 2971 (2012). [8] C. Alexander, Market Risk Analysis. Volume II. [34] J. Bruzda, Procesy nieliniowe i zależności dłu- Practical Financial Econometrics, John Wiley, Chich- gookresowe w ekonomii. Analiza kointegracji nielin- ester 2008. iowej, Wydawnictwo Naukowe Uniwersytetu Mikołaja [9] B. Mandelbrot, J. Bus. 36, 394 (1963). Kopernika, Toruń 2007. [10] W. Feller, An Introduction to Probability Theory and [35] C.W.J. Granger, E. Maasoumi, J. Racine, J. Time Its Applications, Vol. II, John Wiley, New York 1957. Series Analysis 25, 649 (2004). [11] S.T. Rachev, Y.S. Kim, M.L. Bianchi, Frank J. [36] C.W.J. Granger, T. Teräsvirta, Modelling Nonlin- Fabozzi Series: Financial Models with Lévy Processes ear Economic Relationships, Oxford University Press, and Volatility Clustering, John Wiley, Hoboken, NJ Oxford 1993. 2011. [37] C.W.J. Granger, J.-L. Lin, J. Time Series Analysis [12] M. Osińska, Ekonometryczna analiza zależności przy- 15, 371 (1994). czynowych, Wydawnictwo Naukowe Uniwersytetu [38] J. Bruzda, AUNC 34, 183 (2004). Mikołaja Kopernika, Toruń, 2008. [39] W. Orzeszko, Przegląd Statystyczny 59, 369 (2012). [13] M.T. Subbotin, Matematicheskii Sbornik 31, 296 [ ] C. Hiemstra, J.D. Jones, J. Financ. 49, 1639 (1994). (1923). 40 [41] R. Arellano-Valle, J.E. Contreras-Reyes, M.G. Gen- [14] P. Theodossiou, SSRN 11, 1 (2000). ton, Scand. J. Stat. 40, 42 (2012). [15] R.S. Tsay, Analysis of Financial Time Series, 3rd ed., [ ] A.K. Seth, J. Neurosci. Meth. 186, 262 (2010). John Wiley, Hoboken 2007. 42 [16] E.M. Syczewska, Metody Ilościowe w Badaniach Eko- nomicznych XV/4, 169 (2014). [17] C.W.J. Granger, Econometrica 37, 424 (1969). [18] K. Hlaváčková-Schindler, Causality in Time Series: Its Detection and Quantification by Means of Infor- mation Theory, in: Information Theory and Statis- tical Learning, Ed. F. Emmert-Streib, M. Dehmer, Springer-Verlag, New York 2009, p. 183.