Poisson Graphical Granger Causality by Minimum Message Length

Poisson Graphical Granger Causality by Minimum Message Length KateˇrinaHlaváˇcková-Schindler1;2 and Claudia Plant1;3 1 Faculty of Computer Science, University of Vienna, Vienna, Austria 2 Institute of Computer Science of the Czech Academy of Sciences, Prague, Czech Republic, [email protected] 3 ds:UniVie, University of Vienna, Vienna, Austria, [email protected] Abstract. Graphical Granger models are popular models for causal inference among time series. In this paper we focus on the Poisson graphical Granger model where the time series follow Poisson distribution. We use minimum message length principle for determination of causal connec- tions in the model. Based on the dispersion coefficient of each time series and on the initial maximum likelihood estimates of the regression coefficients, we propose a minimum message length criterion to select the subset of causally connected time series with each target time series. We propose a genetic-type algorithm to find this set. To our best knowledge, this is the first work on applying the minimum message length principle to the Poisson graphical Granger model. Common graphical Granger models are usually applied in scenarios when the number of time observations is much greater than the number of time series, normally by several orders of magnitude. In the opposite case of "short" time series, these methods often suffer from overestimation. We demonstrate in the experiments with synthetic Poisson and point process time series that our method is for short time series superior in precision to the compared causal inference methods, i.e. the heterogeneous Granger causality method, the Bayesian causal inference method using structural equation models LINGAM and the point process Granger causality. Keywords: Granger causality · Poisson graphical Granger model · minimum message length · ridge regression for GLM 1 Introduction Granger causality is a popular method for causality analysis in time series due to its computational simplicity. Its application to time series with non-Gaussian distribution can be however misleading. Recently, Behzadi et al. in [2] proposed the heterogeneous graphical Granger Model (HGGM) for detecting causal relations among time series having a distribution from the exponential family, which includes a wider class of common distributions. HGGM employs regression in generalized linear models (GLM) with adaptive Lasso as a variable selection method and applies it to time series with a given lag. The approach allows to apply causal inference among time series with discrete values. Poisson graphical 2 Hlaváˇcková-Schindler and Plant Granger model (PGGM) is a special case of HGGM for detecting Granger-causal relationships among p ≥ 3 Poisson processes. Each process in the model, rep- resented by time series, is a count. A count process can be e.g. a process of events such as the arrival of a telephone call at a call centre in a time interval. Poisson processes can serve as models of point process data, including neural spike trains, [4]. Poisson graphical Granger model may be appropriate when in- vestigating temporal interactions among processes as e.g. the number of transit passengers of an airport within a time period or in criminology, when temporal relationships among various crimes in some time interval are investigated. In this paper we approach the inference in the Poisson graphical Granger model by the principle of minimum message length (MML). { We use minimum message length principle for determination of causal con- nections in the Poisson graphical Granger model. { For the highly collinear design matrix of the model we define a corrected form of the Fisher information matrix using the ridge penalty. { Based on the dispersion coefficient of each time series and on the initial maximum likelihood estimates of the regression coefficients, we propose a minimum message length criterion to select the subset of causally connected time series with each target time series. { We propose a genetic-type algorithm to find this set. { To our best knowledge, this is the first work on applying the minimum message length principle to the Poisson graphical Granger model. { We demonstrate experimentally that our method is superior in precision to the compared causal inference methods, i. e. the heterogeneous Granger causality method, the Bayesian causal inference method using structural equation models LINGAM [23] and the point process Granger causality in the case of short data, i.e. when the number of time observations is approx- imately of the same order as the number of time series. The paper is organized as follows. Section 2 presents preliminaries, concretely Granger causality and the Poisson graphical Granger model. Section 3 presents the PGGM as an instance of multiple Poisson regression. The MML code for PGGM is computed and the main theorem is stated in Section 4. The algorithm to compute the MML code for PGGM and the genetic algorithm for variable selection in PGGM are explained in Section 5. Related work is discussed in Section 6. Our experiments are in Section 7. Section 8 is devoted to conclusions and the derivation of the criterion can be found in Appendix. 2 Preliminaries Relevance of Granger Causality Since its introduction, there has been lead a criticism of Granger causality, since it e.g. does not take into account counter- factuals, [12], [17]. As its name implies, Granger causality is not necessarily true causality. In defense of his method, Granger in [6] wrote: "Possible causation is Poisson Graphical Granger Causality by Minimum Message Length 3 not considered for any arbitrarily selected group of variables, but only for variables for which the researcher has some prior belief that causation is, in some sense, likely." In other words, drawing conclusions about the existence of a causal relation between time series and about its direction is possible only if theoretical knowledge of mechanisms connecting the time series is accessible. Nevertheless as confirmed by a recent Nature publication [16], if the theoretical background of investigated processes is insufficient, methods to infer causal relations from data rather than knowledge of mechanisms (Granger causality including) are helpful. These methods can also make possible to perform credible analyses with large amount of observational time data, e.g. in social networks [11], since they are less costly than common epidemiological or marketing research approaches. Graphical Granger model The (Gaussian) graphical Granger model extend the autoregressive concept of Granger causality to p ≥ 2 time series and time t t lag d ≥ 1 [1]. Let x1; : : : ; xp be p time series, t = 1; : : : ; n. Consider the vector autoregressive (VAR) models with lag d for i = 1; : : : ; p t Lag 0 t xi = Xt;d βi + "i (1) Lag t−d t−1 t−d t−1 where Xt;d = (x1 ; : : : ; x1 ; : : : ; xp ; : : : ; xp ) and βi be a matrix of the t Lag 0 regression coefficients and "i be white noise. One can easily show that Xt;d βi = Pp Pd t−l l j=1 l=1 xj βj. One says the time series xj Granger{causes the time series xi for the given lag d, denote xj ! xi for i; j = 1; : : : ; p if and only if at least one of the d coefficients in j − th row of βi in (1) is non-zero. Poisson graphical Granger model The Poisson graphical Granger model has the form p d t t Lag 0 X X t−l l xi ≈ λi = exp(Xt;d βi) = exp( xj βj) (2) j=1 l=1 t for xi, i = 1; : : : ; p; t = d + 1; : : : ; n having a Poisson distribution. Applying the HGGM approach to the case, when the link function for each process xi is function exp, problem (2) can be solved as n ^ X t Lag 0 2 βi = arg min (xi − exp(Xt;d βi)) + ρiR(βi) (3) βi t=d+1 for a given lag d > 0 and all t = d + 1; : : : ; n with R(βi) adaptive Lasso penalty function. (The sign 0 denotes a transpose of a matrix). One says, the time series xj Granger{causes the time series xi for the given lag d, denote xj ! xi for ^ i; j = 1; : : : ; p if and only if at least one of the d coefficients in j − th row of βi of the solution of (3) is non-zero [2]. 4 Hlaváˇcková-Schindler and Plant 3 Poisson Graphical Granger Model as Multiple Poisson Regression In this section we will derive the Poisson Granger model (2) with a fixed lag d as an instance of a multiple Poisson regression with a fixed design matrix. Consider t the full model for p Poisson variables xi and (integer) lag d ≥ 1 corresponding to the optimization problem (2). To be able to use the maximum likelihood (ML) estimation over the regression parameters, we reformulate the matrix of lagged Lag time series Xt;d from (1) into a fixed design matrix form. Assume n − d > pd d+1 d+2 n and denote xi = (xi ; xi ; : : : ; xi ). We construct the (n − d) × (d × p) design matrix 2 d 1 d 1 3 x1 : : : x1 : : : xp : : : xp d+1 2 d+1 2 6x1 : : : x1 : : : xp : : : xp 7 X = 6 7 (4) 6 . 7 4 . 5 n−1 n−d+1 n−1 n−d+1 x1 : : : x1 : : : xp : : : xp 1 d 1 d and a 1×(d×p) vector βi = (β1 ; : : : ; β1 ; : : : ; βp; : : : ; βp ). We can see that problem 0 0 xi ≈ λi = exp(Xβi) (5) is equivalent to problem (2) in the matrix form where we mean by exp a function d+1 d+1 operating on each coordinate i = d + 1; : : : ; n and λi = (λi ; : : : ; λi ). Denote now by γi ⊂ Γ = f1; : : : ; pg the subset of indices of regressor's 1×(d×ki) variables and ki := jγij its cardinality.

Poisson Graphical Granger Causality by Minimum Message Length

Arxiv:1502.07813V1 [Cs.LG] 27 Feb 2015 Distributions of the Form: M Pr(X;M ) = ∑ W J F J(X;Θ J) (1) J=1

Nmo Ash University

A Minimum Message Length Criterion for Robust Linear Regression

Suboptimal Behavior of Bayes and MDL in Classification Under

Efficient Linear Regression by Minimum Message Length

Arxiv:0906.0052V1 [Cs.LG] 30 May 2009 Responses, We Can Write the Y Values in an N × 1 Vector Y and the X Values in an N × M Matrix X

Model Selection for Mixture Models-Perspectives and Strategies Gilles Celeux, Sylvia Frühwirth-Schnatter, Christian Robert

Minimum Message Length Ridge Regression for Generalized Linear Models

Bayesian Model Averaging Across Model Spaces Via Compact Encoding

0A1AS4 Tjf"1/47

Model Identification from Many Candidates

Model Selection by Normalized Maximum Likelihood