Fitting Graphical Interaction Models to Multivariate Time Series
Michael Eichler Department of Quantitative Economics University of Maastricht 6200 MD Maastricht, Netherlands [email protected]
Abstract A particularly simple graphical representation is pro- vided by graphical interaction models, which visual- ize dependencies by undirected graphs. For time se- Graphical interaction models have become ries, this approach has been discussed by Dahlhaus an important tool for analysing multivariate (2000), who introduced so-called partial correlation time series. In these models, the interrela- graphs. These are undirected graphs, in which each tionships among the components of a time component of the time series is represented by one ver- series are described by undirected graphs in tex. The conditional independences encoded by such which the vertices depict the components graphs can be formulated in terms of the inverse of the while the edges indictate possible dependen- spectral matrix. This has led to nonparametric tests cies between the components. Current meth- for the presence of an edge in the partial correlation ods for the identi cation of the graphical graph based on the maximum of the partial spectral structure are based on nonparametric spec- coherence (Dahlhaus et al. 1997) or on the integrated tral estimation, which prevents application partial spectral coherence (Eichler 2004). The concept of common model selection strategies. In of partial correlation graphs has been used in many this paper, we present a parametric approach applications from various scienti c elds (e.g., Eichler for graphical interaction modelling of multi- et al. 2003, Timmer et al. 2000, Gather et al. 2002, variate stationary time series. The proposed Fried and Didelez 2003). models generalize covariance selection models to the time series setting and are formulated The main disadvantage of the current nonparametric in terms of inverse covariances. We show approach to graphical modelling based on partial cor- that these models correspond to vector au- relation graphs is the lack of a rigorous theory for toregressive models under conditional inde- identifying the best tting graph. An alternative to pendence constraints encoded by undirected the nonparametric approach is the tting of paramet- graphs. Furthermore, we discuss maximum ric graphical models where the parameters are con- likelihood estimation based on Whittle’s ap- strained with respect to undirected graphs. The prob- proximation to the log-likelihood function lem of estimating the dependence structure of the pro- and propose an iterative method for solving cess now becomes a problem of model selection where the resulting likelihood equations. The con- the best approximating model minimizes some chosen cepts are illustrated by an example. model distance such as the Kullback-Leibler informa- tion divergence. In this paper, we propose graphical interaction mod- 1 INTRODUCTION els for stationary Gaussian time series that are de ned in terms of inverse covariances. In these models, the conditional independences encoded by an undirected Graphical models have become an important tool for graph correspond to zero constraints on the parame- analysing multivariate data. While the theory origi- ters. In Section 2, we review the concept of condi- nally has been developed for variables that are sam- tional independence graphs for stationary time series. pled with independent replications, graphical models For Gaussian processes, these graphs are identical to recently have been applied also to stationary multivari- partial correlation graphs, which are restricted to lin- ate time series (e.g., Brillinger 1996, Dahlhaus 2000, ear dependencies. In Section 3, we introduce para- Eichler 1999, 2001, 2006, Dahlhaus and Eichler 2003). metric graphical interaction models for Gaussian pro- for all distinct a, b V . cesses and discuss their relation to vector autoregres- ∈ sive models. In Section 4, we discuss estimation of the The conditional independence graph encodes the pair- model parameters based on Whittle’s approximation wise conditional independence relations that hold for X to the log-likelihood function; an iterative algorithm the process V . Under additional assumptions, more for computing the resulting maximum likelihood es- general conditional independence relations can be de- timates is presented in Section 5. In Section 6, the rived from it. Such properties that allow to asso- tting of graphical interaction models for time series ciate certain statements about the graph G with corre- is illustrated by an example with air pollution data, sponding conditional independence statements about X and Section 7 concludes. the variables in V are called graphical Markov prop- erties. For instance, if XV is a Gaussian process X 2 UNDIRECTED GRAPHS FOR such that Assumption 2.1 holds, then V obeys the so-called global Markov property with respect to its TIME SERIES conditional independence graph G: for disjoint sets 0 A, B, S V , we say that S separates A and B if ev-
X X X Let V = V (t) t∈ with V (t) = Xv(t) v∈V be ery path between A and B intersects S; this will be a stationary¡Gaussian¢ process with mean¡ zero¢and co- on X
¡ denoted as A B S. Then satis es the global X X 0 V variances V V (u) = V (t) V (t u) . Throughout Markov property with| respect to G if the paper, we will make the following assumption. on X X X Assumption 2.1. A B S in G A B S The spectral density matrix | ⇒ ⊥⊥ | 1 ∞ f ( ) = (u) e i u for all disjoint subsets A, B, S V . We also say that V V 2 V V X is Markov for the graph G . For details, we refer u=P ∞ V to Dahlhaus (2000). exists and its eigenvalues are bounded and bounded away from zero uniformly for all [ , ]. For Gaussian processes X , inference about condi- ∈ V tional independence graphs is commonly based in The structure of the interdependencies among the the frequency domain (e.g., Dahlhaus 2000, Dahlhaus X components of a multivariate time series V can be et al. 1997, Fried and Didelez 2003). Here, the con- described in terms of conditional independencies be- ditional dependence between components X A and tween complete components of XV . More precisely, XB given XS can be described by the partial cross- let A, B, and S be disjoint subsets of V . Then the spectrum of XA and XB given XS, subprocesses XA and XB are said to be conditionally 1 independent given XS if f ( ) = f ( ) f ( )f ( ) f ( ), AB|S AB AS SS SB ¡ X X X
g( A) h( B) S which is the cross-spectrum of the residual processes ¡ ¡ | ¡ ¢ ε ε = g(XA) XS h(XB) XS A|S and B|S obtained by removing the linear e ects ¡ | ¢ ¡ | ¢ of the series X from the processes X and X , re-
for all real-valued measurable functions g( ) and h( ) S A B
A B spectively (cf Dahlhaus 2000). Equivalently, we can ¢ on ¢ and , respectively. In this case, we R X X X X use the partial spectral coherency AB|S ( ) of A write A B S. X X ⊥⊥ | and B given S, which is given by As in the case of ordinary random variables, condi- fab|S ( ) tional independence relations of this form can be en- Rab|S( ) = f ( )f ( ) coded by undirected graphs. Recall that an undirected p aa|S bb|S graph G is de ned as a pair (V, E) where V is a set for a A and b B. For Gaussian processes, we have of vertices or nodes and E is a collection of undirected ∈ ∈ X X X edges a b for distinct nodes a, b V . Then each A B S RAB|S ( ) = 0 [ , ]. (1) ∈ ⊥⊥ | ⇔ ∀ ∈ component Xv of a process XV may be represented by a single vertex v in a graph G = (V, E), while the For random vectors, the partial correlations can be edges in E indicate possible dependencies among the obtained from the inverse of the covariance matrix. components of XV . This leads to the following de ni- Dahlhaus (2000) showed that a similar relationship tion of conditional independence graphs. holds between the partial spectral coherences and the
De nition 2.2. Let XV be a stationary process. The inverse of the spectral matrix. More precisely, let 1 conditional independence graph associated with XV is gV V ( ) = fV V ( ) denote the inverse spectral ma- a graph G = (V, E) with vertex set V and edge set E trix. Then, under Assumption 2.1, we have such that gab( ) X X X Rab|V \{a,b}( ) = . (2) a b / E a b V \{a,b} g ( )g ( ) ∈ ⇔ ⊥⊥ | aa bb p This relation provides an e cient method for com- conditionally independent given X V \{a,b} if and only if puting estimates for the partial spectral coherences the corresponding entry in the inverse spectral matrix ˆ 1 Rab|V \{a,b}( ) of a process XV : let fV V ( ) be an es- f( ) vanishes for all frequencies, that is, timate of the spectral matrix such as a nonparametric ˆ 1 d p kernel spectral estimate and let gˆV V ( ) = fV V ( ) i (v u) Kkl aka(u) alb(v) e = 0 its inverse; then the partial spectral coherences can k,lP=1 u,vP=0 be estimated by substituting the entries of gˆV V ( ) for
for all [ , ], where a(0) = and a(u) = 0 the corresponding entries of gV V ( ) in (2). We note ∈ that for non-Gaussian processes, this nonparametric whenever u < 0 or u > p. Thus, the relation X X X imposes the following 2p + 1 re- frequency domain approach can still be used for infer- a ⊥⊥ b | V \{a,b} ence on the partial correlation graph, which describes strictions on the parameters: only the linear dependencies among the variables. d p K a (u) a (u + h) = 0, h = p, . . . , p. kl ka lb 3 PARAMETRIC MODELLING k,lP=1 uP=0 It is clear from these expressions that estimation of The nonparametric frequency domain approach can the parameters, for example, by maximization of the easily be used for an exploratory analysis of the de- likelihood function under such constraints would be pendence structure of stationary time series, but it di cult if not infeasible. provides only insu cient tools for the identi cation of conditional independence graphs. Moreover, it does 3.2 GRAPHICAL INTERACTION not include methods for building new, parsimonious MODELS models that can be used e.g. for forecasting. As an alternative, we consider parametric models that are Alternatively to the frequency domain approach, we Markov for a given graph G and, thus, satisfy the can view the distribution of a Gaussian process XV as conditional independencies encoded by G. Here, the a Gaussian Markov random eld and describe the con- main obstacle will be the constraints on the model ditional independencies among the variables in XV in parameters that are imposed by the graph: for classi- terms of the inverse of the covariance matrix of XV . To cal time series models such as vector autoregressions, make this idea explicit, assume that XV satis es As- the constraints on the parameters are typically non- sumption 2.1. Then the inverse of the spectral matrix linear and prevent e cient algorithms for parame- f( ) exists and has an absolutely convergent Fourier ter estimation. Therefore we introduce graphical in- expansion given by teraction models for stationary Gaussian processes, ∞ in which conditional independence restrictions corre- f( ) 1 = 2 (i)(u) e i u spond to zero constraints on the parameters. u=P ∞ (e.g., Bhansali 1980). Simple calculations show that 3.1 VECTOR AUTOREGRESSIONS the Fourier coe cients
To illustrate the problems encountered by impos- (i) 1 1 i u (u) = 2 f( ) e , u ¢ , (4) ing conditional independence restriction of the form 4 Z ∈ Xa Xb XV \{a,b} on a time series model, we brie y ⊥⊥ | where = [ , ], satisfy consider the case of vector autoregressive models. Sup- pose that XV is a stationary process given by ∞ (i)