<<

Fitting Graphical Models to Multivariate

Michael Eichler Department of Quantitative Economics University of Maastricht 6200 MD Maastricht, Netherlands [email protected]

Abstract A particularly simple graphical representation is pro- vided by graphical interaction models, which visual- ize dependencies by undirected graphs. For time se- Graphical interaction models have become ries, this approach has been discussed by Dahlhaus an important tool for analysing multivariate (2000), who introduced so-called time series. In these models, the interrela- graphs. These are undirected graphs, in which each tionships among the components of a time component of the time series is represented by one ver- series are described by undirected graphs in tex. The conditional independences encoded by such which the vertices depict the components graphs can be formulated in terms of the inverse of the while the edges indictate possible dependen- spectral matrix. This has led to nonparametric tests cies between the components. Current meth- for the presence of an edge in the partial correlation ods for the identication of the graphical graph based on the maximum of the partial spectral structure are based on nonparametric spec- coherence (Dahlhaus et al. 1997) or on the integrated tral estimation, which prevents application partial spectral coherence (Eichler 2004). The concept of common strategies. In of partial correlation graphs has been used in many this paper, we present a parametric approach applications from various scientic elds (e.g., Eichler for graphical interaction modelling of multi- et al. 2003, Timmer et al. 2000, Gather et al. 2002, variate stationary time series. The proposed Fried and Didelez 2003). models generalize selection models to the time series setting and are formulated The main disadvantage of the current nonparametric in terms of inverse . We show approach to graphical modelling based on partial cor- that these models correspond to vector au- relation graphs is the lack of a rigorous theory for toregressive models under conditional inde- identifying the best tting graph. An alternative to pendence constraints encoded by undirected the nonparametric approach is the tting of paramet- graphs. Furthermore, we discuss maximum ric graphical models where the parameters are con- likelihood estimation based on Whittle’s ap- strained with respect to undirected graphs. The prob- proximation to the log- lem of estimating the dependence structure of the pro- and propose an iterative method for solving cess now becomes a problem of model selection where the resulting likelihood equations. The con- the best approximating model minimizes some chosen cepts are illustrated by an example. model distance such as the Kullback-Leibler informa- tion . In this paper, we propose graphical interaction mod- 1 INTRODUCTION els for stationary Gaussian time series that are dened in terms of inverse covariances. In these models, the conditional independences encoded by an undirected Graphical models have become an important tool for graph correspond to zero constraints on the parame- analysing multivariate . While the theory origi- ters. In Section 2, we review the concept of condi- nally has been developed for variables that are sam- tional independence graphs for stationary time series. pled with independent replications, graphical models For Gaussian processes, these graphs are identical to recently have been applied also to stationary multivari- partial correlation graphs, which are restricted to lin- ate time series (e.g., Brillinger 1996, Dahlhaus 2000, ear dependencies. In Section 3, we introduce para- Eichler 1999, 2001, 2006, Dahlhaus and Eichler 2003). metric graphical interaction models for Gaussian pro- for all distinct a, b V . cesses and discuss their relation to vector autoregres- ∈ sive models. In Section 4, we discuss estimation of the The conditional independence graph encodes the pair- model parameters based on Whittle’s approximation wise conditional independence relations that hold for X to the log-likelihood function; an iterative algorithm the process V . Under additional assumptions, more for computing the resulting maximum likelihood es- general conditional independence relations can be de- timates is presented in Section 5. In Section 6, the rived from it. Such properties that allow to asso- tting of graphical interaction models for time series ciate certain statements about the graph G with corre- is illustrated by an example with air pollution data, sponding conditional independence statements about X and Section 7 concludes. the variables in V are called graphical Markov prop- erties. For instance, if XV is a X 2 UNDIRECTED GRAPHS FOR such that Assumption 2.1 holds, then V obeys the so-called global Markov property with respect to its TIME SERIES conditional independence graph G: for disjoint sets 0 A, B, S V , we say that S separates A and B if ev-

X X X Let V = V (t) t∈ with V (t) = Xv(t) v∈V be ery path between A and B intersects S; this will be a stationary¡Gaussian¢ process with ¡ zero¢and co- on X

¡ denoted as A B S. Then satises the global X X 0 V V V (u) = V (t) V (t u) . Throughout Markov property with| respect to G if the paper, we will make the following assumption. on X X X Assumption 2.1. A B S in G A B S The matrix | ⇒ ⊥⊥ | 1 ∞ f () = (u) eiu for all disjoint subsets A, B, S V . We also say that V V 2 V V X is Markov for the graph G. For details, we refer u=P∞ V to Dahlhaus (2000). exists and its eigenvalues are bounded and bounded away from zero uniformly for all [ , ]. For Gaussian processes X , inference about condi- ∈ V tional independence graphs is commonly based in The structure of the interdependencies among the the domain (e.g., Dahlhaus 2000, Dahlhaus X components of a multivariate time series V can be et al. 1997, Fried and Didelez 2003). Here, the con- described in terms of conditional independencies be- ditional dependence between components X A and tween complete components of XV . More precisely, XB given XS can be described by the partial cross- let A, B, and S be disjoint subsets of V . Then the spectrum of XA and XB given XS, subprocesses XA and XB are said to be conditionally 1 independent given XS if f () = f () f ()f () f (), AB|S AB AS SS SB ¡ X X X

g( A) h( B) S which is the cross-spectrum of the residual processes ¡ ¡ | ¡ ¢ ε ε = g(XA) XS h(XB) XS A|S and B|S obtained by removing the linear eects ¡ | ¢ ¡ | ¢ of the series X from the processes X and X , re-

for all real-valued measurable functions g( ) and h( ) S A B

A B spectively (cf Dahlhaus 2000). Equivalently, we can ¢ on ¢ and , respectively. In this case, we R X X X X use the partial spectral coherency AB|S () of A write A B S. X X ⊥⊥ | and B given S, which is given by As in the case of ordinary random variables, condi- fab|S () tional independence relations of this form can be en- Rab|S() = f ()f () coded by undirected graphs. Recall that an undirected p aa|S bb|S graph G is dened as a pair (V, E) where V is a set for a A and b B. For Gaussian processes, we have of vertices or nodes and E is a collection of undirected ∈ ∈ X X X edges a b for distinct nodes a, b V . Then each A B S RAB|S () = 0 [ , ]. (1) ∈ ⊥⊥ | ⇔ ∀ ∈ component Xv of a process XV may be represented by a single vertex v in a graph G = (V, E), while the For random vectors, the partial correlations can be edges in E indicate possible dependencies among the obtained from the inverse of the . components of XV . This leads to the following deni- Dahlhaus (2000) showed that a similar relationship tion of conditional independence graphs. holds between the partial spectral coherences and the

Denition 2.2. Let XV be a . The inverse of the spectral matrix. More precisely, let 1 conditional independence graph associated with XV is gV V () = fV V () denote the inverse spectral ma- a graph G = (V, E) with vertex set V and edge set E trix. Then, under Assumption 2.1, we have such that gab() X X X Rab|V \{a,b}() = . (2) a b / E a b V \{a,b} g ()g () ∈ ⇔ ⊥⊥ | aa bb p This relation provides an ecient method for com- conditionally independent given X V \{a,b} if and only if puting estimates for the partial spectral coherences the corresponding entry in the inverse spectral matrix ˆ 1 Rab|V \{a,b}() of a process XV : let fV V () be an es- f() vanishes for all frequencies, that is, timate of the spectral matrix such as a nonparametric ˆ 1 d p kernel spectral estimate and let gˆV V () = fV V () i(vu) Kkl aka(u) alb(v) e = 0 its inverse; then the partial spectral coherences can k,lP=1 u,vP=0 be estimated by substituting the entries of gˆV V () for

for all [ , ], where a(0) = and a(u) = 0 the corresponding entries of gV V () in (2). We note ∈ that for non-Gaussian processes, this nonparametric whenever u < 0 or u > p. Thus, the relation X X X imposes the following 2p + 1 re- approach can still be used for infer- a ⊥⊥ b | V \{a,b} ence on the partial correlation graph, which describes strictions on the parameters: only the linear dependencies among the variables. d p K a (u) a (u + h) = 0, h = p, . . . , p. kl ka lb 3 PARAMETRIC MODELLING k,lP=1 uP=0 It is clear from these expressions that estimation of The nonparametric frequency domain approach can the parameters, for example, by maximization of the easily be used for an exploratory analysis of the de- likelihood function under such constraints would be pendence structure of stationary time series, but it dicult if not infeasible. provides only insucient tools for the identication of conditional independence graphs. Moreover, it does 3.2 GRAPHICAL INTERACTION not include methods for building new, parsimonious MODELS models that can be used e.g. for forecasting. As an alternative, we consider parametric models that are Alternatively to the frequency domain approach, we Markov for a given graph G and, thus, satisfy the can view the distribution of a Gaussian process XV as conditional independencies encoded by G. Here, the a Gaussian Markov random eld and describe the con- main obstacle will be the constraints on the model ditional independencies among the variables in XV in parameters that are imposed by the graph: for classi- terms of the inverse of the covariance matrix of XV . To cal time series models such as vector autoregressions, make this idea explicit, assume that XV satises As- the constraints on the parameters are typically non- sumption 2.1. Then the inverse of the spectral matrix linear and prevent ecient algorithms for parame- f() exists and has an absolutely convergent Fourier ter estimation. Therefore we introduce graphical in- expansion given by teraction models for stationary Gaussian processes, ∞ in which conditional independence restrictions corre- f()1 = 2 (i)(u) eiu spond to zero constraints on the parameters. u=P∞ (e.g., Bhansali 1980). Simple calculations show that 3.1 VECTOR AUTOREGRESSIONS the Fourier coecients

To illustrate the problems encountered by impos- (i) 1 1 iu (u) = 2 f() e , u ¢ , (4) ing conditional independence restriction of the form 4 Z ∈ Xa Xb XV \{a,b} on a time series model, we briey ⊥⊥ | where = [ , ], satisfy consider the case of vector autoregressive models. Sup- pose that XV is a stationary process given by ∞ (i)

(v u) (u) = p 0u X (t) = a(u) X (t u) + ε (t), u=P∞ V V V uP=1 for all v ¢ (e.g., Shaman 1975, 1976). Thus, the where ε is a Gaussian process with (i)∈ (i) V matrix = (u v) u,v∈ is the inverse of the mean zero and non-singular covariance matrix . Let ¡ ¢ X p covariance matrix of the process V , and we call A(z) = a(1)z . . . a(p)z be the characteristic (i)(u) the inverse covariances of X . polynomialof the pro cess(e.g., Dahlhaus 2000), where V

is the V V identity matrix. Then, if det A(z) = 0 Similarly as for ordinary Gaussian Markov random 6 for all z ¡ with z 1, the inverse spectral matrix elds, the pairwise conditional independencies that 1 ∈ | | f() exists and is given by hold for the process XV can be expressed by zero entries in the inverse (i). In particular, (4) to- f()1 = 2 A ei 0 K A ei , (3) gether with (1) and (2) leads to the following time ¡ ¢ ¡ ¢ where K = 1 is the inverse of the covariance matrix domain characterization of conditional independencies . From (1) and (2), it follows that X a and Xb are between the components of XV . Proposition 3.1. Let XV be a stationary Gaussian stationary Gaussian VAR(p) process that satises As- process satisfying Assumption 2.1. Then, for all dis- sumption 2.1. Then equations (4) and (3) imply that tinct a, b V , the inverse covariance function (i)(u) of the process ∈ XV is given by (i)

X X X ¢ a b V \{a,b} ab (u) = 0 u . ⊥⊥ | ⇔ ∀ ∈ pu (i)(u) = a(v)0 K a(v + u) (7) The proposition suggests to dene graphical interac- vP=0 tion models directly in terms of inverse covariances 1 and thus to make use of the zero constraints that are with K = . These equations show that the con- imposed on the parameters by the conditional inde- straints on the autoregressive parameters a(u), u = pendencies encoded by the absence of edges in the as- 1, . . . , p and can be reformulated in terms of the sociated graph. inverse covariances. More importantly, the equations also imply that the inverse covariances (i)(u) of a Denition 3.2. Let X be a stationary Gaussian V VAR(p) process vanish for all u > p (e.g., Bhansali process such that Assumption 2.1 holds. Furthermore, | | (i) 1980, Battaglia 1984). In other words, the equations let (u), u ¢ , be the inverse covariances of X . V show that every VAR(p) process that is Markov for an Then we say that∈ X belongs to the graphical inter- V undirected graph G also belongs to the GI(p,G) model. action model of order p associated with the undirected graph G = (V, E) if Conversely, it can be shown that the equation system (7) has a unique solution of the autoregressive param- (i) (u) = 0 u > p (5) eters a(u), u = 1, . . . , p and in terms of the inverse ∀| | covariances (Tunniclie Wilson 1972). Thus, a pro- and for all distinct a, b V X ∈ cess V belongs to the GI(p,G) model if and only if it (i) (i) belongs to the class of all VAR(p) processes that are

a b / E (u) = (u) = 0 u ¢ . (6) ∈ ⇒ ab ba ∀ ∈ Markov for the graph G. Therefore the graphical vector autoregressive model of order p associated with a graph We will also say that the process XV belongs to the GI(p,G) model. G, denoted by VAR(p,G), is identical to the GI(p,G) model. Thus, graphical interaction models can be seen The graphical interaction model of order p is as graphical VAR models with an alternative param- parametrized by the vector of inverse covariances eterization that is better suited for representing the constraints on the parameters imposed by the graph. vech (i)(0)  vec (i)(1)  = . , 4 LIKELIHOOD INFERENCE  .   .   (i)   vec (p)  In this section, we discuss estimation of the parameter based on maximization of an appropriate version of where as usual the vec operator stacks the columns of a the Gaussian log-likelihood function. matrix and the vech operator stacks only the elements contained in the lower triangular submatrix. In the fol- 4.1 WHITTLE’S LIKELIHOOD lowing, we denote the spectral matrices, covariances, and inverse covariances specied by the parameter X X (i) Suppose that V (1), . . . , V (T ) are observations from by fθ(), Rθ(u), and (u), respectively. Then the θ a stationary Gaussian process XV . In the following, parameter space (p, G) of the graphical interaction we assume that XV belongs to a GI(p,G) model with model of order p associated with graph G is the set of unknown parameter 0 (p, G). For estimation of r 2 1 all ¢ with r = d p + d(d + 1) such that ∈ ∈ 2 the parameter 0, we consider the likelihood function fθ() satises Assumption 2.1 and T/2 1/2 (i) LT () = (2) det θ,T θ (u) satisfy conditions (5) and (6). ¡ 1 ¢0 1 exp X θ X , We note that the latter condition imposes zero con- 2 V,T ,T V,T straints on the elements of the parameter vector . ¡ ¢ where XV,T = vec(XV (1), . . . , XV (T )) and θ,T is the 3.3 RELATION TO VECTOR V p V p matrix with entries θ(u v) for u, v = | | | | AUTOREGRESSIONS 1, . . . , T (e.g., Brockwell and Davis 1991, 8.6). In § maximum likelihood estimation, the parameter 0 is We now return to our discussion of vector autoregres- estimated by the vector in (p, G) that maximizes the sive models. For this, suppose again that XV is a likelihood function or, equivalently, that minimizes the 1/T log-likelihood function Using matrix calculus (e.g., Harville 1997), we obtain the following rst derivatives of `w() 1 1 X0 1 X T `T ( ) log det θ,T + V,T θ,T V,T , (8) 2T 2T w θ f 1 ∂`T ( ) 1 (T ) ∂ θ () where we have omitted an additive constant. The main = tr I () fθ() d. (9) ∂i 4 Z h ∂i i problem in the application of the log-likelihood func- ¡ ¢ tion `T () to graphical interaction models are the con- Since the inverse spectral matrix is linear in the pa- straints on the parameters. Since `T () depends on the rameters, we get an explicit formula for its derivatives. (i) θ parameters only through the covariance matrix ,T Let k correspond to ab (u). Then, for i, j V , 1 ∈ and its inverse θ,T , derivation of the likelihood equa- tions under the constraints of a graphical interaction 1 2 if a = b and u = 0 ∂f ia ja ij,θ ( )  iu model would be infeasible. = 2 iajbe . ∂k  otherwise  £ iu A more favourable choice for tting graphical interac- +ibjae tion models is Whittle’s approximation to the exact  ¤ Gaussian log-likelihood function, which has been orig- Substituted into (9), we therefore get inally proposed by Whittle (1953, 1954). Whittle’s w θ ∂`T ( ) (T ) iu likelihood is formulated in terms of the inverse spec- = (Iab () fab,θ()) e d (10) ∂k Z tral matrix and thus allows a direct treatment of the constraints in graphical interaction models. More pre- for all u p, . . . , p . These equations can be rewrit- cisely, the approximation is based on the fact that the ten in terms∈ {of time}domain quantities. For this, let 1 matrix θ,T in (8) can be approximated by the corre- ˆ(u) be the empirical covariance function, which is re- sponding submatrix of inverse covariance matrix lated to the periodogram by (i)(0) (i)(1 T ) θ θ ˆ (T ) iu  . . .  (u) = I () e d. (11) . .. . Z   (i) (i)  θ (T 1) θ (0)  θ θ iu   Noting that similarly (u) = f () e d, we ob- (cf Shaman 1975, 1976). Together with the Szego iden- tain the following likelihood equations.R ˆ tity (cf Grenander and Szego 1958), this leads to Whit- Theorem 4.1. Let T be the Whittle estimator of tle’s likelihood function the parameter 0 in the graphical interaction model ˆ GI(p, G). Then T is a solution of the following equa- w 1 (T ) 1 ` () = log det fθ() + tr I ()fθ() d. T 4 Z tion system: ³ £ ¤´ (i) for a, b V with a = b or a b E In this expression, I(T )() is the periodogram matrix ∈ ∈ with entries ˆ ˆ ab,θT (u) = ab(u), u = p, . . . , p; (T ) 1 (T ) (T ) I () = 2H2,T da () d ( ), ab b (ii) for a, b V with a = b and a b / E ¡ ¢ ∈ 6 ∈ where T (T ) (i) da () = Xa(t) exp( it) ˆ (u) = 0, u = p, . . . , p. ab,θT tP=1 is the discrete Fourier transform of the data These equations are similar to the likelihood equa- Xa(1), . . . , Xa(T ). Minimization of Whittle’s likeli- tions in ordinary Gaussian graphical models (cf Lau- hood function leads to the Whittle estimator ritzen 1996). In fact, together with the condition that w (i) ˆ = argmin ` (). ˆ (u) = 0 for u > p, these equations can be seen as T T θT | | θ∈(p,G) the likelihood equations obtained in a Gaussian graph- ical model for the random variables X (t) with v V We note that in practice a tapered version of the peri- v ∈ odogram should be used as this improves the small and t ¢ , where the graph is specied by the zero ∈ (i) constraints on and ˆ = ˆ(u v) is the em- sample properties of the resulting Whittle estimate u,v∈ considerably (e.g., Dahlhaus 1988). pirical covariance matrix of ¡the variables.¢ This is not surprising by the way the Whittle likelihood approxi- 4.2 LIKELIHOOD EQUATIONS mates the log-likelihood function in (8): Whittle’s ap- proximation basically neglects the edge eects due to The likelihood equations are the observing only a nite horizon by substituting asymp- obtained by setting the rst derivatives of the log- totic approximations for the nite sample parameters w 1 likelihood function `T ( ) with respect to to zero. det ,T and ,T . The asymptotic properties of the Whittle estimator Here, the set C0 represents the constraints imposed are well known (e.g., Dzhaparidze and Yaglom 1983). by the order p of the GI(p,G) model while the sets Ci For the formulation of the asymptotic distribution, let indicate the constraints due to the absence of the edges m ( ) = ij ( ) be the matrix with entries ai bi in G. Furthermore, we have Cp,G = i=0Ci. ¡ ¢ ∪ 1 1 ˆ ˆ ˆ f f Starting with (u) = (u) for all u ¢ , where (u) is 1 ∂ θ () ∂ θ () 0 ∈ ij ( ) = tr fθ() fθ() d. the empirical covariance function given by (11), the al- 4 Z h ∂i ∂j i gorithm proceeds by solving in the n-th step the equa- Furthermore let PG be a projector matrix such that tion system PG is the vector of unconstrained parameters in GI(p,G). Then, if X belongs to the GI(p,G) model ˆ (u) = ˆ (u) (a, b, u) / C , V ab,n ab,n1 ∀ ∈ n mod m+1 with parameter , the Whittle estimator ˆ is asymp- (i) 0 T ˆ (u) = 0 (a, b, u) C . totically normally distributed, ab,n ∀ ∈ n mod m+1 In order to avoid working with innite-dimensional √T ˆ D 0, ( ) , T 0 → N 0 matrices, the computations are carried out in the fre- ¡ ¢ ¡ ¢ quency domain: 0 0 1 where ( 0) = PG PG( 0)PG PG. ¡ ¢ For n mod m + 1 = 0, the constraints in C0 dene ˆ We note that, although the Whittle estimator T is a VAR(p) model without further constraints. Thus, consistent for 0, the likelihood equations in general the solution of the above equation system satises the may have multiple solutions. Thus, a solution of the Yule-Walker equations likelihood equations may be only a local minimum of w p the Whittle log-likelihood `T ( ). ˆ (u) = ˆ (u v) a(v)0 + , n1 n1 u0 vP=1 5 ITERATIVE ESTIMATION which are solved by the Yule-Walker estimates ˆ ˆan(1), . . . , ˆan(p) and n (e.g., Brockwell and Davis For Gaussian graphical models, there exist two itera- 1991). From these, the spectral density can be ob- tive algorithms for solving the likelihood equations: tained by iterative proportional scaling (e.g., Lauritzen 1996) ˆ ˆ i 1 ˆ ˆ i 01 and the algorithm by Wermuth and Scheidt (1977). fn() = An(e ) nAn(e ) , The former requires that all constraints are preserved ˆ p throughout the iterations and is hard to realize in the where An(z) = I ˆan(1)z . . . ˆan(p)z . present situation. We therefore discuss an adapted For n mod m + 1 = i > 0, the constraints in Ci are version of the second algorithm. 1 equivalent to faibi () = 0. Consequently, the iteration

Let Cp,G be the set of all (a, b, u) V V ¢ step can be formulated as such that a and b are distinct and not∈ adjacen t in ˆ ˆ G or u > p. Then Cp,G represents the entries in fab,n() = fab,n1() [-, ] if a, b = ai, bi , | | (i) ∀ ∈ { } 6 { } the innite-dimensional inverse covariance matrix 1 fˆ () = 0 [-, ] if a, b = ai, bi . that are constrained to zero in the GI(p,G) model, that ab,n ∀ ∈ { } { } is, we have in the GI(p,G) model This can be accomplished by setting

(i) (a, b, u) Cp,G (u) = 0. ˆ ˆf ˆf 1ˆf ∈ ⇒ ab faibi,n() = aiSi,n1() SiSi,n1() Sibi,n1() for [ , ], where S = V a , b (cf Wermuth For the application of the algorithm by Wermuth and ∈ i \{ i i} Scheidt (1977), which preserves only part of the con- and Scheidt 1977). straints in each iteration, we dene subsets Ci speci- Speed and Kiiveri (1986) have shown the convergence fying the constraints to be preserved alternately. To of the algorithm to a solution of the likelihood equa- this end, let a , b , i = 1, . . . , m, be an enumeration { i i} tions in the case of Gaussian graphical models; the of all distinct vertices ai, bi that are not adjacent in G. proof for the present situation is similar. After the Then we set ˆ n-th step, an estimate T for can be obtained by ˆ inverting the spectral matrix fn() and computing the C0 = (a, b, u) V V ¢ u > p ˆ(i) © ∈ ¯ | | ª inverse covariances n (u) by (4). Alternatively, if and, for i = 1, . . . , m, ¯ the algorithm is stopped at n mod m + 1 = 0, it also provides the estimates in the parameterization of the

Ci = (a, b, u) V V ¢ a, b = ai, bi . VAR(p,G) model. © ∈ ¯ { } { }ª ¯ 0 1.0 1.0 1.0 1.0 CO CO−NO CO−NO2 CO−O3 CO−I −1 −2 0.5 0.5 0.5 0.5

−3 coherence coherence coherence coherence log spectrum −4 0.0 0.0 0.0 0.0 1.0 9 1.0 1.0 1.0 x CO−NO x NO x NO−NO2 x NO−O3 x NO−I 8 0.5 7 0.5 0.5 0.5

6 coherence coherence coherence log spectrum

partial coherence 0.0 5 0.0 0.0 0.0 1.0 1.0 6 1.0 1.0 x CO−NO2 x NO−NO2 x NO2 x NO2−O3 x NO2−I 5 0.5 0.5 0.5 0.5 4 coherence coherence log spectrum

partial coherence 0.0 partial coherence 0.0 3 0.0 0.0 1.0 1.0 1.0 7 1.0 x CO−O3 x NO−O3 x NO2−O3 x O3 x O3−I 6 0.5 0.5 0.5 5 0.5

4 coherence log spectrum

partial coherence 0.0 partial coherence 0.0 partial coherence 0.0 3 0.0 1.0 1.0 1.0 1.0 9 x CO−I x NO−I x NO2−I x O3−I x I 8 0.5 0.5 0.5 0.5 7 log spectrum

partial coherence 0.0 partial coherence 0.0 partial coherence 0.0 partial coherence 0.0 6 0 π 2 π 0 π 2 π 0 π 2 π 0 π 2 π 0 π 2 π frequencyx frequencyx frequencyx frequencyx frequencyx

Figure 2: Estimated spectral densities, coherencies, and partial coherences for air pollution data: nonparametric estimates (solid lines) and parametric t (dashed lines) obtained by minimizing the BIC criterion.

CO I CO I CO I NO2 NO2 NO2 Table 1: Results of model selection for the air pollution data: graphs and orders p with the lowest BIC values.

NO O3 NO O3 NO O3 G1 G2 G3 Graph BIC (p = 3) BIC (p = 4) G1 686.0 705.6 CO I CO I CO I G2 687.7 687.0 NO2 NO2 NO2 G3 708.8 722.1 G4 714.9 733.1 NO O3 NO O3 NO O3 G5 716.1 733.5 G4 G5 G6 G6 718.7 748.1 Figure 1: Undirected graphs with lowest BIC value for the air pollution data. all possible undirected graphs. For each tted model, the parameters were computed in their autoregressive ˆ 6 EXAMPLE form, that is, as ˆap,G and p,G. From these, we ob- tained the BIC score (Schwarz 1978) We illustrate the proposed graphical modelling ap- ˆ BIC(p, G) = T det p,G + log(T ) qp,G, proach by application to a ve-dimensional time series that has been analysed previously by the nonparamet- where qp,G is the number of parameters in the GI(p,G) ric frequency domain approach in Dahlhaus (2000). model. The BIC criterion was minimized for order The time series consists of T = 4386 measurements p = 3 and the graph G1 in Figure 1. Figure 2 shows of the concentrations of four air pollutants—carbon the parametric estimates of the spectral densities and monoxide (CO), nitrate monoxide (NO), nitrate diox- the spectral and partial spectral coherences for the se- ide (NO2), and ozone (O3)—and the global radiation lected model (dashed lines); these t the nonparamet- intensity (I) recorded from January 1991 to December ric estimates (solid lines) reasonably well. Notice that 1992 in Heidelberg (6 equidistant recording per day). in the four lower left plots the parametric estimates For more details, we refer to Dahlhaus (2000). for the partial spectral coherence are identical to zero as required by (1). In order to learn the conditional independence graph of XV from the data, we tted graphical interaction Table 1 shows the BIC scores for the six best models models of orders ranging from p = 1 to p = 6 for of orders 3 and 4 (models with dierent orders led to higher BIC scores). Here, the scores for the best model tication of synaptic connections in neural ensembles GI(3,G ) and the models GI(p,G ) with p = 3, 4 dier by graphical models. Journal of Neuroscience Methods 1 2 77 by less than 2. This indicates that there is not enough , 93–107. Dzhaparidze, K. O. and Yaglom, A. M. (1983). Spectrum evidence in the data to discriminate between the best parameter estimation in time series analysis. In P. R. model and the competing models GI(p,G2) with p = Krishnaiah (ed.), Developments in Statistics, Vol. 4, 3, 4 (e.g., Raftery 1995). We note that the graph G2 Academic Press, New York. was also selected by the nonparametric approach in Eichler, M. (1999). Graphical Models in Time Series Anal- Dahlhaus (2000). ysis. Doctoral thesis, Universitat Heidelberg. Eichler, M. (2001). Graphical modelling of multivariate time series. Technical report, Universitat Heidelberg. 7 DISCUSSION Eichler, M. (2004). Testing nonparametric and semipara- metric hypotheses in vector stationary processes. Tech- nical report, Universitat Heidelberg. In this paper, we have presented a parametric ap- Eichler, M. (2006). and path diagrams proach for graphical interaction modelling of station- for multivariate time series. Journal of ary Gaussian processes. Graphical interaction models (in press). provide a simple description of the dependence struc- Eichler, M., Dahlhaus, R. and Sandkuhler, J. (2003). Par- ture of stationary processes by undirected graphs. We tial correlation analysis for the identication of synaptic connections. Biological Cybernetics 89, 289–302. have proposed a new parametrization of vector autore- Fried, R. and Didelez, V. (2003). Decomposability and gressive models in terms of inverse covariances. With selection of graphical models for time series. Biometrika this parametrization, the conditional independence re- 90, 251–267. strictions encoded by an undirected graph lead to sim- Gather, U., Imho, M. and Fried, R. (2002). Graphical ple zero constraints on the parameters. models for multivariate time series from intensive care monitoring statistics in medicine, 21, 2685-2701 (2002). We have discussed parameter estimation based on Statistics in Medicine 21, 2685–2701. Whittle’s approximation to the log-likelihood function Grenander, U. and Szego, G. (1958). Toeplitz forms and their applications. Univ. California Press, Berkeley. and proposed to compute the resulting estimates by an Harville, D. A. (1997). Matrix Algebra from a ’s adapted version of the iterative algorithm by Wermuth Perspective. Springer, New York. and Scheidt (1977). This algorithm not only has clear Lauritzen, S. L. (1996). Graphical Models. Oxford Univer- convergence properties, but also provides estimates al- sity Press, Oxford. ternatively in the parameterization of the graphical Raftery, A. E. (1995). Bayesian model selection in so- interaction model GI(p,G) and of the graphical vector cial research (with discussion). Sociological Methodology 25, 111–196. autoregressive model VAR(p,G). Schwarz, G. (1978). Estimation the dimension of a model. Annals of Statistics 6, 461–464. Acknowledgements Shaman, P. (1975). An approximate inverse for the co- matrix of and autoregressive 3 This work has been carried out at the Institute of Applied processes. Annals of Statistics , 532–538. Mathematics, University of Heidelberg, Germany. Shaman, P. (1976). Approximations for stationary covari- ance matrices and their inverses with the application to ARMA models. Annals of Statistics 4, 292–301. References Speed, T. P. and Kiiveri, H. T. (1986). Gaussian arkov distributions over nite graphs. Annals of Statistics Battaglia, F. (1984). Inverse covariances of a multivariate 14, 138–150. time series. Metron 2(3-4), 117–129. Timmer, J., Lauk, M., Koster, B., Hellwig, B., Hauler, Bhansali, R. J. (1980). Autoregressive and window esti- S., Guschlbauer, B., Radt, V., Eichler, M., Deuschl, G. mates of the inverse correlation function. Biometrika and Luc king, C. H. (2000). Cross-spectral analysis of 67, 551–561. tremor time series. International Journal of Bifurcation Brillinger, D. R. (1996). Remarks concerning graphical and Chaos 10, 2595–2610. models for time series and point processes. Revista de Tunniclie Wilson, G. (1972). The factorization of matri- Econometria 16, 1–23. cial spectral densities. SIAM J. Appl. Math. 23, 420– Brockwell, P. J. and Davis, R. A. (1991). Time Series: 426. Theory and Methods. 2nd edn, Springer, New York. Wermuth, N. and Scheidt, E. (1977). Fitting a covariance Dahlhaus, R. (1988). Small sample eects in time series selection model to a matrix. Algorithm AS 105. Applied analysis: a new asymptotic theory and a new estimate. Statistics 26, 88–92. Annals of Statistics 18. Whittle, P. (1953). Estimation and information in station- Dahlhaus, R. (2000). Graphical interaction models for mul- ary time series. Ark. Mat. 2, 423–434. (reprinted in: tivariate time series. Metrika 51, 157–172. Breakthroughs in Statistics, Vol. III, S. Kotz and N.L. Dahlhaus, R. and Eichler, M. (2003). Causality and graph- Johnson (eds.), Springer, New York, 1997.). ical models in time series analysis. In P. Green, N. Hjort Whittle, P. (1954). Some recent contributions to the theory and S. Richardson (eds), Highly structured stochastic of stationary processes. In H. Wold (ed.), A Study in systems, University Press, Oxford. the Analysis of Stationary Time Series, Almquist and Dahlhaus, R., Eichler, M. and Sandkuhler, J. (1997). Iden- Wiksell, Uppsala, pp. 196–228. 2nd ed.