Theory of Big Data 2 Conference Big Data Institute, University College London

Causal Inference from Multivariate : Principles and Problems Michael Eichler

Department of Quantitative Maastricht University http://researchers-sbe.unimaas.nl/michaeleichler

6 January 2016 Outline

concepts

• Graphical representation • Definition • Markov properties • Extension: systems with latent variables

• Causal learning • Basic principles • Identification from empirical relationships

• Non-Markovian constraints • Trek-separation in graphs • Tetrad representation theorem • Testing for tetrad constraints

• Open problems and conclusions

2 / 52 Concepts of causality for time series

We consider two variables X and Y measured at discrete times t ∈ Z:

X = Xt , Y = Yt . Z t∈Z t∈   Question: When is it justified to say that X causes Y? Various approaches:

• Intervention causality (Pearl, 1993; Eichler & Didelez 2007, 2010) • Structural causality (White and Lu, 2010) • (Granger, 1967, 1980, 1988) • Sims causality (Sims, 1972)

3 / 52 Granger causality

Two fundamental princples: • The precedes its effect in time. • The causal series contains special information about the series being caused that is not available otherwise.

4 / 52 Granger causality

Two fundamental princples: • The cause precedes its effect in time. • The causal series contains special information about the series being caused that is not available otherwise.

This leads us to consider two information sets: • F ∗(t) - all information in the universe up to time t • ∗ F−X (t) - this information except the values of X

4 / 52 Granger causality

Two fundamental princples: • The cause precedes its effect in time. • The causal series contains special information about the series being caused that is not available otherwise.

This leads us to consider two information sets: • F ∗(t) - all information in the universe up to time t • ∗ F−X (t) - this information except the values of X Granger’s definition of causality (Granger 1969, 1980) We say that X causes Y if the probability distributions of ∗ • Yt+1 given F (t) and • ∗ Yt+1 given F−X (t) are different.

4 / 52 Granger causality

Problem: The definition cannot be used with actual data.

5 / 52 Granger causality

Problem: The definition cannot be used with actual data. Suppose data consist of multivariate time series V =(X, Y, Z) and let • {Xt} - information given by X up to time t • similarly for Y and Z

Definition: Granger non-causality

• X is Granger-noncausal for Y with respect to V if

t t t Yt+1 ⊥⊥ X | Y , Z .

• Otherwise we say that X Granger-causes Y with respect to V.

5 / 52 Granger causality

Problem: The definition cannot be used with actual data. Suppose data consist of multivariate time series V =(X, Y, Z) and let • {Xt} - information given by X up to time t • similarly for Y and Z

Definition: Granger non-causality

• X is Granger-noncausal for Y with respect to V if

t t t Yt+1 ⊥⊥ X | Y , Z .

• Otherwise we say that X Granger-causes Y with respect to V.

Additionally: • X and Y are said to be contemporaneously independent w.r.t. V if

t Xt+1 ⊥⊥ Yt+1 | V

5 / 52 Sims causality

Definition: Sims non-causality X does not Sims-cause Y with respect to V =(X, Y, Z) if

′ t−1 t t {Yt′ |t > t} ⊥⊥ Xt | X , Y , Z

Note: • Granger causality is a concept of direct causality • Sims causality is a concept of total causality (direct and indirect pathways)

The following statistics are measures for Sims causality: • impulse response function (time and frequency domain) • direct transfer function (DTF)

6 / 52 Vector autoregressive processes

Let X be a multivariate stationary Gaussian time series with vector autoregressive representation

∞ ǫ Xt = Ak Xt−k + t k=1 P

Granger non-causality in VAR models: The following are equivalent:

• Xb does not Granger cause Xa with respect to X;

• Aab,k = 0 for all k ∈ Æ.

7 / 52 Vector autoregressive processes

Let X be a multivariate stationary Gaussian time series with vector autoregressive representation

∞ ∞ ǫ Xt = Ak Xt−k + t = Bk ǫt−k k=1 k=0 P P

Granger non-causality in VAR models: The following are equivalent:

• Xb does not Granger cause Xa with respect to X;

• Aab,k = 0 for all k ∈ Æ.

Sims non-causality in VAR models: The following are equivalent:

• Xb does not Sims cause Xa with respect to X;

• Bab,k = 0 for all k ∈ Æ.

7 / 52 Outline

• Causality concepts

• Graphical representation • Definition • Markov properties • Extension: systems with latent variables

• Causal learning • Basic principles • Identification from empirical relationships

• Non-Markovian constraints • Trek-separation in graphs • Tetrad representation theorem • Testing for tetrad constraints

• Open problems and conclusions

8 / 52 Graphical models for time series

Basic idea: use graphs to encode conditional independences among variables • nodes/vertices represent variables • missing edge between two nodes implies conditional independence of the two variables

Application to time series: • treat each variable at each time separately ( time series chain graphs) • treat each series as one variables (only one node in the graph)

9 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)

Idea: represent Granger-causal relations in X by mixed graph G:

• vertices v ∈ V represent the variables (time series) Xv;

10 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)

Idea: represent Granger-causal relations in X by mixed graph G:

• vertices v ∈ V represent the variables (time series) Xv; • directed edges between the vertices indicate Granger-causal relationships;

10 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)

Idea: represent Granger-causal relations in X by mixed graph G:

• vertices v ∈ V represent the variables (time series) Xv; • directed edges between the vertices indicate Granger-causal relationships; • additionally undirected (dashed) edges indicate contemporaneous associations.

10 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)

Example: consider five-dimensional autoregressive process XV ǫ Xt = f(Xt−1)+ t

2 4

1 3 5

11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)

Example: consider five-dimensional autoregressive process XV ǫ Xt = f(Xt−1)+ t

2 4

1 3 5 with

• X1,t = f1(X3,t−1)+ ǫ1,t

11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)

Example: consider five-dimensional autoregressive process XV ǫ Xt = f(Xt−1)+ t

2 4

1 3 5 with

• X1,t = f1(X3,t−1)+ ǫ1,t

• X2,t = f2(X4,t−1)+ ǫ2,t

11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)

Example: consider five-dimensional autoregressive process XV ǫ Xt = f(Xt−1)+ t

2 4

1 3 5 with

• X1,t = f1(X3,t−1)+ ǫ1,t

• X2,t = f2(X4,t−1)+ ǫ2,t

• X3,t = f3(X1,t−1, X2,t−1)+ ǫ3,t

11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)

Example: consider five-dimensional autoregressive process XV ǫ Xt = f(Xt−1)+ t

2 4

1 3 5 with

• X1,t = f1(X3,t−1)+ ǫ1,t

• X2,t = f2(X4,t−1)+ ǫ2,t

• X3,t = f3(X1,t−1, X2,t−1)+ ǫ3,t

• X4,t = f4(X3,t−1, X5,t−1)+ ǫ4,t

11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)

Example: consider five-dimensional autoregressive process XV ǫ Xt = f(Xt−1)+ t

2 4

1 3 5 with

• X1,t = f1(X3,t−1)+ ǫ1,t

• X2,t = f2(X4,t−1)+ ǫ2,t

• X3,t = f3(X1,t−1, X2,t−1)+ ǫ3,t

• X4,t = f4(X3,t−1, X5,t−1)+ ǫ4,t

• X5,t = f5(X3,t−1)+ ǫ5,t

11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)

Example: consider five-dimensional autoregressive process XV ǫ Xt = f(Xt−1)+ t

2 4

1 3 5 with

• X1,t = f1(X3,t−1)+ ǫ1,t

• X2,t = f2(X4,t−1)+ ǫ2,t

• X3,t = f3(X1,t−1, X2,t−1)+ ǫ3,t

• X4,t = f4(X3,t−1, X5,t−1)+ ǫ4,t

• X5,t = f5(X3,t−1)+ ǫ5,t

• ǫ1,t, ǫ2,t, ǫ3,t ⊥⊥ ǫ4,t, ǫ5,t ǫ4,t ⊥⊥ ǫ5,t

11 / 52 Markov properties

Objective: derive Granger-causal relationships for XS, S ⊆ V

12 / 52 Markov properties

Objective: derive Granger-causal relationships for XS, S ⊆ V Idea: characterize pathways that induce associations

12 / 52 Markov properties

Objective: derive Granger-causal relationships for XS, S ⊆ V Idea: characterize pathways that induce associations Tool: concepts of separation in graphs • DAGs: d-separation (Pearl 1988) • mixed graphs: d-separation (Spirtes et al. 1998, Koster 1999) or m-separation (Richardson 2003)

12 / 52 Markov properties

2

1 3

p(x)= p(x3|x2)p(x2|x1)p(x1)

⇒ X3 ⊥⊥ X1 | X2

13 / 52 Markov properties

2 2

1 3 1 3

p(x)= p(x3|x2)p(x2|x1)p(x1) p(x)= p(x1|x2)p(x3|x2)p(x2)

⇒ X3 ⊥⊥ X1 | X2 ⇒ X3 ⊥⊥ X1 | X2

13 / 52 Markov properties

2 2

1 3 1 3

p(x)= p(x3|x2)p(x2|x1)p(x1) p(x)= p(x1|x2)p(x3|x2)p(x2) 2 ⇒ X3 ⊥⊥ X1 | X2 ⇒ X3 ⊥⊥ X1 | X2

1 3

p(x)= p(x2|x1, x3)p(x3)p(x1)

6⇒ X3 ⊥⊥ X1 | X2

13 / 52 Global Granger-causal Markov property Separation in mixed graphs

Question: What type of paths induce Granger causal relations between variables? Note: Granger (non)causality is not symmetric

Idea: consider only paths ending with a directed edge £

¢ £ Examples: 1 £ 2 3 4 entails

• X1 does not Granger cause X4 with respect to X1, X4

• X1 does not Granger cause X4 with respect to X1, X3, X4

• X1 does not Granger cause X4 with respect to X1, X2, X3, X4 but not

• X1 does not Granger cause X4 with respect to X1, X2, X4

14 / 52 Outline

• Causality concepts

• Graphical representation • Definition • Markov properties • Extension: systems with latent variables

• Causal learning • Basic principles • Identification from empirical relationships

• Non-Markovian constraints • Trek-separation in graphs • Tetrad representation theorem • Testing for tetrad constraints

• Open problems and conclusions

15 / 52 Principles of causal inference

Objective: identify causal structure of process X Question: What to use in practise? • Granger causality or Sims causality • bivariate or fully multivariate analysis

16 / 52 Principles of causal inference

Objective: identify causal structure of process X Question: What to use in practise? • Granger causality or Sims causality • bivariate or fully multivariate analysis

Answer: For causal inference . . . all and more.

16 / 52 Principles of identification

An example of indirect causality:

2

1 3 implies for the bivariate submodel

1 3

17 / 52 Principles of identification

An example of spurious causality:

L 2

1 3 implies for the trivariate and bivariate submodels

2

1 3

1 3

18 / 52 Principles of identification

Inverse problem: What can we say about the full system based on observed Granger-noncausal relations for the observed (sub)process? Suppose

• Xa → Xc [XS] for all {a, c}⊆ S ⊆ V

• Xc → Xb [XS] for all {c, b}⊆ S ⊆ V

Rules of causal inference

• Indirect causality rule: Xa truely causes Xb if 9 Xa Xb [S] for some S ⊆ V with c ∈ S

• Spurious causality rule: Xa is a spurious cause of Xb if 9 Xa Xb [S] for some S ⊆ V with c ∈/ S

19 / 52 Principles of causal inference

U Z

Y X

bivariate Granger trivariate Granger trivariate Sims 0.4 0.4 0.4

0.2 0.2 0.2 ) ) ) h h h ( ( ( YX YX YX A A B

0.0 0.0 0.0

−0.2 −0.2 −0.2

2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 12 14 lag h lag h lag h

20 / 52 Principles of causal inference

U Z V

Y X

bivariate Granger trivariate Granger trivariate Sims 0.4 0.4 0.4

0.2 0.2 0.2 ) ) ) h h h ( ( ( YX YX YX A A B

0.0 0.0 0.0

−0.2 −0.2 −0.2

2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 12 14 lag h lag h lag h

21 / 52 Identification of causal structure

Algorithm: identification of adjacencies

• insert a ¡ b whenever Xa and Xb are not contemporaneously independent • insert a b whenever

• Xb → Xa [XS] for all S ⊆ V with a, b ∈ S; • X (t − k) 6⊥⊥ X (t + 1) |F (t) ∨F (t − k) ∨F (t − k − 1)

a b S1 S2 a Z for all k ∈ Æ, t ∈ , for all disjoint S1, S2 ⊆ V with b ∈ S1 and a ∈/ S1 ∪ S2.

22 / 52 Identification of causal structure

Algorithm: identification of tails • colliders: 9 acb ∈ G and Xa Xb [XS] for some S such that c ∈/ S

⇒ c b c ¤ b • non-colliders: 9 acb ∈ G and Xa Xb [XS] for some S such that c ∈ S

⇒ c b c £ b

• ancestors:

£ £ a £ . . . b in G ⇒ a b a b • discriminating paths: e.g. Ali et al. (2004)

23 / 52 Identification of causal structure

Example: application to neural spike train data

Neuron 1 Neuron 2 Neuron 3 Neuron 4 Neuron 5 Neuron 6 Neuron 7 Neuron 8 Neuron 9 Neuron 10 0 2 4 6 8 Time [sec]

0.4 0.4 0.4 0.3 0.3 0.3 ) ) ) 2 0.2 3 0.2 4 0.2 → → →

1 0.1 1 0.1 1 0.1 ( ( ( 0.0 0.0 0.0 pdc pdc pdc −0.1 −0.1 −0.1 −0.2 −0.2 −0.2 −60 −40 −20 0 20 40 60 −60 −40 −20 0 20 40 60 −60 −40 −20 0 20 40 60 lag lag lag 0.4 0.4 0.4 0.3 0.3 0.3 ) ) ) 3 0.2 4 0.2 4 0.2 → → →

2 0.1 2 0.1 3 0.1 ( ( ( 0.0 0.0 0.0 pdc pdc pdc −0.1 −0.1 −0.1 −0.2 −0.2 −0.2 −60 −40 −20 0 20 40 60 −60 −40 −20 0 20 40 60 −60 −40 −20 0 20 40 60 lag lag lag

24 / 52 Identification of causal structure

Example:

(a) (b) (c) (d) 2 3

1 4

(e) (f) (g) (h)

(i) (j) (k)

Result: 2 3

1 4

25 / 52 Outline

• Causality concepts

• Graphical representation • Definition • Markov properties • Extension: systems with latent variables

• Causal learning • Basic principles • Identification from empirical relationships

• Non-Markovian constraints • Trek-separation in graphs • Tetrad representation theorem • Testing for tetrad constraints

• Open problems and conclusions

26 / 52 Problem

Example:

L

1 2 3 4

• X1, X2, X3, X4 are conditionally independent given L

• no conditional independences among X1,..., X4.

27 / 52 Trek separation

Problem: • conditional independences are not sufficient to describe processes that involve latent variables • identification of such structures relies on sparsity that is often not given Approach: Sullivant et al (2011) for multivariate Gaussian distributions • new concept of separation in graphs • encodes rank constraints on minors of covariance matrix • generalizes other concepts of separation • special case: conditional independences

28 / 52 Trek separation

A trek between nodes i and j is a path π =(πL, πM, πR) such that

• πL is a directed path from some node kL to i;

• πR is a directed path from some node kR to j;

• πM is an undirected edge kL ¡ kR or a path of length zero (kL = kR).

¡ £ ¢ ¢ £ £ £ ¡ Examples: i ¢ kR kL j, i v k j, i v j, i j

Definition (trek separation)

(CL, CR) t-separates sets A and B if for every trek (πL, πM, πR)

• πL contains a vertex in CL or

• πR contains a vertex in CR.

29 / 52 Trek separation

Let X be a stationary Gaussian process with spectral matrix Σ(ω) satisfying

∞ Σ 1 −i u ω (ω)= 2π cov(Xt, Xt−u) e . u=−∞ P

Theorem Let X be G-Markov. Then the following are equivalent: Σ • rank( AB(ω)) ≤ r for all ω ∈ [−π, π]

• A and B are t-separated by some (CL, CR) with |CL| + |CR|≤ r.

30 / 52 Trek separation

Corollaries: Let X be Gaussian stationary process. Then

Σ XA ⊥⊥ XB | XC ⇔ rank( A∪C,B∪C)= |C|.

Furthermore the following are equivalent:

• XA ⊥⊥ XB | XC for all G-Markov processes X;

• (CA, CB) t-separates A ∪ C and B ∪ C for some partition C = CA ∪ CB.

31 / 52 Tetrad representation theorem

Consider the class M (G) of all G-Markov stationary Gaussian processes Proposition The following are equivalent: • The spectral matrices Σ(·) of processes in M (G) satisfy

Σ Σ Σ Σ ik(ω) jl(ω) − il(ω) jk(ω)= 0;

• {i, j} and {k, l} are t-separated by (c, ∅) or (∅, c) for some node c in G

32 / 52 Tetrad representation theorem

If the spectral matrix Σ(ω) satisfies the tetrad constraints

Σ Σ Σ Σ ik(ω) jl(ω) − il(ω) jk(ω)= 0 Σ Σ Σ Σ ij(ω) kl(ω) − il(ω) kj(ω)= 0 Σ Σ Σ Σ ik(ω) lj(ω) − ij(ω) lk(ω)= 0 then there exists a node P such that Xi, Xj, Xk, and Xl are mutually conditionally independent given XP.

P

1 2 3 4

Note: If no such XP is among the observed variables, XP must be a latent factor.

33 / 52 Testing tetrad constraints

Approach: nonparametric test (Eichler 2008) Σ Null hypothesis: ψ( (ω)) ≡ 0 where ψ(Z)= zik zjl − zil zjk Test statistic:

Σˆ 2 ST = |ψ( (ω))| dω. Z Σˆ where (ω) is a kernel spectral estimator with bandwidth bT

34 / 52 Testing tetrad constraints

Approach: nonparametric test (Eichler 2008) Σ Null hypothesis: ψ( (ω)) ≡ 0 where ψ(Z)= zik zjl − zil zjk Test statistic:

Σˆ 2 ST = |ψ( (ω))| dω. Z Σˆ where (ω) is a kernel spectral estimator with bandwidth bT

Theorem Under the null hypothesis

1/2 −1/2 D 2 bT TST − bT µ →N (0, σ ), where

Σ ′ Σ Σ Σ µ = Ch Cw,2 tr ∇ψ( (ω)) (ω) ∇ψ( (−ω)) (ω) dω Z   2 2 Σ ′ Σ Σ Σ 2 σ = 4π C Cw,4 | tr ∇ψ( (ω)) AA(ω) ∇ψ( (−ω)) BB(ω) | dω, h Z  

34 / 52 Latent variable models

Common identifiability constraint for factor models:

factors are uncorrelated/independent

But: in many applications (eg in neuroscience), we think of latent variables that are causally connected.

• EEG recordings measures neural activity in close cortical regions • fMRI recordings measure hemodynamic responses which depend on underlying neural activity

Objective: recover latent processes and interrelations among them

35 / 52 Latent variable models

Suppose that Y(t) can be partioned into Y (t),..., Y (t) such that I1 Ir

Y (t)= Λ X (t)+ ǫ (t) Ij j j Ij and X(t) is a VAR(p) process.

Then the model can be fitted by the following steps: • identify clusters of variables depending on one latent variable (based on tetrad rules)

• use PCA to determine latent variable processes Xj(t) • fit VAR model to all latent variable processes jointly

36 / 52 Latent variable models Example

15 10 5 0

X(1) −5 −10 −15 0 200 400 600 800 1000 time 15 10 5 0

X(2) −5 −10 −15 0 200 400 600 800 1000 time

20 10 0

−10X(3) −20 −30 0 200 400 600 800 1000 time 4 2 0 X(4) −2 −4 0 200 400 600 800 1000 time 4 2 0 X(5) −2 −4 0 200 400 600 800 1000 time

37 / 52 Latent variable models Example

Set {1,2} with: 1.0

0.8

• {3,4}: S = −0.98 0.6

0.4 • {3,5}: S = −0.31 abs(Res[m, ]) 0.2

0.0 • {4,5}: S = −1.4 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index

38 / 52 Latent variable models Example

Set {1,3} with: 1.0

0.8

• {2,4}: S = −1.37 0.6

0.4 • {2,5}: S = 0.76 abs(Res[m, ]) 0.2

0.0 • {4,5}: S = −0.44 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index

39 / 52 Latent variable models Example

Set {1,4} with: 1.0

0.8

• {2,3}: S = −1.19 0.6

0.4 • {2,5}: S = 6.54 abs(Res[m, ]) 0.2

0.0 • {3,5}: S = 6.55 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index

40 / 52 Latent variable models Example

Set {1,5} with: 1.0

0.8

• {2,3}: S = −1.22 0.6

0.4 • {2,4}: S = 5.43 abs(Res[m, ]) 0.2

0.0 • {3,4}: S = 5.77 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index

41 / 52 Latent variable models Example

Set {2,3} with: 1.0

0.8

• {1,4}: S = −1.18 0.6

0.4 • {1,5}: S = −1.21 abs(Res[m, ]) 0.2

0.0 • {4,5}: S = −1.58 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index

42 / 52 Latent variable models Example

Set {2,4} with: 1.0

0.8

• {3,4}: S = −1.36 0.6

0.4 • {3,5}: S = 5.43 abs(Res[m, ]) 0.2

0.0 • {4,5}: S = 5.66 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index

43 / 52 Latent variable models Example

Set {2,5} with: 1.0

0.8

• {1,3}: S = 0.76 0.6

0.4 • {1,4}: S = 6.55 abs(Res[m, ]) 0.2

0.0 • {3,4}: S = 5.73 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index

44 / 52 Latent variable models Example

Set {3,4} with: 1.0

0.8

• {1,2}: S = −0.98 0.6

0.4 • {1,5}: S = 5.77 abs(Res[m, ]) 0.2

0.0 • {2,5}: S = 5.73 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index

45 / 52 Latent variable models Example

Set {3,5} with: 1.0

0.8

• {1,2}: S = −0.31 0.6

0.4 • {1,4}: S = 6.54 abs(Res[m, ]) 0.2

0.0 • {2,4}: S = 5.66 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index

46 / 52 Latent variable models Example

Set {4,5} with: 1.0

0.8

• {1,2}: S = −1.41 0.6

0.4 • {1,3}: S = −0.44 abs(Res[m, ]) 0.2

0.0 • {2,3}: S = −1.58 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index 1.0

0.8

0.6

0.4 abs(Res[m, ])

0.2

0.0 0 200 400 600 800 1000 Index

47 / 52 Latent variable models

Example:

P Q

1 2 3 4 5

48 / 52 Latent variable models

Example:

L1 L2 L3

1 2 3 4 5 6

49 / 52 Conclusion

Causal Inference is a complex task • requires modelling at all levels (bivariate to fully multivariate) • requires Granger causality as well as other measures (e.g. Sims causality) • definite results may be sparse without further assumptions • latent variables induces further (non-Markovian) constraints on the distribution Open Problems: • merging of information about latent variables; development of algortihms for latent variables • uncertainty in identification of Granger causal relationships • instantaneous causality • aggregation over time (distortion of identification only possible up to Markov equivalence • non-stationarity and non-linearity

50 / 52 References

• E. (2007), Granger-causality and path diagrams for multivariate time series, Journal of Econometrics 137, 334-353. • E. (2008), Testing nonparametric and semiparametric hypotheses in vector stationary processes. Journal of Multivariate Analysis 99, 968-1009. • E. (2009), Causal inference from time series: what can be learned from Granger causality? In: G. Glymour, W. Wang, D. Westerståhl (eds), Proceedings of the 13th International Congress of Logic, Methodology and Philosophy of Science, College Publications, London. • E. (2010), Graphical Modelling of multivariate time series with latent variables. Journal of Machine Learning Research W&CP 9 • E. (2012), Graphical modelling of multivariate time series. Probability Theory and Related Fields 153, 233-268. • E. (2012). Causal inference in time series analysis. In: C. Berzuini, A.P. Dawid, L. Bernardinelli (eds), Causality: Statistical Perspectives and Applications, Wiley, Chichester. • E. (2013). Causal inference with multiple time series: principles and problems. Philosophical Transaction of The Royal Society A 371, 20110613.

51 / 52