Theory of Big Data 2 Conference Big Data Institute, University College London
Causal Inference from Multivariate Time Series: Principles and Problems Michael Eichler
Department of Quantitative Economics Maastricht University http://researchers-sbe.unimaas.nl/michaeleichler
6 January 2016 Outline
• Causality concepts
• Graphical representation • Definition • Markov properties • Extension: systems with latent variables
• Causal learning • Basic principles • Identification from empirical relationships
• Non-Markovian constraints • Trek-separation in graphs • Tetrad representation theorem • Testing for tetrad constraints
• Open problems and conclusions
2 / 52 Concepts of causality for time series
We consider two variables X and Y measured at discrete times t ∈ Z:
X = Xt , Y = Yt . Z t∈Z t∈ Question: When is it justified to say that X causes Y? Various approaches:
• Intervention causality (Pearl, 1993; Eichler & Didelez 2007, 2010) • Structural causality (White and Lu, 2010) • Granger causality (Granger, 1967, 1980, 1988) • Sims causality (Sims, 1972)
3 / 52 Granger causality
Two fundamental princples: • The cause precedes its effect in time. • The causal series contains special information about the series being caused that is not available otherwise.
4 / 52 Granger causality
Two fundamental princples: • The cause precedes its effect in time. • The causal series contains special information about the series being caused that is not available otherwise.
This leads us to consider two information sets: • F ∗(t) - all information in the universe up to time t • ∗ F−X (t) - this information except the values of X
4 / 52 Granger causality
Two fundamental princples: • The cause precedes its effect in time. • The causal series contains special information about the series being caused that is not available otherwise.
This leads us to consider two information sets: • F ∗(t) - all information in the universe up to time t • ∗ F−X (t) - this information except the values of X Granger’s definition of causality (Granger 1969, 1980) We say that X causes Y if the probability distributions of ∗ • Yt+1 given F (t) and • ∗ Yt+1 given F−X (t) are different.
4 / 52 Granger causality
Problem: The definition cannot be used with actual data.
5 / 52 Granger causality
Problem: The definition cannot be used with actual data. Suppose data consist of multivariate time series V =(X, Y, Z) and let • {Xt} - information given by X up to time t • similarly for Y and Z
Definition: Granger non-causality
• X is Granger-noncausal for Y with respect to V if
t t t Yt+1 ⊥⊥ X | Y , Z .
• Otherwise we say that X Granger-causes Y with respect to V.
5 / 52 Granger causality
Problem: The definition cannot be used with actual data. Suppose data consist of multivariate time series V =(X, Y, Z) and let • {Xt} - information given by X up to time t • similarly for Y and Z
Definition: Granger non-causality
• X is Granger-noncausal for Y with respect to V if
t t t Yt+1 ⊥⊥ X | Y , Z .
• Otherwise we say that X Granger-causes Y with respect to V.
Additionally: • X and Y are said to be contemporaneously independent w.r.t. V if
t Xt+1 ⊥⊥ Yt+1 | V
5 / 52 Sims causality
Definition: Sims non-causality X does not Sims-cause Y with respect to V =(X, Y, Z) if
′ t−1 t t {Yt′ |t > t} ⊥⊥ Xt | X , Y , Z
Note: • Granger causality is a concept of direct causality • Sims causality is a concept of total causality (direct and indirect pathways)
The following statistics are measures for Sims causality: • impulse response function (time and frequency domain) • direct transfer function (DTF)
6 / 52 Vector autoregressive processes
Let X be a multivariate stationary Gaussian time series with vector autoregressive representation
∞ ǫ Xt = Ak Xt−k + t k=1 P
Granger non-causality in VAR models: The following are equivalent:
• Xb does not Granger cause Xa with respect to X;
• Aab,k = 0 for all k ∈ Æ.
7 / 52 Vector autoregressive processes
Let X be a multivariate stationary Gaussian time series with vector autoregressive representation
∞ ∞ ǫ Xt = Ak Xt−k + t = Bk ǫt−k k=1 k=0 P P
Granger non-causality in VAR models: The following are equivalent:
• Xb does not Granger cause Xa with respect to X;
• Aab,k = 0 for all k ∈ Æ.
Sims non-causality in VAR models: The following are equivalent:
• Xb does not Sims cause Xa with respect to X;
• Bab,k = 0 for all k ∈ Æ.
7 / 52 Outline
• Causality concepts
• Graphical representation • Definition • Markov properties • Extension: systems with latent variables
• Causal learning • Basic principles • Identification from empirical relationships
• Non-Markovian constraints • Trek-separation in graphs • Tetrad representation theorem • Testing for tetrad constraints
• Open problems and conclusions
8 / 52 Graphical models for time series
Basic idea: use graphs to encode conditional independences among variables • nodes/vertices represent variables • missing edge between two nodes implies conditional independence of the two variables
Application to time series: • treat each variable at each time separately ( time series chain graphs) • treat each series as one variables (only one node in the graph)
9 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)
Idea: represent Granger-causal relations in X by mixed graph G:
• vertices v ∈ V represent the variables (time series) Xv;
10 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)
Idea: represent Granger-causal relations in X by mixed graph G:
• vertices v ∈ V represent the variables (time series) Xv; • directed edges between the vertices indicate Granger-causal relationships;
10 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)
Idea: represent Granger-causal relations in X by mixed graph G:
• vertices v ∈ V represent the variables (time series) Xv; • directed edges between the vertices indicate Granger-causal relationships; • additionally undirected (dashed) edges indicate contemporaneous associations.
10 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV ǫ Xt = f(Xt−1)+ t
2 4
1 3 5
11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV ǫ Xt = f(Xt−1)+ t
2 4
1 3 5 with
• X1,t = f1(X3,t−1)+ ǫ1,t
11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV ǫ Xt = f(Xt−1)+ t
2 4
1 3 5 with
• X1,t = f1(X3,t−1)+ ǫ1,t
• X2,t = f2(X4,t−1)+ ǫ2,t
11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV ǫ Xt = f(Xt−1)+ t
2 4
1 3 5 with
• X1,t = f1(X3,t−1)+ ǫ1,t
• X2,t = f2(X4,t−1)+ ǫ2,t
• X3,t = f3(X1,t−1, X2,t−1)+ ǫ3,t
11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV ǫ Xt = f(Xt−1)+ t
2 4
1 3 5 with
• X1,t = f1(X3,t−1)+ ǫ1,t
• X2,t = f2(X4,t−1)+ ǫ2,t
• X3,t = f3(X1,t−1, X2,t−1)+ ǫ3,t
• X4,t = f4(X3,t−1, X5,t−1)+ ǫ4,t
11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV ǫ Xt = f(Xt−1)+ t
2 4
1 3 5 with
• X1,t = f1(X3,t−1)+ ǫ1,t
• X2,t = f2(X4,t−1)+ ǫ2,t
• X3,t = f3(X1,t−1, X2,t−1)+ ǫ3,t
• X4,t = f4(X3,t−1, X5,t−1)+ ǫ4,t
• X5,t = f5(X3,t−1)+ ǫ5,t
11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV ǫ Xt = f(Xt−1)+ t
2 4
1 3 5 with
• X1,t = f1(X3,t−1)+ ǫ1,t
• X2,t = f2(X4,t−1)+ ǫ2,t
• X3,t = f3(X1,t−1, X2,t−1)+ ǫ3,t
• X4,t = f4(X3,t−1, X5,t−1)+ ǫ4,t
• X5,t = f5(X3,t−1)+ ǫ5,t
• ǫ1,t, ǫ2,t, ǫ3,t ⊥⊥ ǫ4,t, ǫ5,t ǫ4,t ⊥⊥ ǫ5,t
11 / 52 Markov properties
Objective: derive Granger-causal relationships for XS, S ⊆ V
12 / 52 Markov properties
Objective: derive Granger-causal relationships for XS, S ⊆ V Idea: characterize pathways that induce associations
12 / 52 Markov properties
Objective: derive Granger-causal relationships for XS, S ⊆ V Idea: characterize pathways that induce associations Tool: concepts of separation in graphs • DAGs: d-separation (Pearl 1988) • mixed graphs: d-separation (Spirtes et al. 1998, Koster 1999) or m-separation (Richardson 2003)
12 / 52 Markov properties
2
1 3
p(x)= p(x3|x2)p(x2|x1)p(x1)
⇒ X3 ⊥⊥ X1 | X2
13 / 52 Markov properties
2 2
1 3 1 3
p(x)= p(x3|x2)p(x2|x1)p(x1) p(x)= p(x1|x2)p(x3|x2)p(x2)
⇒ X3 ⊥⊥ X1 | X2 ⇒ X3 ⊥⊥ X1 | X2
13 / 52 Markov properties
2 2
1 3 1 3
p(x)= p(x3|x2)p(x2|x1)p(x1) p(x)= p(x1|x2)p(x3|x2)p(x2) 2 ⇒ X3 ⊥⊥ X1 | X2 ⇒ X3 ⊥⊥ X1 | X2
1 3
p(x)= p(x2|x1, x3)p(x3)p(x1)
6⇒ X3 ⊥⊥ X1 | X2
13 / 52 Global Granger-causal Markov property Separation in mixed graphs
Question: What type of paths induce Granger causal relations between variables? Note: Granger (non)causality is not symmetric
Idea: consider only paths ending with a directed edge £
¢ £ Examples: 1 £ 2 3 4 entails
• X1 does not Granger cause X4 with respect to X1, X4
• X1 does not Granger cause X4 with respect to X1, X3, X4
• X1 does not Granger cause X4 with respect to X1, X2, X3, X4 but not
• X1 does not Granger cause X4 with respect to X1, X2, X4
14 / 52 Outline
• Causality concepts
• Graphical representation • Definition • Markov properties • Extension: systems with latent variables
• Causal learning • Basic principles • Identification from empirical relationships
• Non-Markovian constraints • Trek-separation in graphs • Tetrad representation theorem • Testing for tetrad constraints
• Open problems and conclusions
15 / 52 Principles of causal inference
Objective: identify causal structure of process X Question: What to use in practise? • Granger causality or Sims causality • bivariate or fully multivariate analysis
16 / 52 Principles of causal inference
Objective: identify causal structure of process X Question: What to use in practise? • Granger causality or Sims causality • bivariate or fully multivariate analysis
Answer: For causal inference . . . all and more.
16 / 52 Principles of identification
An example of indirect causality:
2
1 3 implies for the bivariate submodel
1 3
17 / 52 Principles of identification
An example of spurious causality:
L 2
1 3 implies for the trivariate and bivariate submodels
2
1 3
1 3
18 / 52 Principles of identification
Inverse problem: What can we say about the full system based on observed Granger-noncausal relations for the observed (sub)process? Suppose
• Xa → Xc [XS] for all {a, c}⊆ S ⊆ V
• Xc → Xb [XS] for all {c, b}⊆ S ⊆ V
Rules of causal inference
• Indirect causality rule: Xa truely causes Xb if 9 Xa Xb [S] for some S ⊆ V with c ∈ S
• Spurious causality rule: Xa is a spurious cause of Xb if 9 Xa Xb [S] for some S ⊆ V with c ∈/ S
19 / 52 Principles of causal inference
U Z
Y X
bivariate Granger trivariate Granger trivariate Sims 0.4 0.4 0.4
0.2 0.2 0.2 ) ) ) h h h ( ( ( YX YX YX A A B
0.0 0.0 0.0
−0.2 −0.2 −0.2
2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 12 14 lag h lag h lag h
20 / 52 Principles of causal inference
U Z V
Y X
bivariate Granger trivariate Granger trivariate Sims 0.4 0.4 0.4
0.2 0.2 0.2 ) ) ) h h h ( ( ( YX YX YX A A B
0.0 0.0 0.0
−0.2 −0.2 −0.2
2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 12 14 lag h lag h lag h
21 / 52 Identification of causal structure
Algorithm: identification of adjacencies
• insert a ¡ b whenever Xa and Xb are not contemporaneously independent • insert a b whenever
• Xb → Xa [XS] for all S ⊆ V with a, b ∈ S; • X (t − k) 6⊥⊥ X (t + 1) |F (t) ∨F (t − k) ∨F (t − k − 1)
a b S1 S2 a Z for all k ∈ Æ, t ∈ , for all disjoint S1, S2 ⊆ V with b ∈ S1 and a ∈/ S1 ∪ S2.
22 / 52 Identification of causal structure
Algorithm: identification of tails • colliders: 9 acb ∈ G and Xa Xb [XS] for some S such that c ∈/ S
⇒ c b c ¤ b • non-colliders: 9 acb ∈ G and Xa Xb [XS] for some S such that c ∈ S
⇒ c b c £ b
• ancestors:
£ £ a £ . . . b in G ⇒ a b a b • discriminating paths: e.g. Ali et al. (2004)
23 / 52 Identification of causal structure
Example: application to neural spike train data
Neuron 1 Neuron 2 Neuron 3 Neuron 4 Neuron 5 Neuron 6 Neuron 7 Neuron 8 Neuron 9 Neuron 10 0 2 4 6 8 Time [sec]
0.4 0.4 0.4 0.3 0.3 0.3 ) ) ) 2 0.2 3 0.2 4 0.2 → → →
1 0.1 1 0.1 1 0.1 ( ( ( 0.0 0.0 0.0 pdc pdc pdc −0.1 −0.1 −0.1 −0.2 −0.2 −0.2 −60 −40 −20 0 20 40 60 −60 −40 −20 0 20 40 60 −60 −40 −20 0 20 40 60 lag lag lag 0.4 0.4 0.4 0.3 0.3 0.3 ) ) ) 3 0.2 4 0.2 4 0.2 → → →
2 0.1 2 0.1 3 0.1 ( ( ( 0.0 0.0 0.0 pdc pdc pdc −0.1 −0.1 −0.1 −0.2 −0.2 −0.2 −60 −40 −20 0 20 40 60 −60 −40 −20 0 20 40 60 −60 −40 −20 0 20 40 60 lag lag lag
24 / 52 Identification of causal structure
Example:
(a) (b) (c) (d) 2 3
1 4
(e) (f) (g) (h)
(i) (j) (k)
Result: 2 3
1 4
25 / 52 Outline
• Causality concepts
• Graphical representation • Definition • Markov properties • Extension: systems with latent variables
• Causal learning • Basic principles • Identification from empirical relationships
• Non-Markovian constraints • Trek-separation in graphs • Tetrad representation theorem • Testing for tetrad constraints
• Open problems and conclusions
26 / 52 Problem
Example:
L
1 2 3 4
• X1, X2, X3, X4 are conditionally independent given L
• no conditional independences among X1,..., X4.
27 / 52 Trek separation
Problem: • conditional independences are not sufficient to describe processes that involve latent variables • identification of such structures relies on sparsity that is often not given Approach: Sullivant et al (2011) for multivariate Gaussian distributions • new concept of separation in graphs • encodes rank constraints on minors of covariance matrix • generalizes other concepts of separation • special case: conditional independences
28 / 52 Trek separation
A trek between nodes i and j is a path π =(πL, πM, πR) such that
• πL is a directed path from some node kL to i;
• πR is a directed path from some node kR to j;
• πM is an undirected edge kL ¡ kR or a path of length zero (kL = kR).
¡ £ ¢ ¢ £ £ £ ¡ Examples: i ¢ kR kL j, i v k j, i v j, i j
Definition (trek separation)
(CL, CR) t-separates sets A and B if for every trek (πL, πM, πR)
• πL contains a vertex in CL or
• πR contains a vertex in CR.
29 / 52 Trek separation
Let X be a stationary Gaussian process with spectral matrix Σ(ω) satisfying
∞ Σ 1 −i u ω (ω)= 2π cov(Xt, Xt−u) e . u=−∞ P
Theorem Let X be G-Markov. Then the following are equivalent: Σ • rank( AB(ω)) ≤ r for all ω ∈ [−π, π]
• A and B are t-separated by some (CL, CR) with |CL| + |CR|≤ r.
30 / 52 Trek separation
Corollaries: Let X be Gaussian stationary process. Then
Σ XA ⊥⊥ XB | XC ⇔ rank( A∪C,B∪C)= |C|.
Furthermore the following are equivalent:
• XA ⊥⊥ XB | XC for all G-Markov processes X;
• (CA, CB) t-separates A ∪ C and B ∪ C for some partition C = CA ∪ CB.
31 / 52 Tetrad representation theorem
Consider the class M (G) of all G-Markov stationary Gaussian processes Proposition The following are equivalent: • The spectral matrices Σ(·) of processes in M (G) satisfy
Σ Σ Σ Σ ik(ω) jl(ω) − il(ω) jk(ω)= 0;
• {i, j} and {k, l} are t-separated by (c, ∅) or (∅, c) for some node c in G
32 / 52 Tetrad representation theorem
If the spectral matrix Σ(ω) satisfies the tetrad constraints
Σ Σ Σ Σ ik(ω) jl(ω) − il(ω) jk(ω)= 0 Σ Σ Σ Σ ij(ω) kl(ω) − il(ω) kj(ω)= 0 Σ Σ Σ Σ ik(ω) lj(ω) − ij(ω) lk(ω)= 0 then there exists a node P such that Xi, Xj, Xk, and Xl are mutually conditionally independent given XP.
P
1 2 3 4
Note: If no such XP is among the observed variables, XP must be a latent factor.
33 / 52 Testing tetrad constraints
Approach: nonparametric test (Eichler 2008) Σ Null hypothesis: ψ( (ω)) ≡ 0 where ψ(Z)= zik zjl − zil zjk Test statistic:
Σˆ 2 ST = |ψ( (ω))| dω. Z Σˆ where (ω) is a kernel spectral estimator with bandwidth bT
34 / 52 Testing tetrad constraints
Approach: nonparametric test (Eichler 2008) Σ Null hypothesis: ψ( (ω)) ≡ 0 where ψ(Z)= zik zjl − zil zjk Test statistic:
Σˆ 2 ST = |ψ( (ω))| dω. Z Σˆ where (ω) is a kernel spectral estimator with bandwidth bT
Theorem Under the null hypothesis
1/2 −1/2 D 2 bT TST − bT µ →N (0, σ ), where
Σ ′ Σ Σ Σ µ = Ch Cw,2 tr ∇ψ( (ω)) (ω) ∇ψ( (−ω)) (ω) dω Z 2 2 Σ ′ Σ Σ Σ 2 σ = 4π C Cw,4 | tr ∇ψ( (ω)) AA(ω) ∇ψ( (−ω)) BB(ω) | dω, h Z
34 / 52 Latent variable models
Common identifiability constraint for factor models:
factors are uncorrelated/independent
But: in many applications (eg in neuroscience), we think of latent variables that are causally connected.
• EEG recordings measures neural activity in close cortical regions • fMRI recordings measure hemodynamic responses which depend on underlying neural activity
Objective: recover latent processes and interrelations among them
35 / 52 Latent variable models
Suppose that Y(t) can be partioned into Y (t),..., Y (t) such that I1 Ir
Y (t)= Λ X (t)+ ǫ (t) Ij j j Ij and X(t) is a VAR(p) process.
Then the model can be fitted by the following steps: • identify clusters of variables depending on one latent variable (based on tetrad rules)
• use PCA to determine latent variable processes Xj(t) • fit VAR model to all latent variable processes jointly
36 / 52 Latent variable models Example
15 10 5 0
X(1) −5 −10 −15 0 200 400 600 800 1000 time 15 10 5 0
X(2) −5 −10 −15 0 200 400 600 800 1000 time
20 10 0
−10X(3) −20 −30 0 200 400 600 800 1000 time 4 2 0 X(4) −2 −4 0 200 400 600 800 1000 time 4 2 0 X(5) −2 −4 0 200 400 600 800 1000 time
37 / 52 Latent variable models Example
Set {1,2} with: 1.0
0.8
• {3,4}: S = −0.98 0.6
0.4 • {3,5}: S = −0.31 abs(Res[m, ]) 0.2
0.0 • {4,5}: S = −1.4 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index
38 / 52 Latent variable models Example
Set {1,3} with: 1.0
0.8
• {2,4}: S = −1.37 0.6
0.4 • {2,5}: S = 0.76 abs(Res[m, ]) 0.2
0.0 • {4,5}: S = −0.44 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index
39 / 52 Latent variable models Example
Set {1,4} with: 1.0
0.8
• {2,3}: S = −1.19 0.6
0.4 • {2,5}: S = 6.54 abs(Res[m, ]) 0.2
0.0 • {3,5}: S = 6.55 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index
40 / 52 Latent variable models Example
Set {1,5} with: 1.0
0.8
• {2,3}: S = −1.22 0.6
0.4 • {2,4}: S = 5.43 abs(Res[m, ]) 0.2
0.0 • {3,4}: S = 5.77 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index
41 / 52 Latent variable models Example
Set {2,3} with: 1.0
0.8
• {1,4}: S = −1.18 0.6
0.4 • {1,5}: S = −1.21 abs(Res[m, ]) 0.2
0.0 • {4,5}: S = −1.58 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index
42 / 52 Latent variable models Example
Set {2,4} with: 1.0
0.8
• {3,4}: S = −1.36 0.6
0.4 • {3,5}: S = 5.43 abs(Res[m, ]) 0.2
0.0 • {4,5}: S = 5.66 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index
43 / 52 Latent variable models Example
Set {2,5} with: 1.0
0.8
• {1,3}: S = 0.76 0.6
0.4 • {1,4}: S = 6.55 abs(Res[m, ]) 0.2
0.0 • {3,4}: S = 5.73 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index
44 / 52 Latent variable models Example
Set {3,4} with: 1.0
0.8
• {1,2}: S = −0.98 0.6
0.4 • {1,5}: S = 5.77 abs(Res[m, ]) 0.2
0.0 • {2,5}: S = 5.73 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index
45 / 52 Latent variable models Example
Set {3,5} with: 1.0
0.8
• {1,2}: S = −0.31 0.6
0.4 • {1,4}: S = 6.54 abs(Res[m, ]) 0.2
0.0 • {2,4}: S = 5.66 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index
46 / 52 Latent variable models Example
Set {4,5} with: 1.0
0.8
• {1,2}: S = −1.41 0.6
0.4 • {1,3}: S = −0.44 abs(Res[m, ]) 0.2
0.0 • {2,3}: S = −1.58 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index 1.0
0.8
0.6
0.4 abs(Res[m, ])
0.2
0.0 0 200 400 600 800 1000 Index
47 / 52 Latent variable models
Example:
P Q
1 2 3 4 5
48 / 52 Latent variable models
Example:
L1 L2 L3
1 2 3 4 5 6
49 / 52 Conclusion
Causal Inference is a complex task • requires modelling at all levels (bivariate to fully multivariate) • requires Granger causality as well as other measures (e.g. Sims causality) • definite results may be sparse without further assumptions • latent variables induces further (non-Markovian) constraints on the distribution Open Problems: • merging of information about latent variables; development of algortihms for latent variables • uncertainty in identification of Granger causal relationships • instantaneous causality • aggregation over time (distortion of identification only possible up to Markov equivalence • non-stationarity and non-linearity
50 / 52 References
• E. (2007), Granger-causality and path diagrams for multivariate time series, Journal of Econometrics 137, 334-353. • E. (2008), Testing nonparametric and semiparametric hypotheses in vector stationary processes. Journal of Multivariate Analysis 99, 968-1009. • E. (2009), Causal inference from time series: what can be learned from Granger causality? In: G. Glymour, W. Wang, D. Westerståhl (eds), Proceedings of the 13th International Congress of Logic, Methodology and Philosophy of Science, College Publications, London. • E. (2010), Graphical Modelling of multivariate time series with latent variables. Journal of Machine Learning Research W&CP 9 • E. (2012), Graphical modelling of multivariate time series. Probability Theory and Related Fields 153, 233-268. • E. (2012). Causal inference in time series analysis. In: C. Berzuini, A.P. Dawid, L. Bernardinelli (eds), Causality: Statistical Perspectives and Applications, Wiley, Chichester. • E. (2013). Causal inference with multiple time series: principles and problems. Philosophical Transaction of The Royal Society A 371, 20110613.
51 / 52