Exploratory Causal Analysis in Bivariate Time Series Data Abstract Many scientific disciplines rely on observational data of systems for which it is difficult (or impossible) to implement controlled experiments and data analysis techniques are required for identifying causal information and relationships directly from observational data. This need has lead to the development of many different time series causality approaches and tools including transfer entropy, convergent cross-mapping (CCM), and Granger causality statistics.
A practicing analyst can explore the literature to find many proposals for identifying drivers and causal connections in times series data sets, but little research exists of how these tools compare to each other in practice. This work introduces and defines exploratory causal analysis (ECA) to address this issue. The motivation is to provide a framework for exploring potential causal structures in time series data sets.
J. M. McCracken Defense talk for PhD in Physics, Department of Physics and Astronomy 10:00 AM November 20, 2015; Exploratory Hall, 3301 Advisor: Dr. Robert Weigel; Committee: Dr. Paul So, Dr. Tim Sauer
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 1 / 50 Exploratory Causal Analysis in Bivariate Time Series Data
J. M. McCracken Department of Physics and Astronomy George Mason University, Fairfax, VA
November 20, 2015
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 2 / 50 Outline 1. Motivation 2. Causality studies 3. Data causality 4. Exploratory causal analysis 5. Making an ECA summary Transfer entropy difference Granger causality statistic Pairwise asymmetric inference Weighed mean observed leaning Lagged cross-correlation difference 6. Computational tools for the ECA summary 7. Empirical examples Cooling/Heating System Data Snowfall Data 8. Times series causality as data analysis
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 3 / 50 Question Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data, I had straightforward, preferably well-established, interpretations, I were reliable, I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
Motivation Data Consider two sets of time series measurements, X and Y.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data, I had straightforward, preferably well-established, interpretations, I were reliable, I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
Motivation Data Consider two sets of time series measurements, X and Y.
Question Is there evidence that X “drives” Y?
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 I worked with time series data, I had straightforward, preferably well-established, interpretations, I were reliable, I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
Motivation Data Consider two sets of time series measurements, X and Y.
Question Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking for analysis tools that
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 I had straightforward, preferably well-established, interpretations, I were reliable, I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
Motivation Data Consider two sets of time series measurements, X and Y.
Question Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data,
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 I were reliable, I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
Motivation Data Consider two sets of time series measurements, X and Y.
Question Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data, I had straightforward, preferably well-established, interpretations,
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
Motivation Data Consider two sets of time series measurements, X and Y.
Question Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data, I had straightforward, preferably well-established, interpretations, I were reliable,
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
Motivation Data Consider two sets of time series measurements, X and Y.
Question Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data, I had straightforward, preferably well-established, interpretations, I were reliable, I and did not require studying the (vast) philosophical causality literature.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 This work stems from our search for such a tool.
Motivation Data Consider two sets of time series measurements, X and Y.
Question Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data, I had straightforward, preferably well-established, interpretations, I were reliable, I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 Motivation Data Consider two sets of time series measurements, X and Y.
Question Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data, I had straightforward, preferably well-established, interpretations, I were reliable, I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 Causality studies
The study of causality is as old as science itself
I Modern historians credit Aristotle with both the first theory of causality (“four causes”) and an early version of the scientific method
I The modern study of causality is broadly interdisciplinary; far too broad to review in a short talk.
Illari and Russo’s textbook1provides an overview of causality studies
1 Illari, P., & Russo, F. (2014). Causality: Philosophical theory meets scientific practice. Oxford University Press. J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 5 / 50 Foundational causality “Is a cause required to precede an effect?” or “How are causes and effects related in space-time?”
Data causality “Does smoking cause lung cancer?” or “Are traffic accidents caused by rain storms?”
Towards a taxonomy of causal studies
Paul Holland identified four types of causal questions2:
I the ultimate meaningfulness of the notion of causality
I the details of causal mechanisms
I the causes of a given effect
I the effects of a given cause
2 Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396), 945-960. J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 6 / 50 Data causality “Does smoking cause lung cancer?” or “Are traffic accidents caused by rain storms?”
Towards a taxonomy of causal studies
Paul Holland identified four types of causal questions2:
I the ultimate meaningfulness of the notion of causality
I the details of causal mechanisms
I the causes of a given effect
I the effects of a given cause
Foundational causality “Is a cause required to precede an effect?” or “How are causes and effects related in space-time?”
2 Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396), 945-960. J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 6 / 50 Towards a taxonomy of causal studies
Paul Holland identified four types of causal questions2:
I the ultimate meaningfulness of the notion of causality
I the details of causal mechanisms
I the causes of a given effect
I the effects of a given cause
Foundational causality “Is a cause required to precede an effect?” or “How are causes and effects related in space-time?”
Data causality “Does smoking cause lung cancer?” or “Are traffic accidents caused by rain storms?”
2 Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396), 945-960. J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 6 / 50 Approaches to data causality studies include
I design of experiments (e.g., Fisher randomization)
I potential outcomes (Rubin’s counterfactuals)
I directed acyclic graphs (DAGs) with structural equation models (SEMs); popularized by Pearl as “structural causal models (SCMs)”
I time series causality
There is no consensus on the best approach to data causality Many authors consider their favored approach to be the exclusive correct approach.
Data causality
Data causality is data analysis to draw causal inferences
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50 I potential outcomes (Rubin’s counterfactuals)
I directed acyclic graphs (DAGs) with structural equation models (SEMs); popularized by Pearl as “structural causal models (SCMs)”
I time series causality
There is no consensus on the best approach to data causality Many authors consider their favored approach to be the exclusive correct approach.
Data causality
Data causality is data analysis to draw causal inferences
Approaches to data causality studies include
I design of experiments (e.g., Fisher randomization)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50 I directed acyclic graphs (DAGs) with structural equation models (SEMs); popularized by Pearl as “structural causal models (SCMs)”
I time series causality
There is no consensus on the best approach to data causality Many authors consider their favored approach to be the exclusive correct approach.
Data causality
Data causality is data analysis to draw causal inferences
Approaches to data causality studies include
I design of experiments (e.g., Fisher randomization)
I potential outcomes (Rubin’s counterfactuals)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50 I time series causality
There is no consensus on the best approach to data causality Many authors consider their favored approach to be the exclusive correct approach.
Data causality
Data causality is data analysis to draw causal inferences
Approaches to data causality studies include
I design of experiments (e.g., Fisher randomization)
I potential outcomes (Rubin’s counterfactuals)
I directed acyclic graphs (DAGs) with structural equation models (SEMs); popularized by Pearl as “structural causal models (SCMs)”
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50 There is no consensus on the best approach to data causality Many authors consider their favored approach to be the exclusive correct approach.
Data causality
Data causality is data analysis to draw causal inferences
Approaches to data causality studies include
I design of experiments (e.g., Fisher randomization)
I potential outcomes (Rubin’s counterfactuals)
I directed acyclic graphs (DAGs) with structural equation models (SEMs); popularized by Pearl as “structural causal models (SCMs)”
I time series causality
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50 Data causality
Data causality is data analysis to draw causal inferences
Approaches to data causality studies include
I design of experiments (e.g., Fisher randomization)
I potential outcomes (Rubin’s counterfactuals)
I directed acyclic graphs (DAGs) with structural equation models (SEMs); popularized by Pearl as “structural causal models (SCMs)”
I time series causality
There is no consensus on the best approach to data causality Many authors consider their favored approach to be the exclusive correct approach.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50 Data causality
Data causality is data analysis to draw causal inferences
Approaches to data causality studies include
I design of experiments (e.g., Fisher randomization)
I potential outcomes (Rubin’s counterfactuals)
I directed acyclic graphs (DAGs) with structural equation models (SEMs); popularized by Pearl as “structural causal models (SCMs)”
I time series causality
There is no consensus on the best approach to data causality Many authors consider their favored approach to be the exclusive correct approach.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50 Approaches to time series causality can be roughly divided into five categories,
I Granger (model based approaches)
I Information-theoretic
I State space reconstruction (SSR)
I Correlation
I Penchant
Time series causality
Time series causality is data causality with time series data
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 8 / 50 Time series causality
Time series causality is data causality with time series data
Approaches to time series causality can be roughly divided into five categories,
I Granger (model based approaches)
I Information-theoretic
I State space reconstruction (SSR)
I Correlation
I Penchant
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 8 / 50 → and ← will be used as shorthand for causal statements, e.g., A drives B will be written as A → B.
Exploratory causal analysis Language
Exploring causal structures in data sets is distinct from confirming causal structures in data sets.
Causal language used in ECA should not be conflated with other typical uses; i.e., “cause”, “effect”, “drive”, etc. are used as technical terms with definitions unrelated to their common, everyday definitions.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 9 / 50 Exploratory causal analysis Language
Exploring causal structures in data sets is distinct from confirming causal structures in data sets.
Causal language used in ECA should not be conflated with other typical uses; i.e., “cause”, “effect”, “drive”, etc. are used as technical terms with definitions unrelated to their common, everyday definitions.
→ and ← will be used as shorthand for causal statements, e.g., A drives B will be written as A → B.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 9 / 50 Exploratory causal analysis Assumptions
A cause always precedes an effect. This assumption is required for the operational definitions of causality.
A driver may be present in the data being analyzed. This assumption may lead to issues of confounding.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 10 / 50 ECA summary vector
Define a vector ~g where each element gi is defined as either 0 if X → Y, 1 if X ← Y, or 2 if no causal inference can be made. The value of each gi comes from a specific time series causality tool.
ECA summary The ECA summary is either X → Y, Y → X, or undefined, with gi = 0 ∀gi ∈ ~g ⇒ X → Y and gi = 1 ∀gi ∈ ~g ⇒ Y → X.
Exploratory causal analysis ECA summary vector approach
We will not favor a specific operational definition of causality ⇒ we do not favor any particular tool
Consider a time series pair (X, Y),
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 11 / 50 ECA summary The ECA summary is either X → Y, Y → X, or undefined, with gi = 0 ∀gi ∈ ~g ⇒ X → Y and gi = 1 ∀gi ∈ ~g ⇒ Y → X.
Exploratory causal analysis ECA summary vector approach
We will not favor a specific operational definition of causality ⇒ we do not favor any particular tool
Consider a time series pair (X, Y), ECA summary vector
Define a vector ~g where each element gi is defined as either 0 if X → Y, 1 if X ← Y, or 2 if no causal inference can be made. The value of each gi comes from a specific time series causality tool.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 11 / 50 Exploratory causal analysis ECA summary vector approach
We will not favor a specific operational definition of causality ⇒ we do not favor any particular tool
Consider a time series pair (X, Y), ECA summary vector
Define a vector ~g where each element gi is defined as either 0 if X → Y, 1 if X ← Y, or 2 if no causal inference can be made. The value of each gi comes from a specific time series causality tool.
ECA summary The ECA summary is either X → Y, Y → X, or undefined, with gi = 0 ∀gi ∈ ~g ⇒ X → Y and gi = 1 ∀gi ∈ ~g ⇒ Y → X.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 11 / 50 g1 transfer entropy difference g2 Granger log-likelihood statistics g3 pairwise asymmetric inference (PAI) g4 average weighted mean observed leaning g5 lagged cross-correlation difference
Making an ECA summary
Our focus is on time series, so each causal inference gi ∈ ~g will be drawn from a tool in one of each of the five time series causality categories.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50 Making an ECA summary
Our focus is on time series, so each causal inference gi ∈ ~g will be drawn from a tool in one of each of the five time series causality categories.
g1 transfer entropy difference g2 Granger log-likelihood statistics g3 pairwise asymmetric inference (PAI) g4 average weighted mean observed leaning g5 lagged cross-correlation difference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50 Making an ECA summary
Our focus is on time series, so each causal inference gi ∈ ~g will be drawn from a tool in one of each of the five time series causality categories.
• g1 transfer entropy difference information-theoretic g2 Granger log-likelihood statistics g3 pairwise asymmetric inference (PAI) g4 average weighted mean observed leaning g5 lagged cross-correlation difference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50 Making an ECA summary
Our focus is on time series, so each causal inference gi ∈ ~g will be drawn from a tool in one of each of the five time series causality categories.
g1 transfer entropy difference information-theoretic • g2 Granger log-likelihood statistics Granger g3 pairwise asymmetric inference (PAI) g4 average weighted mean observed leaning g5 lagged cross-correlation difference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50 Making an ECA summary
Our focus is on time series, so each causal inference gi ∈ ~g will be drawn from a tool in one of each of the five time series causality categories.
g1 transfer entropy difference information-theoretic g2 Granger log-likelihood statistics Granger • g3 pairwise asymmetric inference (PAI) SSR g4 average weighted mean observed leaning g5 lagged cross-correlation difference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50 Making an ECA summary
Our focus is on time series, so each causal inference gi ∈ ~g will be drawn from a tool in one of each of the five time series causality categories.
g1 transfer entropy difference information-theoretic g2 Granger log-likelihood statistics Granger g3 pairwise asymmetric inference (PAI) SSR • g4 average weighted mean observed leaning penchant g5 lagged cross-correlation difference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50 Making an ECA summary
Our focus is on time series, so each causal inference gi ∈ ~g will be drawn from a tool in one of each of the five time series causality categories.
g1 transfer entropy difference information-theoretic g2 Granger log-likelihood statistics Granger g3 pairwise asymmetric inference (PAI) SSR g4 average weighted mean observed leaning penchant • g5 lagged cross-correlation difference correlation
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50 Transfer entropy (g1) Shannon entropy
The uncertainty that a random variable X takes some specific value Xn is given by the Shannon (or information) entropy,
N XX HX = − P(X = Xn) log2 P(X = Xn) n=1
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50 Transfer entropy (g1) Shannon entropy
The uncertainty that a random variable X takes some specific value Xn is given by the Shannon (or information) entropy,
N XX HX = − P(X = Xn) log2 P(X = Xn) n=1
P(X = Xn) is the probability that X takes the specific value Xn
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50 Transfer entropy (g1) Shannon entropy
The uncertainty that a random variable X takes some specific value Xn is given by the Shannon (or information) entropy,
N XX HX = − P(X = Xn) log2 P(X = Xn) n=1
The sum is over all possible values of Xn; n = 1, 2,..., NX
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50 Transfer entropy (g1) Shannon entropy
The uncertainty that a random variable X takes some specific value Xn is given by the Shannon (or information) entropy,
N XX HX = − P(X = Xn) log2 P(X = Xn) n=1
The base of the logarithm sets the entropy units, which is “bits” here
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50 completely uncertain of outcome Fair coin ⇒ pH = pT = 0.5 ⇒ HC = 1
completely certain of outcome Always heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1 ⇒ HC = 0
(Entropy calculations almost always assume 0 log2 0 := 0.)
Transfer entropy (g1) Shannon entropy example
Binary example (to help with intuition)
Consider a coin C that take the value H with probability pH and T with probability pT . The Shannon entropy is
HC = − (pH log2 pH + pT log2 pT )
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50 completely certain of outcome Always heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1 ⇒ HC = 0
(Entropy calculations almost always assume 0 log2 0 := 0.)
Transfer entropy (g1) Shannon entropy example
Binary example (to help with intuition)
Consider a coin C that take the value H with probability pH and T with probability pT . The Shannon entropy is
HC = − (pH log2 pH + pT log2 pT )
completely uncertain of outcome Fair coin ⇒ pH = pT = 0.5 ⇒ HC = 1
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50 (Entropy calculations almost always assume 0 log2 0 := 0.)
Transfer entropy (g1) Shannon entropy example
Binary example (to help with intuition)
Consider a coin C that take the value H with probability pH and T with probability pT . The Shannon entropy is
HC = − (pH log2 pH + pT log2 pT )
completely uncertain of outcome Fair coin ⇒ pH = pT = 0.5 ⇒ HC = 1
completely certain of outcome Always heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1 ⇒ HC = 0
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50 Transfer entropy (g1) Shannon entropy example
Binary example (to help with intuition)
Consider a coin C that take the value H with probability pH and T with probability pT . The Shannon entropy is
HC = − (pH log2 pH + pT log2 pT )
completely uncertain of outcome Fair coin ⇒ pH = pT = 0.5 ⇒ HC = 1
completely certain of outcome Always heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1 ⇒ HC = 0
(Entropy calculations almost always assume 0 log2 0 := 0.)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50 Transfer entropy (g1) Mutual information
A pair of random variables (X, Y) have some mutual information given by
IX ;Y = HX + HY − HX ,Y
NX NY X X P(X = Xn, Y = Ym) = P(X = X , Y = Y ) log n m 2 P(X = X )P(Y = Y ) n=1 m=1 n m
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50 Transfer entropy (g1) Mutual information
A pair of random variables (X, Y) have some mutual information given by
IX ;Y = HX + HY − HX ,Y
NX NY X X P(X = Xn, Y = Ym) = P(X = X , Y = Y ) log n m 2 P(X = X )P(Y = Y ) n=1 m=1 n m
P(X = Xn, Y = Ym) is the probability that X takes the specific value Xn and Y takes the specific value Ym
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50 Transfer entropy (g1) Mutual information
A pair of random variables (X, Y) have some mutual information given by
IX ;Y = HX + HY − HX ,Y
NX NY X X P(X=X ,Y=Y ) = P(X = X , Y = Y ) log n m n m 2 P(X=Xn)P(Y=Ym) n=1 m=1
If X and Y are independent, then P(X = Xn, Y = Ym) = P(X = Xn)P(Y = Ym) ⇒ IX ;Y = 0
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50 Transfer entropy (g1) Mutual information
A pair of random variables (X, Y) have some mutual information given by
IX ;Y = HX + HY − HX ,Y
NX NY X X P(X = Xn, Y = Ym) = P(X = X , Y = Y ) log n m 2 P(X = X )P(Y = Y ) n=1 m=1 n m
The mutual information is symmetric; i.e., IX ;Y = IY ;X
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50 Transfer entropy (g1) Mutual information
A pair of random variables (X, Y) have some mutual information given by
IX ;Y = HX + HY − HX ,Y
NX NY X X P(X = Xn, Y = Ym) = P(X = X , Y = Y ) log n m 2 P(X = X )P(Y = Y ) n=1 m=1 n m
Schreiber proposed an extension of the mutual information to measure “information flow” by making it conditional and including assumptions about the temporal behavior X and Y.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50 Transfer entropy (g1) Information flow
Suppose X and Y are both Markov processes. The directed flow of information from Y to X is given by the transfer entropy,
NX NY X X pn+1|n,m T = p log Y →X n+1,n,m 2 p n=1 m=1 n+1|n with
I pn+1,n,m = P(X(t + 1) = Xn+1, X(t) = Xn, Y(τ) = Ym)
I pn+1|n,m = P(X(t + 1) = Xn+1|X(t) = Xn, Y(τ) = Ym)
I pn+1|n = P(X(t + 1) = Xn+1|X(t) = Xn)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 16 / 50 Operational causality (information-theoretic) X causes Y if the directed information flow from X to Y is higher than the directed information flow from Y to X; i.e.,
TX →Y − TY →X > 0 ⇒ X → Y
TX →Y − TY →X < 0 ⇒ Y → X
TX →Y − TY →X = 0 ⇒ no causal inference
Transfer entropy (g1) Information flow There is no directed information flow from Y to X if X is conditionally independent of Y; i.e.,
pn+1|n,m = pn+1|n ⇒ TY →X = 0
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 17 / 50 Transfer entropy (g1) Information flow There is no directed information flow from Y to X if X is conditionally independent of Y; i.e.,
pn+1|n,m = pn+1|n ⇒ TY →X = 0
Operational causality (information-theoretic) X causes Y if the directed information flow from X to Y is higher than the directed information flow from Y to X; i.e.,
TX →Y − TY →X > 0 ⇒ X → Y
TX →Y − TY →X < 0 ⇒ Y → X
TX →Y − TY →X = 0 ⇒ no causal inference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 17 / 50 Granger causality (g2) Granger’s axioms
Consider a discrete universe with two time series X = {Xt | t = 1,..., n} and Y = {Yt | t = 1,..., n}, where t = n is considered the present time. All knowledge available in the universe at all times t ≤ n is denoted as Ωn.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50 Granger causality (g2) Granger’s axioms
Consider a discrete universe with two time series X = {Xt | t = 1,..., n} and Y = {Yt | t = 1,..., n}, where t = n is considered the present time. All knowledge available in the universe at all times t ≤ n is denoted as Ωn.
Axiom 1 The past and present may cause the future, but the future cannot cause the past.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50 Granger causality (g2) Granger’s axioms
Consider a discrete universe with two time series X = {Xt | t = 1,..., n} and Y = {Yt | t = 1,..., n}, where t = n is considered the present time. All knowledge available in the universe at all times t ≤ n is denoted as Ωn.
Axiom 1 The past and present may cause the future, but the future cannot cause the past.
Axiom 2
Ωn contains no redundant information, so that if some variable Z is functionally related to one or more other variables, in a deterministic fashion, then Z should be excluded from Ωn.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50 Granger causality (g2) Granger’s axioms
Consider a discrete universe with two time series X = {Xt | t = 1,..., n} and Y = {Yt | t = 1,..., n}, where t = n is considered the present time. All knowledge available in the universe at all times t ≤ n is denoted as Ωn.
Granger’s definition of causality Given some set A, Y causes X if
P(Xn+1 ∈ A|Ωn) 6= P(Xn+1 ∈ A|Ωn − Y)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50 Granger causality (g2) Granger’s axioms
Consider a discrete universe with two time series X = {Xt | t = 1,..., n} and Y = {Yt | t = 1,..., n}, where t = n is considered the present time. All knowledge available in the universe at all times t ≤ n is denoted as Ωn.
Granger’s definition of causality Given some set A, Y causes X if
P(Xn+1 ∈ A|Ωn) 6= P(Xn+1 ∈ A|Ωn − Y)
Granger’s original goal was to make this notion of causality “operational”.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50 Granger causality (g2) VAR models
Consider a time series pair (X, Y). Suppose there is a vector autoregressive (VAR) model that describes the pair,
n i i Xt X A11 A12 Xt−i ε1,t = i i + Yt A A Yt−i ε2,t i=1 21 22
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50 Granger causality (g2) VAR models
Consider a time series pair (X, Y). Suppose there is a vector autoregressive (VAR) model that describes the pair,
n i i Xt X A11 A12 Xt−i ε1,t = i i + Yt A A Yt−i ε2,t i=1 21 22
The current time step t of X and Y
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50 Granger causality (g2) VAR models
Consider a time series pair (X, Y). Suppose there is a vector autoregressive (VAR) model that describes the pair,
n i i Xt X A11 A12 Xt−i ε1,t = i i + Yt A A Yt−i ε2,t i=1 21 22
The current time step t of X and Y is modeled as a sum of n past steps
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50 Granger causality (g2) VAR models
Consider a time series pair (X, Y). Suppose there is a vector autoregressive (VAR) model that describes the pair,
n i i Xt X A11 A12 Xt−i ε1,t = i i + Yt A A Yt−i ε2,t i=1 21 22
The current time step t of X and Y is modeled as a sum of n past steps of X and Y,
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50 Granger causality (g2) VAR models
Consider a time series pair (X, Y). Suppose there is a vector autoregressive (VAR) model that describes the pair,
n i i Xt X A11 A12 Xt−i ε1,t = i i + Yt A A Yt−i ε2,t i=1 21 22
The current time step t of X and Y is modeled as a sum of n past steps of X and Y, plus uncorrelated noise terms.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50 Granger causality (g2) Comparison of VAR models
Consider two different VAR models for the pair (X, Y),
n X X A A X ε t = xx,i xy,i t−i + x,t Yt Ayx,i Ayy,i Yt−i εy,t i=1
n 0 0 Xt X Axx,i 0 Xt−i εx,t = 0 + 0 Yt 0 A Yt−i ε i=1 yy,i y,t
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50 Granger causality (g2) Comparison of VAR models
Consider two different VAR models for the pair (X, Y),
n X X A A X ε t = xx,i xy,i t−i + x,t Yt Ayx,i Ayy,i Yt−i εy,t i=1
n 0 0 Xt X Axx,i 0 Xt−i εx,t = 0 + 0 Yt 0 A Yt−i ε i=1 yy,i y,t The G-causality log-likelihood statistic is defined as
0 |Σxx | FY →X = ln |Σxx |
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50 Granger causality (g2) Comparison of VAR models Consider two different VAR models for the pair (X, Y),
n X X A A X ε t = xx,i xy,i t−i + x,t Yt Ayx,i Ayy,i Yt−i εy,t i=1
n 0 0 Xt X Axx,i 0 Xt−i εx,t = 0 + 0 Yt 0 A Yt−i ε i=1 yy,i y,t The G-causality log-likelihood statistic is defined as
0 |Σxx | FY →X = ln |Σxx | Covariance of X model residuals given no dependence on Y
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50 Granger causality (g2) Comparison of VAR models Consider two different VAR models for the pair (X, Y),
n X X A A X ε t = xx,i xy,i t−i + x,t Yt Ayx,i Ayy,i Yt−i εy,t i=1
n 0 0 Xt X Axx,i 0 Xt−i εx,t = 0 + 0 Yt 0 A Yt−i ε i=1 yy,i y,t The G-causality log-likelihood statistic is defined as
0 |Σxx | FY →X = ln |Σxx |
Covariance of X model residuals given a possible dependence on Y
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50 Operational causality (Granger) X causes Y if the X-dependent forecast of Y decreases the Y model residual covariance (as compared to the X-independent forecast) more than the Y-dependent forecast of X decreases the X model residual covariance (as compared to the Y-independent forecast); i.e.,
FX →Y − FY →X > 0 ⇒ X → Y
FX →Y − FY →X < 0 ⇒ Y → X
FX →Y − FY →X = 0 ⇒ no causal inference
Granger causality (g2) G-causality log-likelihood statistic If both VAR models fit (or “forecast”) the data equally well, then there is no G-causality; i.e.,
0 |Σxx | = |Σxx | ⇒ FY →X = 0
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 21 / 50 Granger causality (g2) G-causality log-likelihood statistic If both VAR models fit (or “forecast”) the data equally well, then there is no G-causality; i.e.,
0 |Σxx | = |Σxx | ⇒ FY →X = 0
Operational causality (Granger) X causes Y if the X-dependent forecast of Y decreases the Y model residual covariance (as compared to the X-independent forecast) more than the Y-dependent forecast of X decreases the X model residual covariance (as compared to the Y-independent forecast); i.e.,
FX →Y − FY →X > 0 ⇒ X → Y
FX →Y − FY →X < 0 ⇒ Y → X
FX →Y − FY →X = 0 ⇒ no causal inference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 21 / 50 Pairwise asymmetric inference (g3) State space reconstruction
Consider an embedding of the time series X = {xt | t = 0, 1 ..., L − 1, L} constructed from delayed time steps as
X˜ = {x˜t | t = 1 + (E − 1)τ, . . . , L}
with x˜t = xt , xt−τ , xt−2τ ,..., xt−(E−1)τ
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 22 / 50 Pairwise asymmetric inference (g3) State space reconstruction
Consider an embedding of the time series X = {xt | t = 0, 1 ..., L − 1, L} constructed from delayed time steps as
X˜ = {x˜t | t = 1 + (E − 1)τ, . . . , L}
with x˜ = x , x , x ,..., x t t t− τ t−2 τ t−(E−1) τ
I τ is the delay time step
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 22 / 50 Pairwise asymmetric inference (g3) State space reconstruction
Consider an embedding of the time series X = {xt | t = 0, 1 ..., L − 1, L} constructed from delayed time steps as
X˜ = {x˜t | t = 1 + (E − 1)τ, . . . , L}
with x˜t = xt , xt−τ , xt−2τ ,..., x t−( E −1)τ
I τ is the delay time step
I E is the embedding dimension
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 22 / 50 Pairwise asymmetric inference (g3) Cross-mapping
Consider a time series pair (X, Y). The shadow manifold of X (labeled X˜) is constructed from the points
x˜t = (xt , xt−τ , xt−2τ ,..., xt−(E−1)τ , yt )
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50 Pairwise asymmetric inference (g3) Cross-mapping
Consider a time series pair (X, Y). The shadow manifold of X (labeled X˜) is constructed from the points
x˜t = (xt , xt−τ , xt−2τ ,..., xt−(E−1)τ , yt )
1. Find the n nearest neighbors tox ˜t (in X˜), where “nearest” means smallest Euclidean distance, d; i.e., d1 < d2 < . . . < dn
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50 Pairwise asymmetric inference (g3) Cross-mapping
Consider a time series pair (X, Y). The shadow manifold of X (labeled X˜) is constructed from the points
x˜t = (xt , xt−τ , xt−2τ ,..., xt−(E−1)τ , yt )
2. Create weights,w, from the nearest neighbors as
− di e d1 w = i dj Pn − d j=1 e 1
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50 Pairwise asymmetric inference (g3) Cross-mapping
Consider a time series pair (X, Y). The shadow manifold of X (labeled X˜) is constructed from the points
x˜t = (xt , xt−τ , xt−2τ ,..., xt−(E−1)τ , yt )
3. Construct the cross-mapped estimate of Y using the weights as
( n ) X Y|X˜ = Y |X˜ = w Y | t = 1 + (E − 1)τ, . . . , L t i tˆi i=1
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50 Pairwise asymmetric inference (g3) Cross-mapping
Consider a time series pair (X, Y). The shadow manifold of X (labeled X˜) is constructed from the points
x˜t = (xt , xt−τ , xt−2τ ,..., xt−(E−1)τ , yt )
Each cross-mapped point in the estimate of Y, i.e.,
n − di /d1 ˜ X e Yt |X = n Ytˆ P e−dj /d1 i i=1 j=1 depends on comparisons of
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50 Pairwise asymmetric inference (g3) Cross-mapping
Consider a time series pair (X, Y). The shadow manifold of X (labeled X˜) is constructed from the points
x˜t = (xt , xt−τ , xt−2τ ,..., xt−(E−1)τ , yt )
Each cross-mapped point in the estimate of Y, i.e.,
n −di /d1 ˜ X e Yt |X = n Ytˆ P e−dj /d1 i i=1 j=1 depends on comparisons of the pasts of X
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50 Pairwise asymmetric inference (g3) Cross-mapping
Consider a time series pair (X, Y). The shadow manifold of X (labeled X˜) is constructed from the points
x˜t = ( xt , xt−τ , xt−2τ ,..., xt−(E−1)τ , yt )
Each cross-mapped point in the estimate of Y, i.e.,
n −di /d1 ˜ X e Yt |X = n Ytˆ P e−dj /d1 i i=1 j=1 depends on comparisons of the pasts of X and the presents of X andY .
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50 Cross-mapping interpretation If similar histories of X (i.e., nearest neighbors in the shadow manifold) capably estimate Y (i.e., lead to CYX ≈ 1, or at least CYX 6= 0), then the presence (or action) of Y in the system has been recorded in X.
Pairwise asymmetric inference (g3) Cross-mapped correlation
A good cross-mapped estimate is defined as one that is strongly correlated with the original times series. The cross-mapped correlation is
h i2 CYX = ρ(Y, Y|X˜)
where ρ (·) is Pearson’s correlation coefficient.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 24 / 50 Pairwise asymmetric inference (g3) Cross-mapped correlation
A good cross-mapped estimate is defined as one that is strongly correlated with the original times series. The cross-mapped correlation is
h i2 CYX = ρ(Y, Y|X˜)
where ρ (·) is Pearson’s correlation coefficient. Cross-mapping interpretation If similar histories of X (i.e., nearest neighbors in the shadow manifold) capably estimate Y (i.e., lead to CYX ≈ 1, or at least CYX 6= 0), then the presence (or action) of Y in the system has been recorded in X.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 24 / 50 Operational causality (SSR) X causes Y if similar histories of Y estimate X better than similar histories of X estimate Y, where the “similar histories” of one time series are used to estimate another time series through shadow manifold nearest neighbor weighting (cross-mapping); i.e.,
CYX − CXY < 0 ⇒ X → Y
CYX − CXY > 0 ⇒ Y → X
CYX − CXY = 0 ⇒ no causal inference
Pairwise asymmetric inference (g3) Cross-mapping interpretation of causality
A time series pair (X, Y) will have two cross-mapped correlations, CYX and CXY .
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 25 / 50 Pairwise asymmetric inference (g3) Cross-mapping interpretation of causality
A time series pair (X, Y) will have two cross-mapped correlations, CYX and CXY . Operational causality (SSR) X causes Y if similar histories of Y estimate X better than similar histories of X estimate Y, where the “similar histories” of one time series are used to estimate another time series through shadow manifold nearest neighbor weighting (cross-mapping); i.e.,
CYX − CXY < 0 ⇒ X → Y
CYX − CXY > 0 ⇒ Y → X
CYX − CXY = 0 ⇒ no causal inference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 25 / 50 Weighted mean observed leaning (g4) Causal penchant
The causal penchant ρEC ∈ [1, −1] is ρEC = P (E|C) − P E|C¯
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50 Weighted mean observed leaning (g4) Causal penchant
The causal penchant ρEC ∈ [1, −1] is ρEC = P (E|C) − P E|C¯
P (E|C) is the probability of some effect E given some cause C
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50 Weighted mean observed leaning (g4) Causal penchant
The causal penchant ρEC ∈ [1, −1] is