<<

Exploratory Causal Analysis in Bivariate Time Series Data Abstract Many scientific disciplines rely on observational data of systems for which it is difficult (or impossible) to implement controlled experiments and data analysis techniques are required for identifying causal and relationships directly from observational data. This need has lead to the development of many different time series approaches and tools including transfer entropy, convergent cross-mapping (CCM), and statistics.

A practicing analyst can explore the literature to find many proposals for identifying drivers and causal connections in times series data sets, but little research exists of how these tools compare to each other in practice. This work introduces and defines exploratory causal analysis (ECA) to address this issue. The motivation is to provide a framework for exploring potential causal structures in time series data sets.

J. M. McCracken Defense talk for PhD in Physics, Department of Physics and Astronomy 10:00 AM November 20, 2015; Exploratory Hall, 3301 Advisor: Dr. Robert Weigel; Committee: Dr. Paul So, Dr. Tim Sauer

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 1 / 50 Exploratory Causal Analysis in Bivariate Time Series Data

J. M. McCracken Department of Physics and Astronomy George Mason University, Fairfax, VA

November 20, 2015

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 2 / 50 Outline 1. Motivation 2. Causality studies 3. Data causality 4. Exploratory causal analysis 5. Making an ECA summary Transfer entropy difference Granger causality statistic Pairwise asymmetric inference Weighed mean observed leaning Lagged cross-correlation difference 6. Computational tools for the ECA summary 7. Empirical examples Cooling/Heating System Data Snowfall Data 8. Times series causality as data analysis

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 3 / 50 Question Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data, I had straightforward, preferably well-established, interpretations, I were reliable, I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

Motivation Data Consider two sets of time series measurements, X and Y.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data, I had straightforward, preferably well-established, interpretations, I were reliable, I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

Motivation Data Consider two sets of time series measurements, X and Y.

Question Is there evidence that X “drives” Y?

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 I worked with time series data, I had straightforward, preferably well-established, interpretations, I were reliable, I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

Motivation Data Consider two sets of time series measurements, X and Y.

Question Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking for analysis tools that

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 I had straightforward, preferably well-established, interpretations, I were reliable, I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

Motivation Data Consider two sets of time series measurements, X and Y.

Question Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data,

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 I were reliable, I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

Motivation Data Consider two sets of time series measurements, X and Y.

Question Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data, I had straightforward, preferably well-established, interpretations,

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

Motivation Data Consider two sets of time series measurements, X and Y.

Question Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data, I had straightforward, preferably well-established, interpretations, I were reliable,

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

Motivation Data Consider two sets of time series measurements, X and Y.

Question Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data, I had straightforward, preferably well-established, interpretations, I were reliable, I and did not require studying the (vast) philosophical causality literature.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 This work stems from our search for such a tool.

Motivation Data Consider two sets of time series measurements, X and Y.

Question Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data, I had straightforward, preferably well-established, interpretations, I were reliable, I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 Motivation Data Consider two sets of time series measurements, X and Y.

Question Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking for analysis tools that I worked with time series data, I had straightforward, preferably well-established, interpretations, I were reliable, I and did not require studying the (vast) philosophical causality literature. Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50 Causality studies

The study of causality is as old as science itself

I Modern historians credit Aristotle with both the first theory of causality (“four causes”) and an early version of the scientific method

I The modern study of causality is broadly interdisciplinary; far too broad to review in a short talk.

Illari and Russo’s textbook1provides an overview of causality studies

1 Illari, P., & Russo, F. (2014). Causality: Philosophical theory meets scientific practice. Oxford University Press. J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 5 / 50 Foundational causality “Is a cause required to precede an effect?” or “How are causes and effects related in space-time?”

Data causality “Does smoking cause lung cancer?” or “Are traffic accidents caused by rain storms?”

Towards a taxonomy of causal studies

Paul Holland identified four types of causal questions2:

I the ultimate meaningfulness of the notion of causality

I the details of causal mechanisms

I the causes of a given effect

I the effects of a given cause

2 Holland, P. W. (1986). Statistics and . Journal of the American statistical Association, 81(396), 945-960. J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 6 / 50 Data causality “Does smoking cause lung cancer?” or “Are traffic accidents caused by rain storms?”

Towards a taxonomy of causal studies

Paul Holland identified four types of causal questions2:

I the ultimate meaningfulness of the notion of causality

I the details of causal mechanisms

I the causes of a given effect

I the effects of a given cause

Foundational causality “Is a cause required to precede an effect?” or “How are causes and effects related in space-time?”

2 Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396), 945-960. J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 6 / 50 Towards a taxonomy of causal studies

Paul Holland identified four types of causal questions2:

I the ultimate meaningfulness of the notion of causality

I the details of causal mechanisms

I the causes of a given effect

I the effects of a given cause

Foundational causality “Is a cause required to precede an effect?” or “How are causes and effects related in space-time?”

Data causality “Does smoking cause lung cancer?” or “Are traffic accidents caused by rain storms?”

2 Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396), 945-960. J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 6 / 50 Approaches to data causality studies include

I design of experiments (e.g., Fisher randomization)

I potential outcomes (Rubin’s counterfactuals)

I directed acyclic graphs (DAGs) with structural equation models (SEMs); popularized by Pearl as “structural causal models (SCMs)”

I time series causality

There is no consensus on the best approach to data causality Many authors consider their favored approach to be the exclusive correct approach.

Data causality

Data causality is data analysis to draw causal inferences

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50 I potential outcomes (Rubin’s counterfactuals)

I directed acyclic graphs (DAGs) with structural equation models (SEMs); popularized by Pearl as “structural causal models (SCMs)”

I time series causality

There is no consensus on the best approach to data causality Many authors consider their favored approach to be the exclusive correct approach.

Data causality

Data causality is data analysis to draw causal inferences

Approaches to data causality studies include

I design of experiments (e.g., Fisher randomization)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50 I directed acyclic graphs (DAGs) with structural equation models (SEMs); popularized by Pearl as “structural causal models (SCMs)”

I time series causality

There is no consensus on the best approach to data causality Many authors consider their favored approach to be the exclusive correct approach.

Data causality

Data causality is data analysis to draw causal inferences

Approaches to data causality studies include

I design of experiments (e.g., Fisher randomization)

I potential outcomes (Rubin’s counterfactuals)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50 I time series causality

There is no consensus on the best approach to data causality Many authors consider their favored approach to be the exclusive correct approach.

Data causality

Data causality is data analysis to draw causal inferences

Approaches to data causality studies include

I design of experiments (e.g., Fisher randomization)

I potential outcomes (Rubin’s counterfactuals)

I directed acyclic graphs (DAGs) with structural equation models (SEMs); popularized by Pearl as “structural causal models (SCMs)”

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50 There is no consensus on the best approach to data causality Many authors consider their favored approach to be the exclusive correct approach.

Data causality

Data causality is data analysis to draw causal inferences

Approaches to data causality studies include

I design of experiments (e.g., Fisher randomization)

I potential outcomes (Rubin’s counterfactuals)

I directed acyclic graphs (DAGs) with structural equation models (SEMs); popularized by Pearl as “structural causal models (SCMs)”

I time series causality

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50 Data causality

Data causality is data analysis to draw causal inferences

Approaches to data causality studies include

I design of experiments (e.g., Fisher randomization)

I potential outcomes (Rubin’s counterfactuals)

I directed acyclic graphs (DAGs) with structural equation models (SEMs); popularized by Pearl as “structural causal models (SCMs)”

I time series causality

There is no consensus on the best approach to data causality Many authors consider their favored approach to be the exclusive correct approach.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50 Data causality

Data causality is data analysis to draw causal inferences

Approaches to data causality studies include

I design of experiments (e.g., Fisher randomization)

I potential outcomes (Rubin’s counterfactuals)

I directed acyclic graphs (DAGs) with structural equation models (SEMs); popularized by Pearl as “structural causal models (SCMs)”

I time series causality

There is no consensus on the best approach to data causality Many authors consider their favored approach to be the exclusive correct approach.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50 Approaches to time series causality can be roughly divided into five categories,

I Granger (model based approaches)

I Information-theoretic

I State space reconstruction (SSR)

I Correlation

I Penchant

Time series causality

Time series causality is data causality with time series data

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 8 / 50 Time series causality

Time series causality is data causality with time series data

Approaches to time series causality can be roughly divided into five categories,

I Granger (model based approaches)

I Information-theoretic

I State space reconstruction (SSR)

I Correlation

I Penchant

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 8 / 50 → and ← will be used as shorthand for causal statements, e.g., A drives B will be written as A → B.

Exploratory causal analysis Language

Exploring causal structures in data sets is distinct from confirming causal structures in data sets.

Causal language used in ECA should not be conflated with other typical uses; i.e., “cause”, “effect”, “drive”, etc. are used as technical terms with definitions unrelated to their common, everyday definitions.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 9 / 50 Exploratory causal analysis Language

Exploring causal structures in data sets is distinct from confirming causal structures in data sets.

Causal language used in ECA should not be conflated with other typical uses; i.e., “cause”, “effect”, “drive”, etc. are used as technical terms with definitions unrelated to their common, everyday definitions.

→ and ← will be used as shorthand for causal statements, e.g., A drives B will be written as A → B.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 9 / 50 Exploratory causal analysis Assumptions

A cause always precedes an effect. This assumption is required for the operational definitions of causality.

A driver may be present in the data being analyzed. This assumption may lead to issues of confounding.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 10 / 50 ECA summary vector

Define a vector ~g where each element gi is defined as either 0 if X → Y, 1 if X ← Y, or 2 if no causal inference can be made. The value of each gi comes from a specific time series causality tool.

ECA summary The ECA summary is either X → Y, Y → X, or undefined, with gi = 0 ∀gi ∈ ~g ⇒ X → Y and gi = 1 ∀gi ∈ ~g ⇒ Y → X.

Exploratory causal analysis ECA summary vector approach

We will not favor a specific operational definition of causality ⇒ we do not favor any particular tool

Consider a time series pair (X, Y),

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 11 / 50 ECA summary The ECA summary is either X → Y, Y → X, or undefined, with gi = 0 ∀gi ∈ ~g ⇒ X → Y and gi = 1 ∀gi ∈ ~g ⇒ Y → X.

Exploratory causal analysis ECA summary vector approach

We will not favor a specific operational definition of causality ⇒ we do not favor any particular tool

Consider a time series pair (X, Y), ECA summary vector

Define a vector ~g where each element gi is defined as either 0 if X → Y, 1 if X ← Y, or 2 if no causal inference can be made. The value of each gi comes from a specific time series causality tool.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 11 / 50 Exploratory causal analysis ECA summary vector approach

We will not favor a specific operational definition of causality ⇒ we do not favor any particular tool

Consider a time series pair (X, Y), ECA summary vector

Define a vector ~g where each element gi is defined as either 0 if X → Y, 1 if X ← Y, or 2 if no causal inference can be made. The value of each gi comes from a specific time series causality tool.

ECA summary The ECA summary is either X → Y, Y → X, or undefined, with gi = 0 ∀gi ∈ ~g ⇒ X → Y and gi = 1 ∀gi ∈ ~g ⇒ Y → X.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 11 / 50 g1 transfer entropy difference g2 Granger log-likelihood statistics g3 pairwise asymmetric inference (PAI) g4 average weighted mean observed leaning g5 lagged cross-correlation difference

Making an ECA summary

Our focus is on time series, so each causal inference gi ∈ ~g will be drawn from a tool in one of each of the five time series causality categories.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50 Making an ECA summary

Our focus is on time series, so each causal inference gi ∈ ~g will be drawn from a tool in one of each of the five time series causality categories.

g1 transfer entropy difference g2 Granger log-likelihood statistics g3 pairwise asymmetric inference (PAI) g4 average weighted mean observed leaning g5 lagged cross-correlation difference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50 Making an ECA summary

Our focus is on time series, so each causal inference gi ∈ ~g will be drawn from a tool in one of each of the five time series causality categories.

• g1 transfer entropy difference information-theoretic g2 Granger log-likelihood statistics g3 pairwise asymmetric inference (PAI) g4 average weighted mean observed leaning g5 lagged cross-correlation difference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50 Making an ECA summary

Our focus is on time series, so each causal inference gi ∈ ~g will be drawn from a tool in one of each of the five time series causality categories.

g1 transfer entropy difference information-theoretic • g2 Granger log-likelihood statistics Granger g3 pairwise asymmetric inference (PAI) g4 average weighted mean observed leaning g5 lagged cross-correlation difference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50 Making an ECA summary

Our focus is on time series, so each causal inference gi ∈ ~g will be drawn from a tool in one of each of the five time series causality categories.

g1 transfer entropy difference information-theoretic g2 Granger log-likelihood statistics Granger • g3 pairwise asymmetric inference (PAI) SSR g4 average weighted mean observed leaning g5 lagged cross-correlation difference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50 Making an ECA summary

Our focus is on time series, so each causal inference gi ∈ ~g will be drawn from a tool in one of each of the five time series causality categories.

g1 transfer entropy difference information-theoretic g2 Granger log-likelihood statistics Granger g3 pairwise asymmetric inference (PAI) SSR • g4 average weighted mean observed leaning penchant g5 lagged cross-correlation difference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50 Making an ECA summary

Our focus is on time series, so each causal inference gi ∈ ~g will be drawn from a tool in one of each of the five time series causality categories.

g1 transfer entropy difference information-theoretic g2 Granger log-likelihood statistics Granger g3 pairwise asymmetric inference (PAI) SSR g4 average weighted mean observed leaning penchant • g5 lagged cross-correlation difference correlation

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50 Transfer entropy (g1) Shannon entropy

The uncertainty that a random variable X takes some specific value Xn is given by the Shannon (or information) entropy,

N XX HX = − P(X = Xn) log2 P(X = Xn) n=1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50 Transfer entropy (g1) Shannon entropy

The uncertainty that a random variable X takes some specific value Xn is given by the Shannon (or information) entropy,

N XX HX = − P(X = Xn) log2 P(X = Xn) n=1

P(X = Xn) is the probability that X takes the specific value Xn

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50 Transfer entropy (g1) Shannon entropy

The uncertainty that a random variable X takes some specific value Xn is given by the Shannon (or information) entropy,

N XX HX = − P(X = Xn) log2 P(X = Xn) n=1

The sum is over all possible values of Xn; n = 1, 2,..., NX

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50 Transfer entropy (g1) Shannon entropy

The uncertainty that a random variable X takes some specific value Xn is given by the Shannon (or information) entropy,

N XX HX = − P(X = Xn) log2 P(X = Xn) n=1

The base of the logarithm sets the entropy units, which is “bits” here

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50 completely uncertain of outcome Fair coin ⇒ pH = pT = 0.5 ⇒ HC = 1

completely certain of outcome Always heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1 ⇒ HC = 0

(Entropy calculations almost always assume 0 log2 0 := 0.)

Transfer entropy (g1) Shannon entropy example

Binary example (to help with intuition)

Consider a coin C that take the value H with probability pH and T with probability pT . The Shannon entropy is

HC = − (pH log2 pH + pT log2 pT )

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50 completely certain of outcome Always heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1 ⇒ HC = 0

(Entropy calculations almost always assume 0 log2 0 := 0.)

Transfer entropy (g1) Shannon entropy example

Binary example (to help with intuition)

Consider a coin C that take the value H with probability pH and T with probability pT . The Shannon entropy is

HC = − (pH log2 pH + pT log2 pT )

completely uncertain of outcome Fair coin ⇒ pH = pT = 0.5 ⇒ HC = 1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50 (Entropy calculations almost always assume 0 log2 0 := 0.)

Transfer entropy (g1) Shannon entropy example

Binary example (to help with intuition)

Consider a coin C that take the value H with probability pH and T with probability pT . The Shannon entropy is

HC = − (pH log2 pH + pT log2 pT )

completely uncertain of outcome Fair coin ⇒ pH = pT = 0.5 ⇒ HC = 1

completely certain of outcome Always heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1 ⇒ HC = 0

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50 Transfer entropy (g1) Shannon entropy example

Binary example (to help with intuition)

Consider a coin C that take the value H with probability pH and T with probability pT . The Shannon entropy is

HC = − (pH log2 pH + pT log2 pT )

completely uncertain of outcome Fair coin ⇒ pH = pT = 0.5 ⇒ HC = 1

completely certain of outcome Always heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1 ⇒ HC = 0

(Entropy calculations almost always assume 0 log2 0 := 0.)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50 Transfer entropy (g1)

A pair of random variables (X, Y) have some mutual information given by

IX ;Y = HX + HY − HX ,Y

NX NY X X P(X = Xn, Y = Ym) = P(X = X , Y = Y ) log n m 2 P(X = X )P(Y = Y ) n=1 m=1 n m

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50 Transfer entropy (g1) Mutual information

A pair of random variables (X, Y) have some mutual information given by

IX ;Y = HX + HY − HX ,Y

NX NY X X P(X = Xn, Y = Ym) = P(X = X , Y = Y ) log n m 2 P(X = X )P(Y = Y ) n=1 m=1 n m

P(X = Xn, Y = Ym) is the probability that X takes the specific value Xn and Y takes the specific value Ym

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50 Transfer entropy (g1) Mutual information

A pair of random variables (X, Y) have some mutual information given by

IX ;Y = HX + HY − HX ,Y

NX NY X X P(X=X ,Y=Y ) = P(X = X , Y = Y ) log n m n m 2 P(X=Xn)P(Y=Ym) n=1 m=1

If X and Y are independent, then P(X = Xn, Y = Ym) = P(X = Xn)P(Y = Ym) ⇒ IX ;Y = 0

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50 Transfer entropy (g1) Mutual information

A pair of random variables (X, Y) have some mutual information given by

IX ;Y = HX + HY − HX ,Y

NX NY X X P(X = Xn, Y = Ym) = P(X = X , Y = Y ) log n m 2 P(X = X )P(Y = Y ) n=1 m=1 n m

The mutual information is symmetric; i.e., IX ;Y = IY ;X

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50 Transfer entropy (g1) Mutual information

A pair of random variables (X, Y) have some mutual information given by

IX ;Y = HX + HY − HX ,Y

NX NY X X P(X = Xn, Y = Ym) = P(X = X , Y = Y ) log n m 2 P(X = X )P(Y = Y ) n=1 m=1 n m

Schreiber proposed an extension of the mutual information to measure “information flow” by making it conditional and including assumptions about the temporal behavior X and Y.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50 Transfer entropy (g1) Information flow

Suppose X and Y are both Markov processes. The directed flow of information from Y to X is given by the transfer entropy,

NX NY X X pn+1|n,m T = p log Y →X n+1,n,m 2 p n=1 m=1 n+1|n with

I pn+1,n,m = P(X(t + 1) = Xn+1, X(t) = Xn, Y(τ) = Ym)

I pn+1|n,m = P(X(t + 1) = Xn+1|X(t) = Xn, Y(τ) = Ym)

I pn+1|n = P(X(t + 1) = Xn+1|X(t) = Xn)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 16 / 50 Operational causality (information-theoretic) X causes Y if the directed information flow from X to Y is higher than the directed information flow from Y to X; i.e.,

TX →Y − TY →X > 0 ⇒ X → Y

TX →Y − TY →X < 0 ⇒ Y → X

TX →Y − TY →X = 0 ⇒ no causal inference

Transfer entropy (g1) Information flow There is no directed information flow from Y to X if X is conditionally independent of Y; i.e.,

pn+1|n,m = pn+1|n ⇒ TY →X = 0

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 17 / 50 Transfer entropy (g1) Information flow There is no directed information flow from Y to X if X is conditionally independent of Y; i.e.,

pn+1|n,m = pn+1|n ⇒ TY →X = 0

Operational causality (information-theoretic) X causes Y if the directed information flow from X to Y is higher than the directed information flow from Y to X; i.e.,

TX →Y − TY →X > 0 ⇒ X → Y

TX →Y − TY →X < 0 ⇒ Y → X

TX →Y − TY →X = 0 ⇒ no causal inference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 17 / 50 Granger causality (g2) Granger’s axioms

Consider a discrete universe with two time series X = {Xt | t = 1,..., n} and Y = {Yt | t = 1,..., n}, where t = n is considered the present time. All knowledge available in the universe at all times t ≤ n is denoted as Ωn.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50 Granger causality (g2) Granger’s axioms

Consider a discrete universe with two time series X = {Xt | t = 1,..., n} and Y = {Yt | t = 1,..., n}, where t = n is considered the present time. All knowledge available in the universe at all times t ≤ n is denoted as Ωn.

Axiom 1 The past and present may cause the future, but the future cannot cause the past.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50 Granger causality (g2) Granger’s axioms

Consider a discrete universe with two time series X = {Xt | t = 1,..., n} and Y = {Yt | t = 1,..., n}, where t = n is considered the present time. All knowledge available in the universe at all times t ≤ n is denoted as Ωn.

Axiom 1 The past and present may cause the future, but the future cannot cause the past.

Axiom 2

Ωn contains no redundant information, so that if some variable Z is functionally related to one or more other variables, in a deterministic fashion, then Z should be excluded from Ωn.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50 Granger causality (g2) Granger’s axioms

Consider a discrete universe with two time series X = {Xt | t = 1,..., n} and Y = {Yt | t = 1,..., n}, where t = n is considered the present time. All knowledge available in the universe at all times t ≤ n is denoted as Ωn.

Granger’s definition of causality Given some set A, Y causes X if

P(Xn+1 ∈ A|Ωn) 6= P(Xn+1 ∈ A|Ωn − Y)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50 Granger causality (g2) Granger’s axioms

Consider a discrete universe with two time series X = {Xt | t = 1,..., n} and Y = {Yt | t = 1,..., n}, where t = n is considered the present time. All knowledge available in the universe at all times t ≤ n is denoted as Ωn.

Granger’s definition of causality Given some set A, Y causes X if

P(Xn+1 ∈ A|Ωn) 6= P(Xn+1 ∈ A|Ωn − Y)

Granger’s original goal was to make this notion of causality “operational”.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50 Granger causality (g2) VAR models

Consider a time series pair (X, Y). Suppose there is a vector autoregressive (VAR) model that describes the pair,

  n  i i      Xt X A11 A12 Xt−i ε1,t = i i + Yt A A Yt−i ε2,t i=1 21 22

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50 Granger causality (g2) VAR models

Consider a time series pair (X, Y). Suppose there is a vector autoregressive (VAR) model that describes the pair,

  n  i i      Xt X A11 A12 Xt−i ε1,t = i i + Yt A A Yt−i ε2,t i=1 21 22

The current time step t of X and Y

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50 Granger causality (g2) VAR models

Consider a time series pair (X, Y). Suppose there is a vector autoregressive (VAR) model that describes the pair,

  n  i i      Xt X A11 A12 Xt−i ε1,t = i i + Yt A A Yt−i ε2,t i=1 21 22

The current time step t of X and Y is modeled as a sum of n past steps

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50 Granger causality (g2) VAR models

Consider a time series pair (X, Y). Suppose there is a vector autoregressive (VAR) model that describes the pair,

  n  i i      Xt X A11 A12 Xt−i ε1,t = i i + Yt A A Yt−i ε2,t i=1 21 22

The current time step t of X and Y is modeled as a sum of n past steps of X and Y,

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50 Granger causality (g2) VAR models

Consider a time series pair (X, Y). Suppose there is a vector autoregressive (VAR) model that describes the pair,

  n  i i      Xt X A11 A12 Xt−i ε1,t = i i + Yt A A Yt−i ε2,t i=1 21 22

The current time step t of X and Y is modeled as a sum of n past steps of X and Y, plus uncorrelated noise terms.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50 Granger causality (g2) Comparison of VAR models

Consider two different VAR models for the pair (X, Y),

n X  X A A  X  ε  t = xx,i xy,i t−i + x,t Yt Ayx,i Ayy,i Yt−i εy,t i=1

  n  0     0  Xt X Axx,i 0 Xt−i εx,t = 0 + 0 Yt 0 A Yt−i ε i=1 yy,i y,t

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50 Granger causality (g2) Comparison of VAR models

Consider two different VAR models for the pair (X, Y),

n X  X A A  X  ε  t = xx,i xy,i t−i + x,t Yt Ayx,i Ayy,i Yt−i εy,t i=1

  n  0     0  Xt X Axx,i 0 Xt−i εx,t = 0 + 0 Yt 0 A Yt−i ε i=1 yy,i y,t The G-causality log-likelihood statistic is defined as

0 |Σxx | FY →X = ln |Σxx |

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50 Granger causality (g2) Comparison of VAR models Consider two different VAR models for the pair (X, Y),

n X  X A A  X  ε  t = xx,i xy,i t−i + x,t Yt Ayx,i Ayy,i Yt−i εy,t i=1

  n  0     0  Xt X Axx,i 0 Xt−i εx,t = 0 + 0 Yt 0 A Yt−i ε i=1 yy,i y,t The G-causality log-likelihood statistic is defined as

0 |Σxx | FY →X = ln |Σxx | Covariance of X model residuals given no dependence on Y

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50 Granger causality (g2) Comparison of VAR models Consider two different VAR models for the pair (X, Y),

n X  X A A  X  ε  t = xx,i xy,i t−i + x,t Yt Ayx,i Ayy,i Yt−i εy,t i=1

  n  0     0  Xt X Axx,i 0 Xt−i εx,t = 0 + 0 Yt 0 A Yt−i ε i=1 yy,i y,t The G-causality log-likelihood statistic is defined as

0 |Σxx | FY →X = ln |Σxx |

Covariance of X model residuals given a possible dependence on Y

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50 Operational causality (Granger) X causes Y if the X-dependent forecast of Y decreases the Y model residual covariance (as compared to the X-independent forecast) more than the Y-dependent forecast of X decreases the X model residual covariance (as compared to the Y-independent forecast); i.e.,

FX →Y − FY →X > 0 ⇒ X → Y

FX →Y − FY →X < 0 ⇒ Y → X

FX →Y − FY →X = 0 ⇒ no causal inference

Granger causality (g2) G-causality log-likelihood statistic If both VAR models fit (or “forecast”) the data equally well, then there is no G-causality; i.e.,

0 |Σxx | = |Σxx | ⇒ FY →X = 0

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 21 / 50 Granger causality (g2) G-causality log-likelihood statistic If both VAR models fit (or “forecast”) the data equally well, then there is no G-causality; i.e.,

0 |Σxx | = |Σxx | ⇒ FY →X = 0

Operational causality (Granger) X causes Y if the X-dependent forecast of Y decreases the Y model residual covariance (as compared to the X-independent forecast) more than the Y-dependent forecast of X decreases the X model residual covariance (as compared to the Y-independent forecast); i.e.,

FX →Y − FY →X > 0 ⇒ X → Y

FX →Y − FY →X < 0 ⇒ Y → X

FX →Y − FY →X = 0 ⇒ no causal inference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 21 / 50 Pairwise asymmetric inference (g3) State space reconstruction

Consider an embedding of the time series X = {xt | t = 0, 1 ..., L − 1, L} constructed from delayed time steps as

X˜ = {x˜t | t = 1 + (E − 1)τ, . . . , L}

with  x˜t = xt , xt−τ , xt−2τ ,..., xt−(E−1)τ

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 22 / 50 Pairwise asymmetric inference (g3) State space reconstruction

Consider an embedding of the time series X = {xt | t = 0, 1 ..., L − 1, L} constructed from delayed time steps as

X˜ = {x˜t | t = 1 + (E − 1)τ, . . . , L}

with   x˜ = x , x , x ,..., x t t t− τ t−2 τ t−(E−1) τ

I τ is the delay time step

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 22 / 50 Pairwise asymmetric inference (g3) State space reconstruction

Consider an embedding of the time series X = {xt | t = 0, 1 ..., L − 1, L} constructed from delayed time steps as

X˜ = {x˜t | t = 1 + (E − 1)τ, . . . , L}

with   x˜t = xt , xt−τ , xt−2τ ,..., x t−( E −1)τ

I τ is the delay time step

I E is the embedding dimension

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 22 / 50 Pairwise asymmetric inference (g3) Cross-mapping

Consider a time series pair (X, Y). The shadow manifold of X (labeled X˜) is constructed from the points

x˜t = (xt , xt−τ , xt−2τ ,..., xt−(E−1)τ , yt )

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50 Pairwise asymmetric inference (g3) Cross-mapping

Consider a time series pair (X, Y). The shadow manifold of X (labeled X˜) is constructed from the points

x˜t = (xt , xt−τ , xt−2τ ,..., xt−(E−1)τ , yt )

1. Find the n nearest neighbors tox ˜t (in X˜), where “nearest” means smallest Euclidean distance, d; i.e., d1 < d2 < . . . < dn

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50 Pairwise asymmetric inference (g3) Cross-mapping

Consider a time series pair (X, Y). The shadow manifold of X (labeled X˜) is constructed from the points

x˜t = (xt , xt−τ , xt−2τ ,..., xt−(E−1)τ , yt )

2. Create weights,w, from the nearest neighbors as

− di e d1 w = i dj Pn − d j=1 e 1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50 Pairwise asymmetric inference (g3) Cross-mapping

Consider a time series pair (X, Y). The shadow manifold of X (labeled X˜) is constructed from the points

x˜t = (xt , xt−τ , xt−2τ ,..., xt−(E−1)τ , yt )

3. Construct the cross-mapped estimate of Y using the weights as

( n ) X Y|X˜ = Y |X˜ = w Y | t = 1 + (E − 1)τ, . . . , L t i tˆi i=1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50 Pairwise asymmetric inference (g3) Cross-mapping

Consider a time series pair (X, Y). The shadow manifold of X (labeled X˜) is constructed from the points

x˜t = (xt , xt−τ , xt−2τ ,..., xt−(E−1)τ , yt )

Each cross-mapped point in the estimate of Y, i.e.,

n − di /d1 ˜ X e Yt |X = n Ytˆ P e−dj /d1 i i=1 j=1 depends on comparisons of

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50 Pairwise asymmetric inference (g3) Cross-mapping

Consider a time series pair (X, Y). The shadow manifold of X (labeled X˜) is constructed from the points

x˜t = (xt , xt−τ , xt−2τ ,..., xt−(E−1)τ , yt )

Each cross-mapped point in the estimate of Y, i.e.,

n −di /d1 ˜ X e Yt |X = n Ytˆ P e−dj /d1 i i=1 j=1 depends on comparisons of the pasts of X

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50 Pairwise asymmetric inference (g3) Cross-mapping

Consider a time series pair (X, Y). The shadow manifold of X (labeled X˜) is constructed from the points

x˜t = ( xt , xt−τ , xt−2τ ,..., xt−(E−1)τ , yt )

Each cross-mapped point in the estimate of Y, i.e.,

n −di /d1 ˜ X e Yt |X = n Ytˆ P e−dj /d1 i i=1 j=1 depends on comparisons of the pasts of X and the presents of X andY .

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50 Cross-mapping interpretation If similar histories of X (i.e., nearest neighbors in the shadow manifold) capably estimate Y (i.e., lead to CYX ≈ 1, or at least CYX 6= 0), then the presence (or action) of Y in the system has been recorded in X.

Pairwise asymmetric inference (g3) Cross-mapped correlation

A good cross-mapped estimate is defined as one that is strongly correlated with the original times series. The cross-mapped correlation is

h i2 CYX = ρ(Y, Y|X˜)

where ρ (·) is Pearson’s correlation coefficient.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 24 / 50 Pairwise asymmetric inference (g3) Cross-mapped correlation

A good cross-mapped estimate is defined as one that is strongly correlated with the original times series. The cross-mapped correlation is

h i2 CYX = ρ(Y, Y|X˜)

where ρ (·) is Pearson’s correlation coefficient. Cross-mapping interpretation If similar histories of X (i.e., nearest neighbors in the shadow manifold) capably estimate Y (i.e., lead to CYX ≈ 1, or at least CYX 6= 0), then the presence (or action) of Y in the system has been recorded in X.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 24 / 50 Operational causality (SSR) X causes Y if similar histories of Y estimate X better than similar histories of X estimate Y, where the “similar histories” of one time series are used to estimate another time series through shadow manifold nearest neighbor weighting (cross-mapping); i.e.,

CYX − CXY < 0 ⇒ X → Y

CYX − CXY > 0 ⇒ Y → X

CYX − CXY = 0 ⇒ no causal inference

Pairwise asymmetric inference (g3) Cross-mapping interpretation of causality

A time series pair (X, Y) will have two cross-mapped correlations, CYX and CXY .

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 25 / 50 Pairwise asymmetric inference (g3) Cross-mapping interpretation of causality

A time series pair (X, Y) will have two cross-mapped correlations, CYX and CXY . Operational causality (SSR) X causes Y if similar histories of Y estimate X better than similar histories of X estimate Y, where the “similar histories” of one time series are used to estimate another time series through shadow manifold nearest neighbor weighting (cross-mapping); i.e.,

CYX − CXY < 0 ⇒ X → Y

CYX − CXY > 0 ⇒ Y → X

CYX − CXY = 0 ⇒ no causal inference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 25 / 50 Weighted mean observed leaning (g4) Causal penchant

The causal penchant ρEC ∈ [1, −1] is  ρEC = P (E|C) − P E|C¯

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50 Weighted mean observed leaning (g4) Causal penchant

The causal penchant ρEC ∈ [1, −1] is  ρEC = P (E|C) − P E|C¯

P (E|C) is the probability of some effect E given some cause C

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50 Weighted mean observed leaning (g4) Causal penchant

The causal penchant ρEC ∈ [1, −1] is

 ρEC = P (E|C) − P E|C¯

P E|C¯ is the probability of some effect E given no cause C

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50 Weighted mean observed leaning (g4) Causal penchant

The causal penchant ρEC ∈ [1, −1] is  ρEC = P (E|C) − P E|C¯

So, the penchant is the probability of an effect E given a cause C minus the probability of that effect without the cause

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50 Weighted mean observed leaning (g4) Causal penchant

The causal penchant ρEC ∈ [1, −1] is  ρEC = P (E|C) − P E|C¯

In the psychology/medical literature, the causal penchant is known as the Eells measure of causal strength or probability contrast.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50 Weighted mean observed leaning (g4) Causal penchant

The causal penchant ρEC ∈ [1, −1] is  ρEC = P (E|C) − P E|C¯

If C drives E, then it is expected that ρEC > 0.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50 Weighted mean observed leaning (g4) Causal penchant

The causal penchant ρEC ∈ [1, −1] is

 ρEC = P (E|C) − P E|C¯

The second term, P E|C¯, can be eliminated from the penchant formula using Bayes theorem.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50 Weighted mean observed leaning (g4) Causal penchant

The causal penchant ρEC ∈ [1, −1] is

 P(C)  P(E) ρ = P(E|C) 1 + − EC 1 − P(C) 1 − P(C)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50 Weighted mean observed leaning (g4) Causal penchant

The causal penchant ρEC ∈ [1, −1] is

 P(C)  P(E) ρ = P(E|C) 1 + − EC 1 − P(C) 1 − P(C)

If E and C are independent, then P(E|C) = P(E), which implies

P(E)P(C) − P(E) ρ = P(E) + = P(E) − P(E) = 0 EC 1 − P(C)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50 Weighted mean observed leaning (g4) Causal penchant The causal penchant ρEC ∈ [1, −1] is

 P(C)  P(E) ρ = P(E|C) 1 + − EC 1 − P(C) 1 − P(C)

Example (to help with intuition)

Consider C and E to be two fair coins, c1 and c2, being “heads”; i.e., 00 00 P(c1 = “heads ) = 0.5 and P(c2 = “heads ) = 0.5. If the coins are independent, then

00 00 00 P(c2 = “heads |c1 = “heads ) = P(c2 = “heads ) = 0.5 ⇒ ρEC = 0

If they are completely dependent then

00 00 P(c2 = “heads |c1 = “heads ) = 1 or 0 ⇒ ρEC = 1 or − 1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50 Weighted mean observed leaning (g4) Causal penchant

The causal penchant ρEC ∈ [1, −1] is

 P(C)  P(E) ρ = P(E|C) 1 + − EC 1 − P(C) 1 − P(C)

This formula has the additional benefit of only needing to estimate one conditional probability from the data.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50 Weighted mean observed leaning (g4) Causal leaning

A difference of penchants can be used to compare different cause-effect assignments (i.e., different assumptions of what should be considered a cause and what should be considered an effect). The leaning is

λEC = ρEC − ρCE

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 27 / 50 Weighted mean observed leaning (g4) Causal leaning

A difference of penchants can be used to compare different cause-effect assignments (i.e., different assumptions of what should be considered a cause and what should be considered an effect). The leaning is

λEC = ρEC − ρCE

Leaning interpretation

If λEC > 0, then C drives E more than E drives C.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 27 / 50 1. Operational definitions of C and E (called the cause-effect assignment) 2. Estimations of P(C), P(E), P(C|E), and P(E|C) from the data

The primary cause-effect assignment will be the l-standard assignment, l-standard assignment Consider a time series pair (X, Y). The l-standard assignment initially assumes the cause is the l lagged time step of X and the effect is the current time step of Y; i.e., {C, E} = {xt−l , yt }.

Probabilities will estimated using data frequency counts.

Weighted mean observed leaning (g4) Usefulness of the leaning

The usefulness of the leaning depends on two things,

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50 2. Estimations of P(C), P(E), P(C|E), and P(E|C) from the data

The primary cause-effect assignment will be the l-standard assignment, l-standard assignment Consider a time series pair (X, Y). The l-standard assignment initially assumes the cause is the l lagged time step of X and the effect is the current time step of Y; i.e., {C, E} = {xt−l , yt }.

Probabilities will estimated using data frequency counts.

Weighted mean observed leaning (g4) Usefulness of the leaning

The usefulness of the leaning depends on two things, 1. Operational definitions of C and E (called the cause-effect assignment)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50 The primary cause-effect assignment will be the l-standard assignment, l-standard assignment Consider a time series pair (X, Y). The l-standard assignment initially assumes the cause is the l lagged time step of X and the effect is the current time step of Y; i.e., {C, E} = {xt−l , yt }.

Probabilities will estimated using data frequency counts.

Weighted mean observed leaning (g4) Usefulness of the leaning

The usefulness of the leaning depends on two things, 1. Operational definitions of C and E (called the cause-effect assignment) 2. Estimations of P(C), P(E), P(C|E), and P(E|C) from the data

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50 Probabilities will estimated using data frequency counts.

Weighted mean observed leaning (g4) Usefulness of the leaning

The usefulness of the leaning depends on two things, 1. Operational definitions of C and E (called the cause-effect assignment) 2. Estimations of P(C), P(E), P(C|E), and P(E|C) from the data

The primary cause-effect assignment will be the l-standard assignment, l-standard assignment Consider a time series pair (X, Y). The l-standard assignment initially assumes the cause is the l lagged time step of X and the effect is the current time step of Y; i.e., {C, E} = {xt−l , yt }.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50 Weighted mean observed leaning (g4) Usefulness of the leaning

The usefulness of the leaning depends on two things, 1. Operational definitions of C and E (called the cause-effect assignment) 2. Estimations of P(C), P(E), P(C|E), and P(E|C) from the data

The primary cause-effect assignment will be the l-standard assignment, l-standard assignment Consider a time series pair (X, Y). The l-standard assignment initially assumes the cause is the l lagged time step of X and the effect is the current time step of Y; i.e., {C, E} = {xt−l , yt }.

Probabilities will estimated using data frequency counts.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50 Weighted mean observed leaning (g4) Leaning from the data

The cause-effect assignment must be specific if the probabilities are to be estimated with frequency counts and need to include tolerance domains to account for noise in the measurements.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50 Weighted mean observed leaning (g4) Leaning from the data

Consider the time series pair (X, Y). The penchant calculation depends on the conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. This conditional will be estimated as

L R L R na∩b P(yt ∈ [a − δy , a + δy ]|xt−l ∈ [b − δx , b + δx ]) = nb

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50 Weighted mean observed leaning (g4) Leaning from the data

Consider the time series pair (X, Y). The penchant calculation depends on the conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. This conditional will be estimated as n L R L R a∩b P(yt ∈ [a − δy , a + δy ]|xt−l ∈ [b − δx , b + δx ]) = nb

L R na∩b is the number of times yt ∈ [a − δy , a + δy ] and L R xt−l ∈ [b − δx , b + δx ] in (X, Y)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50 Weighted mean observed leaning (g4) Leaning from the data

Consider the time series pair (X, Y). The penchant calculation depends on the conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. This conditional will be estimated as

L R L R na∩b P(yt ∈ [a − δy , a + δy ]|xt−l ∈ [b − δx , b + δx ]) = nb

L R nb is the number of times xt−l ∈ [b − δx , b + δx ] in X

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50 Weighted mean observed leaning (g4) Leaning from the data

Consider the time series pair (X, Y). The penchant calculation depends on the conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. This conditional will be estimated as

L R L R na∩b P(yt ∈ [a − δy , a + δy ]|xt−l ∈ [b − δx , b + δx ]) = nb

L R The tolerance domains are usually considered symmetric; i.e., δx = δx and L R δy = δy

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50 Weighted mean observed leaning (g4) Leaning from the data

Consider the time series pair (X, Y). The penchant calculation depends on the conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. This conditional will be estimated as

L R L R na∩b P(yt ∈ [a − δy , a + δy ]|xt−l ∈ [b − δx , b + δx ]) = nb

The causal inference implied by the leaning calculations are dependent on both the cause-effect assignment and the tolerance domains.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50 Weighted mean observed leaning (g4) Weighted mean

Any time series pair (X, Y) will have many leanings; e.g., an l-standard assignment of {C, E} = {xt−l = b ± δx , yt = a ± δy } will have a different leaning calculation for each xt−1 ∈ [b − δx , b + δx ] and yt ∈ [a − δy , a + δy ].

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50 Weighted mean observed leaning (g4) Weighted mean

Consider a time series pair (X, Y) and some cause-effect assignment {C, E} for which reasonable tolerance domains have been defined.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50 Weighted mean observed leaning (g4) Weighted mean

Consider a time series pair (X, Y) and some cause-effect assignment {C, E} for which reasonable tolerance domains have been defined. Any penchant calculation for which the (estimated) conditional P(E|C) 6= 0 (or P(C|E) 6= 0) is called an observed penchant.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50 Weighted mean observed leaning (g4) Weighted mean

Consider a time series pair (X, Y) and some cause-effect assignment {C, E} for which reasonable tolerance domains have been defined. Any penchant calculation for which the (estimated) conditional P(E|C) 6= 0 (or P(C|E) 6= 0) is called an observed penchant.

The weighed mean observed penchant, hρEC iw , is the weighed algebraic mean of the observed penchants.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50 Weighted mean observed leaning (g4) Weighted mean

Consider a time series pair (X, Y) and some cause-effect assignment {C, E} for which reasonable tolerance domains have been defined. Any penchant calculation for which the (estimated) conditional P(E|C) 6= 0 (or P(C|E) 6= 0) is called an observed penchant.

The weighed mean observed penchant, hρEC iw , is the weighed algebraic mean of the observed penchants.

The weighed mean observed leaning, hλEC iw , is the difference of the weighed mean observed penchants; i.e., hλEC iw = hρEC iw − hρCE iw

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50 Weighted mean observed leaning (g4) Causal inference

Operational causality (penchant) X causes Y if the weighted mean observed leaning is positive given a cause-effect assignment (and reasonable tolerance domains) in which the assumed cause X precedes the assumed effect Y; i.e.,

hλEC iw > 0 ⇒ X → Y

hλEC iw < 0 ⇒ Y → X

hλEC iw = 0 ⇒ no causal inference

given C ∈ X, E ∈ Y, and C precedes E.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 31 / 50 Lagged cross-correlation difference (g5) Cross-correlation

The cross-correlation between two time series X and Y is E [(x − µ )(y − µ )] ρxy = t X t Y q 2 2 σX σY

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50 Lagged cross-correlation difference (g5) Cross-correlation

The cross-correlation between two time series X and Y is h  i E xt − µX (yt − µY ) ρxy = q 2 2 σX σY

Every point in X is compared to the mean of X

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50 Lagged cross-correlation difference (g5) Cross-correlation

The cross-correlation between two time series X and Y is h  i E (xt − µX ) yt − µY ρxy = q 2 2 σX σY

Every point in Y is compared to the mean of Y

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50 Lagged cross-correlation difference (g5) Cross-correlation

The cross-correlation between two time series X and Y is

xy E [(xt − µX )(yt − µY )] ρ = r 2 2 σX σY

The product of the individual variances of X and Y is used as a normalization

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50 Lagged cross-correlation difference (g5) Cross-correlation

The cross-correlation between two time series X and Y is E [(x − µ )(y − µ )] ρxy = t X t Y q 2 2 σX σY

Example (to help with intuition)

h 2i E (xt − µX ) 2 xy E [(xt − µX )(yt − µY )] σX X = Y ⇒ ρ = q = 2 = 2 = 1 2 2 σX σX σX σY

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50 Operational causality (correlation) X causes Y (at lag l) if the past of X (i.e., X lagged by l time steps) is more strongly correlated with the present of Y than the past of Y (i.e., Y lagged by l time steps) is with the present of X; i.e.,

xy yx |ρl | − |ρl | < 0 ⇒ X → Y xy yx |ρl | − |ρl | > 0 ⇒ Y → X xy yx |ρl | − |ρl | = 0 ⇒ no causal inference

Lagged cross-correlation difference (g5) Lagged cross-correlation Consider a time series pair (X, Y). The past of Y may be compared to the present of X by introducing a lag l into the cross-correlation calculation,

xy E [(xt − µX )(yt−l − µY )] ρl = q 2 2 σX σY

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 33 / 50 Lagged cross-correlation difference (g5) Lagged cross-correlation Consider a time series pair (X, Y). The past of Y may be compared to the present of X by introducing a lag l into the cross-correlation calculation,

xy E [(xt − µX )(yt−l − µY )] ρl = q 2 2 σX σY

Operational causality (correlation) X causes Y (at lag l) if the past of X (i.e., X lagged by l time steps) is more strongly correlated with the present of Y than the past of Y (i.e., Y lagged by l time steps) is with the present of X; i.e.,

xy yx |ρl | − |ρl | < 0 ⇒ X → Y xy yx |ρl | − |ρl | > 0 ⇒ Y → X xy yx |ρl | − |ρl | = 0 ⇒ no causal inference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 33 / 50 Computational tools for the ECA summary

Open source packages are available for some of the mentioned times series causality tools and others required code to be develop from scratch.

g1 Java Information Dynamics Toolkit (JIDT) g2 Multivariate Granger Causality (MVGC) MATLAB toolbox g3 (C++) g4 (MATLAB) g5 (MATLAB)

All the code is available at https://github.com/jmmccracken

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 34 / 50 The intuitive causal inference is Y → X.

Cooling/Heating System Data Time series data Consider a time series pair (X, Y) where X are indoor temperature measurements (in degrees Celsius) in a house with “experimental” environmental controls and Y is the temperature outside of that house, measured at the same time intervals (168 measurements in each series)3 27 30

26 25 25 20 24 t t

y 15 x 23 10 22

21 5

20 0 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 t t

X Y

3 This data was originally presented at a time series conference. The abstract is available here, http://www.osti.gov/scitech/biblio/5231321 . The data is also available as part of the UCI Machine Learning Repository. J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 35 / 50 Cooling/Heating System Data Time series data Consider a time series pair (X, Y) where X are indoor temperature measurements (in degrees Celsius) in a house with “experimental” environmental controls and Y is the temperature outside of that house, measured at the same time intervals (168 measurements in each series)3 27 30

26 25 25 20 24 t t

y 15 x 23 10 22

21 5

20 0 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 t t

X Y The intuitive causal inference is Y → X.

3 This data was originally presented at a time series conference. The abstract is available here, http://www.osti.gov/scitech/biblio/5231321 . The data is also available as part of the UCI Machine Learning Repository. J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 35 / 50 I embedding dimension and time delay for g3 (PAI)

I cause-effect assignment and tolerance domains for g4 (leaning)

I lags for g5 (cross-correlation)

The embedding dimension will be set (somewhat arbitrarily) to E = 10 and the time delay will be τ = 1. The tolerance domains will be the f -width tolerance domains; i.e., ±δx = f (max(X) − min(X)) and ±δy = f (max(Y) − min(Y)). For this example, f = 1/4. The cause-effect assignment will be the l-standard assignment, but there is still the problem of determining relevant lags l.

Cooling/Heating System Data ECA summary preliminaries

An ECA summary requires several parameters be set from the data, including

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50 I cause-effect assignment and tolerance domains for g4 (leaning)

I lags for g5 (cross-correlation)

The embedding dimension will be set (somewhat arbitrarily) to E = 10 and the time delay will be τ = 1. The tolerance domains will be the f -width tolerance domains; i.e., ±δx = f (max(X) − min(X)) and ±δy = f (max(Y) − min(Y)). For this example, f = 1/4. The cause-effect assignment will be the l-standard assignment, but there is still the problem of determining relevant lags l.

Cooling/Heating System Data ECA summary preliminaries

An ECA summary requires several parameters be set from the data, including

I embedding dimension and time delay for g3 (PAI)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50 I lags for g5 (cross-correlation)

The embedding dimension will be set (somewhat arbitrarily) to E = 10 and the time delay will be τ = 1. The tolerance domains will be the f -width tolerance domains; i.e., ±δx = f (max(X) − min(X)) and ±δy = f (max(Y) − min(Y)). For this example, f = 1/4. The cause-effect assignment will be the l-standard assignment, but there is still the problem of determining relevant lags l.

Cooling/Heating System Data ECA summary preliminaries

An ECA summary requires several parameters be set from the data, including

I embedding dimension and time delay for g3 (PAI)

I cause-effect assignment and tolerance domains for g4 (leaning)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50 The embedding dimension will be set (somewhat arbitrarily) to E = 10 and the time delay will be τ = 1. The tolerance domains will be the f -width tolerance domains; i.e., ±δx = f (max(X) − min(X)) and ±δy = f (max(Y) − min(Y)). For this example, f = 1/4. The cause-effect assignment will be the l-standard assignment, but there is still the problem of determining relevant lags l.

Cooling/Heating System Data ECA summary preliminaries

An ECA summary requires several parameters be set from the data, including

I embedding dimension and time delay for g3 (PAI)

I cause-effect assignment and tolerance domains for g4 (leaning)

I lags for g5 (cross-correlation)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50 The tolerance domains will be the f -width tolerance domains; i.e., ±δx = f (max(X) − min(X)) and ±δy = f (max(Y) − min(Y)). For this example, f = 1/4. The cause-effect assignment will be the l-standard assignment, but there is still the problem of determining relevant lags l.

Cooling/Heating System Data ECA summary preliminaries

An ECA summary requires several parameters be set from the data, including

I embedding dimension and time delay for g3 (PAI)

I cause-effect assignment and tolerance domains for g4 (leaning)

I lags for g5 (cross-correlation)

The embedding dimension will be set (somewhat arbitrarily) to E = 10 and the time delay will be τ = 1.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50 The cause-effect assignment will be the l-standard assignment, but there is still the problem of determining relevant lags l.

Cooling/Heating System Data ECA summary preliminaries

An ECA summary requires several parameters be set from the data, including

I embedding dimension and time delay for g3 (PAI)

I cause-effect assignment and tolerance domains for g4 (leaning)

I lags for g5 (cross-correlation)

The embedding dimension will be set (somewhat arbitrarily) to E = 10 and the time delay will be τ = 1. The tolerance domains will be the f -width tolerance domains; i.e., ±δx = f (max(X) − min(X)) and ±δy = f (max(Y) − min(Y)). For this example, f = 1/4.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50 Cooling/Heating System Data ECA summary preliminaries

An ECA summary requires several parameters be set from the data, including

I embedding dimension and time delay for g3 (PAI)

I cause-effect assignment and tolerance domains for g4 (leaning)

I lags for g5 (cross-correlation)

The embedding dimension will be set (somewhat arbitrarily) to E = 10 and the time delay will be τ = 1. The tolerance domains will be the f -width tolerance domains; i.e., ±δx = f (max(X) − min(X)) and ±δy = f (max(Y) − min(Y)). For this example, f = 1/4. The cause-effect assignment will be the l-standard assignment, but there is still the problem of determining relevant lags l.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50 This observation will be used justify using lags of l = 1, 2, ··· , 7 for both g4 (leaning) and g5 (cross-correlation).

Cooling/Heating System Data Autocorrelations

There are autocorrelations in both time series (only 50 lags are shown), 1 1

0.8 0.8

2 0.6 2 0.6 )| )| t t ,x ,y t−l t−l 0.4 0.4 |r(x |r(y

0.2 0.2

0 0 0 10 20 30 40 50 0 10 20 30 40 50 l l

X Y The autocorrelations appear cyclic and initially drop to zero around l = 7 for both time series.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 37 / 50 Cooling/Heating System Data Autocorrelations

There are autocorrelations in both time series (only 50 lags are shown), 1 1

0.8 0.8

2 0.6 2 0.6 )| )| t t ,x ,y t−l t−l 0.4 0.4 |r(x |r(y

0.2 0.2

0 0 0 10 20 30 40 50 0 10 20 30 40 50 l l

X Y The autocorrelations appear cyclic and initially drop to zero around l = 7 for both time series. This observation will be used justify using lags of l = 1, 2, ··· , 7 for both g4 (leaning) and g5 (cross-correlation).

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 37 / 50 There are 7 different causal inferences in this plot, all of which agree except l = 7. A single causal inference (for each tool) will be found with the algebraic mean across all the tested lags.

Cooling/Heating System Data Lagged cross-correlations and leanings The lagged cross-correlations and leaning (using the l-standard assignment) can be plotted for each tested lag,

1

〈 λl 〉 0.8 ∆l 0.6

0.4

0.2

0

−0.2

−0.4 1 2 3 4 5 6 7 l

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 38 / 50 A single causal inference (for each tool) will be found with the algebraic mean across all the tested lags.

Cooling/Heating System Data Lagged cross-correlations and leanings The lagged cross-correlations and leaning (using the l-standard assignment) can be plotted for each tested lag,

1

〈 λl 〉 0.8 ∆l 0.6

0.4

0.2

0

−0.2

−0.4 1 2 3 4 5 6 7 l

There are 7 different causal inferences in this plot, all of which agree except l = 7.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 38 / 50 Cooling/Heating System Data Lagged cross-correlations and leanings The lagged cross-correlations and leaning (using the l-standard assignment) can be plotted for each tested lag,

1

〈 λl 〉 0.8 ∆l 0.6

0.4

0.2

0

−0.2

−0.4 1 2 3 4 5 6 7 l

There are 7 different causal inferences in this plot, all of which agree except l = 7. A single causal inference (for each tool) will be found with the algebraic mean across all the tested lags. J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 38 / 50 TX →Y − TY →X = −0.14 ⇒ Y → X ⇒ FX →Y − FY →X = −0.35 ⇒ Y → X ⇒ −4 CYX − CXY = 3.1 × 10 ⇒ Y → X ⇒ hhλEC iw i = −0.20 ⇒ Y → X ⇒ xy yx h|ρl | − |ρl |i = 0.40 ⇒ Y → X ⇒

Cooling/Heating System Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50 FX →Y − FY →X = −0.35 ⇒ Y → X ⇒ −4 CYX − CXY = 3.1 × 10 ⇒ Y → X ⇒ hhλEC iw i = −0.20 ⇒ Y → X ⇒ xy yx h|ρl | − |ρl |i = 0.40 ⇒ Y → X ⇒

Cooling/Heating System Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

TX →Y − TY →X = −0.14 ⇒ Y → X ⇒ g1 = 1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50 −4 CYX − CXY = 3.1 × 10 ⇒ Y → X ⇒ hhλEC iw i = −0.20 ⇒ Y → X ⇒ xy yx h|ρl | − |ρl |i = 0.40 ⇒ Y → X ⇒

Cooling/Heating System Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

TX →Y − TY →X = −0.14 ⇒ Y → X ⇒ g1 = 1 FX →Y − FY →X = −0.35 ⇒ Y → X ⇒ g2 = 1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50 hhλEC iw i = −0.20 ⇒ Y → X ⇒ xy yx h|ρl | − |ρl |i = 0.40 ⇒ Y → X ⇒

Cooling/Heating System Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

TX →Y − TY →X = −0.14 ⇒ Y → X ⇒ g1 = 1 FX →Y − FY →X = −0.35 ⇒ Y → X ⇒ g2 = 1 −4 CYX − CXY = 3.1 × 10 ⇒ Y → X ⇒ g3 = 1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50 xy yx h|ρl | − |ρl |i = 0.40 ⇒ Y → X ⇒

Cooling/Heating System Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

TX →Y − TY →X = −0.14 ⇒ Y → X ⇒ g1 = 1 FX →Y − FY →X = −0.35 ⇒ Y → X ⇒ g2 = 1 −4 CYX − CXY = 3.1 × 10 ⇒ Y → X ⇒ g3 = 1 hhλEC iw i = −0.20 ⇒ Y → X ⇒ g4 = 1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50 Cooling/Heating System Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

TX →Y − TY →X = −0.14 ⇒ Y → X ⇒ g1 = 1 FX →Y − FY →X = −0.35 ⇒ Y → X ⇒ g2 = 1 −4 CYX − CXY = 3.1 × 10 ⇒ Y → X ⇒ g3 = 1 hhλEC iw i = −0.20 ⇒ Y → X ⇒ g4 = 1 xy yx h|ρl | − |ρl |i = 0.40 ⇒ Y → X ⇒ g5 = 1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50 Cooling/Heating System Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

TX →Y − TY →X = −0.14 ⇒ Y → X ⇒ g1 = 1

FX →Y − FY →X = −0.35 ⇒ Y → X ⇒ g2 = 1 −4 CYX − CXY = 3.1 × 10 ⇒ Y → X ⇒ g3 = 1

hhλEC iw i = −0.20 ⇒ Y → X ⇒ g4 = 1 xy yx h|ρl | − |ρl |i = 0.40 ⇒ Y → X ⇒ g5 = 1

∴ the ECA summary is Y → X, which agrees with intuition

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50 The intuitive causal inference is X → Y.

Snowfall Data Time series data Consider a time series pair (X, Y) where X is the mean daily temperature (in degrees Celsius) at Whistler, BC, Canada, and Y is the total snowfall (in centimeters) (7,753 measurements in each series)4 30 120

20 100

10 80 t t

x 0 y 60

−10 40

−20 20

−30 0 0 1000 2000 3000 4000 5000 6000 7000 8000 0 1000 2000 3000 4000 5000 6000 7000 8000 t t

X Y

3 This data is available as part of the UCI Machine Learning Repository. The data was recorded from July 1, 1972 to December 31, 2009. J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 40 / 50 Snowfall Data Time series data Consider a time series pair (X, Y) where X is the mean daily temperature (in degrees Celsius) at Whistler, BC, Canada, and Y is the total snowfall (in centimeters) (7,753 measurements in each series)4 30 120

20 100

10 80 t t

x 0 y 60

−10 40

−20 20

−30 0 0 1000 2000 3000 4000 5000 6000 7000 8000 0 1000 2000 3000 4000 5000 6000 7000 8000 t t

X Y

The intuitive causal inference is X → Y.

3 This data is available as part of the UCI Machine Learning Repository. The data was recorded from July 1, 1972 to December 31, 2009. J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 40 / 50 Snowfall Data ECA summary preliminaries

The ECA summary can be made with similar parameters as the previous example,

I The embedding dimension will be E = 100 with a time delay of τ = 1

I The cause-effect assignment will be the l-standard assignment

I The tolerance domains will be the 1/4-width domains

I The tested lags will be l = 1, 2,..., 20

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 41 / 50 −2 TX →Y − TY →X = 2.1 × 10 ⇒ X → Y ⇒ −3 FX →Y − FY →X = −2.6 × 10 ⇒ Y → X ⇒ −2 CYX − CXY = −3.4 × 10 ⇒ X → Y ⇒ −2 hhλEC iw i = 3.7 × 10 ⇒ X → Y ⇒ xy yx −2 h|ρl | − |ρl |i = 2.3 × 10 ⇒ Y → X ⇒

Snowfall Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50 −3 FX →Y − FY →X = −2.6 × 10 ⇒ Y → X ⇒ −2 CYX − CXY = −3.4 × 10 ⇒ X → Y ⇒ −2 hhλEC iw i = 3.7 × 10 ⇒ X → Y ⇒ xy yx −2 h|ρl | − |ρl |i = 2.3 × 10 ⇒ Y → X ⇒

Snowfall Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

−2 TX →Y − TY →X = 2.1 × 10 ⇒ X → Y ⇒ g1 = 0

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50 −2 CYX − CXY = −3.4 × 10 ⇒ X → Y ⇒ −2 hhλEC iw i = 3.7 × 10 ⇒ X → Y ⇒ xy yx −2 h|ρl | − |ρl |i = 2.3 × 10 ⇒ Y → X ⇒

Snowfall Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

−2 TX →Y − TY →X = 2.1 × 10 ⇒ X → Y ⇒ g1 = 0 −3 FX →Y − FY →X = −2.6 × 10 ⇒ Y → X ⇒ g2 = 1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50 −2 hhλEC iw i = 3.7 × 10 ⇒ X → Y ⇒ xy yx −2 h|ρl | − |ρl |i = 2.3 × 10 ⇒ Y → X ⇒

Snowfall Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

−2 TX →Y − TY →X = 2.1 × 10 ⇒ X → Y ⇒ g1 = 0 −3 FX →Y − FY →X = −2.6 × 10 ⇒ Y → X ⇒ g2 = 1 −2 CYX − CXY = −3.4 × 10 ⇒ X → Y ⇒ g3 = 0

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50 xy yx −2 h|ρl | − |ρl |i = 2.3 × 10 ⇒ Y → X ⇒

Snowfall Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

−2 TX →Y − TY →X = 2.1 × 10 ⇒ X → Y ⇒ g1 = 0 −3 FX →Y − FY →X = −2.6 × 10 ⇒ Y → X ⇒ g2 = 1 −2 CYX − CXY = −3.4 × 10 ⇒ X → Y ⇒ g3 = 0 −2 hhλEC iw i = 3.7 × 10 ⇒ X → Y ⇒ g4 = 0

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50 Snowfall Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

−2 TX →Y − TY →X = 2.1 × 10 ⇒ X → Y ⇒ g1 = 0 −3 FX →Y − FY →X = −2.6 × 10 ⇒ Y → X ⇒ g2 = 1 −2 CYX − CXY = −3.4 × 10 ⇒ X → Y ⇒ g3 = 0 −2 hhλEC iw i = 3.7 × 10 ⇒ X → Y ⇒ g4 = 0 xy yx −2 h|ρl | − |ρl |i = 2.3 × 10 ⇒ Y → X ⇒ g5 = 1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50 Snowfall Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

−2 TX →Y − TY →X = 2.1 × 10 ⇒ X → Y ⇒ g1 = 0 −3 FX →Y − FY →X = −2.6 × 10 ⇒ Y → X ⇒ g2 = 1 −2 CYX − CXY = −3.4 × 10 ⇒ X → Y ⇒ g3 = 0 −2 hhλEC iw i = 3.7 × 10 ⇒ X → Y ⇒ g4 = 0 xy yx −2 h|ρl | − |ρl |i = 2.3 × 10 ⇒ Y → X ⇒ g5 = 1

∴ the ECA summary is undefined

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50 Snowfall Data Making an ECA summary

Each of the five time series tools leads to a causal inference in the ECA summary vector,

−2 TX →Y − TY →X = 2.1 × 10 ⇒ X → Y ⇒ g1 = 0 −3 FX →Y − FY →X = −2.6 × 10 ⇒ Y → X ⇒ g2 = 1 −2 CYX − CXY = −3.4 × 10 ⇒ X → Y ⇒ g3 = 0 −2 hhλEC iw i = 3.7 × 10 ⇒ X → Y ⇒ g4 = 0 xy yx −2 h|ρl | − |ρl |i = 2.3 × 10 ⇒ Y → X ⇒ g5 = 1

∴ the ECA summary is undefined The majority of the causal inferences agree with intuition.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50 Two primary objections to time series causality 1. Correlation is not causation 2. Confounding cannot be controlled

Many different tools have been developed that go beyond correlation and ignoring such tools means ignoring potentially useful inferences that can be drawn from the data.

True, but this is an issue of defining “causality”. Exploring potential causal relationships within data sets can be done with operational definitions of causality. These different may provide deeper insight into the system dynamics.

Times series causality as data analysis Objections to causal studies

Data analysis often ignores causality.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 43 / 50 Many different tools have been developed that go beyond correlation and ignoring such tools means ignoring potentially useful inferences that can be drawn from the data.

True, but this is an issue of defining “causality”. Exploring potential causal relationships within data sets can be done with operational definitions of causality. These different causalities may provide deeper insight into the system dynamics.

Times series causality as data analysis Objections to causal studies

Data analysis often ignores causality.

Two primary objections to time series causality 1. Correlation is not causation 2. Confounding cannot be controlled

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 43 / 50 True, but this is an issue of defining “causality”. Exploring potential causal relationships within data sets can be done with operational definitions of causality. These different causalities may provide deeper insight into the system dynamics.

Times series causality as data analysis Objections to causal studies

Data analysis often ignores causality.

Two primary objections to time series causality 1. Correlation is not causation 2. Confounding cannot be controlled

Many different tools have been developed that go beyond correlation and ignoring such tools means ignoring potentially useful inferences that can be drawn from the data.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 43 / 50 Times series causality as data analysis Objections to causal studies

Data analysis often ignores causality.

Two primary objections to time series causality 1. Correlation is not causation 2. Confounding cannot be controlled

Many different tools have been developed that go beyond correlation and ignoring such tools means ignoring potentially useful inferences that can be drawn from the data.

True, but this is an issue of defining “causality”. Exploring potential causal relationships within data sets can be done with operational definitions of causality. These different causalities may provide deeper insight into the system dynamics.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 43 / 50 This conclusion is drawn from a derivation of the “Liang information flow” from the transfer entropy and then applying this new formula to E and I, but this same conclusion can be drawn from the ECA summary vectors of these time series pairs, using the code presented previously with naive algorithm parameters (specifically, the parameters used in the snowfall example).

ECA summaries as practical tools Exploratory causal inference as a practical part of data analysis Consider a recent result presented in Unraveling the cause-effect relation between time series [Phys. Rev. E 90, 052150]: Liang; Section V, Ibid. “... El Ni˜noand IOD [Indian Ocean Dipole] are mutually causal, and the causality is asymmetric, with the one from the latter to the former larger than its counterpart ...” (In the language of ECA: Given the time series pair (E, I), the dominant potential driver is I; i.e., I → E)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 44 / 50 ECA summaries as practical tools Exploratory causal inference as a practical part of data analysis Consider a recent result presented in Unraveling the cause-effect relation between time series [Phys. Rev. E 90, 052150]: Liang; Section V, Ibid. “... El Ni˜noand IOD [Indian Ocean Dipole] are mutually causal, and the causality is asymmetric, with the one from the latter to the former larger than its counterpart ...” (In the language of ECA: Given the time series pair (E, I), the dominant potential driver is I; i.e., I → E)

This conclusion is drawn from a derivation of the “Liang information flow” from the transfer entropy and then applying this new formula to E and I, but this same conclusion can be drawn from the ECA summary vectors of these time series pairs, using the code presented previously with naive algorithm parameters (specifically, the parameters used in the snowfall example).

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 44 / 50 J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 45 / 50 BACK-UP

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 46 / 50 Impulse with linear response

Consider {X, Y} = {{xt }, {yt }} where t = 0, 1,..., L,   2 t = 1 xt = Aηt ∀ t ∈ {t | t 6= 1 and t mod 5 6= 0}  2 ∀ t ∈ {t | t mod 5 = 0}

and yt = xt−1 + Bηt with y0 = 0, A, B ∈ R ≥ 0 and ηt ∼ N (0, 1). Specifically, consider L = 500, A = 0.1, and B = 0.4.

−1 TX →Y − TY →X = 5.3 × 10 ⇒ X → Y ⇒ g1 = 0 −1 FX →Y − FY →X = 4.5 × 10 ⇒ X → Y ⇒ g2 = 0 −3 CYX − CXY = −8.3 × 10 ⇒ X → Y ⇒ g3 = 0 −3 hhλEC iw i = 6.6 × 10 ⇒ X → Y ⇒ g4 = 0 xy yx −3 h|ρl | − |ρl |i = −2.8 × 10 ⇒ X → Y ⇒ g5 = 0 ECA summary is X → Y, which agrees with intuition.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 47 / 50 Cyclic driving with linear response

Consider {X, Y} = {{xt }, {yt }} where t = 0, 1,..., L,

xt = a sin(bt + c) + Aηt

and yt = xt−1 + Bηt

with y0 = 0, A ∈ [0, 1], B ∈ [0, 1], ηt ∼ N (0, 1), and with the amplitude a, the frequency b, and the phase c all in the appropriate units. Specifically, consider L = 500, A = 0.1, B = 0.4, a = b = 1, and c = 0.

−1 TX →Y − TY →X = 1.9 × 10 ⇒ X → Y ⇒ g1 = 0 −1 FX →Y − FY →X = 2.1 × 10 ⇒ X → Y ⇒ g2 = 0 −3 CYX − CXY = −9.8 × 10 ⇒ X → Y ⇒ g3 = 0 −3 hhλEC iw i = 3.9 × 10 ⇒ X → Y ⇒ g4 = 0 xy yx −2 h|ρl | − |ρl |i = −2.9 × 10 ⇒ X → Y ⇒ g5 = 0 ECA summary is X → Y, which agrees with intuition.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 48 / 50 Cyclic driving with non-linear response Consider{X, Y} = {{xt }, {yt }} where t = 0, 1,..., L,

xt = a sin(bt + c) + Aηt and yt = Bxt−1 (1 − Cxt−1) + Dηt ,

with y0 = 0, with A, B, C, D ∈ [0, 1], ηt ∼ N (0, 1), and with the amplitude a, the frequency b, and the phase c all in the appropriate units given t = 0, f π, 2f π, 3f π, . . . , 6π with f = 1/30, which implies L = 181. Specifically, consider A = 0.1, B = 0.3, C = 0.4, D = 0.5, a = b = 1, and c = 0. −1 TX →Y − TY →X = 2.7 × 10 ⇒ X → Y ⇒ g1 = 0 −1 FX →Y − FY →X = 2.6 × 10 ⇒ X → Y ⇒ g2 = 0 −3 CYX − CXY = −1.8 × 10 ⇒ X → Y ⇒ g3 = 0 −3 hhλEC iw i = 8.4 × 10 ⇒ X → Y ⇒ g4 = 0 xy yx −2 h|ρl | − |ρl |i = −6.8 × 10 ⇒ X → Y ⇒ g5 = 0 ECA summary is X → Y, which agrees with intuition. J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 49 / 50 Coupled logistic map

Consider {X, Y} = {{xt }, {yt }} where t = 0, 1,..., L,

xt = xt−1 (rx − rx xt−1 − βxy yt−1)

and yt = yt−1 (ry − ry yt−1 − βyx xt−1)

where the parameters rx , ry , βxy , βyx ∈ R ≥ 0. Specifically, consider L = 500, βxy = 0.5, βyx = 1.5, rx = 3.8, and ry = 3.2 with initial conditions x0 = y0 = 0.4.

−1 TX →Y − TY →X = 4.9 × 10 ⇒ X → Y ⇒ g1 = 0 −1 FX →Y − FY →X = 5.4 × 10 ⇒ X → Y ⇒ g2 = 0 −3 CYX − CXY = −3.9 × 10 ⇒ X → Y ⇒ g3 = 0 −1 hhλEC iw i = 2.7 × 10 ⇒ X → Y ⇒ g4 = 0 xy yx −1 h|ρl | − |ρl |i = −2.6 × 10 ⇒ X → Y ⇒ g5 = 0 ECA summary is X → Y, which agrees with intuition.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 50 / 50