<<

2/10/19

Building Causal Models

Overview

1. What is ?

2. Benefits of thinking in causal models over multiple regression

3. Causal Identification

4. Choosing how to design a model

5. Starting with Meta-Models

6. Realizing Your Model

1 2/10/19

Correlation does not equal causation… but where there’s smoke, there’s fire.

-Jim Grace xkcd

Pearl’s Ladder of Causality 1. Observation: What can we learn?

3. Counterfactual – Can imagine what would happen under unobserved conditions - Requires model of a system - Requires identification of causality

2. Intervention – Understand what happens you do something - Experiments - Provides evidence of causal link

1. Observation – Cause is associated with effect - Correlation - Can only predict within the range of data

Pearl and Mackenzie 2018 http://tylervigen.com/

2 2/10/19

What We Want to Evaluate The World

Exogenous Cause

Cause Effect Cause Effect

Mediator

Cause 2

Intervention: Experiments! What can we Experiments: Manipulate Cause of learn? Interest Exogenous Cause=0 Exogenous Cause=0 X X X Cause Effect Cause = 0,1,2,… Effect X X Mediator=0 X Mediator=0 X Cause 2=0 Cause 2=0

3 2/10/19

Reality Check: Lots of Things Happen in Build The World to Make Counterfactual an Experiment Predictions Exogenous Cause

HowCause much of this do we Effectneed to meaningfully understand if cause → effect? Mediator

Cause 2

Overview Why move beyond multiple regression to causal models? 1. What is Causality? • We estimate the effect of exogenous variables 2. Benefits of thinking in causal models over controlling for all others multiple regression X1 • Covariances implied 3. Causal Identification X2 Y • Not controlling for the 4. Choosing how to design a model right variables = bad X3 5. Starting with Meta-Models • Controlling for the wrong 6. Realizing Your Model variables = bad inference

4 2/10/19

The Chain The

Causal Model Causal Model Be wary of this with post-treatment effects!

Cause Mediator Effect Cause 1 Effect Cause 2

What it can do to Multiple Regression What it would do to Multiple Regression

Cause • Mediator blocks cause Cause 1 • Conditions on effect, opening path between Effect • Looks like no or Cause 2 causes Mediator opposite link between Effect cause and effect • Creates spurious correlations

Example of Collider Bias The Descendant

• My cat jumps on my shoulders when Causal Model she is lonely, on a shelf next to me, or both. Cause Mediator Effect • It’s more likely she is satisfying one of those conditions at any one time, rather than both Descendant cat on shoulders What it would do to Multiple Regression • Thus, if she jumped from a shelf to my • Descendant still partially shoulders, I know she is likely not lonely Cause blocks cause On a Shelf? • I’d falsely conclude there’s a relationship Effect • Masks effect of cause – many possibilities between being on a shelf and loneliness Descendant • Same with descendent of Lonely? collider

5 2/10/19

The Fork The Bigger Problem…

Causal Model Exogenous Cause Causal Model Exogenous Cause

Cause Effect Cause Effect

What it would do to Multiple Regression Univariate Regression

Cause • • Effect Nothing – this is great! Cause Effect The relationship is contaminated – results Exogenous unreliable! Cause

Omitted Variable Bias Overview 1. What is Causality? • We assume that sampling means that omitted variables Exogenous average to 0 2. Benefits of thinking in causal models over Cause – Omission produces downward bias in SE of coefficients multiple regression

• But, if omitted variables are Cause Effect correlated causally with a 3. Causal Identification predictor, they likely are not averaged out 4. Choosing how to design a model

• This will bias your estimates – You will not know in what 5. Starting with Meta-Models direction 6. Realizing Your Model

6 2/10/19

Bigger Problem Causal Identification

All of your model Exogenous This Exogenous need not be causally Cause Cause identified - some may path describe associations

Cause Effect is Cause Effect not But, you can only causally make counterfactual statements about identified parts that are

Causal Identification How do we solve this problem?

Causal identification Exogenous does not require Exogenous This Cause knowing ULTIMATE Cause cause path

Cause Effect Cause Effect is Nor does it require not knowing exact causally mechanisms within a causal pathway identified

7 2/10/19

Solution 1: The Backdoor Criteria Proximate Backdoors

• If we want to know the link between cause and effect Exogenous Cause Exogenous given variables that affect Cause both, we must include all variables with a path into the cause Proximate Cause Cause Effect • So, variables must block all backdoor paths from treatment to outcome

• AND variables must not be Cause Effect descendants of the cause (i.e., no mediators – see the pipe!) Often we only have proximate variables in a backdoor path. Controlling for just them is sufficient.

What Variables Block the Back Door? Space and Time Live in the Backdoor

Site, Spatial and X1 Temporal Covariates and Lags

Y2 Y3

Y1 Y4 Cause Effect

There are two ways to build a multiple regression with closed back doors to determine if Y1 -> Y4. What are they? We will talk more of this with respect to random effects, and autocorrelation on day 4!

8 2/10/19

Finding Backdoors with dagitty Building a DAG with dagitty

X1 • Great package for graph prototyping

• Many ways to analyze graphs as well! Y2 Y3

To build a DAG Y1 Y4 g <- dagitty("dag{ g <- dagitty("dag{ x1 -> y1 ... x1 -> y2 }") x1 -> y3 y3 -> y4 y2 -> y4 y1 -> y4 }")

Plot your DAG! How to Shut the Back Door

X1

Y2 Y3

Y1 Y4

> adjustmentSets(g, exposure = "y1", outcome = "y4")

{ y2, y3 } plot(graphLayout(g)) { x1 }

9 2/10/19

Sometimes We Cannot Shut the Exercise: daggity Backdoor

• Sketch a model of 4-5 variables in your system – Don’t think to hard (that’s for later!) Billion Dollar Environmental Covariates • See if you can figure out how to close any backdoors

• Use daggity to find the back doors between a chosen pair Cause Effect

n.b. can represent chains as: a -> b -> c ->d or colliders as: a -> b <- c

Or, we suspect, but don’t know, of Solution 2: The Front-Door Criterion backdoors

• A variable satisfies the front- Who Knows door criteria when it blocks Who all paths from X to Y. Knows • In practice, you need a ?? ?? ?? ?? causally identified mediating variable unaffected by anything else.

Independent Independent Cause Mediator Effect • Thus, the influence of the Cause Mediator Effect cause is felt by the effect solely through its mediator.

10 2/10/19

Example: Herbivory Through the Front Front Doors and Instruments Door

Tide Height, Fungal • Instrumental variables are Prevalence, Salinity, those that **only** affect Temperature, etc… Who the cause, and have no Knows relation to anything else

?? ?? • By examining their ?? ?? relationship to both the cause and effect, we can derive an estimate of the causal effect Leafhooper Leafhooper Cordgrass Instrument Cause Effect Abundance Damage Abundance • Also useful when cause and effect involved in a feedback

Two Approaches in Instruments Path Represent Causal Relationships – but how solid is our inference?

ic 1. Fit two models, leveraging • We state that a direct link the fact that Instrument Cause Exogenous IE = IC * CE between two variables Cause implies a causal link via a ie Instrument Effect so, CE = IE/IC dependence relationship

Cause Effect • We estimate the strength of that relationship 2. Fit two models, but only Instrument Cause use fitted values of the Mediator • This is a soft causal claim Fitted cause for the second Effect Cause See ivreg

11 2/10/19

Conditional Independence and Hard Check yourself with daggity Causal Claims

• We assume that two Exogenous variables not connected are Waves Cause independent, conditioned on their parent influences forest_mod <- dagitty("dag{ waves -> kelp -> algae algae -> inverts Cause Effect Mediator⊥Exogenous Cause | Cause Kelp Algae waves -> algae }") • This is a HARD causal claim, Mediator setting a path to 0 Invertebrates • Testable

Check yourself with daggity Exercise: What are your independence relationships? X1 > impliedConditionalIndependencies(forest_mod)

Waves Y2 Y3 inverts _||_ kelp | algae inverts _||_ waves | algae Y1 Y4 Kelp Algae

> impliedConditionalIndependencies(g) x1 _||_ y4 | y1, y2, y3 Invertebrates y1 _||_ y2 | x1 y1 _||_ y3 | x1 y2 _||_ y3 | x1

12 2/10/19

Making Sure Pieces of your Model are Overview Causal 1. What is Causality? • Are there omitted variables? 2. Benefits of thinking in causal models over • If so, are they collinear with included variables? multiple regression • Can you shut the back door? • Can you shut the front door? 3. Causal Identification • Can I support all causal independence statements? 4. Choosing how to design a model • Be bold yet honest about causal interpretations! – Science advances by others noticing what you left out 5. Starting with Meta-Models

6. Realizing Your Model

The Continuum of SEM What are the purpose of your modeling?

• Discovery? structural equation modeling Sun Star Lobster Kelp Bass

exploratory/ confirmatory/ model-building hypothesis-testing/ • Hypothesis applications model-comparison White Purple Red applications testing? Urchins Urchins Urchins

Your goals will inform how you build your model • Making Giant Kelp Other Algae predictions?

13 2/10/19

What are the purpose of your modeling? What are the purpose of your modeling?

Predators Top-Down? • Discovery? • Discovery? Species Herbivores Richness

Giant Kelp Algal Cover Understory Kelp • Hypothesis • Hypothesis Summer Kelp Herbivore Upwelling Upwelling Nutrients Spring testing? Interaction testing? Kelp

Predators Bottom-Up? Winter Wave Wave*Kelp Last Years Disturbance Interaction Kelp • Making Herbivores • Making High Kelp Low kelp predictions? Giant Kelp Algal Cover Understory Kelp predictions? High waves ? ? Herbivore Upwelling Low Waves ? ? Upwelling Nutrients Interaction

What is the focus of your modeling? What is the focus of your modeling?

• Driver • Driver Lobster Sun Star Lobster Kelp Bass

• Response • Response

White Purple Red White Purple Red Urchins Urchins Urchins Urchins Urchins Urchins • Mediation • Mediation

Giant Kelp Other Algae Upwelling

• Theory Testing • Theory Testing Giant Kelp

14 2/10/19

What is the focus of your modeling? What is the focus of your modeling?

• Driver Wolf Spider • Driver

Tythus • Response Beetle • Response

• Mediation Planthopper • Mediation

Cardinale et al. 2008 • Theory Testing • Theory Testing Marsh Cordgrass

What is the span of your inference? What is the span of your inference?

Proximity to • Local Wolf Spider • Local Wolf Spider b Spider Farm b estimation estimation a Tythus a Tythus Beetle Beetle

d d • Learning about Planthopper • Learning about Planthopper

processes e processes e

Marsh Marsh Site Salinity Cordgrass Cordgrass

What are a,b,c,d, and e in *THIS* marsh? Across marshes, what is the relative importance (e.g., for biocontrol) of a versus b*d versus site-influences?

15 2/10/19

What are you doing this week? Overview 1. What is Causality? Purpose of modeling effort: - discovery? - testing hypotheses? 2. Benefits of thinking in causal models over - making predictions? multiple regression Focus of modeling effort: - driver focused? 3. Causal Identification - response focused? - mediation focused? 4. Choosing how to design a model - theory testing focused?

Span of inference: 5. Starting with Meta-Models - doing inferential estimation? - learning about processes? 6. Realizing Your Model

SEM as a Unifying Process Model Building

Theory 1. What is Causality? Model Specification 2. Causal Building Blocks 3. Research Goals and Model Structure Measuring & Sampling 4. Starting with Meta-Models Estimation 5. Realizing Your Model Assessment Model of Fit Modification

Interpretation After Grace & Bollen 2005

16 2/10/19

The Structural Equation Meta-Model My Research Program as a Meta-Model (SEMM)

Food Web Ecosystem Structure Function Concept A Causal Process 1 Concept C

Causal Causal Process 2 Process 3

Concept B Environmental Change

Targeting Your Question Targeting Your Question

Focused on driver & Food Web Food Web Ecosystem Mediator Structure Structure Function

Foundation Species Abundance

Environmental EnvironmentalClimate Change Change

17 2/10/19

Identify Relevant Processes

Food Web Structure

Habitat, Food, Light

Foundation Waves Species Abundance

Waves

Climate Change

Do you need to shut a conceptual backdoor? What did I want to do?

Food Web Structure Purpose of modeling effort: Food Web - testing hypotheses Structure - making predictions Foundation Species Focus of modeling effort: Foundation Abundance - driver focused Species Abundance - mediation focused

Span of inference: Wave Local - learning about processes Wave Local Disturbance Environment Disturbance Environment

18 2/10/19

Overview

1. What is Causality?

2. Benefits of thinking in causal models over Meta-Model Your Research multiple regression 3. Causal Identification

4. Choosing how to design a model

5. Starting with Meta-Models

6. Realizing Your Model

Complex Systems are Complex Sampling of Rocky Reefs 2000-2009

19 2/10/19

Matching Data to Concepts Adding Biological Realism

Food Web Food Web Network Metrics Structure

Problem 1: Kelp moderates disturbance Foundation Summer – More Kelp = Smaller Disturbance? Species % Rocky Reef Kelp Counts – Abundance Kelp in BUT no effect on kelp that isn't present… Prev Year

Wave Local Disturbance Environment Winter Max Wave Heights

Solution 1: But: Maintain Backdoor Blockage!

Food Web Food Web Network Metrics Food Web Structure Structure

Rock??? Local Abiotic Environment Summer Foundation Moderation is an Foundation Kelp Counts Species interaction effect Species Abundance Abundance Winter Max Kelp in Wave Heights Prev Year

Wave Local Biotic Wave Local Biotic Disturbance Environment Disturbance Environment

20 2/10/19

Measuring Realized Disturbance via Satellite Natural History Creates Problem Measurements

Problem 2: Kelp regrows quickly – It’s a jungle by summer if nutrients are present – Need to see if kelp was actually removed in winter!

Cavanaugh et al 2011 MEPS

Incorporate Natural History of Model with Observed Variables Disturbance Food Web Food Web Network Structure Metrics

Recovered Foundation Local Abiotic Summer Kelp Rock Species Abundance Environment Abundance (#/sq m) % Cover

Post-Disturbance Satellite Derived Foundation Species Kelp Cover (# Pixels/250m2)

Wave Local Biotic Maximum Winter Previous Year Wave Force (orbital Summer Kelp Disturbance Environment velocity) Abundance (#/sq m)

21 2/10/19

Goal: Hypothesis Evaluation The Process of Model Building

Food Web Network Metrics 1. Make a conceptual meta-model

Summer Kelp Abundance (#/sq m) 3&4

2&4 2. Ensure meta-model’s causal structure meets Satellite Derived your research goals Kelp Cover

Previous Year Maximum Winter Summer Kelp 3. Reify your model based on system natural Wave Force Abundance history (a bigger model!) and available data 1. Full Model 2. No waves -> Food Web effect 3. No effect of previous year's kelp -> Food Web 4. No effect of either waves or previous kelp -> Food Web 4. Ensure causal structure is still intact

Make your model based on data!

22