2/10/19
Building Causal Models
Overview
1. What is Causality?
2. Benefits of thinking in causal models over multiple regression
3. Causal Identification
4. Choosing how to design a model
5. Starting with Meta-Models
6. Realizing Your Model
1 2/10/19
Correlation does not equal causation… but where there’s smoke, there’s fire.
-Jim Grace xkcd
Pearl’s Ladder of Causality 1. Observation: What can we learn?
3. Counterfactual – Can imagine what would happen under unobserved conditions - Requires model of a system - Requires identification of causality
2. Intervention – Understand what happens you do something - Experiments - Provides evidence of causal link
1. Observation – Cause is associated with effect - Correlation - Can only predict within the range of data
Pearl and Mackenzie 2018 http://tylervigen.com/
2 2/10/19
What We Want to Evaluate The World
Exogenous Cause
Cause Effect Cause Effect
Mediator
Cause 2
Intervention: Experiments! What can we Experiments: Manipulate Cause of learn? Interest Exogenous Cause=0 Exogenous Cause=0 X X X Cause Effect Cause = 0,1,2,… Effect X X Mediator=0 X Mediator=0 X Cause 2=0 Cause 2=0
3 2/10/19
Reality Check: Lots of Things Happen in Build The World to Make Counterfactual an Experiment Predictions Exogenous Cause
HowCause much of this do we Effectneed to meaningfully understand if cause → effect? Mediator
Cause 2
Overview Why move beyond multiple regression to causal models? 1. What is Causality? • We estimate the effect of exogenous variables 2. Benefits of thinking in causal models over controlling for all others multiple regression X1 • Covariances implied 3. Causal Identification X2 Y • Not controlling for the 4. Choosing how to design a model right variables = bad inference X3 5. Starting with Meta-Models • Controlling for the wrong 6. Realizing Your Model variables = bad inference
4 2/10/19
The Chain The Collider
Causal Model Causal Model Be wary of this with post-treatment effects!
Cause Mediator Effect Cause 1 Effect Cause 2
What it can do to Multiple Regression What it would do to Multiple Regression
Cause • Mediator blocks cause Cause 1 • Conditions on effect, opening path between Effect • Looks like no or Cause 2 causes Mediator opposite link between Effect cause and effect • Creates spurious correlations
Example of Collider Bias The Descendant
• My cat jumps on my shoulders when Causal Model she is lonely, on a shelf next to me, or both. Cause Mediator Effect • It’s more likely she is satisfying one of those conditions at any one time, rather than both Descendant cat on shoulders What it would do to Multiple Regression • Thus, if she jumped from a shelf to my • Descendant still partially shoulders, I know she is likely not lonely Cause blocks cause On a Shelf? • I’d falsely conclude there’s a relationship Effect • Masks effect of cause – many possibilities between being on a shelf and loneliness Descendant • Same with descendent of Lonely? collider
5 2/10/19
The Fork The Bigger Problem…
Causal Model Exogenous Cause Causal Model Exogenous Cause
Cause Effect Cause Effect
What it would do to Multiple Regression Univariate Regression
Cause • • Effect Nothing – this is great! Cause Effect The relationship is contaminated – results Exogenous unreliable! Cause
Omitted Variable Bias Overview 1. What is Causality? • We assume that sampling means that omitted variables Exogenous average to 0 2. Benefits of thinking in causal models over Cause – Omission produces downward bias in SE of coefficients multiple regression
• But, if omitted variables are Cause Effect correlated causally with a 3. Causal Identification predictor, they likely are not averaged out 4. Choosing how to design a model
• This will bias your estimates – You will not know in what 5. Starting with Meta-Models direction 6. Realizing Your Model
6 2/10/19
Bigger Problem Causal Identification
All of your model Exogenous This Exogenous need not be causally Cause Cause identified - some may path describe associations
Cause Effect is Cause Effect not But, you can only causally make counterfactual statements about identified parts that are
Causal Identification How do we solve this problem?
Causal identification Exogenous does not require Exogenous This Cause knowing ULTIMATE Cause cause path
Cause Effect Cause Effect is Nor does it require not knowing exact causally mechanisms within a causal pathway identified
7 2/10/19
Solution 1: The Backdoor Criteria Proximate Backdoors
• If we want to know the link between cause and effect Exogenous Cause Exogenous given variables that affect Cause both, we must include all variables with a path into the cause Proximate Cause Cause Effect • So, variables must block all backdoor paths from treatment to outcome
• AND variables must not be Cause Effect descendants of the cause (i.e., no mediators – see the pipe!) Often we only have proximate variables in a backdoor path. Controlling for just them is sufficient.
What Variables Block the Back Door? Space and Time Live in the Backdoor
Site, Spatial and X1 Temporal Covariates and Lags
Y2 Y3
Y1 Y4 Cause Effect
There are two ways to build a multiple regression with closed back doors to determine if Y1 -> Y4. What are they? We will talk more of this with respect to random effects, and autocorrelation on day 4!
8 2/10/19
Finding Backdoors with dagitty Building a DAG with dagitty
X1 • Great package for graph prototyping
• Many ways to analyze graphs as well! Y2 Y3
To build a DAG Y1 Y4 g <- dagitty("dag{ g <- dagitty("dag{ x1 -> y1 ... x1 -> y2 }") x1 -> y3 y3 -> y4 y2 -> y4 y1 -> y4 }")
Plot your DAG! How to Shut the Back Door
X1
Y2 Y3
Y1 Y4
> adjustmentSets(g, exposure = "y1", outcome = "y4")
{ y2, y3 } plot(graphLayout(g)) { x1 }
9 2/10/19
Sometimes We Cannot Shut the Exercise: daggity Backdoor
• Sketch a model of 4-5 variables in your system – Don’t think to hard (that’s for later!) Billion Dollar Environmental Covariates • See if you can figure out how to close any backdoors
• Use daggity to find the back doors between a chosen pair Cause Effect
n.b. can represent chains as: a -> b -> c ->d or colliders as: a -> b <- c
Or, we suspect, but don’t know, of Solution 2: The Front-Door Criterion backdoors
• A variable satisfies the front- Who Knows door criteria when it blocks Who all paths from X to Y. Knows • In practice, you need a ?? ?? ?? ?? causally identified mediating variable unaffected by anything else.
Independent Independent Cause Mediator Effect • Thus, the influence of the Cause Mediator Effect cause is felt by the effect solely through its mediator.
10 2/10/19
Example: Herbivory Through the Front Front Doors and Instruments Door
Tide Height, Fungal • Instrumental variables are Prevalence, Salinity, those that **only** affect Temperature, etc… Who the cause, and have no Knows relation to anything else
?? ?? • By examining their ?? ?? relationship to both the cause and effect, we can derive an estimate of the causal effect Leafhooper Leafhooper Cordgrass Instrument Cause Effect Abundance Damage Abundance • Also useful when cause and effect involved in a feedback
Two Approaches in Instruments Path Represent Causal Relationships – but how solid is our inference?
ic 1. Fit two models, leveraging • We state that a direct link the fact that Instrument Cause Exogenous IE = IC * CE between two variables Cause implies a causal link via a ie Instrument Effect so, CE = IE/IC dependence relationship
Cause Effect • We estimate the strength of that relationship 2. Fit two models, but only Instrument Cause use fitted values of the Mediator • This is a soft causal claim Fitted cause for the second Effect Cause See ivreg
11 2/10/19
Conditional Independence and Hard Check yourself with daggity Causal Claims
• We assume that two Exogenous variables not connected are Waves Cause independent, conditioned on their parent influences forest_mod <- dagitty("dag{ waves -> kelp -> algae algae -> inverts Cause Effect Mediator⊥Exogenous Cause | Cause Kelp Algae waves -> algae }") • This is a HARD causal claim, Mediator setting a path to 0 Invertebrates • Testable
Check yourself with daggity Exercise: What are your independence relationships? X1 > impliedConditionalIndependencies(forest_mod)
Waves Y2 Y3 inverts _||_ kelp | algae inverts _||_ waves | algae Y1 Y4 Kelp Algae
> impliedConditionalIndependencies(g) x1 _||_ y4 | y1, y2, y3 Invertebrates y1 _||_ y2 | x1 y1 _||_ y3 | x1 y2 _||_ y3 | x1
12 2/10/19
Making Sure Pieces of your Model are Overview Causal 1. What is Causality? • Are there omitted variables? 2. Benefits of thinking in causal models over • If so, are they collinear with included variables? multiple regression • Can you shut the back door? • Can you shut the front door? 3. Causal Identification • Can I support all causal independence statements? 4. Choosing how to design a model • Be bold yet honest about causal interpretations! – Science advances by others noticing what you left out 5. Starting with Meta-Models
6. Realizing Your Model
The Continuum of SEM What are the purpose of your modeling?
• Discovery? structural equation modeling Sun Star Lobster Kelp Bass
exploratory/ confirmatory/ model-building hypothesis-testing/ • Hypothesis applications model-comparison White Purple Red applications testing? Urchins Urchins Urchins
Your goals will inform how you build your model • Making Giant Kelp Other Algae predictions?
13 2/10/19
What are the purpose of your modeling? What are the purpose of your modeling?
Predators Top-Down? • Discovery? • Discovery? Species Herbivores Richness
Giant Kelp Algal Cover Understory Kelp • Hypothesis • Hypothesis Summer Kelp Herbivore Upwelling Upwelling Nutrients Spring testing? Interaction testing? Kelp
Predators Bottom-Up? Winter Wave Wave*Kelp Last Year s Disturbance Interaction Kelp • Making Herbivores • Making High Kelp Low kelp predictions? Giant Kelp Algal Cover Understory Kelp predictions? High waves ? ? Herbivore Upwelling Low Waves ? ? Upwelling Nutrients Interaction
What is the focus of your modeling? What is the focus of your modeling?
• Driver • Driver Lobster Sun Star Lobster Kelp Bass
• Response • Response
White Purple Red White Purple Red Urchins Urchins Urchins Urchins Urchins Urchins • Mediation • Mediation
Giant Kelp Other Algae Upwelling
• Theory Testing • Theory Testing Giant Kelp
14 2/10/19
What is the focus of your modeling? What is the focus of your modeling?
• Driver Wolf Spider • Driver
Tythus • Response Beetle • Response
• Mediation Planthopper • Mediation
Cardinale et al. 2008 • Theory Testing • Theory Testing Marsh Cordgrass
What is the span of your inference? What is the span of your inference?
Proximity to • Local Wolf Spider • Local Wolf Spider b Spider Farm b estimation estimation a Tythus a Tythus Beetle Beetle
d d • Learning about Planthopper • Learning about Planthopper
processes e processes e
Marsh Marsh Site Salinity Cordgrass Cordgrass
What are a,b,c,d, and e in *THIS* marsh? Across marshes, what is the relative importance (e.g., for biocontrol) of a versus b*d versus site-influences?
15 2/10/19
What are you doing this week? Overview 1. What is Causality? Purpose of modeling effort: - discovery? - testing hypotheses? 2. Benefits of thinking in causal models over - making predictions? multiple regression Focus of modeling effort: - driver focused? 3. Causal Identification - response focused? - mediation focused? 4. Choosing how to design a model - theory testing focused?
Span of inference: 5. Starting with Meta-Models - doing inferential estimation? - learning about processes? 6. Realizing Your Model
SEM as a Unifying Process Model Building
Theory 1. What is Causality? Model Specification 2. Causal Building Blocks 3. Research Goals and Model Structure Measuring & Sampling 4. Starting with Meta-Models Estimation 5. Realizing Your Model Assessment Model of Fit Modification
Interpretation After Grace & Bollen 2005
16 2/10/19
The Structural Equation Meta-Model My Research Program as a Meta-Model (SEMM)
Food Web Ecosystem Structure Function Concept A Causal Process 1 Concept C
Causal Causal Process 2 Process 3
Concept B Environmental Change
Targeting Your Question Targeting Your Question
Focused on driver & Food Web Food Web Ecosystem Mediator Structure Structure Function
Foundation Species Abundance
Environmental EnvironmentalClimate Change Change
17 2/10/19
Identify Relevant Processes
Food Web Structure
Habitat, Food, Light
Foundation Waves Species Abundance
Waves
Climate Change
Do you need to shut a conceptual backdoor? What did I want to do?
Food Web Structure Purpose of modeling effort: Food Web - testing hypotheses Structure - making predictions Foundation Species Focus of modeling effort: Foundation Abundance - driver focused Species Abundance - mediation focused
Span of inference: Wave Local - learning about processes Wave Local Disturbance Environment Disturbance Environment
18 2/10/19
Overview
1. What is Causality?
2. Benefits of thinking in causal models over Meta-Model Your Research multiple regression 3. Causal Identification
4. Choosing how to design a model
5. Starting with Meta-Models
6. Realizing Your Model
Complex Systems are Complex Sampling of Rocky Reefs 2000-2009
19 2/10/19
Matching Data to Concepts Adding Biological Realism
Food Web Food Web Network Metrics Structure
Problem 1: Kelp moderates disturbance Foundation Summer – More Kelp = Smaller Disturbance? Species % Rocky Reef Kelp Counts – Abundance Kelp in BUT no effect on kelp that isn't present… Prev Year
Wave Local Disturbance Environment Winter Max Wave Heights
Solution 1: Moderation But: Maintain Backdoor Blockage!
Food Web Food Web Network Metrics Food Web Structure Structure
Rock??? Local Abiotic Environment Summer Foundation Moderation is an Foundation Kelp Counts Species interaction effect Species Abundance Abundance Winter Max Kelp in Wave Heights Prev Year
Wave Local Biotic Wave Local Biotic Disturbance Environment Disturbance Environment
20 2/10/19
Measuring Realized Disturbance via Satellite Natural History Creates Problem Measurements
Problem 2: Kelp regrows quickly – It’s a jungle by summer if nutrients are present – Need to see if kelp was actually removed in winter!
Cavanaugh et al 2011 MEPS
Incorporate Natural History of Model with Observed Variables Disturbance Food Web Food Web Network Structure Metrics
Recovered Foundation Local Abiotic Summer Kelp Rock Species Abundance Environment Abundance (#/sq m) % Cover
Post-Disturbance Satellite Derived Foundation Species Kelp Cover (# Pixels/250m2)
Wave Local Biotic Maximum Winter Previous Year Wave Force (orbital Summer Kelp Disturbance Environment velocity) Abundance (#/sq m)
21 2/10/19
Goal: Hypothesis Evaluation The Process of Model Building
Food Web Network Metrics 1. Make a conceptual meta-model
Summer Kelp Abundance (#/sq m) 3&4
2&4 2. Ensure meta-model’s causal structure meets Satellite Derived your research goals Kelp Cover
Previous Year Maximum Winter Summer Kelp 3. Reify your model based on system natural Wave Force Abundance history (a bigger model!) and available data 1. Full Model 2. No waves -> Food Web effect 3. No effect of previous year's kelp -> Food Web 4. No effect of either waves or previous kelp -> Food Web 4. Ensure causal structure is still intact
Make your model based on data!
22