Building Causal Models
Total Page:16
File Type:pdf, Size:1020Kb
2/10/19 Building Causal Models Overview 1. What is Causality? 2. Benefits of thinking in causal models over multiple regression 3. Causal Identification 4. Choosing how to design a model 5. Starting with Meta-Models 6. Realizing Your Model 1 2/10/19 Correlation does not equal causation… but where there’s smoke, there’s fire. -Jim Grace xkcd Pearl’s Ladder of Causality 1. Observation: What can we learn? 3. Counterfactual – Can imagine what would happen under unobserved conditions - Requires model of a system - Requires identification of causality 2. Intervention – Understand what happens you do something - Experiments - Provides evidence of causal link 1. Observation – Cause is associated with effect - Correlation - Can only predict within the range of data Pearl and Mackenzie 2018 http://tylervigen.com/ 2 2/10/19 What We Want to Evaluate The World Exogenous Cause Cause Effect Cause Effect Mediator Cause 2 Intervention: Experiments! What can we Experiments: Manipulate Cause of learn? Interest Exogenous Cause=0 Exogenous Cause=0 X X X Cause Effect Cause = 0,1,2,… Effect X X Mediator=0 X Mediator=0 X Cause 2=0 Cause 2=0 3 2/10/19 Reality Check: Lots of Things Happen in Build The World to Make Counterfactual an Experiment Predictions Exogenous Cause HowCause much of this do we Effectneed to meaningfully understand if cause → effect? Mediator Cause 2 Overview Why move beyond multiple regression to causal models? 1. What is Causality? • We estimate the effect of exogenous variables 2. Benefits of thinking in causal models over controlling for all others multiple regression X1 • Covariances implied 3. Causal Identification X2 Y • Not controlling for the 4. Choosing how to design a model right variables = bad inference X3 5. Starting with Meta-Models • Controlling for the wrong 6. Realizing Your Model variables = bad inference 4 2/10/19 The Chain The Collider Causal Model Causal Model Be wary of this with post-treatment effects! Cause Mediator Effect Cause 1 Effect Cause 2 What it can do to Multiple Regression What it would do to Multiple Regression Cause • Mediator blocks cause Cause 1 • Conditions on effect, opening path between Effect • Looks like no or Cause 2 causes Mediator opposite link between Effect cause and effect • Creates spurious correlations Example of Collider Bias The Descendant • My cat jumps on my shoulders when Causal Model she is lonely, on a shelf next to me, or both. Cause Mediator Effect • It’s more likely she is satisfying one of those conditions at any one time, rather than both Descendant cat on shoulders What it would do to Multiple Regression • Thus, if she jumped from a shelf to my • Descendant still partially shoulders, I know she is likely not lonely Cause blocks cause On a Shelf? • I’d falsely conclude there’s a relationship Effect • Masks effect of cause – many possibilities between being on a shelf and loneliness Descendant • Same with descendent of Lonely? collider 5 2/10/19 The Fork The Bigger Problem… Causal Model Exogenous Cause Causal Model Exogenous Cause Cause Effect Cause Effect What it would do to Multiple Regression Univariate Regression Cause • • Effect Nothing – this is great! Cause Effect The relationship is contaminated – results Exogenous unreliable! Cause Omitted Variable Bias Overview 1. What is Causality? • We assume that sampling means that omitted variables Exogenous average to 0 2. Benefits of thinking in causal models over Cause – Omission produces downward bias in SE of coefficients multiple regression • But, if omitted variables are Cause Effect correlated causally with a 3. Causal Identification predictor, they likely are not averaged out 4. Choosing how to design a model • This will bias your estimates – You will not know in what 5. Starting with Meta-Models direction 6. Realizing Your Model 6 2/10/19 Bigger Problem Causal Identification All of your model Exogenous This Exogenous need not be causally Cause Cause identified - some may path describe associations Cause Effect is Cause Effect not But, you can only causally make counterfactual statements about identified parts that are Causal Identification How do we solve this problem? Causal identification Exogenous does not require Exogenous This Cause knowing ULTIMATE Cause cause path Cause Effect Cause Effect is Nor does it require not knowing exact causally mechanisms within a causal pathway identified 7 2/10/19 Solution 1: The Backdoor Criteria Proximate Backdoors • If we want to know the link between cause and effect Exogenous Cause Exogenous given variables that affect Cause both, we must include all variables with a path into the cause Proximate Cause Cause Effect • So, variables must block all backdoor paths from treatment to outcome • AND variables must not be Cause Effect descendants of the cause (i.e., no mediators – see the pipe!) Often we only have proximate variables in a backdoor path. Controlling for just them is sufficient. What Variables Block the Back Door? Space and Time Live in the Backdoor Site, Spatial and X1 Temporal Covariates and Lags Y2 Y3 Y1 Y4 Cause Effect There are two ways to build a multiple regression with closed back doors to determine if Y1 -> Y4. What are they? We will talk more of this with respect to random effects, and autocorrelation on day 4! 8 2/10/19 Finding Backdoors with dagitty Building a DAG with dagitty X1 • Great package for graph prototyping • Many ways to analyze graphs as well! Y2 Y3 To build a DAG Y1 Y4 g <- dagitty("dag{ g <- dagitty("dag{ x1 -> y1 ... x1 -> y2 }") x1 -> y3 y3 -> y4 y2 -> y4 y1 -> y4 }") Plot your DAG! How to Shut the Back Door X1 Y2 Y3 Y1 Y4 > adjustmentSets(g, exposure = "y1", outcome = "y4") { y2, y3 } plot(graphLayout(g)) { x1 } 9 2/10/19 Sometimes We Cannot Shut the Exercise: daggity Backdoor • Sketch a model of 4-5 variables in your system – Don’t think to hard (that’s for later!) Billion Dollar Environmental Covariates • See if you can figure out how to close any backdoors • Use daggity to find the back doors between a chosen pair Cause Effect n.b. can represent chains as: a -> b -> c ->d or colliders as: a -> b <- c Or, we suspect, but don’t know, of Solution 2: The Front-Door Criterion backdoors • A variable satisfies the front- Who Knows door criteria when it blocks Who all paths from X to Y. Knows • In practice, you need a ?? ?? ?? ?? causally identified mediating variable unaffected by anything else. Independent Independent Cause Mediator Effect • Thus, the influence of the Cause Mediator Effect cause is felt by the effect solely through its mediator. 10 2/10/19 Example: Herbivory Through the Front Front Doors and Instruments Door Tide Height, Fungal • Instrumental variables are Prevalence, Salinity, those that **only** affect Temperature, etc… Who the cause, and have no Knows relation to anything else ?? ?? • By examining their ?? ?? relationship to both the cause and effect, we can derive an estimate of the causal effect Leafhooper Leafhooper Cordgrass Instrument Cause Effect Abundance Damage Abundance • Also useful when cause and effect involved in a feedback Two Approaches in Instruments Path Represent Causal Relationships – but how solid is our inference? ic 1. Fit two models, leveraging • We state that a direct link the fact that Instrument Cause Exogenous IE = IC * CE between two variables Cause implies a causal link via a ie Instrument Effect so, CE = IE/IC dependence relationship Cause Effect • We estimate the strength of that relationship 2. Fit two models, but only Instrument Cause use fitted values of the Mediator • This is a soft causal claim Fitted cause for the second Effect Cause See ivreg 11 2/10/19 Conditional Independence and Hard Check yourself with daggity Causal Claims • We assume that two Exogenous variables not connected are Waves Cause independent, conditioned on their parent influences forest_mod <- dagitty("dag{ waves -> kelp -> algae algae -> inverts Cause Effect Mediator⊥Exogenous Cause | Cause Kelp Algae waves -> algae }") • This is a HARD causal claim, Mediator setting a path to 0 Invertebrates • Testable Check yourself with daggity Exercise: What are your independence relationships? X1 > impliedConditionalIndependencies(forest_mod) Waves Y2 Y3 inverts _||_ kelp | algae inverts _||_ waves | algae Y1 Y4 Kelp Algae > impliedConditionalIndependencies(g) x1 _||_ y4 | y1, y2, y3 Invertebrates y1 _||_ y2 | x1 y1 _||_ y3 | x1 y2 _||_ y3 | x1 12 2/10/19 Making Sure Pieces of your Model are Overview Causal 1. What is Causality? • Are there omitted variables? 2. Benefits of thinking in causal models over • If so, are they collinear with included variables? multiple regression • Can you shut the back door? • Can you shut the front door? 3. Causal Identification • Can I support all causal independence statements? 4. Choosing how to design a model • Be bold yet honest about causal interpretations! – Science advances by others noticing what you left out 5. Starting with Meta-Models 6. Realizing Your Model The Continuum of SEM What are the purpose of your modeling? • Discovery? structural equation modeling Sun Star Lobster Kelp Bass exploratory/ confirmatory/ model-building hypothesis-testing/ • Hypothesis applications model-comparison White Purple Red applications testing? Urchins Urchins Urchins Your goals will inform how you build your model • Making Giant Kelp Other Algae predictions? 13 2/10/19 What are the purpose of your modeling? What are the purpose of your modeling? Predators Top-Down? • Discovery? • Discovery? Species Herbivores Richness Giant Kelp Algal Cover Understory Kelp • Hypothesis • Hypothesis Summer