Causation, Prediction, and Search
Total Page:16
File Type:pdf, Size:1020Kb
Causation, Prediction, and Search Peter Spirtes, Clark Glymour, and Richard Scheines Department of Philosophy Carnegie Mellon University Copyright © by Peter Spirtes, Clark Glymour, and Richard Scheines All rights reserved. No portion of this book may be reproduced for commercial use, by any process or tchnique, without the express consent of the authors. To my parents, Morris and Cecile Spirtes - P.S. In memory of Lucille Lynch Schwartz Watkins Speede Tindall Preston - C. G. To Martha, for her support and love - R.S. It is with data affected by numerous causes that Statistics is mainly concerned. Experiment seeks to disentangle a complex of causes by removing all but one of them, or rather by concentrating on the study of one and reducing the others, as far as circumstances permit, to comparatively small residium. Statistics, denied this resource, must accept for analysis data subject to the influence of a host of causes, and must try to discover from the data themselves which causes are the important ones and how much of the observed effect is due to the operation of each. --G. U. Yule and M. G. Kendall 1950 The Theory of Estimation discusses the principles upon which observational data may be used to estimate, or to throw light upon the values of theoretical quantities, not known numerically, which enter into our specification of the causal system operating. -- Sir Ronald Fisher, 1956 George Box has [almost] said "The only way to find out what will happen when a complex system is disturbed is to disturb the system, not merely to observe it passively." These words of caution about "natural experiments" are uncomfortably strong. Yet in today's world we see no alternative to accepting them as, if anything, too weak. --G. Mosteller and J. Tukey, 1977 Causal inference is one of the most important, most subtle, and most neglected of all the problems of Statistics. -- P. Dawid, 1979 Table of Contents Preface ................................................................................................................................vii Acknowledgments ..............................................................................................................xiii Notational Conventions .....................................................................................................xxi 1. Introduction and Advertisement ..................................................................................1 1.1 The Issue............................................................................................................1 1.2 Advertisements ..................................................................................................10 1.2.1 Bayes Networks from the Data...........................................................11 1.2.2 Structural Equation Models from the Data.........................................13 1.2.3 Selection of Regressors.......................................................................14 1.2.4 Causal Inference without Experiment ................................................17 1.2.5 The Structure of the Unobserved........................................................19 1.3 Themes...............................................................................................................21 2. Formal Preliminaries.....................................................................................................25 2.1 Graphs................................................................................................................25 2.2 Probability..........................................................................................................31 2.3 Graphs and Probability Distributions ................................................................32 2.3.1 Directed Acyclic Graphs.....................................................................32 2.3.2 Directed Independence Graphs...........................................................34 2.3.3 Faithfulness.........................................................................................35 2.3.4 d-separation.........................................................................................36 2.3.5 Linear Structures.................................................................................36 2.4 Undirected Independence Graphs......................................................................37 2.5 Deterministic and Pseudo-Indeterministic Systems ..........................................38 2.6 Background Notes .............................................................................................39 3. Causation and Prediction: Axioms and Explications .................................................41 3.1 Conditionals.......................................................................................................41 3.2 Causation ...........................................................................................................42 3.2.1 Direct vs. Indirect Causation ..............................................................42 3.2.2 Events and Variables ..........................................................................43 3.2.3 Examples.............................................................................................45 3.2.4 Representing Causal Relations with Directed Graphs........................47 3.3 Causality and Probability...................................................................................49 3.3.1 Deterministic Causal Structures .........................................................49 3.3.2 Pseudo-Indeterministic and Indeterministic Causal Structures ..........51 3.4 The Axioms .......................................................................................................53 3.4.1 The Causal Markov Condition............................................................53 3.4.2 The Causal Minimality Condition ......................................................55 3.4.3 The Faithfulness Condition.................................................................56 3.5 Discussion of the Conditions .............................................................................57 3.5.1 The Causal Markov and Minimality Conditions ................................57 3.5.2 Faithfulness and Simpson's Paradox...................................................64 3.6 Bayesian Interpretations ....................................................................................70 3.7 Consequences of The Axioms ...........................................................................71 3.7.1 d-Separation........................................................................................71 3.7.2 The Manipulation Theorem ................................................................75 3.8 Determinism ......................................................................................................81 3.9 Background Notes .............................................................................................86 4. Statistical Indistinguishability ......................................................................................87 4.1 Strong Statistical Indistinguishability................................................................88 4.2 Faithful Indistinguishability...............................................................................89 4.3 Weak Statistical Indistinguishability .................................................................90 4.4 Rigid Indistinguishability ..................................................................................93 4.5 The Linear Case.................................................................................................94 4.6 Redefining Variables .........................................................................................99 4.7 Background Notes .............................................................................................101 5. Discovery Algorithms for Causally Sufficient Structures..........................................103 5.1 Discovery Problems...........................................................................................103 5.2 Search Strategies in Statistics ............................................................................104 5.2.1 The Wrong Hypothesis Space ............................................................105 5.2.2 Computational and Statistical Limitations..........................................107 5.2.3 Generating a Single Hypothesis..........................................................108 5.2.4 Other Approaches ...............................................................................109 5.2.5 Bayesian Methods...............................................................................109 5.3 The Wermuth-Lauritzen Algorithm...................................................................111 5.4 New Algorithms.................................................................................................112 5.4.1 The SGS Algorithm ............................................................................114 5.4.2 The PC Algorithm...............................................................................116 5.4.3 The IG (Independence Graph) Algorithm ..........................................124 5.4.4 Variable Selection...............................................................................125 5.4.5 Incorporating Background Knowledge...............................................127 5.5 Statistical Decisions...........................................................................................128