The Book of Why a Review by Lisa R
Total Page:16
File Type:pdf, Size:1020Kb
BOOK REVIEW The Book of Why A review by Lisa R. Goldberg The Book of Why The holdup was the specter of a latent factor, perhaps some- The New Science of Cause and Effect thing genetic, that might cause both lung cancer and a crav- Judea Pearl and Dana Macken- ing for tobacco. If the latent factor were responsible for zie lung cancer, limiting cigarette smoking would not prevent Basic Books, 2018 the disease. Naturally, tobacco companies were fond of 432 pages this explanation, but it was also advocated by the promi- ISBN-13: 978-0465097609 nent statistician Ronald A. Fisher, co-inventor of the so- called gold standard of experimentation, the Randomized Judea Pearl is on a mission to Controlled Trial (RCT). change the way we interpret Subjects in an RCT on smoking and lung cancer would data. An eminent professor have been assigned to smoke or not on the flip of a coin. of computer science, Pearl has The study had the potential to disqualify a latent factor documented his research and opinions in scholarly books as the primary cause of lung cancer and elevate cigarettes and papers. Now, he has made his ideas accessible to to the leading suspect. Since a smoking RCT would have a broad audience in The Book of Why: The New Science been unethical, however, researchers made do with ob- of Cause and Effect, co-authored with science writer Dana servational studies showing association, and demurred on Mackenzie. With the release of this historically grounded the question of cause and effect for decades. and thought-provoking book, Pearl leaps from the ivory Was the problem simply that the tools available in the tower into the real world. 1950s and 1960s were too limited in scope? Pearl address- The Book of Why takes aim at perceived limitations of es that question in his three-step Ladder of Causation, observational studies, whose underlying data are found in which organizes inferential methods in terms of the prob- nature and not controlled by researchers. Many believe lems they can solve. The bottom rung is for model-free that an observational study can elucidate association but statistical methods that rely strictly on association or cor- not cause and effect. It cannot tell you why. relation. The middle rung is for interventions that allow Perhaps the most famous example concerns the impact for the measurement of cause and effect. The top rung is of smoking on health. By the mid 1950s, researchers had for counterfactual analysis, the exploration of alternative established a strong association between smoking and realities. lung cancer. Only in 1984, however, did the US govern- Early scientific inquiries about the relationship between ment mandate the phrase “smoking causes lung cancer.” smoking and lung cancer relied on the bottom rung, model-free statistical methods whose modern analogs Lisa Goldberg is a co-director of the Consortium for Data Analytics in Risk and dominate the analysis of observational studies today. In an adjunct professor of Economics and Statistics at University of California, one of The Book of Why’s many wonderful historical anec- Berkeley. She is a director of research at Aperio Group, LLC. Her email address dotes, the predominance of these methods is traced to the is [email protected]. work of Francis Galton, who discovered the principle of re- Communicated by Notices Book Review Editor Stephan Ramon Garcia. gression to the mean in an attempt to understand the pro- For permission to reprint this article, please contact: cess that drives heredity of human characteristics. Regres- [email protected]. sion to the mean involves association, and this led Galton DOI: https://doi.org/10.1090/noti1912 AUGUST 2019 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1093 Book Review and his disciple, Karl Pearson, to conclude that association only to my admiration for his courage and deter- was more central to science than causation. mination. Imagine the situation in 1921. A self- Pearl places deep learning and other modern data min- taught mathematician faces the hegemony of the ing tools on the bottom rung of the Ladder of Causation. statistical establishment alone. They tell him Bottom rung methods include AlphaGo, the deep learning “Your method is based on a complete misappre- program that defeated the world’s best human Go players hension of the nature of causality in the scientific in 2015 and 2016 [1]. For the benefit of those who remem- sense.” And he retorts, “Not so! My method is ber the ancient times before data mining changed every- important and goes beyond anything you can gen- thing, he explains, erate.” The successes of deep learning have been truly re- markable and have caught many of us by surprise. Pearl defines a causal model to be a directed acyclic graph Nevertheless, deep learning has succeeded primar- that can be paired with data to produce quantitative causal ily by showing that certain questions or tasks we estimates. The graph embodies the structural relationships thought were difficult are in fact not. that a researcher assumes are driving empirical results. The structure of the graphical model, including the identifica- The issue is that algorithms, unlike three-year-olds, do as tion of vertices as mediators, confounders, or colliders, they are told, but in order to create an algorithm capable can guide experimental design through the identification of causal reasoning, of minimal sets of control variables. Modern expositions ...we have to teach the computer how to selectively on graphical cause and effect models are [3] and [4]. break the rules of logic. Computers are not good at breaking rules, a skill at which children excel. Figure 2. Mutated causal model facilitating the calculation of the effect of smoking on lung cancer. The arrow from the Figure 1. Causal model of assumed relationships among confounding smoking gene to the act of smoking is deleted. smoking, lung cancer, and a smoking gene. Within this framework, Pearl defines the do operator, Methods for extracting causal conclusions from observa- which isolates the impact of a single variable from other tional studies are on the middle rung of Pearl’s Ladder of effects. The probability of 푌 do 푋, 푃[푌|do(푋)], is not Causation, and they can be expressed in a mathematical the same thing as the conditional probability of 푌 given language that extends classical statistics and emphasizes 푋. Rather 푃[푌|do(푋)] is estimated in a mutated causal graphical models. model, from which arrows pointing into the assumed cause are removed. Confounding is the difference between Various options exist for causal models: causal dia- 푃[푌|do(푋)] and 푃[푌|푋]. In the 1950s, researchers were grams, structural equations, logical statements, after the former but could estimate only the latter in obser- and so forth. I am strongly sold on causal dia- vational studies. That was Ronald A. Fisher’s point. grams for nearly all applications, primarily due to Figure 1 depicts a simplified relationship between smok- their transparency but also due to the explicit an- ing and lung cancer. Directed edges represent assumed swers they provide to many of the questions we causal relationships, and the smoking gene is represented wish to ask. by an empty circle, indicating that the variable was not ob- The use of graphical models to determine cause and effect servable when the connection between smoking and can- in observational studies was pioneered by Sewall Wright, cer was in question. Filled circles represent quantities that whose work on the effects of birth weight, litter size, length could be measured, like rates of smoking and lung cancer of gestation period, and other variables on the weight of a in a population. Figure 2 shows the mutated causal model 33-day-old guinea pig is in [2]. Pearl relates Wright’s per- that isolates the impact of smoking on lung cancer. sistence in response to the cold reception his work received The conclusion that smoking causes lung cancer was from the scientific community. eventually reached without appealing to a causal model. A My admiration for Wright’s precision is second crush of evidence, including the powerful sensitivity anal- 1094 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 66, NUMBER 7 Book Review ysis developed in [5], ultimately swayed opinion. Pearl ar- fundamentally changed our understanding of how we gues that his methods, had they been available, might have make decisons. Pearl draws on the work of Kahneman and resolved the issue sooner. Pearl illustrates his point in a Tversky in The Book of Why, and Pearl’s approach to analyz- hypothetical setting where smoking causes cancer only by ing counterfactuals might be best explained in terms of a depositing tar in lungs. The corresponding causal diagram question that Kahneman and Tversky posed in their study is shown in Figure 3. His front door formula corrects for the [10] of how we explore alternative realities. confounding of the unobservable smoking gene without How close did Hitler’s scientists come to develop- ever mentioning it. The bias-corrected impact of smoking, ing the atom bomb in World War II? If they had 푋, on lung cancer, 푌, can be expressed developed it in February 1945, would the outcome ′ ′ of the war have been different? 푃[푌|do(푋)] = ∑ 푃[푍|푋] ∑ 푃[푌|푋 , 푍]푃[푋 ]. 푍 푋′ —The Simulation Heuristic Pearl’s response to this question includes the probability of necessity for Germany and its allies to have won World II had they developed the atom bomb in 1945, given our his- torical knowledge that they did not have an atomic bomb in February 1945 and lost the war.