The Seven Tools of Causal Inference with Reflections on Machine Learning

Forthcoming, Communications of Association for Computing Machinery. TECHNICAL REPORT R-481 November 2018 The Seven Tools of Causal Inference with Reflections on Machine Learning JUDEA PEARL, UCLA Computer Science Department, USA ACM Reference Format: summarizes how traditional impediments are circumvented using Judea Pearl. 2018. The Seven Tools of Causal Inference with Reflections on modern tools of causal inference. In particular, I will present seven Machine Learning. 1, 1 (November 2018), 6 pages. https://doi.org/10.1145/ tasks which are beyond reach of associational learning systems and nnnnnnn.nnnnnnn which have been accomplished using the tools of causal modeling. The dramatic success in machine learning has led to an explosion of THE THREE LAYER CAUSAL HIERARCHY AI applications and increasing expectations for autonomous systems that exhibit human-level intelligence. These expectations, however, A useful insight unveiled by the theory of causal models is the have met with fundamental obstacles that cut across many applica- classification of causal information in terms of the kind of questions tion areas. One such obstacle is adaptability or robustness. Machine that each class is capable of answering. The classification forms a learning researchers have noted that current systems lack the capa- 3-level hierarchy in the sense that questions at level i ¹i = 1; 2; 3º bility of recognizing or reacting to new circumstances they have not can only be answered if information from level j (j ≥ i) is available. been specifically programmed or trained for. Intensive theoretical Figure 1 shows the 3-level hierarchy, together with the charac- and experimental efforts toward “transfer learning,” “domain adap- teristic questions that can be answered at each level. The levels are tation,” and “Lifelong learning” [Chen and Liu 2016] are reflective titled 1. Association, 2. Intervention, and 3. Counterfactual. The of this obstacle. names of these layers were chosen to emphasize their usage. We Another obstacle is explainability, that is, “machine learning mod- call the first level Association, because it invokes purely statistical 1 els remain mostly black boxes” [Ribeiro et al. 2016] unable to ex- relationships, defined by the naked data. For instance, observing a plain the reasons behind their predictions or recommendations, customer who buys toothpaste makes it more likely that he/she buys thus eroding users trust. and impeding diagnosis and repair. See floss; such association can be inferred directly from the observed [Marcus 2018] and hhttp://www.sciencemag.org/news/2018/05/ai- data using conditional expectation. Questions at this layer, because researchers-allege-machine-learning-alchemyi. they require no causal information, are placed at the bottom level on A third obstacle concerns the understanding of cause-effect con- the hierarchy. Answering these questions is the hallmark of current nections. This hallmark of human cognition [Lake et al. 2015; Pearl machine learning methods. The second level, Intervention, ranks and Mackenzie 2018] is, in this author’s opinion, a necessary (though higher than Association because it involves not just seeing what is, not sufficient) ingredient for achieving human-level intelligence. but changing what we see. A typical question at this level would be: This ingredient should allow computer systems to choreograph a What will happen if we double the price? Such questions cannot parsimonious and modular representation of their environment, be answered from sales data alone, because they involve a change interrogate that representation, distort it by acts of imagination and in customers choices, in reaction to the new pricing. These choices finally answer “What if?” kind of questions. Examples are interven- may differ substantially from those taken in previous price-raising tional questions: “What if I make it happen?” and retrospective or situations. (Unless we replicate precisely the market conditions that explanatory questions: “What if I had acted differently?” or “what if existed when the price reached double its current value.) Finally, the my flight had not been late?” Such questions cannot be articulated, top level is called Counterfactuals, a mode of reasoning that goes let alone answered by systems that operate in purely statistical back to the philosophers David Hume and John Stewart Mill, and mode, as do most learning machines today. which has been given computer-friendly semantics in the past two I postulate that all three obstacles mentioned above require equip- decades. A typical question in the counterfactual category is “What ping machines with causal modeling tools, in particular, causal if I had acted differently,” thus necessitating retrospective reasoning. diagrams and their associated logic. Advances in graphical and Counterfactuals are placed at the top of the hierarchy because structural models have made counterfactuals computationally man- they subsume interventional and associational questions. If we have ageable and thus rendered causal reasoning a viable component in a model that can answer counterfactual queries, we can also an- support of strong AI. swer questions about interventions and observations. For example, In the next section, I will describe a three-level hierarchy that re- the interventional question, What will happen if we double the stricts and governs inferences in causal reasoning. The final section price? can be answered by asking the counterfactual question: What would happen had the price been twice its current value? Likewise, Author’s address: Judea Pearl, UCLA Computer Science Department, 4532 Boelter Hall, associational questions can be answered once we can answer in- Los Angeles, California, 90095-1596, USA, [email protected]. terventional questions; we simply ignore the action part and let © 2018 Association for Computing Machinery. This is the author’s version of the work. It is posted here for your personal use. Not for 1 redistribution. The definitive Version of Record was published in , https://doi.org/10. Some other terms used in connection to this layer are: “model-free,” “model-blind,” 1145/nnnnnnn.nnnnnnn. “black-box,” or “data-centric.” Darwiche [2017] used “function-fitting,” for it amounts to fitting data by a complex function defined by the neural network architecture. , Vol. 1, No. 1, Article . Publication date: November 2018. :2 • Judea Pearl Level Typical Typical Questions Examples (Symbol) Activity 1. Association Seeing What is? What does a symptom tell me about P¹yjxº How would seeing X a disease? change my belief inY? What does a survey tell us about the election results? 2. Intervention Doing What if? What if I take aspirin, will my P¹yjdo¹xº; zº Intervening What if I do X? headache be cured? What if we ban cigarettes? 3. Counterfactuals Imagining, Why? Was it the aspirin that stopped my 0 0 P¹yx jx ;y º Retrospection Was it X that caused Y? headache? What if I had acted Would Kennedy be alive had Os- differently? wald not shot him? What if I had not been smoking the past 2 years? Fig. 1. The Causal Hierarchy. Questions at level i can only be answered if information from level i or higher is available. observations take over. The translation does not work in the oppo- to be x 0 and Y to be y0. For example, the probability that Joe’s salary site direction. Interventional questions cannot be answered from would be y had he finished college, given that his actual salary is y0 purely observational information (i.e., from statistical data alone). and that he had only two years of college.” Such sentences can be No counterfactual question involving retrospection can be answered computed only when we possess functional or Structural Equation from purely interventional information, such as that acquired from models, or properties of such models [Pearl 2000, Chapter 7]. controlled experiments; we cannot re-run an experiment on subjects This hierarchy, and the formal restrictions it entails, explains who were treated with a drug and see how they behave had they why machine learning systems, based only on associations, are not been given the drug. The hierarchy is therefore directional, with prevented from reasoning about (novel) actions, experiments and the top level being the most powerful one. causal explanations.2 Counterfactuals are the building blocks of scientific thinking as well as legal and moral reasoning. In civil court, for example, the THE SEVEN TOOLS OF CAUSAL INFERENCE (OR WHAT defendant is considered to be the culprit of an injury if, but for YOU CAN DO WITH A CAUSAL MODEL THAT YOU the defendant’s action, it is more likely than not that the injury COULD NOT DO WITHOUT?) would not have occurred. The computational meaning of but for Consider the following five questions: calls for comparing the real world to an alternative world in which • How effective is a given treatment in preventing a disease? the defendant action did not take place. • Was it the new tax break that caused our sales to go up? Each layer in the hierarchy has a syntactic signature that char- • What is the annual health-care costs attributed to obesity? acterizes the sentences admitted into that layer. For example, the • Can hiring records prove an employer guilty of sex discrimi- association layer is characterized by conditional probability sen- nation? tences, e.g., P¹yjxº = p stating that: the probability of event Y = y • I am about to quit my job, but should I? given that we observed event X = x is equal to p. In large systems, such evidential sentences can be computed efficiently using The common feature of these questions is that they are concerned Bayesian Networks, or any number of machine learning techniques. with cause-and-effect relationships. We can recognize them through At the interventional layer we find sentences of the type P¹yjdo¹xº; words such as “preventing,” “cause,” “attributed to,” “discrimination,” zº, which denotes “The probability of event Y = y given that we and “should I.” Such words are common in everyday language, and intervene and set the value of X to x and subsequently observe event our society constantly demands answers to such questions.

The Seven Tools of Causal Inference with Reflections on Machine Learning

A Philosophical Analysis of Causality in Econometrics

Artificial General Intelligence and Classical Neural Network

ABSTRACT CAUSAL PROGRAMMING Joshua Brulé

Artificial Intelligence: Distinguishing Between Types & Definitions

Causal Screening to Interpret Graph Neural

Artificial Intelligence Is Stupid and Causal Reasoning Won't Fix It

Artificial Intelligence/Artificial Wisdom

Machine Guessing – I

How Empiricism Distorts Ai and Robotics

Part IV: Reminiscences

Artificial Intelligence for Robust Engineering & Science January 22

Artificial Intelligence and National Security