Understanding the Causal Logic of Confounds

Meder, B., Hagmayer, Y., & Waldmann, M. R. (2006). Understanding the Causal Logic of Confounds. In R. Sun, & N. Miyake (Eds.) Proceedings of the Twenty-Eighth Annual Conference of the Cognitive Science Society (pp. 579-584). Mahwah, NJ: Erlbaum. Understanding the Causal Logic of Confounds Björn Meder ([email protected]) York Hagmayer ([email protected]) Michael R. Waldmann ([email protected]) Department of Psychology, University of Göttingen, Gosslerstr. 14, 37073 Göttingen, Germany Abstract independent of each other. Under these circumstances the impact of the cause variable can be seen as an increase (gen- The detection of causal relations is often complicated by con- erative influence) or decrease (inhibitory influence) of the founding variables. Handbooks on methodology therefore sug- probability of the effect given the presence of the cause. gest experimental manipulations of the independent variable Methodology textbooks (e.g., Keppel & Wickens, 2000) combined with randomization as the principal method of deal- strongly recommend controlled experiments to eliminate the ing with this problem. Recently, progress has been made within the literature on causal Bayes nets on the proper analysis of relations between the cause and potentially confounding vari- confounds with non-experimental data (Pearl, 2000). The pre- ables. Experiments involve the random assignment of partici- sent paper summarizes the causal analysis of two basic types of pants to experimental and control groups (i.e., randomization) confounding: common-cause and causal-chain confounding. and a manipulation of the putative cause variable by an out- Two experiments are reported showing that participants under- side intervention. This procedure ensures independence of the stand the causal logic of these two types of confounding. cause variable from all other potentially confounding variables. However, in some sciences (e.g., astronomy) and in many everyday contexts controlled experimentation is impos- Introduction sible. Thus, people have to deal with the problem of con- Scientific studies and everyday causal learning aim to reveal founding variables quite often. This paper intends to show (i) the structure and strength of causal relations among events: under which conditions valid causal inferences are possible Does event C cause event E? Will a manipulation of C gener- on the basis of observations even in the presence of con- ate E? In order to answer these questions, data have to be founding variables, and (ii) that people are capable of reason- gathered. But even with data it is often hard to answer these ing correctly with causal models that contain confounds. questions because the statistical relation observed between C Throughout this paper we focus on the most basic type of and E not only may reflect a direct causal relation but a spuri- causal induction, the detection and evaluation of a single ous relation due to other, confounding variables. causal relation. In addition, we assume that there is a known For example, in the 1950’s, a series of studies with non- confounding variable which is related to both the cause and experimental data was published showing that lung cancer the effect. First, we will provide a causal analysis of con- was found to be more frequent in smokers than in non smok- founding. Then we will show how causal Bayes net theory ers (e.g., Doll & Hill, 1956). This data was interpreted as models confounding and causal inferences. Finally, we will evidence that smoking is a cause of lung cancer. However, report two experiments investigating whether participants are some prominent statisticians (e.g., Fisher, 1958) argued that able to take confoundings into account. such a conclusion was not justified on the basis of the available data. Fisher (1958) offered an alternative causal model in The Causal Basis of Confounding which the observed covariation was not interpreted as a direct Two basic causal structures may underlie confounding. causal relation but as a spurious correlation generated by a One possibility is that the confounding variable X is a cause common cause, a specific genotype causing both a craving for of both the candidate cause C and the effect E (com- nicotine and the development of lung cancer. mon-cause confound, Fig. 1a). Another possibility is a causal- Confounding variables are statistically related to both the chain model in which the cause variable C not only directly potential cause C (independent variable) and the presumed influences the effect E but also generates the confounding effect E (dependent variable). It is the relation between the variable X, which, in turn, influences E (causal-chain con- confounding variable and the cause that creates serious prob- found, Fig. 1b). The crucial point is that both models imply a lems. In the most extreme case the cause and the other vari- correlation between cause C and effect E, even when there is able are perfectly confounded, that is, they are either both no direct causal relation between them (i.e., without the present or both absent all the time. In this case it is impossible causal arrow C→E). If the confounding variable is present, to tell whether the effect is generated by the cause or by the both the cause and the effect should tend to be present; if X is confounding variable. Note that the problem of confounding absent both C and E should tend to be absent. In addition to does not originate in the relation between the extraneous vari- the causal relations connecting the confounding variable to C able and the effect. Even if the extraneous variable has a very and E there is a direct causal relation between C and E whose strong influence, the impact of the cause variable can be de- existence and strength has to be identified. tected as long as the extraneous variable is not permanently present and the cause variable and the extraneous variable are 579 Meder, B., Hagmayer, Y., & Waldmann, M. R. (2006). Understanding the Causal Logic of Confounds. In R. Sun, & N. Miyake (Eds.) Proceedings of the Twenty-Eighth Annual Conference of the Cognitive Science Society (pp. 579-584). Mahwah, NJ: Erlbaum. (a) (b) causal model representing events and their directed causal Common-Cause Confound Causal-Chain Confound relations (see Fig. 1). Associated with the model are parame- ters (e.g., conditional probabilities) encoding the strength of the causal relations and the events’ base rates. At the heart of the causal Bayes nets framework lies the causal Markov con- dition (Spirtes et al., 1993; Pearl, 2000) which states that the value of any variable X in a causal model is independent of all other variables (except for its causal descendants) conditional on the set of its direct causes. By applying the Markov condi- Figure 1: Two Types of Confounding tion the joint probability distribution of a causal model can be The two models shown in Fig. 1 represent two different factorized using components representing only direct causal kinds of confounding. The common-cause confound model relations. For example, the joint probability distribution of the represents the situation that some extraneous variable is caus- common-cause confound model can be factorized into ally affecting both the cause and the effect. The hypothesis (1) P(X.C.E) = P(X) · P(C|X) · P(E|C.X) that smoking and lung cancer are both caused by a specific Similarly, the causal-chain confound model can be formalized genotype exemplifies this type of confounding. There are by several possibilities to eliminate the causal relation between the common cause X and the candidate cause C. For example, (2) P(X.C.E) = P(C) · P (X|C) · P(E|C.X) X might be eliminated or held constant (e.g., only people The conditional probabilities of the decomposed models can without the carcinogenic genotype are studied). In addition, C be directly estimated from the conditional frequencies in the might be manipulated independently of X (which would not available data, provided the confounding variable is observed be possible in the case of smoking for ethical reasons). Such along with the cause and effect variables. an independent manipulation is equivalent to a randomized However, the causal consequences of possible interventions experiment (Fisher, 1951). cannot always be read off from conditional frequency infor- However, simple randomization combined with manipula- mation alone. Consider the common-cause confound model tions of the candidate cause C cannot eliminate causal-chain depicted in Figure 1a. The conditional probability of the ef- confounding. This type of confounding calls for other con- fect in the presence of the cause (i.e., P(e|c)) reflects both the trols because a manipulation of C would directly affect X. direct causal influence of C on E and the spurious relation Thus, other ways have to be found to block the causal relation arising from the confounding common cause X. However, connecting the cause C to the confound X. For example, aspi- intervening in C renders the cause independent of the con- rin (C) might not only have a direct influence on headache but founding variable because the intervention fixes the variable’s also make your blood thinner (X), which, in turn, might also state. Therefore the probability of E given an intervention in have an impact on your headache (E). One way to get rid of C reflects the causal influence of C on E and the causal influ- confounding in this case is to administer aspirin to people ence of X on E but is not distorted by a spurious relation. who all have thin blood or who are resistant against the side To formalize the notion of an external intervention, Pearl effect, which is equivalent to holding the confound constant. (2000) introduced the so-called ‘Do-Operator’, written as Another possibility is to manipulate the confounding variable Do (•).

Load more