Some Statistical Aspects of Causality
Total Page:16
File Type:pdf, Size:1020Kb
European Sociological Review,Vol. 17 No. 1, 65^74 65 Some Statistical Aspects of Causality D.R. Coxand NannyWermuth A general review of approaches to causality is given from a statistical perspective. Three broad notions are distinguished. In the ¢nal part of the paper the challenges of reaching potentially causal representations are outlined for a study of some German political and social attitudes. Generalities There is a long history in the philosophical literature Symmetricand Directed Relations. Association is a sym- of discussions of causality.It typically regards a cause metric relation between two or possibly more as necessary and su¤cient for an e¡ect: all children features, but causality is asymmetric.That is, if C is of divorced parents have behavioural problems and associated with R then R is associated with C,butif all children with behavioural problems have C is a cause of R then R is not a cause of C. Thus, divorced parents. In virtually all sociological con- given any two features C and R, we need to distin- texts, however, the concern is with multiple causes, guish the possibilities where even if one is predominant.Thus explicit or implicit 1 C and R are to be treated on an equal footing and statistical considerations are inescapable to evaluate dealt with symmetrically in any interpretation; empirical evidence. Even within the statistical view 2 One of the variables, say C, is to be regarded as of causality there are a number of di¡erent formula- explanatory to the other variable, R, regarded as tions. Here we review three. In the ¢nal section of a response. Then, if there is a relation, it is the paper a speci¢c illustration is sketched. Refer- regarded asymmetrically. ences to work quoted by authors only are in The distinction here is not about statistical signi¢- Holland %1986). cance but rather is concerned with substantive interpretation. Notions of Causality GraphicalRepresentation. Auseful graphical represen- tation shows two variables X1 and X2, regarded on Causality as Stable Association an equal footing, if associated, as connected by an undirected edge, whereas two variables such that C Suppose that there is clear evidence that two features is explanatory of R, if connected, are done so by a of the individuals %people, communities, house- directed edge: see Figures 1a and 1b. holds, etc.) under investigation are associated, a There are two possible bases for the distinction possible cause C and a possible response R.For between explanatory and response variables and instance, individuals with high values of C may thus for using a directed edge. One is that features tend to have high values of R and vice versa. Thus referring to an earlier time-point are explanatory to C and R might be test scores for individuals at a features referring to a later time-point.Thus aspects given age in arithmetic and language or they may of previous education may be possible explanatory be the unemployment rate and level of crime in a variables for subsequent career performance. In community.What might it mean to conclude that C such situations the relevant time is not the time is a cause of a response R? when the observation is made but the time to & Oxford University Press 2001 66 D.R.COX AND NANNY WERMUTH Figure 1. "a)Undirected edge between two variables X1, X2 on an equal footing "b)Directed edge between explanatory variable Cand response variable R "c)General dependence ofresponse Ron B, C "d) Specialsituation with R ?? C j B "e)Specialsituation with B ?? C corresponding inparticular to randomization of C which the features refer although, of course, obser- are typicallyassessed empiricallyby some form of re- vations recorded retrospectively are especially gression analysis. In such a situation one would not subject to recall biases. The second is a subject- regard C as a cause of R even though in an analysis matter working hypothesis based for example on without the background variable B there is a statisti- theory or on empirical data from other kinds of cal dependence between C and R. investigation. Suppose that cross-sectional data are This discussion leads to one de¢nition used in the collected on current salary and on attitudes to var- literature of C being a cause of R, namely that there ious social and political issues. It might then be is a dependence between C and R and that the sign reasonable provisionally to regard income as expla- of that dependence is unaltered whatever variables natory of attitudes, of course only as part of a more B1,B2, ... themselves explanatory to C are consid- complex network of relationships. ered simultaneously with C as possible sources of In summary, the ¢rst step towards causality is to dependence. This de¢nition has a long history but require good reasons for regarding C as explanatory is best articulated by I. J. Good and P. Suppes. A of R as a response and that any notion of causal con- corresponding notion for time series is due to N. nection between C and R is that C is a cause of R and Wiener and C.W.Granger.This de¢nition underlies not the other way round. much empirical statistical analysis in so far as it aims to achieve causal explanation. CommonExplanatoryVariables. Next consider the pos- The de¢nition entertains all possible alternative sibility of one or more common explanatory explanatory variables. In any particular study one variables. For this, suppose that a background vari- can atbest checkthatthe measured background vari- able B is potentially explanatory of C and hence also ablesB do notaccountfor the dependencebetweenC of R.There are a number of possibilities of which the and R.The possibility that the dependence could be most general is shown in Figure 1c with directed explained by variables explanatory to C that have not edges from B to C, from C to R and also directly been measured, i.e. by so-called unobserved con- from B to R. On the other hand, if the relation were founders, is less likely the larger the apparent e¡ect that represented schematically in Figure 1d,theonly and can be discounted only by general plausibility dependencebetweenC andR isthatinducedbytheir arguments about the ¢eld in question. Sensitivity both depending on B.ThenC and R are said to be analysis may be helpful as it involves calculating conditionally independent given B,sometimescon- what the properties of an unobserved confounder veniently written R ?? C j B.There is no direct path would have to be to explain away the dependence in from C to R that does not pass via B. Such relations question.When the empirical dependence found is SOME STATISTICAL ASPECTS OF CAUSALITY 67 Figure 2. "a)Intermediate variable Iaccounting foroverall e¡ectof Cafter ignoring I; R ?? C j I "b)Associated variables C,C* on an equalfooting and both explanatory to R very strong it may be possible to argue that it is C would not be enough in assessing whether such a implausible that there is an unmeasured variable path exists. If R is independent of C given an inter- itself with a strong enough dependence to explain mediate variable I, but dependent on I,thenC may away completely the observed e¡ect. For further still have caused I and I may be a cause of R. details see Rosenbaum %1995). Forinstance suppose thatC representsan aspectof primary school education and I some feature of sec- RoleofRandomization. Althoughphysical randomiza- ondary education, which in turn a¡ects some aspect tion by the investigator is rarely possible in ofemployment; see Figure 2a.Doestheaspectof pri- sociological investigations, it is conceptually impor- mary education cause a change in R?IfR is tant to consider brie£y its consequences for conditionally independent of C given I it would be interpretation. In this case in the scheme sketched reasonable tosay that primaryeducation does cause a in Figure 1e there can be no edge between the Bsand change in R and that this change appears to be C, since such dependence would be contrary to ran- explained via what happens in secondary education. domization,i.e.to each individualunder studybeing equallylikely toreceive each treatment possibility.In Explanatory Variables on an Equal Footing. An even this situation an apparent dependence between C more delicate situation arises with variables CÃ on and R cannot be explained by a background variable an equal footing with the variable C whose causal as in Figure 1d. It is in this sense that causality can be status is under consideration; see Figure 2b.Ifthe inferred from randomized experiments and not role of C is essentially the same whether or not CÃ from observational studies as sometimes stated, is conditioned, i.e. whether or not CÃ is included in especially in the statistical literature. While, other the regression equation, there is no problem, at least things being equal, randomized experiments are at a qualitative level. On the other hand it is relatively greatly tobe preferred to observationalstudies, di¤- common to ¢nd clear dependence on C; CÃ as a culties of interpretation, sometimes serious ones, pair, but that either variable on its own is su¤cient remain.The most important are possible interactive to explain the dependence. There are then broadly e¡ects, especially with unobserved explanatory vari- three routes to interpretation: ables, and unanticipated future interventions in the 1 To regard C; CÃ collectively as the possibly system under study that remain unnoticed when the causal variables; ¢nal response is recorded. 2 To present at least two possibilities for interpre- tation, one based on C and one on CÃ; IntermediateVariables.