Estimating Causal Effects
Total Page:16
File Type:pdf, Size:1020Kb
422_990019 8/4/2002 4:01 pm Page 422 (Black plate) © International Epidemiological Association 2002 Printed in Great Britain International Journal of Epidemiology 2002;31:422–429 THEORY AND METHODS Estimating causal effects George Maldonadoa and Sander Greenlandb Although one goal of aetiologic epidemiology is to estimate ‘the Creator: So you want a descriptive incidence proportion ratio? true effect’ of an exposure on disease occurrence, epidemio- You: No, not descriptive. Causal. An incidence proportion ratio logists usually do not precisely specify what ‘true effect’ they that isolates the effect of E on D from all other causal factors. want to estimate. We describe how the counterfactual theory of causation, originally developed in philosophy and statistics, can Creator: By ‘isolate’, you mean a measure that applies to a be adapted to epidemiological studies to provide precise answers single population under different possible exposure scenarios? to the questions ‘What is a cause?’, ‘How should we measure You: Yes, that’s what I mean. effects?’ and ‘What effect measure should epidemiologists estimate in aetiologic studies?’ We also show that the theory of Creator: OK. Which causal incidence proportion ratio? counterfactuals (1) provides a general framework for designing You: Pardon? and analysing aetiologic studies; (2) shows that we must always depend on a substitution step when estimating effects, and Creator: For what population, and for what time period? The therefore the validity of our estimate will always depend on the true value of a causal incidence proportion ratio can be different validity of the substitution; (3) leads to precise definitions of for different groups of people and for different time periods. It’s effect measure, confounding, confounder, and effect-measure not necessarily a biological constant, you know. modification; and (4) shows why effect measures should be You: Yes, of course. For population (your population here, expected to vary across populations whenever the distribution denoted by P) between the years (your study time period here, of causal factors varies across the populations. denoted by t0 to t1). Creator: By population P, do you mean: (1) everyone in popu- Introduction lation P, or (2) the people in population P who have a specific Imagine that the creator of the universe appears to you in a set of characteristics? dream and grants you the answer to one public-health question. You: Pardon? The conversation might go as follows: Creator: As I just said, the true value of a causal incidence You: What is the true effect of (your exposure here, denoted proportion ratio is not necessarily a biological constant. It can by E) on the occurrence of (your disease here, denoted by D)? be different for subgroups of a population. Creator: What do you mean by ‘the true effect’? The true value You: Of course. Everyone in population P. of what parameter? Creator: OK. Comparing what two exposure levels? You: The true relative risk. You: Exposed and unexposed. Creator: Epidemiologists use the term relative risk for several Creator: What do you mean by exposed and unexposed? Exposed different parameters. Which do you mean? for how long, to how much, and during what time period? You: The ratio of average risk with and without exposure—what There are many different ways you could define exposed and some call the risk ratio1 and others call the incidence proportion unexposed, and each of the corresponding possible ratios can ratio.2 have a different true value, you know. Creator: Which incidence proportion ratio? You: Of course. Ever exposed to any amount of E versus never exposed to E. You: Pardon? Creator: The incidence proportion ratio for the causal effect Creator: Do you want a ratio of average disease risk in two on D of ever E compared to never E in population P during the different groups of people with different exposure levels? study time period t0 to t1 is (your causal incidence-proportion- You: Yes. ratio parameter value here). The point of the above is that, while one goal of etiologic a University of Minnesota School of Public Health, Mayo Mail Code 807, 420 epidemiology is to estimate ‘the true effect’ of an exposure on Delaware St. SE, Minneapolis, MN 55455–0392, USA. E-mail: GMPhD@ umn.edu disease frequency, we usually do not precisely specify what b Department of Epidemiology, UCLA School of Public Health, Los Angeles, ‘true effect’ we want to estimate. We may not be able to do so. CA 90095–1772, USA. For example, before reading this paper would you have required 422 422_990019 8/4/2002 4:01 pm Page 423 (Black plate) ESTIMATING CAUSAL EFFECTS 423 less prompting than in the dialog above? How many published of causality theory is provided by Pearl,15 who shows how papers explicitly state what the authors mean by ‘true’ rela- structural-equation models and graphical causal models (causal tive risk or odds ratio, or whether the estimated measure of diagrams) translate directly to counterfactual models, shedding association is intended to have a descriptive or causal inter- light on all three approaches. A brief review of these con- pretation? How many papers explicitly define the population or nections is given by Greenland,21 and Greenland et al.22 provide time period of interest? How many etiologic papers over- a more extensive review of graphical causal modelling for emphasize results that cannot be given a causal interpretation, epidemiological research. such as significance tests, P-values, correlation coefficients, or proportion of variance ‘explained’? Target In this paper we discuss the questions ‘What is a cause?’, We will use the term target population for the group of people ‘How should we measure effects?’ and ‘What effect measure about which our scientific or public-health question asks, and there- should epidemiologists estimate in etiologic studies?’ We begin fore for which we want to estimate the causal effect of an exposure. by adapting the counterfactual approach to causation, originally The target population could be composed of one group of developed in philosophy and in statistics,3,4 to epidemiological people (as in most epidemiological studies), several groups of studies. In the process, we give precise answers to these ques- people (as in an intervention study in several communities), or tions, and we describe how these answers have important one person. For simplicity, in the rest of this discussion we implications for etiologic research: (1) Under the counterfactual assume that the target population is one group of people. approach, the measure we term a ‘causal contrast’ is the only Let the etiologic time period be the time period about which meaningful effect measure for etiologic studies. (2) The counter- our scientific or public-health question asks. The beginning and factual approach provides a general framework for designing end of this time period is specified by the study question. For and analysing epidemiological studies. (3) The counterfactual example, in a study of the effectiveness of a back-injury pre- definition of causal effect shows why direct measurement of an vention programme in a workplace, the etiologic time period effect size is impossible: We must always depend on a sub- could be any time period after the implementation of the stitution step when estimating effects, and the validity of our programme. Note that this period may vary among individuals estimate will thus always depend on the validity of the sub- (e.g. the etiologic time period for a study of weight gain during stitution.3,5–7 (4) The counterfactual approach makes clear that pregnancy and pre-eclampsia spans only pregnancy), and not a critical step in study interpretation is the formal quantification all of the period need be time at risk. For example, the etiologic of bias in study results. (5) The counterfactual approach leads to time period for a study of intrauterine diethylstilbesterol ex- precise definitions of effect measure, confounding, confounder, posure and subsequent fertility problems could include child- and to precise criteria for effect-measure modification. hood, a time at no risk of such problems but during which In the discussion that follows, we assume that the study etiologically relevant events (e.g. puberty) occur. outcome is a disease (e.g. lung cancer); this discussion can be readily extended to any outcome (e.g. a health behaviour such as cigarette smoking). We also assume for simplicity that disease occurrence is deterministic; under a stochastic model, the quantities we discuss are probabilities or expected values.6,7 The counterfactual approach History Let A be the number of new cases of the study disease in the In 1748, the renowned Scottish philosopher David Hume wrote target population during the etiologic time period. Let B be ‘we may define a cause to be an object followed by another … the denominator for computing disease frequency in the target where, if the first object had not been, the second never had population during the etiologic time period. If B is the number existed.’3,8 A key innovation of this definition was that it of people at risk at the beginning of the period and all indi- pivoted on a clause of the form ‘if C had not occurred, D would viduals are followed throughout the etiologic time period, the not have either’, where C and D are what actually occurred. disease-frequency parameter Such a clause, which hypothesizes what would have happened R = A under conditions contrary to actual conditions, is called a B counterfactual conditional. Despite its early appearance, this counterfactual concept of causation received no formal basis is the proportion getting disease over the period (incidence until 1923 when the statistician Jerzy Neyman presented a proportion, average risk). If B is the amount of person-time at quantitative conceptual model for causal analysis.9 This model risk during the period, R is the person-time incidence rate.