<<

relation between cigarette smoking and cardiovascu- Hill’s Criteria for lar . Counterexamples of strong but noncausal associ- ations are also not hard to find; any study with Despite philosophic criticisms of inductive , strong illustrates the . For inductively oriented causal criteria have commonly example, consider the strong but noncausal relation been used to make such . If a of ne- between Down syndrome and birth rank, which is cessary and sufficient causal criteria could be used confounded by the relation between Down syndrome to distinguish causal from noncausal associations and maternal age. Of course, once the confounding in observational studies, the job of the scientist factor is identified, the association is diminished by would be eased considerably. With such criteria, adjustment for the factor. These examples remind all the concerns about the or lack thereof in us that a strong association is neither necessary nor could be forgotten: it would only be sufficient for causality, nor is weakness necessary nor necessary to consult the checklist of criteria to see if sufficient for absence of causality. In addition to these a relation were causal. We know from counterexamples, we have to remember that neither that a set of sufficient criteria does not exist [3, relative nor any other measure of association is 6]. Nevertheless, lists of causal criteria have become a biologically consistent feature of an association; as popular, possibly because they seem to provide a road described by many authors [4, 7], it is a characteristic map through complicated territory. of a study population that depends on the relative A commonly used set of criteria was proposed prevalence of other causes. A strong association by Sir Austin Bradford Hill [1]; it was an expan- serves only to rule out hypotheses that the association sion of a set of criteria offered previously in the is entirely due to one weak unmeasured confounder landmark Surgeon General’s report on Smoking and or other source of modest . [11], which in turn were anticipated by the inductive canons of [5] and the rules of causal inference given by Hume [3]. Hill Consistency suggested that the following aspects of an associa- tion be considered in attempting to distinguish causal Consistency refers to the repeated of an from noncausal associations: strength, consistency, association in different populations under different specificity, temporality, biologic gradient, plausibil- circumstances. Lack of consistency, however, does ity, coherence, experimental , and analogy. not rule out a causal association, because some effects The popular view that these criteria should be used are produced by their causes only under unusual cir- for causal inference makes it necessary to examine cumstances. More precisely, the effect of a causal them in detail: agent cannot occur unless the complementary com- ponent causes act, or have already acted, to complete Strength a sufficient . These conditions will not always be met. Thus, transfusions can cause HIV Hill’s is essentially that strong associations but they do not always do so: the virus must also be are more likely to be causal than weak associations . Tampon use can cause toxic shock syndrome, because, if they could be explained by some other but only when other conditions are met, such as pres- factor, the effect of that factor would have to be ence of certain bacteria. Consistency is apparent only even stronger than the observed association and there- after all the relevant details of a causal mechanism are fore would have become evident (see Cornfield’s understood, which is to say very seldom. Even stud- Inequality). Weak associations, on the other hand, ies of exactly the same phenomena can be expected are more easily explained by undetected .To to yield different simply because they differ some extent this is a reasonable argument, but, as in their methods and random errors. Consistency Hill himself acknowledged, the that an asso- serves only to rule out hypotheses that the associ- ciation is weak does not rule out a causal con- ation is attributable to some factor that varies across nection. A commonly cited counterexample is the studies.

Encyclopedia of Biostatistics, Online © 2005 John Wiley & Sons, Ltd. This article is © 2005 John Wiley & Sons, Ltd. This article was published in the of Biostatistics in 2005 by John Wiley & Sons, Ltd. DOI: 10.1002/0470011815.b2a03072 2 Hill’s Criteria for Causality

Specificity a J-shaped dose–response curve is at least biologi- cally plausible. The criterion of specificity requires that a cause leads Conversely, associations that do show a monotonic to a single effect, not multiple effects. This argument trend in disease frequency with increasing levels of has often been advanced to refute causal interpre- exposure are not necessarily causal; confounding can tations of exposures that appear to relate to myr- in a monotonic relation between a noncausal iad effects, especially by those seeking to exonerate risk factor and disease if the confounding factor smoking as a cause of lung . The criterion is itself demonstrates a biologic gradient in its relation wholly invalid, however. Causes of a given effect with disease. The noncausal relation between birth cannot be expected to lack other effects on any rank and Down syndrome mentioned above shows a logical grounds. In fact, everyday teaches biologic gradient that merely reflects the progressive us repeatedly that single events or conditions may relation between maternal age and the occurrence of have many effects. Smoking is an excellent example: Down syndrome. it leads to many effects in the smoker. The existence Thus the existence of a monotonic association is of one effect does not detract from the possibility that neither necessary nor sufficient for a causal relation. another effect exists. Thus, specificity does not confer A nonmonotonic relation only conflicts with those greater validity to any causal inference regarding the causal hypotheses specific enough to predict a mono- exposure effect. Hill’s discussion of this criterion tonic dose–response curve. for inference is replete with reservations, and many authors regard this criterion as useless and misleading Plausibility [8, 9]. Plausibility refers to the biologic plausibility of the , an important concern but one that is far Temporality from objective or absolute. Sartwell [9], emphasizing Temporality refers to the necessity that the cause pre- this point, cited the remarks of Cheever, in 1861, who cede the effect in . This criterion is unarguable, was commenting on the etiology of typhus before its insofar as any claimed observation of causation must mode of transmission (via body lice) was known: involve the putative cause C preceding the putative It could be no more ridiculous for the stranger who effect D. It does not, however, follow that a reverse passed the night in the steerage of an emigrant ship time order is evidence against the hypothesis that C to ascribe the typhus, which he there contracted, to can cause D. Rather, in which C fol- the vermin with which bodies of the sick might be lowed D merely shows that C could not have caused infested. An adequate cause, one reasonable in itself, D in these instances; they provide no evidence for or must correct the coincidences of simple experience. against the hypothesis that C can cause D in those What was to Cheever an implausible instances in which it precedes D. turned out to be the correct explanation, since it was indeed the vermin that caused the typhus infection. Biologic Gradient Such is the problem with plausibility: it is too often not based on logic or , but only on prior beliefs. Biologic gradient refers to the presence of a mono- This is not to say that biological should tone (unidirectional) dose–response curve. We often be discounted when evaluating a new hypothesis, expect such a monotonic relation to exist. For exam- but only to point out the difficulty in applying that ple, more smoking means more carcinogen exposure knowledge. and more tissue damage, hence more carcinogenesis. The Bayesian approach to inference attempts to Such an expectation is not always present, however. deal with this problem by requiring that one quan- The somewhat controversial topic of alcohol con- tify, on a probability (0 to 1) scale, the certainty that sumption and mortality is an example. Death rates one has in prior beliefs, as well as in new hypotheses. are higher among nondrinkers than among moderate This quantification displays the dogmatism or open- drinkers, but ascend to the highest levels for heavy mindedness of the analyst in a public fashion, with drinkers. Because modest alcohol consumption can certainty values near 1 or 0 betraying a strong com- have beneficial effects on serum lipid profiles, such mitment of the analyst for or against a hypothesis. It

Encyclopedia of Biostatistics, Online © 2005 John Wiley & Sons, Ltd. This article is © 2005 John Wiley & Sons, Ltd. This article was published in the Encyclopedia of Biostatistics in 2005 by John Wiley & Sons, Ltd. DOI: 10.1002/0470011815.b2a03072 Hill’s Criteria for Causality 3 can also provide a means of testing those quantified Although experimental tests can be much stronger beliefs against new evidence [2]. Nevertheless, the than other tests, they are not as decisive as often Bayesian approach cannot transform plausibility into thought, because of difficulties in interpretation. For an objective causal criterion. example, one can attempt to test the hypothesis that malaria is caused by swamp gas by draining swamps in some areas and not in others to see if the malaria Coherence rates among residents are affected by the draining. As predicted by the hypothesis, the rates will drop in Taken from the Surgeon General’s report on Smok- the areas where the swamps are drained. As Pop- ing and Health [11], the term coherence implies that per emphasized, however, there are always many a cause and effect interpretation for an association alternative for the outcome of every does not conflict with what is known of the natu- . In this example, one alternative, which ral and of the disease. The examples happens to be correct, is that mosquitoes are respon- Hill gave for coherence, such as the histopathologic sible for malaria transmission. effect of smoking on bronchial epithelium (in refer- ence to the association between smoking and ) or the difference in lung cancer incidence Analogy by sex, could reasonably be considered examples Whatever insight might be derived from analogy is of plausibility as well as coherence; the distinction handicapped by the inventive imagination of scien- appears to be a fine one. Hill emphasized that the tists who can find analogies everywhere. At best, absence of coherent , as distinguished, analogy provides a source of more elaborate hypothe- apparently, from the presence of conflicting infor- ses about the associations under study; absence of mation, should not be taken as evidence against an such analogies only reflects lack of imagination or association being considered causal. On the other experience, not falsity of the hypothesis. hand, presence of conflicting information may indeed undermine a hypothesis, but one must always remem- ber that the conflicting information may be mistaken Conclusion or misinterpreted [12]. As is evident, the standards of epidemiologic evi- dence offered by Hill are saddled with reservations Experimental Evidence and exceptions. Hill himself was ambivalent about the utility of these “standards” (he did not use the word It is not clear what Hill meant by experimental evi- criteria in the paper). On the one hand he asked “in dence. It might have referred to evidence from lab- what circumstances can we pass from this observed oratory on animals, or to evidence from association to a verdict of causation?” (original human experiments. Evidence from human experi- emphasis). Yet, despite speaking of verdicts on cau- ments, however, is seldom available for most epi- sation, he disagreed that any “hard-and-fast rules of demiologic research questions, and animal evidence evidence” existed by which to judge causation: relates to different species and usually to levels of exposure very different from those that humans None of my nine viewpoints [criteria] can bring experience. From Hill’s examples, it seems that what indisputable evidence for or against the cause-and- effect hypothesis and none can be required as a sine he had in mind for experimental evidence was the qua non. result of removal of some harmful exposure in an intervention or prevention program, rather than the Actually, the fourth criterion, temporality, is a sine results of laboratory experiments [10]. The lack of qua non for causality: If the putative cause did availability of such evidence would at least be a not precede the effect, that indeed is indisputable pragmatic difficulty in making this a criterion for evidence that the observed association is not causal inference. Logically, however, experimental evidence (although this evidence does not rule out causality in is not a criterion but a test of the causal hypothesis, a other situations, for in other situations the putative test that is simply unavailable in most epidemiologic cause may precede the effect). Other than this one circumstances. condition, however, which may be viewed as part

Encyclopedia of Biostatistics, Online © 2005 John Wiley & Sons, Ltd. This article is © 2005 John Wiley & Sons, Ltd. This article was published in the Encyclopedia of Biostatistics in 2005 by John Wiley & Sons, Ltd. DOI: 10.1002/0470011815.b2a03072 4 Hill’s Criteria for Causality of the definition of causation, there is no necessary [7] Rothman, K.J. (1976). Causes, American Journal of or sufficient criterion for determining whether an 104, 587–592. observed association is causal. [8] Rothman, K.J. & Greenland, S. (1997). Modern Epi- demiology, 2nd Ed. Lippincott, Philadelphia, Chapter 8. Acknowledgment [9] Sartwell, P. (1960). On the of investiga- tions of etiologic factors in chronic – further This article is adapted from Chapter 2 of Modern Epidemi- comments, Journal of Chronic Diseases 11, 61–63. ology 2nd Ed. [8], with permission from the publisher. [10] Susser, M. (1988). Falsification, verification and causal inference in epidemiology: reconsiderations in the light References of Sir ’s philosophy, in Causal Infer- ence, K.J. Rothman, ed. Epidemiology Resources, Inc., [1] Hill, A.B. (1965). The environment and disease: associ- Boston. ation or causation?, Proceedings of the Royal Society of [11] US Department of Health, Education and Welfare 58, 295–300. (1964). Smoking and Health: Report of the Advisory [2] Howson, C. & Urbach, P. (1993). Scientific Reasoning. Committee to the Surgeon General of the The Bayesian Approach, 2nd Ed. Open Court, LaSalle. Service, Public Health Service Publication No. 1103. [3] Hume, D. (1978). A Treatise of Human (origi- Government Printing Office, Washington. nally published in 1739). Oxford University Press edi- [12] Wald, N.A. (1985). Smoking, in Cancer and tion, with an Analytical Index by L. A. Selby-Bigge, Prevention,M.P.Vessey&M.Gray,eds.Oxford published 1888. 2nd Ed. with text revised and notes by University Press, New York, Chapter 3. P.H. Nidditch, published 1978. [4] MacMahon, B. & Pugh, T.F. (1967). Causes and entities (See also Causation) of disease, in Preventive Medicine,D.W.Clark& B. MacMahon, eds. Little, Brown & Company, Boston. ENNETH OTHMAN [5] Mill, J.S. (1862). A System of Logic, Ratiocinative and K J. R & Inductive, 5th Ed. Parker, Son and Bowin, London. SANDER GREENLAND [6] Popper, K.R. (1968). The Logic of Scientific Discovery. Harper & Row, New York.

Encyclopedia of Biostatistics, Online © 2005 John Wiley & Sons, Ltd. This article is © 2005 John Wiley & Sons, Ltd. This article was published in the Encyclopedia of Biostatistics in 2005 by John Wiley & Sons, Ltd. DOI: 10.1002/0470011815.b2a03072