Vol. 5. 303-31 1, April 1996 Cancer Epidemiology, Biomarkers & Prevention 303
Review
The Practice of Causal Inference in Cancer Epidemiology
Douglas L. Weedt and Lester S. Gorelic causes lung cancer (3) and to guide causal inference for occu-
Preventive Oncology Branch ID. L. W.l and Comprehensive Minority pational and environmental diseases (4). From 1965 to 1995, Biomedical Program IL. S. 0.1. National Cancer Institute, Bethesda, Maryland many associations have been examined in terms of the central 20892 questions of causal inference. Causal inference is often practiced in review articles and editorials. There, epidemiologists (and others) summarize evi- Abstract dence and consider the issues of causality and public health Causal inference is an important link between the recommendations for specific exposure-cancer associations. practice of cancer epidemiology and effective cancer The purpose of this paper is to take a first step toward system- prevention. Although many papers and epidemiology atically reviewing the practice of causal inference in cancer textbooks have vigorously debated theoretical issues in epidemiology. Techniques used to assess causation and to make causal inference, almost no attention has been paid to the public health recommendations are summarized for two asso- issue of how causal inference is practiced. In this paper, ciations: alcohol and breast cancer, and vasectomy and prostate we review two series of review papers published between cancer. The alcohol and breast cancer association is timely, 1985 and 1994 to find answers to the following questions: controversial, and in the public eye (5). It involves a common which studies and prior review papers were cited, which exposure and a common cancer and has a large body of em- causal criteria were used, and what causal conclusions pirical evidence; over 50 studies and over a dozen reviews have and public health recommendations ensued. Fourteen appeared. The association between vasectomy and prostate published reviews on alcohol and breast cancer and 6 cancer also involves a common exposure and a common cancer. published reviews on vasectomy and prostate cancer were It is a relatively recent controversy with a smaller empirical examined. For both series of reviews, nearly all available literature and relatively few reviews. published studies were cited except for ecological studies The intent of this “review of reviews” is not to criticize and prior reviews. Sources of causal criteria were often individual reviewers nor to make judgments regarding associ- not provided. When they appeared, all citations were ations. The limitations of making summary statements from either the 1964 Surgeon General’s report or works of small selected samples are recognized. Recommendations for Austin Bradford Hill. Reviews often excluded and possible improvements to the practice of causal inference are sometimes altered criteria without giving reasons for offered. These are based upon the summary findings and upon these changes. The criteria of consistency and strength of our understanding of two areas of related research: the methods association were almost always used accompanied by and theory of causal inference and the scientific quality of dose-response and biological plausibility in a majority of review papers. reviews. The criterion of temporality, considered by many methodologists to be a necessary causal condition, was infrequently used. Confounding and bias were often Materials and Methods added considerations. Public health recommendations were not discussed in nearly one-half of the reviews. The MEDLINE and CANCERLIT data bases were searched from 1977 through June 1994, using the keywords “alcohol and breast cancer” and “vasectomy and prostate cancer.” Reference Introduction lists of primary research studies and reviews identified were What causes cancer? How much and what kinds of evidence perused for mention of relevant articles. Tables of contents lead to causal conclusions? When should public health recom- from major medical, cancer, public health, and epidemiology mendations be made? These are central questions of causal journals available at the libraries of the NIH were also inference in cancer epidemiology. Causal inference has been an examined. important methodological focus for at least half a century (1, 2). For both associations, three types of review papers were A classic point of reference is the mid- I 960s, when two papers, found: editorials in which a review of existing literature was one by a committee appointed by the Surgeon General (3) and included and either cause or public health recommendations the other by Austin Bradford Hill (4), listed what are usually were considered, reviews devoted to the specific association, called causal criteria. These criteria are shown in Table I . At and “mini-reviews” subsumed within reviews of more general the time, they were used to support the claim that smoking issues. Typically, these mini-reviews were relatively brief sec- tions in papers on alcohol and cancer, diet and cancer, breast (and prostate) cancer epidemiology, and male sterilization.
Received 5/23/95; revised I 2/4/95; accepted I 2/5/95. Only editorials and reviews devoted solely to the specific The costs of publication of this article were defrayed in part by the payment of associations are included in this paper, with one exception, an page charges. This article must therefore be hereby marked advertisement in IARC monograph on alcohol and cancer with a review of the accordance with 18 U.S.C. Section 1734 solely to indicate this fact. alcohol and breast cancer literature (6). To whom requests for reprints should be addressed, at National Cancer Institute, EPS T-4I, 9000 Rockville Pike, Bethesda, MD 20892. Phone: (301) 496-8640; Editorials and reviews are subsequently referred to as Fax: (3))l ) 402-0863. “reviews.” For each, the following questions were addressed:
Downloaded from cebp.aacrjournals.org on September 27, 2021. © 1996 American Association for Cancer Research. .304 Review: Causal Inference in Cancer Epidemiology
Table I Causal criteria as described by A. B. Hill (4) and by the Surgeon (3); in both, the original five criteria were used. The remaining General’s committee (3) four reviews cited works of Austin Bradford Hill (4, 97). In only one instance was a list of nine criteria used in the assess- Consistency” ment (80), although “epidemiological sense” replaced the cri- Strength” Dose-response tenon ofcoherence ofHill. Chu (76) cited Hill (4) but used only Biological plausibility four of his nine criteria: consistency, strength, biological cred- Specificity” ibility (i.e., plausibility), and experimentation. Longnecker (63) Temporality” also cited Hill (4) and used a subset comprising consistency, Experimentation strength, dose-response, and biological plausibility. Lowenfels Coherence’ et a!. (75) cited Hill (97) and used five criteria. In the eight Analogy reviews in which no source for criteria is cited, the number of
“ Criteria described by the Surgeon General’s committee (3). criteria used varied from one (consistency; Ref. 72) to six of the original nine of Hill (4, 77). Overall Use of Causal Criteria and Other Considerations. Because the 1964 Surgeon General’s list ofcausal criteria form K Which studies are included? (Asked as a check for a subset of the 1965 criteria of Hill, and because reviewers systematic exclusion.) citing no source nevertheless used many of the same criteria, ( Which causal criteria are used and what sources for the overall use of criteria can be examined. Table 3 shows that these criteria are cited? the criterion of consistency was always used and strength of ( Are other considerations (e.g. , study design, bias, and association was nearly always used in this series of reviews. confounding) featured? Dose-response and biological plausibility were also used rela- ( What causal conclusions and public health recommen- tively frequently. In only 4 of 14 reviews was the criterion of dations are made? temporality discussed. The need for a randomized trial and its K What is the relationship of a conclusion or recommen- ethical and practical feasibility was considered in 3 of the 14 dation to the results of original research studies published by reviews. Coherence and analogy were rarely used. Table 2 the same author? reveals that bias (7 of 14) and confounding (I I of 14) were the K Do reviewers cite the methods, conclusions, or recom- most common additional considerations in this series. mendations of earlier reviews? Causal Conclusions. The question of cause was addressed in 10 of 14 reviews. None claimed cause. Conclusions included Results reasonably clear rejections of the idea (78, 80, 81) and those Akohol and Breast Cancer that left the issue undecided (70, 73, 74, 77, 82). Two reviewers The consumption of alcoholic beverages has been linked with appeared to lean in the direction of concluding cause. Long- breast cancer since 1974 (7). In the ensuing two decades, ecolog- necker (63) for example, argued that the “evidence appears to ical studies (8-1 1), case-control studies (12-50), and cohort stud- be growing stronger. “ Hiatt (79) argued that the data were “not ies (5 1-60) have been published, representing a total ofjust over yet sufficient to conclude .. . causation.” 50 studies. Metaanalyses have also appeared (61-63). Reports Public Health Recommendations. Public health recommen- from a workshop (64-67) and papers discussing aspects of pub- dations were addressed in 8 of 14 reviews. Three of the eight lished cohort studies (68, 69) have contributed to the controversy. reviews included recommendations for reductions in alcohol Twenty-two reviews (63, 70-90) were published between 1985 consumption, with an emphasis on women at high risk of breast and 1994, appearing every year except 1986. Fourteen satisfied cancer and without cardiovascular disease risk factors (7 1, 72, inclusion criteria (63, 70-82) and are summarized in Table 2; 79). Recommendations were considered premature in two of three were published as editorials (70-72). the eight reviews (76, 80). Another reviewer argued that total Studies Included. The vagaries of the publication process make abstinence would be required and therefore could not be rec- precise determination of studies available to authors difficult, ommended (75), whereas others would not advocate alcohol beyond the obvious condition that studies published less than 1 beverage consumption at any level (78). One reviewer noted year before publication of the review were probably not available. that “before any recommendations are made ... a risk benefit There is little evidence of systematic exclusion of studies, although analysis is in order” (63). Perhaps the strongest recommenda- ecological studies were irregularly cited, as were abstracts (9 1-96). tion to reduce alcohol beverage intake (in high-risk women) Three reviews published since 1991 used methodological criteria was included in a 1987 editorial (7 1) accompanying the pub- to exclude studies from consideration (63, 80, 82). Occasionally, lication of two cohort studies in the same issue of the New reports were not included in reviews for several years after pub- Eng!and Journa! of Medicine (53, 54). The author wrote that Iication. An example is a case-control study on diet and breast “whatever can be done. . should be done.” cancer conducted in Greece (21). Published in 1986, this study was Four of the eight reviews in which public health recom- cited in reviews published in 1988 (73), 1989 (77), 1991 (80), and mendations were considered did not discuss cause (71, 72, 75, 1994 (63) but not in seven other reviews published between 1988 76). Finally, two papers published within 1 year of each other and 1993. A possible explanation is that authors of reviews may used identical criteria (3) on almost the same data, and yet one not have been aware that in this study alcohol consumption was reviewer recommended reductions of alcohol intake for high- measured as one component of the diet. A review published in risk women (79) and the other did not (78). 1994 (63) identified seven case-control studies (43-49) published Authorship of Reviews and Studies. Half of the reviews (7 of between I 978 and 1992 that were rarely cited in previous reviews. 14) were written by individuals who were also authors of Six of these seven did not mention “alcohol” in their titles. studies on the same topic. The author of a positive cohort study Use of Causal Criteria and Sources Cited. Sources of causal argued in a later review that women at high risk should limit criteria were cited in six of fourteen reviews (63, 75, 76, their use of alcohol (79) and the authors of two negative 78-80). Two ofthe six cited the 1964 Surgeon General’s report case-control studies argued against causation (78) or that rec-
Downloaded from cebp.aacrjournals.org on September 27, 2021. © 1996 American Association for Cancer Research. Cancer Epidemiology, Biomarkers & Prevention 305
ommendations were premature (76). On the other hand, authors Discussion of a positive cohort study (74, 77) and of a positive case-control To what extent do these findings accurately represent the prac- study (82) made no public health recommendations nor claimed tice of causal inference in epidemiology? Has a methodological cause. A clear relationship between causal conclusions or pub- paradigm been described? We turn to the language popularized lic health recommendations and the results of studies conducted by the work of the historian and philosopher of science Thomas by the reviewers was not apparent. Kuhn (134) because it is useful in this context, although we are Reviewers Citing Prior Reviews. In only 3 of the 13 reviews mindful of its many meanings and possible linguistic devalu- was the conclusion of a prior reviewer about causation or public ation in scientific writing ( I 35, 136). The core idea is that a health recommendations mentioned (79-81). Citations of prior paradigm is shared by members of a scientific discipline and reviews were patchy. For example, a review published in 1991 represents the normal practice of that discipline (137). (80) mentioned the opposing interpretations of two reviews The practice of causal inference in only two exposure- published in 1989 (78) and 1990 (79) but did not cite the seven cancer associations may not be representative of the practice earlier reviews. A review in 1990 (79) mentioned four earlier within cancer epidemiology. Other reviews, summarizing ap- reviews but conclusions from only two (70, 78). All reviews proaches to causal inference for other associations, will help published in 1989 or earlier made no mention of the conclu- decide the issue. The fact that the findings for alcohol and sions of earlier reviewers about cause or public health. The breast cancer reviews were supported by the vasectomy and review by Plant (8 1) mentioned nearly all prior reviews but prostate cancer series lends support to the idea that a good included conclusions from only three (71, 77, 78). In some approximation to a current methodological paradigm has been instances, reviewers mentioned biases (82) or possible biolog- described. Also important is the fact that no reviewer cited as ical mechanisms (77) discussed in earlier reviews. a source of his or her approach to causal inference any paper after the mid-l960s (the only exception a 1971 version of a Summary of Findings. 1965 classic by the same author). This situation is consistent K Reviewers appear to cite nearly all available published with the definition of a paradigm as a classic work [i.e., as an studies except for ecological designs. Reasons for specific “exemplar” (134, 136)1. Indeed, many papers and textbooks use exclusions are infrequently given. the original criteria of either the Surgeon General (3) or Hill (4) ( Reviewers irregularly cite sources of causal criteria. All as the foundation for their treatment of causal inference meth- citations are either the 1964 Surgeon General’s criteria (3) or odology (138-148). Assuming that a good approximation to the works of Austin Bradford Hill, primarily his classic 1965 paper current practice paradigm has been described, we discuss two (4). categories of findings: the use of causal criteria and the “re- K In citing Hill, reviewers often exclude and sometimes sults” that emerge from their use, causal conclusions and public alter criteria. Reasons for these changes are not provided. health recommendations. ( Independent of citation status, subsets of the criteria of Regarding causal criteria, the most frequently used were Hill (4) are most frequently used. consistency, strength, biological plausibility, and (for alcohol ( Consistency and strength are almost always invoked, and breast cancer) dose-response. Temporality and experimen- accompanied by dose-response and biological plausibility in a tal evidence were much less frequently used, whereas coher- majority of reviews. ence, analogy, and epidemiological sense were almost never ( Confounding and bias are often added considerations. used. We take no issue at this time with the emphasis on K Public health recommendations (both for and against consistency, strength, biological plausibility, and dose-response changes in current policy or practice) appear without claiming (when appropriate). Nor do we question the relative infre- or disclaiming cause. quency with which the criterion of experimental evidence was Reviewers rarely address the methods or conclusions of ( used, given that no such evidence exists for either association. prior reviews. We are intrigued, however, by the lack of emphasis placed on temporality and coherence. In the alcohol and breast cancer series, temporality was Vasectomy and Prostate Cancer considered in only 4 of 14 reviews. In the vasectomy and prostate The health effects of vasectomy have been studied since the late cancer series, temporality was never used. This is a situation in I970s (98). Several prospective studies of morbidity rates in which practice and theory appear to diverge. Both Hill (4) and the vasectomized men (99-101), case-control studies (102-107), Surgeon General’s committee (3) recognized the importance of and cohort studies (108-1 14) have examined the relationship demonstrating the temporal priority of exposures over diseases, between vasectomy and prostate cancer. Seventeen studies have especially for slowly developing diseases such as cancer. With rare been published: accompanying them have been 19 publications exceptions (141), recent authors consider temporality to be indis- reviewing or commenting upon the studies (1 15-133). Six pensable (149), crucial (146), a directly deducible (150) sine qua reviews satisfied inclusion criteria (1 15-120), two of which non of causality (142), not a sufficient condition for causality but originally appeared as editorials (I 15, 116). a necessary one (143, 144). In a 1981 list of ranked criteria, The findings from the reviews on vasectomy and prostate temporality is among the top four, after experimentation, strength, cancer matched those from alcohol and breast cancer, with and consistency (151). A 1993 textbook puts temporality first in a these exceptions: ranked list ( 147). Interestingly, Hill (4) argued that “the temporal ( Dose-response was not used as a causal criterion be- problem may not arise often.” This is a reasonable assertion if cause of the all-or-none nature of the exposure. temporality refers only to the situation in which a purported cause K Because five of the six reviews were published in the is really an effect of the disease, so-called “reverse causation” same year, only one earlier review was available for citation. (146). Such a narrow definition of the criterion makes it easily One of the five ( 1 18) cited it. dispensable in some circumstances. In the alcohol and breast K No reviewers were also authors of original research on cancer series, for example, two reviews claimed that temporality the topic. could be considered moot, given cohort evidence demonstrating
Downloaded from cebp.aacrjournals.org on September 27, 2021. © 1996 American Association for Cancer Research. 306 Review: Causal Inference in Cancer Epidemiology