<<

Comment on “The Environment and Disease: Association or Causation?”

Christopher J. Phillips, Joel Greenhouse

Observational Studies, Volume 6, Issue 2, 2020, pp. 24-29 (Article)

Published by University of Pennsylvania Press DOI: https://doi.org/10.1353/obs.2020.0005

For additional information about this article https://muse.jhu.edu/article/793346/summary

[ Access provided at 2 Oct 2021 13:57 GMT with no institutional affiliation ] Observational Studies 6 (2020) 24-29 Submitted 8/19; Published 1/20

Comment on “The Environment and Disease: Association or Causation?”

Christopher J. Phillips [email protected] Department of History Joel Greenhouse [email protected] Department of Statistics Carnegie Mellon University Pittsburgh, PA 15289

A. B. Hill’s 1965 discussion of the relationship between statistical association and causal- ity has become so well known among epidemiologists, it is easy to treat his list of relevant “aspects” as timeless and context-less, as a set of logical postulates. In this comment, we instead want to place Hill’s article in its historical context, both that of the author, and even more so that of ’s understanding of statistical association and causation. The most obvious, but often neglected, context is that the text was first presented as the Presidential Address to the Royal Society of Medicine’s Section of Occupational Medicine. Presidential addresses, particularly when given by senior colleagues, are opportunities for reflection beyond a typical research article. There’s nothing necessary about this reflectivity, however, and the following year’s presidential address by R.S.F. Schilling was instead an impassioned account of the dangers of trawler fishing (Schilling, 1966). This was also not the first such address Hill himself had given – his May 1954 address to the Epidemiology and Preventive Medicine section eschewed broad generalization in favor of a more traditional account of the expected versus observed cases of interwar polio in England and Wales, broken down by county and locale (Hill, 1954). The important contextual difference in October 1964 was that the Occupational Medicine section had just been formed, and Terence Cawthorne’s opening address noted that the sec- tion was intended not just for physicians and surgeons but also scientists from a range of disciplines (Cawthorne, 1965). As subsequent speakers at the first meeting made clear, the traditional focus on industrial medicine was still present, but the renaming as “occupational medicine” was a move indicating its broader audience, from sports physicians to education workers. In this context, Hill set out to address a question that by its very nature was inter- disciplinary and of great interest to this larger group: what is the relationship between an environmental agent and disease, and when is it appropriate to identify such a relationship as causal? This was a pressing issue for the new section, Hill wrote, because the characterization of the relationship between occupational conditions and sickness is “fundamental” and yet problematic. When is a respiratory illness among workers, he asks, simply associated with dust in the environment, and when is it caused by it? As Hill well knew, the question of causation cannot be answered solely by any one field, instead, by drawing on physiol-

c 2020 Christopher Phillips and Joel Greenhouse. Comment on “The Environment and Disease: Association or Causation?”

ogy, statistics, labor relations, , and pulmonology, Hill sought to portray the question as essentially interdisciplinary and therefore of great interest to the new section. Hill himself had long been concerned with causal questions. Though he modestly doesn’t cite his own work, he had written a long report on the prevalence and origins of respira- tory illnesses among Lancashire’s cotton cardroom operators in the 1920s for the Medical Research Council. The cleaning of raw cotton prior to spinning was known to cast off dust and fibers, and in 1927 the UK’s home secretary established an investigation into “whether, and if so to what extent, dust in cardrooms in the cotton industry is a cause of ill-health or disease among cardroom operatives” (Hill, 1930). It was known already that there were health effects from the cleaning of the carding machines themselves, and the owners of the mills contended that the mechanical methods installed to remove dust had remedied the problem, while operatives contended that cleaning cotton carried health risks distinct from the cleaning of the machines. Hill was tasked with finding out what kind of ill effects, if any, there were from the cleaning of cotton, and whether these were distinct from other respiratory diseases known to occur at different stages of the process, and what if anything could be done about it. Hill’s choice of this same example in 1964 is a clear indication that the question of “environment and disease” was one that he had been contemplating for a long time. Hill’s role within industrial health efforts of the 1920s and 1930s put him in contact with colleagues who themselves saw an essential role for statistics in making causal claims. Unlike then-contemporary biomedicine’s focus on bacteria and other microscopic agents of infec- tion, industrial and environmental health were areas in which “causal factors” and “causal relationships” were known to be multifactored, complicated, and often hidden. Work in these areas using statistical rates to make claims about causality, from the relationship of housing and health to that of infant mortality, goes back at least to William Farr and Flo- rence Nightingale. Later, had used statistical methods, specifically a regression equation, to try to pinpoint causes of pauperism in England at the turn of the century.1 Hill’s work at the Medical Research Council along these lines was initially overseen by , an influential statistician whose own training, combining statistics (he stud- ied under Karl Pearson) with physiology and preventive medicine, exemplified the growing importance of data for measuring associations between health and environmental agents (Higgs, 2000). Indeed, there’s a good argument that Hill inherited the mantle of statistics in medicine from this earlier generation. Their approaches – emphasizing careful study design and data collection techniques, relying on relatively conservative uses of statistics, avoiding formal inferences tests and elaborate models – would later characterize Hill’s own approach throughout his career. With remarkable clarity in his 1964 address, Hill lays out the question of interest: Our observations reveal an association between two variables, perfectly clear- cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation? (p. 295) Contrary to Rothman and Greenland’s claim (1998) that Hill’s criteria were essentially “an expansion of criteria offered previously in the landmark Surgeon General’s report on

1Yule (1899) was discussed thoughtfully alongside other examples in Freedman (1999).

25 Phillips and Greenhouse

Smoking and Health,” Hill’s approach for distinguishing causal from non-causal associations was developed based on his longstanding experience in epidemiologic field studies. As an illustration we consider his influential paper with Richard Doll, “Smoking and Carcinoma of the Lung” (1950). This was an early case-control study of 20 hospitals in the London region of patients presenting with cancer of the lung, stomach and large bowel. The patients with carcinoma of the stomach and large bowel served as one comparison group and another comparison group were non-cancer general hospital patients, “chosen so as to be of the same sex and age as the lung-carcinoma patients.” In the Discussion section, Doll and Hill synthesized the results, in language that would both implicitly and explicitly feature the relevant aspects of association, specificity, biological gradient, consistency, coherence, and plausibility: • “...the comparison of the smoking habits of patients in different groups...revealed no association between smoking and cancer of the other sites (mainly stomach and large bowel. The association therefore seems to be specific to carcinoma of the lung.” • “The effect of smoking varies, as would be expected, with the amount smoked.” • “How do these results fit in with other known facts about smoking and carcinoma of the lung? Both the consumption of tobacco and the number of deaths attributed to cancer of the lung are known to have increased, and to have increased largely, in many countries this century.” • “As to the nature of the carcinogen we have no evidence. The only carcinogenic substance which has been found in tobacco smoke is arsenic, but the evidence that arsenic can produce carcinoma of the lung is suggestive rather than conclusive. Should arsenic prove to be the carcinogen, the possibility arises that it is not the tobacco itself which is dangerous. Insecticides containing arsenic have been used for the protection of the growing crop since the end of the last century and might conceivably be the source of the responsible factor.” Clearly, Doll and Hill are systematically and logically assessing the body of evidence from their study and the existing epidemiologic literature to make a case for (or against) a causal association. In the last quotation, Doll and Hill consider an alternative explanation for the ob- served association between tobacco smoke and carcinoma of the lung—arsenic in tobacco. Curiously, in his 1964 address he does not explicitly include elimination of alternative expla- nations as a relevant aspect. However, Hill clearly considered the elimination of alternative explanations central to the establishment of a causal association as he indicated in his Wat- son Memorial Lecture (1962), itself repeatedly cited in the 1964 address. In the section entitled “The Assessment of Evidence” he writes: We are continuously brought back to the fundamental question - what alterna- tive explanation will fit a set of observations, what other differences between our contrasted groups could equally, or better, account for the observed incidences. That is the crux of the matter - and no χ2 test or other application of the Greek alphabet will answer it. It demands an experience, and acumen, in what to collect, how to seek in the data for essentials, how to interpret. (p. 189)

26 Comment on “The Environment and Disease: Association or Causation?”

This focus on alternative explanations predated Hill’s work with Doll on smoking and health. Indeed, his experience with the Industrial Health Research Board had already established this pattern in the 1920s. When looking at whether artificial humidification produced diseases among workers in the cotton industry’s sheds, for example, Hill showed how simply by collecting the right data one could carefully rule out possible alternative explanations for the sickness of workers (Hill, 1927). He’d include towns with only one kind of shed, to ensure sicker workers were not selecting the humid sheds; he’d also include towns with both humid and dry sheds to ensure there was not a bias at that level of selection; he’d track the sickness of every weaver, even those who left before the end of the study, to ensure that the departure of the sickest workers was not itself an explanation of the data; he consulted over 20,000 records to be sure that any measured differences were unlikely to be explained by chance. Ruling out alternative explanations didn’t involve formal tests for Hill so much as it did careful data collection and logical thinking. Counterfactual history is a dangerous game to play but it is likely uncontroversial to say that even if Hill hadn’t published his list of criteria, some set of criteria for moving from statistical association to causation would have become standard by the 1980s. We might instead be talking about the U.S. Surgeon General’s “Criteria of the Epidemiologic Method” since their publication in 1964’s Smoking and Health featured a similar empha- sis on consistency, strength, specificity, temporal relationship, and coherence. Throughout the 1950s, a number of eminent epidemiologists and statisticians had tried to specify how exactly statistical methods might be useful for making causal claims. In 1959, Jerome Corn- field and colleagues published a long review article systematically evaluating the question of causation in smoking and health. The conclusion – that smoking plays a “causal role” in the acquisition of lung cancer – was based on the existing data’s strength, specificity, tempo- ral relationship, consistency, coherence, and a systematic attempt to eliminate alternative explanations (Greenhouse 2009). These were, of course, exactly the criteria later used in the Surgeon General’s report. Two years earlier, Abraham M. Lilienfeld’s “Epidemiological Methods and Inferences in Studies of Noninfectious Diseases” (1957) also used smoking as the model for thinking about how to elucidate possible etiological factors, settling on a pragmatic approach to analyzing statistical associations which emphasized, e.g., that when the exposure to a factor is diminished, so should the incidence. And two years before that, E. Cuyler Hammonds’ chapter on “Cause and Effect” in Ernest Wynder’s The Biologic Effects of Tobacco made it plain that “The eventual aim of epidemiologic research is to dis- cover means by which conditions may be altered in such a way as to lower disease incidence and mortality rates or at least to limit their rise. Thus, it becomes a search for causative factors.” Moreover, for Hammonds, causative factors were readily ascertained statistically because they typically acted “quantitively,” namely they “increas[ed] the probability that a specific event will occur” (Hammonds, 1955: 173-174). Though with different emphases and subtleties, there was a consistent approach from many epidemiologists and biostatisti- cians to think systematically and rigorously about how to make causal claims in the years after World War II. Hill’s lecture, while more expansive in its list of criteria than anything that came before, did little to change the overall trajectory of how practitioners were using statistical measures of association. As a coda, it is worthwhile to remember that Hill’s lecture wasn’t hailed as a masterpiece, or even as particularly important when it was first published. Though now widely cited,

27 Phillips and Greenhouse

it was hardly cited at all through the 1970s (only about twice per year on average in the fifteen years after publication). In fact, Mervyn Susser’s 1973 textbook Causal Thinking in the Health Sciences has its “criteria of judgment” explicitly modeled around the Surgeon General’s report. As Susser was to admit years later, he had not even read Hill’s article when preparing his textbook, surely a sign if there is any that its importance wasn’t immediately apparent (Susser, 1973, 1991). We think it plausible, in fact, that the association of causal criteria with Hill’s speech was ultimately more of an honorary gesture towards Hill’s own importance in the 1980s and 1990s. By that point, there was little doubt of his lasting role, in the development of clinical trials and observational studies, in the use of statistics by governmental agencies, and, indeed, in the ways we can make rigorous causal claims.

References Cawthorne, T. (1965). Opening Address. Proceedings of the Royal Society of Medicine. 58(5):289-94. Cornfield, J. et al. (1959). Smoking and Lung Cancer: Recent Evidence and a Discussion of Some Questions. Journal of the National Cancer Institute. 22(1):173-203. Doll R. and Hill, A.B. (1950 Sep 30). Smoking and Carcinoma of the Lung. British Medical Journal. 2(4682):739-748. Freedman, D. (1999). From Association to Causation: Some Remarks on the History of Statistics. Statistical Science. 14(3):243-258. Greenhouse, J.B. (2009). Commentary: Cornfield, Epidemiology, and Causality. Interna- tional Journal of Epidemiology, 38(5):1199-1201. Hammond, E.C. (1955). Cause and Effect. The Biologic Effects of Tobacco. E.L. Wynder, ed. Little, Brown, Boston. 171-196. Higgs, E. (2000). , Patronage, and the State: The Development of the MRC Statistical Unit, 1911-1948. Medical History, 44:323-340. Hill, A.B. (1927). Artificial Humidification in the Cotton Weaving Industry. Medical Research Council, Industrial Fatigue Research Board, Report No. 48. London: Her Majesty’s Stationary Office. Hill, A.B. (1930). Sickness Among Operatives in Lancashire Cotton Spinning Mills. Medical Research Council, Industrial Health Research Board, Report No. 59. London: Her Majesty’s Stationary Office. Hill, A.B. (1954). Poliomyelitis in England and Wales Between the Wars. Proceedings of the Royal Society of Medicine. 47(9):795-805. Hill, A.B. (1962). Alfred Watson Memorial Lecture: The Statistician in Medicine. Journal of the Institute of Actuaries. 88(2):178-191. Lilienfeld, A.M. (1957). Epidemiological Methods and Inferences in Studies of Noninfectious Diseases. Public Health Reports. 72(1):51-60. Rothman, K.J. and Greenland, S. (1998). Causation and Causal Inference. In Modern Epidemiology, 2nd edition. Lippincott Williams & Wilkins, Philadelphia. Schilling, R.S. (1966). Trawler Fishing: An Extreme Occupation. Proceedings of the Royal Society of Medicine. 59(5):405-410. Susser, M. (1973). Causal Thinking in the Health Sciences. Oxford University Press, New York.

28 Comment on “The Environment and Disease: Association or Causation?”

Susser, M. (1991). What is a Cause and How Do We Know One? A Grammar for Pragmatic Epidemiology. American Journal of Epidemiology, 133(7):635-648. Yule, G.U. (1899). An Investigation into the Causes of Changes in Pauperism in England, Chiefly During the Last Two Intercensal Decades. Journal of the Royal Statistical Society. 62(2):249-295.

29