Are Statistical Contributions to Medicine Undervalued?
Total Page:16
File Type:pdf, Size:1020Kb
Biometrics 59, 1–8 March 2003 PRESIDENTIAL ADDRESS: XXI International Biometric Conference, Freiburg, Germany, July 2002 Are Statistical Contributions to Medicine Undervalued? Norman E. Breslow Department of Biostatistics, Box 357232, University of Washington, Seattle, Washington 98195-7232, U.S.A. email: [email protected] Summary. Econometricians Daniel McFadden and James Heckman won the 2000 Nobel Prize in eco- nomics for their work on discrete choice models and selection bias.Statisticians and epidemiologists have made similar contributions to medicine with their work on case-control studies, analysis of incomplete data, and causal inference.In spite of repeated nominations of such eminent figures as Bradford Hill and Richard Doll, however, the Nobel Prize in physiology and medicine has never been awarded for work in biostatistics or epidemiology.(The “exception who proves the rule” is Ronald Ross, who, in 1902, won the second medical Nobel for his discovery that the mosquito was the vector for malaria.Ross then went on to develop the mathematics of epidemic theory—which he considered his most important scientific contribution—and applied his insights to malaria control programs.) The low esteem accorded epidemiology and biostatistics in some medical circles, and increasingly among the public, correlates highly with the contradictory results from observational studies that are displayed so prominently in the lay press.In spite of its demonstrated efficacy in saving lives, the “black box” approach of risk factor epidemiology is not well respected.To correct these unfortunate perceptions, statisticians would do well to follow more closely their own teachings: conduct larger, fewer studies designed to test specific hypotheses, follow strict protocols for study design and analysis, better integrate statistical findings with those from the laboratory, and exercise greater caution in promoting apparently positive results. Key words: Black box; Epidemiology; Nobel Prize; Observational studies; Ronald Ross. The international community of statisticians was elated the selection committee has broadly interpreted the domain by news that the 2000 Nobel Prize in economics had been of economic science, and has granted awards for methodology awarded to econometricians James Heckman and Daniel and interdisciplinary work.Previous economics prizes, for ex- McFadden.Heckman was cited for the “development of the- ample, had gone to Frisch and Tinbergen in 1969 for develop- ory and methods for analyzing selective samples,” McFadden ment of “dynamic models for analysis of economic processes,” “for his development of theory and methods for analyz- to Arrow and Hicks in 1972 and Debreu in 1983 for “general ing discrete choice.” The presentation speech by Swedish economic equilibrium theory,” to Solow in 1987 for mathe- economist Professor Karl Gustav J¨oreskog made clear that matical models for macroeconomic growth, and to Haavelmo the selection of the 2000 Nobelists was based squarely in 1988 for the “probability theory foundations of economet- on their contributions to the development of statistical rics.” The 2000 prize was a dramatic confirmation of the value methodology (http://www.nobel.se/economics/laureates/ and importance of statistical thinking in the social sciences. 2000/presentation-speech.html): McFadden’s method for discrete choice analysis, devel- oped to assist a graduate student to study freeway routing Their methods have become standard tools of microeconomic choices made by the California Department of Transporta- research in economics as well as in other social sciences, and tion, had its basis in formal theories of economic behavior. have been applied to solving many important problems of Under Marschak’s random utility model, observed economic society. ...[They have] made it possible to study individual economic choices—such as whether to ride the bus or drive the car to behavior in a statistically correct way. work—result from multinomial sampling (Becker, DeGroot, and Marschak, 1963).Psychologist Duncan Luce (1959) added The Bank of Sweden Prize in Economic Sciences in Mem- the rather strong condition that the ratio of odds for any two ory of Alfred Nobel was established in 1968.Ever since then, choices was independent of the set of other available choices. 1 2 Biometrics, March 2003 These assumptions led McFadden (1973) to his “conditional .se/economics/laureates/2000/press.html, italics added): logit model” “[Heckman] ... has also proposed tools for solving closely re- eν(s,x) lated problems with individual differences unobserved by the Pr(x | s, B)= , ν(s,y) researcher.” His views on counterfactual causality were ex- ∈ e y B plicitly spelled out in a very readable review in the Jour- where x is the choice selected, B is the set of available nal of Educational Statistics (Heckman, 1989).Distilling the choices, and s measures attributes of the individual decision problem to its essence, he defined Y 1 to be the outcome maker.Corollary developments included methods for outcome if the subject was treated, Y 0 the outcome if it was not, dependent, “choice-based” sampling.Manski and Lerman and R to be a binary indicator of whether Y 1 (R =1) (1977) proposed a weighted likelihood method with origins or Y 0 was observed.The population of interest was the in sampling theory.Manski and McFadden (1981) developed causal population specified by the joint probability distribu- a pseudolikelihood procedure to which they referred as condi- tion F (y1, y0).What could be observed was the distribution tional maximum likelihood or CML.Cosslett (1981) studied G(y, r) of the observed outcome and the treatment indica- semiparametric efficient “maximum likelihood” estimation. tor.Citing Fisher (1951), Roy (1951), and Cox (1958a) as the McFadden and Reid (1975) realized the need to correct for source of these ideas, he noted that the fundamental problem attenuation when trying to forecast aggregate demand from of causal inference was the fact that disaggregated models, and proposed methods for the combi- β =E(Y − Y ) =E( Y | R =1)− E(Y | R =0). nation of data from individuals and aggregates. 1 0 As I pointed out in my 1995 Fisher Lecture to the Joint One possible way of making progress was to identify covari- Statistical Meetings (Breslow, 1996), much of this work in ates Z such that treatment assignment R was independent of econometrics was paralleled by biostatistical contributions to treatment outcomes (Y 1, Y 0) conditional on Z.And he was the analysis of data from case-control studies in epidemiology. very clear that this was always an assumption! These were stimulated by results of three case-control stud- If this all sounds very familiar—it is! Precisely the same ies of lung cancer and smoking published in 1950, the most ideas involving counterfactual causality have been explored influential of which was that of Doll and Hill (1950).They at length in the statistical literature.Key sources are Rubin included Cornfield’s (1951) demonstration of the preservation (1974) on causal models, Little and Rubin (1987) on the anal- of the odds ratio under case-control sampling, Mantel and ysis of incomplete data, Rosenbaum and Rubin (1984) on the Haenszel’s (1959) summary relative risk estimator, and Cox’s propensity score, Robins (1986) on the G-computation algo- (1958b, 1966) semiparametric approach to logistic regression. rithm, Robins, Hernan, and Brumback (2000) on the marginal Anderson (1972) and Prentice and Pyke (1979) sanctioned ap- structural model for longitudinal analysis, and Pearl (2000) plications of “prospective” logistic regression to case-control on graphical causal models.I was most disappointed to learn samples, while Breslow and Day (1980) developed extensions that an invited paper session at this conference that was de- for matched samples of cases and controls. voted to this important topic has had to be canceled. Heckman’s work also has parallels in the (bio)statistical The main point of all of this, of course, is that statis- literature.His selection problem was based on the model ticians, biostatisticians, and epidemiologists have been just π(z)f(z) as active and successful as have been their counterparts in g(z)= , econometrics in developing statistical methods for causal in- π(u)f(u) du ference, discrete outcomes, outcome-dependent sampling, and where g is the probability distribution observed for the se- many other statistical problems.However, because they have lected (biased) sample, π is the selection probability, and f is chosen to apply their work in medicine and public health, the target (population) distribution.Solution of the problem rather than economics and social science, none is likely ever stems from the realization that knowledge or estimation of the to win a Nobel Prize.I first became aware of this issue in selection probability π allows recovery of f from g.The selec- 1972, when I went to work for the International Agency for tion may be due to biased sampling—such as occurs with cen- Research on Cancer (IARC).I learned of the impact of the sored survival data, length-biased sampling in screening stud- pioneering work of Doll and Hill—not only on reduction of ies, or choice-based (case-control) sampling.Or it may be due deaths from lung cancer by identification of its major pre- to self-selection by study subjects, whether into the study in ventable cause, but also on the development of chronic dis- the first place, into the treatment group in nonrandomized ease epidemiology as a discipline.Richard Doll left