The Bio in Biostatistics
Total Page:16
File Type:pdf, Size:1020Kb
The Bio in Biostatistics James A. Hanley Dept. of Epidemiology, Biostatistics & Occupational Health Faculty of Medicine, McGill University SSC Impact Award Address Annual Meeting of Statistical Society of Canada Montréal, Québec 2018-06-05 Things They Don’t Teach You in Graduate School1 James A Hanley McGill University Abstract Much of what statisticians teach and use in practice is learnt ‘on the job.’ I recount here some of my early statistical experiences, and the lessons we might learn from them. They are aimed at those of you starting out in the profession today, and at the teachers who train you. I stress the importance of communication. Key Words communication; communication; communication. James A. Hanley ( http://www.biostat.mcgill.ca/hanley )isProfessor, Department of Epidemiology, Biostatistics and Occupational Health, Mon- treal, Quebec, H3A 1A2, Canada. (email: [email protected]). This work was partially supported by the Natural Sciences and Engineering Re- search Council of Canada, and Le Fonds Qu´eb´ecois de la recherche sur la nature et les technologies. The author thanks Marvin Zelen for helpful com- ments. 1An expansion on some after-dinner remarks made at the Conference of Applied Statisticians of Ireland, held in Killarney, May 17-19, 2006. The article is dedicated to two former colleagues – and superb communicators – Fred Mosteller and Steve Lagakos, who are no longer with us. 2012.07.19 JOURNAL OF MATHEMATICAL PSYCHOLOGY: 6, 487-496 (1969) Maximum-Likelihood Estimation of Parameters of Signal-Detection Theory and Determination of Confidence Intervals-Rating-Method Data1 DONALD D. DORFMAN~ San Diego State College, San Diego, California 92115 AND EDWARD ALF, JR. U.S. Naval Personnel Research Activity, San Diego, California 92133 Procedures have been developed for obtaining maximum-likelihood estimates of the parameters of the Thurstonian model for the method of successive intervals. The signal-detection model for rating-method data is a special case of the Thurstonian model with fixed boundaries, in that there are two stimuli rather than an unspecified set. The present paper presents the solution to the two-stimulus case, and in addition, provides procedures for obtaining the variance-covariance matrix and confidence intervals. The expected values of the second partial derivatives are presented in analytic form to ensure accurate computation of the variance-covariance matrix. An applica- tion of these methods was employed on some data collected by others. Dorfman and Alf (1968) recently developed procedures for obtaining maximum- likelihood estimates of the parameters of signal-detection theory from data of yes-no ROC curves. Ogilvie and Creelman (1968) recently developed maximum-likelihood estimates and confidence intervals for the parameters of signal-detection theory from rating-method data, by using the logistic distribution rather than the normal distribu- tion to make the mathematics more tractable. They estimated d’ by means of an empirical relation which they obtained between d’ and an analogous parameter in the logistic model. This relation was found through numerical experiments on a high- speed computer. Unfortunately, a stable empirical relation could not be found between the sigma ratio of signal-detection theory and the analogous parameter of the logistic model. Consequently, a procedure assuming underlying normal distributions would be preferred. Schijnemann and Tucker (1967) developed maximum likelihood estimates of the parameters of the Thurstonian model for the method of successive intervals. Specifically, they assumed a set of normally distributed stimuli with unequal dispersions, and a set of fixed boundaries or cutoffs. They estimated the stimulus i Research supported in part by a grant from the National Science Foundation, GS-1466. a Now at the University of Iowa. 487 480/6/3-11 from Encyclopedia of Biostatistics, 2005 2 Receiver Operating Characteristic (ROC) Curves 1 commonly used. Getting a reader to use all of the TPF (2+2+11+33) /51 ratingFigure categories 1 Example provided of empirical yields ROC a more points stable and smoothROC (2+11+33) /51 (6 (11+33) /51 + curve estimate, but is not always easy to accomplish (6 6 curve fitted to them. The empirical points are calculated (11 + + 11 0.8 11 withoutfrom successively causing other more problemsliberal definitions [23]. Use of test of positivityratings + + 2) /58 + 2) /58 2) /58 fromapplied the to continuous the 2 5table(inset)ofdiseasestatus(D 0–100% confidence scale [31, × + 49]or D has)andratingcategory( several advantages: it moreto closely). The resem- smooth 33/51 − −− ++ 2/58 ROC curve is derived from the fitted binormal model (inset, 0.6 bles reader’s clinical thinking and reporting; its use lower right, with parameters a 1.657 and b 0.713 on of a finer scale leads to somewhat= smaller= stan- Rating category acontinuouslatentscale)byusingallpossiblescalevalues -- - +/− + ++ total dardfor test errors positivity.of estimated The fitted indicesparameters ofa accuracy;and b,together and 0.4 D− 33 6 6 11 2 58 it increases the possibility that the data will allow D+ 3 2 2 11 33 51 with the four estimated cutpoints, produce fitted frequencies parametricof 32.9, 6.4 curve, 5.9, 10 fitting..7, 2.1 and 3.2, 1.5, 2.1, 11.2, 32.9 for{ the D and D rows} of the{ 2 5table.Notethat} D− − + × 0.2 a monotonic transformation of the latent axis may produce Obtainingoverlapping distributions an ROC with Curve nonbinormal and Summary shapes, but will Latent scale Indicesyield the same Derived multinomial from distributions it and the same fitted D+ FPF ROC curve 0 0.2 0.4 0.6 0.8 1 For rating scale data, the 2 (D states) k (rating categories) frequency table of the ratings yields× k 1 Figure 1 Example of empirical ROC points and smooth empirical (TPF, FPF) ROC points. As shown− in curve fitted to them. The empirical points are calculated Figure 1, these are obtained from the k 1possible from successively more liberal definitions of test positivity − applied to the 2 5table(inset)ofdiseasestatus(D two-by-two tables formed by different re-expressions × + of the 2 k data table. After TPF 0atFPF 0, or D )andratingcategory( to ). The smooth × = = ROC− curve is derived from the fitted−− binormal++ model (inset, the lowest leftmost ROC point is derived using the lower right, with parameters a 1.657 and b 0.713 on strictest cutpoint, where only the most positive cate- acontinuouslatentscale)byusingallpossiblescalevalues= = gory would be regarded as positive; each subsequent for test positivity. The fitted parameters a and b,together point towards the top right ROC corner (TPF with the four estimated cutpoints, produce fitted frequencies 1, FPF 1) is obtained by employing successively= of 32.9, 6.4, 5.9, 10.7, 2.1 and 3.2, 1.5, 2.1, 11.2, 32.9 laxer criteria= for test positivity. For objective tests for{ the D and D rows} of the{ 2 5table.Notethat} a monotonic− transformation+ of the latent× axis may produce that yield numerical data,thesameprocedure–with overlapping distributions with nonbinormal shapes, but will each distinct observed numerical test value as a cate- yield the same multinomial distributions and the same fitted gory boundary and with k no longer fixed a priori but ROC curve rather determined by the numbers of ‘runs’ of D and D in the aggregated data – is used to calculate+ the series− of empirical ROC data points. The sequence of The complexity of the test material can have an points can then be joined to form the empirical ROC important bearing on the ability of a study to compare curve or a smooth curve can be fitted. tests. Cases resulting in an ROC curve that is midway As a summary measure of accuracy, one can between the diagonal (subtle or completely obscure use: (i) TPF[FPF], the TPF corresponding to a single ones) and the upper left corner (all ‘obvious’) allow selected FPF; (ii) the area under the ROC curve; or for sizable differences in performance; however, the (iii) the area under a selected portion of the curve, closer the curve is to the upper left corner, the often called the partial area. Summary (i) is readily narrower is the sampling distribution of the various understood and most clinically pertinent. However, indices derived from the curve [39]. reported TPFs are often in reference to different FPF For clinical imaging studies involving inter- values, and it may be unclear whether a reference pretations, the most economical method of col- FPF was chosen in advance or after inspection of the lecting a reader’s impression of each case is curve. Moreover, the statistical reliability tends to be through the use of a rating scale, i.e. graded lev- lower than that of other summary indices. els of confidence that the case is D . A discrete Summary (ii) has been recommended as an alter- five-point scale – 1 “definitely not diseased”,+ 2 native [52]. It has an interpretation in signal detection “probably not diseased”,= 3 “possibly diseased”, 4= theory as the proportion of correct choices in a two- “probably diseased”, 5 “definitely= diseased” –= is alternative forced choice experiment [21], i.e. an = No.s of students in intro. course ’81-’93 (n=7 in ’80) and Distribution of Birthdays JOURNAL OF MATHEMATICAL PSYCHOLOGY 12, 387-415 (1975) The Area above the Ordinal Dominance Graph and the Area below the Receiver Operating Characteristic Graph DONALD BAMBER Psychology Service, Veterans Administration Hospital, St. Cloud, Minnesota 56301 Receiver operating characteristic graphs are shown to be a variant form of ordinal dominance graphs. The area above the latter graph and the area below the former graph are useful measures of both the size or importance of a difference between two populations and/or the accuracy of discrimination performance. The usual estimator for this area is closely related to the Mann-Whitney U statistic.