Comparison of Receiver Operating Characteristic Curves on the Basis of Optimal Operating Points

Statistics for Radiologists Harold L. Kundel, MD, Editor Comparison of Receiver Operating Characteristic Curves on the Basis of Optimal Operating Points Ethan J. Halpern, MD 1, Michael Albert, PhD 1, Abba M. Krieger, PhD 2, Charles E. Metz, PhD 3, Andrew D. Maidment, PhD 1 Rationale and Objectives. We developed a method tive rates of the dichotomous interpretation depend on of comparing receiver operating characteristic (ROC) the underlying distributions of test results (Fig. 1A) in curves on the basis of the utilities associated with their the normal and abnormal populations, as well as on the optimal operating points (OOPs). cutoff value used to discriminate between normal and Methods. OOPs were computed for paired ROC abnormal populations. As the cutoff value is varied, a curves on the basis of isocost lines in ROC space with receiver operating characteristic (ROC) curve is gener- slopes ranging from 0.1 to 3.0. For each pair of OOPs ated (Fig. 1B). For any given clinical scenario, there is corresponding to a single isocost slope, the difference an optimal operating point (OOP) on the ROC curve in costs and the variance of this difference was com- that defines the most appropriate cutoff value to dis- puted. A sensitivity analysis was thus obtained for the criminate a positive from a negative test result. difference between the two curves over a range of iso- In terms of cost-benefit analysis, the OOP on a ROC cost slopes. Three published data sets were evaluated curve maximizes the expected utility of a diagnostic using this technique, as well as by comparisons of areas under the curves and of true-positive fractions at fixed test. The utility of a diagnostic test depends on the prior false-positive fractions. expectation of disease (or disease prevalence) and the relative costs incurred by a false-positive or a false-neg- Results. The OOPs of paired ROC curves often occur ative result. The slope of a line of "isoutility" in ROC at different false-positive fractions. Comparisons of ROC curves on the basis of OOPs may provide results space is given by that differ from comparisons of curves at a fixed false- (prevalence of disease) (cost of false-positive result) (1) positive fraction. 1 - (prevalence of disease)x (cost of false-negative result) Conclusion. ROC curves may be compared on the This slope defines a family of parallel lines. The OOP basis of utilities associated with their OOPs. This compar- on a ROC curve must be tangent to the highest line of ison of the optimal performance of two diagnostic tests may differ from conventional statistical comparisons. isoutility that intersects with the ROC curve. The slope of the ROC curve at its OOP will be equal to the slope Key Words. Receiver operating characteristic curve; of isoutility [1]. area under the curve; optimal operating point; statistical By analogy with laboratory tests that provide a con- comparison. tinuous numeric result, imaging studies provide a result iagnostic tests often provide a continuous value (supporting or refuting a particular diagnosis) with a D that may be interpreted as a dichotomous result variable confidence level. The ROC curve for a diagnos- (normal or abnormal). The true-positive and false-posi- tic imaging study plots the title- and false-positive rates From the 1Departmentof Radiology, Jefferson Medical College of Thomas Jefferson University Hospital, Philadelphia, PA; 2Department of Statistics, Wharton School, University of Pennsylvania, Philadelphia, PA; and 3Departmentof Radiology,University of Chicago Medical Center, Chicago, IL. This work was partly supported by a grant from General Electric. Ethan J. Halpern, MD, is a member of the General Electric-Association of University Radiologists' Radiology Research Academic Fellowship Program. Address reprint requests to E. J. Halpern, MD, Department of Radiology,Thomas Jefferson University, 132 S. 10th St., Philadelphia, PA 19107-5244. Received August 11, 1995, and accepted for publication after revision November 20, 1995. Acad Radio11996;3:245-253 © 1996, Association of University Radiologists 245 HALPERN ET AL. Voh 3, No. 3, March 1996 1;0 TPF(c) i c=c2 ~C=C| | ! ! u N c uS 0 FPF~) 1.0 A B FIGURE 1. A, Distribution of signal and noise from which the classic receiver operating characteristic (ROC) curve is derived. The dichotomousinterpretation of the diagnostic study is positive when the value obtained is greater than c. B, In the resulting ROC curve, each value of c defines an expected true-positive fraction (TPF) and an expected false-positive fraction (FPF). N = normal, S = abnormal. of the study at these various confidence levels. The analyzed previously by comparing AUCs or TPFs at set OOP for a diagnostic imaging study defines the confi- FPFs. In this article, we analyze these data sets on the dence level that will provide the best test performance basis of the costs associated with the OOPs and com- from the cost-benefit analysis perspective. pare our results with the conventional techniques used ROC curves are used both to evaluate individual diag- in the original publications. nostic tests and to compare the relative accuracy of competing diagnostic tests. The area under the ROC curve MATERIALS AND METHODS (AUC) provides a summary index to evaluate a diagnostic test [2] and may be used to provide a statistical compari- Determination of the OOP on a ROC Curve son of competing diagnostic tests [2, 3]. The AUC evalu- Assume that we have a diagnostic study that must ates the accuracy of a diagnostic test over the full range of distinguish between noise and signal. The magnitude of possible discriminating cutoff values. In practice, how- both the noise and the true signal are normally distrib- ever, most tests are (or at least should be) applied with a uted with means ~N and gs, where N denotes normal discriminating value close to the OOP. Thus, the AUC and S abnormal. As detailed in the Appendix, a ROC may not be truly representative of the diagnostic accuracy curve for this scenario is defined by two parameters, a of a test as it is used in clinical practice. and b. Parameter a represents the normalized differ- To provide a comparison of tl-/e more clinically rele- ence between the means: a = (bt s - bLN)/~S. Parameter vant portions of ROC curves, a method has been b represents the ratio of the standard deviations: b = described to compare true-positive fractions (TPFs) at ON/OS. Equation 1 provides the slope of a line of preselected false-positive fractions (FPFs) [4]. In this isoutility in ROC space. This line will be tangent to a technique, it is assumed that two diagnostic tests ROC curve at its OOP. We call this slope 13. Assuming should be compared with their respective discriminat- that we know the value of 13 for a particular diagnostic ing values adjusted to achieve identical false-positive situation, we need to find the OOP on the ROC curve rates. Often, however, the OOP of one test is at a differ- for a diagnostic test. The FPFs and TPFs at the OOP are ent false-positive level than the OOP of a competing given by " diagnostic study. Under such circumstances, a comparison of TPFs at a preselected FPF does not properly 1) for b=l, FPFoop(a,[~)=q~ I-a/2 - ln(fJ)/a ] (2) compare the optimal utilization of both tests. It has been suggested recently that a method should TPFoop(a,[3)=O [+a/2- lu(fD/a ] be developed for comparing ROC curves on the basis of the cost (or utility) associated with the OOP for each diagnostic study [5]. We developed such a technique 2) for b:~ 1, FPFooe(a,b, fJ)=~{[ab-,,Ja2+ 2(1- t~)ln(f~/b)]/(1-b2)} and applied this technique to three data sets obtained from the radiology literature [6-8]. These data sets were TPFoop (a, b,13) = • { [a - b%/a2+2 (1- ba) ln(f3/b)]/( 1--/92)} 246 Vol. 3, No. 3, March 1996 COMPARING ROC CURVES where • is the- area under the standard normal curve to lated from equation 3. The variance of this difference is the left of the value within the parentheses. approximated from the variances and covariance of the a and b parameters of each ROC curve (Appendix). A Z Computation of the Cost of a Diagnostic Test statistic may be calculated as the ratio of (K 1 - K 2) to The slope of the isoutility line in ROC space, 13, the standard deviation of (K 1 - K2). The constant ~, is defines the relative utility of true- and false-positive eliminated because it appears in both the cost and its results for a given prevalence of disease. At any point standard deviation. As shown in the Appendix, the on a ROC curve, the expected cost of a diagnostic test resulting Z statistic depends only on a, b, and 13. is determined by the FPF and TPF and is given by When a ROC curve is determined by the maximum- likelihood technique, the values of the curve parame- K = X[[3(FPF) - (TPF)] + (Cstudy) , (3) ters a and b, and the variances and covariance of these parameters, are estimated from ordinal image-reading where K is the expected cost (negative utility), ~, or test-result data. In general, the value of ~ for any represents a constant that translates K into the appro- diagnostic test is not known with certainty because it priate units of cost, and Cstudy represents the intrinsic depends on disease prevalence and the perceived cost costs of the diagnostic study, including actual monetary of underdiagnosis and overdiagnosis (equation 1). costs as well as potential morbidity-mortality that may Given an estimate of 13, we may calculate the Z statistic result from the test.

Comparison of Receiver Operating Characteristic Curves on the Basis of Optimal Operating Points

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support