Fast Implementation of Delong's Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves

MANUSCRIPT ACCEPTED BY IEEE SIGNAL PROCESSING LETTERS 1 Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves Xu Sun; Weichao Xu∗, Member, IEEE Abstract—Among algorithms for comparing the areas under directly from the samples without making any parametric two or more correlated receiver operating characteristic (ROC) model assumptions on the forms of the parent populations. curves, DeLong’s algorithm is perhaps the most widely used For samples drawn from continuous distributions, algorithms one due to its simplicity of implementation in practice. Unfor- tunately, however, the time complexity of DeLong’s algorithm is of linearithmic time complexity have been proposed by Xu et of quadratic order (the product of sample sizes), thus making it al. for estimating the mean and variance of AUC [17]. Unfortu- time-consuming and impractical when the sample sizes are large. nately, however, sometimes in practice, the samples obey non- Based on an equivalent relationship between the Heaviside func- continuous distributions, that is, the probability of ties between tion and mid-ranks of samples, we improve DeLong’s algorithm samples are not zero. Under this circumstance, sub-quadratic by reducing the order of time complexity from quadratic down to linearithmic (the product of sample size and its logarithm). estimators for the variance of and covariance between AUCs Monte Carlo simulations verify the computational efficiency of are still unavailable to the best of our knowledge. This lack of our algorithmic findings in this work. efficient algorithms makes AUC comparison computationally Index Terms—Area under the curve (AUC), DeLong’s method, very expensive in scenarios (e.g. bioinformatics) involving mid-rank, receiver operating characteristic (ROC). massive data analysis. Motivated by this unsatisfactory situation, in this work we I. INTRODUCTION improve the popular DeLong’s algorithm [14] by reducing the time complexity from quadratic down to linearithmic order. RIGINATED from detection theory developed during This is accomplished through a relationship we find between World War II [1]–[4], receiver operating characteris- the Heaviside function and the mid-ranks of samples. ticO (ROC) analysis has found a wide use in a number of The rest part of this paper is organized as follows. Section II fields, including medicine, psychology, bioinformatics, signal gives the basic definition of AUC as well as some general processing, and machine learning, just to name a few [5]– notations employed throughout this work. Section III depicts [9]. Geometrically, ROC curve is a two-dimensional curve our linearithmic algorithm after establishing a close relation- traced out by pairs of false-positive rate and true-positive ship between the Heaviside function and mid-ranks associated rate according to various decision threshold settings. Given with samples. In Section IV, we demonstrate the efficiency of the ROC curve, the area under the curve (AUC) can then our improved algorithm in terms of time-complexity by Monte be computed, either analytically or empirically, as a figure Carlo experiments. Finally, we summarize our main finding of merit to summarize a diagnostic system’s performance [5], and draw our conclusion in Section V. a binary classifier’s overall accuracy [6], or a detector’s power of detecting the presence of an unknown signal [7]–[9]. II. DEFINITIONS As revealed by Bamber [10], AUC can be estimated by the Mann-Whitney U statistic (MWUS) [11]. Based on such For completeness and ease of later development, this section relationship, a plenty of nonparametric methods have been pro- describes the definition of nonparametric estimator of AUC as posed in the literature [12]–[16], which formulate algorithms well as DeLong’s formulas for estimating the variance of and covariance between correlated AUCs. Copyright (c) 2012 IEEE. Personal use of this material is permitted. Let X1;:::;Xm and Y1;:::;Yn be two independent and However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. identically distributed (i.i.d) samples drawn from two popula- This work was supported in part by National Natural Science Foundation of tions (whose distributions can be either continuous or discrete). China (Project 61271380), in part by Guangdong Natural Science Foundation Then, based on the relationship between the MWUS and AUC, (Project S2012010009870), in part by 100-Talents Scheme Funding from Guangdong University of Technology (Grant 112418006), in part by the Talent the sample version of AUC can be defined as Introduction Special Funds from Guangdong Province (Grant 2050205), in 1 m n part by a team project from Guangdong University of Technology (Grant ^ θ , (Xi Yj) (1) GDUT2011-07), and in part by Project Program of Key Laboratory of mn H − i=1 j=1 Guangdong Higher Education Institutes of China (Grant 2013CXZDA015). X X X, Sun and W. Xu are with the Department of Automatic Control, School where of Automation, Guangdong University of Technology, Guangzhou, 510006, P. R. China (e-mail:[email protected]; [email protected]). 1 t > 0 ∗Corresponding Author. (t) = 1 t = 0 (2) Tel:+86-020-39322552 Fax:+86-020-39322469 H 8 2 EDICS:MLSAS-PATT; SAS-MALN; SAS-STAT < 0 t < 0 : 2 MANUSCRIPT ACCEPTED BY IEEE SIGNAL PROCESSING LETTERS is the familiar Heaviside function. A. Relationship between mid-ranks and ( ) H · Let E( ), V( ) and C( ; ) denote the mean, variance and Let ;:::; be a sequence of real numbers. Sorting the · · · · Z1 ZM covariance of (between) random variables, respectively. It is sequence in ascending order yields a new sequence, termed ^ easily seen from (1) that θ is an unbiased estimator of the the order statistics [18]–[23], as corresponding population version θ, since = = < < = = (= ) Z(1) ··· Z(1) ··· Z(J) ··· Z(J) Zi 1 ^ Block1 BlockJ E(θ) = θ , Pr(X > Y ) + Pr(X = Y ): (3) (10) 2 < < = = : | {z } ··· Z| (K) ··· {z Z(K) } Let BlockK ^ ^(1) ^(k) θ , θ ;:::; θ (4) Suppose that BlockJ , whose elements are all equal to i, starts f g | {z } Z at position a and ends at position b in the sorted sequence (10). be a vector of statistics representing the areas under the ROC Then the mid-ranks of i’s in the original sequence is defined (r) (r) Z curves derived from different readings X1 ;:::;Xm and as [24] (r) (r) b Y1 ;:::;Yn (1 r k) of k different experiments. 1 a + b ≤ ≤ T ( i) = k = : (11) For the rth element of the vector, define the “structural Z Z b a + 1 2 k=a components” − X Given (11), it then follows that mid-ranks are closed related to 1 n the Heaviside function ( ), as shown in the following lemma. V (X(r)) = (X(r) Y (r)); i = 1; : : : ; m (5) H · 10 i n H i − j Lemma 1: The mid-ranks of i’s in 1;:::; M can be j=1 Z Z Z X computed by and M 1 T ( i) = ( i j) + : (12) m Z Z H Z − Z 2 1 j=1 V (Y (r)) = (X(r) Y (r)); j = 1; : : : ; n: (6) X 01 j m H i − j i=1 Proof: It suffices to show that the right side of (12) equals X the rightmost term in (11), as (r;s) Also define two matrices S 10 , s10 and S 01 , M a 1 b k k 1 − 1 1 a + b × ( i j) + = 1 + 1 + = : (13) s(r;s) such that h i H Z − Z 2 2 2 2 01 j=1 k k X kX=1 kX=a h i × 1 m (r;s) (r) ^(r) (s) ^(s) Note that from (10) and (11), T ( i), i = 1;:::;M, can be s10 = V10(Xi ) θ V10(Xi ) θ Z Z m 1 − − obtained in linearithmic time, i.e., (M log M), by using the − i=1 O X h i h (7)i popular quick sort algorithm [25] (See Fig. 1). The relationship and of (12), while useful for developing efficient algorithms later on, is not employed to calculate T ( i), i = 1;:::;M, due 1 n Z Z s(r;s) = V (Y (r)) θ^(r) V (Y (s)) θ^(s) : to its quadratic time complexity. 01 n 1 01 j − 01 j − j=1 − X h i h i (8) B. Fast algorithm for computing θ^ Then, DeLong et al. [14] proposed a variance-covariance Lemma 2: Denote by Z1;:::;ZN , N = m + n, the matrix estimator for the vector θ^ in (4), as concatenated sequence of X1;:::;Xm and Y1;:::;Yn. Then 1 1 n S = S 10 + S 01: (9) (Xi Yj) = TZ (Xi) TX (Xi) (14) m n H − − j=1 X When the vector θ^ contains only one element, that is, r = s = m 1 in (7) and (8), the covariance estimator in (9) reduces to a (Yj Xi) = TZ (Yj) TY (Yj) (15) H − − variance estimator (θ^). i=1 V X Proof: It follows from (12) along with the definition of III. LINEARITHMIC ALGORITHMS the Z-sequence that Possessing the time complexities of orders (mn) and N 2 O ^ 1 [kmn + k (m + n)], respectively, the algorithm of θ in TZ (Xi) = (Xi Zj) + O H − 2 (1) and the algorithm of S in (9) are both computationally j=1 X inefficient, especially when the sample sizes m and n are large. m 1 n However, by the relationship of the Heaviside function and = (Xi Xj) + + (Xi Yj) H − 2 H − j=1 j=1 mid-ranks shown in Lemma 1 below, linearithmic algorithms X X can be formulated based on DeLong’s formulas (5)–(9). TX (Xi) | {z } SUNSUNetet al. al.: FAST: FAST IMPLEMENTATION IMPLEMENTATION OF OF DELONG’S DELONG’S ALGORITHM ALGORITHM FOR COMPARING COMPARING THE THE AREAS AREAS UNDER UNDER CORRELATED CORRELATED ROC ROC CURVES CURVES 3 3 C.

Load more