<<

MANUSCRIPT ACCEPTED BY IEEE SIGNAL PROCESSING LETTERS 1 Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves

Xu Sun, Weichao Xu∗, Member, IEEE

Abstract—Among algorithms for comparing the areas under directly from the samples without making any parametric two or more correlated receiver operating characteristic (ROC) model assumptions on the forms of the parent populations. curves, DeLong’s algorithm is perhaps the most widely used For samples drawn from continuous distributions, algorithms one due to its simplicity of implementation in practice. Unfor- tunately, however, the time complexity of DeLong’s algorithm is of linearithmic time complexity have been proposed by Xu et of quadratic order (the product of sample sizes), thus making it al. for estimating the and of AUC [17]. Unfortu- time-consuming and impractical when the sample sizes are large. nately, however, sometimes in practice, the samples obey non- Based on an equivalent relationship between the Heaviside func- continuous distributions, that is, the probability of ties between tion and mid-ranks of samples, we improve DeLong’s algorithm samples are not zero. Under this circumstance, sub-quadratic by reducing the order of time complexity from quadratic down to linearithmic (the product of sample size and its logarithm). estimators for the variance of and between AUCs Monte Carlo simulations verify the computational efficiency of are still unavailable to the best of our knowledge. This lack of our algorithmic findings in this work. efficient algorithms makes AUC comparison computationally Index Terms—Area under the curve (AUC), DeLong’s method, very expensive in scenarios (e.g. ) involving mid-rank, receiver operating characteristic (ROC). massive analysis. Motivated by this unsatisfactory situation, in this work we I.INTRODUCTION improve the popular DeLong’s algorithm [14] by reducing the time complexity from quadratic down to linearithmic order. RIGINATED from detection theory developed during This is accomplished through a relationship we find between World War II [1]–[4], receiver operating characteris- the Heaviside function and the mid-ranks of samples. ticO (ROC) analysis has found a wide use in a number of The rest part of this paper is organized as follows. Section II fields, including medicine, psychology, bioinformatics, signal gives the basic definition of AUC as well as some general processing, and machine learning, just to name a few [5]– notations employed throughout this work. Section III depicts [9]. Geometrically, ROC curve is a two-dimensional curve our linearithmic algorithm after establishing a close relation- traced out by pairs of false-positive rate and true-positive ship between the Heaviside function and mid-ranks associated rate according to various decision threshold settings. Given with samples. In Section IV, we demonstrate the efficiency of the ROC curve, the area under the curve (AUC) can then our improved algorithm in terms of time-complexity by Monte be computed, either analytically or empirically, as a figure Carlo . Finally, we summarize our main finding of merit to summarize a diagnostic system’s performance [5], and draw our conclusion in Section V. a binary classifier’s overall accuracy [6], or a detector’s power of detecting the presence of an unknown signal [7]–[9]. II.DEFINITIONS As revealed by Bamber [10], AUC can be estimated by the Mann-Whitney U (MWUS) [11]. Based on such For completeness and ease of later development, this section relationship, a plenty of nonparametric methods have been pro- describes the definition of nonparametric estimator of AUC as posed in the literature [12]–[16], which formulate algorithms well as DeLong’s formulas for estimating the variance of and covariance between correlated AUCs. Copyright (c) 2012 IEEE. Personal use of this material is permitted. Let X1,...,Xm and Y1,...,Yn be two independent and However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. identically distributed (i.i.d) samples drawn from two popula- This work was supported in part by National Natural Science Foundation of tions (whose distributions can be either continuous or discrete). China (Project 61271380), in part by Guangdong Natural Science Foundation Then, based on the relationship between the MWUS and AUC, (Project S2012010009870), in part by 100-Talents Scheme Funding from Guangdong University of Technology (Grant 112418006), in part by the Talent the sample version of AUC can be defined as Introduction Special Funds from Guangdong Province (Grant 2050205), in 1 m n part by a team project from Guangdong University of Technology (Grant ˆ θ , (Xi Yj) (1) GDUT2011-07), and in part by Project Program of Key Laboratory of mn H − i=1 j=1 Guangdong Higher Education Institutes of China (Grant 2013CXZDA015). X X X, Sun and W. Xu are with the Department of Automatic Control, School where of Automation, Guangdong University of Technology, Guangzhou, 510006, P. R. China (e-mail:[email protected]; [email protected]). 1 t > 0 ∗Corresponding Author. (t) = 1 t = 0 (2) Tel:+86-020-39322552 Fax:+86-020-39322469 H  2 EDICS:MLSAS-PATT; SAS-MALN; SAS-STAT  0 t < 0  2 MANUSCRIPT ACCEPTED BY IEEE SIGNAL PROCESSING LETTERS is the familiar Heaviside function. A. Relationship between mid-ranks and ( ) H · Let E( ), V( ) and C( , ) denote the mean, variance and Let ,..., be a sequence of real numbers. Sorting the · · · · Z1 ZM covariance of (between) random variables, respectively. It is sequence in ascending order yields a new sequence, termed ˆ easily seen from (1) that θ is an unbiased estimator of the the order [18]–[23], as corresponding population version θ, since = = < < = = (= ) Z(1) ··· Z(1) ··· Z(J) ··· Z(J) Zi 1 ˆ Block1 BlockJ E(θ) = θ , Pr(X > Y ) + Pr(X = Y ). (3) (10) 2 < < = = . | {z } ··· Z| (K) ··· {z Z(K) } Let BlockK ˆ ˆ(1) ˆ(k) θ , θ ,..., θ (4) Suppose that BlockJ , whose elements are all equal to i, starts { } | {z } Z at position a and ends at position b in the sorted sequence (10). be a vector of statistics representing the areas under the ROC Then the mid-ranks of i’s in the original sequence is defined (r) (r) Z curves derived from different readings X1 ,...,Xm and as [24] (r) (r) b Y1 ,...,Yn (1 r k) of k different experiments. 1 a + b ≤ ≤ T ( i) = k = . (11) For the rth element of the vector, define the “structural Z Z b a + 1 2 k=a components” − X Given (11), it then follows that mid-ranks are closed related to 1 n the Heaviside function ( ), as shown in the following lemma. V (X(r)) = (X(r) Y (r)), i = 1, . . . , m (5) H · 10 i n H i − j Lemma 1: The mid-ranks of i’s in 1,..., M can be j=1 Z Z Z X computed by and M 1 T ( i) = ( i j) + . (12) m Z Z H Z − Z 2 1 j=1 V (Y (r)) = (X(r) Y (r)), j = 1, . . . , n. (6) X 01 j m H i − j i=1 Proof: It suffices to show that the right side of (12) equals X the rightmost term in (11), as (r,s) Also define two matrices S 10 , s10 and S 01 , M a 1 b k k 1 − 1 1 a + b × ( i j) + = 1 + 1 + = . (13) s(r,s) such that h i H Z − Z 2 2 2 2 01 j=1 k k X kX=1 kX=a h i × 1 m (r,s) (r) ˆ(r) (s) ˆ(s) Note that from (10) and (11), T ( i), i = 1,...,M, can be s10 = V10(Xi ) θ V10(Xi ) θ Z Z m 1 − − obtained in linearithmic time, i.e., (M log M), by using the − i=1 O X h i h (7)i popular quick sort algorithm [25] (See Fig. 1). The relationship and of (12), while useful for developing efficient algorithms later on, is not employed to calculate T ( i), i = 1,...,M, due 1 n Z Z s(r,s) = V (Y (r)) θˆ(r) V (Y (s)) θˆ(s) . to its quadratic time complexity. 01 n 1 01 j − 01 j − j=1 − X h i h i (8) B. Fast algorithm for computing θˆ Then, DeLong et al. [14] proposed a variance-covariance Lemma 2: Denote by Z1,...,ZN , N = m + n, the matrix estimator for the vector θˆ in (4), as concatenated sequence of X1,...,Xm and Y1,...,Yn. Then 1 1 n S = S 10 + S 01. (9) (Xi Yj) = TZ (Xi) TX (Xi) (14) m n H − − j=1 X When the vector θˆ contains only one element, that is, r = s = m 1 in (7) and (8), the covariance estimator in (9) reduces to a (Yj Xi) = TZ (Yj) TY (Yj) (15) H − − variance estimator (θˆ). i=1 V X Proof: It follows from (12) along with the definition of III.LINEARITHMICALGORITHMS the Z-sequence that

Possessing the time complexities of orders (mn) and N 2 O ˆ 1 [kmn + k (m + n)], respectively, the algorithm of θ in TZ (Xi) = (Xi Zj) + O H − 2 (1) and the algorithm of S in (9) are both computationally j=1 X inefficient, especially when the sample sizes m and n are large. m 1 n However, by the relationship of the Heaviside function and = (Xi Xj) + + (Xi Yj) H − 2 H − j=1 j=1 mid-ranks shown in Lemma 1 below, linearithmic algorithms X X can be formulated based on DeLong’s formulas (5)–(9). TX (Xi) | {z } SUNSUNetet al. al.: FAST: FAST IMPLEMENTATION IMPLEMENTATION OF OF DELONG’S DELONG’S ALGORITHM ALGORITHM FOR COMPARING COMPARING THE THE AREAS AREAS UNDER UNDER CORRELATED CORRELATED ROC ROC CURVES CURVES 3 3

C. Fast Implementation of DeLong’s Algorithm Algorithm 1: Procedure of Calculating Mid-ranks C. Fast Implementation of DeLong’s Algorithm Data: a sequence 1,..., M TheoremTheorem 1: 1: The two terms (5) (??) and and (6) (?? in) DeLong’s in DeLong’s algo- Z Z Result: the associated mid-ranks T ( 1),..., T ( M ) algorithmrithm (9) ( are??) numerically are numerically equivalent equivalent with with Z Z Z Z 1 begin (r()r) (r)(r) (r) (X ) (r) (X ) 2 M length of (r(r)) TTZZ(r) (Xi i ) TXTX(r) (Xi i ) ←− Z VV1010((XXi )) = = −− , i,= i = 1, 1. ., . . , . m . , m 3 sorted version of in ascending order i nn W ←− Z (17) 4 position indices of in (17) I ←− Z W and 5 , M + 1 and W ←− {W W } (r) (r) 6 a list of M zeros (r) (r) T (r) TZ(r) (Yj ) TY (r) (Yj ) ←− (r) TZ(r) (Yj −) TY (r) (Yj ) 7 i 1 V01(Yj ) = 1 − , j = 1, . . . , n ←− V01(Yj ) = 1− m , j = 1, . . . , n 8 while i M do − m (18) ≤ (18) 9 a i respectively. ←− respectively. 10 j a Proof: Using the relationship (Xi Yj) = 1 (Yj ←− Proof: Using the relationshipH (X−i Yj) =− 1 H (Y−j 11 while j = a do Xi) and (??), we have H − − H − W W Xi) and (15), we have 12 j j + 1 ←− m 13 end m (Xi Yj) = m [TZ (Yj) TY (Yj)] . (19) 14 b j 1 H (Xi− Yj) = m− [TZ (Yj−) TY (Yj)] . (19) ←− − i=1 H − − − 15 for k = a to b do Xi=1 X 16 Tk = (a + b)/2 Then the results follow directly by substituting (??) and (??) Then the results follow directly by substituting (14) and (19) 17 end into (??) and (??). into (5) and (6). 18 i b + 1 Remark 2: Although Theorem ?? is established in the ←− Remark 2: 19 end context of AUCAlthough comparison, Theorem other 1 is methodsestablished using in the similar context ˆ 20 for i = 1 to M do statisticsof AUC comparison, as θ can also other gain methods benefit using from similar this algorithmic statistics as ˆ 21 k finding,θ can also e.g., gain the benefitlocalization from ROC this algorithmic analysis [?]. finding, e.g., the ←− Ii 22 T ( k) Ti localizationFig. ?? summarizes ROC analysis the fast [27]. implementation of DeLong’s Z Z ←− 23 end algorithmFig. 2 based summarizes on Theorem the fast??.implementation Specifically, Lines of?? DeLong’sto ?? 24 end purportalgorithm to obtain basedθ on, V Theorem10- and V01 1.-terms Specifically, based on Lines (??), 4 (?? to) 20 andpurport (??), to respectively; obtain θ, V whereas10- and LinesV01-terms?? to based?? are on to (16), obtain (17) Fig. 1. Fast algorithm for calculating the mid-ranks of a sequence. Note that in theand matrix (18), respectively;S based on (whereas??)–(??). Lines It is 21 easily to 33 seen are that to obtain the Fig.Line 1.?? Fast, algorithmis the ordered for calculating-sequence the in (mid-ranks??); whereas of ain sequence. Line ??, Notecontains that W Z I improved algorithm is dominated by the mid-ranks algorithm inthe Line position 3, is indices the ordered of i, i-sequence= 1,...,M in (10);, in whereas, i.e., ini = Linek if 4, kcontains= i. the matrix S based on (7)–(9). It is easily seen that the the positionW indices of Z, Zi = 1,...,M, in W, i.e., I = k if WI = Z . Line ?? is to append ani extra element M + 1 (= max(i ) + 1) tok ini inimproved Fig. ??, algorithm which can is be dominated accomplished by the by mid-ranks the popular algorithm quick Line 5 is to append anZ extra element W +W 1 (= max(I Z) + 1W) to WZin order to prevent overflow in Line ??. AfterWM Lines ?? to ??,Z the list T containsW sort algorithm. Therefore, the overall time complexity of our orderthe mid-ranksto prevent of overflowalready. in Line At last, 12. the After rest Lines lines complete8 to 19, the the list rearrangementT contains in Fig. 1, which can be accomplished by the popular quick W [k(m + n) log(m + n) + k2(m + n)] theof mid-ranks the mid-ranks of withalready. respect At tolast, the the orignial rest lines sequence complete1,..., the rearrangementM . algorithmsort algorithm. is of order Therefore, the overall time complexity of our, W Z Z O 2 of the mid-ranks with respect to the orignial sequence 1,..., M . muchalgorithm lower is than of order the original[k(m order+ n) oflog(m[kmn+ n+) +k k(2m(m++n)]n)], Z Z O O whenmuchm lowerand n thanare the large. original order of [kmn + k2(m + n)] O when m and n are large. N IV. NUMERICAL RESULTS 1 TZ (Yj) = N (Yj Zi) + H − 12 This section demonstratesIV. NUMERICAL the computational RESULTS efficiency of TZ (Yj) = i=1 (Yj Zi) + Xn H − 2 m the mid-ranks-based algorithm established in Theorems ??, by i=1 1 This section demonstrates the computational efficiency of =Xn (Y Y ) + + m (Y X ) a Monte Carlo for k = 2, a scenario which is not H j − i 12 H j − i the mid-ranks-based algorithm established in Theorems 1, by i=1 i=1 only the most fundamental, but also the motivating example for = X (Yj Yi) + + X (Yj Xi) a Monte Carlo experiment for k = 2, a scenario which is not H − 2 H − the research topic of this letter [?]. All samples are generated i=1 TY (Yj ) i=1 only the most fundamental, but also the motivating example for X X by functions in MATLAB Statistics ToolboxTM. (Y ) the research topic of this letter [13]. All samples are generated which leads to ( | ) andTY ({zj), respectively.} Consider the following hypothetical scenario. We have a ?? ?? by functions in MATLAB Statistics ToolboxTM. With Lemma| , it follows{z readily} that θˆ can be computed two-class p-dimensional data set X 1,...,XX m,YY 1,...,YY n which leads to (14)?? and (15), respectively. Consider the following hypothetical{ scenario. We have} a by a linerithmic algorithm, as ˆ based on which two linear classifiers are designed, that is, we With Lemma 2, it follows readily that θ can be computed havetwo-class trainedp-dimensional two 1 p row data vectors set XW1,...,X(byX Fisher’sm,YY 1,...,Y LDA,Y n × { 1 } by a linerithmic algorithm,m as say)based and onW which(by twoSVM, linear say) classifiers which will are project designed, the original that is,p- we ˆ 1 2 θ = [TZ (Xi) TX (Xi)] dimensionalhave trained dataset two 1 ontop tworow one-dimensional vectors W 1 (by spaces, Fisher’s respec- LDA, mn m − (1) × (1) (2) 1 i=1 say) and W 2 (by SVM, say) which will project the original p- ˆ X (16) tively. Write Xi , W 1X i, Yj , W 1Y j, Xi , W 2X i, θ = m[TZ (Xi) TX (Xi)] mn1 − m + 1 dimensional(2) dataset onto two one-dimensional spaces, respec- = i=1 TZ (Xi) . Yj , W 2Y j,(1)i = 1 . . . , m and(1) j = 1 . . . , n.(2) Based on mnXm − 2n (16) tively. Write Xi , W 1X i, Yj , W 1Y j, Xi , W 2X i, 1 i=1 m + 1 these two one-dimensional data set, two estimators of the X (2) ˆ(1) ˆ(2) = TZ (Xi) . correspondingYj , W 2Y AUCs,j, i =θ 1 . .and . , mθ and, canj = be 1 computed. . . , n. Based from on mn − 2n Remark 1: Note thati=1 (??) has been mentioned in passing in (these). The two question one-dimensional is, which classifier data set, is two better, estimators or, in terms of the X ?? Lehmann’s book [?]. We present it here just for completeness ofcorresponding AUC, which AUCs, one ofθˆθˆ(1)(1) andand θˆθˆ(2)(2),is can greater. be computed To answer from asRemark well as 1: easeNote of that reference (16) has in thebeen later mentioned part of this in passing paper. in this(16). question, The question it is natural is, which to investigate classifier the is magnitudebetter, or, inof theterms Lehmann’s book [26]. We present it here just for completeness of AUC, which one of θˆ(1) and θˆ(2) is greater. To answer as well as ease of reference in the later part of this paper. this question, it is natural to investigate the magnitude of the 4 4 MANUSCRIPTMANUSCRIPT SUBMITTED ACCEPTED TO BY IEEE IEEE SIGNAL SIGNAL PROCESSING PROCESSING LETTERS LETTERS

Algorithm 2: Improved DeLong’s Algorithm 11 (r) m (r) n 1010−− SlowSlow Data: X(r) X , Y (r) Y , 1 r k , i i=1 , j j=1 ≤ ≤ FastFast Result: S k k in (9) and θ1 k in (4) ×  ×  1 begin (1) (1) 2 2 m length of X ,...,X 10−2 1 m 10− ←− (1) (1) 3 n length of Y ,...,Yn ←− 1 4 for r = 1 to k do (r) 5 3 θ 0 10− (r)←− (r) (r) 3 6 Z concatenation of X and Y 10− ←− (r) (r) 7 TZ mid-ranks of Z by Algorithm 1 CPU Time (Logarithmic Scale) ←− (r) 8 (r) X TX mid-ranks of by Algorithm 1 CPU Time (Logarithmic Scale) ←− (r) 0 50 100 150 200 9 TY (r) mid-ranks of Y by Algorithm 1 ←− 0 50Sample Sizes100 (m = n150) 200 10 for i = 1 to m do (r) (r) Sample Sizes (m = n) 11 TZ(r) (Xi ) TZ(r) (Zi ) Fig. 3. Contrast of computational speeds between the conventional covariance (r) ←− (r) (r) estimator proposed by DeLong et al. using (1), (5) and (6) and our efficient 12 V (X ) [ (r) (X ) (r) (X )]/n 10 i TZ i TX i versionFig. 3. proposed Contrast ofin computationalTheorem 1 using speeds (16), between (17) and the (18). conventional For simplicity, covariance the (r) ←−(r) (r) − 13 θ θ + TZ(r) (Xi ) sampleestimator sizes proposed of X- andby DeLongY -class et for al. each using experiments (1), (5) and are (6) set and to beour equal, efficient ←− versionm proposed= n = in10(10)200 Theorem 1 using (16), (17) and (18). For simplicity, the 14 end namely, . All samples are drawn from bivariate normal (r) (r) distributions.sample sizes A of logarithmicX- and Y scale-class is for used each for a experiments better visual are effect. set to be equal, 15 θ θ /(m n) (m + 1)/(2 n) ←− × − × namely, m = n = 10(10)200. All samples are drawn from bivariate normal 16 for j = 1 to n do distributions. A logarithmic scale is used for a better visual effect. (r) (r) n are large, and/or one employs the technique to 17 TZ(r) (Yi ) TZ(r) (Zm+j) (r) ←− test the statistical significance of the z-statistic, it is desirable 18 V01 (Yj) test the statistical significance of the z-statistic, it is desirable ←−(r) (r) to use a faster algorithm, such as the one proposed in this 1 [TZ(r) (Y ) TY (r) (Y )]/m − i − j work,to use other a faster than algorithm,the original such version asof the DeLong’s one proposed algorithm. in this 19 end work,To simulate other than the scenariothe original mentioned version above, of DeLong’s we generate algorithm. two 20 end one-dimensionalTo simulate the normal scenario samples, mentioned representing above, those we generate projected two 21 r = 1 to k (1) m for do fromone-dimensional the high-dimensional normal samples, space. representing Specifically, thoseX projected i (1)i=1m 22 for s = 1 to k do from the(2) high-dimensionalm space. Specifically, X (r,s) and Xi are i.i.d. samples follow a bivariatei nor-i=1 23 s 0 (2) i=1m  10 and X are i.i.d. samples follow a(1) bivariaten nor- (r,s) ←− mal distributioni i=1 1, 2, 1, 1, 0.8 ; whereas Yj  and 24 s 0  (1)j=1n 01 (2) n N ←− mal distribution 1, 2, 1, 1, 0.8 ; whereas Yj and 25 i = 1 to m Yj  are i.i.d. follow a bivariate normal distributionj=1 for do (2) jn=1 N   (r,s) (r,s) (r) (r) Y2, 3, 1, 1, 0are.8 . i.i.d. The follow notation a bivariateµ , µ , normal σ2, σ2, ρ distributionstands 26 s10 s10 + [V10(Xi ) θ ]  j j=1  1 2 1 2 ←− − × N N 2 2 (s) (s) for a2, bivariate3, 1, 1, 0 norm.8 . The distribution notation with meansµ1, µ2µ, σ1 and, σ ,µ ρ2, vari-stands [V10(Xi ) θ ]   1 2  − ancesN σ2 and σ2, and correlation ρN. Note that the parameters 27 end for a bivariate1 2 norm distribution with µ1 and µ2, vari- are chosen2 rather2 arbitrarily, since they have little if not no 28 for j = 1 to n do ances σ1 and σ2, and correlation ρ. Note that the parameters (r,s) (r,s) (r) (r) effectare chosen on the rather computational arbitrarily, speed since comparison. they have little if not no 29 s01 s10 + [V01(Yj ) θ ] ←−(s) − × effectFig. on 3 contrasts the computational the computational speed comparison. speeds over m = n = [V (Y ) θ(s)] 01 j − 10(10)200Fig. 3 contrastsbetween the the computational original quadratic speeds versions over m and= then = 30 end newly developed linearithmc versions for computing S, where (r,s) (r,s) 10(10)200 between the original quadratic versions and the 31 S s /(m 1) + s /(n 1) m and n denote the sample sizes of X- and Y -class, respec- r,s ←− 10 − 01 − newly developed linearithmc versions for computing S, where 32 end tively.m and Then denote notation the10(10)200 sample sizesstands of forX- a and listY starting-class, fromrespec- 33 10 end 10tively. to 200 The with notation an increment10(10)200 of stands. Each for of a list the starting algorithms from 34 is run for 100 times for stability. As shown in Fig. 3, the end 10 to 200 with an increment of 10. Each of the algorithms linearithmic algorithm proposed in this work does reduce the is run for 100 times for stability. As shown in Fig. 3, the Fig. 2. Fast implementation of DeLong’s algorithm based on Theorem 1. time complexity significantly. Fig. 2. Fast implementation of DeLong’s algorithm based on Theorem 1. linearithmic algorithm proposed in this work does reduce the time complexity significantly.V. SUMMARY statistic [13] statistic [13] In this paper we investigated the problems of comparing ˆ(1) ˆ(2) ˆ(1) ˆ(2) V. SUMMARY θ θ θ θ the areas under two or more correlated receiver operating z , θˆ(1) −θˆ(2) = θˆ(1) −θˆ(2) characteristicIn this paper curves we from investigated nonparametric the problems viewpoints. of Our comparing con- z , V[θˆ−(1) θˆ(2)]= V[θˆ(1)] + V[θˆ(2)−] 2C[θˆ(1), θˆ(2)] ˆ(1) −ˆ(2) ˆ(1) ˆ(2) − ˆ(1) ˆ(2) tributionthe areas is an under improvement two or more of the popularcorrelated DeLong’s receiver algorithm operating qV[θ θ ] qV[θ ] + V[θ ] 2C[θ , θ ] where the denominator− involves all terms in− the matrix S incharacteristic the sense of curves reducing from the nonparametric time complexity viewpoints. from the original Our con- q q wheredefined the by denominator (9). Under involves the null allhypothesis, terms inz thecan matrix be wellS quadratictribution isorder an improvement down to thepresent of the popular linearithmic DeLong’s order, algorithm based definedapproximated by (9). by Under the standard the null normal hypothesis, distributionz can [13]. be There- well onin thean equivalent sense of reducing relationship the time we find complexity between from the Heaviside the original approximatedfore, if the byvalue the of standardz deviates normal too distribution much from [13]. zero, There- e.g., functionquadratic and order mid-ranks down ofto samples.the present These linearithmic algorithmic order, findings based fore,z > if1.96 the, it value is thus of reasonablez deviates to consider too much that fromθˆ(1) > zero,θˆ(2) with e.g., mighton an shed equivalent new light relationship on the topic we of find ROC between analysis the involving Heaviside zthe > 1 significance.96, it is thus level reasonablep < 0.05 to. Whenconsider the that sampleθˆ(1) sizes> θˆ(2)mwithand manyfunction scientific and mid-ranks areas including of samples. signal These processing. algorithmic findings the significance level p < 0.05. When the sample sizes m and might shed new light on the topic of ROC analysis involving n are large, and/or one employs the resampling technique to many scientific areas including signal processing. SUN et al.: FAST IMPLEMENTATION OF DELONG’S ALGORITHM FOR COMPARING THE AREAS UNDER CORRELATED ROC CURVES 5

REFERENCES [27] A. Wunderlich and F. Noo, “A nonparametric procedure for comparing the areas under correlated lroc curves,” IEEE Transactions on Medical [1] W. Peterson, T. Birdsall, and W. Fox, “The theory of signal detectability,” Imaging, vol. 31, no. 11, pp. 2050–2061, 2012. IRE Professional Group on Information Theory, vol. 4, no. 4, pp. 171– 212, 1954. [2] D. Van Meter and D. Middleton, “Modern statistical approaches to reception in communication theory,” IRE Professional Group on Infor- mation Theory, vol. 4, no. 4, pp. 119–145, 1954. [3] A. Wald, “Statistical decision functions,” The Annals of , pp. 165–205, 1949. [4] J. A. Hanley, “Receiver operating characteristic (ROC) methodology: the state of the art.” Critical Reviews in Diagnostic Imaging, vol. 29, no. 3, p. 307, 1989. [5] D. Mossman, “Three-way ROCs,” Medical Decision Making, vol. 19, no. 1, pp. 78–89, 1999. [6] W. Yousef, R. Wagner, and M. Loew, “Assessing classifiers from two independent data sets using ROC analysis: A nonparametric approach,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 11, pp. 1809 –1817, nov. 2006. [7] A. Oukaci, J.-P. Delmas, and P. Chevalier, “Performance analysis of lrt/glrt-based array receivers for the detection of a known real-valued sig- nal corrupted by noncircular interferences,” Signal Processing, vol. 91, no. 10, pp. 2323–2331, 2011. [8] G. Guo, M. Mandal, and Y. Jing, “A robust detector of known signal in non-gaussian noise using threshold systems,” Signal Processing, vol. 92, no. 11, pp. 2676–2688, 2012. [9] V. Hari, G. Anand, A. Premkumar, and A. Madhukumar, “Design and performance analysis of a signal detector based on suprathreshold stochastic resonance,” Signal Processing, vol. 92, no. 7, pp. 1745–1757, 2012. [10] D. Bamber, “The area above the ordinal dominance graph and the area below the receiver operating characteristic graph,” Journal of Mathematical Psychology, vol. 12, no. 4, pp. 387–415, 1975. [11] H. Mann and D. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,” The Annals of Mathe- matical Statistics, vol. 18, no. 1, pp. 50–60, 1947. [12] J. Hanley, “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, vol. 743, pp. 29–36, 1982. [13] J. A. Hanley and B. J. McNeil, “A method of comparing the areas under receiver operating characteristic curves derived from the same cases.” Radiology, vol. 148, no. 3, pp. 839–843, 1983. [14] E. DeLong, D. DeLong, and D. Clarke-Pearson, “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,” Biometrics, pp. 837–845, 1988. [15] L. Popescu, “Nonparametric ROC and LROC analysis,” Medical Physics, vol. 34, p. 1556, 2007. [16] K. Zou, W. Hall, and D. Shapiro, “Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests,” Statistics in Medicine, vol. 16, no. 19, pp. 2143–2156, 1998. [17] W. Xu, J. Dai, Y. Hung, and Q. Wang, “Estimating the area under a re- ceiver operating characteristic (roc) curve: Parametric and nonparametric ways,” Signal Processing, vol. 93, no. 11, pp. 3111–3123, 2013. [18] W. Xu, C. Chang, Y. S. Hung, S. K. Kwan, and P. C. W. Fung, “ correlation coefficient and its application to association measurement of biosignals,” Proc. Int. Conf. Acoustics, Speech, Signal Process. (ICASSP) 2006, vol. 2, pp. II–1068–II–1071, May 2006. [19] ——, “Order statistics correlation coefficient as a novel association measurement with applications to biosignal analysis,” IEEE Trans. Signal Process., vol. 55, no. 12, pp. 5552–5563, dec. 2007. [20] W. Xu, C. Chang, Y. S. Hung, and P. C. W. Fung, “Asymptotic properties of order statistics correlation coefficient in the normal cases,” IEEE Trans. Signal Process., vol. 56, no. 6, pp. 2239–2248, Jun. 2008. [21] W. Xu, Y. S. Hung, M. Niranjan, and M. Shen, “Asymptotic mean and variance of Gini correlation for bivariate normal samples,” IEEE Trans. Signal Process., vol. 58, no. 2, pp. 522–534, Feb. 2010. [22] W. Xu, Y. Hou, Y. Hung, and Y. Zou, “A comparative analysis of Spearman’s rho and Kendall’s tau in normal and contaminated normal models,” Signal Processing, vol. 93, pp. 261–276, Jan. 2013. [23] R. Ma, W. Xu, Q. Wang, and W. Chen, “Robustness analysis of three classical correlation coefficients under contaminated gaussian model,” Signal Processing, vol. 104, pp. 51–58, 2014. [24] M. Kendall and J. D. Gibbons, Rank Correlation Methods, 5th ed. New York: Oxford University Press, 1990. [25] T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein et al., Introduction to algorithms. MIT press Cambridge, 2001, vol. 2. [26] E. L. Lehmann and H. J. D’Abrera, Nonparametrics: statistical methods based on ranks. Springer New York, 2006.

View publication stats