Kendall's and Spearman's Correlation Coefficients in the Presence of A

.HQGDOO VDQG6SHDUPDQ V&RUUHODWLRQ&RHIILFLHQWVLQWKH3UHVHQFHRID%ORFNLQJ9DULDEOH $XWKRU V -HUHP\0*7D\ORU 6RXUFH%LRPHWULFV9RO1R -XQ SS 3XEOLVKHGE\,QWHUQDWLRQDO%LRPHWULF6RFLHW\ 6WDEOH85/http://www.jstor.org/stable/2531822 $FFHVVHG Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=ibs. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics. http://www.jstor.org BIOMETRIcs 43, 409-416 June 1987 Kendall's and Spearman'sCorrelation Coefficients in the Presence of a Blocking Variable Jeremy M. G. Taylor Division of Biostatistics, School of Public Health and Jonsson Comprehensive Cancer Center, University of California at Los Angeles, Los Angeles, California, 90024, U.S.A. SUMMARY The use of a weighted sum of Kendall's taus or a weighted sum of Spearman'srhos for testing associationin the presenceof a blocking variableis discussed.In a Monte Carlo study the two are shown to have essentiallythe same powerwith the optimal choice of weights.In the presenceof ties, the weightedsum of Spearman'srhos is preferredbecause its variancehas a much simplerform. An exampleis given from the field of radiationtherapy. 1. Introduction Kendall's tau (Kendall, 1938) and Spearman'srho (Spearman, 1904) are two commonly used nonparametricmethods of detectingassociations between two variables.Their use is usually restrictedto a single block. However, there are circumstanceswhere the joint distributionof the two variablesof interest (X and Y) is affected by the value of a third variable, called a blocking variable. In this paper, we restrict attention to the situation where there are multiple (X, Y) pairs for each value of the third variable.There are many ways to addressthis problem;for example, one could use a linear model approachand test for associations by examining the estimated regressioncoefficients. However, it may be desirableto use a less model-dependentapproach. An example of where the underlying assumptionsof the linear model are probablyinappropriate is when the X and Y variables are orderedcategorical, rather than continuous variables. Elfving and Whitlock (1950), Ury (1968), Quade (1974), Reynolds (1974), and Korn (1984) have suggestedusing a weighted sum of Kendall'stau, across the blocks, to test for associations.Different choices in the weighting scheme and their merits are discussed by Korn. Spearman'srho is an alternativeto Kendall'stau which can be used for testing for association. Kendall (1970, ?1.2.4) claims that for many practical and most theoretical points of view, tau is preferableto rho. One reason for preferringtau, when estimatinga correlation,is that the population parameterbeing estimated has a simpler interpretation. For hypothesis testing the interpretationof the test statistic is less important;the power propertiesof the respectivetests are more relevant.For the case when one of the variables has two possible values, Lehmann (1975, p. 135) considers a weighted sum of Wilcoxon statistics. Tau and rho are closely related;they are both functions of the ranks. Hettmansperger (1984, p. 203) gives insight into the relationshipby showing that, assumingno ties, tau=1- I1) s(R -R j) nn- 1 ~~~ Key words: Conditionalindependence; Kendall's tau; Rank correlation;Spearman's rho. 409 410 Biometrics,June 1987 and 12 rho =1I- (j - i)s(R1 - R) n(n - 1) wheren is the numberof pairs, s(x) = 1 if x > 0 and s(x) = 0 otherwise,and Ri is the rank of Yi after the X's have been arrangedin the order X1 < ... < X. Kendall and Stuart (1973, p. 481) show that, if X and Y are independent,the correlationbetween tau and rho is at least .98 and tends to 1 as n -* oo. Van der Waerden (1969, p. 338) and Woodworth (1970) compare tau and rho for a bivariatenormal distributionwith correlationequal to c. By approximatingthe power, van der Waerdenshows that rho is slightly more powerfulthan tau, for testing the hypothesis that c equals 0. Woodworthshows that the Bahadurefficiency of tau to rho appearsto be between 1.0 and 1.05 for all values of c. For ordered contingency tables, Simon (1978) shows that tau and rho have the same efficacy;see also Proctor (1973) and Brown and Benedetti(1977). We would expect the very similarpower propertiesof tau and rho to extend to weighted averagesof tau and rho, across many blocks. In this article we discuss what weighting scheme to use with tau and rho, both in the situation where there are no ties and when there are many ties. Based on a small Monte Carlo study, we conclude that optimally weighted sums of tau and rho are essentiallyequivalent in terms of power. However, the calculation of the significancelevel of the test is appreciablysimpler for rho, if there are ties. Section 4 gives an example of the use of the weightedsum of rhos for some data from radiationtreatment of cancer. 2. Testing for Association 2.1 Kendall's Tau Using the notation of Kom, suppose we observe (Xik, Yik), i = 1, ..., nk; k = 1, ..., K; where nk is the number of pairs in the kth block. Let Tk = Kendall's tau for the kth block 2 ->(n - I) EiZ sign[(Xi - Xj)(Yj - Yj)] flk(flk - 1)i<i where sign(a) equals 1, 0, or -1 if a is positive, zero, or negative,respectively. Tk is an estimatorof the probabilityof concordanceminus the probabilityof discordance in block k; denote this by TXY.k. Under the hypothesisHo, of conditional independence betweenX and Y given the block, the mean and varianceof Tk are given by E(Tk) = 0 and co(Tk) = 2(2nk + 5) (2.1) Let T = z WkTk be the weighted sum of the Tk's; it is a suitable statistic for testing conditional independence. Different choices of weights can be used. A naive approach would be to use W(-) = 1. Elfving and Whitlock (1950), Ury (1968), Quade (1974), and Reynolds (1974) suggest using W(2) = lnk(nk - 1). Korn suggestsusing W(3) = 1/a2(Tk). He notes that weighting by the inverse of the variance is optimal under alternativesin which TXY.k iS constant acrossblocks. In practice,one may not know that 1XY.k iS constant across blocks, and even if it is not the case one may still be interested in testing for conditionalindependence. CombiningRank CorrelationsAcross Blocks 411 If ties occur in the observations,then equation (2.1) is incorrect. Under the independence hypothesisthe varianceof Tk,conditional on the set of observedranks, is (Noether, 1967, p. 77) ~ 2 2 (T = oo(Tk) 9n2(n _ 1)2 [nk(nk -1)(2nk + 5) - Ef(f- 1)(2f+ 5) - >2g(g- 1)(2g+ 5)] 4 4 - - +nn )(k-2 [ffff- 1)(f- 2)] [ g(g 1)(g 2)] (2.2) +~k- ) [Ef(fi- O)][g(g_ 1)], wheref and g denote the multiplicity of ties of the X and Y ranks in block k, and the summationsare over all sets of ties. Note that (2.2) reducesto (2.1) when there are no ties, and that aor(Tk)is approximatelyequal to 0o(Tk) unless there is a vast number of ties. In particular,only if the observationsare heavilygrouped with a substantialproportion of the observationstied at a few values will ao(Tk) and Uo2(Tk)be significantlydifferent. A fourth possible weighting scheme is W4) = 1/ o2 (Tk). Not only is W(4) practically unattractive,but it also has the disadvantageof dependingon the data. To test Ho, the null hypothesis of conditional independence, one compares Z. = T[X Wk O(Tk)]-1/2 with the standardnormal distribution.Alternatively, one could use Z2= T[E W2a2(Tk)] -1/2, and hope that the effect of the ties was negligible. The relativemerits of using W('), W(2), g 3), and W(4) for small samplesizes are addressed in Section 3, in the two cases where there are no ties and many ties. 2.2 Spearman'sRho Let rk (ui - )(vi-) rk=[ (u -)2_ Z (V - _V)2]1/2 where ui (vi) is the rank of Xi (Yi); averageranks are used if there are ties. rkis Spearman's rho in block k, the rank analogue of the product-momentcorrelation coefficient. Under Ho, the hypothesisof conditional independencebetween X and Y, E(rk)= 0 and var(rk)= (nk - 1)'-. rkhas the nice propertythat the variance conditional on the set of ranks is (nk - 1)-', even when there are ties in the data; see Kendall and Stuart (1973, p. 477). Let R = z Wkrk be the weighted sum of the rhos. We consider two weightingschemes: (i) the naive choice W(')= 1, and (ii) W(2) = nk - 1. If the underlyingpopulation values of rkare constant across blocks, then W(2) maximisesthe efficacyof R for testing Ho. To test Ho, one compares P = R[Z W2(nk - 1)-1]-1/2 with the standard normal distribution.Other approximations to the distributionof R could be used;however, we will use this simple approach.

Kendall's and Spearman's Correlation Coefficients in the Presence of A

Krishnan Correlation CUSB.Pdf

ED441031.Pdf

Detecting Trends Using Spearman's Rank Correlation Coefficient

Correlation • References: O Kendall, M., 1938: a New Measure of Rank Correlation

Correlation and Regression Analysis

"Rank and Linear Correlation Differences in Simulation and Other

A Test of Independence in Two-Way Contingency Tables Based on Maximal Correlation

Chi-Squared for Association in SPSS

Linear Regression and Correlation

Nonparametric Statistics

The Spearman's Rank Correlation Test

Spearman Correlation: 2-Tailed