Measuring and Testing Dependence by Correlation of Distances
Total Page:16
File Type:pdf, Size:1020Kb
The Annals of Statistics 2007, Vol. 35, No. 6, 2769–2794 DOI: 10.1214/009053607000000505 c Institute of Mathematical Statistics, 2007 MEASURING AND TESTING DEPENDENCE BY CORRELATION OF DISTANCES By Gabor´ J. Szekely,´ 1 Maria L. Rizzo and Nail K. Bakirov Bowling Green State University, Bowling Green State University and USC Russian Academy of Sciences Distance correlation is a new measure of dependence between ran- dom vectors. Distance covariance and distance correlation are anal- ogous to product-moment covariance and correlation, but unlike the classical definition of correlation, distance correlation is zero only if the random vectors are independent. The empirical distance depen- dence measures are based on certain Euclidean distances between sample elements rather than sample moments, yet have a compact representation analogous to the classical covariance and correlation. Asymptotic properties and applications in testing independence are discussed. Implementation of the test and Monte Carlo results are also presented. 1. Introduction. Distance correlation provides a new approach to the problem of testing the joint independence of random vectors. For all dis- tributions with finite first moments, distance correlation generalizes the idea of correlation in two fundamental ways: R (i) (X,Y ) is defined for X and Y in arbitrary dimensions; (ii) R(X,Y ) = 0 characterizes independence of X and Y . R Distance correlation has properties of a true dependence measure, analogous to product-moment correlation ρ. Distance correlation satisfies 0 1, and = 0 only if X and Y are independent. In the bivariate normal≤R≤ case, isR a function of ρ, and (X,Y ) ρ(X,Y ) with equality when ρ = 1. R R p≤ | | q ± arXiv:0803.4101v1 [math.ST] 28 Mar 2008 Throughout this paper X in R and Y in R are random vectors, where p and q are positive integers. The characteristic functions of X and Y are denoted fX and fY , respectively, and the joint characteristic function of X Received November 2006; revised March 2007. 1Supported by the National Science Foundation while working at the Foundation. AMS 2000 subject classifications. Primary 62G10; secondary 62H20. Key words and phrases. Distance correlation, distance covariance, multivariate inde- pendence. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Statistics, 2007, Vol. 35, No. 6, 2769–2794. This reprint differs from the original in pagination and typographic detail. 1 2 G. J. SZEKELY,´ M. L. RIZZO AND N. K. BAKIROV and Y is denoted f . Distance covariance can be applied to measure the X,Y V distance fX,Y (t,s) fX(t)fY (s) between the joint characteristic function and the productk of− the marginalk characteristic functions (the norm is defined in Section 2), and to test the hypothesis of independence k · k H : f = f f vs. H : f = f f . 0 X,Y X Y 1 X,Y 6 X Y The importance of the independence assumption for inference arises, for ex- ample, in clinical studies with the case-only design, which uses only diseased subjects assumed to be independent in the study population. In this design, inferences on multiplicative gene interactions (see [1]) can be highly distorted when there is a departure from independence. Classical methods such as the Wilks Lambda [14] or Puri–Sen [8] likelihood ratio tests are not applicable if the dimension exceeds the sample size, or when distributional assumptions do not hold (see, e.g., [7] regarding the prevalence of nonnormality in biol- ogy and ecology). A further limitation of multivariate extensions of methods based on ranks is that they are ineffective for testing nonmonotone types of dependence. We propose an omnibus test of independence that is easily implemented in arbitrary dimension. In our Monte Carlo results the distance covariance test exhibits superior power against nonmonotone types of dependence while maintaining good power performance in the multivariate normal case rela- tive to the parametric likelihood ratio test. Distance correlation can also be applied as an index of dependence; for example, in meta-analysis [12] dis- tance correlation would be a more generally applicable index than product- moment correlation, without requiring normality for valid inferences. Theoretical properties of distance covariance and correlation are covered in Section 2, extensions in Section 3, and results for the bivariate normal case in Section 4. Empirical results are presented in Section 5, followed by a summary in Section 6. 2. Theoretical properties of distance dependence measures. Notation. The scalar product of vectors t and s is denoted by t,s . For complex-valued functions f( ), the complex conjugate of f is denotedh i by 2 · p f and f = f f. The Euclidean norm of x in R is x p. A sample from the distribution| | of X in Rp is denoted by the n p| matrix| X, and the × sample vectors (rows) are labeled X1,...,Xn. A primed variable X′ is an independent copy of X; that is, X and X′ are independent and identically distributed. p q Definition 1. For complex functions γ defined on R R the w- p+q × k · k norm in the weighted L2 space of functions on R is defined by 2 2 (2.1) γ(t,s) w = γ(t,s) w(t,s) dt ds, k k Rp+q | | Z DISTANCE CORRELATION 3 where w(t,s) is an arbitrary positive weight function for which the integral above exists. 2.1. Choice of weight function. Using the w-norm (2.1) with a suit- able choice of weight function w(t,s), we definek · a k measure of dependence 2 2 (X,Y ; w)= fX,Y (t,s) fX (t)fY (s) w (2.2) V k − k 2 = fX,Y (t,s) fX (t)fY (s) w(t,s) dt ds, Rp+q | − | Z such that 2(X,Y ; w) = 0 if and only if X and Y are independent. In this paper willV be analogous to the absolute value of the classical product- momentV covariance. If we divide (X,Y ; w) by (X; w) (Y ; w), where V V V 2 p 2 (X; w)= fX,X (t,s) fX (t)fX (s) w(t,s) dt ds, V R2p | − | Z we have a type of unsigned correlation . Rw Not every weight function leads to an “interesting” w, however. The coefficient should be scale invariant, that is, invariantR with respect to Rw transformations (X,Y ) (ǫX,ǫY ), for ǫ> 0. We also require that w is positive for dependent variables.7→ It is easy to check that if the weight functionR w(t,s) is integrable, and both X and Y have finite variance, then the Taylor expansions of the underlying characteristic functions show that 2(ǫX,ǫY ; w) lim V = ρ2(X,Y ). ǫ 0 2(ǫX; w) 2(ǫY ; w) → V V Thus for integrable w,p if ρ =0, then w can be arbitrarily close to zero even if X and Y are dependent. However,R by applying a nonintegrable weight function, we obtain an w that is scale invariant and cannot be zero for dependent X and Y . WeR do not claim that our choice for w is the only reasonable one, but it will become clear in the following sections that our choice (2.4) results in very simple and applicable empirical formulas. (A more complicated weight function is applied in [2], which leads to a more com- putationally difficult statistic and does not have the interesting correlation form.) The crucial observation is the following lemma. Lemma 1. If 0 <α< 2, then for all x in Rd 1 cos t,x − h i dt = C(d, α) x α, Rd t d+α | | Z | |d where 2πd/2Γ(1 α/2) C(d, α)= − α2αΓ((d + α)/2) 4 G. J. SZEKELY,´ M. L. RIZZO AND N. K. BAKIROV and Γ( ) is the complete gamma function. The integrals at 0 and are · ∞ meant in the principal value sense: limε 0 Rd −1 c , where B is the → εB+ε B Rd c \{ } unit ball (centered at 0) in and B is theR complement of B. See [11] for the proof of Lemma 1. In the simplest case, α = 1, the constant in Lemma 1 is π(1+d)/2 (2.3) c = C(d, 1) = . d Γ((1 + d)/2) In view of Lemma 1, it is natural to choose the weight function 1+p 1+q 1 (2.4) w(t,s) = (c c t s )− , p q | |p | |q corresponding to α = 1. We apply the weight function (2.4) and the cor- responding weighted L norm , omitting the index w, and write the 2 k · k dependence measure (2.2) as 2(X,Y ). In integrals we also use the symbol V dω, which is defined by 1+p 1+q 1 dω = (c c t s )− dt ds. p q | |p | |q For finiteness of f (t,s) f (t)f (s) 2 it is sufficient that E X < k X,Y − X Y k | |p ∞ and E Y < . By the Cauchy–Bunyakovsky inequality | |q ∞ 2 i t,X i s,Y 2 f (t,s) f (t)f (s) = [E(e h i f (t))(e h i f (s))] | X,Y − X Y | − X − Y i t,X 2 i s,Y 2 E[e h i f (t)] E[e h i f (s)] ≤ − X − Y = (1 f (t) 2)(1 f (s) 2). − | X | − | Y | If E( X + Y ) < , then by Lemma 1 and by Fubini’s theorem it follows | |p | |q ∞ that 2 fX,Y (t,s) fX (t)fY (s) dω Rp+q | − | Z 2 2 1 fX (t) 1 fY (s) − | 1+p | dt − | 1+q | ds ≤ Rp Rq Z cp t p Z cq s q (2.5) | | | | 1 cos t, X X′ 1 cos s,Y Y ′ = E − h 1+p− i dt E − h 1+q− i ds Rp c t p · Rq c s q Z p| | Z q| | = E X X′ E Y Y ′ < .