<<

TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 359, Number 11, November 2007, Pages 5345–5363 S 0002-9947(07)04192-X Article electronically published on May 11, 2007

ASYMPTOTIC DISTRIBUTION OF THE LARGEST OFF-DIAGONAL ENTRY OF CORRELATION MATRICES

WANG ZHOU

Abstract. Suppose that we have n observations from a p-dimensional pop- ulation. We are interested in testing that the p variates of the population are independent under the situation where p goes to infinity as n →∞.A test statistic is chosen to be Ln =max1≤i

1. Introduction

Suppose (X, Y ), (Xi,Yi),i=1,...,nare independent and identically distributed (i.i.d.) random vectors from a bivariate distribution F (x, y). For measuring the closeness of two random variables X and Y , we use Pearson’s sample correlation coefficient, which is defined by  n − −  i=1(Xi X)(Yi Y ) (1.1) ρXY =   , n − 2 n − 2 i=1(Xi X) i=1(Yi Y )   n n where X = i=1 Xi/n and Y = i=1 Yi/n. Obviously, ρXY is a natural of the population correlation coefficient,  ρ = E(X − EX)(Y − EY )/ E(X − EX)2(Y − EY )2.

In the case of a bivariate , ρXY is the maximum likelihood estimator of ρ. In addition, it is a commonly used statistic to test the null hypothesis that the two variables X, Y are independent, i.e.,

H0 : ρ =0,

H1 : ρ =0 .

Received by the editors April 25, 2005 and, in revised form, September 5, 2005. 2000 Mathematics Subject Classification. Primary 60F05, 62G20, 62H10. Key words and phrases. Sample correlation matrices, Spearman’s rank correlation matrices, Chen-Stein method, moderate deviations. The author was supported in part by grants R-155-000-035-112 and R-155-050-055-133/101 at the National University of Singapore.

c 2007 American Mathematical Society Reverts to public domain 28 years from publication 5345

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 5346 WANG ZHOU

Although many authors have studied the distribution of ρXY , see Hotelling [6] and the references therein, the best present-day usage in dealing with sample correlation coefficients is based on R. A. Fisher’s work published in 1915; see [3]. Nowsupposewehaveap-dimensional distribution with mean µ and covariance matrix Σ. In the last three or four decades, due to the fast development of modern technology, in many research areas, including signal processing, network security, image processing, genetics, stock marketing and other economic problems, we are faced with the task of analyzing data with ever increasing dimension p. Naturally, one may ask how to test the independence among the p components of the pop- ulation. For example, air pollutant levels are influenced by several meteorological conditions, such as wind speed, temperature, humidity, insolation, etc. The rela- tionship of air pollutant levels to meteorological conditions can be described by regression models. But an essential part of constructing regression models is to check that all the interested meteorological conditions are independent or uncorre- lated. Under the additional normality assumption, Johnstone [9] derived the asymptotic distribution of the largest eigenvalue of the sample covariance matrices to study the test hypothesis H0 :Σ=I assuming µ = 0. On the other hand, Jiang [7] proposed to use a more intuitive independence test statistic, the largest off-diagonal entry of the sample correlation matrices, to test H0 assuming at least a 30th moment of the population. Before proceeding further, let us introduce some notation. Let (Xk1,...,Xkp), k =1,...,n,beasampleofn independent observations taken from a nondegenerate T p-variate population. Let Xi =(X1i,...,Xni) , i =1,...,p, be columns of the × X X n p matrix n =(Xki), k =1 ,...,n,i=1,...,p. Hence n =(X1, X2,...,Xp). n − − Introduce the average Xi = k=1 Xki/n.WewriteXi Xi for Xi Xie,where T n e =(1, 1,...,1) ∈ R . Then, ρij, the sample correlation coefficient between Xi and Xj is defined by T (Xi − Xi) (Xj − Xj) (1.2) ρij = , 1 ≤ i

where ·is the Euclidean norm. If we assume that the columns of Xn are independent, all the ρij should be close to 0. In other words, the biggest off- diagonal entry of the sample correlation matrix Γn =(ρij),

Ln =max|ρij|, 1≤i

has to be small. So we can use Ln as an independence test statistic. Under the i.i.d. assumption, Jiang [7] found an asymptotic distribution of Ln. Theorem A (Jiang, [7]). Suppose that E|X|30+ < ∞ for some >0.Ifn/p → γ, then as n →∞for any y ∈ R,  √  2 − ≤ → − 2 −1 −y/2 (1.3) P(nLn an y) exp (γ 8π) e ,

where an =4logn − log log n. Jiang’s result makes it possible to test independence of components of vectors in large dimensional data problems when strong structural assumptions on the population, such as normality, are not present. But the conditions of his theorem are too restrictive. For example, a (30+)th moment is needed to get the asymptotic

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYMPTOTIC DISTRIBUTION OF LARGEST OFF-DIAGONAL ENTRY 5347

distribution of Ln. So is it possible to reduce the high order moment requirements? On the other hand, we have no guarantee that the ratio of the sample size to the dimension will converge in practice. Sometimes it may happen that the dimension increases much slower than the sample size. Clearly, one may ask what we can do in those situations. In this paper, we try to answer the above mentioned questions through the following theorem.

Theorem 1.1. Assume that the entries of the matrix Xn =(Xki) are i.i.d. Let p = pn satisfy (1.4) lim p = ∞, lim sup p/n < ∞. n→∞ n→∞ Write b =4logp − log log p.Then n   2 − ≤ − −y/2 −1/2 (1.5) lim P(nLn bn y)=exp Ke ,K=(8π) , n→∞ provided that    (1.6) lim x6P |XX | >x =0, x→∞  where X and X are independent copies of X11.Inparticular,(1.6) holds if (1.7) EX6 < ∞. We shall make the following remarks: (1). Using the Chebyshev inequality, we can see that the condition EX6 < ∞ is stronger than the weak moment assumption (1.6) for the product XX. Through Example 1 in Section 4, we explain that the weak moment condition (1.6) seems to be optimal. (2). The limit distributions in (1.5) and (1.3) are different. This is due to the different choice of centering constants. Our choice of centering constants has an advantage that the limit distribution is independent of the unknown parameters, which is really useful in statistical applications. Of course, under Jiang’s condi- tions, the relations (1.5) and (1.3) are mathematically equivalent, since in this case, limn→∞(an − bn)=4logγ. (3). To prove Theorem 1.1 we use the Poisson approximation technique, which was suggested by Barbour and Eagleson [2], and self-normalized moderate devia- tions studied by Shao [10], Wang and Jing [14], Shao [11], Jing, Shao and Wang [8]. Now we can see that the sample correlation coefficient is still useful in large dimensional statistical inference if high order moment assumptions are violated. But if strong structural assumptions on the population, such as normality, are not present, the sample correlation coefficient does not retain much interest because ρ is not invariant under rearrangement of the values of the independent random variables. More precisely, if Z = ψ(X), where ψ is one-to-one and measurable, the correlation coefficient between X and Y may be very different from that between Z and Y . Another disadvantage of the sample correlation coefficient is that ρ =0does not necessarily imply the independence. Therefore other measures of independence can be of interest. For the independence test problems, it is Spearman [13] who introduced Spearman’s rank correlation coefficient into . T For a column Xi =(X1i,...,Xni) ,weordertheelementsX1i,...,Xni of Xi from least to greatest, and let Qki denote the rank of Xki. Replacing the matrix

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 5348 WANG ZHOU

Xn =(Xki)bythematrix(Qki), similar to (1.2), we can define the so-called Spearman rank correlation coefficient between Xi and Xj as    n Q Q − 1 ( n Q )( n Q ) k=1 ki kj n k=1 ki k=1 kj rij =   n − 2 n − 2 (Qki Qi) (Qkj Qj) (1.8) k=1 k=1 12 n 3(n +1) = Q Q − , n(n2 − 1) ki kj n − 1 k=1

where Qi = Qj =(n +1)/2. All the rij constitute Spearman’s rank correlation matrix

Rn =(rij)1≤i,j≤p.

The most important point is that the distribution of rij does not depend on nuisance parameters. Much information on rij can be found in H´ajek, Sid´ˇ ak and Sen [4]. As a statistically more relevant result than Theorem 1.1, we give the following Theorem 1.2 for the largest off-diagonal entry of Rn,

ln =max|rij|. 1≤i

Although ln is distribution-free, it is hard to get the exact distribution of ln when n is large. In order to make statistical inference based on ln, it is essential to obtain the asymptotic distribution of ln.

Theorem 1.2. Assume that X1,...,Xp are independent, and all the entries of Xn are continuous random variables. If p = pn satisfy lim p = ∞, lim sup p/n < ∞, n→∞ n→∞ then   2 −y/2 −1/2 (1.9) lim P(nl − bn ≤ y)=exp − Ke ,K=(8π) , n→∞ n

with bn =4logp − log log p.

In Theorem 1.2, we assume that all the entries of Xn are continuous random variables in order to avoid ties among the observations. Otherwise, the number of ties must be taken into account. As a consequence, the expectation and the variance under the null (independence) hypothesis need adjustment. We refer to Hollander and Wolfe [5] for more details. For the adjusted rij, Theorem 1.2 still holds. The paper is organized as follows. We will prove the main theorems in Section 2. Some technical lemmas are deferred to Section 3. A counterexample is presented in Section 4 to explain that the weak moment condition (1.6) seems to be optimal.

2. Proofs of main theorems We first present a lemma about Poisson approximation, which is a special case of Theorem 1 in Arratia, Goldstein and Gordon [1]. In the proofs of the main theorems, we rely heavily on it.

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYMPTOTIC DISTRIBUTION OF LARGEST OFF-DIAGONAL ENTRY 5349

Lemma 2.1. Let B be an index set and {Bα,α∈ B} be a set of subsets of B, i.e., ⊂ ∈ { ∈ } Bα B for each α B.Alsolet ηα,α B be random variables. For a given t ∈ R,setλ = P(η >t). Then α∈B α   −λ −1 P(max ηα ≤ t) − e  ≤ (1 ∧ λ )(b1 + b2 + b3) α∈B where   b1 = P(ηα >t)P(ηβ >t), ∈ ∈ αB β Bα b2 = P(ηα >t,ηβ >t), ∈  ∈ α B α=β Bα     b3 = EP (ηα >t|σ(ηβ,β ∈ Bα)) − P(ηα >t), α∈B

where σ(ηβ,β ∈ Bα) is the σ-algebra generated by {ηβ,β ∈ Bα}.Inparticular,if ηα is independent of {ηβ,β ∈ Bα} for each α,thenb3 =0. Before we prove Theorem 1.1, let us illustrate the main steps of the proof. We first show that it suffices to consider Ln based on the truncated random variables. See (2.1). Then we replace sample variances, the denominator of each correlation coefficient, by population variances. See (2.4). The next step is to truncate the products XˆkiXˆkj. See (2.5). Finally, we apply Lemma 2.1 to get the asymptotic distribution. For notational convenience, we write  tp = 4logp − log log p + y and  tpn = 4logp − log log p + y + n c where n → 0asn →∞. Throughout the paper, for an arbitrary set Ω, Ω denotes the complementary set of Ω.

Proof of Theorem 1.1. Since the sample correlation coefficient is scale and shift invariant, we can assume that EX =0, EX2 =1. Define  √ n 2 T Xˆki = XkiI |Xki|≤ n/ log n , Xˆ i =(Xˆ1i,...,Xˆni) , Xˆi = Xˆki/n, k=1 and T (Xˆ i − Xˆi) (Xˆ j − Xˆj) ρˆij = , Lˆn =max|ρˆij|, 1≤i n/ log n 1≤k≤n,1≤i≤p √ (2.1) 2 ≤ np P(|X12| > n/ log n) ≤ np (log n)12−2κ E|X|6−κ/n3−κ/2 → 0

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 5350 WANG ZHOU

as p →∞, where in the last step we used Lemma 3.1. Let us deal with Lˆn. Write  T  (Xˆ i − Xˆi) (Xˆ j − Xˆj) Lˆ1n =max , 1≤i

 T  (Xˆ i − Xˆi) (Xˆ j − Xˆj) Lˆ2n =max . 1≤in/(log n) . 1≤i≤p ˆ2 ≥ ˆ2 On the other hand, since L1n Ln if Θ2 is true, we have ≥ ∩ ≥ ∩ ≥ − c P(Θ1) P(Θ1 Θ2) P(Θ3 Θ2) P(Θ3) P(Θ2), ˆ2 ≤ 2 where Θ3 =(nL1n tp). Combining the above inequalities gives     ˆ2 ≤ 2 −  ˆ − ˆ 2 − 2 P(nL1n tp) P max  Xi Xi n >n/(log n) 1≤i≤p ≤ ˆ2 ≤ 2 (2.2) P(nLn tp)     ≤ ˆ2 ≤ 2  ˆ − ˆ 2 − 2 P(nL2n tp)+P max  Xi Xi n >n/(log n) . 1≤i≤p By (3.20) of Lemma 3.5,    2  2 P max Xˆ i − Xˆi − n >n/(log n) ≤ ≤ 1 ip   (2.3)  2  2 ≤ pP Xˆ 1 − Xˆ1 − n >n/(log n) ≤ pn−M → 0 as n →∞.Alsonotethat  2 ± 2 2 tp 1 1/(log n) = tp + o(1) as n →∞. It follows from (2.2)–(2.3), in order to prove the theorem, that it suffices to prove the following:   2 ≤ 2 → − −y/2 (2.4) P(nLn tp) exp Ke as p →∞,where    T  Ln =max(Xˆ i − Xˆi) (Xˆ j − Xˆj)/n. 1≤i

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYMPTOTIC DISTRIBUTION OF LARGEST OFF-DIAGONAL ENTRY 5351

Now for  satisfying (3.33), define  √  (2.5) ξkij = XˆkiXˆkjI |XˆkiXˆkj|≤ n , 1 ≤ k ≤ n,  n ξ − n Xˆ Xˆ y = k=1 kij√ i j , 1 ≤ i

(2.6) Wn =max|yij |. 1≤i n  n p n p  ≤ ≤ ≤ ≤ 1 k n,1 i n → 0 as p →∞, where in the last inequality, we used the conditions (1.6) and (1.4). So we may turn to consider Wn. We will apply Lemma 2.1 to prove   −y/2 (2.8) P max |yij |≤tp → exp − Ke 1≤itp)= √ 1+o(1) 2 8π as p →∞, where the last equality is from Lemma 3.2. Notice that yα is independent of {yβ : β ∈ B \ Bα} for any α ∈ B.Tocomplete the proof of (2.8), by Lemma 2.1, we need to show that b1 → 0andb2 → 0. But 1 b ≤ (p2 − p)2p P(|y | >t )2 = O(1/n) 1 2 12 p by Lemma 3.2. Also Lemma 3.4 gives b ≤ p(p2 − p)P(|y | >t , |y | >t ) 2  12 p 13 p  2 ≤ p(p − p) P(|y12 + y13| > 2tp)+P(|y12 − y13| > 2tp) → 0. Hence (2.8) is verified. So we have completed the proof. 

Proof of Theorem 1.2. We apply Lemma 2.1 directly to prove Theorem√ 1.2. Take B = {(i, j):1≤ it )= P( n|r | >t ). n ij p 2 12 p 1≤i

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 5352 WANG ZHOU

By (1.8), 12 n 3(n +1) r = Q Q − , 12 n(n2 − 1) k1 k2 n − 1 k=1 which can be reexpressed as 12 n 3(n +1) (2.9) r = iR − , 12 n(n2 − 1) i n − 1 i=1

where Ri is some Qk2 such that Qk1 = i. Now applying Lemma 3.6 to (2.9), we have exp(−y/2) λn = √ 1+o(1) 8π as n →∞. Notice that rα is independent of {rβ : β ∈ B \ Bα} for any α ∈ B.Tocomplete the proof, by Lemma 2.1, we need to show that b1 → 0andb2 → 0. But 1 √ b ≤ (p2 − p)2p P2( n|r | >t )=O(1/n) 1 2 12 p by Lemma 3.6, and √ √ b ≤ p(p2 − p)P( n|r | >t , n|r | >t ) 2 √ 12 p 13 p 2 − 2 | | = p(p p)EP1( n r12 >tp),

where P1 means√ the conditional probability√ given X1. For Spearman’s correlation coefficient, P1( n|r12| >tp)=P( n|r12| >tp) a.s. Applying Lemma 3.6 again gives b2 → 0. So we have completed the proof. 

3. Technical lemmas {  } Let ξ,η,ξk,ηk,ηk,k =1, 2,...,n be i.i.d. random variables with mean 0 and variance 1. Also suppose (3.1) sup x6P (|ξη| >x) < ∞. x>0 Lemma 3.1. Under conditions ( 3.1), we have E|ξ|6−κ < ∞, E|ξη|6−κ < ∞, for all 0 <κ<1. The above Lemma 3.1 is well known for one . For the product, just note that E|ξη|6−κ = E|ξ|6−κE|η|6−κ, due to the i.i.d. assumption. So we omit its proof. Now we introduce the following truncated random variables: ˆ    ξ = ξI (|ξ|≤dn) , ηˆ = ηI (|η|≤dn) , ηˆ = η I (|η |≤dn)

with √ 2 1/3 dn = n/(log n) or dn = n , and ˆ | |≤ | |≤   | |≤ ξi = ξiI ( ξi dn) , ηˆi = ηiI ( ηi dn) , ηˆi = ηiI ( ηi dn) , for i =1, 2,...,n.

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYMPTOTIC DISTRIBUTION OF LARGEST OFF-DIAGONAL ENTRY 5353 √ So we truncate all the variables at two levels. One is n/(log n)2, which is used in the proof of Theorem 1.1. The other one is n1/3, which we use in Example 1 of ˆ  Section 4. The means of ξi,ˆηi andη ˆi are denoted by n n n ˆ ˆ   ξ = ξi/n, ηˆ = ηˆi/n, ηˆ = ηˆi/n. i=1 i=1 i=1 Define, for  satisfying (3.33), √ √ z = ξˆηˆI(|ξˆηˆ|≤ n),z = ξˆηˆI(|ξˆηˆ|≤ n), √ √ ˆ |ˆ |≤  ˆ  |ˆ |≤ zi = ξiηˆiI( ξiηˆi  n),zi = ξiηˆiI( ξiηˆi  n), for i =1, 2,...,n.

Truncating z and zi again, we have  √  √ 2 2 zˆ = zI |z|≤ n/(log n) , zˆi = ziI |zi|≤ n/(log n) , for i =1, 2,...,n.  The sums of zi,zi, zˆi are denoted by n n n   ˆ S = zi,S= zi, S = zˆi. i=1 i=1 i=1

The following Lemma 3.2 is used to calculate λ and b1 of Lemma 2.1. The main idea of proving Lemma 3.2 is to replace the population variance by the sample variance so that we can use the self-normalized moderate deviations. Lemma 3.2. For the above truncated random variables,

 √   −y/2   e −2 (3.2) P (S − nξˆηˆ)/ n >tp √ p → 1, 2π and for any A>1,    √ 2 (3.3) P |S/ n| >A log p = o(p−A /4) as n →∞.

Proof. The Bernstein inequality implies, for any 0 <δ0 < 1/2, that     2δ   − n 0 − P ξˆ− Eξˆ >nδ0 1/2 ≤ 2exp − √ ≤ n c ˆ 2 δ0−1/2 2Varξ + 3  nn with c>2forn sufficiently large, which combined with |Eξˆ|≤n−(5−κ)/3E|ξ|6−κ implies that  √   √  − − 2δ0 1/2 c ˆ P |S/ n|−4n >tp − n ≤ P |(S − nξηˆ)/ n| >tp  √  2δ0−1/2 −c (3.4) ≤ P |S/ n| +4n >tp + n

for n sufficiently large and κ<1. Let δ0 ∈ (0, 1/4). Then 2δ0 − 1/2 < 0, which implies that ± 2δ−1/2 2 − 2 → (3.5) (tp 4n ) tp 0 as n →∞. Hence, in order to prove (3.2), by (3.4) and (3.5) it suffices to prove

 √ −y/2 e −2 (3.6) P |S/ n| >tpn √ p → 1 2π

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 5354 WANG ZHOU

as n →∞.Since  √  √  P |S/ n| >tpn − P |S/ˆ n| >tpn

√ √ 2 = P |S/ n| >tpn, max |zi| > n/(log n) ≤ ≤ 1 i n √ √ 2 − P |S/ˆ n| >tpn, max |zi| > n/(log n) 1≤i≤n by Lemma 3.3, it remains to prove √ √ 2 −2 (3.7) P(|S/˜ n| >tpn, max |zi| > n/(log n) )=o(p )withS˜ = S or Sˆ 1≤i≤n  √  √ as n →∞.LetuswriteZ = |z | > n/(log n)2 , Z = |z | > n/(log n)2 ,  √ 1 1  2 2 | | 2 ≤ ≤ c Z2n = max2≤i≤n zi > n/(log n) . Then, noting zi =ˆzi for 2 i n if Z2n holds, we have  √  ≤ ∩ | ˜ | ∩ ∩ c (3.7) nP(Z1 Z2n)+nP ( S/ n >tpn) Z1 Z2n √ ≤ n2P(Z ∩ Z )+nP(|z / n| >δt ) 1 2 1 pn      n √  + nP  z / n > (1 − δ)t ∩ Z ∩ Zc (3.8) i pn 1 2n i=2 √ ≤ n2P(Z ∩ Z )+nP(|z / n| >δt ) 1 2 1 pn     n √  + nP(Z1)P  zˆi/ n > (1 − δ)tpn i=2 where δ can be arbitrarily small. In the above inequality, applying Condition (3.1) and the Chebyshev inequality, we have √ −2 2 −2 nP(|z1/ n| >δtpn)=o(n )andn P(Z1 ∩ Z2)=o(n ). For the second probability in (3.8), by (3.13) of Lemma 3.3, we have   n √    −2+6δ P  zˆi/ n > (1 − δ)tpn = o(p ) i=2 for n sufficiently large, which combined with 6−κ 12−2κ 3−κ/2 P(Z1) ≤ E|z1| (log n) /n shows that, for sufficiently small κ and δ,   n √    −2 nP(Z1)P  zˆi/ n > (1 − δ)tpn = o(p ) i=2 as n →∞. Hence (3.7) is proved, and we have obtained (3.2). Now let us prove (3.3). Note that √ √ √ (3.9) n|Ez| = n|Eξˆηˆ − EξˆηˆI(|ξˆηˆ| > n)| = o(n−3/2). Hence for n sufficiently large,  √    √   (3.10) P |S/ n| >A log p ≤ P |(S − nEz)/ n| > 0.9A log p .

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYMPTOTIC DISTRIBUTION OF LARGEST OFF-DIAGONAL ENTRY 5355

So by (3.24) of Lemma 3.5, we have     √   P (S − nEz)/ n > 0.9A log p (3.11)   √  −M ≤ P |Tz| > 0.9A log p/ 1+1 + n ,  − n − 2 2 where Tz =(S nEz)/ i=1(zi Ezi) and M>A/4. Now applying Theorem 2.3 of Jing, Shao and Wang [8],

  √    √  P |Tz| > 0.9A log p/ 1+1 / 2 1 − Φ(0.9A log p/ 1+1) → 1

2 as n →∞, which combined with (3.10), (3.11) and the fact 1 − Φ(s) ∼ √ 1 e−s /2 2πs as s →∞, implies (3.3). 

Lemma 3.3. For the zˆi,i=1, 2,...,n, we have

 √  −y/2 e −2 (3.12) P |S/ˆ n| >tp √ p → 1, 2π and for sufficiently small but fixed δ>0,   n √    −2+6δ (3.13) P  zˆi/ n > (1 − δ)tpn = o(p ), i=2 as n →∞. Proof. The proofs of (3.12) and (3.13) are the same. We only prove (3.12). Note that √ √ √ √ n|Ezˆ| = n|Eξˆηˆ − EξˆηˆI(|ξˆηˆ| > n) − EzI(|z| > n/(log n)2)| (3.14) = o(n−3/2) as n →∞by the inequality E|z|I(|z| >x) ≤ x−5+κE|z|6−κ and Lemma 3.1. Hence in order to show (3.12), it suffices to prove

 √  −y/2 e −2 (3.15) P |(Sˆ − nEzˆ)/ n| >tpn √ p → 1 2π as n →∞. Applying (3.21) of Lemma 3.5, we have    −2 1/2 −M P |Tzˆ| > 1+(logn) tpn − n  √  ≤ P |(Sˆ − nEzˆ)/ n| >tpn    −2 1/2 −M ≤ P |Tzˆ| > 1 − (log n) tpn + n ,  n 2 where Tzˆ =(Sˆ − nEzˆ)/ (ˆzi − Ezˆ) and M>3. Note that  i=1 ± −2 2 2 1 (log n) tpn = tpn + o(1) as n →∞. So in order to get (3.15), we need to prove

−y/2 e −2 (3.16) P (|Tzˆ| >tpn) √ p → 1 2π as n →∞. Now we can invoke Theorem 2.3 of Jing, Shao and Wang [8] to obtain (3.16). Hence, we have completed the proof of (3.12). 

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 5356 WANG ZHOU

The following Lemma 3.4 is useful when we estimate b2 of Lemma 2.1. Lemma 3.4. For the above truncated random variables,      √   ˆ −3 (3.17) P  S + S − nξ(ηˆ + ηˆ ) / n > 2tp = o(p )

and      √   ˆ −3 (3.18) P  S − S − nξ(ηˆ − ηˆ ) / n > 2tp = o(p ) as n →∞ Proof. Similarly as in the derivation of (3.4), and noting (3.9), in order to prove (3.17), we need to prove  √       −3 (3.19) P  S + S − n(Ez + Ez ) / n > 2tpn = o(p )

as n →∞. By (3.22) of Lemma 3.5, we have     √      1/2 −M P  S + S − n(Ez + Ez )  > 2tpn ≤ P |Tzz | > (1 − 1/2) 2tpn + n ,     n   1/2  − − − 2 where Tzz = S + S n(Ez + Ez ) / i=1(zi + zi Ezi Ezi) and M>4. Now applying Theorem 2.3 of Jing, Shao and Wang [8] to  √  1/2 P |Tzz | > (1 − 1/2) 2tpn

gives (3.19). So the proof of (3.17) is complete, and we can get (3.18) similarly. 

Lemma 3.5 guarantees that in the proof of Theorem 1.1, the sample variance and the population variance are equivalent. ˆ  ˆ  Lemma 3.5. For the above truncated random variables ξ, z, z , ξi,zi,zi,i= 1, 2,...,n, we have     n 2  ˆ2 − ˆ − 2 ≤ −M (3.20) P  ξi /n ξ 1 > 1/(log n) n , i=1   n   2  2 −M (3.21) P  (ˆzi − Ezˆ) /n − 1 > 1/(log n) ≤ n , i=1     1 n  (3.22) P  (z + z − Ez − Ez)2 − 2 > ≤ n−M , n i i i i 1 i=1     1 n  (3.23) P  (z − z − Ez + Ez)2 − 2 > ≤ n−M , n i i i i 1 i=1     1 n  (3.24) P  (z − Ez )2 − 1 > ≤ n−M , n i i 1 i=1

with arbitrarily large M (say, greater than 100) and arbitrarily small but fixed 1. Proof. The Bernstein inequality implies that      P ξˆ− Eξˆ >n−1/6 ≤ n−M .

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYMPTOTIC DISTRIBUTION OF LARGEST OFF-DIAGONAL ENTRY 5357

Noting 1 ≥ Eξˆ2 2 −(4−κ)/3 6−κ =1− Eξ I (|ξ| >dn) ≥ 1 − n E|ξ| and |Eξˆ|≤n−(5−κ)/3E|ξ|6−κ,wehave     n 2  ˆ2 − ˆ − 2 P  ξi /n ξ 1 > 1/(log n) i=1  (3.25)    n   ≤ ˆ2 − ˆ2 2 −M P  ξi /n Eξ  > 1/ 2(log n) + n , i=1 for n sufficiently large. For the probability term in (3.25), we have     n   ˆ2 − ˆ2 2 P  ξi /n Eξ  > 1/ 2(log n) i=1    t n ≤ exp − E exp tn−1(ξˆ2 − Eξˆ2) 2(log n)2     t n +exp − E exp tn−1(Eξˆ2 − ξˆ2) 2(log n)2   t 1  n ≤ 2exp − 1+ t2n−2Eξ4 exp 2t/(log n)4 2(log n)2 2 ≤ n−M , (let t =2M log3 n)

where the above second inequality follows from the Taylor√ expansion of the expo- 2 nential function and the fact that ξˆ is bounded by dn(≤ n/(log n) ), and the last one follows from the fact that 1 + x ≤ ex for x>0. Hence we can complete the proofof(3.20).  √  Now let us prove (3.21). Recalling that z = ξˆηˆ I |ξˆηˆ|≤ n ,wehave  √   √  ˆ ˆ (3.26) z + ξI (|ξ| >dn)ˆηI |ξηˆ|≤ n + ξηI (|η| >dn) I |ξηˆ|≤ n √ (3.27) = ξη − ξηI(|ξη| > n) − ξηI2 − ξηI3 − ξηI4,

where √ I2 = I |ξη|I (|ξ| >dn, |η|≤dn) ≤  n ,

  √ I3 = I |ξη|I |ξ|≤dn, |η| >dn ≤  n

and   √ I4 = I |ξη|I |ξ| >dn, |η| >dn ≤  n .

For the last two terms in (3.26), noting Eξ2 = Eη2 =1,wehave

   √  2 ˆ E ξI |ξ| >dn ηˆI |ξηˆ|≤ n (3.28)   2 −(4−κ)/3 6−κ ≤ Eξ I |ξ| >dn ≤ n E|ξ| ,

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 5358 WANG ZHOU

and

√ 2 ˆ −(4−κ)/3 6−κ (3.29) E ξηI (|η| >dn) I(|ξηˆ|≤ n) ≤ n E|η| .

For the last three terms in (3.27), we have     2 2 2 2 Eξ η I2 ≤ Eξ I |ξ| >dn η I |η|≤dn (3.30) ≤ n−(4−κ)/3E|ξ|6−κ,

and

2 2 −(4−κ)/3 6−κ 2 2 (3.31) Eξ η I3 ≤ n E|η| , Eξ η I4 =0

for n sufficiently large. Combining (3.26)–(3.31) and making use of the elementary fact √ √ Ea2 + Eb2 − 2 Ea2Eb2 ≤ E(a + b)2 ≤ Ea2 + Eb2 +2 Ea2Eb2, we have √ (3.32) Ez2 =1+O(1/ n).

But  √  E(ˆz − Ezˆ)2 = Ez2 − Ez2I |z| > n/(log n)2 − (Ezˆ)2 √ =1+O(1/ n)

as n →∞, where the last equality comes from (3.14) and (3.32). Hence we can repeat the proof of (3.20) to get (3.21). Finally we will prove (3.22). The proofs of (3.23) and (3.24) are the same as that of (3.22). Noting (3.32) and the basic inequality

(Ea)2 ≤ Ea2,

we have √ E(z + z − Ez − Ez)2 =2− 2Ezz + O(1/ n). Let us turn to the calculation of Ezz. Recalling the definitions of z and z,we have √ √ zz + ξˆ2ηˆηˆI(|ξˆηˆ| > n)I(|ξˆηˆ|≤ n) √ √ + ξˆ2ηˆηˆI(|ξˆηˆ|≤ n)I(|ξˆηˆ| > n) √ √ + ξˆ2ηˆηˆI(|ξˆηˆ| > n)I(|ξˆηˆ| > n)=ξˆ2ηˆηˆ,

which implies Ezz = O(n−1/2) as n →∞. Hence √ E(z + z − Ez − Ez)2 =2+O(1/ n).

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYMPTOTIC DISTRIBUTION OF LARGEST OFF-DIAGONAL ENTRY 5359

 − −  2 | |≤ 2 Write ωi √=(zi + zi Ezi Ezi) , i =1,...,n.Wehaveωi 16 n and Eωi = 2+O(1/ n). Hence for n large enough,   n n P | ωi/n − 2| >1 ≤ P | (ωi − Eωi)/n| >1/2 i=1 i=1   n ≤ exp(−t /2) E exp t(ω − Eω )/n 1  1 1 n +exp(−t /2) E exp t(Eω − ω )/n 1 1 1 1 ≤ 2exp(−t /2) exp t2n−1Eω2 exp(322t) 1 2 1 −M ≤ n , (let t =3M log n/1)

where  satisfies

2 (3.33) 96M /1 < 1. 

The following lemma is a special case of Theorem 2.1 in Seoh, Ralescu and Puri [12].

Lemma 3.6. For Spearman’s correlation coefficient ( 1.8), 1 ≤ ix =2(1− Φ(x)) (1 + o(1)), Var rij

where Var rij =1/(n − 1).

4. Example Now let us give an example explaining that if

(4.1) lim x6P (|XX| >x) > 0, x→∞

then (1.5) may not hold, or the limit could be different. So our weak moment condition (1.6) seems to be optimal for (1.5). In this section, the results are not rigorous.

Example 1. Let X be a symmetric random variable such that (4.2) −6 −1/2 −6 −1/2 P(X =0)=1− 2c0a log a, P(X>x)=c0x log x, x > a > 1,

2 where c0 is determined by EX = 1. The density function of the absolute continu- ous part of X is

−7 −1/2 −1 −7 −3/2 f(x)=6c0|x| log |x| +2 c0|x| log |x|, |x| >a.

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 5360 WANG ZHOU

 Let X ,Xki,1≤ k ≤ n, 1 ≤ i ≤ p, be i.i.d. random variables with the same distribution as X.Thenforx>a2,  x/a   P (|XX | >x)=4 P (X>x/x1) f(x1)dx1 +4P(X>a)P(X >x/a)  a 1−(log a)/ log x 2 −6 −1/2 − −1/2 =24c0x y (1 y) dy (log a)/ log x  1−(log a)/ log x 2 −6 −1 −3/2 − −1/2 +2c0x log x y (1 y) dy (log a)/ log x +4P(X>a)P(X >x/a), which implies that  1 (4.3) lim x6P (|XX| >x)=24c2 y−1/2(1 − y)−1/2dy. →∞ 0 x 0 Note that (4.3) is weaker than the moment condition (1.6). Choose p = n. We will use an indirect argument, not so rigorously, to show that for random variables Xki,1≤ k ≤ n, 1 ≤ i ≤ p,   2 ≤ 2 → − −y/2 P(nLn tp) exp Ke as n →∞. Now we assume for random variables Xki,1≤ k ≤ n, 1 ≤ i ≤ p,that   2 ≤ 2 → − −y/2 (4.4) P(nLn tp) exp Ke as n →∞. Based on the assumption (4.4), we may get a contradiction. Define   n 1/3 T X˜ki = XkiI |Xki|≤n , X˜ i =(X˜1i,...,X˜ni) , X˜i = X˜ki/n, k=1 and T (X˜ i − X˜i) (X˜ j − X˜j) ρ˜ij = , L˜n =max|ρ˜ij|, 1≤in 1≤k≤n,1≤i≤p (4.5) 1/3 ≤ np P(|X12| >n ) → 0 as n →∞.Write

 T  (X˜ i − X˜i) (X˜ j − X˜j) L˜1n =max  1≤i

 T  (X˜ i − X˜i) (X˜ j − X˜j) L˜2n =max . 1≤i

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYMPTOTIC DISTRIBUTION OF LARGEST OFF-DIAGONAL ENTRY 5361

By similar arguments leading to (2.2), we have     ˜2 ≤ 2 −  ˜ − ˜ 2 − 2 P(nL1n tp) P max  Xi Xi n >n/(log n) 1≤i≤p ≤ ˜2 ≤ 2 (4.6) P(nLn tp)     ≤ ˜2 ≤ 2  ˜ − ˜ 2 − 2 P(nL2n tp)+P max  Xi Xi n >n/(log n) . 1≤i≤p By (3.20) of Lemma 3.5,    2  2 −M (4.7) P max X˜ i − X˜i − n >n/(log n) ≤ pn → 0 1≤i≤p as n →∞.Alsonotethat  2 ± 2 2 tp 1 1/(log n) = tp + o(1) as n →∞. Combining (4.5)–(4.7) and the assumption (4.4) gives   2 ˜ ≤ 2 → − −y/2 (4.8) P(nLn tp) exp Ke as n →∞,where    T  L˜n =max(X˜ i − X˜i) (X˜ j − X˜j)/n. 1≤i

(4.10) W˜ n =max|y˜ij |. 1≤i

as n →∞. Noting (4.8), we have

2 ˜ ≤ 2 − ˜ 2 ≤ 2 → (4.12) P(nLn tp) P(nWn tp) 0    →∞ ≥ − ∩ as n . Using the inequality P( i Θi) i P(Θi) i= j P(Θi Θj), we conjecture 2 | ˜ ≤ 2 − ˜ 2 ≤ 2 | P(nLn tp) P(nWn tp) −1 (4.13) ≥ 2 np(p − 1)P (L112) 3 2 (4.14) − np P(L112 ∩ L113) − (np(p − 1)) P(L112 ∩ L212) 2 − ˜ 2 (4.15) P(nLn >tp),

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 5362 WANG ZHOU  √  where Lkij = |X˜kiX˜kj| >A0 n ,andA0 is a constant specified in (4.20). For (4.13), √ 1/3 1/3 np(p − 1)P (L112)=np(p − 1)P(|X11X12| >A0 n, |X11|≤n , |X12|≤n )  n1/3 √ 1/3 =4np(p − 1) P(A0 n/x < X12 ≤ n )f(x)dx 1/6 A0n → 2 −6 (4.16) 24c0A0 c  2/3 −1/2 − −1/2 →∞ with c = 1/3 y (1 y) dy,asn . For the first term in (4.14), 3 np P(L112 ∩ L113) √ √ 3 1/3 = np P(|X11X12| >A0 n, |X11X13| >A0 n, |X11|≤n , 1/3 1/3 |X12|≤n , |X13|≤n )  n1/3 √ 3 2 1/3 =8np P (A0 n/x < X12 ≤ n )f(x)dx. 1/6 A0n Since √  √ √  2 ≤ 1/3 ≤ 2 −12 −1 −4 −1 1/3 P (A0 n/x

For I1,  √ n1/3 √ 3 3 −12 −1 5 −1/2 −1 −3/2 I1 =16np c0(A0 n) log (A0 n/x)x (6 log x +2 log x)dx 1/6 A0n √  1/3 √ log(n )/ log(A0 n)  √ 3 3 −12 − −1 =16np c0(A0 n) √ (1 y) exp 6y log(A0 n) log(A n1/6)/ log(A n)  √0 0 √  −1/2 −1/2 −1 −3/2 −3/2 × 6y log (A0 n)+2 y log (A0 n) dx = o(1)

as n →∞.ForI2,wealsohaveI2 = o(1) as n →∞. Hence 3 (4.17) np P(L112 ∩ L113)=o(1) as n →∞. For the second term in (4.14),  − 2 ∩ → 2 −6 2 (4.18) (np(p 1)) P(L112 L212) 24c0A0 c as n →∞.

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use ASYMPTOTIC DISTRIBUTION OF LARGEST OFF-DIAGONAL ENTRY 5363

Hence combining (4.12)–(4.18) and our assumption (4.4), we have

2 0 = lim P(nL˜ ≤ t2) − P(nW˜ 2 ≤ t2) n→∞ n p n p (4.19)   ≥ 2 −6 − 2 −6 2 − − − −y/2 12c0A0 c (24c0A0 c) 1 exp( Ke ) .

Let   2 1/6 (4.20) A0 =max (48c0c) , . Select y so that 2 −6 − 2 −6 2 − − −y/2 12c0A0 c (24c0A0 c) > 1 exp( Ke ). Then (4.19) yields 0 > 0, a contradiction if we have (4.13)–(4.15). Hence for the random variable (4.2), we may not have (4.4). Acknowledgments The author would like to thank the referee for his/her valuable suggestions and remarks. References

[1] Arratia, R., Goldstein, L. and Gordon, L. (1989). Two moments suffice for Poisson approxi- mation: the Chen-Stein method. Ann. Prob., 17, 9-25. MR972770 (90b:60021) [2] Barbour, A. and Eagleson, G. (1984). Poisson convergence for dissociated statistics. J. R. Statist. Soc. B, 46, 397-402. MR790624 (86k:60033) [3] Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika, 10, 507-521. [4] H´ajek, J., Sid´ˇ ak, Z. and Sen, P. K. (1999). Theory of Rank Tests. Academic Press, New York. [5] Hollander, M. and Wolfe, D. A. (1999). Nonparametric Statistical Methods. Wiley, New York. MR1666064 (99m:62004) [6] Hotelling, H. (1953). New light on the correlation coefficient and its transforms. (with dis- cussion) J. Roy. Statist. Soc. Ser. B., 15, 193-232. MR0060794 (15:728d) [7] Jiang, T. (2004). The asymptotic distributions of the largest entries of sample correlation matrices. Ann. Appl. Prob., 14, 865-880. MR2052906 (2005b:60053) [8] Jing, B.-Y., Shao, Q.-M. and Wang, Q. (2003). Self-normalized Cram´er-type large deviations for independent random variables. Ann. Probab., 31, 2167–2215. MR2016616 (2004k:60069) [9] Johnstone, I. (2001). On the distribution of the largest eigenvalue in principal component analysis. Ann. Stat., 29, 295-327. MR1863961 (2002i:62115) [10] Shao, Q.-M. (1997). Self-normalized large deviations. Ann. Probab., 25, 285–328. MR1428510 (98b:60056) [11] Shao, Q.-M. (1999). A Cram´er type large deviation result for Student’s t-statistic. J. Theoret. Probab., 12, 385–398. MR1684750 (2000d:60046) [12] Seoh, M., Ralescu, S. and Puri, M. L. (1985). Cram´er type large deviations for generalized rank statistics. Ann. Prob., 13, 115-125. MR770632 (86k:62077) [13] Spearman, C. (1904). The proof and measurement of association between two things. Amer. J. Psychol., 15, 72-101. [14] Wang, Q. Y. and Jing, B.-Y. (1999). An exponential non-uniform Berry-Ess´een bound for self-normalized sums. Ann. Probab., 27, 2068-2088. MR1742902 (2001c:60045)

Department of Statistics and Applied Probability, National University of Singa- pore, Singapore, 117546 E-mail address: [email protected]

License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use