Determinant of Sample Correlation with Application

Tiefeng Jiang1 University of Minnesota

Abstract

Let x1, ··· , xn be independent random vectors of a common p-dimensional normal distribu-

tion with population correlation matrix Rn. The sample correlation matrix Rˆ n = (ˆrij )p×p

is generated from x1, ··· , xn such thatr ˆij is the Pearson correlation coefficient between the 0 i-th column and the j-th column of the data matrix (x1, ··· , xn) . The matrix Rˆ n is a popular object in the Multivariate Analysis and it has many connections to other problems.

We derive a central limit theorem (CLT) for the logarithm of the of Rˆ n for a

big class of Rn. The expressions of mean and the variance in the CLT are not obvious, and they are not known before. In particular, the CLT holds if p/n has a non-zero limit and

the smallest eigenvalue of Rn is larger than 1/2. Besides, a formula of the moments of |Rˆ n| and a new method of showing weak convergence are introduced. We apply the CLT to a high-dimensional statistical test.

Keywords: Central limit theorem, sample correlation matrix, smallest eigenvalue, multivariate normal distribution, moment generating . AMS 2000 Subject Classification: 60B20 and 60F05.

1School of Statistics, University of Minnesota, 224 Church Street, S. E., MN55455, USA, [email protected]. The research was supported in part by NSF Grant DMS-1209166 and DMS-1406279.

1 1 Introduction

We first give a background of the sample correlation matrix, then state our main result. Two new tools are introduced and the method of the proof is elaborated by using them. At last an application is presented.

1.1 Main Results

Let x1, ··· , xn be a sequence of independent random vectors from a common distribution Np(µ, Σ), that is, a p-dimensional normal distribution with mean vector µ and covariance matrix Σ = (σij)p×p. We always assume σii > 0 for each i to avoid trivial cases. In order to take limit in n, the dimension p is assumed to depend on n and is written by pn. Sometimes we write p for pn and Σ for Σn to ease notation. The corresponding correlation matrix Rn = (rij)p×p is defined by

σij rii = 1 and rij = √ (1.1) σiiσjj

0 for all 1 ≤ i 6= j ≤ p. Write X = (xij)n×p = (x1, ··· , xn) . Letr ˆij denote the Pearson correlation 0 0 coefficient between (x1i, ··· , xni) and (x1j, ··· , xnj) , given by Pn k=1(xki − x¯i)(xkj − x¯j) rˆij = (1.2) pPn 2 Pn 2 k=1(xki − x¯i) · k=1(xkj − x¯j) 1 Pn 1 Pn wherex ¯i = n k=1 xki andx ¯j = n k=1 xkj. Then the sample correlation matrix obtained from the random sample x1, ··· , xn is defined by

Rˆ n := (ˆrij)p×p. (1.3)

A natural requirement for non-singularity of Rˆ n is n > p. In this paper we will prove that log |Rˆ n|, the logarithm of the determinant of Rˆ n, satisfies the central limit theorem (CLT) under suitable assumption on Σ. Our theorem holds for a big class of Rn containing I, in particular, for Rn generated by x1 as in (1.1) such that the p entries of x1 are far from independent. In the next, we will review the literature on Rˆ n and narrate our motivation to investigate log |Rˆ n|. The sample correlation matrix Rˆ n is a popular target in Multivariate Analysis, a subject in Statistics; see, e.g., the classical books by Anderson (1958), Muirhead (1982) and Eaton (1983).

Particularly, |Rˆ n| is the likelihood ratio test statistic for testing that the p entries of X1 are independent, but not necessarily identically distributed. If Rn = I, some are understood about Rˆ n. For example, the density of |Rˆ n| is given by

(n−p−2)/2 Constant · |Rn| dRn; see, e.g., Theorem 5.1.3 in Muirhead (1982). Unfortunately, the density of the eigenvalues of

Rˆ n is not known because Rˆ n does not have the orthogonal invariance that standard random matrices (e.g., the Hermite ensemble, the Laguerre ensemble and the Jacobi ensemble) usually have. As a consequence, unlike typical random matrices, the research on Rˆ n can not rely on the density of its eigenvalues. This makes any efforts more involved.

On the other hand, the largest entries of Rˆ n is related to other statistical testing problems, random packing on spheres in Rp and the seventh most challenging problem in the 21st century

2 by Smale [Jiang (2004a) and Cai et al. (2013)]. For example, Smale (2000) asks a packing of n points on the unit sphere such that the product of all pairwise distances among the n points almost attains its largest value. There are also investigations on the least moment condition to guarantee the limit law of the largest element of Rˆ n; see, for example, Li and Rosalsky (2006), Zhou (2007) and Li el al. (2010).

Again, under Rn = I, some of the behaviors of the eigenvalues of Rˆ n are known. The empirical distribution of the eigenvalues of Rˆ n satisfies the Mar˘cenko-Pastur law (Jiang, 2004b); the largest eigenvalue of Rˆ n asymptotically satisfies the Tracy-Widom law [Bao et al. (2012)]; the quantity log |Rˆ n| satisfies the CLT [Jiang and Yang (2013) and Jiang and Qi (2015)]. To our knowledge, little is understood on Rˆ n as Rn 6= I. In particular, there is no CLT for log |Rˆ n| as Rn 6= I. Our motivations in this paper to study the CLT of log |Rˆ n| for Σ 6= I are as follows. First, we plan to understand its behavior in general. Second, there is some recent interest to study the of various random matrices. For instance, Tao and Vu (2012) and Nguyen and Vu (2014) derive the CLT for the determinants of Wigner matrices; Cai et al. (2015) derive the

CLT for sample covariance matrices. So it is natural to look into |Rˆ n|. Lastly, when we study the power for the hypothesis test H0: the entries of X1 are completely independent, we need to derive the asymptotic distribution of log |Rˆ n| under Rn 6= I. This problem occurs in the High-dimensional Statistics, which together with Machine Learning forms the most important tools to study Big Data. We will further elaborate the test in Section 1.3. Before introducing our main results, we need some notation. For a square matrix M = P 2 1/2 (mij)p×p, set kMk∞ = max1≤i,j≤p |mij| and kMk2 = ( i,j |mij| ) . We denote by kMk the spectral norm of M, equivalently, the largest singular value of M. The notation |M| stands for the determinant of M. If M is symmetric, its smallest eigenvalue is denoted by λmin(M). In particular, if Rn is positive definite, then λmin(Rn) ∈ (0, 1] due to the fact rii = 1 (see also Lemma 2.3). Keep in mind that Rn is non-random and Rˆ n is a random matrix constructed 0 from data. We adopt the convention 0 = 1. In this paper, we derive the following CLT for the sample correlation matrix.

THEOREM 1 Assume p := pn satisfy that n > p + 4 and p → ∞. Let x1, ··· , xn be i.i.d. from ˆ 1 Np(µ, Σ) and Rn be as in (1.3). Assume infn≥6 λmin(Rn) > 2 . Set 3  p  n − 2 µ = p − n +  log 1 − − p + log |R | ; n 2 n − 1 n − 1 n h p  p i 2 σ2 = −2 + log 1 − + tr [(R − I)2]. n n − 1 n − 1 n − 1 n

Then, (log |Rˆ n| − µn)/σn converges weakly to N(0, 1) as n → ∞ provided one of the following pn 1 2 pnkRn−Ik∞ holds: (i) infn≥6 > 0; (ii) infn≥6 tr[(Rn − I) ] > 0; (iii) sup < ∞. n n n≥6 kRn−Ik2

2 It is not intuitive to see or guess why µn and σn have the expressions in Theorem 1. They come purely from computation. pn 2 Condition (i) is particularly valid if n → c ∈ (0, 1]. Reading σn, conditions (ii) says the entries of Rn and/or pn are large enough. Naturally, kRn − Ik2 ≤ pkRn − Ik∞, so condition (iii) is equivalent to that kRn − Ik2 ≤ pkRn − Ik∞ ≤ KkRn − Ik2 for all n ≥ 6, where K is a constant not depending on n. This condition says that almost all of the entries of Rn − I are at the same magnitude.

3 1 1 The condition infn≥6 λmin(Rn) > 2 in Theorem 1 is used in Lemma 2.16. The number “ 2 ” 1 2 − 2 x 1 comes essentially from the Gaussian density const · e . Literally, it is related to “ 2 m” in (1.5). It will be interesting to see if the limiting distribution is still the normal distribution when 1 λmin(Rn) ≤ 2 . Take Rn = I in Theorem 1, then pnkRn − Ik∞ = kRn − Ik2 = 0. So (iii) above holds and 2 log |Rn| = tr [(Rn − I) ] = 0, the conclusion becomes Corollary 3 from Jiang and Qi (2015). Now let us look at a case where Rn is of “compound symmetry structure”, that is, all of the off- diagonal entries of Rn are equal to a ∈ [0, 1/2), then Rn has eigenvalues (1−a)+ap, 1−a, ··· , 1− 1 p a. Hence λmin(Rn) = 1 − a > 2 . Obviously, kRn − Ik∞ = a and kRn − Ik2 = a p(p − 1) for all n ≥ 6. So condition (iii) from Theorem 1 holds. We then have the following conclusion.

COROLLARY 1 Assume p := pn satisfy that n > p + 4 and p → ∞. Let x1, ··· , xn be i.i.d. from Np(µ, Σ) and Rˆ n be as in (1.3). Assume the off-diagonal entries of Rn are all equal to ˆ an ≥ 0 for each n and satisfy supn≥4 an < 1/2. Then, (log |Rn| − µn)/σn goes to N(0, 1) weakly as n → ∞, where µn and σn are as in Theorem 1 with

p−1 2 2 |Rn| = (1 + an(p − 1))(1 − an) and tr [(Rn − I) ] = p(p − 1)an.

Now we apply Theorem 1 to the correlation matrix from AR(1) and a banded correlation ma- trix. The notation AR(1) stands for the autoregressive process of order one [see, e.g., Brockwell and Davis (2002)]. Both are very popular models in Statistics.

pn COROLLARY 2 Assuming the set-up in Theorem 1 with infn≥6 n > 0. The following is true. 1 |i−j| (i) Let ρ be a constant with |ρ| < 5 . If Rn = (rij)p×p with rij = ρ for all 1 ≤ i, j ≤ p, then the CLT in Theorem 1 holds.

(ii) Let k ≥ 1 be a constant integer. Suppose Rn = (rij)p×p satisfies rij = 0 for |j − i| > k and 1 supn≥6 maxi6=j |rij| < 4k . Then, the CLT in Theorem 1 holds.

Corollary 2 will be checked at the end of Section 2.6. Besides the three special matrices of Rn studied in the above, there are a lot of other patterned matrices including the Toeplitz matrices, the Hankel matrices and the symmetric circulant matrices; see, e.g., Brockwell and Davis (2002).

One needs to check if the smallest eigenvalue of Rn is larger than 1/2. If so, the CLT holds. The proof of Theorem 1 is relatively lengthy. It needs some new tools. The tools and the method of the proof are introduced Section 1.2. In addition, we obtain some interesting matrix inequalities as byproducts. They are stated in Section 2.2.

1.2 New Tools and Strategy of Proofs

Under Σ = Ip, Jiang and Yang (2013) and Jiang and Qi (2015) prove that Ln := (log |Rˆ n| − µn)/σn converges weakly to N(0, 1); see Theorem 1 by taking Σ = Ip. The argument is showing 2 tLn t /2 limn→∞ Ee = e for t ∈ (−t0, t0). The proof may fail if the limit holds only for t in an half t interval [0, t0] or [−t0, 0]. Of course, the exact expression of E[|Rˆ n| ] is crucial. So we prefer to s get a closed form of E[|Rˆ n| ] as Σ 6= Ip. To see what we are able to obtain, define

p Y  1  Γ (z) := πp(p−1)/4 Γ z − (i − 1) (1.4) p 2 i=1

4 1 for z with Re(z) > 2 (p − 1); see, e.g., p. 62 from Muirhead (1982). Through- out the paper, the notation Beta(a, b) stands for the beta distribution with probability density Γ(a+b) a−1 b−1 Γ(a)Γ(b) x (1 − x) for 0 ≤ x ≤ 1, where a > 0 and b > 0 are two parameters.

PROPOSITION 1.1 Let x1, ··· , xn be i.i.d. with distribution Np(µ, Σ) and n = m + 1 > p. Recall Rn and Rˆ n are as in (1.1) and (1.3), respectively. Set ∆n = Rn − I. Then

 Γ( m ) p Γ ( m + t) ˆ t 2 p 2 E[|Rn| ] = m · m Γ( 2 + t) Γp( 2 ) t  −(m/2)−t ·|Rn| · E |I + ∆n · diag(V1, ··· ,Vp)| (1.5)

m for each t > 0, where V1, ··· ,Vp are i.i.d. with Beta(t, 2 )-distribution.

If ∆n = 0, or equivalently, Rn = I, Proposition 1.1 implies

 Γ( m ) p Γ ( m + t) ˆ t 2 p 2 E[|Rn| ] = m · m (1.6) Γ( 2 + t) Γp( 2 ) for all t > 0; see p. 150 from Muirhead (1982) or p. 492 from Wilks (1932). Jiang and Yang (2013) and Jiang and Qi (2015) further generalize this identity. Their results show that the formula is also true for a big range of negative value of t by using the Carlson uniqueness theorem of complex functions. We feel that Proposition 1.1 deserves a further understanding for the case t < 0.

In the special case that the off-diagonal entries of Rn are all equal to a as in Corollary 1, it is seen from Lemma 2.4 later that p p h Y i  X aVi  |I + ∆ · diag(V , ··· ,V )| = (1 − aV ) · 1 + (1.7) n 1 p i 1 − aV i=1 i=1 i for all a ∈ [0, 1). For a general form of ∆n, it seems there is not a better expression. Even worse, m the right hand of (1.5) involves the probability distribution Beta(t, 2 ), which forces t > 0. We may consider a complex continuation by representing the expectation in terms of integrals in a similar way to the Riemann’s zeta function or the Gamma function. However, we do see an advantage on the right hand side of (1.5): the expectation is taken over a function of i.i.d. random variables. To make use of this fact while considering the restriction “t > 0”, we develop a new tool. We say the distribution of a random ξ is uniquely determined by its moments {E(ξp); p = 1, 2, ···} if the following is true: for any random variable η with E(ξp) = E(ηp) for all p = 1, 2, ··· , the probability distributions of ξ and η are identical.

PROPOSITION 1.2 Let {Xn; n = 0, 1, ···} be random variables and δ > 0 be a constant such tXn p that Ee < ∞ for all n ≥ 0 and t ∈ [0, δ]. Assume supn≥0 E(|Xn| ) < ∞ for each p ≥ 1. tXn tX0 Assume limn→∞ Ee = Ee for all t ∈ [0, δ]. If the distribution of X0 can be determined p uniquely by moments {E(X0 ); p = 1, 2, ···}, then Xn converges weakly to X0 as n → ∞.

A classical result says that the above lemma holds if “t ∈ [0, δ]” is replaced by a stronger as- sumption “|t| ≤ δ”; see, e.g., Billingsley (1986). It is interesting to see that the weak convergence in Proposition 1.2 is still valid under a “one-sided” condition on moment generating functions if some extra requirements are fulfilled.

5 With the above two conclusions, we now are ready to state the method of the proof of

Theorem 1. By Proposition 1.2, it suffices to show, there exists s0 > 0 such that

ˆ 2 log |Rn| − µn  t s log E exp s = log E[|Rˆ n| ] − µnt → (1.8) σn 2

s for all s ∈ (0, s0), where t = , and σn

hlog |Rˆ | − µ 2ki sup E n n < ∞ (1.9) n≥6 σn for each integer k ≥ 1. To get (1.8), by Proposition 1.1, we need to work on

m p m  Γ( 2 )  Γp( 2 + t)  −(m/2)−t m · m and In := E |I + ∆n · diag(V1, ··· ,Vp)| . (1.10) Γ( 2 + t) Γp( 2 ) The first one is understood well enough by Jiang and Qi (2015), so it suffices to derive an asymp- totic formula for In However, it seems hard to evaluate In directly. This can be convinced easily − m −t by the explicit case from (1.7). To compute In, by setting Qn := |I+∆n ·diag(V1, ··· ,Vp)| 2 , we will show

−hn e Qn converges in probability to 1; (1.11)

−hn {e Qn; n ≥ 6} is uniformly integrable, (1.12)

hn where hn is an explicit constant depending on t. The two assertions imply EQn ∼ e . Then (1.8) follows. The statements in (1.9), (1.11) and (1.12) are proved in Sections 2.3, 2.4 and 2.5, respectively. A detailed account of the above, which forms the proof of Theorem 1, is presented in Section 2.6.

1.3 An Application to a High-dimensional Likelihood Ratio Test

Recall x1,..., xn are i.i.d. random observations with a common p-variate normal distribution Np(µ, Σ). The according correlation matrix is given in (1.1). In application, there is only one correlation matrix associated with Σ. So we will drop the subscript to write it as R instead of

Rn in this subsection. We test that the p components of x1 are independent, which is equivalent to

H0 : R = Ip vs Ha : R 6= Ip. (1.13)

When p is fixed, the chi-square approximation holds under H0:  2p + 5 − n − 1 − log |Rˆ | converges to χ2 (1.14) 6 n p(p−1)/2 in distribution as n → ∞; see, e.g., Bartlett (1954) or p. 40 from Morrison (2004). According to the latter literature, the rejection region of the likelihood ratio test for (1.13) is

|Rˆ n| ≤ cα where cα is determined so that the test has significance level of α. For many modern data, the population dimension p is very large relative to the sample size n. The approximation (1.14) is

6 far from accurate. To correct this, Jiang and Yang (2013) and Jiang and Qi (2015) prove the

CLT as in Theorem 1 for the case Σ = Ip. For a hypothesis testing problem, we not only need to know the asymptotic distribution of the test statistic under H0 to make a decision, we also need to carry out another procedure, that is, to minimize (type II) error, or equivalently, make the so-called power as large as possible. So we have to develop the CLT for the case Ha : R 6= Ip. Theorem 1 provides us with the exact result. Define 3 p n − 2 µ = (p − n + ) log(1 − ) − p ; n,0 2 n − 1 n − 1 h p  p i σ2 = −2 + log 1 − . n,0 n − 1 n − 1

According to Theorem 1 at R = Ip, the asymptotic size-α test is given by R = {log |Rˆ n| ≤ cα} −1 −1/2 R x −t2/2 with cα = µn,0 + σn,0Φ (α), where Φ(x) = (2π) −∞ e dt. By Theorem 1 again, the power function for the test is

cα − µn  β(R) = P (log |Rˆ n| ≤ cα|R) ∼ Φ σn for all correlation matrix R satisfying the conditions in Theorem 1.

2 Proofs

We will prove the main conclusions, that is, Theorem 1, Propositions 1.1 and 1.2 in this section. The two propositions are almost self-contained results and they will serve as tools to prove Theorem 1. We will work on them first. The proof of Theorem 1 will follow the scheme described in Section 1.2. Since its proof is relatively lengthy, we make a list of sections next to explain what will be done in each part. Section 2.1: Proofs of Major Tools: Propositions 1.1 and 1.2. Section 2.2: Auxiliary Results on Matrices and Gamma Functions. Section 2.3: Moments on Logarithms of Determinants of Sample Correlation Matrices. The inequality (1.9) will be proved. Section 2.4: Convergence of Random Determinants. The assertion (1.11) will be verified. Section 2.5: Uniform Integrability of Random Determinants. The statement (1.12) will be justified. Section 2.6: Proofs of Theorem 1 and Corollary 2.

2.1 Proofs of Major Tools: Propositions 1.1 and 1.2

p Suppose ξ1, ··· , ξm are i.i.d. R -valued random vectors with distribution Np(0, Σ), where 0 ∈ Rp and Σ is a p × p positive definite matrix. We then say the p × p matrix W = 0 (ξ1, ··· , ξm)(ξ1, ··· , ξm) follows the Wishart distribution and denote it by W ∼ Wp(m, Σ). d For two r × s random matrices M1 and M2, we use notation M1 = M2 to represent that the two sets of rs random variables in order have the same joint distribution.

7 LEMMA 2.1 Let x1, ··· , xn be i.i.d. with distribution Np(µ, Σ) and n = m + 1 > p. Let W = (Wij) follow the Wishart distribution Wp(m, Σ). Review Rˆ n in (1.3). Then

ˆ d  Wij  Rn = √ p . (2.1) Wii · Wjj p×p

In particular, Rˆ n is a positive definite matrix with 0 < |Rˆ n| ≤ 1.

0 Proof of Lemma 2.1. Write X = (x1, ··· , xn) = (xij)n×p = (y1, ··· , yp). Define e = (1, ··· , 1)0 ∈ Rn. Review the definition of the Pearson correlation coefficient in (1.2), it is easily 0 0 seen that the correlation between yi = (x1i, ··· , xni) and yj = (x1j, ··· , xnj) is the same as that between yi + ae and yj + be for any a ∈ R and b ∈ R. Thus, without loss of generality, we can assume µ = 0. 1 0 ˆ Set M = In − n ee . Recall Rn = (ˆrij)p×p. From (1.2) and (1.3), it is trivial to check that

0 (Myi) (Myj) rˆij = (2.2) kMyik · kMyjk for all 1 ≤ i, j ≤ p. Let x˜1, ··· , x˜n be i.i.d. random vectors with distribution Np(0, Ip). Then,

0 (My1, ··· , Myp) = M(x1, ··· , xn) d 1/2 1/2 0 = M(Σ x˜1, ··· , Σ x˜n) 0 1/2 = M(x˜1, ··· , x˜n) Σ , (2.3) where Σ1/2 is a positive definite matrix satisfying Σ1/2 · Σ1/2 = Σ. Since M2 = M and tr(M) = 0 n − 1 = m, there is an n × n Γ such that M = Γ diag(Im, 0)Γ . Hence, by the invariance property of Np(0, Ip),

0 d 0 M(x˜1, ··· , x˜n) = Γ diag(Im, 0)(x˜1, ··· , x˜n) ! U = Γ = Γ U 0 1

0 p where 0 = (0, ··· , 0) ∈ R , all of the entries of U = Um×p are i.i.d. standard normals and Γ1 is the n × m matrix by deleting the last column of Γ. This together with (2.3) implies that

d 1/2 (My1, ··· , Myp) = Γ1UΣ .

0 Use the fact Γ1Γ1 = Im to have

0 (Wij)p×p : = (My1, ··· , Myp) (My1, ··· , Myp) =d (Σ1/2U0)(Σ1/2U0)0.

0 2 In particular, Wij = (Myi) (Myj) and Wii = kMyik for all i, j. Evidently, the m columns of 1/2 0 the p × m matrix Σ U are i.i.d. random variables with distribution Np(0, Σ). The conclusion (2.1) then follows from (2.2) and the definition of a Wishart matrix. ˆ −1/2 −1/2 Finally, write Rn = QWQ where Q := diag (W11 , ··· ,Wpp ). For m ≥ p, the Wishart matrix W is positive definite. Thus, Rˆ n is positive definite. Since all of the diagonal entries of ˆ ˆ Qp ˆ Rn are equal to 1, we see |Rn| ≤ i=1(Rn)ii = 1 by the Hadamard inequality. 

8 We now use Lemma 2.1 to show Proposition 1.1. Proof of Proposition 1.1. Let us continue to use the notation in the proof of Lemma 2.1. By Theorem 3.2.1 from Muirhead (1982), the density function of W is given by

− 1 tr(Σ−1W ) (m−p−1)/2 Cm,p · e 2 · |W | (2.4)

1 for all positive definite matrix W , where Γp( 2 m) is the multivariate Gamma function defined at (1.4) and 1 C = . m,p mp/2 1 m/2 2 Γp( 2 m)|Σ| ˆ Qp −1 By Lemma 2.1, |Rn| has the same distribution as that of |W| · ( i=1 Wii) . Thus, t Z |W | 1 −1 ˆ t − 2 tr(Σ W ) (m−p−1)/2 E[|Rn| ] = Cm,p Qp t e · |W | dW W >0 ( i=1 Wii) t+(m−p−1)/2 Z |W | 1 −1 − 2 tr(Σ W ) = Cm,p Qp t e dW. W >0 ( i=1 Wii) Write ∞ 1 Z 1 −t − 2 Wiiyi t−1 Wii = t e yi dyi 2 Γ(t) 0 for all i (this is the step we have to assume t > 0). It follows that

t E[|Rˆ n| ] Z p Z Y 1 −1 1 Pp 0 t−1 t+(m−p−1)/2 − 2 tr(Σ W ) − 2 i=1 Wiiyi = Cm,p yi dy1 ··· dyp |W | e · e dW p (R+) i=1 W >0 where R+ = [0, ∞) and 1 C0 = C · . m,p m,p 2ptΓ(t)p Pp Write Y = diag(y1, ··· , yp) and i=1 Wiiyi = tr(YW ). Then

t E[|Rˆ n| ] Z p Z Y 1 −1 0 t−1 t+(m−p−1)/2 − 2 tr((Σ +Y )W ) = Cm,p yi dy1 ··· dyp |W | e dW p (R+) i=1 W >0 Z p 0 h m pt+(mp/2)i −1 −((m/2)+t) Y t−1 = Cm,p Γp(t + )2 |Σ + Y | yi dy1 ··· dyp (2.5) 2 p (R+) i=1

m 1 by Theorem 2.1.11 from Muirhead (1982) (taking a = t + 2 ) provided t > − 2 (m + 1 − p). From (2.4) the constant in front of the integral from (2.5) becomes 1 1 m 1 Γ ( m + t) · · Γ (t + )2pt+(mp/2) = · p 2 . mp/2 1 m/2 pt p p m/2 p 1 2 Γp( 2 m)|Σ| 2 Γ(t) 2 |Σ| Γ(t) Γp( 2 m) Write |Σ−1 + Y| = |Σ|−1|I + ΣY|. Then

t Γ ( m + t) Z p ˆ t |Σ| p 2 −(m/2)−t Y t−1 E[|Rn| ] = p · m |I + ΣY | yi dy1 ··· dyp (2.6) Γ(t) Γ ( ) p p 2 [0,∞) i=1

9 for any t > 0.

We next will transfer (2.6) to the right hand side of (1.5). Recall (1.1) and write Σ = (σij). 1/2 1/2 Then Σ = LRnL where L := diag (σ11 , ··· , σpp ). Noticing L and Y are both diagonal matrices. Plugging this into (2.6), we see that t |R |t Qp σ  Γ ( m + t) Z p ˆ t n i=1 ii p 2 2 −(m/2)−t Y t−1 E[|Rn| ] = p · m |I + RnL Y | yi dy1 ··· dyp. Γ(t) Γ ( ) p p 2 [0,∞) i=1 2 Obviously, L Y = diag (σ11y1, ··· , σppyp). Set si = σiiyi for all i and S = diag (s1, ··· , sp). We obtain t Γ ( m + t) Z p ˆ t |Rn| p 2 −(m/2)−t Y t−1 E[|Rn| ] = p · m |I + RnS| si ds1 ··· dsp Γ(t) Γ ( ) p p 2 [0,∞) i=1 m p m  Γ( 2 )  Γp( 2 + t) = m · m Γ( 2 + t) Γp( 2 ) p m p Z t  Γ( 2 + t)  −(m/2)−t Y t−1 ·|Rn| · m |I + RnS| si ds1 ··· dsp. Γ(t)Γ( ) p 2 [0,∞) i=1 −1 Write ∆n = Rn − I. Then |I + RnS| = |I + S + ∆nS| = |I + S| · |I + ∆nS(I + S) |. Thus, the last integral equals Z p −1 −(m/2)−t Y −(m/2)−t t−1 |I + ∆nS(I + S) | (1 + si) si ds1 ··· dsp. p [0,∞) i=1

1 xi Set xi = si/(1 + si) for 1 ≤ i ≤ p. Then 1 + si = and si = . The integral above is 1−xi 1−xi equal to Z p Y m −(m/2)−t t−1 2 −1 |I + ∆n · diag(x1, ··· , xp)| xi (1 − xi) dx1 ··· dxp. p [0,1] i=1 This says that  Γ( m ) p Γ ( m + t) ˆ t 2 p 2 E[|Rn| ] = m · m Γ( 2 + t) Γp( 2 ) t −(m/2)−t ·|Rn| · E|I + ∆n · diag(V1, ··· ,Vp)| m where V1, ··· ,Vp are i.i.d. with Beta(t, 2 )-distribution.  Proof of Proposition 1.2. We first claim that

p tXn sup E{|Xn| e } < ∞ (2.7) n≥0 for every integer p ≥ 1 and every t ∈ [0, δ), where δ is as in the statement of the proposition.

p tXn 2p tXn tXn p Obviously, |Xn| e ≤ |Xn| e + e . So, by the given condition supn≥0 E(|Xn| ) < ∞ for each p ≥ 1, we only need to prove (2.7) for even integer p ≥ 1 and t ∈ (0, δ). In fact, for 00 00 p 0 00 0 δ Xn (δ ) p δ ∈ (0, δ), set δ = (δ −δ )/2. By the Taylor expansion, e ≥ p! (Xn) for all even integers 0 p ≥ 1 if Xn ≥ 0. Consequently, for all t ∈ (0, δ ],

p tXn p tXn p tXn (Xn) e = (Xn) e I(Xn ≥ 0) + (Xn) e I(Xn < 0) p! 00 1 (t+δ )Xn p x ≤ 00p · e I(Xn ≥ 0) + p · sup{x e } δ t x≤0 p! 1 δXn p x ≤ 00p · e + p · sup{x e } δ t x≤0

10 for all even integer p ≥ 1. We then get (2.7) by taking expectations. 2 The condition supn≥1 E(Xn) < ∞ implies that {Xn; n ≥ 1} is tight. Thus, for any subse- quence {Xnk ; k ≥ 1} such that Xnk converges weakly to Y as k → ∞, to prove the lemma, it is enough to show that Y and X0 have the same distribution.

tXnk Now we assume that Xnk converging weakly to Y as k → ∞. Since e converges weakly to etY ≥ 0, by assumption and the Fatou lemma, EetY ≤ EetX0 for all t ∈ [0, δ]. This implies tY tX that {e , e nk ; k ≥ 1} is uniformly integrable for each t ∈ [0, δ) by H¨older’sinequality. Thus,

tY tX tX Ee = lim Ee nk = Ee 0 (2.8) k→∞

p tXnk p tY for all t ∈ [0, δ). Furthermore, |Xnk | e → |Y | e weakly for every t ∈ R. By (2.7) and the Fatou lemma again, we know that E{|Y |petY } < ∞ for every p ≥ 1 and every t ∈ [0, δ). This and (2.7) imply

p tY p tX0 E{|Y | e } < ∞ and E{|X0| e } < ∞ (2.9)

p tY p δ1Y p for every p ≥ 1 and every t ∈ [0, δ). Fix δ1 ∈ (0, δ). Observe |Y | e ≤ |Y | e + |Y | for all t ∈ [0, δ1]. Then (2.9) can be strengthened to

p tY  p tX0  E sup |Y | e < ∞ and E sup |X0| e < ∞ (2.10) t∈[0,δ1] t∈[0,δ1] for each p ≥ 1. Now, by the mean-value theorem from calculus, for any t 6= t0 ∈ [0, δ1] and each Y j etY −Y j et0Y j+1 ξY integer j ≥ 0, there exists ξ between t and t0 such that = Y e . Therefore, t−t0

Y jetY − Y jet0Y Y jetY − Y jet0Y lim = Y j+1et0Y and sup ≤ sup |Y |j+1etY t→t0 t − t0 t6=t0∈[0,δ1] t − t0 t∈[0,δ1] for every integer j ≥ 0. By (2.8), (2.10), the dominated convergence theorem and induction,

dp(EetY ) dp(EetX0 ) E(Y petY ) = = = E(XpetX0 ) dtp dtp 0 for each integer p ≥ 1 and all t ∈ [0, δ). At t = 0, the derivatives above are understood as right p p derivatives. Therefore, E(Y ) = E(X0 ) for all integer p ≥ 1 by taking t = 0. By assumption, the moments of X0 uniquely determine the distribution of X0, we conclude that Y and X0 have the same distribution. 

2.2 Auxiliary Results on Matrices and Gamma Functions In this section, we prove some facts on matrices and Gamma functions. We will also verify (1.7) in Lemma 2.4.

We say R = (rij)p×p is a correlation matrix if R is a non-negative definite matrix with rii = 1 for all 1 ≤ i ≤ p. The following auxiliary results on matrices seem interesting in their own way.

P 2 LEMMA 2.2 Let R = (rij)p×p be a correlation matrix. Then max1≤i≤p j6=i rij ≤ kRk.

Proof of Lemma 2.2. For a square matrix M, let λmax(M) and λmin(M) be the largest and smallest eigenvalues of M, respectively. By definition, kMk = λmax(M) if M is non-negative definite.

11 First, assume R is invertible. Let R1 be the (p − 1) × (p − 1) upper-left corner of R and y = (rp1, ··· , rp p−1). Then ! R y0 R = 1 . y 1

By the formula for the determinant of a partitioned matrix [see, e.g., p. 470 from Bai and −1 0 Silverstein (2009)], |R| = |R1| · (1 − yR1 y ). Since R is positive definite, |R| > 0 and |R1| > 0. It follows that

−1 0 −1 2 1 ≥ yR1 y ≥ λmin(R1 ) · |y| |y|2 |y|2 = ≥ λmax(R1) λmax(R) where in the last step we apply the interlacing theorem to the situation that R1 is a submatrix of R; see, e.g., p. 185 from Horn and Johnson (1985). It leads to

X 2 rpj ≤ kRk. (2.11) j6=p For any 1 ≤ i ≤ p − 1, we simply exchange the ith row and pth row and then exchange the ith column and pth column. Since the new and old matrices are conjugate to each other, they have the same eigenvalues. We then apply (2.11) to get the desired conclusion. 1 If R is not invertible. Consider R(x) := 1+x (R + xI) for x > 0. Then R(x) is a correlation matrix and is positive definite for any x > 0. By the proved conclusion,

1 X 2 1 max rij ≤ · kR + xIk. (1 + x)2 1≤i≤p 1 + x j6=i

Recall kMk is continuous in the entries of M. The proof is complete by letting x ↓ 0. 

LEMMA 2.3 Let R = (rij)p×p be a correlation matrix. Then λmin(R) ∈ [0, 1] and λmin(R) = 1 if and only if R = I.

0 Proof of Lemma 2.3. Let λ1 ≥ · · · ≥ λp ≥ 0 be the eigenvalues of R. Take x0 = (1, 0, ··· , 0) ∈ Rp. By the Rayleigh-Ritz formula,

0 0 λmin(R) = min x Rx ≤ x0Rx0 = 1. kxk=1

If λmin(R) = 1, then λ1 ≥ · · · ≥ λp ≥ 1. At the same time, the Hadamard inequality says 0 ≤ λ1 ··· λp = |R| ≤ r11 ··· rpp = 1. Hence, λ1 = ··· = λp = 1. This implies R = I.  We now verify (1.7). The notation here is slightly different from (1.7) for a general purpose.

LEMMA 2.4 Let A = (aij)k×k with aij = 1 for all 1 ≤ i, j ≤ k. Let b1, ··· , bk ∈ R and B = diag(b1, ··· , bk). Then

k k h Y i  X bix  |I + x(A − I)B| = (1 − b x) · 1 + i 1 − b x i=1 i=1 i for all x ∈ R satisfying bix 6= 1 for any 1 ≤ i ≤ k.

12 Qk 0 Let f(x) = i=1(1 − bix). Then the right hand side above is equal to f(x) − xf (x). Proof of Lemma 2.4. Since A is of rank one, the only non-zero eigenvalue is equal to tr(A) = k. √ It is easy to see that the corresponding eigenvector is h = (1, ··· , 1)0/ k ∈ Rk. Then, by the spectral theorem of symmetric matrices, there exists an orthogonal matrix O such that A = O diag(k, 0, ··· , 0) O0, where the first column of O is h. Observe

|I + x(A − I)B|

= |diag(1 − b1x, ··· , 1 − bkx) + xAB| k h Y i −1 −1 = (1 − bix) · I + xA · diag(b1(1 − b1x) , ··· , bk(1 − bkx) ) (2.12) i=1

k for all x with bix 6= 1, i = 1, ··· , k. Now, for any c1, ··· , ck ∈ R ,

I + xA · diag(c1, ··· , ck) 0 = I + xO diag (k, 0, ··· , 0)O · diag(c1, ··· , ck) 0 = I + diag (kx, 0, ··· , 0) · O · diag(c1, ··· , ck)O . (2.13)

0 Write C = O · diag(c1, ··· , ck)O = (cij). Then, recall the first column of O is h. Therefore,

k k X 1 X c = (O0) c O = c . 11 1 i i i 1 k i i=1 i=1 Notice   kxc11 kxc12 ··· kxc1k  0 0 ··· 0  0   diag (kx, 0, ··· , 0) · O · diag(c1, ··· , ck)O =  . . .  .  . . .   . . .  0 0 ··· 0

Pk From (2.13), we get |I + xA · diag(c1, ··· , ck)| = 1 + kxc11 = 1 + x i=1 ci. This together with (2.12) yields the conclusion. 

LEMMA 2.5 Let b = b(x) be a function defined on [0, ∞) with b(x) = o(x) as x → ∞. Then Γ(x+b) (|b|+1)2  log xbΓ(x) = O x as x → ∞. Proof of Lemma 2.5. By Lemma 5.1 from Jiang and Qi (2015), as x → +∞, Γ(x + b) b b2 + 1 log = (x + b) log(x + b) − x log x − b − + O  Γ(x) 2x x2 holds uniformly on b ∈ [−δx, δx] for any given δ ∈ (0, 1). By assumption, b(x) = o(x), then Γ(x + b) b2 + bx + 1 log = (x + b) log(x + b) − x log x − b + O  Γ(x) x2 b b b2 + bx + 1 = x log 1 +  + b log x + b log 1 +  − b + O  x x x2 b b2 b2 b2 + bx + 1 = x + O  + b log x + O  − b + O  x x2 x x2 b2x + bx + b2 + 1 = b log x + O  x2

13 where the formula log(1 + t) = t + O(t2) as t → 0 is used in the third identity. Now

|b2x + bx + b2 + 1| ≤ 2b2x + |b|x + 1 ≤ 2(|b| + 1)2x as x ≥ 1. The conclusion then follows. 

We next provide upper and lower bounds for the standard deviation σn from Theorem 1.

LEMMA 2.6 Let p := pn satisfy that m := n−1 > p+3, Rn and σn be as in Theorem 1. Then 2 p 2 2 2p  + tr [(R − I)2] ≤ σ2 ≤ 2 log m + (2.14) m m n n m p for all n ≥ 6. Furthermore, limn→∞ = ∞. σn

P∞ xi Proof of Lemma 2.6. By the Taylor expansion, log(1 − x) = − i=1 i for all |x| < 1. Hence, x2 −[x + log(1 − x)] ≥ 2 for all x ∈ [0, 1) and

p 2 2 σ2 ≥  + tr [(R − I)2]. n m m n p n−1−p 1 2 P 2 2 Second, log(1 − n−1 ) = log( n−1 ) ≥ log n−1 . Also, tr [(Rn − I) ] = 1≤i6=j≤p rij ≤ p since 2 |rij| ≤ 1. By the definition of σn, the second inequality follows. 1 2 Now we prove the remaining conclusion. It is enough to show p2 σn → 0. First,  p  2p2 σ2 ≤ −2 log 1 − + . n m m Therefore, 1 2  p  2 σ2 ≤ − log 1 − + . p2 n p2 m m

m p 1 p 1 2 1 2 If p ≤ 2 , then 1 − m ≥ 2 and hence − log(1 − m ) ≤ log 2. This implies p2 σn ≤ p2 log 4 + m . m Moreover, if p > 2 , then 1 8 m − p 2 8 4 2 σ2 ≤ − log + ≤ − log + . p2 n m2 m m m2 m m

1 2 1 2 8 Overall, p2 σn ≤ p2 log 4 + m + m2 log m → 0. 

2.3 Moments on Logarithms of Determinants of Sample Correlation Matrices In this section, we will prove (1.9) that is one of the crucial steps in proving Theorem 1. In the following we write Σ for Σn for short notation. Review Wp(m, Σ) as defined at the beginning of Section 2.1.

LEMMA 2.7 Let x1, ··· , xn be i.i.d. from Np(µ, Σ) and Rˆ n be as in (1.3). Assume n > p −1 and Σ exists. Let Rn be the correlation matrix generated by Σ = (σij) as in (1.1). Suppose 1/2 1/2 W¯ = (W¯ ij) follows Wp(m, Ip) with m = n − 1. Set W = (Wij) = Σ WΣ¯ . Then p p d  Y  Y σii |Rˆ | = |Rˆ | · |R | · W¯ · n n,0 n ii W i=1 i=1 ii where Rˆ n,0 is the Rˆ n corresponding to Σ = I.

14 Proof of Lemma 2.7. By definition, W follows the distribution of Wp(m, Σ). By Lemma 2.1,

p p d Y 1 Y 1 |Rˆ | = |W| · = |Σ| · |W¯ | · . n W W i=1 ii i=1 ii Define

p Y 1 |Rˆ | = |W¯ | · , n,0 W¯ i=1 ii which has the same distribution as that of Rˆ n as Σ = I. Then p ¯ d  Y Wii  |Rˆ | = |Rˆ | · · |Σ|. (2.15) n n,0 W i=1 ii

By the definition of Rn,

p Y 1 |R | = |Σ| · . n σ i=1 ii

Replacing |Σ| in (2.15) with the one from the above leads to the desired conclusion.  Qp ¯  LEMMA 2.8 Assume the notation and conditions in Lemma 2.7 hold. Define Tn = i=1 Wii · p Q σii . Then, for any integer k ≥ 1, i=1 Wii

2 −k nh p 2 2 i  2ko sup 2 + tr [(Rn − I) ] · E (log Tn) < ∞. n≥3 m m

Proof of Lemma 2.8. The proof is divided into several steps.

Step 1: a workable form of Tn. Write p ¯ p X Wii X Wii log T = log − log . n m mσ i=1 i=1 ii

Define i and i,0 through

2 Wii Wii  Wii  log = − 1 + − 1 i; (2.16) mσii mσii mσii W¯ W¯ W¯ 2 log ii = ii − 1 + ii − 1  . m m m i,0

The estimates of i and i,0 will be given after (2.21). It follows that

p p p 2 X Wii X  Wii  X  Wii  log = − 1 + − 1  mσ mσ mσ i i=1 ii i=1 ii i=1 ii and

p p p ¯ ¯ ¯ 2 X Wii X Wii  X Wii  log = − 1 + − 1  . m m m i,0 i=1 i=1 i=1

15 Consequently,

p ¯ p X Wii X Wii log − log m mσ i=1 i=1 ii p p 2 ¯ 2 1 X  Wii  X Wii  =  tr (W¯ ) − tr (TW) − − 1  + − 1  m mσ i m i,0 i=1 ii i=1

1/2 1/2 where T = diag (1/σii, 1 ≤ i ≤ p). From Lemma 2.7, tr (TW) = tr (Σ TΣ W¯ ). It gives us

tr (W¯ ) − tr (TW) = tr (MW¯ )

1/2 1/2 with M = I − Σ TΣ . By the spectral theorem, there is an orthogonal matrix O = Op×p 0 such that M = O diag (λ1, ··· , λp)O, where {λi; 1 ≤ i ≤ p} are the eigenvalues of M. Since the Wishart matrix W¯ = (W¯ ij) is orthogonally invariant,

 0 tr (MW¯ ) = tr diag (λ1, ··· , λp)OWO¯ d   = tr diag (λ1, ··· , λp)W¯ p X = λiW¯ ii, i=1

2 where {W¯ ii; 1 ≤ i ≤ p} are i.i.d. random variables with distribution χ (m). Also, observe Pp 1/2 1/2 1/2 1/2 i=1 λi = tr (M) = p − tr (Rn) = 0 since tr (Σ TΣ ) = tr (ΣT) = tr (T ΣT ). We conclude

p p p 2 ¯ 2 d 1 X X  Wii  X Wii  log T = λ (W¯ − m) − − 1  + − 1  n m i ii mσ i m i,0 i=1 i=1 ii i=1 = Z1 − Z2 + Z3. (2.17)

1/2 1/2 By the definition of correlation matrix from (1.1), we have T ΣT = Rn. Since M1M2 and M2M1 have the same eigenvalues for any p × p matrices M1 and M2, it is easy to check that λ1, ··· , λp are also the eigenvalues of I − Rn. Step 2: the moment of Z1. Review the definition of Wp(m, Ip) at the beginning of Section 2.1. ¯ Pm 2 We are able to write Wii −m = j=1(ξij −1) for all 1 ≤ i ≤ p, where {ξij; 1 ≤ i ≤ p, 1 ≤ j ≤ m} are i.i.d. N(0, 1)-distributed random variables. By the Marcinkiewicz-Zygmund inequality [e.g., Theorem 2 on p. 367 from Chow and Teicher (1988)], for any integer k ≥ 1, there exists a constant Ck > 0 such that

p m k 2k −2k h X X 2 2 2 i E(Z1 ) ≤ Ckm E λi (ξij − 1) i=1 j=1 m k −2k 2 2 k h N X 2 2 i = Ckm (λ1 + ··· + λp) E E (ξNj − 1) , (2.18) j=1 where N is a random variable independent of ξij’s and

2 λi P (N = i) = 2 2 , 1 ≤ i ≤ p. λ1 + ··· + λp

16 In the sequel, the notation Ck, which denotes constants depending on k but not on n, may be different from line to line. By H¨older’sinequality and the convex inequality on g(x) := xk on [0, ∞),

m k m k  N X 2 2 N h X 2 2 i E (ξNj − 1) ≤ E (ξNj − 1) j=1 j=1 m k−1 N X 2 2k ≤ m E (ξNj − 1) . j=1

From Fubini’s theorem,

m k m h N X 2 2 i k−1 N X  2 2k E E (ξNj − 1) ≤ m E E (ξNj − 1) j=1 j=1 k  2 2k = m E (ξ11 − 1) .

This and (2.18) imply

 1 k E(Z2k) ≤ C m−k(λ2 + ··· + λ2)k = C · tr [(R − I)2] . (2.19) 1 k 1 p k m n

1/2 Step 3: the moments of Z2 and Z3. Recall Lemma 2.7. Let Σ = (τij) = (τij)p×p. Then 0 Pp 2 ¯ 0 (σij) = Σ = (τij)(τij) . In particular, σii = j=1 τij. By definition W = Y Y, where Y is a m × p matrix in which all entries are i.i.d. N(0, 1)-distributed random variables. Since 1/2 1/2 1/2 0 1/2 (Wij) = Σ WΣ¯ = Σ (Y Y)Σ , we see

0 0 Wii = (τi1, ··· , τip)Y Y(τi1, ··· , τip) 0 2 = kY(τi1, ··· , τip) k 2 ∼ σii · χ (m) by the invariance of i.i.d. normal random variables. We then get a simple formula W ii ∼ χ2(m) (2.20) σii for each 1 ≤ i ≤ p. Let us calculate the moment of Z2. Keep in mind that we will only use the marginal distribution as in (2.20). Then, by H¨older’sinequality,

p 4k X h Wii  i E(Z2k) ≤ p2k−1 E − 1 2k 2 mσ i i=1 ii p 4k 2k−1 X Wii  4k 1/2 ≤ p − 1 · E(i ) mσ 8k i=1 ii

r 1/r where kξkr := [E(|X| )] for any random variable ξ. From (2.20) and Corollary 2 on p. 368 from Chow and Teicher, 1988), which is an application of the Marcinkiewicz-Zygmund inequality used in Step 2,

8k m 8k h Wii  i 1 h X  i Ck E − 1 = E (η2 − 1) ≤ , mσ m8k i m4k ii i=1

17 where η1, ··· , ηm are i.i.d. N(0, 1)-distributed random variables. Therefore,

2k−1 p p X 1/2 E(Z2k) ≤ C · E(4k) . (2.21) 2 k m2k i i=1 log x−x+1 Define ψ(x) = (x−1)2 for x > 0. It is easy to verify, there exists a constant C > 0 such that C Wii Wii −1 |ψ(x)| ≤ for all x > 0. Take x = in (2.16) to see |i| ≤ C( ) . By (2.20), x mσii mσii 4k 4k 2 −4k E(i ) ≤ (Cm) E[χ (m) ].

2 1 (m/2)−1 −x/2 The density of χ (m) is f(x) := m m/2 x e for x > 0. Consequently, Γ( 2 )2 1 Z ∞ E[χ2(m)−4k] = x(m/2)−4k−1e−x/2 dx m m/2 Γ( 2 )2 0 2(m/2)−4k Z ∞ = v(m/2)−4k−1e−v dv m m/2 Γ( 2 )2 0 m Γ( 2 − 4k) −4k = m 4k ∼ m (2.22) Γ( 2 )2 Γ(x+a) by the transform v = x/2 and then Lemma 2.4 from Dong et al. (2012) that limx→∞ Γ(x)xa = 1 for any number a ∈ R, which can also be obtained from (2.5). The assertion (2.22) is consistent 2 4k with the law of large numbers χ (m) ∼ m. We then have E(i ) ≤ Ck. By (2.21), we eventually arrive at  p 2k E(Z2k) ≤ C . (2.23) 2 k m

Inspecting the above proof, we get (2.23) from (2.20) without using any information on σii’s. 2k p 2k Applying (2.23) to the case Σ = I, we get E(Z3 ) ≤ Ck m . 2k 2k Finally, the two inequalities on E(Z2 ) and E(Z3 ) together with (2.17) and (2.19) entail  2k  2k 2k 2k  E (log Tn) ≤ Ck E(|Z1| ) + E(|Z2| ) + E(|Z3| ) h p2 2 ik ≤ C + tr [(R − I)2] . k m2 m n The proof is complete.  We will need the following notation later.  3  p  n − 2 µn,0 = p − n + log 1 − − p; 2 n − 1 n − 1 (2.24) h p  p i σ2 = −2 + log 1 − . n,0 n − 1 n − 1

LEMMA 2.9 Assume p := pn satisfy that n > p + 4 and p → ∞. Let x1, ··· , xn be i.i.d. from Np(µ, I) and Rˆ n,0 be as in Lemma 2.7. Then, there exists s0 > 0 such that the following holds. pnj (i) For any subsequence {nj; j ≥ 1} of positive integers with limj→∞ = y ∈ [0, 1], nj ˆ log |Rnj ,0| − µnj ,0  s2/2 lim E exp s = e , |s| ≤ s0. j→∞ σnj ,0

(ii) For all |s| ≤ s0, log |Rˆ | − µ  sup E exp n,0 n,0 s < ∞. n≥6 σn,0

18 Proof of Lemma 2.9. Recall Rˆ n,0 is the Rˆ n corresponding to Σ = I.  log |Rˆ n,0|−µn,0  By (1.6), E exp s < ∞ for all n ≥ 6 and |s| ≤ s0; σn,0 (i). The assertion (5.68) from Jiang and Yang (2013) confirms the case for y ∈ (0, 1]; the limit right above (5.6) from Jiang and Qi (2015) confirms the case for y = 0. So (ii) is true for any y ∈ [0, 1]. 0 (ii). If not, there exists s 6= 0 and a subsequence {nj; j ≥ 1} of positive integers such that 0 |s | ≤ s0 and log |Rˆ | − µ  lim E exp nj ,0 nj ,0 s0 = ∞. j→∞ σnj ,0

pn Since 0 < n ≤ 1 for all n ≥ 6, there exists y ∈ [0, 1] and a further subsequence {njk ; k ≥ 1} pn jk such that limk→∞ = y and njk ˆ log |Rn ,0| − µn ,0  lim E exp jk jk s0 = ∞, k→∞ σ njk ,0 which contradicts (i). The proof is completed.  We now are ready to prove the main conclusion in this section.

LEMMA 2.10 Assume p := pn satisfy that n > p+4 and p → ∞. Let x1, ··· , xn be i.i.d. from Np(µ, Σ) and Rˆ n be as in (1.3). Let µn and σn be as in Theorem 1. Let Rn be as in (1.1). −1 Assume Rn exists. Then, for any k ≥ 1, hlog |Rˆ | − µ 2ki sup E n n < ∞. n≥6 σn

2 Proof of Lemma 2.10. Review (2.24) for µn,0 and σn,0. Define m = n − 1. Then 2 µ = µ + log |R | and σ2 = σ2 + tr [(R − I)2]. n n,0 n n n,0 m n By Lemma 2.7,

log |Rˆ n| − µn log |Rˆ n,0| − µn,0 σn,0 1 = · + log Tn σn σn,0 σn σn

 Qp  Qp σii  where Tn = W¯ ii · is as in Lemma 2.8. Therefore, i=1 i=1 Wii hlog |Rˆ | − µ 2ki 21−2kE n n σn ˆ 2k hlog |Rn,0| − µn,0  i 1  2k ≤ E + 2k · E (log Tn) (2.25) σn,0 σn for any n ≥ 6. By (iii) of Lemma 2.9,

log |Rˆ | − µ  sup E exp n,0 n,0 s < ∞ n≥6 σn,0

x2k |x| x −x for all |s| ≤ s0. Use the inequality (2k)! ≤ e ≤ e + e for any x ∈ R to get

hlog |Rˆ | − µ 2ki sup E n,0 n,0 < ∞ (2.26) n≥6 σn,0

19 for any integer k ≥ 1. On the other hand, by Lemma 2.6,

p 2 2 σ2 ≥  + tr [(R − I)2]. n m m n Thus, from Lemma 2.8 we see

n 1  2ko sup 2k · E (log Tn) < ∞ n≥6 σn for any integer k ≥ 1. This, (2.25) and (2.26) entail the desired result. 

2.4 Convergence of Random Determinants

We now prove (1.11), the convergence in probability, a key step to compute In in (1.10). The major conclusion is Lemma 2.12. Throughout the rest of the paper, we always assume the following: s Let σn be as in Theorem 1. Given s > 0, set t = tn = and m = n − 1. Let V1, ··· ,Vp σn m be i.i.d. random variables with the distribution Beta(t, ). (2.27) 2 √ p Recall Rn and ∆n from Proposition 1.1. Let Dn = diag( V1, ··· , Vp) and λ1, ··· , λp be the eigenvalues of Dn∆nDn. Observing (∆n)ii = 0, then (Dn∆nDn)ii = 0 and (Dn∆nDn)ij = p rij ViVj for all i 6= j. These say

p p X X 2  2 X 2 λi = tr (Dn∆nDn) = 0 and λi = tr (Dn∆nDn) = rijViVj. (2.28) i=1 i=1 i6=j

LEMMA 2.11 Let p := pn satisfy that n > p and p → ∞ and σn be as in Theorem 1. Given s Pp 2 4t2 2 s > 0, set t = tn = and m = n − 1. Then m λ = · tr [(Rn − I) ] + n with n → 0 σn i=1 i m in probability as n → ∞.

Proof of Lemma 2.11. First, if ξ ∼ Beta(α, β), then α αβ Eξ = and Var(ξ) = ; α + β (α + β)2(α + β + 1) k−1 Y α + i E(ξk) = , k = 1, 2, ··· (2.29) α + β + i i=0 m t In our case, V1 ∼ Beta(t, 2 ). From (2.14), m → 0, hence

t 2t 2 t(t + 1) 4t(t + 1) EV1 = m ∼ ; E(V1 ) = m m ∼ 2 ; (2.30) 2 + t m ( 2 + t)( 2 + t + 1) m mt/2 4t Var (V1) = m 2 m ∼ 2 (2.31) ( 2 + t) ( 2 + t + 1) m P 2 as n → ∞. Let Un = i6=j rijViVj. Then

2 X 2 2 2 EUn = (EV1) rij = (EV1) · tr[(Rn − I) ] i6=j 4t2 = (1 + o(1)) · · tr[(R − I)2]. (2.32) m2 n

20 Write V¯i = Vi − EVi for i = 1, ··· , p. Then,

X 2 ¯ ¯ X 2 ¯ Un − EUn = rijViVj + 2 rijVi · EVj i6=j i6=j p X 2 ¯ ¯ X X 2  ¯ = rijViVj + 2(EV1) rij Vi. (2.33) i6=j i=1 j6=i

By independence, the two terms in (2.33) are uncorrelated and the summands from each sum are also uncorrelated respectively. Therefore

2 Var (Un) = E[(Un − EUn) ] p 2 X 4 2 X X 2 2 = Var (V1) · rij + 4(EV1) · Var (V1) · rij i6=j i=1 j6=i 2 2 2 2 ≤ Var (V1) · tr[(Rn − I) ] + 4kRnk · (EV1) · Var (V1) · tr[(Rn − I) ] (2.34)

P 4 P 2 2 since i6=j rij ≤ i6=j rij = tr[(Rn − I) ] and

p X X 2 2 X 2 rij ≤ kRnk · rij i=1 j6=i j6=i

s by Lemma 2.2. Then, from (2.14), (2.31) and the notation t = tn = , σn

2 2 2 2 16s 2 m Var (V1) · tr[(Rn − I) ] ∼ 2 2 · tr[(Rn − I) ] m σn 2 2 2 tr[(Rn − I) ] 16s ≤ (16s ) · 2 2 ≤ → 0 (2.35) p + m tr [(Rn − I) ] m as n → ∞. Now,

2 2 2 m · kRnk · (EV1) · Var (V1) · tr[(Rn − I) ] 4t2 4t ≤ 2 · m2 · · · kR k · tr[(R − I)2] m2 m2 n n kR kt  1  = 32 · n · t2 · tr[(R − I)2] m m n 2 1 2 2 as n is sufficiently large. By (2.14), t · m tr[(Rn − I) ] ≤ s . Recall the notation kRnk = 2 2 λmax(Rn). Trivially, tr [(Rn − I) ] ≥ (kRnk − 1) . Then, from (2.14) again,

2 −1/2 kRnkt kRnk − 1 + 1  p 1  ≤ s · · + (kR k − 1)2 m m m2 m n

kRnk − 1 + 1 = s · . p 2 2 p + m · (kRnk − 1)

√1 1 Evidently, the fraction is controlled by m + p . Therefore,

2 2 2 m · kRnk · (EV1) · Var (V1) · tr[(Rn − I) ] → 0 as n → ∞. Connecting this to (2.34) and (2.35), we obtain Var (mUn) → 0. Therefore, mUn − 4t2 2 mEUn → 0 in probability. The inequality in (2.14) implies m · tr[(Rn −I) ] is bounded. Hence,

21 Pp 2 4t2 2 we see from (2.28) and (2.32) that m i=1 λi = mUn = m · tr[(Rn − I) ] + n with n → 0 in probability as n → ∞.  Recall the notation in (2.27). Here comes our main result for convergence in probability.

LEMMA 2.12 Let p := pn satisfy that n > p and p → ∞ and σn be as in Theorem 1. Review Rn defined in (1.1). Set ∆n = Rn − I. Then m t2 − + t log |I + ∆ · diag(V , ··· ,V )| − · tr (∆2 ) → 0 2 n 1 p m n in probability as n → ∞. √ p Proof of Lemma 2.12. Let Dn = diag( V1, ··· , Vp) and λ1, ··· , λp be the eigenvalues of Dn∆nDn. Then

log |I + ∆n · diag(V1, ··· ,Vp)| = log |I + Dn∆nDn| p X = log(1 + λi). (2.36) i=1 We first need a few of estimates. It is easy to check from (2.14) and notation t = s that σn 3 3 3 |t| 2 |s| 1 2 |s| 2 · tr (∆n) = · 2 tr (∆n) ≤ → 0 (2.37) m mσn mσn p 2 2 2 by using mσn ≥ p and σn ≥ m tr (∆n). Similarly, 2 2 2 4t 2 4s 1 2 4s 3 · tr (∆n) = 2 · 2 tr (∆n) ≤ 2 . (2.38) m m mσn m

From Lemma 2.11 and (2.37), for any sequence {δn; n ≥ 4} with δn = o(1), p m 1 X − + t · − + δ  · λ2 2 2 n i i=1 1 h4t2 i 1 − 2δ t h4t2 i = (1 − 2δ ) · · · tr (∆2 ) +  + n · · · tr (∆2 ) +  n 4 m n n 2 m m n n t2 = · tr (∆2 ) + 0 (2.39) m n n 0 t where n → 0 and n → 0 in probability as n → ∞, and the assertion m n → 0 in probability is t 4t2 2 due to the fact m → 0; the fact m · tr[(Rn − I) ] is bounded comes from (2.14). Pp 2 4t2 2 By Lemma 2.11 again, m i=1 λi = m · tr (∆n) + n. Thus, we see from (2.38) that p 2 p 2 P 2 5s P 2 5s √3s P ( i=1 λi ≤ m ) → 1 as n → ∞. Over the set { i=1 λi ≤ m }, we know max1≤i≤p |λi| ≤ m . 1 In particular, max1≤i≤p |λi| ≤ 2 as n is large enough. 1 2 3 P∞ i 1 i Write log(1+x) = x− 2 x +x ·(x) for |x| < 1. By Taylor’s expansion, (x) = i=0(−1) i+3 x . 1 P∞ i 1 Thus, |(x)| ≤ 3 i=0 |x| = 3(1−|x|) for all |x| < 1. So sup|x|≤1/2 |(x)| ≤ 1. Define Ωn = Pp 2 5s2 { i=1 λi ≤ m }, which is a subset of the whole sample space Ω. From now on, we assume n is sufficiently large. On Ωn, it is seen from the first identity in (2.28) that p p p p X X 1 X X log(1 + λ ) = λ − λ2 + λ3 ·  (λ ) i i 2 i i i i i=1 i=1 i=1 i=1 p p 1 X X = − λ2 + λ3 ·  (λ ) 2 i i i i i=1 i=1

22 by (2.28) where i(x), 1 ≤ i ≤ p, are functions satisfying sup1≤i≤p |(λi)| ≤ 1. Trivially,

p p p X 3 X 2 3s X 2 λi · i(λi) ≤ max |λi| · λi ≤ √ · λi 1≤i≤p m i=1 i=1 i=1 on Ωn. Hence p p m X m h 1 1 i X − + t log(1 + λ ) = − + t · − + O√  · λ2 2 i 2 2 m i i=1 i=1 on Ωn. The lemma then follows from (2.36), (2.39) and the fact limn→∞ P (Ωn) = 1. 

2.5 Uniform Integrability of Random Determinants In this section, we will show (1.12) en route to prove Theorem 1. Recall the notation in (2.27). First, we need some concentration inequalities.

LEMMA 2.13 Let p := pn satisfy that m := n − 1 > p → ∞ and σn be as in Theorem 1. Then, 1 for any ρ ∈ (0, 2 ), there exist constants M = M(ρ) > 0 and n0 = n0(ρ, s) ≥ 1 such that p n ρmy  X o sup e · P Vi > y ≤ 1 pt y≥M m i=1 as n ≥ n0.

Lemma 2.13 is not a standard Chernoff bound since the mean of V1 goes to zero. An extra −C1m effort has to be paid to get the subtle rate. In fact, the rate e for some C1 > 0 in the lemma −C2p is different from e for some constants C2 > 0, which is usually seen in a Chernoff bound. Pp Proof of Lemma 2.13. Notice i=1 Vi ≤ p. So without loss of generality, we assume y ≤ p. m Set r = 2 . By the Chernoff bound [see, e.g., Dembo and Zeitouni (2009)], p 1 X P V ≥ x ≤ e−pI(x) (2.40) p i i=1

2t for all x > EV1 ∼ m from (2.30), where

I(x) = sup{θx − log EeθV1 }. θ>0

Step 1. Estimate of the moment generating function EeθV1 . Notice

1 Z 1 EeθV1 = xt−1(1 − x)r−1eθx dx. B(t, r) 0

r−1 −wx m Define w = r − 1. Then (1 − x) ≤ e since r − 1 = 2 − 1 ≥ 0 as n ≥ 3. Hence, 1 Z 1 EeθV1 ≤ xt−1e−(w−θ)x dx B(t, r) 0 (w − θ)−t Z ∞ ≤ yt−1e−y dy B(t, r) 0 Γ(r + t) = (w − θ)−t Γ(r)

23 for θ < w, where the transform y = (w − θ)x is used in the second step and the formula B(t, r) = Γ(r)Γ(t) is applied in the last identity. By using t = s = O( m ) and Lemma 2.5, we Γ(r+t) σn p Γ(r+t) t 2 −1 see Γ(r) = r u where u = un = exp[O((t + 1)m )] not depending on θ. Then,

EeθV1 ≤ rtu · (w − θ)−t (2.41) uniformly for all θ < w as n is sufficiently large. Step 2. Evaluation of the rate function I(x). From (2.41),

I(x) ≥ − log(rtu) + sup {θx + t log(w − θ)}. 0<θ

0 t 00 −2 Set ϕ(θ) = θx+ t log(w − θ) for 0 < θ < w. Then ϕ (θ) = x − w−θ and ϕ (θ) = −t(w − θ) < 0. t t t So the maximizer θ0 satisfies w − θ0 = x or θ0 = w − x ∈ (0, w) if x > w . Thus, under the restriction t 2t x > m ∼ , (2.42) 2 − 1 m t t we obtain I(x) ≥ − log(r u) + wx − t + t log x . By (2.40), p 1 X P ( V ≥ x) ≤ (rtu)pe−pJ(x) p i i=1 t for all x satisfying (2.42), where J(x) = wx − t + t log x . Therefore, p p  X  1 X  P V > y = P V > x ≤ (rtu)pe−pJ(x) i p i i=1 i=1

y 3pt with x = p if (2.42) holds, which is ensured if y ≥ m . Now wy pt J(x) = − t + t log p y Consequently, p  X  n pto P V > y ≤ (rtu)p · exp − wy + pt − pt log i y i=1 n pt o = exp − wy + pt − pt log + pt log r + p log u y n ery o = exp − wy + pt log + p log u (2.43) pt

3pt for all y ≥ m . Step 3. In this part we will show that the second and third terms in {·} from (2.43) are small relative to the first term wy. Review u = exp[O((t2 + 1)m−1)]. It is easy to see, as n is sufficiently large, p log u p(t2 + 1) t2 + 1 = O = O wy m2y mt

pt uniformly for all y ≥ 3 m . From Lemma 2.6,

2 2 1/2 t + 1 1 1 s σn s 12 log m 2p  = t + = + ≤ + 2 + 3 → 0 (2.44) mt m t mσn ms p s m m

24 as n → ∞ since p < m by assumption. This implies p log u τn := sup → 0 (2.45) y≥3pt/m wy

r m as n → ∞. Moreover, noting r−1 = m−2 ≤ 2 for all m = n − 1 ≥ 4, we have  ery  1 pt ery pt log · = log pt wy wy pt er log A 2e log A = ≤ r − 1 A A

ery m pt for all n ≥ 5, where A := pt . Evidently, given M ≥ 3, we see A ≥ pt y ≥ M if y ≥ M m . log x Realizing h(x) = x is decreasing on [e, ∞), we see n ery  1 o 6 log M sup pt log · ≤ y>Mpt/m pt wy M for all M ≥ 3 and n ≥ 5. This, (2.43) and (2.45) imply

p  X  n  6 log M o P V > y ≤ exp − wy 1 − − τ (2.46) i M n i=1

pt my uniformly for all y > M m , M ≥ 3 and large n not depending on M. Trivially, wy ∼ 2 as log M n → ∞. From the fact limM→∞ M = 0 and limn→∞ τn = 0 we get the desired conclusion by picking M = M(ρ) through (2.46) and n0 = n0(ρ, s) ≥ 1 through (2.44) and (2.45).  Lemma 2.13 is a type of large deviations. We also need to estimate a similar probability when the “y” in the lemma is small. This belongs to the zone of moderate deviations. We will apply the following inequality by Yurinskii (1976) to achieve this purpose.

LEMMA 2.14 Let ξ1, ··· , ξp be independent random variables taking values in a separable Ba- nach space (B, k · k) satisfying m! E(kξ km) ≤ b2Hm−2, m = 2, 3, ··· . j 2 j

2 2 2 βp βp Let βp ≥ Ekξ1 + ··· + ξpk and B = b + ··· + b . For any x > , set x¯ = x − . Then p 1 p Bp Bp

 x¯2 1  P (kξ1 + ··· + ξpk ≥ xBp) ≤ exp − . 8 1 + (¯xH)/(2Bp) Specializing the above lemma to our set-up, we get the following result.

LEMMA 2.15 Let p := pn satisfy that m := n−1 > p+3 and p → ∞ and σn be as in Theorem pn 1 2 1. Assume either infn≥6 n > 0 or infn≥6 n tr[(Rn − I) ] > 0. Then, for some δ > 0,

 X   1 m2y  P r2 V V ≥ y ≤ exp − · √ ij i j 256 pt + m y i6=j

1 for all y > m , s ∈ (0, δ] and n ≥ 6.

25 The Hanson-Wright inequality (Hanson and Wright, 1971) can also provide an upper bound for the probability in Lemma 2.15. A most recent exposition of the inequality is given by Rudelson and Vershynin (2013). However, since all Vi’s take small values, the Hanson-Wright inequality seems not able to catch the precision we want. The Yurinskii inequality supplies an ideal upper bound for us.

Proof of Lemma 2.15. By Lemma 2.6, there exists a constant s0 > 0 such that

p 2 2 σ2 ≥  + tr [(R − I)2] ≥ s2 > 0 n m m n 0

pn 1 2 for all n ≥ 6 based on the assumption that either infn≥6 n > 0 or infn≥6 n tr[(Rn − I) ] > 0. This says s t = tn = ≤ 1 (2.47) σn for all n ≥ 6 and 0 < s ≤ s0. In the application of Lemma 2.14 next, we take B = Rp and k · k is the Euclidean norm. 2 Now, for Rn = (rij) is non-negative definite, the Hadamard product Rn ◦ Rn = (rij) is also non-negative definite by the Schur product theorem; see, e.g., p. 458 from Horn and Johnson 2 2 0 (1985). So we can write (rij) = A = A A, where A = (a1, ··· , ap) is a non-negative definite 2 0 P 2 matrix. In particular, rij = aiaj for i 6= j and kaik = 1. Set Wn = i6=j rijViVj. Then

X 2 X 0 2 Wn ≤ rijViVj = aiajViVj = ka1V1 + ··· + apVpk . (2.48) 1≤i,j≤p 1≤i,j≤p

m 2 1/2 Define r = 2 as before. Note Eka1V1 + ··· + apVpk ≤ (Eka1V1 + ··· + apVpk ) and

2 X 2 E(ka1V1 + ··· + apVpk ) = E rijViVj 1≤i,j≤p t2 t(t + 1) = tr [(R − I)2] · + p n (r + t)2 (r + t)(r + t + 1) 2s2 8pt ≤ + , m m2 2 2 2 where (2.30) is used in the second identity and the fact m tr [(Rn − I) ] ≤ σn and (2.47) are p employed in the last step. From Lemma 2.6, σn ≥ m , it follows 2s2 8s1/2 Eka V + ··· + a V k ≤ + 1 1 p p m m 1 ≤ √ := β (2.49) 2 m p for all 0 < s ≤ 1/40. Now let us bound the probability P (ka1V1 + ··· + apVpk ≥ x). First, by (2.29),

m m E[ka1V1k ] = E(V1 ) t(t + 1) ··· (t + m − 1) = (r + t)(r + t + 1) ··· (r + t + m − 1) t(t + 1) (t + 2)(t + 3) ··· (t + m − 1) ≤ · r2 rm−2 2t 1 m! ≤ · 2 . r2 rm−2

26 According to the notation from Lemma 2.14, 2t 2pt 1 b2 = ,B2 = ,H = . (2.50) j r2 p r2 r

If x ≥ 2 βp , then x ≤ x¯ := x − βp ≤ x. Therefore, we have from Lemma 2.14 and (2.49) that Bp 2 Bp

 x¯2 1  P (ka1V1 + ··· + apVpk ≥ xBp) ≤ exp − 8 1 + (¯xH)/(2Bp)  x2 1  ≤ exp − 32 1 + (xH)/(2Bp)

βp 1 for all x ≥ 2 and 0 < s ≤ min{s0, }. Now, from (2.48), Bp 40

P (Wn ≥ y) ≤ P (ka1V1 + ··· + apVpk ≥ xBp) √ √ with x = y . Hence, if x = y ≥ 2 βp , that is, y ≥ 4β2 = 1 , we have Bp Bp Bp p m  y 1  P (Wn ≥ y) ≤ exp − 2 √ 2 . 32Bp 1 + (H y)/(2Bp) Finally, from (2.50),

y 1 1 r2y 1 m2y 2 √ 2 = · √ ≥ · √ . 32Bp 1 + (H y)/(2Bp) 32 2pt + r y/2 256 pt + m y In summary,

 1 m2y  P (W ≥ y) ≤ exp − · √ n 256 pt + m y

1 1 for all y ≥ m , 0 < s ≤ min{s0, 40 } and n ≥ 6. 

Let {Xt; t ∈ T } be a collection of random variables. We say they are uniformly integrable if supt∈T E[|Xt|I(|Xt| ≥ β)] → 0 as β → ∞. Now we prove the key result in this section. Review ∆n = Rn − I and (2.27).

LEMMA 2.16 Let p := pn satisfy that m := n − 1 > p and p → ∞. Set Qn = |I + ∆n · m − 2 −t 1 diag(V1, ··· ,Vp)| . Assume infn≥6 λmin(Rn) > 2 . Then, there exists s0 > 0 such that {Qn; n ≥ 6} is uniformly integrable for any s ∈ (0, s0] provided one of the following holds: (i) pn 1 2 pnk∆nk∞ infn≥6 > 0; (ii) infn≥6 tr(∆ ) > 0; (iii) sup < ∞. n n n n≥6 k∆nk2

Proof of Lemma 2.16. Write an = −λmin(∆n) for short notation. Lemma 2.3 and the 1 condition infn≥6 λmin(Rn) > 2 imply that an > 0 for each n ≥ 6 and 1 sup an < . (2.51) n≥6 2 √ p Let Dn = diag( V1, ··· , Vp). Easily, Mn := Dn(∆n + anIp)Dn is nonnegative definite. 2 This and the fact that 0 ≤ Vi ≤ 1 for each i imply that I + anDn is positive definite. Hence,

2 |I + ∆n · diag(V1, ··· ,Vp)| = |(I − anDn) + Mn| 2 ≥ |I − anDn|.

27 m Set v = 2 + t. Notice

2 −v 1 Qn ≤ Rn := |I − anDn| ≤ vp . (2.52) (1 − an)

By the definition of uniform integrability, it suffices to show that there exists n0 ≥ 6 such that   sup E QnI(Qn ≥ β) → 0 n≥n0 Pp as β → ∞. Set Un = i=1 Vi. Then, for any number β ≥ 1 and b > 0,       E QnI(Qn ≥ β) ≤ E RnI(Un ≥ b) + E QnI(Qn ≥ β, Un ≤ b)

:= Fn + Gn.

In the following two steps, we will show Fn → 0 and Gn → 0, respectively. x Step 1: the proof that Fn → 0. It is known log(1 − x) ≥ − 1−x for x ∈ [0, 1). Thus,  Qp  an Pp log (1 − anVi) ≥ − Vi. Hence, from (2.52), i=1 1−an i=1

 anv  Rn ≤ exp Un . (2.53) 1 − an Set h = an v. Then 0 ≤ h ≤ v by (2.51). It is trivial to check 1−an Z c hUn hb hy e I(b ≤ Un ≤ c) ≤ e I(Un ≥ b) + h e I(Un ≥ y) dy b for any b < c and any realization of Un by considering if b ≤ Un ≤ c is true or not. Take expectations on both sides to have Z c  hUn  hb hy E e I(b ≤ Un ≤ c) ≤ e P (Un ≥ b) + h e P (Un ≥ y) dy. (2.54) b

1 an τ 1 Since sup an < , it is apparent that τ := sup < 1 from (2.51). Take ρ ∈ ( , ). By n≥6 2 n≥6 1−an 2 2 Lemma 2.13, there exist constants n1 = n1 ≥ 6 and M = M(ρ) > 0 such that

−ρmy P (Un ≥ y) ≤ e

pt for all y ≥ b := bn = M m and n ≥ n1. In what follows, ni denotes constants not depending on n. Thus, we have from (2.54) with c = ∞ that h a i Z ∞ a  hUn  n  n  E e I(Un ≥ b) ≤ exp vb − ρmb + h exp vy − ρmy dy 1 − an b 1 − an Z ∞ = e−βnmb + h e−βnmy dy, b where

an v βn = ρ − . 1 − an m v 1 1 1 For m → 2 and 0 ≤ h ≤ m for large n, from the choice of ρ, it is readily seen βn ≥ 2 (ρ − 2 τ) = 1 4 (2ρ − τ) > 0 as n ≥ n2 ≥ n1. This gives  4h   1  EehUn I(U ≥ b) ≤ 1 + exp − (2ρ − τ)mb n (2ρ − τ)m 4  4  n 1 o ≤ 1 + · exp − (2ρ − τ)Mpt 2ρ − τ 4

28 an for all n ≥ n3 ≥ n2. Review (2.53) and the notation h = v, we get 1−an  4  n 1 o ER I(U ≥ b) ≤ 1 + · exp − (2ρ − τ)Mpt n n 2ρ − τ 4 for all s > 0 and n ≥ n3. From Lemma 2.6, pt → ∞. Therefore,   lim E RnI(Un ≥ b) = 0. n→∞

0 Step 2: the proof that Gn → 0. It suffices to show there exist n0 ≥ 6 and s1 > 0 such that   lim sup E QnI(Qn ≥ β, Un ≤ b) = 0 (2.55) β→∞ 0 n≥n0

pt for any s ∈ (0, s1), where b = M m is defined as before. Let λ1, ··· , λp be the eigenvalues of Dn∆nDn. From (2.28),

p X 2 X 2 X 2 λi = rijViVj ≤ ViVj ≤ Un. (2.56) i=1 i6=j i6=j

pt Pp 2 M 2p2t2 Then, under condition Un ≤ b = M m , we see i=1 λi ≤ m2 . Recall M does not depend on 1 2 s2 m2s2 s. Set s0 = . Thus, from Lemma 2.6, we see t = 2 ≤ 2 and hence 2M σn p

p X M 2p2t2 1 λ2 ≤ ≤ i m2 4 i=1

1 for any s ∈ (0, s0). In particular, max1≤i≤p |λi| < 2 . From now on, we assume s ∈ (0, s0). 2 P∞ i 1 i Write log(1+x) = x−x ·(x) for |x| < 1. Then, by Taylor’s expansion, (x) = i=0(−1) i+2 x . 1 P∞ i 1 Thus, |(x)| ≤ 2 i=0 |x| = 2(1−|x|) for all |x| < 1. So sup|x|≤1/2 |(x)| ≤ 1. This indicates that, under Un ≤ b,

p m m X log Q = − + t · |I + D ∆ D | = − + t log(1 + λ ) n 2 n n n 2 i i=1 p X 2 = v λi (λi) i=1 by the first identity from (2.28), where max1≤i≤p |(λi)| ≤ 1. This and (2.28) say that

vΨn mΨn QnI(Un ≤ b) ≤ e ≤ e (2.57)

P 2 for all n ≥ n4 ≥ n3, where Ψn := i6=j rijViVj. Now we show (2.55) by distinguishing three cases. pn 1 2 Cases (i) and (ii). Review Case (i) says infn≥6 n > 0 and Case (ii) is that infn≥6 n tr(∆n) > 0. The inequalities from (2.56) and (2.57) imply that

log β M 2p2t2  Q I(Q ≥ β, U ≤ b) ≤ emΨn I ≤ Ψ ≤ (2.58) n n n m n m2 for all β > 0. By Lemma 2.15, there exists some 0 < δ < s0 for which  X   my m  P r2 V V ≥ y ≤ exp − · √ ij i j 256 pt + m y i6=j

29 1 ps for all y > , s ∈ (0, δ] and n ≥ n4 + 6. Since pt = ≤ ms by Lemma 2.6, we have m σn m m 1 √ ≥ ≥ ≥ 512 pt + m y pt + Mpt (M + 1)s

1 M 2p2t2 −1 for all m < y ≤ m2 and s ∈ (0, s1] where s1 := min{δ, (512(M + 1)) }. Thus,

−2my P (Ψn ≥ y) ≤ e

1 M 2p2t2 log β 2 for all m < y ≤ m2 , s ∈ (0, s1] and n ≥ n4 + 6. In (2.54), taking b = b1 := m with β ≥ e M 2p2t2 and c = m2 , then replacing Un by Ψn, we obtain Z c  mΨn  mb1 my E e I(b1 ≤ Ψn ≤ c) ≤ e P (Ψn ≥ b1) + m e P (Ψn ≥ y) dy b1 Z ∞ ≤ e−mb1 + m e−my dy b1 2 = 2e−mb1 = . β

This and (2.58) indicate

  2 sup E QnI(Qn ≥ β, Un ≤ b) ≤ n≥n4 β

2 for every s ∈ (0, s1], every β ≥ e and n ≥ n4 + 6. This concludes (2.55). pnk∆nk∞ Case (iii): sup < ∞. Let us continue from (2.57). Under Un ≤ b, n≥6 k∆nk2

2 X 2 2 Ψn ≤ k∆nk∞ ViVj ≤ k∆nk∞ · Un i6=j p2t2 ≤ M 2k∆ k2 · . n ∞ m2

pnk∆nk∞ 1 Set K = sup . Then, k∆nk∞ ≤ K k∆nk2, and thus n≥6 k∆nk2 pn

2 2 (MKs)  2 2  1 (MKs) Ψn ≤ · tr [(Rn − I) ] · 2 ≤ m m σn m where the last assertion comes from Lemma 2.6. Therefore, by (2.57), we finally see

(MKs)2 QnI(Un ≤ b) ≤ e for all n ≥ n4. Therefore,

QnI(Qn ≥ β, Un ≤ b) = 0

(MKs)2 as long as β > e . This proves (2.55). 

2.6 Proofs of Theorem 1 and Corollary 2 With the understanding from Sections 2.1-2.5, we are now fully prepared to prove Theorem 1.

30 Proof of Theorem 1. Since N(0, 1) is uniquely determined by its moments E[N(0, 1)k] for k = 1, 2, ··· , to prove the theorem, by Proposition 1.2 and Lemma 2.10, it is enough to show

log |Rˆ | − µ  2 lim E exp n n s = es /2 (2.59) n→∞ σn for all s ∈ (0, s0), where s0 is a constant not depending on n. From the second conclusion in Lemma 2.1, we have log |Rˆ n| ≤ 0, and hence the expectation above is finite for any s ≥ 0 and n ≥ 6. Define

 3  p  n − 2 µ = p − n + log 1 − − p; n,0 2 n − 1 n − 1 h p  p i 2 σ2 = −2 + log 1 − and σ2 = tr [(R − I)2]. n,0 n − 1 n − 1 n,1 n − 1 n We then have

2 2 2 µn = µn,0 + log |Rn| and σn = σn,0 + σn,1. (2.60)

Set t = s for s > 0 fixed. By Proposition 1.1, σn

 Γ( n−1 ) p Γ ( n−1 + t) ˆ t 2 p 2 E[|Rn| ] = n−1 · n−1 Γ( 2 + t) Γp( 2 ) t  − n−1 −t ·|Rn| · E |I + ∆n · diag(V1, ··· ,Vp)| 2 t t  − n−1 −t = E[|Rˆ n,0| ] · |Rn| · E |I + ∆n · diag(V1, ··· ,Vp)| 2

n−1 where V1, ··· ,Vp are i.i.d. with Beta(t, 2 )-distribution and

n−1 p n−1 t t  Γ( 2 )  Γp( 2 + t) E[|Rˆ n,0| ] = E[|Rˆ n| ] = · . ∆n=0 n−1 n−1 Γ( 2 + t) Γp( 2 )

ˆ log |Rn|−µn  −µnt t Rewrite E exp s = e · E[|Rˆ n| ]. Then, to show (2.59), it suffices to check σn

 t  − n−1 −t log E[|Rˆ n,0| ] − µn,0t + log E |I + ∆n · diag(V1, ··· ,Vp)| 2 s2 := A + B → (2.61) n n 2 for each s ∈ (0, s0). The number “s0” will be specified later. We will analyze An and Bn separately next.

Step 1: Analysis of An. By Lemma 2.9, there exists s0 > 0 such that, for any subsequence pnj {nj; j ≥ 1} of positive integers with limj→∞ = y ∈ [0, 1], nj ˆ log |Rnj ,0| − µnj ,0  s2/2 lim E exp s = e , |s| ≤ s0. (2.62) j→∞ σnj ,0

pn Based on the fact n ∈ [0, 1] for all n ≥ 6, by a subsequence argument, it is easy to see that

log |Rˆ | − µ  2 lim E exp n,0 n,0 s = es /2 n→∞ σn,0

31 for all |s| ≤ 2s1 := s0. This immediately implies

log |Rˆ | − µ n,0 n,0 → N(0, 1) (2.63) σn,0 weakly as n → ∞. 2 σn,i From (2.60), 2 ∈ [0, 1] for i = 0, 1 and all n ≥ 6. Then, for any subsequence {lk}, there σn exists a further subsequence lkj such that

2 σn,i lim 2 = αi ∈ [0, 1] and α0 + α1 = 1 (2.64) j→∞ σn

ˆ log |Rn,0|−µn,0  with n = lk for i = 0, 1. Define Zn = exp s for n ≥ 6 and s ≥ 0. Then j√ σn Zn → exp(s α0 N(0, 1)) weakly along the subsequence {lkj } by (2.63). Furthermore, since ˆ 2 0 < |Rn,0| ≤ 1, then Zn is a bounded random variable for all n ≥ 6. Thus supn≥6 E(Zn) < ∞ 2 2 for all |s| ≤ s1 by (2.62) and the fact σn ≥ σn,0. It follows that {Zn; n ≥ 6} are uniformly integrable. So

log |Rˆ | − µ  √ 2 n,0 n,0 α0s /2 lim E exp s = E exp(s α0 N(0, 1)) = e j→∞ σn for all |s| ≤ s1 if “n” is replaced by “lkj ”. So α s2 A → 0 (2.65) n 2 along the subsequence {lkj } for all |s| ≤ s1. 2 2 2 Step 2: Analysis of Bn. Recall σn,1 = n−1 tr [(Rn − I) ]. Then,

2 2 2 2 t 2 s σn,1 α1s tr (∆n) = · 2 → n − 1 2 σn 2 along the subsequence {lkj } by (2.64). We then have from Lemma 2.12 that

n−1 2 − −t α1s /2 |I + ∆n · diag(V1, ··· ,Vp)| 2 → e

in probability along the subsequence {lkj } for any s ≥ 0. Lemma 2.16 confirms that the left hand side above is uniformly integrable for each s ∈ [0, s2] where s2 is a constant. We thus conclude

2 h − n−1 −ti α1s B = log E |I + ∆ · diag(V , ··· ,V )| 2 → (2.66) n n 1 p 2 along the subsequence {lkj } for each s ∈ [0, s2]. We now make a summary. It is shown that, for any subsequence {lk; k ≥ 1} of {n; n ≥ 1}, we

find a further subsequence {lkj ; j ≥ 1} such that (2.65) and (2.66) hold along the subsequence

{lkj ; j ≥ 1} for all s ∈ [0, s3] where s3 := s1 ∧ s2. This ensures (2.61) by the second equality in (2.64) and the subsequence argument again. 

The Proof of Corollary 2. By Lemma 2.3, λmin(Rn) ∈ [0, 1]. Let λ1, ··· , λp be the eigenvalues of Rn. Then, from the Ger˘sgorindisc theorem [see, e.g., p. 344 from Horn and Johnson (1990)], we have X |1 − λk| ≤ max |rij| 1≤i≤p j6=i

32 P for any 1 ≤ k ≤ p. In particular, this entails 1 − λmin(Rn) ≤ max1≤i≤p j6=i |rij|. Now we use this fact to prove the corollary. P p−1 2|ρ| (i) Notice max1≤i≤p j6=i |rij| ≤ 2(|ρ| + ··· + |ρ| ) ≤ 1−|ρ| . Hence 2|ρ| 1 λ (R ) ≥ 1 − > min n 1 − |ρ| 2

1 pn if |ρ| < 5 . Then, the CLT in Theorem 1 holds under the assumption infn≥6 n > 0. (ii) First, fix n ≥ 6. Then X 1 − λmin(Rn) ≤ max |rij| ≤ max {j 6= i; |i − j| ≤ k}· max |rij| 1≤i≤p 1≤i≤p i6=j j6=i

≤ (2k) · sup max |rij|. n≥6 i6=j

1 1 If supn≥6 maxi6=j |rij| < 4k , then infn≥6 λmin(Rn) > 2 . Hence, the CLT in Theorem 1 is valid pn provided infn≥6 n > 0. 

Acknowledgement. We thank a referee for his constructive suggestions on each section.

References

[1] Anderson, T. (1958). An Introduction to Multivariate Statistical Analysis. John Wiley & Sons, 2nd Ed. [2] Bartlett, M. S. (1954). A note on multiplying factors for various chi-squared approxima- tions. J. Royal Stat. Soc., Ser. B 16, 296-298. [3] Billingsley, P. (1986). Probability and Measure. Wiley Series in Probability and Mathemat- ical Statistics, 2nd Ed. [4] Bai, Z. and Silverstein, J. (2009). Spectral Analysis of Large Dimensional Random Matri- ces. Springer, 2nd Ed. [5] Bao, Z., Pan, G. and Zhou, W. (2012). Tracy-Widom law for the extreme eigenvalues of sample correlation matrices. Electron. J. Probab. 17, 1-32; DOI: 10.1214/EJP.v17-1962. [6] Brockwell, P. J. and Davis, R. A. (2002). Introduction to Time Series and Forecasting. Springer, New York. [7] Cai, T. Fan, J. and Jiang, T. (2013). Distributions of angles in random packing on spheres. J. Mach. Learn. Res. 14, 1837-1864. [8] Cai, T., Liang, T. and Zhou, H. (2015). Law of log determinant of sample covariance matrix and optimal estimation of differential entropy for high-dimensional Gaussian distributions. J. Multivariate Anal. 137, 161-172. [9] Chow, Y. S. and Teicher, H. (1988). Probability Theory, Independence, Interchangeability, Martingales. Springer, 2nd Ed. [10] Dembo, A. and Zeitouni, O. (2009). Large Deviations Techniques and Applications. Springer, Second edition. [11] Dong, Z., Jiang, T. and Li, D. (2012). Circular law and arc law for truncation of random . J. Math. Phys. 53, 013301-14.

33 [12] Eaton, M. (1983). Multivariate Statistics: A Approach (Wiley Series in Probability and Statistics). John Wiley & Sons Inc. [13] Hanson, D. L. and Wright, E. T. (1971). A bound on tail probabilities for quadratic forms in independent random variables. Ann. Math. Statist. 42, 1079-1083. [14] Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge University Press. [15] Jiang, T. (2004a). The asymptotic distributions of the largest entries of sample correlation matrices. Ann. Appl. Probab. 14, 865-880.

[16] Jiang, T. (2004b). The limiting distributions of eigenvalues of sample correlation matrices. Sankhya 66, 35-48. [17] Jiang, T. and Qi, Y. (2015). Likelihood ratio tests for high-dimensional normal distribu- tions. Scand. J. of Stat. 42(4), 988-1009.

[18] Jiang, T. and Yang, F. (2013). Central limit theorems for classical likelihood ratio tests for high-dimensional normal distributions. Ann. Stat. 41(4), 2029-2074. [19] Li, D., Liu, W. and Rosalsky, A. (2010). Necessary and sufficient conditions for the asymp- totic distribution of the largest entry of a sample correlation matrix. Probab. Theory and Related Fields 148(1), 5-35.

[20] Li, D. and Rosalsky, A. (2006). Some strong limit theorems for the largest entries of sample correlation matrices. Ann. Appl. Probab. 16(1), 423-447. [21] Morrison, D. F. (2004). Multivariate Statistical Methods. Duxbury Press, 4th Ed. [22] Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York.

[23] Nguyen, H. H. and Vu, V. (2014). Random matrices: Law of the determinant. Ann. Probab. 42(1), 146-167. [24] Rudelson, M. and Vershynin, R. (2013). Hanson-Wright inequality and sub-gaussian con- centration. Electron. Commun. Probab. 18(82), 19.

[25] Smale, S. (2000). Mathematical problems for the next century. Mathematics: Frontiers and Perspectives (Ed. V. Arnold, M. Atiyah, P. Lax, and B. Mazur), 271-294. [26] Tao, T. and Vu, V. (2012). A central limit theorem for the determinant of a Wigner matrix. Adv. Math. 231(1), 74-101. [27] Wilks, S. S. (1932). Certain generalizations in the analysis of variance. Biometrika 24, 471-494. [28] Yurinski˘i, V. V. (1976). Exponential inequalities for sums of random vectors. J. Multivar. Anal. 6, 473-499. [29] Zhou, W. (2007). Asymptotic distribution of the largest off-diagonal entry of correlation matrices. Trans. Amer. Math. Soc. 359(11), 5345-5363.

34