Determinant of Sample Correlation Matrix with Application
Tiefeng Jiang1 University of Minnesota
Abstract
Let x1, ··· , xn be independent random vectors of a common p-dimensional normal distribu-
tion with population correlation matrix Rn. The sample correlation matrix Rˆ n = (ˆrij )p×p
is generated from x1, ··· , xn such thatr ˆij is the Pearson correlation coefficient between the 0 i-th column and the j-th column of the data matrix (x1, ··· , xn) . The matrix Rˆ n is a popular object in the Multivariate Analysis and it has many connections to other problems.
We derive a central limit theorem (CLT) for the logarithm of the determinant of Rˆ n for a
big class of Rn. The expressions of mean and the variance in the CLT are not obvious, and they are not known before. In particular, the CLT holds if p/n has a non-zero limit and
the smallest eigenvalue of Rn is larger than 1/2. Besides, a formula of the moments of |Rˆ n| and a new method of showing weak convergence are introduced. We apply the CLT to a high-dimensional statistical test.
Keywords: Central limit theorem, sample correlation matrix, smallest eigenvalue, multivariate normal distribution, moment generating function. AMS 2000 Subject Classification: 60B20 and 60F05.
1School of Statistics, University of Minnesota, 224 Church Street, S. E., MN55455, USA, [email protected]. The research was supported in part by NSF Grant DMS-1209166 and DMS-1406279.
1 1 Introduction
We first give a background of the sample correlation matrix, then state our main result. Two new tools are introduced and the method of the proof is elaborated by using them. At last an application is presented.
1.1 Main Results
Let x1, ··· , xn be a sequence of independent random vectors from a common distribution Np(µ, Σ), that is, a p-dimensional normal distribution with mean vector µ and covariance matrix Σ = (σij)p×p. We always assume σii > 0 for each i to avoid trivial cases. In order to take limit in n, the dimension p is assumed to depend on n and is written by pn. Sometimes we write p for pn and Σ for Σn to ease notation. The corresponding correlation matrix Rn = (rij)p×p is defined by
σij rii = 1 and rij = √ (1.1) σiiσjj
0 for all 1 ≤ i 6= j ≤ p. Write X = (xij)n×p = (x1, ··· , xn) . Letr ˆij denote the Pearson correlation 0 0 coefficient between (x1i, ··· , xni) and (x1j, ··· , xnj) , given by Pn k=1(xki − x¯i)(xkj − x¯j) rˆij = (1.2) pPn 2 Pn 2 k=1(xki − x¯i) · k=1(xkj − x¯j) 1 Pn 1 Pn wherex ¯i = n k=1 xki andx ¯j = n k=1 xkj. Then the sample correlation matrix obtained from the random sample x1, ··· , xn is defined by
Rˆ n := (ˆrij)p×p. (1.3)
A natural requirement for non-singularity of Rˆ n is n > p. In this paper we will prove that log |Rˆ n|, the logarithm of the determinant of Rˆ n, satisfies the central limit theorem (CLT) under suitable assumption on Σ. Our theorem holds for a big class of Rn containing I, in particular, for Rn generated by x1 as in (1.1) such that the p entries of x1 are far from independent. In the next, we will review the literature on Rˆ n and narrate our motivation to investigate log |Rˆ n|. The sample correlation matrix Rˆ n is a popular target in Multivariate Analysis, a subject in Statistics; see, e.g., the classical books by Anderson (1958), Muirhead (1982) and Eaton (1983).
Particularly, |Rˆ n| is the likelihood ratio test statistic for testing that the p entries of X1 are independent, but not necessarily identically distributed. If Rn = I, some are understood about Rˆ n. For example, the density of |Rˆ n| is given by
(n−p−2)/2 Constant · |Rn| dRn; see, e.g., Theorem 5.1.3 in Muirhead (1982). Unfortunately, the density of the eigenvalues of
Rˆ n is not known because Rˆ n does not have the orthogonal invariance that standard random matrices (e.g., the Hermite ensemble, the Laguerre ensemble and the Jacobi ensemble) usually have. As a consequence, unlike typical random matrices, the research on Rˆ n can not rely on the density of its eigenvalues. This makes any efforts more involved.
On the other hand, the largest entries of Rˆ n is related to other statistical testing problems, random packing on spheres in Rp and the seventh most challenging problem in the 21st century
2 by Smale [Jiang (2004a) and Cai et al. (2013)]. For example, Smale (2000) asks a packing of n points on the unit sphere such that the product of all pairwise distances among the n points almost attains its largest value. There are also investigations on the least moment condition to guarantee the limit law of the largest element of Rˆ n; see, for example, Li and Rosalsky (2006), Zhou (2007) and Li el al. (2010).
Again, under Rn = I, some of the behaviors of the eigenvalues of Rˆ n are known. The empirical distribution of the eigenvalues of Rˆ n satisfies the Mar˘cenko-Pastur law (Jiang, 2004b); the largest eigenvalue of Rˆ n asymptotically satisfies the Tracy-Widom law [Bao et al. (2012)]; the quantity log |Rˆ n| satisfies the CLT [Jiang and Yang (2013) and Jiang and Qi (2015)]. To our knowledge, little is understood on Rˆ n as Rn 6= I. In particular, there is no CLT for log |Rˆ n| as Rn 6= I. Our motivations in this paper to study the CLT of log |Rˆ n| for Σ 6= I are as follows. First, we plan to understand its behavior in general. Second, there is some recent interest to study the determinants of various random matrices. For instance, Tao and Vu (2012) and Nguyen and Vu (2014) derive the CLT for the determinants of Wigner matrices; Cai et al. (2015) derive the
CLT for sample covariance matrices. So it is natural to look into |Rˆ n|. Lastly, when we study the power for the hypothesis test H0: the entries of X1 are completely independent, we need to derive the asymptotic distribution of log |Rˆ n| under Rn 6= I. This problem occurs in the High-dimensional Statistics, which together with Machine Learning forms the most important tools to study Big Data. We will further elaborate the test in Section 1.3. Before introducing our main results, we need some notation. For a square matrix M = P 2 1/2 (mij)p×p, set kMk∞ = max1≤i,j≤p |mij| and kMk2 = ( i,j |mij| ) . We denote by kMk the spectral norm of M, equivalently, the largest singular value of M. The notation |M| stands for the determinant of M. If M is symmetric, its smallest eigenvalue is denoted by λmin(M). In particular, if Rn is positive definite, then λmin(Rn) ∈ (0, 1] due to the fact rii = 1 (see also Lemma 2.3). Keep in mind that Rn is non-random and Rˆ n is a random matrix constructed 0 from data. We adopt the convention 0 = 1. In this paper, we derive the following CLT for the sample correlation matrix.
THEOREM 1 Assume p := pn satisfy that n > p + 4 and p → ∞. Let x1, ··· , xn be i.i.d. from ˆ 1 Np(µ, Σ) and Rn be as in (1.3). Assume infn≥6 λmin(Rn) > 2 . Set 3 p n − 2 µ = p − n + log 1 − − p + log |R | ; n 2 n − 1 n − 1 n h p p i 2 σ2 = −2 + log 1 − + tr [(R − I)2]. n n − 1 n − 1 n − 1 n
Then, (log |Rˆ n| − µn)/σn converges weakly to N(0, 1) as n → ∞ provided one of the following pn 1 2 pnkRn−Ik∞ holds: (i) infn≥6 > 0; (ii) infn≥6 tr[(Rn − I) ] > 0; (iii) sup < ∞. n n n≥6 kRn−Ik2
2 It is not intuitive to see or guess why µn and σn have the expressions in Theorem 1. They come purely from computation. pn 2 Condition (i) is particularly valid if n → c ∈ (0, 1]. Reading σn, conditions (ii) says the entries of Rn and/or pn are large enough. Naturally, kRn − Ik2 ≤ pkRn − Ik∞, so condition (iii) is equivalent to that kRn − Ik2 ≤ pkRn − Ik∞ ≤ KkRn − Ik2 for all n ≥ 6, where K is a constant not depending on n. This condition says that almost all of the entries of Rn − I are at the same magnitude.
3 1 1 The condition infn≥6 λmin(Rn) > 2 in Theorem 1 is used in Lemma 2.16. The number “ 2 ” 1 2 − 2 x 1 comes essentially from the Gaussian density const · e . Literally, it is related to “ 2 m” in (1.5). It will be interesting to see if the limiting distribution is still the normal distribution when 1 λmin(Rn) ≤ 2 . Take Rn = I in Theorem 1, then pnkRn − Ik∞ = kRn − Ik2 = 0. So (iii) above holds and 2 log |Rn| = tr [(Rn − I) ] = 0, the conclusion becomes Corollary 3 from Jiang and Qi (2015). Now let us look at a case where Rn is of “compound symmetry structure”, that is, all of the off- diagonal entries of Rn are equal to a ∈ [0, 1/2), then Rn has eigenvalues (1−a)+ap, 1−a, ··· , 1− 1 p a. Hence λmin(Rn) = 1 − a > 2 . Obviously, kRn − Ik∞ = a and kRn − Ik2 = a p(p − 1) for all n ≥ 6. So condition (iii) from Theorem 1 holds. We then have the following conclusion.
COROLLARY 1 Assume p := pn satisfy that n > p + 4 and p → ∞. Let x1, ··· , xn be i.i.d. from Np(µ, Σ) and Rˆ n be as in (1.3). Assume the off-diagonal entries of Rn are all equal to ˆ an ≥ 0 for each n and satisfy supn≥4 an < 1/2. Then, (log |Rn| − µn)/σn goes to N(0, 1) weakly as n → ∞, where µn and σn are as in Theorem 1 with
p−1 2 2 |Rn| = (1 + an(p − 1))(1 − an) and tr [(Rn − I) ] = p(p − 1)an.
Now we apply Theorem 1 to the correlation matrix from AR(1) and a banded correlation ma- trix. The notation AR(1) stands for the autoregressive process of order one [see, e.g., Brockwell and Davis (2002)]. Both are very popular models in Statistics.
pn COROLLARY 2 Assuming the set-up in Theorem 1 with infn≥6 n > 0. The following is true. 1 |i−j| (i) Let ρ be a constant with |ρ| < 5 . If Rn = (rij)p×p with rij = ρ for all 1 ≤ i, j ≤ p, then the CLT in Theorem 1 holds.
(ii) Let k ≥ 1 be a constant integer. Suppose Rn = (rij)p×p satisfies rij = 0 for |j − i| > k and 1 supn≥6 maxi6=j |rij| < 4k . Then, the CLT in Theorem 1 holds.
Corollary 2 will be checked at the end of Section 2.6. Besides the three special matrices of Rn studied in the above, there are a lot of other patterned matrices including the Toeplitz matrices, the Hankel matrices and the symmetric circulant matrices; see, e.g., Brockwell and Davis (2002).
One needs to check if the smallest eigenvalue of Rn is larger than 1/2. If so, the CLT holds. The proof of Theorem 1 is relatively lengthy. It needs some new tools. The tools and the method of the proof are introduced Section 1.2. In addition, we obtain some interesting matrix inequalities as byproducts. They are stated in Section 2.2.
1.2 New Tools and Strategy of Proofs
Under Σ = Ip, Jiang and Yang (2013) and Jiang and Qi (2015) prove that Ln := (log |Rˆ n| − µn)/σn converges weakly to N(0, 1); see Theorem 1 by taking Σ = Ip. The argument is showing 2 tLn t /2 limn→∞ Ee = e for t ∈ (−t0, t0). The proof may fail if the limit holds only for t in an half t interval [0, t0] or [−t0, 0]. Of course, the exact expression of E[|Rˆ n| ] is crucial. So we prefer to s get a closed form of E[|Rˆ n| ] as Σ 6= Ip. To see what we are able to obtain, define
p Y 1 Γ (z) := πp(p−1)/4 Γ z − (i − 1) (1.4) p 2 i=1
4 1 for complex number z with Re(z) > 2 (p − 1); see, e.g., p. 62 from Muirhead (1982). Through- out the paper, the notation Beta(a, b) stands for the beta distribution with probability density Γ(a+b) a−1 b−1 Γ(a)Γ(b) x (1 − x) for 0 ≤ x ≤ 1, where a > 0 and b > 0 are two parameters.
PROPOSITION 1.1 Let x1, ··· , xn be i.i.d. with distribution Np(µ, Σ) and n = m + 1 > p. Recall Rn and Rˆ n are as in (1.1) and (1.3), respectively. Set ∆n = Rn − I. Then
Γ( m ) p Γ ( m + t) ˆ t 2 p 2 E[|Rn| ] = m · m Γ( 2 + t) Γp( 2 ) t −(m/2)−t ·|Rn| · E |I + ∆n · diag(V1, ··· ,Vp)| (1.5)
m for each t > 0, where V1, ··· ,Vp are i.i.d. with Beta(t, 2 )-distribution.
If ∆n = 0, or equivalently, Rn = I, Proposition 1.1 implies
Γ( m ) p Γ ( m + t) ˆ t 2 p 2 E[|Rn| ] = m · m (1.6) Γ( 2 + t) Γp( 2 ) for all t > 0; see p. 150 from Muirhead (1982) or p. 492 from Wilks (1932). Jiang and Yang (2013) and Jiang and Qi (2015) further generalize this identity. Their results show that the formula is also true for a big range of negative value of t by using the Carlson uniqueness theorem of complex functions. We feel that Proposition 1.1 deserves a further understanding for the case t < 0.
In the special case that the off-diagonal entries of Rn are all equal to a as in Corollary 1, it is seen from Lemma 2.4 later that p p h Y i X aVi |I + ∆ · diag(V , ··· ,V )| = (1 − aV ) · 1 + (1.7) n 1 p i 1 − aV i=1 i=1 i for all a ∈ [0, 1). For a general form of ∆n, it seems there is not a better expression. Even worse, m the right hand of (1.5) involves the probability distribution Beta(t, 2 ), which forces t > 0. We may consider a complex continuation by representing the expectation in terms of integrals in a similar way to the Riemann’s zeta function or the Gamma function. However, we do see an advantage on the right hand side of (1.5): the expectation is taken over a function of i.i.d. random variables. To make use of this fact while considering the restriction “t > 0”, we develop a new tool. We say the distribution of a random variable ξ is uniquely determined by its moments {E(ξp); p = 1, 2, ···} if the following is true: for any random variable η with E(ξp) = E(ηp) for all p = 1, 2, ··· , the probability distributions of ξ and η are identical.
PROPOSITION 1.2 Let {Xn; n = 0, 1, ···} be random variables and δ > 0 be a constant such tXn p that Ee < ∞ for all n ≥ 0 and t ∈ [0, δ]. Assume supn≥0 E(|Xn| ) < ∞ for each p ≥ 1. tXn tX0 Assume limn→∞ Ee = Ee for all t ∈ [0, δ]. If the distribution of X0 can be determined p uniquely by moments {E(X0 ); p = 1, 2, ···}, then Xn converges weakly to X0 as n → ∞.
A classical result says that the above lemma holds if “t ∈ [0, δ]” is replaced by a stronger as- sumption “|t| ≤ δ”; see, e.g., Billingsley (1986). It is interesting to see that the weak convergence in Proposition 1.2 is still valid under a “one-sided” condition on moment generating functions if some extra requirements are fulfilled.
5 With the above two conclusions, we now are ready to state the method of the proof of
Theorem 1. By Proposition 1.2, it suffices to show, there exists s0 > 0 such that
ˆ 2 log |Rn| − µn t s log E exp s = log E[|Rˆ n| ] − µnt → (1.8) σn 2
s for all s ∈ (0, s0), where t = , and σn
hlog |Rˆ | − µ 2ki sup E n n < ∞ (1.9) n≥6 σn for each integer k ≥ 1. To get (1.8), by Proposition 1.1, we need to work on
m p m Γ( 2 ) Γp( 2 + t) −(m/2)−t m · m and In := E |I + ∆n · diag(V1, ··· ,Vp)| . (1.10) Γ( 2 + t) Γp( 2 ) The first one is understood well enough by Jiang and Qi (2015), so it suffices to derive an asymp- totic formula for In However, it seems hard to evaluate In directly. This can be convinced easily − m −t by the explicit case from (1.7). To compute In, by setting Qn := |I+∆n ·diag(V1, ··· ,Vp)| 2 , we will show
−hn e Qn converges in probability to 1; (1.11)
−hn {e Qn; n ≥ 6} is uniformly integrable, (1.12)
hn where hn is an explicit constant depending on t. The two assertions imply EQn ∼ e . Then (1.8) follows. The statements in (1.9), (1.11) and (1.12) are proved in Sections 2.3, 2.4 and 2.5, respectively. A detailed account of the above, which forms the proof of Theorem 1, is presented in Section 2.6.
1.3 An Application to a High-dimensional Likelihood Ratio Test
Recall x1,..., xn are i.i.d. random observations with a common p-variate normal distribution Np(µ, Σ). The according correlation matrix is given in (1.1). In application, there is only one correlation matrix associated with Σ. So we will drop the subscript to write it as R instead of
Rn in this subsection. We test that the p components of x1 are independent, which is equivalent to
H0 : R = Ip vs Ha : R 6= Ip. (1.13)
When p is fixed, the chi-square approximation holds under H0: 2p + 5 − n − 1 − log |Rˆ | converges to χ2 (1.14) 6 n p(p−1)/2 in distribution as n → ∞; see, e.g., Bartlett (1954) or p. 40 from Morrison (2004). According to the latter literature, the rejection region of the likelihood ratio test for (1.13) is
|Rˆ n| ≤ cα where cα is determined so that the test has significance level of α. For many modern data, the population dimension p is very large relative to the sample size n. The approximation (1.14) is
6 far from accurate. To correct this, Jiang and Yang (2013) and Jiang and Qi (2015) prove the
CLT as in Theorem 1 for the case Σ = Ip. For a hypothesis testing problem, we not only need to know the asymptotic distribution of the test statistic under H0 to make a decision, we also need to carry out another procedure, that is, to minimize (type II) error, or equivalently, make the so-called power as large as possible. So we have to develop the CLT for the case Ha : R 6= Ip. Theorem 1 provides us with the exact result. Define 3 p n − 2 µ = (p − n + ) log(1 − ) − p ; n,0 2 n − 1 n − 1 h p p i σ2 = −2 + log 1 − . n,0 n − 1 n − 1
According to Theorem 1 at R = Ip, the asymptotic size-α test is given by R = {log |Rˆ n| ≤ cα} −1 −1/2 R x −t2/2 with cα = µn,0 + σn,0Φ (α), where Φ(x) = (2π) −∞ e dt. By Theorem 1 again, the power function for the test is
cα − µn β(R) = P (log |Rˆ n| ≤ cα|R) ∼ Φ σn for all correlation matrix R satisfying the conditions in Theorem 1.
2 Proofs
We will prove the main conclusions, that is, Theorem 1, Propositions 1.1 and 1.2 in this section. The two propositions are almost self-contained results and they will serve as tools to prove Theorem 1. We will work on them first. The proof of Theorem 1 will follow the scheme described in Section 1.2. Since its proof is relatively lengthy, we make a list of sections next to explain what will be done in each part. Section 2.1: Proofs of Major Tools: Propositions 1.1 and 1.2. Section 2.2: Auxiliary Results on Matrices and Gamma Functions. Section 2.3: Moments on Logarithms of Determinants of Sample Correlation Matrices. The inequality (1.9) will be proved. Section 2.4: Convergence of Random Determinants. The assertion (1.11) will be verified. Section 2.5: Uniform Integrability of Random Determinants. The statement (1.12) will be justified. Section 2.6: Proofs of Theorem 1 and Corollary 2.
2.1 Proofs of Major Tools: Propositions 1.1 and 1.2
p Suppose ξ1, ··· , ξm are i.i.d. R -valued random vectors with distribution Np(0, Σ), where 0 ∈ Rp and Σ is a p × p positive definite matrix. We then say the p × p matrix W = 0 (ξ1, ··· , ξm)(ξ1, ··· , ξm) follows the Wishart distribution and denote it by W ∼ Wp(m, Σ). d For two r × s random matrices M1 and M2, we use notation M1 = M2 to represent that the two sets of rs random variables in order have the same joint distribution.
7 LEMMA 2.1 Let x1, ··· , xn be i.i.d. with distribution Np(µ, Σ) and n = m + 1 > p. Let W = (Wij) follow the Wishart distribution Wp(m, Σ). Review Rˆ n in (1.3). Then
ˆ d Wij Rn = √ p . (2.1) Wii · Wjj p×p
In particular, Rˆ n is a positive definite matrix with 0 < |Rˆ n| ≤ 1.
0 Proof of Lemma 2.1. Write X = (x1, ··· , xn) = (xij)n×p = (y1, ··· , yp). Define e = (1, ··· , 1)0 ∈ Rn. Review the definition of the Pearson correlation coefficient in (1.2), it is easily 0 0 seen that the correlation between yi = (x1i, ··· , xni) and yj = (x1j, ··· , xnj) is the same as that between yi + ae and yj + be for any a ∈ R and b ∈ R. Thus, without loss of generality, we can assume µ = 0. 1 0 ˆ Set M = In − n ee . Recall Rn = (ˆrij)p×p. From (1.2) and (1.3), it is trivial to check that
0 (Myi) (Myj) rˆij = (2.2) kMyik · kMyjk for all 1 ≤ i, j ≤ p. Let x˜1, ··· , x˜n be i.i.d. random vectors with distribution Np(0, Ip). Then,
0 (My1, ··· , Myp) = M(x1, ··· , xn) d 1/2 1/2 0 = M(Σ x˜1, ··· , Σ x˜n) 0 1/2 = M(x˜1, ··· , x˜n) Σ , (2.3) where Σ1/2 is a positive definite matrix satisfying Σ1/2 · Σ1/2 = Σ. Since M2 = M and tr(M) = 0 n − 1 = m, there is an n × n orthogonal matrix Γ such that M = Γ diag(Im, 0)Γ . Hence, by the invariance property of Np(0, Ip),
0 d 0 M(x˜1, ··· , x˜n) = Γ diag(Im, 0)(x˜1, ··· , x˜n) ! U = Γ = Γ U 0 1
0 p where 0 = (0, ··· , 0) ∈ R , all of the entries of U = Um×p are i.i.d. standard normals and Γ1 is the n × m matrix by deleting the last column of Γ. This together with (2.3) implies that
d 1/2 (My1, ··· , Myp) = Γ1UΣ .
0 Use the fact Γ1Γ1 = Im to have
0 (Wij)p×p : = (My1, ··· , Myp) (My1, ··· , Myp) =d (Σ1/2U0)(Σ1/2U0)0.
0 2 In particular, Wij = (Myi) (Myj) and Wii = kMyik for all i, j. Evidently, the m columns of 1/2 0 the p × m matrix Σ U are i.i.d. random variables with distribution Np(0, Σ). The conclusion (2.1) then follows from (2.2) and the definition of a Wishart matrix. ˆ −1/2 −1/2 Finally, write Rn = QWQ where Q := diag (W11 , ··· ,Wpp ). For m ≥ p, the Wishart matrix W is positive definite. Thus, Rˆ n is positive definite. Since all of the diagonal entries of ˆ ˆ Qp ˆ Rn are equal to 1, we see |Rn| ≤ i=1(Rn)ii = 1 by the Hadamard inequality.
8 We now use Lemma 2.1 to show Proposition 1.1. Proof of Proposition 1.1. Let us continue to use the notation in the proof of Lemma 2.1. By Theorem 3.2.1 from Muirhead (1982), the density function of W is given by
− 1 tr(Σ−1W ) (m−p−1)/2 Cm,p · e 2 · |W | (2.4)
1 for all positive definite matrix W , where Γp( 2 m) is the multivariate Gamma function defined at (1.4) and 1 C = . m,p mp/2 1 m/2 2 Γp( 2 m)|Σ| ˆ Qp −1 By Lemma 2.1, |Rn| has the same distribution as that of |W| · ( i=1 Wii) . Thus, t Z |W | 1 −1 ˆ t − 2 tr(Σ W ) (m−p−1)/2 E[|Rn| ] = Cm,p Qp t e · |W | dW W >0 ( i=1 Wii) t+(m−p−1)/2 Z |W | 1 −1 − 2 tr(Σ W ) = Cm,p Qp t e dW. W >0 ( i=1 Wii) Write ∞ 1 Z 1 −t − 2 Wiiyi t−1 Wii = t e yi dyi 2 Γ(t) 0 for all i (this is the step we have to assume t > 0). It follows that
t E[|Rˆ n| ] Z p Z Y 1 −1 1 Pp 0 t−1 t+(m−p−1)/2 − 2 tr(Σ W ) − 2 i=1 Wiiyi = Cm,p yi dy1 ··· dyp |W | e · e dW p (R+) i=1 W >0 where R+ = [0, ∞) and 1 C0 = C · . m,p m,p 2ptΓ(t)p Pp Write Y = diag(y1, ··· , yp) and i=1 Wiiyi = tr(YW ). Then
t E[|Rˆ n| ] Z p Z Y 1 −1 0 t−1 t+(m−p−1)/2 − 2 tr((Σ +Y )W ) = Cm,p yi dy1 ··· dyp |W | e dW p (R+) i=1 W >0 Z p 0 h m pt+(mp/2)i −1 −((m/2)+t) Y t−1 = Cm,p Γp(t + )2 |Σ + Y | yi dy1 ··· dyp (2.5) 2 p (R+) i=1
m 1 by Theorem 2.1.11 from Muirhead (1982) (taking a = t + 2 ) provided t > − 2 (m + 1 − p). From (2.4) the constant in front of the integral from (2.5) becomes 1 1 m 1 Γ ( m + t) · · Γ (t + )2pt+(mp/2) = · p 2 . mp/2 1 m/2 pt p p m/2 p 1 2 Γp( 2 m)|Σ| 2 Γ(t) 2 |Σ| Γ(t) Γp( 2 m) Write |Σ−1 + Y| = |Σ|−1|I + ΣY|. Then
t Γ ( m + t) Z p ˆ t |Σ| p 2 −(m/2)−t Y t−1 E[|Rn| ] = p · m |I + ΣY | yi dy1 ··· dyp (2.6) Γ(t) Γ ( ) p p 2 [0,∞) i=1
9 for any t > 0.
We next will transfer (2.6) to the right hand side of (1.5). Recall (1.1) and write Σ = (σij). 1/2 1/2 Then Σ = LRnL where L := diag (σ11 , ··· , σpp ). Noticing L and Y are both diagonal matrices. Plugging this into (2.6), we see that t |R |t Qp σ Γ ( m + t) Z p ˆ t n i=1 ii p 2 2 −(m/2)−t Y t−1 E[|Rn| ] = p · m |I + RnL Y | yi dy1 ··· dyp. Γ(t) Γ ( ) p p 2 [0,∞) i=1 2 Obviously, L Y = diag (σ11y1, ··· , σppyp). Set si = σiiyi for all i and S = diag (s1, ··· , sp). We obtain t Γ ( m + t) Z p ˆ t |Rn| p 2 −(m/2)−t Y t−1 E[|Rn| ] = p · m |I + RnS| si ds1 ··· dsp Γ(t) Γ ( ) p p 2 [0,∞) i=1 m p m Γ( 2 ) Γp( 2 + t) = m · m Γ( 2 + t) Γp( 2 ) p m p Z t Γ( 2 + t) −(m/2)−t Y t−1 ·|Rn| · m |I + RnS| si ds1 ··· dsp. Γ(t)Γ( ) p 2 [0,∞) i=1 −1 Write ∆n = Rn − I. Then |I + RnS| = |I + S + ∆nS| = |I + S| · |I + ∆nS(I + S) |. Thus, the last integral equals Z p −1 −(m/2)−t Y −(m/2)−t t−1 |I + ∆nS(I + S) | (1 + si) si ds1 ··· dsp. p [0,∞) i=1
1 xi Set xi = si/(1 + si) for 1 ≤ i ≤ p. Then 1 + si = and si = . The integral above is 1−xi 1−xi equal to Z p Y m −(m/2)−t t−1 2 −1 |I + ∆n · diag(x1, ··· , xp)| xi (1 − xi) dx1 ··· dxp. p [0,1] i=1 This says that Γ( m ) p Γ ( m + t) ˆ t 2 p 2 E[|Rn| ] = m · m Γ( 2 + t) Γp( 2 ) t −(m/2)−t ·|Rn| · E|I + ∆n · diag(V1, ··· ,Vp)| m where V1, ··· ,Vp are i.i.d. with Beta(t, 2 )-distribution. Proof of Proposition 1.2. We first claim that
p tXn sup E{|Xn| e } < ∞ (2.7) n≥0 for every integer p ≥ 1 and every t ∈ [0, δ), where δ is as in the statement of the proposition.
p tXn 2p tXn tXn p Obviously, |Xn| e ≤ |Xn| e + e . So, by the given condition supn≥0 E(|Xn| ) < ∞ for each p ≥ 1, we only need to prove (2.7) for even integer p ≥ 1 and t ∈ (0, δ). In fact, for 00 00 p 0 00 0 δ Xn (δ ) p δ ∈ (0, δ), set δ = (δ −δ )/2. By the Taylor expansion, e ≥ p! (Xn) for all even integers 0 p ≥ 1 if Xn ≥ 0. Consequently, for all t ∈ (0, δ ],
p tXn p tXn p tXn (Xn) e = (Xn) e I(Xn ≥ 0) + (Xn) e I(Xn < 0) p! 00 1 (t+δ )Xn p x ≤ 00p · e I(Xn ≥ 0) + p · sup{x e } δ t x≤0 p! 1 δXn p x ≤ 00p · e + p · sup{x e } δ t x≤0
10 for all even integer p ≥ 1. We then get (2.7) by taking expectations. 2 The condition supn≥1 E(Xn) < ∞ implies that {Xn; n ≥ 1} is tight. Thus, for any subse- quence {Xnk ; k ≥ 1} such that Xnk converges weakly to Y as k → ∞, to prove the lemma, it is enough to show that Y and X0 have the same distribution.
tXnk Now we assume that Xnk converging weakly to Y as k → ∞. Since e converges weakly to etY ≥ 0, by assumption and the Fatou lemma, EetY ≤ EetX0 for all t ∈ [0, δ]. This implies tY tX that {e , e nk ; k ≥ 1} is uniformly integrable for each t ∈ [0, δ) by H¨older’sinequality. Thus,
tY tX tX Ee = lim Ee nk = Ee 0 (2.8) k→∞
p tXnk p tY for all t ∈ [0, δ). Furthermore, |Xnk | e → |Y | e weakly for every t ∈ R. By (2.7) and the Fatou lemma again, we know that E{|Y |petY } < ∞ for every p ≥ 1 and every t ∈ [0, δ). This and (2.7) imply
p tY p tX0 E{|Y | e } < ∞ and E{|X0| e } < ∞ (2.9)
p tY p δ1Y p for every p ≥ 1 and every t ∈ [0, δ). Fix δ1 ∈ (0, δ). Observe |Y | e ≤ |Y | e + |Y | for all t ∈ [0, δ1]. Then (2.9) can be strengthened to