arXiv:1901.05166v2 [math.ST] 2 Jun 2021 nta fidpnet h atrbigamc togrnto th notion stronger much a being latter the independent, of instead aarsetvl.Define respectively. M x ihdmninlcvrac arcsi elliptical in matrices high-dimensional eerhqeto nsaitc st nlz h eairof behavior the analyze to is in question research size eae,futu eut xlrn h smttcpoet of property asymptotic the exploring results fruitful decades, follow iesoa smttcrgm hr h dimension co the In where regime. regime asymptotic asymptotic high-dimensional dimensional the under theory matrix , . . . nteifrneo ihdmninlcvrac arcsuigran using matrices covariance high-dimensional of that inference the on vno oprbemgiue o ito nrdcoymtraso materials introductory of list [ a e.g. For magnitude. comparable of even from 0 ebr ntefml felpia itiuin.Oeecpini h ca the is exception One distributions. elliptical mod of statistical family useful the practically in many members excludes assumption This 1. ups n bevdidpnetadietclydsrbtd(... d (i.i.d.) distributed identically and independent observed one Suppose Introduction 1. r c-io ii o h ags ievleof eigenvalue largest the for limit Tracy-Widom 1 , . . . , × x N 3 X N N , M 5 slre h ihdmninlaypoi eierfr ota bot that to refers regime asymptotic high-dimensional the large, is x R r o oml h nre nec ounof column each in entries the normal, not are = , 1 arx hl = Σ while matrix, N eateto ttsisadApidPoaiiy Nationa , Applied and Statistics of Department vraenra itiuinwt en0adpplto oainem covariance population and 0 with distribution normal -variate 10 M ial itiue ounvectors column distributed tically Abstract: .I h ieaue h quantity the literature, the In Σ. cln,where scaling, in,nml,tesae ags ievlecnegswea that converges distributions eigenvalue the largest of scaled the namely, tions, orhmmn fterdu of radius the of fourth eslt,TayWdmdsrbto,Ti probability. Tail distribution, Tracy-Widom versality, phrases: and large Keywords scaled the no of multivariate distribution of law. limiting matrix Tracy-Widom covariance the sample that the conclude of that with ftesae ags ievleof eigenvalue largest scaled the of Y T where , hr h oiieintegers positive the where , , 13 ihthe with , 19 ∗ e Jun Wen , 39 Let stecnuaetasoeo arcstruhu hsatce A article. this throughout matrices of transpose conjugate the is X .Let ]. X ∗ M stetasoeof transpose the is ean be W × matgnrcvr 000/6fl:dat11.e date draft-1218.tex file: 2020/08/06 ver. imsart-generic T 1 = E N i Jiahui Xie , eamti uhthat such matrix a be x x M N 1 1 , . . . , matrix apecvrac arcs litcldsrbtos Ed distributions, Elliptical matrices, covariance Sample x × − 1 ∗ N 1 distributions x P sdfie stepplto oainemti.I recent In matrix. covariance population the as defined is x admmti ossigo independent of consisting matrix random 1 XX * N i N [email protected] XX xss.I atclr i oprn h re function Green the comparing via particular, In . exists x =1 Y olwas follow 1 X ∗ , . . . , N x ∗ nti ril,w rv httelmtn behavior limiting the that prove we article, this In . srfre oa h apecvrac arxafter matrix covariance sample the as to referred is ossigo ...etiswt en0advariance and 0 mean with entries i.i.d. of consisting 1 i suieslfrawd ls felpia distribu- elliptical of class wide a for universal is uLong Yu , x and x i ∗ eerdt stesml oainemti of matrix covariance sample the as to referred , N 1 ,N M, ihgnrlpplto oainematrix covariance population general with M T T M r h apesz n h ieso of dimension the and size sample the are ∞ → X ∗ 1 suulyfie rsaladtesample the and small or fixed usually is r nygaate ob uncorrelated be to guaranteed only are ,I swrhntn htms works most that noting worth is It Σ, = n huWang Zhou and with nvriyo igpr,e-mail: Singapore, of University l l otesm ii regardless limit same the to kly tegnau stecelebrated the is eigenvalue st W W mlydsrbtddt,we data, distributed rmally M/N aebe salse yrandom by established been have Let . nteformer. the an tatt h rdtoa low traditional the to ntrast admmti hoy see theory, matrix random n → o arxter assume theory matrix dom l,frisac,ams all almost instance, for els, X ata φ 0 M ( = > x h ewhere se vraeellip- -variate 1 fteweak the if 0 1 , . . . , ,M N, x 1 , . . . , euni- ge ti .If Σ. atrix x ue3 2021 3, June : r ag and large are N fundamental x x ihmean with N 1 , . . . , ethe be ) x x N 1 , /Tracy-Widom limit in elliptical distributions 2

In this article, we consider the case where x1,..., xN follow elliptical distribution which is a family of probability distributions widely used in statistical modeling. See e.g. the technical report [1] for a comprehensive introduction to elliptical distributions. Generally, we say a random vector y follows elliptical distribution if y can be written as

y = ξAu, (1.1)

M M where A R × is a nonrandom matrix with rank(A)= M, ξ 0 is a scalar representing∈ the radius of y, and u RM is the random direction,≥ which is independent of ∈ M 1 M ξ and uniformly distributed on the M 1 dimensional unit sphere S − in R , denoted as M 1 − u U(S − ). See e.g. [15, 16, 24, 43] for some recent advances on for elliptically∼ distributed data. The current paper focuses on the problem involving the largest eigenvalue of sample with elliptically distributed data. Briefly speaking, we show the following result. Claim 1. If Σ satisfies some mild assumptions, then the rescaled largest eigenvalue N 2/3(λ ( ) 1 W − λ+) converges weakly to the celebrated Tracy-Widom law ([17, 27, 35, 42]) if the radius satisfies the following tail probability condition:

lim lim sup s2P( Nξ2 M √Ms) = 0 (1.2) s N | − |≥ →∞ →∞ Our arguments are built upon the pioneering works [9, 14, 22, 30, 37, 32]. The eigenvalues of sample covariance matrix widely appear in statistical applications such as principal component analysis (PCA), , hypothesis testing. As an instance, in PCA, the eigenvalue of covariance matrix represents the of each component of rotated vector. In many practical situations, such as financial asset pricing and signal processing (see, e.g. [2, 12]), the data observed are usually of high-dimension but are actually sparse in nature. A common practical act is to keep only the small portion of components of large suggested by the eigenvalues of covariance matrix with others discarded. This way of data manipulation often acts as an effective dimension reduction technique in practice. However, most of the time, the eigenvalues of population covariance matrix are unknown. At this time, the largest sample eigenvalue performs as a good candidate for the inference of properties of population eigenvalues. See e.g. [8, 28, 29, 36]. We summarize the contributions of this paper here. We first prove the local law and Tracy- Widom limit for the sample covariance matrices of elliptical high-dimensional random vectors. The weak correlations cross the coordinates differentiate the model from the existing studies, hence facilitate to the applications in more general scenarios. The correlations also bring new challenge to the technical proofs, such as calculating the large deviation bounds and fluctuation averaging errors. Corresponding lemmas in this paper can be of independent interest. More- over, we relax the typical moment conditions in existing results, e.g.[24, 25], to the sharper tail probability assumption (1.2). We prove that the tail probability assumption is sufficient for the Tracy-Widom limit under elliptical distribution. This result will undoubtedly push forward the research on theory with elliptically distributed data. This article is organized as follows. In Section 2, we introduce our notation and list the basic conditions. In Section 3, we present our main results and the sketch of proof. We first prove the deformed local law which is a bound of difference between the Stieltjes transform of empirical distribution of sample eigenvalues and that of its limiting counterpart under some

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 3 bounded restrictions. This result will be our starting point to derive the limiting distribution of the rescaled largest eigenvalue. Meanwhile, it may be of interest for its own right since a number of other useful consequences regarding sample covariance matrix can be obtained from it, such as eigenvector delocalization and eigenvalue spacing (see e.g. [10]). Taking the deformed local law as an input, the next step is to prove a strong average local law with some restrictions on bounded support and the four leading moments of ξ. Finally, we show that the limiting distribution of the rescaled largest eigenvalue does not depend on the specific distribution of matrix entries under a tail probability condition. Comparison with the normally distributed data then indicates that the limiting distribution of the rescaled largest eigenvalue is the Tracy-Widom (TW) law. In Sections 4 to 8, we give the detailed proof of our main theorems, while several lemmas and results will be also put into the Appendices.

2. Notation and basic conditions

Throughout this article, we set C > 0 to be a constant whose value may be different from line + to line. Z, Z+, R, R+, C, C denote the sets of integers, positive integers, real numbers, positive real numbers, complex numbers and the upper half complex plane respectively. For a,b R, a b = min(a,b) and a b = max(a,b). ı = √ 1. For a z, Re z and Im z denote∈ ∧ ∨ − the real and imaginary parts of z respectively. For a matrix A = (Aij ), TrA denotes the trace of A, A denotes the spectral norm of A equal to the largest singular value of A (usually we use k ask well) and A denotes the Frobenius norm of A equal to ( A 2)1/2. For M Z , k·k2 k kF ij | ij | ∈ + diag(a1,...,aM ) denote the diagonal matrix with a1,...,aM as its diagonal elements. For two P sequences of numbers aN N∞=1, bN N∞=1, aN bN if there exist constants C1, C2 > 0 such that C b a C b { and} O(a{ ) and} o(a )≍ denote the sequences such that O(a )/a C 1| N | ≤ | N |≤ 2| N | N N | N N |≤ with some constant C > 0 for all large N and limN o(aN )/aN = 0. I denotes the identity matrix of appropriate size. For a set A, Ac denotes its complement→∞ (with respect to some whole set Z 2 which is clear in the context). For some integer M +, χM denotes the chi-square distribution with degrees of freedom M. For a measure ̺, supp(∈̺) denotes its support. For any finite set T , we let T denote the cardinality of T . For any event Ξ, 1(Ξ) denotes the indicator of the event | | Ξ, equal to 1 if Ξ occurs and 0 if Ξ does not occur. For any a,b R with a b, 1[a,b](x) is equal to 1 if x [a,b] and 0 if x / [a,b]. ∈ ≤ ∈ ∈ We consider the M N data matrix X as follows. Let U = (u1,..., uN ) and D = diag(ξ1,...,ξN ). Then the corresponding× column vectors and data matrices are

1/2 X := (x1,..., xN ), Σ U =: (r1,..., rN ),

1/2 1/2 X =Σ UD =Σ (ξ1u1,...,ξN uN ) = (ξ1r1,...,ξN rN ) M 1 where ui’s are i.i.d. from U(S − ) and ξi’s are i.i.d. nonnegative random variables independent with all ui’s. Our sample covariance matrix is XX∗. So ξi has absorbed the usual normalised factor 1/√N for all i. For convenience we write ξˆi := √Nξi. The Σ1/2 in the above equation can be also replaced by some general M M matrix A with × AA∗ = Σ. We claim that the technical proof is totally the same, using the singular value decom- position A = UADAVA and the observation that the distribution of u1,..., uN is orthogonally invariant. For the same reason, without loss of generality we assume that Σ = diag(σ1,...,σM ), where σ1,...,σM denote the descending eigenvalues.

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 4

Denote the empirical spectral density of Σ as

1 M π := δ . M σi i=1 X Following the general assumptions on Σ in the literature, we suppose that for a small enough constant τ > 0, 1 σ 6 τ − π([0, τ]) 6 1 τ. (2.1) 1 − Fix 0 <τ< 1, and define

+ 1 1+τ 1 D D(τ,N) := z = E + ıη C : z τ, E τ − ,N − η τ − . ≡ { ∈ | |≥ | |≤ ≤ ≤ } For z := E + ıη C+, further define the following quantities ∈

W = X∗X, = XX∗, W while the respective Green functions of W and are W 1 1 G(z) = (W zI)− , (z) = ( zI)− . − G W− We denote the respective empirical spectral density of W and as W N M 1 1 ρW := δλi(W ), ρ := δλi( ). N W M W i=1 i=1 X X The stieltjes’ transforms of ρW and ρ are given by W

W 1 1 1 1 mN := ρW = Tr G(z), mNW := ρ = Tr (z). x z N x z W M G Z − Z − W Throughout the rest of the paper, we denote mN (z) := mN (z) for simplification. It’s easy to see that the eigenvalues of W and are the same up to M N number of 0s. We W | − | denote the descending eigenvalues of W and in the unified manner as λ1 λ2 λM N , where λ ,...,λ and λ ,...,λ are understoodW to be the eigenvalues of ≥and W≥···≥respectively.∨ 1 M 1 N W In particular, λM N+1,...,λM N are all 0. Consequently, ∧ ∨

φN ρ = ρW + (1 φN )δ0 (2.2) W − and 1 φN φ mW (z)= − + m (z), (2.3) N N z N where φN := M/N. We may suppress the subscript N and use φ directly hereafter. Denote the index set = 1,...,N . For T , we introduce the notation X(T ) to denote the M (N T ) minorI of {X obtained} from removing⊂ I all the ith columns of X for i T . In × −( ) | | ∈ particular, X ∅ = X. For convenience, we briefly write ( i ), ( i, j ) and i, j T as (i), (i, j) and (ijT ) respectively. Correspondingly, { } { } { } ∪

(T ) (T ) (T ) (T ) (T ) (T ) W = (X )∗X , = X (X )∗, W

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 5 and

(T ) (T ) 1 (T ) (T ) 1 (T ) 1 (T ) G (z) = (W zI)− , (z) = ( zI)− , m (z)= TrG (z). − G W − N N

Throughout this article, we denote Xij as the (i, j)-th entry of a matrix X. In particular, in the (T ) minor Xij with i,j / T , we keep the original indices of X. In the following, we∈ present a notion introduced in [20]. It provides a simple way of system- atizing and making precise statements for two families of random variables A, B of the form “A is bounded with high probability by B up to small powers of N”. Definition 2.1 (Stochastic domination). (a). For two families of nonnegative random variables

A = A (t): N Z ,t T , B = B (t): N Z ,t T , { N ∈ + ∈ N } { N ∈ + ∈ N } where TN is a possibly N-dependent parameter set, we say that A is stochastically dominated by B, uniformly in t if for all (small) ε> 0 and (large) D> 0 there exists N (ε,D) Z such that 0 ∈ + as N N0(ε,D), ≥ ε D sup P AN (t) >N BN (t) N − . t TN ≤ ∈ If A is stochastically dominated by B , uniformly in t, we use notation A B or A = O (B). Moreover, for some complex family A if A B we also write A = O (B)≺. ≺ (b). Let A be a family of random matrices| |≺ and ζ be a family of nonnegative≺ random variables. Then we denote A = O (ζ) if A is dominated under weak operator norm sense, i.e. v, Aw ≺ |h i| ≺ ζ v 2 w 2 for any deterministic vectors v and w. k k k k ǫ (c). For two sequences of numbers a ∞ , b ∞ , a b if for all ǫ> 0, a N b . { N }N=1 { N }N=1 N ≺ N N ≤ N Remark 2.2. The stochastic domination throughout this article holds uniformly for the matrix indices and z D (or the set De defined later). For simplicity, in the proof of each result, we omit the explicit∈ indication of this uniformity. The discussion in this paper highly relies on the following global definitions. Definition 2.3 (High probability event). We say that an N-dependent event Ω holds with overwhelming high probability if there exists constant c> 0 independent of N, such that

P(Ω) > 1 exp( N c), (2.4) − − for all sufficiently large N. Definition 2.4 (Bounded support condition). We say that an N-dependent random variable x := x(N) satisfies the bounded support condition with q q(N) if ≡ P( x 6 q) > 1 exp( N c), (2.5) | | − − for some c> 0. Remark 2.5. Note that if x satisfies Condition (2.5), it is equivalent to that the event x 6 q holds with high probability. Consequently, we can neglect the bad event x > q . In other{| words,| } in our proof, we are in a high probability whole set Ω. For instance,{| under| Ω}, the entries of a data matrix satisfy the bounded support condition.

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 6

Throughout this article, we assume the following conditions.

Condition 2.6. N with M M(N) such that φ := M/N φ0 [a,b] for all large N where a Ms)=0, (2.6) s N | − | →∞ →∞ for all i 1, ,N . Recall that ξ = ξˆ /√N and ξˆ2 has left tight support i.e. ξˆ2 0. ∈{ ··· } i i i ≥ 1/2 ǫ We remark that in Condition 2.7, the general choice of s diverges with N, e.g. s = N − in the proof of Theorem 3.6 below. We put an alternative restriction on ξ1,...,ξN . E 2 Condition 2.8. ξ1,...,ξN are independent nonnegative random variables such that ξi = φ and ξ2 φ has bounded support q in the sense of Definition 2.4 with i − 1/2 c N − log N 6 q 6 N − for some c< 1/2 and all i 1, ,N . ∈{ ··· } Remark 2.9. Condition 2.7 excludes some elliptical distributions, such as multivariate student- t distributions and normal scale mixtures. The limiting spectral distribution of sample covariance matrix from these distributions do not follow the Marˇcenko-Pastur equation (2.7), see ([18, 33]), and hence is out of scope of this article. Actually there are still a wide of distributions satisfying Condition 2.7, including the multivariate Pearson type II distribution and the family of Kotz-type distributions, see the examples and Table 1 in [24]. In particular, if ξ2 can be written 2 1 2 2 E as ξ = N − (y1 + + yM ) with y1,...,yM being an positive i.i.d. sequence such that y1 =1 and Ey4 < , then···ξ2 satisfies Condition 2.7. 1 ∞ Remark 2.10. (2.6) is weaker than

1 E ˆ2 2 lim sup ξi M < N M | − | ∞ →∞ but stronger than 1 E ˆ2 2 δ lim sup ξi M − < N M | − | ∞ →∞ for arbitrary δ > 0. One can check that Conditions 2.6 and 2.8 are sufficient for Theorem 1.1 of [6]. Hence we have the following result. Lemma 2.11. Suppose, given Conditions 2.6 and 2.8, π converges weakly to a probability dis- tribution π0 and φ φ0 (0, ). Then, almost surely, ̺W converges weakly to a deterministic → ∈ ∞ + limiting ̺0 and for any z C , almost surely, mN (z) converges to the ∈ + Stieltjes transform of ̺0 which we denote as m0(z). Moreover, for all z C , m0(z) is the unique value in C+ satisfying the equation ∈ 1 x z = + φ π (dx). (2.7) −m (z) 0 1+ xm (z) 0 0 Z 0

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 7

Remark 2.12. If we replace π0 and φ0 by their finite sample counterparts π and φ in (2.7) and solve for m for each z C+, we obtain a Stieltjes transform of a deterministic probability distribution. Throughout this∈ article, we denote this deterministic probability distribution and its Stieltjes transform as ̺ and m(z) respectively. By Lemma 2.11, when N is large ̺W and mN (z) are close to ̺ and m(z). The aim of next section is to evaluate the bound of m (z) m(z) . | N − | We define the function f : C C, → 1 xπ(dx) f(w)= + φ , (2.8) −w 1+ wx Z and assume that

f ′( c)=0, 0 < lim inf σM lim sup σ1 < , lim sup σ1c < 1. (2.9) − N ≤ N ∞ N →∞ →∞ →∞ 1 for c (0, σ1− ). Let λ+ := f( c), so it can be shown that λ+ is the rightmost endpoint of supp(̺∈) (see the discussion on page− 4 of [9] or Lemma 2.4 of [30]), i.e., the edge of ̺. For τ, τ ′ (0, ), N Z , define ∈ ∞ ∈ + e e D D (τ, τ ′,N) := z D(τ,N): E [λ τ ′, λ + τ ′] (2.10) ≡ { ∈ ∈ + − + } as the subset of D with the real part of z restricted to a small closed interval around λ+. Also we define the distance to the rightmost edge as κ κ := E λ for z = E + iη. (2.11) ≡ E | − +| Now we introduce some definitions before presenting our main results.

Definition 2.13 (Linearizing block matrix). For z C+, we define the (N + M) (N + M) block matrix (no commas in matrices) ∈ × 0 X H := , (2.12) X∗ 0   and 1 IM M X − := − × (2.13) H X∗ zIN N  − ×  z X z XG = G G = G . (2.14) ( X)∗ G (XG)∗ G  G    Definition 2.14 (Deterministic limit of ). We define the deterministic limit Π of as H H 1 (1 + m(z)Σ)− 0 Π(z) := − . (2.15) 0 m(z)IN N  ×  Define the control parameters by

Λ Λ(z) := max Gij (z) δij m(z) , Λo Λo(z) := max Gij (z) , ≡ i,j | − | ≡ i,j ,i=j | | ∈I ∈I 6

Im m(z)+Θ 1 Θ Θ(z) := mN (z) m(z) , ΨΘ := ℑ , Ξ := Λ (log N)− , ≡ | − | s Nη { ≤ }

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 8 where δij denotes the Kronecker delta, i.e. δij = 1 if i = j, and δij = 0 if i = j and Ξ is a z-dependent event. For simplicity of notation, we occasionally omit the variable6 z for those z-dependent quantities provided no ambiguity occurs.

3. Main results

3.1. Deformed local law

Theorem 3.1 (Deformed strong local law). Given Conditions 2.6, 2.8 as well as (2.1) and (2.9), there exists a constant τ ′ depending only on τ such that

Λ(z) Im m(z) + 1 + q, (3.1) ≺ Nη Nη q 2 m (z) m(z) min q, q + 1 , (3.2) | N − |≺ { √κ+η } Nη e   uniformly for z D (τ, τ ′,N). ∈ e Remark 3.1. Theorem 3.1 can be strengthened in a simultaneous sense for z D (τ, τ ′,N), ∈ 1 using the Lipschitz continuity of Gij (z),m(z), ΨΛ(z), Ψm(z) and the fact that ΨΛ(z), Φm(z) N e ≥ on D (τ, τ ′,N), where

Im m(z) 1 q2 1 ΦΛ(z) := + + q, Φm(z) := min q, + . s Nη Nη { √κ + η } Nη The proof is essentially the same as the one in (III.5) of Appendix III. We just put down the conclusions as follows, Λ(z) m (z) m(z) sup max 1, sup | N − | 1, (3.3) z De i,j ΦΛ(z) ≺ z De Φm(z) ≺ ∈ ∈ under the assumptions in Theorem 3.1. A direct consequence is the following theorem. Theorem 3.2. Under the assumptions in Theorem 3.1, we have

2 ǫ 2 2/3 6 λ + N (q + N − ). (3.4) kHk + n b n Furthermore, for any real numbers a,b such that a b, define N (a,b)= a ̺N (dx) and (a,b)= b ≤ a ̺(dx). Then there exists a constant τ ′ depending only on τ such that forR any E1, E2 Re z : e ∈{ z D (τ, τ ′,N) , R ∈ } 1 3 2 nN (E1, E2) n(E1, E2) N − + q + q (√κE √κE ). (3.5) | − |≺ 1 − 2 1/3 Consequently, we have for q N − , ≤ 1/3 2/3 2 λ γ i− N − + q , (3.6) | i − i|≺ uniformly in i such that γ [λ c, λ ] for some c> 0, where i ∈ + − + ∞ i 1 γ := sup ̺(x)dx > − . i { N } x Zx imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 9

The proof of this theorem is the same as the one in [14], and we summarize the main arguments in Appendix III.

3.2. Edge universality with small support

Theorem 3.3 (Edge universality with small support). Suppose XW and XV are two random 5/12+ζ matrices satisfying Conditions 2.6 and 2.8 with q 6 N − for some small ζ > 0. Then there exist some positive constants ǫ,δ > 0 such that for any s R ∈ V 2/3 ǫ δ W 2/3 P (N (λ1 λ+) s N − ) N − P (N (λ1 λ+) s) − ≤ − − ≤ − ≤ (3.7) V 2/3 ǫ δ P (N (λ λ ) s + N − )+ N − , ≤ 1 − + ≤ where PV and PW denote the laws of XV and XW respectively. Remark 3.2. Theorem 3.3 can be extended to the case of joint distribution of the largest k eigenvalues for any fixed positive integer k, that is, for any real numbers s1,...,sk which may depend on N, there exist some positive constants ε,δ > 0 such that for all large N

V 2/3 ε 2/3 ε δ P (N (λ λ ) s N − ,...,N (λ λ ) s N − ) N − 1 − + ≤ 1 − k − + ≤ k − − PW (N 2/3(λ λ ) s ,...,N 2/3(λ λ ) s ) ≤ 1 − + ≤ 1 k − + ≤ k V 2/3 ε 2/3 ε δ P (N (λ λ ) s + N − ,...,N (λ λ ) s + N − )+ N − . (3.8) ≤ 1 − + ≤ 1 k − + ≤ k 3.3. Edge universality with large support

Theorem 3.4 (Rigidity of eigenvalues with large support). Suppose random matrix X satisfies c Conditions 2.6 and 2.8 with q N − for some constant c> 0 and suppose moreover that ≤ E 2 2 1 ξi φ BN − log N, (3.9) | − | ≤ ′ for some constant B > 0. Then there exists constant c1,τ,τ such that

mN (z) m(z) sup | − 1 | 1, (3.10) z De (Nη)− ≺ ∈ for sufficient large N. Moreover, (3.10) implies that with high probability 1/3 2/3 λ γ i− N − , (3.11) | i − i|≺ uniformly in i such that γ [λ c , λ ], and i ∈ + − 1 + 1 sup nN (E) n(E) . (3.12) E λ+ c1 | − |≺ N ≥ − Theorem 3.5 (Edge universality with large support). Suppose XW and XV are two random matrices satisfying the assumptions in Theorem 3.4. Then there exist some positive constants ǫ,δ > 0 such that for any s R ∈ V 2/3 ǫ δ W 2/3 P (N (λ1 λ+) s N − ) N − P (N (λ1 λ+) s) − ≤ − − ≤ − ≤ (3.13) V 2/3 ǫ δ P (N (λ λ ) s + N − )+ N − , ≤ 1 − + ≤ where PV and PW denote the laws of XV and XW respectively.

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 10

3.4. Edge universality

M 1 Let u1,..., uN be from U(S − ), and ξ˜1,..., ξ˜N be i.i.d. non-negative random variables such ˜2 2 ˜ ˜ that ξ1 follows χM /N distribution. Assume the independence of u1,..., uN and ξ1,..., ξN . 1/2 { } { } Let X˜ := Σ (ξ˜1u1,..., ξ˜N uN ), so X˜ is a matrix whose columns are i.i.d. Gaussian random vectors. We claim in the following theorem that for elliptically distributed data X and Σ satisfying Condition 2.6, (2.1) and (2.9), the largest eigenvalue of its sample covariance matrix follows the same limiting distribution as the one with X˜ if Condition 2.7 holds.

Theorem 3.6 (Edge universality). Let W = X∗X be an (N N) sample covariance matrix with X satisfying Condition 2.6, (2.1) and (2.9) . If Condition×2.7 holds, then we have for all s R ∈ 2/3 2/3 lim P(N (λ1 λ+) s) = lim P(N (λ˜1 λ+) s), (3.14) N N →∞ − ≤ →∞ − ≤ where λ˜1 is the largest eigenvalue of X˜ ∗X˜. Corollary 3.3 (Tracy-Widom law). Under assumptions in Theorem 3.6, we have

2/3 lim P γN (λ1 λ+) s = F1(s), N − ≤ →∞  where γ is defined by 1 1 λc 3 = 1+ φ π(dλ) , γ3 c3 1 λc  Z  −   and F1(s) is the type-1 Tracy-Widom distribution [42]. Remark 3.4. Theorem 3.6 can be extended to the case of joint distribution of the largest k eigenvalues for any fixed positive integer k, namely, for any real numbers s1,...,sk which may depend on N, there exist some positive constants ε,δ > 0 such that for all large N

2/3 ε 2/3 ε δ P(N (λ˜ λ ) s N − ,...,N (λ˜ λ ) s N − ) N − 1 − + ≤ 1 − k − + ≤ k − − P(N 2/3(λ λ ) s ,...,N 2/3(λ λ ) s ) ≤ 1 − + ≤ 1 k − + ≤ k 2/3 ε 2/3 ε δ P(N (λ˜ λ ) s + N − ,...,N (λ˜ λ ) s + N − )+ N − . (3.15) ≤ 1 − + ≤ 1 k − + ≤ k Accordingly, Corollary 3.3 can be extended to the case of joint distribution as follows,

γN 2/3(λ λ ),...,γN 2/3(λ λ ) 1 − + k − + converges to the k-dimensional joint Tracy-Widom distribution. Here we use the term “joint Tracy-Widom distribution” as in Theorem 1 of [41]. The extension (3.15) can be proved by a similar argument to the one in [37]. Hence we do not reproduce the details.

3.5. Sketch of the proof

First, we show Theorems 3.1 which will serve as crucial inputs for the proof of Theorem 3.2, Theorem 3.3, Theorem 3.4, Theorem 3.5 and Theorem 3.6. The proof strategy essentially dates

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 11 back to [22, 30, 37]. We start by studying each entry of the Green function G(z). The general target is to show that each diagonal element of G(z) is close to m(z) and the off-diagonal elements of G(z) are close to 0 under the bounded support q. Before attaining the final goal, our first step is 1/4 to obtain a weaker but still nontrivial version of the local law, i.e. Λ(z) (Nη)− +q. Compared to previous papers e.g. [8, 9, 30, 37] assuming i.i.d. entries in the data≺ matrix, the main difficulty of our work is to deal with dependence among entries in each column xi, i = 1,...,N. Due to the dependence, the usual large deviation bounds for i.i.d. vectors in [8, 9, 30, 37] are no longer applicable. In Section 4, we present the large deviation inequalities (Lemma 4.4) for uniformly spherically distributed random vectors and give their proofs in the Appendix I. Moreover, the radius variable ξi causes extra which is the reason for the introduction of Condition 2.8 as to reduce the variation. Also due to dependence, the strategy in [30] to expand the matrix X along both rows and columns cannot be applied. We tackle this issue by expanding X only along columns and bounding the errors emerging from the finite sample approximation of the Marˇcenko-Pastur equation. Then the weak deformed local law can be achieved by a bootstrapping procedure. Next, the weaker bound is strengthened to

Im m(z) 1 Λ(z) + + q ≺ s Nη Nη via the self-improving steps utilizing a so-called fluctuation averaging argument. This procedure 1 (i) (i) 1 involves estimating the conditional expectation of N i xi∗ Σ(mN Σ+I)− xi. The difficulty ∈I G (i) 1 lies in not only the dependence among each columnP but also randomness in (mN Σ+ I)− . In (i) (i) 1 order to handle these difficulties, we expand xi∗ and (mN Σ+ I)− respectively. It turns out to be several weakly correlated monomials of quadraticG forms with entries of (i) as coefficients. For these Green function entries, we further expand them by resolvent identities.G One can refer to Appendix II for the details. With (3.2) at hand, Theorem 3.2 follows from a standard argument similar to Proposition 9.1 of [10], and the Helffer-Sj¨ostrand argument, see e.g. Theorem 2.8 and Appendix C of [10] or (8.6) of [37]. For the readers’ convenience, we write down the details of the proof of Theorem 3.2 in Appendix III. For Theorem 3.3, we use the Green function comparison method. The strategy follows [37] with a Lindeberg-type column by column replacement due to the dependence within each column of X. The details will be provided in Section 6. The establishment of Theorem 3.4 and Theorem 3.5 is the key step to prove Theorem 3.6. Roughly speaking, we find that the strong average local law holds with larger support and some mild restrictions on the four leading moments of X. Such moment restrictions can be further relaxed to the tail probability Condition 2.7 using the truncation technique, which concludes Theorem 3.6. The main tool is still the Green function comparison method, while the details are put in Sections 7 and 8.

4. Preliminary results

In this section, we present some preliminary results that will be used in the derivation of our main theorems in Sections 5 and 6. Lemma 4.1 is by Shur’s complement formula, whose proof can be found in Lemma 4.2 of [21]. The proof of Lemmas 4.2 and 4.4 are given in Appendix I. Lemma 4.3 is by elementary linear algebra whose proof is omitted.

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 12

Lemma 4.1. Under the above notation, for any T ⊂ I 1 G(T )(z) = , i T, ii −z + zx (iT )(z)x ∀ ∈ I\ i∗G i (T ) (T ) (iT ) (ijT ) G (z) = zG (z)G (z)x∗ (z)x , i, j T,i = j, ij ii jj i G j ∀ ∈ I\ 6 G(T )(z)G(T )(z) G(T )(z) = G(kT )(z)+ ik kj , i, j, k T,i,j = k. ij ij (T ) ∀ ∈ I\ 6 Gkk (z)

Lemma 4.2. Let XN N∞=1 be a sequence of random variables and ΦN be deterministic. Suppose C { } Φ N − holds for large N with some C > 0, and that for all p there exists a constant C N ≥ p such that E X p N Cp . Then we have the equivalence | N | ≤ X Φ EXp Φp for any fixed p N. N ≺ N ⇔ N ≺ N ∈ Lemma 4.3. Let A, B be two matrices with AB well-defined. Then

Tr(AB) A B , | | ≤ k kF k kF AB A B , k k ≤ k kk k AB min A B , A B 6 A B , k kF ≤ {k kF k k k kk kF } k kF k kF A + B 6 A + B , k kF k kF k kF Tr(AB) 6 A Tr B . | | k k | | M 1 Lemma 4.4. Let u = (u1,...,uM )∗, u˜ = (˜u1,..., u˜M )∗ be U(S − ) random vectors, A = (aij ) an M M matrix and b = (b1,...,bM )∗ an M-dimensional vector, where A and b may be complex-valued× and u, u˜, A, b are independent. Then as M → ∞ b 2 b∗u k k , (4.1) | | ≺ r M 1 1 u∗Au TrA A , (4.2) | − M | ≺ M k kF 1 u∗Au˜ A . (4.3) ≺ M k kF

Moreover, if u, u˜, A, b depend on an index t T for some set T , then the above domination bounds hold uniformly for t T . ∈ ∈ Recalling the definition of κ, we then introduce the following two results whose proof can be found in Lemmas A.4 and A.5 of [30]. In particular, the edge regularity condition required in [30] is encompassed in (2.9).

Lemma 4.5. Fix τ > 0. Given assumption (2.9), there exists τ ′ > 0 such that for any z e ∈ D (τ, τ ′,N) we have

√κ + η if E supp(̺), Im m(z) ∈ ≍ η if E/ supp(̺), ( √κ+η ∈ 1+ m(z)σ τ, i 1,...,M . (4.4) | i| ≥ ∀ ∈{ }

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 13

Proposition 4.6. Fix τ > 0. There exists a constant τ ′ > 0 such that z = f(m) is stable at the e e 2 1 edge D (τ, τ ′,N) in the following sense. Suppose δ : D (0, ) satisfies N − δ(z) log− N for z De and that δ is Lipschitz continuous with Lipschitz→ constant∞ N 2. Suppose≤ moreover≤ that for each∈ fixed E, the function η δ(E +ıη) is nonincreasing for η> 0. Suppose that u : De C is the Stieltjes transform of a probability→ measure supported on [0, C]. Let z De and suppose→ that ∈ f(u(z)) z δ(z). | − |≤ If Im z < 1, suppose also that Cδ u m , (4.5) | − |≤ √κ + η + √δ 5 holds at z + ıN − . Then (4.5) holds at z.

5. Proof of the local law

In this section, we prove Theorem 3.1. Theorem 3.2 follows from Theorem 3.1 directly by standard arguments, whose details are put in Appendix III. Firstly, we prove a weaker result. Proposition 5.1 (Deformed weak local law). Suppose Conditions 2.6 and 2.8 as well as (2.1) 1/4 and (2.9) hold. Then there exists a constant τ ′ > 0 depending only on τ such that Λ (Nη)− + e ≺ q uniformly for z D (τ, τ ′,N) with high probability. ∈ For i , define Pi as the operator of expectation conditioning on all (u1,..., uN ) and (ξ ,...,ξ∈) I except u . Denote Q =1 P . Define 1 N i i − i 2 (i) (i) ξi (i) Z := Q (x∗ x )= x∗ x Tr( Σ). i i i G i i G i − M G We observe from Lemma 4.1 that,

2 1 (i) ξi (i) = z zxi∗ xi = z zTr( Σ) zZi. (5.1) Gii − − G − − M G − In the following, we denote 1 = Tr( Σ) Tr( (i)Σ) , i , Ui M { G − G } ∈ I 1 1 = Tr ( zm Σ zI)− Σ Tr( Σ) . V M { { − N − }− G } Note that from (5.1) and the definitions of and , we have Ui V 2 1 2 2 ξi 1 = z + zξi i + zξi z Tr ( zmN Σ zI)− Σ zZi. (5.2) Gii − U V− M { − − }− Before proceeding to prove Proposition 5.1, we provide the following useful lemmas and propo- 1 sitions 5.2 to 5.6, whose proofs are in Appendix II. Recall that Ξ is the event Λ (log N)− . { ≤ }

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 14

Lemma 5.2. (m Σ+ I) 1 1 1 N − x x (i) ( zmN Σ zI)− = (i) ( i i∗ Σ ). G− − − z(1 + x∗ x ) G − N G i i i X∈I G (T ) 2 1 (T ) Lemma 5.3 (Ward identity). Let T such that 0 T C. Then = η− Im Tr . ⊂ I ≤ | |≤ kG kF G Lemma 5.4. For any i ∈ I (i) 1 Tr(G G) η− , | (i) − | ≤ 1 1 Tr( ) z − + η− , | G(i) − G | ≤ | | 2 1 Im Tr( ) η z − + η− . | G − G | ≤ | | Proposition 5.5 (General properties of m). Fix τ > 0. Given (2.1) and (2.9), there exists a constant C > 0 such that 1 m(z) 1, Im m(z) C− η, (5.3) | |≍ ≥ + 1 for all z C satisfying τ z τ − . ∈ ≤ | |≤ Lemma 5.6. Let T be an index set such that 0 T C1 for some constant C1 0 ( T may be empty set). Then ≤ | |≤ ≥ (T ) 1 1(Ξ) + 1(η 1) Gij + 1(Ξ) C, { ≥ }| | G(T ) ≤ ii for some constant C > 0 uniformly for i, j and z D . ∈ I ∈ Now we proceed to prove the weak local law. We start with the next lemma which provides a good control for the error when η 1 or Ξ holds. ≥ Lemma 5.7. Suppose Conditions 2.6, 2.8, (2.1) and (2.9) hold. Then 1(η 1)+ 1(Ξ) ( Z +Λ ) Ψ , (5.4) { ≥ } | i| o ≺ Θ 1(η 1) + 1(Ξ) ( + ) Ψ , (5.5) { ≥ } |V| |Ui| ≺ Θ uniformly for i and z D. ∈ I ∈ Proof. We firstly show (5.4). Applying Lemmas 4.1, 4.3 and (4.3), we obtain that uniformly for z D and i, j with i = j, ∈ ∈ I 6 (i) (ij) (i) 1 (ij) 1(Ξ) G 1(Ξ) z G G x∗ x 1(Ξ) G G ξ ξ Σ . (5.6) | ij |≤ | || ii jj || i G j |≺ | ii jj | i j M k kkG kF Using Lemma 4.1, we obtain that for any k i, j , ∈ I\{ } (i) (i) GkiGij GjiGik G G G G (Gkj )(Gjk ) (ij) (i) kj jk ki ik − Gii − Gii Gkk = Gkk (i) = Gkk (i) − − Gii − Gjj Gjj

GkiGij Gjk Gkj GjiGik GkiGij GjiGik G G + 2 G G kj jk Gii Gii G ki ik − − ii = Gkk (i) − Gii − Gjj

GkiGik Gkj Gjk GkiGij Gjk Gkj GjiGik GkiGij GjiGik = Gkk (i) (i) (i) + (i) . − Gii − − − 2  Gjj Gjj Gii Gjj Gii Gjj Gii 

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 15

Then we have from Lemma 5.6 that

1(Ξ) G(ij) G | kk − kk| G G G G G G G G G G G G G G 1 ki ik kj jk ki ij jk kj ji ik ki ij ji ik (Ξ) | | + | (i) | + | (i) | + | (i) | + | (i) | ≤ Gii G G G G G G G2  | | | jj | | jj ii| | jj ii| | jj ii|  1(Ξ)C(Λ2 +Λ3 +Λ4) 1(Ξ)CΛ2, (5.7) ≤ o o o ≤ o 3 4 2 where the last inequality holds because Λo +Λo Λo for large N given Ξ. Then it follows from (5.7) and Lemma 5.6 that ≤

1(Ξ) Im TrG(ij) Im TrG = 1(Ξ) Im G(ij) Im G | − | kk − kk k i,j k ∈I\{X } X∈I 1(Ξ) (G(ij) G ) + 1(Ξ) Im G + Im G ≤ kk − kk ii jj k i,j ∈I\{X } 2 1(Ξ) CNΛ2 + 1(Ξ)2 Im m(z)+ . (5.8) ≤ o log N We note that (N 2 M) Tr (ij) = − − + TrG(ij). (5.9) G z Applying Lemma 5.3 and (5.9), we have

(ij) 2 Im Tr (ij) Im TrG(ij) (N 2 M) 1(Ξ)kG kF = 1(Ξ) G = 1(Ξ) − − . (5.10) M 2 M 2η M 2η − M 2 z 2 n | | o 1 It then follows from (5.3), (5.8), (5.10) and MN − 1 that ≍ 1 Im m(z)+Θ+Λ2 1(Ξ) (ij) 2 1(Ξ)C o . (5.11) M 2 kG kF ≤ Nη

Using Lemma 5.6, (5.6), (5.11) and the fact ξ 1 uniformly for all i , we have i ≺ ∈ I Im m +Θ+Λ2 1/2 1(Ξ) G 1(Ξ) o . | ij |≺ Nη   Therefore, by the definition of stochastic domination,

Im m +Θ Λo 1(Ξ) Λo 1(Ξ) + 1(Ξ) 1(Ξ)Λo 1(Ξ)ΨΘ. | |≺ s Nη (Nη)1/2 ⇒ ≺

Now we evaluate the bound for Zi. Similarly to (5.11), we can easily derive that uniformly for any i , ∈ I 1 Im m(z)+Θ+Λ2 1(Ξ) (i) 2 1(Ξ) o . (5.12) M 2 kG kF ≺ Nη

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 16

2 It follows from Lemmas 4.4, 5.3, (5.12) and ξi 1 by bounded support assumption for i that ≺ ∈ I 2 (i) ξi (i) 2 1 (i) 1(Ξ)Z = 1(Ξ) zx∗ x z Tr( Σ) 1(Ξ) z ξ Σ i { i G i − M G }≺ | | i M kG kF 2 2 2 σ1 (i) Im m +Θ+Λo 1(Ξ) z ξi F 1(Ξ) . ≤ | | M kG k ≺ s Nη Using the bound 1(Ξ)Λ 1(Ξ)Ψ , we obtain that o ≺ Θ Im m +Θ √Im m +Θ Im m +Θ 1(Ξ)Z 1(Ξ) + 1(Ξ) . i ≺ Nη Nη ≺ Nη s  s Now we show the result when η 1. Let i, j such that i = j. It follows from Lemma 5.6, (5.10) and ξ 1 for i that ≥ ∈ I 6 i ≺ ∈ I (i) (ij) 1(η 1) G 1(η 1) G G x∗ x ≥ | ij |≤ ≥ | ii jj | i G j 1 1(η 1) Σ (ij) ≺ ≥ M k kkG kF Im Tr (ij) 1/2 1(η 1) Σ G ≤ ≥ k k M 2η   Im TrG(ij) N 2 M 1/2 = 1(η 1) Σ − − . ≥ k k M 2η − M 2 z 2  | |  Let T be a subset of such that T C for all large N. From Lemma 5.4, we know that I | |≤ (T ) 1 TrG TrG Cη− . (5.13) | − |≤ It then follows from Proposition 5.5 and (5.13) that

Im Tr (T ) Im TrG(T ) N T M 1(η 1) G = 1(η 1) − | |− ≥ M 2η ≥ M 2η − M 2 z 2   1 | | Im TrG Cη− N T M Im m +Θ 1(η 1) + − | |− =Ψ2 . (5.14) ≤ ≥ M 2η M 2η − M 2 z 2 ≺ Nη Θ  | |  Consequently

Im m +Θ 1(η 1)Λo = 1(η 1) max Gij 1(η 1) . ≥ ≥ i,j ,i=j | |≺ ≥ Nη ∈I 6 s For Z , using Lemma 4.4, (5.14) and ξ2 1 for i , we have i i ≺ ∈ I 2 (i) ξi (i) 2 1 (i) 1(η 1)Z = 1(η 1) zx∗ x z Tr( Σ) 1(η 1) z ξ Σ ≥ i ≥ { i G i − M G }≺ ≥ | | i M kG kF

2 1 (i) Im m +Θ 1(η 1) z ξi Σ F 1(η 1) . ≤ ≥ | | k kM kG k ≺ ≥ s Nη

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 17

Hence (5.4) follows. Next, we will show (5.5). Under Ξ, applying Lemmas 4.1, 4.3, 4.4, 5.3, (5.12) and ξ2 1, we get, for any i , i ≺ ∈ I (i) (i) 1 (i) 1 xi∗ Σ xi i = Tr[( )Σ] = G G(i) |U | M G − G M 1+ x∗ xi i G 1 (i) (i) 1 1 (i) (i) 1 (i) (i) = zG x∗ Σ x zG Tr( Σ Σ) + Σ Σ M ii i G G i ≺ M | ii| M G G M kG G kF 2 2   zG (i)Σ 2 zG (i) 2 Σ 2 ≤ M 2 | ii|kG kF ≤ M 2 | ii|kG kF k k Ψ2 . ≺ Θ Similarly, under Ξ,

1 1 1(Ξ) = 1(Ξ) Tr( zm Σ zI)− Σ Tr Σ |V| M − N − − G 1  1 (mN Σ+ I)− (i) 1 1 (i) 1 (i) =1(Ξ) Tr (x x∗ Σ Σ Σ+ Σ Σ Σ Σ) M z(1 + x (i)x ) i i G − N G N G − N G i i∗ i ! X∈I G 2 (5.15) 1 ξi 1 (i) q 2 1(Ξ) Σ (m Σ+ I)− Σ + +Ψ ≺ M M k kk N G kF √ Θ i N X∈I 1 1 1 (i) q 2 61(Ξ) Σ (m Σ+ I)− Σ + +Ψ . M M k kk N kk G kF √ Θ i N X∈I Since by assumption (2.9) mσ +1 > τ, | i | we have ′ 1(Ξ) 1+ m σ > 1(Ξ)( 1+ mσ m m σ ) > τ > 0. | N | | i| − | − N | i Combining (5.15) we have 1 1 q 1(Ξ) Σ (i) + +Ψ2 Ψ . |V| ≺ M M k G kF √ Θ ≺ Θ (5.16) i N X∈I For η > 1, the procedure is similar and we omit the details. Then the lemma holds. Remark 5.8. In the following proof,we will use two relations several times, 1 1 1(η 1)+ 1(Ξ) (i) Ψ , 1(η 1) + 1(Ξ) Ψ , (5.17) { ≥ }M kG kF ≺ Θ { ≥ }M kGkF ≺ Θ so we summarize them here. With the above results, we can further prove the next lemma. Lemma 5.9. Under the assumptions in Lemma 5.7, one has

1(η 1)+ 1(Ξ) G G Ψ + q, (5.18) { ≥ }| ii − jj |≺ Θ uniformly for i, j and z D. ∈ I ∈

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 18

Proof. We observe from (5.1) that 1 1 Gii Gjj = GiiGjj | − | Gjj − Gii   ξ2 ξ2 6 G G Z Z + G G z i Tr( (i)Σ) j Tr( (j)Σ) | ii jj || i − j | ii jj M G − M G   ξ2 ξ2 (5.19) 6 G G Z Z + G G z i Tr( (i)Σ) + j Tr( (j)Σ) | ii jj || i − j | | ii jj || | M | G | M | G | ξ2  ξ2 ξ2  Z Z + i Tr( (i)Σ (j)Σ )+ i − j Tr( (j)Σ ) ≺ | i − j| M |G − G | M |G | Ψ + q +Ψ2 , ≺ Θ Θ where we used Condition 2.8 and the fact that under Ξ, (i) σ . |Gkk |≍ k Now we can complete the proof of Proposition 5.1. Proof of Proposition 5.1. We observe from (5.18) that 1 1 1 1(Ξ) + 1(η 1) { ≥ } N Gii − mN i n X∈I o 2 1 Gii mN (Gii mN ) = 1(Ξ) + 1(η 1) −2 + − 2 { ≥ }N − m Giim i N N X∈I   2 1 (Gii mN ) = 1(Ξ) + 1(η 1) − 2 { ≥ }N Giim i N X∈I Ψ2 + q2. (5.20) ≺ Θ It then follows from (5.2), Condition 2.8 and (5.20) that

1 1 1 2 2 1(Ξ) + 1(η 1) = 1(Ξ) + 1(η 1) + O (ΨΘ)+ O (q ) { ≥ }mN { ≥ }N Gii ≺ ≺ i X∈I 1 1 = 1(Ξ) + 1(η 1) z 1+ ξ2 + ξ2 { ≥ } − N i Ui N V i  i i  X∈I X∈I  1 2 1 1 z 2 2 (5.21) + ξi Tr (mN Σ+ I)− Σ Zi + O (ΨΘ)+ O (q ) M N { }− N ≺ ≺ i i  X∈I X∈I 1 1 2 = 1(Ξ) + 1(η 1) z + Tr (mN Σ+ I)− Σ + O (ΨΘ)+ O (q ). { ≥ } − N { } ≺ ≺   Since 1 σi Tr (mN Σ+ I)− Σ = , { } mN σi +1 i X∈I it follows from the definition of f(x) in (2.8) that

1(Ξ) + 1(η 1) f(m ) z Ψ + q2. (5.22) { ≥ }{ N − }≺ Θ

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 19

Applying Proposition 4.6, for any ε> 0 we have 2 ΨΘ + q 2 1(η 1) mN m ΨΘ + q . (5.23) ε 2 ≥ | − |≺ √κ + η + N (ΨΘ + q ) ≺ p Therefore, it follows from (5.18), (5.23) and Lemmap 5.7 that 1/2 1(η 1)Λ(z) 1(η 1) max Gii mN + mN m +Λo N − + q. (5.24) ≥ ≤ ≥ { i | − | | − | }≺ The rest proof of Proposition 5.1 follows from a standard bootstrapping step which we summarize into Appendix II-vi and omit further details here. Now we can prove Theorem 3.1. Note that for i , ∈ I 2 1 ξi (i) Qi = Qi z z Tr( Σ) zZi = zZi, (5.25) Gii {− − M G − } − and we write 1 1 (mN Σ+ I)− (i) 1 (i) 1 (i) 1 = Tr (x x∗ Σ Σ Σ+ Σ Σ Σ Σ) V M z(1 + x (i)x ) i i G − N G N G − N G i i∗ i  X∈I G  (5.26) 1 1 1 1 (i) = G Tr( )+ G Tr((m Σ+ I)− Σ( )Σ), M ii Vi M ii N N G − G i i X∈I X∈I where 1 (i) 1 (i) := (m Σ+ I)− (x x∗ Σ Σ Σ). (5.27) Vi N i i G − N G For the second term in (5.26),

1 1 1 (i) G Tr((m Σ+ I)− Σ( )Σ) M ii N N G − G i X∈I (i) (i) 1 1 1 xixi∗ = Gii Tr((mN Σ+ I)− Σ G (Gi) Σ) (5.28) M N 1+ xi∗ xi i G X∈I 1 2 1 (i) 1 (i) 6 G x∗ (m Σ+ I)− x , M | ii| N | i G N G i| i X∈I 2 which can be bounded by ΨΘ by Lemma 4.3, Lemma 4.4, Lemma 5.6 and (5.17). Furthermore, using the same methods one can easily verify that 1 1 1 G Tr( )= G Tr( (i))+ G Tr( (i)) M ii Vi M ii Vi M ii Vi −Vi i i i X∈I X∈I X∈I (5.29) 1 G Tr( (i))+Ψ2 , ≺ M ii Vi Θ i X∈I (i) (i) 1 (i) 1 (i) where i := (mN Σ+ I)− (xixi∗ Σ N Σ Σ). FromV Proposition 5.1, we knowG that− Ξ is trueG with high probability, i.e. 1 1(Ξ). So from now on, we can drop the factor 1(Ξ) in all Ξ dependent results without affecting≺ their validity. To improve the deformed weak local law to the strong local law, a key input is Proposition 5.10 below whose proof we postpone to Appendix II-vii.

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 20

Proposition 5.10 (Fluctuation averaging). Let ν [1/4, 1] and τ ′ be defined in Proposition − ∈ Im m+(Nη) ν +q ν 5.1. Denote Φν = Nη . Suppose moreover that Θ (Nη)− + q uniformly for e ≺ z D (τ, τ ′,N). Thenq we have ∈ 1 1 2 Qi Φν , (5.30) N Gii ≺ i X∈I and 1 Q V Φ2 , (5.31) N i i ≺ ν i X∈I e uniformly for z D (τ, τ ′,N), where V is defined as ∈ i (i) (i) 1 V := x∗ Σ(m Σ+ I)− x . (5.32) i i G N i ε 1/2 ν Proof of Theorem 3.1. Let ε> 0 be an arbitrary small number. Suppose Θ N (q +(Nη)− ) holds with high probability for some ν [1/4, 1] uniformly for z De. The≤ idea is to update ν by applying Proposition 5.10 iteratively.∈ ∈ ε 1/2 ν Let Φν be defined in Proposition 5.10. Given that Θ N (q + (Nη)− ) holds with high probability, it follows from (5.25), (5.28), Proposition 5.10≤ and (5.21) that 1 Im m f(m ) z N ε Φ2 + q2 6 N ε q2 + + , | N − |≤ { ν } { (Nη)ν+1 Nη } holds with high probability uniformly for z De. Then we observe from Lemma 4.5 and Proposition∈ 4.6 that 2 1 Im m q + ν+1 + Θ 6 N ǫ { (Nη) Nη } 2 1 Im m √κ + η + q + ν+1 + { (Nη) Nη } q Im m 1 Im m 6 CN ǫ + q2 + + (5.33) Nη√κ + η s (Nη)ν+1 Nη   Im m 1 6 CN ǫ + q + s Nη (Nη)(ν+1)/2   holds with high probability uniformly for z De. Then using Lemma 5.7 and Lemma 5.9, it is easy to check ∈

ǫ ǫ Im m 1 Λ 6 CN (ΨΘ + q)+Θ 6 CN + q + . s Nη (Nη)(ν+1)/2   One can see that after the self-improving arguments, the error bound of Λ improves from 1/(Nη)ν to 1/(Nη)(ν+1)/2. Hence implementing the auguments a finite number (depending only on ε) of times, we obtain that

1 Im m Λ 6 CN ǫ(q + + ) (5.34) Nη s Nη holds with high probability uniformly for z De. Applying (5.34) in Lemma 4.5, Proposition 5.10 and Proposition 4.6, we conclude Theorem∈ 3.1.

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 21

6. Proof of the edge universality with small support

Once the following Green function comparison Theorem 6.1 holds, Theorem 3.3 will follow from a standard procedure. We only prove Theorem 6.1 in this section while the complete proof of Theorem 3.3 is put in Appendix III. Theorem 6.1 (Green function comparison on the edge). Let XV and XW be defined in Theorem 3.3. Let F : R R be a function whose derivatives satisfy → (k) C1 sup F (x) (1 + x )− C1, k =1, 2, 3, 4, (6.1) x R | | | | ≤ ∈ with some constants C1 > 0. Then there exist ε0 > 0, N0 Z+ depending on C1 such that for any ε<ε and N N and for any real numbers E, E and∈ E satisfying 0 ≥ 0 1 2 2/3+ε E λ , E λ , E λ N − | − +| | 1 − +| | 2 − +|≤ 2/3 ε and η = N − − , we have

V W 1/6+Cǫ EF (Nη Im m (z)) EF (Nη Im m (z)) CN − , z = E + ıη, (6.2) | N − N |≤ and

E2 E2 E V E W 1/6+Cǫ F N Im mN (y + ıη)dy F N Im mN (y + ıη)dy CN − , (6.3) E1 − E1 ≤  Z   Z  V 1 V V 1 where m (z)= N − Tr((X )∗X zI)− , Cǫ is a constant which tends to 0 as ǫ 0. N − → Proof. Let γ 1,...,N +1 and set Xγ to be the matrix whose first γ 1 columns are the same as those∈ of {XW and the} remaining N γ + 1 columns are the same as− those of XV . Then − we note that since Xγ and Xγ+1 only differ in the γ-th column,

(γ) (γ) Xγ = Xγ+1.

We define mN,γ(z) and mN,γ+1(z) to be the analogs of mN (z) with the matrix X replaced by X and X respectively. Similarly, for i , define m(i) (z) and m(i) (z) to be the analogs γ γ+1 ∈ I N,γ N,γ+1 (i) (i) (i) (i) of mN (z) with the matrix X replaced by Xγ and Xγ+1 respectively. Then we have EV F (Nη Im mV (z)) EW F (Nη Im mW (z)) N − N N = EF (Nη Im m (z)) EF (Nη Im m (z)) . N,γ − N,γ+1 γ=1 X n o So (6.2) follows from Lemma 6.1 below. (6.3) follows from an analogous argument. Hence we omit its proof. 2/3+ε Lemma 6.1. Let F be a function satisfying (6.1) and z = E + ıη. If E λ+ N − and 2/3 ε 2/3 | − |≤ N − − η N − for some ε > 0, there exists some positive constant C independent of ε such that ≤ ≤ 7/6+Cε EF (Nη Im m (z)) EF (Nη Im m (z)) N − , (6.4) | N,γ − N,γ+1 |≺ uniformly for γ 1,...,N +1 . ∈{ }

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 22

Proof. Recall the relationship between the eigenvalues of and G, G 1 1 1 φ m = TrG = Tr − , N N N G− z with 1 1 (γ) zGγγ (γ) 2 Tr Tr = x∗ ( ) x . |N G− N G | N γ G γ Here we can assume 1 φ to be 1 after introducing a multiplicative constant. Then | − |

(γ) 1 zGγγ (γ) 2 Ef(Nη Im m (z)) = EF (Nη Im(m (z) + x∗ ( ) x )). N,γ N,γ − Nz N γ G γ

Denoting ∗ yV = ηzG xV ( (γ))2xV , (6.5) γγ γ G γ by the Taylor expansion we obtain that η EF (Nη Im m (z)) = EF (Nη Im m(γ) (z) Im + Im yV ) N,γ N,γ − z η2 =E F (Nη Im m(γ) (z) ) N,γ − z 2 (6.6) h | | 3 2 1 (k) (γ) η V k 4/3+Cǫ + F (Nη Im m (z) )(Im y ) + O (N − ) , k! N,γ − z 2 ≺ Xk=1 | | i

V 1/3+Cǫ where we used the estimation y N − . Then the left-hand side of (|6.4)|≺ reads

EF (Nη Im m (z)) EF (Nη Im m (z)) | N,γ − N,γ+1 | 3 1 η2 = F (k)(Nη Im m(γ) (z) )(Im yV )k | k! N,γ − z 2 Xk=1 | | 3 2 (6.7) 1 (k) (γ) η W k 4/3+Cǫ F (Nη Im m (z) )(Im y ) + O (N − ) − k! N,γ+1 − z 2 ≺ | kX=1 | | 3 2 1 (k) (γ) η V k W k 4/3+Cǫ = F (Nη Im m (z) ) (Im y ) (Im y ) + O (N − ) . | k! N,γ − z 2 − ≺ | k=1 | | X  One can check that by Theorem 3.1 and Lemma 4.5 as well as the choice of η,

η2 Nη Im m(γ) (z) = Nη Im m(z)+ O (1) 1. N,γ − z 2 ≺ ≺ | | Then for k =1, 2, 3, 4, there exists c> 0 such that with high probability

η2 F (k)(Nη Im m(γ) (z) ) N c. (6.8) N,γ − z 2 ≤ | |

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 23

Now the proof of (6.4) is reduced to showing

V k W k 7/6+Cǫ (y ) (y ) N − , (6.9) | − |≺ for k =1, 2, 3. Let 2 (m Gγγ) 1 Gγγ 2m B := −2 = + −2 , (6.10) m Gγγ Gγγ m so by Lemma 5.6 and Theorem 3.1 we have

1 2/3+ǫ B N − . (6.11) | |≺ (Nη)2 ≤ On the other hand, we may write 2 2 2 m /(2m Gγγ ) m m k Gγγ = 2 − = ( B) . (6.12) m /(2m Gγγ )B +1 2m Gγγ −zm Gγγ k 0 − − X≥ − Consequently, we obtain 2 2 m m k (1) 2 y = ηz ( B) xγ∗ ( ) xγ = yk, (6.13) 2m Gγγ −2m Gγγ G k 0 k 0 X≥ − − X≥ where 2 2 m m k (1) 2 y := ηz ( B) x∗ ( ) x . k 2m G −2m G γ G γ − γγ − γγ Then one can check that

2/3 2k/3+ǫ 1/3+2ǫ 1/3 2k/3+Cǫ y N − N − N N − − . (6.14) | k|≺ ≤ Therefore it suffices to prove

3 V k W k 7/6+Cǫ ( y ) ( y ) N − . (6.15)  j − j  ≺ k=1 j 0 j 0 X X≥ X≥   We note that for j 2, yj is sufficiently small, hence it suffices to consider y0,y1 for the ≥ | E| V W following three cases. Now let γ be the conditional expectation with respect to ξγ and ξγ . a) k = 1. It suffices to bound Im(yV + yV ) Im(yW + yW ) . | 0 1 − 0 1 | We observe that E (yV yW ) = 0; (6.16) γ 0 − 0 1 1 ∗ 1 ∗ E V W E xV (γ) 2xV xW (γ) 2xW γ (y1 y1 ) = ηz 2 γ ( + C) γ ( ) γ ( + C) γ ( ) γ − | C Gγγ G − Gγγ G | 2   ηz V 4 W 4 1/2 (γ) (γ) 2 1/2 = E (ξ ) (ξ ) (Σ u∗ u Σu∗ ( ) u Σ ) | C2 γ γ − γ γ G γ γ G γ | 2   ηz 10/12+Cǫ 1/2 (γ) (γ) 2 1/2 N − Σ u∗ u Σu∗ ( ) u Σ ≺ | C2 γG γ γ G γ | 7/6+Cǫ N − , ≺ (6.17)

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 24

where we have used Condition 2.8 and the fact that second moments of ξV , ξW match . b) k = 2. In this case we only need to consider

(Im yV )2 (Im yW )2 . | 0 − 0 | Similarly, we observe that

E (yV )2 (yW )2 γ 0 − 0 2 2  η z V 4 W 4 1/2 (γ) 2 (γ) 2 1/2 (6.18) = E (ξ ) (ξ ) (Σ u∗ ( ) u Σu∗ ( ) u Σ ) | C γ γ − γ γ G γ γ G γ | 3/2+Cǫ  N − . ≺ c) k = 3. In this case we need to bound

(Im yV )3 (Im yW )3 . | 0 − 0 | We observe that

3 3 V 3 W 3 η z V 6 W 6 1/2 (γ) 2 1/2 3 E (y ) (y ) = E (ξ ) (ξ ) (Σ u∗ ( ) u Σ ) | γ 0 − 0 | | C γ γ − γ γ G γ | (6.19)   11/6+Cǫ  N − . ≺ Finally, combining all the results, we see that (6.4) holds. Thus we complete the proof of Lemma 6.1.

7. Proof of Theorem 3.4 and Theorem 3.5

7.1. Proof of Theorem 3.4

We need the next lemma to prove Theorem 3.4.

Lemma 7.1. Suppose ξi’s satisfy the assumptions in Theorem 3.4. Then there exists one matrix 1/2 X˜ = (˜xij ), such that the elements ξ˜i’s satisfy Condition 2.8 with q = O(N − log N), and the first four moments of ξi and ξ˜i match for all i, that is E k E˜k ξi = ξi , k =1, 2, 3, 4. (7.1)

The proof of this lemma can be found in [32]. We note that X˜ satisfies the conditions of Theorem 3.4. Now we process to prove Theorem 3.4. Proof. Note that from Theorem 3.1, X˜ satisfies (3.10). We use the Green function comparison idea to show that (3.10) also holds for X. Since we have the trivial bound

1 max Gij Cη− N, ij ≤ ≤

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 25

c for any X with q N − . Then by Lemma 4.2, it suffices to show that ≤ p p E m m (Nη)− , (7.2) | N − | ≺ c for X with q N − . ≤ 1 Firstly, recall the relationship (2.3). For simplification, we denote mM := N − Tr and m := 1 G m + (1 φ)z− , so − 1 m = m + (1 φ)z− . M N − Then it is equivalent to showing that

p p E m m (Nη)− . (7.3) | M − | ≺ For γ = 0, ,N, let X be the matrix whose first γ columns are the same as those of X ··· γ and the remaining N γ columns are the same as those of X˜ with entries σ ξ˜ u , where ξ˜ ’s − i j ij j satisfy the assumptions in Lemma 7.1. Then X0 = X˜ and XN = X. Denote Gγ , γ as the Green 1 (γ) G(γ) functions of Xγ∗Xγ and Xγ Xγ∗ respectively, and mM,γ = N − Tr γ. γ and mM,γ are defined (γ) G G similarly with Xγ . The resolvent expansion gives

(γ) (γ) = x x∗ . Gγ Gγ − Gγ γ γ Gγ Consequently, by m = 1 Tr , we may write M,γ N Gγ

(γ) 1 2 (γ) m m = ξ r∗ r . (7.4) M,γ − M,γ −N γ γ Gγ Gγ γ Similarly, (γ) 1 ˜2 (γ) mM,γ 1 mM,γ 1 = ξγ rγ∗ γ 1 γ 1rγ . − − − −N G − G − p p/2 p/2 We note that m m = (m m) (m∗ m∗) for any even integer p > 0. In the | M − | M − M − following of this proof, we slightly abuse the notation by ignoring the conjugate in mM and m for simplicity. We shall see that this will not affect the validity of our result. ∗ When γ = 0, from Theorem 3.1 and the assumptions on X˜, it is clear that

1 p p m m (Nη)− , E m m (Nη)− . (7.5) | M,0 − |≺ | M,0 − | ≺ p p The target is to show that E(mM,N m) (Nη) . Actually, in the proof below, we use the | − |≺ p deterministic form of the bound in (7.3), that is, we choose ǫ> 0 such that E(mM,N m) (Nη)pN ǫ. | − |≤ (γ) (γ) Note that γ = γ 1, and G G − (γ) (γ) (γ) rγ∗ γ γ rγ rγ∗ γ γ rγ = G G . 2 (γ) G G 1+ ξ r γ r γ γ∗ G γ

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 26

It’s not hard to see that the local law also holds for Gγ , then by large deviations bounds

(γ) 2 (γ) 1 (γ) 1 (γ) 2 r∗ r σ u∗ u Tr( ) + Tr( ) C, | γ Gγ γ |≤ 1 | γGγ γ | ≺ |M Gγ | |M Gγ | ≤ 2 1 (γ) (γ) σ1 (γ) (γ) 1 1 (γ) (γ) 1 (γ) (γ) 2 r∗ r u∗ u Tr( ) + Tr( ) |N γ Gγ Gγ γ |≤ N | γGγ Gγ γ |≺ N |M Gγ Gγ | |M Gγ Gγ |   1 1 1 (γ) 2 , ≤N M kGγ kF ≺ Nη where we use (γ) 1 1 1 Im Tr γ 1 N Im m + NΘ N M q + √η 1 (γ) 2 = M( G )= M( − ) . N M kGγ kF N M 2η N M 2η − M 2 z 2 ≺ Nη ≤ Nη | | Using Taylor’s expansion

k+1 2 (γ) k 1 1 (ξ φ)r∗ γ rγ = − γ − γ G . 2 (γ) (γ) k! 1+ ξγ rγ∗ γ rγ k 0 1+ φrγ∗ γ rγ   G X≥ G (γ) 1 2 2 and the fact that 1+ φr∗ γ r − C, ξ = ξ φ + φ, the RHS of (7.4) can be written as | γ G γ | ≤ γ γ − (ξ2 φ)kA , γ − k,γ k 0 X≥ where A is independent of ξ and for any k 0, k,γ γ ≥ p p ǫ E(A ) (Nη)− N . | k,γ |≤ Similarly, we can write (γ) ˜2 k mM,γ 1 mM,γ 1 = (ξγ φ) Ak,γ 1, while Ak,γ 1 = Ak,γ . (7.6) − − − − − − k 0 X≥ Hence, we can write E(m m)p M,γ − p k (γ) p p (γ) p k 2 k1 (7.7) =E(m m) + E (m m) − (ξ φ) A , M,γ − k M,γ − γ − k1,γ k=1    k1 0  X X≥ and similarly, p E(mM,γ 1 m) − − p p k (7.8) E (γ) p E (γ) p k ˜2 k1 = (mM,γ 1 m) + (mM,γ 1 m) − (ξγ φ) Ak1,γ . − − k − − − k=1    k1 0  X X≥ (γ) (γ) (γ) ˜ E 2 k We claim the fact that mM,γ 1 = mM,γ, mM,γ is independent of ξγ and ξγ , (ξγ φ) = − − E(ξ˜2 φ)k for k 2, and for any k 3, γ − ≤ ≥ 2 k 1 k 2 2 k 1 k 2 E(ξ φ) N − log Nq − , E(ξ˜ φ) N − log Nq − . γ − ≤ γ − ≤

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 27

Then, comparing (7.7) and (7.8), we infer from the Cauchy-Schwartz inequality that

p k E p E p p (Nη)− E (γ) 2(p k) 1/2 (mM,γ m) (mM,γ 1 m) + 1+c ǫ (mM,γ 1 m) − . (7.9) | − | ≤ | − − | k N − | − − | kX=1   Moreover, we know that E (γ) 2(p k) E (γ) 2(p k) (mM,γ 1 m) − = (mM,γ 1 mM,γ 1 + mM,γ 1 m) − | − − | | − − − − − | 2(p k) − 2(p k) E (γ) l 2(p k) l = − ((mM,γ 1 mM,γ 1) (mM,γ 1 m)) − − l | − − − − − | Xl=0   2(p k) 1/2 1/2 − (7.10) 2(p k) E (γ) 2l E 4(p k) 2l − (mM,γ 1 mM,γ 1) (mM,γ 1 m) − − ≤ l | − − − | | − − | Xl=0      2(p k) 1/2 − cpǫ l 4(p k) 2l CpN (Nη)− E(mM,γ 1 m) − − ≤ | − − | Xl=0   for some constants Cp and cp, where the last inequality is by (7.6). We then use (7.9) and (7.10) to complete the induction. For γ = 0, we already know that

p p ǫ (1) p p ǫ E(m m) (Nη)− N , E(m m) (Nη)− N . | M,0 − |≤ | M,0 − |≤ Then by (7.9) it’s easy to see that for γ = 1,

p 1 p ǫ E(m m) 1+ (Nη)− N . | M,1 − |≤ N 1+c/2   p Now assume that for some γ 1, there exists constant a > 0 such that E(mM,γ 1 m) 1 c/2 a p ǫ ≥ | − − | ≤ (1 + N − − ) (Nη)− N for any fixed p. By (7.10), a (γ) 2(p k) 1 2(p k) cpǫ E(m m) − C 1+ (Nη)− − N (7.11) | M,γ 1 − |≤ p N 1+c/2 −   for some constants Cp and cp. Note that ǫ is arbitrary small, so plug (7.11) into (7.9) to obtain a+1 p 1 p ǫ E(m m) 1+ (Nη)− N . | M,γ − |≤ N 1+c/2   Then by induction, N p 1 p ǫ p 2ǫ E(m m) 1+ (Nη)− N (Nη)− N , | M,N − |≤ N 1+c/2 ≤   for arbitrary small ǫ> 0. A similar but more complicated procedure can lead to p p ǫ E m m (Nη)− N , | M,N − | ≤ and the theorem follows from Chebyshev’s inequality. The other conclusions in Theorem 3.4 can be obtained by the standard procedure used in the proof of Theorem 3.2. So we omit details.

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 28

7.2. Proof of Theorem 3.5

We note that X˜ in Lemma 7.1 satisfies the desired edge universality according to Theorem 3.3. Thus if we can prove the following lemma, then Theorem 3.5 follows immediately. Lemma 7.2. Let X and X˜ be two matrices in Lemma 7.1. Then there exist constants ǫ,δ > 0 such that, for any s R ∈ X˜ 2/3 ǫ δ X 2/3 P (N (λ1 λ+) s N − ) N − P (N (λ1 λ+) s) − ≤ − − ≤ − ≤ (7.12) X˜ 2/3 ǫ δ P (N (λ λ ) s + N − )+ N − ≤ 1 − + ≤ where PX and PX˜ are the laws of X and X˜, respectively. Most of the proof of Lemma 7.2 is the same as the one of Theorem 3.3. We only write down the Green function comparison part, which is slightly different from before but simpler since we have the first four moments at this time. Theorem 7.1. Let X and X˜ be two matrices in Lemma 7.1. Suppose F : R R is a function whose derivatives satisfy →

(l) C2 sup F (x) (1 + x )− C2, l =1, 2, 3 x R | | | | ≤ ∈ with some constant C2 > 0. Then for any sufficiently small constant ǫ > 0 and for any real numbers E, E1 and E2 satisfying

2/3+ǫ E λ , E λ , E λ N − | − +| | 1 − +| | 2 − +|≤ 2/3 ǫ and η = N − − , we have

c1+Cǫ EF (Nη Im m (z)) EF (Nη Imm ˜ (z)) N − , z = E + ıη, (7.13) | N − N |≤ and

E2 E2 c1+Cǫ EF N Im mN (y + ıη)dy EF N Imm ˜ N (y + ıη)dy N − , (7.14) E1 − E1 ≤  Z   Z  where c is a positive constant and C 0 as ǫ 0. 1 ǫ → → Proof. We only prove the first inequality and the second one follows from similar arguments. The beginning part is the same as before. We split

EF (Nη Im m (z)) EF (Nη Imm ˜ (z)) N − N N = EF (Nη Im mN,γ(z)) EF (Nη Im mN,γ 1(z)) . − − γ=1 X n o We will prove that

1 c+Cǫ EF (Nη Im mN,γ(z)) EF (Nη Im mN,γ 1(z)) N − − . (7.15) | − − |≺

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 29

(γ) (γ) Use the resolvent expansion that γ = γ γ xγ xγ∗ γ and the relationship mM,γ = 1 G G − G G m + (1 φ)z− , we have N − (γ) (γ) ηx∗ γ γ x (γ) 1/3 ǫ γ G G γ Nη Im mN,γ(z)= Nη Im mM,γ(z)+(1 φ)N − − Im (γ) . − − 1+ x γ x γ∗ G γ We further expand the last term (ignoring Im) of the last identity to

k (γ) k+1 2 (ξγ φ)rγ∗ γ rγ 2 (γ) (γ) 1 η(ξ φ + φ)r∗ r − − G γ − γ Gγ Gγ γ (γ)  k!  k 0  1+ φrγ∗ γ rγ    X≥ G (γ) (γ) (γ) (γ) 2 (γ) 2 (γ) (γ) ηφrγ∗ γ γ rγ ηφrγ∗ γ γ rγ (ξγ φ)rγ∗ γ rγ η(ξγ φ)rγ∗ γ γ rγ = G G G G − G + − G G + Rγ (γ) (γ) 2 (γ) 1+ φr γ r − (1 + φr γ r ) 1+ φr γ r γ∗ G γ γ∗ G γ γ∗ G γ :=Aγ + Bγ + Cγ + Rγ , where Aγ is independent of ξγ , EBγ = ECγ = 0. We have already known that

(γ) (γ) (γ) (γ) 1 ηr∗ r q, r∗ r C, 1+ φr∗ r − C, γ Gγ Gγ γ ≺ γ Gγ γ ≤ | γ Gγ γ | ≤ 2 2k 1 E(ξ φ) N − log N, k 1. γ − ≤ ≥ i 1 ǫ 1 c+2ǫ Hence, E R q N − log N N N − − , i =1, 2. Then | γ | ≤ × × ≤ (γ) 1/3 ǫ F (Nη Im m (z)) F Nη Im m (z)+(1 φ)N − − Im A N,γ − M,γ − − γ (1) (γ)  1/3 ǫ  = F Nη Im m (z)+(1 φ)N − − Im Aγ Im(Bγ + Cγ + Rγ ) (7.16) − M,γ − − × 1   + F (2)(ψ) Im2(B + C + R ), 2 γ γ γ

(γ) 1/3 ǫ where ψ is some number between Nηm (z) and Nη Im m (z)+(1 φ)N − − Im A . N,γ M,γ − − γ By the local law and large deviation bounds, Aγ q + √η 0. Furthermore, we observe that from Theorem 3.4, ≺ → 1 Nη Im m (z) Nη(Im m(z) + (Nη)− ) C, N,γ ≺ ≤ which implies

(1) (γ) 1/3 ǫ (2) F Nη Im m (z)+(1 φ)N − − Im A 1, F (ψ) 1. (7.17) M,γ − − γ ≺ | |≺  

Therefore,

(γ) 1/3 ǫ E F (Nη Im m (z)) F (Nη Im m (z)+(1 φ)N − − Im A ) N,γ − M,γ − − γ   C(E R + E R 2 + E B 2 + E C 2) ≤ | γ | | γ | | γ | | γ | 1 ǫ 1 c+2ǫ q N − log N N N − − . ≤ × × ≤

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 30

Similarly, we can prove that

E (γ) 1/3 ǫ 1 F (Nη Im mN,γ 1(z)) F (Nη Im mM,γ 1(z)+(1 φ)N − − Im Aγ 1) 1+c 2ǫ . − − − − − − ≤ N −   (γ) (γ) Note that mM,γ 1(z)= mM,γ(z) and Aγ 1 = Aγ , which conclude (7.15) and the Theorem holds. − −

8. Proof of Theorem 3.6

Suppose the matrix X satisfies Condition 2.6 and Condition 2.7. We can write the sample covariance matrix as N 2 = XX∗ = ξ r r∗, (8.1) W i i i i=1 X 1/2 where ri =Σ ui. For any fixed ǫ> 0, define 2 1 ǫ α := P( ξˆ M >N − ). (8.2) N | i − | Using Condition 2.7, we can see that for any δ > 0 and large enough N,

1+2ǫ α δN − . (8.3) N ≤ 2 s l Let ρ(x) be the distribution of ξi . Then we define independent random variables ζi , ζi and c , 1 i N in the following ways: i ≤ ≤ s 1. ζi has distribution density ρs(x), where

ǫ ρ(x) ρs(x) := 1 x φ N − ; (8.4) | − |≤ 1 αN   − l 2. ζi has distribution density ρl(x), where

ǫ ρ(x) ρl(x) := 1 x φ >N − ; (8.5) | − | αN   3. c is a Bernoulli 0 1 random variable with P(c =1)= α and P(c =0)=1 α . i − i N i − N It is easy to check

d ξ2 = ζs(1 c )+ ζlc , (8.6) i i − i i i therefore we may write

N N 2 s l = ξ r r∗ = ζ (1 c )+ ζ c r r∗ (8.7) W i i i i − i i i i i i=1 i=1 X X   We observe that s 2 1 E ζ φ = O(N − log N), (8.8) | i − |

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 31

s so ζi satisfies the assumptions in Theorem 3.5. We conclude that for the matrix

N s ˜ := ζ r r∗, W i i i i=1 X there exist constants ǫ,δ > 0 such that for any s R, ∈ G 2/3 ǫ δ W˜ 2/3 P (N (λ1 λ+) s N − ) N − P (N (λ1 λ+) s) − ≤ − − ≤ − ≤ (8.9) G 2/3 ǫ δ P (N (λ λ ) s + N − )+ N − , ≤ 1 − + ≤ where PG denotes the law for a Gaussian covariance matrix and PW˜ denotes the law for ˜ . Now we write the right-hand side of (8.7) as W

N N s l s s = (ζ + (ζ ζ )c )r r∗ := (ζ + R c )r r∗, W i i − i i i i i i i i i i=1 i=1 X X l s where Ri := ζi ζi . We aim to show that the Rici terms have negligible effects on λ1. Define the corresponding− matrix as N c R := Riciriri∗. i=1 X s l Note that ci is independent of ζi and ζi . In order to understand the spectral behavior of this matrix, we first introduce the following event

A := ♯ i : c =1 N 5ǫ . { { i }≤ }

Since ci’s are independent and identically distributed Bernoulli random variables, by Bernstein’s inequality it is easy to check

P(A) 1 exp( N ǫ). (8.10) ≥ − − 5ǫ 5ǫ Without loss of generality, we will assume that ci = 0 for i>N and ci = 1 for i N . On the other hand, by Condition 2.7, we have ≤

l ω 2 2 P( R ω) P( ζ )= P( ξˆ M ωN)= o(N − ), (8.11) | i|≥ ≤ | i |≥ 2 | i − |≥ for any fixed constant ω > 0. Hence, the event

A max Ri ω { i | |≤ } \ happens with probability approaching to 1. Hereafter, we will focus on this event. Define

N 5ǫ (λ)= λI ˜ + t R r r∗ , t [0, 1]. Wt − W i i i ∈ i=1  X 

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 32

˜ N 5ǫ In fact, by taking ω sufficiently small, the eigenvalues of + t i=1 Ririri∗ are continuous in t. 3/4W Next, we aim to prove that for λ = µ := λ ( ˜ ) N − , 1 W ± P P det( (µ)) =0, t [0, 1] =1 o(1). (8.12) Wt 6 ∀ ∈ −   ˜ N 5ǫ If (8.12) holds, by continuity we know that the largest eigenvalue of + t i=1 Ririri∗ will not 3/4 W cross the boundary λ1( ˜ ) N − . Hence λ1( ) is sticking to λ1( ˜ ) with a rate smaller than 3/4 W ± W W P N − , which concludes the theorem. Now we prove (8.12). We know that the eigenvalues of GOE are separated at the scale of 2/3 N − , so by (8.9), 3/4 P( λ ( ˜ ) µ N − )=1 o(1). | k W − |≥ − Therefore, µ is not an eigenvalue of ˜ , and W N 5ǫ s det( (µ)) = det(µ ˜ ) det 1 t R r r∗ (µ) , Wt − W − i i i G i=1  X  s 1 s where (z) = ( ˜ z)− is the Green function. Hereafter, we ignore the superscript s in (z) G W− 1+δ G for simplicity. Let z = λ+ + ıN − for some δ > 0. Then

N 5ǫ N 5ǫ N 5ǫ 1 t R r r∗ (µ)=1 t R r r∗ (µ) (z) t R r r∗ (z). (8.13) − i i i G − i i i G − G − i i i G i=1 i=1 i=1 X X  X Note that for each i,

s (i) (i) (i) ζi riri∗ (z)riri∗ (z) r r∗ (z)= r r∗ (z)+ G G , i i G i i G 1+ ζsr (i)(z)r i i∗G i while

(i) 1 r∗ (z)r mTrΣ i G i − M

(i) 1 (i) 1 (i) 1/6+ǫ r∗ (z)r Tr (z)Σ + Tr( (z) m)Σ N − . ≤ i G i − M G M G − ≺

(i) 1 Therefore, we can replace r∗ (z)r with M − mTrΣ and write i G i N 5ǫ N 5ǫ 1 t R r r∗ (z) max R r r∗ (z) Cω r r∗ , (8.14) i i i G ≤ | i| i i G ≤ k i i k≤ 10 i=1 i=1 i X X X where we have used the fact that w can be sufficiently small and (z) C by Theorem 3.2. Then, it remains to consider the second surm in (8.13). kG k ≤ Let βα be the eigenvector of ˜ corresponding to the α-th eigenvalue λα. Note that for any 1+δ W λ > τ and z∗ = λ + ıN − , we have r∗ (z∗)r C with high probability. Moreover, α α | i G i|≤ 2 < ri,βj > Im z∗ 2 1 2 Im r∗ (z∗)r = (Im z∗) < r ,β > = (Im z∗)− < r ,β > . | i G i| λ z 2 ≥ λ z 2 i α i α j j ∗ α ∗ X | − | | − |

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 33

2 1 Since δ can be arbitrary small, we have < r ,β > N − for any α satisfying λ > τ. Let α∗ be i α ≺ α the largest α satisfying this condition, so by the eigenvalue rigidity we have α∗ kN for some 2/3 ≍ ǫ constant k. The eigenvalue rigidity also implies that λα λ+ (α/N) for any α N . 2/3 − ≍ ≥ Recall z = λ+ + ıN − , so for each i,

2 1 1 r∗ (µ) (z) r = < r ,β > i G − G i i α µ λ − z λ α α α  X − − η (1 + o(1))η2 < r ,β >2 + . ≤ i α (λ λ )2 + η2 λ µ (λ λ )2 + η2 α + α α + α X  − | − || − | Firstly,

η (1 + o(1))η2 r 2 < i,βα > 2 2 + 2 2 (λ+ λα) + η λα µ (λ+ λα) + η α N ǫ   X≤ − | − || − | ǫ 1 2/3+ǫ 3/4+ǫ 1/4+2ǫ N N − (N + N ) N − . ≺ ≤ Secondly,

η (1 + o(1))η2 r 2 < i,βα > 2 2 + 2 2 ∗ (λ+ λα) + η λα µ (λ+ λα) + η N ǫ α α   ≤X≤ − | − || − | 4/3 5/3+ǫ α − 1/3+ǫ 4/3 1/3+2ǫ N − N − α− N − . ≺ N ≤ ≤ N ǫ α α∗   1 α kN ≤X≤ ≤X≤ Lastly,

η (1 + o(1))η2 < r ,β >2 + i α (λ λ )2 + η2 λ µ (λ λ )2 + η2 α>α∗ + α α + α X  − | − || − | η 2 η 2 2/3+ǫ < r ,β > r N − . ≤(λ τ)2 i α ≤ (λ τ)2 k ik ≤ + α + − X − Therefore, we have N 5ǫ 1/4+7ǫ t R r r∗ (µ) (z) N − . (8.15) i i i G − G ≺ i=1 X 

Combining (8.13), (8.14) and (8.15), with probability approaching to 1 we have

N 5ǫ det 1 t R r r∗ (z) =0, t [0, 1], − i i i G 6 ∀ ∈ i=1  X  which concludes the theorem.

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 34

In the following appendices, we provide the proofs of some lemmas and results which are omitted in the main text.

Appendix I: Proof of results in Section 4. i. Proof of Lemma 4.2

Proof. If EXp Φp , then for any ε> 0 we get from Markov’s inequality that N ≺ N E p P ε XN 1 ( XN >N Φ) | εp |p ε(p 1) . | | ≤ N Φ ≤ N −

Choosing p large enough (depending on ε) proves the “ ” part. Conversely, if XN ΦN , then for any D> 0 we get ⇐ ≺

EX E X = E X 1( X N εΦ )+ E X 1( X >N εΦ ) | N |≤ | N | | N | | N |≤ N | N | | N | N ε 2 ε ε C2/2 D/2 N Φ + E X P( X >N Φ ) N Φ + N − . ≤ N | N | | N | N ≤ N C p p Using ΦN N − and choosing D large enough, we obtain the “ ” part for p = 1. The same ≥ ⇒ p p implication for arbitrary p follows from the fact that XN ΦN implies XN ΦN for any fixed p. ≺ ≺ ii. Proof of large deviation bounds in Lemma 4.4

Proof. Let k = σ u1,...,uk be the σ-algebra generated by u1,...,uk. In particular, 0 is the trivial σ-algebra,F i.e.,{ E( )} is the unconditional expectation. For k = 1,...,M 1,F we see ·|F0 − by symmetry that conditioned on , (u ,...,u )′ follows the uniform distribution on the Fk k+1 M (M k)-dimensional sphere with radius 1 k u2, namely, − − i=1 i q P k 2 1/2 M k (u ,...,u )′ U 1 u S − . (I.1) k+1 M |Fk ∼ − i i=1  X   Define the martingale difference sequence

M sk = bi E(ui k) E(ui k 1) . { |F − |F − } i=1 X

A direct observation from (I.1) is that sk = bkuk. Then it follows from Theorem V.1 in Appendix V and the Burkholder inequality [11] that for any positive integer q, there exists a constant Cq > 0

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 35 such that

M 2q M q M q 2q 2 2 2 E b∗u = E s C E s C E b u | | k ≤ q | k| ≤ q | k| k k=1  k=1   k=1  X X X = C b 2 b 2E(u2 u2 ) q | k1 | · · · | kq | k1 ··· kq 1 k1 kq M n ≤ X··· ≤ o M C C q b 2 q q b 2 b 2 = q b 2 = C k k . ≤ M q | k1 | · · · | kq | M q | k| q M 1 k1 kq M k=1 ≤ X··· ≤  X    Then (4.1) follows from that for any q Z , ∈ + 2 E 2q ε b b∗u Cq P( b∗u >M k k ) | | . b 2 q 2εq | | M ≤ 2εq ≤ M r M kMk   To show (4.2), we first show that

M 1 1 M a (u2 ) a 2. (I.2) kk k − M ≺ M v | kk| k=1 uk=1 X uX t We construct the martingale difference sequence as

M E 2 E 2 k := aii (ui k) (ui k 1) . N |F − |F − i=1 X   Note that M = M 1, and F F − k 1 E(u2 )= (1 u2), i k +1. i |Fk M k − l ∀ ≥ − Xl=1 Therefore,

M M 2 E 2 E 2 k =akkuk + aii (ui k) aii (ui k 1) N |F − |F − i=Xk+1 Xi=k M k 1 a 1 − = a ii u2 (1 u2) kk − M k k − M k +1 − l  i=Xk+1 −  − Xl=1  M M aii 2 1 1 2 1 = a u M − (u M − ) kk − M k k − − M k +1 l −  i=Xk+1 −  − Xl=k  M k 1 aii 2 1 1 − 2 1 = a u M − + (u M − ) kk − M k k − M k +1 l −  i=Xk+1 −  − Xl=1  :=Akνk.

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 36

Use the Burkholder inequality to obtain that

M M 1 q E a (u2 ) 2q = E 2q C E 2 | kk k − M | | Nk| ≤ q Nk kX=1 Xk=1  Xk  C A2 A2 E ν2 ν2 ≤ q k1 ··· kq k1 ··· kq 1 k1,...,kq M   ≤ X ≤ C q C log M q q A2 q a2 , ≤M 2q k ≤ M 2q kk  Xk   Xk  where the third line is by Theorem V.1. So (I.2) holds. In order to complete the proof of (4.2), we will prove that

M 1 a u u a 2. (I.3) jk j k ≺ M | jk| j

We begin with the martingale difference sequence

l := ajk E(uj uk l) E(uj uk l 1) . Y |F − |F − Xj

E(u u )= E(u u )+ E(u u )+ E(u u ) j k|Fl j k|Fl j k|Fl j k|Fl j

M q q E a u u 2q C E 2 C E u2( a u )2 | jk j k| ≤ q Yl ≤ q l jl j Xj

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 37 which concludes (I.3). Therefore, (4.2) follows from

M 1 2 1 u∗Au trA = a (u )+ a u u | − M | | kk k − M jk j k| k=1 j=k X X6 1 M 1 a 2 + a 2 ≺ √ v | kk| √ | jk| 2M uk=1 2M j=k uX sX6 t 1 M a 2 + a 2 ≤ M v | kk| | jk| uk=1 j=k uX X6 1 t = A . M k kF

Denote the ith column of A as A i. Finally (4.3) follows from (4.1) conditioned on u, ·

2 u∗A u∗Au˜ k k , | |≺ r M and M M 2 2 2 A i 1 2 1 2 u∗A = u∗A i k · k = aij = A F , k k | · | ≺ M M M k k i=1 i=1 i,j X X X which concludes the lemma.

Appendix II: Proof of the results in Section 5. i. Proof of Lemma 5.2

Proof. We recall that 1 = x x∗ zI − . G i i − i X∈I  It follows from the resolvent identity that

1 1 ( zm Σ zI)− = ( zm Σ zI)− zm Σ x x∗ . (II.1) G− − N − − N − − N − i i G i X∈I  Using the Sherman-Morrison formula (see, e.g., (2.2) of [40] or Lemma V.1 in Appendix V, we have (i) xixi∗ x x∗ = G . (II.2) i i G 1+ x (i)x i∗G i

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 38

Using (II.1), (II.2) and (5.1), we have

1 1 1 ( zmN Σ zI)− Σ ( zmN Σ zI)− = − − (i) G G− − − N 1+ x∗ x i i i X∈I G 1 (i) ( zmN Σ zI)− xixi∗ − − (i) G − 1+ x∗ x i i i X∈I G (m Σ+ I) 1 1 N − x x (i) = (i) i i∗ Σ , z(1 + x∗ x ) G − N G i i i X∈I G  which concludes the lemma. ii. Proof of Lemma 5.3

Proof. Let λ˜ and v˜ be the k-th largest eigenvalue of (T ) and the eigenvector corresponding k k W to λ˜k respectively for k =1,...,M. Denote v˜k(i) as the i-th entry of v˜k. We observe that

(T ) 2 (T ) (T ) = ( )∗ |Gij | G G ii i,j 1,...,M i 1,...,M ∈{X } ∈{X }  M M M v˜k (i)v˜∗ v˜k v˜∗ (i) = 1 k1 2 k2 ˜ ˜ i=1 λk1 z λk2 z∗ X n kX1=1 −  kX1=1 − o M M M v˜k(i)v˜k∗(i) 1 1 1 (T ) = = η− Im = η− Im Tr , ˜ 2 ˜ G i=1 λk z λk z X Xk=1 | − | kX=1  −  which concludes the lemma. iii. Proof of Lemma 5.4

Proof. The first inequality follows from Theorem A.6 of [5]. To show the second and the third inequalities, we observe that N 1 M N M 1 Tr( (i) )= − − − + Tr(G(i) G)= + Tr(G(i) G), G − G z − z − −z − so the results follow. iv. Proof of Proposition 5.5

1 Proof. Following Lemma 1 of [26], taking imaginary part and multiplying (Im m)− on both sides of z = f(m), we get 1 xπ(dx) Im z φ = > 0. m 2 − 1+ xm 2 Im m | | Z | |

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 39

Hence 1 xπ(dx) > φ . (II.3) m 2 1+ xm 2 | | Z | | By (II.3) and the Cauchy-Schwartz inequality, we obtain from z = f(m) that 1 φ φ π(dx) m = + | | −z z − z 1+ xm Z 2 1/2 1/2 1 φ φ x π(dx ) 2 < | − | + x− π(dx) z z 1+ xm 2 | | | | Z | |  Z  1/2 1 φ √φ 2 < | − | + x− π(dx) . (II.4) z z m | | | || | Z  This implies that 1/2 2 1 φ m √φ 2 m | − || | x− π(dx) < 0. | | − z − z | | | | Z  Some basic calculations yield

2 1/2 1 φ + z 1 φ +4 z √φ x− π(dx) m | − | | || − | | | . | |≤ q 2 z | | R  2 Since supp(π) is uniformly bounded away from 0 for all N, x− π(dx) is uniformly bounded. Then from (II.4), we have sup sup m C, R z [τ,τ −1] N | |≤ | |∈ for some constant C > 0. − Suppose inf z [τ,τ 1] infN m = 0. Then we can choose a sequence zN N∞=1 τ z 1 | |∈ | | { } ⊂ { ≤ | | ≤ τ − such that m(z ) 0 as N . From z = f(m), we have for all N } N → → ∞ xm(z )π(dx) z m(z )+1= φ N , N N 1+ xm(z ) Z N which implies that with probability 1, as N , → ∞ xm(z )π(dx) φ N 1. 1+ m(z ) → Z N However one can see xm(z )π(dx) xπ(dx) φ N m(z ) φ 0, 1+ xm(z ) ≤ | N | 1+ xm(z ) → Z N Z N | | which is a contradiction. Therefore we have

inf inf m > 0. z [τ,τ −1] N | | | |∈ Finally,

Im z 2 1 Im m = m Im z C− η, 1 xπ(dx) ≥ | | ≥ m 2 φ 1+xm 2 | | − | | which concludes the lemma. R

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 40 v. Proof of Lemma 5.6

1 Proof. Given the event Ξ, Gii is within log− N distance to m uniformly for i . Since m 1, it then follows that ∈ I | |≍ 1(Ξ)G 1. ii ≍ Next, it follows from Lemma 4.1 and the definition of Ξ that

2 (k) GikGkj 1 log− N 1(Ξ) Gij Gij + δij m + log− N + 1 , | | ≤ | | | Gkk |≤ | | m log− N | |− 2 (k) GikGkj 1 log− N 1(Ξ) Gij Gij δij m log− N 1 , | | ≥ | | − | Gkk |≥ | |− − m log− N | |− (k) 1 which implies that given Ξ, Gij is within 2 log− N distance to m uniformly for all i, j, k such that i, j = k. ∈ I Applying this6 argument inductively, we conclude that for any index set T such that T C , | |≤ 1 given Ξ, there exists a constant C2 > 0 such that

(T ) 1 1(Ξ) G δ m C log− N, (II.5) | ij − ij |≤ 2 uniformly for i, j T . Consequently, it follows from Proposition 5.5 that there exists some constant C > 0 such∈ I\ that (T ) 1 1(Ξ) Gij + 1(Ξ) C. (II.6) | | G(T ) ≤ ii (T ) (T ) 1 (T ) Let G = V (L zI)− V ∗ be the eigen-decomposition of G where V = (v ,..., v ) − 1 N with orthonormal columns v ,..., v is an orthogonal matrix and L(T ) = diag(λ(T ), , λ(T ) ). 1 N 1 ··· N T Then we see that for any i, j , −| | ∈ I N T N T −| | −| | (T ) (T ) w∗vkvk∗w w∗vkvk∗w 1 Gij G = sup (T ) sup = η− . (II.7) | |≤k k w =1 λ z ≤ w =1 η k k k=1 k − k k k=1 X X Therefore the desired result follows from (II.6) and (II.7). vi. Complement of the proof of Proposition 5.1

Proof. Let ω ,ω C+. Some basic calculations yield that 1 2 ∈ 1 1 G (w ) G (w ) (Im w )− (Im w )− w w , i,j . (II.8) | ij 1 − ij 2 |≤ 1 2 | 1 − 2| ∈ I e Let z E + ıη D . We construct a lattice as follows. Let z0 = E + ı. Fix ε (0,τ/8). For k =0, 1, 2≡,...,N 5 ∈ N 4+τ , define ∈ − 5 η = 1 kN − , z = E + ıη , k − k k 1/2 2 ε δ = (Nη )− + q , Ξ = Λ(z ) N δ . k k k { k ≤ k} p imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 41

Let C > 0 be a fixed constant. We show by induction for k = 1,...,N 5 N 4+τ that if the two events − ε CN δk 1 Θ(zk 1) − , 1(Ξk 1)=1, (II.9) ε − ≤ √κ + ηk 1 + N δk 1 − − − hold with high probability, then p N εδ Θ(z ) k , 1(Ξ )=1, k ε k ≤ √κ + ηk + √N δk hold with high probability. It is clear that (II.9) for k = 1 follows from (5.23) and (5.24). We verify that if 1(Ξk 1) = 1, 1 5 4+τ − then Λ(z ) log− N, k =1,...,N N . Using the Lipschitz condition (II.8), we have k ≤ − 1(Ξk 1)Λ(zk) 1(Ξk 1) Λ(zk) Λ(zk 1) + 1(Ξk 1)Λ(zk 1) − ≤ − | − − | − − ε max Gij (zk) Gij (zk 1) + N δk 1 ≤ i,j | − − | − 1 1 ε p 1/4 zk zk 1 ηk− ηk− 1 + N [(Nηk 1)− + q] ≤ | − − | − − 3 2τ ǫ τ/4 N − − + N [N − + q] ≤ 1 log− N. ≤ Let D > 0 be an arbitrarily large number. Therefore, by (5.18), (5.4) and (5.22), we can choose N Z such that as N N , 0 ∈ + ≥ 0

1 ε sup P 1(Ξk 1)(Λo(zk)+max Gii(zk) mN (zk) ) > N δk k 1,...,N 5 N 4+τ − i | − | 2 ∈{ − }  ∈I  D N − , (II.10) ≤ and D sup P 1(Ξk 1) f mN (zk) zk >δk N − . (II.11) k 1,...,N 5 N 4+τ − | − | ≤ ∈{ − }    Then, applying Proposition 4.6, we obtain from the induction hypothesis (II.9) and (II.11) that

ε/2 P 1(Ξk 1)Θ(zk) > CN δk − ε  p  CN δk D P 1(Ξ )Θ(z ) > N − . (II.12) k 1 k ε ≤ − √κ + ηk + √N δk ≤   5 4+τ Using (II.10), (II.12) and the fact that δk < √δk for all k =1,...,N N , we get that as N N , − ≥ 0 P c (Ξk 1 Ξk) − ∩ ε P 1(Ξk 1) max Gii(zk) mN (zk) + Θ(zk)+Λo(zk) >N δk ≤ − i | − |  ∈I  1 ε p P 1(Ξk 1) max Gii(zk) mN (zk) +Λo(zk) > N δk ≤ − i | − | 2  ∈I  1p ε D + P 1(Ξk 1)Θ(zk) > N δk 2N − . − 2 ≤  p  imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 42

Then we see that for any k 1,...,N 5 N 4+τ , as N N , ∈{ − } ≥ 0 k P c P P c P c 5 D (Ξk)=1 (Ξk)= (Ξi 1 Ξi )+ (Ξ0) 2N − . − − ∩ ≤ i=1 X 5 4+τ This shows that 1 1(Ξk) or equivalently Λ(zk) √δk uniformly for all k 0,...,N N . ≺ 5 4+τ ≺ ∈{5 − } Finally, by choosing kˆ 1,...,N N such that ı(z z ) N − , we have ∈{ − } − − kˆ ≤

Λ(z) Λ(z) Λ(zˆ) + Λ(zˆ) max Gij (z) Gij (zˆ) + Λ(zˆ) ≤ | − k | k ≤ i,j | − k | k 3 2τ 1/4 N − − + Λ(z ) (Nη)− + q. ≤ kˆ ≺ The proof of Proposition 5.1 is now complete. vii. Proof of Proposition 5.10.

Proof. We omit the proof of (5.30), since it is similar to that of (5.31) (actually it is also simpler than (5.31) since we only need to expand the G terms using the third identity of Lemma 4.1). In the following, we give the proof of (5.31). (i) 1 V (i) For simplicity of notation, denote Σ0 = Σ(mN Σ+ I)− and write i = xi∗ Σ0xi. In the following, we bound the quantity G N 1 Q V |N i i| i=1 X V V Let p be an even integer. Denote Vis := Qis is for s p/2 and Vis := Qis i∗s for s > p/2. We p ≤ bound E 1 N V . N is=1 i We see that P N p p p 1 p 1 1 E Q V = E V = E (P + Q )V . N i i N p is N p ir ir is i=1 i1,...,ip s=1 i1,...,ip s=1  r=1  X X Y X Y Y

Introducing the notation i = (i1,...,ip), [i]= i1,...,ip , PA = i A Pi and QA = i A Qi for some index set A, we get { } ∈ ∈ Q Q N p 1 p 1 E Q V = E P c Q V . (II.13) i i p As As is N N i i i=1 A1,...,Ap [ ] s=1   X X X ⊂ Y

c By definition of Vi, we have that Vis = Qis Vis and Pis Vis = 0, which imply that PAs Vis =0 if i / A . Hence we may restrict the summation to A satisfying s ∈ s s i A (II.14) s ∈ s

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 43

c c for all s. Moreover, we see that if is q=sAq for some s, say s = 1, then PA QAq Viq is ∈ ∩ 6 q X(s)-measurable for each q =2,...,p. Thus, we have

p p

E c E c c (PAs QAs Vis )= (PA1 QA1 Qi1 Vi1 ) (PAs QAs Vis ) s=1 s=2 Y Y p

E c c = Qi1 (PA1 QA1 Vi1 ) (PAs QAs Vis ) =0. (II.15) s=2 n Y o (II.14) and (II.15) show that each index is must belong to at least two different sets: As and Aq for some q = s. Hence 6 p A 2 [i] . (II.16) | s|≥ | | s=1 X In the following, a crucial step is to show that for i A ∈ A Q V Φ| |. (II.17) | A i|≺ ν When A = 1 (corresponding to the case A = i ), it follows straightforward from Lemma 4.4. Suppose| | A 2. For ease of presentation, we{ assume} without loss of generality that i =1 and A = 1, 2|,...,ν| ≥ for some ν 2. Before we proceed, we note the following equality that for any i = j{and T }with i,j / T≥, 6 ⊂ I ∈ (ijT ) (ijT ) (iT ) (ijT ) xj xj∗ x∗ = x∗( G G ) i G i G − 1+ x (ijT )x j∗G j (ijT ) (iT ) (ijT ) (ijT ) = x∗ + zG x x x∗ i G jj iG j j G (T ) (ijT ) Gij (ijT ) = x∗ + x∗ , (II.18) i G (T ) j G Gii and (T ) (T ) 1 G G Σ(T ) =Σ(iT ) + ji ij Σ(iT )ΣΣ(T ). (II.19) 0 0 N (T ) 0 0 j ( i T ) Gii ∈I\X{ }∪ We show an example of the expansion. It follows from Lemma V.1 and Lemma 4.1 that

(1) (1) Q V = Q (x∗ Σ x ) 2 1 2 1G 0 1 (1) (1) (1) (12) 1 Gj2 G2j (1) (12) (1) = Q (x∗ Σ x + x∗ Σ ΣΣ x ) 2 1G 0 1 N (1) 1G 0 0 1 j / 1,2 G22  ∈{X }  (12) (12) G12 (12) (12) = Q2(x1∗ Σ0 x1)+ Q2( x2∗ Σ0 x1) G G11 G (1) (1) 1 Gj2 G2j (1) (12) (1) +Q ( x∗ Σ ΣΣ x ) 2 N (1) 1G 0 0 1 j / 1,2 G22  ∈{X }  (1) (1) G 1 G G 12 x (12) (12)x j2 2j x (1) (12) (1)x = Q2( 2∗ Σ0 1)+ Q2( (1) 1∗ Σ0 ΣΣ0 1). G11 G N G j / 1,2 G22  ∈{X } 

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 44

We note that

G12 (12) (12) Φν (12) (12) Φν (12) (12) 2 x2∗ Σ0 x1 Σ0 F F Σ0 Φν , G11 G ≺ M kG k ≤ M kG k k k≺

and

(1) (1) 1 Gj2 G2j (1) (12) (1) Q ( x∗ Σ ΣΣ x ) 1 N (1) 1G 0 0 1 j / 1,2 G22  ∈{X }  (1) (1) 1 Gj2 G2j (1) (12) (1) = Q (x∗ Σ ΣΣ x ) N (1) 1 1G 0 0 1 j / 1,2 G22  ∈{X }  Φ2 ν (1)Σ(12)ΣΣ(1) ≺ M kG 0 0 kF Φ3 . ≺ ν We see that

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 45

Q3Q2V1 (1) (1) = Q Q (x∗ Σ x ) 3 2 1G 0 1 (1) (1) (1) (12) 1 Gj2 G2j (1) (12) (1) = Q Q (x∗ Σ x + x∗ Σ ΣΣ x ) 3 2 1G 0 1 N (1) 1G 0 0 1 j / 1,2 G22  ∈{X }  (12) (12) (1) (123) 1 Gj3 G3j (1) (123) (12) = Q Q (x∗ Σ x + x∗ Σ ΣΣ x 3 2 1G 0 1 N (12) 1G 0 0 1 j / 1,2,3 G33  ∈{X }  (1) (1) 1 Gj2 G2j (1) (12) (1) + x∗ Σ ΣΣ x ) N (1) 1G 0 0 1 j / 1,2 G22  ∈{X }  (12) (123) G12 (12) (123) = Q3Q2(x1∗ Σ0 x1 + x2∗ Σ0 x1 G G11 G (12) (12) 1 Gj3 G3j (1) (123) (12) + x∗ Σ ΣΣ x N (12) 1G 0 0 1 j / 1,2,3 G33  ∈{X }  (1) (1) 1 Gj2 G2j (1) (12) (1) + x∗ Σ ΣΣ x ) N (1) 1G 0 0 1 j / 1,2 G22  ∈{X }  G G G(1) 12 x (123) (123)x 12 23 x (123) (123)x = Q3Q2( 2∗ Σ0 1 + (1) 3∗ Σ0 1 G11 G G G11G22 (12) (12) 1 Gj3 G3j (1) (123) (12) + x∗ Σ ΣΣ x N (12) 1G 0 0 1 j / 1,2,3 G33  ∈{X }  (1) (1) 1 Gj2 G2j (1) (12) (1) + x∗ Σ ΣΣ x ) N (1) 1G 0 0 1 j / 1,2 G22  ∈{X }  (3) (3) 2 G12 G13G32 G12 G13G31 G13G32G31 (123) (123) = Q3Q2( + x2∗ Σ0 x1 G(3) G G(3) − G G(3)G − G G(3)G2 G  11 33 11 11 11 33 11 11 33  (1) G12G23 (123) (123) + x∗ Σ x (1) 3G 0 1 G11G22 (12) (12) 1 Gj3 G3j (1) (123) (12) + x∗ Σ ΣΣ x N (12) 1G 0 0 1 j / 1,2,3 G33  ∈{X }  (1) (1) 1 Gj2 G2j (1) (12) (1) + x∗ Σ ΣΣ x ) N (1) 1G 0 0 1 j / 1,2 G22  ∈{X }  (3) 2 G13G32 G12 G13G31 G13G32G31 (123) (123) = Q3Q2( x2∗ Σ0 x1 G G(3) − G G(3)G − G G(3)G2 G  33 11 11 11 33 11 11 33  (1) G12G23 (123) (123) + x∗ Σ x (1) 3G 0 1 G11G22 imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 (12) (12) 1 Gj3 G3j (1) (123) (12) + x∗ Σ ΣΣ x N (12) 1G 0 0 1 j / 1,2,3 G33  ∈{X }  (1) (1) 1 Gj2 G2j (1) (12) (1) + x∗ Σ ΣΣ x ). (II.20) N (1) 1G 0 0 1 j / 1,2 G22  ∈{X }  /Tracy-Widom limit in elliptical distributions 46

3 We observe that the first two terms in (II.20) are Φν , and we continue the expansion (T ) (T ) ≺ procedures for and Σ0 for which (T ) is not maximally expanded. Thus, the third term can be writtenG as

(12) (12) 1 G G G j3 3j x (12) 12 x (12) (123) (12)x Q3Q2( (12) ( 1∗ + 2∗ )Σ0 ΣΣ0 1) N G G11 G j / 1,2,3 G33  ∈{X }  (12) (12) 1 G G G j3 3j 12 x (12) (123) (12)x 3 = Q3Q2( (12) 2∗ )Σ0 ΣΣ0 1) Φν . N G11 G ≺ j / 1,2,3 G33  ∈{X }  Similar results can be obtained for the fourth term. Actually, one can observe that the first term of QAVi is the leading term, so we can only clarify the bounds for the first term. We see that

Q4Q3Q2Q1V1 (12) (12) (1) (123) 1 Gj3 G3j (1) (123) (12) = Q Q Q Q (x∗ Σ x + x∗ Σ ΣΣ x 1 4 3 2 1G 0 1 N (12) 1G 0 0 1 j / 1,2,3 G33  ∈{X }  (1) (1) 1 Gj2 G2j (1) (12) (1) + x∗ Σ ΣΣ x ) N (1) 1G 0 0 1 j / 1,2 G22  ∈{X }  (1) (1234) = Q Q Q Q (x∗ Σ x 1 4 3 2 1G 0 1 (123) (123) 1 Gj4 G4j (1) (1234) (123) + x∗ Σ ΣΣ x N (123) 1G 0 0 1 j / 1,2,3,4 G44  ∈{X }  (12) (12) 1 Gj3 G3j (1) (123) (12) + x∗ Σ ΣΣ x N (12) 1G 0 0 1 j / 1,2,3 G33  ∈{X }  (1) (1) 1 Gj2 G2j (1) (12) (1) + x∗ Σ ΣΣ x ) N (1) 1G 0 0 1 j / 1,2 G22  ∈{X }  (12) (1234) G12 (12) (1234) = Q1Q4Q3Q2(x1∗ Σ0 x1 + x2∗ Σ0 x1), G G11 G

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 47 whose first term can be expanded as

(1) (1234) Q Q Q Q (x∗ Σ x ) 1 4 3 2 1G 0 1 (12) (1234) G12 (12) (1234) = Q1Q4Q3Q2(x1∗ Σ0 x1 + x2∗ Σ0 x1) G G11 G G G G(1) 12 x (123) (1234)x 12 23 x (123) (1234)x = Q1Q4Q3Q2( 2∗ Σ0 1 + (1) 3∗ Σ0 1) G11 G G G11G22 (3) (3) 2 G12 G13G32 G12 G13G31 G13G32G31 (123) (1234) = Q1Q4Q3Q2( + x2∗ Σ0 x1 G(3) G G(3) − G G(3)G − G G(3)G2 G  11 33 11 11 11 33 11 11 33  (1) (1) (12) G12G23 (1234) (1234) G12G23 G34 (1234) (1234) + x∗ Σ x + x∗ Σ x ) (1) 3G 0 1 (1) (12) 4G 0 1 G11G22 G11G22 G33 (3) 2 G13G32 G12 G13G31 G13G32G31 (123) (1234) = Q1Q4Q3Q2( x2∗ Σ0 x1 G G(3) − G G(3)G − G G(3)G2 G  33 11 11 11 33 11 11 33  (1) (1) (12) G12G23 (1234) (1234) G12G23 G34 (1234) (1234) + x∗ Σ x + x∗ Σ x ) (1) 3G 0 1 (1) (12) 4G 0 1 G11G22 G11G22 G33 (3) 2 G13G32 G12 G13G31 G13G32G31 (1234) (1234) = Q1Q4Q3Q2( x2∗ Σ0 x1 G G(3) − G G(3)G − G G(3)G2 G  33 11 11 11 33 11 11 33  (3) 2 (13) G13G32 G12 G13G31 G13G32G31 G24 (1234) (1234) + x4∗ Σ0 x1 G G(3) − G G(3)G − G G(3)G2 G(13) G  33 11 11 11 33 11 11 33  44 (1) (1) (12) G12G23 (1234) (1234) G12G23 G34 (1234) (1234) + x∗ Σ x + x∗ Σ x ). (1) 3G 0 1 (1) (12) 4G 0 1 G11G22 G11G22 G33 We see that the term

(3) 2 G13G32 G12 G13G31 G13G32G31 (1234) (1234) Q1Q4Q3Q2( x2∗ Σ0 x1) G G(3) − G G(3)G − G G(3)G2 G  33 11 11 11 33 11 11 33  can be bounded by carrying out the following expansion

(3) (3) G13G32 (4) G14G43 (4) G34G42 1 G34G43 1 G14 G41 (3) = G13 + G32 + (4) (4) (34) (3) (34) (3) . G G G44 G44 G − G G G G − G G G 33 11    33 33 33 44  11 11 11 44  Thus the first term resulting from the expansion yields that

(4) (4) G13 G32 (1234) (1234) Q Q Q Q ( x∗ Σ x )=0, 1 4 3 2 (4) (34) 2G 0 1 G33 G11 and the remaining terms all contain three off-diagonal G terms in the numerators. Consequently,

(3) 2 G13G32 G12 G13G31 G13G32G31 (1234) (1234) 4 Q1Q4Q3Q2( x2∗ Σ0 x1) Φν . G G(3) − G G(3)G − G G(3)G2 G ≺  33 11 11 11 33 11 11 33 

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 48

Similarly, the term (1) G12G23 (1234) (1234) Q Q Q Q ( x∗ Σ x ) 1 4 3 2 (1) 3G 0 1 G11G22 4 can be bounded by Φν via expanding the G terms into those with (4) added to the superscripts. So the remaining term is

(1) (1) 1 Gj2 G2j (12) (12) (12) Q Q Q Q ( x∗ Σ ΣΣ x ). 1 4 3 2 N (1) 1G 0 0 1 j / 1,2 G22  ∈{X }  We note that (1) (1) 1 Gj2 G2j (12) (12) (12) Q Q Q Q ( x∗ Σ ΣΣ x ) 1 4 3 2 N (1) 1G 0 0 1 j / 1,2 G22  ∈{X }  (1) (1) 1 Gj2 G2j (123) (123) (123) 4 = Q1Q4Q3Q2( x1∗ Σ0 ΣΣ0 x1)+ O (Φν ) N (1) G ≺ j / 1,2 G22  ∈{X }  (13) (13) 1 Gj2 G2j (123) (123) (123) 4 = Q1Q4Q3Q2( + x1∗ Σ0 ΣΣ0 x1)+ O (Φν ) N (13) T G ≺ j / 1,2,3 G22  ∈{X }  Φ4 , ≺ ν where is the term with at least three off-diagonal G terms in the numerator which can be T 4 bounded by Φν via expanding the G terms. Now we summarise the expansion steps as follows. Consider

(1) (1) Q V := Q Q Q V = Q Q Q x∗ Σ x A 1 ν ··· 2 1 1 ν ··· 2 1 1G 0 1 (1) (A) a) We first expand the term x1∗ to xi∗ from the smallest index to the largest index. G G (A) We will get a sequence of monomials with the maximally expanded i := xi∗ where i A. The coefficients of ’s are of the pattern A G ∈ Ai (Tab) a,b Pi Gab { }∈ , G(Tab) Q a,b Pi aa { }∈ where Pi is an ordered paired patternQ subset of 1, ,i A, for example, Pi = 1, 3 , 3, 4 , 4,i and T = 1, ,b a,b or {T ···= .} And ⊂ these coefficients can be {{ } { } { }} ab { ··· }\{ } ab ∅ handled by the resolvent extension just like the same procedures in (1/N) i=1 Qi1/Gii. One can observe that the only remaining terms after taking QA (we imprecisely ignore P the term Σ0 here) are those monomials whose lower indexes in the numerator contain all elements of A. Since there are only off-diagonal entries in the numerator, by the paired pattern Pi we remark that for those monomials, the number of off-diagonal entries is A . (T ) | | b) We expand Σ0 to match the upper index with which ensures that several undesired terms vanish after taking conditional expectation.G Actually, we use the same strategy as in step a), adding the upper index of Σ0 from the smallest value to the largest value of A. We remark that except the leading term with coefficient 1, other terms give us more off-diagonal entries as coefficients.

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 49

c) We expand the Green function of coefficients after steps a) and b) to a maximal extent, or have at least A off-diagonal entries. | | Then it follows that for ν > 2

Q V = Q Q Q V Φν+1. A i ν ··· 2 1 i ≺ ν This completes the proof of (II.17). By (II.16) and (II.17), it follows from (II.13) that there exists some constant Cp > 0 depending on p only such that

N p 1 p 1 E Q V = E P c Q V i i p As As is N N i i i=1 A1,...,Ap [ ] s=1   X X X ⊂ Y p Cp 2 [i] Cp 2s Φ | | = Φ 1( i = s) ≺ N p ν N p ν | | i s=1 i X p X X 2s s p 1/2 2p 2p C Φ N − C (Φ + N − ) C Φ , (II.21) ≤ p ν ≤ p ν ≤ p ν s=1 X where the second last step follows from the elementary inequality anbm (a+b)n+m for positive 1/2 ≤ 1 N V a,b and the last step follows from the fact that CN − Φν . (II.21) shows that N i=1 Qi i Φ2 , which concludes (5.31). ≤ ≺ ν P

Appendix III: Proof of Theorem 3.2.

Firstly we show (3.4). Proof of (3.4). Using Proposition 5.10, we get from (5.22) that 1 Im m f(m ) z N ε q2 + + , | N − |≺ { (Nη)2 Nη }

e 1/3 uniformly for z D . Note here we assume q 0, as N is sufficiently large,

ε P N Im m 1 2 sup mN m > ( + 2 + q ) z De | − | √κ + η Nη (Nη) ∈   ε Im m 1 2 N ( + 2 + q ) Nη (Nη) D sup P mN m > N − , ≤ z De | − | ε/2 Im m 1 2 ≤ √κ + η + N ( + 2 + q ) ∈  Nη (Nη)  q so uniformly for z De, ∈ 1 Im m 1 m m ( + + q2). (III.1) | N − |≺ √κ + η Nη (Nη)2 Denote 1 Im m 1 Ψ(˜ z)= ( + + q2). √κ + η Nη (Nη)2

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 50

Next, we show that 2/3 2 λ1 = λ+ + O (N − + q ). (III.2) ≺ We know from Lemma V.2 there exists some constant C > 0 such that λ1 C with high probability. Therefore, it remains to show that for any fixed ε> 0, there is no eigenvalue≤ of W in the interval 2/3+4ε 4ǫ 2 I := [λ+ + N − + N q , C] (III.3) with high probability. The idea of the proof is to choose, for each E I, a scale η(E) such −ε ∈ that Im m (E + ıη(E)) N with high probability. First, we need a simultaneous version of N ≤ Nη(E) (III.1), i.e. m (z) m(z) Ψ(˜ z) holds with high probability. (III.4) | N − |≤ z De ∈\ n o It suffices to show that m (z) m(z) sup | N − | 1. (III.5) z De Ψ(˜ z) ≺ ∈ Let for i = 1, 2, z E + ıη De and κ = E λ . Elementary calculation yields that i ≡ i i ∈ i | i − +| there exists some constant C1 > 0 such that

2 mN (z1) mN (z2) C1N z1 z2 , | − | ≤ 2| − | m(z1) m(z2) C1N z1 z2 , | − | ≤ 5/|2 − | Ψ(˜ z1) Ψ(˜ z2) C1N z1 z2 , | − | ≤ |1 − | inf Ψ(˜ z) (C1N)− . (III.6) z De ∈ ≥ 4 e 4 2 e e 8 e We define the N − -net Dˆ = (N − Z ) D . Hence, Dˆ CN and for any z D , there e 4 ∩ | |≤ ∈ exists a w Dˆ such that z w 2N − . Then using a simple union bound and (III.6), we can deduce (III.5∈ ). | − |≤ Let ε be as in (III.3). Then we observe from (III.4) and Lemma 4.5 that

ε η 1 1 1 2 mN (z) m(z) N + ( 2 + q ) (III.7) e | − |≤ κ Nη √κ (Nη) z D ,E λ+ ∈ \≥ n  o holds with high probability. For each E I, we define ∈ 1 1 1/2 η(E)= q− N − κ(E) , z(E)= E + ıη(E). (III.8)

Using Lemma 4.5, we find that there exists a constant C > 0 such that for all E I 2 ∈ C η(E) C N ε Im m(z(E)) 2 2 − . (III.9) ≤ κ(E) ≤ Nη(E)

With the choice η(E) in (III.8), we obtain fromp (III.7) that

2N ε m (z) m(z) − holds with high probability. (III.10) | N − |≤ Nη(E) E I \∈ n o

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 51

From (III.9) and (III.10) we conclude that

(2 + C )N ε Im m (z) 2 − holds with high probability. (III.11) N ≤ Nη(E) E I \∈ n o

Now suppose that there is an eigenvalue, say λi of W in I. Then we find that 1 η(λ ) 1 Im m (z(λ )) = i , N i N (λ λ )2 + η(λ )2 ≥ Nη(λ ) j j i i i X − which contradicts with the inequality in (III.11). Therefore, we conclude that with high proba- bility, there is no eigenvalue in I. Since ε> 0 in (III.3) is arbitrary, (III.2) follows. Now we show (3.5). First we show the following lemma.

Lemma III.1. Let a1,a2 be two numbers with a1 a2 and a1 + a2 = O(1). For any E1, E2 1 ≤ 2 R| | | | ∈ [a1,a2] and η = N − , let ψ(λ) := ψE1,E2,η(λ) be a C ( ) function such that ψ(x)=1 for x [E1 + η, E2 η], ψ(x)=0 for x R [E1, E2] and the first two derivatives of ψ satisfy (1)∈ 1− (2) 2 ∈ \ ∆ ψ (x) Cη− , ψ (x) Cη− for all x R. Let ̺ be a signed measure on the real line | ∆| ≤ | | ≤ ∆ ∈ and m be the Stieltjes transform of ̺ . Suppose, for some positive number cN depending on N, we have 1 q2 m∆(x + ıy) Cc ( + ) y < 1, x [a ,a ], (III.12) | |≤ N Ny √κ + y ∀ ∈ 1 2 then

2 ∆ 1 q ψ(λ)̺ (dλ) 6 cN ( + ) N κ +1/2 Z 1 6 c ( + qp2 E E + η) (III.13) N N 2 − 1 1 p 6 c ( + q3 + q2√κ ). N N E1

ǫ 2 2/3 Proof. By (3.4), it suffices to show the case where E2 = λ+ + N (q + N − ). Let χ(y) be a smooth cutoff function with support [ 1, 1] such that χ(y) = 1 for y 1/2 and χ(y) has bounded derivatives otherwise. Define: − | | ≤

̺∆(x) := ρ (x) ̺(x), m∆(z) := m (z) m(z). W − N − Using the Helffer-Sj¨ostrand formula (setting χ(x + ıy)= χ(y) in Proposition C.1 of [10]), we get that ı yψ(2)(x)χ(y)+ ψ(x)+ ıyψ(1)(x) χ(1)(y) ψ(λ)= { } dxdy. 2π R2 λ x ıy Z − −

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 52

Integrating with respect to ̺∆ and using the fact that ψ and χ are real, we obtain that ı ψ(λ)̺∆(dλ) = [yψ(2)(x)χ(y)+ ψ(x)+ ıyψ(1)(x) χ(1)(y)]m∆(x + ıy)dxdy 2π R2 { } Z Z

C ψ(x) + y ψ(1)(x) χ(1)(y) m∆(x + ıy) dxdy ≤ R2 {| | | || |}| || | Z +C yψ(2)(x)χ(y) Im m∆(x + ıy)dxdy y η R Z| |≤ Z

+C yψ(2)(x)χ(y) Im m∆(x + ıy)dxdy . (III.14) y >η R Z| | Z

With (III.12), the first term in (III.14) can be estimated as

C ψ(x) + y ψ(1)(x) χ(1)(y) m∆(x + ıy) dxdy R2 {| | | || |}| || | Z E2 = C ψ(x) χ(1)(y) m∆(x + ıy) dxdy [ 1,1] [ 1/2,1/2] E1 | || || | Z − \ − Z +C y ψ(1)(x) χ(1)(y) m∆(x + ıy) dxdy [ 1,1] [ 1/2,1/2] [E1,E2] [E1+η,E2 η] | || || || | Z − \ − Z \ − 1 q2 CcN ( + ). ≤ N κ +1/2

The second term in (III.14p ) can be estimated as

C yψ(2)(x)χ(y) Im m∆(x + ıy)dxdy y η R Z| |≤ Z

= C yψ(2)(x)χ(y) Im m∆(x + ıy)dxdy y η [E1,E2] [E1+η,E2 η] Z| |≤ Z \ − Cc N . ≤ N For the third term in (III.14), we note that

∂ ∂ ∂ ∂ Im m∆(x + ıy)=Im m∆(x + ıy) = Im ı m∆(x + ıy) = Re m∆(x + ıy). ∂x ∂x − ∂y −∂y n o n o

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 53

Then it follows from integration by parts first with respect to x then y that

C yψ(2)(x)χ(y) Im m∆(x + ıy)dxdy y >η R Z| | Z ∂ = C yψ(1)(x)χ(y) Im m∆(x + ıy)dx dy y >η R ∂x Z| | Z ∂ = C yψ(1)(x)χ(y) Re m∆(x + ıy)dyd x − R y >η ∂y Z Z| |

= C yψ(1)(x)χ(y) Re m∆(x + ıy) ∞dx − R η Z h i η yψ(1)(x)χ(y) Re m∆(x + ıy) − dx − R Z h i−∞ +C ψ(1)(x) χ(y)+ yχ(1)(y) Re m∆(x + ıy)dydx R y η { } Z Z| |≥

= C 2 ηψ(1)(x) Re m∆(x + ıη)dx R Z

+ C ψ(1)(x) χ(y)+ yχ(1)(y) Re m∆(x + ıy)dydx R y >η { } Z Z| |

C η ψ(1)(x) Re m∆(x + ıη) dx ≤ R | || | Z C + Re m∆(x + ıy) dxdy η η< y 1 [E1,E2] [E1+η,E2 η] | | Z | |≤ Z \ − +C yψ(1)(x)χ(1)(y) Re m∆(x + ıy) dxdy η< y 1 [E1,E2] [E1+η,E2 η] | | Z | |≤ Z \ − 2 cN 1 q C dx + C cN ( + )dy ≤ [E1,E2] [E1+η,E2 η] Nη η< y 1 N y κ + y Z \ − Z | |≤ | | | | c 1 q2 +C N ( + )dpxdy η< y 1 [E1,E2] [E1+η,E2 η] η N κ +1/2 Z | |≤ Z \ − 2 2 CcN q CcN log N p q + log η 6 CcN ( + ). ≤ κ +1/2 N | | N κ +1/2 Then we summarisep that p 2 ∆ 1 q ψ(λ)̺ (dλ) 6 cN ( + ) N κ +1/2 Z 1 6 c ( + qp2 E E + η) (III.15) N N 2 − 1 1 p 6 c ( + q3 + q2√κ ) N N E1

∆ 1+τ Next, let ̺ be the signed measure ̺N ̺. If y y0 = N − , the condition (III.12) in Lemma III.1 holds for the difference m∆ = m− m and≥ c = N ε for any small ε> 0 with high N − N

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 54 probability due to Theorem 3.1. For y y , set z = x + ıy, z = x + ıy and estimate ≤ 0 0 0 y0 ∂ mN (z) m(z) mN (z0) m(z0) + mN (x + ıη) m(x + ıη) dη. (III.16) | − | ≤ | − | y ∂η { − } Z

Note that ∂ ∂ 1 m (x + iη) = ̺ (dλ) ∂η N ∂η λ x ıη N Z − − 1 1 ̺N (dλ)= η− Im mN (x + ıη). ≤ λ x ıη 2 Z | − − | The same bound applies to ∂ m(x + ıη) with m replaced by m. | ∂η | N Then using Theorem 3.1 and the fact that the functions y y Im mN (x + ıy) and y y Im m(x + ıy) are both monotone increasing for any y > 0 since→ both are Stieltjes transforms→ of a positive measure, we obtain that y0 ∂ y0 1 mN (x + ıη) m(x + ıη) dη Im mN (x + ıη)+Im m(x + ıη) dη y ∂η { − } ≤ y η { } Z Z y0 1 y Im m (z )+Im m(z ) dη ≤ 0{ N 0 0 } η2 Zy 1 1 = y0 Im mN (z0)+Im m(z0) ( ) { } y − y0 y y = Im m (z )+Im m(z ) 0 − { N 0 0 } y 1 2 Im m(z ) + (Ny )− . ≺ 0 0 Hence we have from (III.16) that

τ 1 CNy +1 CN m (z) m(z) 2 Im m(z ) + (Ny )− + q + q + q. (III.17) | N − |≺ 0 0 ≤ Ny ≤ Ny τ Let ψE1,E2,η be the function in Lemma III.1. Applying Lemma III.1 below with cN = N , we 1 obtain that for any η = N −

1+τ ψE1,E2,η(λ)̺N (dλ) ψE1,E2,η(λ)̺(dλ) N − . R − R ≺ Z Z

Integrating with respect to ̺N (dλ) and ̺(dλ) on both sides of the following elementary in- equality 2η2 1[x η,x+η](λ) x, λ R, − ≤ (λ x)2 + η2 ∀ ∈ − and using (III.17), Lemma 4.5 and the definitions of y0 and q, we get that for some constant C > 0 1+τ n (x η, x + η) Cη Im m (x + ıη) Cy Im m (x + ıy ) N − , N − ≤ N ≤ 0 N 0 ≺ and 1+τ n(x η, x + η) Cη Im m(x + ıη) Cy Im m(x + ıy ) N − , − ≤ ≤ 0 0 ≺

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 55 uniformly for x in a small neighborhood of λ+. We note that

E2 E2 n (E , E ) n(E , E ) = ̺ (dλ) ̺(dλ) | N 1 2 − 1 2 | | N − | ZE1 ZE1 E2 η E2 η − − ψ (λ)̺ (dλ) ψ (λ)̺(dλ) ≤ | E1,E2,η N − E1,E2,η | ZE1+η ZE1+η E1+η E1+η + ̺ (dλ) + ̺(dλ) | N | | | ZE1 ZE1 E2 E2 + ̺N (dλ) + ̺(dλ) | E2 η | | E2 η | Z − Z − 1 3 2 ( 1+τ + q + q (√κE1 √κE2 ), ≺ N − − Since τ is arbitrary, the desired result follows. Finally, we are ready to show (3.6). 1/3 Proof of (3.6). With the choice of q λ+ c 2/3 − N N − for some c> 0, then, with high probability

ǫ 2/3 λ γ 6 N − N − , (III.18) | i − i| 3/2 for some ǫ> 0. By the square root behavior of ̺, we have n(x) (λ1 x) when x is near the edge. That is ∼ − j n(γ )= (λ γ )3/2. j N ∼ 1 − j Thus we have proved the case where j 6 N c for small c. Together with (III.18), we conclude (3.6). For the rest of j’s, one can refer to [14], so we omit the details since we only need the result near the right edge.

Appendix IV: Complete the proof of Theorem 3.3.

We introduce the notation of functional calculus. Specifically, for a function f( ) and a matrix H, f(H) denotes the matrix whose eigenvectors are those of H and eigenvalues are· the values of f applied to each eigenvalue of H. First, we present a lemma for the approximation of the eigenvalue counting function. For any η> 0, define η 1 1 ϑ (x)= = Im . η π(x2 + η2) π x ıη − We notice that for any a,b R with a b, the convolution of 1 and ϑ applied to the ∈ ≤ [a,b] η eigenvalues λi, i =1,...,N yields that

N N b 1 ϑ (λ )= Im m (x + ıη)dx. [a,b] ∗ η i π N i=1 a X Z

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 56

In terms of the functional calculus notation, we have

N N 1 (λ ) = Tr1 (W ), 1 ϑ (λ ) = Tr1 ϑ (W ). [a,b] i [a,b] [a,b] ∗ η i [a,b] ∗ η i=1 i=1 X X R b For a,b , , define (a,b) = N a ̺N (dx) as the number of eigenvalues of W in [a,b]. ∈ ∪ {−∞ ∞} N R The following lemma shows that Tr1[a,b](W ) can be well approximated by its smoothed version Tr1[a,b] ϑη(W ) for a,b around the edge λ+ so that the problem can be converted to comparison of the Stieltjes∗ transform. 2/3+ε 2/3 3ε Lemma IV.1. Let ε> 0 be an arbitrarily small number. Set Eε = λ+ +N − , ℓ1 = N − − 2/3 9ε 3 2/3+ε and η1 = N − − . Then for any E satisfying E λ+ 2 N − , it holds with high probability that | − |≤ 2ε Tr1 (W ) Tr1 ϑ (W ) C(N − + (E ℓ , E + ℓ )). | [E,Eε] − [E,Eε] ∗ η |≤ N − 1 1 Proof. See Lemma 4.1 of [37] or Lemma 6.1 of [22]. Let q : R R be a smooth cutoff function such that → + 1 if x 1/9, q(x)= | |≤ 0 if x 2/9, ( | |≥ so q(x) is decreasing for x 0. Then we have the following corollary. ≥ 2ε 2/3 ε Corollary IV.2. Let ε,ℓ1, η1, Eε be defined in Lemma IV.1. Set ℓ = ℓ1N /2 = N − − /2. Then for all E such that 2/3+ε E λ N − , (IV.1) | − +|≤ the inequality

ε ε Tr1[E+ℓ,Eε] ϑη1 (W ) N − (E, ) Tr1[E ℓ,Eε] ϑη1 (W )+ N − (IV.2) ∗ − ≤ N ∞ ≤ − ∗ holds with high probability. Furthermore, for any D > 0, there exists N0 N independent of E such that for all N N , ∈ ≥ 0 E q Tr1[E ℓ,Eε] ϑη1 (W ) − ∗ n o D P( (E, )=0) Eq Tr1 ϑ (W ) + N − . (IV.3) ≤ N ∞ ≤ [E+ℓ,Eε] ∗ η1 n o 2/3+ε Proof. Notice that for E satisfying E λ+ N − , we have E ℓ λ+ E λ+ + ℓ 3 2/3+ε | − |≤ | − − | ≤ | − | ≤ N − . Therefore, Lemma IV.1 holds with E replaced by any x [E ℓ, E]. By the mean 2 ∈ −

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 57 value theorem, we obtain that with high probability,

E 1 Tr1[E,Eε](W ) ℓ− Tr1[x,Eε](W )dx ≤ E ℓ Z − E 1 ℓ− Tr1[x,Eε] ϑη1 (W )dx ≤ E ℓ ∗ Z − E 1 2ε +Cℓ− N − + (x ℓ1, x + ℓ1) dx E ℓ{ N − } Z − Tr1[E ℓ,Eε] ϑη1 (W )dx ≤ − ∗ 2ε ℓ1 +CN − + C (E 2ℓ, E + ℓ), ℓ N − where the last inequality follows from the fact that each eigenvalue in [E 2ℓ, E+ℓ] will contribute E − to the integral E ℓ (x ℓ1, x + ℓ1)dx at most 2ℓ1 mass, since the length of the interval − N − 2ε 2/3 [x ℓ1, x+ℓ1] isR 2ℓ1 for each x [E ℓ, E]. From Theorem 3.2, (IV.1), ℓ1/ℓ =2N − , ℓ N − and− the square root behavior of∈ ̺ (see− e.g. Lemma 2.1 of [9]), we get that ≤

ℓ1 1 2ε 1 (E 2ℓ, E + ℓ) =2N − n(E 2ℓ, E + ℓ)+ O (N − ) ℓ N − { − ≺ } E+ℓ 1 2ε 1 =2N − ̺(dx)+ O (N − ) { E 2ℓ ≺ } Z − (E+ℓ) λ+ 1 2ε ∧ 1 2N − O(1) λ+ x̺(dx)+ O (N − ) ≤ { E 2ℓ − ≺ } Z − 5ε/2 2ε p CN − + O (N − ). ≤ ≺ We have thus proved that

ε (E, Eε) = Tr1[E,Eε](W ) Tr1[E ℓ,Eε] ϑη1 (W )+ N − N ≤ − ∗ holds with high probability. Using Theorem 3.2, it follows that we can replace (E, Eε) by (E, ) with a loss of D N N ∞ probability of at most N − for any large D > 0. This proves the upper bound of (IV.2). The lower bound of (IV.2) can be shown analogously.

When (IV.2) holds, the event (E, ) = 0 implies that Tr1[E+ℓ,Eε] ϑη1 (W ) 1/9. Thus we have N ∞ ∗ ≤ D P( (E, )=0) P(Tr1 ϑ (W ) 1/9)+ N − , N ∞ ≤ [E+ℓ,Eε] ∗ η1 ≤ which together with Markov’s inequality proves the upper bound of (IV.3). For the lower bound, by using the upper bound of (IV.2) and the fact that (E, ) is an integer, we see that N ∞ E P q Tr1[E ℓ,Eε] ϑη1 (W ) Tr1[E ℓ,Eε] ϑη1 (W ) 2/9 − ∗ ≤ − ∗ ≤ ε P (E, ) 2/9+ N − = P (E, )=0 .  ≤ N ∞ ≤  N ∞ This completes the proof of Corollary IV.2.  

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 58

2/3 Proof of Theorem 3.3. Let ε> 0 be an arbitrary small number. Let E = λ+ + sN − for some ε 2/3+ε 2/3 ε 2/3 9ε s N . Define Eε = λ+ + N − , ℓ = N − − /2 and η1 = N − − . Define W˜ , ˜ to be the| | ≤ analogs of W , but with X˜ in place of X. N Using CorollaryNIV.2, we have E ˜ P ˜ q Tr1[E ℓ,Eε] ϑη1 (W ) (E, )=0 . (IV.4) − ∗ ≤ N ∞ Recall that by definition  

N Eε Tr1[E ℓ,Eε] ϑη1 (W )= Im mN (x + ıη1)dx. − ∗ π E ℓ Z −

Theorem 6.1 applied to the case where E1 = E ℓ and E2 = Eε shows that there exists δ > 0 such that − E E ˜ δ q Tr1[E ℓ,Eε] ϑη1 (W ) q Tr1[E ℓ,Eε] ϑη1 (W ) + N − . (IV.5) − ∗ ≤ − ∗ Then applying Corollary IV.2 to the left-hand side of (IV.5), we have for arbitrarily large D> 0 P E D (E 2ℓ, )=0 q Tr1[E ℓ,Eε] ϑη1 (W ) + N − (IV.6) N − ∞ ≤ − ∗ as N is sufficiently large.   Using the bounds (IV.4), (IV.5) and (IV.6), we get that

δ P (E 2ℓ, )=0 P ˜ (E, )=0 +2N − N − ∞ ≤ N ∞   2/3 for sufficiently small ε > 0 and sufficiently large N. Recall that E = λ+ + sN − . The proof of the first inequality of Theorem 3.3 is thus complete. By switching the roles of X and X˜, the second inequality follows. The proof is done.

Appendix V: Some useful results.

Lemma V.1 (Sherman-Morrison formula). Let A be an invertible matrix and x be a column vector such that xx∗ has the same size as A. Then A 1xx A 1 xx 1 1 − ∗ − (A + ∗)− = A− 1 . (V.1) − 1+ x∗A− x

Proof. Multiplying A + xx∗ on both sides of (V.1), we get the identity I = I.

Theorem V.1 (Moments of uniform spherical distribution). Let u = (u1,...,uM )′ be an M- dimensional random vector of spherical uniform distribution. Let n 1,...,M , i1,...,in 1,...,M and k ,...,k be positive integers. Defining k = k + ∈+ {k , we have} ∈ { } 1 n 0 1 ··· n Γ( M ) n Γ( ki+1 ) k1 kn 2 i=1 2 k0/2 E u u = Ck ,nM − , | i1 ··· in | πn/2Γ( k0+M ) ≤ 0 Q 2 where C > 0 is a constant depending on k and n only. Moreover, if for some j 1,...,n , k0,n 0 ∈{ } kj is odd. Then E(uk1 ukn )=0. i1 ··· in

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 59

Proof of Theorem V.1. Write u = z/ z for some NM (0, I) random vector z. Then z follows a half with scale parameterk k M and u is independent with z (seek e.g.k Page 37 of [34]). So k k

2k0/2 n Γ( ki+1 ) E( z k0 uk1 ukn )= E zk1 zkn = i=1 2 . k k | i1 ··· in | | i1 ··· in | πn/2 Q We see that 2k0/2Γ( k0+M ) E k0 2 ( z ) = M k k Γ( 2 )

k0/2 k0/2 k0+M 2 i=1 ( 2 i) if k0 is even, = − M+1 k /2 (k0 1)/2 k0+M Γ( 2 ) 0 − (2 Qi=1 ( 2 i) M if k0 is odd, − Γ( 2 ) k0/2 M Q if k0 is even,

1 k0/2 ≥ ( 2 M if k0 is odd, q where for odd k0, we have used Wendel’s inequality (see e.g. [38]) that Γ(x + s) x 1 s xs − x> 0, 0

k1 kj kn k1 kj kn k1 kj kn E(u u u )= E u ( ui ) u = E u ( u ) u =0. i1 ··· ij ··· in { i1 ··· − j ··· in } { i1 ··· − ij ··· in }

Lemma V.2. Under Conditions 2.6 and 2.8, there exists a constant C > 0 such that XX∗ C with high probability. k k≤ 1/2 2 1/2 Proof. Recall that XX∗ =Σ UD U ∗Σ . We observe that

2 2 1 1 1 1/2 max ξi = max(ξi MN − )+ MN − = MN − + O (N − ). (V.2) i i − ≺

From (V.2), the assumption that Σ is bounded and the inequality XX∗ Σ UU ∗ 2 k k k k≤k kk k D , we see that it suffices to show that there exists a constant C > 0 such that UU ∗ C with k k k k≤ 2 high probability. Write the i, j-th entry of U as Uij . We see that E√MUij = 0, E(√MUij ) =1 k and E(√MUij ) < for all k 3. The desired result then follows from the arguments in [44] ∞ 1 ≥ 2 applied to the matrix MN − UU ∗ which shows that for any k 1, 2, and c> (1+ M/N) , ETr(MN 1UU )k ck for all large N. Indeed, the matrix U ∈{violates···} the assumption in [44] that − ∗ p all entries of U are≤ mutually independent. However, this will not invalidate the proof because 1 k k the strategy of [44] is to bound the probability P(Tr(MN − UU ∗) > c ), for which the only inputs needed are the bounds on

EU U U U . | i1j1 i2j1 ··· ik jk i1jk | We note that the only consequence that the violation of independence leads to is that the expectation of products of the U entires from the same column do not factor into products of

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 60 expectation. However, this is not a problem since the expectation of the product of dependent U entries can be bounded by product of individual expectations. To be specific, we consider, without loss of generality, the U entries from the first column. Let m M be a positive integer, ≤ i1,...,im 1,...,M be distinct m integers and a1,...,am be m positive integers. Denote a = a + ∈ {+ a . Then} we claim that 0 1 ··· m E(U a1 U am ) E(U a1 ) E(U am ). (V.3) i11 ··· im1 ≤ i11 ··· im1 E a1 am If one of a1,...,am is odd, (V.3) is true because by symmetry of the U entries, both (Ui1 1 Uim1). a1 am ··· and E(U ) E(U ) are 0. Hence from now on, we assume that all of a1,...,am are even. i11 ··· im1 Let z (z1,..., zM )∗ be a real-valued M dimensional standard normal random vector. Then ≡ M 1 we have that z/ z U(S − ) and z/ z are independent with z . It thus follows that k k∼ k k k k E(U a1 U am )E z a0 = E(za1 ) E(zam ). i1 1 ··· im1 k k i1 ··· im Therefore,

a a E 1 m E a1 E am E a1 E am (Ui11 Uim1) ( z ) ( z ) ( z ) ( z ) a ··· a = k k ··· k k k k ··· k k E(U 1 ) E(U m ) E( z a0 ) ≤ E( z a1 )E( z a0 a1 ) i11 ··· im1 k k k k k k − E( z a1 ) E( z am ) E( z a1 ) E( z am ) k k ··· k k k k ··· k k =1, ≤ E( z a1 )(E z a2 )E( z a0 a1 a2 ) ≤···≤ E( z a1 ) E( z am ) k k k k k k − − k k ··· k k k a a0 P aj where the first to the last inequalities follow from the fact that z k and z − j=1 are positive correlated for all k = 1,...,m 1. Now the proof of thek claimk is completek k and this lemma is shown. −

References

[1] Anderson, T. and Fang, K. (1990). Statistical inference in elliptically contoured and related distributions. Technical report. Standford University, Stanford, California. https://apps.dtic.mil/dtic/tr/fulltext/u2/a230672.pdf. [2] Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1):191–221. [3] Bai, Z. D. (1999). Methodologies in spectral analysis of large dimensional random matrices, a review. Statist. Sinica, 9:611–677. [4] Bai, Z. D. and Silverstein, J. W. (1998). No eigenvalues outside the support of the limiting spectral distribution of large dimensional sample covariance matrices. Ann. Probab., 26:316– 345. [5] Bai, Z. D. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices. Springer, 2nd edition. [6] Bai, Z. D. and Zhou, W. (2008). Large sample covariance matrices without independence structures in columns. Statistica Sinica, 18(2):425–442. [7] Baik, J., Ben Arous, G., and Pech´ e,´ S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab., 33(5):1643–1697.

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 61

[8] Bao, Z., Pan, G., and Zhou, W. (2015). Universality for the largest eigenvalue of sample covariance matrices with general population. Ann. Statist., 43(1):382–421. [9] Bao, Z. G., Pan, G. M., and Zhou, W. (2013). Local density of the spectrum on the edge for sample covariance matrices with general population. Ann. Statist., 2015, 43(1): 382-421. [10] Benaych-Georges, F. and Knowles, A. (2016). Lectures on the local semicircle law for Wigner matrices. arXiv e-prints, page arXiv:1601.04055. [11] Burkholder, D. L. (1973). Distribution function inequalities for martingales. Ann. Probab., 1(1):19–42. [12] Cai, T., Ma, Z., and Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Related Fields, 161(3):781–815. [13] Ding, X. (2017). Asymptotics of empirical eigen-structure for high dimensional sample covariance matrices of general form. arXiv e-prints, page arXiv:1708.06296. [14] Ding, X. and Yang, F. (2018). A necessary and sufficient condition for edge universality at the largest singular values of covariance matrices. Ann. Appl. Probab., 28(3): 1679-1738. [15] Dobriban, E. and Liu, S. (2018). A New Theory for Sketching in . arXiv e-prints, page arXiv:1810.06089. [16] Dobriban, E. and Sheng, Y. (2018). Distributed linear regression by averaging. arXiv e-prints, page arXiv:1810.00412. [17] El Karoui, N. (2007). Tracy-widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab., 35(2):663–714. [18] El Karoui, N. (2009). Concentration of measure and spectra of random matrices: Ap- plications to correlation matrices, elliptical distributions and beyond. Ann. Appl. Probab., 19(6):2362–2405. [19] Erdos,¨ L. (2011). Universality of wigner random matrices: a survey of recent results. Rus- sian Math. Surveys, 66(3):507. [20] Erdos,˝ L., Knowles, A., and Yau, H.-T. (2013). Averaging fluctuations in resolvents of random band matrices. Ann, H. Poincar´e, 14(8):1837–1926. [21] Erdos,˝ L., Yau, H.-T., and Yin, J. (2012). Bulk universality for generalized wigner ma- trices. Probab. Theory Related Fields, 154(1):341–407. [22] Erdos,¨ L., Yau, H.-T., and Yin, J. (2012). Rigidity of eigenvalues of generalized wigner matrices. Advances in Mathematics, 229(3):1435 – 1515. [23] Feral,´ D. and Pech´ e,´ S. (2009). The largest eigenvalues of sample covariance matrices for a spiked population: Diagonal case. J. Math. Phys., 50(7):073302. [24] Hu, J., Li, W., Liu, Z., and Zhou, W. (2019). High-dimensional covariance matrices in elliptical distributions with application to spherical test. Ann. Statist., 47(1):527–555. [25] Hu, J., Li, W., and Zhou, W. (2019). for mutual information of large mimo systems with elliptically correlated channels. IEEE Transactions on Information Theory., 65(11):7168-7180. [26] Jing, B., Pan, G., Shao, Q., and Zhou, W. (2010). Nonparametric estimation of spectral density functions of sample covariance matrices: a first step. Ann. Statist., 38:3724–3750. [27] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal com- ponents analysis. Ann. Statist., 29:295–327. [28] Johnstone, I. M. (2008). Multivariate analysis and jacobi ensembles: Largest eigenvalue, tracy-widom limits and rates of convergence. Ann. Statist., 36(6):2638–2716. [29] Johnstone, I. M. and Nadler, B. (2017). Roy’s largest root test under rank-one alter- natives. Biometrika, 104(1):181–193.

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021 /Tracy-Widom limit in elliptical distributions 62

[30] Knowles, A. and Yin, J. (2017). Anisotropic local laws for random matrices. Probab. Theory Related Fields, 169(1):257–352. [31] Lee, J. O. and Schnelli, K. (2016). Tracy-widom distribution for the largest eigenvalue of real sample covariance matrices with general population. Ann. Appl. Probab., 26(6):3786– 3839. [32] Lee, J. O. and Yin, J. (2014). A necessary and sufficient condition for edge universality of Wigner matrices. Duke Math. J., 163 117-173. MR3161313. [33] Li, W. and Yao, J. (2018). On structure testing for component covariance matrices of a high dimensional mixture. J. R. Statist. Soc. B, 80(2):293–318. [34] Muirhead, R. J. (2005). Aspects of Multivariate . Wiley. [35] Onatski, A. (2008). The tracy-widom limit for the largest eigenvalues of singular complex wishart matrices. Ann. Appl. Probab., 18(2):470–490. [36] Onatski, A. (2009). Testing hypotheses about the number of factors in large factor models. Econometrica, 77(5):1447–1479. [37] Pillai, N. S. and Yin, J. (2014). Universality of covariance matrices. Ann. Appl. Probab., 24(3):935–1001. [38] Qi, F. and Luo, Q.-M. (2013). Bounds for the ratio of two gamma functions: from wendel’s asymptotic relation to elezovi´c-giordano-peˇcari´c’s theorem. J. Inequal. Appl., 2013(1):542. [39] Silverstein, J. (2009). The Stieltjes transform and its role in eigenvalue behavior of large dimensional random matrices. In, Random Matrix Theory and Its Applications (Z.D. Bai, Y. Chen and Y.-C. Yang, ed.), 1–25. Lecture Notes Series, Institute for Mathematical Sciences, National University of Singapore. [40] Silverstein, J. W. and Bai, Z. D. (1995). On the empirical distribution of eigenvalues of a class of large dimensional random matrices. J. Multivariate Anal., 54:175–192. [41] Soshnikov, A. (2002). A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices. J. Statist. Phys., 108(5):1033–1056. [42] Tracy, C. A. and Widom, H. (1994). Level-spacing distributions and the airy kernel. Comm. Math. Phys., 159(1):151–174. [43] Yang, X., Zheng, X., Chen, J., and Li, H. (2020). Testing high-dimensional covariance matrices under the elliptical distribution and beyond. Journal of . [44] Yin, Y. Q., Bai, Z. D., and Krishnaiah, P. R. (1988). On the limit of the largest eigenvalue of the large dimensional sample covariance matrix. Probab. Theory Related Fields, 78(4):509–521.

imsart-generic ver. 2020/08/06 file: draft-1218.tex date: June 3, 2021