Topics IN LINEAR SPECTRAL STATISTICS OF RANDOM MATRICES

by

Asad Lodhia

B. S. Mathematics; B. S. Physics University of California, Davis, 2012

SUBMITTED TO THE DEPARTMENT OF MATHEMATICS IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN MATHEMATICS AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY

JUNE 2017

2017 Asad I. Lodhia. All rights reserved.

The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created.

Signature of the author:...... Signature redacted Department of Mathematics Signature reda cted April 28, 2017 Certified by: ...... Alice Guionnet Professor of Mathematics Thesis Supervisor Accepted by:. Signature red a cte d ...... e!:5 John A. Kelner Professor of Applied Mathematics Graduate Chair MASSACHUSMTS1NgTiffOTE OF TEQHNOLOPY

AUG 0 1-2017 LIBRARIES 'ARCHIVES Topics IN LINEAR SPECTRAL STATISTICS OF RANDOM MATRICES

by

Asad Lodhia

Submitted to the Department of Mathematics on May 3, 2017 in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in Mathematics

Abstract:

The behavior of the spectrum of a large random matrix is a topic of great interest in probability theory and statistics. At a global level, the limiting spectra of certain random matrix models have been known for some time. For example, the limiting spectral measure of a Wigner matrix is a semicircle law and the limiting spectral measure of a sample covariance matrix under certain conditions is a Marienko-Pastur law.

The local behavior of eigenvalues for specific random matrix ensembles (GUE and GOE) have been known for some time as well and until recently, were conjectured to be universal. There have been many recents breakthroughs in the universality of this local behavior of eigenvalues for Wigner Matrices. Furthermore, these universality results laws have been proven for other probabilistic models of particle systems, such as Beta Ensembles. In this thesis we investigate the fluctuations of linear statistics of eigenvalues of Wigner Matrices and Beta Ensembles in regimes intermediate to the global regime and the mi- crosocopic regime (called the mesoscopic regime). We verify that these fluctuations are Gaussian and derive the covariance for a range of test functions and scales. On a separate line of investigation, we study the global spectral behavior of a random matrix arising in statistics, called Kendall's Tau and verify that it satisfies an analogue of the Mareenko-Pastur Law.

Thesis Supervisor: Alice Guionnet Title: Professor of Mathematics

2 INTRODUCTION

0.1 Limit Theorems in Probability

In classical probability theory, the Law of Large Numbers (LLN) predicts that if {X}N is a sequence of R-valued independent identically distributed random variables with dis- tribution P, then for any f E Cb(R) we have

N lim f(Xi) = E[f (X)] where X ~ P, (LLN) n-+ooN

where the convergence holds almost surely. Further, the Central Limit Theorem (CLT) implies

2 lim Z{ff(Xi) - E[f(Xi)]} = K(O, a'(f)) where a (f) := Var (f (X)), (CLT)

where the convergence holds in distribution. A key requirement in the LLN and the CLT is the independence of the sequence Xi along with the observation that each f(Xi) is bounded (and therefore has a mean and variance). If we conceive of the sequence {Xi}_ 1 as a collection of particles randomly scattered on the line, then the LLN can be interpreted as describing the limiting shape of the configuration they form. Further, the CLT tells us the typical deviation from this shape. If we were to go further and ask what the local behavior of these particle positions are, say, in the simpler setting where the Xi have a continuous distribution p(x) with compact support [a; b], then we obtain another interesting limit theorem from probability theory. Specifically, let E E [a; b] be a point such that p > 0 in a neighborhoud of E (this is called a point in the bulk of p) and let s > 0. We have that

# Xi : Xi E EE+ N } P (s), (LRE) IN p(E )_

that is, the number of points landing in an interval of the form [E, E + Np(E) converges to a Poisson random variable with parameter s. We refer to this as the Law of Rare Events (LRE). We may also ask what the behavior of the maximum (or minimum) of the sequence Xi is. The Borel-Cantelli lemma easily shows that almost surely, maxi 0, y > -1, # 0 and, s > 0 then

lim P maxXi - b < - S = e~_8'+, (LEV) N-+oo i

3 for some scaling constant B. The distribution on the right hand side above is called the Gumbel distribution; we refer to the above as the Law of Extreme Values (LEV). In the study of random matrix eigenvalues, there are analogues to the above Theorems that differ greatly from the classical setting. For the (LLN), there is the famous Wigner Semicircle Law which states [AGZ09] that if A, < < AN are the N real eigenvalues of a Wigner matrix (see Defition 0.1 below) then

I N 2 0.)2 -Y f(A2) -+ jf(x) dx 01 N f-A) 2 fW 27r almost surely for all bounded continuous functions f. The fluctuations of the above are also known N f (Ai) - E[f (Ai)] -=> A(o, .2(y)), and scale differently from the classic (CLT) which has a normalization of /1. Here o.2(f) has been computed explicitly [Joh98, BY05]. Furthermore, the local (bulk) behavior of the eigenvalues does not converge to a classical Poisson point process as in the (LRE). Rather for the case of GUE matrices, it converges to a determinantalpoint process [AGZ09, Lemma 3.5.6] with two-point correlation function given by the sine-kernel:

K(x, y) = I sin(x - y) 7r x - y this indicates that the dependency structure of the eigenvalues create a novel interaction. Finally, the maximal eigenvalue follows a behavior that is distinct from the usual (LEV). As it turns out the limiting behavior of the largest eigenvalue of the GUE follows the Tracy-Widom distribution [AGZ09, Theorem 3.1.4]

lim P(I N2 (AN - 2) < t}) = F2(t), N-+oo\ which is distinct from the Gumbel distribution (although the N3 scaling is predicted if we use the 'y = 21 value of the semicircle density) and is related to the solution to the Painleve equation (see [AGZ09, Theorem 3.1.5] for the formula for F2 (t)). One question that has been heavily investigated in recent years is the universality of these results for a broad class of random matrix models, see [Erdli] for a survey. In this line of recent investigation it was found that in the intermediate scale (between the local sine-kernel behavior and the global semicircle behavior) the semicircle law continues to hold for a broad class of Wigner matrices. Our first line of work in this thesis is to investigate the fluctuations about the semicircle law at this intermediate scale. These limit Theorems for random matrix eigenvalues have been observed for random processes arising in other contexts. For instance, the'length of the longest increasing subsequence of a random permutation follows the Tracy-Widom law, also, the limiting behaviour of certain statistics of random tilings follow the Tracy-Widom Law (see [BDS16] for a recent survey).

4 0.2 Summary of Part 1 In Part 1, we study linear spectral statistics of N x N Wigner random matrices W on mesoscopic scales. This particular section is based on joint work with N. J. Simm. The definition of our matrix model is:

Definition 0.1. A Wigner matrix is an N x N Hermitianrandom matrix W whose entries W.7 = Wji are centered, independent identically distributed complex random variables satisfying EWij 12 = 1 and EW4 = 0 for all i and j. We assume that the common 2 distribution [ of Wij satises the sub-Gaussian decay fc ecIz1 d p(z) < oc for some c > 0. This implies that the higher moments are finite, in fact we have EWij q < (Cq)q for some 11 2 0> 0. We denote by f - N W the normalized Wigner matrix. Finally, in case all the entries Wij are Gaussian distributed, the ensemble is known as the Gaussian Unitary Ensemble (GUE).

As stated above, a mesoscopic linear statistic is a functional of the form:

N XNeso(f) := f(dN(E - Aj)) j=1 and we define the centered version of the above as

XkeSf - Xmeso(f) - E[Xgeso(f)], (0.2) where f E Cla,/3(R) and Cl,',&(R) denotes the space of all functions with a-H6lder continuous first derivative such that f(x) and f'(x) decay faster than O(xI-' ) as Ixl -+ 00. Suppose that dN = NT where -y satisfies the condition 0 < y < 1/3 and consider test functions fi, . . . , fM E C1,',3(R) for some ac > 0 and 0 > 0. Then for a fixed E C (-2, 2) in (0.2) we prove the convergence in distribution

(myeso ... , XNmeso(fM)) => (X(f1 ),..., X(fM)) where (X(fi),. .. , X(fM)) is an M-dimensional Gaussian vector with zero mean and co- variance matrix

E(X(fp)X(fq)) j dk IkI fp(k)jq(k), 1 < p, q < M (0.3) and f(k) := (27r) 1 / 2 fR f(x) e-ikx dx.

5 0.3 Summary of Part 2. The results in this section are from a joint work with Florent Bekerman. In a similar vein to Part 1, we studied linear statistics of particles A, < A 2 < -- AN distributed according a 3-ensemble with potential V whose density is defined by:

exp ( - N 1 V(A2 )) H1

When V is continuous and satisfies

lim inf V(x)> 1 X_+0 # logJx >1 the empirical measure of the Ai, defined LN := - Z=1 J, converges almost surely towards a unique probability measure pv. For the sake of this paper, in addition to the above condition on V we made the following additional assumptions on V: " V C6 (R). " The support of pv is a connected interval [a, b] and

dp =: pv(x) = S(x) v(b - x)(x - a) with S > 0 on [a, b].

" The function V(.) - 3 f logI - -yJ dpv(y) achieves its minimum on the support only. With the above assumptions (the second and third are referred to as one-cut assump- tion and off-criticality respectively) we analyze the statistic MN(f) given in (1) where now f C (R) (compactly supported test functions with 5 derivatives) and 0 < a < 1. We prove that the limiting distribution of MN (f) is the same as in the Wigner setting above, that is MN (f) tends towards a gaussian with mean 0 and variance:

I If (f(x)-f(Y)) 2 dd 2w72 y ( ' which up to a scale factor is the same as the variance (0.3).

6 0.4 Summary of Part 3. The results in this section are from a joint work with Afonso Bandeira and Philippe Rigollet. This section differs from the above two in that it studies the limiting distribution of eigenvalues of a random matrix model rather than it's fluctuations about a limit, further the results were done at a macroscopic scale rather than the mesoscopic results in the above two cases. The model in the case consisted of a random matrix generated by a sequence of i.i.d random vectors Xi E RP with 1 < i < n. The distribution of each component of RP is unspecified, we only assume each Xi(j) is independent of any other Xi(k) and each Xi(j) has a distribution that is absolutely continuous with respect to the Lebesgue Measure. From this, we construct the matrix

T ( 1 E sign{Xi - Xj} 0 sign{Xi - Xj}, 2Isi

P LN(r )O i1 where A, are the ordered eigenvalues of T, we show under these assumptions that in the limit as n -÷ oc and lim := (0, 0), n-*oo n we have that LN converges weakly in probability to the distribution of the random variable

2-Y2 + 1- 3 3' where Y is distributed according to the standard Marenko-Pastur law with parameter y, whose density is

xfb - x)(x - a) if a < x < b, 0, otherwise, and has a point mass 1 - 1 at the origin if y > 1. Here a :=(1- 7) 2 and b:= (1+ VY) 2 . We prove that Kendall's Rank correlation matrix converges to the Mareenko Pastur law, under the assumption that observations are i.i.d random vectors X 1 , ... ,Xn with components that are independent and absolutely continuous with respect to the Lebesgue measure. This is the first result on the empirical spectral distribution of a multivariate U-statistic.

7 PART 1

MESOscoPic LINEAR SPECTRAL STATISTICS OF WIGNER MATRICES

Based on joint work with N. J. Simm

Acknowledgements: The authors express thanks to Alice Guionnet for suggesting the main techniques used in the paper. The author (A. L.) in particular expresses his grati- tude to Alice Guionnet, who provided helpful advice and support through the NSF grant 6927980 "Random Matrices, Free Probability and the enumeration of maps". The au- thor (N. J. Simm) expresses his gratitude to Yan Fyodorov, Anna Maltsev and J6r6mie Unterberger for stimulating discussions. N. J. Simm was supported on EPSRC grant EP/J002763/1 "Insights into DisorderedLandscapes via Random Matrix Theory and Sta- tistical Mechanics".

8 1.1 Overview As stated in the Introduction, the goal of this chapter is to study the limiting fluctuations as N -+ oo of the linear spectral statistic

N Xeso (f) Z f(dN(E - A)) (1.1) j=1 where A,.. ., AN are the eigenvalues of an N x N Wigner random matrix N as defined in Definition 0.1. Recall that the mesoscopic or intermediate scale is defined by the assump- tion that dN -+ oc as N -+ oo, but dN/N -+ 0 as N -+ oc. Therefore, if f is decaying suitably at oc, only a fraction N/dN of the total number of eigenvalues will contribute in the sum (1.1). In recent years, there has been growing interest in understanding the limiting distribu- tion of (1.1) on such mesoscopic scales. This interest has stemmed from, e.g., the appear- ance of novel stochastic processes in probability theory [FKS13], conductance fluctuations in disordered systems [EK14a, EK14b and linear statistics of the zeros of Riemann's zeta function [BK14], among others [BD14, BEYY14, DJ13, JL15]. Previously, the majority of studies concentrated exclusively on the macroscopic scale where dN = 1 and E = 0 in (1.1), denoted Xmacro(f). In this case it was proved for many different types of random matrix ensembles that, provided f has at least one derivative, the centered random variable

Xacro.(f) X macro(f) - EXmacro(f) (1.2)

converges in distribution to the normal law AF(0, a2 ) as N -+ oc. Furthermore, an explicit formula for the limiting variance o 2 was obtained, see e.g. [Joh98, BY05]. In analogy with classical probability, we refer to such results as central limit theorems (CLTs). Going to finer scales, the mesoscopic fluctuations of (1.1) are known to be highly sensitive when compared to the macroscopic scale; in fact the CLT must break down if dN grows too quickly [Pas06]. In particular if dN = N, only a finite number of terms contribute in the sum (1.1) and we cannot expect a Gaussian limit. The latter case dN = N is known as microscopic and will not be considered in this article, though see [ErdlI] for an extensive review. This ensemble was introduced by Wigner who proved that in the limit N -+ oc, the mean eigenvalue distribution of the normalized Wigner matrix N converges to the semi- circle law. A modern version of this result (e.g. Theorem 2.9 in [AGZ09]) states that this convergence holds weakly almost surely, i.e.

N 2 2 A f(x) /4 - x dx, N -+oo, almost surely. f( * -

Nj=127 2

for all bounded and continuous functions f. In order to state our main Theorem, we need a condition on the regularity and decay of the test functions f entering in (1.1). For a, # > 0, let C',ck, 3(R) denote the space of all functions with a-H6lder continuous first derivative such that f(x) and f'(x) decay faster than O(|x| 1'-0) as |x| -+ oc. Finally, recall the notation XesO(f) := Xmeso(f) - EXmeso(f).

9 Theorem 1.1. Let 'W be a normalized Wigner matrix as in Definition 0.1. Suppose that dN = NY where -y satises the condition 0 < -y < 1/3 and consider test functions fl, . . ., fm E C 1 '3 (R) for some a' > 0 and 3 > 0. Then for a fixed E E (-2,2) in (1.1) we have the convergence in distribution

(Zmeso(fl). X meso (fM)) === (X (fi),... , X(fM)) (1.3)

where (X(f1 ), . .. , X(fm)) is an M-dimensional Gaussian vector with zero mean and co- variance matrix

E(X (fp)X(fq)) = j dk IkI fp(k)f q(k), 1 < p, q < M (1.4) 2 7r _-00

and f(k) := (27r)- 1 / 2 fR f(x e-ikx dx.

This result improves and extends earlier work of Boutet de Monvel and Khorunzhy [BdMK99a] who proved Theorem 1.1 when 0 < y < 1/8, M = 1 and f(x) = (x - z)- (see also Theorem 1.4 below). Using very different methods, Erd6s and Knowles [EK14a, EK14b] proved an analogue of Theorem 1.1 for random band matrices at the same scale 1 < dN < N1 / 3 , which includes the Wigner matrices as a special case, but includes a heavier assumptions on the allowed test functions f. One year after the present work originally appeared on arXiv, He and Knowles [HK16] extended our Theorem 1.1 to all scales 1 < dN < N. Due to exploiting a similar Helffer-Sj6strand functional calculus approach to ours (crucially, in the present work this part of the calculation actually works on all scales 1 < dN < N), the regularity conditions in [HK16] remain identical to our Theorem 1.1. Apart from these works, CLTs for (1.1) were also obtained in several other ensembles [BdMK99b, SosO0, FKS13, DJ13, BD14, BEYY14]. Although these works extend to scales 0 < -y < 1, the proofs rely on exact formulas for the distribution of the eigenvalues, which are unavailable in the Wigner setting. Let us now make some general remarks about Theorem 1.1. On macroscopic scales E = 0 the results are different for Wigner matrices since the limiting covariance depends on the fourth moment of the matrix entries [BS04, Shc11]. On mesoscopic scales with fixed E E (-2, 2), we show that this difference vanishes, indicating a particularly strong form of universality for formula (1.4) (see also [KKP96]). As with the local regime, the limiting distribution of (1.1) is universal in the choice E E (-2,2) around which ones samples the eigenvalues. Apparently unique to the mesoscopic regime, however, is the scale invariance of the limiting Gaussian process: formula (1.4) is unchanged after rescaling the arguments of the test functions by any parameter (see also Section 1.3 ). Optimal conditions on the test functions given in Theorem 1.1 remains a significant issue ever since the seminal work of Johansson [Joh98]. The latter article suggests that in the macroscopic regime, only finiteness of the limiting variance should suffice to conclude asymptotic Gaussianity, see [SW13] for recent progress in this direction. In the mesoscopic regime we believe analogously that optimal conditions for asymptotic Gaussianity of (1.1) should be that fR Ikflf(k)1 2 dk < oo. It is historically interesting to remark that (1.4) already appeared in a famous 1963 paper of Dyson and Mehta [DM63].

10 From a probabilistic viewpoint, the semi-circle law (0.1) from the Introduction may be interpreted as the law of large numbers for the eigenvalues of a Wigner matrix; this may be considered the first natural step for the probabilist. The second natural step is to prove the CLT for the macroscopic fluctuations around (0.1). Going now to mesoscopic scales, only the first step has been investigated in detail for Wigner matrices, with the corresponding results known as local semi-circle laws, so called because they track the convergence to the semi-circle closer to the scale of individual eigenvalues. The local semi-circle laws turned out to be a very important tool in proving the long-standing universality conjectures for eigenvalue statistics at the microscopic scale [Meh04, Erd11, TV11]. Consequently, a number of refinements of Wigner's semi-circle law of increasing optimality were obtained in recent years [ESY09a, EYY12b, EYY12a, EKYY13]. Such results will play a crucial role in our proof of the CLT at mesoscopic scales. In order to state the local semi-circle law, it is convenient to work with the resolvent G(z) = (7 - z)-, Q (z) > 0. Then according to (0.1), the Stieltjes transform of R

1 SN(Z) := JTrG(z) N should be close to the Stieltjes transform of the semi-circle:

) := (x--z)-/4 -x 2 dx

The local semi-circle law shows that this convergence remains valid at mesoscopic scales !(z) = O(dN7) for 1 < dN < N. The following is a version (in our notation) of this result that we will use. Theorem 1.2 (Cacciapuoti, Maltsev, Schlein, 2014). [CMS14, Theorem 1 (i)] Fix i > 0 and let z = t+i with t c R and r > 0 fixed. Then there are constants Mo, No, C, c, co > 0 such that P sN (E + z/dN) - s(E + z/dN) I> KdN) (Cq)cq 2K- ( N77 for all j7 < i, IE +t/dNI 2 +rjdN, N > No such that ;> MO, and q < co () 1/8

To prove Theorem 1.1, we will start by proving it for the special case of the resol- vent f(x) = (x - z)- 1 . Indeed, one can interpret sN(E + z/dN) as a random process on the upper-half plane H and ask whether, after appropriate centering and normalization, a universal limiting process exists. We will show that the function (N/dN)(sN(E + z/dN) - EsN (E + z/dN)) converges to the I'+ -processes. These are certain analytic-pathed Gaus- sian processes defined on H. In order to define these processes, we first review a well-known Gaussian process-the Fractional Brownian motion. Fractional Brownian motion is a continuous time Gaussian process BH(t) indexed by a number H E (0, 1) and having covariance

2 E(BH(t)BH(s)) =cH (ItI H + _I2H 2H) (1.5)

11 where CH is a normalization constant. A generalization of the usual Brownian motion (H = 1/2), these processes are characterized by their fundamental properties of stationary increments, scale invariance (i.e. BH(at) = aHBH(t)) and Gaussianity. The parameter H is known as the Hurst index and describes the raggedness of the resulting stochastic motion, with the limit of vanishingly small H to be considered the most irregular (see e.g. Proposition 2.5 in [DOT03]). Although the fBm processes were invented by Kol- mogorov, they were very widely popularized due to a famous work of Mandelbrot and van Ness [MvN68] and since have appeared prominently across mathematics, engineering and finance, among other fields, see [LSSW14] for a survey of fractional Gaussian fields. Until recently however, no relation between the fBm processes and random matrix theory was known. The latter relation (discovered in [FKS13]) goes via the limit H -+ 0 and the following regularization

B77)(t) := s1/2+H ([eits - ]Bc(ds) + [e'ts - 1]Bc(ds) (1.6)

where Bc(s) := Bi(s) + iB2 (s) and B1 , B2 are independent copies of standard Brownian motion. One can verify that in the limit 7 -+ 0, one recovers precisely the fractional Brownian motion, i.e. B(0)(t) = BH(t). On the other hand, taking instead the limit H -+ 0 in (1.6), one obtains a process B 7)(t), about which the following was proved: Theorem 1.3 ([FKS13] Fyodorov, Khoruzhenko and Simm). For a GUE random matrix 7 GUE, consider the sequence of stochastic processes

W 7)((t) :log det NGUE - E -T -log det (GUE - E (1.7)

and Wjj'(t) := W()(t) - E(W77i7(t)). On any mesoscopic scales of the form dN -+ 00

with dN= o(N/ log(N)) and with fixed r E R, 7 > 0 and E E (-2, 2), the process WN7

2 converges weakly in L [a, b] to B(0 ) as N -+ oo. In particular, this gives a functional (in L 2 ) version of the CLT of Theorem 1.1 for GUE random matrices with fk(x) = log Ix - Tk - iqI - log Ix - i 7 . Either by computing the resulting H1/ 2 norm (1.4) or by computing the covariance of B (' as defined in (1.6) one finds the logarithmic correlations

E((B 7)(t) - B )(s))2 'log (t s)2 + 1(1.8)

Thus B "O inherits many of the fundamental properties of fBm, including Gaussianity, stationary increments, although now one has the 'regularized self-similarity' B(a,) (at) B 4)(t) (the latter following from the scale invariance of the inner product (1.4). More generally, Gaussian fields with logarithmic correlations have received a great deal of recent attention across mathematics and physics, see [FLDR12, FK14] and references therein. The

12 most famous example of such a field is undoubtedly the 2D Gaussian Free Field (GFF) [She07], which has important applications in areas such as quantum gravity [DRSV14], Gaussian multiplicative chaos and Stochastic Loewner Evolution [SS13]. The GFF is also believed to play a central role in random matrix theory. For example, similarly to (1.7), it has appeared in relation to the characteristic polynomial, either explicitly [RV07, AHM11] or in what appear to be its various one-dimensional slices [HKOO1, FKS13, Webl4]. More recently it has appeared as the height function for the minor processes of random matrices [BG, Bor14]. Going now to the Stieltjes transform SN(Z) of Theorem 1.2, the trivial relation

N (9 ( N R{SN(T + i1)} = aW( dN 0 N

suggests that the appropriate limiting object should be related to the derivative of B (). Although such a derivative could obviously be represented by differentiating inside the Fourier integral in (1.6), it can also be conveniently represented by a random series. More generally, for z E H and Hurst index H < 1, define the following 'Cayley' series

I (Z+i 2H-2 oo (2-_2H k) k- 1 }'l)~z) :=(z )((+U) (1.9) H V/2 2 r(2 - 2H)k! Z + Zi k=0

2) are a family whereof real i.i.d.{ {1), standard } Gaussians. A quick computation with the series (1.9) shows that it has zero mean and covariance structure

E(F'(zi)F'(z 2 )) 1 (1.10) (i(z1 - z -2))2-2H

with E(F'+(zi)F'(z 2 )) = 0. It follows that the processes F'+ are stationary on horizontal line segments of the complex plane (c.f. the stationary increments (1.8) for the integrated version). The F'+-processes were originally introduced by Unterberger [Unt09] in the context of geometric rough path theory and stochastic partial differential equations, but since then the relation to random matrix theory has apparently gone unnoticed. We will show that F'+ (z) is directly related to a fundamental object of random matrix theory: the normalized trace of the resolvent.

Theorem 1.4. Consider the resolvent G(z) = (' - z) 1 . Under the same assumptions as Theorem 1.1, the centered and normalized trace

VN (z) := dN (Tr G(E + z/dN) - ETr G(E + z/dN)) , (z) > 0

converges in the sense of finite dimensional distributions to FI (z) as N -+ oc. That is, for any finite set of points z1 ,. . .,zM in the upper half-plane H, we have

(VN (z1), - -,VN(zM)) = 0 ('+(z1), .. , F(zM)), N -+ oo. (1.11)

13 Furthermore, the process VN is tight in the space U(D) of continuous functions defined on a bounded N-independent rectangle D C H and VN converges weakly to r'+ in U(D).

Proof. For the finite dimensional convergence in (1.11), see Section 1.2. The tightness condition in U(D) follows from Corollary 1.30. U Intuitively, the underlying reason for the covariance structure (1.10) (with H = 0) appearing in random matrix theory can be traced back to the fundamental relation with the sine-kernel

lim lim E(VN(tl + im1)VN(t2 + 2)) -sin(w(t -t 2 )) 2 71,72-+0 N-+oo dN=N 7 (1- t 2 ) where (heuristically) going to slightly larger scales dN = NT with 0 < y < 1 has the effect of a large time separation Iti - t2I smoothening out the oscillations in the numerator, thus reproducing (1.10) with H = 0 (see e.g. [BZ93, Pas06] for additional heuristics). Theorem 1.3 can now be easily extended to Wigner matrices, starting with the identity

Wj )() = j (VN (t + in)) dt. (1.12)

Next, by the rigidity of Theorem 1.2, we have El VN (t+i) 12 bounded uniformly on compact subsets of t and 77 c [6, oo) for fixed 6 > 0 (see Proposition B.4). Then a standard tightness argument (see e.g. [Gri76]) combined with (1.11) allows us to conclude the convergence in distribution as N -+ oo,

R (VN(t + Z-7)) dt -+ (F/+(t + i7)) dt ~ B (7).

This implies that W77) - B") , though now in the Wigner case, subject to a more re- stricted growth of the parameter dN than in Theorem 1.3. In all cases considered here, optimal conditions on the growth of dN should be anything asymptotically slower than the microscopic scale, i.e. we expect our main results to hold provided only that dN = o(N). Our proof of Theorem 1.4 will follow closely the approach popularized by Bai and Silverstein [BS10a, BSo4]. The technique begins by exploiting the independence of the matrix entries of 'W to write Tr G(z) as a sum of martingale differences. Then a classical version of the martingale CLT implies that only 2 estimates are required in order to conclude asymptotic Gaussianity. For the macroscopic regime, this technique was applied successfully to conclude CLTs for many random matrix ensembles, though not without significant computations [BSO4, RS06, BWZ09, TV12, BGGM13, OR14]. The mesoscopic regime is characterized by the situation that Q (z) = O(dN') as N -+ oc, which is further problematic in that the majority of bounds for resolvents involve powers of !a(z)-1 . To overcome this we use the rigid control provided by Theorem 1.2 many times, but for technical reasons we were not able to avoid obtaining estimates of order N-1 (z)- 3 . Such estimates are the source of the restriction on dN in Theorem 1.1.

14 To pass Theorem 1.4 onto the general linear statistic (1.1) of Theorem 1.1, we use an exact formula (see Lemma C.1):

N eso(f) = -f J VN (T + i'q)OIf (T, 7) dT dr (1-13) S 0 -oo

where + and T' is 2a certain 2-dimensional extension of f, known as an almost-analytic extension [Dav95]. Since o9I' is deterministic, we can use our CLT for VN(T + irj) to conclude a CLT for Xeso(f). The main problem there is to interchange the distributional convergence for VN with the integrals appearing in (1.13). To perform such an interchange it will suffice to prove a certain tightness condition which will boil 2 down to having sharp control on EIVN(T + i?7)1 in the various regimes of T and 77. In the bulk of the Wigner semi-circle with q/dN >> N- 1 , the optimal bound of Theorem 1.2 plays a key role, since earlier estimates involving log(N) and N' factors would lead to a divergent estimate in the mesoscopic regime. In the regions outside the bulk, or with with very small imaginary part 77/dN< N-', we employ the recent variance estimates of [SW13] (see Proposition 1.23) which have the advantage of holding uniformly in 27 > 0, but the disadvantage of an additional factor d' appearing in the bound. In this way we are able to remove the assumption of very rapid decay, which appears in most studies on the mesoscopic regime [SosOO, BD14, BEYY14]. In contrast, there is no decay requirement in the macroscopic regime and the main important characteristic is the regularity of f [SW13], while here the decay adds an additional complexity to the problem. The structure of this chapter is as follows. In Section 1.2 we prove the finite dimen- sional convergence in Theorem 1.4 on scales 1 < dN< N1 /3 . In Section 1.3 we extend the obtained results to compactly supported functions f C Ccl& (R) and show how to replace the assumption of compact support with a suitable decay condition on f.

15 1.2 Convergence in law of the Stieltjes transform The goal of this section is to prove the following:

Proposition 1.5. Let z1 , ... , zM be M fixed numbers in the upper half of the complex plane H. Under the same assumptions as Theorem 1.4, the function VN converges in the sense of finite dimensional distributions to F+, i.e. we have the convergence in law

(vN (z1), - N -z) --- (irN'zM ), r,- - zM)), N - oc,

where T (z) is a Gaussian process on H with covariance C(zi, z 2 ) defined by 1 E (F '+ ( z ) '+ z ) =1 - 0 (Z)F'+Z2)) (i (ZI - z_ 2))2 and E(rl+(ZI)rf+(Z2))0 = 0.0. To prove Proposition 1.5, it is enough to fix a linear combination

M M ZM : cpVN(zp) => cp-(Tr (G(E + zp/dN)) - ETr (G(E + zp/dN))) P=1 P=1 dN and to prove that ZM converges in distribution to a Gaussian random variable with the appropriate variance. Our starting point is that ZM can be expressed as a sum of mar- tingale differences, to which a classical version of the martingale CLT can be applied, see Theorem A.1 in the appendix. To satisfy the conditions of the martingale CLT we shall follow the technique outlined in Chapter 9 of the book [BS10a] of Bai and Silverstein, which is equivalent to the work [BY05]. Our approach is also valid at the macroscopic scales considered in [BY05] and we feel gives a somewhat more accessible proof in this case. We first outline the martingale method and provide the notation used in the remainder of this Section. Let Ek denote the conditional expectation with respect to the --algebra generated by the upper-left k x k corner of the Wigner matrix W. Then we have the martingale decomposition N ZM = Xk,N k=1 where M Xk,N := (Ek - Ek1)Z cpd Tr (G(E + zp/dN)) P=1 dN Therefore, to prove Proposition 1.5, it will suffice to check the following two conditions: 1. The Lindeberg condition: for all c > 0, we have

N

ZE(Xk,N 2 lXk,Nl>E) - 0, N - oo. (1.14) k=1

16 2. Conditional variance: we have the convergence in probability

N M

ZEk_1[Xk,NI -2 cEc~mjC(zj,Tzm), N -+ 00, (1.15) k=1 l,m=1 N M Ek _[X2,N - cl CM O(Z, ZM), N -+ 00, (1.16) k=1 l,m=1

2 where C(zi, zm) = (i(Tl - Tm + i(,ql + q)))- denotes the covariance in Proposition 1.5. Before we proceed with the proof of these conditions, we provide some of the relevant notation. Important notation: Until now the complex numbers zp were independent of N. For notational convenience and in the remainder of this Section only, we will now allow the implicit N-dependence

zP := E + -+ n dN

As before, the sequence dN - oc as N -* oc with dN/N - 0 and Tp, rp are fixed real numbers with q, , 0. We fix E E (-2 + 6, 2 - 6) strictly inside the support of the limiting semi-circle for some small 6 > 0. Let 7 k be the N - 1 x N - 1 Wigner matrix obtained by erasing the k-th row and column from N. We denote by Gk(z) = (?k - z)- 1 the corresponding resolvent. The following formula, a consequence of the Schur complement formula from Linear Algebra, will play an important role:

I1+ h Gk (Z)2hk Tr (G(z)) - Tr (Gk (z)) k

- kk - z - h Gk(z)hk where hk is the k-th column of 71 with the k-th entry removed [BS10a, Theorem A.5]. Recall the following standard notation for convergence of random variables in LP. For a sequence of random variables {XNI=,, we write XN = OLP(u(N)) to mean there exists a constant c such that EIXN IP < cu(N) for all N large enough. We will repeatedly use the standard fact that if XN converges to X in probability and YN converges to zero in LP, p > 1, then XN + YN converges to X in probability. We start with the proof of the Lindeberg condition (1.14) which follows from the following stronger result (due to the trivial inequality IXk,N 2 lXk,Nj>f < E Xk,N16 with 6 > 2):

Lemma 1.6 (Lyapunov condition). For all mesoscopic scales 1 < dN < NE with c > 0, there is an integer 6 > 2 such that

N M Z E(Ek - Ek-1) cp Tr G(z) -+0, N -+ *o. k=1 P=1

17 Proof. By the triangle inequality it suffices to verify the claim when M = 1, c1 = 1. By definition of Ek we have (Ek - Ek_1)d-N TrG(zi) = (Ek - Ek_1)Zk,N where Zk,N dy (Tr G(zi) - Tr Gk (zi)). Then Schur's complement formula implies

1 1+htGk(z)2hk ZkN = dN Wkk - - h4Gk(zl)hk 1 - 11 k 2(z ) + N- 1 Tr (Gk(z1) 2 )Gkk(z1) dN Nk where we made use of the identity for the diagonal elements of the resolvent

1 Gkk(z1) =1 Nkk - Z1 - h G and defined 6k,,(z) := ht Gk (zI)nhk - N-Tr (Gk (zl)n) (1.17)

By the conditional Jensen inequality, we have lEkZk l < EklZk l6. Hence it is sufficient to prove N E E|Zk,N 1 -+ 0, N -+oo (1.18) k=1 The limit (1.18) follows from standard concentration inequalities applied to the variables d- 1 6k 2 (z), Gkk(z) and d- 1N-TrG(z) 2 . In particular, Lemmas B.1, B.2 and 1.11 show that for any fixed q > 0, we have the estimates

1 d- 6k, (z) = OLq((dNIN)ql2) Gkk(z) = OLq (1), 1 1 dN N- Tr G(z) 2 < OL, (max{dN , (dN/N)q}) ,

Then applying Cauchy-Schwarz and choosing 6 > 0 large enough, we obtain (1.18). U

Remark 1.7. In the macroscopic regime dN = 1, one can argue similarly that Zk,N = (1 + s'(z))(-z - s(z))- 1 + O(N- 1 / 2 ) with high probability. The leading term in this asymptotic is deterministic and does not contribute to (Ek - Ek l)Zk,N, while the error term is small enough to imply (1.18). We now proceed to the remaining and most challenging part of the proof of Proposition 1.5, which is to verify condition (1.5). Before we proceed, it's worth noting that both X2 and IX12 are finite linear combinations of terms of the form

1 1 (Ek - Ek1)I TrG(zi) x (Ek - Ek-1) Tr G(z 2 ) dN dN and so it suffices to prove the convergence for a single mixed term in the linear combination. Setting Yk (z) (Ek - Ek-1)d- 1 Tr G(z),

18 our essential goal in the remainder of this section will be to prove that we have the con- vergence in probability

N 1

CN(Z1, z2) := E 1[Y(z1)Yk(Z)1 - (i(a 1 -T2 1 +72)))2- 00 (1.19) k=1

In what follows, the proof of (1.19) is divided into 3 main parts: first we rewrite CN(Z1, z2 ) in a form suitable for the computation of asymptotics, then the main asymptotic results are obtained and finally they are used to prove (1.19). Simplifying the covariance

kernel: Our first Proposition shows that CN (z1, z2 ) can be approximated in the following way

Proposition 1.8. In terms of the variables (1.17), define the covariance kernel

-1 [2 N j~ kl(.0

CN(Z1,Z2)2 aZ az2 s(z1)s(z2) ZEk1[E6 N(z)Ek 1(z2)] (1.20)

Then we have

CN (Z1, 7Z2) =N (ZI, Z2) -+OL I( 2 /N) (1.21)

Proof As in the proof of the Lindeberg condition, we start with Schur's complement formula which implies that

I1+ ht G(Z)2 hk Yk(z) = (Ek - E-10 k dN Ekk - z - htGk(z)hk

Rewriting Yk(z) via the small terms (1.17) and expanding, we obtain the exact identity

6 I 7Nikk- 3 Nu(z) Yk(z) = (Ek - Ek_1) +Ek,N W az dN z+ N- 1 Tr(Gk(z))

where

C?-Lkk -:- 5Ej-z2G N(z) 1 ))2 N1- N 1 Ck,N (Z) := (E - Ekl_) dN (z + N- Tr (Gk(z))) 2 dN (z + N 1 Tr (Gk (z)))J

This identity is implicit in the work [BY05] (see Section 4.1 in [BY05]), but we provide the derivation in the Appendix, Lemma B.3. Then as in the proof of (1.18), Lemma B.1 implies the bound Ek,N(Z) = Otq((dN/N)q) uniformly in k. Now we replace (z+N-1 Tr (Gk(z)))- 1 with -s(z), causing an error to Yk(z) of the form wNT(W) ) 1 k (Ek - Ek_1)- dcj7k N rG + s(W) 27ridN fsz (W - Z)2 (w+ N-1rG

19 where we employed analyticity to write the z-derivative as a contour integral over a small circle S, centered at z with radius of order dN 1 (avoiding intersection with R). By Lemma B.1, 6lj(w) is )q11 a 1 + S) is OLq((dN/N)q- ) for some small c', so the error caused by this replacement is at most Otq((dN/N)q). Therefore, we have Yk(z) = Yk(z) + OLq((dN/N)q) where 0 1 Yk(Z) = -(Ek - Ek_ )--s(z)(kk - az dN Using properties of the conditional expectation, we compute that

N I a2 Ek1 (z1)(z2) = (Z) S(Z2) + N (Zl, Z2) k=1 d2 1z Z2

Then the covariance CN (z1 ,z 2 ) can be estimated as N CN(Z1, Z2) N(Z1, Z2) Z Ek1(Yk(z1) - k(Z1)) Yk(Z2) k=1 N

+ E Ek_1 (Yk(z2 ) --Yk(z2 ))Yk(z1) k=1 N

+ ZEk_1(Yk(z) - Yk(Zl))(Yk(Z 2 ) - Yk(z 2 )) k=1 + OL1(/d2) where we used that s'(zi)s'(z 2 ) are uniformly bounded for any fixed E E (-2, 2). By our estimates for Y - Yk, the second last term above is OL1 (d 2/N). For the middle terms, we apply Cauchy-Schwarz (twice) to obtain

N N N

E ZE Yk(z)(Yk(z 2 )-Yk(z 2 )) < E|Yk(z1) -- k(zl)|2 E Ek (z2)|2 k= 1 k=1

- E|EIYk (z1) - Yk (ZI1|2 ENz, +d2iz

The first term in the product above is O(V/d2N/N). In the remainder of this section, it 1 will become clear that ON (zI, z 2 ) is bounded in L , see Proposition 1.22. This completes the proof of Proposition 1.8. M Remark 1.9. Before we proceed further, note that the approximate covariance kernel

CN (z1 , z2 ) in (1.20) is naturally expressed in terms of the auxiliary kernel

N KN (Z1, Z2) = Ek_1[Ekjkj (z1)Ek kl (Z2)]. k=1

20 Then by Cauchy's integral formula and analyticity, we can write the derivatives in (1.20) as

d2 s(wi)s(w 2 ) KNww) CN(Z1,Z2):= 2 dw1 j z2 2 KN(W1, W2) dN (27 i2 Szi (Z I2 (Z2 - W2)2

2 2 where S, is a small circle with center z and radius 1/ (2dN) /(T1 - T2 ) + (771 - q2 ) . This ensures that for fxed -1,T2,M1,72 and N large enough, Sz 1 and Sz2 are disjoint sets. In the degenerate case that z 1 Z2 , it is enough to use the Cauchy integral formula with a single circle S1. If we obtain uniform estimates on KN of the form KN = KN + OL1(u(N)), then the error in approximating CN (Z1, Z2) is of the same order in N:

s(wi)s(w ) 11 2 kNI E I dw1 51' dw2 2) |W)(WIKN - d2N (27ri)2 S i 1Z -1W)2 (Z2 - W2T2

K dw1d 2 s(wi)I|s(w2)I < IIJ2 2 dwi dW2 S(J1SW2 2 _Ju(N) I d 47 SZ SZ 2 I -W 1 22 1 1 < 2 IJu(N) I - (T1 - T2) + (771 - 772)2

The conclusion of this remark is that it will be sufficient just to understand the convergence in Ll of the kernel KN(zl, Z2).

Our first Lemma in this direction rewrites KN (Z1 , Z2) in terms of the matrix elements of the resolvent Gk(Z) := (k - Z)-'. We will frequently make use of the shorthand notation G := Gk(zp), p =1, 2 to emphasize the dependence on the variables z, and z2 .

Lemma 1.10. The covariance kernel KN (Z1, Z 2 ) satishes the exact identity

N (1.22) KN (Z, Z2 ) =N-2 E k_1 SEkj (G ) Ek(G (k k=1 i

Oik = E(IWik|2 _ 1)2

Proof. By definition we have

1 1 E(jk, 1(Z1)) = N-1 I Ek(G( ) 2ijWikWjk - N Ek(G );j i

21 Multiplying out the resulting terms, we obtain

Ek_1[Ek (jlf(zi))Ek (jhk (Z2))] N- 2 Ek EEkE(G"G)GEk(G )pqVikWjkWpkWqk (1.24)

i

2 + N- Ek_ 1 > Ek(G(W)) ik 1 Ek(G2 )jj(WjkI -- 1) (1.25) +~ ~ Nkk %3k(V)~WkWjk >EEk(Gkl)jl~k2-1 (1.26) i

2 2 2 N~ Ek- 1 >3 Ek(G iiEk(G ))(Wjkl2 _ 1)(|Wik| - 1) (1.27) i

It is clear that the sums (1.25) and (1.26) are identically zero, since the vector {Wik}ik consists of centered independent random variables satisfying ElWik|2 = 1. Similarly, the first summation (1.24) will be zero unless i = q and p = j, this gives the first sum on the right-hand side of (1.22). The last term (1.27) will be zero unless i = j, which gives the second sum in (1.23). 0 We now proceed with the estimation of the two sums in (1.22) and (1.23). To do this we need precise estimates on the resolvent matrix elements appearing in the sums.

Lemma 1.11 (Bound on the resolvent). Let z = E + '+IdN as above and consider an off-diagonal resolvent matrix element (Gk (z))pq with p # q. Then for any positive integer s we have positive constants c, C such that

E|(Gk(Z))pql| < (CS) s (d )s/2 (1.28) ,qN for all k,p, q, Er c R, c > 0 and dN > 0. For the diagonal matrix elements, the same result holds but with (Gk(z))pp - s(z) in place of (Gk(z))pq. Proof. This follows from Lemma 5.3 in [CMS14], see also previous works on the local semi-circle laws [ESY09b, ESY09a, EYY12a, EYY12b]. E The second sum (1.23) over diagonal elements of the resolvent can now be dispensed with immediately:

Lemma 1.12. For all points z1 and z 2 with non-zero imaginarypart and on all mesoscopic scales 1 < dN < N, we have the convergence in Ll:

N 07 -2 Ek(G('))iiEk(G (2) N -+oo s(zI)s(z2) E Y k k )ii/3ik (1.29) dN z1 z 2 k=1 i

N N N2 >> 2 Ek(G jiiE(G = N- s(z1)s(z 2 ) >3> /ik + OL1( /dN/N ) (1.30) k=1 i

22 where we assumed that the fourth moment of each matrix entry Wik is finite. Since the left-hand side of (1.29) is analytic, we can use the strategy outlined in Remark 1.9. Hence after inserting (1.30) into the left-hand side of (1.29), the OL ( dN/N) bound remains of the same order while the leading term is of order dN2 since s and s' are analytic and uniformly bounded provided E C (-2+5,2 - 6). This completes the proof of the Lemma. E

Remark 1.13. This shows that in the mesoscopic regime, the process VN (z) is insensitive to the value of the fourth cumulants of the matrix elements, provided they are finite. This is in contrast to the regime of global fluctuations where the fourth cumulant is known to appear explicitly in the limiting covariance formula, see e.g. [LPO9a, BY05, Shc1]. This suggests that the Gaussian fluctuations obtained in the mesoscopic regime are more universal than in the global regime.

The remaining challenge now is to compute the first sum in (1.22). For that we shall show that to leading order on the scales 1 < dN < N 1/3 , the sum

S1 := I Y3 Ek(Gk(Z1))pqEk(Gk(Z2))qp (1.31) p

Proposition 1.14. Consider the quantity S1 in (1.31). We have the estimate

1 1 2 zISI = -s(zi)Si - S(Z2) - k IS(Z2)S1 + OL 1((d3N- ) / ) (1.32) N NN where the OL bound is uniform in k = 1, .. ., N.

In what follows, the proof of this Proposition will be divided into a number of Lemmas which eventually culminate in Lemma 1.21. We start with the Green function perturbation identity z1(Gk(Z1))pq = -Jpq + N-1/ 2 ((Wk)pj(Gk)jq (1.33)

It turns out to be helpful to separate out the correlations between Wij and Gk by intro- ducing the matrix Gkij defined as the resolvent of the matrix 'Wk with entries (i, j) and (j, i) replaced with 0. Simple algebra shows this is a perturbation of the original resolvent:

G(1) -Gk kU3 =-N-1/2G(l)cij((Wk)iEijkij + (Wk)jiEji)G' i k , (1.34) where Eij is the elementary matrix which is entirely zero, except entry (i, J) which is equal to 1, while the coefficients ci z= 1 if i 4 j and cii = 1/2. We now insert (1.34) directly into (1.33) leading to the following expansion.

23 Lemma 1.15. The matrix elements of the resolvent (G(' )pq satisfy

1 2 1 zi(Gl))pq = -6pq + N-1 (Wk)pj (G( )jq - s(zi)(1 - 3/(2N))(G( ))pq

- N~-1 c I(Wk)pj12 ((G )jj - s(zi))(G(1),q- N-1 cpj(I(Wk)j12 - 1)s(zi)(G(1)),q

- N- 1 cp (Wk)gj(G)) p (G( 1))jq (1.35)

Now inserting (1.35) into the definition (1.31), we obtain the decomposition

z1S1 = S1,1 + S1,2 + S1,3 + S1,4 + S1,5 + S1,6

where

S1,1 = -N-1 Ek (G ()),, (1.36) p

S1,2 = N-3 / 2 ( (Wk)pj Ek(G(% )jq Ek(G(2 ) )qp (1.37)

p

S1,3 = -s(zi)(1 - 3/(2N))S 1 (1.38)

2 2 S1,4 = -N- ( cvj EkI(Wk) kj2)((GJ)3 - s(z1))(G(1))pqE(G ))qp (1.39)

p

2 S1,5= -N- 2 cpjEk(I (Wk)Pj1 - 1)s(z1)(G (1 )pqEk(G 2 ) )qp (1.40)

p

S1,6 = -N - cpjEk (Wk) -(G (1)p (G (1 )jqE (G 2 )q (1.41)

p

To estimate these sums we will apply Lemma 1.11 repeatedly, together with the follow- ing Lemma which allows us to bound the matrix elements of the perturbation (Gki,(z))pq. Lemma 1.16 (Bound on the perturbation). The same bound (1.28) of Lemma 1.11 holds with (Gkij(Z))pq in place of (Gk(z))pq. Proof. We will use that higher moments of the entries Wij are finite. Using the perturbation identity (1.34) we get

(Gkij)pq =(Gk)pq + cij (Wk)i jN-11 2 (Gkij)pi(Gk) jq + cij(W k)iN-1 2 (Gkij)pj (Gk)iq (1.42) Iterating (1.42) once more and applying Minkowski's inequality shows that

I(Gisj)pq|S cs|(Gk)pqI + csI(Wk)ijIN-s/ 2 (|(Gk)pi(Gk)jqI| + I(Gk)pj(Gk)ql8 ) + cS I(Wk)j 1|2'N-s (I (Gkij)pi(Gk)ji( Gk)jq + (Gkij)pj (Gk) ii(Gk)jqI\ +|(Gkij)pi(Gk)jj(Gk)iqI + (Gkij)pj(Gk)ij (Gk)jiq|I)

24 for some constant c, depending only on s. From the deterministic bound I(Gkij),ql IIGkijII < dN/r and Lemma 1.11, we find by Cauchy-Schwarz that

E|(Gkii)pq1| < csEI(Gk)pq|8 + O(N- s2) + O((dN/(N))s)

Note that the obtained error term is smaller than (dN/(rN))s/ 2 found in (1.28). This completes the proof of the Lemma. E As is suggested by the structure of the terms (1.36)-(1.41), in what follows we will encounter many sums of the following generic form

S O(W) 17 Ek(Gm))Cm,Cm+1 (1.43) C17C2,...,Cr mCI for some index set I and a complex valued function q. By simply counting the number of occurrences of (G "))CmCm we obtain

Lemma 1.17 (Trivial Bound). Suppose that Ekb(W)1 2 is uniformly bounded. Then

EIQ| < cNr(dN/N)111/ 2 (1.44) with the same result holding if any of the factors in (1.43) are replaced with the perturba- tion G(m)k(a(b for any a, b E I. Remark 1.18. In practice we will apply a slightly more general result, including cases where # or Gk may or may not appear inside a conditional expectation. By conditional Jensen and Cauchy-Schwarz inequalities, the bound (1.44) continues to hold. Proof. This follows by repeatedly applying the Cauchy-Schwarz inequality in conjunction with Lemmas 1.11 and 1.16. U We start by giving some first estimates on the terms S1,1 to 81,6 defined in (1.36)- (1.41).

Lemma 1.19 (Estimates of S1,1, 81,3, S1,4 and S1 ,6 ). We have the following bounds, holding uniformly in k = 1, ... , N:

81, =k s(z ) + OL ((dNN') 1/ 2 ) (1.45) N1 2 S1,3 = -s(zi)S1 + O (dN/N) (1.46)

|1, 41

E18, 6 1 < c d3 N-1 (1.48)

Proof. These estimates are straightforward consequences of the trivial bound of Lemma 1.17. U With S1,5 we have to take a bit more care because the trivial bound actually produces a bound of order dN which is divergent. To fix this we have to exploit independence and the fact that EI(Wk)pj1 2 1.

25 Lemma 1.20 (Estimate for S 1 ,5 ). We have the following bound, holding uniformly in k = ,.,N:

EIS1,5f c d2/N

Proof. First denote R - := (G(1)pq E (G (2 )qp q

1/2

EIS1,1 < N-2( 2 2 N EIRp12 E E cPj cp,2 (1(Wk)P3 1 _ 1)((Wk)Pj 2 1 _ p

= N- 2 ( EIRp| 2 Zc2 E(|(Wk)j| 2 _ 2

Lemma 1.21. We have the following bound, holding uniformly in k 1, ... , N:

1 1 1 2 S1,2 = -(k - 1)N- s(z 2 )S1 + OLl ((dN- ) / ) (1.49)

Proof. We start by replacing (G( 2 ))qp with (G(2))qp in the definition (1.37) of S1,2 and denote the modified sum by 81,2. We will show that S1,2 converges to 0 in L2 . We have

E1,2| = ~E (Wk)Piii (Wk )P2j2 E plG, x(GQ)q kp j 31 32 P1 P2,ql 'q2

x Ek(G2,) )jq2 Ek(G 2 j2 ) 2P2 (1.50) where for the proof of this Lemma and unless otherwise stated, the summation indices run from 1 to k - 1. To make (Wk)p13 1 and GkP 2 j 2 independent we use a similar perturbation 2 formula to remove the matrix element (Wk)p 1lj from Gk 2j 2 . Therefore, we define G as the resolvent of the matrix W with the k-th column and row erased and with entries have (Wk)Plj , (Wk)jlP1 , (Wk)P 2j 2 and (Wk)j 2P2 replaced with 0. It is easy to show that we a similar identity

G- Cp))j=(G jp )a - c(,G(Wk)pjJN-1/2(G) 2j2l (1.12)) -cpljl (Wk)jlpl N- 1/2 (G (2,ll)aj (G(2), (1.51)

26 Then if we replace the last two factors in (1.50) with (1.51) the main term has identically zero expectation unless both p, = ji and P2 = j2, which we assume for the moment does not hold. The higher order terms in (1.51) give rise to an error of the form

1 2 A N- E E (Wk)Piljl(Wk)P 2j 2Ek(G2 )i Ev(G 1 qip jl1,j2,Pl P2,ql ,q 2

and four further error terms which do not differ in any important way from A above.

To estimate A, we now force independence in (Wk)P 2 ,j2 by replacing G with G k2, Again the main term obtained by this replacement has expectation 0 because E(Wk)P 2j 2 0. The error terms in (1.51) give rise to sums of the form

4 N- E (Wk)pliJ (Wk)p2j2Ek(G '2 (G j)qp2(G >E )jqiEk((Wk)p2j2 1 )j1)

P2 ql ,q2

xEk(Gl)lj )hq2Ek ((Wk)pljl (G 11)qaj (Gd 2)PlP2)

We call such a term maximally expanded because we can no longer exploit independence of the different factors to reduce the size of the sum (none of the factors have zero expectation at this point). See also [EK13] for related methods. By the prescription (1.44) , we have

A' < cN- 4 N 6 (dNN- 1 )3 < cd3 N 1

It is clear that all error terms resulting from the replacement (1.51) give the same bounds after employing this procedure. Now consider the diagonal terms pi = P2 and Ji = j2 contributing to &1,2:

2 N-3E I(Wk).I|2Ek(G" )j Ek(Gk(0))q, , Ek(G(2 )jq2 Ek(G ) j,p,q ,q2

2 We conclude that EIW 1 ,21 < cd3N/N. Now again using (1.34) we have

S1,2 = -3/2 (Wk)p E(G() ) ((Gpj) qqp -(G kpj )+ OLL d3 N1 j,p,q

= -N 2 1 cpj(Wk) PjEk(GJ )jq Ek((G() )q(G(k)ip) (1.52) jpq

2 2 - cPjI(wk)PjI2 Ek(Gl)J )jqEk((G (2))qj( ( ))P)+OL1 (~d ~N-1)1.53) j,p,q

27 By (1.44), we see that (1.52) is OLI( d3N- 1 ). The final term (1.53) (let us denote it S1,2) can be written

2 S1,2 = - N s(Z 2 ) E cpjEk(G )j Eqk(G () )qj (1.54) j,p,q 2 2 2 - N- s(Z 2 ) ((Wk)p1 - 1)c E k(G2) )jq Ek (G( ) )qj (1.55) jP q 2 - N- E cPjJ(Wk)pj12Ek (G(l )Ejqk (G(2))qj ((Gk )pp - s(z 2 )) (1.56) j,p,q

To estimate the term (1.55), first replace G(2) with G() and note that by (1.44) the error in making this replacement is OL1 ( dNN- 1/ 2 ). We denote the remaining sum 71,2. We have

2 E Il, 2 2 2 2 N-4s( 2) (W(Wk) 1311 - 1)(1(WV)p 2i 21 - 1)Ek(Gl )jiq Ek(G2) 1 1

32 'P2 ,q2

x Ek 0) E (2) x Ek(GP2 2 )j2q2 Ek(GCkp 2 j2 )q2j2

This sum has a similar structure to that seen in the computation of EIS1, 2 2. Apply- 2 2 ing exactly the same procedure shows that EIFi, 21 < cd3/N . The term (1.56) is OL1(V dN- 1 ) as follows directly from the generic bound (1.44). In (1.54) all terms in the sum where p = j will only contribute OL1 (dNN- 1 ) so can be neglected, which justifies us setting cpj = 1 in (1.54). Finally, note that (1.54) has a similar form to Si except with the perturbed resolvent elements (G) )jq. By the trivial bound (1.44) they can be exchanged with the original ones (G()jq at a cost OL1((dNN- 1)). Then the summation over p just gives a prefactor (k - 1), leading to (1.49). Proof of Proposition 1.5. With the most significant challenges dealt with above, our aim now is to complete the proof of Proposition 1.5 by solving relation (1.32) and computing the limit N -+ 00. In other words, we finally prove the conditional variance formula (1.15) which is enough to verify Proposition 1.5.

Proposition 1.22. Consider the approximate covariancekernel CN(Z1, Z2 ) given by equa- tion (1.20), where Zk = E + j, we have the estimate

CN (Z1, 2) 2 + OL1 (1/dN) + OL1 log(dN) d/N) (1.57) (i (TI - T2 + Z*(n1 + n2 )))2 and

CN(Z1,Z2) OL1(1/dN) + OL1 log(dN) d /N) (1.58)

28 Proof. Recall that ON (z 1 , z 2 ) can be expressed in terms of the auxiliary kernel KN (z 1 ,-2) in (1.22): & 2 N (Z1,Z2) = d2 &z1&z2 s(zl)s (z2 )KN(z1, z2 )

In turn, KN(z1, z 2 ) was expressed in terms of the sum Si defined in (1.31) and computed in Proposition 1.14. Let us denote the error term in that Proposition by EN,k. Then we have EN,k k - 1 s(zi)s(z 2 ) N 1 - -s(zi)s(z 2) 1 - k1s(z1 )s(z 2 )

where EN,k = OL1( dN/N) uniformly in k. Then by definition (1.31) the kernel KN can be written as

KN(z,z2 1 )s(Z2 ) + I k Ek OL1(dN/N) 1 - kls(z1)s(z2) N k1 - S(z1)s(z2) k=1k= N (1.59) These sums are close to Riemann integrals, but before we calculate them we must take

into account a subtle feature of the mesoscopic regime: when the points z, and z 2 have opposing signs in their imaginary parts, there is a singularity in the denominators of (1.59), due to the asymptotic formula

2 1 - s(zI)s(4j) = 11 + r2 + (T2 - 1)i + O(d- ) (1.60) d N \/4 -E 2N

If the signs of the imaginary part are the same, there is no singularity and one finds that the limiting covariance is identically zero, as in (1.58). Now let us control the errors in (1.59), assuming there is a singularity. For 0 < u < 1, let 4'(u) := 1 - us(zi)s(T2)j- 1. Then standard results about the Riemann integral show that the error term in (1.59) is bounded in L 1 by

dN/IN _ < d3NN fo () du + NT k=1 N Z' s(z1 )1

where 1 -IITV is the total variational norm of the function b. Since us(zi)s(z 2 ) has non-zero imaginary part, 0'(u) exists and is Riemann integrable. Then the total variational norm can be written IIQIITv = f1 0'(u)l du and a simple calculation taking into account the

asymptotics (1.60) shows that || ||Tv = O(dN) as N -+ oc, while f t 0(u) du = O(log(dN)) as N -+ oo. The error from approximating the first term in (1.59) can be estimated in the same way, and we get

KN(z1 ) s(z)S( duO1 log(dN) d/N) -- 1 - log(z- s(z1)S( 2) du +rN) 1 -= s(Z)ss(_Z_))- + OLi log(dN)V3N/N (1.61)

29 Inserting (1.61) into the definition of ON (zi, Tj) and bearing in mind Remark 1.9, we have

- s- Z2)N (Z1, - 2) ON (Z1, T2 2 s

(s'(1)s'() 2 (1 s'zs(~2- 2s'(Z2)s'(Z2) + OL' log(dN) d/N") N1 N

0 +O(1/dN) + L1 log(dN) 3/3 ((71 - T2 + i(71 + 72)))2

where in the last line we applied formula (1.60) and used that

s'(Z1)s'(,2) = 4 1 E2 + 0(dN )

This completes the proof of Proposition 1.22. Therefore we have finally verified the con- ditions (1.15) and (1.16), hence completing the proof of Proposition 1.5. U

1.3 Tightness and linear statistics The goal of this section is to extend the CLT obtained in the previous section to a wider range of test functions, focusing on the distributional convergence of (1.1). Our approach will be to combine the Helffer-Sj6strand formula (1.13) with the main result of the last Section, telling us that the finite dimensional distributions of VN converge to I . If we can interchange the integrals in (1.13) with this distributional convergence then the CLT for mesoscopic linear statistics (1.1) would be proved. In practice we will need to prove a certain tightness criteria in order to justify this interchange, involving uniform estimates on the second moment E VN(T + 7) 12. We will handle those estimates mainly with Theorem 1.2, but this Theorem only applies in the bulk region with 71 > dN/N. For smaller q we need the following variance bound, due to Sosoe and Wong (in our notation). Proposition 1.23. Let c > 0. Then for 0 < 1 < 1, and |E + T/dNI < 5 we have that there is a universal constant C > 0 such that:

2 EIVN(T i77)1 < Cd'q-2-, (1.62)

Proof. See Proposition 4.1 in [SW13]. The reader should compare this result with that obtained from Theorem 1.2 which gives an improved bound EIVN(t+izn) 2 < C71 2 (Proposition B.4) but in a more restricted region. To prove the CLT for (1.1), we consider two situations. Firstly we consider the case

that f E C, '0 (R), the H6lder space of compactly supported functions f such that f' is H6lder continuous with exponent a > 0. The hypothesis of compact support will allow us to avoid complications coming from the edges of the spectrum and will serve as a warm up, illustrating our general approach.

30 In the last subsection we consider the more challenging case where the hypothesis of compact support is replaced with a more general decay condition f(x) and f'(x). In particular we will prove our main result, Theorem 1.1. Compactly supported functions with H6lder continuous first derivative: To obtain the CLT we will apply the Helffer-Sj6strand formula (1.13) with

'If(t, 7) = (f(t) + i(f (t + rj) - f (t)))J(77 )

where J(rj) is a smooth function of compact support, equal to 1 in a neighborhood of 77 = 0 and equal to 0 if r1> 1, see Lemma C.1. It follows that

&'I'j(t, r) = (f'(t) - f'(t + r/) + i(f'(t + r1) - f'(t)))J(R) + (if(t) - (f(t + rI) - f ()) ()

and if we assume that f' is H6lder continuous with exponent 0 < a < 1 then we also have the bound I&'f (t, 7)1 < c7o for some constant c > 0 for r7 small. Also If (t, r7) satisfies TI'(t, 0) = f(t), and that &If (t, r) is compactly supported whenever f is. Theorem 1.24. Suppose that dN satisfes the condition 1 < dN < N 1/ 3 and consider compactly supported test functions f1, . .. , fm whose first derivatives are H6lder continuous for some exponent a > 0. Then for any fixed E E (-2, 2) in (1.1) we have the convergence in distribution (Imeso l)X..., meso(fM)) -+ (X,, XM)

where (X 1 , . . . , Xm) is an M-dimensional Gaussian vector with zero mean and covariance matrix I f 0 0 E(XpXq) = Ik Ifp(k) j (k) dk, 1 < p, q < M (1.63)

Proof. Since f has compact support, we may write

-~ ~meo( T+ XNSo(f) - VN(T + i7)DT f (T, 77) d- d77 7r O fr-

for some fixed T_ and -+. First note that the region 0 < 7 < dN/N can be neglected, since

T+ jd/ jr- E(IVN(T + ir/)I)19'If (T, 77) dT, dr cdN jr -1- dT d77(1.64) f dNIN T+dNIN = c(T+ - T-)N'(dN/N)"

The latter goes to zero after choosing 0 < c < (1 - -y)a, where dN = NY with 0

X(VN, f) = f1VN(T + ir7)XN (r)DXIf(T, 77) dTd7 7T 0 -_ where XN (r,) is an indicator function, equal to 1 on the region dN/N < 77 < 1 and 0 otherwise. We wish to show that X(VN, f) converges in distribution to X(F''+, f). Since

31 we proved in the previous section that the finite dimensional distributions of VN (T + in) converge pointwise to F' +(T + ir), we can appeal to Theorem A.2. Let 4D denote the set of functions #(z, w) : H x C -+ C of the form O(z, w) = 5T. (z)w where g is a function in the class stated by Theorem 1.24 and M is as in the Theorem A.2. Let D be the domain [0, 1] x [-,, r+]. Then Theorem A.2 guarantees the convergence in distribution (X(VN, f1), . - X(VN, fM)) => (X(FOf), . . .,X(1+, fm)) provided we check the following tightness conditions:

inf limsup P f iVN( - irl)XN(rI)O'f(T, l)jdrdd 2' =0 (1.65) BCH,A(B)oo r D\B

lim limsupP -I (IVN (r 9i)XN(1)O9ff(T,r7) - K)+ ddr/ E =0, (1.66) K -0O N -*00 D First we prove (1.65). By Markov's inequality it suffices to check that

inf lim supf E(IVN(r + ir)I)XN(')1f&'f(T,r7) ddr7 = 0 (1-67) BCD,A(B)<0o N-*o JD\B

Now since T is fixed, the real part appearing in the denominator of the resolvent is bounded away from the edges (due to the compact support of f). Also the imaginary part appearing in the denominator is no less than 1/N. Hence we can apply Proposition B.4 and obtain 1 1 EIVN(T + iY)I < cq- . By the assumptions on f, i7- fO4'f(T, i)j is integrable on D \ B, and so (1.67) is bounded by

inf Crq-1 JD9f (,,q rI) dr dr1 = 0 BCD,A(B) 0, we have

(IVN(T + i/)XN (T)DT f (T, 77)1 - K)N(T - i/)XN (T If (T, K6 V(ir)N7)If(r)I 6 Then (1.66) is bounded by

lim K- lim sup f E(IVN (T + i'q) )X+N()XIDT f (7, 77)11+6 dT ch K-+oo N - oo JD

< lim K"f ci77-1l-6 1 pf(T, r)J+ 6 drdr =0 (1.68) K-*oo ID -(.8 where in (1.68), we choose 6 so that 0 < 6 < a/(1 - a) so that the above integral is finite by the behavior IDI'f(-, 77)1 < c ~0 for small rq. This completes the proof that X(VN, f) converges in distribution to the random variable

X(F/+, f) = J F"+(T + irj)DTf (T + irI) d- drI,

Since the integral of a Gaussian process is Gaussian, it just remains to compute the co- variance and verify it is well-defined, see the next Lemma. M

32 Lemma 1.25. Consider the random functional defined by:

X(Ff+, f) (T + iy)&Pf (T + i?7) dTd77 (1.69) 7r H

Then the covarianceEX(T'+, fl)X(F'+, f2) is given by (1.4), provided that f1 and f2 are in C1 'o(R) for some a > 0, and Ifgi, IfjI are O(Ixl-K+1 )) for some 3 > 0.

Proof Recall that the process rO+(T + ig) for q > 0 appearing in (1.69) has the covariance structure 1 E(% +(Ti + i n,1)Fr+(T2 + i 2 )) = 1 2 (1-70) (i(71 - 72 + i(1 + 72)))

while E(F'+(Ti + iq1)F'0+(T2 + iT12 )) = 0. We compute

E(X(F'+, fi), X (F'+,f 2 )) =

2Ej dzi dZ2 R ['o+ (zi)'o+ (Z2)DAWf 1z)5Tf2 (Z2)] (1-71) 27r2 . H H

+ R [r'+(z)51Ff1 (zl)F'+(z22)D f2 (z2 ) , (1.72)

we proceed to check that the expectation is finite and that we can exchange expectation with integrals by utilizing Fubini's theorem. It suffices to verify that

I dzi dz2 EIF'+(zl)F'+(z )I&'If (zi)|I&4f (z )I < oc, 2 1 2 2 0 fH x H 0 0(1JD~~Z)

by using Cauchy-Schwarz, the above integral is bounded by

1 00, D'Ff (z)1IO'f 2 (z 2 )j < JHxH dzi dz 2 rq 1 j2 1 since by our assumptions on fi, we have

f4(t + 7) - fj(t)| < min(C1r", C2tI-(1+a')) < C7O'jtj-0+3)(1-O), for any o- E (0, 1) and c > 0 a constant independent of t and n, giving us the bound

|5a 7, (t, l _)

EFO+(T + i7)I'+(T + in) = dkIkIr7, 1 (k)rQ2 (k)

33 1 where r, 1 '1(x) = (x - 71 - i1)~ . We now interchange the integration over k with the integration over H x H. To justify this, note that rr7 1 (k) = -2wie-ikrie-JkI? and due to the conditions on fi, f2, we have the bound:

f2 (z )1|ke-I k)1I e-\k172 R dkIDI 5(zl)I l 2 IHxH dz1 dz 2

=d77 llg1* hIli (1.73) dIJ dx dr1 J 0 -- OO f-oo X 1 -Z71 0

1 where g,1 (TI) = IT, +iZ1|- and h, 1(Ti) = &I'f(T1,711)1. To bound the L' norm of the convolution, we apply Young's inequality IIg,7 * hq, 11 < |hgn,lip 11h.., llq with q= 1 - 6 and p = 1 6/(1 - 26) with 6 > 0. A simple computation shows that IgI1, < c 5/(1-6) while for sufficiently small 6, I1h l11qis bounded uniformly in ql due to the integrability 1 assumptions on f and its derivatives. This shows that lg.., * hn|11 < c ' /( -() so that (1.73) is finite. After performing all such interchanges of integration, we finally obtain

E(X(F'o+, f)X(F'+, f2))=

Ikdk dxe-ikxJ dz1 -5x(z1) 8,7r _-CO f-0_0 7r x - z1

x dx e-ikxJ dz2 1 5 f 2 (z2 ) +c.c., (1.74) -o, 7 H X-Z2 where c.c. refers to the complex conjugate of the term preceding it. Now the inner integrals over H can be evaluated by Lemma C.1. There is a caveat however, Lemma C.1 requires the function f to be compactly supported. We remedy this by taking our function f and multiplying it by a cutoff function 0" = 4(x/n) where O(x) is 1 on [-1, 1] and vanishes outside [-2, 2], we let fn = #nf. By Lemma C.1 we have the identity

-f dr/ dfr - . &'f(Ti, q)= fn (x) + iH[fn](x). (1.75)

It is well known that H is a bounded operator from L2 (R) to itself, therefore if we take the limit as n -* oo on both sides of (1.75) (and note that fn -+ f pointwise everywhere and in L 2 (R)) we have

lim - dr/i dT . -I'fn(T1,T/1) = f(x) +iH[f](x), n--+ oo 1 7 f - i- Z7 1

34 we check that the limit interchanges with integral by noting that

|I pf" (t, 77)1 <21 f' (t) -- f' (t +TJ I)I J (7)|1+ 2 (1f (t) I+ If (t +T1 ))J'(77)|1, If'(t) - f' (t + 7)1 <5'n(t + 7)1If (t + r7) - f (t)I + IO'n (t + TI) - O'n (t)IIf(t)I + 10n(t + 7)J|f'(t + rj) - f'(t)l + If'(t)110n(t + 7q) - On(t)I,

and that #n(t) = #(t/n) is infinitely differentiable and therefore in Ckc(R) for any k E N, a E (0,1). So we may bound for large t, &4'fn(t,q)I by C uItI-l+/)(o) for some constant C and any o- E (0, 1), it follows by the dominated convergence theorem that we may interchange limit with integral to obtain

- dlj ddi . a'If(Tri, 1) =f(x) +iH[fl(x). (1.76) 7r fo -_OO X 1 - Z'1

The Fourier transform of the Hilbert transform is given by

H[f](k) = -isgn(k)f(k). (1.77)

Hence, inserting (1.76) into (1.74) and applying (1.77) yields the limiting covariance struc- ture 2 E(X(f1 )X( f2)) = dk |kIf(k)f2 (k)I1 - i sgn(k)1 dk + c.c.

I j IdkIk|f1(k)f 2 (k)

as required. Functions supported on the real line The main goal of this subsection is to remove the assumption of compact support from the functions f in Theorem 1.24 subject to the following decay condition on f and f': for some 3 > 0 and IxI large enough, f(x) and f'(x) are O(|x-- 3 ). This will complete the proof of the corresponding statement in our main Theorem 1.1. We begin by approximating f by a compactly supported function whose support grows at a rate O(dN) as N -+ oc. The support of the test function can now extend over the edges of the spectrum, so that the resolvent bounds of Proposition B.4 cannot be applied. The goal of this subsection is to prove the following

Theorem 1.26. Let dN = N7 with 0 < -Y< 1/3. If for some a > 0 and 3 > 0, we have f E Cl*'(R) where f (x) and f'(x) decay faster than x|- '-1 for large lxi, then the random variable X""SO(f) converges in distribution to a Gaussian random variable with variance given by (1.63) of Theorem 1.24. Moreover, the multidimensional version stated in Theorem 1.24 continues to hold for functions of this class.

We will prove this Theorem by means of the following Lemma and two Propositions.

Lemma 1.27. Let ON (x) denote a smooth cutoff function equal to 1 in when IE+x/dN I 2 and equal to 0 when IE + x/dNI > 4. Let fN(x) := f(x)N(x). Then Xmeso(f) X meso fN) + OL2(1).

35 Proof. We follow the same technique here as in Section 4 of [SW13]. Note that Xmeso((1 -

#N)f) is only non-zero when A1 < -4 or AN > 4, so that

oN)f) (dNdb) > 0 -b [ (A, > ) + (AN < -T)hu P (( -

these quantities are bounded by e- Nc by [EYY12a, Lemma 7.2]. Thus

EIXeso ((1 - ON)f) 2 2 jN1f I XP (I -ON)f(dNAi) x) dx,

2 0>NI2Loo(R)P(-N)f(dNAi) L

NC fL (R) - 'M

Now we apply the Helffer-Sj6strand formula (1.13) to the function fN, obtaining

kyesoyfN) = -j VN(T + i7j)N(T) T fN( ) d-d77 0 R

where N(T) is the indicator function of the region F + T/dNI < 4. Repeating the derivation of (1.64), we see that

E(IVN (T + ij)I1N (T)) I&4fN(Tl)I dTdq --- 0 dN /N I 0 JR

on all scales of the form dN = N'Y with 0 < 'y < 1. To apply Theorem A.2, we replace the N-dependent fN with f, noting that &'fN - ONf &IfN-f, where fN(T) - f(7) = f(T)(ON(T) - 1) is supported on the region IT/dN - El > 4.

Proposition 1.28. We have dTdTOL1(1)-=f(T,,)fN-+(TVN5T2)

k (1.78) S 0R

We postpone the proof of this Proposition until the end of this subsection, where it will follow from a more general argument. Thus X""SO(f) = I(f, VN) + OL' (1) where

1 / 1o f I(f,VN ) = -kJ I VN (T + iM)XN (77)N (T) 5T f (T, 1) dTd7- ir J 0 J-

To show that I(f, VN) converges to I(f, r'+), thus completing the proof of Theorem 1.26, it remains to check the following tightness result.

36 Proposition 1.29. Consider the domain D = [0, 1] x R. We have the following estimates:

inf limsup E(IVN(t+i7)IXN(7)iN(r))I 'f(t,7)Id27dt= 0 (1.79) BCH,A(B)

lim K-" lim E(IVN(t+ i27)1+ 6XN(2)X>N(T))I&'If(t,r7) 1' 6 drdt=0. (1.80) K-+oo N-+oo JD

Proof. First we give some bounds on I|x(t, 77). By our assumptions on f, we have

1 3 If'(t + 7) - f'(t) min(C14", C2It-( + ) < C7 OIt-(+IO)(la)

for any a E (0, 1) and c > 0 a constant independent of t and T1. Hence by construction we have the bound Ialpf(t, t) - (1.81)

for large Itl and small r. We proceed by splitting the integration in (1.79) and (1.80) into the regions Dbulk = [0, 1] x {r: IE+T/dNI < 2} and Dout = [0, 1] x {T: 2 < IE+T/dNI < 4. Starting with region Dbuik, the variance bound of Proposition B.4 is applicable and we see 6 1 that EIVN(t + i?7)I1+ < c1 -( '+) uniformly in t. Combined with (1.81) we see that the integrands of (1.79) and (1.80) restricted to Dbu1k are dominated by an integrable function and the proof proceeds as in the compactly supported case of Theorem 1.24. It remains to bound the contribution to the integrals on the domain Dout. Here we can exploit the decay of the test function f and show that the inner lim sup over N will already be zero in (1.79) and (1.80). Therefore it suffices to take B 0 and K = 1. Then the variance bound of Proposition 1.23 applies, yielding

(t,1+" dtd7 JD/ E (|VN z1+('jDP Scd1 +65) t(--)(1-a)(+) t(1-)(1+6)+d 7 cd(1.82) N

where we choose c and 6 so small that the integral over 77 is finite while the integral over t goes to zero as N -* oc. Indeed, if 6 < a-a/3 and 6 < aa/3 then (--1 - E)(1 + 6) +o-a > -1 and the 7 integral is finite. In the integral over t, we choose a < 6/(1+6), c < /3/(1+6) and deduce that E(1+6)-(1+,)(1-a)(1+6)+1 < 0. To make the bounds work simultaneously we take E < min{-a/3, #/(1 + 6)}. We conclude that the limit of (1.82) is zero. To prove Proposition 1.28 notice that the integrand is supported on Dout with the same regularity conditions on f. Hence an identical calculation to that given in (1.82) shows that (1.78) converges to zero in L' as N -+ oc. This completes the proof of Propositions 1.29 and 1.28. Consequently, by means of Theorem A.1, this also completes the proof of Theorem 1.26. M

37 Corollary 1.30. The sequence of stochastic processes VN(z) with z G H is tight in the space of continuous functions on any N-independent rectangle in the upper half-plane H.

Proof. It suffices to verify the Arzela-Ascoli criterion:

EIVN(zl) - VN(z2)1 2 z21 -2

for zi = ui + iv, and z 2 = U 2 + iv 2 in some N-independent bounded rectangle in R c H and C a constant depending only on the vertices of the rectangle (i.e. not on N). To prove this, note that

1 1 (ul - u2 ) + i(v1 - v 2 ) - X U1 - iv 1 X - U2 - iv 2 (x -U1 - iv1)(x - U2 - iv 2 ) implies 2 2 EIVN (zl) - VN(z2) 1 =|1- z2 12 EIeso (h)|

1 where h(x) = ((x - U1 - ivI)(x - U2 - iv2 ))> is a smooth function with decay h(x) = O(IxK-2) as lxi -+ oc. Hence the techniques of the present subsection are applicable with a = # = 1. Indeed, after replacing h with a smooth cut-off hN as in Lemma 1.27, an application of formula (1.13) followed by Cauchy-Schwarz leads to ElXeso(v)12 2 where

InV1110 E|NI= (T +irl) F 15ThN

Following the proof of Proposition 1.29 we easily deduce that Iv is uniformly bounded in

N and z1 , z 2 (E R.

38 PART 2

MESOSCOPIC CENTRAL LIMIT THEOREM FOR GENERAL BETA ENSEMBLES

Based on joint work with F. Bekerman

Acknowledgements: The authors thank Alice Guionnet for the fruitful discussions and the very helpful comments and her support through the NSF grant MS-1307704. The authors also thank the anonymous referees who helped us improve the manuscript.

39 2.1 Introduction We consider a system of N particles on the real line distributed according to a density proportional to V(A)yldAi J N - A 1Ife-N E 1 0. This system is called the /-log gas, or general /-ensemble and for classical values of / E {1, 2, 4}, this distribution corresponds to the joint law of the eigenvalues of symmetric, hermitian or quaternionic random matrices with density proportional to e-NTr V(M)dM where N is the size of the random matrix M. Recently, great progress has been made to understand the behaviour of /-log gases. At the microscopic scale, the eigenvalues exhibit a universal behaviour (see [BFG13], [BEY14b], [BEY14a], [Bek15], [Shc14]) and the local statistics of the eigenvalues are de- scribed by the Sine, process in the bulk and the Stochastic Airy Operator at the edge (see [VV09] and [RRV11] for definitions). In this article, we study the linear fluctuations of the eigenvalues of general 3-ensembles at the mesoscopic scale; we prove that for a E (0; 1) fixed, f a smooth function (whose regularity and decay at infinity will be specified later), and E a fixed point in the bulk of the spectrum

N f (N(Ai - E)) - N f(N'(x - E))dpv(x) i=1 converges towards a Gaussian random variable. At the macroscopic level (i.e when a = 0), it is known that the eigenvalues satisfy a central limit theorem and the re-centered linear statistics of the eigenvalues converge towards a Gaussian random variable. This was first proved in [Joh98] for polynomial potentials satisfying the one-cut assumption. In [BG13a], the authors derived a full expansion of the free energy in the one-cut regime from which they deduce the central limit theorem for analytic potentials. The multi-cut regime is more complicated and in this setting, the central limit theorem does not hold anymore for all test functions (see [BG13b], [Shc13]). Similar results have also been obtained for the eigenvalues of Random Matrices from different ensembles (see [AZ06], [LP+09b], [Shc11]). Interest in mesoscopic linear statistics has surged in recent years. Results in this field of study were obtained in a variety of settings, for Gaussian random matrices [BdMK99b, FKS13], and for invariant ensembles [BD14, Lam16]. In many cases the results were shown at all scales a E (0; 1), often with the use of distribution specific properties. In more general settings, the absence of such properties necessitates other approaches to obtain the limiting behaviour at the mesoscopic regime. For example, an early paper studying mesoscopic statistics for Wigner Matrices was [BdMK99a], here the regime studied was a E (0; -). This was pushed to a E (0; 1) [Lod15] using improved local law results and recent work has pushed this to all scales [HK16]. The central limit theorem at the mesoscopic scale has also been obtained recently for the two dimensional /-log gas (or Coulomb gas) in [LS16, BBNY16]. Extending these results to one dimensional /-ensembles is a natural step. We also prove convergence at all mesoscopic scales. The proof of the main Theorem relies on the analysis of the loop equations (see Section 2.2) from which we can deduce a recurrence

40 relationship between moments, and the rigidity results from [BEY14b], [BEY14a] to control the linear statistics. Similar results have been obtained before in [BEYY15, Theorem 5.4]. There, the authors showed the mesoscopic CLT in the case of a quadratic potential, for small o (see Remark 5.5). In Section 2.2, we introduce the model and recall some background results and Section 2.4 will be dedicated to the proof of Theorem 2.5.

2.2 Definitions and Background We consider the general -matrix model. For a potential V R -+ R and / > 0, we denote the measure on RN

P (dA1, ... , dAN) jAi - Aj eNJdA V

with zN jo-N17 v(Ai) 1

It is well known (see [MPS95] for the H61der case, and [J] Theorem 2.1 for the contin- uous case) that under PN the empirical measure of the eigenvalues converge towards an equilibrium measure:

Theorem 2.1. Assume that V : R -- + R is continuous and that

lim if V(x)> 1. x-+oo /0log x|

Then the energy defined by

/f (V(xi) + V(x 2 ) / E(pu) =] 2 2 log Ixi - X21 dp(x1)dp(x 2) (2.1)

has a unique global minimum on the space M 1 (R) of probabilitymeasures on R. Moreover, under P > the normalized empirical measure LN = N- 1 zN_ 1 6 converges almost surely and in expectation towards the unique probabilitymeasure AV which minimizes the energy. Furthermore, Av has compact support A and is uniquely determined by the existence of a constant C such that: /3 log Ix - yldpv(y) - V(x) < C , with equality almost everywhere on the support.

Hypothesis 2.2. For what proceeds, we assume the following " V is continuous and goes to infinity faster than / log lxi. * The support of Av is a connected interval A = [a; b] and

dliV dx =pv(x)=S(x)\ (b-x)(x-a) with S>0 on[a;b].

41 * The function V(-) - 13 f log I - -yldpv(y) achieves its minimum on the support only. Remark 2.3. The second and third assumptions are typically known as the one-cut and off-criticality assumptions. In the case where the support of the equilibrium measure is no longer connected, the macroscopic central limit theorem does not hold anymore in generality (see [BG13b] , [Shc13]). Whether the theorem holds for critical potentials is still an open question.

Remark 2.4. If the previous assumptions are fulfilled, and V E CP(R) then S E CP- 3 (R) (see for instance [BFG13], Lemma 3.2). Theorem 2.5. Let 0 < a < 1 , E a point in the bulk (a; b), V E C6 (R) and f E C6 (R) with compact support. Then, under PN

N Z f (N"(Ai - E)) - N f(N"(x - E))dpv(x) - N(0, o,)

where the convergence holds in moments (and thus, in distribution), and

2 2 = 2/3i1 2 JJdxdy.(f(x)_f(y) f 2&72 X __ Y

Note that, as in the macroscopic central limit theorem, the variance is universal in the potential with a multiplicative factor proportional to 3. Interestingly and in contrast with the macroscopic scale, the limit is always centered. The proof relies on an explicit computation of the moments of the linear statistics. We will use two tools: optimal rigidity for the eigenvalues of beta-ensembles to provide a bound on the linear statistics (as in [BEY14b], [BEY14a]) and the loop equations at all orders to derive a recurrence relationship between the moments. For what follows, set

IN N NMNJ M - Npv. LN , i=21 i=1

and for a measure v and an integrable function h set

v(h) = hdv and (h) = hd - E N hdu), (2.2)

when v is random and where EN is expectation with respect to PN. Further f will be any function as in Theorem 2.5, and

fN(x) := f(N0 (x - E)).

Finally, for any function g C CP(R), let

p II9I cP(R) := sup 19(')(X)1, l=0 XER

42 when it exists. Loop Equations: To prove the convergence, we use the loop equations at all orders. Loop equations have been used previously to derive recurrence relationships between correlators and derive a full expansion of the free energy for 3-ensembles in [Shc13], [BG13b], and [BG13a] (from which the authors also derive a macroscopic central limit theorem). The first loop equation was used to prove the central limit theorem at the macroscopic scale in [J] and used subsequently in [BEYY15]. Here, rather than using the first loop equation to control the Stieltjes transform as in [J] and [BEYY15], we rely on the analysis of the loop equations at all orders to compute directly the moments.

1 Proposition 2.6. Let h, hl, h2 , - - - be a sequence of bounded functions in C (R). Define

___ h(x) - h(y)dL)d()# F1N(h) N dLN()dLN(Y) - NLN( ) + - LN(hl) (2.3) 1 2 ffX- y 2 and for all k > 1

hk) := FkN(h, h1 , ---, hk-_)MN(hk) + (k M N(hl) LN(hh') (2.4) / \ 1=1 where the product is equal to 1 when k = 1 and MN was defined by the convention eq. (2.2). Then we have for all k > 1

E (FN(h, h1 , * - , hk-1)) = 0, (2.5) which is called the loop equation of order k. Proof. The first loop equation (2.3) is derived by integration by parts (see also [J] eq. (2.18) for a proof using a change of variables). More precisely, for a fixed index 1, integration by parts with respect to Al yields the equality:

E N (h'(Al)) = -E NV'(Al)II (h(Al) ( Y1 Al-Ai 1, #1i<

Summing over 1 we get by symmetry

N h(AI)-h(X 2 ) N N)3 h'(A,)) = 0 1<

Writing the sums in term of LN gives eq. (2.5) for k = 1. To derive the loop equation at order k + 1 from the one at order k, replace V by V + 6hk and notice that for any functional F that is independent of 6,

3 EN +jhc(F) = -N E (FMN(hk)) 36h= 6a6

43 Also observe that the loop equation eq. (2.5) is now

k-i

E + hk (FkN (h, h1 , .,hk_)) - 6NE+h (( IiN (hl)LN(hh'))

by induction and the definitions given in eqns. (2.3) and (2.4). Differentiating both sides with respect to 6 and setting 6 = 0 yields the loop equation at order k + 1. E It will be easier to compute the moments of MN (f) by re-centering the first loop equation - that is, we wish to replace LN by LN - pv. To that end, define the operator acting on smooth functions h : R -- R by

Bh(x) := h(x) - h(y) dpy(y) - V'(x)h(x)

We now prove the equilibrium relation (2.6) below, to recenter LN by ptv. Consider for a smooth function h : R -+ R and 6 in a neighbourhood of 0, pv,,5 = (x + 6h(x))#pv, where for a map T and measure /t, T#p refers to the push-forward measure of A by T. Then by (2.1) we have E(pyv,5) > E(py), which writes

V(xi + 6h(xi)) + V(x 2 + 3h(x 2)) JJ1~ 2

-log x1 - x 2 + 6(h(x1 ) - h(x2 ))I dpv(xi)dpyv(x 2 ) lg li X-X2) dp/v(xl)dpyv(x2)- > JJ(V(Xi) + V(x 2) 0

As 6 approaches 0 we get

6(JV'(x)h(x)dpv(x) - /3 h(x) h(y)dp()dp(y)) 0(62)> 0

So that ,3 JJh(x) - h()dpy(x)dp (y)= V'(x)h(x)dpv(x) , (2.6)

and thus 0 J fh(x) h(Y) dLN(x)dLN(y) - LN(hV')

-MN (Eh) + JJ - dMN (x)dMN (y) - Consequently, we can write

FN (h) = MN (Eh) + I - )LN (h') + h(x) - (Y)dMN (x)dMN(y] (2.7) 2 N -2J x - yI One of the key features of the operator E is that it is invertible (modulo constants) in the space of smooth functions. More precisely, we have the following Lemma (see [BFG13, Lemma 3.2] for the proof):

44 Lemma 2.7 (Inversion of E). Assume that V E CP(R) and satisfies Hypothesis 2.2. Let [a; b] denote the support of Av and set

dpv - S (x) (b - x)(x - a) = S(x) -(x), dx

where S > 0 on [a; b]. Then for any k E C'(R) there exists a unique constant Ck and h E C(r- 2 )A(P- 3 ) (R) such that B(h) =k+ck.

Moreover the inverse is given by the following formulas: * Vx E supp (Av)

h(x) (- b k(y)-k(x) dy) (2.8) &72S(X) ( o-0,(y) (y -X)

" Vx supp (Av)

h(x) = h3y) dpdv(y) + k(x) + ck 3 dv(y) - V'(x). (2.9)

And ck = - f d pv(y) - k(a). Note that the definition (2.9) is proper since h has been defined on the support.

We shall denote this inverse by E- 1 k.

Remark 2.8. For f and V as in Theorem 2.5, p = 6 and r = 6 so - C (R)-

Remark 2.9. The denominator 3 f X' dpV (y) - V'(x) is identically null on supp pv and behaves like a square root at the edges. As we can modify freely the potential outside any neighbourhood of the support (see for instance the large deviation estimates Section 2.1 of [BG13a]), we may assume that it doesn't vanish outside Av.

In order to bound the linear statistics we use the following lemma to bound E-1 (fN) and its derivatives.

Lemma 2.10. Let supp f c [-M, M] for some constant M > 0. For each p E {0, 1, 2, 3}, there is a constant C > 0 such that

(1=-1N)1 CP (R) < CNPa log N, (2.10)

Moreover, there is a constant C such that whenever x E supp Av and NIx - E| > M + 1

C -7-1JUN) (P) W Na(x - E)P+ ' (2.11) and when x V supp pv

(fNNo ClogN .(2.12)

45 Proof. We start by proving (2.10) on the support. For x E supp pv we use

N a b1 f B 1 (fN)(X) - f(Nat(x - E) + Nc(1 - t)(y - E))dtdy 072S(X) Ja 0-(y) 0 so that

~-(X E~(fN)P (X) 2

1 X b Nt+ [' +1) (Not(x - E) + Na (1 - t)(y - E))dtdy fa J(y) J)

Let A(x) {(t, y) E [0; 1] x [a; b] , Najt(x - E) + (1 - t)(y - E)I M}. We have

2M A 1A (x)(t,y)dt < , /\ 0 No, __-Y1 and thus

1 ClogNN', Iba 0-(y)+0a J01lf('+ )(Nt(x - E) +N"(1 -t)(y - E))Idtdy and this proves (2.10). We now proceed with the proof of (2.11). First, let x E suppllv such that N'flx-EJ > M + 1. The inversion formula (2.8) writes

-1(_ 21 bf(Na (y - E)) (f)() w~ S(X) Ia a (Y) (Y - X) dy(2.13)

w7r2 S(x) j (E + )(u - N(x - E))

and we can conclude in this setting by differentiating under the integral. Moreover we see that E- 1(fN) is in fact of class C' on supp /v and simiar bounds holds for p C {4, 5}. We now prove the bounds for x V supp pv. Let /N be an arbitrary extension of 1 (fN) Isupp [v in C5 (R), bounded by C/Na outside the support (and its five first deriva- tives as well). This is possible by what we just proved and a Taylor expansion. Using (2.9) we notice that

O3 Nf 'N(Y) d Nv +CfN 3 --13N f bX)-Y (Y) +~vy 'Cx gfy fN(.4

Of x1 ydpv(y) - V'(x)

CfN =ON (X) - (VN)(X)- f f XY dpy (y) - V'(x)

46 f has compact support we may write --(ON) - CfN= - )N) - CfN - fN on [a; a + E] and [b - E; b] for e small enough. Furthermore this quantity vanishes identicaly on these intervals by definition of ON. Consequently, E(ON) - CfN and its four first derivatives vanish at the edges. By definition we also get that

|CfNI = N3j dy v(y)

2M/N- (y - a)y - El < log N Na

On the other hand, for p E [4] and x supp gv,

O (+Y E- y) - X)PI -(O) JN(P)-~(y -( N dv(y) - (V'N)'(X)

By doing a similar splitting, and bounding the fifth derivative of ON uniformely away from E, we obtain the same bound Clog N/N' on B(4N)(P) outside the support. By Remark 2.9 and (2.14), we conclude that we can bound the C' norm of E- 1 (fN) by ClogN/Na outside the support. U Sketch of the Proof: We have developed the tools we need to prove Theorem 2.5. In order to motivate the technical estimates in the following section, we now sketch the proof by computing the first moments. Consider a function f satisfying the hypothesis of Theorem 2.5. Applying (2.7) to -'(fN) yields

1 FfN (E- (fN)) =MN(fN) + (1 - LN ( N))

+ I lfN(X-fN(Y)dMN()dMN(y)] N [Off_2 JJ - y

If the central limit theorem holds, we expect terms of the type MN(h) where h is fixed to be almost of constant order, and this an easy consequence of the rigidity estimates from [BEY14a] (stated as Theorem 2.11 below). Due to the dependency in N of fN (and its inverse under E), a little care must be taken for these estimates to yield a bound on the last term in the right handside, and this is precisely the point of Lemma 2.14, eq. (2.24). Similarly, we have

1 LN ((E fN)') = PV ( IfN ') + IMN ((= 1 fN)') and Lemma 2.13 shows the term in the right handside is a small error term. Thus admitting the results of the next section, we would get with high probability and for eN small

1 F1 (- (fN)) = MN(fN) -+- - 2 V N + CN-

47 By the first loop equation from Proposition 2.6, the expectation of Ff is zero and this shows that the first moment

1 E N(MN(fN)) ~- V ((5 fN' + o(1)

The term on the right handside is deterministic and is shown to decrease towards zero in Lemma *INSERT*. Thus the first moment converges to 0. In order to exhibit all the terms we will need to control, we proceed with the compu- tation of the second moment. By definition

F2N (B(fN),fN) = Ff (- 1 (fN)) KIN (fN) + LN (E (fN)fN')

which we can write (with now an EN incorporating the deterministic mean converging to zero)

1 1 F 2N - (N), fN) MN(fN)MN(fN) + EN MN(fN) + LN (B (fN)fN)

Lemma 2.13 ensures that eN N fN) remains small, and that the term in the right handside of the decomposition

) V N (f)fN) + IMN ( 1(fN)f) LN (B 1 (fN) =

is also controlled. Consequently, using the second loop equation we see that

E -MN (N 1 (fN ) =o()

The limit of the term appearing on the right handside is then computed in Lemma 2.14 equation (2.23). The following moments are computed similarly (see Section 2.4). In the following section, we establish all the bounds we need for the proof of Theorem 2.5. The previous steps will then be made rigorous in the last section.

48 2.3 Control of the linear statistics. We now make use of the strong rigidity estimates proved in [BEY14a, Theorem 2.4] to control the linear statistics. We recall the result here Theorem 2.11. Let 'yi the quantile defined by

dpv(x)= N

Then, under Hypothesis 2.2 and for all & > 0 there exists constants c > 0 such that for N large enough 1 3 PN (JAi - il > N-2/3+ i- / ) < e-Nc where i = i A (N + 1 - i). We will use the following lemma quite heavily in what proceeds. Lemma 2.12. Let -yi and i be as in Theorem 2.11, let t G [0; 1], and let Ai, i E [N], be a configuration of points such that |Ai - -yi N- 2 /3 + i-1/3 for 0 < g < (1 - a) A 2, and let M > 1 be a constant. Define the pairwise disjoint sets:

J:{i E [N], IN' (yi - E)I < 2M}, (2.15)

J2 i E Jf,1(-/i - E)J < (E-a) A (b- E), (2.16)

J3 : 1 n JS.

The following statements hold:

(a) For all i E J1 U J2 , i > CN, for some C > 0 that depend only on pv in a neighborhood of E, also for all such i, |7i - i+i| < for a constant C > 0 depending only on pv in a neighborhood of E.

(b) Uniformly in all i E Jf = J2 U J 3 , x c [i, yi+] and all t E [0; 1],

IN't(A - x) + N'(x - E)I > M + 1, (2.17)

for N large enough. (c) The cardinality of J1 is of order CN-, where again, C > 0 depends only on PV in a neighborhood of E.

Proof. The first part of statement (a) holds by the observation that for i e Ji U J2 , 7i is in the bulk, so

0 < C< I dpv (x)= < C< I a N for constants C, c > 0 depending only on pv. For the second part of statement (a), the density of pv is bounded below uniformly in i E J1 U J2 , so

cI7i - 7i+iI < dpv(x) =

49 Statement (b) can be seen as follows: let i C J2 and consider first x = 'y. On this set i > CN by (a), so uniformly in such i, NOAi - il CN- 1 +', which goes to zero, while NOl-y -El > 2M. On the other hand, for i C J3, we have N'jIyj-E > }Na(E-a)A(b-E), which goes to infinity faster than NalAj - 7iy < N0-+, by our choice of (. When we substitute y by x, the same argument holds because N' x - yj I No y - 7+11, which is of order No- on J2 (as we showed in statement (a)) and bounded by CNo- on J3 - Statement (c) follows by the observation that on the set x E [a, b] such that Ix - El < 2 the density of pv is bounded uniformly above and below, so

c / dav(x) = d (x) + 0 - < Na- ix-EI:5 Na iEJi C giving the required result. U The rigidity of eigenvalues, Theorem 2.11, along with the previous Lemma leads to the following estimates

Lemma 2.13. For all 0 < < (1 - a) A 13 there exists constants C, c > 0 such that for N large enough we have the concentration bounds

j(IMN(M(fN))I > CN" llflIc1(R)) e(2.2) P (|MN N)' )j > C~at|c() - e (2.20 PV (IMN (Ek) > Na+ 11f||1 1(R)) 9eN

Proof. Let M > 1 such that supp f c [-M, M] and fix 0 < ( < (1- a) A 3.2. For the remainder of the proof, we may assume that we are on the event Q := {Vi , XAi - 'yj < N-2/3+ i-1/ 3 }. This follows from the fact that, for example,

PN (IMN(fUN) > CNEIIf Ic1(R)) - PV ({IMN(fN)l > CNEl1fllcl(R)} fQ) + N

and by Theorem 2.11, we may bound PV(QC) by eNc for some constant c > 0, and N large enough. On Q, the A satisfy the conditions of Lemma 2.12, we will utilize the sets

J1 , J2, and J3 as defined there. We begin by controlling (2.18). We have that

N IMN(fN)| = f(N'(Ai - E)) - N pv(f)

N N < Yf (Na(A - E)) - f (N'(-yi - E)) (2.21)

N + f(Nc(i - E))- i=1

50 the first term in (2.21) may be bounded (on Q) by N N f(No(A -E)) -f(N (-Yj - E)) = i=1

Na (Ai - -y) f'(tNO'(Aj - 7) + N'(-j - E))dt fo N 1 5 Na-2/3+ i-1/ 3 If'(tN(A - Yi) + Nc(-i - E))|dt, i:=1 By Lemma 2.12 (b), for N large enough, we have

I N 2 3 3 Na- / + i-1/ If'(tN (Ai - -y) + Na ('1 - E))Idt

= j-/ i-1/3 |f'(tNo(A - y}) + No (yj - E))Idt, SiE Ji Na 1 +tI|fI|c1(R) CNE||fI|c1(R),

iE J1 where, in the third line we used Lemma 2.12 (a) and (c) in order. Thus N N f(Nc(Ai - E)) - f(N'(yi - E)) < CN111f ||CI(R). F=r ti=1 For the second term in (2.21),

N b f(N"(-yi - E)) - N f (No'( x - E))dyuv(x)

2i=1

< N YI f(No(i - E)) - f(N'(x - E))Idpv(x) i(E J1

< N+IlfIIcl(R)IJ1I sup(7Yi+ 1 - 7i) dpv(x) CI|f Ic'(R) iE Ji since the spacing of the quantiles in J, is bounded by -!. This proves (2.18). We now proceed with the proof of (2.19). N 1 MN ( fNY)' (fN)'(Ai) - N -- I(fN)x dpv (x) I-ti -Yi+ 1 N

< N i -I(N)X Ip~v -y+ 7-1 (fN)(Ai) - (x) i=1 -Yi N f 1 < N 1 Aj - xI (fN)2) - x) + x) dtdv ,

51 Recall from the proof of Lemma 2.12 that uniformly in i E J2 and x E [yi, yi+1], -yi - El > 2M while lx - 'YI !; further, IAj - xj < CN~1 '+ so for N large enough we can replace jt(Aj - x) + (yi - E)j by -yi - El uniformly in t E [0; 1]. Likewise, uniformly in i E J3, x E ['-y, yi+1] and t E [0; 1] we can bound below lt(Aj - x) + (x - E) I by a constant for what follows.

For i E J2 , by the observations in the previous paragraph, along with Lemma 2.12 (b), Lemma 2.10 (2.11), and Lemma 2.12 (a),

N JEX - f B- 1 (fN)(2) (t(A - x) + x) iEJ2 f4 0

C - )' < N f Yi+1 f ClAi -X1 z (7y4 - E)3 1 EJ - x) + x - E )dtdV(x) - - 2 o Na(jt(Aj 3 iETJ2

The same reasoning for i E J3 using also (2.12) yields

N I 1 E CN-- 3-z. Z+Y I I lA -xI B- (fN)(2)(t(Ai - x) + x) dt dlv(x) < log N iEJ3 iE J3 For i E Ji, by Lemma 2.10 (2.10) and Lemma 2.12 (a), 1 + N E /7i j IAj - xl =- 1 (fN) 2 (t(Ai - x) - x) dt dpv (x) iE J1 7i 0

SZ CN 2a log N|A - xldipv(x) < E CN 2a+ i log N.

iEJ1 f iE J1 It follows that

IMN( =(fN)')I CN2 +, logN + E OM-N-E + log N E CN -- i3

iEJ1 iEJ 2 ('yj E iE J3 < CN+ log N + C log NN+a

b NM-a-1 (EE- CN2M 7 dx 3 < CN +a (yi - E) 3 -- N a (x - E) IE+g (x - E)3 - N -1 CN E i C N - x N < CN -" i EJ3 = This proves (2.19). The bound (2.20) is obtained in a similar way and we omit the details. M For convenience we introduce the following notation: for a sequence of random variable (XN)NEN we write XN = w(1) if there exists constants c, C and 6 > 0 such that the bound IXNI < C holds with probability greater than 1-- e-N. Using Lemma 2.13 we prove the following bounds:

52 Lemma 2.14. The following estimates hold:

LN (=1(fN)) W(1) , (2.22)

LN (fN)f) + o = W(1), (2.23)

YN) (X) - (N)(Y) dMN(x)dMN(y) ) - (2.24) N j Jx - y

Proof. For both (2.22) and (2.23), we use

MN (B(fN) + LN(B1(fN)I) N -1 N),

MN (-r 1 (fN)) LN (-fN f N N

Lemma 2.13 implies that the first term in both equations are w(1) so (2.22) and (2.23) simplify to deterministic statements about the speed of convergence of the integrals against UV above. To show (2.22), integration by parts yields:

fb=- fN)(x)(S()a(x) + S(x) u'(x))dx, I (B-fN)'(x)dpv(x) - -

inserting the formula for -ifN we obtain

jb fN(X)-f N(y) K S,(X)o() + ) dxdy _ (-'N ) V() f - 72 Ja a y - ( S()O-(y) -(y) Recall that S is bounded below on [a, b], S' is bounded above on [a, b], further, up to a constant, . can be bounded above by (-(x)o-(y))- 1 . We define the sets

AN := [N0 (a - E); N'(b - E)], R [ BN :=N"(a - E); -N"(b - E)].

By the observations above, and the change of variable n = N (x - E) and v N'(y - E) we get

(E -fN)(x)dMlv(x) <

Cg 11A2 f(u) - f(v) (o(E + ) 1 N U - V o(E + N) a-(E + u)-(E + v dudv. (2.25)

53 For large enough N, on the set (U, v) E (AN\BN) 2 , the function If(u) - f(v)I is always zero, thus the integral on the right above can be divided into integrals over the sets:

(AN XAN)f(AN\BN XAN\BN)c = BN XBNUBN X (AN\BN)U(AN\BN) XBN- (2.26)

We bound the integral in (2.25) over each set in (2.26). We begin with the first set in (2.26). For (u, v) E BN x BN, -(E + u ) and -(E + ') are uniformly bounded above and below. Therefore, the integral in (2.25) can be bounded in this region by

f(u) z f(v) dudv

f (U) f (V) dudv +f(V) du dv, I_ M;M]2 u -f _M IBNfl{uI the integral over [-M; M 2 exists by the differentiability of f, while:

1 (v) du dv < C JIf (v)Ilog[NIv + MI|v - M] dv < C log N, f-M BNnju~jU-V_ for N large enough. For the second set in (2.25), observe that for (u, v) E BN x (AN\BN), f(v) is 0 for N sufficiently large, and c-(E + u) is bounded uniformly above and below while f(u) is 0 outside [-M; M]. This implies that the integral in (2.25) can be bounded in this region by

f (u) a(E + *) + dudv JAN\BN :-M -V ( E + v ) o-(E + u)-(E + 4 ) )

C||f||IC(R) I1 dV < C NCc IAN\BN -(E+N d C where in the final line we used ju - vI > cN' for u E [-M; M] , v E AN\BN- We can do similarly for the third set in (2.25) and putting together these bounds on the right hand side of (2.25) gives

t

f-1f (xfI(d(1 v (x a(x)f ' (x)(fN(x) - fN(y)) N NX 2 7r Ja a .(y)(X - y)

54 Observe that

2 IOx (fN (X) - fN (Y)) = f(X)(fN (X) - 2 fN (y)), -}(a + b)(x + y) + ab + xy ax (9) ( - y a(x)(x - y)2 Therefore, integration by parts yields

/ N(x)fN(x)dIv(x) =- I Ib 1 b (x)0x(fN() -fN(y)) 2dd I 2#72 a a o(y)(x -y) ab+xy- (a+b)(x+y) 2- x o- y d xd ' 207w Ia Ja(fN(X) x -_ y )

By changing variables again to (u, v) = (No(x - E), No(y - E)) and observing that

2 uv ab+xy -- I(a +b)(x +y) =-u(E) + U + __a+_ + E) 2 No,( ~ we obtain I =- fN{x) fN x p x (2.27) f (u) f ()2 o(E)2 _ (a+b + E) - dd. 207F2 JA2 U - V o-(E + u)o-(E + J

As before, (f(u) - f(v)) 2 is zero for all (u, v) E (AN\BN ) 2 for large enough N, therefore we split the above integral into the regions defined in (2.26). Notice that uniformly in u C BN 1 1 + 0 Ju _____+ u 7(E) and further notice (u + v)/Na and uv/N 2a are bounded uniformly by constants in the entire region AN x AN and converge pointwise to 0 for each (u, v). Consequently the integral (2.27) over the region BN x BN is:

- f(v) 2 o-(E) NQ(a2 Sf(u) +E) N2 ) dudv IIB u -v o-( E + u )o-( E + v) f M 2 u+v a+b 11B2 f(u) + E) Noo-(E) 2 2 N2ao-(E)2) dudv + J( u- V (I1 2 f(u) f(v) +0 (|ul + Iv|)dudv) ( NIalI 1BN (2.28)

55 the first term of (2.28) is equal to,

2Ir2 (f(U)f(v)) 2dud+o(

while the second term in (2.28) can be written as

f (u)- f 2 u -Uv) (Jul + Iv)dudv I f (u) f) (Jul + IvI)dudv 1 1B2N 'B_2 ;M2 u -- )

+2 IM U 2 (|u| + | v|1)dudv, _-M BNnjjm(B

the integral over [-M; M] 2 is finite by differentiability of f while the second is bounded by

2 (v)1 ( i + 2vl dudv -M IBNfl{uI MI

I- log[Nlv-M|lv+M|] ClogN Cof I(v) 12 ( + C[-M iM] |V- M| |M +V|

since supp f C [-M, M]. In the region (u, v) c BN X (AN\BN), u(E + u) is bounded above and below while, for N large enough f(v) = 0, thus the integral over BN X (AN\BN) is bounded above by

f (u) f (v) ) )1 dud IAN\BN JBN \ u v |a( E + u )o-( E + v ) )

LNBN (U())2 1 __dudv f AN\BN f-M \N- o(E+%

< 2 Ia dv

where in the second line we used Ju - vI > cN' for u E [-M; M] and v E AN\B. By symmetry of the integrand in (2.27) this argument extends to the region (U, v) E (AN\BN) x BN. Altogether, our bounds show

f (Y) 2 f W - dxdy + 0 log N J-1fN(x)f(x)dzv(x) = - 2Ir2 J X-Y ) which shows (2.23). We conclude by proving (2.24). The proof will be similar to the proof of Lemma 2.13. As in Lemma 2.13 we may restrict our attention to the event Q = {Vi : IA - yiI <

56 N-3i i} by applying Theorem 2.11. Further, we use again the sets J1, J2 , and J3 defined in Lemma 2.12. The general idea will be that we can use the uniform bounds (2.10) for particles close to the bulk point E (corresponding to the indices in JI), and control the number of such particles. In the intermediary regime we will use the bounds (2.12) or the explicit formula (2.13). On the other hand, for the particles far away from E (corresponding to J3 ) we can use the uniform decay of E-'fN and its derivative by (2.11) and (2.12). Define for j c {1,2,3}:

MU = ( \ - Nl pv) iE Jj

so that MN = M( + M + M(. We can write

//=-1fN)- -(fN)(Y) dMN(x)dMN(y)= x - y d~ dx df (y) - (N N l

Integrating repeatedly for each (3i, j2), and using that Npv ([-y, 7ei+i]) = 1 for all indices i yields:

S (fN)(X) - E(fN) (Y dM ()(x)dM ( ) - X YN N -

f2+1f( 2 fY i1+1 N d[il v(xi)]1 duv(x2) dudv dtj (Ail - x1)(Ai 2 - x2)t(1 - t)

i 1 Ej3 'yiT '2E Jj2

x (tv(Ai - x - - (f 1 1) + Ut(x2 Ai2) u(A i2 x 2 ) + t(xI - x 2 ) x 2 ) (2.29) where T = [0; 1]3. We will bound (2.29) for each pair (Ji, j2). 1 For (jil, j2) = (1, 1). Recall by Lemma 2.12 (c) that IJ1 CN -, and further from 1 the proof of Lemma 2.12 uniformly in i E J1, IAi - xj < CN- whenever x E [7y, 7i+1]- We use (2.29), Lemma 2.10 (2.10) to obtain the upper bound

JJB(fN) (X) -E(fN) (Y) d~~(d W N UdM ()(x) dM y II (fNxx - yN N(Y

N 2 2 5 J J N 3log NjAil - xI|Ai2 - x 2I dliv(xi) dpv(x 2 ) CN +a log N, hiEJ1 1 e 'Yd2

which is w(1) when divided by N.

57 For (Ji, j2) = (3, 3). We bound as in the previous case, using (2.12) and that uni- formely uniformly in i c J3 , IAj - xl C < N-3K-

3 JJ 7(fzv)(X) -' (fN) (Y) (M3) (d)( (x) dM (y) < I Nx S- y N y dMN

2 2 N J + Y' +l | - x1||Ai - z 2 No 2 X21 dpv(xi) dpv(x 2 ) < CN - log N,

which is w(1) when divided by N. For (i, j2) = (2, 2). We remark that the strategy is not as straightforward as the case i E J2 in the proof of Lemma 2.13 (2.19), this is because the term t(xI - x 2 ) + x 2 appearing as an argument in (2.29) may enter a neighborhood of E depending on the indices il,i2 E J2 ; so we may not use the bound Lemma 2.10 (2.11) uniformly in i1 , not Z2 E J 2 . Some care is needed also because MN is a signed measure so IMN(g)I need be bounded by MN(Ig9). It will be convenient to use directly (2.13) from the proof of Lemma 2.10 (this can be done as J2 is located outside the support of f). We can write for x, y E {z E supp pv, N'z - El > M +1}

=-1(fN)(X) - N x - y

1 ) (du 7 _M o(E + )(x - y) S(y)(u - Nc(y - E)) S(x)(u - Nt(x - E))

1_ f (u) S(x) - S(y) 1 -M o-(E + u) (x - y) S(x)S(y)(u - No(y - E)) N0 S(x)(u - Nc(x - E))(u - No(y - E)) (2.30) When we integrate the term on the third line of (2.30) against M2 M (, we obtain

f(U) M(2) (f S(t(- -Y) +Y) dt 1 N2) (y) du, _M o- (E + u ) N 0 S(-SMy (u - NO (y - E)) N (2.31) define the function

g(y) : N M2) S'(t - y) + y) dt)

58 first, g(y) is bounded for any y e [a; b]:

M N) o S'(t(.S(-)S(y) - y) + y) dt

N SS') 1L (ST(Ai - y) + y) S'(t(x - y) + y) dt dpv (x) S(y) E S(Aj) S(x) 1 N E +i S'(t(Ai - y) +y) - S'(t(x - y) +y) S() Jo S(A ) dtd pv(x)

N S(x) - S(Aj) S'(t(x - y) + SS(y y)dtdgv(x) CN , i ) iEJ2 + J S(x)S(A where in the final line we used S and S' are smooth on [a; b] (and therefore uniformly 1 Lipschitz), S > 0 in a neighborhood of [a; b], further Ix - AiI < CN , and IJ21 < CN. Moreover, g(y) is uniformly Lipschitz in [a; b] with constant CN , since:

S'(t( S'(t(- - y) + y) - z) + z) dt o 1 S(-)S(y) S()S(z)J - +t(- - z) + =(z - y)M (j tS"(ut(z y) y)dtdu) S(.SWy

S(z) - S(y)M( S'(t(. SS(z)S(y) -z)+ z)dt (jo S(-O

i (2) and both terms appearing in M above are of the same form as g so they are bounded by CN . Returning to (2.31), we may bound

g(y) 2) u - Na(y - E))

g(A) - N'(Aj - x)g(x) =NZ g(x) dpuv(x) I -Y1 (u - Na(Aj - E))+ (u - N"(Ai - E))(u - NO(x - E)) iEJ.2 [ CN - 2 >2M - Ncl(x - E)I (u - No (x - E)) dx

) N f f) M 2) (2) 1 du. (2.32) J-M or(E +g-) N S(.)(u - NO(. - E))) M(u - Na(- - E))

59 Repeating our argument in the previous paragraph gives:

M(2) < CN-log N + CN, N S(-)(u - Na(- E)))

M(2) 1 < CN MN (u - Na( - - E)) where in the first inequality we use 1/S is uniformly bounded and uniformly Lipschitz on [a; b]. Inserting the bounds into (2.32) gives an upper bound of CN 2+', as f is bounded. Altogether

B-1 fN)()-51fN (ydMN (x)dM (y) S- y I which is w(1) when divided by N. For (J, J2) = (1, 2). By the bounds IAi, - 7

N'ltv(Ai, - Xi) + ut(x 2 - Ai 2 ) + t(X 1 - X2) + u(Ai2 - X2) + 2 - El > M + 1, (2.33) we have

It(l - 73 2 ) + (b 3 2 - E)f + CN-1 ;> M + 1, by triangle inequality. It follows that for N sufficiently large, uniformly in xi E [7i1; 1ii+1],

X2 E [i 2 ; Yi2 +1], u, V E [0; 1] 1

ltv(Ail - i) + ut(X 2 - Ai 2 ) + t(c1 - X2)+ u(Ai2 - X2) + 2 - El C < lt(7ii- y%) + (7yi 2 - E)I' where the constant C only depends on M. Therefore, whenever (2.33) is satisfied, applying Lemma 2.10 (2.11) yields

1 3 - (fN)( ) (tv(Asi - XI) + ut(X 2 - Ai2) + t( 1 - X2) u(Ai 2 - 2) + 2) C (2.34) <1

- N(t(-yj - 7i 2 ) + 7i 2 - E)4

Now, let t E (0, 1) fixed and define the sets

2M (t - 2M J - +- -E;> {j 2 , tE N 7i ) NcJ' Kt:= 2M \ 2M C J , t (E+ +- j-E < {j 2 Na ) - , Kt := Kt U Kt.

60 By construction, if i2 E K then 2M (7- 2) + ( 2 - E) __ uniformly in i1 E J1 . Thus for such i2 E Kt, (2.33) is satisfied for N sufficiently large on (uniformly in u, v, x1 , and X2 ; also the choice of how large N must be only depends and pv). The same statement holds for Kt. We now proceed to bound (2.29) for ji = 1 and J2 = 2 by splitting J2 into the regions Kt, Kt and J 2 \Kt. We start with Kt (the argument for Kt is identical). Our observations from the previous paragraph along with (2.34) gives:

//7f11i 2 +1 2 ]dudvdt N ) dpv(x 1 )] dyv(x 2 ){ Ai1 - x1 )(Ai 2 - x 2)t(1 - t) T i1 Ej1 Y i '2EKt

3 x - (fN)( ) (tv(A3) - X1) + Ut(X 2 - A i2 + u(Ai 2 - X2 ) t(x1 - X2 ) + X2 ) /1 z CN-2-at(1 - t) dt < CN-1-2t(1 - t) dt 4 4 Jo EI (tbil - 7Y2) + (7i 2 - E)) - ((1 - )(7Y 2 - E) - tfM ) 2EK1 where in the final line we used Ji < CN '-1 from Lemma 2.12 (c). Next, note that

1 1 E+ (E-a)A(b-E) dx

since, by definition of Kt, 7i 2 > E + ). We conclude,

2 2 K CN +a. I/ 1 . CN2 -1- at(1 - t) dt i2EK' ((1 - )(7y 2 - E) - t) -

We continue with J2 \Kt. By the same argument as in Lemma 2.12 (c) J2\KtI < 1 CN a where the constant C does not depend on t, we use this in addition with 1-t Lemma 2.10 (2.11), |JIA < CN 1-", and JAi, - xj < CN- 1 to obtain the bound

-t) 2 dpv(x2){(Ail -x 1 )(Ai 2 -x 2 )t(1 dudvdt N S J dpv(x1)J j T i EJi1i 'YY27 \Kt i2 EJ 2

x - (fN)(3) (tv(A _- X1) - Ut(X2 - Ai2 ) +u(Ai 2 - X2) + t(XI - X2) + X2 )

< C j N 3a log N x N2 2 x N 2- 2 0t dt < CNa+x2 log N.

61 Combining the bounds we have obtained gives

JJ fN() - fN( ()dM(x)dM () < CNO'x log N,

which is w(1) when divided by N for small enough. For j, = 1 or 2 and j2 = 3. the proof is similar and we omit the details. U

2.4 Proof of Theorem 2.5. We proceed with the proof of Theorem 2.5. As we did in the sketch of the proof, by (2.7) applied to h = B- 1 (fN) yields

1 F ( (fN)) =MN (fN) +I - LN ((B-fN)')

[l J-lfN() - IfN(Y) dMN(x)dMN()] + . N 2 xy x -

Combining Lemma 2.14 (2.22) and (2.24) we can bound the two terms on the right hand side to get F1 (=-1(fN )) = MN (fN) + W(I). (2.35)

We consider an event A 1 of probability higher than 1 - e-Nc on which C |FN( - 1 (fN)) - MN fN) < '

for some positive constants c, C and J. Using the first loop equation from Proposition 2.6, and the trivial deterministic bounds

MN(fN) = O(NIjfI11) , F(-'(fN)) = O(N(lf 11. + I-(fN)IICl(R))= O(N 2 ),

we obtain

o = Eg (FlN(I-1(fN)) = Ef (Ff(E-1(fN))1A) + Eg (Ff (B~'(fN)A

= EV (MN(fN)1A,)+ o(1) + O(N 2 P (Ac))

= EV (MN (fN)) + 0(1) ,

and thus EV (MN(fN)) = o(1)- (2-36)

We now show recursively that

1 2 F(-- (fN), fN, ',fN) = fNN) - (k - 1)O2 MN(fN)k

62 Here, the set on which the bound holds might vary from one k to another but each bound has probability greater than 1 - e-Nck for each fixed k. The bound holds for k = 1, by (2.35). Now, assume this holds for k > 1. On a set of probability greater than 1 - e-NCk+1 we have by the induction hypothesis, Lemma 2.13 (2.18), and Lemma 2.14 (2.23), for some 6> 0 and a constant C

1 FkN -F(fN), fNv ,fN)-~MN(fN) +(k -)o N(fN) N

LN _ (fN)fN) + 0-21 < C IMN(fN~ MN~fN < NN/2k6/'

On this set, using the definition of FN+l from Proposition 2.6,

FkN -_1(f

=FkN (-(fN) fN, fN)N (fN)+ IMN (fN)kl LNf

= (fNk- (k - 1)o MNfN) k-2 O )N (fN)

2 + MN(fN)k-(

N12 =MN(fN) k+1-k MN fN) k- oj/ 2 )

and this proves the induction. Using the fact that Fk is bounded polynomially and deter- ministically as before, we see that for any k > 1

E V(MN(fN)k+1) = 0- k E NMN (fN)k-1 (1).

Coupled with (2.36), the computation of the moments is then straightforward and we obtain for all k E N N (N \2k\ =,2k (2k)! EVMNfN) f 2kk! (1)

EV (MN(fN)2k+1) = o(j)

This concludes the proof of Theorem 2.5.

Remark 2.15. The same proof would also show the macroscopic central limit Theorem already shown in [BG13b, Shc13, J] with appropriatedecay conditions on f.

63 PART 3

MARiENKO-PASTUR LAW FOR KENDALL'S TAU

Based on joint work with A. S. Bandeira and P. Rigollet

Acknowledgments: The authors thank Alice Guionnet for helpful comments and dis- cussions. A. S. B. was supported by Grant NSF-DMS-1541100. Part of this work was done while A. S. B. was with the Department of Mathematics at MIT. A. L. was sup- ported by Grant NSF-MS-1307704. P. R was supported by Grants NSF-DMS-1541099, NSF-DMS-1541100 and DARPA-BAA-16-46 and a grant from the MIT NEC Corporation.

64 3.1 Introduction Estimating the association between two random variables X, Y E R is a central statistical problem. As such many methods have been proposed, most notably Pearson's correlation coefficient. While this measure of association is well suited to the Gaussian case, it may be inaccurate in other cases. This observation has led statisticians to consider other measures of associations such as Spearman's p and Kendall's T that can be proved to be more robust to heavy-tailed distributions (see, e.g., [LHY+12]). In a multivariate setting, covariance and correlation matrices are preponderant tools to understand the interaction between variables. They are also used as building blocks for more sophisticated statistical questions such as principal component analysis or graphical models. The past decade has witnessed an unprecedented and fertile interaction between ran- dom matrix theory and high-dimensional statistics (see [PA14] for a recent survey). Indeed, in high-dimensional settings, traditional asymptotics where the sample size tends to infin- ity fail to capture a delicate interaction between sample size and dimension and random matrix theory has allowed statisticians and practitioners alike to gain valuable insight on a variety of multivariate problems. The terminology "Wishart matrices" is often, though sometimes abusively, used to refer to p x p random matrices of the form XTX/n, where X is an n x p random matrix with independent rows (throughout this paper we restrict our attention to real random matrices). The simplest example arises where X has i.i.d standard Gaussian entries but the main characteristics are shared by a much wider class of random matrices. This universality phenomenon manifests itself in various aspects of the limit distribution, and in particular in the limiting behavior of the empirical spectral distribution of the matrix. Let W = XTX/n be a p x p Wishart matrix and denote by A,..., AP its eigenvalues; then the empirical spectral distribution 4, of W is the distribution on R defined as the following mixture of Dirac point masses at the Ajs:

k=1

Assuming that the entries of X are independent, centered and of unit variance, it can be shown that p, converges weakly to the Mareenko-Pastur distribution under weak moment conditions (see [EKYY12] for the weakest condition). While this development alone has led to important statistical advances, it fails to capture more refined notions of correlations, notably more robust ones involving ranks and therefore dependent observations. A first step in this direction was made by [YK86], where the matrix X is assumed to have independent rows with isotropic distribution. More recently, this result was extended in [BZ08, O'R12] and covers for example the case of Spearman's p matrix that is based on ranks, which is also a Wishart matrix of the form XTX/n. The main contribution of this paper is to derive the limiting distribution of Kendall's T matrix, a cousin of Spearman's p matrix but which is not of the Wishart type but rather a matrix whose entries are U-statistics. Kendall's T matrix is a very popular surrogate for correlation matrices but an understanding the fluctuations of its eigenvalues is still missing.

65 Interestingly, Mareenko-Pastur results have been used as heuristics, without justification, precisely for Kendall's T in the context of certain financial applications [CCL+15]. As it turns out, the limiting distribution of A, is not exactly Mareenko-Pastur, but rather an affine transformation of it. Our main theorem below gives the precise form of this transformation.

Theorem 3.1. Let X1 ,..., Xn, be n independent random vectors in RP whose compo- nents Xi(k) are independent random variables that have a density with respect to the Lebesgue measure on R. Then as n -- oo and nE -+ -y > 0 the empirical spectral distribu- tion of T converges in probability to

2- Y + 1 3 3'

where Y is distributed according to the standard Martenko-Pasturlaw with parameter -Y (see Theorem D.1 for the appropriatedefinition).

Notation: For any integer k > 1 we write [k] = {1,... , k}. We denote by I, the identity matrix of RP. For a vector x E RP, we denote by x(j) it's jth coordinate. For any p x p matrix M, we denote by diag (M) the p x p diagonal matrix with the same diagonal elements as M and any real number r, we define D,(M) = M - diag (M) + rIp. In other words, the operator D, replaces each diagonal element of a matrix by the value r. We denote sign the sign function with convention that sign(0) = 1. We define the Frobenius norm of a p x p matrix H as 11HI11 := Tr (HT H). Finally, we define Unif([a, b]) to be the uniform distribution on the interval [a, b].

3.2 Kendall's Tau The (univariate) Kendall T statistic [Ess24, Lin25, Lin29, Ken38] is defined as follows. Let

(Y1 , Z 1 ), ... , (Y, Z,) be n independent samples of a pair (Y, Z) c R x R of real-valued random variables. Then the (empirical) Kendall T between Y and Z is defined as

T(Y, Z) =- 1 - >

The statistic T takes values in [-1, 1] and it is not hard to see that it can be expressed as

1 T = (# {concordant pairs} - # {discordant pairs})

Where a pair (i, j) is said to be concordant if Y - Y and Zi - Z have the same sign and discordant otherwise. It is known that the Kendall T statistic is asymptotically Gaussian (see, e.g., [Ken38]). Specifically, if Y and Z are independent, then as n - oc,

V'nT(Y, Z) -> r O, .) (3.1)

66 This property has been central to construct independence tests between two random vari- ables X and Y (see, e.g, [KG90]).

Kendall's T stastistic can be extended to the multivariate case. Let X 1 , ... , Xn, be n independent copies of a random vector X E RP, with independent coordinates X(1), ... , X(p). The (empirical) Kendall T matrix of X is defined to be the p x p matrix whose entries rkI are given by

TkI := T(X(k),X(l)) - ( sign(Xi(k) - Xj(k)) -sign(Xi(l)-X(l)) 1 < k, 1 < p.

(3.2) Note that the r can be written as the sum of (n) rank-one random matrices:

= sign (Xi - Xj) ® sign(Xi - Xj) , (3.3) 2 1i

It is easy to see that Tr = 1 for all i. Together with (3.1), it implies that the matrix

3 1 2 2 is such that E[Viu)ir1 -+ Ip and Var[ij] - 1{i $ j} as n -* oc. This suggests that if the empirical spectral distribution of t converges to a Mardenko-Pastur distribution, it should be a standard Mardenko-Pastur distribution. This heuristic argument supports the affine transformation arising in Theorem 3.1. However, the matrix T is not Wishart and the Mardenko-Pastur limit distribution does not follow from standard arguments. Neverthe- less, Kendall's T is a U-statistic which are known to satisfy the weakest form of universality, namely a Central Limit Theorem under general conditions [Hoe48, dlPG99I. In this paper, we show that in the case of the Kendall T matrix, this universality phenomenon extends to the empirical spectral distribution.

67 3.3 Proof of Theorem 3.1. For any pair (i, j) such that 1 < ij < n and i z j, let A(i~j) be the vector

A(ij) := sign(Xi - X,) and recall from (3.3) that

7= A(ij) 9 A(i-j) (n)21

Akin to most asymptotic results on U-statistics, we utilize a variant of Hoeffding's (a.k.a. Efron-Stein, a.k.a ANOVA) decomposition [Hoe48]:

A(ij) = A(ij,) + A(i,.) + A(-,j) (3.4) where

Aij,.) :E [A(ij) Xi] , A(.,j) := E [A(i,j) lXi] and i(ij) := A(ij) - A}(.,) - A(j,-)

It is easy to check that each of the vectors in the right-hand side of (3.4) are centered and are orthogonal to each other with respect to the inner product E[vT w] where v, w E RP. These random vectors can be expressed conveniently thanks to the following Lemma.

Lemma 3.2. For k G [p], let Fk denote the cumulative distribution function of X(k). Fix i C [n] and let U E RP is a random vector with kth coordinate given by Uj(k) = 2Fk(Xi(k)) - 1 - Unif([-1, 1]). Then

i,-)= -(,i) = Us .

Proof. For any i E [n], observe that since the components of X have a density, then P (Xi(k) = Xj (k)IXj) = 0 so that

E [sign(Xi(k) - Xj(k))IX]= P(Xi(k) > X3 (k))|X ) - P(Xi(k) < Xj(k))|Xj) = 2Fk(Xi(k)) - 1.

The observation that A(i,.) = -(-,i) follows from the fact that A(i,) = -A(j,i). Using (3.4) we obtain the equality:

A( j) 9 A(ij) = M 1 ) + M(2 ) + (M(2) + M (3.5) where M( :=Ip + Do[{A(i,.) + A(., j )} 0 {A(i,-) + A(.,j)}l,

M( := Do [A(i,j) {A(i,.)Z0 + A(.,j)

M( ) Do[A(,j) 0 A(ij)]-

68 By the relation A( i ,.) = -A(.,i) from Lemma 3.2 we have

li:{A(2 ,.) + A(-,,)} 0 {(j~,.) + A(.,3)} 1

Using Lemma 3.2 yields:

1 M ) = IP + Do[Ui 0 Uj] - Do Ui &UI (3.6) ( i

Next, note that, the coordinates of each Uj, i = 1,... ,n are mutually independent so that E[Ui] = 0 and 1 E [U U?] = E[T2 ]I,= (3.7) -I,3 where T ~ Unif ([-1, 1]) Theorem D.1 implies as n -+ oc and -+ -y > 0, the empirical spectral distribution of

C) n YzUi0 DU i= 1 converges in probability to (2/3)Y, where Y is distributed according to the standard Marienko-Pastur law with parameter 'y. Moreover,

2 2 Ui) - I, PE diag (Ui 0 = E {Uj(k)2 - E [Uj (k)2] F k=1 i=1 C S--- + 0, n 2 2 E 1 Do E U2 3 Uj] 2W E E Uj (k) Uj (1) P 2() (ij)e[nj2 P()2 2 2 [ : j F 2 (kl)E[p] :kOI (2,a)E[n] :j2$3

n2 for some constant C > 0 independent of n. By Lemma D.2, the normalized Frobenius norm bounds the Levy distance between spectral measures. An application of Lemma D.2, together with (3.6), triangle inequality and the above bounds yields the following result.

Proposition 3.3. As n -+ oc and { -+ y > 0, the empirical spectral distribution 4p, of 1 5(1 (2,3) Gf) 1 %

69 converges in probability to the law of 2Y + -, where Y is distributed according to the standard Martenko-Pasturlaw with parameter y.

Let ft denote the empirical spectral distribution of T. Using Lemma D.2 once more, we show that the L6vy distance between A and Ap converges to zero. This implies Theorem 3.1 by Proposition 3.3. To that end, observe that by (3.5) and triangle inequality:

1 22 1M + E M 1 E -E T- (n)) j

MEM( (lf, fO(Fothe (ise. (3.9) ETr {

To see this, expand

ETr { (MM)) f,)=

E (E[f{A(ij)(k) - Ui(k) + Uj (k)}{A(itP')(k) - U2 (k) + Uj (k)}] (k,1) E [p]2: k#1

xE[{A(ij)(l) - U?(l) + Uj(l)}{A( 2 ,j/)(l) - U2'(l) + UAW(l)}]), (3.10)

and notice that each expectation is zero unless (i, j) = (i', j') by Tower property and Lemma 3.2. Note that when (ij) =(i',j'), the expression (3.10) is bounded by Cp2 for some C > 0. The equation (3.9) also holds for the collection of matrices {M } and we also have E IM) 112 < Cp 2 by a similar argument. Therefore the right side of (3.8) is bounded by:

() x #{(i, ,i', j') E [n]4 : (ij) = (i',j')} < ,

for some constant C > 0, which vanishes as n -+ oc. This concludes the proof of Theorem 3.1. M

70 APPENDIX

A. Weak Convergence results

Theorem A.1 (Martingale Central Limit Theorem) [Bil08, Theorem 35.12]. Let Mk,N, 1 < k < N, N > 1 be a sequence of zero-mean, square-integrablemartingales adapted to the filtration rk,N, and let FO,N denote the trivial a-field. Let Yk,N denote the martingale difference sequence Yk,N = Mk,N - Mk-1,N where 1 < k < N. Suppose the following conditions hold

N for all e > 0 S E[YlN Yk,N>I>EIk-1,N] -+ 0 in probability, k=1 N 2 EE YNIFk-1,N1 . in probability where a2 . Then MN,N converges in distribution to a Gaussian random variable with mean 0 and variance a2 . Theorem A.2 [CK86, Theorem 1; Lemma 1]. Let P be a space of measurable functions #(z, w) : H x C -* C such that #(z, -) is continuous for all z E H. Let M be a set of measurable functions x : H - C such that I |#(zx(z))jdz

fC O N (Z)) dz- J H

Suppose {N : N G No} is a sequence of stochastic processes N(Z) :H -+ C, with paths in M. If the finite dimensional distributions of N converge weakly to those of o Lebesgue almost everywhere in H and for all E > 0 and 0 E 41:

lim lim sup P (|#(z, N(Z))I -K)+dz e = -1, K -+0 N ->oo \ JH

inf lim supP ( 1#(z, N (z)) Idz = 0 BCH, A(B)

71 B. Concentration inequalities and bounds on the resolvent In this section we record some important bounds required in Part 1. We remind the reader the notation used in that Part

jk3>(z) := ht G (z)nhk - N-1 Tr (Gk (zl)) (B.1) where hk denotes the k-th column of R with k-th element removed. The matrix Rk is defined as the matrix H with the k-th row and column erased and Gk(z) := (Rk - z)- 1 is the resolvent of 7 k. Lemma B.1. Let z = E + 7 with fixed r > 0, TE RandE G (-2+3,2-3). Fix I dN E' > 0. Then for all q > 1, there are positive constants c 1 , c 2 such that

El3 (z)lq < c1 (Nn) (B.2) N d N

E < c 2 (B.3) z N+I Tr (Gk (z))

E + s(z) q

~ 1 Az AP+1 z + s(z) + Ak (z + s(z))i+l (z + s(z))P(z + N-)Tr (Gk))

Then it is known that Iz + s(z)j- 1 < 1 uniformly on the upper-half plane z E C+ (see [BS10a, Eq. 8.1.19]). Furthermore, we have the trivial bound Iz + N 1 Tr (Gk)j 1 < (/dN + N- 1 Tr (Gk))- 1 < dN 1. This gives us

Ibk| < Z Ak + dN Ak l

On the other hand, by (B.2) we know that EjAkf4 C'(N/dN) q, so for example

EIbk| < 1 +O((NT/dN 1 ) - ((dN/)(N?/dN

This last error term can be made o(1) by choosing p large enough (setting dN = N', one finds the condition p + 1 > a/(1 - a)). For (B.4), note the identity (z + s(z))' = -s(z) and so after subtracting -s(z) the remaining error terms in (B.5) are smaller than Ooq((dN/N)q) except the last term. As before this last term is estimated with the trivial bound on (z + N-Tr (Gk))~ 1 and choosing p large enough. M

72 Lemma B.2. Let z = E + T"7.dN For all q > 1 with fixed 71 > 0 andT ER, there are positive constants c, C such that

q/2 3q Eld-' k ,2 ( z)| < C (

E 7 Tr (G(z)) < c max 7 N,

for all k, N.

Proof. Since Gk (z) is an analytic function of z in the upper half plane and -- Gk(z) = G2(Z), we write

d- 62= dzd h Gkhk - Tr (Gk)

(Gk(w))) dw - (z - (ht Gk(w)hk - Tr

where S_ is a small circle of radius rl/2dN around the point z. Note that Sz is of distance O(r//dN) from the real axis and that |z - wI = r1/2dN. By H61der's inequality and Lemma B.1, we have

1 I .. E 6 (wi)I - 1 2dwI Eld1k 21q (27dN )q S S z 1j NI ,1 d i

(N -/2 SCdN / (27dNE \J 2 /2 - g =C ( 72

As for Tr (G2) we similarly have

2 s) (W+ - Gk - - (N-'Tr (w) Trm(w)) (G2) dw =- d Tr Gk -N 7 dN dN dz I( Z + 7i N S Z ) (B.6) where Sz is a small circle of radius r7/2dN with center z = z(s). Then using the known rigidity estimates we get E|N-Tr (Gk(w))-s(z)|q < (Nr/dN) - which we use to estimate

73 (B.6). Then

E N Tr (Gk) dN q N-Tr Gk (w) - m(w))dw -s'(z) E 2lrdN Jsz (W 2 ( i dNN q cN q)) dNs(z) ( 77) j < cmax (N dN I as required. E

T = Lemma B.3. Let z = E+irl dN asin the condition of Lemma B.1 and let k,N O 1 i,j < N}) where R is a Wigner Matrix as in Part 1. Define Ek[.] := E[.IFk,NI. We have the following identity:

I I + h G(Z)2 hk &N( (Ek - Ek_1) - = (Ek - Ek_1) +1 kN(Z), dN Wkk - z - h Gk(z)hk Oz dN Z + Tr (Gk(z)) where

2 (Wkk - 6k,(z))2Gkk(z) 1 6% (z)(kk - N (z)) Ek,N(z) := (Ek - Ek_1) (z+ Tr (Gk (z)))2 dN (z + ! Tr (Gk (z)))

Proof. We omit the explicit z-dependence on the resolvent and make use of the quantities given in Qj(L in (B.1) and two further quantities

~1 1 b + := I z+ LTr Gk' G =kk- z - htGkhk

Using the identity z + 1Tr Gk = (-J + Wkk - G-1) we obtain

1 + hG2hk 1+ TrG (B.7) 'Hk - z - ht Gkhk -z - 1Tr Gk

(1 + hG 2hk)((kk - 6k, 1 G-1(1 +htGikhk) G- |(I+ N-'TrG2) 1 6-j G-1 G-j-1 G-1>-1

Gkk(1+ hG 2hk)(Wk k - k,2 b- 1

= - I> (1 + hG hk) (Wkk - k1) + Ii(1 + ht G hk) (-kk - - 6 (B.8)

- - I>(1 + N- Tr G2)((kk - 6k1) - 6k,2 6k (B.9)

- I5 (2 k1) + 62 (1 + - 6k,1)2 (B.10)

74 where to obtain (B.7) we expanded using the simple identity Gkk = -k - bkGkk (Na - 1 (kk). Then to obtain (B.9) from (B.8) we used that ht G'hk = N- Tr G2 + 6k2. The two terms in (B.9) combine as an exact derivative

-bk(R-kk - 3 ') = -k1+ TrGk)(Nkk N') 6bk while the remaining two terms in (B.10) combine to give the error term Ek,N(z). Finally, we conclude the proof of the identity by applying (Ek - Ek-1) to (B.7), noting that the second term vanishes. M

Proposition B.4. Fix ij > 0. Then under the same conditions as Theorem 1 in Part 1 we have that there exists a positive constants No, MO, C, co, and c1 = ci(C, c) such that

EIVN(Z)12 =Var Tr G(E + zdN) -2 dN for all N > No such that Nrl/dN > Mo, r1/dN < i|, and IE + t/dNI < 2 + rT/dN-

Proof. By the triangle inequality, we can write

EIVN (z) 2 =EI(N/dN)(sN(E + z/dN) - ESN(E + z/dN)) 1 2 2EI(N/dN)(sN(E + z/dN) - s(E + z/dN)) 2 + 21(N/dN)E(sN(E + z/dN) - s(E + zdN)) 12 <4E|(N/dN)(sN( E + z/dN) - s(E + zdN)) 12 where the last line follows from Jensen's inequality. The latter expectation is by definition the integral

2uP(ISN(E+z/dN) - s(E + z/dN)I > udN/N) du

= r-2 j 2KP (!SN(E + z/dN) - s(E + z/dN)I> KjN dK

It suffices to bound the contribution to the above integral when K > 1. We apply Theorem *INSERT* from Part 1 with e.g. q = 3 to obtain a convergent estimate. U

75 C. Function extension The following Lemma is useful for proving results about the linear spectral statistic of a random matrix via analogous results for the resolvent. It can be found in [AGZ09, Lemma 5.5.5].

Lemma C.1. Let f : R -+ R be a compactly supported function with first derivative 2 continuous (denote it by f E Cc (R)) and consider an extension L' : R -+ C that inherits the same regularity and such that I'f(x, 0) = f (x) for all x and (f f(x, 0)) = 0. Further assume that 5T f (x, y) = 0(y) as y - 0. Then one has

f (A)+ H[f= dx dy,

where : + i and H : L 2 (R) * L 2 (R) is the Hilbert Transform of f:

H[f] := P.V. dt. 7 _co X - t Remark C.2. Our particularchoice of If in this paper will be the following (see [Dav95]):

1P f (x, y) = (f(x) + i(f (x + y) - f ()) J y, where J(y) is a smooth function of compact support, equal to 1 in a neighborhood of 0 and equal to 0 if y > 1.

Proof. We make use of the the substitution x - x + A and compute the real part (denoting I f = u + iv) DO f0x Jf(x + A, y) Sdy Idx= S0 -o x+iy 1 1 x (u(x, y) _ Ov(x,yY + y( u(x,y) +v(x, y) 2 lim dxdy 2 ax y . + ) E-+ f~cJ

where A,,R {E < fx 2 + y 2 < R} and R is sufficiently large (by the compact support hypothesis). Changing to polar coordinates in the latter integral, a simple computation shows that the integral transforms as

1 f r R &u(rcos(0), r sin(0)) & v(rcos(O),rsin(O)) drr lim dO O

flimd u(R cos(0), R sin(0)) - fr dO u(E cos(O), e sin(O)) + v( dr I J0 J0o E

76 D. The Standard Mareenko-Pastur Law. We include here the definition of the standard Mareenko-Pastur law and a bound on the distance between empirical spectral distributions of two matrices.

Theorem D.1 (Mareenko-Pastur law) [BS10b, Theorem 3.6]. Let X 1 ,.. ., X" be indepen- dent copies of a random vector X E RP such that

E[X]=O, E[X®X]=Ip.

2 Suppose that n -> oo, { -* y > 0 and define a (1- /;)2, and b = (1 + /Y) . Then the empirical spectral distribution of the matrix

Xi0 Xi, i=1 converges almost surely to the standard Mar6enko-Pasturlaw which has density:

ifa 1. -Y Lemma D.2 [BS10b, Corollary A.41]. Let A and B be two p x p normal matrices, with empirical spectral distributions AA and /aB. Then

1 L(A^A A)3 -|| IA- B||1, p where L(/2A, j1 B) is the L6vy distance between the distribution functions fj^ and 1 B.

77 REFERENCES

[AGZ09] G. W. Anderson, A. Guionnet, and 0. Zeitouni. An Introduction to Random Matrices. Cambridge University Press, 2009.

[AHM11] Yacin Ameur, Hikan Hedenmalm, and Nikolai Makarov. Fluctuations of eigenvalues of random normal matrices. Duke Math. J., 159(1):31-81, 2011.

[AZ061 Greg W Anderson and Ofer Zeitouni. A CLT for a band matrix model. Probability Theory and Related Fields, 134(2):283-338, 2006.

[BBNY16] Roland Bauerschmidt, Paul Bourgade, Miika Nikula, and Horng-Tzer Yau. The two-dimensional coulomb plasma: quasi-free approximation and central limit theorem. arXiv preprint arXiv:1609.08582, 2016.

[BD14] J. Breuer and M. Duits. Universality of mesoscopic fluctuations for orthogonal polynomial ensembles. eprint = arXiv:1411.5205, 2014.

[BdMK99a] A. Boutet de Monvel and A. Khorunzhy. Asymptotic distribution of smoothed eigenvalue density. II. Wigner random matrices. Random Oper. Stochastic Equations, 7(2):149-168, 1999. [BdMK99b] A. Boutet de Monvel and A. Khorunzhy. Asymptotic distribution of smoothed eigenvalue density. I. Gaussian random matrices. Random Oper. Stochastic Equations, 7(1):1-22, 1999.

[BDS16] Jinho Baik, Percy Deift, and Toufic Suidan. Combinatorics and random ma- trix theory, volume 172 of Graduate Studies in Mathematics. American Math- ematical Society, Providence, RI, 2016.

[Bek15] Florent Bekerman. Transport maps for -matrix models in the multi-cut regime. arXiv preprint arXiv:1512.00302, 2015.

[BEY14a] Paul Bourgade, Liszl6 Erd6s, and Horng-Tzer Yau. Edge universality of beta ensembles. Communications in Mathematical Physics, 332(1):261-353, 2014.

[BEY14b] Paul Bourgade, Laszl6 Erd6s, and Horng-Tzer Yau. Universality of general /3-ensembles. Duke Mathematical Journal, 163(6):1127-1190, 2014.

[BEYY14] P. Bourgade, L. Erd6s, H-T. Yau, and J. Yin. Fixed energy universality for generalized Wigner matrices. eprint = arXiv:1407.5606, 2014. [BEYY15] Paul Bourgade, L Erd6s, H-T Yau, and Jun Yin. Fixed energy universal- ity for generalized Wigner matrices. Communications on Pure and Applied Mathematics, 2015. [BFG13] Florent Bekerman, , and Alice Guionnet. Transport maps for /3-matrix models and universality. Communications in Mathematical Physics, 338(2):589-619, 2013.

78 [BG] A. Borodin and V. Gorin. General beta Jacobi corners process and the Gaus- sian Free Field. To appear in Communications on Pure and Applied Mathe- matics. eprint = arXiv:1305.3627. [BG13a] Gaetan Borot and Alice Guionnet. Asymptotic expansion of '3 matrix models in the one-cut regime. Communications in Mathematical Physics, 317(2):447- 483, 2013. [BG13b] Gatan Borot and Alice Guionnet. Asymptotic expansion of 3 matrix models in the multi-cut regime. arXiv preprint arXiv:1303.1045, 2013.

[BGGM13] F. Benaych-Georges, A. Guionnet, and C. Male. Central limit theorems for linear statistics of heavy tailed random matrices. eprint = arXiv:1301.0448, 2013.

[Bil08] Patrick Billingsley. Probability and measure. John Wiley & Sons, 2008.

[BK14] P. Bourgade and J. Kuan. Strong Szeg6 asymptotics and zeros of the zeta function. Communications on Pure and Applied Mathematics, 67(6):1028- 1044, 2014.

[Bor14] A. Borodin. CLT for spectra of submatrices of Wigner random matrices. Mosc. Math. J., 2014(1):29-38, 2014.

[BS04] Z.D. Bai and J.W. Silverstein. CLT for linear spectral statistics of large-dimensional sample covariance matrices. The Annals of Probability, 32(1A):553-605, 2004.

[BS10a] Z.D. Bai and J.W. Silverstein. Spectral Analysis of Large Dimensional Ran- dom Matrices. Springer, 2010.

[BS10b] Zhidong Bai and Jack W. Silverstein. Spectral analysis of large dimensional random matrices. Springer Series in Statistics. Springer, New York, second edition, 2010.

[BWZ09] Z. Bai, X. Wang, and W. Zhou. CLT for Linear Spectral Statistics of Wigner matrices. Electronic Journal of Probability, 14:2391-2417, 2009.

[BY05] Z. D. Bai and J. Yao. On the convergence of the spectral empirical process of Wigner matrices. Bernoulli, 11(6):949-1128, 2005. [BZ93] E. Br6zin and A. Zee. Universality of the correlations between eigenvalues of large random matrices. Nucl. Phys. B., 402:613-627, 1993. [BZ08] Zhidong Bai and Wang Zhou. Large sample covariance matrices without independence structures in columns. Statist. Sinica, 18(2):425-442, 2008.

[CCL+ 15] Francois Cr6nin, David Cressey, Sophie Lavaud, Jiali Xu, and Pierre Clauss. Random matrix theory applied to correlations in operational risk. Journal of OperationalRisk, 10(4):45-71, December 2015.

79 [CK86] Heinz Cremers and Dieter Kadelka. On weak convergence of integral func- tionals of stochastic processes with applications to processes taking paths in LE. Stochastic processes and their applications, 21(2):305-317, 1986. [CMS14] Claudio Cacciapuoti, Anna Maltsev, and Benjamin Schlein. Bounds for the Stieltjes transform and the density of states of Wigner matrices. Probability Theory and Related Fields, pages 1-59, 2014. [Dav95] E. B. Davies. The functional calculus. J. London Math. Soc.(2), 52(1):166- 176, 1995. [DJ13] M. Duits and K. Johansson. On mesoscopic equilibrium for linear statistics in Dyson's Brownian Motion. eprint = arXiv:1312.4295, 2013. [dlPG99] Victor H. de la Peia and Evarist Gin6. Decoupling. Probability and its Applications (New York). Springer-Verlag, New York, 1999. [DM63] F.J. Dyson and M.L. Mehta. Statistical Theory of the Energy Levels of Com- plex Systems. IV. J. Math. Phys., 4:701, 1963. [DOT03] P. Doukhan, G. Oppenheim, and M. Taqqu. Theory and Applications of Long- Range Dependence. Birkhauser, Boston, 2003. [DRSV14] Bertrand Duplantier, Remi Rhodes, Scott Sheffield, and Vincent Vargas. Crit- ical Gaussian multiplicative chaos: convergence of the derivative martingale. Ann. Probab., 42(5):1769-1808, 2014. [EK13] Laszl6 Erd6s and Antti Knowles. Averaging Fluctuations in Resolvents of Random Band Matrices. Ann. Henri Poincari, 14:1837-1926, 2013. [EK14a] L. Erd6s and A. Knowles. The Altshuler-Shklovskii Formulas for Random Band Matrices I: the Unimodular Case. Communications in Mathematical Physics, 2014. [EK14b] L. Erd6s and A. Knowles. The Altshuler-Shklovskii Formulas for Random Band Matrices II: the General Case. Ann. Inst. Henri Poincard, 2014. [EKYY12] Laszl6 Erd6s, Antti Knowles, Horng-Tzer Yau, and Jun Yin. Spectral statis- tics of Erd6s-R6nyi Graphs II: Eigenvalue spacing and the extreme eigenval- ues. Comm. Math. Phys., 314(3):587-640, 2012. [EKYY13] L. Erd6s, A. Knowles, H-T. Yau, and J. Yin. The Local Semicircle Law for a General Class of Random Matrices. Electron. J. Probab., 18:1-58, 2013. [Erdl1] L. Erd6s. Universality of Wigner random matrices: a survey of recent results. Russian Math. Surveys, 66:507-626, 2011. [Ess24] Fredrick Esscher. On a method of determining correlation from the ranks of the variates. Scandinavian Actuarial Journal, 1924(1):201-219, 1924. [ESY09a] L. Erd6s, B. Schlein, and H-T. Yau. Local Semicircle Law and Complete Delocalization for Wigner Random Matrices. Comm. Math. Phys., 287:641- 655, 2009.

80 [ESY09b] L. Erd6s, B. Schlein, and H-T. Yau. Wegner estimate and Level Repulsion for Wigner Random Matrices. Int. Math. Res. Not., 2010:436-479, 2009.

[EYY12a] L. Erd6s, H-T. Yau, and J. Yin. Bulk universality for generalized Wigner matrices. Probab. Theory Related Fields, 154:341-407, 2012.

[EYY12b] L. Erd6s, H-T. Yau, and J. Yin. Rigidity of eigenvalues of generalized Wigner matrices. Adv. Math., 229:1435-1515, 2012.

[FK14] Yan V. Fyodorov and Jonathan P. Keating. Freezing transitions and extreme values: random matrix theory, and disordered landscapes. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci., 372(2007):20120503, 32, 2014.

[FKS13] Y.V. Fyodorov, B.A. Khoruzhenko, and N.J. Simm. Fractional Brownian motion with Hurst index H = 0 and the Gaussian Unitary Ensemble. eprint = arXiv:1312.0212, 2013.

[FLDR12] Y.V. Fyodorov, P. Le Doussal, and A. Rosso. Counting function fluctuations and extreme value threshold in multifractal patterns: the case study of an ideal 1/f noise. J. Stat. Phys., 149:898-920, 2012.

[Gri76] L. S. Grinblat. A limit theorem for measurable random processes and its applications. Proc. Amer. Math. Soc., 61:371-376, 1976.

[HK16] Yukun He and Antti Knowles. Mesoscopic eigenvalue statistics of Wigner matrices. eprint = arXiv:1603.01499, 2016.

[HKO01] C. P. Hughes, J. P. Keating, and N. O'Connell. On the Characteristic Polyno- mial of a Random Unitary Matrix. Comm. Math. Phys., 220:429-451, 2001.

[Hoe48] Wassily Hoeffding. A class of statistics with asymptotically normal distribu- tion. Ann. Math. Statistics, 19:293-325, 1948.

[JL15] K. Johannson and G. Lambert. Gaussian and non-Gaussian fluctua- tions for mesoscopic linear statistics in determinantal processes. eprint = arXiv:1504.06455, 2015.

[Joh98] K. Johansson. On fluctuations of eigenvalues of random Hermitian matrices. Duke Math. J., 91(1):151-204, 1998.

[Ken38] M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81-93, 1938.

[KG90] Maurice Kendall and Jean Dickinson Gibbons. Rank correlation methods. A Charles Griffin Title. Edward Arnold, London, fifth edition, 1990.

[KKP96] A.M. Khorunzhy, B.A. Khoruzhenko, and L.A. Pastur. Asymptotic properties of large random matrices with independent entries. J. Math. Phys., 37:5033, 1996.

[Lam16] G. Lambert. Mesoscopic fluctuations for unitary invariant ensembles. eprint - arXiv:1510.03641v2, 2016.

81 [LHY+ 12] Han Liu, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman. High- dimensional semiparametric Gaussian copula graphical models. Ann. Statist., 40(4):2293-2326, 2012. [Lin25] Jarl Waldemar Lindeberg. Uber die korrelation. Den VI skandinaviske Matematikerkongres i Kobenhavn, pages 437-446, 1925.

[Lin29] JW Lindeberg. Some remarks on the mean error of the percentage of corre- lation. Nordic Statistical Journal, 1:137-141, 1929.

[Lod15] N. Lodhia, A. Simm. Mesoscopic linear statistics of Wigner matrices. eprint = arXiv:1503.03533, 2015.

[LP09a] A. Lytova and L. Pastur. Central limit theorem for linear eigenvalue statistics of random matrices with independent entries. Ann. Probab., 37(5):1778-1840, 2009.

[LP+09b] Anna Lytova, Leonid Pastur, et al. Central limit theorem for linear eigen- value statistics of random matrices with independent entries. The Annals of Probability,37(5):1778-1840, 2009.

[LS16] Thomas Lebl6 and Sylvia Serfaty. Fluctuations of two-dimensional coulomb gases. arXiv preprint arXiv:1609.08088, 2016.

[LSSW14] A. Lodhia, S. Sheffield, X. Sun, and S. Watson. Fractional gaussian fields: a survey. eprint = arXiv:1407.5598, 2014.

[Meh04] M. L. Mehta. Random Matrices. Academic Press; 3rd edition, 2004.

[MPS95] A Boutet Monvel, Leonid Pastur, and Maria Shcherbina. On the statistical mechanics approach in the random matrix theory: integrated density of states. Journal of statistical physics, 79(3):585-611, 1995.

[MvN68] B. B. Mandelbrot and J. W. van Ness. Fractional Brownian Motions, Frac- tional Noises and Applications. SIAM Review, 10(4):422-437, 1968.

[O'R12] Sean O'Rourke. A note on the Marchenko-Pastur law for a class of random matrices with dependent entries. Electron. Commun. Probab., 17:no. 28, 13, 2012.

[OR14] S. O'Rourke and D. Renfrew. Central limit theorem for linear eigenvalue statistics of elliptic random matrices. eprint = arXiv:1410.4586, 2014.

[PA14] Debashis Paul and Alexander Aue. Random matrix theory in statistics: a review. J. Statist. Plann. Inference, 150:1-29, 2014.

[Pas06] L. Pastur. Limiting laws of linear eigenvalue statistics for Hermitian matrix models. J. Math. Phys., 47:103303 (22pp), 2006.

[RRV11] Jos6 Ramirez, Brian Rider, and Balint Viraig. Beta ensembles, stochastic Airy spectrum, and a diffusion. Journal of the American Mathematical Society, 24(4):919-944, 2011.

82 [RS06] B. Rider and J.W. Silverstein. Gaussian fluctuations for non-Hermitian ran- dom matrix ensembles. The Annals of Probability, 34(6):2118-2143, 2006. [RV07] B. Rider and B. VirAig. The noise in the Circular Law and the Gaussian free field. Int. Math. Res. Not., 2:33pp, 2007. [RV13] Mark Rudelson and Roman Vershynin. Hanson-Wright inequality and sub- Gaussian concentration. Electron. Commun. Probab., 18:no. 82, 1-9, 2013. [Shcll] M. Shcherbina. Central Limit Theorem for linear eigenvalue statistics of the Wigner and sample covariance random matrices. J. Math. Phys., Analysis, Geometry, 7, 2011. [Shc13] Mariya Shcherbina. Fluctuations of linear eigenvalue statistics of # matrix models in the multi-cut regime. Journal of Statistical Physics, 151(6):1004- 1034, 2013. [Shc14] Mariya Shcherbina. Change of variables as a method to study general 3- models: bulk universality. Journal of Mathematical Physics, 55(4):043504, 2014. [She07] S. Sheffield. Gaussian free fields for mathematicians. Probab. Theory Related Fields, 139(3-4):521-541, 2007. [SosOO] A. Soshnikov. The central limit theorem for local linear statistics in clas- sical compact groups and related combinatorial identities. Ann. Probab., 28(3):1353-1370, 2000. [SS13] Oded Schramm and Scott Sheffield. A contour line of the continuum Gaussian free field. Probab. Theory Related Fields, 157(1-2):47-80, 2013. [SW13] P. Sosoe and P. Wong. Regularity conditions in the CLT for linear eigenvalue statistics of Wigner matrices. Adv. Math., 249:37-87, 2013. [TV11] T. Tao and V. Vu. Random matrices: Universality of the local eigenvalue statistics. Acta Mathematica, 206:127-204, 2011. [TV12] T. Tao and V. Vu. A central limit theorem for the determinant of a Wigner matrix. Adv. Math., 231:74-101, 2012. [Unt09] J. Unterberger. Stochastic calculus for fractional Brownian motion with Hurst exponent H > 1/4: A rough path method by analytic extension. Ann. Probab., 37(2), 2009. [VV09] Benedek Valk6 and Bilint Viraig. Continuum limits of random matrices and the Brownian carousel. Inventiones mathematicae, 177(3):463-508, 2009. [Webl4] C. Webb. The characteristic polynomial of a random unitary matrix and Gaussian multiplicative chaos - the L 2-phase. eprint = arXiv:1410.0939, 2014. [YK86] Y. Q. Yin and P. R. Krishnaiah. Limit theorems for the eigenvalues of prod- uct of large-dimensional random matrices when the underlying distribution is isotropic. Teor. Veroyatnost. i Primenen., 31(2):394-398, 1986.

83