Global and Local Limit Laws for Eigenvalues of the Gaussian Unitary Ensemble and the Wishart Ensemble
Author: Supervisor: Zhou Fan Prof. Horng-Tzer Yau
Submitted to the Harvard University Department of Mathematics in partial fulfillment of the requirements for the degree of A.B. in Mathematics
March 22, 2010 Contents
1 Introduction 1
2 Convergence of the Empirical Distribution of Eigenvalues 3 2.1 Convergence in moment of the mean empirical distribution Ln ...... 4 2.2 Convergence of the empirical distribution Ln ...... 7 2.3 Computation of the limit distribution of Ln for the GUE and Wishart Ensemble 9 2.3.1 The GUE and the semicircle law ...... 10 2.3.2 The Wishart Ensemble and the law of Mar˘cenko-Pastur ...... 12
3 Joint Distribution of Eigenvalues 15 3.1 Joint eigenvalue density for the GUE ...... 18 3.2 Joint eigenvalue density for the Wishart Ensemble ...... 18
4 Convergence of the Local Eigenvalue Correlation Function 20 4.1 Properties of the k-point eigenvalue correlation functions ...... 21 4.2 Limiting behavior of the correlation function in the bulk of the spectrum . . . 24 4.2.1 Hermite polynomials and the GUE ...... 26 4.2.2 Laguerre polynomials and the Wishart Ensemble for p = 1 ...... 28
1 Introduction
Linear algebra deals with the study of vector spaces and linear transformations between them. These transformations can be represented as matrices, and various properties of a linear transformation are reflected in the properties of its corresponding matrix. When the elements of a matrix are replaced by random variables, tools from probability theory can be used to study the resulting properties of these matrices. This paper presents a survey of several results in random matrix theory, the study of spectral properties of matrices with random elements, and analyzes two families of random matrices in a unified way. To motivate our discussion, we begin with three examples:
Example 1.1 (Nuclear Physics [18]). Consider the nucleus of a large atom (e.g. Uranium- 238). We are interested in determining the energy levels of this nucleus. From quantum mechanics, the nuclear energy levels En are given as the eigenvalues of the Hamiltonian operator H of the system, Hψn = Enψn. Unfortunately, for large atoms, the Hamiltonian H cannot be explicitly computed and so we cannot explicitly determine its spectrum. However, we may instead model the system by approximating the infinite-dimensional Hilbert space of wave functions with a large finite-dimensional space, and approximating H by a matrix operator on this space with random elements. We may then study local statistics of the energy levels, such as their pairwise joint distributions or distributions concerning the spacings between them, using this model. By imposing conditions on the joint distribution of the matrix elements based on the symmetries of the system, we find that the local statistical properties of the eigenvalue distributions of these random matrix models closely match those of observed nuclear energy levels in large atoms. Example 1.2 (Wireless Communications [27]). Consider the transmission of information over a wireless communication channel. The relation between a vector of transmitted data x and received data y can be modeled by a linear channel y = Hx + n, where H is a random channel matrix and n is a vector of random Gaussian noise. An information theoretic quantity of interest is the channel capacity, an upper bound on the rate of information transmission over the channel. Suppose that x ∈ Cm and y ∈ Cn, and the matrix HH∗ has
1 1 eigenvalues λ1, . . . , λn. Let FH (x) = n · #{λi ≤ x} be the cumulative distribution function of the probability mass function of these eigenvalues. Then, under suitable assumptions on the distributions of x and H, the channel capacity is given by the expected value of
Z ∞ 2 nE[kxk ] n log 1 + 2 x dFH (x) 0 kE[knk ] over the distribution of the channel matrix H. In particular, the channel capacity is depen- ∗ dent on the global distribution FH of the eigenvalues of HH . Example 1.3 (Financial Portfolio Optimization [16, 21]). Consider a collection of n stocks. We are interested in understanding the risk associated to a portfolio of these stocks and in constructing portfolios of low risk. If we model the return of each stock as a random variable ri and consider the covariance matrix C with entries Cij = Cov(ri, rj), then the risk associated to a portfolio p = (p1, . . . , pn) where pi is the amount invested in stock i is given by ptCp. Low risk portfolios can be selected to have large components in the directions of eigenvectors of C with the lowest eigenvalues. However, by estimating C using the sample covariance matrix of observed returns for each stock, the noise in the sample matrix can cause us to misidentify the eigenvectors corresponding to the lowest eigenvalues of C and to underestimate the risk associated with the chosen portfolio when n is large. We may use properties of the global eigenvalue distribution of the random sample covariance matrix to devise more accurate methods of portfolio selection and adjust for the noise factor in the computation of risk. There exists a body of research pertaining to each of these applications of random matrix theory. It is not the goal of this paper to discuss the details of these applications; we refer the interested reader to the listed references. We present these examples as an illustration of the diversity of the range of applications of this theory and as motivation for the specific families of random matrices and the specific properties of their eigenvalue distributions that we will examine. In particular, we will focus on two families of random matrices, defined as follows: (n) (n) Definition 1.4. Let {αij }n∈N,1≤i≤j≤n and {βij }n∈N,1≤i All three of the examples above deal with random matrices of large dimensionality. In single-variate statistics, large collections of random variables are analyzed using limit theo- rems such as the Law of Large Numbers and the Central Limit Theorem. The goal of this paper is to develop similar limit theorems for spectral properties of interest for the GUE and Wishart Ensemble, as the matrix size tends to infinity. Our examples motivate the study of the limit theorems for two distinct spectral properties: the global distribution of eigen- values, as relevant to Example 1.2, and the local statistics of the eigenvalue distribution, as relevant to Example 1.1. We will address these properties separately in the subsequent 2 sections. We will see that the assumption of a normal distribution in Definitions 1.4 and 1.5 is not necessary for our study of the global eigenvalue distribution, but we will rely on this assumption when we turn to the examination of the local statistics. A theme of this paper is the unified derivation of our results for the GUE and Wishart Ensemble, the studies of which had historical origins in different fields of application. Each section of the paper takes advantage of a similarity between the two families of matrices to derive a general result, which is then specialized to the GUE and Wishart cases. In Section 2, we prove the existence of a limit law for the global empirical distribution of eigenvalues for a class of general band matrices, using a combinatorial and graph theoretic approach. In Section 3, we use a change of variables formula to derive the form of the joint density function for eigenvalues of matrix distributions invariant under conjugation by unitary matrices. Finally, in Section 4, we compute a local correlation function for matrix eigenvalues in terms of orthogonal polynomials and derive a limit law for this function in the cases of the GUE and Wishart Ensemble with p = 1. 2 Convergence of the Empirical Distribution of Eigen- values We prove in this section the convergence in probability of the empirical distribution of eigenvalues for a class of general band matrices, and we specialize the result to the GUE and Wishart Ensemble by an explicit computation of the limit distribution. The result was first proven for the GUE by Wigner in [28] and [29], and we will follow Wigner’s general strategy using the method of moments, with the generalizations provided in [2]. Our presentation draws on [1] and [2]. The class of band matrices we choose to work with is much more (n) general than what is needed for the GUE and Wishart Ensemble—we do not require αij (n) and βij from Definitions 1.4 and 1.5 to have normal distributions, or even to be identically distributed. Specifically, we will consider a class of matrices Xn according to the following definition: Definition 2.1. Let s : [0, 1] × [0, 1] → [0, ∞) be a symmetric bounded measurable function (n) with discontinuity set of measure zero. Let {ξij }n∈N,1≤i,j≤n be a collection of random variables with the following properties: (n) (n) 1. ξii is real-valued for all i and ξij is complex-valued for all i 6= j. (n) 2. For each n, {ξij }1≤i≤j≤n is independent. (n) (n) 3. For all i 6= j, ξij = ξji . (n) 4. E[ξij ] = 0 for all i, j. 2 5. ξ(n) = s i , j for all i, j. E ij n n k (n) 6. For each k ∈ N, supn,i,j E ξij < ∞. (n) √1 For each n ∈ N, let Xn be the matrix with Xn(i, j) = n ξij . √1 In particular, the GUE scaled by n satisfies this definition with s ≡ 1. The primary concern of this section is the following notion of an empirical eigenvalue distribution: 3 Definition 2.2. Suppose that Xn has eigenvalues λ1 ≤ ... ≤ λn. The empirical distribu- n 1 X tion of eigenvalues of X is the counting measure L = δ . The mean empirical n n n λi i=1 distribution of eigenvalues is Ln = ELn. With this notation, Ln is a random measure over our probability space such that Ln(S) is the fraction of eigenvalues of Xn contained in the set S. Ln is a fixed measure determined by our matrix class {Xn}, so that Ln(S) is the expected fraction of eigenvalues in S for a random matrix from our matrix class. The central result of this section is the following proposition: Proposition 2.3. There exists a measure µ of bounded support, symmetric about 0, and uniquely defined by its moments, such that Ln converges weakly, in probability, to µ. That is, for any f ∈ Cb(R) and any ε > 0, Z Z lim P f dLn − f dµ > ε = 0. n→∞ Taking f to be a continuous approximation of an indicator function for an interval S, this R proposition tells us that for large n, Ln(S) ≈ S dµ with high probability, i.e. the fraction of eigenvalues of Xn contained in S is approximately the integral of a fixed density function over S. Visually, we may view this result as stating that a histogram of eigenvalues of Xn will, with high probability, converge in shape to the graph of a fixed density function. We will prove Proposition 2.3 in two steps in Sections 2.1 and 2.2 and explicitly compute the limit measure µ for the GUE and Wishart Ensemble in Section 2.3. 2.1 Convergence in moment of the mean empirical distribution Ln Our proof of Proposition 2.3 is based on a computation of the moments of Ln using a combinatorial analysis. We will use the following definitions: Definition 2.4. An n-path of length k is a (k + 1)-tuple of indices i = (i1, i2, . . . , ik+1) ∈ k+1 {1, . . . , n} . The path is closed if ik+1 = i1. Let l(i) = k be the length and w(i) = #{i1, . . . , ik+1} be the number of distinct indices of the path i. Let us consider the path i = (i1) of a single index to be a closed path with l(i) = 0 and w(i) = 1. For any ordered pair 0 0 0 0 0 (i, i ) of indices, let fi(i, i ) = #{j | ij = i, ij+1 = i }. Let bi(i, i ) = #{j | ij = i , ij+1 = i} 0 0 0 if i 6= i and bi(i, i ) = 0 if i = i . Definition 2.5. A Wigner n-path is a closed n-path i with the following properties: 1. For each j = 1, . . . , k, fi(ij, ij+1) = 1 and bi(ij, ij+1) = 1. l(i) 2. w(i) = 2 + 1. We may think of an n-path of length k as a walk of k steps along the edges of the complete 0 0 undirected graph of n vertices. Then fi(i, i ) and bi(i, i ) are the numbers of times the walk traverses the edge i → i0 in the forward and backward directions respectively. (If i = i0, we count a traversal of this self-loop as a traversal in the forward direction only.) The first condition of the definition of a Wigner n-path specifies that the path traverses no self-loops and traverses each edge along the path exactly twice, once in each direction. Then the l(i) number of undirected edges traversed by the Wigner n-path is 2 , and hence the second condition specifies that the subgraph of undirected edges traversed by the Wigner n-path is a tree. We consider an equivalence relation on paths under permutation of the vertex labels: 4 Definition 2.6. Two paths i and i0 are equivalent if there is a bijection ϕ : N → N that maps i to i0 when applied componentwise. Let [i] denote the equivalence class of i. As paths in an equivalence class have the same length and number of distinct indices, let l([i]) = l(i) and w([i]) = w(i). Let Ck be the set of equivalence classes of closed paths of length k and Wk ⊂ Ck be the set of equivalence classes of Wigner paths of length k. We note that this notion of equivalence does not make reference to the size n of the index set, so an equivalence class C ∈ Ck contains an infinite number of paths, and the n-paths belonging to C will be denoted as C ∩ {1, . . . , n}k+1. Lemma 2.7. The moments of Ln satisfy Z k 1/2 k 1 X X Y ij ij+1 lim x dLn = lim s , , (1) n→∞ n→∞ nk/2+1 n n C∈Wk i∈C∩{1,...,n}k+1 j=1 R k if the limit on the right exists. In particular, limn→∞ x dLn = 0 if k is odd. Proof. Letting λ1 ≤ ... ≤ λn be the eigenvalues of Xn, " n # Z 1 X xkdL = λk n E n i i=1 1 = tr Xk E n n n 1 X = [X (i , i )X (i , i ) ...X (i , i )] n E n 1 2 n 2 3 n k 1 i1,...,ik=1 1 X X h (n) (n) i = E ξi i . . . ξi i . nk/2+1 1 2 k 1 C∈Ck i∈C∩{1,...,n}k+1 0 0 0 0 If Ei = (i, i ) | i ≤ i , (i, i ) = (ij, ij+1) or (i , i) = (ij, ij+1) for some j is the set of edges traversed by any closed path i, then f (i,i0) b (i,i0) h (n) (n) i Y (n) i (n)∗ i ξ . . . ξ = ξ 0 ξ 0 E i1i2 iki1 E ii ii 0 (i,i )∈Ei (n) (n) by the independence condition on {ξij }i≤j. Therefore, since each ξij has mean zero, [ξ(n) . . . ξ(n) ] = 0 for any i that traverses an edge only once. Also, by the bounded E i1i2 iki1 (n) moment condition on the ξij , 0 0 0 0 fi(i,i ) bi(i,i ) fi(i,i )+bi(i,i ) Y (n) (n)∗ Y (n) E ξii0 ξii0 ≤ E ξii0 < A[i] 0 0 (i,i )∈Ei (i,i )∈Ei for some constant A[i] depending on the equivalence class of i. If C is an equivalence class of paths for which l(C) = k and w(C) = m, then there are n(n − 1) ... (n − m + 1) paths in k+1 k C ∩ {1, . . . , n} . Thus, if m < 2 + 1, then 1 X h (n) (n) i n(n − 1) ... (n − m + 1) lim E ξi i . . . ξi i ≤ lim · AC = 0. n→∞ nk/2+1 1 2 k 1 n→∞ nk/2+1 i∈C∩{1,...,n}k+1 k On the other hand, if each edge in Ei is traversed at least twice, then i traverses at most 2 k edges, and hence m ≤ 2 + 1. Equality holds when each edge is traversed exactly twice and 5 l(i) w(i) = 2 + 1, i.e. the subgraph traversed by i is a tree. A path that traverses each edge of a tree exactly twice must traverse each edge once in either direction, so i is a Wigner path. Thus Z k 1 X X Y h (n) (n)∗i lim x dLn = lim E ξ 0 ξ 0 , n→∞ n→∞ nk/2+1 ii ii k+1 0 C∈Wk i∈C∩{1,...,n} (i,i )∈Ei 2 h (n) (n)∗i (n) i i0 if the limit on the right exists. As ξ 0 ξ 0 = ξ 0 = s , , this gives the E ii ii E ii n n desired result. If k is odd, then Wk is empty, so the limit is 0. Let us recall the following solution to the classical Hamburger moment problem, whose proof can be found in [23]: ∞ Lemma 2.8. Given a sequence of real values {mk}k=0, there exists a Borel measure µ with R ∞ k moments −∞ x dµ = mk if and only if the k × k Hankel matrix {hij}0≤i,j≤k−1 given by hij = mi+j is positive semi-definite for each k. These lemmas are sufficient to show that the mean empirical eigenvalue distributions Ln converge in moment to a fixed measure µ: Proposition 2.9. There exists a Borel measure µ of bounded support, symmetric about 0, such that for all k ≥ 0, Z Z k k lim x dLn = x dµ. n→∞ k C Proof. Let us fix C ∈ Wk with m = 2 + 1 = w(C). Let i ∈ C be the Wigner path such that the distinct indices of iC , in the order in which they are visited, are 1, 2, . . . , m. Define m Qk 1/2 a function fC : [0, 1] → [0, ∞) by fC (x1, . . . , xm) = s(x C , x C ) if k ≥ 1, or j=1 ij ij+1 fC (x1) = 1 if k = 0. Then fC is bounded with discontinuity set of measure zero, because C C C the same properties hold for s and ij 6= ij+1 for any j if i is a Wigner path. So fC is Riemann integrable, and we have Z 1 Z 1 1 n n X X x1 xm ... fC (x1, . . . , xm)dx1 . . . dxm = lim m ... fC n ,..., n 0 0 n→∞ n x1=1 xm=1 1 n X x1 xm = lim fC ,..., , n→∞ nm n n x1,...,xm=1 x16=...6=xm where the second equality holds because fC is bounded. But by the definition of fC , n k 1/2 X x1 xm X Y ij ij+1 fC n ,..., n = s n , n , x1,...,xm=1 i∈C∩{1,...,n}k+1 j=1 x16=...6=xm which is the quantity appearing on the right hand side of (1) in Lemma 2.7. Thus the limit in Lemma 2.7 exists, and we have Z k X lim x dLn = [fC ] n→∞ E C∈Wk m for even k, where E[fC ] denotes the average value of fC over [0, 1] . For each n, the k × k Hankel matrix of moments for Ln as defined in Lemma 2.8 is positive semi-definite for all k. As all moments of Ln converge as n → ∞, the limits of these k × k Hankel matrices as n → ∞ exist and must also be positive semi-definite for all k. Hence there exists a Borel measure µ whose moments are the limits of those of Ln, by Lemma 2.8. 6 To show that µ has bounded support, let us compute the size of Wk. Consider the map k+1 ϕ : Wk → Z such that ϕ([i])1 = 0, and ϕ([i])j+1 = ϕ([i])j + 1 if (ij, ij+1) is the first traversal of that edge in i and ϕ([i])j+1 = ϕ([i])j − 1 if (ij, ij+1) is the second traversal of that edge in i, for each j = 1, . . . , k. Then ϕ([i])k+1 = 0 and ϕ([i])j ≥ 0 for all j, i.e. ϕ([i]) is a Dyck path. It is straightforward to verify that ϕ is a bijection between Wigner paths and Dyck paths, the latter enumerated by the Catalan numbers. Hence, for 1 k k 1/2 k even k, |Wk| = k/2+1 k/2 ≤ 2 . Together with the bound E[fC ] ≤ (ksk∞ ) , we obtain k R k 1/2 h 1/2 1/2 i that limn→∞ x dLn ≤ 2(ksk∞ ) , so µ is supported on −2(ksk∞ ), 2(ksk∞ ) . It is symmetric about 0 because its odd moments are 0 by Lemma 2.7. 2.2 Convergence of the empirical distribution Ln To conclude the proof of Proposition 2.3, we extend our notion of path equivalence to an equivalence relation on pairs of paths: Definition 2.10. Two ordered pairs of paths (i, i0) and (j, j0) are equivalent if there is a bijection ϕ : N → N that maps i to j and i0 to j0 when applied componentwise. Let [(i, i0)] 0 denote the equivalence class of (i, i ). Let Pk be the set of equivalence classes of pairs of closed paths of length k. The following lemma shows convergence in probability of the moments of Ln to those of Ln: Lemma 2.11. For all ε > 0 and k ≥ 0, Z Z k k lim P x dLn − x dLn ≥ ε = 0. n→∞ R k R k Proof. As E x dLn = x dLn, Chebyshev’s inequality gives Z Z "Z Z 2# k k 1 k k P x dLn − x dLn ≥ ε ≤ E x dLn − x dLn ε2 " # ! 1 Z 2 Z 2 = xkdL − xkdL ε2 E n n 1 2 = ( 1 tr Xk)2 − 1 tr Xk ε2 E n n E n n n 1 X = [X (i , i ) ...X (i , i )X (i0 , i0 ) ...X (i0 , i0 )] ε2n2 E n 1 2 n k 1 n 1 2 n k 1 i1,...,ik=1 0 0 i1,...,ik=1 0 0 0 0 − E[Xn(i1, i2) ...Xn(ik, i1)]E[Xn(i1, i2) ...Xn(ik, i1)] 1 X X h (n) (n) (n) (n) i = E ξi i . . . ξi i ξi0 i0 . . . ξi0 i0 ε2nk+2 1 2 k 1 1 2 k 1 C∈Pk (i,i0)∈C∩{1,...,n}k+1×{1,...,n}k+1 h (n) (n) i h (n) (n) i − E ξi i . . . ξi i E ξ 0 0 . . . ξ 0 0 . (2) 1 2 k 1 i1i2 iki1 0 0 0 0 If Ei = (i, i ) | i ≤ i , (i, i ) = (ij, ij+1) or (i , i) = (ij, ij+1) for some j is the set of edges traversed by any closed path i, then f (i,i0) b (i,i0) h (n) (n) i Y (n) i (n)∗ i ξ . . . ξ = ξ 0 ξ 0 E i1i2 iki1 E ii ii 0 (i,i )∈Ei 7 and 0 0 0 0 f (i,i )+f 0 (i,i ) b (i,i )+b 0 (i,i ) h (n) (n) (n) (n) i Y (n) i i (n)∗ i i E ξi i . . . ξi i ξ 0 0 . . . ξ 0 0 = E ξ 0 ξ 0 1 2 k 1 i1i2 iki1 ii ii 0 (i,i )∈Ei∪Ei0 (n) 0 by the independence condition on {ξij }i≤j. Hence for any pair of paths (i, i ) that together traverse an edge only once, the corresponding term in the sum in (2) is 0, and any other pair of paths traverses at most k distinct undirected edges. Also, for any pair of paths (i, i0) for which Ei and Ei0 are disjoint, the corresponding term in (2) is 0 since 0 0 0 0 f (i,i )+f 0 (i,i ) b (i,i )+b 0 (i,i ) Y (n) i i (n)∗ i i E ξii0 ξii0 0 (i,i )∈Ei∪Ei0 0 0 0 0 f (i,i ) b (i,i ) f 0 (i,i ) b 0 (i,i ) Y (n) i (n)∗ i Y (n) i (n)∗ i = E ξii0 ξii0 E ξii0 ξii0 . 0 0 (i,i )∈Ei (i,i )∈Ei0 Any pair of paths for which Ei and Ei0 are not disjoint is such that the undirected edges of Ei ∪ Ei0 form a connected subgraph. Hence any equivalence class C of closed path pairs that contributes to the sum in (2) defines a pair of paths with at most k + 1 distinct indices, and so #{(i, i0) ∈ C} ≤ n(n − 1) ... (n − k). Together with the bounded moment condition (n) on ξij , this implies Z Z k k n(n − 1) ... (n − k) x dLn − x dLn ≥ ε ≤ · max A[i,i0] P 2 k+2 0 ε n [i,i ]∈Pk for a collection of bounds A[i,i0]. Taking the limit as n → ∞ gives the desired result. Proof of Proposition 2.3. By Proposition 2.9, Ln converges in moment to a measure µ, symmetric about 0, of bounded support. Suppose that µ is supported on [−C,C], and set B > max(1,C2) ≥ C. Given any bounded continuous function f, there exists a polynomial Q such that g = f − Q satisfies sup|x|≤B |g(x)| < ε/8 by the Weierstrass approximation theorem. Then we have Z Z Z Z Z Z Z f dLn − f dµ ≤ g1|x|≤B dLn − g1|x|≤B dµ + g1|x|>B dLn + Q dLn − Q dµ Z Z Z Z Z ε ≤ + g1|x|>B dLn + Q dLn − Q dLn + Q dLn − Q dµ , 4 so Z Z Z Z Z 1 ε ε P f dLn − f dµ > ε ≤ P g |x|>B dLn > + P Q dLn − Q dLn > 4 4 Z Z ε + P Q dLn − Q dµ > . 4 As f is bounded, Q is a polynomial, and B > 1, there exists some k such that g(x) ≤ |x|k on |x| > B. Then Z Z Z 1 ε k1 ε 4 k1 P g |x|>B dLn > ≤ P |x| |x|>B dLn > ≤ |x| |x|>B dLn 4 4 ε by Markov’s inequality. Cauchy-Schwartz’s inequality then gives Z sZ sZ k 2k |x| 1|x|>B dLn ≤ x dLn 1|x|>B dLn, 8 and Chebyshev’s inequality gives Z 1 Z 1 dL = L |x|k > Bk ≤ x2kdL . |x|>B n n B2k n Hence, by Proposition 2.9, Z 4 R x2kdL 4 R x2kdµ 2k k1 ε n 4C lim sup P |x| |x|>B dLn > ≤ lim k = k ≤ k . n→∞ 4 n→∞ εB εB εB