Introduction to Random Matrix Theory

MATH 247A: INTRODUCTION TO RANDOM MATRIX THEORY TODD KEMP For the next ten weeks, we will be studying the eigenvalues of random matrices. A random matrix is simply a matrix (for now square) all of whose entries are random variables. That is: X is an n×n matrix with fXijg1≤i;j≤n random variables on some probability space 2 2 (Ω; F ; P). Alternatively, one can think of X as a random vector in Rn (or Cn ), although it is better to think of it as taking values in Mn(R) or Mn(C) (to keep the matrix structure in mind). For any fixed instance ! 2 Ω, then, X(!) is an n × n matrix and (maybe) has eigenvalues λ1(!); : : : ; λn(!). So the eigenvalues are also random variables. We seek to understand aspects of the distributions of the λi from knowledge of the distribution of the Xij. In order to guarantee that the matrix actually has eigenvalues (and they accurately capture the behavior of the linear transformation X), we will make the assumption that X is a symmetric / Hermitian matrix. 1. WIGNER MATRICES We begin by fixing an infinite family of real-valued random variables fYijg1≤i≤j. Then we can define a sequence of symmetric random matrices Yn by ( Yij; i ≤ j [Yn]ij = : Yji; i > j The matrix Yn is symmetric and so has n real eigenvalues, which we write in increasing order λ1(Yn) ≤ · · · ≤ λn(Yn). In this very general setup, little can be said in general about these eigenvalues. The class of matrices we are going to spend most of this quarter studying, Wigner matrices, are given by the following conditions. • We assume that the fYijg1≤i≤j are independent. • We assume that the diagonal entries fYiigi≥1 are identically-distributed, and the off-diagonal entries fYijg1≤i<j are identically-distributed. 2 2 2 • We assume that E(Yij) < 1 for all i; j. (I.e. r2 = maxfE(Y11); E(Y12)g < 1.) It is not just for convenience that we separate out the diagonal terms; as we will see, they really do not contribute to the eigenvalues in the limit as n ! 1. It will also be conve- nient, at least at the start, to strengthen the final assumption to moments of all orders: for k ≥ 1, set k k rk = maxfE(Y11); E(Y12)g: To begin, we will assume that rk < 1 for each k; we will weaken this assumption later. Note: in the presence of higher moments, much of the following will not actually require identically-distributed entries. Rather, uniform bounds on all moments suffice. Herein, we will satisfy ourselves with the i.i.d. case. Date: Fall 2011. 1 2 TODD KEMP One variation we will allow from i.i.d. is a uniform scaling of the matrices as n in- creases. That is: let αn be a sequence of positive scalars, and set Xn = αnYn: In fact, there is a natural choice for the scaling behavior, in order for there to be limiting behavior of eigenvalues. That is to say: we would like to arrange (if possible) that both λ1(Xn) and λn(Xn) converge to finite numbers (different from each other). An easy way to test this is suggested by the following simple calculation: n 2 2 2 2 2 n 2 2 2 n · minfλ1; λng ≤ n · minfλj g ≤ λ1 + ··· + λn ≤ n · maxfλj g = n · maxfλ1; λng: j=1 j=1 Hence, in order for λ1 and λn to both converge to distinct constants, it is necessary for the 1 2 2 sequence n (λ1+···+λn) to be bounded (and not converge to 0). Fortunately, this sequence can be calculated without explicit reference to eigenvalues: it is the (normalized) Hilbert- Schmidt norm (or Fröbenius norm) of a symmetric matrix X: n 1 X 1 X kXk2 = λ (X )2 = X2 : 2 n i n n ij i=1 1≤i;j≤n Exercise 1.0.1. Verify the second equality above, by showing (using the spectral theorem) that 1 2 both expressions are equal to the quantity n Tr (X ). For our random matrix Xn above, then, we can calculate the expected value of this norm: n 1 X α2 X 2α2 X kX k2 = α2 (Y 2) = n (Y )2 + n (Y 2) E n 2 n n E ij n E ii n E ij 1≤i;j≤n i=1 1≤i<j≤n 2 2 2 2 = αn · E(Y11) + (n − 1)αn · E(Y12): 2 We now have two cases. If E(Y12) = 0 (meaning the off-diagonal terms are all 0 a:s:) then we see the “correct” scaling for αn is αn ∼ 1. This is a boring case: the matrices Xn are diagonal, with all diagonal entries identically distributed. Thus, these entries are also the eigenvalues, and so the distribution of eigenvalues is given by the common distribution 2 of the diagonal entries. We ignore this case, and therefore assume that E(Y12) > 0. Hence, 2 in order for EkXnk2 to be a bounded sequence (that does not converge to 0), we must −1=2 have αn ∼ n . 2 Definition 1.1. Let fYijg1≤i≤j and Yn be as above, with r2 < 1 and E(Y12) > 0. Then the −1=2 matrices Xn = n Yn are Wigner matrices. It is standard to abuse notation and refer to the sequence Xn as a a Wigner matrix. The pre- ceding calculation shows that, if Xn is a Wigner matrix, then the expected Hilbert-Schmidt 2 norm EkXnk2 converges (as n ! 1) to the second moment of the (off-diagonal) entries. As explained above, this is prerequisite to the bulk convergence of the eigenvalues. As we will shortly see, it is also sufficient. Consider the following example. Take the entires Yij to be N(0; 1) normal random variables. These are easy to simulate with MATLAB. Figure 1 shows the histogram of all n = 4000 eigenvalues of one instance of the corresponding Gaussian Wigner matrix MATH 247A: INTRODUCTION TO RANDOM MATRIX THEORY 3 X4000. The plot suggests that λ1(Xn) ! −2 while λn(Xn) ! 2 in this case. Moreover, although random fluctuations remain, it appears that the histogram of eigenvalues (some- times called the density of eigenvalues) converges to a deterministic shape. In fact, this is universally true. There is a universal probability distribution σt such that the density of eigenvalues of any Wigner matrix (with second moment t) converges to σt. The limiting distribution is known as Wigner’s semicircle law: 1 σ (dx) = p(4t − x2) dx: t 2πt + 60 50 40 30 20 10 0 ï2 ï1.5 ï1 ï0.5 0 0.5 1 1.5 2 FIGURE 1. The density of eigenvalues of an instance of X4000, a Gaussian Wigner matrix. 2. WIGNER’S SEMICIRCLE LAW Theorem 2.1 (Wigner’s Semicircle Law). Let Xn be a sequence of Wigner matrices, with entries 2 satisfying E(Yij) = 0 for all i; j and E(Y12) = t. Let I ⊂ R be an interval. Define the random variables #(fλ (X ); : : : ; λ (X )g \ I) E (I) = 1 n n n n n Then En(I) ! σt(I) in probability as n ! 1. 4 TODD KEMP The first part of this course is devoted to proving Wigner’s Semicircle Law. The key ob- servation (that Wigner made) is that one can study the behavior of the random variables En(I) without computing the eigenvalues directly. This is accomplished by reinterpreting the theorem in terms of a random measure, the empirical law of eigenvalues. Definition 2.2. Let Xn be a Wigner matrix. Its empirical law of eigenvalues µXn is the random discrete probability measure n 1 X µ = δ : Xn n λj (Xn) j=1 That is: µX is defined as the (random) probability measure such that, for any continuous function n R f 2 C(R), the integral f dµXn is the random variable n Z 1 X f dµ = f(λ (X )): Xn n j n j=1 R 1 Note that the random variables En(I) in Theorem 2.1 are given by En(I) = I dµXn . Although 1I is not a continuous function, a simple approximation argument shows that the following theorem (which we also call Wigner’s semicircle law) is a stronger version of Theorem 2.1. Theorem 2.3 (Wigner’s Semicircle Law). Let Xn be a sequence of Wigner matrices, with entries 2 satisfying E(Yij) = 0 for all i; j and E(Y12) = t. Then the empirical law of eigenvalues µXn converges in probability to σt as n ! 1. Precisely: for any f 2 Cb(R) (continuous bounded functions) and each > 0, Z Z lim P f dµXn − f dσt > = 0: n!1 In this formulation, we can use the spectral theorem to eliminate the explicit appearance > of eigenvalues in the law µXn . Diagonalize Xn = Un ΛnUn. Then (by definition) n n Z 1 X 1 X f dµ = f(λ (X )) = f([Λ ] ) Xn n j n n n jj j=1 j=1 1 1 1 = Tr f(Λ ) = Tr U >f(Λ )U = Tr f(X ): n n n n n n n n The last equality is the statement of the spectral theorem. Usually we would use it in reverse to define f(Xn) for measurable f. However, in this case, we will take f to be a polynomial. (Since both µXn and σt are compactly-supported, any polynomial is equal to a Cb(R) function on their supports, so this is consistent with Theorem 2.3.) This leads us to the third, a priori weaker form of Wigner’s law.

Introduction to Random Matrix Theory

Random Matrix Theory and Covariance Estimation

Rotation Matrix Sampling Scheme for Multidimensional Probability Distribution Transfer

The Distribution of Eigenvalues of Randomized Permutation Matrices Tome 63, No 3 (2013), P

New Results on the Spectral Density of Random Matrices

Random Rotation Ensembles

Moments of Random Matrices and Hypergeometric Orthogonal Polynomials

"Integrable Systems, Random Matrices and Random Processes"

Computing the Jordan Structure of an Eigenvalue∗

Cycle Indices for Finite Orthogonal Groups of Even Characteristic

Random Density Matrices Results at ﬁxed Size Asymptotics

Math-Ph/0111005V2 4 Sep 2003 Osbe Hsrsac a Enspotdi Atb H R Gran FRG Stay the My by Making Part for in Conrey Supported NSF

Random Matrix Theory and Financial Correlations