1 Random Matrices

Lecture 12: Concentration for sums of random matrices CSE 525, Spring 2019 Instructor: James R. Lee 1 Random matrices Consider the graph sparsification problem: Given a graph G V; E , we want to approximate G (in a ¹ º sense to be defined later) by a sparse graph H V; E . Generally we would like that E E and ¹ 0º 0 ⊆ moreover E is as small as possible—say O n or O n log n where n V . We will be able to do j 0j ¹ º ¹ º j j this by choosing a (nonuniform) random sample of the edges, but to analyze such a process, we will need a large-deviation inequality for sums of random matrices. 1.1 Symmetric matrices If A is a d d real symmetric matrix, then A has all real eigenvalues which we can order × λ1 A > λ2 A > > λd A . The operator norm of A is ¹ º ¹ º ··· ¹ º A : max Ax 2 max λi A : i 1;:::; d : k k x 21 k k fj ¹ ºj 2 f gg k k Íd Íd Íd The trace of A is Tr A i1 Aii i1 λi A . The trace norm of A is A i1 λi A .A ¹ º ¹ º k k∗ j ¹ ºj symmetric matrix is positive semidefinite (PSD) if all its eigenvalues are nonnegative. Note that for a A Í Ak PSD matrix A, we have Tr A A . We also recall the matrix exponential e 1k which is ¹ º k k∗ 0 k! well-defined for all real symmetric A and is itself also a real symmetric matrix. If A is symmetric, then eA is always PSD, as the next argument shows. Every real symmetric matrix can be diagonalized, writing A UT DU, where U is an orthogonal matrix, i.e. UUT UTU I and D is diagonal. One can easily check that Ak UT DkU for any k N, thus Ak and A are simultaneously diagonalizable. It follows that A and eA are simultaneously 2 A λ A diagonalizable. In particular, we have λi e e i . ¹ º ¹ º Finally, note that for symmetric matrices A and B, we have Tr AB 6 A B . To see this, j ¹ ºj k k · k k∗ let ui be an orthonormal basis of eigenvectors of B with Bui λi B ui. Then f g ¹ º Õd Õd Õd Tr AB ui ; ABui λi B ui ; Aui 6 λi B A B A : j ¹ ºj h i ¹ ºh i j ¹ ºj · k k k k∗ k k i1 i1 i1 Many classical statements are either false or significantly more difficult to prove when translated to the matrix setting. For instance, while ex+y ex e y e y ex is true for arbitrary real numbers x and y, it is only the case that eA+B eAeB if A and B are simultaneously diagonalizable. However, somewhat remarkably, the matrix analog does hold if we do it inside the trace. Theorem 1.1 (Golden-Thompson inequality). If A and B are real symmetric matrices, then + Tr eA B 6 Tr eAeB : ¹ º ¹ º Proof. We can prove this using the non-commutative Hölder inequality: For any even integer p > 2 and real symmetric matrices A1; A2;:::; Ap: Tr A1A2 Ap 6 A1 S A2 S Ap S ; ¹ ··· º k k p k k p · · · k k p 1 T p 21 p where A S Tr A A / is the Schatten p-norm. Consider real symmetric matrices U; V. k k p ¹ º / Applying this with A1 Ap UV gives, for every even p > 2: ··· p p T T p 2 2 p 2 2 2 p 2 Tr UV 6 UV Tr V U UV / Tr VU V / Tr U V / ; ¹¹ º º k kSp ¹¹ º º ¹¹ º º ¹¹ º º where the last inequality uses the cyclic property of the trace. Applying this inequality repeatedly now yields, for every even p, Tr UV p 6 Tr UpV p : ¹¹ º º ¹ º A p B p If we now take U e / and V e / , this gives A p B p p A B Tr e / e / 6 Tr e e : (1.1) ¹¹ º º ¹ º A p 2 B p For p large, we can use the Taylor approximation e / 1 + A p + O 1 p and similarly for e / . + + 2 / ¹ / º Thus: eA p eB p e A B p O 1 p . Therefore taking p in (1.1) gives / / ∼ ¹ )/ ¹ / º ! 1 + Tr eA B 6 Tr eAeB : ¹ º ¹ º 1.2 The Laplace transform for matrices We will consider now a random d d real matrix X. The entries Xij of X are all (not necessarily × ¹ º independent) random variables. We have seen inequalities (like those named after Chernoff and Azuma) which assert that if X X1 + X2 + + Xn is a sum of independent random numbers, then ··· X is tightly concentrated around its mean. Our goal now is to prove a similar fact for sums of independent random symmetric matrices. First, observe that the trace is a linear operator; this is easy to see from the fact that it is the sum of the diagonal entries of its argument. If A and B are arbitrary real matrices, then Tr A + B Tr A + Tr B . This implies that if X is a random matrix, then E Tr X Tr E X . ¹ º ¹ º ¹ º » ¹ º¼ ¹ » ¼º Note that E X is the matrix defined by E X ij E Xij . » ¼ ¹ » ¼º » ¼ Suppose that X1; X2;:::; Xn are independent random real symmetric matrices. Let X X1 + X2 + + Xn. Let Sk X1 + + Xk be the partial sum of the first k terms so that X Sn. Our ··· ··· first goal will be to bound the probability that X has an eigenvalue bigger than t. To do this, we will try to extend the method of exponential moments to work with symmetric matrices, as discovered by Ahlswede and Winter. It is much simpler than previous approaches that only worked for special cases. βX βλ X Note that for β > 0, we have λi e e i . Therefore: ¹ º ¹ º βX βt βX βt P max λi X > t P max λi e > e 6 P Tr e > e ; (1.2) i ¹ º i ¹ º ¹ º where the last inequality uses the fact that all the eigenvalues of eβX are nonnegative, hence βX Í βX βX Tr e λi e > maxi λi e . ¹ º i ¹ º ¹ º Now Markov’s inequality implies that E Tr eβX P Tr eβX > eβt 6 » ¹ º¼ : (1.3) » ¹ º ¼ eβt As in our earlier uses of the Laplace transform, our goal is now to bound E Tr eβX by a product » ¹ º¼ that has one factor for each term Xi. 2 In the matrix setting, this is more subtle: Using Theorem 1.1, βX β Sn 1+Xn βSn 1 βXn E Tr e E Tr e ¹ − º 6 E Tr e − e : » ¹ º¼ » ¹ º¼ » ¹ º¼ Now we push the expectation over Xn inside the trace: βSn 1 βXn βSn 1 βXn βSn 1 βXn E Tr e − e E Tr e − E e X1;:::; Xn 1 E Tr e − E e ; » ¹ º¼ ¹ » j − ¼º ¹ » ¼º βSn 1 and we have used independence to pull e − outside the expectation and then to remove the conditioning. Finally, we use the fact that Tr AB 6 A B and B Tr B when B has all ∗ ∗ βS¹n 1 º k k · k k k k ¹ º nonnegative eigenvalues (as is the case for e − ): βSn 1 βXn βXn βSn 1 E Tr e − E e 6 E e E Tr e − : ¹ » ¼º » ¼ ¹ º Doing this n times yields n n Ö Ö E Tr eβX 6 Tr I E eβXi d E eβXi : » ¹ º¼ ¹ º » ¼ » ¼ i1 i1 Combining this with (1.2) and (1.3) yields Ön βt βXi P max λi X > t 6 e− d E e : i ¹ º » ¼ i1 We can also apply this to X to get − ! Ön Ön βt βXi βXi P X > t 6 e− d E e + E e− : (1.4) »k k ¼ » ¼ » ¼ i1 i1 1.3 Concentration Let Y be a random, symmetric, psd d d matrix with E Y I. Suppose that Y 6 L with × » ¼ k k probability one. Theorem 1.2. If Y1; Y2;:::; Yn are i.i.d. copies of Y, then for any " 0; 1 the following holds. Let 1 Ín 2 ¹ º λ1; λ2; : : : ; λn denote the eigenvalues of n i1 Yi. Then 2 P λ1; λ2; : : : ; λn 1 "; 1 + " > 1 2d exp " n 4L : »f g ⊆ » − ¼¼ − − / There is a slightly nicer way to write this using the Löwner ordering of symmetric matrices: We write A B to denote that the matrix A B is positive semidefinite. We can rewrite the conclusion − of Theorem 1.2 as " n # 1 Õ 2 P 1 " I Yi 1 + " I > 1 2d exp " n 4L : (1.5) ¹ − º n ¹ º − − / i1 Proof of Theorem 1.2. Define Xi : Yi E Yi and X : X1 + + Xn. Then (1.5) is equivalent to − » ¼ ··· P X > "n 6 2d exp "2n 4L : »k k ¼ − / We know from (1.4) that it will suffice to bound E eβXi for each i. To do this, we will use the » ¼ fact that 1 + x 6 ex 6 1 + x + x2 x 1; 1 : 8 2 [− ¼ 3 Note that if A is a real symmetric matrix, then since I, A, A2, and eA are simultaneously diagonalizable, this yields I + A eA I + A + A2 (1.6) for any A with A 6 1.

1 Random Matrices

Random Matrix Theory and Covariance Estimation

Rotation Matrix Sampling Scheme for Multidimensional Probability Distribution Transfer

The Distribution of Eigenvalues of Randomized Permutation Matrices Tome 63, No 3 (2013), P

New Results on the Spectral Density of Random Matrices

Random Rotation Ensembles

Moments of Random Matrices and Hypergeometric Orthogonal Polynomials

"Integrable Systems, Random Matrices and Random Processes"

Computing the Jordan Structure of an Eigenvalue∗

Cycle Indices for Finite Orthogonal Groups of Even Characteristic

Random Density Matrices Results at ﬁxed Size Asymptotics

Math-Ph/0111005V2 4 Sep 2003 Osbe Hsrsac a Enspotdi Atb H R Gran FRG Stay the My by Making Part for in Conrey Supported NSF

Random Matrix Theory and Financial Correlations