School of Industrial and Information Engineering Master of Science in Mathematical Engineering
RANDOM MATRIX THEORYAND APPLICATIONS IN TELECOMMUNICATION AND QUANTUM SYSTEMS
Author : Zheng LI Supervisor : Franco FAGNOLA
Academic Year 2018/2019
To C.S. Ying
Abstract
The study of random matrices started in 1940s, when the physicists observed that the empirical spectral distributions of random Hamiltonians tends to a semicir- cle. Since then, more and more research results about random matrices have been published, and random matrix theory turned out to have deep connections with free probability and combinatorics. In the meantime, random matrix theory has been applied in many other fields, in almost all the situations where one wants to know the asymptotic property of some statistics determined by spectra of large matrices. In this thesis, we present an overview of random matrix theory and illustrate its applications to: telecommunication MIMO systems to the computa- tion of channel capacities, CDMA systems in the evaluation of minimum mean square errors and spectral efficiency, in quantum information for the study of quantum channel capacities and the celebrated conjecture on additivity of quan- tum entropy, in open quantum systems for finding spectra of random Lindblad operators.
Keywords: Random Matrix Theory, Telecommunication Systems, Open Quan- tum Systems.
iv
Acknowledgements
Firstly I would like to express my sincere gratitude to Prof. F. Fagnola, who has supervised the whole writing of my thesis. He has given me clear guidelines on studying of different subjects, and was always responsible when I encountered difficulties. I would also thank Prof. V. Moretti, who spent much time to demon- strate some proofs for me with patience. I wish to thank T. Kletti, who kindly read the manuscript and gave me sug- gestions, and always helped me on studying of mathematics. I also wish to thank K. Dong for meaningful discussions on telecommunication systems, thank S.F. Zhang for help on English language writing. I also have to thank my family, in particular my uncle D.S. Li, who is the main financial supporter of my studies here in Italy, and unfortunately passed away last year; I hope he would be glad to know the accomplishment of my master thesis writing in heaven. Finally I want to express a special gratitude to the Department of Mathematics of Politecnico di Milano, which gave me, once a layman, a precious opportunity to study mathematics, and has opened my eyes to see a new beautiful world.
vi
Contents
Abstract iv
Acknowledgement vi
1 Introduction1
2 Preliminaries3 2.1 Information Theory...... 3 2.1.1 Complex Random Vector...... 3 2.1.2 Entropy and Mutual Information...... 5 2.2 Estimation Theory...... 7 2.2.1 Minimal Mean Squared Error Estimator...... 7 2.2.2 Linear MMSE Estimator...... 7 2.3 Probability Measures on Metric Space...... 9 2.3.1 Weak Convergence of Probability Measures...... 9 2.3.2 Tightness and Relative Compactness...... 11 2.3.3 Other Types of Convergence...... 12 2.4 Bounded Linear Operators on Hilbert Space...... 15 2.4.1 Banach Algebra and C*-Algebra...... 16 2.4.2 Adjoint Operator...... 17 2.4.3 Isometry and Partial Isometry...... 18 2.4.4 Trace Class Operator...... 21 2.4.5 Von Neumann Algebra...... 23
3 Random Matrix Theory 27 3.1 Empirical Spectral Distribution...... 27 3.2 Convergence of Random Distributions...... 28 3.2.1 From Deterministic Distribution to Random Distribution.. 28 3.2.2 General Facts on Convergence of Random Distributions.. 29 3.2.3 Common Types of Convergence Used in RMT...... 30 3.3 Stieltjes Transform...... 32 3.3.1 Definition and Basic Properties of Stieltjes Transform.... 32 3.3.2 Derivation of semi-circular Law Using Stieltjes Transform. 35
viii 3.4 Asymptotic Results in Random Matrix Theory...... 36 3.4.1 Wigner Matrices and semi-circular Law...... 36 3.4.2 Wishart Matrices and Marchenko-Pastur Distribution.... 37 3.4.3 Ginibre Matrices and Circular Law...... 38 3.4.4 ESD of Another Important Class of Random Matrices.... 39 3.5 Convergence Rates of ESD...... 40 3.5.1 In Cases of Wigner Matrices and Wishart Matrices...... 40 3.5.2 Simulation of Convergence Rate of ESD...... 42 3.6 Connections with Free Probability...... 43 3.6.1 Non-commutative Probability Space and Freeness...... 43 3.6.2 Free Product and Free Probability...... 45 3.6.3 Free Central Limit Theorem and Asymptotic Freeness.... 47
4 Applications of RMT 49 4.1 In MIMO System...... 49 4.1.1 Asymptotic Result I: Fixed Number of Receivers...... 50 4.1.2 Asymptotic Result II: Simultaneously Tending to Infinity.. 51 4.2 In CDMA System...... 53 4.2.1 Cross-correlations of Random Spreading Sequences..... 55 4.2.2 MMSE Multiuser Dectection and Spectral Efficiency..... 56 4.2.3 Other Types of Detections in CDMA System...... 60 4.3 In Open Quantum System...... 61 4.3.1 Quantum State and Quantum Channel...... 61 4.3.2 Asymptotic Minimal Output Entropy...... 65 4.3.3 Spectrum of Random Quantum Channel...... 68 4.3.4 Quantum Markov Semigroup and Lindblad Equation.... 71 4.3.5 Random Matrix Model of Lindbladian...... 74
5 Conclusions and Future Development 76
Bibliography 78
ix Chapter 1
Introduction
Random matrix theory (RMT) first appeared in the study of quantum mechan- ics in the 1940s. In quantum mechanics, values of physical observables such as energy and momentum are regarded as eigenvalues of linear bounded operators on a Hilbert space. In particular, the Hermitian operator Hamiltonian, which is closely related to the time-evolution of a quantum system (this will also be mentioned in our Section 4.3.4), played the vital role in the theory of quantum mechanics. Hence, the asymptotic behavior (in particular the distribution of the spectrum) of large dimensional random matrices of such type had attracted spe- cial interests, and semi-circular law was discovered during that time [2]. Later, lots of researchers started to work on this field. Many other types of matrices have been studied, and the convergence of the empirical spectral distri- butions of random matrices has been proven in more and more strong sense. In 1980s, L. Pastur [5], as a pioneer, introduced the Stieltjes transform into random matrix theory, which can discover the limit spectral distribution in many cases, without knowing prior knowledge. Moreover, Z.D. Bai et al. [11][12] have used the Stieltjes transform as the main tool to study the convergence rate of empirical spectral distributions. It is worth mentioning that, in the meantime, D. Voiculescu created the free probability, a theory that studies non-commutative random vari- ables, in which the limit distribution in the free version of central limit theorem is the semi-circular law. About 1991, D. Voiculescu [9] also discovered that freeness could be asymptotically held for many kinds of random matrices. Nowadays, the random matrix theory has been applied to numerous fields, anywhere we want to know the asymptotic property of some statistics depending on the spectra of matrices. In this master thesis, we first introduce some preliminary knowledge on sev- eral quite different topics in Chapter2, which will be used in the subsequent chapters. Notice that we will select only the knowledge that is out of the cur- riculum of the Mathematical Engineering program at Politecnico di Milano; in other words, we assume that the reader is familiar with the basics in Algebra, Probability, Measure Theory, and Functional Analysis.
1 In Chapter3, we will clarify the formal definition of "convergence" for the ran- dom probability measures, and give some properties of the aforementioned pow- erful tool Stieltjes transform. The limit spectral distributions of different types of ensemble like Wigner ensemble, Wishart ensemble and Ginibre ensemble, will be listed. Moreover, we will discuss the free probability and its connections to random matrix theory. In Chapter4 one can find several applications of random matrix theory in telecommunication systems and open quantum systems. The first application appears in multiple-input multiple-output (MIMO) system, in which the channel can be modelled as a matrix, and the Information Theory tells us the capacity of the channel is determined by the eigenvalues of that matrix, hence we can ap- ply the random matrix theory to analyze the asymptotic capacity of the channel. The second application is about code-division multiple access (CDMA) system, in which people use random spreading sequences to modulate the signal, and the linear minimal mean square error estimation to demodulate signal. We will analyze the asymptotic error and capacity of such estimation. The third appli- cation is about the asymptotic capacity of the random quantum channel, and it depends on the eigenvalues in a random subspace of a tensor product. The last application is about the sampling of the spectrum of random Lindblad opera- tor, in high dimensional open quantum systems. We will give a random matrix model of such operator, which conserves the asymptotic spectral property, but dramatically reduces the sampling time.
2 Chapter 2
Preliminaries
We begin with some preliminary but important results that will be used in the rest chapters. Section 2.1 and 2.2 are prepared for the engineering problems. Section 2.3 helps us review the probability theory; in particular, in the Section 2.3.3 an important result will be given, which tells that the different kinds of convergence of probability measures coincide under specific condition. Section 2.4, however, mainly talks about some basic knowledge of Operator Algebra, which is the fun- damental of Quantum Probability.
2.1 Information Theory
2.1.1 Complex Random Vector We define a complex-valued random vector in the following way: 0 Definition 2.1 (Complex random vector) A complex random vector Z = (Z1, ··· , Zn) on the probability (Ω, F, P) is a measurable function Z : Ω → Cn such that the vector 0 (Re(Z1), Im(Z1), ··· , Re(Zn), Im(Zn)) is a real random vector on (Ω, F, P). For a complex random vector, we define its expectation as 0 µ := E[Z] = (E[Z1], E[Z2], ··· , E[Zn]) (2.1.1) and define its covariance matrix as h i Σ := E (Z − E [Z])(Z − E [Z])† (2.1.2) where (Z − E [Z])† denotes the conjugate transpose of Z − E [Z]. Moreover, dif- ferent from the real-valued random vectors, in the complex-valued case we addi- tionally define 0 Γ := E (Z − E [Z])(Z − E [Z]) (2.1.3)
3 0 where (Z − E [Z]) denotes the matrix transpose of Z − E [Z]. This Γ also plays a role in determining a distribution. Then, we turn to define the complex-valued Gaussian random vector: Definition 2.2 (Complex Gaussian random vector) A complex random vector Z is 0 complex Gaussian distributed if (Re(Z1), Im(Z1), ··· , Re(Zn), Im(Zn)) is a real Gaus- sian distributed random vector. If Z is complex Gaussian distributed, we have the notation Z ∼ CN (µ, Σ, Γ), where the parameters are defined as in 2.1.1- 2.1.3. The probability density func- tion of Z ∼ CN (µ, Σ, Γ) is pretty complicated and will not be used in the follow- ing sections, so we omit it here, but one can easily find it in [16]. Specially, we say Z is standard complex Gaussian distributed if Z ∼ CN (0, I, 0). Now, we introduce this concept of circularly symmetric complex random vec- tor, which is widely used in the field of Telecommunication Engineering. Definition 2.3 (Circular symmetry) A complex random vector Z is said to be circu- larly symmetric, if for every x ∈ [−π, π) its distribution is identical to eixZ. As a direct result of circular symmetry, E [Z] = 0. In fact, if taking the expec- tation, we will have E [Z] = eix · E [Z] for all x ∈ [−π, π), this compels E[Z] = 0. Next, we will see if a complex random vector is circularly symmetric, then its Γ will also have some special structure. Proposition 2.4 If Z is circularly symmetric, then its corresponding µ = 0 and Γ = 0. Proof. µ = E[Z] = 0 has already been demonstrated. For studying the struc- 0 ture of Γ we will follow the same idea. Notice now Γjk = E ZjZk because Z is centered, and we can write 0 0 −i2x ix ix −i2x 0 E ZjZk = e E e Zj e Zk = e E ZjZk (2.1.4)
The second equality holds because the distributions of E[Z] and eixE[Z] must be 0 identical. Obviously the arbitrariness of x compels E ZjZk = 0. Moreover, here j, k are also arbitrary, therefore Γ = 0. Then we can combine the complex Gaussian distribution and the circular sym- metry, and give a formal definition as following Definition 2.5 (Circularly symmetric complex Gaussian random vector) A com- plex random vector Z is circularly symmetric complex Gaussian distributed if it is com- plex Gaussian distributed and circularly symmetric, and it can be simply denoted as Z ∼ CN (0, Σ). Here Γ is suppressed because of Proposition 2.4, and the probability density func- tion of Z ∼ CN (0, Σ) can be simplified as [46]
1 † −1 f (z) = e−z Σ z (2.1.5) Z πN det (Σ)
4 2.1.2 Entropy and Mutual Information In this section we turn to define the entropy, which measures the uncertainty of a random variable (or vector). 0 Definition 2.6 (Entropy) Let X = (X1, X2, ··· , XN) random vector with probability density function fX (x), its entropy is defined as Z h(X) = − fX (x) log fX (x)
0 0 Definition 2.7 (Conditional entropy) Let X = (X1, X2, ··· , XN) and Y = (Y1, Y2, ··· , YN) be two random vectors, with knowing the joint density function f(X,Y) and the conditional probability density function fX|Y . Then, X’s entropy conditioned on Y is defined as ZZ h(X|Y) = − f(X,Y)(x, y) log fX|Y (x|y)
Next we want to study the entropy of a circularly symmetric complex Gaus- sian distributed random vector, but first we need to prove an important lemma.
Lemma 2.8 If Z ∼ CNN(0, Σ), then h i E Z†Σ−1Z = N
−1 −1 Proof. We denote Σij as the entry of Σ at i-th row and j-th column, then observe that " # h i N N N N E † −1 E −1 −1E Z Σ Z = ∑ ∑ Σij ZiZj = ∑ ∑ Σij ZiZj = ··· i=1 j=1 i=1 j=1 N N −1 −1 = ∑ ∑ Σij Σji = tr Σ Σ = N i=1 j=1
which completes the proof.
Proposition 2.9 If Z ∼ CNN(0, Σ), then its entropy is
h(Z) = log det (πeΣ) (2.1.6)
Proof. Recall the probability density function of circularly symmetric complex Gaussian variable fZ(z) is given in Expression 2.1.5, just by calculation and using
5 Lemma 2.8 we have Z h(Z) = − fZ(z) log fZ(z) Z N † −1 = − fZ(z) − log π det Σ − z Σ z log e Z Z N † −1 = log π det Σ · fZ(z) + log e · fZ(z)z Σ z h i = log πN det Σ + log e · E Z†Σ−1Z = log πN det Σ + log eN = log det (πeΣ) so we have reached the goal. Then we define the mutual information, which is a measure of the amount of information that one random variable contains about another random variable. In our case we are concerning on the random vector, i.e. multi-dimensional ran- dom variable. Definition 2.10 (Mutual information) Let X and Y be two random vector with prob- ability density functions fX (x) and fY (y), respectively, then their mutual information is ! ZZ f(X,Y)(x, y) I(X, Y) = f(X,Y)(x, y) log (2.1.7) fX (x) fY (y) Moreover, we can also represent the mutual information by the entropy, which reveals that the mutual information I(X, Y) is exactly the reduction in the uncer- tainty of X due to the knowledge of Y [35]. Symmetricly, it is also the reduction in the uncertainty of Y due to the knowledge of X. Proposition 2.11 For two random vectors X and Y, we have
I(X, Y) = h(X) − h(X|Y) = h(Y) − h(Y|X)
Proof. According to Definition 2.6 and Definition 2.7, we have Z ZZ h(X) − h(X|Y) = − fX (x) log fX (x) + f(X,Y)(x, y) log fX|Y (x|y) ! ZZ ZZ f(X,Y)(x, y) = − f(X,Y)(x, y) log fX (x) + f(X,Y)(x, y) log fY (y) ! ZZ f(X,Y)(x, y) = f(X,Y)(x, y) log = I(X, Y) fX (x) fY (y)
By the same strategy we can prove I(X, Y) = h(Y) − h(Y|X).
6 2.2 Estimation Theory
2.2.1 Minimal Mean Squared Error Estimator Suppose that we want to estimate the value of the unobserved random variable X (here we are not estimating a parameter), given the observation Y = y. In general, our estimate xˆ is a function of y, i.e.
xˆ = g(y)
The mean squared error is defined by h i h i 2 2 MSE (xˆ) := E (X − xˆ) Y = y = E (X − g(y)) Y = y (2.2.1) by simple computation one can obtain that xˆ = E [X|Y = y] is the estimate which minimizes the quantity 2.2.1. Therefore, we shall define the minimal mean square error (MMSE) estimator in the following way:
Definition 2.12 (MMSE estimator) Let Xˆ = g(Y) be an estimator of random variable X, needing to observe the value of some random variable Y. We say Xˆ MMSE is the MMSE estimator if Xˆ MMSE = E [X|Y] which minimizes the mean square error among all estimators.
By the law of total expectation and law of total variance, we know:
E Xˆ = E [E [X|Y]] = E [X] Var Xˆ = Var (X) − E [Var (X|Y)]
2.2.2 Linear MMSE Estimator In most cases, the MMSE estimator Xˆ = E [X|Y] is not easy to be explicitly com- puted even when we know Y = y. We might hope that the MMSE estimator has a simpler structure, for example we shall let
Xˆ LMMSE = g(Y) = aY + b in which a, b ∈ R are real numbers to be determined. This Xˆ LMMSE is called linear MMSE estimator. In particular, we want to choose a and b such that Xˆ LMMSE has the minimal mean square error among all possibilities, and we have the following proposition:
7 Proposition 2.13 Let X and Y be two random variables with finite mean and variance, and consider the function h defined as
h 2i h i h(a, b) := E X − Xˆ = E (X − aY − b)2 (2.2.2)
Then, h(a, b) is minimized if Cov (X, Y) a = a∗ := (2.2.3) Var (Y) and Cov (X, Y) b = b∗ = E [X] − E [Y] (2.2.4) Var (Y) Moreover, h(a∗, b∗) = (1 − ρ2) Var (X) (2.2.5) where ρ is the correlation coefficient between X and Y; and we also have
E [(X − a∗Y + b∗)Y] = 0 (2.2.6)
which is also called orthogonal principle.
Proof. We can directly expand 2.2.2 and get
h(a, b) = EX2 + a2EY2 + b2 − 2aEXY − 2bEX + 2abEY
then 2EY2 EY Hess h = EY 2 is positive definite, which is easy to be checked. Therefore we conclude h is strictly convex on R2, which promises the existence and uniqueness of the mini- mizer, and allows us to cultivate the minimizer by partial derivatives. Thus, we compel ∂ h = 2aEY2 − 2EXY + 2bEY = 0 (2.2.7) ∂a ∂ h = 2b − 2EX + 2aEY = 0 ∂b By solving above system of equations we obtain 2.2.3, 2.2.4 and 2.2.5, and observe that 2.2.6 is an immediate result of 2.2.7. In this way, we can write our Linear MMSE estimator as Cov (X, Y) Xˆ = (Y − EY) + EX LMMSE Var (Y)
2 Obviously we have E Xˆ LMMSE = E [X] and Var Xˆ LMMSE = ρ Var (X).
8 2.3 Probability Measures on Metric Space
2.3.1 Weak Convergence of Probability Measures Now we go through a little bit in the measure theory. Suppose that we have a metric space S with Borel σ-algebra B(S), on which we have a space of proba- bility measures P(S). Then, we shall define the weak convergence of probability measures, which sometimes is also called narrow convergence.
Definition 2.14 (Weak convergence) Let {µn}n∈N and µ be probability measures on (S, B(S)). We say µn → µ weakly if Z Z f dµn → f dµ, ∀ f ∈ Cb(S) S S Then, naturally, we will consider the question: is the weak limit unique? The answer is positive, which is the consequence of following theorem [38], since space of bounded uniformly continuous functions is a subspace of Cb(S). R R Theorem 2.15 Probability measures µ and ν on (S, B(S)) coincide if S f dµ = S f dν for all bounded uniformly continuous functions f on S.
We have also other characterizations of the weak convergence is given by the following Portmanteau Theorem [38].
Theorem 2.16 (Portmanteau) The following conditions are equivalent:
1. µn → µ weakly. R R 2. S f dµn → S f dµ for all bounded uniformly continuous functions f on S.
3. µn(A) → µ(A) for all µ-continuity sets A ⊂ S.
In condition3, A ⊂ S is called µ-continuity set if its boundary ∂A satisfies
µ(∂A) = 0
Specially, when S is Euclidean spaces (here we can even set S = R for sim- plicity), we can introduce the concept of cumulative distribution function. In fact, if we have a probability measure µ, its corresponding distribution function Fµ is defined as Z 1 Fµ(x) := µ ((−∞, x]) = (−∞,x] dµ R
Conversely, Fµ also characterizes µ. This tells us there is a kind of conjugacy be- tween the distribution function and the probability measure. Since the existence of this conjugacy, sometimes we abuse the term "distribution", especially in the
9 random matrix theory, it can represents either the distribution function or the cor- responding probability measure. However, one should always be able to verify what does it indicate, in concrete situations. More specifically, a function F : R → [0, 1] is the distribution function of some probability measure on (R, B(R)) if and only if the following three conditions are satisfied [36]:
1. F is non-decreasing.
2. F is right continuous.
3. limx→−∞ F(x) = 0 and limx→+∞ F(x) = 1. Finally, we give a characterization of the weak convergence of probability measures in terms of their corresponding cumulative distribution functions.
Theorem 2.17 Let {µn}n∈N and µ be probability measures on R, and denote their cor-
responding distribution function as Fµn and Fµ. Then, µn → µ weakly if and only if
Fµn (x) → Fµ(x) for x ∈ R at which Fµ is continuous.
Proof. One side can be obtained just by condition3 in Theorem 2.16. For the proof of the whole theorem please refer to [36]. Here we empathize that µ must be a probability measure on (R, B(R)); oth- erwise, we can let {µn}n∈N be the sequence of Dirac measures, each centered at
n, and let µ = 0. Obviously, Fµn has a unit jump at n and Fµ is identically zero.
Now, observe that Fµn (x) → Fµ(x) pointwisely for all x ∈ R, but µn 9 µ weakly. We will see this sequence of Dirac measures several times in the further part. In fact, we have the Helly’s Selection Theorem, which in general describes the sequential compactness in the space of locally integrable function of bounded variations, and luckily every cumulative distribution function is just its one spe- cial case.
Theorem 2.18 (Helly’s Selection Theorem) For every sequence {Fn}n∈N of distri-
bution functions there exists a subsequence {Fnk }k∈N and a non-decreasing, right-continuous function F such that
lim Fnk (x) = F(x) k→∞ at continuity points x of F.
Proof. Apply the classical diagonal technique, we can obtain a sequence {nk}k∈N
of integers along which the limit G(r) = limk→∞ Fnk (r) exist for all r ∈ Q. Then we define F(x) := inf G(r) x 10 clearly F is non-decreasing. Moreover, to each x and e > 0, there is an r for which x < r and G(r) < F(x) + e. If x ≤ y < r, then F(y) ≤ G(r) < F(x) + e, so F is right-continuous. If F is continuous at x, choose y < x such that F(x) − e < F(y); now choose rational r and s such that y < r < x < s and G(s) < F(x) + e. From F(x) − e < G(r) ≤ G(s) < F(x) + e and Fn(r) ≤ Fn(x) ≤ Fn(s), it follows that, as k → +∞, Fnk (x) has limits superior and inferior within e of F(x). The F in this theorem necessarily satisfies 0 ≤ F(x) ≤ 1 for all x ∈ R, but F need not be a distribution function, as we see in aforementioned counterexample. Sometimes, we say such F is a (extended) cumulative distribution function of a sub-probability measure. In fact, this kind of convergence is called vague convergence, which will be introduced later (in Definition 2.22, one shall prove that they are equivalent). 2.3.2 Tightness and Relative Compactness Another important concept we are about to introduce here is the tightness, this nice property of measures will help to constrict the limit in the Helly’s Selection Theorem to be a probability measure. Definition 2.19 (Tightness) Let M ⊂ P(S) be a collection of probability measures on (S, B(S)), it is called (uniformly) tight if for every e > 0, there exists a compact subset Ke ⊂ S such that µ (S \ Ke) < e, ∀µ ∈ M Especially, if M consists of a single measure µ, then µ is called a tight measure. If the collection M is sequential, that is, we can write it as {µn}n∈N, then the tightness ensures that µn will not "flee" to infinity as n → +∞. Consider the sequence of probability measures {δn}n∈N on (R, B(R)), where δn is the Dirac measure centered at n. Obviously {δn}n∈N cannot be tight, since once have fixed e, we can let n be large enough so that there does not exist such Ke. In this case, the tightness of the sequence of probability measures can be also equivalently expressed as [26] sup lim inf µn(K) = 1 (2.3.1) → K⊂⊂S n ∞ Moreover, we give a further simple condition for the weak convergence of probability measures: Proposition 2.20 A necessary and sufficient condition for µn → µ weakly is that each { } { } subsequence µnk k∈N contains a further subsequence µnk(i) i∈N converging weakly to µ. 11 Proof. The necessity is trivial. For the sufficiency, if µn 9 µ, then Z Z lim f dµn 9 f dµ n→∞ S S for some f ∈ Cb(S). But then, for some e > 0 and some subsequence {µnk }k∈N, Z Z f dµnk − f dµ > e, ∀k ∈ N S S so there will be no further subsequence which can converge weakly to µ. Then, we will reveal the connection between the tightness and relative com- pactness of M, in the following Prokhorov’s Theorem [38]. Recall that, we say M is relatively compact or pre-compact, if every sequence of elements of M contains a weakly convergent subsequence. For the most part we are concerned with the relative compactness of sequences {µn}n∈N, this means that every subsequence { } { } → µnk k∈N contains a further subsequence µnk(i) i∈N such that µnk(i) ν weakly as i → +∞, for some probability measure ν ∈ P(S). Theorem 2.21 (Prokhorov’s Theorem) Suppose M ⊂ P(S) is a collection of proba- bility measures. If M is tight, then M is relatively compact. Conversely, if M relatively compact and S is separable and complete, i.e. S is Polish, then M is tight. This theorem will be used in the next section. 2.3.3 Other Types of Convergence Moreover, we have some other weaker types of convergences of probability mea- sures, which will also be used in the further part. Definition 2.22 (Vague convergence - type I) Let {µn}n∈N and µ be probability mea- sures on (S, B(S)). We say µn → µ vaguely if Z Z f dµn → f dµ, ∀ f ∈ C0(S) Definition 2.23 (Vague convergence - type II) Let {µn}n∈N and µ be probability mea- sures on (S, B(S)). We say µn → µ vaguely if Z Z f dµn → f dµ, ∀ f ∈ Cc(S) Definition 2.24 (Distributional convergence) Let {µn}n∈N and µ be probability mea- sures on (S, B(S)). We say µn → µ distributionally if Z Z ∞ f dµn → f dµ, ∀ f ∈ Cc (S) 12 Notice C0(S) represents the Banach space of continuous function on S vanish- ing at infinity, equipped with the uniform norm, so we have the boundedness. Notice that, since S is a metric space which does not necessarily have the norm, vanishing at infinity means: given any e > 0, there is a compact subset Ke ⊂ S such that | f (x)| < e for all x ∈ S \ Ke. Moreover, Cc(S) represents the space of ∞ compactly supported continuous functions on S, and Cc (S) represents the space of compactly supported smooth functions on S. ∞ Obviously we have the relation Cc (S) ⊂ Cc(S) ⊂ C0(S) ⊂ Cb(S), so once we have the weak convergence of probability measures, we have all other kinds of convergence. Now, we are going to demonstrate the most important result in this section, that is, if S = Rd, all the aforementioned convergence , i.e. Definition 2.14, 2.22, 2.23, 2.24, are equivalent. Simply, we can just prove that the distribu- tional convergence and weak convergence coincide. Before proving the main theorem, we need to write down a useful lemma as follows, which allows us to approximate the target function by a sequence of smooth functions. For more detail about the mollifier and convolution technique, please refer to [45]. p d Lemma 2.25 Let f ∈ L (Ω), 1 ≤ p ≤ ∞, Ω ⊆ R , and define fη := gη ∗ f , where gη is the mollifier, then we have: ∞ n 1.f η ∈ C (R ) for all η > 0. 2. if f ∈ C(Ω), then fη → f uniformly in every compact K ⊂ Ω, as η → 0. d d Theorem 2.26 Suppose that {µn}n∈N and µ are probability measures on (R , B(R )), then µn → µ distributionally if and only if µn → µ weakly. Proof. The necessity is obvious, we can just show the sufficiency. ∞ d First, we show that {µn}n∈N is tight. Let ζ ∈ Cc (R ) satisfies 0 ≤ ζ ≤ 1 and ( 1 |x| ≤ 1 ζ(x) = 2 0 |x| ≥ 1 Define ζk(x) := ζ(x/k), and by the distributional convergence we have Z Z lim ζk(x) dµn = ζk(x) dµ (2.3.2) n→∞ Rd Rd d Observe that ζk(x) → 1 pointwisely for all x ∈ R , and the constant function 1 is µ-integrable, by the Dominated Convergence Theorem we have Z Z lim ζk(x) dµ = 1 dµ = 1 (2.3.3) k→∞ Rd Rd 13 Moreover, notice that the left hand side of 2.3.2 can be naturally bounded by Z lim inf µn Bk(0) ≥ lim ζk(x) dµn (2.3.4) n→∞ n→∞ Rd Let the k tend to infinity, and combine 2.3.4 and 2.3.3 we have Z Z lim lim inf µn Bk(0) ≥ lim lim ζk(x) dµn = lim ζk(x) dµ = 1 k→∞ n→∞ k→∞ n→∞ Rd k→∞ Rd which implies the tightness of {µn}n∈N, in terms of Expression 2.3.1. By Prokhorov’s Theorem the each subsequence {µnk }k∈N has at least one weak convergent fur- { } → ∈ P(Rd) ther subsequence µnk(i) i∈N, such that µnk(i) ν for some µ . Then we have to show ν must coincide with µ. For the simplicity of notation, we use {µnk }k∈N to denote the weakly convergent further subsequence, rather than us- { } ing µnk(i) i∈N. Since we want to reach the weak convergence, we have to use functions in ∞ d d C0 (R ) to approximate function in Cb(R ). This will be proceed in two steps: for a function f ∈ C (Rd), by Lemma 2.25, we can convolve f with the mollifier to b obtain a sequence of smooth functions fη such that fη → f uniformly on every d compact subset of R . Then, since {µn}n∈N is tight, for each e > 0, we can find d a compact set Ke ⊂ R such that µn (Ke) < e for all n ∈ N. Therefore, we can e e always construct a compactly supported smooth function fη such that fη = fη on e Ke, and we do not care what happens outside Ke. Obviously, fη → f pointwisely as η → 0 and e → 0. d Therefore, for f ∈ Cb(R ), fix η > 0, e > 0, we have Z Z e lim fη dµnk − lim f dµnk k→∞ Rd k→∞ Rd Z e ≤ lim fη − f dµnk k→∞ Rd Z Z e e = lim f − f dµn + lim f − f dµn d η k η k k→∞ R \Ke k→∞ Ke Z Z Z e ≤ lim f dµn + lim k f k d dµn + lim fη − f dµn d η d k d Cb(R ) k k k→∞ R \Ke Cb(R ) k→∞ R \Ke k→∞ Ke Z e ≤ f + k f k d · e + lim f − f dµn η d Cb(R ) η k Cb(R ) k→∞ Ke let η → 0, we know that fη − f → 0 uniformly, then also let e → 0, we conclude that Z Z e lim fη dµnk → lim f dµnk (2.3.5) k→∞ Rd k→∞ Rd e ∞ d d e remember that fη ∈ Cc (R ) and f ∈ Cb(R ), and fη → f pointwisely. 14 We know that µn → µ distributionally, and µnk → ν weakly, obviously there is Z Z ∞ d g dµ = g dν, ∀g ∈ Cc (R ) Rd Rd but according to Theorem 2.15, this is not enough to determine that µ and ν are identical. Thus, we proceed the rest by contradiction: d ∞ d Assume that µ 6= ν, then we can find a function f ∈ Cb(R ) \Cc (R ) such that Z Z f dµ 6= f dν Rd Rd e Then we use fη to approximate f , and, respectively, at the left hand side of 2.3.5 we utilize the distributional convergence and apply the Dominated Convergence Theorem, and at the right hand side the weak convergence, we obtain Z Z Z e e lim fη dµnk = fη dµ → f dµ (2.3.6) k→∞ Rd Rd Rd and Z Z Z lim f dµnk = f dν 6= f dµ (2.3.7) k→∞ Rd Rd Rd By 2.3.5, 2.3.6 and 2.3.7, there is a contradiction, therefore µ = ν. Until now, we have known: µn → µ distributionally; {µn}n∈N is tight, and each subsequence of {µn}n∈N contains a weakly convergent further subsequence, whose weak limit is also µ. By Proposition 2.20, we conclude that µn → µ weakly. Notice that, in Theorem 2.26, we say the distributional convergence and weak converge coincide only when their limit is certainly a probability measure. Let us use the example of dirac measures again, consider a sequence of probability measures {δn}n∈N, easily we know that δn → 0 distributionally, but δn 9 0 weakly, as n → ∞. 2.4 Bounded Linear Operators on Hilbert Space Let H be a separable Hilbert space, and use B(H) to denote space of the bounded linear operators. Moreover, we define the operator norm as kTk := sup kTxk (2.4.1) x∈H kxk=1 as known to all, B(H) is a Banach space with respect to the operator norm. In this section we will discuss the properties of element in B(H), but at first we shall define some objects appearing in algebra literature. 15 2.4.1 Banach Algebra and C*-Algebra Definition 2.27 (Complex algebra) A complex algebra is a vector space A over the complex field C in which a multiplication is defined that satisfies x(yz) = (xy)z (x + y)z = xz + yz x(y + z) = xy + xz and α(xy) = (αx)y = x(αy) for all x, y and z in A and for all scalars α ∈ C. Definition 2.28 (Banach algebra) If A is a complex algebra, and at the same time A is also a Banach space with respect to a norm k · k that satisfies the multiplicative inequality kxyk ≤ kxkkyk (2.4.2) then A is called a Banach algebra. Notice that a complex or Banach algebra is not necessarily to be commutative. Definition 2.29 (Involution) A map x 7→ x∗ of a complex algebra A into A is called an involution on A if it has the following four properties, for all x, y ∈ A and α ∈ C: (x + y)∗ = x∗ + y∗ (αx)∗ = αx∗ (xy)∗ = y∗x∗ x∗∗ = x By convention, a complex algebra with involution is called *-algebra. If the *- algebra is with Banach structure and multiplicative inequality 2.4.2, we may call it Banach *-algebra, but this is not a often used structure. Instead, we may expect a nicer structure, by imposing the following rather strong condition 2.4.3, and then define the C*-algebra: Definition 2.30 (C*-algebra) A Banach algebra A with an involution x 7→ x∗ is said to be a C*-algebra, if it also satisfies kxx∗k = kxk2 (2.4.3) for all x ∈ A. 16 Notice that from 2.4.3 we can easily know kxk2 = kxx∗k ≤ kxkkx∗k, then kxk ≤ kx∗k. Similarly, there is kx∗k = kx∗x∗∗k ≤ kx∗kkx∗∗k = kx∗kkxk, which implies kx∗k ≤ kxk. Therefore kxk = kx∗k (2.4.4) in each C*-algebra. It also follows that kxx∗k = kxkkx∗k (2.4.5) Conversely, 2.4.4 and 2.4.5 obviously imply 2.4.3. An element in a C*-algebra A is said to be positive if it can be written in the form a∗a, for some a ∈ A. A linear operator T : A → B between two C*-algebras is said to be positive if it sends positive elements in A into positive elements in B. Positive operators defined on C*-algebra have many good properties, but this might not enough. Hence, we invent the notion of completely positive operator, but we will throw out its definition only in Section 4.3.1. For the detail about properties of positive and completely positive operator, we refer to [22]. 2.4.2 Adjoint Operator Theorem 2.31 If f : H × H → C is sesquilinear and bounded, in the sense that M := sup | f (x, y)| kxk=kyk=1 then there exists a unique S ∈ B(H) which satisfies f (x, y) = (x, Sy), ∀x, y ∈ H (2.4.6) Moreover, kSk = M. Proof. Since | f (x, y)| ≤ Mkxkkyk, the map x 7→ f (x, y) is, for each y ∈ H, a bounded linear functional on H. By Riesz representation theorem we know there exists a unique element Sy ∈ H such that 2.4.6 holds; also, kSyk ≤ Mkyk. It is clear that S : H → H is additive. If α ∈ C, then hx, S(αy)i = f (x, αy) = α f (x, y) = αhx, Syi = hx, αSyi for all x, y ∈ H. Therefore S is linear, and S ∈ B(H), and kSk ≤ M. On the other hand, we also have | f (x, y)| = |hx, Syi| ≤ kxkkSyk ≤ kxkkSkkyk which gives the opposite inequality M ≤ kSk. 17 If T ∈ B(H), then hTx, yi is sesquilinear and bounded, as a consequence of Theorem 2.31, there exists a unique T∗ ∈ B(H) for which hTx, yi = hx, T∗yi, ∀x, y ∈ H and we call T∗ the adjoint operator of T. Theorem 2.32 B(H) is a C*-algebra. Proof. It is easy to verify that B(H) is a Banach algebra, and the mapping from an operator T ∈ B(H) to its adjoint operator T∗ ∈ B(H) is an involution. Moreover, kTxk2 = hTx, Txi = hT∗Tx, xi ≤ kT∗Tkkxk2, ∀x ∈ H so we have kTk2 ≤ kT∗Tk. On the other hand, kTk = kT∗k gives kT∗Tk ≤ kT∗kkTk = kTk2 hence the equality kT∗Tk = kTk2 holds for all T ∈ B(H). In fact, we have the Gelfand–Naimark theorem, which tells us an arbitrary C*-algebra is isometrically *-isomorphic to a C*-algebra of bounded operators on some Hilbert space. 2.4.3 Isometry and Partial Isometry Definition 2.33 (Isometry) An operator T ∈ B(H) is said to be an isometry on H, if kTxk = kxk, ∀x ∈ H As we can see, an isometry is a linear mapping which preserves the norm; more generally, an isometry could be defined on the mapping between two metric spaces. Also, we have the following characterization: Proposition 2.34 Suppose T ∈ B(H), the following statements are equivalent: 1. T is an isometry on H. 2. hTx, Tyi = hx, yi holds for all x, y ∈ H. 3.T ∗T = I on H. 18 Proof. From 1 to 2 we need the following identity, which can be easily checked: kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 hx, yi = 4 Therefore kTx + Tyk2 − kTx − Tyk2 + ikTx + iTyk2 − ikTx − iTyk2 hTx, Tyi = 4 kT(x + y)k2 − kT(x − y)k2 + ikT(x + iy)k2 − ikT(x − iy)k2 = 4 kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 = 4 = hx, yi holds for all x, y ∈ H. Now starting from 2, hTx, Tyi = hx, yi ⇒ h(T∗T − I)x, yi = 0 for all x, y ∈ H. Just take y = (T∗T − I)x, we conclude that T∗T = I on whole H, which is exactly the point 3. By the same ideas, we can go back from 3 to 2 and from 2 to 1, then we complete all the proof. Before introducing the partial isometry, we need to prove two simple but use- ful lemmas. ⊥ ⊥ Lemma 2.35 If T ∈ B(H), then ker (T∗) = Im (T) , and ker (T) = Im (T∗) . Proof. Observe that y ∈ ker(T∗) ⇔ hx, T∗yi = 0, ∀x ∈ H ⇔ hTx, yi = 0, ∀x ∈ H ⇔ y ∈ Im(T)⊥ and the second equality can be demonstrated in the same way. Lemma 2.36 Let T ∈ B(H), the following four statements are equivalent: 1.T ∗T is a projector, i.e. it is idempotent. 2.T = TT∗T. 3.T ∗ = T∗TT∗. 4.TT ∗ is a projector. 19 Proof. If P := T∗T is a projector, then define S := TT∗T − T, and observe that S∗S = (T∗TT∗ − T∗)(TT∗T − T) = P3 − P2 − P2 + P = 0 By the C*-property of B(H), we know that 0 = kS∗Sk = kSk2 thus TT∗T = T. Taking the conjugation on both side we get T∗TT∗ = T∗, and then left multiplying T we get TT∗TT∗ = TT∗, so TT∗ is also a projector. The proof of from 4 to 1 follows the same idea that we have shown in proof of from 1 to 4. Now, we give the definition of partial isometry, which plays an important role in the polar decomposition theorem and study of von Neumann algebra. Definition 2.37 (Partial isometry) An operator T ∈ B(H) is said to be an partial ⊥ isometry H if it is an isometry on ker (T) . Obviously, if T ∈ B(H) is an isometry on H, it must also be a partial isometry on H; but the inverse is not true. For the partial isometry we have the following characterization: Proposition 2.38 T ∈ B(H) is a partial isometry on H if and only if T∗T is a projector. Proof. In Lemma 2.36 we have stated T∗T is a projector if and only if T∗ = T∗TT∗; then we have T∗x = T∗TT∗x, ∀x ∈ H or equivalently y = T∗Ty, ∀y ∈ Im (T∗) ⊥ So T∗T = I on Im (T∗), and Lemma 2.35 tells us Im (T∗) = ker (T) . By Propo- ⊥ sition 2.34, we know it is equivalent to say T is an isometry on ker (T) , which is indeed the definition of partial isometry on H. Then, we concern the question: if T ∈ B(H) is a partial isometry on H, how about T∗? In fact, again with the help of Lemma 2.36, immediately we have the following result: Corollary 2.39 T ∈ B(H) is a partial isometry on H if and only if T∗ is a partial isometry. One should also notice that, if H is an finitely dimensional separable Hilbert space, and T ∈ B(H) is an isometry if and only if T∗ is an isometry, since in the finitely dimensional case T∗T = I and TT∗ = I are equivalent. However, if H is 20 of infinitely dimension, the above result may not be true. For example, consider the Hilbert space l2(N), and the left-shift operator L : l2(N) → l2(N) defined as L : (x1, x2, x3 ··· ) 7→ (x2, x3, x4 ··· ) (2.4.7) and its adjoint operator is the so-called right-shift operator R : l2(N) → l2(N), which is similarly defined as R : (x1, x2, x3 ··· ) 7→ (0, x1, x2, ··· ) (2.4.8) Obviously L is not an isometry on l2(N), but R is; and they are both partial isome- tries. At the end of this section, we write down the famous polar decomposition theory [8], in which the partial isometry appears. Theorem 2.40 (Polar decomposition theory) If T ∈ B(H), then T has a factoriza- tion √ T = U T∗T where U is a partial isometry on H. 2.4.4 Trace Class Operator Definition 2.41 (Trace-class operator) An operator A ∈ B(H) is said to be in the trace class, if for some (and hence all) orthonormal basis {uk}k of H, the sum of positive terms √ ∗ kAk1 := Tr |A| := ∑h A Aek, eki k is finite. In this case, the trace of A, which is given by the sum Tr A := ∑hAek, eki k is absolutely convergent and is independent of the choice of the orthonormal basis. Then, The set of trace-class operators on H will be denoted by B1(H). It has been shown that [39], (B1(H), k · k1) is a Banach space, and k · k1 is sometimes called nuclear norm or trace norm. Proposition 2.42 For A ∈ B1(H), we have |Tr A| ≤ Tr |A|. 21 Proof. By the polar decomposition√ theorem, we know there exist a unique partial isometry U such that A = U A∗ A. Moreover, the trace is in invariant under dif- ferent orthonormal√ bases, so we shall choose the basis {ek}k consisting of eigen- ∗ vectors of A A. Observe that, if ek ∈ ker U, then kUekk = 0, otherwise we have kUekk = kekk = 1. Therefore, √ ∗ |Tr A| = Tr U A A √ ∗ = ∑hU A Aek, eki k = ∑ λkhUek, eki k 2 ≤ ∑ λkkekk = Tr |A| k √ ∗ in which λk’s are singular values of A, i.e. the eigenvalues of A A. Corollary 2.43 Tr : B1(H) → C is continuous with respect to k · k1. Proof. Immediately, suppose A, B ∈ B1(H), by Proposition 2.42: |Tr A − Tr B| = |Tr(A − B)| ≤ Tr |A − B| = kA − Bk1 so Tr is a continuous functional. Another fact we will concern is that: the natural operator norm k · k defined as 2.4.1 is always controlled by the nuclear norm k · k1. Proposition 2.44 For A ∈ B1(H), we have kAk ≤ kAk1. 22 2 2 Proof. For convenience we can compare kAk and kAk1, observe that: kAk2 = sup kAxk2 kxk=1 = sup hA∗ Ax, xi kxk=1 * ! + ∗ = sup A A ∑ hx, eki ek , x kxk=1 k ∗ = sup ∑ hx, eki hA Aek, xi kxk=1 k D 2 E = sup ∑ hx, eki λkek, x kxk=1 k 2 2 = sup ∑ λk hx, eki kxk=1 k !2 2 2 ≤ ∑ λk ≤ ∑ λk = kAk1 k k √ ∗ in which {ek}k is the orthonormal basis consisting of eigenvectors of A A, and λk’s are corresponding eigenvalues. 2.4.5 Von Neumann Algebra A von Neumann algebra or W*-algebra is a unital *-subalgebra of B(H) that is closed in the weak operator topology. It is a very important object in operator algebra and quantum mechanics. Also, it is a special type of C*-algebra. For understanding the definition, we need to introduce some common topologies in B(H). The strongest one is the norm topology or sometimes we also call it uniform topology, i.e. the topology induced by operator norm 2.4.1. The weaker one is called the strong operator topology: Definition 2.45 (Strong operator topology) A net {Tα}α ⊂ B(H) is said to be con- vergent to some T ∈ B(H) in the strong operator topology, if Tαx → Tx, ∀x ∈ H Then, we turn to give the definition of weak operator topology, which is even weaker than the strong operator topology. Definition 2.46 (Weak operator topology) A net {Tα}α ⊂ B(H) is said to be con- vergent to some T ∈ B(H) in the weak operator topology, if ∗ y(Tαx) → y(Tx), ∀x ∈ H, ∀y ∈ H (2.4.9) 23 Equivalently, by Riesz representation theorem, we can also rewrite 2.4.9 as hTαx, yi → hTx, yi , ∀x, y ∈ H For example, on the Hilbert space H = l2(N), let R ∈ B(H) be the right shift n operator defined as in 2.4.8, and we can construct a sequence by Rn := R . It is easy to verify that {Rn}n∈N converges to 0 in the weak operator topology. In fact, |hRnx, yi − h0x, yi| = |hRnx, yi| = |hx, Lnyi| ≤ kxkkLnyk → 0 n ∗ where Ln := L and L = R is the left shift operator, as defined in 2.4.7. Besides, another common topology is the σ-weak topology. It is a well-known result that the predual of B(H) is the trace class operators B1(H), and it generates the weak-* topology on B(H), called the σ-weak topology. Definition 2.47 (σ-weak operator topology) A net {Tα}α ⊂ B(H) is said to be con- vergent to some T ∈ B(H) in the σ-weak operator topology if Tr (TαF) → Tr (TF) , ∀F ∈ B1(H) Now, we can formally set down the definition of the von Neumann algebra. Definition 2.48 (von Neumann algebra) A von Neumann algebra is a unital *-subalgebra of B(H) that is closed in the weak operator topology. As usual, we wish to characterize von Neumann algebra in other ways, but first we should define the commutant: Definition 2.49 (Commutant) Let A ⊂ B(H), the commutant of A is A0 := {T ∈ B(H) : AT = TA, ∀A ∈ A} Then we have the bicommutant theorem given by von Neumann: Theorem 2.50 (Bicommutant theorem) Let A ⊂ B(H) be a unital *-subalgebra. The following three statements are equivalent: 1. A is closed in weak operator topology, i.e. A is a von Neumann algebra. 2. A is closed in strong operator topology. 3. A = A00. The proof of bicommutant theorem can be easily found in every relevant text- book. Next, we define some more terms which frequently appear in the literature of von Neumann algebra. Definition 2.51 (Center) The center of a von Neumann algebra A is the subset A ∩ A0. 24 Definition 2.52 (Factor) A factor is a von Neumann algebra A whose centre is trivial, i.e. 0 A ∩ A = {cI}c∈C Besides, in a von Neumann algebra usually we concern only the orthogonal projection (without special clarification we shall omit "orthogonal"), i.e. operator P ∈ A such that P = P2 = P∗ P’s are exactly the operators which give an orthogonal projection of H onto some closed subspace. A subspace of the Hilbert space H is said to belong to the von Neumann al- gebra A if it is the image of some projection in A. This establishes a 1-to-1 corre- spondence between projections of A and subspaces that belong to A.[54] Now we define the order on projections, which is the fundamental of classification of factors, with the help of aforementioned 1-to-1 correspondence: Definition 2.53 (Order on projections) Two subspace E, F ∈ A are called Murray–von Neumann equivalent if there is a partial isometry u ∈ A mapping the E isomorphically onto F. Correspondingly, if define p := PE and q := PF, the Murray–von Neumann equivalence between two projections is denoted as p ∼ q. Moreover, the subspaces in A are partially ordered by inclusion, and this induces a partial order of projections; that is, we write p q if E ⊆ F. By Proposition 2.38, we can have another characterization of the Murray–von Neumann equivalence, that is: we say p ∼ q if there exists a partial isometry u ∈ A with uu∗ = p and u∗u = q. Rigorously, we have the following theorem [40]: Theorem 2.54 The relation ∼ is exactly an equivalence relation, and the relation is exactly a partial order on the equivalence classes of projections. One should notice that is not a total order on projections, and, potentially, we can construct a total order on the equivalence classes of projections generated by ∼. We write q ≺ p to represent q p but q 6= p. Then we need to define some special projections, in order to proceed the classification of factors. Definition 2.55 (Minimal projection) A projection p in a von Neumann algebra A is said to be minimal if there is no other projection q with 0 ≺ q ≺ p. Definition 2.56 (Finite projection) A projection p in a von Neumann algebra A is called infinite if p ∼ q for some q ≺ p, Otherwise p is called finite. Then we can state the type classification of factors: 25 Definition 2.57 (Type classification of factors) Suppose we have a factor A, then: 1. A is said to be of type I, if there is a minimal projection. It is customary to call the bounded operators on a Hilbert space of finite dimension n as a factor of type In, and the bounded operators on a separable infinite-dimensional Hilbert space as a factor of type I∞. 2. A is said to be of type II, if there is no minimal projection but there are non-zero finite projections. Moreover, if the identity operator in A is finite, the factor is said to be of type II1; otherwise, it is said to be of type II∞. 3. A is said to be of type III, if A do not contain any nonzero finite projections at all. We also concern the linear functional defined on a von Neumann algebra: Definition 2.58 (Tracial state) A tracial state φ on a von Neumann algebra A is a linear functional from the set of positive elements to [0, +∞], such that φ(a∗a) = φ(aa∗) for all a ∈ A and φ(1) = 1. Murray and von Neumann proved the fundamental result that a factor of type II1 has a unique finite tracial state, and the set of traces of projections is [0, 1] [1]. This result will be useful in Section 4.3.2. 26 Chapter 3 Random Matrix Theory In classical probability theory, random matrix can be regarded either as a matrix- valued random variable, or a matrix with random variable entries, these two points of view are equivalent. In non-commutative theory, the random matrix is −∞ the element in the non-commutative probability space Mn(C) ⊗ L (in which −∞ L is defined as 3.6.1), equipped with the trace τn ⊗ E. In particular, we focus on the distribution of eigenvalues of random matrices, given some specific prob- abilistic assumptions on the their entries. Firstly, we give the definition of em- pirical spectral distribution (ESD), which is a random distribution, in Section 3.1. Then, in Section 3.2 we define the convergence of random distributions, based on what we have introduced in the previous chapter. Stieltjes transform will be intro- duced in Section 3.3, with which we can even derive the limit spectral distribution of some random matrices. We will see, in Section 3.4, with some special structure, the empirical spectral distributions will tend to some limit spectral distribution, as the size of matrices grows. Besides, the convergence rates of empirical spectral distribution will be concerned in Section 3.5. Lastly, we turn to explain what is −∞ the space Mn(C) ⊗ L in Section 3.6, where we will also introduce some basics of free probability, and discover that there is a natural connection between free probability and random matrix. 3.1 Empirical Spectral Distribution At the beginning, we discard the randomness and define the empirical spectral distribution of a deterministic matrix A. Without specific notice we will suppose our random matrix to be self-adjoint, so that there spectrum will always be con- tained in the real line. Definition 3.1 (Empirical spectral distribution) For a Hermitian matrix A ∈ CN×N, its empirical spectral distribution of the eigenvalues µA : B(R) → [0, 1] can be defined 27 as 1 N µ := δ A ∑ λi(A) N i=1 Also we have the same definition but in the language of cumulative distribu- tion function. Definition 3.2 (Empirical spectral distribution) For a Hermitian matrix A ∈ CN×N, its empirical spectral distribution of the eigenvalues FA : R → [0, 1] is defined as 1 N F (x) := 1 A ∑ {λi(A)≤x} N i=1 where λi(A) is the i-th eigenvalue of A. Obviously, FA is the distribution function induced by µA. If, from now on, we assume A is a random matrix, then FA is no more de- terministic, and can be regarded as a random variable taking values in the space of probability distribution function on R. Similarly, FA becomes a random mea- sure, i.e. random variable taking values in the space of probability measure on (R, B(R)). 3.2 Convergence of Random Distributions We said at the beginning, under some specific assumptions, the empirical spectral distribution of a random matrix will converges in some sense to a deterministic distribution, which is called limit spectral distribution. Therefore, we have to rigorously define the different types of convergence of random distributions. 3.2.1 From Deterministic Distribution to Random Distribution Now, we start to define the random distribution. Again, suppose S is a metric space, B(S) is the Borel σ-algebra on S, P(S) is the set of probability measures on (S, B(S)); and (Ω, A, P) is a probability space. We define the random probability measure in the following way: Definition 3.3 (Random probability measure) A random probability measure is a map µ : Ω → P(S) such that ω 7→ µ(ω)(E) is measurable, for all E ∈ B(S). Especially, when S = R, we can define the corresponding random cumulative distribution function. 28 Definition 3.4 (Random distribution function) A random distribution function Fµ(ω)(x) can be generated from a random probability measure µ(ω), by Fµ(ω)(x) := µ(ω) ((−∞, x]) Sometimes we may also be interested in the expectation of the random prob- ability measure. If here we assume S is also locally compact, then by the help of the Riesz–Markov–Kakutani Representation Theorem, we can have the following definition. Definition 3.5 (Expectation of random probability measure) Let µ(ω) be a ran- dom probability measure, then its expectation E [µ(ω)] is defined by the duality, that is, for all f ∈ C0(R) Z Z f dE [µ(ω)] := E f dµ(ω) R R 3.2.2 General Facts on Convergence of Random Distributions In the section we mainly introduce the results on convergence of random proba- bility measures given by Berti, Patrizia and Pratelli [25], which will be extremely useful later. Theorem 3.6 Suppose {µn}n∈N is a sequence of random probability measures, and µ is a random probability measure, they are all measures on (S, E) and defined on probability space (Ω, A, P). If S is a Radon space, then the following two statements are equivalent: 1. µn(ω) → µ(ω) weakly for almost all ω ∈ Ω. R R 2. S f dµn(ω) → S f dµ(ω) almost surely for all f ∈ Cb(S). R According to Definition 2.14, statement 1 means: for almost all ω ∈ Ω, f dµn(ω) → R S S f dµ(ω) for all f ∈ Cb(S), then we can easily interchange the order to get state- ment 2. But the converse is not obvious, since if we fix f ∈ C (S), and write that R R b f dµn(ω) → f dµ(ω) converges on a set Ω f ⊂ Ω with probability mea- S S T sure one, we have to show the intersection f ∈Cb(S) Ω f is still with probability measure one. As a corollary to Theorem 3.6, we have Corollary 3.7 With the same setting, if S is a Radon space, then the following two state- ments are equivalent: 1. For each subsequence of µn, denoted as µn0 , there exists a subsequence of µn0 , de- noted as µn00 , such that µn00 (ω) → µ(ω) weakly for almost all ω ∈ Ω. R R 2. S f dµn(ω) → S f dµ(ω) in probability for all f ∈ Cb(S), i.e. for any e > 0, any δ > 0, there exists N = N(δ) such that Z Z P ω : f dµn(ω) − f dµ(ω) < e < δ, ∀n ≥ N, ∀ f ∈ Cb(S) S S 29 3.2.3 Common Types of Convergence Used in RMT In the literature of Random Matrix Theory, when we say the empirical spectral distributions tend to the limit spectral distribution, the meaning of "tendency" varies under different situations. Also, remember that the term "distribution" indicates either probability measure or distribution function, which brings more confusion. Here we list some types of convergence which are widely used in Random Matrix Theory, and then reveal their relationships. For the simplicity of the notation, sometimes we will not write ω explicitly. First, we introduce the almost sure convergence of random distribution func- tions, which is directly generalized from Theorem 2.17, that is, we do nothing but introducing the randomness into the weak converge of the probability measures. One will find, the almost sure convergence of random probability measures just means the sequence "converges weakly, almost surely" or we say "converges in weak topology almost surely"; and we must be careful about the order of state- ment. Definition 3.8 (Almost sure convergence of random distribution functions) We say a sequence of random distribution functions Fn → F almost surely, where F is a ran- dom distribution function on R, if Fn(x) → F(x) convergences for all x ∈ R, at which F is continuous, almost surely. Besides, we can also define the almost sure convergence of random distribu- tions in terms of random probability measures. So we introduce the definition used in [30]. Definition 3.9 (Almost sure convergence of random probability measures) We say a sequence of random probability measures µn → µ almost surely, where µ is a random probability measure on (R, B(R)), if for all test function f ∈ Cb(R) we have Z Z f dµn → f dµ R R almost surely. Now, we show Definition 3.8 and 3.9 are equivalent, which completes the def- inition of almost sure convergence of random probability distributions. Proposition 3.10 Suppose there is a sequence of random probability measures {µn}n∈N and a random probability measure µ, and they are all measures on (R, B(R)) and defined on probability space (Ω, A, P). Let Fµn denote the corresponding random distribution function of µn for all n ∈ N, and let Fµ denote the corresponding random distribution function of µ, obtained via Definition 3.4. Then, µn → µ almost surely if and only if Fµn → Fµ almost surely. 30 R R Proof. µn(ω) → µ(ω) almost surely means R f dµn(ω) → R f dµ(ω) almost surely for all f ∈ Cb(R), and notice the measures are defined on measurable space (R, B(R)), where R is obviously Radon. Then by Theorem 3.6, µn(ω) → µ(ω) almost surely is equivalent to µn(ω) → µ(ω) weakly for almost all ω ∈ Ω. Next, by Theorem 2.17, it is also equivalent to say Fµn (ω)(x) → Fµ(ω)(x) for all x ∈ R at which Fµ(ω) is continuous, for almost all ω ∈ Ω, this is exactly Fµn (ω) → Fµ(ω) almost surely. After having defined the almost sure convergence of random probability dis- tribution, we shall loose the condition, that is, we ask only the convergence in probability. But recall that, the definition of convergence in probability of ran- dom variables depends on the metric on the space where the random variables take the value. For the random probability measures, we can still utilize the met- ric on R through the weak convergence, like in Definition 3.9, and then we have the Definition 3.11, which is used in [30]. Definition 3.11 (Convergence in probability of random probability measures) We say a sequence of random probability measures µn → µ in probability, where µ is a random probability measure on (R, B(R)), if for all test function f ∈ Cb(R) we have Z Z f dµn → f dµ R R in probability. More explicitly, that is to say: for any e > 0, Z Z lim P ω : f dµn(ω) − f dµ(ω) > e = 0, ∀ f ∈ Cb(R) n→∞ R R Notice that if µn(ω) → µ(ω) almost surely, as in Definition 3.9, then µn(ω) → µ(ω) in probability, as in Definition 3.11, which is trivial. Unfortunately, when it turns to random distribution function, which can be regarded as the random variable taking values from probability distribution function on R, there is not a nice norm at our fingertips to loose Definition 3.8. Thus, we just introduce the definition of convergence in probability of random distribution functions used in [7], which has exchanged the order of using Ω and R in Definition 3.8, as following: Definition 3.12 (Convergence in probability of random distribution functions) We say a sequence of random distribution functions Fn → F in probability, where F is a random distribution function on R, if for any x ∈ R,Fn(x) → F(x) convergences in probability. More clearly, it is equivalent to say: for any x ∈ R, any e > 0, lim P ({ω : |Fn(ω)(x) − F(ω)(x)| > e}) = 0 n→∞ This Definition 3.12 is useful particularly when the limit random distribution function F(ω)(x) has good property. For example, if F(ω)(x) is continuous on R for all ω ∈ Ω, then we can construct a bridge from Definition 3.8 to Definition 3.12. 31 Proposition 3.13 If a sequence of random distribution functions Fn → F almost surely, where F is a random distribution function on R, and F is continuous on R for all ω ∈ Ω, then Fn → F converges in probability. Proof. From the assumptions we know Fn(ω)(x) → F(ω)(x) converges for all x ∈ R, for almost all ω ∈ Ω. Exchanging the order we know Fn(ω)(x) → F(ω)(x) converges for almost all ω ∈ Ω for all x ∈ R, therefore Fn(ω) → Fn(ω)(x) in probability for all x ∈ R, as in Definition 3.12. 3.3 Stieltjes Transform As we said at the beginning of Section 3.2, the empirical spectral distribution of some random matrices will tend, in sense we introduced above, to a limit spectral distribution. But how do we know the from of the limit special distribution? The super-important tool is the Stieltjes transform, which will be introduced later. 3.3.1 Definition and Basic Properties of Stieltjes Transform Definition 3.14 (Stieltjes transform) If µ is a probability measure on (R, B(R)). then its Stieltjes transform is defined by Z 1 sµ(z) = dµ (3.3.1) R x − z where z ∈ D := {z ∈ C : Im z > 0}. More explicitly, if let z = u + iv, we can write 3.3.1 as Z x − u Z v sµ(u + iv) = µ(dx) + i · µ(dx) (3.3.2) R (x − u)2 + v2 R (x − u)2 + v2 and observe it is well-defined upper and lower half-planes in the complex plane, since if v = 0, then in 3.3.2 the integrand of the real part is ill at x = u. But we restrict our attention only to z in the upper half-plane D. Then, we want to show that a probability measure µ can be recovered by the limit behavior of its Stieltjes transform, which shows a one-to-one correspon- dence between the probability measure and their Stieltjes transforms. 1 Theorem 3.15 (Inverse Stieltjes transform) Define fe(λ) := π Im sµ(λ + ie), then fe is a density function of some probability measure νe on (R, B(R)), and νe → µ weakly as e → 0+. 32 Proof. Observe that 1 1 Z e fe(λ) = Im sµ(λ + ie) = µ(dx) π π R (x − λ)2 + e2 is the probability density of random variable X + Ce, where X is distributed ac- cording to µ, Ce is Cauchy distributed with parameter e, and X ⊥ Ce. We also 7→ e recall that the density function of Cauchy distribution is x π(x2+e2) . Now, we study the cumulative distribution function of Ye := X + Ce, by integrating with respect to λ we have Z y Z 1 e ( ) = ( ) Fe y 2 2 µ dx dλ (3.3.3) −∞ R π (x − λ) + e Notice that the integrand in 3.3.3 is always equal or greater than 0, and it is also continuous then measurable. Moreover, µ and the Lebesgue measure are both σ-finite. Therefore we can apply the Fubini-Tonelli Theorem and get Z Z y e ( ) = ( ) Fe y 2 2 dλ µ dx R −∞ π [(λ − x) + e ] Z 1 y − x 1 = arctan + µ(dx) (3.3.4) R π e 2 1 y−x 1 Define the integrand in 3.3.4 as he(x) := π arctan e + 2 . Obviously he → h pointwisely, where 1 h(x) = 1 (x) + 1 (x) 2 {x=y} {x According to Theorem 2.17, νe → µ weakly. Next, we introduce a theorem which permits us to work with the convergence of random probability measures after the Stieltjes transform. Theorem 3.16 Let {µn}n∈N be probability measures on (R, B(R)), and µ be a sub- probability measure on (R, B(R)). Then, µn converges to µ in the vague topology if and only if sµn (z) converges to sµ(x) for all z ∈ D. R R Proof. If µn → µ vaguely, i.e. R f dµn → R f dµ for all f ∈ C0(R). Observe that, when z ∈ D is fixed, the integrands of the real part and imaginary part in 3.3.2 both vanish as |x| → ∞, then sµn (z) → sµ(z). 33 Conversely, suppose we have sµn (z) → sµ(x) for all z ∈ D. By Helly’s Se- lection Theory, for every subsequence {µnk }k∈N we can always extract a further { } subsequence µnk(i) i∈N, which converges vaguely to some sub-probability mea- sure ν. By the sufficiency that we have proved, sµ (z) → sν(x) for all z ∈ D. nk(i) But Theorem 3.15 tells us the Stieltjes transform uniquely determines a measure, thus µ = ν. As the same idea in the proof of Proposition 2.20, we conclude that µn → µ vaguely. Now, we turn to compute the Stieltjes transform of the famous semi-circular law, which was first posed by Wigner [2] in 1958. Definition 3.17 (semi-circular law) A probabilistic measure µsc is with semi-circular law, if its probability density psc is as 1 p p (x) = 4 − x2 · 1 sc 2π {|x|≤2} Immediately, we have Z 1 ssc(z) = · psc(x) dx (3.3.5) R x − z y By using twice the change of variable x = 2 cos y, ζ = ei , and the periodicity of triangular functions, we can transform 3.3.5 into the integral over the unit circle in the complex plane: Z p 1 1 2 1 ssc(z) = · 4 − x · {|x|≤2} dx R x − z 2π 1 Z 2π 1 = sin2 y dy π 0 2 cos y − z 2 1 Z 2π 1 eiy − e−iy = iy −iy dy π 0 e + e − z 2i 2 1 I ζ2 − 1 = − 2 2 dζ (3.3.6) 4πi |ζ|=1 ζ (ζ − zζ + 1) Then, we attend to use the Residue Theorem; observe the integrand in 3.3.6 has three poles: √ √ z + z2 − 4 z − z2 − 4 ζ = 0, ζ = , ζ = (3.3.7) 0 1 2 2 2 Recall√ that, for an arbitrary z ∈ C, the real part and imaginary parts of its square root z can be written as √ 1 q Re z = √ sign (Im z) |z| + Re z (3.3.8) 2 34 and √ 1 q Im z = √ |z| − Re z 2 √ Put z2 − 4 into 3.3.8 we get p 1 q Re z2 − 4 = √ sign (2 Re z Im z) |z2 − 4| + Re (z2 − 4) 2 √ thus we see that z2 − 4 and the real part of z have the same sign, since in Stieltjes transform we consider only z ∈ C such that Im z > 0. Therefore, we conclude that |ζ1| > |ζ2|. Moreover, ζ1ζ2 = 1, then we must have |ζ1| > 1 and |ζ2| < 1. On the other hand, the residues are: 2 4ζ ζ2 − 1 ζ2 − zζ + 1 − ζ2 − 1 (2ζ − z) = = res ζ0 lim 2 z ζ→ζ0 (ζ2 − zζ + 1) 2 ζ2 − 1 p = = − 2 − res ζ2 lim 2 z 4 ζ→ζ2 ζ (ζ − ζ1) By the Residue Theorem we obtain the final result: the Stieltjes transform of semi- circular law is 1 p s (z) = − z − z2 − 4 (3.3.9) sc 2 3.3.2 Derivation of semi-circular Law Using Stieltjes Transform In this section, we introduce the procedures, without detail, of the derivation of semi-circular law by the Stieltjes Transform. We will show that, even without the prior knowledge about the semi-circular law, we can still discover it. For the rigorous proof please refer to [37] or [31]. In Definition 3.2, we defined the empirical spectral distribution of a Hermitian matrix A ∈ CN×N. Now, we assume the its upper-triangular entries are inde- pendent and identically distributed complex-valued random variables with zero mean and unit variance, and the diagonal entries are independent and identically distributed real-valued random variables. We shall work with normalized matrix √1 A , whose corresponding empiri- N N cal special distribution is 1 N µN := µ √1 = δ 1 AN ∑ λi √ AN N N i=1 N and its Stieltjes transform is − Z 1 1 1 1 sN(z) := sµN (z) = dµN = tr √ AN − zIN R x − z N N 35 and taking the expectation we get " − # 1 N 1 1 EsN(z) = E √ AN − zIN N ∑ N i=1 ii One can follow these three steps to proceed the derivation of the semi-circular law [31]: 1. For any z ∈ D = {z ∈ C : Im z > 0}, sN(z) → EsN(z) almost surely. 2. For any fixed z ∈ D, EsN(z) converges to some s(z). In particular, we will find a recursion of EsN(z), pass it into limit we will get the equation 1 s(z) = − z + s(z) whose solution is exactly s(z) = ssc(z), the Stieltjes transform of semi- circular law. 3. sn(z) → s(z) almost surely for all z ∈ D. Finally, by Theorem 3.16 we conclude that µn → µsc in vague topology almost surely. Further, notice µsc is indeed a probability measure on (R, B(R)), then thanks to Theorem 2.26, we know µn → µsc almost surely. 3.4 Asymptotic Results in Random Matrix Theory We have seen that, more or less, when the entries of a random matrix meet some specific assumptions, its empirical specific distribution tends to some limit. For example, in the last section, we assume that a Hermitian matrix has i.i.d upper- triangular entries with zero mean and unit variance, and has i.i.d. diagonal en- tries. In the section, we give formal names to some special random matrices, and introduce their corresponding limit spectral distributions. 3.4.1 Wigner Matrices and semi-circular Law In fact, the random matrix appeared in the last section is called Wigner matrices. Definition 3.18 (Wigner matrix) Consider a family of independent, zero-mean, real or complex valued random variables Z , independent from a family {Y } of zero- ij 1≤i 36 As we can see, in Definition 3.18 there is no requirement of having identically distributed entries. Moreover, the following theorems tells us, no matter if the Wigner matrix has the identically distributed entries, under right setting, the its empirical spectral distribution will always tend to the semi-circular law, which was introduced in the Definition 3.17. Theorem 3.19 (Semi-circular law in i.i.d case [31]) Suppose that AN is an N × N Wigner matrix whose diagonal entries are i.i.d. real random variables with zero mean and those above the diagonal are i.i.d. complex random variables with zero mean and unit variance. Then, the empirical spectral distribution of W := √1 A tends to the N N N semicircular law almost surely. Theorem 3.20 (Semi-circular law in non-i.i.d case [31]) Suppose that W := √1 A N N N is a Wigner matrix and the entries above or on the diagonal of WN are independent but may be dependent on N and may not necessarily be identically distributed. Assume that all the entries of AN are of zero mean and unit variance and satisfy the condition that, for any constant η > 0 1 N N 2 N 1n √ o lim E Ajk · N = 0 N→∞ 2 ∑ ∑ A ≥η N N j=1 k=1 jk N where Ajk is the entry of AN on j-th row and k-th column. Then, the empirical spectral distribution converges to the semicircular law almost surely. 3.4.2 Wishart Matrices and Marchenko-Pastur Distribution Simply, we call the random matrix W = AA† a Wishart matrix, where A is a N × K random matrix with independent entries. Sometimes we explicitly write ANK to empathize the size of A. Theorem 3.21 (Marchenko-Pastur distribution in i.i.d. case[31]) Consider an N × K matrix A whose entries are independent and identically distributed complex random variables with mean 0 and variance σ2. As N, K → ∞ with N/K → β > 0, the em- 1 † pirical spectral distribution of K AA converges almost surely to a deterministic limit distribution Fβ, which is called Marchenko-Pastur distribution. In particular, If 0 < β ≤ 1,Fβ has density q bβ − x x − aβ f (x) = (3.4.1) β 2πxβσ2 2 p 2 2 p 2 on interval (aβ, bβ), where aβ = σ 1 − β and bβ = σ 1 + β ; and it has probability density 0 outside this interval. If β > 1,Fβ is a mixed distribution, it has probability density as 3.4.1 on (aβ, bβ), 1 and has probability mass 1 − β at x = 0, and has zero mass or density else anywhere. 37 A trick of transform might be useful in the future chapters, that is, we can define 1 B := √ A K immediately 1 BB† = AA† K Then according to the Theorem 3.21, the limit spectral distribution of BB† is still 3.4.1. In this way, we can equivalently turn to study the spectral distribution of BB†, where B has i.i.d. entries with mean 0 and variance σ2/K. Especially, when 2 σ = 1, we call Fy standard Marchenko-Pastur distribution. Marchenko-Pastur distribution can be used to study the asymptotic property of the eigenvalues of the sample covariance matrices in Statistics, and it can also describe the asymptotic behavior of singular values of large rectangular random matrices. Similarly, we also have a version for Wishart matrix with non identi- cally distributed entries. Theorem 3.22 (Marchenko-Pastur distribution in non-i.i.d. case[31]) Suppose that the entries of A are independent complex variables with a common mean µ and variance σ2. Assume that N/K → β > 0 and that, for any η > 0, 1 N K 2 NK 1n √ o lim E Ajk · NK = 0 K→∞ 2 ∑ ∑ A ≥η K η NK j=1 k=1 jk 1 † Then the empirical spectral distribution of K AA tends to the Marchenko-Pastur law almost surely, with ratio index β and scale index σ2. 3.4.3 Ginibre Matrices and Circular Law In this section we have to throw away some conventional settings introduced before, since now we would like to deal with the case where random matrices are no longer Hermitian, and consequently their spectrum will be on the complex plane instead of the real line. Suppose that XN is an N × N matrix with independent and identically dis- tributed entries, which are of zero mean and unit variance, and we also call it Ginebre ensemble. Now if λ , λ , ··· , λ are eigenvalues of √1 X, then we can 1 2 n N define the two-dimensional empirical distribution by 1 µ (x, y) = #{k ≤ N : Re(λ ) ≤ x, Im(λ ) ≤ y} N N k k One should notice that the techniques dealing with Hermitian matrices wil fail in the non-Hermitian case, like truncation method and moment method. More- over, the Stieltjes transform will become pretty hard. For this reason, the con- jecture that the limit spectral distribution of Ginibre matrices is the circular law 38 had not been proven for decades. Now, we are going to give the newest result [28]. It requires the (2 + e)-th moment to be finite, for some e > 0. However, the problem under the only condition of finite second moment is still open. Theorem 3.23 (Circular law) Suppose X is a Ginibre ensemble, and each its entry is with finite (2 + e)-th moment, then the empirical spectral distribution of √1 X con- N N verges almost surely to the circular law, which is the uniform distribution on the unit disk of the complex plane. 3.4.4 ESD of Another Important Class of Random Matrices Naturally we want to study the empirical spectral distribution of the matrices with a more general structure. For example, we can generalize XX† to XTX†, or even to A + XTX†. Of course it should be done under proper assumptions on T and A. In fact, we have the following theorem [14]: Theorem 3.24 Assume that 1. For all N ∈ N, we have the matrix 1 N XN := X N ij 1≤i≤N 1≤j≤K N in which Xij ’s are complex-valued random variables, and they are identically dis- tributed for all N ∈ N, independent across i and j for each N ∈ N. Moreover, 2 1 1 E X11 − EX11 = 1 2. When N → ∞, there are K → ∞ and K/N → β.