<<

School of Industrial and Information Engineering Master of Science in Mathematical Engineering

RANDOM MATRIX THEORYAND APPLICATIONS IN TELECOMMUNICATION AND QUANTUM SYSTEMS

Author : Zheng LI Supervisor : Franco FAGNOLA

Academic Year 2018/2019

To C.S. Ying

Abstract

The study of random matrices started in 1940s, when the physicists observed that the empirical spectral distributions of random Hamiltonians tends to a semicir- cle. Since then, more and more research results about random matrices have been published, and random matrix theory turned out to have deep connections with free probability and combinatorics. In the meantime, random matrix theory has been applied in many other fields, in almost all the situations where one wants to know the asymptotic property of some statistics determined by spectra of large matrices. In this thesis, we present an overview of random matrix theory and illustrate its applications to: telecommunication MIMO systems to the computa- tion of channel capacities, CDMA systems in the evaluation of minimum mean square errors and spectral efficiency, in for the study of capacities and the celebrated conjecture on additivity of quan- tum entropy, in open quantum systems for finding spectra of random Lindblad operators.

Keywords: Random Matrix Theory, Telecommunication Systems, Open Quan- tum Systems.

iv

Acknowledgements

Firstly I would like to express my sincere gratitude to Prof. F. Fagnola, who has supervised the whole writing of my thesis. He has given me clear guidelines on studying of different subjects, and was always responsible when I encountered difficulties. I would also thank Prof. V. Moretti, who spent much time to demon- strate some proofs for me with patience. I wish to thank T. Kletti, who kindly read the manuscript and gave me sug- gestions, and always helped me on studying of mathematics. I also wish to thank K. Dong for meaningful discussions on telecommunication systems, thank S.F. Zhang for help on English language writing. I also have to thank my family, in particular my uncle D.S. Li, who is the main financial supporter of my studies here in Italy, and unfortunately passed away last year; I hope he would be glad to know the accomplishment of my master thesis writing in heaven. Finally I want to express a special gratitude to the Department of Mathematics of Politecnico di Milano, which gave me, once a layman, a precious opportunity to study mathematics, and has opened my eyes to see a new beautiful world.

vi

Contents

Abstract iv

Acknowledgement vi

1 Introduction1

2 Preliminaries3 2.1 Information Theory...... 3 2.1.1 Complex Random Vector...... 3 2.1.2 Entropy and Mutual Information...... 5 2.2 Estimation Theory...... 7 2.2.1 Minimal Mean Squared Error Estimator...... 7 2.2.2 Linear MMSE Estimator...... 7 2.3 Probability Measures on Metric Space...... 9 2.3.1 Weak Convergence of Probability Measures...... 9 2.3.2 Tightness and Relative Compactness...... 11 2.3.3 Other Types of Convergence...... 12 2.4 Bounded Linear Operators on Hilbert Space...... 15 2.4.1 Banach Algebra and C*-Algebra...... 16 2.4.2 Adjoint Operator...... 17 2.4.3 Isometry and Partial Isometry...... 18 2.4.4 Trace Class Operator...... 21 2.4.5 Von Neumann Algebra...... 23

3 Random Matrix Theory 27 3.1 Empirical Spectral Distribution...... 27 3.2 Convergence of Random Distributions...... 28 3.2.1 From Deterministic Distribution to Random Distribution.. 28 3.2.2 General Facts on Convergence of Random Distributions.. 29 3.2.3 Common Types of Convergence Used in RMT...... 30 3.3 Stieltjes Transform...... 32 3.3.1 Definition and Basic Properties of Stieltjes Transform.... 32 3.3.2 Derivation of semi-circular Law Using Stieltjes Transform. 35

viii 3.4 Asymptotic Results in Random Matrix Theory...... 36 3.4.1 Wigner Matrices and semi-circular Law...... 36 3.4.2 Wishart Matrices and Marchenko-Pastur Distribution.... 37 3.4.3 Ginibre Matrices and Circular Law...... 38 3.4.4 ESD of Another Important Class of Random Matrices.... 39 3.5 Convergence Rates of ESD...... 40 3.5.1 In Cases of Wigner Matrices and Wishart Matrices...... 40 3.5.2 Simulation of Convergence Rate of ESD...... 42 3.6 Connections with Free Probability...... 43 3.6.1 Non-commutative Probability Space and Freeness...... 43 3.6.2 Free Product and Free Probability...... 45 3.6.3 Free Central Limit Theorem and Asymptotic Freeness.... 47

4 Applications of RMT 49 4.1 In MIMO System...... 49 4.1.1 Asymptotic Result I: Fixed Number of Receivers...... 50 4.1.2 Asymptotic Result II: Simultaneously Tending to Infinity.. 51 4.2 In CDMA System...... 53 4.2.1 Cross-correlations of Random Spreading Sequences..... 55 4.2.2 MMSE Multiuser Dectection and Spectral Efficiency..... 56 4.2.3 Other Types of Detections in CDMA System...... 60 4.3 In Open Quantum System...... 61 4.3.1 and Quantum Channel...... 61 4.3.2 Asymptotic Minimal Output Entropy...... 65 4.3.3 Spectrum of Random Quantum Channel...... 68 4.3.4 Quantum Markov Semigroup and Lindblad Equation.... 71 4.3.5 Random Matrix Model of Lindbladian...... 74

5 Conclusions and Future Development 76

Bibliography 78

ix Chapter 1

Introduction

Random matrix theory (RMT) first appeared in the study of quantum mechan- ics in the 1940s. In quantum mechanics, values of physical such as energy and momentum are regarded as eigenvalues of linear bounded operators on a Hilbert space. In particular, the Hermitian operator Hamiltonian, which is closely related to the time-evolution of a quantum system (this will also be mentioned in our Section 4.3.4), played the vital role in the theory of quantum mechanics. Hence, the asymptotic behavior (in particular the distribution of the spectrum) of large dimensional random matrices of such type had attracted spe- cial interests, and semi-circular law was discovered during that time [2]. Later, lots of researchers started to work on this field. Many other types of matrices have been studied, and the convergence of the empirical spectral distri- butions of random matrices has been proven in more and more strong sense. In 1980s, L. Pastur [5], as a pioneer, introduced the Stieltjes transform into random matrix theory, which can discover the limit spectral distribution in many cases, without knowing prior knowledge. Moreover, Z.D. Bai et al. [11][12] have used the Stieltjes transform as the main tool to study the convergence rate of empirical spectral distributions. It is worth mentioning that, in the meantime, D. Voiculescu created the free probability, a theory that studies non-commutative random vari- ables, in which the limit distribution in the free version of central limit theorem is the semi-circular law. About 1991, D. Voiculescu [9] also discovered that freeness could be asymptotically held for many kinds of random matrices. Nowadays, the random matrix theory has been applied to numerous fields, anywhere we want to know the asymptotic property of some statistics depending on the spectra of matrices. In this master thesis, we first introduce some preliminary knowledge on sev- eral quite different topics in Chapter2, which will be used in the subsequent chapters. Notice that we will select only the knowledge that is out of the cur- riculum of the Mathematical Engineering program at Politecnico di Milano; in other words, we assume that the reader is familiar with the basics in Algebra, Probability, Measure Theory, and Functional Analysis.

1 In Chapter3, we will clarify the formal definition of "convergence" for the ran- dom probability measures, and give some properties of the aforementioned pow- erful tool Stieltjes transform. The limit spectral distributions of different types of ensemble like Wigner ensemble, Wishart ensemble and Ginibre ensemble, will be listed. Moreover, we will discuss the free probability and its connections to random matrix theory. In Chapter4 one can find several applications of random matrix theory in telecommunication systems and open quantum systems. The first application appears in multiple-input multiple-output (MIMO) system, in which the channel can be modelled as a matrix, and the Information Theory tells us the capacity of the channel is determined by the eigenvalues of that matrix, hence we can ap- ply the random matrix theory to analyze the asymptotic capacity of the channel. The second application is about code-division multiple access (CDMA) system, in which people use random spreading sequences to modulate the signal, and the linear minimal mean square error estimation to demodulate signal. We will analyze the asymptotic error and capacity of such estimation. The third appli- cation is about the asymptotic capacity of the random quantum channel, and it depends on the eigenvalues in a random subspace of a tensor product. The last application is about the sampling of the spectrum of random Lindblad opera- tor, in high dimensional open quantum systems. We will give a random matrix model of such operator, which conserves the asymptotic spectral property, but dramatically reduces the sampling time.

2 Chapter 2

Preliminaries

We begin with some preliminary but important results that will be used in the rest chapters. Section 2.1 and 2.2 are prepared for the engineering problems. Section 2.3 helps us review the probability theory; in particular, in the Section 2.3.3 an important result will be given, which tells that the different kinds of convergence of probability measures coincide under specific condition. Section 2.4, however, mainly talks about some basic knowledge of Operator Algebra, which is the fun- damental of Quantum Probability.

2.1 Information Theory

2.1.1 Complex Random Vector We define a complex-valued random vector in the following way: 0 Definition 2.1 (Complex random vector) A complex random vector Z = (Z1, ··· , Zn) on the probability (Ω, F, P) is a measurable function Z : Ω → Cn such that the vector 0 (Re(Z1), Im(Z1), ··· , Re(Zn), Im(Zn)) is a real random vector on (Ω, F, P). For a complex random vector, we define its expectation as 0 µ := E[Z] = (E[Z1], E[Z2], ··· , E[Zn]) (2.1.1) and define its covariance matrix as h i Σ := E (Z − E [Z])(Z − E [Z])† (2.1.2) where (Z − E [Z])† denotes the conjugate transpose of Z − E [Z]. Moreover, dif- ferent from the real-valued random vectors, in the complex-valued case we addi- tionally define 0 Γ := E (Z − E [Z])(Z − E [Z])  (2.1.3)

3 0 where (Z − E [Z]) denotes the matrix transpose of Z − E [Z]. This Γ also plays a role in determining a distribution. Then, we turn to define the complex-valued Gaussian random vector: Definition 2.2 (Complex Gaussian random vector) A complex random vector Z is 0 complex Gaussian distributed if (Re(Z1), Im(Z1), ··· , Re(Zn), Im(Zn)) is a real Gaus- sian distributed random vector. If Z is complex Gaussian distributed, we have the notation Z ∼ CN (µ, Σ, Γ), where the parameters are defined as in 2.1.1- 2.1.3. The probability density func- tion of Z ∼ CN (µ, Σ, Γ) is pretty complicated and will not be used in the follow- ing sections, so we omit it here, but one can easily find it in [16]. Specially, we say Z is standard complex Gaussian distributed if Z ∼ CN (0, I, 0). Now, we introduce this concept of circularly symmetric complex random vec- tor, which is widely used in the field of Telecommunication Engineering. Definition 2.3 (Circular symmetry) A complex random vector Z is said to be circu- larly symmetric, if for every x ∈ [−π, π) its distribution is identical to eixZ. As a direct result of circular symmetry, E [Z] = 0. In fact, if taking the expec- tation, we will have E [Z] = eix · E [Z] for all x ∈ [−π, π), this compels E[Z] = 0. Next, we will see if a complex random vector is circularly symmetric, then its Γ will also have some special structure. Proposition 2.4 If Z is circularly symmetric, then its corresponding µ = 0 and Γ = 0. Proof. µ = E[Z] = 0 has already been demonstrated. For studying the struc-  0  ture of Γ we will follow the same idea. Notice now Γjk = E ZjZk because Z is centered, and we can write  0  0  −i2x  ix   ix  −i2x  0  E ZjZk = e E e Zj e Zk = e E ZjZk (2.1.4)

The second equality holds because the distributions of E[Z] and eixE[Z] must be  0  identical. Obviously the arbitrariness of x compels E ZjZk = 0. Moreover, here j, k are also arbitrary, therefore Γ = 0.  Then we can combine the complex Gaussian distribution and the circular sym- metry, and give a formal definition as following Definition 2.5 (Circularly symmetric complex Gaussian random vector) A com- plex random vector Z is circularly symmetric complex Gaussian distributed if it is com- plex Gaussian distributed and circularly symmetric, and it can be simply denoted as Z ∼ CN (0, Σ). Here Γ is suppressed because of Proposition 2.4, and the probability density func- tion of Z ∼ CN (0, Σ) can be simplified as [46]

1 † −1 f (z) = e−z Σ z (2.1.5) Z πN det (Σ)

4 2.1.2 Entropy and Mutual Information In this section we turn to define the entropy, which measures the uncertainty of a random variable (or vector). 0 Definition 2.6 (Entropy) Let X = (X1, X2, ··· , XN) random vector with probability density function fX (x), its entropy is defined as Z h(X) = − fX (x) log fX (x)

0 0 Definition 2.7 (Conditional entropy) Let X = (X1, X2, ··· , XN) and Y = (Y1, Y2, ··· , YN) be two random vectors, with knowing the joint density function f(X,Y) and the conditional probability density function fX|Y . Then, X’s entropy conditioned on Y is defined as ZZ h(X|Y) = − f(X,Y)(x, y) log fX|Y (x|y)

Next we want to study the entropy of a circularly symmetric complex Gaus- sian distributed random vector, but first we need to prove an important lemma.

Lemma 2.8 If Z ∼ CNN(0, Σ), then h i E Z†Σ−1Z = N

−1 −1 Proof. We denote Σij as the entry of Σ at i-th row and j-th column, then observe that " # h i N N N N E † −1 E −1 −1E   Z Σ Z = ∑ ∑ Σij ZiZj = ∑ ∑ Σij ZiZj = ··· i=1 j=1 i=1 j=1 N N −1  −1  = ∑ ∑ Σij Σji = tr Σ Σ = N i=1 j=1

which completes the proof. 

Proposition 2.9 If Z ∼ CNN(0, Σ), then its entropy is

h(Z) = log det (πeΣ) (2.1.6)

Proof. Recall the probability density function of circularly symmetric complex Gaussian variable fZ(z) is given in Expression 2.1.5, just by calculation and using

5 Lemma 2.8 we have Z h(Z) = − fZ(z) log fZ(z) Z   N  † −1  = − fZ(z) − log π det Σ − z Σ z log e Z Z  N  † −1 = log π det Σ · fZ(z) + log e · fZ(z)z Σ z   h i = log πN det Σ + log e · E Z†Σ−1Z   = log πN det Σ + log eN = log det (πeΣ) so we have reached the goal.  Then we define the mutual information, which is a measure of the amount of information that one random variable contains about another random variable. In our case we are concerning on the random vector, i.e. multi-dimensional ran- dom variable. Definition 2.10 (Mutual information) Let X and Y be two random vector with prob- ability density functions fX (x) and fY (y), respectively, then their mutual information is ! ZZ f(X,Y)(x, y) I(X, Y) = f(X,Y)(x, y) log (2.1.7) fX (x) fY (y) Moreover, we can also represent the mutual information by the entropy, which reveals that the mutual information I(X, Y) is exactly the reduction in the uncer- tainty of X due to the knowledge of Y [35]. Symmetricly, it is also the reduction in the uncertainty of Y due to the knowledge of X. Proposition 2.11 For two random vectors X and Y, we have

I(X, Y) = h(X) − h(X|Y) = h(Y) − h(Y|X)

Proof. According to Definition 2.6 and Definition 2.7, we have Z ZZ h(X) − h(X|Y) = − fX (x) log fX (x) + f(X,Y)(x, y) log fX|Y (x|y) ! ZZ ZZ f(X,Y)(x, y) = − f(X,Y)(x, y) log fX (x) + f(X,Y)(x, y) log fY (y) ! ZZ f(X,Y)(x, y) = f(X,Y)(x, y) log = I(X, Y) fX (x) fY (y)

By the same strategy we can prove I(X, Y) = h(Y) − h(Y|X). 

6 2.2 Estimation Theory

2.2.1 Minimal Mean Squared Error Estimator Suppose that we want to estimate the value of the unobserved random variable X (here we are not estimating a parameter), given the observation Y = y. In general, our estimate xˆ is a function of y, i.e.

xˆ = g(y)

The mean squared error is defined by h i h i 2 2 MSE (xˆ) := E (X − xˆ) Y = y = E (X − g(y)) Y = y (2.2.1) by simple computation one can obtain that xˆ = E [X|Y = y] is the estimate which minimizes the quantity 2.2.1. Therefore, we shall define the minimal mean square error (MMSE) estimator in the following way:

Definition 2.12 (MMSE estimator) Let Xˆ = g(Y) be an estimator of random variable X, needing to observe the value of some random variable Y. We say Xˆ MMSE is the MMSE estimator if Xˆ MMSE = E [X|Y] which minimizes the mean square error among all estimators.

By the law of total expectation and law of total variance, we know:

E Xˆ  = E [E [X|Y]] = E [X] Var Xˆ  = Var (X) − E [Var (X|Y)]

2.2.2 Linear MMSE Estimator In most cases, the MMSE estimator Xˆ = E [X|Y] is not easy to be explicitly com- puted even when we know Y = y. We might hope that the MMSE estimator has a simpler structure, for example we shall let

Xˆ LMMSE = g(Y) = aY + b in which a, b ∈ R are real numbers to be determined. This Xˆ LMMSE is called linear MMSE estimator. In particular, we want to choose a and b such that Xˆ LMMSE has the minimal mean square error among all possibilities, and we have the following proposition:

7 Proposition 2.13 Let X and Y be two random variables with finite mean and variance, and consider the function h defined as

h 2i h i h(a, b) := E X − Xˆ  = E (X − aY − b)2 (2.2.2)

Then, h(a, b) is minimized if Cov (X, Y) a = a∗ := (2.2.3) Var (Y) and Cov (X, Y) b = b∗ = E [X] − E [Y] (2.2.4) Var (Y) Moreover, h(a∗, b∗) = (1 − ρ2) Var (X) (2.2.5) where ρ is the correlation coefficient between X and Y; and we also have

E [(X − a∗Y + b∗)Y] = 0 (2.2.6)

which is also called orthogonal principle.

Proof. We can directly expand 2.2.2 and get

h(a, b) = EX2 + a2EY2 + b2 − 2aEXY − 2bEX + 2abEY

then 2EY2 EY Hess h = EY 2 is positive definite, which is easy to be checked. Therefore we conclude h is strictly convex on R2, which promises the existence and uniqueness of the mini- mizer, and allows us to cultivate the minimizer by partial derivatives. Thus, we compel ∂ h = 2aEY2 − 2EXY + 2bEY = 0 (2.2.7) ∂a ∂ h = 2b − 2EX + 2aEY = 0 ∂b By solving above system of equations we obtain 2.2.3, 2.2.4 and 2.2.5, and observe that 2.2.6 is an immediate result of 2.2.7.  In this way, we can write our Linear MMSE estimator as Cov (X, Y) Xˆ = (Y − EY) + EX LMMSE Var (Y)

   2 Obviously we have E Xˆ LMMSE = E [X] and Var Xˆ LMMSE = ρ Var (X).

8 2.3 Probability Measures on Metric Space

2.3.1 Weak Convergence of Probability Measures Now we go through a little bit in the measure theory. Suppose that we have a metric space S with Borel σ-algebra B(S), on which we have a space of proba- bility measures P(S). Then, we shall define the weak convergence of probability measures, which sometimes is also called narrow convergence.

Definition 2.14 (Weak convergence) Let {µn}n∈N and µ be probability measures on (S, B(S)). We say µn → µ weakly if Z Z f dµn → f dµ, ∀ f ∈ Cb(S) S S Then, naturally, we will consider the question: is the weak limit unique? The answer is positive, which is the consequence of following theorem [38], since space of bounded uniformly continuous functions is a subspace of Cb(S). R R Theorem 2.15 Probability measures µ and ν on (S, B(S)) coincide if S f dµ = S f dν for all bounded uniformly continuous functions f on S.

We have also other characterizations of the weak convergence is given by the following Portmanteau Theorem [38].

Theorem 2.16 (Portmanteau) The following conditions are equivalent:

1. µn → µ weakly. R R 2. S f dµn → S f dµ for all bounded uniformly continuous functions f on S.

3. µn(A) → µ(A) for all µ-continuity sets A ⊂ S.

In condition3, A ⊂ S is called µ-continuity set if its boundary ∂A satisfies

µ(∂A) = 0

Specially, when S is Euclidean spaces (here we can even set S = R for sim- plicity), we can introduce the concept of cumulative distribution function. In fact, if we have a probability measure µ, its corresponding distribution function Fµ is defined as Z 1 Fµ(x) := µ ((−∞, x]) = (−∞,x] dµ R

Conversely, Fµ also characterizes µ. This tells us there is a kind of conjugacy be- tween the distribution function and the probability measure. Since the existence of this conjugacy, sometimes we abuse the term "distribution", especially in the

9 random matrix theory, it can represents either the distribution function or the cor- responding probability measure. However, one should always be able to verify what does it indicate, in concrete situations. More specifically, a function F : R → [0, 1] is the distribution function of some probability measure on (R, B(R)) if and only if the following three conditions are satisfied [36]:

1. F is non-decreasing.

2. F is right continuous.

3. limx→−∞ F(x) = 0 and limx→+∞ F(x) = 1. Finally, we give a characterization of the weak convergence of probability measures in terms of their corresponding cumulative distribution functions.

Theorem 2.17 Let {µn}n∈N and µ be probability measures on R, and denote their cor-

responding distribution function as Fµn and Fµ. Then, µn → µ weakly if and only if

Fµn (x) → Fµ(x) for x ∈ R at which Fµ is continuous.

Proof. One side can be obtained just by condition3 in Theorem 2.16. For the proof of the whole theorem please refer to [36].  Here we empathize that µ must be a probability measure on (R, B(R)); oth- erwise, we can let {µn}n∈N be the sequence of Dirac measures, each centered at

n, and let µ = 0. Obviously, Fµn has a unit jump at n and Fµ is identically zero.

Now, observe that Fµn (x) → Fµ(x) pointwisely for all x ∈ R, but µn 9 µ weakly. We will see this sequence of Dirac measures several times in the further part. In fact, we have the Helly’s Selection Theorem, which in general describes the sequential compactness in the space of locally integrable function of bounded variations, and luckily every cumulative distribution function is just its one spe- cial case.

Theorem 2.18 (Helly’s Selection Theorem) For every sequence {Fn}n∈N of distri-

bution functions there exists a subsequence {Fnk }k∈N and a non-decreasing, right-continuous function F such that

lim Fnk (x) = F(x) k→∞ at continuity points x of F.

Proof. Apply the classical diagonal technique, we can obtain a sequence {nk}k∈N

of integers along which the limit G(r) = limk→∞ Fnk (r) exist for all r ∈ Q. Then we define F(x) := inf G(r) x

10 clearly F is non-decreasing. Moreover, to each x and e > 0, there is an r for which x < r and G(r) < F(x) + e. If x ≤ y < r, then F(y) ≤ G(r) < F(x) + e, so F is right-continuous. If F is continuous at x, choose y < x such that F(x) − e < F(y); now choose rational r and s such that y < r < x < s and G(s) < F(x) + e. From F(x) − e < G(r) ≤ G(s) < F(x) + e and Fn(r) ≤ Fn(x) ≤ Fn(s), it follows that, as k → +∞,

Fnk (x) has limits superior and inferior within e of F(x).  The F in this theorem necessarily satisfies 0 ≤ F(x) ≤ 1 for all x ∈ R, but F need not be a distribution function, as we see in aforementioned counterexample. Sometimes, we say such F is a (extended) cumulative distribution function of a sub-probability measure. In fact, this kind of convergence is called vague convergence, which will be introduced later (in Definition 2.22, one shall prove that they are equivalent).

2.3.2 Tightness and Relative Compactness Another important concept we are about to introduce here is the tightness, this nice property of measures will help to constrict the limit in the Helly’s Selection Theorem to be a probability measure.

Definition 2.19 (Tightness) Let M ⊂ P(S) be a collection of probability measures on (S, B(S)), it is called (uniformly) tight if for every e > 0, there exists a compact subset Ke ⊂ S such that µ (S \ Ke) < e, ∀µ ∈ M Especially, if M consists of a single measure µ, then µ is called a tight measure.

If the collection M is sequential, that is, we can write it as {µn}n∈N, then the tightness ensures that µn will not "flee" to infinity as n → +∞. Consider the sequence of probability measures {δn}n∈N on (R, B(R)), where δn is the Dirac measure centered at n. Obviously {δn}n∈N cannot be tight, since once have fixed e, we can let n be large enough so that there does not exist such Ke. In this case, the tightness of the sequence of probability measures can be also equivalently expressed as [26] sup lim inf µn(K) = 1 (2.3.1) → K⊂⊂S n ∞ Moreover, we give a further simple condition for the weak convergence of probability measures:

Proposition 2.20 A necessary and sufficient condition for µn → µ weakly is that each { } { } subsequence µnk k∈N contains a further subsequence µnk(i) i∈N converging weakly to µ.

11 Proof. The necessity is trivial. For the sufficiency, if µn 9 µ, then Z Z lim f dµn 9 f dµ n→∞ S S for some f ∈ Cb(S). But then, for some e > 0 and some subsequence {µnk }k∈N,

Z Z f dµnk − f dµ > e, ∀k ∈ N S S so there will be no further subsequence which can converge weakly to µ.  Then, we will reveal the connection between the tightness and relative com- pactness of M, in the following Prokhorov’s Theorem [38]. Recall that, we say M is relatively compact or pre-compact, if every sequence of elements of M contains a weakly convergent subsequence. For the most part we are concerned with the relative compactness of sequences {µn}n∈N, this means that every subsequence { } { } → µnk k∈N contains a further subsequence µnk(i) i∈N such that µnk(i) ν weakly as i → +∞, for some probability measure ν ∈ P(S).

Theorem 2.21 (Prokhorov’s Theorem) Suppose M ⊂ P(S) is a collection of proba- bility measures. If M is tight, then M is relatively compact. Conversely, if M relatively compact and S is separable and complete, i.e. S is Polish, then M is tight.

This theorem will be used in the next section.

2.3.3 Other Types of Convergence Moreover, we have some other weaker types of convergences of probability mea- sures, which will also be used in the further part.

Definition 2.22 (Vague convergence - type I) Let {µn}n∈N and µ be probability mea- sures on (S, B(S)). We say µn → µ vaguely if Z Z f dµn → f dµ, ∀ f ∈ C0(S)

Definition 2.23 (Vague convergence - type II) Let {µn}n∈N and µ be probability mea- sures on (S, B(S)). We say µn → µ vaguely if Z Z f dµn → f dµ, ∀ f ∈ Cc(S)

Definition 2.24 (Distributional convergence) Let {µn}n∈N and µ be probability mea- sures on (S, B(S)). We say µn → µ distributionally if Z Z ∞ f dµn → f dµ, ∀ f ∈ Cc (S)

12 Notice C0(S) represents the Banach space of continuous function on S vanish- ing at infinity, equipped with the uniform norm, so we have the boundedness. Notice that, since S is a metric space which does not necessarily have the norm, vanishing at infinity means: given any e > 0, there is a compact subset Ke ⊂ S such that | f (x)| < e for all x ∈ S \ Ke. Moreover, Cc(S) represents the space of ∞ compactly supported continuous functions on S, and Cc (S) represents the space of compactly supported smooth functions on S. ∞ Obviously we have the relation Cc (S) ⊂ Cc(S) ⊂ C0(S) ⊂ Cb(S), so once we have the weak convergence of probability measures, we have all other kinds of convergence. Now, we are going to demonstrate the most important result in this section, that is, if S = Rd, all the aforementioned convergence , i.e. Definition 2.14, 2.22, 2.23, 2.24, are equivalent. Simply, we can just prove that the distribu- tional convergence and weak convergence coincide. Before proving the main theorem, we need to write down a useful lemma as follows, which allows us to approximate the target function by a sequence of smooth functions. For more detail about the mollifier and convolution technique, please refer to [45].

p d Lemma 2.25 Let f ∈ L (Ω), 1 ≤ p ≤ ∞, Ω ⊆ R , and define fη := gη ∗ f , where gη is the mollifier, then we have:

∞ n 1.f η ∈ C (R ) for all η > 0.

2. if f ∈ C(Ω), then fη → f uniformly in every compact K ⊂ Ω, as η → 0.

d d Theorem 2.26 Suppose that {µn}n∈N and µ are probability measures on (R , B(R )), then µn → µ distributionally if and only if µn → µ weakly.

Proof. The necessity is obvious, we can just show the sufficiency. ∞ d First, we show that {µn}n∈N is tight. Let ζ ∈ Cc (R ) satisfies 0 ≤ ζ ≤ 1 and ( 1 |x| ≤ 1 ζ(x) = 2 0 |x| ≥ 1

Define ζk(x) := ζ(x/k), and by the distributional convergence we have Z Z lim ζk(x) dµn = ζk(x) dµ (2.3.2) n→∞ Rd Rd

d Observe that ζk(x) → 1 pointwisely for all x ∈ R , and the constant function 1 is µ-integrable, by the Dominated Convergence Theorem we have Z Z lim ζk(x) dµ = 1 dµ = 1 (2.3.3) k→∞ Rd Rd

13 Moreover, notice that the left hand side of 2.3.2 can be naturally bounded by   Z lim inf µn Bk(0) ≥ lim ζk(x) dµn (2.3.4) n→∞ n→∞ Rd Let the k tend to infinity, and combine 2.3.4 and 2.3.3 we have   Z Z lim lim inf µn Bk(0) ≥ lim lim ζk(x) dµn = lim ζk(x) dµ = 1 k→∞ n→∞ k→∞ n→∞ Rd k→∞ Rd

which implies the tightness of {µn}n∈N, in terms of Expression 2.3.1. By Prokhorov’s

Theorem the each subsequence {µnk }k∈N has at least one weak convergent fur- { } → ∈ P(Rd) ther subsequence µnk(i) i∈N, such that µnk(i) ν for some µ . Then we have to show ν must coincide with µ. For the simplicity of notation, we use

{µnk }k∈N to denote the weakly convergent further subsequence, rather than us- { } ing µnk(i) i∈N. Since we want to reach the weak convergence, we have to use functions in ∞ d d C0 (R ) to approximate function in Cb(R ). This will be proceed in two steps: for a function f ∈ C (Rd), by Lemma 2.25, we can convolve f with the mollifier to b  obtain a sequence of smooth functions fη such that fη → f uniformly on every d compact subset of R . Then, since {µn}n∈N is tight, for each e > 0, we can find d a compact set Ke ⊂ R such that µn (Ke) < e for all n ∈ N. Therefore, we can e e always construct a compactly supported smooth function fη such that fη = fη on e Ke, and we do not care what happens outside Ke. Obviously, fη → f pointwisely as η → 0 and e → 0. d Therefore, for f ∈ Cb(R ), fix η > 0, e > 0, we have Z Z e lim fη dµnk − lim f dµnk k→∞ Rd k→∞ Rd Z e ≤ lim fη − f dµnk k→∞ Rd Z Z e e = lim f − f dµn + lim f − f dµn d η k η k k→∞ R \Ke k→∞ Ke Z Z Z e ≤ lim f dµn + lim k f k d dµn + lim fη − f dµn d η d k d Cb(R ) k k k→∞ R \Ke Cb(R ) k→∞ R \Ke k→∞ Ke   Z e ≤ f + k f k d · e + lim f − f dµn η d Cb(R ) η k Cb(R ) k→∞ Ke

let η → 0, we know that fη − f → 0 uniformly, then also let e → 0, we conclude that Z Z e lim fη dµnk → lim f dµnk (2.3.5) k→∞ Rd k→∞ Rd e ∞ d d e remember that fη ∈ Cc (R ) and f ∈ Cb(R ), and fη → f pointwisely.

14 We know that µn → µ distributionally, and µnk → ν weakly, obviously there is Z Z ∞ d g dµ = g dν, ∀g ∈ Cc (R ) Rd Rd but according to Theorem 2.15, this is not enough to determine that µ and ν are identical. Thus, we proceed the rest by contradiction: d ∞ d Assume that µ 6= ν, then we can find a function f ∈ Cb(R ) \Cc (R ) such that Z Z f dµ 6= f dν Rd Rd e Then we use fη to approximate f , and, respectively, at the left hand side of 2.3.5 we utilize the distributional convergence and apply the Dominated Convergence Theorem, and at the right hand side the weak convergence, we obtain Z Z Z e e lim fη dµnk = fη dµ → f dµ (2.3.6) k→∞ Rd Rd Rd

and Z Z Z

lim f dµnk = f dν 6= f dµ (2.3.7) k→∞ Rd Rd Rd By 2.3.5, 2.3.6 and 2.3.7, there is a contradiction, therefore µ = ν. Until now, we have known: µn → µ distributionally; {µn}n∈N is tight, and each subsequence of {µn}n∈N contains a weakly convergent further subsequence, whose weak limit is also µ. By Proposition 2.20, we conclude that µn → µ weakly.  Notice that, in Theorem 2.26, we say the distributional convergence and weak converge coincide only when their limit is certainly a probability measure. Let us use the example of dirac measures again, consider a sequence of probability measures {δn}n∈N, easily we know that δn → 0 distributionally, but δn 9 0 weakly, as n → ∞.

2.4 Bounded Linear Operators on Hilbert Space

Let H be a separable Hilbert space, and use B(H) to denote space of the bounded linear operators. Moreover, we define the operator norm as

kTk := sup kTxk (2.4.1) x∈H kxk=1

as known to all, B(H) is a Banach space with respect to the operator norm. In this section we will discuss the properties of element in B(H), but at first we shall define some objects appearing in algebra literature.

15 2.4.1 Banach Algebra and C*-Algebra Definition 2.27 (Complex algebra) A complex algebra is a vector space A over the complex field C in which a multiplication is defined that satisfies

x(yz) = (xy)z (x + y)z = xz + yz x(y + z) = xy + xz

and α(xy) = (αx)y = x(αy) for all x, y and z in A and for all scalars α ∈ C.

Definition 2.28 (Banach algebra) If A is a complex algebra, and at the same time A is also a Banach space with respect to a norm k · k that satisfies the multiplicative inequality

kxyk ≤ kxkkyk (2.4.2)

then A is called a Banach algebra.

Notice that a complex or Banach algebra is not necessarily to be commutative.

Definition 2.29 (Involution) A map x 7→ x∗ of a complex algebra A into A is called an involution on A if it has the following four properties, for all x, y ∈ A and α ∈ C:

(x + y)∗ = x∗ + y∗ (αx)∗ = αx∗ (xy)∗ = y∗x∗ x∗∗ = x

By convention, a complex algebra with involution is called *-algebra. If the *- algebra is with Banach structure and multiplicative inequality 2.4.2, we may call it Banach *-algebra, but this is not a often used structure. Instead, we may expect a nicer structure, by imposing the following rather strong condition 2.4.3, and then define the C*-algebra:

Definition 2.30 (C*-algebra) A Banach algebra A with an involution x 7→ x∗ is said to be a C*-algebra, if it also satisfies

kxx∗k = kxk2 (2.4.3) for all x ∈ A.

16 Notice that from 2.4.3 we can easily know kxk2 = kxx∗k ≤ kxkkx∗k, then kxk ≤ kx∗k. Similarly, there is kx∗k = kx∗x∗∗k ≤ kx∗kkx∗∗k = kx∗kkxk, which implies kx∗k ≤ kxk. Therefore

kxk = kx∗k (2.4.4) in each C*-algebra. It also follows that

kxx∗k = kxkkx∗k (2.4.5)

Conversely, 2.4.4 and 2.4.5 obviously imply 2.4.3. An element in a C*-algebra A is said to be positive if it can be written in the form a∗a, for some a ∈ A. A linear operator T : A → B between two C*-algebras is said to be positive if it sends positive elements in A into positive elements in B. Positive operators defined on C*-algebra have many good properties, but this might not enough. Hence, we invent the notion of completely positive operator, but we will throw out its definition only in Section 4.3.1. For the detail about properties of positive and completely positive operator, we refer to [22].

2.4.2 Adjoint Operator Theorem 2.31 If f : H × H → C is sesquilinear and bounded, in the sense that

M := sup | f (x, y)| kxk=kyk=1 then there exists a unique S ∈ B(H) which satisfies

f (x, y) = (x, Sy), ∀x, y ∈ H (2.4.6)

Moreover, kSk = M.

Proof. Since | f (x, y)| ≤ Mkxkkyk, the map x 7→ f (x, y) is, for each y ∈ H, a bounded linear functional on H. By Riesz representation theorem we know there exists a unique element Sy ∈ H such that 2.4.6 holds; also, kSyk ≤ Mkyk. It is clear that S : H → H is additive. If α ∈ C, then

hx, S(αy)i = f (x, αy) = α f (x, y) = αhx, Syi = hx, αSyi for all x, y ∈ H. Therefore S is linear, and S ∈ B(H), and kSk ≤ M. On the other hand, we also have

| f (x, y)| = |hx, Syi| ≤ kxkkSyk ≤ kxkkSkkyk which gives the opposite inequality M ≤ kSk. 

17 If T ∈ B(H), then hTx, yi is sesquilinear and bounded, as a consequence of Theorem 2.31, there exists a unique T∗ ∈ B(H) for which

hTx, yi = hx, T∗yi, ∀x, y ∈ H

and we call T∗ the adjoint operator of T.

Theorem 2.32 B(H) is a C*-algebra.

Proof. It is easy to verify that B(H) is a Banach algebra, and the mapping from an operator T ∈ B(H) to its adjoint operator T∗ ∈ B(H) is an involution. Moreover,

kTxk2 = hTx, Txi = hT∗Tx, xi ≤ kT∗Tkkxk2, ∀x ∈ H

so we have kTk2 ≤ kT∗Tk. On the other hand, kTk = kT∗k gives

kT∗Tk ≤ kT∗kkTk = kTk2 hence the equality kT∗Tk = kTk2 holds for all T ∈ B(H).  In fact, we have the Gelfand–Naimark theorem, which tells us an arbitrary C*-algebra is isometrically *-isomorphic to a C*-algebra of bounded operators on some Hilbert space.

2.4.3 Isometry and Partial Isometry Definition 2.33 (Isometry) An operator T ∈ B(H) is said to be an isometry on H, if

kTxk = kxk, ∀x ∈ H

As we can see, an isometry is a linear mapping which preserves the norm; more generally, an isometry could be defined on the mapping between two metric spaces. Also, we have the following characterization:

Proposition 2.34 Suppose T ∈ B(H), the following statements are equivalent:

1. T is an isometry on H.

2. hTx, Tyi = hx, yi holds for all x, y ∈ H.

3.T ∗T = I on H.

18 Proof. From 1 to 2 we need the following identity, which can be easily checked:

kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 hx, yi = 4 Therefore

kTx + Tyk2 − kTx − Tyk2 + ikTx + iTyk2 − ikTx − iTyk2 hTx, Tyi = 4 kT(x + y)k2 − kT(x − y)k2 + ikT(x + iy)k2 − ikT(x − iy)k2 = 4 kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 = 4 = hx, yi holds for all x, y ∈ H. Now starting from 2,

hTx, Tyi = hx, yi ⇒ h(T∗T − I)x, yi = 0 for all x, y ∈ H. Just take y = (T∗T − I)x, we conclude that T∗T = I on whole H, which is exactly the point 3. By the same ideas, we can go back from 3 to 2 and from 2 to 1, then we complete all the proof.  Before introducing the partial isometry, we need to prove two simple but use- ful lemmas.

⊥ ⊥ Lemma 2.35 If T ∈ B(H), then ker (T∗) = Im (T) , and ker (T) = Im (T∗) .

Proof. Observe that

y ∈ ker(T∗) ⇔ hx, T∗yi = 0, ∀x ∈ H ⇔ hTx, yi = 0, ∀x ∈ H ⇔ y ∈ Im(T)⊥ and the second equality can be demonstrated in the same way. 

Lemma 2.36 Let T ∈ B(H), the following four statements are equivalent: 1.T ∗T is a projector, i.e. it is idempotent.

2.T = TT∗T.

3.T ∗ = T∗TT∗.

4.TT ∗ is a projector.

19 Proof. If P := T∗T is a projector, then define S := TT∗T − T, and observe that

S∗S = (T∗TT∗ − T∗)(TT∗T − T) = P3 − P2 − P2 + P = 0

By the C*-property of B(H), we know that

0 = kS∗Sk = kSk2 thus TT∗T = T. Taking the conjugation on both side we get T∗TT∗ = T∗, and then left multiplying T we get TT∗TT∗ = TT∗, so TT∗ is also a projector. The proof of from 4 to 1 follows the same idea that we have shown in proof of from 1 to 4.  Now, we give the definition of partial isometry, which plays an important role in the polar decomposition theorem and study of von Neumann algebra.

Definition 2.37 (Partial isometry) An operator T ∈ B(H) is said to be an partial ⊥ isometry H if it is an isometry on ker (T) .

Obviously, if T ∈ B(H) is an isometry on H, it must also be a partial isometry on H; but the inverse is not true. For the partial isometry we have the following characterization:

Proposition 2.38 T ∈ B(H) is a partial isometry on H if and only if T∗T is a projector.

Proof. In Lemma 2.36 we have stated T∗T is a projector if and only if T∗ = T∗TT∗; then we have T∗x = T∗TT∗x, ∀x ∈ H or equivalently y = T∗Ty, ∀y ∈ Im (T∗) ⊥ So T∗T = I on Im (T∗), and Lemma 2.35 tells us Im (T∗) = ker (T) . By Propo- ⊥ sition 2.34, we know it is equivalent to say T is an isometry on ker (T) , which is indeed the definition of partial isometry on H.  Then, we concern the question: if T ∈ B(H) is a partial isometry on H, how about T∗? In fact, again with the help of Lemma 2.36, immediately we have the following result:

Corollary 2.39 T ∈ B(H) is a partial isometry on H if and only if T∗ is a partial isometry.

One should also notice that, if H is an finitely dimensional separable Hilbert space, and T ∈ B(H) is an isometry if and only if T∗ is an isometry, since in the finitely dimensional case T∗T = I and TT∗ = I are equivalent. However, if H is

20 of infinitely dimension, the above result may not be true. For example, consider the Hilbert space l2(N), and the left-shift operator L : l2(N) → l2(N) defined as

L : (x1, x2, x3 ··· ) 7→ (x2, x3, x4 ··· ) (2.4.7)

and its adjoint operator is the so-called right-shift operator R : l2(N) → l2(N), which is similarly defined as

R : (x1, x2, x3 ··· ) 7→ (0, x1, x2, ··· ) (2.4.8)

Obviously L is not an isometry on l2(N), but R is; and they are both partial isome- tries. At the end of this section, we write down the famous polar decomposition theory [8], in which the partial isometry appears.

Theorem 2.40 (Polar decomposition theory) If T ∈ B(H), then T has a factoriza- tion √ T = U T∗T where U is a partial isometry on H.

2.4.4 Trace Class Operator Definition 2.41 (Trace-class operator) An operator A ∈ B(H) is said to be in the trace class, if for some (and hence all) orthonormal basis {uk}k of H, the sum of positive terms √ ∗ kAk1 := Tr |A| := ∑h A Aek, eki k is finite.

In this case, the trace of A, which is given by the sum

Tr A := ∑hAek, eki k

is absolutely convergent and is independent of the choice of the orthonormal basis. Then, The set of trace-class operators on H will be denoted by B1(H). It has been shown that [39], (B1(H), k · k1) is a Banach space, and k · k1 is sometimes called nuclear norm or trace norm.

Proposition 2.42 For A ∈ B1(H), we have |Tr A| ≤ Tr |A|.

21 Proof. By the polar decomposition√ theorem, we know there exist a unique partial isometry U such that A = U A∗ A. Moreover, the trace is in invariant under dif- ferent orthonormal√ bases, so we shall choose the basis {ek}k consisting of eigen- ∗ vectors of A A. Observe that, if ek ∈ ker U, then kUekk = 0, otherwise we have kUekk = kekk = 1. Therefore,  √  ∗ |Tr A| = Tr U A A

√ ∗ = ∑hU A Aek, eki k

= ∑ λkhUek, eki k 2 ≤ ∑ λkkekk = Tr |A| k √ ∗ in which λk’s are singular values of A, i.e. the eigenvalues of A A. 

Corollary 2.43 Tr : B1(H) → C is continuous with respect to k · k1.

Proof. Immediately, suppose A, B ∈ B1(H), by Proposition 2.42:

|Tr A − Tr B| = |Tr(A − B)| ≤ Tr |A − B| = kA − Bk1

so Tr is a continuous functional.  Another fact we will concern is that: the natural operator norm k · k defined as 2.4.1 is always controlled by the nuclear norm k · k1.

Proposition 2.44 For A ∈ B1(H), we have kAk ≤ kAk1.

22 2 2 Proof. For convenience we can compare kAk and kAk1, observe that: kAk2 = sup kAxk2 kxk=1 = sup hA∗ Ax, xi kxk=1 * ! + ∗ = sup A A ∑ hx, eki ek , x kxk=1 k ∗ = sup ∑ hx, eki hA Aek, xi kxk=1 k D 2 E = sup ∑ hx, eki λkek, x kxk=1 k 2 2 = sup ∑ λk hx, eki kxk=1 k !2 2 2 ≤ ∑ λk ≤ ∑ λk = kAk1 k k √ ∗ in which {ek}k is the orthonormal basis consisting of eigenvectors of A A, and λk’s are corresponding eigenvalues. 

2.4.5 Von Neumann Algebra A von Neumann algebra or W*-algebra is a unital *-subalgebra of B(H) that is closed in the weak operator topology. It is a very important object in operator algebra and quantum mechanics. Also, it is a special type of C*-algebra. For understanding the definition, we need to introduce some common topologies in B(H). The strongest one is the norm topology or sometimes we also call it uniform topology, i.e. the topology induced by operator norm 2.4.1. The weaker one is called the strong operator topology:

Definition 2.45 (Strong operator topology) A net {Tα}α ⊂ B(H) is said to be con- vergent to some T ∈ B(H) in the strong operator topology, if

Tαx → Tx, ∀x ∈ H Then, we turn to give the definition of weak operator topology, which is even weaker than the strong operator topology.

Definition 2.46 (Weak operator topology) A net {Tα}α ⊂ B(H) is said to be con- vergent to some T ∈ B(H) in the weak operator topology, if ∗ y(Tαx) → y(Tx), ∀x ∈ H, ∀y ∈ H (2.4.9)

23 Equivalently, by Riesz representation theorem, we can also rewrite 2.4.9 as

hTαx, yi → hTx, yi , ∀x, y ∈ H

For example, on the Hilbert space H = l2(N), let R ∈ B(H) be the right shift n operator defined as in 2.4.8, and we can construct a sequence by Rn := R . It is easy to verify that {Rn}n∈N converges to 0 in the weak operator topology. In fact,

|hRnx, yi − h0x, yi| = |hRnx, yi| = |hx, Lnyi| ≤ kxkkLnyk → 0

n ∗ where Ln := L and L = R is the left shift operator, as defined in 2.4.7. Besides, another common topology is the σ-weak topology. It is a well-known result that the predual of B(H) is the trace class operators B1(H), and it generates the weak-* topology on B(H), called the σ-weak topology.

Definition 2.47 (σ-weak operator topology) A net {Tα}α ⊂ B(H) is said to be con- vergent to some T ∈ B(H) in the σ-weak operator topology if

Tr (TαF) → Tr (TF) , ∀F ∈ B1(H)

Now, we can formally set down the definition of the von Neumann algebra. Definition 2.48 (von Neumann algebra) A von Neumann algebra is a unital *-subalgebra of B(H) that is closed in the weak operator topology. As usual, we wish to characterize von Neumann algebra in other ways, but first we should define the commutant:

Definition 2.49 (Commutant) Let A ⊂ B(H), the commutant of A is

A0 := {T ∈ B(H) : AT = TA, ∀A ∈ A}

Then we have the bicommutant theorem given by von Neumann:

Theorem 2.50 (Bicommutant theorem) Let A ⊂ B(H) be a unital *-subalgebra. The following three statements are equivalent: 1. A is closed in weak operator topology, i.e. A is a von Neumann algebra.

2. A is closed in strong operator topology.

3. A = A00.

The proof of bicommutant theorem can be easily found in every relevant text- book. Next, we define some more terms which frequently appear in the literature of von Neumann algebra.

Definition 2.51 (Center) The center of a von Neumann algebra A is the subset A ∩ A0.

24 Definition 2.52 (Factor) A factor is a von Neumann algebra A whose centre is trivial, i.e. 0 A ∩ A = {cI}c∈C

Besides, in a von Neumann algebra usually we concern only the orthogonal projection (without special clarification we shall omit "orthogonal"), i.e. operator P ∈ A such that P = P2 = P∗ P’s are exactly the operators which give an orthogonal projection of H onto some closed subspace. A subspace of the Hilbert space H is said to belong to the von Neumann al- gebra A if it is the image of some projection in A. This establishes a 1-to-1 corre- spondence between projections of A and subspaces that belong to A.[54] Now we define the order on projections, which is the fundamental of classification of factors, with the help of aforementioned 1-to-1 correspondence:

Definition 2.53 (Order on projections) Two subspace E, F ∈ A are called Murray–von Neumann equivalent if there is a partial isometry u ∈ A mapping the E isomorphically onto F. Correspondingly, if define p := PE and q := PF, the Murray–von Neumann equivalence between two projections is denoted as p ∼ q. Moreover, the subspaces in A are partially ordered by inclusion, and this induces a partial order  of projections; that is, we write p  q if E ⊆ F.

By Proposition 2.38, we can have another characterization of the Murray–von Neumann equivalence, that is: we say p ∼ q if there exists a partial isometry u ∈ A with uu∗ = p and u∗u = q. Rigorously, we have the following theorem [40]:

Theorem 2.54 The relation ∼ is exactly an equivalence relation, and the relation  is exactly a partial order on the equivalence classes of projections.

One should notice that  is not a total order on projections, and, potentially, we can construct a total order on the equivalence classes of projections generated by ∼. We write q ≺ p to represent q  p but q 6= p. Then we need to define some special projections, in order to proceed the classification of factors.

Definition 2.55 (Minimal projection) A projection p in a von Neumann algebra A is said to be minimal if there is no other projection q with 0 ≺ q ≺ p.

Definition 2.56 (Finite projection) A projection p in a von Neumann algebra A is called infinite if p ∼ q for some q ≺ p, Otherwise p is called finite.

Then we can state the type classification of factors:

25 Definition 2.57 (Type classification of factors) Suppose we have a factor A, then:

1. A is said to be of type I, if there is a minimal projection. It is customary to call the bounded operators on a Hilbert space of finite dimension n as a factor of type In, and the bounded operators on a separable infinite-dimensional Hilbert space as a factor of type I∞. 2. A is said to be of type II, if there is no minimal projection but there are non-zero finite projections. Moreover, if the identity operator in A is finite, the factor is said to be of type II1; otherwise, it is said to be of type II∞. 3. A is said to be of type III, if A do not contain any nonzero finite projections at all.

We also concern the linear functional defined on a von Neumann algebra:

Definition 2.58 (Tracial state) A tracial state φ on a von Neumann algebra A is a linear functional from the set of positive elements to [0, +∞], such that φ(a∗a) = φ(aa∗) for all a ∈ A and φ(1) = 1.

Murray and von Neumann proved the fundamental result that a factor of type II1 has a unique finite tracial state, and the set of traces of projections is [0, 1] [1]. This result will be useful in Section 4.3.2.

26 Chapter 3

Random Matrix Theory

In classical probability theory, random matrix can be regarded either as a matrix- valued random variable, or a matrix with random variable entries, these two points of view are equivalent. In non-commutative theory, the random matrix is −∞ the element in the non-commutative probability space Mn(C) ⊗ L (in which −∞ L is defined as 3.6.1), equipped with the trace τn ⊗ E. In particular, we focus on the distribution of eigenvalues of random matrices, given some specific prob- abilistic assumptions on the their entries. Firstly, we give the definition of em- pirical spectral distribution (ESD), which is a random distribution, in Section 3.1. Then, in Section 3.2 we define the convergence of random distributions, based on what we have introduced in the previous chapter. Stieltjes transform will be intro- duced in Section 3.3, with which we can even derive the limit spectral distribution of some random matrices. We will see, in Section 3.4, with some special structure, the empirical spectral distributions will tend to some limit spectral distribution, as the size of matrices grows. Besides, the convergence rates of empirical spectral distribution will be concerned in Section 3.5. Lastly, we turn to explain what is −∞ the space Mn(C) ⊗ L in Section 3.6, where we will also introduce some basics of free probability, and discover that there is a natural connection between free probability and random matrix.

3.1 Empirical Spectral Distribution

At the beginning, we discard the randomness and define the empirical spectral distribution of a deterministic matrix A. Without specific notice we will suppose our random matrix to be self-adjoint, so that there spectrum will always be con- tained in the real line.

Definition 3.1 (Empirical spectral distribution) For a Hermitian matrix A ∈ CN×N, its empirical spectral distribution of the eigenvalues µA : B(R) → [0, 1] can be defined

27 as 1 N µ := δ A ∑ λi(A) N i=1

Also we have the same definition but in the language of cumulative distribu- tion function.

Definition 3.2 (Empirical spectral distribution) For a Hermitian matrix A ∈ CN×N, its empirical spectral distribution of the eigenvalues FA : R → [0, 1] is defined as

1 N F (x) := 1 A ∑ {λi(A)≤x} N i=1

where λi(A) is the i-th eigenvalue of A.

Obviously, FA is the distribution function induced by µA. If, from now on, we assume A is a random matrix, then FA is no more de- terministic, and can be regarded as a random variable taking values in the space of probability distribution function on R. Similarly, FA becomes a random mea- sure, i.e. random variable taking values in the space of probability measure on (R, B(R)).

3.2 Convergence of Random Distributions

We said at the beginning, under some specific assumptions, the empirical spectral distribution of a random matrix will converges in some sense to a deterministic distribution, which is called limit spectral distribution. Therefore, we have to rigorously define the different types of convergence of random distributions.

3.2.1 From Deterministic Distribution to Random Distribution Now, we start to define the random distribution. Again, suppose S is a metric space, B(S) is the Borel σ-algebra on S, P(S) is the set of probability measures on (S, B(S)); and (Ω, A, P) is a probability space. We define the random probability measure in the following way:

Definition 3.3 (Random probability measure) A random probability measure is a map µ : Ω → P(S) such that ω 7→ µ(ω)(E) is measurable, for all E ∈ B(S).

Especially, when S = R, we can define the corresponding random cumulative distribution function.

28 Definition 3.4 (Random distribution function) A random distribution function Fµ(ω)(x) can be generated from a random probability measure µ(ω), by

Fµ(ω)(x) := µ(ω) ((−∞, x]) Sometimes we may also be interested in the expectation of the random prob- ability measure. If here we assume S is also locally compact, then by the help of the Riesz–Markov–Kakutani Representation Theorem, we can have the following definition. Definition 3.5 (Expectation of random probability measure) Let µ(ω) be a ran- dom probability measure, then its expectation E [µ(ω)] is defined by the duality, that is, for all f ∈ C0(R) Z Z  f dE [µ(ω)] := E f dµ(ω) R R

3.2.2 General Facts on Convergence of Random Distributions In the section we mainly introduce the results on convergence of random proba- bility measures given by Berti, Patrizia and Pratelli [25], which will be extremely useful later.

Theorem 3.6 Suppose {µn}n∈N is a sequence of random probability measures, and µ is a random probability measure, they are all measures on (S, E) and defined on probability space (Ω, A, P). If S is a Radon space, then the following two statements are equivalent:

1. µn(ω) → µ(ω) weakly for almost all ω ∈ Ω. R R 2. S f dµn(ω) → S f dµ(ω) almost surely for all f ∈ Cb(S). R According to Definition 2.14, statement 1 means: for almost all ω ∈ Ω, f dµn(ω) → R S S f dµ(ω) for all f ∈ Cb(S), then we can easily interchange the order to get state- ment 2. But the converse is not obvious, since if we fix f ∈ C (S), and write that R R b f dµn(ω) → f dµ(ω) converges on a set Ω f ⊂ Ω with probability mea- S S T sure one, we have to show the intersection f ∈Cb(S) Ω f is still with probability measure one. As a corollary to Theorem 3.6, we have Corollary 3.7 With the same setting, if S is a Radon space, then the following two state- ments are equivalent:

1. For each subsequence of µn, denoted as µn0 , there exists a subsequence of µn0 , de- noted as µn00 , such that µn00 (ω) → µ(ω) weakly for almost all ω ∈ Ω. R R 2. S f dµn(ω) → S f dµ(ω) in probability for all f ∈ Cb(S), i.e. for any e > 0, any δ > 0, there exists N = N(δ) such that   Z Z P ω : f dµn(ω) − f dµ(ω) < e < δ, ∀n ≥ N, ∀ f ∈ Cb(S) S S

29 3.2.3 Common Types of Convergence Used in RMT In the literature of Random Matrix Theory, when we say the empirical spectral distributions tend to the limit spectral distribution, the meaning of "tendency" varies under different situations. Also, remember that the term "distribution" indicates either probability measure or distribution function, which brings more confusion. Here we list some types of convergence which are widely used in Random Matrix Theory, and then reveal their relationships. For the simplicity of the notation, sometimes we will not write ω explicitly. First, we introduce the almost sure convergence of random distribution func- tions, which is directly generalized from Theorem 2.17, that is, we do nothing but introducing the randomness into the weak converge of the probability measures. One will find, the almost sure convergence of random probability measures just means the sequence "converges weakly, almost surely" or we say "converges in weak topology almost surely"; and we must be careful about the order of state- ment.

Definition 3.8 (Almost sure convergence of random distribution functions) We say a sequence of random distribution functions Fn → F almost surely, where F is a ran- dom distribution function on R, if Fn(x) → F(x) convergences for all x ∈ R, at which F is continuous, almost surely.

Besides, we can also define the almost sure convergence of random distribu- tions in terms of random probability measures. So we introduce the definition used in [30].

Definition 3.9 (Almost sure convergence of random probability measures) We say a sequence of random probability measures µn → µ almost surely, where µ is a random probability measure on (R, B(R)), if for all test function f ∈ Cb(R) we have Z Z f dµn → f dµ R R almost surely.

Now, we show Definition 3.8 and 3.9 are equivalent, which completes the def- inition of almost sure convergence of random probability distributions.

Proposition 3.10 Suppose there is a sequence of random probability measures {µn}n∈N and a random probability measure µ, and they are all measures on (R, B(R)) and defined on probability space (Ω, A, P). Let Fµn denote the corresponding random distribution function of µn for all n ∈ N, and let Fµ denote the corresponding random distribution function of µ, obtained via Definition 3.4. Then, µn → µ almost surely if and only if

Fµn → Fµ almost surely.

30 R R Proof. µn(ω) → µ(ω) almost surely means R f dµn(ω) → R f dµ(ω) almost surely for all f ∈ Cb(R), and notice the measures are defined on measurable space (R, B(R)), where R is obviously Radon. Then by Theorem 3.6, µn(ω) → µ(ω) almost surely is equivalent to µn(ω) → µ(ω) weakly for almost all ω ∈ Ω.

Next, by Theorem 2.17, it is also equivalent to say Fµn (ω)(x) → Fµ(ω)(x) for all x ∈ R at which Fµ(ω) is continuous, for almost all ω ∈ Ω, this is exactly

Fµn (ω) → Fµ(ω) almost surely.  After having defined the almost sure convergence of random probability dis- tribution, we shall loose the condition, that is, we ask only the convergence in probability. But recall that, the definition of convergence in probability of ran- dom variables depends on the metric on the space where the random variables take the value. For the random probability measures, we can still utilize the met- ric on R through the weak convergence, like in Definition 3.9, and then we have the Definition 3.11, which is used in [30]. Definition 3.11 (Convergence in probability of random probability measures) We say a sequence of random probability measures µn → µ in probability, where µ is a random probability measure on (R, B(R)), if for all test function f ∈ Cb(R) we have Z Z f dµn → f dµ R R in probability. More explicitly, that is to say: for any e > 0,   Z Z lim P ω : f dµn(ω) − f dµ(ω) > e = 0, ∀ f ∈ Cb(R) n→∞ R R

Notice that if µn(ω) → µ(ω) almost surely, as in Definition 3.9, then µn(ω) → µ(ω) in probability, as in Definition 3.11, which is trivial. Unfortunately, when it turns to random distribution function, which can be regarded as the random variable taking values from probability distribution function on R, there is not a nice norm at our fingertips to loose Definition 3.8. Thus, we just introduce the definition of convergence in probability of random distribution functions used in [7], which has exchanged the order of using Ω and R in Definition 3.8, as following: Definition 3.12 (Convergence in probability of random distribution functions) We say a sequence of random distribution functions Fn → F in probability, where F is a random distribution function on R, if for any x ∈ R,Fn(x) → F(x) convergences in probability. More clearly, it is equivalent to say: for any x ∈ R, any e > 0,

lim P ({ω : |Fn(ω)(x) − F(ω)(x)| > e}) = 0 n→∞ This Definition 3.12 is useful particularly when the limit random distribution function F(ω)(x) has good property. For example, if F(ω)(x) is continuous on R for all ω ∈ Ω, then we can construct a bridge from Definition 3.8 to Definition 3.12.

31 Proposition 3.13 If a sequence of random distribution functions Fn → F almost surely, where F is a random distribution function on R, and F is continuous on R for all ω ∈ Ω, then Fn → F converges in probability.

Proof. From the assumptions we know Fn(ω)(x) → F(ω)(x) converges for all x ∈ R, for almost all ω ∈ Ω. Exchanging the order we know Fn(ω)(x) → F(ω)(x) converges for almost all ω ∈ Ω for all x ∈ R, therefore Fn(ω) → Fn(ω)(x) in probability for all x ∈ R, as in Definition 3.12. 

3.3 Stieltjes Transform

As we said at the beginning of Section 3.2, the empirical spectral distribution of some random matrices will tend, in sense we introduced above, to a limit spectral distribution. But how do we know the from of the limit special distribution? The super-important tool is the Stieltjes transform, which will be introduced later.

3.3.1 Definition and Basic Properties of Stieltjes Transform Definition 3.14 (Stieltjes transform) If µ is a probability measure on (R, B(R)). then its Stieltjes transform is defined by

Z 1 sµ(z) = dµ (3.3.1) R x − z where z ∈ D := {z ∈ C : Im z > 0}.

More explicitly, if let z = u + iv, we can write 3.3.1 as Z x − u Z v sµ(u + iv) = µ(dx) + i · µ(dx) (3.3.2) R (x − u)2 + v2 R (x − u)2 + v2 and observe it is well-defined upper and lower half-planes in the complex plane, since if v = 0, then in 3.3.2 the integrand of the real part is ill at x = u. But we restrict our attention only to z in the upper half-plane D. Then, we want to show that a probability measure µ can be recovered by the limit behavior of its Stieltjes transform, which shows a one-to-one correspon- dence between the probability measure and their Stieltjes transforms.

1 Theorem 3.15 (Inverse Stieltjes transform) Define fe(λ) := π Im sµ(λ + ie), then fe is a density function of some probability measure νe on (R, B(R)), and νe → µ weakly as e → 0+.

32 Proof. Observe that 1 1 Z e fe(λ) = Im sµ(λ + ie) = µ(dx) π π R (x − λ)2 + e2

is the probability density of random variable X + Ce, where X is distributed ac- cording to µ, Ce is Cauchy distributed with parameter e, and X ⊥ Ce. We also 7→ e recall that the density function of Cauchy distribution is x π(x2+e2) . Now, we study the cumulative distribution function of Ye := X + Ce, by integrating with respect to λ we have

Z y Z 1 e  ( ) = ( ) Fe y 2 2 µ dx dλ (3.3.3) −∞ R π (x − λ) + e Notice that the integrand in 3.3.3 is always equal or greater than 0, and it is also continuous then measurable. Moreover, µ and the Lebesgue measure are both σ-finite. Therefore we can apply the Fubini-Tonelli Theorem and get Z Z y e ( ) = ( ) Fe y 2 2 dλ µ dx R −∞ π [(λ − x) + e ] Z  1 y − x 1 = arctan + µ(dx) (3.3.4) R π e 2

1 y−x 1 Define the integrand in 3.3.4 as he(x) := π arctan e + 2 . Obviously he → h pointwisely, where 1 h(x) = 1 (x) + 1 (x) 2 {x=y} {x

According to Theorem 2.17, νe → µ weakly.  Next, we introduce a theorem which permits us to work with the convergence of random probability measures after the Stieltjes transform.

Theorem 3.16 Let {µn}n∈N be probability measures on (R, B(R)), and µ be a sub- probability measure on (R, B(R)). Then, µn converges to µ in the vague topology if and

only if sµn (z) converges to sµ(x) for all z ∈ D. R R Proof. If µn → µ vaguely, i.e. R f dµn → R f dµ for all f ∈ C0(R). Observe that, when z ∈ D is fixed, the integrands of the real part and imaginary part in 3.3.2

both vanish as |x| → ∞, then sµn (z) → sµ(z).

33 Conversely, suppose we have sµn (z) → sµ(x) for all z ∈ D. By Helly’s Se- lection Theory, for every subsequence {µnk }k∈N we can always extract a further { } subsequence µnk(i) i∈N, which converges vaguely to some sub-probability mea- sure ν. By the sufficiency that we have proved, sµ (z) → sν(x) for all z ∈ D. nk(i) But Theorem 3.15 tells us the Stieltjes transform uniquely determines a measure, thus µ = ν. As the same idea in the proof of Proposition 2.20, we conclude that µn → µ vaguely.  Now, we turn to compute the Stieltjes transform of the famous semi-circular law, which was first posed by Wigner [2] in 1958.

Definition 3.17 (semi-circular law) A probabilistic measure µsc is with semi-circular law, if its probability density psc is as 1 p p (x) = 4 − x2 · 1 sc 2π {|x|≤2} Immediately, we have

Z 1 ssc(z) = · psc(x) dx (3.3.5) R x − z

y By using twice the change of variable x = 2 cos y, ζ = ei , and the periodicity of triangular functions, we can transform 3.3.5 into the integral over the unit circle in the complex plane:

Z p 1 1 2 1 ssc(z) = · 4 − x · {|x|≤2} dx R x − z 2π 1 Z 2π 1 = sin2 y dy π 0 2 cos y − z 2 1 Z 2π 1 eiy − e−iy  = iy −iy dy π 0 e + e − z 2i 2 1 I ζ2 − 1 = − 2 2 dζ (3.3.6) 4πi |ζ|=1 ζ (ζ − zζ + 1)

Then, we attend to use the Residue Theorem; observe the integrand in 3.3.6 has three poles: √ √ z + z2 − 4 z − z2 − 4 ζ = 0, ζ = , ζ = (3.3.7) 0 1 2 2 2 Recall√ that, for an arbitrary z ∈ C, the real part and imaginary parts of its square root z can be written as

√ 1 q Re z = √ sign (Im z) |z| + Re z (3.3.8) 2

34 and √ 1 q Im z = √ |z| − Re z 2 √ Put z2 − 4 into 3.3.8 we get p  1 q Re z2 − 4 = √ sign (2 Re z Im z) |z2 − 4| + Re (z2 − 4) 2 √ thus we see that z2 − 4 and the real part of z have the same sign, since in Stieltjes transform we consider only z ∈ C such that Im z > 0. Therefore, we conclude that |ζ1| > |ζ2|. Moreover, ζ1ζ2 = 1, then we must have |ζ1| > 1 and |ζ2| < 1. On the other hand, the residues are:

2 4ζ ζ2 − 1 ζ2 − zζ + 1 − ζ2 − 1 (2ζ − z) = = res ζ0 lim 2 z ζ→ζ0 (ζ2 − zζ + 1) 2 ζ2 − 1 p = = − 2 − res ζ2 lim 2 z 4 ζ→ζ2 ζ (ζ − ζ1) By the Residue Theorem we obtain the final result: the Stieltjes transform of semi- circular law is 1  p  s (z) = − z − z2 − 4 (3.3.9) sc 2

3.3.2 Derivation of semi-circular Law Using Stieltjes Transform In this section, we introduce the procedures, without detail, of the derivation of semi-circular law by the Stieltjes Transform. We will show that, even without the prior knowledge about the semi-circular law, we can still discover it. For the rigorous proof please refer to [37] or [31]. In Definition 3.2, we defined the empirical spectral distribution of a Hermitian matrix A ∈ CN×N. Now, we assume the its upper-triangular entries are inde- pendent and identically distributed complex-valued random variables with zero mean and unit variance, and the diagonal entries are independent and identically distributed real-valued random variables. We shall work with normalized matrix √1 A , whose corresponding empiri- N N cal special distribution is

1 N µN := µ √1 = δ  1  AN ∑ λi √ AN N N i=1 N and its Stieltjes transform is

− Z 1 1  1  1 sN(z) := sµN (z) = dµN = tr √ AN − zIN R x − z N N

35 and taking the expectation we get " − # 1 N  1  1 EsN(z) = E √ AN − zIN N ∑ N i=1 ii One can follow these three steps to proceed the derivation of the semi-circular law [31]:

1. For any z ∈ D = {z ∈ C : Im z > 0}, sN(z) → EsN(z) almost surely.

2. For any fixed z ∈ D, EsN(z) converges to some s(z). In particular, we will find a recursion of EsN(z), pass it into limit we will get the equation 1 s(z) = − z + s(z)

whose solution is exactly s(z) = ssc(z), the Stieltjes transform of semi- circular law.

3. sn(z) → s(z) almost surely for all z ∈ D.

Finally, by Theorem 3.16 we conclude that µn → µsc in vague topology almost surely. Further, notice µsc is indeed a probability measure on (R, B(R)), then thanks to Theorem 2.26, we know µn → µsc almost surely.

3.4 Asymptotic Results in Random Matrix Theory

We have seen that, more or less, when the entries of a random matrix meet some specific assumptions, its empirical specific distribution tends to some limit. For example, in the last section, we assume that a Hermitian matrix has i.i.d upper- triangular entries with zero mean and unit variance, and has i.i.d. diagonal en- tries. In the section, we give formal names to some special random matrices, and introduce their corresponding limit spectral distributions.

3.4.1 Wigner Matrices and semi-circular Law In fact, the random matrix appeared in the last section is called Wigner matrices. Definition 3.18 (Wigner matrix) Consider a family of independent, zero-mean, real or complex valued random variables Z , independent from a family {Y } of zero- ij 1≤i

36 As we can see, in Definition 3.18 there is no requirement of having identically distributed entries. Moreover, the following theorems tells us, no matter if the Wigner matrix has the identically distributed entries, under right setting, the its empirical spectral distribution will always tend to the semi-circular law, which was introduced in the Definition 3.17.

Theorem 3.19 (Semi-circular law in i.i.d case [31]) Suppose that AN is an N × N Wigner matrix whose diagonal entries are i.i.d. real random variables with zero mean and those above the diagonal are i.i.d. complex random variables with zero mean and unit variance. Then, the empirical spectral distribution of W := √1 A tends to the N N N semicircular law almost surely. Theorem 3.20 (Semi-circular law in non-i.i.d case [31]) Suppose that W := √1 A N N N is a Wigner matrix and the entries above or on the diagonal of WN are independent but may be dependent on N and may not necessarily be identically distributed. Assume that all the entries of AN are of zero mean and unit variance and satisfy the condition that, for any constant η > 0

1 N N  2 N 1n √ o lim E Ajk · N = 0 N→∞ 2 ∑ ∑ A ≥η N N j=1 k=1 jk

N where Ajk is the entry of AN on j-th row and k-th column. Then, the empirical spectral distribution converges to the semicircular law almost surely.

3.4.2 Wishart Matrices and Marchenko-Pastur Distribution Simply, we call the random matrix W = AA† a Wishart matrix, where A is a N × K random matrix with independent entries. Sometimes we explicitly write ANK to empathize the size of A. Theorem 3.21 (Marchenko-Pastur distribution in i.i.d. case[31]) Consider an N × K matrix A whose entries are independent and identically distributed complex random variables with mean 0 and variance σ2. As N, K → ∞ with N/K → β > 0, the em- 1 † pirical spectral distribution of K AA converges almost surely to a deterministic limit distribution Fβ, which is called Marchenko-Pastur distribution. In particular, If 0 < β ≤ 1,Fβ has density q   bβ − x x − aβ f (x) = (3.4.1) β 2πxβσ2

2 p 2 2 p 2 on interval (aβ, bβ), where aβ = σ 1 − β and bβ = σ 1 + β ; and it has probability density 0 outside this interval. If β > 1,Fβ is a mixed distribution, it has probability density as 3.4.1 on (aβ, bβ), 1 and has probability mass 1 − β at x = 0, and has zero mass or density else anywhere.

37 A trick of transform might be useful in the future chapters, that is, we can define 1 B := √ A K immediately 1 BB† = AA† K Then according to the Theorem 3.21, the limit spectral distribution of BB† is still 3.4.1. In this way, we can equivalently turn to study the spectral distribution of BB†, where B has i.i.d. entries with mean 0 and variance σ2/K. Especially, when 2 σ = 1, we call Fy standard Marchenko-Pastur distribution. Marchenko-Pastur distribution can be used to study the asymptotic property of the eigenvalues of the sample covariance matrices in Statistics, and it can also describe the asymptotic behavior of singular values of large rectangular random matrices. Similarly, we also have a version for Wishart matrix with non identi- cally distributed entries. Theorem 3.22 (Marchenko-Pastur distribution in non-i.i.d. case[31]) Suppose that the entries of A are independent complex variables with a common mean µ and variance σ2. Assume that N/K → β > 0 and that, for any η > 0, 1 N K  2 NK 1n √ o lim E Ajk · NK = 0 K→∞ 2 ∑ ∑ A ≥η K η NK j=1 k=1 jk

1 † Then the empirical spectral distribution of K AA tends to the Marchenko-Pastur law almost surely, with ratio index β and scale index σ2.

3.4.3 Ginibre Matrices and Circular Law In this section we have to throw away some conventional settings introduced before, since now we would like to deal with the case where random matrices are no longer Hermitian, and consequently their spectrum will be on the complex plane instead of the real line. Suppose that XN is an N × N matrix with independent and identically dis- tributed entries, which are of zero mean and unit variance, and we also call it Ginebre ensemble. Now if λ , λ , ··· , λ are eigenvalues of √1 X, then we can 1 2 n N define the two-dimensional empirical distribution by 1 µ (x, y) = #{k ≤ N : Re(λ ) ≤ x, Im(λ ) ≤ y} N N k k One should notice that the techniques dealing with Hermitian matrices wil fail in the non-Hermitian case, like truncation method and moment method. More- over, the Stieltjes transform will become pretty hard. For this reason, the con- jecture that the limit spectral distribution of Ginibre matrices is the circular law

38 had not been proven for decades. Now, we are going to give the newest result [28]. It requires the (2 + e)-th moment to be finite, for some e > 0. However, the problem under the only condition of finite second moment is still open.

Theorem 3.23 (Circular law) Suppose X is a Ginibre ensemble, and each its entry is with finite (2 + e)-th moment, then the empirical spectral distribution of √1 X con- N N verges almost surely to the circular law, which is the uniform distribution on the unit disk of the complex plane.

3.4.4 ESD of Another Important Class of Random Matrices Naturally we want to study the empirical spectral distribution of the matrices with a more general structure. For example, we can generalize XX† to XTX†, or even to A + XTX†. Of course it should be done under proper assumptions on T and A. In fact, we have the following theorem [14]:

Theorem 3.24 Assume that

1. For all N ∈ N, we have the matrix   1 N XN := X N ij 1≤i≤N 1≤j≤K

N in which Xij ’s are complex-valued random variables, and they are identically dis- tributed for all N ∈ N, independent across i and j for each N ∈ N. Moreover,

2 1 1 E X11 − EX11 = 1

2. When N → ∞, there are K → ∞ and K/N → β.

N N 3.T N = diag τ1 , ··· , τK is with entries of real-valued random variable, and its ESD converges almost surely to a probability distribution function H as N → ∞.

† 4.B N = AN + XN TN XN, where AN is N × N Hermitian random matrix, for which FAN converges to Λ in vague topology almost surely. Notice that here Λ is a (possibly defective) deterministic distribution function.

5.X N,TN and AN are independent.

Then, FBN → F in vague topology almost surely as N → ∞, where F is a deterministic probability distribution function, whose Stieltjes transform is

 Z x  sF(z) = sΛ z − β H (dx) (3.4.2) R 1 + x · SF(z)

39 From 3.4.2 we can directly know what will happen if the matrix A disappears. Observe that this is equivalent to adding a zero matrix AN for each N ∈ N, and obviously AN has the eigenvalue 0 with multiplicity N, therefore here Λ is exactly the Heaviside function. By definition of Stieltjes transform we know Z 1 1 sΛ = H(dx) = − R x − z x Then we have  Z x −1 sF(z) = − z − β H (dx) (3.4.3) R 1 + x · SF(z) † where F now is the limit spectral distribution of matrices XN TN XN.

3.5 Convergence Rates of ESD

When we want to apply the RMT in engineering, the convergence rates of the empirical spectral distribution must be also considered, since we do not want to waste our limited computation capacity. In this section we will introduce the con- vergence rates of empirical spectral distribution of Wigner matrices, and Wishart matrices, respectively. But first, when talking about the convergence, we must be able to measure the similarity between the objects on which we want to define the convergence. In Section 3.2.3 we mentioned that there is no readily available norm in the space of distribution functions on (R, B(R)), since this space is without algebraically lin- ear structure. Intuitively we will consider the metric kF − Gk := supx∈R |F(x) − G(x)| (it is inherited from the norm of the space of bounded continuous functions on R), where F and G are distribution functions. In fact, it will turn out to be more effi- cient to use the Stieltjes transform of the distribution functions to bound kF − Gk.

3.5.1 In Cases of Wigner Matrices and Wishart Matrices For a Wigner matrix W = √1 A , suppose there are N N N

N N N 1. EAij = 0, Aij = Aji , for all 1 ≤ i ≤ j ≤ N.

 2 2 2. E AN = 1 for all 1 ≤ i < j ≤ N, and E AN = σ2 for all 1 ≤ i ≤ N. ij ii

 6 3 3. sup sup E AN < ∞ and sup sup E AN < ∞. n∈N 1≤i≤j≤N ij n∈N 1≤i≤N ii Under above assumptions, and heavily relying on the usage of Stieltjes trans- form, we have the result [11] as

40 Theorem 3.25 Let FN denote the empirical distribution functions of WN, and let F de- note the semi-circular law, then

 − 1  kEFN − Fk = O N 2 (3.5.1)

The term EFN is as in Definition 3.5, and 3.5.1 means, there exists C > 0 and N0 ∈ N0 such that, for all N > N0, we have

− 1 kEFN − Fk < CN 2

For a Wishart matrix W = AA† ∈ CN×N, where A is of size N × K, if the following assumptions holds

1. All the entries of ANK are independent.

 2 2. EANK = 0 and E ANK = 1 for all 1 ≤ i ≤ N, 1 ≤ j ≤ K. ij ij

 6 3. sup sup E ANK < ∞. n∈N i,j ij then we have the result [12] as:

+ Theorem 3.26 Suppose that as N, K → ∞,N/K → y ∈ R . Let FN denote the 1 † N×N empirical distribution functions of K AA ∈ C , and let Fy denote the Marchenko- Pastur distribution defined as 3.4.1 then   − 1 −1 − 1 O K 2 a a > K 3 EF − F = (3.5.2) N yN  − 1  O K 6 otherwise

√ 2 where yN = N/K and a := 1 − y .

Notice that, if we know nothing about the convergence rate of |y − y|, it N is impossible to establish any rate for the convergence of EFN − Fy , since the convergence of |yN − y| could be arbitrarily slow. Conversely, if we know the convergence rate of |y − y|, then from 3.5.2 we can easily derive a convergence N rate for EFN − Fy . For example, while doing the simulation, we can control the convergence rate of |y − y|, and we could even set y ≡ y, which makes N N EFN − FyN degenerate to EFN − Fy .

41 3.5.2 Simulation of Convergence Rate of ESD

One can observe that, once having fixed ω ∈ Ω, the value of kFN(ω) − Fk can be numerically obtained. In fact, FN(ω) is a stepwise function, which is a constant on interval [λi(WN), λi+1(WN)) for all 1 ≤ i ≤ N − 1. Moreover, FN(ω) and F are both non-decreasing, right-continuous, and with property limx→−∞ FN(ω)(x) = limx→−∞ F(x) = 0 and limx→+∞ FN(ω)(x) = limx→+∞ F(x) = 1. It is obvious that on each interval [λ (W ), λ + (W )), sup kF (ω)(x) − F(x)k must i N i 1 N x∈[λi,λi+1) N appear on the boundary of the interval, i.e. we have

sup kFN(ω)(x) − F(x)k = max kFN(ω)(x) − F(x)k x∈{λ ,λ + } x∈[λi,λi+1) i i 1

Specially, the supremum of the difference on interval (−∞, λ1) must occur at x = λ1, on interval [λN, +∞) at x = λN. In this way, we can obtain the supre- mum of the difference by selecting the maximal difference among the differences computed on each splited interval. This procedure takes 2N times computations.

Figure 3.5.1: Convergence rate of empirical spectral distribution for Wigner matrices

‖N 2

2

F N F

2 2 N

Then we can do the simulation of the convergence rate of the empirical spec- tral distribution for the standard Gaussian Wigner matrices, the result is shown in Figure 3.5.1, in which the red line is obtained by fitting the simulation data, ac- cording to Expression 3.5.1. Although, Expression 3.5.1 only gives a lower bound of the convergence rate, which means the convergence rate could be faster. How- ever, according to the simulation result, we can guess that the true convergence rate will not be far from the given bound.

42 3.6 Connections with Free Probability

Around 1985, for solving the free group factors isomorphism problem, Voiculescu initiated the theory of free probability, in which the random variables are no more necessarily commutative. In fact, the probability space will not be constructed with the help of σ-algebra and measure, instead, we use the algebraic tools. We will see, the random matrix theory is strongly related to free probability, since random matrices are naturally non-commutative random variables. Additionally, in the so-called free central limit theorem in the theory of free probability, which is the analogue of central limit theorem in the classical probability theorem, the semi-circular law will take over the role of normal distribution; this is a striking result. We assume the readers have the basic knowledge on algebra structures and category theory.

3.6.1 Non-commutative Probability Space and Freeness We shall firstly introduce the definition of algebraic probability space.

Definition 3.27 (Algebraic probability space) Let A be a unital algebra, and φ : A → C be a linear functional such that φ(e) = 1, where e is the identity in A. Then, we say (A, φ) is an algebraic probability space.

Definition 3.28 (Random variable and distribution) An element of a ∈ A is called a random variable in (A, φ). The distribution of a is the linear functional µa on the algebra of complex polynomials in one variable, denoted as C [X], defined as

µa(P) = φ (P(a)) , ∀P ∈ C[X]

(The definition of joint distribution of random variables can be given in a similar way, but we will not give the detail here.) Moreover, we may limit our work on particular algebras, like *-algebras, C*- algebras and von Neumann algebras, or add more assumptions on the linear functional φ, for obtaining better properties. Usually, we have the following com- mon settings on φ:

Definition 3.29 (State) Let A be a unital *-algebra, the linear functional φ : A → C is said to be a state, if φ(e) = 1 and φ(a∗a) ≥ 0 for all a ∈ A. Especially, we say state φ is tracial, if φ(ab) = φ(ba) for every a, b ∈ A; we say φ is faithful, if φ(a∗a) = 0 if and only if a = 0.

In (A, φ), if A is a unital C*-algebra, and the linear functional φ is a state, we say (A, φ) is a C*-probability space. Furthermore, if A is a unital von Neumann algebra, we will require the state φ to be also σ-weakly continuous, and here we

43 recall Definition 2.47; then (A, φ) becomes a von Neumann probability space, which is the fundamental of quantum probability theory (another branch of non- commutative probability). However, in free probability we may just need the setting as in Definition 3.27. In general we assume A is non-commutative, and we will use the term "non-commutative probability space" to refer it. But remember that, if A is com- mutative, it is compatible with the classical probability space. In fact, the algebra of some specific random matrices can be equipped with an appropriate linear functional to general a non-commutative probability space. Assume (Ω, F, µ) is a classical probability space, then we define the algebra \ L−∞ := Lp(Ω, µ) (3.6.1) 1≤p<∞ here we have not chosen the algebra L∞(Ω, µ) since we do not want to exclude Gaussian random variables. Next, we define the tensor algebra M(L−∞) =∼ −∞ Mn(C) ⊗ L (we will discuss an almost instinct algebra structure with detail in Section 4.3.1), with a linear functional τn ⊗ E, where E is the integration with respect to µ, and τn is the normalized trace on elements of Mn(C). Then we start to discuss the independence in the free probability. As known to all, independence is the core of probability; in the classical probability theory, we define the independence of a collection of σ-algebras. Similarly, in the free probability, we define the independence of unital subalgebras in the following manner: Definition 3.30 (Freeness) Let (A, φ) be a non-commutative probability space. A col- lection of unital subalgebras {Ai}i∈I is called freely independent (or in short, free), if

φ(a1a2 ··· an) = 0 ∈ A 6= 6= · · · 6= ( ) = ∈ { ··· } whenever aj ij , i1 i2 in, and φ aj 0 for all j. Here j 1, 2, , n .

Note that, in the above definition, i1 6= i2 6= · · · 6= in means i1 6= i2, i2 6= i3 and so on. In other words, neighboring elements in a1a2 ··· an are from different subalgebras. Moreover, a family of random variables {an}n are called free if the algebras generated by an and e are free. We will talk about the origin of this definition later, now we shall give another example of non-commutative probability space, thanks to [51]. Let G be a group, its group algebra CG is defined as CG := ∑g∈G αgg, where αg ∈ C for all g ∈ G and α 6= 0 for only finitely many g ∈ G. This group algebra is equipped with the multiplication ! ! ∑ αgg · ∑ βhh := ∑ αgβh (gh) g∈G h∈G g,h∈G

44   C for all αg g∈G, βg g∈G meeting the above requirement. Obviously, G and G have the same identity. Therefore, CG is a unital algebra. On CG we can define a unital linear functional φG : CG → C by ! φG ∑ αgg := αe g∈G

Then the tuple (CG, φG) is a non-commutative probability space.

3.6.2 Free Product and Free Probability In fact, the Definition 3.30 is inspired from the free product of groups, which is exactly the coproduct of a collection of groups, defined by the universality. More generally, we give the definition of the coproduct of a collection of objects in some category:

Definition 3.31 (Coproduct) Let {A } ∈ be a family of objects in a category A. By i i I their coproduct one means a pair C, { fi}i∈I consisting of an object C ∈ A and a family of morphisms { fi}i∈I satisfying the following property: given a family of morphisms {gi : Ai → S}, there exists a unique morphism h : C → S such that h ◦ fi = gi for all i ∈ I.  Observe the coproduct C, { fi}i∈I of the family {Ai}i∈I is a universal object in the category whose objects consist of collections of morphism {gi : Ai → S}. Besides, we usually write qi∈I Gi := C. Now we turn to free product for a family of groups. For the category of groups G, we can construct the free product in the following way: given a family of groups {Gi}i∈I, we define its word as a product of the form

x1x2 ··· xn

where each xj can be from arbitrary Gi. The reduced word can be defined, using the following operations:

1. Remove xj if it is identity element in some Gi.

2. If the adjacent elements xjxj+1 are from the same group Gi, replace it by their product in Gi.

Then the free product of {Gi}i∈I, denoted as F, is the group whose elements are the reduced words defined as above, with the empty word e0 as identity, and the concatenation followed by reduction as the product. Naturally, for each i ∈ I, we have the homomorphism fi : Gi → F defined by mapping xj to the word of 0 length one xj if xj 6= e in Gi, and to the empty world e if xj = e in Gi. One can

45 also observe that: any element x ∈ F different from the identity e0, i.e. the empty word, can be written as a finite product = ··· x xi1 xi2 xin (3.6.2) ∈ 6= 6= 6= · · · 6= with xik Gik , xik e and i1 i2 in. It is easy to verify that the above constructed free product is exactly the co- product in category of groups:

Proposition 3.32 Let {Gi}i∈I be a family of groups, then its free product F with { fi}i∈I is the coproduct of the family.

Proof. Let H be any group, and let gi : Gi → H be a collection of homomorphisms. Then we define the map φ : F → H by ( ) =   ··· φ x gi1 xi1 gi2 xi2 gin (xin ) (3.6.3) 0 0 when x 6= e and φ(e ) = e, so obviously φ is a homomorphism, and gi = φ ◦ fi immediately holds, and gives the uniqueness. 

Therefore F = qi∈I Gi, and from now on we will not distinguish the free product and coproduct for a family of groups, we will use the only one notation (qi∈I Gi, { fi}i∈I). Inversely, we have the following characterization of the free product, which is very useful later when we build the bridge between free product and free proba- bility.

Proposition 3.33 Let G be a group and {Gi}i∈I a family of subgroups. Assume:

1. The family {Gi}i∈I generates G. = ··· ∈ 6= 6= 6= · · · 6= 6= 2. If x xi1 xi2 xin , with xik Gik , xik e, and i1 i2 ik, then x e in G.

Then G is the free product of {Gi}i∈I.

0 Proof. We use gi : Gi → G to denote the subgroup inclusion, e and e the iden- tity in G and qi∈I Gi, respectively. By the universal property of the free product qi∈I Gi, there exists a unique homomorphism φ : qi∈I Gi → G such that:

φ ◦ fi = gi, ∀i ∈ I

It is obvious that φ is surjective, since {Gi}i∈I generates G. In fact, φ is also injective. let x ∈ ker φ, and we suppose that x 6= e0, i.e. being not the empty word, then as in Expression 3.6.2, we know that x can be represented as: = ··· x xi1 xi2 xin

46 ∈ 6= 6= 6= · · · 6= with xik Gik , xik e and i1 i2 in. Putting φ on both sides, with applying Expression 3.6.3, we get = ( ) = ( ··· ) e φ x φ xi1 xi2 xin =   ··· gi1 xi1 gi2 xi2 gin (xin ) = ··· xi1 xi2 xin but according to Condition 2 we have the contradiction, thus ker φ = {e0}. ∼ In conclusion, φ : qi∈I Gi → G is an isomorphism, i.e. qi∈I Gi = G.  Note that some time we say the subgroups {Gi}i∈I are free (in algebraic sense) in G, if they meet the Condition2 in Proposition 3.33. Now we can state the most important theorem in this section, which reveals the connection between free product and free probability.

Theorem 3.34 Let {Gi}i∈I be subgroups of a group G. Then the following two state- ments are equivalent:

1. The subgroups {Gi}i∈I are free in G.

2. The subalgebras {CGi}i∈I are freely independent in the non-commutative proba- bility space (CG, τG).

Proof. From2 to1: suppose {CGi}i∈I are freely independent, then for any finite ··· ∈ 6= 6= · · · 6= ( ) = sequence a1, a2, , an such that aj Gij , i1 i2 in, and τG aj 0 for all j, there is τG (a1a2 ··· an) = 0 (3.6.4) = ( ) = Recall that aj ∑gj∈G αgj gj, and τG aj 0 tells us the coefficient of aj on e is ··· 0. Therefore 3.6.4 implies any product of form xi1 xi2 xin cannot be equal to e, ∈ 6= 6= 6= · · · 6= where xij Gij , xij e, and i1 i2 in. From1 to2: it is trivial, just consider it inversely. 

3.6.3 Free Central Limit Theorem and Asymptotic Freeness From Definition 3.28 we already know what is the random variable and its distri- bution in an non-commutative probability space. In this section, we will discuss the analogue of central limit theorem in the theory of free probability. So, first we should clarify what kind of convergence will be used.

Definition 3.35 (Convergence in distribution) Let {an}n∈N be a sequence of ran- dom variables in the non-commutative probability space (A, φ), we say {an}n∈N con- verges in distribution if there is

lim µa (P) = µ(P), ∀P ∈ C[X] n→∞ n where µ : C[X] → C is a linear functional such that µ(1) = 1.

47 Then we give the free central limit theorem [10] without proof, which is the evidence that semi-circular law, our old friend in random matrix theory, is the analogue of normal distribution in the theory of free probability.

Theorem 3.36 Let (A, φ) be a non-commutative probability space, and let {an}n∈N be a free family of random variables in A such that:

1. φ(an) = 0 for all n ∈ N;

k 2. supn∈N φ(an) < ∞ for all k ≥ 2;

1 n 2 3. limn→∞ n ∑j=1 φ(aj ) = 1. Define 1 sn := √ (a + ··· + an) , n 1 then sequence {sn}n∈N converges in distribution to the semicircle law µ. We remind that in this case µ : C[X] → C is defined as

Z p 1 2 1 µ(P) = P(x) 4 − x · {|x|≤2} dx 2π R Another topic we would like to cover in this section is the asymptotic freeness, which means the freeness holds in the limit case.

Definition 3.37 (Asymptotic freeness) A sequence of families of random variables is said to be asymptotically free, if it converges in distribution to a family of free random variables.

It has been shown that the asymptotic freeness appears between independent Gaussian random matrices, but there are many more classes of random matrices which show asymptotic freeness. In particular, in the book of Mingo and Speicher they have presented similar results for Wigner matrices, Haar unitary random matrices and treat also the relation between such ensembles and deterministic matrices. For the detail, we refer to [47].

48 Chapter 4

Applications of RMT

In this chapter we introduce some applications of random matrix theory in the wireless communication systems and open quantum systems.

4.1 In MIMO System

At one time, in telecommunication engineering, the term "MIMO" referred to the use of multiple antennas at the transmitter and the receiver. In modern us- age, "MIMO" specifically refers to a practical technique for sending and receiving more than one data signal simultaneously over the same radio channel by ex- ploiting multi-path propagation. MIMO is fundamentally different from smart antenna techniques developed to enhance the performance of a single data sig- nal, such as beamforming and diversity [52]. Suppose we have a memoryless multiple-input and multiple-output (MIMO) system without power constraint, as following: Y = HX + E (4.1.1) 2 in which X ∼ CNK(0, σX IK) is the N-dimensional input vector, Y is the N-dimensional 2 N×K output vector, E ∼ CNN(0, σE IN) is the noise, and H ∈ C is a random channel matrix, whose entries describe the variation of input vector while the transmis- sion, but at present we assume it is deterministic. Obviously we have 2 † 2 Y ∼ CNN(0, σX HH + σE IN) (4.1.2) Now, we can study the joint probability density function of (X, Y). Using the relationship in 4.1.1, there is

f(X,Y)(x, y) = f(X,E)(x, y − Hx) = fX (x) fE(y − Hx) (4.1.3) then directly we have the conditional probability density

f(X,Y)(x, y) fY|X (y|x) = = fE(y − Hx) (4.1.4) fX (x)

49 By the joint probability density 4.1.3 and the conditional probability density 4.1.4 and recall Definition 2.7, we get the conditional entropy h(Y|X) as ZZ h(Y|X) = − f(X,Y)(x, y) log fY|X (y|x)dxdy ZZ = − fX (x) fE(y − Hx) log fE(y − Hx)dxdy Z Z  = − fX (x) fE(y − Hx) log fE(y − Hx)dy dx Z = h(E) · fX (x)dx = h(E)

According to Proposition 2.9 and Proposition 2.11, we can compute the mutual information between input vector X and Y as following

1 1 I(X, Y) = [h(Y) − h(Y|X)] N N 1 = [h(Y) − h(E)] N 1 h      i = log det πe σ2 HH† + σ2 I − log det πe σ2 I N X E N E N 1  †  = log det SNR · HH + I (4.1.5) N N N 1  †  = ∑ log SNR · λi(HH ) + 1 (4.1.6) N i=1

2 2 in which SNR := σX/σE, it represents the ratio of signal power to the noise power.

4.1.1 Asymptotic Result I: Fixed Number of Receivers From now on, we add some assumptions on the entries of H ∈ CN×K. In the sim- plest case, we suppose that they are independent identically distributed complex- 1 valued random variables, with mean 0 and variance K ; notice that the variance must be controlled to avoid divergence. Suppose the number of transmitters K tends to infinity, and the number of receivers N is fixed. We write † a.s. HH −→ IN in sense that the almost sure convergence holds entrywisely. In fact, consider the entry of HH† located at the i-th row and j-th column,

K (  † † a.s. 0 i 6= j HH = Hik Hkj −→ ij ∑ k=1 1 i = j

50 which is an immediate result of strong law of large numbers. Observe that the 2 map 4.1.5 is continuous, from a subspace of RN to R, thus we conclude, as K → ∞, with N being fixed,

1 a.s. I(X, Y) −→ log (SNR +1) (4.1.7) N

Figure 4.1.1: The channel capacity increase with respect to SNR under different K and fixed N = 5

0 00 00 0

0 Y X I

N

0

0

00

0 0

As displayed in Figure 4.1.1, the red line represents the right-hand side of 4.1.7, and the blue lines depict the results obtained by simulating 4.1.6. One may also be curious about what will happen when the number of trans- mitters N tends to infinity while K being fixed. Unfortunately, no obvious math- ematical tool could be found for analyzing the asymptotic result, so we turn to simulation. By the simulation we believe that, in this case, I(X, Y) diverges and 1 N I(X, Y) → 0 in some sense.

4.1.2 Asymptotic Result II: Simultaneously Tending to Infinity Now, we turn to assume that N, K → ∞ simultaneously, but with some fixed ratio. We use the notation of empirical spectral distribution, as in Definition 3.2, to rewrite 4.1.6 as N 1 1  †  I(X, Y) = ∑ log SNR · λi(HH ) + 1 N N i=1 Z N = log (SNR · x + 1) dF † R HH

51 Here we have the same assumptions on the entries of H, as in Section 4.1.1. Then, let N, K → ∞ with N/K → β. According to Theorem 3.21, we have the limit spectral distribution of HH†. As a consequence, for all β > 0, we have

q   b b − x x − a 1 a.s. Z β β β I(X, Y) −→ log (SNR · x + 1) · dx (4.1.8) N aβ 2πxβ

p 2 p 2 in which aβ = 1 − β and bβ = 1 + β . One may notice that the function x 7→ log (SNR · x + 1) does not belong to Cb(R), which seems to be an obstacle for concluding the convergence. Fortu- → → N nately, as N, K ∞ with N/K β, the limit distribution of FHH† , the Marchenko- + Pastur distribution, is compactly supported by [aβ, bβ] ⊂ R ; and for fixed N ∈ N N R+ † , FHH† is always with finitely many atoms on (observe HH is positive semi- definite), so it is not difficult to prove the tightness. We assert that we can always find a compact subset K such that [aβ, bβ] ⊂ K ⊂ (−1/ SNR, +∞), and outside of N K, FHH vanishes for all N ∈ N. Easily, we can construct a continuous function g(x) such that

g(x) = log(SNR · x + 1), ∀x ∈ K and |g(x)| decreases to 0 outside K. Obviously g ∈ Cb(R). Then, we have Z Z N 1 N log(SNR · x + 1) dF † = log(SNR · x + 1) · K(x) dF † R HH R HH Z N = g(x) dF † R HH q   b b − x x − a a.s. Z β β β −→ g(x) · dx aβ 2πxβ q   Z bβ bβ − x x − aβ = log (SNR · x + 1) · dx aβ 2πxβ which completes the demonstration of 4.1.8. Tulino and Verdú [24] gave an analytical integration result of 4.1.8. For the purpose of simulation, the numerical integration is acceptable. Then we have the Figure 4.1.2, in which the red line is plotted based on Expression 4.1.8 and the blue line based on Expression 4.1.6; as we can see, the empirical spectral distribution converges quite quickly, since there is few fluctuation when N equals to a small integer, for example, one hundred, this also validates the results have introduced in the Section 3.5.

52 Figure 4.1.2: The channel capacity increase with respect to SNR, under different N and K

0 0 0

0 0 0 Y Y X X

I I N N 0 0

0 0

00 00

0 0 0 0 000 00000

0 0

Y Y X X I I N N 0 0

0 0

00 00

0 0 0 0

4.2 In CDMA System

Code-division multiple access (CDMA) is an example of multiple access, where several transmitters can send information simultaneously over a single commu- nication channel. To permit this without undue interference between the users, which indicate the message source in the multiple access channel; CDMA em- ploys spread spectrum technology and a special coding scheme. [44] Unlike Time-division multiple access (TDMA) and Frequency-division mul- tiple access (FDMA), in CDMA the signals overlap both the time and frequency domain. Here we give an extremely simple example of CDMA. Consider two carrier signals (or we call them signature waves) s1(t) and s2(t)

53 Figure 4.2.1: Orthogonal signals s1 and s2 assigned to two users

t t

0 0 s s

0 0 t t like in Figure 4.2.1, we say they are orthogonal because

Z 1 (s1, s2)L2([0,1]) = s1(t)s2(t) dt = 0 0 Obviously, they are overlapped in time domain, and in frequency domain. Sup- pose we have the codes

b1 = {1, 0, 1, 0} b2 = {1, 1, 0, 0} that expected to be transmitted, then, just by the modulation strategy showed in Figure 4.2.2, we shall finally transmit the signal which is as sum of two wave- forms (as in the third sub-figure). Obviously, we can immediately proceed the demodulation of the signal to obtain the b1 and b2. In general [18], the basic CDMA K-user channel model can be written as

K y(t) = ∑ Akbksk(t) + σn(t), t ∈ [0, T] k=1 where sk is the deterministic signature waveform assigned to the k-th user, with 2 unit norm in L ([0, T]), Ak is the received amplitude of the k-th user’s signal, bk ∈ {−1, +1} is the bit transmitted by the k-th user, and n(t) is the white Gaussian noise with unit variance. Moreover [20], if the k-th user want to send the multi-dimensional codeword, n i.e. bk ∈ R , the transmitting signal can be represented as

n Ak ∑ bkisk(t − iT) i=1

54 Figure 4.2.2: Modulation using carrier signals s1 and s2

1

1 1

1

1 1

1

1 1

1 1

1

After having transmitted the signal via the channel according to the aforemen- tioned forms, our current mission is to extract the codewords buried in the signal; this is also called demodulation or user detection.

4.2.1 Cross-correlations of Random Spreading Sequences

Usually, the signature waveform sk has duration T, with unit energy, lives in an N-dimensional space N sk(t) = ∑ ckjφj(t) j=1 0 where ck = [ck1, ··· , ckN] is the spreading√ code√ assigned to the k-th user with each elements chosen from set {−1/ N, +1/ N}; and φ(t)’s are called orthonor- mal chip waveforms, since

 = φi, φj L2([0,T]) δij where δij is the Kronecker delta notation.

55 Now, we define the important cross-correlation matrix R ∈ RK×K, whose en- tries Rkl’s represent the cross-correlation between the signature waveforms sk and st. More explicitly, for all 1 ≤ k, l ≤ K, we have

N Rkl = (sk, sl)L2([0,T]) = ∑ ckncln (4.2.1) n=1

apparently each Rkl satisfies −1 ≤ Rkl ≤ 1. If use C to denote the following matrix

C := [c1, c2, ··· , cK] (4.2.2)   c11 c21 ··· cK1  c12 c22 ··· cK2  =   ∈ RN×K  . . .. .   . . . .  c1N c1N ··· cKN then 4.2.1 can be rewritten by 4.2.2, we get

R = C0C ∈ RK×K

For analysis of the average performance of multiple user detection, we as- sume C to be a random matrix, with assumptions that: all the entries of C are independent and identically distributed with two-point distribution    √1 1 P ckj = − = 2  N  ∀1 ≤ k ≤ K, 1 ≤ j ≤ N (4.2.3) P c = + √1 = 1  kj N 2

From 4.2.3 we can directly obtain the moment information of the entries of C; as K, N tends to infinity in some way, we shall use the Marchenko-Pastur law to study the asymptotic properties.

4.2.2 MMSE Multiuser Dectection and Spectral Efficiency In user detection, we can apply the mean squared error criterion, which is intro- 2 duced in Section 2.2. The k-th user should choose the waveform wk ∈ L ([0, T]) such that h i2 min E bk − (wk, y) 2 (4.2.4) 2 L ([0,T]) wk∈L ([0,T])

in which bk ∈ R is the codeword, and y is the received continuous-time wave- form, which carries the randomness since it is polluted by the noise. According to Verdú [18], we can formulate 4.2.4 as an equivalent form, which transforms

56 the optimization problem in infinitely dimensional space into the one in finitely dimensional space: h i min E kb − Myk2 M∈RK×K s.t. y = RAb + n where R ∈ RK×K is the cross-correlation matrix of the spreading sequence, as defined in 4.2.1, A ∈ RK×K describes the power of the signals sent by users, y ∈ RK×1 is the vector of output of normalized matched filter, n ∈ RK×1 is the ad- ditive noise with zero mean and covariance matrix σ2R, and k · k is the Frobenius matrix norm. Verdú [18] also proved that the achieved minimal mean squared error is  −1 h 2i h 2 i MMSE := min E kb − Myk = tr IK − σ ARA (4.2.5) M∈RK×K Now we assume that all the users have the equal power, which simplifies the A, then the Expression 4.2.5 can be transformed into

1 1  −1 MMSE = tr [I + SNR R] K K K 1 K 1 = (4.2.6) ∑ + SNR ( ) K k=1 1 λk R Z 1 K = dFR R 1 + SNR · x

We notice that the function x 7→ 1/ (1 + SNR · x) is not bounded on whole R, use the trick that we applied in derivation of 4.1.8, and recall the entries of C ∈ RN×K are i.i.d. with mean zero and variance 1/N; we conclude that, as N, K → ∞, with K/N → β > 0, by the Theorem 3.21, there is: If β ≤ 1, then q   b b − x x − a 1 a.s. Z β 1 β β MMSE −→ dx (4.2.7) K aβ 1 + SNR · x 2πxβ otherwise, if β > 1, then q     b b − x x − a 1 a.s. 1 Z β 1 β β MMSE −→ 1 − + dx (4.2.8) K β aβ 1 + SNR · x 2πxβ

p 2 p 2 in which aβ = (1 − β) and bβ = (1 + β) . The Figure 4.2.3 shows the asymptotic behavior of MMSE, in which the blue lined are obtained by generating random matrix and computing 4.2.6, and the red lines are obtained by integrating 4.2.7 if β ≤ 1 and 4.2.8 if β > 1.

57 Figure 4.2.3: Variation of the MMSE with the increase of SNR

N= 10, K= 15, beta= 1.50 N= 15, K= 10, beta= 0.67

1.0 1.0

0.9 0.8

0.8

0.6 0.7 MMSE MMSE 1 K 1 K

0.6 0.4

0.5 0.2

0.4 0 2 4 6 8 10 0 2 4 6 8 10 SNR SNR

On the other hand, we also concern the spectral efficiency of system, which refers to the information rate that can be transmitted over a given bandwidth in a specific communication system. Then, recall that ck denotes the spreading codes of the k-th user, and we shall define K−1 0 N×N Q := ∑ ckck ∈ R k=1 N×N Σ := IN + SNR · Q ∈ R Here Q is also a Wishart matrix; in fact, if we continue to define  0    c1 c11 c12 ··· c1N 0  c   c21 c22 ··· c2N  ˜ =  2  =   ∈ R(K−1)×N C :  .   . . .. .   .   . . . .  0 cK−1 c(K−1)1 c(K−1)2 ··· c(K−1)N Obviously we have Q = C˜0C˜ (4.2.9) and entries of C˜ are i.i.d. with two-point distribution as above defined. However, as we showed the their variance are identically 1/N, and what we expect is 1/K, since we want to properly apply the Marchenko-Pastur law, so we have to do the transformation, define r N Cˆ := C˜ (4.2.10) K then putting 4.2.10 into 4.2.9, get K N Q = Cˆ0Cˆ ⇐⇒ Q = Cˆ0Cˆ N K

58 in which Cˆ has i.i.d. entries with mean zero and variance 1/K. Verdú [18] tells us: the signal-to-interference ration (SIR) can be written as

SNR  −1 SIR = tr Σ N N SNR 1 = ∑ ( ) N n=1 λn Σ N 1 SNR = ∑ + · ( ) N n=1 1 SNR λn Q N 1 SNR = N ∑ K N  n=1 1 + SNR · N · λn K Q Z SNR = N K dFN (4.2.11) R K Q 1 + SNR · N · x and the spectral efficiency of the MMSE detector can be calculated by the SIR:

1 K C := log (1 + SIR) (4.2.12) MMSE 2 N Moreover, We notice that the integrand in 4.2.11 contains the factor K/N, which brings the difficulty for applying the Marchenko-Pastur law. For the sim- plicity, we may assume N, K → ∞ with N/K = β (rather than N/K → β, unlike the previous cases). Again, by the technique we have used in 4.1.8, and the con- tinuity of function 4.2.12 (with respect to SIR), and combining with 4.2.11, we get When β ≤ 1, ! 1 K Z SNR = + N CMMSE log 1 K dFN 2 N R K Q 1 + SNR · N · x  q  −  −  1 Z bβ SNR bβ x x aβ −→a.s. + log 1 −1 dx 2β aβ 1 + SNR β x 2πxβ

Or when β > 1,    a.s. 1 1 C −→ log 1 + SNR 1 − + ··· MMSE 2β β q    Z bβ SNR bβ − x x − aβ −1 dx (4.2.13) aβ 1 + SNR β x 2πxβ

p 2 p 2 in which aβ = (1 − β) and bβ = (1 + β) .

59 4.2.3 Other Types of Detections in CDMA System Besides MMSE detection, we have also the asymptotic analysis of spectral effi- ciency of other kinds of detection, like optimal, single-user matched-filter and decorrelator detection. However, expect the optimal detection, the derivation of other results do not rely on the random matrix theory, but central limit theory or large deviation theory.

Figure 4.2.4: Variation of the spectral efficiency with the increase of SNR for different detectors when β = 2

0.8 Optimal Single-user Matched-filter 0.7 Decorrelator MMSE 0.6

0.5

0.4 Efficiency

0.3 Spectral

0.2

0.1

0.0

0 2 4 6 8 10 SNR

Define β˜ := β−1, Verdú [20] gave the following results

r    b bβ˜ − x x − aβ˜ a.s. β˜ Z β˜ C −→ log (1 + SNR x) dx (4.2.14) OPT ˜ 2 aβ˜ 2πxβ   a.s. β˜ SNR CSUMF −→ log 1 + (4.2.15) 2 1 + SNR β˜ a.s. β˜  C −→ log 1 + SNR 1 − β˜ (4.2.16) DECO 2 as K, N → ∞ with N/K = β > 1. Here we compels β > 1 only for making 4.2.16 to be meaningful. Besides, it is worthy to notice that 4.2.14 is almost as same as 4.1.8. Figure 4.2.4 shows the curves of 4.2.13, 4.2.14, 4.2.15 and 4.2.16 when β = 2, which roughly describes the asymptotic, more explicitly, when the number of users K → ∞, performance of different types of detectors used in demodulation.

60 4.3 In Open Quantum System

In physics, an open quantum system is a quantum-mechanical system which in- teracts with an external quantum system, for example the environment. In gen- eral, these interactions significantly change the dynamics of the system and result in quantum dissipation, where the information contained in the system is lost to its environment. Because no quantum system is completely isolated from its sur- roundings, it is important to develop a theoretical framework for treating these interactions in order to obtain an accurate understanding of quantum systems [53]. Mathematically, time evolution in the open quantum system is no longer de- scribed by means of one-parameter groups of unitary maps, but one needs to introduce semigroups of completely positive maps or homomorphisms of a non- commutative algebra. In this very general framework several problems, relevant from both the physical and applied perspective, are still open and lack a firm mathematical formalization. However, in this chapter we will not go so far, and we will be concerned only with the finite dimensional case, so all the aforemen- tioned linear maps will become matrices. We will see that some distributions on these linear maps can be introduced to construct random maps, and then the asymptotic property of their spectra can be studied. Of course, many results given in Chapter3 will be applied.

4.3.1 Quantum State and Quantum Channel We start by introducing the basic components in an open quantum system. A is a matrix that describes the statistical state of the system, defined as:

Definition 4.1 (Density Matrix) A density matrix is a positive semi-definite matrix of unit trace.

1,+ We shall use Md (C) to denote the collection of density matrices of size d × d over the field C, i.e.

1,+ Md (C) := {ρ ∈ Md(C) : ρ ≥ 0, Tr ρ = 1}

in which Md(C) represents the space of matrices of size d × d.

1,+ Proposition 4.2 Md (C) ⊂ Md(C) is convex; also closed and bounded with respect to the nuclear norm k · k1, which is defined in 2.41.

Proof. Firstly we show the convexity: for arbitrary t ∈ [0, 1], and ρ1, ρ2 ∈ 1,+ Md (C), consider the convex combination tρ1 + (1 − t)ρ2, obviously we have d h(tρ1 + (1 − t)ρ2) x, xi = thρ1x, xi + (1 − t)hρ2x, xi ≥ 0, ∀x ∈ C

61 and Tr (tρ1 + (1 − t)ρ2) = t Tr (ρ1) + (1 − t) Tr (ρ2) = 1

since Tr is a linear operator on Md(C). 1,+ Then we show that Md (C) is a closed and bounded subset of the finite di- 1,+ mensional Banach space (Md(C), k · k1). Take {ρn}n∈N ⊂ Md (C) with ρn → ρ ∈ Md(C) in nuclear norm. Since ρn’s are all positive by definition, therefore

p ∗  kρnk1 = Tr ρnρn = Tr ρn = 1, ∀n ∈ N

and we know that the norm must be a continuous map, so

Tr ρ = kρk = lim kρnk = 1 1 n→∞ 1

Moreover, with the help of Proposition 2.44, for fixed x ∈ Cd, there is

|hρnx, xi − hρx, xi| = |h(ρn − ρ) x, xi| 2 ≤ kρn − ρkkxk 2 ≤ kρn − ρk1kxk → 0

1,+ Then hρx, xi ≥ 0; in other words, ρ must also be positive. Thus, ρ ∈ Md (C), 1,+ and then Md (C) is closed. The boundedness is trivial, and notice that we are now working on the finite dimensional space, so all the norms are equivalent, i.e. they will generate the same topology. 1,+ Finally, we conclude that: Md (C) ⊂ Md(C) is convex; closed and bounded with respect to any norm on Md(C).  1,+ The extreme points of Md (C), i.e. point that cannot be represented as the convex combination of two other different points in the convex set, are the rank ∗ d one projectors Px := xx , where x ∈ C and kxk = 1, and in particular, they are called pure states.

Now we turn to give the definition of quantum channels, which is a specific class of linear maps between Mn(C) and Mk(C). In section 2.4.1 we have already mentioned the name of completely positive operator between C*-algebras, now we are going to give the formal definition. Suppose there are two C*-algebras A and B and a φ : A → B, with given k ∈ N, we can construct the map

idk ⊗φ : Mk(C) ⊗ A → Mk(C) ⊗ B (4.3.1) and we shall define Definition 4.3 (Completely positive map) A linear map φ : A → B is said to be completely positive if idk ⊗φ, as in 4.3.1, is positive for all k ∈ N.

62 Notice that a element of M (C) ⊗ A can be identified by the element in M (A), k  k which is the algebra of k × k matrices with entries in A. In fact, let aij ∈ Mk(A)  and Eij be the canonical orthonormal basis in Mk(C), then it is easy to check that k  aij 7→ ∑ aij ⊗ Eij i,j=1

defines a *-isomorphism between Mk(A) and Mk(C) ⊗ A. One can prove that Mk(A) is a C*-algebra in another way. The Gelfand– Naimark theorem tells us: A can be identified as B(H) for some Hilbert space H, then it is not difficult to observe that there is also a *-isomorphism between (k) (k) k Mk(B(H)) and B(H ), where H := ⊗i=1Hi and Hi = H for all 1 ≤ i ≤ k, so the natural *-algebra Mk(A) can inherit the C*-norm and positive elements from B(H(k)), then becomes a C*-algebra. Up to now, we have realized that there is another equivalent definition of completely positive map in terms of Mk(A) and Mk(B): a linear map φ : A → B is said to be completely positive if     a11 ··· a1k φ(a11) ··· φ(a1k)  . .. .   . .. .  φk :  . . .  7→  . . .  ak1 ··· akk φ(ak1) ··· φ(akk)

is positive for all k ∈ N. Then, we give the definition of quantum channel, which is exactly the com- pletely positive operator between finitely dimensional C*-algebras. Since we want to map density matrices to density matrices, the property of trace preserv- ing is necessary.

Definition 4.4 (Quantum channel) A quantum channel is a linear completely posi- tive trace preserving map Φ : Mn(C) → Mk(C).

Here we also give an important characterization of quantum channels [33], which is a result of Stinespring dilation theorem.

Proposition 4.5 A linear map Φ : Mn(C) → Mk(C) is a quantum channel if and only if there exists a finite dimensional Hilbert space K = Cd and an isometry V : Cn → Ckd such that ∗ Φ(X) = TrK (VXV )

for all X ∈ Mn(C), in which TrK is the partial trace over K.

Now, we turn to talk about the capacity of a quantum channel, the main topic of this section, and it is characterized by the Rényi entropy, an object defined on the probability simplex.

63 Definition 4.6 (Probability simplex) A probability simplex ∆d is the set

( d ) d ∆d := x ∈ R : xi ≥ 0 ∀i, ∑ xi = 1 i=1

↓ and we say it is descending if x1 ≥ x2 ≥ · · · ≥ xd, denoted as ∆d.

1,+ Obviously, if ρ ∈ Md (C), then its spectrum λ(ρ) is exactly a descending probability simplex. Then, we define the Rényi entropy:

Definition 4.7 (Rényi entropy) The Rényi Entropy of order p ∈ (0, 1) ∪ (1, ∞) of a probability vector x ∈ ∆d is defined as

d 1 p H (x) = log x (4.3.2) p − ∑ i 1 p i=1

Obviously limp→1 Hp(x) exists and it is exactly the famous Shannon entropy. By the functional calculus [42] we can extend the definition of Rényi entropy 1,+ to the density matrices Md (C), just as 1 H (ρ) = log Tr ρp (4.3.3) p 1 − p We can easily return back from 4.3.3 to 4.3.2, just because the following iden- tity 4.3.4 always holds Hp(ρ) = Hp(λ(ρ)) (4.3.4) 1,+ in which ρ ∈ Md (C), and λ(ρ) is its spectrum, also a descending probability simplex, as stated before. The following defined minimal Rényi output entropy describes the capacity of the quantum channel: Definition 4.8 (Minimal output entropy) The minimal output entropy of a quantum channel is defined as min Hp (Φ) := min Hp (Φ(ρ)) (4.3.5) 1,+ ρ∈Mn (C) One should notice in 4.3.5 the existence of the minimizer is always admitted. In fact, Φ is a linear mapping starting from a finite dimensional space Mn(C) to another complex topological vector space, so Φ must be continuous [8]. By Corollary 2.43, and continuity of some basic functions, we conclude that ρ 7→ Hp (Φ(ρ)) is also continuous, as the multi-composition of continuous mappings. 1,+ Moreover, in Proposition 4.2 we have shown that Mn (C) ⊂ Mn(C) is closed and bounded, then compact. According to Weierstrass Theorem, the minimizer must exist.

64 4.3.2 Asymptotic Minimal Output Entropy Amosov, Holevo and Werner gave the "additivity conjecture": the minimal output min entropy Hp is additive, that is, for arbitrary quantum channels Φ1, Φ2, there is

min min min Hp (Φ1 ⊗ Φ2) = Hp (Φ1) + Hp (Φ2) (4.3.6)

But 4.3.6 turns to be false, proven by using some random methods, that means, the identity violates with non-zero probability. Particularly, some deterministic counterexamples [23][32] are given under suitable p, in which we have only < in Equality 4.3.6. But until now, we do not know any concrete deterministic coun- terexample yet in the case 1 ≤ p ≤ 2. In fact, in the random methods we used the so-called random quantum chan- nel, about which we want to discuss in this section.

Let us consider a probability distribution on quantum channels. By Proposi- tion 4.5, a quantum channel Φ : Mn(C) → Mk(C) can be characterized as ∗ Φ(X) = TrCd (VXV ) , ∀X ∈ Mn(C)

Now we set V : Cn → Ck ⊗ Cd as a random isometry, and usually as a trunca- tion of a random Haar unitary; then Φ becomes a uniformly distributed random channel. The question we are concerned with in this section is: how will the minimal output like when d → ∞? Will the randomness fade away like what happened in the asymptotic analysis of telecommunication systems? Fortunately the answer is positive, but before doing asymptotic analysis, we shall do some transformation on the expression of minimal output entropy, i.e. 4.3.5; this could provide us some shortcuts. Since the Rényi entropies are Schur-concave functions, their minima are at- tained on the extreme points of the set of density matrices [33], more explicitly, ∗ the rank one projectors Px := xx , as aforementioned, and hence

min Hp (Φ) = min Hp (Φ(ρ)) 1,+ ρ∈Mn (C) p = min H (Φ (Px)) x∈Cn kxk=1 p ∗ = min H (TrCd (VPxV )) x∈Cn kxk=1

Define y := Vx, and recall that V : Cn → Ck ⊗ Cd is an isometry. Moreover, observe that we can easily cancel TrCd here, since Hp contains full trace operator.

65 Therefore min  Hp (Φ) = min Hp TrCd Py y∈Im V kyk=1

= min Hp(Py) y∈Im V kyk=1

For a subspace Im V ⊂ Ck ⊗ Cd of dimension n, define the set of singular values or Schmidt coefficient

KV := {σ(y) : y ∈ Im V, kyk = 1} √ ∗ where √σ(y) is the set of singular values of y, i.e. eigenvalues of y y, so σ(y) is just λ( y∗y). ∗ k d ∗ One may notice that Py = yy , where y ∈ C ⊗ C , is not exactly y y; but we can observe that they have almost the same spectrum. That is to say, a non-zero eigenvalue λ of yy∗, must also be the eigenvalue of y∗y. In fact, if yy∗x = λx for some x 6= 0, then left multiply y∗ on both sides of the equality, we get y∗y(y∗x) = λ(y∗x), and obviously y∗x 6= 0, so λ is also eigenvalue of y∗y. Moreover, both yy∗ ∗ and y y are positive semi-definite, then√ we conclude:√ the only difference (that might appear) between the spectra of yy∗ and y∗y is the multiplicity of the eigenvalue 0. Luckily, if we put them into Hp, there will be the same result. min One step further, we can use KV to simplify Hp (Φ):

min ∗ ∗ Hp (Φ) = min Hp(yy ) = min Hp(y y) = ··· y∈Im V y∈Im V kyk=1 kyk=1 ∗ 2 = min Hp(λ(y y)) = min Hp(σ (y)) = ··· y∈Im V y∈Im V kyk=1 kyk=1 2 = min Hp(z ) z∈KV Notice that here we abuse the notation in σ2(y) and z2. In fact σ2(y) represents the vector  2 2 2  σ1 (y), σ2 (y), ··· , σmin{k,d}(y) For z2 it is analogous. n k d Now, more explicitly, we write Vd to denote V : C → C ⊗ C , and Φd to denote the random quantum channel generated by Vd. At this stage, we are pre- pared to understand min 2 lim Hp (Φd) = lim min Hp(z ) d→ d→ z∈K ∞ ∞ Vd As we can see, the minimal output entropy is in fact determined by the singu- lar values of unit elements in a random subspace belonging to the tensor product.

66 As d → ∞, k fixed, and n ∼ tkd for arbitrary constant t ∈ (0, 1), Belinschi, Collins and Nechita [34] have proved that the performance of such singular values be- comes deterministic, as stated in the following theorem:

k Theorem 4.9 For a sequence of uniformly distributed random subspaces Im Vd ⊂ C ⊗ Cd , the set KVd of singular values of unit vectors from Im Vd converges almost surely and in the Hausdorff distance to a deterministic and convex subset Kk,t of the probability simplex ∆k, where n o Kk,t := z ∈ ∆k : hz, xiRk ≤ kxk(t), ∀x ∈ ∆k (4.3.7)

In 4.3.7 we need the help from a new norm on Rk [34]:

Definition 4.10 ((t)-Norm) For a positive integer k, embed Rk as a self-adjoint sub- algebra R of a II1 factor A endowed with trace φ, so that φ ((x1, ··· , xk)) = (x1 + ··· + xk) /k. Let pt be a projection whose trace t := φ(pt) ∈ (0, 1] in A, free from the real vector space Rk, the (t)-norm is defined as:

kxk(t) := kptxptk∞ where the vector x ∈ Rk is identified with its image in R.

Therefore, in the asymptotic case, the problem is reduced to find minimal out- put entropy on deterministic Kk,t rather than stochastic KVd . As a result of Theo- rem 4.9, also proven by Belinschi, Collins and Nechita [41], we have the following corollary which describes the asymptotic capacity of the random quantum chan- nel:

Corollary 4.11 For all p ≥ 1, there is

min 2 lim Hp (Φd) = min Hp(λ ) = Hp(a, b, b, ··· , b) d→∞ λ∈Kk,t

1−a 1 where b = k−1 , a = φ( k , t), and φ is defined as ( p s + t − 2st + 2 st(1 − s)(1 − t), s + t < 1 φ(s, t) = 1, s + t ≥ 1

min We can see the asymptotic minimal output entropy limd→∞ Hp (Φd) is to- tally decided by parameters k ∈ N and t ∈ (0, 1). In fact, we have the following figure, in which we have used Shannon entropy, i.e. when p → 1.

67 Figure 4.3.1: Asymptotic MOE of random quantum channel under different k

k = 2 6 k = 10 k = 50 k = 100 5 k = 500

4

3 Asymptotic MOE Asymptotic 2

1

0

0.0 0.2 0.4 0.6 0.8 1.0 t

4.3.3 Spectrum of Random Quantum Channel In Definition 4.4, if we set k = n, then a quantum channel is a linear map on Mn(C), and we are able to study its spectrum. In fact, we have the following theorem, which is the immediate result of Corol- lary 2.8 in [22], and it gives us some information about the spectrum of quantum channels. Theorem 4.12 Let A and B be unital C*-algebras, and Φ : A → B be a positive map, then kΦ(a)k ≤ kΦ(1)kkak (4.3.8) for all a ∈ A. Remember that, in our case, in Inequality 4.3.8 the norm indicates the one which makes Mn(C) to be a C*-algebra. In fact, it is the so-called spectral norm. For an element in Mn(C), its spectral norm equals its maximal singular value. On the other hand, Proposition 4.5 tells us, if Φ is a quantum channel, then Φ(1) = 1, where 1 is exactly the identity matrix In, then obviously kΦ(1)k = 1. Now, if λ ∈ C is the eigenvalue of a quantum channel Φ : Mn(C) → Mn(C) (Since we are working on finitely dimensional spaces, we do not distinguish spec- trum and eigenvalues), we have Φ(X) = λX

for some X 6= 0 in Mn(C). By Inequality 4.3.8, we get |λ|kXk = kλXk = kΦ(X)k ≤ kΦ(1)kkXk = kXk

68 so we conclude |λ| ≤ 1, i.e. the eigenvalues of a quantum channel is inside the unit circle in the complex plane. In particular, Φ(1) = 1 tells us 1 must be the first eigenvalue for any quantum channel. Bruzda, Cappellini, Sommer and Zyczkowski [29] imposed an emsemble of quantum channels and presented an algorithm to randomly generate them. As a convention, we call it BCSZ ensemble.

Figure 4.3.2: Spectra of random quantum channel Φ (BCSZ ensemble, Φ : Mn(C) → Mn(C), with auxiliary circles of radius 1 and 1/n)

− − les

In Figure 4.3.2, we can observe that, except the leading eigenvalue 1, most of eigenvalues lie on the real axis, and the rest eigenvalues seem to be uniformly scattered inside the circle of radius 1/n; but this is just an observation, rigorous analysis seldom has been developed. However, the gap between the first and second eigenvalues of a random quantum channel is still a very interesting topic.

69 González-Guillén et al. [49] consider a different probability distribution on the collection of quantum channels, which is indeed the one we defined in Section 4.3.2, and they have proved a lower bound on the difference between the first and second singular values of random quantum channels. Now we deal with a little bit tangential problem: usually we hope to obtain the spectrum of given quantum channel by programming, but the quantum chan- nels of form introduce in Proposition 4.5 are not executable objects for computer. Fortunately the quantum channel is defined on finite-dimensional spaces, hence we need to rewrite them into matrix form, and we need the help of the following proposition:

Proposition 4.13 Let V be a n-dimensional vector space with a basis {ei}i∈I, and let n T ∈ L(V). Suppose Tˆ ∈ L (C ) is the n × n matrix generated by T and {ei}i∈I, i.e.  ˆ T ej = ∑ Tijei, ∀j ∈ I (4.3.9) i∈I

where Tˆij ∈ C. Then, T and Tˆ have the same spectrum.

Proof. Suppose there exist non-zero element v ∈ V and λ ∈ C such that

T(v) = λv (4.3.10)

Expanding v as v = ∑i∈I viei, and putting it back into 4.3.10, we get ! T ∑ viei = λ ∑ vkek i∈I k∈I ⇒ ∑ viT (ei) = λ ∑ vkek i∈I k∈I ˆ ˆ  ⇒ ∑ vi T1ie1 + ··· + Tnien = λ ∑ vkek i∈I k∈I ! ˆ ⇒ ∑ ∑ viTki ek = λ ∑ vkek k∈I i∈I k∈I which is equivalent to the matrix multiplication

Tˆ vˆ = λvˆ

n where vˆ = (v1, ··· , vn) ∈ C . Observe that v 6= 0 so vˆ 6= 0, then λ is also an eigenvalue of Tˆ. We can reverse all the steps to prove another direction. 

In this way, for a quantum channel Φ : Mn(C) → Mn(C), we shall choose the  an orthogonal basis Eij , where Eij are matrices which have entry 1 at position  (i, j) and entry 0 otherwise, and lexicographically order Eij , then write Φ as

70 a n2 × n2 matrix, whose spectrum can be immediately analyzed by computer. Especially, for the linear operator T : M(C) → M(C), we can explicitly calculate its matrix representation Tˆ, as defined in 4.3.9, by h i ˆ † Tij = Tr Fi T(Fj) (4.3.11) where (Fi)i∈I is arbitrary orthonormal basis in Mn(C). It is equivalent to verify that: for every j ∈ I we have

h i † T(Fj) − ∑ Tr Fi T(Fj) · Fi = 0 i∈I 2 where k · k2 is the Frobenius norm. However, it is very easy to prove this equality, just take the square and expand the left-hand side to four components, one can 2 2 T(F ) ± F T(F ) observe that the first one will be j 2 and the rests are ∑i∈I i, j 2 , which is exactly the Parseval’s identity.

4.3.4 Quantum Markov Semigroup and Lindblad Equation In this section we turn to study the dynamics in a Markovian open quantum system, which can be modelled as semigroups of completely positive operators and their corresponding infinitesimal generator. Conventionally, this structure is called quantum Markov semigroup, and we will give its formal definition later, since right now we are going to introduce some general facts on semigroups of bounded linear operators.

Definition 4.14 (Semigroup of operators) Let X be a Banach space, and {Tt}t≥0 is a collection of operators such that Tt ∈ B(X) for all t ≥ 0 and

1.T 0 = I,

2.T s+t = TsTt for all s ≥ 0 and t ≥ 0. then we say {Tt}t≥0 is a semigroup of opertors.

In particular, we often assume that

lim kTtx − xk = 0, ∀x ∈ X (4.3.12) t→0 borrowing the term from Definition 2.45, we shall restate 4.3.12 as Tt → I when t → 0 in strong operator topogy, and the we say {Tt}t≥0 is a strongly continuous semigroup. Observe that for every continuous complex function that satisfies f (s + t) = f (s) f (t) has the form f (t) = exp(at), and f is determined by the number a =

71 0 f (0). Therefore, we associate with {Tt}t≥0 the operators Le by, fixing x ∈ X and e > 0, T x − x L x = e e e and define Lx := lim Lex (4.3.13) e→0 for all x ∈ D(L), that is, for all x for which the limit 4.3.13 exists in the norm topology. Notice that D(L) is a subspace of X and that L is a linear operator on X. 0 The operator L, which is essentially T0, is called the infinitesimal generator of the semigroup {Tt}t≥0. If {Tt}t≥0 is strongly continuous, then D(L) is dense in X [8]. If we enforce the assumption on the continuity, for instance, if we suppose limt→0 kT − Ik = 0, i.e. T → I in operator norm topology (sometimes we also say uniform topology), we will have the following better result [8]:

Theorem 4.15 If {Tt}t≥0 is a semigroup of operators on a Banach space X, then the following three conditions are equivalent:

1. limt→0 kTt − Ik = 0, 2. D(L) = X,

tL 3. L ∈ B(X) and Tt = e .

Now, based on the rough introduction of semigroups of operators, we can give the definition of quantum Markov semigroup as follows. Rigorously, it is called uniformly continuous quantum Markov semigroup.

Definition 4.16 (Quantum Markov semigroup) Let A be a von Neumann algebra of linear bounded operators acting on some Hilbert space H, and {T}t≥0 be a collection of linear bounded operators on A. Then, we call {T}t≥0 a quantum Markov group if

1.T 0 = I,

2. (Markovianity) Tt(1) = 1 for all t ≥ 0,

3. (Complete positivity) Tt is completely positive for all t ≥ 0,

4. (Semigroup) Ts+t = TsTt for all s ≥ 0 and t ≥ 0.

5.T t is σ-weakly continuous for all t ≥ 0.

6. (Uniform continuity) limt→0 kTt − Ik = 0.

72 In general setting, we will relax the uniform continuity, as shown in above Condition6, to the σ-weak continuity. In that case, we need also add some as- sumption on continuity of the map t 7→ T(t)a, defined for all fixed a ∈ A. But this is not necessary here, since from Condition6 we can easily prove the conti- nuity of t 7→ T(t)a. For more details about the general facts of quantum Markov semigroups, we refer to [19]. In 1976, Lindblad [6] gave a characterization of infinitesimal generator of quan- tum Markov semigroups, which is a linear bounded operator on A, as we have seen from Theorem 4.15. The following theorem is of vital importance in the theory of open quantum systems. For the remembrance, we call L Lindblad op- erator.

Theorem 4.17 (Lindblad) {T}t≥0 is a quantum Markov semigroup if and only if its infinitesimal generator L is in the form   ∗ 1 n ∗ o L(a) = i [H, a] + ∑ Vj aVj − Vj Vj, a (4.3.14) j 2 ∗ ∗ where Vj ∈ B(H), ∑j Vj Vj ∈ B(H),H = H ∈ B(H); and [ · , · ] denotes the commu- tator, { · , · } the anticommutator. Like in Section 4.3.1, sometimes we concern more about the finite dimensional case; as shown in [21]: Corollary 4.18 Keep all the settings fixed as in Theorem 4.17 expect now H is a finite dimensional Hilbert space with dimenstion N, then {T}t≥0 is a quantum Markov semi- group if and only if its infinitesimal generator L is in the form

L (ρ) = LH(ρ) + LD(ρ) N2−1   † 1 n † o = −i [H, ρ] + ∑ Kmn FnρFm − FmFn, ρ (4.3.15) m,n=1 2 ∼ Notice that now B(H) = MN(C), then in 4.3.15 we have: {Fn} is a collection of traceless matrices in MN(C), and it is orthonormal with respect to the Hilbert-Schmidt inner product; K = (Kmn) is a positive semidefinite matrix, and it is named Kossakowski † matrix; the Hamiltonian H = H ∈ MN(C) is as the same.

As a convention, we call LH as the Hamiltonian part, and LD the dissipative part. Moreover, By diagonalizing the Kossakowski matrix, one can reduce an- other form of LD as following (which is infact the finite dimensional version of Expression 4.3.14)

N2−1   † 1 n † o LD(ρ) = ∑ γn VnρVn − Vn Vn, ρ (4.3.16) n=1 2

where γn’s are eigenvalues of Kassakowski matrix, and Vn ∈ Mn(C).

73 4.3.5 Random Matrix Model of Lindbladian In section 4.3.3 we have studied the spectra of random quantum channels, and have observed some interesting phenomena. On the other hands, in Section 4.3.4, it tells us in the finite dimensional case the Lindblad operator L can be repre- sented as Expression 4.3.15, where {Fn} can be selected as orthonormal traceless matrices. Especially, we shall choose the generators of the special unitary group SU(N), which is a Lie group of N × N unitary matrices with determinant 1, as our {Fn}. Moreover, K can be easily sampled as a Wishart matrix. Therefore, we have found the way to sample a random Lindblad operator, and by Proposition 4.13 we are permitted to do the numerical simulation on its spectrum. There- fore, in this section we are going to study the properties of the spectra of random Lindblad operators. When one deals with numerical issues, it is unavoidable to consider the com- plexity of algorithms. Unfortunately, our tentative to calculate the spectrum of 4.3.15 by using Formula 4.3.9 or Formula 4.3.11 shows that the brute-force algo- rithm is pretty slow: it takes around five minutes to calculate the spectrum of one random Lindblad operator when N = 10 in our experiment. The bottleneck is calculating the matrix representation of the Lindblad operator. Denisov et al. [50] also showed us that the single-core implementation of such sampling produces a single realization for N = 100 on the time scale of several days. Then, they have developed some optimization and have used the paral- leling computation technique to accelarate the calculation, and their work was performed on the Lomonosov supercomputer (Moscow State University) and the MPIPKS cluster (Dresden). In short words, it is a painful task to obtain the spectrum of a random Lindblad operator when N is quite large. But recall that in the chapter of random matrix theory, we have observed that the empirical spectral distributions of a series of matrices will tend to a limit spectral distribution, will this also happen to the Lindblad operator? The answer is positive. In fact, Denisov et al. gave the following result: Proposition 4.19 The random matrix model of Lindblad operator is iα (H ⊗ 1 − 1 ⊗ H) + G − 1 ⊗ 1 − (C ⊗ 1 + 1 ⊗ C) (4.3.17) | {z } | {z } Hamiltonian part Dissipative part Above H denotes a Hermitian random matrix of size N × N, sampled from the Gaussian unitary ensemble, and it represents a generic Hamiltonian. α is the weight on Hamilto- nian part. G is sampled from real Ginibre ensemble of size N2 × N2, and C is sampled from Gaussian orthogonal ensemble of size N × N. Here the random matrix model means the Expression 4.3.17 has the same limit spectral distribution as the large random Lindblad operator 4.3.15 (after trunca- tion of the origin point and rescaling). One can find the proof of above propo- sition in the supplemental material of [50], in which free central limit theorem

74 has been used. Remember this nice result not only allows us to quickly visualize the limit spectral distribution of random Lindbladian, but also allows us to more easily compute the border of the spectrum in the asymptotic case.

Figure 4.3.3: Spectra of random Lindbladians L of dimension N = 50, approximated by the random matrix model 4.3.17, under different weights α

− − − lpha = 1.5

The Figure 4.3.3 can be plotted in several minutes, and the main bottleneck is no longer computing the matrix representation of random Lindbladian but com- puting the eigenvalues of a large size matrix. Notice that in the plot we have removed the shift component 1 ⊗ 1 from 4.3.17, so the spectra are centered. From Figure 4.3.3, we believe that the spectrum of the dissipative part LD of large di- mensional Lindbladian L is lemon-shaped. With the increasing of the weight on the Hamiltonian part LH, the spectrum of L becomes slightly contracted along the real axis, and quickly expanded along the imaginary axis.

75 Chapter 5

Conclusions and Future Development

Several applications of random matrix theory in telecommunication and quan- tum systems have been presented in the Chapter4, so one can well comprehend the importance of random matrix theory. However, the applications we have dis- played are just a tip of the iceberg. Apart from the Telecommunication Engineer- ing and Quantum Physics, there are still many fields in which random matrix theory has successful applications. Here we select some other typical fields as examples: In Statistics, the Wishart matrix, a frequently appearing object, arises as the distribution of the sample covariance matrix for a sample from a multivariate normal distribution. Therefore, the random matrix might be one possible method for dealing with large dimensional data analysis, and hence has received more attention among statisticians in recent decades. In Number Theory, the distribution of zeros of the Riemann zeta function, and other L-functions, is modelled by the distribution of eigenvalues of certain random matrices. Bogomolny and Keating [13][15] applied the RMT to study the Riemann zeros in 1995 and 1996. At the same time, Rudnick and Sarnak [17] applied it to the L-function case. Recently, many applications of random matrix theory appear even in Ma- chine Learning. For example, Pennington and Worah [48] have demonstrated that the pointwise nonlinearities of neural networks can be incorporated into the moments method, so that the barrier of directly applying random matrix theory in the neural networks has been removed. On the other hand, although 80 years have passed since the start of study on random matrices, and numerous new papers are still being published every year, there are still open problems that need to be demonstrated, for instance the one mentioned in section 3.4.3, and many others could be found in [27]. There are still fields where random matrix theory could be applied. For instance, in 1966 Gustafson [4] introduced the antieigenvalue theory, which is applicable to

76 Numerical Analysis, Statistics, etc. Then, we may ask ourselves: what about the distribution of antieigenvalues of random matrices? Some simulation and observation of the limit distribution have been done by the colleague F. Ferrari [43] in his master thesis, in which he also introduced the Growth Curve Model [3] to show the necessity of this study. However, the theoretical work has not been developed yet, which could be an interesting research objective.

77 Bibliography

[1] Francis J Murray and John von Neumann. “On rings of operators. II”. In: Transactions of the American Mathematical Society 41.2 (1937), pp. 208–248. [2] Eugene P Wigner. “On the distribution of the roots of certain symmetric matrices”. In: Ann. Math 67.2 (1958), pp. 325–327. [3] Richard F Potthoff and SN Roy. “A generalized multivariate analysis of variance model useful especially for growth curve problems”. In: Biometrika 51.3-4 (1964), pp. 313–326. [4] Karl Gustafson. “The angle of an operator and positive operator products”. In: Bulletin of the American Mathematical Society 74.3 (1968), pp. 488–492. [5] Leonid Andreevich Pastur. “On the spectrum of random matrices”. In: Teo- reticheskaya i Matematicheskaya Fizika 10.1 (1972), pp. 102–112. [6] Goran Lindblad. “On the generators of quantum dynamical semigroups”. In: Communications in Mathematical Physics 48.2 (1976), pp. 119–130. [7] Dag Jonsson. “Some limit theorems for the eigenvalues of a sample covari- ance matrix”. In: Journal of Multivariate Analysis 12.1 (1982), pp. 1–38. [8] Walter Rudin. “Functional analysis”. In: Internat. Ser. Pure Appl. Math (1991). [9] Dan Voiculescu. “Limit laws for random matrices and free products”. In: Inventiones mathematicae 104.1 (1991), pp. 201–220. [10] Dan V Voiculescu, Ken J Dykema, and Alexandru Nica. Free random vari- ables. 1. American Mathematical Soc., 1992. [11] ZD Bai et al. “Convergence Rate of Expected Spectral Distributions of Large Random Matrices. Part I. Wigner Matrices”. In: The Annals of Probability 21.2 (1993), pp. 625–648. [12] Zhidong D Bai et al. “Convergence rate of expected spectral distributions of large random matrices. Part II. Sample covariance matrices”. In: The Annals of Probability 21.2 (1993), pp. 649–672. [13] Eugene B Bogomolny and Jon Peter Keating. “Random matrix theory and the Riemann zeros. I. Three-and four-point correlations”. In: Nonlinearity 8.6 (1995), p. 1115.

78 [14] Jack W Silverstein and ZD Bai. “On the empirical distribution of eigenval- ues of a class of large dimensional random matrices”. In: Journal of Multi- variate analysis 54.2 (1995), pp. 175–192. [15] Eugene B Bogomolny and Jonathan P Keating. “Random matrix theory and the Riemann zeros II: n-point correlations”. In: Nonlinearity 9.4 (1996), p. 911. [16] Bernard Picinbono. “Second-order complex random vectors and normal distributions”. In: IEEE Transactions on Signal Processing 44.10 (1996), pp. 2637– 2640. [17] Zeév Rudnick, Peter Sarnak, et al. “Zeros of principal L-functions and ran- dom matrix theory”. In: Duke Mathematical Journal 81.2 (1996), pp. 269–322. [18] Sergio Verdú et al. Multiuser detection. Cambridge university press, 1998. [19] Franco Fagnola. “Quantum Markov semigroups and quantum flows”. In: Proyecciones 18.3 (1999), pp. 1–144. [20] Sergio Verdú and Shlomo Shamai. “Spectral efficiency of CDMA with ran- dom spreading”. In: IEEE Transactions on Information theory 45.2 (1999), pp. 622– 640. [21] Heinz-Peter Breuer, Francesco Petruccione, et al. The theory of open quantum systems. Oxford University Press on Demand, 2002. [22] Vern Paulsen. Completely bounded maps and operator algebras. Vol. 78. Cam- bridge University Press, 2002. [23] Reinhard F Werner and Alexander S Holevo. “Counterexample to an ad- ditivity conjecture for output purity of quantum channels”. In: Journal of Mathematical Physics 43.9 (2002), pp. 4353–4357. [24] Antonia M Tulino, Sergio Verdú, et al. “Random matrix theory and wire- less communications”. In: Foundations and Trends R in Communications and Information Theory 1.1 (2004), pp. 1–182. [25] Patrizia Berti, Luca Pratelli, and Pietro Rigo. “Almost sure weak conver- gence of random probability measures”. In: Stochastics and Stochastics Re- ports 78.2 (2006), pp. 91–97. [26] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré. Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2008. [27] Percy Deift. “Some open problems in random matrix theory and the theory of integrable systems”. In: Contemporary Mathematics 458 (2008), p. 419. [28] Terence Tao and Van Vu. “Random matrices: the circular law”. In: Commu- nications in Contemporary Mathematics 10.02 (2008), pp. 261–307.

79 [29] Wojciech Bruzda et al. “Random quantum operations”. In: Physics Letters A 373.3 (2009), pp. 320–324. [30] Greg W Anderson, Alice Guionnet, and Ofer Zeitouni. An introduction to random matrices. Vol. 118. Cambridge university press, 2010. [31] Zhidong Bai and Jack W Silverstein. Spectral analysis of large dimensional ran- dom matrices. Vol. 20. Springer, 2010. [32] Andrzej Grudka, Michal Horodecki, and Lukasz Pankowski. “Constructive counterexamples to the additivity of the minimum output Rényi entropy of quantum channels for all p>2”. In: Journal of Physics A: Mathematical and Theoretical 43.42 (2010), p. 425304. [33] Benoît Collins and Ion Nechita. “Random quantum channels II: Entangle- ment of random subspaces, Rényi entropy estimates and additivity prob- lems”. In: Advances in Mathematics 226.2 (2011), pp. 1181–1201. [34] Serban Belinschi, Benoît Collins, and Ion Nechita. “Eigenvectors and eigen- values in a random subspace of a tensor product”. In: Inventiones mathemat- icae 190.3 (2012), pp. 647–697. [35] Thomas M Cover and Joy A Thomas. Elements of information theory. John Wiley & Sons, 2012. [36] Jean Jacod and Philip Protter. Probability essentials. Springer Science & Busi- ness Media, 2012. [37] Terence Tao. Topics in random matrix theory. Vol. 132. American Mathemati- cal Soc., 2012. [38] Patrick Billingsley. Convergence of probability measures. John Wiley & Sons, 2013. [39] Robert Schatten. Norm ideals of completely continuous operators. Vol. 27. Springer- Verlag, 2013. [40] Vaughan FR Jones. “von Neumann algebras”. In: (2015). [41] Serban T Belinschi, Benoit Collins, and Ion Nechita. “Almost one bit viola- tion for the additivity of the minimum output entropy”. In: Communications in Mathematical Physics 341.3 (2016), pp. 885–909. [42] Benoit Collins and Ion Nechita. “Random matrix techniques in quantum in- formation theory”. In: Journal of Mathematical Physics 57.1 (2016), p. 015215. [43] Fabio Ferrari. “Antieigenvalues and sample coviarance matrices”. In: (2016). URL: http://hdl.handle.net/10589/123744. [44] Guowang Miao et al. Fundamentals of mobile data networks. Cambridge Uni- versity Press, 2016. [45] Sandro Salsa. Partial differential equations in action: from modelling to theory. Vol. 99. Springer, 2016.

80 [46] Amos Lapidoth. A foundation in digital communication. Cambridge Univer- sity Press, 2017. [47] James A Mingo and Roland Speicher. Free probability and random matrices. Vol. 35. Springer, 2017. [48] Jeffrey Pennington and Pratik Worah. “Nonlinear random matrix theory for deep learning”. In: Advances in Neural Information Processing Systems. 2017, pp. 2637–2646. [49] Carlos E González-Guillén, Marius Junge, and Ion Nechita. “On the spec- tral gap of random quantum channels”. In: arXiv preprint arXiv:1811.08847 (2018). [50] Sergey Denisov et al. “Universal spectra of random Lindblad operators”. In: Physical review letters 123.14 (2019), p. 140403. [51] Roland Speicher. “Lecture Notes on" Free Probability Theory"”. In: arXiv preprint arXiv:1908.08125 (2019). [52] Wikipedia contributors. MIMO — Wikipedia, The Free Encyclopedia. https: //en.wikipedia.org/w/index.php?title=MIMO&oldid=923129758. 2019. [53] Wikipedia contributors. Open quantum system — Wikipedia, The Free Encyclo- pedia. 2019. URL: https://en.wikipedia.org/w/index.php?title=Open_ quantum_system&oldid=929489775. [54] Wikipedia contributors. Von Neumann algebra — Wikipedia, The Free Encyclo- pedia. 2020. URL: https://en.wikipedia.org/w/index.php?title=Von_ Neumann_algebra&oldid=935662315.

81