Introduction to Matrix Analysis and Applications
Fumio Hiai and D´enes Petz
Graduate School of Information Sciences Tohoku University, Aoba-ku, Sendai, 980-8579, Japan E-mail: [email protected]
Alfr´ed R´enyi Institute of Mathematics Re´altanoda utca 13-15, H-1364 Budapest, Hungary E-mail: [email protected]
Preface
A part of the material of this book is based on the lectures of the authors in the Graduate School of Information Sciences of Tohoku University and in the Budapest University of Technology and Economics. The aim of the lectures was to explain certain important topics on matrix analysis from the point of view of functional analysis. The concept of Hilbert space appears many times, but only finite-dimensional spaces are used. The book treats some aspects of analysis related to matrices including such topics as matrix monotone functions, matrix means, majorization, entropies, quantum Markov triplets. There are several popular matrix applications for quantum theory. The book is organized into seven chapters. Chapters 1-3 form an intro- ductory part of the book and could be used as a textbook for an advanced undergraduate special topics course. The word “matrix” started in 1848 and applications appeared in many different areas. Chapters 4-7 contain a num- ber of more advanced and less known topics. They could be used for an ad- vanced specialized graduate-level course aimed at students who will specialize in quantum information. But the best use for this part is as the reference for active researchers in the field of quantum information theory. Researchers in statistics, engineering and economics may also find this book useful. Chapter 1 contains the basic subjects. We prefer the Hilbert space con- cepts, so complex numbers are used. Spectrum and eigenvalues are impor- tant. Determinant and trace are used later in several applications. The tensor product has symmetric and antisymmetric subspaces. In this book “positive” means 0, the word “non-negative” is not used here. The end of the chapter ≥ contains many exercises. Chapter 2 contains block-matrices, partial ordering and an elementary theory of von Neumann algebras in finite-dimensional setting. The Hilbert space concept requires the projections P = P 2 = P ∗. Self-adjoint matrices are linear combinations of projections. Not only the single matrices are required, but subalgebras are also used. The material includes Kadison’s inequality and completely positive mappings. Chapter 3 contains matrix functional calculus. Functional calculus pro- vides a new matrix f(A) when a matrix A and a function f are given. This is an essential tool in matrix theory as well as in operator theory. A typical A ∞ n example is the exponential function e = n=0 A /n!. If f is sufficiently smooth, then f(A) is also smooth and we have a useful Fr´echet differential formula. Chapter 4 contains matrix monotone functions. A real functions defined on an interval is matrix monotone if A B implies f(A) f(B) for Hermi- ≤ ≤ 4 tian matrices A, B whose eigenvalues are in the domain interval. We have a beautiful theory on such functions, initiated by L¨owner in 1934. A highlight is integral expression of such functions. Matrix convex functions are also con- sidered. Graduate students in mathematics and in information theory will benefit from a single source for all of this material. Chapter 5 contains matrix (operator) means for positive matrices. Matrix extensions of the arithmetic mean (a + b)/2 and the harmonic mean
a−1 + b−1 −1 2 are rather trivial, however it is non-trivial to define matrix version of the geometric mean √ab. This was first made by Pusz and Woronowicz. A general theory on matrix means developed by Kubo and Ando is closely related to operator monotone functions on (0, ). There are also more complicated ∞ means. The mean transformation M(A, B) := m(LA, RB) is a mean of the left-multiplication LA and the right-multiplication RB recently studied by Hiai and Kosaki. Another concept is a multivariable extension of two-variable matrix means. Chapter 6 contains majorizations for eigenvalues and singular values of ma- trices. Majorization is a certain order relation between two real vectors. Sec- tion 6.1 recalls classical material that is available from other sources. There are several famous majorizations for matrices which have strong applications to matrix norm inequalities in symmetric norms. For instance, an extremely useful inequality is called the Lidskii-Wielandt theorem. There are several famous majorizations for matrices which have strong applications to matrix norm inequalities in symmetric norms. The last chapter contains topics related to quantum applications. Positive matrices with trace 1 are the states in quantum theories and they are also called density matrices. The relative entropy appeared in 1962 and the ma- trix theory has many applications in the quantum formalism. The unknown quantum states can be known from the use of positive operators F (x) when
x F (x) = I. This is called POVM and there are a few mathematical re- sults, but in quantum theory there are much more relevant subjects. These subjects are close to the authors and there are some very recent results. The authors thank several colleagues for useful communications, Professor Tsuyoshi Ando had several remarks.
Fumio Hiai and D´enes Petz April, 2013 Contents
1 Fundamentals of operators and matrices 5 1.1 Basicsonmatrices ...... 5 1.2 Hilbertspace ...... 8 1.3 Jordancanonicalform ...... 16 1.4 Spectrumandeigenvalues ...... 19 1.5 Traceanddeterminant ...... 24 1.6 Positivity and absolute value ...... 30 1.7 Tensorproduct ...... 38 1.8 Notesandremarks ...... 47 1.9 Exercises...... 49
2 Mappings and algebras 57 2.1 Block-matrices ...... 57 2.2 Partialordering ...... 67 2.3 Projections ...... 71 2.4 Subalgebras ...... 78 2.5 Kernelfunctions...... 86 2.6 Positivity preserving mappings ...... 88 2.7 Notesandremarks ...... 96 2.8 Exercises...... 98
3 Functional calculus and derivation 104 3.1 Theexponentialfunction...... 105 3.2 Otherfunctions ...... 113 3.3 Derivation...... 119 3.4 Fr´echetderivatives ...... 127
1 2 CONTENTS
3.5 Notesandremarks ...... 132 3.6 Exercises...... 133
4 Matrix monotone functions and convexity 138 4.1 Someexamplesoffunctions ...... 139 4.2 Convexity ...... 143 4.3 Pickfunctions...... 159 4.4 L¨owner’stheorem...... 165 4.5 Someapplications...... 172 4.6 Notesandremarks ...... 183 4.7 Exercises...... 184
5 Matrix means and inequalities 187 5.1 Thegeometricmean ...... 188 5.2 Generaltheory ...... 195 5.3 Meanexamples ...... 207 5.4 Meantransformation ...... 212 5.5 Notesandremarks ...... 221 5.6 Exercises...... 222
6 Majorization and singular values 227 6.1 Majorizationofvectors ...... 228 6.2 Singularvalues ...... 233 6.3 Symmetricnorms ...... 242 6.4 Moremajorizationsformatrices ...... 254 6.5 Notesandremarks ...... 269 6.6 Exercises...... 271
7 Some applications 274 7.1 GaussianMarkovproperty ...... 275 7.2 Entropiesandmonotonicity ...... 278 7.3 QuantumMarkovtriplets ...... 289 7.4 Optimalquantummeasurements...... 293 7.5 Cram´er-Raoinequality ...... 308 7.6 Notesandremarks ...... 322 CONTENTS 3
7.7 Exercises...... 323
Index 324
Bibliography 331
Chapter 1
Fundamentals of operators and matrices
A linear mapping is essentially matrix if the vector space is finite dimensional. In this book the vector space is typically finite dimensional complex Hilbert- space.
1.1 Basics on matrices
For n, m N, Mn×m = Mn×m(C) denotes the space of all n m complex matrices.∈ A matrix M M is a mapping 1, 2,...,n 1×, 2,...,m ∈ n×m { }×{ }→ C. It is represented as an array with n rows and m columns: m m m 11 12 1m m21 m22 m2m M = . . . . . . .. . m m m n1 n2 nm mij is the intersection of the ith row and the jth column. If the matrix is denoted by M, then this entry is denoted by Mij. If n = m, then we write M instead of M . A simple example is the identity matrix I M n n×n n ∈ n defined as mij = δi,j, or 1 0 0 0 1 0 In = . . . . ...... 0 0 1 Mn×m is a complex vector space of dimension nm. The linear operations
5 6 CHAPTER 1. FUNDAMENTALS OF OPERATORS AND MATRICES are defined as follows:
[λA]ij := λAij, [A + B]ij := Aij + Bij where λ is a complex number and A, B M . ∈ n×m Example 1.1 For i, j = 1,...,n let E(ij) be the n n matrix such that (i, j)-entry equals to one and all other entries equal to× zero. Then E(ij) are called matrix-units and form a basis of Mn:
n
A = AijE(ij). i,j=1 Furthermore, n
In = E(ii) . i=1 If A M and B M , then product AB of A and B is defined by ∈ n×m ∈ m×k m
[AB]ij = AiℓBℓj, ℓ=1 where 1 i n and 1 j k. Hence AB Mn×k. So Mn becomes an algebra.≤ The≤ most significant≤ ≤ feature of matrices∈ is non-commutativity of the product AB = BA. For example, 0 1 0 0 1 0 0 0 0 1 0 0 = , = . 0 0 1 0 0 0 1 0 0 0 0 1
In the matrix algebra Mn, the identity matrix In behaves as a unit: InA = AIn = A for every A Mn. The matrix A Mn is invertible if there is a B M such that AB∈ = BA = I . This B∈ is called the inverse of A, in ∈ n n notation A−1.
Example 1.2 The linear equations
ax + by = u cx + dy = v can be written in a matrix formalism: a b x u = . c d y v 1.1. BASICS ON MATRICES 7
If x and y are the unknown parameters and the coefficient matrix is invertible, then the solution is x a b −1 u = . y c d v So the solution of linear equations is based on the inverse matrix which is formulated in Theorem 1.33.
The transpose At of the matrix A M is an m n matrix, ∈ n×m × [At] = A (1 i m, 1 j n). ij ji ≤ ≤ ≤ ≤ It is easy to see that if the product AB is defined, then (AB)t = BtAt. The adjoint matrix A∗ is the complex conjugate of the transpose At. The space Mn is a *-algebra: (AB)C = A(BC), (A + B)C = AC + BC, A(B + C)= AB + AC, (A + B)∗ = A∗ + B∗, (λA)∗ = λA¯ ∗, (A∗)∗ = A, (AB)∗ = B∗A∗.
Let A M . The trace of A is the sum of the diagonal entries: ∈ n n
Tr A := Aii. (1.1) i=1 It is easy to show that Tr AB = Tr BA, see Theorem 1.28 . The determinant of A M is slightly more complicated: ∈ n det A := ( 1)σ(π)A A ...A , (1.2) − 1π(1) 2π(2) nπ(n) π where the sum is over all permutations π of the set 1, 2,...,n and σ(π) is { } the parity of the permutation π. Therefore a b det = ad bc, c d − and another example is the following:
A11 A12 A13 det A A A 21 22 23 A31 A32 A33 A22 A23 A21 A23 A21 A22 = A11 det A12 det + A13 det . A32 A33 − A31 A33 A31 A33 It can be proven that det(AB) = (det A)(det B). (1.3) 8 CHAPTER 1. FUNDAMENTALS OF OPERATORS AND MATRICES 1.2 Hilbert space
Let be a complex vector space. A functional , : C of two variablesH is called inner product H×H→
(1) x + y, z = x, z + y, z (x, y, z ), ∈ H (2) λx,y = λ x, y (λ C, x,y ) ∈ ∈ H (3) x, y = y, x (x, y ), ∈ H (4) x, x 0 for every x and x, x = 0 only for x = 0. ≥ ∈ H Condition (2) states that the inner product is conjugate linear in the first variable (and it is linear in the second variable). The Schwarz inequality
x, y 2 x, x y,y (1.4) | | ≤ holds. The inner product determines a norm for the vectors:
x := x, x . (1.5) This has the properties
x + y x + y and x, y x y . ≤ | |≤ x is interpreted as the length of the vector x. A further requirement in the definition of a Hilbert space is that every Cauchy sequence must be con- vergent, that is, the space is complete. (In the finite dimensional case, the completeness always holds.) The linear space Cn of all n-tuples of complex numbers becomes a Hilbert space with the inner product
y1 n y2 x, y = xiyi = [x1, x2,..., xn] . , . i=1 . yn where z denotes the complex conjugate of the complex number z C. An- other example is the space of square integrable complex-valued fun∈ction on the real Euclidean space Rn. If f and g are such functions then
f,g = f(x) g(x) dx Rn 1.2. HILBERT SPACE 9 gives the inner product. The latter space is denoted by L2(Rn) and it is infinite dimensional contrary to the n-dimensional space Cn. Below we are mostly satisfied with finite dimensional spaces. If x, y = 0 for vectors x and y of a Hilbert space, then x and y are called orthogonal , in notation x y. When H , then H⊥ := x : x h ⊥ ⊂ H { ∈ H ⊥ for every h H . For any subset H the orthogonal complement H⊥ is a closed subspace.∈ } ⊂ H A family e of vectors is called orthonormal if e , e = 1 and e , e = { i} i i i j 0 if i = j. A maximal orthonormal system is called a basis or orthonormal basis. The cardinality of a basis is called the dimension of the Hilbert space. (The cardinality of any two bases is the same.) In the space Cn, the standard orthonormal basis consists of the vectors
δ1 = (1, 0,..., 0), δ2 = (0, 1, 0,..., 0), ..., δn = (0, 0,..., 0, 1), (1.6) each vector has 0 coordinate n 1 times and one coordinate equals 1. −
Example 1.3 The space Mn of matrices becomes Hilbert space with the inner product A, B = Tr A∗B (1.7) which is called Hilbert–Schmidt inner product. The matrix units E(ij) (1 i, j n) form an orthonormal basis. ≤ ≤ It follows that the Hilbert–Schmidt norm
n 1/2 A := A, A = √Tr A∗A = A 2 (1.8) 2 | ij| i,j=1 is a norm for the matrices.
Assume that in an n dimensional Hilbert space linearly independent vec- tors v1, v2,...,vn are given. By the Gram-Schmidt procedure an or- thonormal{ basis can} be obtained by linear combination: 1 e := v , 1 v 1 1 1 e := w with w := v e , v e , 2 w 2 2 2 − 1 2 1 2 1 e3 := w3 with w3 := v3 e1, v3 e1 e2, v3 e2, w3 − − . . 1 e := w with w := v e , v e ... e , v e . n w n n n − 1 n 1 − − n−1 n n−1 n 10 CHAPTER 1. FUNDAMENTALS OF OPERATORS AND MATRICES
The next theorem tells that any vector has a unique Fourier expansion.
Theorem 1.4 Let e , e ,... be a basis in a Hilbert space . Then for any 1 2 H vector x the expansion ∈ H x = e , x e n n n holds. Moreover, x 2 = e , x 2 | n | n Let and be Hilbert spaces. A mapping A : is called linear if it preservesH linearK combination: H→K
A(λf + g)= λAf + Ag (f,g , λ, C). ∈ H ∈ The kernel and the range of A are
ker A := x : Ax =0 , ran A := Ax : x . { ∈ H } { ∈K ∈ H} The dimension formula familiar in linear algebra is
dim = dim (ker A) + dim (ran A). (1.9) H The quantity dim (ran A) is called the rank of A, rank A is the notation. It is easy to see that rank A dim , dim . ≤ H K Let e1, e2,...,en be a basis of the Hilbert space and f1, f2,...,fm be a basis of . The linear mapping A : is determinedH by the vectors Ae , K H→K j j =1, 2,...,n. Furthermore, the vector Aej is determined by its coordinates:
Aej = c1,jf1 + c2,jf2 + ... + cm,jfm.
The numbers c ,1 i m, 1 j n, form an m n matrix, it is called the i,j ≤ ≤ ≤ ≤ × matrix of the linear transformation A with respect to the bases (e1, e2,...,en) and (f1, f2,...,fm). If we want to distinguish the linear operator A from its matrix, then the latter one will be denoted by [A]. We have
[A] = f , Ae (1 i m, 1 j n). ij i j ≤ ≤ ≤ ≤ Note that the order of the basis vectors is important. We shall mostly con- sider linear operators of a Hilbert space into itself. Then only one basis is needed and the matrix of the operator has the form of a square. So a linear transformation and a basis yield a matrix. If an n n matrix is given, then it × 1.2. HILBERT SPACE 11 can be always considered as a linear transformation of the space Cn endowed with the standard basis (1.6). The inner product of the vectors x and y will be often denoted as x y , | | | this notation, sometimes called bra and ket, is popular in physics. On the other hand, x y is a linear operator which acts on the vector z as | | | ( x y ) z := x y z y z x . | | | | | ≡ | | Therefore, x1 x 2 x y = . [y , y ,..., y ] | | 1 2 n . x n is conjugate linear in y , while x y is linear. | | The next example shows the possible use of the bra and ket.
Example 1.5 If X,Y M (C), then ∈ n n Tr E(ij)XE(ji)Y = (Tr X)(Tr Y ). (1.10) i,j=1 Since both sides are bilinear in the variables X and Y , it is enough to check that case X = E(ab) and Y = E(cd). Simple computation gives that the left-hand-side is δabδcd and this is the same as the right-hand-side. Another possibility is to use the formula E(ij)= e e . So | i j| Tr E(ij)XE(ji)Y = Tr e e X e e Y = e X e e Y e | i j| | j i| j| | j i| | i i,j i,j i,j = e X e e Y e j| | j i| | i j i and the right-hand-side is (Tr X)(Tr Y ).
Example 1.6 Fix a natural number n and let be the space of polynomials of at most n degree. Assume that the variableH of these polynomials is t and the coefficients are complex numbers. The typical elements are
n n i i p(t)= uit and q(t)= vit . i=0 i=0 12 CHAPTER 1. FUNDAMENTALS OF OPERATORS AND MATRICES
If their inner product is defined as
n p(t), q(t) := u v , i i i=0 then 1, t, t2,...,tn is an orthonormal basis. { } The differentiation is a linear operator:
n n u tk ku tk−1 . k → k k=0 k=0 In the above basis, its matrix is 0 1 0 ... 0 0 0 0 2 ... 0 0 0 0 0 ... 0 0 ...... (1.11) ...... 0 0 0 0 ... 0 n 0 0 0 ... 0 0 This is an upper triangular matrix, the (i, j) entry is 0 if i > j.
Let 1, 2 and 3 be Hilbert spaces and we fix a basis in each of them. If B : H H andHA : are linear mappings, then the composition H1 → H2 H2 → H3 f A(Bf) (f ) → ∈ H3 ∈ H1 is linear as well and it is denoted by AB. The matrix [AB] of the composition AB can be computed from the matrices [A] and [B] as follows
[AB]ij = [A]ik[B]kj. (1.12) k The right-hand-side is defined to be the product [A][B] of the matrices [A] and [B]. Then [AB] = [A][B] holds. It is obvious that for a k m matrix × [A] and an m n matrix [B], their product [A][B] is a k n matrix. × × Let 1 and 2 be Hilbert spaces and we fix a basis in each of them. If A, B : H Hare linear mappings, then their linear combination H1 → H2 (λA + B)f λ(Af)+ (Bf) → is a linear mapping and
[λA + B]ij = λ[A]ij + [B]ij. (1.13) 1.2. HILBERT SPACE 13
Let be a Hilbert space. The linear operators form an algebra. This algebraH B( ) has a unit, the identity operatorH → denoted H by I and the product is non-commutative.H Assume that is n dimensional and fix a basis. H Then to each linear operator A B( ) an n n matrix A is associated. The correspondence A [A] is an∈ algebraicH isomorphism× from B( ) to the → H algebra Mn(C) of n n matrices. This isomorphism shows that the theory of linear operators on an× n dimensional Hilbert space is the same as the theory of n n matrices. × Theorem 1.7 (Riesz-Fischer theorem) Let φ : C be a linear map- H → ping on a finite dimensional Hilbert space . Then there is a unique vector v such that φ(x)= v, x for every vectorH x . ∈ H ∈ H
Proof: Let e1, e2,...,en be an orthonormal basis in . Then we need a vector v such that φ(e )= v, e . So H ∈ H i i
v = φ(ei)ei i will satisfy the condition. The linear mappings φ : C are called functionals. If the Hilbert space is not finite dimensional,H →then in the previous theorem the condition φ(x) c x should be added, where c is a positive number. | | ≤ The operator norm of a linear operator A : is defined as H→K A := sup Ax : x , x =1 . { ∈ H } It can be shown that A is finite. In addition to the common properties A + B A + B and λA = λ A , the submultiplicativity ≤ | | AB A B ≤ also holds. If A 1, then the operator A is called contraction. ≤ The set of linear operators is denoted by B( ). The convergence A A means A A H0. → In H the case of finiteH dimensional Hilbert n → − n → space the norm here can be the operator norm, but also the Hilbert-Schmidt norm. The operator norm of a matrix is not expressed explicitly by the matrix entries.
Example 1.8 Let A B( ) and A < 1. Then I A is invertible and ∈ H − ∞ (I A)−1 = An. − n=0 14 CHAPTER 1. FUNDAMENTALS OF OPERATORS AND MATRICES
Since
N (I A) An = I AN+1 and AN+1 A N+1, − − ≤ n=0 we can see that the limit of the first equation is
∞ (I A) An = I. − n=0 This shows the statement which is called Neumann series.
Let and be Hilbert spaces. If T : is a linear operator, then H K H→K its adjoint T ∗ : is determined by the formula K → H x,Ty = T ∗x, y (x ,y ). (1.14) K H ∈K ∈ H The operator T B( ) is called self-adjoint if T ∗ = T . The operator T is ∈ H self-adjoint if and only if x, T x is a real number for every vector x . For self-adjoint operators and matrices the notations B( )sa and Msa are∈ H used. H n Theorem 1.9 The properties of the adjoint:
(1) (A + B)∗ = A∗ + B∗, (λA)∗ = λA∗ (λ C), ∈ (2) (A∗)∗ = A, (AB)∗ = B∗A∗,
(3) (A−1)∗ =(A∗)−1 if A is invertible,
(4) A = A∗ , A∗A = A 2.
Example 1.10 Let A : be a linear mapping and e1, e2,...,en be a basis in the Hilbert spaceH. → The H (i, j) element of the matrix of A is e , Ae . H i j Since e , Ae = e , A∗e , i j j i this is the complex conjugate of the (j, i) element of the matrix of A∗. If A is self-adjoint, then the (i, j) element of the matrix of A is the con- jugate of the (j, i) element. In particular, all diagonal entries are real. The self-adjoint matrices are also called Hermitian matrices.
Theorem 1.11 (Projection theorem) Let be a closed subspace of a Hilbert space . Any vector x can be writtenM in a unique way in the H ∈ H form x = x + y, where x and y . 0 0 ∈M ⊥M 1.2. HILBERT SPACE 15
Note that a subspace of a finite dimensional Hilbert space is always closed.
The mapping P : x x0 defined in the context of the previous theorem is called orthogonal projection → onto the subspace . This mapping is M linear: P (λx + y)= λP x + P y. Moreover, P 2 = P = P ∗. The converse is also true: If P 2 = P = P ∗, then P is an orthogonal projection (onto its range).
Example 1.12 The matrix A M is self-adjoint if A = A . A particular ∈ n ji ij example is the Toeplitz matrix:
a1 a2 a3 ... an−1 an a2 a1 a2 ... an−2 an−1 a3 a2 a1 ... an−3 an−2 ...... , (1.15) ...... . . . . . an−1 an−2 an−3 ... a1 a2 an an−1 an−2 ... a2 a1 where a R. 1 ∈ An operator U B( ) is called unitary if U ∗ is the inverse of U. Then ∈ H U ∗U = I and x, y = U ∗Ux,y = Ux,Uy for any vectors x, y . Therefore the unitary operators preserve the inner product. In particular,∈ H orthogonal unit vectors are mapped into orthogonal unit vectors.
Example 1.13 The permutation matrices are simple unitaries. Let π be a permutation of the set 1, 2,...,n . The Ai,π(i) entries of A Mn(C) are 1 and all others are 0. Every{ row and} every column contain exac∈tly one 1 entry. If such a matrix A is applied to a vector, it permutes the coordinates:
0 1 0 x1 x2 0 0 1 x = x . 2 3 1 0 0 x3 x1 This shows the reason of the terminology. Another possible formalism is A(x1, x2, x3)=(x2, x3, x1).
An operator A B( ) is called normal if AA∗ = A∗A. It follows immediately that ∈ H Ax = A∗x (1.16) 16 CHAPTER 1. FUNDAMENTALS OF OPERATORS AND MATRICES for any vector x . Self-adjoint and unitary operators are normal. ∈ H The operators we need are mostly linear, but sometimes conjugate-linear operators appear. Λ : is conjugate-linear if H→K Λ(λx + y)= λ Λx + Λy for any complex numbers λ and and for any vectors x, y . The adjoint Λ∗ of the conjugate-linear operator Λ is determined by the∈ equation H x, Λy = y, Λ∗x (x ,y ). (1.17) K H ∈K ∈ H A mapping φ : C is called complex bilinear form if φ is linear in the second variableH×H→ and conjugate linear in the first variables. The inner product is a particular example.
Theorem 1.14 On a finite dimensional Hilbert space there is a one-to-one correspondence φ(x, y)= Ax, y between the complex bilinear forms φ : C and the linear operators A : . H×H→ H → H Proof: Fix x . Then y φ(x, y) is a linear functional. Due to the Riesz-Fischer theorem∈ H φ(x, y)= →z,y for a vector z . We set Ax = z. ∈ H The polarization identity 4φ(x, y) = φ(x + y, x + y) + iφ(x + iy, x + iy) φ(x y, x y) iφ(x iy, x iy) (1.18) − − − − − − shows that a complex bilinear form φ is determined by its so-called quadratic form x φ(x, x). → The n n matrices Mn can be identified with the linear operators B( ) where the× Hilbert space is n-dimensional. To make a precise identificationH H an orthonormal basis should be fixed in . H
1.3 Jordan canonical form
A Jordan block is a matrix a 1 0 0 0 a 1 0 J (a)= 0 0 a 0 , (1.19) k . . . . . ...... 0 0 0 a 1.3. JORDAN CANONICAL FORM 17 where a C. This is an upper triangular matrix J (a) M . We use also ∈ k ∈ k the notation Jk := Jk(0). Then
Jk(a)= aIk + Jk (1.20) and the sum consists of commuting matrices.
Example 1.15 The matrix Jk is 1 if j = i + 1, (J ) = k ij 0 otherwise. Therefore 1 if j = i + 1 and k = i + 2, (J ) (J ) = k ij k jk 0 otherwise. It follows that 1 if j = i + 2, (J 2) = k ij 0 otherwise. We observe that taking the powers of Jk the line of the 1 entries is going k m upper, in particular Jk = 0. The matrices Jk :0 m k 1 are linearly independent. { ≤ ≤ − } If a = 0, then det J (a) = 0 and J (a) is invertible. We can search for the k k inverse by the equation
k−1 j (aIk + Jk) cjJk = Ik. j=0 Rewriting this equation we get
k−1 j ac0Ik + (acj + cj−1)Jk = Ik. j=1 The solution is c = ( a)−j−1 (0 j k 1). j − − ≤ ≤ − In particular,
a 1 0 −1 a−1 a−2 a−3 − 0 a 1 = 0 a−1 a−2 . 0 0 a 0 0 −a−1 Computation with a Jordan block is convenient. 18 CHAPTER 1. FUNDAMENTALS OF OPERATORS AND MATRICES
The Jordan canonical form theorem is the following. Theorem 1.16 Given a matrix X M , there is an invertible matrix S ∈ n ∈ Mn such that J (λ ) 0 0 k1 1 0 Jk2 (λ2) 0 −1 −1 X = S . . . . S = SJS , . . .. . 0 0 J (λ ) km m where k + k + + k = n. The Jordan matrix J is uniquely determined 1 2 m (up to the permutation of the Jordan blocks in the diagonal.)
Note that the numbers λ1,λ2,...,λm are not necessarily different. Exam- ple 1.15 showed that it is rather easy to handle a Jordan block. If the Jordan canonical decomposition is known, then the inverse can be obtained. The theorem is about complex matrices. Example 1.17 An essential application is concerning the determinant. Since det X = det(SJS−1) = det J, it is enough to compute the determinant of the upper-triangular Jordan matrix J. Therefore m kj det X = λj . (1.21) j=1 The characteristic polynomial of X M is defined as ∈ n p(x) := det(xI X) n − From the computation (1.21) we have m m m k p(x)= (x λ )kj = xn k λ xn−1 + +( 1)n λ j . (1.22) − j − j j − j j=1 j=1 j=1 The numbers λj are roots of the characteristic polynomial.
The powers of a matrix X Mn are well-defined. For a polynomial p(x)= m k ∈ k=0 ckx the matrix p(X) is m k ckX . k=0 If q is a polynomial, then it is annihilating for a matrix X M if q(X) = 0. ∈ n The next result is the Cayley-Hamilton theorem.
Theorem 1.18 If p is the characteristic polynomial of X Mn, then p(X)= 0. ∈ 1.4. SPECTRUM AND EIGENVALUES 19 1.4 Spectrum and eigenvalues
Let be a Hilbert space. For A B( ) and λ C, we say that λ is an eigenvalueH of A if there is a non-zero∈ H vector v ∈ such that Av = λv. Such a vector v is called an eigenvector of A for∈ the H eigenvalue λ. If is H finite-dimensional, then λ C is an eigenvalue of A if and only if A λI is not invertible. ∈ − Generally, the spectrum σ(A) of A B( ) consists of the numbers ∈ H λ C such that A λI is not invertible. Therefore in the finite-dimensional case∈ the spectrum is− the set of eigenvalues.
Example 1.19 We show that σ(AB)= σ(BA) for A, B Mn. It is enough to prove that det(λI AB) = det(λI BA). Assume first that∈ A is invertible. − − We then have det(λI AB) = det(A−1(λI AB)A) = det(λI BA) − − − and hence σ(AB)= σ(BA).
When A is not invertible, choose a sequence εk C σ(A) with εk 0 and set A := A ε I. Then ∈ \ → k − k det(λI AB) = lim det(λI AkB) = lim det(λI BAk) = det(λI BA). − k→∞ − k→∞ − − (Another argument is in Exercise 3 of Chapter 2.)
Example 1.20 In the history of matrix theory the particular matrix 0 1 0 ... 0 0 1 0 1 ... 0 0 0 1 0 ... 0 0 ...... (1.23) ...... 0 0 0 ... 0 1 0 0 0 ... 1 0 has importance. Its eigenvalues were computed by Joseph Louis Lagrange in 1759. He found that the eigenvalues are 2 cos jπ/(n +1) (j =1, 2,...,n).
The matrix (1.23) is tridiagonal. This means that A = 0 if i j > 1. ij | − | Example 1.21 Let λ R and consider the matrix ∈ λ 1 0 J (λ)= 0 λ 1 . (1.24) 3 0 0 λ 20 CHAPTER 1. FUNDAMENTALS OF OPERATORS AND MATRICES
Now λ is the only eigenvalue and (y, 0, 0) is the only eigenvector. The situation −1 is similar in the k k generalization Jk(λ): λ is the eigenvalue of SJk(λ)S for an arbitrary invertible× S and there is one eigenvector (up to constant multiple). This has the consequence that the characteristic polynomial gives the eigenvalues without the multiplicity.
If X has the Jordan form as in Theorem 1.16, then all λj’s are eigenvalues. Therefore the roots of the characteristic polynomial are eigenvalues.
For the above J3(λ) we can see that
2 2 J3(λ)(0, 0, 1) = (0, 1,λ), J3(λ) (0, 0, 1) = (1, 2λ,λ ), therefore (0, 0, 1) and these two vectors linearly span the whole space C3. The vector (0, 0, 1) is called cyclic vector. n Assume that a matrix X Mn has a cyclic vector v C which means 2 ∈ n−1 n ∈ −1 that the set v,Xv,X v,...,X v spans C . Then X = SJn(λ)S with some invertible{ matrix S, the Jordan} canonical form is very simple.
Theorem 1.22 Assume that A B( ) is normal. Then there exist λ1,..., λ C and u ,...,u such∈ that Hu ,...,u is an orthonormal basis of n ∈ 1 n ∈ H { 1 n} and Au = λ u for all 1 i n. H i i i ≤ ≤ Proof: Let us prove by induction on n = dim . The case n = 1 trivially holds. Suppose the assertion holds for dimension nH 1. Assume that dim = n and A B( ) is normal. Choose a root λ of det(−λI A) = 0. As explainedH ∈ H 1 − before the theorem, λ1 is an eigenvalue of A so that there is an eigenvector u with Au = λ u . One may assume that u is a unit vector, i.e., u = 1. 1 1 1 1 1 1 Since A is normal, we have
(A λ I)∗(A λ I) = (A∗ λ I)(A λ I) − 1 − 1 − 1 − 1 = A∗A λ A λ A∗ + λ λ I − 1 − 1 1 1 = AA∗ λ A λ A∗ + λ λ I − 1 − 1 1 1 = (A λ I)(A λ I)∗, − 1 − 1 that is, A λ I is also normal. Therefore, − 1 (A∗ λ I)u = (A λ I)∗u = (A λ I)u =0 − 1 1 − 1 1 − 1 1 so that A∗u = λ u . Let := u ⊥, the orthogonal complement of u . 1 1 1 H1 { 1} { 1} If x then ∈ H1 Ax, u = x, A∗u = x, λ u = λ x, u =0, 1 1 1 1 1 1 A∗x, u = x, Au = x, λ u = λ x, u =0 1 1 1 1 1 1 1.4. SPECTRUM AND EIGENVALUES 21
∗ ∗ so that Ax, A x 1. Hence we have A 1 1 and A 1 1. So ∈ H H ⊂∗ H ∗ H ⊂ H one can define A1 := A H1 B( 1). Then A1 = A H1 , which implies that A is also normal.| Since∈ dimH = n 1, the induction| hypothesis 1 H1 − can be applied to obtain λ2,...,λn C and u2,...,un 1 such that u ,...,u is an orthonormal basis of ∈ and A u = λ u for∈ all Hi =2,...,n. { 2 n} H1 1 i i i Then u1,u2,...,un is an orthonormal basis of and Aui = λiui for all i =1, 2{,...,n. Thus} the assertion holds for dimensionH n as well. It is an important consequence that the matrix of a normal operator is diagonal in an appropriate orthonormal basis and the trace is the sum of the eigenvalues.
Theorem 1.23 Assume that A B( ) is self-adjoint. If Av = λv and Aw = w with non-zero eigenvectors∈ v,wH and the eigenvalues λ and are different, then v w and λ, R. ⊥ ∈ Proof: First we show that the eigenvalues are real:
λ v, v = v,λv = v, Av = Av, v = λv,v = λ v, v . The v,w = 0 orthogonality comes similarly: v,w = v, w = v,Aw = Av,w = λv,w = λ v,w .
If A is a self-adjoint operator on an n-dimensional Hilbert space, then from the eigenvectors we can find an orthonormal basis v1, v2,...,vn. If Avi = λivi, then n A = λ v v (1.25) i| i i| i=1 which is called the Schmidt decomposition. The Schmidt decomposition is unique if all the eigenvalues are different, otherwise not. Another useful decomposition is the spectral decomposition. Assume that the self-adjoint operator A has the eigenvalues 1 > 2 >...> k. Then
k
A = jPj, (1.26) j=1 where Pj is the orthogonal projection onto the subspace spanned by the eigen- vectors with eigenvalue j. (From the Schmidt decomposition (1.25),
P = v v , j | i i| i 22 CHAPTER 1. FUNDAMENTALS OF OPERATORS AND MATRICES
where the summation is over all i such that λi = j.) This decomposition is always unique. Actually, the Schmidt decomposition and the spectral de- composition exist for all normal operators. If λ 0 in (1.25), then we can set x := √λ v and we have i ≥ | i i| i n A = x x . | i i| i=1 If the orthogonality of the vectors x is not assumed, then there are several | i similar decompositions, but they are connected by a unitary. The next lemma and its proof is a good exercise for the bra and ket formalism. (The result and the proof is due to Schr¨odinger [73].)
Lemma 1.24 If n n A = x x = y y , | j j| | i i| j=1 i=1 n then there exists a unitary matrix [Uij]i,j=1 such that
n U x = y . (1.27) ij| j | i j=1 Proof: Assume first that the vectors x are orthogonal. Typically they are | j not unit vectors and several of them can be 0. Assume that x , x ,..., x | 1 | 2 | k are not 0 and xk+1 = ... = xn = 0. Then the vectors yi are in the linear span of x :1| j k , therefore| | {| j ≤ ≤ } k x y y = j| i x | i x x | j j=1 j j | is the orthogonal expansion. We can define [Uij] by the formula
x y U = j| i (1 i n, 1 j k). ij x x ≤ ≤ ≤ ≤ j| j We easily compute that
k k x y y x U U ∗ = t| i i| u it iu x x x x i=1 i=1 t t u u | | x A x = t| | u = δ , x x x x t,u u| u t| t 1.4. SPECTRUM AND EIGENVALUES 23
and this relation shows that the k column vectors of the matrix [Uij] are orthonormal. If k < n, then we can append further columns to get an n n unitary, see Exercise 37. (One can see in (1.27) that if x = 0, then U does× | j ij not play any role.) In the general situation
n n A = z z = y y | j j| | i i| j=1 i=1 we can make a unitary U from an orthogonal family to yi ’s and a unitary V ∗ | from the same orthogonal family to zi ’s and UV goes from zi ’s to yi ’s. | | |
Example 1.25 Let A B( ) be a self-adjoint operator with eigenvalues λ λ ... λ (counted∈ H with multiplicity). Then 1 ≥ 2 ≥ ≥ n λ = max v, Av : v , v =1 . (1.28) 1 { ∈ H } We can take the Schmidt decomposition (1.25). Assume that
max v, Av : v , v =1 = w,Aw { ∈ H } for a unit vector w. This vector has the expansion
n w = c v i| i i=1 and we have n w,Aw = c 2λ λ . | i| i ≤ 1 i=1 The equality holds if and only if λi < λ1 implies ci = 0. The maximizer should be an eigenvector with eigenvalue λ1. Similarly, λ = min v, Av : v , v =1 . (1.29) n { ∈ H } The formulae (1.28) and (1.29) will be extended below.
Theorem 1.26 (Poincar´e’s inequality) Let A B( ) be a self-adjoint ∈ H operator with eigenvalues λ1 λ2 ... λn (counted with multiplicity) and let be a k-dimensional subspace≥ ≥ of .≥ Then there are unit vectors x, y suchK that H ∈K x, Ax λ and y,Ay λ . ≤ k ≥ k 24 CHAPTER 1. FUNDAMENTALS OF OPERATORS AND MATRICES
Proof: Let vk,...,vn be orthonormal eigenvectors corresponding to the eigenvalues λk,...,λn. They span a subspace of dimension n k + 1 which must have intersection with . Take a unitM vector x − which K ∈K∩M has the expansion n
x = civi i=k and it has the required property:
n n x, Ax = c 2λ λ c 2 = λ . | i| i ≤ k | i| k i=k i=k To find the other vector y, the same argument can be used with the matrix A. − The next result is a minimax principle.
Theorem 1.27 Let A B( ) be a self-adjoint operator with eigenvalues λ λ ... λ (counted∈ withH multiplicity). Then 1 ≥ 2 ≥ ≥ n λ = min max v, Av : v , v =1 : , dim = n +1 k . k { ∈K } K ⊂ H K − Proof: Let vk,...,vn be orthonormal eigenvectors corresponding to the eigenvalues λ ,...,λ . They span a subspace of dimension n + 1 k. k n K − According to (1.28) we have
λ = max v, Av : v k { ∈ K} and it follows that in the statement of the theorem is true. ≥ To complete the proof we have to show that for any subspace of di- K mension n +1 k there is a unit vector v such that λ v, Av , or λ − k ≤ − k ≥ v, ( A)v . The decreasing eigenvalues of A are λn λn−1 λ1 where − the ℓth is λ . The existence of− a unit− vector≥ −v is guaranteed≥ ≥− by − n+1−ℓ the Poincar´e’s inequality and we take ℓ with the property n +1 ℓ = k. −
1.5 Trace and determinant
When e1,...,en is an orthonormal basis of , the trace Tr A of A B( ) is defined{ as } H ∈ H n Tr A := e , Ae . (1.30) i i i=1 1.5. TRACE AND DETERMINANT 25
Theorem 1.28 The definition (1.30) is independent of the choice of an or- thonormal basis e ,...,e and Tr AB = Tr BA for all A, B B( ). { 1 n} ∈ H Proof: We have
n n n n Tr AB = e , ABe = A∗e , Be = e , A∗e e , Be i i i i j i j i i=1 i=1 i=1 j=1 n n n = e , B∗e e , Ae = e , BAe = Tr BA. i j i j j j j=1 i=1 j=1 Now, let f ,...,f be another orthonormal basis of . Then a unitary { 1 n} H U is defined by Ue = f , 1 i n, and we have i i ≤ ≤ n n f , Af = Ue , AUe = Tr U ∗AU = Tr AUU ∗ = Tr A, i i i i i=1 i=1 which says that the definition of Tr A is actually independent of the choice of an orthonormal basis. When A M , the trace of A is nothing but the sum of the principal ∈ n diagonal entries of A:
Tr A = A + A + + A . 11 22 nn The trace is the sum of the eigenvalues. Computation of the trace is very simple, the case of the determinant (1.2) is very different. In terms of the Jordan canonical form described in Theorem 1.16, we have
m m kj Tr X = kjλj and det X = λj . j=1 j=1 Formula (1.22) shows that trace and determinant are certain coefficients of the characteristic polynomial. The next example is about the determinant of a special linear mapping.
Example 1.29 On the linear space Mn we can define a linear mapping α : ∗ Mn Mn as α(A) = V AV , where V Mn is a fixed matrix. We are interested→ in det α. ∈ Let V = SJS−1 be the canonical Jordan decomposition and set
−1 −1 ∗ ∗ ∗ α1(A)= S A(S ) , α2(B)= JBJ , α3(C)= SCS . 26 CHAPTER 1. FUNDAMENTALS OF OPERATORS AND MATRICES
Then α = α α α and det α = det α det α det α . Since α = α−1, 3 ◦ 2 ◦ 1 3 × 2 × 1 1 3 we have det α = det α2, so only the Jordan block part has influence to the determinant. The following example helps to understand the situation. Let λ x J = 1 0 λ 2 and 1 0 0 1 0 0 0 0 A = , A = , A = , A = . 1 0 0 2 0 0 3 1 0 4 0 1 Then A , A , A , A is a basis in M . If α(A)= JAJ ∗, then from the data { 1 2 3 4} 2 2 α(A1)= λ1A1, α(A2)= λ1xA1 + λ1λ2A2, 2 2 α(A3)= λ1xA1 + λ1λ2A3, α(A4)= x A1 + xλ2A2 + xλ2A3 + λ2A4 we can observe that the matrix of α is upper triangular:
2 2 λ1 xλ1 xλ1 x 0 λ λ 0 xλ 1 2 2 , 0 0 λ1λ2 xλ2 0 0 0 λ2 2 So its determinant is the product of the diagonal entries:
2 2 4 4 4 λ1(λ1λ2)(λ1λ2)λ2 = λ1λ2 = (det J) . Now let J M and assume that only the entries J and J can be ∈ n ii i,i+1 non-zero. In Mn we choose the basis of the matrix units, E(1, 1), E(1, 2),...,E(1, n), E(2, 1),...,E(2, n),...,E(3, 1),...,E(n, n). We want to see that the matrix of α is upper triangular. From the computation JE(j, k))J ∗ = J J E(j 1,k 1)+ J J E(j 1,k) j−1,j k−1,k − − j−1,j k,k − +J J E(j, k 1) + J J E(j, k) jj k−1,k − jj k,k we can see that the matrix of the mapping A JAJ ∗ is upper triangular. (In → the lexicographical order of the matrix units E(j 1,k 1), E(j 1,k), E(j, k 1) are before E(j, k).) The determinant is the product− − of the diagonal− entries:− n m n n n JjjJkk = (det J)Jkk = (det J) det J j,k =1 k =1 1.5. TRACE AND DETERMINANT 27
n It follows that the determinant of α(A) = V AV ∗ is (det V )n det V , since the determinant of V equals to the determinant of its Jordan block J. If β(A) = V AV t, then the argument is similar, det β = (det V )2n, only the conjugate is missing. Next we deal with the space of real symmetric n n matrices. Set M × γ : , γ(A)= V AV t. The canonical Jordan decomposition holds also forM→M real matrices and it gives again that the Jordan block J of V determines the determinant of γ. To have a matrix of A JAJ t, we need a basis in . We can take → M E(j, k)+ E(k, j):1 j k n . { ≤ ≤ ≤ } Similarly to the above argument, one can see that the matrix is upper triangu- lar. So we need the diagonal entries. J(E(j, k)+ E(k, j))J ∗ can be computed from the above formula and the coefficient of E(j, k)+ E(k, j) is JkkJjj. The determinant is n+1 n+1 JkkJjj = (det J) = (det V ) . 1≤j ≤k≤n
Theorem 1.30 The determinant of a positive matrix A Mn does not ex- ceed the product of the diagonal entries: ∈ n det A A ≤ ii i=1 This is a consequence of the concavity of the log function, see Example 4.18 (or Example 1.43). ij If A Mn and 1 i, j n, then in the next theorems [A] denotes the (n 1) ∈(n 1) matrix≤ which≤ is obtained from A by striking out the ith row − × − and the jth column.
Theorem 1.31 Let A M and 1 j n. Then ∈ n ≤ ≤ n det A = ( 1)i+jA det([A]ij). − ij i=1 Example 1.32 Here is a simple computation using the row version of the previous theorem. 1 2 0 det 3 0 4 =1 (0 6 5 4) 2 (3 6 0 4)+0 (3 5 0 0). 0 5 6 − − − − 28 CHAPTER 1. FUNDAMENTALS OF OPERATORS AND MATRICES
The theorem is useful if the matrix has several 0 entries.
The determinant has an important role in the computation of the inverse.
Theorem 1.33 Let A M be invertible. Then ∈ n det([A]ik) (A−1) =( 1)i+k ki − det A for 1 i, k n. ≤ ≤ Example 1.34 A standard formula is
a b −1 1 d b = − c d ad bc c a − − when the determinant ad bc is not 0. −
The next example is about the Haar measure on some matrices. Mathe- matical analysis is essential there.
Example 1.35 denotes the set of invertible real 2 2 matrices. is a G R R4 × G (non-commutative) group and M2( ) ∼= is an open set. Therefore it is a locally compact topologicalG group. ⊂ The Haar measure is defined by the left-invariance property: