Trace Inequalities and Quantum Entropy: an Introductory Course

Contemporary Mathematics

Trace Inequalities and Quantum Entropy: An Introductory Course

Eric Carlen

Abstract. We give an elementary introduction to the subject of trace inequalities and related topics in analysis, with a special focus on results that are relevant to quantum statistical mechanics. This introductory course was presented at Entropy and the Quantum: A school on analytic and functional inequalities with applications, Tucson, Arizona, March 16-20, 2009.

Contents 1. Introduction 74 1.1. Basic deﬁnitions and notation 74 1.2. Trace inequalities and entropy 76 2. Operator convexity and monotonicity 81 2.1. Some examples and the L¨owner-Heinz Theorem 81 2.2. Convexity and monotonicity for trace functions 85 2.3. Klein’s Inequality and the Peierls-Bogoliubov Inequality 87 3. The joint convexity of certain operator functions 90 3.1. The joint convexity of the map (A, B) B∗A−1B∗ on H+ M 90 7→ n × n 3.2. Joint concavity of the harmonic mean 91 3.3. Joint concavity of the geometric mean 93 3.4. The arithmetic-geometric-harmonic mean inequality 94 4. Projections onto -subalgebras and convexity inequalities 94 ∗ 4.1. A simple example 94 4.2. The von Neumann Double Commutant Theorem 96 4.3. Propertiesoftheconditionalexpectation 100 4.4. Pinching, conditional expectations, and the Operator Jensen Inequality 106

1991 Mathematics Subject Classiﬁcation. 47A63, 15A90. Key words and phrases. convexity, concavity, trace inequality, entropy, operator norms. Work partially supported by U.S. National Science Foundation grant DMS 09-01632. c 2010 by the author. This paper may be reproduced, in its entirety, for non-commercial purposes.

c 0000 (copyright holder) 73 74 ERIC CARLEN

5. Tensor products 108 5.1. Basic deﬁnitions and elementary properties of tensor products 108 5.2. Tensor products and inner products 111 5.3. Tensor products of matrices 113 5.4. The partial trace 114 5.5. Ando’s identity 118 6. Lieb’sConcavityTheoremandrelatedresults 118 6.1. Lieb’s Concavity Theorem 118 6.2. Ando’s Convexity Theorem 119 6.3. Lieb’s Concavity Theorem and joint convexity of the relative entropy 120 6.4. Monotonicity of the relative entropy 121 6.5. Subadditivity and strong subadditivity of the entropy 122 7. Lp norms for matrices and entropy inequalities 123 7.1. The matricial analogs of the Lp norms 123 7.2. Convexity of A Tr (B∗ApB)q/p and certain of its applications 125 7→ 7.3. Proof of the convexity of A Tr (B∗ApB)q/p 129 7→ 8. Brascamp-Liebtypeinequalitiesfortraces 131 8.1. A generalized Young’s inequality in the context of non-commutative integration 132 8.2. A generalized Young’s inequality for tensor products 133 8.3. Subadditivity of Entropy and Generalized Young’s Inequalities 135 Acknoledgements 138 References 138

1. Introduction 1.1. Basic definitions and notation. Let M denote the space of n n n × matrices, which we sometimes refer to as operators on Cn. The inner product on Cn, and other Hilbert spaces, shall be denoted by , , conjugate linear on the left, h· ·i and the Hermitian conjugate, or simply adjoint of an A M shall be denoted by ∈ n A∗. Let Hn denote the n n Hermitian matrices, i.e.; the subset of Mn consisting ×∗ of matrices A such that A = A. There is a natural partial order on Hn: A matrix A H is said to be positive semi-definite in case ∈ n (1.1) v, Av 0 for all v Cn , h i≥ ∈ in which case we write A 0. A is said to positive definite in case the inequality ≥ in (1.1) is strict for all v = 0 in Cn, in which case we write A> 0. Notice that in 6 the finite dimensional case we presently consider, A > 0 if and only A 0 and A ≥ is invertible. By the Spectral Theorem, A 0 if and only if all of the eigenvalues of A are ≥ non-negative, which is the case if and only if there is some B M such that ∈ n A = B∗B. TRACE INEQUALITIES AND QUANTUM ENTROPY 75

Finally, we partially order H by defining A B to mean that A B 0. We n ≥ − ≥ shall also write A>B to mean that A B > 0. Let H+ denote the n n positive − n × definite matrices. For A M , the trace of A, Tr(A), is defined by ∈ n n Tr(A)= Aj,j . j=1 X For any A, B M , ∈ n n n (1.2) Tr(AB)= Ai,j Bj,i = Bj,iAi,j = Tr(BA) . i,j=1 i,j=1 X X This is known as cyclicity of the trace. It tells us, for example that if u ,...,u { 1 n} is any orthonormal basis for Cn, then

n (1.3) Tr(A)= u , Au . h j ji j=1 X Indeed if U is the unitary matrix whose jth column is u , (U ∗AU) = u , Au , j j,j h j j i and then by (1.2), Tr(U ∗AU) = Tr(AUU ∗) = Tr(A). Thus, Tr(A) is a unitarily invariant function of A, and as such, depends only on the eigenvalues of A. In fact, taking u ,...,u to be an orthonormal basis of Cn with Au = λ u , j =1,...,n, { 1 n} j j j (1.2) yields

n (1.4) Tr(A)= λj . j=1 X An n n density matrix is a matrix ρ H+ with Tr(ρ) = 1. The symbols ρ × ∈ n (and σ) are traditional for density matrices, and they are the quantum mechanical analogs of probability densities, and they are in one-to-one correspondence with the set of states of a quantum mechanical system whose observables are self adjoint operators on Cn. n Let Sn denote the set of density matrices on C . This is a convex set, and it is easy to see that the extreme points of Sn are precisely the rank one orthogonal projections on Cn. These are called pure states. Of course in many, if not most, quantum mechanical systems, the observables are operators on an infinite dimensional, but separable, Hilbert space . It is H easy to extend the definition of the trace, and hence of density matrices, to this infinite dimensional setting. However, it is not hard to show that any positive semi- definite operator ρ on with Tr(ρ) = 1 is a compact operator, and thus it may be H approximated in the operator norm by a finite rank operator. Simon’s book [31] contains a very elegant account of all this. Here we simply note that essentially for this reason, the essential aspects of the inequalities for density matrices that we study here are contained in the finite dimensional case, to which we restrict our attention for the most part of these notes. 76 ERIC CARLEN

1.2. Trace inequalities and entropy. Much of what we discuss here is directly related to some notion of entropy.

1.1. Definition. The von Neuman entropy of ρ S , S(ρ), is defined by ∈ n (1.5) S(ρ)= Tr(ρ log ρ) . − The operator ρ log ρ H is defined using the Spectral Theorem. This says ∈ n that for every self adjoint operator A on Cn, there exists an orthonormal basis u ,...,u consisting of eigenvectors of A, meaning that for some real numbers { 1 n} λ ,...,λ , Au = λ u for j =1,...,n. The set λ ,...,λ is the spectrum of { 1 n} j j j { 1 n} A, and the numbers in it are the eigenvalues of A. n Given such a basis of eigenvectors, let Pj be the orthogonal projection in C onto the span of uj . Then A can be written in the form n (1.6) A = λj Pj j=1 X This is the spectral decomposition of A. For any function f : R R, the operator f(A) H is then defined by → ∈ n m (1.7) f(A)= f(λj )Pj . j=1 X Using the evident fact that Pj Pk = δj,kPj , it is easily checked that if f is a k j polynomial in the real variable t, say f(t) = j=0 aj t , then for this definition, k j f(A)= j=0 aj A . It is easily checked that inP general, as in the polynomial case, f(A) does not depend on the choice of the orthonormal basis u ,...,u consisting P { 1 n} of eigenvectors of A. A case that will be of particular focus in these notes is f(t) = t log(t). Given ρ S , let u ,...,u be an orthonormal basis of Cn consisting of eigenvectors ∈ n { 1 n} of ρ: ρu = λ u . Since ρ 0, each λ 0 for each j. Then by (1.3), n λ = 1, j j j ≥ j ≥ j=1 j and so λ 1 for each j. By (1.3) once more, j ≤ P n (1.8) S(ρ)= λ log λ . − j j j=1 X That is, S(ρ) depends on ρ only through its eigenvalues. Otherwise put, the von Neumann entropy is unitarily invariant; i.e.,

(1.9) S(UρU ∗)= S(ρ) .

The fact that t t log(t) is strictly convex together with (1.8) tells us that 7→ n n n 1 1 1 S(ρ)= n λ log λ n λ log λ − n j j ≤ n j  n j  j=1 j=1 j=1 X X X 1 1     = n log = log(n) , n n − TRACE INEQUALITIES AND QUANTUM ENTROPY 77 and there is equality if and only if each λj =1/n. Thus, we have (1.10) 0 S(ρ) log n ≤ ≤ for all ρ S , and there is equality on the left iff ρ is a pure state, and there is ∈ n equality on the right iff ρ = (1/n)I. Actually, S(ρ) is not only a strictly concave function of the eigenvalues of ρ, it is strictly concave function of ρ itself. That is, as we shall show in the next section, (1.11) S((1 t)ρ + tρ ) (1 t)S(ρ )+ tS(ρ ) − 0 1 ≥ − 0 1 for all ρ ,ρ S , with equality iff ρ = ρ . This is much stronger than concavity 0 1 ∈ n 0 1 as a function of the eigenvalues since if ρ0 and ρ1 do not commute, the eigenvalues of (1 t)ρ + tρ are not simply linear combinations of the eigenvalues of ρ and − 0 1 0 ρ1. Since we shall be focusing on convexity and concavity of trace functions in these notes, we briefly discuss one reason this concavity matters, starting with the simpler fact (1.10) that we have deduced from the concavity of the entropy as a function of the eigenvalues of ρ. In quantum statistical mechanics, equilibrium states are determined by maximum entropy principles, and the fact that (1.12) sup S(ρ) = log n ρ∈Sn reflects Boltzmann’s identity S = k log W which is engraved on his funerary monument in Vienna. Often however, we are not interested in the unconstrained supremum in (1.12), but instead the constrained supremum over states with a specified energy: Consider a quantum system in which the observables are self adjoint operators on Cn, and in particular, in which the energy is represented by H H . By the rules of quantum ∈ n mechanics, the expected value of the energy in the state ρ is given by Tr(Hρ). Our constrained optimization problem is to compute (1.13) sup S(ρ) : ρ S , Tr(Hρ)= E . { ∈ n } The key to this solving this problem is a duality formula for the entropy that is part of Theorem 2.13 which we prove in the next section. The formula in question says that for any density matrix ρ in Mn, (1.14) S(ρ) = sup Tr(Aρ) ln TreA , − A∈Hn{ − } and moreover 1 (1.15) S(ρ) = Tr(Aρ) ln TreA ρ = eA . − − ⇐⇒ Tr(eA) Notice that for any fixed A H , the function ∈ n ρ Tr(Aρ) ln TreA 7→ − 78 ERIC CARLEN is affine, and therefore convex. It is easy to see that the pointwise supremum of any family of convex functions is again convex, and so proving the duality formula (1.14) would immediately prove the concavity of the entropy. In fact, under mild regularity conditions, every convex function can be written in this way, as a supremum of affine functions. Such dual representations are very useful in solving variations problems involving convex functions. We illustrate this by solving the optimization problem in (1.13). 1.2. Definition (Gibbs state). Given H H and β R, the Gibbs state for ∈ n ∈ Hamiltonian H at inverse temperature β is the density matrix ρβ,H where 1 (1.16) ρ := e−βH . β,H Tr [e−βH ]

Define ρ∞,H = limβ→∞ ρβ,H and ρ−∞,H = limβ→−∞ ρβ,H . It may seem strange that we allow negative “temperatures” in this definition. However, it is physically natural for systems with finitely many degrees of freedom. On a mathematical level, negative “temperatures” are natural on account of the following theorem: 1.3. Theorem (Variational principle for Gibbs states). Let H H , and let ∈ n λ λ be the n eigenvalues of H. Then for any E with 1 ≤···≤ n λ E λ 1 ≤ ≤ n there is a unique β [ , ] such that ∈ −∞ ∞

E = Tr(HρβE ,H ) , and for any ρ˜ S , ∈ n Tr(Hρ˜)= E and S(˜ρ) = sup S(ρ) : ρ S , Tr(Hρ)= E ρ˜ = ρ . { ∈ n } ⇐⇒ βE ,H Proof: We shall use (1.14) and (1.15), though these shall only be proved in Theo- rem 2.13. Note that by (1.14), for anyρ ˜ S with Tr(˜ρH)= E, ∈ n (1.17) S(˜ρ) βE + ln Tre−βH , ≤ and by (1.15), there is equality in (1.17) if and only if

1 −βEH (1.18)ρ ˜ = e = ρβ ,H , Tr [e−βEH ] E where βE is such that Tr(ρβE,H H) = E, so that our constraint is satisﬁed. For λ E λ , there is exactly one such β as we explain next. 1 ≤ ≤ n E Notice that 1 d Tr(Hρ )= Tr[He−βH ]= log Tr[e−βH ] . β,H Tr [e−βH ] −dβ A direct calculation shows that d2 1 1 2 log Tr[e−βH ] = Tr[H2e−βH ] Tr[He−βH ] . dβ2 Tr [e−βH ] − Tr [e−βH ] By the Schwarz inequality, this is strictly positive unless H is a multiple of the identity. If H is a multiple of the identity, necessarily EI, the optimization problem TRACE INEQUALITIES AND QUANTUM ENTROPY 79 is trivial, and the optimal ρ is (1/n)I. So assume that H is not a multiple of the identity. Then the map β log Tr[e−βH ] is strictly convex, and hence β 7→ 7→ Tr(Hρ ) is strictly monotone decreasing, and in fact, β,H n 1 −βλj Tr(Hρβ,H )= n λj e . e−βλj j=1 j=1 X It follows that P

lim Tr(Hρβ,H )= λ1 and lim Tr(Hρβ,H )= λn . β→∞ β→−∞ By the Spectral Theorem, λ Tr(Hρ) λ for all ρ S , and so for all E 1 ≤ ≤ n ∈ n ∈ [λ , λ ], there is a unique value of β [ , ] such that Tr(Hρ )= E. 1 n ∈ −∞ ∞ β,H

The fact that the Gibbs state ρβE,H is characterized by the fact that S(ρ ) S(˜ρ) forallρ ˜ S with Tr(Hρ˜)= E βE ,H ≥ ∈ n gives rise to the variational principle for Gibbs states. If one accepts the idea that equilibrium states should maximize the entropy given the expected values of macroscopic observables, such as energy, this leads to an identification of Gibbs states with thermodynamic equilibrium states. There is a great deal of physics, and many open questions, that one should discuss to set this variational principle in proper physical context. Such a discussion is far beyond the scope of these notes. Here we shall simply accept the deep physical insight that equilibrium states should maximize the entropy given the expected values of macroscopic observables, such as energy. Then, as we have seen, this extremal property is intimately connected with the concavity of the entropy. Further exploration of these ideas would lead one to investigate certain convexity, concavity and monotonicity properties of other trace functions. These notes will focus on the mathematical aspects of convexity, concavity and monotonicity properties of trace functions. This is an extremely beautiful theory, and the beauty can be deeply appreciated on a purely mathematical basis. However, one should bear in mind that many of the most profound aspects of this theory were discovered through physical enquiry. As we have indicated, the entropy is not the only trace function that matters in statistical mechanics: Even in this very particular context of the entropy maximization problem, the duality formula (1.14) involves the trace functional A log Tr(eA) , and its proof makes use of what is known as the relative en- 7→ tropy: 1.4. Definition. The relative entropy of ρ S with respect to σ S , S(ρ σ), ∈ n ∈ n | is defined to be + unless the nullspace of ρ contains the nullspace of σ in which ∞ case it is defined by (1.19) S(ρ σ) = Tr(ρ log ρ) Tr(ρ log σ) . | − As we shall see, (ρ, σ) S(ρ σ) is jointly convex in the sense that for all 7→ | ρ ,ρ , σ , σ S and any 0 t 1, 0 1 0 1 ∈ n ≤ ≤ (1.20) S((1 t)ρ + tρ (1 t)σ + tσ ) (1 t)S(ρ σ )+ tS(ρ σ ) . − 0 1| − 0 1 ≤ − 0| 0 1| 1 80 ERIC CARLEN

This is a deeper fact than is needed to prove the duality formula (1.14) – which only uses the much more easily proved fact that S(ρ σ) 0 with equality if and only | ≥ if σ = ρ. This is a special case of Klein’s inequality, which we prove in Section 2, and show it to be a direct consequences of the strict concavity of the von Neuman entropy. The joint convexity of S(ρ σ) is, as we have noted, much deeper than the strict | concavity of the von Neuman entropy. Its proof, and its applications, shall have to wait until later. The ideas leading to the proof are closely connected with yet another type of “entropy”; i.e., the Wigner-Yanase skew information. The notions of “entropy” and “information” are distinct, but closely inter- twined. The Shannon information content I (ρ) of a density matrix ρ S is S ∈ n defined by I (ρ) = S(ρ). See [19] for a discussion of this definition. Note that S − with this definition, the information content of any pure state ρ is zero. However, for a quantum mechanical system in which the observables are self adjoint operators on Cn, and in which the energy is the observable H H , some ∈ n states are easier to measure than others: Those that commute with H are easy to measure, and those that do not are more difficult to measure. This led Wigner and Yanase [38, 39] to introduce the Wigner-Yanase skew information IW Y (ρ) of a density matrix ρ in a quantum mechanical system with energy operator H to be 1 I (ρ)= Tr [√ρ, H]2 . W Y −2 Note that (1.21) I (ρ) = TrH2ρ Tr√ρH√ρH , W Y − which vanishes if and only if ρ commutes with H. Wigner and Yanase [38, 39] proved that ρ I (ρ) is convex on S , i.e., 7→ W Y n (1.22) I ((1 t)ρ + tρ ) (1 t)I (ρ )+ tI (ρ ) , W Y − 0 1 ≤ − W Y 0 W Y 1 and explained the information theoretic consequences of this. From the formula (1.21), it is clear that convexity of ρ I (ρ) amounts to concavity of ρ 7→ W Y 7→ Tr√ρH√ρH, which they proved. Even though this is a convexity result for one variable, and not a joint convexity result, it too is much harder to prove than the concavity of the von Neuman entropy, or what is the same thing, the convexity of the Shannon information. Wigner and Yanase left open the more general problem of concavity of (1.23) ρ Tr(ρpKρ1−pK∗) 7→ for 0

For now, we mention that strong subadditivity of the entropy is an example of a trace inequality for density matrices acting on a tensor product of Hilbert spaces – in this case – that involves partial traces. The different Hilbert H1 ⊗ H2 ⊗ H3 spaces correspond to different parts of the quantum mechanical system: Here, , H1 and are the state spaces for degrees of freedom in various subsystems of the H1 H3 whole system, and it is often important to estimate the entropy of a density matrix ρ on the whole system in terms of the entropies of induced density matrices on the various subsystems. Later in these notes, we shall extensively develop this general topic, and inequalities for tensor products are absolutely fundamental throughout the subject. In fact, the easiest (by far) proof of the Lieb Concavity Theorem proceeds through a simple tensor product argument devised by Ando [1]. Before entering fully into our subject, let us close the introduction by empha- sizing that in our exposition, we shall provide full mathematical detail and context, but we shall be comparatively sketchy when it comes to physical detail and context. There are many excellent accounts of the physics behind the definitions of S(ρ), S(ρ σ) and I (ρ) and other mathematical constructions that we shall encounter | W Y here. Thirring’s book [33] is an especially good reference for much of this. It is especially good at connecting the physics with the rigorous mathematics, but still, what we do here provides a mathematical complement to it. For example, [33] does not contain a full proof of the joint convexity of S(ρ σ). It only give the simple | argument which reduces this to the Lieb Concavity Theorem, about which it says: “The proof of this rather deep proposition . . . is too laborious to be repeated here”. In fact, as we shall see, one can give a very simple, clean and clear proof of this. The point of view that leads to this simple proof is due to Ando [1], and as we shall see, it provides insight into a number of other interesting questions as well. We now turn to the systematic development of our subject – inequalities for operators and traces, with special focus on monotonicity and convexity.

2. Operator convexity and monotonicity 2.1. Some examples and the L¨owner-Heinz Theorem.

2.1. Definition (Operator monotonicity and operator convexity). A function f : (0, ) R is said to be operator monotone in case whenever for all n, and all ∞ → A, B H+, ∈ n (2.1) A B f(A) f(B) . ≥ ⇒ ≥ A function f : (0, ) R is said to be operator convex in case for all n and all ∞ → A, B H+, and 0

2.2. Example (Non-monotonicity of the square). The function f(t) = t2 is monotone in the usual sense, but for A, B H+, ∈ n (A + B)2 = A2 + (AB + BA)+ B2 . For any choice of A and B such that AB + BA has even one strictly negative eigenvalue, (A + tB)2 A2 will fail for all sufficiently small t. It is easy to find + ≥ such A and B in Hn . For example, take 1 1 1 0 (2.3) A = and B = , 1 1 0 0 so that 2 1 AB + BA = . 1 0 Thus, not even the square function is operator monotone. It turns out, however, that the square root function is operator monotone. This is an important theorem of Heinz [13]. The proof we give is due to Kato [17]; see [9] for its specialization to the matrix case, which we present here. 2.3. Example (Monotoncity of the square root). The square root function, f(t)= t1/2, is operator monotone. To see this, it suffices to show that if A, B H+ ∈ n and A2 B2, then A B. ≤ ≤ Towards this end, consider any eigenvalue λ of the Hermitian matrix B A, − and let u be a unit vector that is an eigenvector with this eigenvalue. We must show that λ 0. Observe that ≥ (B λ)u = Au Bu, (B λ)u = Bu,Au . − ⇒ h − i h i Then by the Schwarz inequality, Bu 2 λ u,Bu Bu Au . k k − h i≤k kk k But since A2 B2, ≤ Bu Au = u,B2u 1/2 u, A2u 1/2 u,B2u , k kk k h i h i ≤ h i we have Bu 2 λ u,Bu Bu 2 , k k − h i≤k k and this shows that λ u,Bu 0, and hence λ 0. h i≥ ≥ We now give two examples pertaining to convexity: 2.4. Example (Convexity of the square). The square function is operator convex: One has the parallelogram law A + B 2 A B 2 1 1 + − = A2 + B2 , 2 2 2 2 so certainly for f(t) = t2, one always has (2.2) for t = 1/2, which is known as midpoint convexity. A standard argument then gives (2.2) whenever t is a dyadic rational, and then by continuity one has it for all t, in (0, 1) of course. We will often use the fact that in the presence of continuity, it suffices to check midpoint convexity. TRACE INEQUALITIES AND QUANTUM ENTROPY 83

2.5. Example (Non-convexity of the cube). The cube function is not operator convex. To easily see this, let us deduce a consequence of (2.2) that must hold for any operator convex function f: Take A, B H+, and all 0

After seeing these negative examples, one might suspect that the notions of operator monotonicity and operator convexity are too restrictive to be of any interest. Fortunately, this is not the case. The following result furnishes a great many positive examples.

2.6. Theorem (L¨owner-Heinz Theorem). For 1 p 0, the function f(t)= − ≤ ≤ tp is operator monotone and operator concave. For 0 p 1, the function − ≤ ≤ f(t)= tp is operator monotone and operator concave. For 1 p 2, the function ≤ ≤ f(t)= tp and operator convex. Furthermore f(t) = log(t) is operator concave and operator monotone, while f(t)= t log(t) is operator convex.

Löwner actually proved more; he gave a necessary and sufficient condition for f to be operator monotone. But what we have stated is all that we shall use. We shall give an elementary proof of Theorem 2.6 after first proving two lemmas. The first lemma addresses the special case f(t)= t−1.

2.7. Lemma. The function f(t) = t−1 is operator convex, and the function f(t)= t−1 is operator monotone. − Proof: We begin with the monotonicity. Let A, B H+. Let C = A−1/2BA−1/2 ∈ n so that A−1 (A + B)−1 = A−1/2[I (I + C)−1]A−1/2 . − − Since C H+, I (I + C)−1 H+, and hence A−1/2[I (I + C)−1]A−1/2 > 0. ∈ n − ∈ n − This proves the monotonicity. Similarly, to prove midpoint convexity, we have 1 1 A + B −1 1 1 I + C −1 A−1 + B−1 = A−1/2 I + C−1 A−1/2 . 2 2 − 2 2 2 − 2 " # 84 ERIC CARLEN

By the arithmetic-harmonic mean inequality, for any real numbers a,b > 0, a + b a−1 + b−1 −1 . 2 ≥ 2 Applying this with a = 1 and c any eigenvalue of C−1, we see from the Spectral Theorem that 1 1 I + C −1 I + C−1 0 , 2 2 − 2 ≥ " # from which 1 1 A + B −1 A−1 + B−1 0 2 2 − 2 ≥ follows directly. Again, by continuity, the full convexity we seek follows from the midpoint convexity that we have now proved. The other ingredient to the proof of Theorem 2.6 is a set of integral representations for the functions A Ap in H+ for p in the ranges 1

2.8. Lemma. For all A H+, one has the following integral formulas: ∈ n π ∞ 1 (2.5)Ap = tp dt for all 1 0, and all 0 0, and some finite positive measure µ. ∈ 2.2. Convexity and monotonicity for trace functions. Given a function f : R R, consider the associated trace function on H given by → n A Tr[f(A)] . 7→ In this subsection we address the question: Under what conditions on f is such a trace function monotone, and under what conditions on f is it convex? We shall see that much less is required of f in this context than is required for operator monotonicity or operator convexity. Notice first of all that we are working now on H and not only H+, and with functions defined on all of R and not only on (0, ). n n ∞ The question concerning monotonicity is very simple. Suppose that f is continuously differentiable. Let B, C H+. Then by the Spectral Theorem and first ∈ n 86 ERIC CARLEN order perturbation theory, d Tr(f(B + tC)) = Tr(f ′(B)C) = Tr(C1/2f ′(B)C1/2) , dt t=0 where in the last step we have used cyclicity of the trace. As long as f has a positive derivative, all of the eigenvalues of f ′(B) will be positive, and so f ′(B) is positive semi-definite, and therefore so is C1/2f ′(B)C1/2. It follows immediately that Tr(C1/2f ′(B)C1/2) 0, and from here one easily sees that for A B, and ≥ ≥ with C = A B, − 1 Tr[f(A)] Tr[f(B)] = Tr(C1/2f ′(A + tB)C1/2)dt 0 . − ≥ Z0 Thus, Tr[f(A)] Tr[f(B)] whenever A>B and f is continuously differentiable ≥ and monotone increasing. By a simple continuity argument, one may relax the requirement that f be continuously differentiable to the requirement that f be continuous. The question concerning convexity is more interesting. Here we have the following theorem:

2.9. Theorem (Peierls Inequality). Let A H , and let f be any convex ∈ n function on R. Let u ,...,u be any orthonormal base of Cn. Then { 1 n} n (2.9) f ( u , Au ) Tr[f(A)] . h j j i ≤ j=1 X There is equality if each uj is an eigenvector of A, and if f is strictly convex, only in this case.

Proof: By (1.3) together with the spectral representation (1.7),

n m Tr[f(A)] = uj f(λk)Pk uj j=1 * " # + X kX=1 n m = f(λ ) P u 2 k k k j k j=1 ! X kX=1 n m (2.10) f λ P u 2 ≥ kk k j k j=1 ! X Xk=1 n = f ( u , Au ) . h j j i j=1 X The inequality above is simply the convexity of f, since for each j, m P u 2 = k=1 k k j k u 2 = 1, and thus m λ P u 2 is a weighted average of the eigenvalues of k jk k=1 kk k jk P A. Note that each u is an eigenvector of A if and only if P u 2 = 1 for some jP k k j k k, and is 0 otherwise, in which case the inequality in (2.10) is a trivial equality. And clearly when f is strictly convex, equality can hold in (2.10) only if for each j, P u 2 = 1 for some k, and is 0 otherwise. k k jk TRACE INEQUALITIES AND QUANTUM ENTROPY 87

Now consider A, B H , and let f : R R be convex. Let u ,...,u be an ∈ n → { 1 n} orthonormal basis of Cn consisting of eigenvectors of (A+B)/2. Then, Theorem 2.9, n A + B A + B Tr f = f u , u 2 j 2 j j=1 X n 1 1 = f u , Au + u ,Bu 2h j j i 2h j j i j=1 X n 1 1 (2.11) f( u , Au )+ f( u ,Bu ) ≤ 2 h j ji 2 h j j i j=1 X 1 1 (2.12) Tr[f(A)] + Tr[f(B)] . ≤ 2 2 where we have used Theorem 2.9, in the first equality, and where in (2.11) we have used the (midpoint) convexity of f, and in (2.12) we have used Theorem 2.9 again. This shows that for every natural number n, whenever f is midpoint convex, the map A Tr[f(A)] is midpoint convex on H . Note that if f is strictly convex 7→ n and Tr[f(A + B)/2] = Tr[f(A)]/2 + Tr[f(A)]/2, we must have equality in both (2.11) and (2.12). On account of the strict convexity of f, equality in (2.11) implies that u , Au = u ,Bu for each u . By the conditions for equality in Peierl’s h j j i h j ji j inequality, equality in (2.11) implies that each uj is an eigenvector of both A and B. Thus, Au = u , Au u = u ,Bu u = Bu , j h j j i j h j ji j j and so A = B. A simple continuity argument now shows that if f continuous as well as convex, A Tr[f(A)] is convex on H , and strictly so if f is strictly convex. 7→ n Let us summarize some conclusions we have drawn so far in a theorem: 2.10. Theorem (Convexity and monotonicty of trace functions). Let f : R R → be continuous, and let n be any natural number. Then if t f(t) is monotone 7→ increasing, so is A Tr[f(A)] on H . Likewise, if t f(t) is convex, so is 7→ n 7→ A Tr[f(A)] on H , and strictly so if f is strictly convex. 7→ n 2.3. Klein’s Inequality and the Peierls-Bogoliubov Inequality. We close this section with three trace theorems that have significant applications in statistical quantum mechanics. 2.11. Theorem (Klein’s Inequality). For all A, B H , and all differentiable ∈ n convex functions f : R R, or for all A, B H+, and all differentiable convex → ∈ n functions f : (0, ) R ∞ → (2.13) Tr[f(A) f(B) (A B)f ′(B)] 0 . − − − ≥ In either case, if f is strictly convex, there is equality if and only if A = B.

Proof: Let C = A B so that for 0

Tr(A ln A) A Sn, (2.17) S(A) := − ∈ ( otherwise. −∞ 2.13. Theorem (Duality formula for the entropy). For all A H , ∈ n (2.18) S(A) = sup Tr(AH) ln Tr eH : H H . − − ∈ n The supremum is an attained maximum if and only if A is a strictly positive probability density, in which case it is attained at H if and only if H = ln A + cI for some c R. Consequently, for all H H , ∈ ∈ n (2.19) ln Tr eH = sup Tr(AH)+ S(A) : A H . { ∈ n} The supremum is a maximum at all points of the domain of ln Tr eH , in which case it is attained only at the single point A = eH /(Tr(eH )). Proof: To see that the supremum is unless 0 A I, let c be any real number, ∞ ≤ ≤ and let u be any unit vector. Then let H be c times the orthogonal projection onto u. For this choice of H,

Tr(AH) ln Tr eH = c u,Au ln(ec + (n 1)) . − h i− − If u,Au < 0, this tends to as ctends to . If u,Au > 1, this tends to h i ∞ −∞ h i ∞ as c tends to . Hence we need only consider 0 A I. Next, taking H = cI, ∞ ≤ ≤ c R, ∈ Tr(AH) ln Tr eH = cTr(A) c ln(n) . − − − Unless Tr(A) = 1, this tends to as ctends to . Hence we need only consider ∞ ∞ the case that A is a density matrix ρ. Hence, consider any ρ S , and let H be any self-adjoint operator. In finite ∈ n dimensions, necessarily Tr(eH ) < , and then we may define the density matrix σ ∞ 90 ERIC CARLEN by eH σ = . Tr(eH ) By Klein’s inequality, Tr(ρ ln ρ ρ ln σ) 0 with equality if and only if σ = ρ. But − ≥ by the definition of σ, this reduces to Tr(ρ ln ρ) Tr(ρH) ln Tr eH , ≥ − with equality if and only if H = ln ρ. From here, there rest is very simple. As we have explained in the introduction, (1.14), which is now justified by Theorem 2.13, shows that the Gibbs states maximize the entropy given the expected value of the energy.

3. The joint convexity of certain operator functions The route to the proof of the joint convexity of the relative entropy passes through the investigation of joint convexity for certain operator functions. This section treats three important examples.

3.1. The joint convexity of the map (A, B) B∗A−1B∗ on H+ M . In 7→ n × n this section we shall prove the joint convexity or joint concavity of certain operator functions. Our ﬁrst example concerns the map (A, B) B∗A−1B∗ on H+ M 7→ n × n which we shall show to be convex. Our next two examples concern the operator version of the harmonic and geometric means of two operators A, B H+. We ∈ n shall show that these are jointly concave. All three proofs follow the same pattern: In each of them, we show that the function in question has a certain maximality or minimality property, and we then easily prove the concavity or convexity as a consequence of this. All three proofs are taken from Ando’s paper [1]. Here is the main theorem of this subsection:

3.1. Theorem. The map (A, B) B∗A−1B from H+ M to H+ is jointly 7→ n × n n convex. That is, for all (A ,B ), (A ,B ) H+ M , and all 0

Proof: Define D := C B∗A−1B∗, so that − A B A B 0 0 = + . B∗ C B∗ B∗A−1B 0 D Now notice that ∗ A B A1/2 A−1/2B A1/2 A−1/2B (3.2) = 0 . B∗ B∗A−1B 0 0 0 0 ≥ A B Hence, positive semi-definiteness of D is sufficient to ensure that is B∗ C positive semi-definite. It is also evident from the factorization (3.2) that for any v Cn, the vector ∈ A−1Bv A B belongs to the null space of , so that v B∗ B∗A−1B A−1Bv A B A−1Bv , = v,Dv , v B∗ B v h i A B and hence positive semi-definiteness of D is necessary to ensure that B∗ C is positive semi-definite. A B Lemma 3.2 says that the set of matrices C H+ such that is ∈ n B∗ C positive semi-definite has a minimum, namely C = B∗A−1B∗. Form this, Ando draws a significant conclusion: Proof of Theorem 3.1: By Lemma 3.2, A B A B (1 t) 0 0 + t 1 1 − B∗ B∗A−1B B∗ B∗A−1B 0 0 0 0 1 1 1 1 is a convex combination of positive semi-definite matrices, and is therefore positive semi-definite. It is also equal to (1 t)A + tA (1 t)B + tB − 0 1 − 0 1 . (1 t)B∗ + tB∗ (1 t)B∗A−1B + tB∗A−1B − 0 1 − 0 0 0 1 1 1 Now (3.1) follows by one more application of Lemma 3.2.

3.2. Joint concavity of the harmonic mean. Ando also uses Lemma 3.2 to prove a theorem of Anderson and Duﬃn on the concavity of the harmonic mean for operators.

3.3. Definition. For A, B H+, the harmonic mean of A and B, M (A, B) ∈ n −1 is deﬁned by A−1 + B−1 −1 M (A, B)= . −1 2 3.4. Theorem (Joint concavity of the harmonic mean). The map (A, B) 7→ M (A, B) on H+ H+ is jointly concave. −1 n × n 92 ERIC CARLEN

Proof: The key is the identity (3.3) M (A, B)=2B 2B(A + B)−1B . −1 − Granted this, the jointly concavity of M−1(A, B) is a direct consequence of Theo- rem 3.1. To prove (3.3), note that B B(A + B)−1B = (A + B)(A + B)−1B B(A + B)−1B − − = A(A + B)−1B , and also (A(A + B)−1B)−1 = B−1(A + B)A−1 = A−1 + B−1 .

Here we have used the minimality property in Lemma 3.2 implicitly, but there is another way to proceed: It turns out that the harmonic mean has a certain maximality property: 3.5. Theorem (Ando’s variational formula for the harmonic mean). For all A, B H+, the set of all C H such that ∈ n ∈ n A 0 C C 2 0 0 B − C C ≥ has a maximal element, which is M−1(A, B). We remark that this maximum property of the harmonic mean gives another proof of the concavity of the harmonic mean, just as the minimum property of (A, B) B∗A−1B from Lemma 3.2 gave proof of the convexity of this function. 7→ Proof of Theorem 3.5: Note that as a consequence of (3.4), (3.3) and the fact that M−1(A, B)= M−1(B, A), we have

M (B, A) = [A A(A + B)−1A] + [B B(A + B)−1B] −1 − − (3.4) = A(A + B)−1B + B(A + B)−1A , from which it follows that 1 (3.5) 2M (A, B) = (A + B) (A B) (A B) . −1 − − A + B − (Incidentally, we remark that from this identity one easily see the harmonic- arithmetic mean inequality: M (A, B) (A + B)/2 with equality if and only −1 ≤ if A = B.) A + B A B 2C 0 Furthermore, by Lemma 3.2 , − 0 if and only A B A + B − 0 0 ≥ if − (A + B) 2C (A B)(A + B)−1(A B) , − ≥ − − and by (3.5), this is the case if and only if C M (A, B). Finally, note that ≤ −1 A + B A B 2C 0 A 0 C C − 0 2 0 . A B A + B − 0 0 ≥ ⇐⇒ 0 B − C C ≥ − TRACE INEQUALITIES AND QUANTUM ENTROPY 93

3.3. Joint concavity of the geometric mean.

3.6. Definition. For A, B H+, the geometric mean of A and B, M (A, B) ∈ n 0 is deﬁned by 1/2 −1/2 −1/2 1/2 1/2 M0(A, B)= A (A BA ) A .

We note that if A and B commute, this deﬁnition reduces to M0(A, B) = A1/2B1/2. While it is not immediately clear from the deﬁning formula that in general one has that M0(A, B)= M0(B, A), this is clear from the following variational formula of Ando:

3.7. Theorem (Ando’s variational formula for the geometric mean). For all A, B H+, the set of all C H such that ∈ n ∈ n A C 0 C B ≥ has a maximal element, which is M0(A, B). A C Proof: If 0, then by Lemma 3.2, B CA−1C, and hence C B ≥ ≥ A−1/2BA−1/2 A−1/2CA−1CA−1/2 = (A−1/2CA−1/2)2 . ≥ By the operator monotonicity of the square root functions, which has been proved in Example 2.3 and as a special case of the L¨owner-Heinz Theorem,

A1/2(A−1/2BA−1/2)1/2A1/2 C . ≤ On the other hand, if C = A1/2(A−1/2BA−1/2)1/2A1/2, then B = CA−1C This shows the maximality property of M0(A, B).

3.8. Theorem (Joint concavity of the geometric mean). The map (A, B) 7→ M (A, B) on H+ H+ is jointly concave, and is symmetric in A and B. Moreover, 0 n × n for any non-singular matrix D M , ∈ n ∗ ∗ ∗ (3.6) M0(D AD,D BD)= D M0(A, B)D . Finally, (A, B) M (A, B) is monotone increasing in each variable. 7→ 0 Proof: The argument for the concavity is by now familiar. For (3.6), note that A C D∗AD D∗CD > 0 > 0 . C B ⇐⇒ D∗CD D∗BD Finally, for ﬁxed A, the fact that B A1/2(A−1/2BA−1/2)1/2A1/2 = M (A, B) 7→ 0 is monotone increasing is a direct consequence of the monotonicity of the square root function, which is contained in Theorem 2.6. By symmetry, for ﬁxed B, A M (A, B) is monotone increasing. 7→ 0 94 ERIC CARLEN

3.4. The arithmetic-geometric-harmonic mean inequality.

Let M1(A, B) denote the arithmetic mean of A and B; i.e., A + B M (A, B)= . 1 2 3.9. Theorem (Arithmetic-Geomemtric-Harmonic Mean Inequality). For all A, B H+, ∈ n M (A, B) M (A, B) M (A, B) , 1 ≥ 0 ≥ −1 with strict inequality everywhere unless A = B.

Proof: We first note that one can also use (3.5) to deduce that for all A, B H+ ∈ n and nonsingular D M , ∈ n ∗ ∗ ∗ (3.7) M−1(D AD,D BD)= D M−1(A, B)D in the same way that we deduced (3.6). However, (3.7) can also be deduced very simply from the formula that defines M−1(A, B). We now show that (3.6) and (3.7) together reduce the proof to the corresponding inequality for numbers [16], which is quite elementary. To see this, take D = A−1/2 and letting L = A−1/2BA−1/2, we have from the obvious companion for M1(A, B) to (3.6) and (3.7) that M (A, B) M (A, B) = A1/2[M (I,L) M (I,L)]A1/2 1 − 0 1 − 0 1 = A1/2 I + L 2L1/2 A1/2 2 − 1 h i = A1/2(I L1/2)2A1/2 . 2 − The right hand side is evidently positive semi-definite, and even positive definite unless L = I, which is the case if and only if A = B. Likewise,

M (A, B) M (A, B) = A1/2[M (I,L) M (I,L)]A1/2 0 − −1 0 − −1 = A1/2(L1/2 2(I + L−1)−1)2A1/2 . − The right hand side is positive semi-deﬁnite by the Spectral Theorem and the geometric-harmonic mean inequality for positive numbers, and even positive deﬁnite unless L = I, which is the case if and only if A = B.

4. Projections onto -subalgebras and convexity inequalities ∗ 4.1. A simple example. The notion of operator convexity shall be useful to us when we encounter an operation on matrices that involves averaging; i.e., forming convex combinations. One is likely to encounter such operations much more frequently than one might ﬁrst expect. It turns out that many natural operations on Mn can be written in the form N (4.1) A (A) := w U AU ∗ 7→ C j j j j=1 X TRACE INEQUALITIES AND QUANTUM ENTROPY 95

N where the weights w1,...,wN are positive numbers wtih j=1 wj = 1, and the U ,...,U are unitaries in M . A basic example concerns orthogonal projections 1 N n P onto -subalgebras, as we now explain. ∗ A unital -subalgebra of M is a subspace A that contains the identity I, is ∗ n closed under matrix multiplication and Hermitian conjugation: That is, if A, B A, ∈ then so are AB and A∗. In what follows, all -subalgebras that we consider shall be ∗ unital, and to simplify the notation, we shall simply write -subalgebra in place of ∗ unital -subalgebra. ∗ Perhaps the simplest example is z w A = A M : A = w,z C . ∈ 2 w z ∈ Since x y ξ η xξ + yη xη + yξ (4.2) = , y x η ξ xη + yξ xξ + yη we see that A is in fact closed under multiplication, and quite obviously it is closed under Hermitian conjugation. Moreover, one sees from (4.2) that

x y ξ η ξ η x y = ; y x η ξ η ξ y x that is, the algebra A is a commutative subalgebra of M2. The main theme of this section is that: Orthogonal projections onto subalgebras can be expressed in terms of averages over unitary conjugations, as in (4.1), and that this introduction of averages opens the way to the application of convexity inequalities. By orthogonal, we mean of course orthogonal with respect to the Hilbert-Schmidt inner product. This theme was first developed by C. Davis [8]. Our treatment will be slightly different, and presented in a way that shall facilitate the transition to infinite dimensions. In our simple example, it is particularly easy to see how the projection of M2 onto A can be expressed in terms of averages. Let 0 1 (4.3) Q = . 1 0 Clearly Q = Q∗ and QQ∗ = Q2 = I, so that Q is both self-adjoint and unitary. a b Notice that for any A = M , c d ∈ 2 1 1 a + d b + c (A + QAQ∗)= A . 2 2 b + c a + d ∈

1 ∗ Let us denote (A + QAQ ) by EA(A) for reasons that shall soon be explained. 2 One can easily check that EA(A) is the orthogonal projection of A onto A. Indeed, it suﬃces to check by direct computation that EA(A) and A EA(A) are − orthogonal in the Hilbert-Schmidt inner product. 96 ERIC CARLEN

The fact that

1 ∗ EA(A)= (A + QAQ ) 2 is an average over unitary conjugations of A means that if f is any operator convex function, then f(A)+ Qf(A)Q∗ f(A)+ f(QAQ∗) EA[f(A)] = = 2 2 A + QAQ∗ f = f (EA[A]) . ≥ 2 Let us look ahead and consider an important application of this line of reasoning. Consider relative entropy function A, B H(A B) = Tr[A log A] 7→ | − Tr[A log(B)]. In Theorem 6.3, this function shall be proved to be jointly convex on H+ H+. It is also clearly unitarily invariant; i.e., for any n n unitary n × n × matrix U, H(UAU ∗ UBU ∗)= H(A B) . | | It then follows that, for n = 2, and A being the subalgebra of M2 deﬁned above, H(A B)+ H(QAQ∗ QBQ∗) H(A B) = | | | 2 A + QAQ∗ B + QBQ∗ H ≥ 2 2

= H(EA(A) EA(B)) . | It turns out that there is nothing very special about the simple example we have been discussing: if A is any -subalgebra of M , the orthogonal projection ∗ n onto A can be expressed in terms of averages of unitary conjugations, and from this fact we shall be able to conclude a number of very useful convexity inequalities. The notation EA for the orthogonal projection onto a -subalgebra reﬂects a ∗ close analogy with the operation of taking conditional expectations in probability theory: Let (Ω, , µ) be a probability space, and suppose that is a sub-σ-algebra F S of . Then L2(Ω, , µ) will be a closed subspace of L2(Ω, , µ), and if X is any F S F bounded random variable on (Ω, , µ); i.e., any function on Ω that is measurable F with respect to , and essentially bounded with respect to µ, the conditional expec- F tation of X given is the orthogonal projection of X, which belongs to L2(Ω, , µ), S S onto L2(Ω, , µ). The bounded measurable functions on (Ω, , µ) of course form F F an commutative -algebra (in which the operation is pointwise complex conjuga- ∗ ∗ tion), of which the bounded measurable functions on (Ω, , µ) form a commutative S -subalgebra. The non-commutative analog of the conditional expectation that we ∗ now develop is more than an analog; it is part of a far reaching non-commutative extension of probability theory, and integration in general, due to Irving Segal [28, 30].

4.2. The von Neumann Double Commutant Theorem. TRACE INEQUALITIES AND QUANTUM ENTROPY 97

4.1. Definition (Commutant). Let be any subset of M . Then ′, the A n A commutant of , is given by A ′ = B M : BA = AB for all A . A { ∈ n ∈A} It is easy to see that for any set , ′ is a subalgebra of M , and if is closed A A n A under Hermitian conjugation, then ′ is a -subalgebra of M . A ∗ n In particular, if A is a -subalgebra of M , then so is A′, the commutant of A. ∗ n Continuing in this way, the double commutant A′′ is also a -subalgebra of M , ∗ n but it is nothing new: 4.2. Theorem (von Neumann Double Commutant Theorem). For any - ∗ subalgebra A of Mn, (4.4) A′′ = A .

Proof: We ﬁrst show that for any -subalgebra A, and any B A′′ and any v Cn, ∗ ∈ ∈ there exists an A A such that ∈ (4.5) Av = Bv . Suppose that this has been established. We then apply it to the -subalgebra ∗ M of Mn2 consisting of diagonal block matrices of the form A 0 . . . 0 0 A ... 0   A . = A In×n , A . 0 .. 0 ⊗ ∈      0 0 ... A    It is then easy to see that M′′ consists of diagonal block matrices of the form B 0 . . . 0 0 B ... 0   A′′ . = B In×n , B . 0 .. 0 ⊗ ∈      0 0 ... B    v1 2 Now let v ,...,v be any basis of Cn, and form the vector v = . Cn . { 1 n}  .  ∈ v  n  Then   (A I )v = (B I )v Av = Bv j =1,...,n . ⊗ n×n ⊗ n×n ⇒ j j Since v ,...,v is a basis of Cn, this means B = A A. Since B was an arbitrary { 1 n} ∈ element of A′′, this shows that A′′ A . ⊂ Since A A′′ is an automatic consequence of the deﬁnitions, this shall prove that ⊂ A = A′′. Therefore, it remains to prove (4.5). Fix any v Cn, and let V be the subspace ∈ of Cn given by (4.6) V = Av : A A . { ∈ } 98 ERIC CARLEN

Let P be the orthogonal projection onto V in Cn. Since, by construction, V is invariant under the action of A, P AP = AP for all A A. Taking Hermitian ∈ conjugates, P A∗P = P A∗ for all A A. Since A is a -algebra, this imples ∈ ∗ P A = AP for all A A. That is, P A′. ∈ ∈ Thus, for any B A′′, BP = PB, and so V is invariant under the action of ∈ A′′. In particular, Bv V , and hence, by the deﬁnition of V , Bv = Av for some ∈ A A. ∈

4.3. Remark. von Neumann proved his double commutant theorem [37] for operators on an infinite dimensional Hilbert space, but all of the essential aspects are present in the proof of the finite dimensional specialization presented here. The relevant difference between finite and infinite dimensions is, of course, that in finite dimensional all subspaces are closed, while this is not the case in infinite dimensions. Thus in the infinite dimensional case, we would have to replace (4.6) by (4.7) V = Av : A A , { ∈ } taking the closure of Av : A A . The same proof would then lead to the { ∈ } conclusion that for all B A′′, Bv lies in the closure of Av : A A . Thus ∈ { ∈ } one concludes that A′′ = A if and only if A is closed in the weak operator topology, which is the usual formulation in the infinite dimensional context.

4.4. Example. The -subalgebra A of M from subsection 4.1 is spanned by ∗ 2 I and Q, where Q is given by (4.3). This -subalgebra happens to be commu- 2×2 ∗ tative, and so it is certainly the case that A A′ – a feature that is special to ⊂ the commutative case. In fact, one can easily check that AQ = QA if and only if A A, and so A = A′. It is then obviously the case that A = A′′, as von Neumann’s ∈ theorem tells us.

4.5. Lemma. Let A be a -subalgebra of Mn. For any self-adjoint A A m ∗ ∈ let A = j=1 λj Pj be the spectral decomposition of A. Then each of the spectral projections P , j =1,...,m belongs to A. Moreover, each A A can be written as P j ∈ a linear combination of at most 4 unitary matrices, each of which belongs to A.

Proof: It is easy to see that 1 (4.8) Pj = (λi A) . λi λj − i∈{1,...,nY}\{j} − As a polynomial in A, this belongs to A. If furthermore A is a self adjoint contraction, then each λ lies in [ 1, 1], and j − hence λ = cos(θ ) for some θ [0, π]. In this case we may write j j ∈ m 1 m m A = λ P = eiθj P + e−iθj P . j j 2  j j  j=1 j=1 j=1 X X X m iθj   Note that U = j=1 e Pj is unitary, and since it is a linear combination of elements of A, U A. Thus we have seen that every self-adjoint contraction in A P∈ TRACE INEQUALITIES AND QUANTUM ENTROPY 99 is of the form 1 A = (U + U ∗) U A . 2 ∈ Now for general A A, write ∈ 1 1 A = (A + A∗)+ i(A A∗) . 2 2i − which expresses A as a linear combination of self-adjoint elements of A. From here, the rest easily follows.

4.6. Lemma. Let A be a -subalgebra of M . Then for any A M ∗ n ∈ n (4.9) A A A = UAU ∗ for all U A′ . ∈ ⇐⇒ ∈ Proof: Since for unitary U, A = UAU ∗ if and only if UA = AU, the condition that A = UAU ∗ for all U A′ amounts to the condition that A commutes with every ∈ unitary matrix in A′. But by the previous lemma, commuting with every unitary matrix in A′ is the same thing as commuting with every matrix in A′. Thus A = UAU ∗ for all U A′ A A′′ . ∈ ⇐⇒ ∈ Then by the von Neumann Double Commutant Theorem, (4.9) follows. The fact that all -subalgebras of M contain “plenty of projections, and plenty ∗ n of unitaries”, as expressed by Lemma 4.5, is often useful. As we shall see, there is another important sense in which -subalgebra of M are rich in unitaries. We ∗ n ﬁrst recall the polar factorization of a matrix A M . ∈ n 4.7. Lemma. For any matrix A M , let A = (A∗A)1/2. Then there is a ∈ n | | unique partial isometry U M such that A = U A , U is an isometry from the ∈ n | | range of A∗ onto the range of A, and U is zero on the nullspace of A. If A is invertible, U is unitary, and in any case, for all v Cn, ∈ Uv = lim A(A∗A + ǫI)−1/2v . ǫ→0

We leave the easy proof to the reader. Now let pn(t) be a sequence of polynomi- als such that limn→∞ pn(t)= √t, uniformly on an interval containing the spectrum of A∗A. Then ∗ A = lim pn(A A) . | | n→∞ Now, if A is any -subalgebra of M , and A any matrix in A, then for each n, ∗ n p (A∗A) A, and hence A A. The same argument shows that for each ǫ > 0, n ∈ | | ∈ (A∗A + ǫI)1/2 A. We now claim that (A∗A + ǫI)−1/2 A as well. Indeed: ∈ ∈ 4.8. Lemma. Let A be any -subalgebra of M , and let B H+ belong to A. ∗ n ∈ n Then the inverse of B also belongs to A.

Proof: The spectrum of B −1B lies in the interval (0, 1], and hence I k k k − B −1B < 1. Thus, k k k ∞ ( B B)−1 = [I (I B B)]−1 = (I B B)n , k k − −k k −k k n=0 X 100 ERIC CARLEN and by the above, each term in the convergent power series on the right belongs to A. Thus ( B B)−1 belongs to A, and hence so does B−1. k k Thus for each ǫ> 0,

A A A(A∗A + ǫI)−1/2 A . ∈ ⇒ ∈ Taking limits, we see that if A = U A is the polar factorization of A, then both U | | and A belong to A. We can now improve Lemma 4.8: | | 4.9. Theorem. Let A be any -subalgebra of M . Then for all A A such ∗ n ∈ that A is invertible in Mn, A is invertible in A; i.e., the inverse of A belongs to A.

Proof: Let A be invertible in M , and let A = U A be the polar factorization of n | | A. Then A−1 = A −1U ∗. Since U A, which is a -subalgebra, U ∗ A as well. | | ∈ ∗ ∈ Since A is invertible, so is A , and we have seen that A A. Then by Lemma 4.8, | | | | ∈ A −1 A. Altogether, we have that | | ∈ A−1 = A −1U ∗ A . | | ∈

4.3. Properties of the conditional expectation.

4.10. Definition. For any -subalgebra A of M , let EA be the orthogonal ∗ n projection, with respect to the Hilbert-Schmidt inner product of Mn onto A (which is a closed subspace of Mn.) We refer to EA as the conditional expectation given A.

4.11. Example. Let u ,...,u be any orthonormal basis of Cn. Let A be { 1 n} the subalgebra of Mn consisting of matrices that are diagonal in this basis; i.e., A A if and only if A = n a u u∗ for some a ,...,a C. Here, we are ∈ j=1 j j j 1 n ∈ writing u u∗ to denote the orthogonal projection onto the span of the unit vector j j P u . In the usual physics notation, this is denoted by u u . We shall use these j | jih j | notations interchangeably. For any B M , the matrix B := n u ,Bu u u A, and moreover, ∈ n j=1h j j i| j ih j | ∈ for any A A, P ∈ e n Tr[A(B B)] = u , A(B B)u =0 , − h j − j i j=1 X e e and so B is the orthogonal projection of B onto A. Thus, we have the formula n (4.10) e EA(B)= u ,Bu u u h j j i| jih j | j=1 X for all B M , where in the usual physical notation, u denotes the orthogonal ∈ n ih j | projection onto the span of uj.

The next result is based on the projection lemma, which is very useful also in ﬁnite dimensions! For the sake of completeness, we give the simple proof: TRACE INEQUALITIES AND QUANTUM ENTROPY 101

4.12. Theorem (Projection Lemma). Let K be a closed convex set in a Hilbert space. Then K contains a unique element of minimal norm. That is, there exists v K such that v < w for all w K, w = v. ∈ k k k k ∈ 6 Proof: Let D := inf w : w K . If D = 0, then 0 K since K is closed, and {k k ∈ } ∈ this is the unique element of minimal norm. Hence we may suppose that D > 0. Let w N be a sequence in K such that lim w = D. By the parallelogram { n}n∈ n→∞ k nk identity w + w 2 w w 2 w 2 + w 2 m n + m − n = k mk k nk . 2 2 2

2 wm + wn 2 By the convexity of K, and the deﬁnition of D, D and so 2 ≥

2 2 2 2 2 w w wm D + wn D m − n k k − k k − . 2 ≤ 2

By construction, the right side tends to zero, and so wn n∈N is a Cauchy sequence. { } Then, by the completeness that is a deﬁning property of Hilbert spaces, w N { n}n∈ is a convergent sequence. Let v denote the limit. By the continuity of the norm, v = lim w = D. Finally, if u is any other vector in K with u = D, k k n→∞ k nk k k (u + v)/2 K, so that (u + v)/2 D. Then by the parallelogram identity once ∈ k k≥ more (u v)/2 = 0, and so u = v. This proves the uniqueness. k − k 4.13. Theorem. For any A M , and any -subalgebra A of M , let K ∈ n ∗ n A denote the closed convex hull of the operators UAU ∗, U A′. Then EA(A) is the ∈ unique element of minimal (Hilbert-Schmidt) norm in KA. Furthermore,

(4.11) Tr[EA(A)] = Tr[A] ,

(4.12) A> 0 EA(A) > 0 , ⇒ and for each B A, ∈ (4.13) EA(BA)= BEA(A) and EA(AB)= EA(A)B .

Proof: We apply the Projection Lemma in Mn equipped with the Hilbert-Schmidt inner product, so that it becomes a Hilbert space, and we may then apply the Projection Lemma. For each A M , let A denote the unique element of minimal ∈ n norm in K , and let U A′ be unitary. Then by the parallelogram law, A ∈ 2 2 e A + UAU ∗ A UAU ∗ A 2 + UAU ∗ 2 + − = k k k k = A 2 . 2 2 2 k k

e e e e e e Since (A + UAU ∗)/2 K , (A + UAU ∗)/2 A , the minimale norm in K , ∈ A k k≥k k A and hence 2 e e eA UeAU ∗ e − =0 . 2 e e ∗ ′ This means that A = UAU for all unitary U A . By Lemma 4.6, this means ∈ that A A. ∈ e e e 102 ERIC CARLEN

Next we claim that (4.14) B, A A = 0 for all B A , h − i ∈ which, together with the fact that A A identiﬁes A as the orthogonal projection e ∈ of A onto A. ′ ∗ ∗ To prove (4.14) note that for Be A and Uj Ae, U(AB)U = UAU B Hence N ∗ ∈ ∈ if j=1 wj Uj(AB)Uj is any convex combination of unitary conjugations of AB with each U A′, P j ∈ N N w U (AB)U ∗ = w U AU ∗ B . j j j  j j j  j=1 j=1 X X It readily follows that   (4.15) AB = AB . Now observe that since unitary conjugation preserves the trace, each element g e of K has the same trace, namely the trace of A. In particular, for all A M , A ∈ n (4.16) Tr[A] = Tr[A] . Combining this with (4.15) yields e 0 = Tr[AB] Tr[AB] − = Tr[AB] Tr[AB] g − = Tr[(A A)B] . e − Since B is an arbitrary element of A, this proves (4.14). Thus, A is the orthogonal e projection of A onto A. Now that we know A = EA(A), (4.11) follows from (4.16), and the identity on the right in (4.13) now follows from (4.15), ande then the identity on the left follows by Hermitian conjugation.e Finally, if A > 0, so is each UAU ∗, and hence so is each member of KA, including the element of minimal norm, EA(A).

4.14. Remark. Theorem 4.13 says that for all A M , all -subalgebras A of ∈ n ∗ M and all ǫ> 0, there exists some set of N unitaries U ,...,U A′ and some n 1 N ∈ set of N weights, w1,...,wN , non-negative and summing to one, such that N ∗ (4.17) EA(A) w U AU ǫ . k − j j j k≤ j=1 X In fact, in finite dimensional settings, one can often avoid the limiting procedure and simply write N ∗ (4.18) EA(A)= wj Uj AUj , j=1 X as an exact equality. An important instance is provided in the next example. How- ever, while the statement of Theorem 4.13 is couched in finite dimensional terms, this is only for simplicity of exposition: The proof makes no reference to the finite TRACE INEQUALITIES AND QUANTUM ENTROPY 103 dimension of Mn, and the approximation of conditional expectations provided by (4.17) is valid – and useful – in infinite dimensions as well.

4.15. Example. As in Example 4.11, let u1,...,un be any orthonormal basis n { } of C , and let A be the subalgebra of Mn consisting of matrices that are diagonal in this basis. There we derived an explicit formula (4.10) for EA. Theorem 4.13 says that there exists an alternate expression of EA as a limit of averages of unitary conjugations. In fact, it can be expressed as an average over n unitary conjugations, and no limit is needed in this case. To see this, for k =1,...,n deﬁne the unitary matrix Uk by n (4.19) U = ei2πℓk/n u u . k | ℓih ℓ| Xℓ=1 n Then, for any B M , U BU ∗ = u Bu ei2π(m−ℓ)k/n u u . Therefore, ∈ n k k h m ℓi | mih ℓ| ℓ,m=1 averaging over k, and then swappingX orders of summation,

n n n 1 ∗ 1 i2π(m−ℓ)k/n UkBUk = umBuℓ e um uℓ n h i n ! | ih | kX=1 ℓ,mX=1 Xk=1 n = u Bu u u = EA(B) . h m mi| mih m| m=1 X In summary, with U ,...,U deﬁned by (4.19), and A being the -subalgebra 1 n ∗ of M consisting of matrices that are diagonalized by u ,...,u , n { 1 n} n 1 ∗ (4.20) EA(B)= U BU . n k k Xk=1 In other words, the “diagonal part” of B is an average over n unitary conjugations of B.

Theorem 4.13 is the source of many convexity inequalities for trace functions. Here is one of the most important:

4.16. Theorem. Let f be any operator convex function. Then for any - ∗ subalgebra A of M , and any self-adjoint operator A M , n ∈ n f(EA(A)) EA(f(A)) . ≤ Proof: Since f(EA(A)) EA(f(A)) is a self-adjoint operator in A, all of its spec- − tral projections are in A, and thus, it suﬃces to show that for every orthogonal projection P in A,

(4.21) Tr [Pf(EA(A))] Tr [P EA(f(A))] . ≤ But, since P A, ∈ (4.22) Tr [P EA(f(A))] = Tr [Pf(A)] . 104 ERIC CARLEN

Next, by Theorem 4.13, for any A M , EA(A) is a limit of averages of unitary ∈ n conjugates of A. That is EA(A) = lim (A), where each (A) has the form k→∞ Ck Ck Nk (4.23) (A)= p U AU ∗ Ck k,j n,j k,j j=1 X A′ Nk and where for each k,j Uk,j is a unitary in , pk,j > 0, and j=1 pk,j = 1. Then, by the operator convexity of f, P

Nk f( (A)) p U f(A)U ∗ , Ck ≤  k,j k,j k,j  j=1 X   and then since P A and U A′ for each k, j, ∈ k,j ∈

Nk Tr[Pf( (A))] p Tr U Pf(A)U ∗ Ck ≤ k,j k,j k,j j=1 X Nk = pk,j Tr [Pf (A)] j=1 X = Tr[Pf(A)] .

Therefore,

Tr [Pf(EA(A))] = lim Tr[Pf( k(A))] Tr[Pf(A)] . k→∞ C ≤ Combining this with (4.22) proves (4.21). Let us apply this to the von Neumann entropy. First of all, note that since EA is the orthogonal projection onto A, it is continuous. Thus, EA not only preserves the class of positive deﬁnite operators, as asserted in (4.12), it also preserves the class of positive semideﬁnite operators:

(4.24) A 0 EA(A) 0 . ≥ ⇒ ≥ This, together with (4.11) implies that if ρ M is a density matrix, then so is ∈ n EA(ρ). Now we may apply Theorem 4.16 to see that for any -subalgebra of M , and ∗ n any ρ S , the von Neumann entropy of EA(ρ), S(EA(ρ)), is no less than the von ∈ n Neumann entropy of ρ, S(ρ):

S(EA(ρ)) S(ρ) . ≥

In later sections, where we shall encounter jointly convex functions on Mn, we shall apply the same sort of reasoning. Towards that end, the following simple observation is useful: For any A, B in M , there is a single sequence N of n {Ck}k∈ operators of the form (4.23) such that both

EA(A) = lim k(A) and EA(B) = lim k(B) . k→∞ C k→∞ C TRACE INEQUALITIES AND QUANTUM ENTROPY 105

There is a very simple way to see that this is possible: Consider the -subalgebra ∗ M2(A) of M2n consisting of block matrices of the form

A B , A,B,C,D A . C D ∈

Then the same computations which show that the only matrices in M2 that commute with all other matrices in M2 are multiples of the identity show that ′ (M2(A)) consists of matrices of the form

X 0 , X A′ , 0 X ∈ ′ and hence the unitaries in (M2(A)) have the form

U 0 , U A′ ,UU ∗ = I , 0 U ∈ One readily computes that for any A,B,C,D M (M now, not only A), ∈ n n and any U A′, ∈ ∗ U 0 A B U 0 UAU ∗ UBU ∗ = . 0 U C D 0 U UCU ∗ UDU ∗ and moreover,

2 A B = A 2 + B 2 + C 2 + D 2 . C D k kHS k kHS k kHS k kHS HS

From this and Theorem 4.13, one readily concludes that

A B EA(A) EA(B) EM2(A) = , C D EA(C) EA(D) and that there exists a sequence N of operators of the form (4.23) such that {Ck}k∈ EA(A) EA(B) (A) (B) = lim Ck Ck . EA(C) EA(D) k→∞ (C) (D) Ck Ck

The same argument clearly applies to the larger block-matrix algebras Mm(A), m 2, and we draw the following conclusion: ≥ 4.17. Lemma. For any m matrices A ,...,A M , and any -subalgebra A 1 m ∈ n ∗ of M , there exists a sequence N of operators of the form (4.23) such that , n {Ck}k∈

EA(Aj ) = lim k(Aj ) for each j =1,...,m . k→∞ C The block matrix construction that has led to the proof of Lemma 4.17 provides a powerful perspective on a great many problems, and it will turn out to be important for far more than the proof of this lemma. In the meantime however, let us turn to some concrete examples of conditional expectations. 106 ERIC CARLEN

4.4. Pinching, conditional expectations, and the Operator Jensen In- equality.

4.18. Example (Pinching). Let A Hn have the spectral representation A = k k ∈ j=1 λj Pj , with j=1 Pj = I. (That is, we include the zero eigenvalue in the sum if zero is an eigenvalue.) Let A denote the commutant of A, which, as we have P P A observed is a -subalgebra of M . For simplicity of notation, let E denote the ∗ n A conditional expectation given AA; i.e.,

EA := EAA . We now claim that for any B M , ∈ n k (4.25) EA(B)= Pj BPj . j=1 X To prove this, note that Pj A = APj = λj Pj , so that

k k k P BP A = λ P BP = A P BP ,  j j  j j j  j j  j=1 j=1 j=1 X X X     and thus the right hand side of (4.25) belongs to AA. Next, ss we have seen in (4.8), each of the spectral projections Pj can be written as a polynomial in A, and hence belongs to A . Furthermore, for all C A , and A ∈ A each j =1 ...,k, CPj = Pj C. Therefore, for such C,

k k P BP C = P BCP ,  j j  j j j=1 j=1 X X so that   k k Tr B P BP C = Tr BC P BCP =0  −  j j    −  j j  j=1 j=1 X X k        since j=1 Pj = I. This shows that the right hand side of (4.25) is in fact the orthogonal projection of B onto AA, and proves (4.25). P k Davis [8] refers to the operation B j=1 Pj BPj for a set of orthogonal pro- k 7→ jections P1,...,Pk satisfying j=1 Pj = I Pas a pinching operation. The calculation we have just made shows that pinching is a conditional expectation: Indeed, given P k the orthogonal projections P1,...,Pk satisfying j=1 Pj = I, deﬁne the self-adjoint k operator A by A = j=1 jPj = I. With this deﬁnitionP of A, (4.25) is true using A on the left and P ,...,P on the right. 1 P k It now follows from Theorem 4.16 that for any operator convex function f, and k any set of orthogonal projections P1,...,Pk satisfying j=1 Pj = I,

k P (4.26) f P BP f(B)  j j  ≤ j=1 X   TRACE INEQUALITIES AND QUANTUM ENTROPY 107 for all B in Hn. The fact that “pinching” is actually a conditional expectation has several useful consequences, with which we close this section.

4.19. Theorem (Sherman-Davis Inequality). For all operator convex functions f, and all orthogonal projectios P M , ∈ n (4.27) Pf(P AP )P Pf(A)P ≤ for all A H . ∈ n

Proof: We take P = P and let P = I P . Then using (4.26) for P + P = I 1 2 − 1 1 2 and 2 P f P BP P = P f(P BP )P, 1  j j  1 1 1 1 j=1 X we obtain (4.27).   Theorem 4.19 is due to Davis and Sherman [7, 8]. Note that if f(0) = 0, (4.27) may be shortened to f(P AP ) Pf(A)P , ≤ but one case that comes up in applications is f(s)= s−1 where (4.27) must be used as is. The next inequality is a variant of Theorem 4.19 due to Davis [8].

4.20. Theorem (The Operator Jensen Inequality). Let V ,...,V M satisfy 1 k ∈ n k ∗ (4.28) Vj Vj = I j=1 X Then for any operator convex function f, and any B ,...,B H+, 1 k ∈ n k k (4.29) f V ∗B V V ∗f(B )V .  j j j  ≤ j j j j=1 j=1 X X   Proof: Let be any kn kn unitary matrix which, when viewed as a k k block U × × matrix with n n blocks has × Ui,j (4.30) = V i =1,...,n . Ui,n i Since (4.28) is satisfied, there are many ways to construct such a matrix : (4.30) U specifies the final n columns, which are unit vectors in Ckn by (4.28), and then the remaining columns can be filled in by extending these n unit vectors to an orthonormal basis of Ckn. Next, let be the kn kn matrix, again viewed as an k k block matrix, which B × × has B for its jth diagonal block, and zero for all off-diagonal blocks. Finally, let j P by the kn kn orthogonal projection with I in the upper left block, and zeros × n×n elsewhere. 108 ERIC CARLEN

Note that f( ) has f(B ) as its jth diagonal block, and zeros elsewhere. A B j simple calculation now shows that ∗ has k V ∗B V as its upper left n n U BU j=1 j j j × block, and P f ( ∗ )= ∗f( ) U BU U B U has k V ∗f(B )V as its upper left n n block. j=1 j j j × By (4.27), P f ( ∗ ) f ( ∗ ) , P PU BUP P≤P U BU P which, by the calculation we have just made, and by the deﬁnition of , is equivalent P to (4.29).

4.21. Remark. It is clear, upon taking each Vj to be a positive multiple of the identity, that the operator convexity of f is not only a suﬃcient condition for (4.29) to hold whenever V ,...,V M satisfy (4.28); it is also necessary. It 1 k ∈ n is remarkable that the class of functions f with the operator convexity property of Theorem 4.20, in which the convex combination is taken using operator valued weights, is not a proper subclass of the class of operator convex functions we have already deﬁned using scalar valued weights.

5. Tensor products 5.1. Basic definitions and elementary properties of tensor products. If V and W are two finite dimensional vector spaces, their tensor product is the space of all bilinear forms on V ∗ W ∗, where V ∗ and W ∗ are the dual spaces of V × and W respectively. That is, V ∗ consists of the linear functionals f on V , with the usual vector space structure ascribed to spaces of functions, and similarly for W . Of course, in any inner product space, we have an identification of V ∗ with V provided by the inner product. However, certain formulas for the tensor product that we shall make much use of will be most clear if we introduce the tensor product in its purest form as a vector space construct, without reference to any inner product. The next few paragraphs recall some elementary facts about dual spaces and matrix representations. Many readers may prefer to skip ahead to the formal definition of the tensor product, but we include the material to ensure that our notation is absolutely unambiguous. If v ,...,v is any basis of V , let f ,...,f denote the corresponding dual { 1 m} { 1 m} basis of V ∗. That is, for any v V , write ∈ m v = a v , a ,...,a C . j j 1 m ∈ j=1 X Since the coefficients a1,...,aj are uniquely determined by v, the map f : v a j 7→ j is well defined and is clearly a linear transformation from V to C; i.e., an element of V ∗. It is easy to see that f ,...,f spans V ∗, and also that { 1 m} f (v )= δ , 1 i, j m i j i,j ≤ ≤ TRACE INEQUALITIES AND QUANTUM ENTROPY 109 from which linear independence easily follows. Thus, f ,...,f is a basis of V ∗, { 1 m} and is, by definition, the basis dual to v ,...,v . { 1 m} The coordinate maps v (f (v),...,f (v)) Cm 7→ 1 m ∈ and f (f(v ),...,f(v )) Cm 7→ 1 m ∈ are the isomorphisms of V and V ∗ respectively with Cm that are induced by the dual bases v ,...,v and f ,...,f , and ultimately by the basis v ,...,v , { 1 m} { 1 m} { 1 m} since this determines its dual basis. In particular, for any v V and any f V ∗, ∈ ∈ m m (5.1) v = fj(v)vj and f = f(vj )fj . j=1 j=1 X X The dual basis is useful for many purposes. One is writing down matrix representations of linear transformations. If T : V V is any linear transformation of → V , let [T ] M be defined by ∈ m [T ]i,j = fi(T (vj )) , For any fixed basis, the matrix [T ] gives the action of T on coordinate vectors for that basis: If a vector v V has jth coordinate aj ; i.e., aj = fj (v), j = 1,...,m, ∈ m then T v has ith coordinate j=1[T ]i,j aj . 5.1. Definition (TensorP product of two finite dimensional vector spaces). For two finite dimensional vector spaces V and W , their tensor product space V W ⊗ is the vector space consisting of all bilinear forms K on V ∗ W ∗, equipped with × the usual vector space structure ascribed to spaces of functions. Given v V and w W , v w denote the bilinear from on V ∗ W ∗ given by ∈ ∈ ⊗ × (5.2) v w(f,g)= f(v)g(w) for all f V ∗ ,g W ∗ . ⊗ ∈ ∈ If v ,...,v and w ,...,w are bases of V and W respectively, let { 1 m} { 1 n} f ,...,f and g ,...,g denote the corresponding dual bases. By (5.1) and { 1 m} { 1 n} the definition of v w , for any bilinear form K on V ∗ W ∗, i ⊗ j × K(f,g)= K(f(vi)fi,g(wj )gj)= K(fi,gj )f(vi)g(wj ) i,j i,j X X = K(f ,g )[v w ](f,g) . i j i ⊗ j i,j X That is, (5.3) K = K(f ,g )[v w ] . i j i ⊗ j i,j X Thus, (5.4) v w : 1 i m , 1 j n { i ⊗ j ≤ ≤ ≤ ≤ } spans V W . It is also linearly independent, and is therefore a basis of V W . ⊗ ⊗ To see this linear independence, suppose that for some numbers b , 1 i i,j ≤ ≤ m , 1 j n, b v w = 0; i.e., b v w is the bilinear map on ≤ ≤ i,j i,j i ⊗ j i,j i,j i ⊗ j P P 110 ERIC CARLEN

V W sending everything to zero. But then applying b v w to (f ,g ) × i,j i,j i ⊗ j k ℓ we see P 0= b v w (f ,g )= b f (v )g (w )= b ,  i,j i ⊗ j  k ℓ i,j k i ℓ j k,ℓ i,j i,j X X which shows the linear independence. We are now in a position to deﬁne a key isomorphism: 5.2. Definition (Matrix isomorphism). Given any two bases v ,...,v and { 1 m} w ,...,w of V and W respectively, and hence the corresponding dual bases { 1 n} f ,...,f and g ,...,g of V ∗ and W ∗ respectively, the matrix isomorphism { 1 m} { 1 n} is the identiﬁcation of V W with the space M of m n matrices given by ⊗ m×n × (5.5) V W K [K] M ⊗ ∋ 7→ ∈ m×n where

(5.6) [K]i,j = K(fi,gj) . The fact that (5.5) is an isomorphism follows directly from (5.3) and the fact that (5.4) is a basis of V W . Of course, this isomorphism depends on the choice ⊗ of bases, but that shall not diminish its utility. 5.3. Example. For any v V and w W , what is the matrix corresponding ∈ ∈ to v w? (Of course we assume that the bases v ,...,v and w ,...,w of ⊗ { 1 m} { 1 n} V and W , and their corresponding dual bases are specified.) Since v w(f ,g )= ⊗ i j fi(v)gj (w), we have (5.7) [v w] = [v w](f ,g )= f (v)g (w) = [v] [w] ⊗ i,j ⊗ i j i j i j where [v]i := fi(v) is the ith coordinate of v, while [w]j := gj(w) is the jth coordinate of w. In other words, the matrix corresponding to v w, for this choice of ⊗ bases, is the rank one matrix with entries [v]i[w]j . Every rank one matrix arises this way, and thus the matrix isomorphism identi- fies the set of product vectors in V W with the set of rank one matrices in M . ⊗ m×n Since we know how to compute the rank of matrices by row reduction, this gives us a means to determine whether or not any given element of V W is a product ⊗ vector on not. 5.4. Definition (Entanglement). The Schmidt rank of a vector K in V W ⊗ is the rank of the corresponding matrix [K] M . (Note that this rank is ∈ m×n independent of the choice of bases used to determine [K].) If the Schmidt rank of K is greater than one, then K is an entangled vector, and otherwise, if the Schmidt rank of K equals one, K is a product vector, in which case one may say K is unentangled. As noted above, the matrix isomorphism provides an effective means to determine whether a given K V W is entangled or not. ∈ ⊗ Now let T : V V and S : W W be linear transformations. This pair, → → (T,S), induces a linear transformation T S : V W V W by ⊗ ⊗ → ⊗ [T S(K)](f,g)= K(f S,g T ) , ⊗ ◦ ◦ TRACE INEQUALITIES AND QUANTUM ENTROPY 111 where of course f T V ∗ is given by f T (v)= f(T (v)), and similarly for g S. ◦ ∈ ◦ ◦ By (5.1), m m m f T = ((f T )(v )f = f (T (v ))f = [T ] f , i ◦ i ◦ k k i k k i,k k kX=1 Xk=1 Xk=1 and likewise n g S = [S] g . j ◦ j,ℓ ℓ Xℓ=1 Therefore, (5.8) (T S)K(f ,g )= [T ] [S] K(f ,g ) , ⊗ i j i,k j,ℓ k ℓ Xk,ℓ which means that (5.9) [(T S)K] = [T ] [S] [K] . ⊗ i,j i,k j,ℓ k,ℓ Xk,ℓ In other words, under the isometry K [K] of V W with the space of m n 7→ ⊗ × matrices, the action of T S on V W has a very simple matrix expression: The ⊗ ⊗ matrix [T ] of T acts on the left index of [K], and the matrix [S] of S acts on the right index of K.

5.2. Tensor products and inner products. Now suppose that V and W are inner product spaces; i.e., ﬁnite dimensional Hilbert spaces. We denote the inner product on either space by , . At this stage we may as well identify V with h· ·i Cm and W with Cn, so let us suppose that V = Cm and W = Cn, both equipped with their standard Euclidean inner products. Now let v ,...,v and w ,...,w be orthonormal bases for V and W { 1 m} { 1 n} respectively. We can now express the dual basis elements in terms of the inner product: For any v V and w W , f (v) = v , v , i = 1,...,m and g (w) = ∈ ∈ i h i i j w , w , j =1,...,n. In particular, from (5.7) we have that h j i (5.10) [v w] = v , v w , w 1 i m , 1 j n . ⊗ i,j h i ih j i ≤ ≤ ≤ ≤ As above, for K V W , let [K] denote the m n matrix corresponding ∈ ⊗ × to K under the matrix isomorphism that is induced by our choice of orthonormal bases. Now use the Hilbert-Schmidt inner product on the space of m n matrices × to induce an inner product on V W . Deﬁne, for B, C V W , ⊗ ∈ ⊗ (5.11) B, C = Tr ([B]∗[C]) = [B] [C] . h i i,j j,i i,j X Combining (5.10) and (5.11), we see that for any v, v′ V and any w, w′ W , ∈ ∈ v w, v′ w′ = v , v w , w v , v′ w , v′ h ⊗ ⊗ i h i ih i ih i ih i i i,j X m m = v, v v , v′ w, w w , w′ h iih i i h j ih j i i=1 ! i=1 ! X X (5.12) = v, v′ w, w′ . h ih i 112 ERIC CARLEN

Notice that the right hand side does not depend on our choices of orthonormal bases. Thus, while our inner product on V W defined by (5.11) might at first ⊗ sight seem to depend on the choice of the orthonormal bases used to identify V W ⊗ with the space of m n matrices, we see that this is not the case. × There is one more important conclusion to be drawn from (5.12): For any orthonormal bases v ,...,v and w ,...,w of V and W respectively, { 1 m} { 1 n} v w : 1 i m , 1 j n { i ⊗ j ≤ ≤ ≤ ≤ } is an orthonormal basis of V W . ⊗ When V and W are inner product spaces, we can quantify the degree of entanglement of vectors in V W in a meaningful way, independent of the choice ⊗ of bases. The Schmidt rank gives one such quantification, but as rank is not a continuous function on Mm×n, it has limited use beyond its fundamental role in defining entanglement. Recall that any K M has a singular value decomposition ∈ m×n (5.13) K = UΣV ∗ where Σ is an r r diagonal matrix with strictly positive entries σ σ × 1 ≥ ··· ≥ r known as the singular values of K, and where U and V are isometries from Cr into m n C and C respectively. That is U = [u1,...,ur] where each uj is a unit vector in m n C , and V = [v1,...,vr] where each vj is a unit vector in C . In other words, ∗ ∗ (5.14) U U = V V = Ir×r . Evidently r is the rank of K. While the matrices U and V are “essentially uniquely” determined by K, what is important to us here is that the matrix Σ is absolutely uniquely determined by K: It makes sense to speak of the singular values of K. By the definition of the inner product on V W , if K is any unit vector in ⊗ V W , then ⊗ (5.15) Tr[K∗K] = Tr[KK∗]=1 , and so both K∗K and KK∗ are density matrices, on Cn and Cm respectively. Notice that by (5.14) K∗K = V Σ2V ∗ and KK∗ = UΣ2U ∗ so the squares of the singular values of K are the non-zero eigenvalues of these two r 2 density matrices, and in particular j=1 σj = 1. Computing the von Neumann entropies of these two density matrices, we find P r S(K∗K)= S(KK∗)= σ2 log(σ2) . − j j j=1 X Thus, we come to the conclusion that K is a product state if and only if S(K∗K) = 0, and otherwise, if S(K∗K) > 0, K is entangled. Since S(K∗K) depends continuously on K, this provides us with a useful measure of the degree of entanglement. TRACE INEQUALITIES AND QUANTUM ENTROPY 113

5.3. Tensor products of matrices. In this subsection, we focus on the tensor product of Cm and Cn, each equipped with their usual Euclidean inner products. Given matrices A M and B M , identify these matrices with the linear ∈ m ∈ n transformations that they induce on Cm and Cn respectively. It then follows from (5.7) and (5.9) that the map (5.16) v w Av Bw , ⊗ 7→ ⊗ extends to a linear transformation on Cm Cn. This linear transformation is ⊗ denoted by A B. Expressing it more generally and concretely, it follows from ⊗ (5.9) that for any K M regarded as a vector in Cm Cn, ∈ m×n ⊗ (5.17) [(A B)K] = A B K . ⊗ i,j i,k j,ℓ k,ℓ Xk,ℓ Thus, for all A, C M and B,D M , ∈ m ∈ n (5.18) (A B)(C D) = (AC) (BD) . ⊗ ⊗ ⊗ In particular, if A and B are invertible, then so is A B, and ⊗ (A B)−1 = A−1 B−1 . ⊗ ⊗ It follows from (5.12) that for all A M , B M , v , v Cm and w , w ∈ m ∈ n 1 2 ∈ 1 2 ∈ Cn, v w , (A B)v w = v , Av w , Bw = A∗v , v B∗w , w h 1 ⊗ 1 ⊗ 2 ⊗ 2i h 1 2ih 1 2i h 1 2ih 1 2i = (A∗ B∗)v w , v w . h ⊗ 1 ⊗ 1 2 ⊗ 2i That is, (5.19) (A B)∗ = A∗ B∗ . ⊗ ⊗ Consequently, suppose A H+ and B H+. Then we can write A = C∗C ∈ m ∈ n and B = D∗D for C M and D M . Then A B = (C D)∗(C D), so ∈ m ∈ n ⊗ ⊗ ⊗ (A B) is at least positive semi-definite. Since A and B are both invertible, so ⊗ is A B, and hence (A B) is positive definite. That is, whenever A H+ and ⊗ ⊗ ∈ m B H+, then A B is positive definite. ∈ n ⊗ The equation (5.17) provides one useful way to represent the action of the operator A B, but there is another that is also often useful: a representation of ⊗ the operator A B in terms of block matrices. If K = [v ,...,v ] is the m n ⊗ 1 n × matrix whose jth column is v Cm, let us “vertically stack” K as a vector in j ∈ Cmn:

v1 . (5.20) Kvec =  .  . v  n  Then A B is represented by the block matrix ⊗ B1,1A B1,nA . ·. · · . (5.21)  . .. .  . B A B A  n,1 n,n   · · ·  114 ERIC CARLEN

5.4. The partial trace. Let D be any operator on Cm Cn, regarded as an ⊗ mn dimensional inner product space, as described above. Let Tr denote the trace on Cm Cn, and consider the linear functional ⊗ A Tr[DA] 7→ defined on A := (Cm Cn), the operators on Cm Cn. L ⊗ ⊗ Let A1 and A2 be the subalgebras of A consisting of operators of the form B I and I C respectively. (In what follows, we shall usually simply ⊗ n×n m×m ⊗ write I in place of Im×m or In×n where the meaning is clear.) The maps B Tr[D(B I)] and C Tr[D(I C)] 7→ ⊗ 7→ ⊗ are then linear functionals on Mm and Mn respectively. Since M is an inner product space with the inner product X, Y = Tr[X∗Y ], n h i for every linear functional ϕ on M , there is a unique X M such that n ϕ ∈ n ϕ(Y )= X , Y = Tr[X Y ] for all Y M . h ϕ i ϕ ∈ n This justifies the following definition:

5.5. Definition. For any operator D on Cm Cn, Tr [D] is the unique element ⊗ 1 of Mn such that (5.22) Tr[D(I C)] = (I C∗),D = C∗, Tr [D] = Tr [Tr [D]C] , ⊗ h ⊗ i h 1 i 1 where the trace on the left hand side of (5.22) is taken on Cm Cn, and the trace n ⊗ on the right is taken on C . We refer to Tr1[D] as the partial trace of D onto Mn. In the same way, we deﬁne Tr2[D] so that (5.23) Tr[(Tr D)B] = Tr[D(B I)] 2 ⊗ for all B M . ∈ m If we represent K Cm Cn as a vector in Cmn as in (5.20) then D can be ∈ ⊗ represented as a block matrix with n2 blocks

D(1,1) D(1,n) . ·. · · . (5.24)  . .. .  , D D  (n,1) (n,n)   · · ·  where each D M . Then by (5.20), (i,j) ∈ m

D(1,1)B D(1,n)B . ·. · · . (5.25) D(B I)= . .. . , ⊗  . .  D B D B  (n,1) (n,n)   · · ·  and therefore n Tr[D(B I)] = Tr[D B] , ⊗ (j,j) j=1 X TRACE INEQUALITIES AND QUANTUM ENTROPY 115 where the trace on the left is taken in Cm Cn, and on the right in Cm. Thus we ⊗ see n

Tr2[D]= D(j,j) . j=1 X That is, if D is written in block matrix form, then the partial trace is simply the sum of the diagonal blocks. The partial trace has an important physical interpretation. In quantum mechanics, the density matrices ρ on Cm Cn represent the possible states of a system ⊗ whose observables are operators A on Cm Cn. Then the value Tr[ρA] represents ⊗ the expected value of a measurement of the observable A, at least in the case that S is self-adjoint, in which case a well-defined measurement procedure is supposed to exist. Let A denote the algebra of observables on the whole system, i.e., A denotes the linear transformations from Cm Cn into itself. ⊗ The tensor product structure of our (finite dimensional) Hilbert space Cm Cn ⊗ arises whenever our quantum mechanical system is composed of two subsystems: The first may consist of some degrees of freedom that we are trying to measure; i.e., that are coupled to some experimental apparatus, and the second may be the “environment”, a heat bath of some sort, or just some other degrees of freedom that are not directly coupled to our measurement apparatus. In this circumstance, the subalgebra A of observables of the form B I; i.e., 1 ⊗ observables on the first subsystem is of obvious interest. And clearly, it is of obvious interest to restrict the linear functional A Tr[ρA] , 7→ which gives expected values, to the subalgebra A1 of observables that our apparatus might measure. This restriction is evidently given by (B I) Tr[ρ(B I)] . ⊗ 7→ ⊗ The partial trace allows us to express this restriction in terms of a density matrix on the subsystem. By the definition of the partial trace, Tr[ρ(B I)] = Tr [Tr [ρ]B] . ⊗ 2 m The fact that Tr2[ρ] is a density matrix on C whenever ρ is a density matrix on Cm Cn is clear from the fact that ⊗ B 0 B I 0 Tr[ρ(B I)] 0 Tr [Tr [ρ]B] 0 , ≥ ⇒ ⊗ ≥ → ⊗ ≥ ⇒ 2 ≥ so that Tr [ρ] 0, and taking B = I, we see that Tr[Tr [ρ]] = 1. In summary: 2 ≥ 2 5.6. Theorem (The partial traces preserve positivity and traces). For all operators D on Cm Cn, the map D Tr (D), j =1, 2 satisfies ⊗ 7→ j (5.26) Tr[Trj (D)] = Tr[D] , and (5.27) D 0 Tr (D) 0 . ≥ ⇒ j ≥ That is, D Tr (D), j =1, 2, is trace preserving and positivity preserving. 7→ j 116 ERIC CARLEN

We now make an observation that may already have occurred to the reader: The partial trace is nothing but a special case of the conditional expectation: Using the notation introduced above, consider EA2 , the conditional expectation that is the orthogonal projection onto the -subalgebra A consisting of operators in A of ∗ 2 the form I C, C M . Then, by the definition of the conditional expectation as ⊗ ∈ n an orthogonal projection, for any D A, and any C M , ∈ ∈ n ∗ ∗ (5.28) I C ,D = I C , EA (D) . h ⊗ i h ⊗ 2 i By definition, EA (D) has the form I D for some D M . Thus, we can rewrite 2 ⊗ ∈ n (5.28) as e e TrCm Cn [(I C)D] = TrCm Cn [(I C)(I D)] = mTrCn [CD] , ⊗ ⊗ ⊗ ⊗ ⊗ where the subscripts indicate the different Hilbert spacese over which thee traces are taken. Comparing this with (5.22), we see that D = Tr1[D]. That is, 1 I Tr1[D] = EAe2 (D) . m ⊗ This result, combined with Theorem 4.13 allows us to express partial traces as averages over unitary conjugations, and this will be useful in many applications of convexity. Therefore, we summarize in a theorem:

5.7. Theorem. Let A := (Cm Cn), the -algebra of linear transformations L ⊗ ∗ from Cm Cn into itself. Let A and A be the -subalgebras of A consisting of ⊗ 1 2 ∗ operators of the form B I and I C respectively, with B M and ⊗ n×n m×m ⊗ ∈ m C M . Then, for any D A, ∈ n ∈ 1 1 (5.29) I Tr [D]=EA (D) and Tr [D] I =EA (D) . m m×m ⊗ 1 2 n 2 ⊗ n×n 1 A′ A Continuing with the notation of Theorem 5.7, we observe that 1 = 2 and A′ A A′ A 2 = 1. In particular, the unitaries in 2 are the unitaries in 1, which means they have the form U I, where U is a unitary in M . ⊗ m We also mention at this point that the maps D Tr (D), j = 1, 2 are not 7→ j only positivity preserving, but that they also have a stronger property known as complete positivity. This is physically very signiﬁcant, and we shall explain this later. In the meantime, let us make one more deﬁnition, and then turn to examples and digest what has been introduced so far.

5.8. Definition. The map B B on M is deﬁned so that each entry 7→ m×n of B is the complex conjugate of the corresponding entry of B. This map is an antilinear isometry from Mm×n to itself.

The map B B preserves positivity: From the spectral representation written 7→ n ∗ in the form B = j=1 λj ujuj , one sees that B is unitarily equivalent to B under the unitary transformation that takes each u to its complex conjugate u . In P j j particular, if B H+, then B H+ as well, and for any f : R R, ∈ n ∈ n → f(B)= f(B) . TRACE INEQUALITIES AND QUANTUM ENTROPY 117

5.9. Example (Tr ( K K ) and Tr ( K K )). For K M with 1 | vecih vec| 2 | vecih vec| ∈ m×n Tr[K∗K] = 1, considered as a unit vector K in Cm Cn, let K K denote vec ⊗ | vecih vec| the rank one projection onto the span of K in Cm Cn. In the language and vec ⊗ notation of quantum statistical mechanics, this projection is a pure state density matrix on Cm Cn. . ⊗ By deﬁnition, for all A M , ∈ m Tr[Tr ( K K )A] = Tr[ K K (A I )] 2 | vecih vec| | vecih vec| ⊗ n×n = K , (A I )K h vec ⊗ n×n veci ∗ = Kj,i Ai,kKk,j i,j X Xk = Tr[K∗AK] = Tr[(KK∗)A] .

Tr ( K K )= K∗K and Tr ( K K )= KK∗ 1 | vecih vec| 2 | vecih vec| has a signiﬁcant consequence:

5.10. Theorem. For any pure state ρ on Cm Cn, the two restricted density ⊗ matrices Tr1ρ and Tr2ρ have the same non-zero spectrum, including multiplicities. In particular, S(Tr1ρ)= S(Tr2ρ).

Proof: For any m n matrix K, KK∗, K∗K and K∗K have the same non-zero × spectrum, including multiplicities, and the statement about entropy follows directly from this.

5.11. Example (Mixed states as partial traces of pure states). Consider any ρ S . Let u ,...,u be any orthonormal basis for Cn. Let K Cn Cn by ∈ n { 1 n} vec ∈ ⊗ given by n K = u ρ1/2u . vec j ⊗ j j=1 X We now claim that Tr ( K K )= ρ . 1 | vecih vec| This shows that every mixed state; i.e., density matrix on Cn is the partial trace of a pure state on Cn Cn. ⊗ 118 ERIC CARLEN

To verify the claim, consider any B M . Then ∈ n Tr[Tr ( K K )B] = Tr[( K K (I B)] 1 | vecih vec| | vecih vec| m×m ⊗ = K , (I B)K h vec m×m ⊗ veci n n = u ρ1/2u , u Bρ1/2u i ⊗ i j ⊗ j *i=1 j=1 + X X n = u ,u ρ1/2u ,Bρ1/2u = Tr[Bρ] . h i j ih i j i i,j=1 X (5.30)

5.5. Ando’s identity. The next lemma records an important observation of Ando.

5.12. Lemma (Ando’s identity). Let A H+ , B H+ and let K be any m n ∈ m ∈ n × matrix considered as a vector in Cm Cn. Then ⊗ (5.31) K, (A B)K = Tr(K∗AKB) . h ⊗ i Proof: (A B)K, considered as an m n matrix, has the entries ⊗ × [(A B)K] = A B K . ⊗ i,j i,k j,ℓ k,ℓ Xk,ℓ Since B H , B = B , and so ∈ n j,ℓ ℓ,j [(A B)K] = A K B = [AKB] . ⊗ i,j i,k k,ℓ ℓ,j i,j Xk,ℓ Then since K, (A B)K = Tr(K∗[(A B)K]), the result is proved. h ⊗ i ⊗ One easy consequence of this identity is the following: For A H+ , B H+, ∈ m ∈ n by cyclicity of the trace, 1/2 1/2 K, (A B)K = Tr(B K∗AKB ) = Tr(A1/2KBK∗A1/2) , h ⊗ i and so the map (A, B) (A B) 7→ ⊗ on H+ H+ is monotone in each argument. m × n 6. Lieb’s Concavity Theorem and related results 6.1. Lieb’s Concavity Theorem. In this section, we prove the following fundamental theorem of Lieb [18]:

6.1. Theorem (Lieb’s Concavity Theorem). For all m n matrices K, and all × 0 q, r 1, with q + r 1 the real valued map on H+ H+ given by ≤ ≤ ≤ m × n (6.1) (A, B) Tr(K∗AqKBr) 7→ is concave. TRACE INEQUALITIES AND QUANTUM ENTROPY 119

The following proof is due to Ando [1] Proof of Theorem 6.1: Since the map B B is linear over R, Theorem 5.12 7→ shows that an equivalent formulation of Lieb’s Concavity Theorem is that for 0 ≤ q, r 1, ≤ (6.2) (A, B) Aq Br 7→ ⊗ is concave from H+ H+ to H+ m × n mn Let Ω be the subset of (0, ) (0, ) consisting of points (q, r) such that ∞ × ∞ (A, B) Aq Br is concave. Obviously, (0, 1), (1, 0) and (0, 0) all belong to Ω, 7→ ⊗ and hence it suﬃces to show that Ω is convex. By continuity, it suﬃces to show that if (q , r ), (q , r ) Ω, then so is 1 1 2 2 ∈ q + q r + r (q, r) := 1 2 , 1 2 . 2 2 The key to this is to use the joint concavity properties of the em operator geometric mean M0 that we have studied in Section 3 of these notes: Observe that by (5.18), for such (p, q), (p1, q1) and (p2, q2), Aq Br = M (Aq1 Br1 , Aq2 Br2 ) . ⊗ 0 ⊗ ⊗ Since (q , r ), (q , r ) Ω, 1 1 2 2 ∈ q r A + C j B + D j Aqj Brj + Cqj Drj ⊗ ⊗ j =1, 2 . 2 ⊗ 2 ≥ 2 Then by the monotonicity and concavity of the operator geometric mean,

A + C q B + D r A + C q1 B + D r1 A + C q2 B + D r2 = M , 2 ⊗ 2 0 2 ⊗ 2 2 ⊗ 2 Aq1 Br1 + Cq1 Dr1 Aq2 Br2 + Cq2 Dr2 M ⊗ ⊗ , ⊗ ⊗ ≥ 0 2 2 1 1 M (Aq1 Br1, Aq2 Br2 )+ M (Cq1 Dr1, Cq2 Dr2 ) ≥ 2 0 ⊗ ⊗ 2 0 ⊗ ⊗ 1 1 = Aq Br + Cq Dr . 2 ⊗ 2 ⊗ This proves the midpoint concavity of (A, B) Aq Br, and now the full concavity 7→ ⊗ follows by continuity. Thus, (q, r) Ω, as was to be shown. ∈ 6.2. Ando’s Convexity Theorem. Ando’s proof [1] of Lieb’s Concavity Theorem leads to the following signiﬁcant complement to it:

6.2. Theorem (Ando’s Convexity Theorem). For all m n matrices K, and × all 1 q 2 and 0 r 1 with q r 1, the real valued map on H+ H+ given ≤ ≤ ≤ ≤ − ≥ m × n by

(6.3) (A, B) Tr(K∗AqKB−r) 7→ is convex. 120 ERIC CARLEN

Proof of Theorem 6.2: First note that 1 (6.4) Aq B−r = A I A I . ⊗ ⊗ A2−q Br ⊗ ⊗ Next, for 1 q 2, 0 r 1 and q r 1, we have 0 2 q 1 and ≤ ≤ ≤ ≤ − ≥ ≤ − ≤ 0 (2 q)+ r 1. Therefore, by Theorem 6.1, (A, B) A2−q Br is concave, ≤ − ≤ 7→ ⊗ so that for all A, C H+ and all B,D H+, ∈ m ∈ n A + C 2−q B + D r A2−q Br + C2−q Dr (6.5) ⊗ ⊗ . 2 ⊗ 2 ≥ 2 Thus, by the obvious monotonicity of X Y ∗X−1Y , 7→ −1 A + C q A + C −r A + C A + C 2−q B + D r A + C = I I 2 ⊗ 2 2 ⊗ 2 ⊗ 2 2 ⊗ " # A + C A2−q Br + C2−q Dr −1 A + C I ⊗ ⊗ I ≤ 2 ⊗ 2 2 ⊗ Finally, by Theorem 3.1, which asserts the joint convexity of (X, Y ) Y ∗XY , and 7→ then (6.4) once more, we have

A + C q B + D −r Aq B−r + Cq D−r ⊗ ⊗ , 2 ⊗ 2 ≤ 2 which is the midpoint version of the desired convexity statement. The general case follows by continuity.

6.3. Lieb’s Concavity Theorem and joint convexity of the relative entropy. Consider the map

(6.6) (A, B) Tr[A log A] Tr[A log(B)] := H(A B) 7→ − | on H+ H+. In particular, for density matrices ρ and σ, H(ρ σ) = S(ρ σ), the n × n | | relative entropy of ρ with respect to σ. We shall prove:

6.3. Theorem. The map (A, B) Tr[A log A] Tr[A log(B)] from H+ H+ 7→ − n × n to R is jointly convex.

Proof: For all 0

6.4. Monotonicity of the relative entropy.

6.4. Theorem. Let A be any -subalgebra of M . Then for any two density ∗ n matrices ρ, σ S , ∈ n

(6.7) S(ρ σ) S(EA(ρ) EA(σ)) . | ≥ | Proof: We suppose first that ρ and σ are both positive definite, so that (ρ, σ) 7→ S(ρ σ) is continuous in a neighborhood of (ρ, σ). By Lemma 4.17, there is a sequence | k n∈N of operators of the form (4.23) such that e e {C } e e Eρ(σ) = lim k(ρ) and Eσ(σ) = lim k(σ) . k→∞ C k→∞ C Then by the joint convexity of the relative entropy from Theorem 6.3, and the unitary invariance of the relative entropy; i.e., S(UρU ∗ UσU ∗) = S(ρ σ), and the | | specific form (4.23) of , we have that for each k, Ck S( (ρ) (σ)) S(ρ σ) . Ck |Ck ≤ | Now taking k to infinity, we obtain the result for positive definite ρ and σ. To pass to the general case, first note that unless the nullspace of σ is contained in the nullspace of ρ, S(ρ σ)= , and there is nothing to prove. | ∞ Hence we assume that the nullspace of σ is contained in the nullspace of ρ. This implies that the nullspace of EA(σ) is contained in the nullspace of EA(ρ). To see this, let P denote the orthogonal projection onto the nullspace of EA(σ). Then P A, and so ∈ 0 = Tr(P EA(σ)) = Tr(P σ) , and then since the nullspace of σ is contained in the nullspace of ρ,

0 = Tr(Pρ) = Tr(P EA(ρ)) .

Now replace ρ and σ by ρ := (1 ǫ)ρ + (ǫ/n)I and σ := (1 ǫ)σ + (ǫ/n)I ǫ − ǫ − respectively with 1 >ǫ> 0. Note that since I A, ∈

EA((1 ǫ)ρ + (ǫ/n)I)=(1 ǫ)EA(ρ) + (ǫ/n)I = (EA(ρ)) , − − ǫ and likewise, EA(σǫ) = (EA(σ))ǫ. Therefore

S(ρ σ ) S(EA(ρ ) EA(σ )) ǫ| ǫ ≥ ǫ | ǫ (6.8) = S((EA(ρ)) (EA(σ)) ) . ǫ| ǫ It is easy to show that when the nullspace of σ is contained in the nullspace of ρ, S(ρ σ) = lim S(ρ σ ). By what we have shown above, we then also have | ǫ→0 ǫ| ǫ S(EA(ρ) EA(σ)) = lim S((EA(ρ)) (EA(σ)) ). This together with (6.8) proves | ǫ→0 ǫ| ǫ the result in the general case. 122 ERIC CARLEN

6.5. Subadditivity and strong subadditivity of the entropy. Consider a density matrix ρ on the tensor product of two finite dimensional Hilbert H1 ⊗ H2 spaces. To be concrete, we may as well suppose that for some m and n, = Cm H1 and = Cn. Let ρ = Tr ρ be the density matrix on obtained by taking the H2 1 2 H1 partial trace over of ρ, and let ρ be defined in the analogous way. H2 2 Then ρ ρ is a density matrix on , and by Klein’s inequality, 1 ⊗ 2 H1 ⊗ H2 S(ρ ρ ρ ) 0 | 1 ⊗ 2 ≥ with equality if and only if ρ ρ = ρ. Let us assume that ρ is strictly positive, 1 ⊗ 2 so that ρ1 and ρ2 are also strictly positive, and compute the left hand side. Then since log(ρ ) I and I log(ρ ) commute, it is clear that 1 ⊗ H2 H1 ⊗ 2 (6.9) exp(log(ρ ) I + I log(ρ )) = exp(log(ρ ) I ) exp(I log(ρ )) = 1 ⊗ H2 H1 ⊗ 2 1 ⊗ H2 H1 ⊗ 2 (ρ I ) (I ρ )= ρ ρ . 1 ⊗ H2 H1 ⊗ 2 1 ⊗ 2 It follows that log(ρ ρ ) = log(ρ ) I + I log(ρ ) , 1 ⊗ 2 1 ⊗ H2 H1 ⊗ 2 and hence that S(ρ ρ ρ ) = S(ρ) Tr [ρ (log(ρ ) I + I log(ρ ))] | 1 ⊗ 2 − − 1 ⊗ H2 H1 ⊗ 2 = S(ρ)+ S(ρ )+ S(ρ ) − 1 2 where we have used the definition of the partial trace in the second equality. Since the left hand side is non-negative by Klein’s inequality, we conclude that S(ρ) ≤ S(ρ1)+S(ρ2). This inequality is known as the subadditivity of the quantum entropy. We summarize our conclusions in the following theorem: 6.5. Theorem (Subadditivity of quantum entropy). Let ρ be a density matrix on the tensor product of two finite dimensional Hilbert spaces. For j =1, 2, H1 ⊗H2 let ρ denote the density matrix on obtained by taking the partial trace of ρ over j Hj the other Hilbert space. Then (6.10) S(ρ) S(ρ )+ S(ρ ) , ≤ 1 2 and there is equality if and only if ρ ρ = ρ. 1 ⊗ 2 Note that the dimension does not really enter our considerations, and so this inequaity is easily generalized to the infinite dimensional case. In the spirit of these notes, we leave this to the reader. There is a much deeper subadditivity inequality for density matrices on a tensor product of three Hilbert spaces . Let ρ be a density matrix. By taking H1 ⊗H2 ⊗H3 the various partial traces of ρ, we obtain various density matrices from ρ. We shall use the following notation for these:

ρ123 := ρ ρ23 := Tr1ρ, ρ3 := Tr12ρ and so forth, where Tr denotes the partial trace over , Tr denotes the partial 1 H1 12 trace over and so forth. (That is, the subscripts indicate the spaces H1 ⊗ H2 “remaining” after the traces.) TRACE INEQUALITIES AND QUANTUM ENTROPY 123

6.6. Theorem (Strong subadditivity of quantum entropy). Let ρ be a density matrix on the tensor product of three ﬁnite dimensional Hilbert spaces. H1 ⊗H2⊗H3 Then, using the notation introduced above (6.11) S(ρ )+ S(ρ ) S(ρ )+ S(ρ ) . 13 23 ≥ 123 3 This theorem was conjectured by Lanford, Robinson and Ruelle [25], and was proved by Lieb and Ruskai [20]. Proof of Theorem 6.6: As in the proof of Theorem 6.5, we compute that (6.12) S(ρ ρ ρ )= S(ρ )+ S(ρ )+ S(ρ ) . 123| 12 ⊗ 3 − 123 12 3 Now let A be the -subalgebra of operators on of the form I A ∗ H1 ⊗ H2 ⊗ H3 H1 ⊗ where A is an operator on . Then by the monotonicity of the relative H2 ⊗ H3 entropy, Thoerem 6.4,

(6.13) S(ρ ρ ρ ) S(EA(ρ ) EA(ρ ρ )) . 123| 12 ⊗ 3 ≥ 123 | 12 ⊗ 3 But by Theorem 6.4, (6.14) 1 1 EA(ρ )= I ρ and EA(ρ ρ )= I (ρ ρ ) . 123 dim( ) H1 ⊗ 23 12 ⊗ 3 dim( ) H1 ⊗ 2 ⊗ 3 H1 H1 Therefore, by Theorem 6.5,

S(EA(ρ ) EA(ρ ρ )) = S(ρ )+ S(ρ )+ S(ρ ) . 123 | 12 ⊗ 3 − 23 2 3 Combining this with (6.13) and (6.12) yields (6.11).

7. Lp norms for matrices and entropy inequalities In this section, we shall prove various Lp norm inequalities for matrices that have a connection with quantum entropy. The basic idea is this: Let ρ S . Then ∈ n the map p Tr[ρp] 7→ is diﬀerentiable at p = 1, and d Tr[ρp] = Tr[ρ log(ρ)] = S(ρ). dp − p=1

The inequalities we obtain in this section will be of interest in their own right, but shall also lead to new proofs of entropy inequalities such as strong subadditivitiy of quantum entropy. We begin with an elementary introduction to the matricial analogs of the Lp norms.

p 7.1. The matricial analogs of the L norms. Let Mn denote the set of n n matrices with complex entries, and let A∗ denote the Hermitian conjugate of × A M . For 0 0. 124 ERIC CARLEN

We shall now show that is in fact a norm for 1 leqq < . Let A k·kq ∞ | | denote (A∗A)1/2, and let u ,...,u be an orthonormal basis of Cn consisting of { 1 n} eigenvectors of A with A u = λ u . Then | | | | j j j 1/q n A = (Tr[(A∗A)q/2])1/q = (Tr[ A q])1/q = λq . k kq | |  j  j=1 X   The eigenvalues of A are the singular values of A. Thus, A is the ℓ norm of | | k kq q the sequence of singular values of A.

7.1. Theorem (Duality formula for A ). For all q 1, deﬁne p by 1/q + k kq ≥ 1/p =1. Then for all A in Mn,

∗ A q = sup Tr[B A] : B p =1 . k k B∈Mn{ k k } Proof: For any invertible A, B M let A = U A and B = V B be their polar ∈ n | | | | decompositions, and let W = V ∗U. Let u ,...,u be an orthonormal basis of { 1 n} Cn consisting of eigenvectors of B with B u = λ u . Then | | | | j j j n n (7.2) Tr(B∗A)= u , B W A u = λ u , W A u . h j | | | | j i j h j | | j i j=1 j=1 X X Now let us suppose that q > 1. By H¨older’s inequality, for any q > 1 and p = q/(q 1), − 1/p 1/q n n n λ u W A u λp u , W A u q j h j | | j i ≤  j   |h j | | j i|  j=1 j=1 j=1 X X X    1/q  n

(7.3) = B u , W A u q . k kp  |h j | | j i|  j=1 X   ∗ Now deﬁne vj = W uj . Then by the Schwarz inequality twice , and then Peierl’s inequality, n n u , W A u v , A v q/2 u , A u q/2 |h j | | j i| ≤ h j | | j i h j | | j i j=1 j=1 X X 1/2 1/2 n n v , A v q u , A u q ≤  h j | | j i   h j | | j i  j=1 j=1 X X  1/2  1/2  (7.4) (Tr[ A q]) (Tr[ A q]) = A q . ≤ | | | | k kq

Combining (7.2), (7.3) and (7.4), we have

(7.5) Tr(B∗A) B A , | |≤k kpk kq TRACE INEQUALITIES AND QUANTUM ENTROPY 125 which is the tracial version of Hölder’s inequality, and we note that if B = A 1−qU A q−1, then B = 1 and k kq | | k kp Tr(B∗A)= A 1−qTr A q−1U ∗U A = A 1−qTr [ A q]= A . k kq | | | | k kq | | k kq Combining this with (7.5) yields the result for q > 1. The easy extension to q =1 is left to the reader. Starting from Theorem 7.1, the proof of the Minkowski inequality for k·kq proceeds exactly as it does for the Lp norms: Given A, C M , ∈ n ∗ A + C q = sup Tr[B (A + C)] : B p =1 k k B∈Mn{| | k k } ∗ ∗ sup Tr[B A] : B p =1 + sup Tr[B C] : B p =1 ≤ B∈Mn{| | k k } B∈Mn{| | k k } = A + C . k kq k kq 7.2. Convexity of A Tr (B∗ApB)q/p and certain of its applications. 7→ For any fixed B M and any numbers p,q > 0, define Υ on H+ by ∈ n p,q n ∗ p q/p (7.6) Υp,q(A) = Tr (B A B) . h i 7.2. Theorem. For all 1 p 2, and for all q 1, Υ is convex on H+. ≤ ≤ ≥ p,q n For 0 p q 1, Υ is concave on H+. For p> 2, there exist B such that Υ ≤ ≤ ≤ p,q n p,q is not convex or concave for any values of q = p. 6 1/q 7.3. Remark. The function Υp,q has the same convexity and concavity properties as Υ . To see this, note that Υ is homogeneous of degree q 1. Recall p,q p,q ≥ that a function f that is homogeneous of degree one is convex if and only if the level set x : f(x) 1 is convex, while it is concave if and only if the level set { ≤ } x : f(x) 1 is convex. Hence, if g(x) is homogeneous of degree q, and convex, { ≥ } so that x : g(x) 1 is convex, g1/q is convex, and similarly for concavity. { ≤ }

The concavity of Υp,1 for 0 2. Finally the convexity 1

7.4. Theorem (Lieb-Thirring trace inequality). For all A, B H+ and all ∈ n t 1, ≥ (7.7) Tr (B1/2AB1/2)t Tr Bt/2AtBt/2 . ≤ h i h i Proof: Deﬁne C = At and p =1/t 1, so that A = Cp. Then ≤ Tr (B1/2AB1/2)t Tr Bt/2AtBt/2 = Tr (B1/2CpB1/2)1/p Tr CB1/p , − − h i h i h i h i and by Epstein’s part of Theorem 7.2 the right hand side is a concave function of C. Now we apply Example 4.15: Choose an orthonormal basis u ,...,u { 1 n} diagonalizing B. Let A be the -subalgebra of M consisting of matrices that are ∗ n 126 ERIC CARLEN diagonal in this basis. Then as shown in Example 4.15, EA(C) is an average over unitary conjugates of C, by unitaries that commute with B. It follows that

Tr (B1/2CpB1/2)1/p Tr CB1/p − h i h 1/2 i p 1/2 1/p 1/p Tr (B (EA(C)) B ) Tr (EA(C))B . ≥ − h i h i However, since EA(C) and B commute, the right hand side is zero. Theorem 7.4 was first proved in [22], and has had many applications since then. The next application is taken from [3, 4]. We first define another trace function: + m For any numbers p,q > 0, and any positive integer m, define Φp,q on (Hn ) , + the m-fold cartesian product of Hn with itself by

(7.8) Φ (A ,...,A )= ( m Ap)1/p . p,q 1 m k j=1 j kq 7.5. Theorem. For all 1 p 2, and forP all q 1, Φ is jointly convex on ≤ ≤ ≥ p,q (H+)m, while for 0 p q 1, Φ is jointly concave on (H+)m. For p > 2, n ≤ ≤ ≤ p,q n Φ is not convex or concave, even separately, for any value of q = p. p,q 6

A1 0 Proof : Define the mn mn matrices = .. and × A  .  0 A  m  I 0 . . . 0   = . . . . ( is block diagonal with A as the jth diagonal block, B  . . .  A j I 0 . . . 0   and  has n n identities in each block in the first column, and zeros elsewhere.) B × Then p is the block matrix with m Ap in its upper left block, and zeros BA B j=1 j elsewhere. Thus, P q/p 1/q 1/q Tr m Ap = Tr ( p )q/p . j=1 j BA B P h i By Theorem 7.2 and Remark 7.3, the right hand side is convex in for all 1 p 2, A ≤ ≤ q 1, and concave in for 0 p q 1. ≥ A ≤ ≤ ≤ We now show, by means of a Taylor expansion, that both convexity and concavity fail for p> 2 and any q = p. By simple differentiation one finds that for any 6 A, B H+, ∈ n tp (7.9) Φ (tA,B)= B + B 1−q TrApBq−p + O(t2p) . p,q k kq p k kq

Keeping B ﬁxed, but replacing A by A1, A2 and (A1 + A2)/2, we ﬁnd 1 1 A + A Φ (tA ,B)+ Φ (tA ,B) Φ t 1 2 , B = 2 p,q 1 2 p,q 2 − p,q 2 tp 1 1 A + A p B 1−q Tr ApBq−p + Tr ApBq−p Tr 1 2 Bq−p + O(t2p). p k kq 2 1 2 2 − 2 TRACE INEQUALITIES AND QUANTUM ENTROPY 127

p Now if p > 2, A A is not operator convex, and so we can ﬁnd A1 and A2 + 7→ Cn in Hn and a unit vector v in such that

1 1 A + A p (7.10) v, Apv + v, Apv v, 1 2 v < 0 , 2h 1 i 2h 2 i− 2 and of course since A Tr(Ap) is convex for p> 2, we can ﬁnd v so that the left 7→ hand side in (7.10) is positive. For q = p, take Bq−p to be (a close approximation 6 of) the rank one projection onto v.

7.6. Remark. We showed in the proof of Theorem 7.5 that Φp,q is convex or concave for given p and q whenever Υp,q is. Thus our proof that Φp,q is not convex or concave for p> 2 and q = p implies this part of Theorem 7.2. 6 We now deﬁne one more trace function, and make a simple reformulation of Theorem 7.5 that will lead to another proof of the strong subadditivity of quantum entropy. For any numbers p,q > 0, and any positive integers m and n, deﬁne Ψp,q on H+ , viewed as the space of linear operators on Cm Cn by mn ⊗

(7.11) Ψ (A)= (Tr Ap)1/p . p,q k 1 kq 7.7. Theorem. For 1 p 2 and q 1, Ψ is convex on H+ , while for ≤ ≤ ≥ p,q mn 0 p q 1, Ψ is concave on H+ . ≤ ≤ ≤ p,q mn

Proof: We shall apply Theorem 5.7. Let A be the subalgebra of Mmn, identiﬁed with Cm Cn, consisting of operators of the form I B, B M . By ⊗ m×m ⊗ ∈ n Theorem 5.7,

1 p p I Tr A =EA(A ) , m m×m ⊗ 1 and so

p 1/p p 1/p (7.12) Ψ (A )= m (EA(A )) . p,q k kq

1/p The factor of n does not aﬀect the convexity properties of Ψp,q, and so it suﬃces 1/p to consider the convexity properties of A (EA(Ap)) . 7→ k kq By Theorem 4.13 and the remark following its proof, EA(Ap) is a limit of operators (Ap) of the form Ck

Nk (7.13) (Ap)= p I W ApI W ∗ . Ck k,j m×m ⊗ k,j m×m ⊗ k,j j=1 X 128 ERIC CARLEN

′ where each Wk,j is a unitary matrix in A (which means that it has the form I U where U is unitary in M ). But then m×m ⊗ n 1/p Nk −1/p p ∗ n Ψp,q(A) = lim pk,j Wk,j A Wk,j k→∞   j=1 X q   1/p N k p 1/p ∗ = lim pk,j pk,j Wk,j AWk,j k→∞   j=1 X q   1/p ∗ 1/p ∗ = lim Φp,q p Wk,1AWk,1 ,... ,p Wk,Nk AWk,N . k→∞ k,1 k,Nk k Since a limit of convex functions is convex, we see that Ψp,q is convex or concave whenever Φp,q is. The reverse implication is even more elementary: To see this, suppose that the matrix A in Theorem 7.7 is the block diagonal matrix whose jth diagonal block is Aj . Then, clearly, Ψp,q(A)=Φp,q(A1, A2,...,Am). We now return to the theme with which we began this section, and explain how to deduce the strong subadditivity of quantum entropy from Theorem 7.7. Let ρ be a density matrix on = Cm Cn. Let ρ = Tr ρ and H1 ⊗ H2 ⊗ 1 H2 ρ2 = TrH1 ρ be its two partial traces. As in our previous discussion of subadditivity, we shall also use ρ12 to denote the full density matrix ρ. Then a simple calculation shows that d (7.14) Ψ (ρ) = S(ρ ) S(ρ ) . dp p,1 2 − 12 p=1

To see this, observe that for a positive operator A, and ε close to zero,

A1+ε = A + εA ln A + (ε2) . O At least in ﬁnite dimensions, one can take a partial trace of both sides, and the resulting identity still holds. Applying this with A = ρ = ρ12, we compute Tr ρ1+ε = ρ + εTr (ρ ln ρ )+ (ε2) . H1 2 H1 12 12 O Then, since to leading order in ε, 1/(1 + ε) is 1 ε, − 1/(1+ε) Tr (ρ1+ε) = ρ + εTr ((ρ ln ρ ) ερ ln ρ + (ε2) . H1 2 H1 12 12 − 2 2 O Thus, 1/(1+ε) (7.15) Tr ( Tr ((ρ1+ε) =1 εS(ρ )+ εS(ρ )+ (ε2) . H2 H1 − 12 2 O This proves (7.14). Now we apply Theorem 7.7. Let us take the case where is replaced by H2 a tensor product of two ﬁnite dimensional Hilbert spaces , so that we H2 ⊗ H3 identify operators on with M where m = dim( ) and n = H1 ⊗ H2 ⊗ H3 mn H1 dim( ) dim( ). Let A denote the -subalgebra of M consisting of operators H2 × H3 ∗ mn of the form A I where A is an operator on . ⊗ H3 H1 ⊗ H2 TRACE INEQUALITIES AND QUANTUM ENTROPY 129

We now claim that for any density matrix ρ on , 123 H1 ⊗ H2 ⊗ H3 (7.16) Ψ (ρ ) Ψ (EA(ρ )) . p,1 123 ≥ p,1 123 To see this, we apply Theorem 4.13 as in the proof of Theorem 7.7, together with the fact that the unitaries in A′ are of the form I U, where U is unitary on H1⊗H2 ⊗ . H3 Then by (7.14) and Theorem 5.7, d Ψ (EA(ρ )) = S(Tr [EA(ρ )]) S(EA(ρ )) dp p,1 123 H1 123 − 123 p=1

= S(ρ2) S(ρ12) . − Also, directly from (7.14), d (7.17) Ψ (ρ ) = S(ρ ) S(ρ ) . dp p,1 123 23 − 123 p=1

Then, since at p = 1, both sides of (7.16) equal one,

S(ρ ) S(ρ ) S(ρ ) S(ρ ) , 23 − 123 ≥ 2 − 12 which of course is equivalent to (6.11). This shows that the strong subadditivity of quantum entropy can be viewed as a consequence, via diﬀerentiation in p, of the inequality of Theorem 7.7. One may therefore view the inequality of Theorem 7.7 as a generlaization of the strong subadditivity inequality. For another Lp inequality that can be diﬀerentiated to yield strong subadditivity, namely a Minkowski type inequality for traces of operators on a tensor product of three Hilbert spaces, see [3, 4]. For other applications of Theorem 7.2, see [4]. 7.3. Proof of the convexity of A Tr (B∗ApB)q/p . We now close this 7→ section by proving Theorem 7.2, and thus completing the proofs of all of the theorems in this subsection. We prepare the way for the proof of Theorem 7.2 with some lemmas. The proof of convexity of Υ divides into two cases, namely 1 q p 2 and 1 p 2 p,q ≤ ≤ ≤ ≤ ≤ with q>p. The latter case, q>p, is the easier one, and the next lemma takes care of it: 7.8. Lemma. For 1 p 2 and q p, Υ is convex on H+. ≤ ≤ ≥ p,q n Proof: Since r := q/p 1 and since B∗ApB 0, we can write ≥ ≥ (7.18) B∗ApB = sup Tr(B∗ApBY ) k kr kY kr′ ≤ 1, Y ≥ 0 where 1/r +1/r′ = 1. Since Ap is well known to be operator convex in A for 1 p 2, so is B∗ApB. Since the right side of (7.18) is the supremum of a family ≤ ≤ of convex functions (note that Y 0 is needed here) we conclude that B∗ApB ≥ k kr is convex. (Υp,q(A) is the rth power of this quantity and is therefore convex.) The case q

For r> 1, and c,x > 0, the arithmetic–geometric mean inequality says 1 r 1 cr + − xr cxr−1 , r r ≥ and hence 1 cr (7.19) c = inf + (r 1)x : x> 0 . r xr−1 − With the inﬁmum replaced by a supremum, the resulting formula is valid for 0

p/2 ∗ p/2 q/p (7.20) Υp,q(A) = Tr (A BB A ) . h i

7.9. Lemma. For any positive n n matrix A, and with r = p/q > 1, × 1 1 (7.21) Υ (A)= inf Tr Ap/2B B∗Ap/2 + (r 1)X : X > 0 p,q r Xr−1 − where the inﬁmum is taken over all positive n n matrices X. Likewise, if the × inﬁmum replaced by a supremum, the resulting formula is valid for r = p/q < 1.

Proof: Let C = B∗Ap/2. By continuity we may assume that C∗C is strictly positive. Then, for r> 1, there is a minimizing X. Let

Y = X1−r and note that minimizing 1 Tr Ap/2B B∗Ap/2 + (r 1)X Xr−1 − with respect to X > 0 is the same as minimizing

Tr CC∗Y + (r 1)Y −1/(r−1) − with respect to Y > 0. Since the minimizing Y is strictly positive, we may replace the minimizing Y by Y +tD, with D self adjoint, and set the derivative with respect to t equal to 0 at t = 0. This leads to TrD[CC∗ Y −r/(r−1)] = 0. Therefore − Y −r/(r−1) = CC∗ and we are done. The variational formula for p/q < 1 is proved in the same manner.

7.10. Lemma. If f(x, y) is jointly convex, then g(x) deﬁned by g(x) = infy f(x, y) is convex. The analogous statement with convex replaced by concave and inﬁmum replaced by supremum is also true.

Proof: For ε> 0, choose (x0,y0) and (x1,y1) so that f(x ,y ) g(x )+ ε and f(x ,y ) g(x )+ ε . 0 0 ≤ 0 1 1 ≤ 1 TRACE INEQUALITIES AND QUANTUM ENTROPY 131

Then: g((1 λ)x + λx ) f((1 λ)x + λx , (1 λ)y + λy ) − 0 1 ≤ − 0 1 − 0 1 (1 λ)f(x ,y )+ λf(x ,y ) ≤ − 0 0 1 1 (1 λ)g(x )+ λg(x )+ ε . ≤ − 0 1 On account of Lemmas 7.9 and 7.10 we shall be easily able to prove the stated convexity and concavity properties of Υp,q once we have proved: 7.11. Lemma. The map 1 (7.22) (A, X) Tr Ap/2B∗ BAp/2 7→ Xr−1 is jointly convex on H+ H+ for all 1 r p 2 and is jointly concave for all n × n ≤ ≤ ≤ 0

Proof: We ﬁrst rewrite the right hand side of (7.22) in a more convenient form: Deﬁne A 0 0 0 B∗X1−pB 0 Z = and K = so that K∗Z1−pK = . 0 X B 0 0 0 1 Then, by cyclicity of the trace, Tr(ZpK∗Z1−rK) = Tr Ap/2B∗ BAp/2 . Xr−1 Note that convexity/concavity of the left hand side in Zis the same as convexity/concavity of the right hand side in (A, X). The result now follows from The- orem 6.1, the Lieb Concavity Theorem and Theorem 6.2, the Ando Convexity Theorem. Proof of Theorem 7.2: By Lemma 7.11, the mapping in (7.22) is jointly convex for 1 r p 2. Then taking r = p/q, we have from Lemma 7.9 and Lemma ≤ ≤ ≤ 7.10 Υp,q(A) = infX f(A, X) where f(A, X) is jointly convex in A and X. The convexity of Υp,q now follows by Lemma 7.10. The concavity statement is proved in the same way. We have already observed that Φp,q inherits its convexity and concavity properties from those of Υp,q, and thus, having shown in the proof of Theorem 7.5 that Φ is neither convex nor concave for p> 2 and q = p, the same p,q 6 is true for Υp,q.

8. Brascamp-Lieb type inequalities for traces We recall the original Young’s inequality: For non-negative measurable functions f1, f2 and f3 on R, and 1/p1 +1/p2 +1/p3 =2

(8.1) f1(x)f2(x y)f3(y)dxdy R2 − ≤ Z 1/p1 1/p2 1/p3 p1 p2 p3 f1 (t)dt f2 (t)dt f3 (t)dt . R R R Z Z Z Deﬁne the maps φ : R2 R, j =1, 2, 3, by j → φ (x, y)= x φ (x, y)= x y and φ (x, y)= y . 1 2 − 3 132 ERIC CARLEN

Then (8.1) can be rewritten as

3 3 1/pj 2 pj (8.2) fj φj d x fj (t)dt . R2  ◦  ≤ R j=1 j=1 Z Y Y Z There is now no particular reason to limit ourselves to products of only three functions, or to integrals over R2 and R, or even any Euclidean space for that matter: 8.1. Definition. Given measure spaces (Ω, , µ) and (M , , ν ), j = S j Mj j 1,...,N, not necessarily distinct, together with measurable functions φ :Ω M j → j and numbers p ,...,p with 1 p , 1 j N, we say that a generalized 1 N ≤ j ≤ ∞ ≤ ≤ Young’s inequality holds for φ ,...,φ and p ,...,p in case there is a ﬁnite { 1 N } { 1 N } constant C such that N N

(8.3) f φ dµ C f pj j ◦ j ≤ k jkL (νj ) Ω j=1 j=1 Z Y Y holds whenever fj is non-negative and measurable on Mj , j =1,...,N. 8.1. A generalized Young’s inequality in the context of non- commutative integration. In non-commutative integration theory, as expounded by I. Segal [28, 30] and J. Dixmier [10], the basic data is an operator algebra equipped with a positive A linear functional λ. (For more information, see especially Nelson’s paper [23].) The algebra corresponds to the algebra of bounded measurable functions, A and applying the positive linear functional λ to a positive operator corresponds to taking the integral of a positive function. That is,

A λ(A) correspondsto f fdν . 7→ 7→ ZM To frame an analog of (8.3) in an operator algebra setting, we replace the measure spaces by non-commutative integration spaces: (M , , ν ) ( , λ ) j =1,...,N j Mj j −→ Aj j and (Ω, , µ) ( , λ) . S −→ B The right hand side of (8.3) is easy to generalize to the operator algebra setting; for A ( , λ), and 1 q , we deﬁne ∈ A ≤ ≤ ∞ A = (λ( A )q )1/q . k kq,λ | | Then the natural analog of the right hand side of (8.3) is N A . k j k(qj ,λj ) j=1 Y As for the left hand side of (8.3), regard fj fj φj can easily be interpreted ∞ 7→∞ ◦ in operator algebra terms: Think of L (Ω) and L (Mj ) as (commuative) operator algebras. Indeed, we can consider their elements as multiplication operators in the TRACE INEQUALITIES AND QUANTUM ENTROPY 133 obvious way. Then the map f f φ is an operator algebra homomorphism; j 7→ j ◦ j i.e. , a linear transformation respecting the product and the conjugation . ∗ Therefore, suppose we are given operator algebra homomorphisms φ : . j Aj → A Then each φ (A ) belongs to , however in the non-commutative case, the product j j A of the φj (Aj ) depends on the order, and need not be self adjoint even – let alone positive – even if each of the Aj are positive. Therefore, let us return to the left side of (8.3), and suppose that each fj is strictly positive. Then deﬁning h = ln(f ) so that f φ = ehj ◦φj , j j j ◦ j we can rewrite (8.3) as

N N hj (8.4) exp h φ dµ C e pj ,  j ◦ j  ≤ k kL (νj ) Ω j=1 j=1 Z X Y We can now formulate our operator algebra analog of (8.3): 8.2. Definition. Given non-commutative integration spaces ( , λ) and A ( , λ ), j = 1,...,N, together with operator algebra homomorphisms φ : Aj j j Aj → , j = 1,...,N, and indices 1 p , j = 1,...,N, a generalized Young’s A ≤ j ≤ ∞ inequality holds for φ ,...,φ and p ,...,p if there is a finite constant C so { 1 N } { 1 N } that N N (8.5) λ exp φ (H ) C (λ (exp [p H ]))1/pj   j j  ≤ j j j j=1 j=1 X Y whenever H is self adjoint in , j=1,...,N. j Aj We are concerned with determining the indices and the best constant C for which such an inequality holds, and shall focus on one example arising in mathematical physics. 8.2. A generalized Young’s inequality for tensor products. Let , Hj j =1,...,N be separable Hilbert spaces, and let denote the tensor product K = . K H1 ⊗···⊗HN Define to be ( ), the algebra of bounded linear operators on , and define A B K K λ to be the trace Tr on , so that ( , λ) = ( ( ), Tr). K A B K For any non-empty subset J of 1,...,n , let denote the tensor product { } KJ = . KJ ⊗j∈J Hj Define to be ( ), the algebra of bounded linear operators on , and define AJ B KJ KJ λ be the trace on , so that ( , λ ) = ( ( ), Tr ). J KJ AJ J B KJ J There are natural endomorphisms φ embedding the 2N 1 algebras into J − AJ . For instance, if J = 1, 2 , A { } (8.6) φ (A A )= A A I I , {1,2} 1 ⊗ 2 1 ⊗ 2 ⊗ H3 ⊗···⊗ HN 134 ERIC CARLEN and is extended linearly. It is obvious that in case J K = and J K = 1,...,n , then for all ∩ ∅ ∪ { } H and H , J ∈ AJ K ∈ AK HJ +HK HJ HK (8.7) Tr e = TrJ e TrK e , but things are more interesting when J K = and J and K are both proper ∩ 6 ∅ subsets of 1,...,N . The following is proved in [6]: { } 8.3. Theorem (Generalized Young’s Inequality for Tensor Products). Let J ,...,J be N non-empty subsets of 1,...,n For each i 1,...,n , let p(i) de- 1 N { } ∈{ } note the number of the sets J1,...,JN that contain i, and let p denote the minimum of the p(i). Then, for self adjoint operators H on , j =1,...,N, j KJj N N 1/q (8.8) Tr exp φ (H ) Tr eqHj   Jj j  ≤ Jj j=1 j=1 X Y for all 1 q p, while for all q>p, it is possible for the left hand side to be ≤ ≤ infinite, while the right hand side is finite.

Note that in the generalized Young’s inequality in Theorem 8.3, the constant C in Deﬁnition (8.2) is 1. The fact that the constant C = 1 is best possible, and that the inequality cannot hold for q>p is easy to see by considering the case that each has ﬁnite Hj dimension dj , and Hj = 0 for each j. Then

N n Tr exp φ (H ) = d   Jj j  k j=1 X kY=1 N   N  n 1/q qHj 1/q p(k)/q TrJj e = dk = dk . j=1 j=1 k∈J k=1 Y Y Yj Y 1/p 1/q Moreover, since for q >p, Tr epHj > Tr eqHj , it suﬃces to prove the inequality (8.8) for q = p. We will do this later in this section. As an example, consider the case of overlapping pairs with a periodic boundary condition: J = j, j +1 j =1,...,n 1 and J = n, 1 . j { } − n { } Here, N = n, and obviously p = 2. Therefore,

N N 1/2 (8.9) Tr exp φ (H ) Tr e2Hj .   j j  ≤ j=1 j=1 X Y The inequality (8.9) has an interesting statistical mechanical interpretation as a bound on the partition function of an arbitrarily long chain of interacting spins in terms of a product of partition functions of simple constituent two–spin systems. Again, the inequality (8.9) is non-trivial due to the “overlap” in the algebras . Aj TRACE INEQUALITIES AND QUANTUM ENTROPY 135

8.3. Subadditivity of Entropy and Generalized Young’s Inequalities. In the examples we consider, the positive linear functionals λ under consideration are either traces or normalized traces. Throughout this section, we assume that our non-commutative integration spaces ( , λ) are based on tracial positive linear A functionals λ. That is, we require that for all A, B , ∈ A λ(AB)= λ(BA) .

In such a non-commutative integration space ( , λ), a probability density is a A non-negative element ρ of such that λ(ρ) = 1. Indeed, the tracial property of λ A ensures that λ(ρA)= λ(Aρ)= λ(ρ1/2Aρ1/2) so that A λ(ρA) is a positive linear functional that is 1 on the identity. 7→ Now suppose we have N non-commutative integration spaces ( , λ ) and op- Aj j erator algebra homomorphisms φ : . Then these homomorphisms induce j Aj → A maps from the space of probability densities on to the spaces of probability A densities on the , as follows: Aj For any probability density ρ on ( , λ), let ρ be the probability density on A j ( , λ ) by Aj j λj (ρj A)= λ(ρφj (A)) for all A . ∈ Aj

For example, in the setting we are discussing here, ρJj is just the partial trace of ρ over k∈Jc k leaving an operator on k∈J k. ⊗ j H ⊗ j H In this section, we are concerned with the relations between the entropies of ρ and the ρ1,...,ρN . The entropy of a probability density ρ, S(ρ), is deﬁned by S(ρ)= λ(ρ ln ρ) . − Evidently, the entropy functional is concave on the set of probability densities.

8.4. Definition. Given tracial non-commutative integration spaces (A, λ) and (A , λ ), j = 1,...,N, together with C∗ algebra endomorphisms φ : A A, j j j j → j = 1,...,N, and numbers 1 p , j = 1,...,N, a generalized subadditivity ≤ j ≤ ∞ of entropy inequality holds if there is a ﬁnite constant C so that

N 1 (8.10) S(ρ ) S(ρ) ln C p j ≥ − j=1 j X for all probability densities ρ in A.

The following is a non-commutative version of theorem proved in [5]. The non-commuative version is proved in [6].

8.5. Theorem (Duality for Generalized Youngs Inequalities and Entropy). Let (A, λ) and (Aj , λj ), j = 1,...,N, be tracial non-commutative integration spaces. Let φ : A A, j =1,...,N be C∗ algebra endomorphisms. j j → 136 ERIC CARLEN

Then for any numbers 1 p , j = 1,...,N, and any ﬁnite constant C, ≤ j ≤ ∞ the generalized subadditivity of entropy inequality

N 1 S(ρ ) S(ρ) ln C p j ≥ − j=1 j X is true for all probability densities ρ on A if and only if the generalized Young’s inequality

N N λ exp φ (H ) C (λ exp[p H ])1/pj   j j  ≤ j j j j=1 j=1 X Y    is true for all self-adjoint H A , j =1,...,N, with the same p ,...,p and the j ∈ j 1 N same C.

Proof of Theorem 8.5: We make use of the duality formula for the entropy given in Theorem 2.13. Suppose ﬁrst that the generalized Young’s inequality (8.5) holds. Then, for any probability density ρ in A, and any self adjoint H A , j ∈ j j =1,...,N, we have

N N S(ρ) λ ρ φ (H ) ln λ exp φ (H ) − ≥   j j  −    j j  j=1 j=1 X X N    N   = λ (ρ H ) ln λ exp φ (H ) j j j −    j j  j=1 j=1 X X N  N   1/p λ (ρ H ) ln C λ epj Hj j ≥ j j j −  j  j=1 j=1 X Y N   1 = λ (ρ [p H ]) ln λ e[pj Hj ] ln C . p j j j j − j − j=1 j X h i

Now choosing p H to maximize λ (ρ [p H ]) ln λ e[pj Hj ] , we get j j j j j j − j λ (ρ [p H ]) ln λ e[pj Hj ] = S(ρ )= λ (ρ ln ρ ) . j j j j − j − j j j j Next, suppose that the subadditivity inequality is true. Let the self adjoint operators H1,...,HN be given, and deﬁne

−1 N N ρ = λ exp φ (H ) exp φ (H ) .    j j   j j  j=1 j=1 X X       TRACE INEQUALITIES AND QUANTUM ENTROPY 137

Then by Theorem 2.13,

N N ln λ exp φ (H ) = λ ρ φ (H ) + S(ρ)    j j    j j  j=1 j=1 X X     N   = λj [ρj Hj ]+ S(ρ) j=1 X N 1 [λ [ρ (p H )] + S(ρ )] + ln C ≤ p j j j j j j=1 j X N 1 ln [λ (exp(p H ))] + ln C ≤ p j j j j=1 j X

Proof of Theorem 8.3: By Theorem 8.5, in order to prove Theorem 8.3, it suffices to prove the corresponding generalized subadditivity of entropy inequality for tensor products of Hilbert spaces, which we now formulate and prove. The crucial tool that we use here is the strong subadditivity of the entropy; i.e., Theorem 6.6, except that we shall use a slightly different indexing of the various partial traces that is better adapted to our application. Suppose, as in the case we are discussing, that we are given n separable Hilbert spaces ,..., . As before, let denote their tensor product, and for any non- H1 Hn K empty subset J of 1,...,n , let denote . { } KJ ⊗j∈J Hj For a density matrix ρ on , and any non-empty subset J of 1,...,n , define K { } ρ = Tr c ρ to be the density matrix on induced by the natural injection of J J KJ ( ) into ( ). As noted above, ρ is nothing other than the partial trace of ρ B KJ B K J over the complementary product of Hilbert spaces, . ⊗j∈ /J Hj The strong subadditivity of the entropy of Theorem 6.6 can be formulated as the statement that for all non-empty J, K 1,...,n , ⊂{ } (8.11) S(ρ )+ S(ρ ) S(ρ )+ S(ρ ) . J K ≥ J∪K J∩K In case J K = , it reduces to the ordinary subadditivity of the entropy, which is ∩ ∅ the elementary inequality

(8.12) S(ρ )+ S(ρ ) S(ρ ) for J K = . J K ≥ J∪K ∩ ∅ Combining these, we have

S(ρ )+ S(ρ )+ S(ρ ) S(ρ )+ S(ρ )+ S(ρ ) {1,2} {2,3} {3,1} ≥ {1,2,3} {2} {1,3} 2S(ρ ) , ≥ {1,2,3} (8.13) where the ﬁrst inequality is the strong subadditivity (8.11) and the second is the ordinary subadditivity (8.12). Thus, for n = 3 and J = 1, 2 , J = 2, 3 and 1 { } 2 { } 138 ERIC CARLEN

J3 = 3, 1 , we obtain { } N 1 S(ρ ) S(ρ) . 2 Jj ≥ j=1 X 8.6. Theorem. Let J ,...,J be N non-empty subsets of 1,...,n . For each 1 N { } i 1,...,n , let p(i) denote the number of the sets J ,...,J that contain i, and ∈{ } 1 N let p denote the minimum of the p(i). Then

N 1 (8.14) S(ρ ) S(ρ) p Jj ≥ j=1 X for all density matrices ρ on = . K H1 ⊗···⊗Hn Proof: Simply use strong subadditivity to combine overlapping sets to produce as many “complete” sets as possible, as in the example above. Clearly, there can be no more than p of these. If p(i) > p for some indices i, there will be “left over” partial sets. The entropy is always non-negative, and therefore, discarding the corresponding entropies gives us N S(ρ ) pS(ρ), and hence the inequality. j=1 Jj ≥ P Proof of Theorem 8.3: This now follows directly from Theorem 8.5 and Theo- rem 8.6.

Acknoledgements I would like to thank the organizers of the 2009 Arizona Summer School Entropy and the quantum, Robert Sims and Daniel Ueltschi, for the invitation to deliver these lectures. Some of the results presented here were obtained in collaboration with Elliott Lieb, and beyond that, my view of the subject presented in these notes draws very much on the many discussions we had in the course of these collaborations. Though I take full responsibility for any shortcomings in these notes, I would like to thank Elliott Lieb for many enlightening conversations that have certainly added to whatever utility these notes may ﬁnd. I would also like to thank Robert Seiringer for very carefully reading them, correcting many misprints, and making valuable suggestions.

References

[1] T. Ando: Convexity of certain maps on positive deﬁnite matrices and applications to Hadamard products, Lin. Alg. and Appl., 26 203–241 (1979) [2] H.J. Brascamp and E.H. Lieb: The best constant in Young’s inequality and its generalization to more than three functions,. J. Math. Pures Appl. 86, no. 2, 2006 pp. 89–99. [3] E.A. Carlen and E.H. Lieb: A Minkowski type trace inequality and strong subadditivity of quantum entropy, Advances in the Mathematical Sciences, AMS Transl., 189 Series 2, (1999) 59-68. arXiv:math/0701352. [4] E.A. Carlen and E.H. Lieb: A Minkowski type trace inequality and strong subadditivity of the quantum entropy II Lett. Math. Phys., Vol. 83, no. 2, (2008) pp. 107-126 [5] E.A. Carlen and D. Cordero–Erausquin: Subadditivity of the entropy and its relation to Brascamp-Lieb type inequalities, to appear in Geom. Funct. Anal. (2008). TRACE INEQUALITIES AND QUANTUM ENTROPY 139

[6] E.A. Carlen and E.H. Lieb: Bascamp-Lieb inequalities for non-commutative integration Documenta Mathematica, 13 (2008) 553-584 [7] C. Davis A Schwarz inequality ofr operator convex functions. Proc. Amer. Math. Soc., 8, 42–44, (1957) [8] C. Davis Various averaging operations onto subalgebras. Illinois J. Math., 3, 528–553, (1959) [9] C. Davis Notions generalizing convexity for functions defined on spaces of matrices, in Convexity, Proceedings of Symposia in Pure Mathematics, VII, ed. V. Klee, Amer. Math. Soc., Providence, 1963, pp 187–202 [10] J. Dixmier: Formes linéaires sur un anneau d’opérateurs, Bull. Soc. Math. France 81, 1953, pp. 222–245 [11] H. Epstein: Remarks on two theorems of E. Lieb, Commun. Math. Phys. 31 317-325 (1973) [12] S. Golden: Lower bounds for Helmholtz functions. Phys. Rev. 137B, 1965, pp. 1127-1128 [13] E. Heinz: Beiträge zur Störungstheorie der Spektralzerlegung, Math. Ann. 123, 415–438 (1951) [14] F. Hiai: Concavity of certain matrix trace functions. Taiwanese J. Math., 5, no. 3, 535–554 (2001) [15] F. Hansen: The Wigner-Yanase entropy is not subadditive, J. of Stat. Phys., 126, No. 3, 643–648 (2007) [16] G.H. Hardy, J.E. Littlewood and G. Pólya: Inequalities. Cambridge University Press, Cam- bridge, 1934 [17] T Kato: Notes on some inequalities for linear operators, Math. Ann., 125, 208–212 (1952) [18] E.H. Lieb: Convex trace functions and the Wigner-Yanase-Dyson Conjecture, Adv. Math. 11 267-288 (1973) [19] E.H. Lieb: Some convexity and subadditivity properties of entropy, Bull. Amer. Math. Soc. 81, 1975, pp. 1-13. [20] E.H. Lieb and M.B. Ruskai: Proof of the strong subadditivity of quantum-mechanical entropy, J. Math. Phys. 14, 1973, pp. 1938-1941. [21] E.H. Lieb and M.B. Ruskai: Some operator inequalities of the Schwarz type, Adv. in Math. 12, 269-273 (1974). [22] E.H. Lieb and W. Thirring, Inequalities for the moments of the eigenvalues of the Schrödinger hamiltonian and their relation to Sobolev inequalities, in Studies in Mathemat- ical Physics, E.H. Lieb, B. Simon, A. Wightman eds., Princeton University Press, 269-303 (1976). [23] E. Nelson: Notes on non-commutative integration, Jour. Funct. Analysis, 15, 1974, pp. 103- 116 [24] D. Petz: A variational expression for the relative entropy, Comm. Math. Phys., 114, 1988, pp. 345–349 [25] D.W.. Robinson and D. Ruelle: Mean entropy of states in classical statistical mechanics Commun. Math. Phys 5, 1967, pp.288-300 [26] T. Rockafellar: Conjugate duality and optimization, Vol. 16, Regional conference series in applied mathematics, SIAM, Philadelphia, 1974 [27] M.B. Ruskai, Inequalities for quantum entropy: A review with conditions for equality, J. Math. Phys. 43, 2005, pp. 4358-4375 (2002). Erratum ibid 46, pp. 019901 [28] I.E. Segal: A non-commutative extension of abstract integration, Annals of Math., 57, 1953, pp. 401–457 [29] I.E. Segal.: Tensor algebras over Hilbert spaces II, Annals of Math., 63, 1956, pp. 160–175 [30] I.E. Segal.: Algebraic integration theory, Bull. Am. Math. Soc.., 71, 1965, no. 3, pp. 419-489 [31] B. Segal.: Trace ideals and their applications, London Math. Soc. Lecture Note Series, 35, Cambridge University Press, Cambridge, 1979 [32] C. Thompson: Inequality with application in statistical mechanics J. Math. Phys. 6, 1812– 1813 (1965) [33] W. Thirring: A course in mathematical physics, Volume IV: quantum mechanics of large systems Springer Verlag, New York, 1983 140 ERIC CARLEN

[34] A. Uhlmann: Sätze über Dichtematrizen, Wiss. Z. Karl-Marx Univ. Leipzig 20, 633-53 (1971) [35] A. Uhlmann: Relative entropy and the Wigner-Yanase-Dyson-Lieb concavity in an interpo- lation theory Commun. Math. Phys., 54, 1977, pp. 21-32 [36] H. Umegaki: Conditional expectation in operator algebras I, Tohoku Math. J., 6, 1954, pp. 177-181 [37] J. von Neumann, Zur Algebra der Funktionaloperatoren und Theorie der normalen Opera- toren, Math. Ann. 102, 370427 (1929) [38] E.P. Wigner and M.M. Yanase Information contents of distributions Proc. Nat. Acad. Sci 49 910–918 (1963) [39] E.P. Wigner and M.M. Yanase On the positive semidefinite nature of certain matrix expres- sions Canad. J. Math., 16 397–406 (1964) [40] W.H.. Young: On the multiplication of successions of Fourier constants, Proc. Royal soc. A., 97, 1912, pp. 331-339 [41] W.H.. Young: Sur la généralization du théore `me du Parseval, Comptes rendus., 155, 1912, pp. 30-33

Department of Mathematics, Hill Center, Rutgers University, 110 Frelinghuysen Road, Piscataway NJ 08854-8019 USA E-mail address: [email protected]