<<

2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Bulletin of Mathematical Sciences Vol. 1, No. 2 (2019) 1950008 (39 pages) c The Author(s) ⃝ DOI: 10.1142/S1664360719500085

Some inequalities for exponential and logarithmic functions

, , Eric A. Carlen∗ ‡ and Elliott H. Lieb† §

∗Department of Mathematics, Hill Center, Rutgers University 110 Frelinghuysen Road Piscataway, NJ 08854-8019, USA

†Departments of Mathematics and Physics Jadwin Hall, Princeton University Washington Road, Princeton, NJ 08544, USA ‡[email protected] §[email protected]

Received 8 October 2017 Accepted 24 April 2018 Published 14 May 2019

Communicated by Ari Laptev

Consider a function F (X, Y ) of pairs of positive matrices with values in the positive matrices such that whenever X and Y commute F (X, Y )=XpY q . Our first main result gives conditions on F such that Tr[X log(F (Z, Y ))] Tr[X(p log X + q log Y )] for all ≤ X, Y, Z such that TrZ =TrX.(NotethatZ is absent from the right side of the inequal- ity.) We give several examples of functions F to which the theorem applies. Our theorem allows us to give simple proofs of the well-known logarithmic inequalities of Hiai and Petz and several new generalizations of them which involve three variables X, Y, Z instead of

Bull. Math. Sci. Downloaded from www.worldscientific.com just X, Y alone. The investigation of these logarithmic inequalities is closely connected with three functionals: The standard Umegaki quantum rel- ative entropy D(X Y )=Tr[X(log X log Y ]), and two others, the Donald relative ∥ − entropy D (X Y ), and the Belavkin–Stasewski relative entropy D (X Y ). They are D ∥ BS ∥ known to satisfy D (X Y ) D(X Y ) D (X Y ). We prove that the Donald rela- D ∥ ≤ ∥ ≤ BS ∥ tive entropy provides the sharp upper bound, independent of Z on Tr[X log(F (Z, Y ))] in a number of cases in which F (Z, Y )ishomogeneousofdegree1inZ and 1inY .We − also investigate the Legendre transforms in X of D (X Y )andD (X Y ), and show D ∥ BS ∥

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. how our results for these lead to new refinements of the Golden–Thompson inequality.

Keywords:Traceinequalities;quantumrelativeentropy;convexity.

Mathematics Subject Classification: 39B62, 65F60, 94A17

This is an Open Access article published by World Scientific Publishing Company. It is distributed under the terms of the Creative Commons Attribution 4.0 (CC-BY) License. Further distribution of this work is permitted, provided the original work is properly cited.

1950008-1 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

1. Introduction Let M denote the set of complex n n matrices. Let P and H denote the n × n n subsets of Mn consisting of strictly positive and self-adjoint matrices, respectively. For X, Y H , X Y to indicate that X Y is positive semi-definite; i.e. in the ∈ n ≥ − closure of P ,andX>Y indicates that X P . n ∈ n Let p and q be nonzero real numbers. There are many functions F : Pn Pn p q × → Pn such that F (X, Y )=X Y whenever X and Y compute. For example,

F (X, Y )=Xp/2Y qXp/2 or F (X, Y )=Y q/2XpY q/2. (1.1)

Further examples can be constructed using geometric means: For positive n n × matrices X and Y ,andt [0, 1], the t-geometric mean of X and Y ,denotedby ∈ X#tY ,isdefinedbyKuboandAndo[26]tobe

1/2 1/2 1/2 t 1/2 X#tY := X (X− YX− ) X . (1.2)

The geometric mean for t =1/2wasinitiallydefinedandstudiedbyPuszand Woronowicz [36]. Equation (1.2) makes sense for all t R and it has a natural ∈ geometric meaning [40]; see the discussion around Definition 2.4 and in Appendix C. Then for all r>0andallt (0, 1), ∈ r r F (X, Y )=X #tY (1.3)

is such a function with p = r(1 t)andq = rt.Otherexampleswillbeconsidered − as follows. If F is such a function, then Tr[X log F (X, Y )] = Tr[X(p log X +q log Y )] when- ever X and Y commute. We are interested in conditions on F that guarantee either

Tr[X log F (X, Y )] Tr[X(p log X + q log Y )] (1.4) ≥ Bull. Math. Sci. Downloaded from www.worldscientific.com or

Tr[X log F (X, Y )] Tr[X(p log X + q log Y )] (1.5) ≤ for all X, Y P .Someexamplesofsuchinequalitiesareknown:HiaiandPetz ∈ n [23] proved that

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. 1 Tr[X log(Y p/2XpY p/2)] Tr[X(log X +logY )] p ≤ 1 Tr[X log(Xp/2Y pXp/2)] (1.6) ≤ p for all X, Y > 0andallp>0. Replacing Y by Y q/p shows that for F (X, Y )= Xp/2Y qXp/2,(1.4)isvalid,whileforF (X, Y )=Y q/2XpY q/2,(1.5)isvalid: Remarkably, the effects of non-commutativity go in different directions in these two examples. Other examples involving functions F of the form (1.3) have been proved by Ando and Hiai [2].

1950008-2 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

Here we prove several new inequalities of this type, and we also strengthen the results cited above by bringing in a third operator Z:Forexample,Theorem1.4 says that for all positive X, Y and Z such that Tr[Z]=Tr[X], 1 Tr[X log(Y p/2ZpY p/2)] Tr[X(log X +logY )] (1.7) p ≤ with strict inequality if Y and Z do not commute. If Y and Z do commute, the left side of (1.7) is simply Tr[X(log Z +logY )], and the inequality (1.7) would then follow from the inequality Tr[X log Z] Tr[X log X]forallpositiveX and Z with ≤ Tr[Z]=Tr[X]. Our result shows that this persists in the noncommutative case, and we obtain similar results for other choices of F ,inparticularforthosedefined in terms of geometric means. One of the reasons that inequalities of this sort are of interest is their connection 1 with quantum relative entropy. By taking Y = W − ,withX and W both having unit trace, so that both X and W are density matrices, the middle quantity in (1.6), Tr[X(log X log W )], is the Umegaki relative entropy of X with respect to W [43]. − Thus (1.6) provides upper and lower bounds on the relative entropy. There is another source of interest in the inequalities (1.6) which Hiai and Petz refer to as logarithmic inequalities.Astheypointout,logarithmicinequalities are dual, via the Legendre transform, to certain exponential inequalities related to the Golden–Thompson inequality. Indeed, the quantum Gibbs variational principle states that sup Tr[XH] Tr[X(log X log W )] : X 0, Tr[X]=1 { − − ≥ } =log(Tr[eH+log W ]) (1.8) for all self-adjoint H and all nonnegative W .(ThequantumGibbsvariationalprin- ciple is a direct consequence of the Peierls–Bogoliubov inequality, see Appendix A)

Bull. Math. Sci. Downloaded from www.worldscientific.com It follows immediately from (1.6) and (1.8) that 1/2 1 1/2 sup Tr[XH] Tr[X log(X W − X )] : X 0, Tr[X]=1 { − ≥ } log(Tr[eH+log W ]). (1.9) ≤ The left side of (1.9) provides a lower bound for log(Tr[eH+log W ]) in terms of a Legendre transform, which, unfortunately, cannot be evaluated explicitly.

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. An alternate use of the inequality on the right in (1.6) does yield an explicit lower bound on log(Tr[eH+log W ]) in terms of a geometric mean of eH and W .This was done in [23]; the bound is rH rK 1/r (1 t)H+tK Tr[(e # e ) ] Tr[e − ], (1.10) t ≤ which is valid for all self adjoint H, K,andallr>0andt [0, 1]. Since the (1 t)H+tK (1 t)H tK∈ Golden–Thompson inequality is Tr[e − ] Tr[e − e ], (1.10) is viewed ≤ in [23] as a complement to the Golden–Thompson inequality. Hiai and Petz show [23, Theorem 2.1] that the inequality (1.10) is equivalent to the inequality on the right in (1.6). One direction in proving the equivalence,

1950008-3 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

starting from (1.10), is a simple differentiation argument; differentiating (1.10) at t =0yieldstheresult.Whiletheinequalityontheleftin(1.6)isrelativelysimple to prove, the one on the right appears tobedeeperandmoredifficult to prove, from the perspective of [23]. In our paper, we prove a number of new inequalities, some of which strengthen and extend (1.6) and (1.10). Our results show, in particular, that the geomet- ric mean provides a natural bridge between the pair of inequalities (1.6). This perspective yields a fairly simple proof of the deeper inequality on the right of (1.6), and thereby places the appearance of the geometric mean in (1.10) in a nat- ural context. Before stating our results precisely, we recall the notions of operator concavity and operator convexity. A function F : P H is concave in case for all X, Y n → n ∈ P and all t [0, 1], n ∈ F ((1 t)X + tY ) (1 t)F (X) tF (Y ) P , − − − − ∈ n and F is convex in case F is concave. For example, F (X):=Xp is concave for − p [0, 1] as is F (x):=logX. ∈ AfunctionF : P P H is jointly concave in case for all X, Y, W, Z P n × n → n ∈ n and all t [0, 1] ∈ F ((1 t)X + tY, (1 t)Z + tW ) (1 t)F (X, Z) tF (Y,W) P , − − − − − ∈ n and F is jointly convex in case F is jointly concave. Strict concavity or convex- − ity means that the left side is never zero for any t (0, 1) unless X = Y and ∈ Z = W .Aparticularlywell-knownandimportantexampleisprovidedbythegen- eralized geometric means. By a theorem of Kubo and Ando [26], for each t [0, 1], ∈ F (X, Y ):=X#tY is jointly concave in X and Y .Otherexamplesofjointlyconcave functions are discussed as follows. Bull. Math. Sci. Downloaded from www.worldscientific.com Our first main result is the following.

Theorem 1.1. Let F : P P P be such that n × n → n (1) For each fixed Y P ,X F (X, Y ) is concave, and for all λ > 0,F(λX, Y )= ∈ n '→ λF (X, Y ).

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. (2) For each n n unitary U, and each X, Y P , × ∈ n

F (UXU∗,UYU∗)=UF(X, Y )U ∗. (1.11)

(3) For some q R, if X and Y commute then F (X, Y )=XY q. ∈ Then, for all X, Y, Z P such that Tr[Z]=Tr[X], ∈ n Tr[X log(F (Z, Y ))] Tr[X(log X + q log Y )]. (1.12) ≤ If, moreover,X F (X, Y ) is strictly concave, then the inequality in (1.12) is strict '→ when Z and Y do not commute.

1950008-4 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

Remark 1.2. Note that (1.12) has three variables on the left, but only two on the right. The third variable Z is related to X and Y only through the constraint Tr[Z]=Tr[X]. Different choices for the function F (X, Y )yielddifferent corollaries. For our first ∞ 1 1 corollary, we take the function F (X, Y )= 0 λ+Y X λ+Y dλ, which evidently sat- isfies the conditions of Theorem 1.1 with q = 1. We obtain, thereby, the following ! − inequality. Theorem 1.3. Let X, Y, Z P be such that Tr[Z]=Tr[X], Then ∈ n ∞ 1 1 Tr X log Z dλ Tr[X(log X log Y )]. (1.13) λ + Y λ + Y ≤ − " #$0 %& Another simple application can be made to the function F (X, Y )=Y 1/2XY 1/2, however in this case, an adaptation of method of proof of Theorem 1.1 yields a more general result for the two-parameter family of functions F (X, Y )=Y p/2XpY p/2 for all p>0. Theorem 1.4. For all X, Y, Z P such that Tr[Z]=Tr[X], and all p>0, ∈ n Tr[X log(Y p/2ZpY p/2)] Tr[X(log Xp +logY p)]. (1.14) ≤ The inequality in (1.14) is strict unless Z and Y commute. Specializing to the case Z = X,(1.14)reducestotheinequalityontheleft in (1.6). Theorem 1.4 thus extends the inequality of [23] by inclusion of the third variable Z,andspecifiesthecasesofequalitythere. Remark 1.5. If Z does commute with Y ,(1.14)reducestoTr[X log Z] ≤ Tr[X log X]whichiswellknowntobetrueundertheconditionTr[Z]=Tr[X], with equality if and only if Z = X.

Bull. Math. Sci. Downloaded from www.worldscientific.com We also obtain results for the two parameter family of functions r r F (X, Y )=Y #sX with s [0, 1] and r>0. In this case, when X and Y commute, F (X, Y )=XpY q ∈ with p = rs and q = r(1 s). (1.15)

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. − It would be possible to deduce at least some of these results directly from Theo- 2 2 1 2 1 1/2 rem 1.1. We knew that, for example, X Y # X = Y (Y − X Y − ) Y is '→ 1/2 concave in X.Whilewehavenosuchresult,itturnsoutthatwecanuseTheo- rem 1.4 to obtain the following. Theorem 1.6. Let X, Y, Z P be such that Tr[Z]=Tr[X].Thenforalls [0, 1] ∈ n ∈ and all r>0, Tr[X log(Y r# Zr)] Tr[X(s log Xr +(1 s)log Y r)]. (1.16) s ≤ − For s (0, 1), when Z does not commute with Y, the inequality is strict. ∈ 1950008-5 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

The case in which Z = X is proved in [2] using log-majorization methods. The inequality (1.16) is an identity at s =1.Asweshallshow,differentiating it at s =1 in the case Z = X yields the inequality on the right in (1.6). Since the geometric mean inequality (1.16) is a consequence of our generalization of the inequality on the left in (1.6), this derivation shows how the geometric means construction “bridges” the pair of inequalities (1.6). Theorems 1.3, 1.4 and 1.6 provide infinitely many new lower bounds on the Umegaki relative entropy. — one for each choice of Z.Thetracefunctionalonthe right side of (1.6) bounds the Umegaki relative entropy from the above, and is in many ways better-behaved than the trace functional on the left, or any of the individual new lower bounds. By a theorem of Fujii and Kamei [17],

1/2 1/2 1 1/2 1/2 X, W X log(X W − X )X '→ is jointly convex as a function from P P to P ,andthenasatrivialconsequence, n× n n 1/2 1 1/2 X, W Tr[X log(X W − X )] '→ is jointly convex. When X and W are density matrices,

1/2 1 1/2 Tr[X log(X W − X )] =: D (X W ) BS ∥ is the Belavkin–Stasewski relative entropy [6]. The joint convexity of the Umegaki relative entropy is a Theorem of Lindblad [32], who deduced it as a direct conse- quence of the main concavity theorem in [30]. See also [42]. 1/2 1 1/2 Aseeminglysmallchangeinthearrangementoftheoperators—X W − X 1/2 1/2 replaced with W − XW− —obliteratesconvexity;

1/2 1/2 X, W Tr[X log(W − XW− )] (1.17) '→ Bull. Math. Sci. Downloaded from www.worldscientific.com 1/2 1/2 is not jointly convex, and even worse, the function W Tr[X log(W − XW− )] '→ is not convex for all fixed X P .Therefore,althoughthefunctionin(1.17)agrees ∈ n with the Umegaki relative entropy when X and W commute, its lack of convexity makes it unsuitable for consideration as a relative entropy functional. We discuss the failure of convexity at the end of Sec. 3. However, Theorem 1.4 provides a remedy by introducing a third variable Z with by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. respect to which we can maximize. The resulting functional is still bounded above by the Umegaki relative entropy, that is, for all density matrices X and W ,

1/2 1/2 sup Tr[X log(W − ZW− )] : Z 0, Tr[Z] 1 D(X W ). (1.18) { ≥ ≤ } ≤ ∥ One might hope that the left side is a joint of X and W , which does turn out to be the case. In fact, the left-hand side is a quantum relative entropy originally introduced by Donald [14] through a quite different formula. Given any n orthonormal basis u1,...,un of C ,definea“pinching”mapΦ : Mn Mn by { } → defining Φ(X)tobethediagonalmatrixwhosejth diagonal entry is u ,Xu .Let ⟨ j j⟩ 1950008-6 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

denote the sets of all such pinching operations. For density matrices X and Y , P the Donald relative entropy, D (X Y )isdefinedby D ∥ D (X Y )=sup D(Φ(X) Φ(Y )) : Φ . (1.19) D ∥ { ∥ ∈ P} Hiai and Petz [22] showed that for all density matrices X and all Y P , ∈ n D (X Y )=sup Tr[XH] log(Tr[eH Y ]) : H H , (1.20) D ∥ { − ∈ n} n arguing as follows. Fix any orthonormal basis u1,...,un of C .LetX be any { } and let Y be any positive matrix. Define xj = uj,Xuj and yj = n ⟨ ⟩ uj,Yuj for j =1,...,n.For(h1,...,hn) R ,defineH to be the self-adjoint ⟨ ⟩ ∈ operator given by Huj = hjuj, j =1,...,n. Then by the classical Gibb’svariationalprinciple.

n n n hj n xj (log xj log yj)=sup xjhj log e yj :(h1,...,hn) R − ⎧ − ⎛ ⎞ ∈ ⎫ j=1 j=1 j=1 ' ⎨' ' ⎬ ⎝ H ⎠ n =sup⎩Tr[XH] log(Tr[e Y ]) : H Hn R . ⎭ { − ∈ ∈ } Taking the supremum over all choices of the orthonormal basis yields (1.20). For our purposes, a variant of (1.20) is useful.

Lemma 1.7. For all density matrices X, and all Y P , ∈ n D (X Y )=sup Tr[XH]:H H , Tr[eH Y ] 1 . (1.21) D ∥ { ∈ n ≤ }

Proof. Observe that we may add a constant to H without changing Tr[XH] − Bull. Math. Sci. Downloaded from www.worldscientific.com log(Tr[eH Y ]), and thus in taking the supremum in (1.20), we may restrict our attention to H H such that Tr[eH Y ]=1.ThenTr[XH] log(Tr[eH Y ]) = ∈ n − Tr[XH]andtheconstraintin(1.21)issatisfied.Hencethesupremumin(1.20)is no larger than the supremum in (1.21). Conversely, if Tr[eH Y ] 1, then ≤ Tr[XH] Tr[XH] log(Tr[eH Y ]), ≤ − by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. and thus the supremum in (1.21) is no larger than the supremum in (1.20).

By the joint convexity of the Umegaki relative entropy, for each Φ , ∈ P D(Φ(X) Φ(Y )) is jointly convex in X and Y , and then since the supremum of ∥ afamilyofconvexfunctionsisconvex,theDonaldrelativeentropyD (X Y )is D ∥ jointly convex. Making the change of variables Z = W 1/2eH W 1/2 in (1.18), one sees that the supremum in (1.20) is exactly the same as the supremum in (1.21), and thus for all density matrices X and W , D (X W ) D(X W )whichcanalso D ∥ ≤ ∥ be seen as a consequence of the joint convexity of the Umegaki relative entropy.

1950008-7 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

Theorems 1.3 and 1.6 give two more lower bounds to the Umegaki relative entropy for density matrices X and Y ,namely

∞ 1 1 sup Tr X log Z dλ (1.22) Z P ,Tr[Z]=Tr[X] 0 λ + Y λ + Y ∈ n 2 " #$ %&3 and

1 2 sup Tr[X log(Y − #1/2Z) ] . (1.23) Z P ,Tr[Z]=Tr[X]{ } ∈ n Proposition 3.1 shows that both of the supremums are equal to D (X Y ). D ∥ Our next results concern the partial Legendre transforms of the three relative entropies D (X Y ), D(X Y )andD (X Y ). For this, it is natural to consider D ∥ ∥ BS ∥ them as functions on P P , and not only on density matrices. The natural n × n extension of the Umegaki relative entropy functional to P P is n × n D(X W ):=Tr[X(log X log W )] + Tr[W ] Tr[X]. (1.24) ∥ − − It is homogeneous of degree one in X and W and, with this definition, D(X Y ) 0 ∥ ≥ with equality only in case X = W ,whichisaconsequenceofKlein’sinequality,as discussed in Appendix A. The natural extension of the Belavkin–Stasewski relative entropy functional to P P is n × n 1/2 1 1/2 D (X W )=Tr[X log(X W − X )] + Tr[W ] Tr[X]. (1.25) BS ∥ − Introducing Q := eH ,thesupremumin(1.21)is

sup Tr[X log Q]:Q 0, Tr[WQ] 1 , { ≥ ≤ } and the extension of the Donald relative entropy to P P is Bull. Math. Sci. Downloaded from www.worldscientific.com n × n

DD(X W )=sup Tr[X log Q]:Tr[WQ] Tr[X] +Tr[W ] Tr[X]. (1.26) ∥ Q>0{ ≤ } − To avoid repetition, it is useful to note that all three of these functionals are examples of quantum relative entropy functionals in the sense of satisfying the following axioms. This axiomatization differs from many others, such as the ones by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. in [14] and [18], which are designed to single out the Umegaki relative entropy.

Definition 1.8. A quantum relative entropy is a function R(X W )onP P ∥ n × n with values in [0, ]suchthat ∞ (1) X, Y R(X W )isjointlyconvex. '→ ∥ (2) For all X, W P and all λ > 0, R(λX, λW )=λR(X, W)and ∈ n R(λX, W)=λR(X, W)+λ log λTr[X]+(1 λ)Tr[W ]. (1.27) − (3) If X and W commute, R(X W )=D(X W ). ∥ ∥ 1950008-8 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

The definition does not include the requirement that R(X W ) 0withequality ∥ ≥ if and only if X = W because this follows directly from (1)–(3).

Proposition 1.9. Let R(X W ) be any quantum relative entropy. Then ∥ 1 X W 2 R(X W ) Tr[X] , (1.28) ∥ ≥ 2 Tr[X] − Tr[W ] 4 41 4 4 where denotes the trace norm. 4 4 ∥ · ∥1 4 4 The proof is given towards the end of Sec. 3. It is known for the Umegaki relative entropy [21], but the proof uses only the properties (1)–(3). The following pair of inequalities summarizes the relation among the three rel- ative entropies. For all X, W P , ∈ n D (X W ) D(X W ) D (X W ). (1.29) D ∥ ≤ ∥ ≤ BS ∥ These inequalities will imply a corresponding pair of inequalities for the partial Legendre transforms in X.

Remark 1.10. The partial Legendre transform of the relative entropy, which fig- ures in the Gibbs variational principle, is in many ways better behaved than the n full Legendre transform. Indeed, the Legendre transform F ∗ of a function F on R that is convex and homogenous of degree one always has the form

0,yC, F ∗(y)= ∈ 5 ,y/C ∞ ∈ for some convex set C [38]. The set C figuring in the full Legendre transform of the Umegaki relative entropy was first computed by Pusz and Woronowicz [37], and

Bull. Math. Sci. Downloaded from www.worldscientific.com somewhat more explicitly by Donald in [14].

Consider any function R(X Y )onP P that is convex and lower semicon- ∥ n × n tinuous in X.TherearetwonaturalpartialLegendretransformsthatarerelated to each other, namely ΦR(H, Y )andΨR(H, Y )definedby

ΦR(H, Y )= sup Tr[XH] R(X Y ):Tr[X]=1 (1.30)

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. X P { − ∥ } ∈ n and

ΨR(H, Y )= sup Tr[XH] R(X Y ) , (1.31) X P { − ∥ } ∈ n where H H is the conjugate variable to X. ∈ n For example, let R(X Y )=D(X Y ), the Umegaki relative entropy. Then, by ∥ ∥ the Gibbs variational principle,

Φ(H, Y )=1 Tr[Y ]+log(TreH+log Y )(1.32) −

1950008-9 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

and Ψ(H, Y )=TreH+log Y TrY. (1.33) − Lemma 1.11. Let R(X Y ) be any function on P P that is convex and lower ∥ n × n semicontinuous in X, and which satisfies the scaling relation (1.27).Thenforall H H and all Y P , ∈ n ∈ n Φ (X,Y )+Tr[Y ] 1 Ψ (H, Y )=e R − Tr[Y ]. (1.34) R − This simple relation between the two Legendre transforms is a consequence of scaling, and hence the corresponding relation holds for any quantum relative entropy. Consider the Donald relative entropy and define

ΨD(H, Y ):=sup Tr[XH] DD(X Y ) (1.35) X>0{ − ∥ } and

ΦD(H, Y ):= sup Tr[XH] DD(X Y ) . (1.36) X>0,Tr[X]=1{ − ∥ } In Lemma 3.7, we prove the following analog of (1.32): For H H and Y P , ∈ n ∈ n Φ (H, Y )=1 Tr[Y ]+inf λ (H log Q):Q P , Tr[QY ] 1 , (1.37) D − { max − ∈ n ≤ } where for any self-adjoint operator K, λmax(K)isthelargesteigenvalueofK,and we prove that ΦD(H, Y )isconcaveinY .Asaconsequenceofthis,weprovein Theorem 3.10 that for all H H ,thefunction ∈ n Y exp inf λmax(H log Q) (1.38) '→ Q>0,Tr[QY ] 1 − 6 ≤ 7 is concave on P .Moreover,forallH, K H , n ∈ n

Bull. Math. Sci. Downloaded from www.worldscientific.com H+K H K log(Tr[e ]) inf λmax(H log Q) log(Tr[e e ]). (1.39) ≤ Q>0,Tr[QeK ] 1 − ≤ ≤ These inequalities improve upon the Golden–Thompson inequality. Note that by Lemma 1.11, (1.33) and (1.37), the inequality on the left in (1.39) is equivalent to Ψ(H, Y ) Ψ (H, Y ), which in turn is equivalent under the Legendre transform to ≤ D D (X Y ) D(X Y ). D ∥ ≤ ∥

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. The inequality on the right in (1.39) arises through the simple of choice H H Q = e /Tr[Ye ]inthevariationalformulaforΨD(H, Y ). The Q chosen here is optimal only when H and Y commute. Otherwise, there is a better choice for Q, which we shall identify in Sec. 4, and which will lead to a tighter upper bound. In Sec. 4, we shall also discuss the Legendre transform of the Belavkin–Staszewski rel- ative entropy and from this, we derive further refinements of the Golden–Thompson inequality. Finally, in Theorem 4.3, we prove a sharpened form of (1.10), the com- plementary Golden–Thompsen inequality of Hiai and Petz, incorporating a relative entropy remainder term. Three appendices collect background material for the con- venience of the reader.

1950008-10 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

2. Proof of Theorem 1.1 and Related Inequalities

Proof of Theorem 1.1. Our goal is to prove that for all X, Y, Z P such that ∈ n Tr[Z]=Tr[X],

Tr[X log(F (Z, Y ))] Tr[X(log X + q log Y )] (2.1) ≤ whenever F has the properties (1)–(3) listed in the statement of Theorem 1.1. By the homogeneity specified in (3), we may assume without loss of generality that Tr[X]=Tr[Z]=1.Notethat(2.1)isequivalentto

Tr[X(log(F (Z, Y )) log X q log Y )] 0. (2.2) − − ≤ By the Peierls–Bogoliubov inequality (A.3), it suffices to prove that

Tr[exp(log(F (Z, Y )) q log Y )] 1. (2.3) − ≤ Let denote an arbitrary finite index set with cardinality .Let = J |J | U U1,...,U be any set of unitary matrices each of which commutes with Y . { |J |} Then for each j ,by(2), ∈ J Tr[exp(log(F (Z, Y )) q log Y )] = Tr[U exp(log(F (Z, Y )) q log Y )U ∗] − j − j =Tr[exp(log(F (U ZU∗,Y)) q log Y )]. (2.4) j j − Define 1 Z = UjZUj∗. j |J | '∈J 8 Recall that W Tr[eH+log W ]isconcave[30].Usingthis,theconcavityofZ '→ '→ F (Z, Y )specifiedin(1),andthemonotonicityofthelogarithm,averagingboth Bull. Math. Sci. Downloaded from www.worldscientific.com sides of (2.4) over j yields

Tr[exp(log(F (Z, Y )) q log Y )] Tr[exp(log(F (Z,Y )) q log Y )]. − ≤ − Now making an appropriate choice of [13], Z becomes the “pinching” of Z with U 8 respect to Y ;i.e.theorthogonalprojectioninM onto the -subalgebra generated n ∗ by Y and 1.Inthiscase,Z and Y commute so8 that by (3), by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. log(F (Z,Y )) q log Y =logZ + q log Y. 8 − Altogether, 8 8 Tr[exp(log(F (Z, Y )) q log Y )] Tr[Z]=Tr[Z]=1, − ≤ and this proves (2.3). 8

For the case F (X, Y )=Y p/2XpY p/2,wecanmakeasimilaruseofthePeierls– Bogoliubov inequality but can avoid the appeal to convexity.

1950008-11 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

Proof of Theorem 1.4. The inequality we seek to prove is equivalent to 1 Tr X log(Y p/2ZpY p/2) log X log Y 0, (2.5) p − − ≤ " # %& and again by the Peierls–Bogoliubov inequality, it suffices to prove that 1 Tr exp log(Y p/2ZpY p/2) log Y 1. (2.6) p − ≤ " # %& ArefinedversionoftheGolden–ThompsoninequalityduetoFriedlandandSo[16] says that for all positive A, B,andallr>0, Tr[elog A+log B] Tr[(Ar/2BrAr/2)1/r], (2.7) ≤ and moreover the right-hand side is a strictly increasing function of r,unlessA and B commute, in which case it is constant in r.Thefactthattherightsideof(2.7)is increasing in r is a consequence of the Araki–Lieb–Thirring inequality [4], but here we shall need to know that the increase is strict when A and B do not commute; this is the contribution of [16]. Applying (2.7) with r = p, 1 Tr exp log(Y p/2ZpY p/2) log Y p − " # %& p/2 p/2 p p/2 p/2 1/p Tr[(Y − (Y Z Y )Y − ) ]=Tr[Z]=1. (2.8) ≤ By the condition for equality in (2.7), there is equality in (2.8) if and only if (Y p/2ZpY p/2)1/p and Y commute, and evidently, this is the case if and only if Z and Y commute.

In the one parameter family of inequalities provided by Theorem 1.4, some are stronger than others. It is worth noting that the lower the value of p>0in(1.14), the stronger this inequality is in the following sense. Bull. Math. Sci. Downloaded from www.worldscientific.com Proposition 2.1. The validity of (1.14) for p = p1 and for p = p2 implies its validity for p = p1 + p2.

Proof. Since there is no constraint on Y other than that Y is positive, we may replace Y by any power of Y .Therefore,itisequivalenttoprovethatforall X, Y, Z Pn such that Tr[Z]=Tr[X]andallp>0,

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. ∈ Tr[X log(YZpY )] Tr[X(p log X +2logY )]. (2.9) ≤ If (2.9) is valid for p = p1 and for p = p2,thenitisalsovalidforp = p1 + p2: YZp1+p2 Y =(YZp2/2)Zp1 (Zp2/2Y )

p 1/2 p p 1/2 =(YZ 2 Y ) U ∗Z 1 U(YZ 2 Y )

p 1/2 p p 1/2 =(YZ 2 Y ) (U ∗ZU) 1 (YZ 2 Y ) , p 1/2 p /2 where U(YZ 2 Y ) is the polar factorization of Z 2 Y .SinceTr[U ∗ZU]= Tr[Z]=Tr[X], we may apply (2.9) for p to conclude Tr[X log(YZp1+p2 Y )] 1 ≤ 1950008-12 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

p2 p1Tr[X log X]+Tr[X log(YZ Y )]. One more application of (2.9), this time with p = p2,yields

Tr[X log(YZp1+p2 Y )] (p + p )Tr[X log X]+2Tr[X log Y ]. (2.10) ≤ 1 2 By the last line of Corollary 1.4, the inequality (2.10) is strict if Z and Y do not commute and at least one of p1 or p2 belongs to (0, 1).

Our next goal is to prove Theorem 1.6. As indicated in the introduction, we will show that Theorem 1.6 is a consequence of Theorem 1.4. The determination of cases of equality in Theorem 1.4 is essential for the proof of the key lemma which is explained as follows.

Lemma 2.2. Fix X, Y, Z P such that Tr[Z]=Tr[X], and fix p>0.Thenthere ∈ n is some ϵ > 0 so that (1.16) is valid for all s [0, ϵ], and such that when Y and Z ∈ do not commute, (1.16) is valid as a strict for all s (0, ϵ). ∈ Proof. We may suppose, without loss of generality, that Y and Z do not commute since, if they do commute, the inequality is trivially true, just as in Remark 1.5. We compute

p/2 p/2 d p p ∞ Y p/2 p/2 Y Tr[X log(Y # Z )] =Tr X log(Y − ZY − ) dt ds s t + Y p t + Y p 9s=0 " $0 & 9 9 p/2 p/2 9 =Tr[W log(Y − ZY − )], where p/2 p/2 ∞ Y Y W := X dt. t + Y p t + Y p $0

Bull. Math. Sci. Downloaded from www.worldscientific.com Evidently, Tr[W ]=Tr[X]=Tr[Z]. Therefore, by Theorem 1.4 (with X replaced 1 by W and Y replaced by Y − ),

p/2 p/2 p p Tr[W log(Y − ZY − )] Tr[W (log W log Y )]. ≤ − Now note that p/2 p/2 ∞ Y ∞ Y Tr[W log Y p]=Tr X log Y p dt =Tr[X log Y p]. by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. t + Y p t + Y p " $0 $0 & Moreover, by definition W = Φ(X), where Φ is a completely positive, trace and identity preserving . By Lemma B.2, this implies that Tr[W log W p] Tr[X log Xp]. ≤ Consequently, d Tr[X log(Y p# Zp) s log Xp (1 s)log Y p] ds s − − − s=0 Tr[W log W p] Tr[X log Xp]. 9 ≤ − 9 1950008-13 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

Therefore, unless Y and Z commute, the derivative on the left is strictly negative, and hence, for some ϵ > 0, (1.16) is valid as a strict inequality for all s (0, ϵ). If ∈ Y and Z commute, (1.16) is trivially true for all p>0andalls [0, 1]. ∈

Proof of Theorem 1.6. Suppose that (1.16) is valid for s = s1 and s = s2,Since (by Eqs. (C.7) and (C.8))

p p p p p (Y #s Z )#s Z = Y #s +s s s Z , 1 2 1 2− 1 2 p p Tr[X log(Y #s +s s s Z )] 1 2− 1 2 p p p =Tr[X log((Y #s1 Z )#s2 Z )] Tr[X(s log Xp +(1 s )log(Y p# Zp))] ≥ 2 − 2 s1 Tr[X((s + s s s )log Xp +(1 s )(1 s )log Y p)]. ≥ 1 2 − 1 2 − 2 − 1

Therefore, whenever (1.16) is valid for s = s1 and s = s2,itisvalidfors = s + s s s . 1 2 − 1 2 By Lemma 2.2, there is some ϵ > 0sothat(1.16)isvalidas a strict inequality for all s (0, ϵ). Define an increasing sequence tn n recursively by t1 = ϵ and ∈ { } ∈N t =2t t2 for n>1. Then by what we have just proved, (1.16) is valid as a n n − n strict inequality for all s (0,tn). Since limn tn =1,theproofiscomplete. ∈ →∞

The next goal is to show that the inequality on the right in (1.6) is a consequence of Theorem 1.6 by a simple differentiation argument. This simple proof is the new feature, The statement concerning cases of equality was proved in [20].

Theorem 2.3. For all X, Y Pn and all p>0, Bull. Math. Sci. Downloaded from www.worldscientific.com ∈ Tr[X(log Xp +logY p)] Tr[X log(Xp/2Y pXp/2)], (2.11) ≤ and this inequality is strict unless X and Y commute.

Proof. Specializing to the case Z = X in Theorem 1.6, by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. Tr[X log(Y r# Xr)] Tr[X(s log Xr +(1 s)log Y r)]. (2.12) s ≤ − At s =1,bothsidesof(2.12)equalTr[X log Xr]. Therefore, we may differentiate at s =1toobtainanewinequality.Rearrangingtermsin(2.12)yields

Tr[X log Xr] Tr[X log(Y r# Xr)] − s Tr[X(log Xr log Y r)]. (2.13) 1 s ≥ − − Taking the limit s 1ontheleftsideof(2.15)yields d Tr[X log(Y r# Xr))] . ↑ ds p |s=1 From the integral representation for the logarithm, namely log A = ∞( 1 1 ) 0 λ − λ+A 1950008-14 ! 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

dλ,itfollowsthatforallA P and H H , ∈ n ∈ n d ∞ 1 1 log(A + uH) = H dλ. du λ + A λ + A 9u=0 $0 9 r r 9 s s r/2 r/2 r r/2 1 p r/2 Since (see (C.8)) Y #sX = X9 #1 sY = X (X− Y X− ) − X , −

d r r r/2 r/2 r r/2 r/2 Y # X = X log(X− Y X− )X ds s − 9s=1 9 9 r/2 r/2 r r/2 r/2 9 = X log(X Y − X )X . Altogether, by the cyclicity of the trace,

1+r d r r ∞ X r/2 r r/2 Tr[X log(Y # X )] =Tr dλ log(X Y − X ) dp s (λ + Xr)2 9s=1 "$0 & 9 9 r/2 r r/2 9 =Tr[X log(X Y − X )].

1 Replacing Y by Y − yields (2.11). This completes the proof of the inequality itself, and it remains to deal with the cases of equality. Fix r>0andX and Y that do not commute. By Theorem 1.3 applied with Z = X and s =1/2, there is some δ > 0suchthat

1 1 1 Tr[X log(Y # X)] Tr X log X + log Y δ. (2.14) 1/2 ≤ 2 2 − 2 " # %&

Now use the fact that Y #3/4X =(Y #1/2X)#1/2X,andapply(2.11)andthen (2.14),

Tr[X log(Y #3/4X)] = Tr[X log((Y #1/2X)#1/2X)] Bull. Math. Sci. Downloaded from www.worldscientific.com 1 1 Tr X log X + log(Y # X) ≤ 2 2 1/2 " # %& 1 1 = Tr[X log X]+ Tr[X(Y # X)] 2 2 1/2 1 1 1 1 1 Tr[X log X]+ Tr X log X + log Y δ by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. ≤ 2 2 2 2 − 2 # " # %& % 3 1 1 =Tr X log X + log Y δ. 4 4 − 4 " # %& We may only apply strict in the last step since δ depends on X and Y ,and

strict need not hold if Y is replaced by Y #1/2X.However,inthiscase,wemay apply (2.11). Further iteration of this argument evidently yields the inequalities

k Tr[X log(Y #1 t X)] Tr[X((1 tk)log X + sk log Y )] tkδ,tk =2− , − k ≤ − −

1950008-15 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

for each k N.Wemaynowimprove(2.15)to ∈ Tr[X log Xr] Tr[X log(Y r# Xr)] − s Tr[X(log Xr log Y r)] + δ (2.15) 1 s ≥ − − k for s =1 2− , k N.Bythecalculationsabove,takings 1alongthissequence − ∈ → yields the desired strict inequality.

Further inequalities, which we discuss now, involve an extension of the notion of geometric means. This extension is introduced here and explained in more detail in Appendix C. 1/2 1/2 1/2 t 1/2 Recall that for t [0, 1] and X, Y P , X# Y := X (X− YX− ) X . ∈ ∈ n t As noted earlier, this formula makes sense for all t R,andithasanatural ∈ geometric meaning. The map t X#tY ,definedfort R,isaconstantspeed '→ ∈ geodesic running between X and Y for a particular Riemannian metric on the space of positive matrices.

Definition 2.4. For X, Y Pn and for t R, ∈ ∈ 1/2 1/2 1/2 t 1/2 X#tY := X (X− YX− ) X . (2.16) The geometric picture leads to an easy proof of the following identity: Let X, Y ∈ Pn,andt0,t1 R.Thenforallt R, ∈ ∈

X#(1 t)t +tt Y =(X#t0 Y )#t(X#t1 Y ). (2.17) − 0 1 See Theorem C.4 for the proof. As a special case, take t1 =0andt0 =1.Then,for all t,

X#1 tY = Y #tX. (2.18) − With this definition of X#tY for t R,wehavethefollowing. ∈

Bull. Math. Sci. Downloaded from www.worldscientific.com Theorem 2.5. For all X, Y, Z P such that Tr[Z]=Tr[X], ∈ n Tr[X log(Zr# Y r)] Tr[X((1 t)log Xr + t log Y r)] (2.19) t ≥ − is valid for all t [1, ) and r>0.IfY and Z do not commute, the inequality is ∈ ∞ strict for all t>1. The inequalities in Theorem 2.5 and 1.6 are equivalent. The following simple

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. identity is the key to this observation.

Lemma 2.6. For B,C P and s =1, let A = B# C.Then ∈ n ̸ s B = C#1/(1 s)A. (2.20) −

Proof. Note that by (2.16) and (2.18), A = B#sC is equivalent to A = 1/2 1/2 1/2 1 s 1/2 1/2 1/2 1/2 1/2 1 s C (C− BC− ) − C ,sothatC− AC− =(C− BC− ) − .

Lemma 2.7. Let X, Y, Z P be such that Tr[Z]=Tr[X].Letr>0.Then(1.16) ∈ n is valid for s (0, 1) if and only if (2.19) is valid for t =1/(1 s). ∈ − 1950008-16 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

r r r Proof. Define W Pn by W := Y #sZ .Theidentity(2.20)thensaysthat r r ∈r Y = Z #1/(1 s)W .Therefore, − Tr[X log(Y r# Zr) s log Xr (1 s)log Y r)] s − − − r r r r =Tr[X(log W s log X (1 s)log(Z #1/(1 s)W ))]. (2.21) − − − − Since s (0, 1), the right side of (2.21) is nonpositive if and only if ∈

r r s 1 r Tr[X log(Z #1/(1 s)W )] Tr X − log X + W . − ≥ 1 s 1 s " # − − %& With this lemma, we can now prove Theorem 2.5.

Proof of Theorem 2.5. Lemma 2.7 says that Theorem 2.5 is equivalent to The- orem 1.6.

There is a complement to Theorem 2.5 in the case Z = X that is equivalent to aresultofHiaiandPetz,whoformulateitdifferently and do not discuss extended geometric means. The statement concerning cases of equality is new.

Theorem 2.8. For all X, Y P , ∈ n Tr[X log(Xr# Y r)] Tr[X((1 t)log Xr + t log Y r)] (2.22) t ≥ − is valid for all t ( , 0] and r>0.IfY and X do not commute, the inequality ∈ −∞ is strict for all t<0.

Proof. By definition 2.4, Bull. Math. Sci. Downloaded from www.worldscientific.com r r r/2 r/2 r r/2 t r/2 X #tY = X (X Y − X )| |X

r/2 r r/2 r/2 r r/2 t /r = X W X , where W := (X Y − X )| | .

Therefore, by (2.11),

r r r/2 r r/2 by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. Tr[X log(X # Y )] = Tr[X log(X W X )] rTr[X log X]+rTr[X log W ]. t ≥ By the definition of W and (2.18) once more,

t r/2 r r/2 t Tr[X log W ]=| |Tr[X log(X Y − X )] | |rTr[X(log X log Y )]. r ≥ r − By combining the inequalities, we obtain (2.22).

The proof given by Hiai and Petz is quitedifferent. It uses a tensorization argument.

1950008-17 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

3. Quantum Relative Entropy Inequalities Theorems 1.3, 1.4 and 1.6 show that the three functions

1/2 1/2 X, Y sup Tr[X log(Y − ZY − )] +Tr[Y ] Tr[X], (3.1) '→ Z P ,Tr[Z]=Tr[X]{ } − ∈ n 1 2 X, Y sup Tr[X log(Y − #1/2Z) ] +Tr[Y ] Tr[X](3.2) '→ Z P ,Tr[Z]=Tr[X]{ } − ∈ n

and

∞ 1 1 X, Y sup Tr X log Z dλ +Tr[Y ] Tr[X] '→ Z P ,Tr[Z]=Tr[X] 0 λ + Y λ + Y − ∈ n 2 " #$ %&3 (3.3)

are all bounded above by the Umegaki relative entropy X, Y Tr[X(log X '→ − log Y )] + Tr[Y ] Tr[X]. The next lemma shows that these functions are actually − one and the same.

Proposition 3.1. The three functions defined in (3.1)–(3.3) are all equal to the Donald relative entropy D (X Y ).Consequently, for all X, Y P , D ∥ ∈ n

D (X Y ) D(X Y ). (3.4) D ∥ ≤ ∥

Proof of Proposition 3.1. The first thing to note is that the relaxed constraint Tr[YQ] Tr[X]imposesthesamerestrictionin(1.26)asdoesthehardconstraint ≤ Tr[YQ]=Tr[X] since, if Tr[YQ] < Tr[X], we may replace Q by (Tr[X]/Tr[YQ])Q

Bull. Math. Sci. Downloaded from www.worldscientific.com so that the hard constraint is satisfied. Thus we may replace the relaxed constraint in (1.26) by the hard constraint without affecting the function D (X Y ). This D ∥ will be convenient in the lemma though elsewhere the relaxed constraint will be essential. Next, for each of (3.1)–(3.3), we make a change of variables. In the first case, 1/2 1/2 define Φ : Pn Pn by Φ(Z)=Y − ZY − := Q.ThenΦ is invertible with 1 1/2→ 1/2 by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. Φ− (Q)=Y QY .Underthischangeofvariables,theconstraintTr[X]=Tr[Z] becomes Tr[X]=Tr[Y 1/2QY 1/2]=Tr[YQ]. Thus (3.1) gives us another expression for the Donald relative entropy. For the function in (3.2), we make a similar change of variables. Define Φ : 1/2 Pn Pn by Φ(Z)=Z#1/2Y := Q from Pn to Pn.Thismapisinvertible:It → 1/2 1 follows by direct computation from the definition (1.2) that for Q := Z#1/2Y − , 1/2 1/2 1 1/2 1/2 Z = Q YQ ,sothatΦ− (Q)=Q YQ .(Thishasaninterestinganduseful geometric interpretation that is discussed in Appendix C.) Under this change of variables, the constraint Tr[X]=Tr[Z]becomesTr[X]=Tr[Q1/2YQ1/2]=Tr[YQ]. Thus (3.2) gives another expression for the Donald relative entropy.

1950008-18 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

Finally, for the function in (3.3), we make a similar change of variables. Define Φ : P P by n → n ∞ 1 1 Φ(Z)= Z dλ := Q1/2 λ + Y λ + Y $0 1 1 1 s s from Pn to Pn.Thismapisinvertible:Φ− (Q)= 0 Y − QY ds.Underthis change of variables, the constraint Tr[X]=Tr[Z]becomesTr[X]=Tr[ 1 Y 1 sQ ! 0 − Y sds]=Tr[YQ]. !

With the Donald relative entropy having taken center stage, we now bend our efforts to establishing some of its properties.

Lemma 3.2. Fix X, Y P , and define := Q P :Tr[QY ] Tr[X] . ∈ n KX,Y { ∈ n ≤ } There exists a unique Q such that Tr[Q Y ] Tr[X] and such that X,Y ∈ KX,Y X,Y ≤

Tr[X log QX,Y ] > Tr[X log Q]

for all other Q .Theequation ∈ KX,Y

∞ 1 1 X dt = Y (3.5) t + Q t + Q $0

has a unique solution in Pn, and this unique solution is the unique maximizer QX,Y .

Proof. Note that is a compact, convex set. Since Q log Q is strictly KX,Y '→ concave, Q Tr[X log Q]isstrictlyconcaveon ,andithasthevalue on '→ KX,Y −∞ ∂P ,thereisauniquemaximizerQ that lies in P . n ∩ KX,Y X,Y n ∩ KX,Y

Bull. Math. Sci. Downloaded from www.worldscientific.com Let H H be such that Tr[HY ]=0.Forallt in a neighborhood of 0, ∈ n Q + tH P .Differentiating in t at t =0yields X,Y ∈ n ∩ KX,Y

∞ 1 1 0= Tr X H dt t + Q t + Q $0 # " X,Y X,Y &% ∞ 1 1 =Tr H X dt ,

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. t + Q t + Q " #$0 X,Y X,Y %& and hence

∞ 1 1 X dt = λY t + Q t + Q $0 X,Y X,Y 1/2 for some λ R.MultiplyingthroughonbothsidesbyQ and taking the trace ∈ X,Y yields λ =1,whichshowsthatQX,Y solves (3.5). Conversely, any solution of (3.5) yields a critical point of our strictly concave functional, and hence must be the unique maximizer.

1950008-19 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

Remark 3.3. There is one special case for which we can give a formula for the 1 solution QX,Y to (3.5): When X and Y commute, QX,Y = XY − .

Lemma 3.4. For all X, Y P and all λ > 0, ∈ n DD(λX, λY )=λDD(X, Y ), (3.6) and D (λX, Y )=λD (X, Y )+λ log λTr[X]+(1 λ)Tr[Y ]. (3.7) D D −

Proof. By (3.5), the maximizer QX,Y in Lemma 3.2 satisfies the scaling relations

1 QλX,Y = λQX,Y and QX,λY = λ− QX,Y , (3.8) and (3.6) follows immediately. Next, by (3.8) again, D (λX Y )=λ(Tr[X log Q ]+Tr[Y ] Tr[X]) + λ log λTr[X]+(1 λ)Tr[Y ], D ∥ X,Y − − which proves (3.7).

Lemma 3.5. If X and Y commute,D (X Y )=D(X Y ). D ∥ ∥ Proof. Let U ,...,U be any set of unitary matrices that commute with { 1 N } X and Y .Thenforeachj =1,...,n,Tr[Y (Uj∗QUj)] = Tr[YQ]. Define

N 1 Q = U ∗QU . N j j j=1 ' 8 For an appropriate choice of the set U ,...,U , Q is the orthogonal projection of { 1 N } Q,withrespecttotheHilbert–Schmidtinnerproduct,ontotheabeliansubalgebra

Bull. Math. Sci. Downloaded from www.worldscientific.com of Mn generated by X, Y and 1 [13]. By the concavity8 of the logarithm,

N N 1 1 Tr[X log Q] Tr[X log(U ∗QU )] = Tr[UXU∗ log Q]=Tr[X log Q]. ≥ N j j N j=1 j=1 ' ' Therefore,8 in taking thesupremum,weneedonlytoconsideroperatorsQ that commute with both X and Y .TheclaimnowfollowsbyRemark3.3. by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles.

Remark 3.6. Another simple proof of this can be given using Donald’s original formula (1.19).

We have now proved that DD has properties (2) and (3) in the Definition 1.8 of relative entropy, and have already observed that it inherits joint convexity from the Umegaki relative entropy though its original definition by Donald. We now compute the partial Legendre transform of D (X Y ). In doing so, we D ∥ arrive at a direct proof of the joint convexity of D (X Y ), independent of the joint D ∥ convexity of the Umegaki relative entropy. We first prove Lemma 1.11.

1950008-20 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

1 Proof of Lemma 1.11. For X P ,definea =Tr[X]andW := a− X,sothat ∈ n W is a density matrix. Then Tr[XH] R(X Y )=aTr[WH] aR(W Y ) a log a (1 a)Tr[Y ]. − ∥ − ∥ − − − Therefore,

ΨR(H, Y )

=sup a sup Tr[WH] DR(W Y ):Tr[W ]=1 + aTr[Y ] a log a Tr[Y ] a>0{ W P { − ∥ } − } − ∈ n

=sup a(ΦR(H, Y )+Tr[Y ]) a log a Tr[Y ]. a>0{ − } − b 1 Now use the fact that for all a>0andallb R, a log a + e − ab with equality ∈ ≥ if and only if b =1+loga to conclude that (1.34) is valid.

The function DD evidently satisfies the conditions of this lemma. Our immediate goal is to compute ΦR(H, Y )forthischoiceofR,andtoshowitsconcavityasa function of Y .Recallthedefinition

ΦD(H, Y ):= sup Tr[XH] DD(X Y ) . (3.9) X>0,Tr[X]=1{ − ∥ } We wish to evaluate the supremum as explicitly as possible.

Lemma 3.7. For H H and Y P , ∈ n ∈ n Φ (H, Y )=1 Tr[Y ]+inf λ (H log Q):Q P , Tr[QY ] 1 , (3.10) D − { max − ∈ n ≤ } where for any self-adjoint operator K, λmax(K) is the largest eigenvalue of K. Our proof of (3.10) makes use of a Minimax Theorem; such theorems give con- ditions under which a function f(x, y)onA B satisfies

Bull. Math. Sci. Downloaded from www.worldscientific.com × sup inf f(x, y)= inf sup f(x, y). (3.11) x A y B y B x A ∈ ∈ ∈ ∈ The original Minimax Theorem was proved by von Neumann [44]. While most of his paper deals with the case in which f is a bilinear function on Rm Rn for some × m and n,andA and B are simplexes, he also proves [44, p. 309] a more general result for functions on R R that are quasi-concave in X and quasi-convex in y. by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. × According to Kuhn and Tucker [27, p. 113], a multidimensional version of this is implicit in the paper. von Neumann’s work inspired host of researchers to undertake extensions and generalizations; [15] contains a useful survey. A theorem of Peck and Dulmage [34] serves our purpose. See [39] for a more general extension. Theorem 3.8 (Peck and Dulmage). Let be a topological vector space, and let X be a vector space. Let A be nonempty compact and convex, and let B Y ⊂ X ⊂ Y be nonempty and convex. Let f be a real valued function on A B such that for × each fixed y B, x f(x, y) is concave and upper semicontinuous, and for each ∈ '→ fixed x A, y f(x, y) is convex. Then (3.11) is valid. ∈ '→ 1950008-21 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

Proof of Lemma 3.7. The formula (3.10) has been proved above. Define = = M , A = W P :Tr[W ]=1 and B := W P : X Y n { ∈ n } { ∈ n Tr[WY] 1 .ForH H ,define ≤ } ∈ n f(X, Q):=Tr[X(H log Q)]. − Then the hypotheses of Theorem 3.8 are satisfied, and hence sup inf f(X, Q)= inf sup f(X, Q). (3.12) X A Q B Q B X A ∈ ∈ ∈ ∈ Using the definition (3.9) and the identity (3.12),

ΦD(H, Y )+Tr[Y ] 1:= sup Tr[XH] sup Tr[X log Q] − X>0,Tr[X]=1{ − Q>0,Tr[QY ] 1{ }} ≤ := sup inf Tr[X(H log Q)] Q>0,Tr[QY ] 1{ − } X>0,Tr[X]=1 ≤ =inf sup Tr[X(H log Q)] Q>0,Tr[QY ] 1 { − } ≤ X>0,Tr[X]=1

=infλmax(H log Q). (3.13) Q>0,Tr[QY ] 1 − ≤

Lemma 3.9. For each H H ,Y Φ (H, Y ) is concave.aaa ∈ n '→ D

Proof. Fix Y>0andletA Hn be such that Y := Y A are both positive. Let ∈ ± ± Q be optimal in the variational formula (3.10) for Φ(H, Y ). We claim that there exists c R so that ∈ c c Tr[Y+Qe ] 1andTr[Y Qe− ] 1. (3.14) ≤ − ≤ Suppose for the moment that this is true. Then Bull. Math. Sci. Downloaded from www.worldscientific.com

1 c 1 c λ (H log Q)= λ (H log(Qe )+ λ (H log(Qe− ))). max − 2 max − 2 max − By (3.14), 1 1 Φ (H, Y ) Φ (H, Y )+ Φ (H, Y ), D ≥ 2 D + 2 D by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. which proves midpoint concavity. The general concavity statement follows by con- tinuity. To complete this part of the proof, it remains to show that we can choose c R so ∈ that (3.14) is satisfied. Define a := Tr[QA]. Since Y A>0, and Tr[Q(Y A)] > 0, ± ± which is the same as 1 a>0. That is, a < 1. We then compute ± | | c c c Tr[Y+Qe ]=e Tr[YQ+ AQ]=e (1 + a)

c c and likewise, Tr[Y Qe− ] e− (1 a). We wish to choose c so that − − − c c e (1 + a) 1ande− (1 a) 1. ≤ − ≤

1950008-22 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

This is the same as log(1 a) c log(1 + a). − ≤ ≤− Since log(1+a) log(1 a)= log(1 a2) > 0, the interval [log(1 a), log(1+a)] − − − − − − − is nonempty, and we may choose any c in this interval.

We may now improve on Lemma 3.9: Not only ΦD(H, Y )isconcaveinY ; its exponential is also concave in Y .

Theorem 3.10. For all H H , the function ∈ n

Y exp inf λmax(H log Q) (3.15) '→ Q>0,Tr[QY ] 1 − # ≤ % is concave on P .Moreover, for all H, K H , n ∈ n H+K H K log(Tr[e ]) inf λmax(H log Q) log(Tr[e e ]). (3.16) ≤ Q>0,Tr[QeK ] 1 − ≤ ≤ These inequalities improve upon the Golden–Thompson inequality.

Proof. Let Ψ (H, Y )bethepartialLegendretransformofD(X Y )inX without D ∥ any restriction on X:

ΨD(H, Y ):=sup Tr[XH] DD(X Y ) . (3.17) X>0{ − ∥ } By [9, Theorem 1.1], and the joint convexity of D (X Y ), Ψ (H, Y )isconcavein D ∥ D Y for each fixed H H .ByLemma1.11, ∈ n Φ (X,Y )+Tr[Y ] 1 Ψ (H, Y )=e D − Tr[Y ], D − and thus we conclude

ΨD(H, Y )=exp inf λmax(H log Q) Tr[Y ]. (3.18) Bull. Math. Sci. Downloaded from www.worldscientific.com Q>0,Tr[QY ] 1 − − # ≤ % The inequality Ψ(H, Y ) Ψ (H, Y )followsfromD (X Y ) D(X Y )andthe ≤ D D ∥ ≤ ∥ order reversing property of Legendre transforms. Taking exponentials and writing K eH Y = e yield the first inequality in (3.16). Finally, choosing Q := Tr[eH Y ] so that the constraint Tr[QY ] 1issatisfied,weobtainΦ (H, Y ) log(Tr[eH Y ]). Taking ≤ D ≤ exponentials and writing Y = eK now yield the second inequality in (3.16). by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles.

The proof that the function in (3.15) is concave has two components. One is the identification (3.18) of this function with ΨD(H, Y ). The second makes use of the direct analog of an argument of Tropp [41] proving the concavity in Y of Tr[eH+log Y ]=Ψ(H, Y )+Tr[Y ]asaconsequenceofthejointconvexityofthe Umegaki relative entropy. Once one has Eq. (3.18), the convexity of the function in (3.15) follows from the same argument, applied instead to the Donald relative entropy, which is also jointly convex.

1950008-23 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

However, it is of interest to note here that this argument can be run in reverse to deduce the joint convexity of the Donald relative entropy without invoking the joint convexity of the Umegaki relative entropy. To see this, note that Lemma 3.9 provides a simple direct proof of the concavity in Y of ΦD(H, Y ). By the Fenchel- Moreau Theorem, for all density matrices X

DD(X Y )= sup Tr[XH] ΦD(H, Y ) . (3.19) ∥ H H { − } ∈ n For each fixed H H , X, Y Tr[XH] Φ (H, Y )isevidentlyjointlyconvex. ∈ n '→ − R Since the supremum of any family of convex functions is convex, we conclude that with the X variable restricted to be a density matrix, X, Y D (X Y )isjointly '→ D ∥ convex. The restriction on X is then easily removed; see Lemma 3.11. This gives an elementary proof of the joint convexity of D (X Y ). D ∥ It is somewhat surprising that the joint convexity of the Umegaki relative entropy is deeper than the joint convexity of either D (X Y )orD (X Y ). In D ∥ BS ∥ fact, the simple proof by Fujii and Kamei that the latter is jointly convex stems from ajointoperator convexity result; see the discussion in Appendix C. The joint con- vexity of the Umegaki relative entropy, in contrast, stems from the basic concavity theorem in [30].

Lemma 3.11. Let f(x, y) be a ( , ] valued function on Rm Rn that is homo- m−∞ ∞ m × geneous of degree one. Let a R , and let Ka = x R : a, x =1 , and suppose ∈ { ∈ ⟨ ⟩n } that whenever f(x, y) < , a, x > 0.Iff is convex on Ka R , then it is convex ∞ ⟨ ⟩ × on Rm Rn. × n n Proof. Let x1,x2 R and y1,y2 R .Wemaysupposethatf(x1,y1), ∈ ∈ f(x ,y ) < .Defineα = a, x and α = a, x .Thanα , α > 0, and 2 2 ∞ 1 ⟨ 1⟩ 2 ⟨ 2⟩ 1 2 u /α ,x /α K .Withλ := α /(α + α ), 1 1 2 2 ∈ a 1 1 2

Bull. Math. Sci. Downloaded from www.worldscientific.com x x y y f(x + x ,y + y )=(α + α )f λ 1 +(1 λ) 2 , λ 1 +(1 λ) 2 1 2 1 2 1 2 α − α α − α # 1 2 1 2 % x y x y (α + α )λf 1 , 1 +(α + α )(1 λ)f 2 , 2 ≤ 1 2 α α 1 2 − α α # 1 1 % # 2 2 % = f(x1,y1)+f(x2,y2).

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. Thus, f is subadditive on Rm Rm,andbythehomogeneityoncemore,jointly × convex.

We next provide the proof of Proposition 1.9, which when we recall says that any quantum relative entropy functional satisfies the inequality X W 2 R(X W ) 1 Tr[X] (3.20) ∥ ≥ 2 Tr[X] − Tr[W ] 4 41 4 4 for all X, W Pn,where 1 denotes the4 trace norm. 4 ∈ ∥ · ∥ 4 4

1950008-24 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

Proof of Proposition 1.9. By scaling, it suffices to show that when X and W are density matrices, 1 R(X W ) X W 2. (3.21) ∥ ≥ 2∥ − ∥1 Let X and W be density matrices and define H = X W .LetP be the spectral − projection onto the subspace of Cn spanned by the eigenvectors of H with nonneg- ative eigenvalues. Let be the -subalgebra of M generated by H and 1,andlet A ∗ n E be the orthogonal projection in Mn equipped with the Hilbert–Schmidt inner A product onto .ThenA E A is a convex operation [13], and then by the joint A '→ A convexity of R,

R(X Y ) R(E X E Y ). (3.22) ∥ ≥ A ∥ A Since both E X and E Y belong to the commutative algebra ,(3.22)together A A A with property (3) in the definition of quantum relative entropies then gives us

R(X Y ) D(E X E Y ). ∥ ≥ A ∥ A

Since E X E Y 1 = X Y 1,theinequalitynowfollowsfromtheclassical ∥ A − A ∥ ∥ − ∥ Csiszar–Kullback–Leibler–Pinsker inequality [12, 28, 29, 35] on a two-point proba- bility space.

Remark 3.12. The proof of the lower bound (3.21) given here is essentially the same as the proof for the case of the Umegaki relative entropy given in [21]. The proof gives one reason for attaching importance to the joint convexity property, and since it is short, we spelled it out to emphasize this.

We conclude this section with a brief discussion of the failure of con- 1/2 1/2 1/2 1/2

Bull. Math. Sci. Downloaded from www.worldscientific.com vexity of the function φ(X, Y )=TrX log(Y − XY − )X .Werecall that if we write this in the other order, i.e. define the function ψ(X, Y )= 1/2 1/2 1 1/2 1/2 TrX log(X Y − X )X ,thefunctionψ is jointly convex. In fact, ψ is oper- ator convex if the trace is omitted. We might have hoped, therefore, that φ would at 1/2 1/2 least be convex in Y alone, and even have hoped that log(Y − XY − )isopera- tor convex in Y .Neitherofthesethingsistrue.Thefollowinglemmaprecludesthe operator convexity. by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles.

Lemma 3.13. Let F be a function mapping the set of positive semidefinite matrices into itself. Let f :[0, ) R be a concave, monotone increasing function. If ∞ → Y f(F (Y )) is operator convex, then Y F (Y ) is operator convex. '→ '→ Proof. If Y F (Y )isnotoperatorconvex,thenthereisaunitvectorv and there '→ 1 are density matrices Y1 and Y2 such that with Y = 2 (Y1 + Y2), 1 v, F(Y )v < ( v, F(Y )v + v, F(Y )v ). ⟨ ⟩ 2 ⟨ 1 ⟩ ⟨ 2 ⟩

1950008-25 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

By Jensen’s inequality, for all density matrices X, v, f(F (X))v f( v, F(X)v ). ⟨ ⟩≤ ⟨ ⟩ Therefore, 1 1 ( v, f(F (Y ))v + v, f(F (Y ))v ) (f( v, F(Y )v )+f( v, F(Y )v )) 2 ⟨ 1 ⟩ ⟨ 2 ⟩ ≤ 2 ⟨ 1 ⟩ ⟨ 2 ⟩ 1 f ( v, F(Y )v + v, F(Y )v ) ≤ 2 ⟨ 1 ⟩ ⟨ 2 ⟩ # % < v, f(F (Y ))v . ⟨ ⟩ 1/2 1/2 1/2 1/2 By the lemma, if Y log(Y − ZY − )wereconvex,Y Y − ZY − '→ '→ would be convex. But this may be shown to be false in the 2 2casebysimple × computations in a neighborhood of the identity with Z arank-oneprojector.A more intricate computation of the same type shows that — even with the trace — convexity fails.

4. Exponential Inequalities Related to the Golden– Thompson Inequality

Let Ψ(H, Y )begivenin(1.33)andΨD(H, Y )begivenin(3.17).Wehaveseeninthe previous section that the inequality D (X Y ) D(X Y )leadstotheinequality D ∥ ≤ ∥ Ψ(H, Y ) Ψ (H, Y ). This inequality, which may be written explicitly as ≤ D Tr[eH+log Y ] exp(inf λ (H log Q):Q P , Tr[QY ] 1 ), (4.1) ≤ { max − ∈ n ≤ } immediately implies the Golden–Thompson inequality through the simple choice Q = eH/Tr[YeH ]. The Q chosen here is optimal only when H and Y commute. Otherwise, there is a better choice for Q,whichwillleadtoatighterupperbound. AsimilaranalysiscanbemadewithrespecttotheBSrelativeentropy.Define ΨBS(H, Y )by

ΨBS(H, Y ):=sup Tr[HX] DBS(X Y ):X Pn . (4.2)

Bull. Math. Sci. Downloaded from www.worldscientific.com { − ∥ ∈ } The inequality D(X Y ) D (X Y )togetherwithLemma1.11gives ∥ ≤ BS ∥ Ψ (H, Y ) Ψ(H, Y )=Tr[eH+log Y ] Tr[Y ]. (4.3) BS ≤ − It does not seem possible to compute ΨBS(H, Y )explicitly,butitispossibletogive an alternate expression for it in terms of the solutions of a nonlinear matrix equation similar to the one (3.5) that arises in the context of the Donald relative entropy.

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. Writing out the identity X#tY = Y #1 tX gives − 1/2 1/2 1/2 t 1/2 1/2 1/2 1/2 1 t 1/2 X (X− YX− ) X = Y (Y − XY − ) − Y . Differentiating at t =0yields 1/2 1/2 1 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 X log(X Y − X )X = Y (Y − XY − )log (Y − XY − )Y . This provides an alternate expression for D (X Y )thatinvolvesX in a somewhat BS ∥ simpler way that is advantageous for the partial Legendre transform in X: 1/2 1/2 D (X Y )=Tr[Yf(Y − XY − )] Tr[X]+Tr[Y ], (4.4) BS ∥ − where f(x)=x log x.Adifferent derivation of this formula may be found in [23].

1950008-26 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

1/2 1/2 Introducing the variable R = Y − XY − ,wehave,forallH H , ∈ n 1/2 1/2 Tr[XH] D (X Y )=Tr[X(H + 1)] TrTr[Yf(Y − XY − )] Tr[Y ] − BS ∥ − − =Tr[R(Y 1/2(H + 1)Y 1/2)] Tr[Yf(R)] Tr[Y ]. − − Therefore,

1/2 1/2 ΨBS(H, Y )+Tr[Y ]= sup Tr[R(Y (H + 1)Y )] Tr[Yf(R)] . (4.5) R P { − } ∈ n When Y and H commute, the supremum on the right is achieved at R = eH since for this choice of R, Tr[R(Y 1/2(H + 1)Y 1/2)] Tr[Yf(R)] = Tr[YeH ]=Tr[eH+log Y ], − and by (4.3), this is the maximum possible value. In general, without assuming that H and Y commute, this choice of R and (4.3) yields an interesting inequality.

Theorem 4.1. For all self-adjoint H and L, Tr[eH eL] Tr[eH+L] Tr[eH HeL] Tr[eH eL/2HeL/2]. (4.6) − ≤ − Proof. With the choice R = eH ,theinequality(4.3)togetherwith(4.5)yields Tr[eH (Y 1/2HY 1/2 + Y )] Tr[YeH H] Tr[eH+log Y ], − ≤ or, rearranging terms, Tr[eH Y ] Tr[eH+log Y ] Tr[eH HY] Tr[eH (Y 1/2HY 1/2)]. − ≤ − The inequality is proved by writing Y = eL. Bull. Math. Sci. Downloaded from www.worldscientific.com We now turn to the specification of the actual maximizer.

Lemma 4.2. For K H and Y P , the function ∈ n ∈ n R Tr[RK] Tr[Yf(R)] '→ − on Pn has a unique maximizer RK,Y in Pn which is contained in Pn, and RK,Y is by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. the unique critical point of this function in Pn.

Proof. Since f is strictly operator convex, R Tr[RK] Tr[Yf(R)] is strictly con- '→ − cave. There are no local maximizers on the boundary on Pn since limx 0( f ′(x)) = ↓ − ,sothatifR has a zero eigenvalue, a small perturbation of R will yield a higher ∞ value. Finally, 1 Tr[RK] Tr[Yf(R)] K Tr R R log R , − ≤∥ ∥ − a " &

1950008-27 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

1 where a = K Y − .Thisshowsthat ∥ ∥∥ ∥ sup Tr[RK] Tr[Yf(R)] =sup Tr[RK] Tr[Yf(R)] : R 0, R e1/a . R P { − } { − ≥ ∥ ∥≤ } ∈ n Since the set on the right is compact and convex, and since the function R '→ Tr[RK] Tr[Yf(R)] is strictly concave and upper-semicontinuous on this set, there − exists a unique maximizer, which we have seen must be in the interior, and by the strict concavity, there can be no other interior critical point.

It is now a simple matter to derive the Euler–Lagrange equation that determines the maximizer in Lemma 4.2. The integral representation for f(A)=A log A is

∞ A λ A log A = 1 + dλ, λ +1− λ + A $0 # %

and then one readily concludes that the unique maximizer RH,Y to the variational problem in (4.5) is the unique solution in Pn of

∞ Y 1 1 λ Y dλ = Y 1/2(H + 1)Y 1/2. λ +1− λ + R λ + R $0 # % When H and Y commute, one readily checks that R = eH is the unique solution in Pn. We now show how some of the logarithmic inequalities that follow from Theo- rem 1.1 may be used to get upper and lower bounds on Tr[eH+log Y ]. Given two positive matrices W and V ,onewaytoshowthatTr[W ] Tr[V ]is ≤ to show that

Tr[W log W ] Tr[W log V ]. (4.7) Bull. Math. Sci. Downloaded from www.worldscientific.com ≤ Then

0 D(W V )=Tr[W log W ] Tr[W log V ] Tr[W ]+Tr[V ] ≤ ∥ − − Tr[W ]+Tr[V ]. (4.8) ≤− by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. Thus, when (4.7) is satisfied, one not only has Tr[W ] tr[V ], but the stronger ≤ bound D(W V )+Tr[W ] Tr[V ]. ∥ ≤ Theorem 4.3. Let H, K For r>0, define ∈ Hn rH rK 1/r (1 s)H+sK W := (e #se ) and V := e − . (4.9)

Then for s [0, 1], ∈ D(V W )+Tr[W ] Tr[V ]. (4.10) ∥ ≤

1950008-28 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

Proof. By the remarks preceding the theorem, it suffices to show that for this choice of V and W ,Tr[W log W ] Tr[W log V ]. Define X = eH and Y = eK.The ≤ identity

A =(A#sB)# s/(1 s)B (4.11) − − valid for A, B P is the special case of Theorem C.4 in which t =1,t = ∈ n 1 t /(t t )andt = s.TakingA = Xr = erH and B = Y r = erK,wehave − 0 − 0 0 Xr = W r# Y r with β = s/(1 s). Therefore, by (2.22), β − − 1 Tr[W log X]= Tr[W log(W r# Y r)] r β Tr[W ((1 β)log W + β log Y )]. ≥ − Since 1 β log X log Y =(1 s)log X + s log Y =logV, 1 β − 1 β − − − this last inequality is equivalent to Tr[W log W ] Tr[W log V ]. ≤ Remark 4.4. Since D(W V ) > 0unlessW = V ,(4.10)isstrongerthanthe ∥ inequality Tr[W ] Tr[V ]whichisthecomplementedGolden–Thompsoninequal- ≤ ity of Hiai and Petz [23]. Their proof is also based on (2.22), together with an identity equivalent to (4.11), but they employ these differently, thereby omitting the remainder term D(W V ). ∥ We remark that one may obtain at least one of the cases of (1.10) directly from (4.2) and (4.3) by making an appropriate choice of X in terms of H and Y :Define X1/2 := Y #eH .Then

Bull. Math. Sci. Downloaded from www.worldscientific.com 1/2 1 1/2 1/2 H X Y − X = X # 1Y = e , − and, therefore, making this choice of X,

Ψ (H, Y ) Tr[(Y #eH )2H] Tr[(Y #eH )2H]+Tr[(Y #eH )2] Tr[Y ] BS ≥ − − =Tr[(Y #eH )2] Tr[Y ].

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. − This proves Tr[(Y #eH )2] Tr[eH+log Y ]whichisequivalenttother =1/2, t =1/2 ≤ case of (1.10).

Appendix A. The Peierls–Bogoliubov Inequality and the Gibbs Variational Principle For A H ,letσ(A)denotethespectrumofA,andletA = λP ∈ n λ σ(A) λ be the spectral decomposition of A.Forafunctionf defined by σ(A),∈ f(A)= : λ σ(A) f(λ)Pλ. Likewise, for B Hn,letB = µ σ(B) µQµ be the spectral ∈ ∈ ∈ : 1950008-29 : 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

decomposition of B.Letf be convex and differentiable on an interval containing

σ(A) σ(B). Then, since λ σ(A) Pλ = µ σ(B) Qµ = 1, ∪ ∈ ∈ Tr[f(B) f(A:) f ′(A)(B :A)] − − − = [f(µ) f(λ) f ′(λ)(λ µ)]Tr[P Q ]. (A.1) − − − λ µ λ σ(A) µ σ(B) ∈' ∈' For each µ and λ both [f(µ) f(λ) f ′(λ)(λ µ)] and Tr[P Q ]arenonnegative, − − − λ µ and hence the right side of (A.2) is nonnegative. This yields Klein’s inequality [25]:

Tr[f(B)] Tr[f(A)] + Tr[f ′(A)(B A)]. (A.2) ≥ − Now suppose that the function f is strictly convex on an interval containing σ(A) σ(B). Then for µ = λ,[f(µ) f(λ) f ′(λ)(λ µ)] > 0. If there is equality in ∪ ̸ − − − (A.2), then for each λ σ(A)andµ σ(B)suchthatλ = µ,Tr[P Q ]=0.Since ∈ ∈ ̸ λ µ µ σ(B) Tr[PλQµ]=Tr[Pλ] > 0, λ σ(B)andPλ Qλ.Thesamereasoning ∈ ∈ ≤ shows that for each µ σ(B), µ σ(A)andQ P .Thus,thereisequalityin : ∈ ∈ µ ≤ µ Klein’s inequality if and only if A = B. Taking f(t)=et,(A.2)becomesTr[eB] Tr[eA]+Tr[eA(B A)]. For c R ≥ − ∈ and H, K H ,chooseA = c + H and B = H + K to obtain ∈ n Tr[eH+K ] ecTr[eH ]+ecTr[eH (K c)]. ≥ − H H Choosing c =Tr[eH K]/Tr[eH ], we obtain Tr[eH+K ] eTr[e K]/Tr[e ]Tr[eH ]which ≥ can be written as Tr[eH K] log(Tr[eH+K ]) log(Tr[eH ]), (A.3) Tr[eH ] ≤ − the Peierls–Bogoliubov inequality [8], valid for all H, K H . ∈ n The original application of Klein’s inequality was to the entropy. It may be used to prove the nonnegativity of the relative entropy. Let A, B Pn,andapplyKlein’s Bull. Math. Sci. Downloaded from www.worldscientific.com ∈ inequality with f(x)=x log x to obtain Tr[B log B] Tr[A log A]+Tr[(1+logA)(B A)] = Tr[B] Tr[A]+Tr[B log A]. ≥ − − Rearranging terms yields Tr[B(log B log A)] + Tr[A] Tr[B] 0; that is, − − ≥ D(B A) 0. ∥ ≥ The Peierls–Bogoliubov Inequality has as a direct consequence the quantum by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. Gibbs Variational Principle. Suppose that H H and Tr[eH ]=1.DefineX := eH ∈ n so that X is a density matrix. Then (A.3) specializes to Tr[XK] log(Tr[elog X+K ]), (A.4) ≤ which is valid for all density matrices X and all K H .ReplacingK in (A.4) ∈ n with K log X yields − Tr[XK] log(Tr[eK ]) + Tr[X log X]. (A.5) ≤ For fixed X,thereisequalityin(A.5)forK =logX,andforfixedK,thereis equality in (A.5) for X := eK /Tr[eK ].

1950008-30 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

It follows that for all density matrices X, Tr[X log X]=sup Tr[XK] log(Tr[eK]) : K H (A.6) { − ∈ n} and that for all K H , ∈ n log(Tr[eK ]) = sup Tr[XK] Tr[X log X]:X P , Tr[X]=1 . (A.7) { − ∈ n } This is the Gibbs variational principle for the entropy S(X)= Tr[X log X]. − Now let Y P and replace K with K +logY in (A.5) to conclude that for all ∈ n density matrices X,allY P and all K H , ∈ n ∈ n Tr[XK] log(Tr[eK+log Y ]) + Tr[X(log X log Y )] ≤ − =(log(Tr[eK+log Y ]) + 1 Tr[Y ]) + D(X Y ). (A.8) − ∥ For fixed X,thereisequalityin(A.8)forK =logX log Y ,andforfixedK,there − is equality in (A.5) for X := eK+log Y/Tr[eK+log Y ]. Recalling that for Tr[X]=1, Tr[X(log X log Y )] = D(X Y )+1 Tr[Y ], we have that for all density matrices − ∥ − X,andallY P , ∈ n D(X Y )=sup Tr[XK] (log(Tr[eK+log Y ]) + Tr[Y ] 1) : K H (A.9) ∥ { − − ∈ n} and that for all K H and all Y P , ∈ n ∈ n log(Tr[eK+log Y ]) + 1 Tr[Y ]=sup Tr[XK] D(X Y ):X P , Tr[X]=1 . − { − ∥ ∈ n } (A.10) The paper [3] of Araki contains a discussion of the Peierls–Bogoliubov and Golden–Thompson inequalities in a very general von Neumann algebra setting.

Appendix B. Majorization Inequalities Bull. Math. Sci. Downloaded from www.worldscientific.com n Let x =(x1,...,xn)andy =(y1,...,yn)betwovectorsinR such that xj+1 xj ≤ and y y for each j =1,...,n 1. Then y is said to majorize x in case j+1 ≤ j −

k k n n x y for k =1,...,n 1and x = y , (B.1) j ≤ j − j j j=1 j=1 j=1 j=1

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. ' ' ' ' and in this case we write x y. ≺ AmatrixP M is doubly stochastic in case P has nonnegative entries and ∈ n the entries in each row and column sum toone.ByatheoremofHardy,Littlewood and P´olya [19], x y if and only if there is a doubly stochastic matrix P such that ≺ x = P y.Therefore,ifφ is convex on R and x y,letP be a doubly stochastic ≺ matrix such that x = P y. By Jensen’s inequality, n n n n n φ(x )= φ P y P φ(y )= φ(y ). j j,k k ≤ j,k k k j=1 j=1 ; < ' ' k'=1 j,k'=1 'k=1

1950008-31 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

That is, for every convex function φ, n n x y φ(x ) φ(y ). (B.2) ≺ ⇒ j ≤ j j=1 j=1 ' ' Let X, Y H ,andletλX and λY be the eigenvalue sequences of X and Y , ∈ n respectively, with the eigenvalues repeated according to their geometric multiplic- ity and arranged in decreasing order considered as vectors in Rn.ThenY is said to majorize X in case λX λY ,andinthiscase,wewriteX Y .Itfollows ≺ ≺ immediately from (B.2) that if φ is an increasing convex function, X Y Tr[φ(X)] Tr[φ(Y )] and Tr[X]=Tr[Y ]. (B.3) ≺ ⇒ ≤ The following extends a theorem of Bapat and Sunder [5]. Theorem B.1. Let Φ : M M be a linear transformation such that Φ(A) 0 n → n ≥ for all A 0, Φ(1)=1 and Tr[Φ(A)] = Tr[A] for all A M .Thenforall ≥ ∈ n X H , ∈ n Φ(X) X. (B.4) ≺ Proof. Note that Φ(X) H .LetΦ(X)= n λ v v be the spectral reso- ∈ n j=1 j | j⟩⟨ j | lution of Φ(X)withλj λj+1 for j =1,...,n 1. Fix k 1,...,n 1 .and let k ≥ :− ∈ { − } P = v v .ThenwithΦ∗ denoting the adjoint of Φ with respect to the k j=1 | j⟩⟨ j | Hilbert–Schmidt inner product, : k λj =Tr[PkΦ(X)] j=1 ' k =Tr[Φ∗(P )X] sup Tr[QX], 0 Q 1, Tr[Q]=k = µ , k ≤ { ≤ ≤ } j j=1 ' Bull. Math. Sci. Downloaded from www.worldscientific.com where µ ,...,µ is the eigenvalue sequence of X arranged in decreasing order. { 1 k}

m Bapat and Sunder prove this for Φ of the form Φ(A)= ȷ=1 Vj∗AVj ,where V ,...,V M satisfy 1 m ∈ n : m m

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. Vj Vj∗ = 1 = Vj∗Vj. (B.5) j=1 j=1 ' ' Choi [10, 11] has shown that, for all n 2, the transformation ≥ 1 Φ(A)= ((n 1)Tr[A]1 A) n2 n 1 − − − − cannot be written in the form (B.5), yet it satisfies the conditions of Theorem B.1. Lemma B.2. Let A P and let Φ be defined by ∈ n 1/2 1/2 ∞ A A Φ(X)= X dλ. (B.6) λ + A λ + A $0 1950008-32 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

Then for all X H , (B.4) is satisfied, and for all p 1, ∈ n ≥ Tr[ Φ(X) p] Tr[ X p]. (B.7) | | ≤ | | Proof. Φ evidently satisfies the conditions of Theorem B.1, and then (B.4) implies (B.7) as discussed above.

Appendix C. Geodesics and Geometric Means

There is a natural Riemannian metric on Pn such that the corresponding distance δ(X, Y )isinvariantunderconjugation:

δ(A∗XA,A∗YA)=δ(X, Y )

for all X, Y P and all invertible n n matrices A.ItturnsoutthatforA, B ∈ n × ∈ P , t A# B, t [0, 1], is a constant speed geodesic for this metric that connects n '→ t ∈ A and B.Thisgeometricpointofview,originatingintheworkofstatisticians,and was developed in the form presented here by Bhatia and Holbrook [7]. See also [33].

Definition C.1. Let t X(t), t [a, b], be a smooth path in P .Thearc-length '→ ∈ n along this path in the conjugation invariant metric is

b 1/2 1/2 X(t)− X′(t)X(t)− dt, ∥ ∥2 $a where denotes the Hilbert–Schmidt norm and the prime denotes the derivative. ∥·∥2 The corresponding distance between X, Y P is defined by ∈ n 1 1/2 1/2 δ(X, Y )=inf X(t)− X′(t)X(t)− dt : X(t) P for t (0, 1), ∥ ∥2 ∈ n ∈ 2$0 Bull. Math. Sci. Downloaded from www.worldscientific.com X(0) = X, X(1) = Y . 3 To see the conjugation invariance, let the smooth path X(t)begiven,letan invertible matrix A be given, and define Z(t):=A∗X(t)A.Thenbycyclicityofthe trace,

1/2 1/2 2 1 1 by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. Z(t)− Z′(t)Z(t)− =Tr[Z(t)− Z′(t)Z(t)− Z′(t)] ∥ ∥2 1 1 1 =Tr[A− X(t)− X′(t)X(t)− X′(t)A] 1/2 1/2 2 = X(t)− X′(t)X(t)− . ∥ ∥2 Given any smooth path t X(t), define H(t):=log(X(t)) so that X(t)=eH(t), '→ and then

1 1 s s X′(t)= X(t) − H′(t)X(t) ds, (C.1) $0

1950008-33 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

or equivalently,

∞ 1 1 H′(t)= X′(t) dλ λ + X(t) λ + X(t) $0 1/2 1/2 ∞ X(t) 1/2 1/2 X(t) = (X(t)− X′(t)X(t)− ) dλ. (C.2) λ + X(t) λ + X(t) $0 1/2 1/2 Lemma B.2 yields H′(t) X(t)− X′(t)X(t)− and its consequence ≺ 1/2 1/2 H′(t) X(t)− X′(t)X(t)− . (C.3) ∥ ∥2 ≤∥ ∥2 Now let X(t)beasmoothpathinPn with X(0) = X and X(1) = Y .Then, with H(t)=logX(t) 1 1 log Y log X = H′(t)dt H′(t) dt ∥ − ∥2 ≤ ∥ ∥2 4$0 42 $0 4 4 4 1 4 1/2 1/2 4 X(t)− 4X′(t)X(t)− dt = δ(X, Y ). (C.4) ≤ ∥ ∥2 $0 If X and Y commute, this lower bound is exact: Given X, Y Pn that commute, H(t) ∈ define H(t)=(1 t)log X + t log Y ,andX(t)=e .ThenH′(t)=logY log X, − − independent of t.Hencealloftheinequalitiesin(C.4)areequalities.Moreover,if there is equality in (C.4), then necessarily 1 H′(s)= H′(t)dt =logY log X − $0 for all s [0, 1]. This proves the following. ∈

Lemma C.2. When X, Y Pn commute, there is exactly one constant speed ∈ (1 t)log X+t log Y geodesic running from X to Y in unit time, namely,X(t)=e − ,

Bull. Math. Sci. Downloaded from www.worldscientific.com and δ(X, Y )= log Y log X . ∥ − ∥2 Since conjugation is an isometry in this metric, it is now a simple matter to find the explicit formula for the geodesic connecting X and Y in Pn.Apartfromthe statement on uniqueness, the following theorem is due to Bhatia and Holbrook [7]. by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. Theorem C.3. For all X, Y P , there is exactly one constant speed geodesic ∈ n running from X to Y in unit time, namely,

1/2 1/2 1/2 t 1/2 X(t)=X#tY := X (X− YX− ) X , (C.5) and

1/2 1/2 δ(X, Y )= log(X− YX− ) . ∥ ∥2 Proof. By Lemma C.2, the unique constant speed geodesic running from 1 to 1/2 1/2 1/2 1/2 t X− YX− in unit time is W (t)=(X− YX− ) ;ithastheconstantspeed

1950008-34 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

1/2 1/2 log (X− YX− ) ,and ∥ ∥2 1/2 1/2 1/2 1/2 1/2 1/2 δ(1,X− YX− )= log (X− YX− ) = δ(1,X− YX− ). ∥ ∥2 1/2 1/2 By the conjugation invariance of the metric, δ(X, Y )=δ(1,X− YX− )and X(t)asdefinedin(C.5)hastheconstantspeedδ(X, Y )andrunsfromX to Y in unit time. Thus it is a constant speed geodesic running from X to Y in unit time. 1/2 1/2 If there were another such geodesic, say X(t), then X− X(t)X− would 1/2 1/2 be a constant speed geodesic running from 1 to X− YX− in unit time, and different from W (t), but this would contradict= the uniqueness in= Lemma C.2.

In particular, the midpoint of the unique constant speed geodesic running from X to Y in unit time is the geometric mean of X and Y as originally defined by Pusz and Woronowicz [36]: 1/2 1/2 1/2 1/2 1/2 X#Y = X (X− YX− ) X .

In fact, the Riemannian manifold (Pn, δ)isgeodesicallycomplete:Thesmooth path 1/2 1/2 1/2 t 1/2 t X (X− YX− ) X := X# Y '→ t is well defined for all t R.BytheconjugationinvarianceandLemmaC.2,forall ∈ s, t R, ∈ 1/2 1/2 s 1/2 1/2 t δ(X#sY,X#tY )=δ((X− YX− ) , (X− YX− ) )

1/2 1/2 = t s log(X− YX− ) . | − |∥ ∥2 Since the speed along the curve T X#tY has the constant value 1/2 1/2 '→ log(X− YX− ) ,this,togetherwiththeuniqueness in Theorem C.3, shows ∥ ∥2 that for all t0

Bull. Math. Sci. Downloaded from www.worldscientific.com − This has a number of consequences.

Theorem C.4. Let X, Y Pn, and t0,t1 R.Thenforallt R, ∈ ∈ ∈

X#(1 t)t +tt Y =(X#t0 Y )#t(X#t1 Y ). (C.6) − 0 1

Proof. By what we have noted above, t X#(1 t)t +tt Y is a constant speed '→ − 0 1

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. geodesic running from X# Y to X# Y in unit time, as is t (X# Y )# t0 t1 '→ t0 t (X#t1 Y ). The identity (C.6) now follows from the uniqueness in Theorem C.3.

Taking t0 =0andt1 = s,wehavethespecialcase

X#tsY = X#t(X#sY ). (C.7)

Taking t0 =1andt1 =0,wehavethespecialcase

X#1 tY = Y #tX. (C.8) − The identity (C.8) is well known, and may be derived directly from the formula in (C.5).

1950008-35 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

We are particularly concerned with t X# Y for t [ 1, 2]. Indeed, from the '→ t ∈ − formula in (C.5), 1 1 X# 1Y = X X and X#2Y = Y Y. (C.9) − Y X Let t (0, 1). By combining the formula ∈ 1/2 1/2 1/2 t 1/2 1/2 1/2 1 1/2 t 1/2 X#tY = X (X− YX− ) X = X (X Y − X )− X with the integral representation

t sin(πt) ∞ t 1 sin(πt) ∞ t 1 A− = λ− dλ = λ dλ, π λ + A π 1+λA $0 $0 we obtain, for t (0, 1), ∈ sin(πt) ∞ 1 X# Y = λtX1/2 X1/2dλ t π 1+λX1/2Y 1X1/2 $0 − sin(πt) ∞ 1 = λt dλ. (C.10) π X 1 + λY 1 $0 − − The merit of this formula lies in the following lemma [1].

1 1 1 Lemma C.5 (Ando). The function (A, B) (A− + B− )− is jointly concave '→ on Pn.

1 1 1 1 Proof. Note that A− + B− = A− (A + B)B− ,sothat 1 1 1 1 1 (A− + B− )− = B(A + B)− A =((A + B) A)(A + B)− A − 1 = A A(A + B)− A, − 1 and the claim now follows form the convexity of (A, B) A(A + B)− A '→ Bull. Math. Sci. Downloaded from www.worldscientific.com [24, 31].

The harmonic mean of positive operators A and B, A : B,isdefinedby

1 1 1 A : B := 2(A− + B− )− , (C.11) and hence Lemma C.5 says that (A, B) A : B is jointly concave. Moreover, '→

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. (C.10) can be written in terms of the harmonic mean as

sin(πt) ∞ X# Y = X :(λY )λtdλ, (C.12) t 2π $0 which expresses weighted geometric means as average over harmonic means. By the 1 operator monotonicity of the map A A− ,themapX, Y X : Y is monotone '→ '→ in each variable, and then by (C.12), this is also true for X, Y X# Y .This '→ t proves the following result of Ando and Kubo [26]. Theorem C.6 (Ando and Kubo). For all t [0, 1], (X, Y ) X# Y is jointly ∈ '→ t concave, and monotone increasing in X and Y .

1950008-36 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

The method of Ando and Kubo can be used to prove joint operator concavity theorems for functions on P P that are not connections. The next theorem, n × n due to Fujii and Kamei [17], provides an important example.

1/2 1/2 1 1/2 1/2 Theorem C.7. The map (X, Y ) X log(X Y − X )X is jointly '→− concave.

Proof. The representation

∞ 1 1 log A = dλ λ +1− λ + A $0 # % yields

1/2 1/2 1 1/2 1/2 ∞ 1 1 X log(X Y − X )X = X dλ − X 1 +(λY ) 1 − λ +1 $0 # − − % from which the claim follows.

Theorem C.8. For all t [ 1, 0] [1, 2], the map (X, Y ) X# Y is jointly ∈ − ∪ '→ t convex.

Proof. First suppose that t [ 1, 0]. The case t =0istrivial,andsinceX# 1Y = 1 ∈ − − XY − X which is convex, we may suppose that t ( 1, 0). Let s = t so that ∈ − − s (0, 1). We use the integral representation ∈ sin πs ∞ 1 1 As = λs dλ π λ − λ + A $0 # % valid for A P and s (0, 1) to obtain ∈ n ∈ sin πs ∞ s 1 dλ X#sY = λ X 1 1 , π 0 − X− +(λY )− λ

Bull. Math. Sci. Downloaded from www.worldscientific.com $ # % which by Lemma C.5 is jointly convex. Finally, the identity Y #1 tX = X#tY − shows that the joint convexity for t [1, 2] follows from the joint convexity for ∈ t [ 1, 0]. ∈ −

Acknowledgment

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. This work was partially supported by U.S. National Science Foundation Grants DMS 1501007, 1746254 and PHY 1265118.

References [1] T. Ando, Concavity of certain maps on positive definite matrices and applications to Hadamard products, Linear Algebra Appl. 26 (1979) 203–241. [2] T. Ando and F. Hiai, Log majorization and complementary Golden–Thompson type inequalities, Linear Algebra Appl. 197 (1994) 113–131. [3] H. Araki, Golden–Thompson and Peierls–Bogoliubov inequalities for a general von Neumann algebra, Comm. Math. Phys. 34 (1973) 167–178.

1950008-37 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

E. A. Carlen & E. H. Lieb

[4] H. Araki, On an inequality of Lieb and Thirring, Lett. Math. Phys. 19 (1990) 167–170. [5] R. B. Bapat and V. S. Sunder, On majorization and Schur Products, Linear Algebra Appl. 72 (1995) 107–117. [6] V. P. Belavkin and P. Staszewski, C∗-algebraic generalization of relative entropy and entropy, Ann. Inst. H. Poincar´eSect.A37 (1982) 51–58. [7] R. Bhatia and J. Holbrook, Riemannian geometry and matrix geometric means, Linear Algebra Appl. 181 (1993) 594–168. [8] N. N. Bogoliubov, On a variational principle in the many body problem, Sov. Phys. Doklady 3 (1958) 292. [9] E. A. Carlen and E. H. Lieb, A Minkowski-type trace inequality and strong subaddi- tivity of quantum entropy II: Convexity and concavity, Lett. Math. Phys. 83 (2008) 107–126. [10] M. D. Choi, Positive linear maps on C∗ algebras, Can. J. Math. 24 (1972) 520–529. [11] M. D. Choi, Completely positive linear maps on complex matrices, Linear Algebra Appl. 10 (1975) 285–290. [12] I. Csisz´ar, Information-type measures of difference of probability distributions and indirect observations, Studia Sci. Math. Hungar. 2 (1967) 299–318. [13] C. Davis, Various averaging operations onto subalgebras, Illinois J. Math. 3 (1959) 528–553. [14] M. J. Donald, On the relative entropy, Comm. Math. Phys. 105 (1986) 13–34. [15] J. B. G. Frenk, G. Kassay and J. Kolumb´an, On equivalent results in minimax theory, European J. Operat. Res. 157 (2004) 46–58. [16] S. Frieedland and W. So, On the product of matrix exponentials, Linear Algebra Appl. 196 (1994) 193–205. [17] J. I. Fujii and E. Kamei, Relative operator entropy in noncommutative information theory, Math. Japon. 34 (1989) 341–348. [18] F. Hansen, Quantum entropy derived from first principles, J. Stat. Phys. 165 (2016) 799–808. [19] G. H. Hardy, J. E. Littlewood and G. P´olya, Some simple inequalities satisfied by convex functions, Messenger Math. 58 (1929) 145–152. [20] F. Hiai, Equality cases in matrix norm inequalities of Golden–Thompson type, Linear Multilinear Algebra 36 (1994) 239–249.

Bull. Math. Sci. Downloaded from www.worldscientific.com [21] F. Hiai, M. Ohya and M. Tsukada, Sufficiency, KMS condition, and relative entropy in von Neumann algebras, Pacific J. Math. 96 (1981) 99–109. [22] F. Hiai and D. Petz, The proper formula for relative entropy and its asymptotics in quantum probability, Comm. Math. Phys. 413 (2006) 99–114. [23] F. Hiai and D. Petz, The Golden–Thompson trace inequality is complemented, Linear Algebra Appl. 181 (1993) 153–185. [24] J. Kiefer, Optimum experimental designs, J. R. Stat. Soc. Ser. B 21 (1959) 272–

by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles. 310. [25] O. Klein, Zur Quantenmechanischen Begr¨undung des zweiten Hauptsatzes der W¨armelehre, Z. Phys. 72 (1931) 767–775. [26] F. Kubo and T. Ando, Means of positive linear operators, Math. Ann. 246 (1980) 205–224. [27] H. W. Kuhn and A. W. Tucker, ’s work in the theory of games and mathematical economics, Bull. Amer. Math. Soc. 64 (1958) 100–122. [28] S. Kullback and R. A. Leibler, On information and sufficiency, Ann. Math. Stat. 22 (1951) 79–86. [29] S. Kullback, Lower bound for discrimination information in terms of variation, IEEE Trans. Inform. Theory 13 (1967) 126–127, Correction 16 (1970) 652.

1950008-38 2nd Reading

May 13,2019 11:36 WSPC/1664-3607 319-BMS 1950008

Some trace inequalities

[30] E. H. Lieb, Convex trace functions and the Wigner–Yanase–Dyson conjecture, Adv. Math. 11 (1973) 267–288. [31] E. H. Lieb and M. B. Ruskai, Some operator inequalities of the Schwarz type, Adv. Math. 12 (1974) 269–273. [32] G. Lindblad, Expectations and entropy inequalities for finite quantum systems, Comm. Math. Phys. 39 (1974) 111–119. [33] M. Moakher, A differential geometric approach to the geometric mean of symmetric positive definite matrices, SIAM J. Matrix Anal. Appl. 26 (2005) 735–747. [34] J. E. L. Peck and A. L. Dumage, Games on a compact set, Canad. J. Math. 9 (1957) 450–458. [35] M. S. Pinsker, Information and Information Stability of Random Variables and Pro- cesses (Holden Day, San Francisco, 1964). [36] W. Pusz and S. L. Woronowicz, Functional calculus for sesquilinear forms and the purification map, Rep. Math. Phys. 8 (1975) 159–170. [37] W. Pusz and S. L. Woronowicz, Form convex functions and the WYDL and other inequalities, Lett. Math. Phys. 2 (1978) 505–512. [38] R. T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, 1970). [39] M. Sion, On general minimax theorems, Pacific. J. Math. 8 (1958) 171–175. [40] L. T. Skovgaard, A Riemannian geometry of the multivariate normal model, Scand. J. Stat. 11 (1984) 211–223. [41] J. Tropp, From joint convexity of quantum relative entropy to a concavity theorem of Lieb, Proc. Amer. Math. Soc. 140 (2012) 1757–1760. [42] A. Uhlmann, Relative entropy and the Wigner Yanase Dyson Lieb concavity in an interpolation theory, Comm. Math. Phys. 54 (1977) 21–32. [43] H. Umegaki, Conditional expectation in an operator algebra, IV (entropy and infor- mation), in Kodai Mathematical Seminar Reports (Department of Mathematics, Tokyo Institute of Technology, 1962), pp. 59–85. [44] J. Von Neumann, Zur Theorie der Gesellschaftsspiele, Math. Ann. 100 (1928) 295– 320. Bull. Math. Sci. Downloaded from www.worldscientific.com by 71.188.98.66 on 05/31/19. Re-use and distribution is strictly not permitted, except for Open Access articles.

1950008-39