ORDERING POSITIVE DEFINITE MATRICES∗
CYRUS MOSTAJERAN† AND RODOLPHE SEPULCHRE†
+ Abstract. We introduce new partial orders on the set Sn of positive definite matrices of + dimension n derived from the affine-invariant geometry of Sn . The orders are induced by affine- invariant cone fields, which arise naturally from a local analysis of the orders that are compatible + with the homogeneous geometry of Sn defined by the natural transitive action of the general linear + group GL(n). We then take a geometric approach to the study of monotone functions on Sn and establish a number of relevant results, including an extension of the well-known L¨owner-Heinz theorem derived using differential positivity with respect to affine-invariant cone fields.
Key words. Positive Definite Matrices, Partial Orders, Monotone Functions, Monotone Flows, Differential Positivity, Matrix Means
AMS subject classifications. 15B48, 34C12, 37C65, 47H05
1. Introduction. Well-defined notions of ordering of elements of a space are of fundamental importance to many areas of applied mathematics, including the theory of monotone functions and matrix means in which orders play a defining role [17, 11,2, 14]. Partial orders play a key part in a wide variety of applications across information geometry where one is interested in performing statistical analysis on sets of matrices. In such applications, the choice of order relation is often taken for granted. This choice, however, is of crucial significance since a function that is not monotone with respect to one order, may be monotone with respect to another. We outline a geometric approach to systematically generate orders on homoge- neous spaces. A homogeneous space is a manifold that admits a transitive action by a Lie group, in the sense that any two points on the manifold can be mapped onto each other by elements of a group of transformations that act on the space. The obser- vation that cone fields induce conal orders on continuous spaces, combined with the geometry of homogeneous spaces forms the basis of the approach taken in this paper. The aim is to generate cone fields that are invariant with respect to the homogeneous geometry, thereby defining partial orders built upon the underlying symmetries of the space. A smooth cone field on a manifold is often also referred to as a causal struc- ture. The geometry of invariant cone fields and causal structures on homogeneous spaces has been the subject of extensive studies from a Lie theoretic perspective; see [18, 13, 12], for instance. Causal structures induced by quadratic cone fields on man- ifolds also play a fundamental role in mathematical physics, in particular within the theory of general relativity [22]. The focus of this paper is on ordering the elements of the set of symmetric pos- + arXiv:1712.02573v5 [math.DG] 3 May 2018 itive definite matrices Sn of dimension n. Positive definite matrices arise in numer- ous applications, including as covariance matrices in statistics and computer vision, as variables in convex and semidefinite programming, as unknowns in fundamental problems in systems and control theory, as kernels in machine learning, and as dif- + fusion tensors in medical imaging. The space Sn forms a smooth manifold that can be viewed as a homogeneous space admitting a transitive action by the general linear
∗ This work was funded by the Engineering and Physical Sciences Research Council (EPSRC) of the United Kingdom, as well as the European Research Council under the Advanced ERC Grant Agreement Switchlet n.670645. †Department of Engineering, University of Cambridge, United Kingdom ([email protected], [email protected]). 1 2 C. MOSTAJERAN, AND R. SEPULCHRE
group GL(n), which endows the space with an affine-invariant geometry as reviewed in Section2. In Section3, this geometry is used to construct affine-invariant cone + fields and new partial orders on Sn . In Section4, we discuss how differential positiv- + ity [9] can be used to study and characterize monotonicity on Sn with respect to the invariant orders introduced in this paper. We also state and prove a generalized ver- sion of the celebrated L¨owner-Heinztheorem [17, 11] of operator monotonicity theory derived using this approach. In Section5, we consider preorder relations induced by + affine-invariant and translation-invariant half-spaces on Sn , and provide examples of functions and flows that preserve such structures. Finally, in Section6, we review the notion of matrix means and establish a connection between the geometric mean and + affine-invariant cone fields on Sn . + + 2. Homogeneous geometry of Sn . The set Sn of symmetric positive definite matrices of dimension n has the structure of a homogeneous space with a transitive + GL(n)-action. The transitive action of GL(n) on Sn is given by congruence transfor- mations of the form
(1) τ :Σ AΣAT A GL(n), Σ S+. A 7→ ∀ ∈ ∀ ∈ n + 1/2 1/2 Specifically, if Σ1, Σ2 Sn , then τA with A = Σ2 Σ1− GL(n) maps Σ1 onto Σ2, where Σ1/2 denotes the∈ unique positive definite square root∈ of Σ. This action is said to be almost effective in the sense that I are the only elements of GL(n) that fix + ± every Σ Sn . The isotropy group of this action at Σ = I is precisely the orthogonal ∈ T group O(n), since τQ : I QIQ = I if and only if Q O(n). Thus, we can identify any Σ S+ with an element7→ of the quotient space GL(∈n)/O(n). That is ∈ n + (2) Sn ∼= GL(n)/O(n). + The identification in (2) can also be made by noting that Σ Sn admits a Cholesky decomposition Σ = CCT for some C GL(n). The Cauchy∈ polar decomposition of the invertible matrix C yields a unique∈ decomposition C = PQ of C into an orthogonal n matrix Q O(n) and a symmetric positive definite matrix P S+. Now note that if Σ has Cholesky∈ decomposition Σ = CCT and C has a Cauchy∈ polar decomposition C = PQ, then Σ = P QQT P = P 2. That is, Σ is invariant with respect to the + orthogonal part Q of the polar decomposition. Therefore, we can identify any Σ Sn with the equivalence class [Σ1/2] = Σ1/2 O(n) in the quotient space GL(n)/O(n∈). · n n Recall that the Lie algebra gl(n) of GL(n) consists of the set R × of all real n n matrices equipped with the Lie bracket [X,Y ] = XY YX, while the Lie algebra× of n n T − n n O(n) is o(n) = X R × : X = X . Since any matrix X R × has a unique { ∈1 T −1 } T ∈ decomposition X = 2 (X X ) + 2 (X + X ), as a sum of an antisymmetric part − n n T and a symmetric part, we have gl(n) = o(n) m, where m = X R × : X = X . 1 ⊕T { ∈ } Furthermore, since AdQ(S) = QSQ− = QSQ is a symmetric matrix for each S m, we have ∈
(3) Ad m m, O(n) ⊆ + which shows that Sn = GL(n)/O(n) is in fact a reductive homogeneous space with reductive decomposition gl(n) = o(n) m. Also, note that since (XY YX)T = Y T XT XT Y T , we have [o(n), o(n)] ⊕o(n), [m, m] o(n), and [o(n), m−] m. The − + + ⊆ ⊆ ⊆ tangent space ToSn of Sn at the base-point o = [I] = I O(n) is identified with m. + + + · For each Σ S , the action τ 1/2 : S S induces the vector space isomorphism ∈ n Σ n → n ORDERING POSITIVE DEFINITE MATRICES 3
+ + dτ 1/2 : T S T S given by Σ |I I n → Σ n 1/2 1/2 (4) dτ 1/2 X = Σ XΣ , X m. Σ I ∀ ∈ + The map (4) can be used to extend structures defined in ToSn to structures defined + on the tangent bundle TSn through affine-invariance, provided that the structures + in ToSn are AdO(n)-invariant. The AdO(n)-invariance is required to ensure that the + extension to TSn is unique and thus well-defined. For instance, any homogeneous + Riemannian metric on Sn ∼= GL(n)/O(n) is determined by an AdO(n)-invariant inner product on m. Any such inner product induces a norm that is rotationally invariant and so can only depend on the scalar invariants tr(Xk) where k 1 and X m. Moreover, as the inner product is a quadratic function, X 2 must be≥ a linear combi-∈ 2 2 k k nation of (tr(X)) and tr(X ). Thus, any AdO(n)-invariant inner product on m must be a scalar multiple of
(5) X,Y m = tr(XY ) + µ tr(X) tr(Y ), h i where µ is a scalar parameter with µ > 1/n to ensure positive-definiteness [21]. Therefore, the corresponding affine-invariant− Riemannian metrics are generated by (4) and given by
1/2 1/2 1/2 1/2 X,Y Σ = Σ− XΣ− , Σ− Y Σ− m h i h 1 1 1 i 1 (6) = tr(Σ− XΣ− Y ) + µ tr(Σ− X) tr(Σ− Y ),
+ + for Σ Sn and X,Y TΣSn . In the case µ = 0, (6) yields the most commonly used ∈ ∈ + ‘natural’ Riemannian metric on Sn , which corresponds to the Fisher information metric for the multivariate normal distribution [8, 23], and has been widely used in applications such as tensor computing in medical imaging [4]. 3. Affine-invariant orders. + 3.1. Affine-invariant cone fields. A cone field on Sn smoothly assigns a + + K cone (Σ) TΣSn to each point Σ Sn . In this paper, we consider a cone to be a solidK and⊂ pointed subset of a vector∈ space that is closed under linear combinations with positive coefficients. We say that is affine-invariant or homogeneous with + K respect to the quotient geometry Sn ∼= GL(n)/O(n) if (7) dτ (Σ) = (τ (Σ)), A ΣK K A + for all Σ Sn and A GL(n). The procedure we will use for constructing affine- ∈ ∈ + invariant cone fields on Sn is similar to the approach taken for generating the affine- invariant Riemannian metrics in Section2. We begin by defining a cone (I) at I K that is AdO(n)-invariant:
(8) X (I) Ad X = dτ X = QXQT (I), Q O(n). ∈ K ⇐⇒ Q Q I ∈ K ∀ ∈
Using such a cone, we generate a cone field via + 1/2 1/2 (9) (Σ) = dτ 1/2 (I) = X T S :Σ− XΣ− (I) . K Σ I K { ∈ Σ n ∈ K }
The AdO(n)-invariance condition (8) is satisfied if (I) has a spectral characterization; K + m that is, we can check to see if any given X TI Sn ∼= lies in (I) using only properties of X that are characterized by its spectrum.∈ This observationK leads to the following result. 4 C. MOSTAJERAN, AND R. SEPULCHRE
+ Proposition 3.1. A cone (I) TI Sn is AdO(n)-invariant if and only if there n K ∈ exists a cone Λ R that satisfies K ⊂ (10) λ P λ , ∈ KΛ ⇐⇒ ∈ KΛ n n for all permutation matrices P R × , such that X (I) whenever λX Λ, ∈ ∈ K ∈ K where λX = (λi(X)) is a vector consisting of the n real eigenvalues of the symmetric matrix X. For instance, tr(X) and tr(X2) are both functions of X that are spectrally character- ized and indeed AdO(n)-invariant. Quadratic AdO(n)-invariant cones are defined by inequalities on suitable linear combinations of (tr(X))2 and tr(X2). Proposition 3.2. For any choice of parameter µ (0, n), the set ∈ (11) (I) = X T S+ : (tr(X))2 µ tr(X2) 0, tr(X) 0 , K { ∈ I n − ≥ ≥ } + n n T defines an AdO(n)-invariant cone in TI S = X R × : X = X . n { ∈ } 2 T T Proof. AdO(n)-invariance is clear since tr(X ) = tr(QXQ QXQ ) and tr(X) = tr(QXQT ) for all Q O(n). To prove that (11) is a cone, first note that 0 (I) and for λ > 0, X ∈(I), we have λX (I) since tr(λX) = λ tr(X) 0 and∈ K ∈ K ∈ K ≥ (12) (tr(λX))2 µ tr((λX)2) = λ2[(tr(X))2 µ tr(X2)] 0. − − ≥ To show convexity, let X ,X (I). Now tr(X + X ) = tr(X ) + tr(X ) 0, and 1 2 ∈ K 1 2 1 2 ≥ (tr(X + X ))2 µ tr((X + X )2) = [(tr(X ))2 µ tr(X2)] 1 2 − 1 2 1 − 1 (13) + [(tr(X ))2 µ tr(X2)] + 2[tr(X ) tr(X ) µ tr(X X )] 0, 2 − 2 1 2 − 1 2 ≥ 2 1 2 1 1 1 since tr(X X ) (tr(X )) 2 (tr(X )) 2 tr(X ) tr(X ), where the first in- 1 2 ≤ 1 2 ≤ √µ 1 √µ 2 equality follows by Cauchy-Schwarz. Finally, we need to show that (I) is pointed. If X (I) and X (I), then tr( X) = tr(X) = 0. Thus, (tr(XK ))2 µ tr(X2) = µ∈tr( KX2) 0,− which∈ K is possible if− and only− if all of the eigenvalues of−X are zero; −i.e., if and only≥ if X = 0. The parameter µ controls the opening angle of the cone. If µ = 0, then (11) defines the half-space tr(X) 0. As µ increases, the opening angle of the cone becomes smaller and for µ = n≥(11) collapses to a ray. For each µ (0, n), the cone µ n ∈ Λ = R of Proposition 3.1 is given by K KΛ ⊂ n 2 n n µ n 2 (14) = λ = (λi) R : λi µ λ 0, λi 0 , KΛ ∈ − i ≥ ≥ ( i=1 ! i=1 i=1 ) X X X since tr(X) = n λ (X) and tr(X2) = n λ2(X). Indeed µ is a quadratic cone i=1 i i=1 i KΛ µ n T T (15) P = λ R : λ PQµλ 0, 1 λ 0 , KΛ { ∈ ≥ ≥ } T n where 1 = (1, , 1) R , and Qµ is the n n matrix with entries (Qµ)ii = 1 µ and (Q ) = 1··· for i =∈j. × − µ ij 6 The dual cone C∗ of a subset C of a vector space is a very important notion in convex analysis. For a vector space endowed with an inner product , , the dual V h· ·i cone can be defined as C∗ = y : y, x 0, x C . A cone is said to be self-dual if it coincides with its{ dual∈ V cone.h Iti is ≥ well-known∀ ∈ that} the cone of positive semidefinite matrices is self-dual. The following lemma will be used to characterize µ the form of the dual cone ( )∗ for each µ (0, n) with respect to the standard inner KΛ ∈ product on Rn. ORDERING POSITIVE DEFINITE MATRICES 5
Lemma 3.3. The dual cone of the quadratic cone defined by (15) with respect to the standard inner product on Rn is given by
µ n T 1 T (16) ( )∗ = λ R : λ Q− λ 0, 1 λ 0 . KΛ { ∈ µ ≥ ≥ } 1 The inverse matrix Qµ− is given by
µ (n 1) − − 1 µ(n µ) i = j, (17) (Qµ− )ij = 1− ( µ(n µ) i = j. − 6
T 1 Since µ(n µ) > 0 and µ (n 1) = 1 µ∗ where µ∗ = n µ, we find that λ Qµ− λ 0 − T − − − − ≥ if and only if λ Q ∗ λ 0. That is, µ ≥ µ n µ (18) ( )∗ = − . KΛ KΛ
We notice of course from (18) that AdO(n)-invariant cones are generally not self-dual. Indeed, for quadratic AdO(n)-invariant cones, self-duality is only achieved for µ = n/2. Now for any fixed µ (0, n), we obtain a unique well-defined affine-invariant cone field given by ∈
+ 1 2 1 1 1 (19) (Σ) = X T S : (tr(Σ− X)) µ tr(Σ− XΣ− X) 0, tr(Σ− X) 0 . K { ∈ Σ n − ≥ ≥ } Note that for the value µ = 0, (19) reduces to the affine-invariant half-space field + 1 X TΣSn : tr(Σ− X) 0 . At the other extreme, for µ = n, it is easy to show { ∈ ≥ } + that the set at I is given by the ray X TI Sn : X = λI, λ 0 . By affine- + { ∈ ≥ } invariance, (19) reduces to X TΣSn : X = λΣ, λ 0 for µ = n, which describes { ∈ + ≥ } an affine-invariant field of rays in Sn . It should be noted that of course not all AdO(n)-invariant cones at I are quadratic. Indeed, it is possible to construct polyhedral AdO(n)-invariant cones that arise as the + intersections of a collection of spectrally defined half-spaces in TI Sn . The clearest + example of such a construction is the cone of positive semidefinite matrices in TI Sn , + which of course itself has a spectral characterization (I) = X TI Sn : λi(X) 0, i = 1, . . . , n . K { ∈ ≥ } + 3.2. Affine-invariant pseudo-Riemannian structures on Sn . At this point it is instructive to note the following systematic analysis of all affine-invariant pseudo- + Riemannian structures on Sn before continuing with our treatment of affine-invariant cone fields. This elegant characterization presents the affine-invariant Riemannian metrics of (6) and the quadratic affine-invariant cone fields of (19) within a unified and rigorous mathematical framework. Recall that a pseudo-Riemannian metric is a generalization of a Riemannian metric in which the metric tensor need not be positive definite, but need only be a non-degenerate, smooth, symmetric bilinear form. The signature of such a metric tensor is defined as the ordered pair consisting of the number of positive and negative eigenvalues of the real and symmetric matrix of the metric tensor with respect to a basis. Note that the signature of a metric tensor is independent of the choice of basis by Sylvester’s law of inertia. A metric tensor on a smooth manifold is called Lorentzian if its signature is (1, dim 1). M M − The irreducible decomposition of m under the AdO(n)-action is given by m = RI m0, where m0 := X m : tr X = 0 . According to this decomposition, we have⊕X = tr X I π(X){ for any∈ X m, where} π(X) := X tr X I m . Denote by n ⊕ ∈ − n ∈ 0 6 C. MOSTAJERAN, AND R. SEPULCHRE
2 X,Y std the standard inner product tr(XY ) on m, and let X std := X,X std be hthe correspondingi norm. Then we have k k h i
(tr X)2 (20) tr(X2) = X 2 = + π(X) 2 . k kstd n k kstd
Now since m0 is an irreducible AdO(n)-module, any AdO(n)-invariant quadratic form 2 on m0 is simply a scalar multiple of std by Schur’s lemma. Therefore, any AdO(n)- invariant quadratic form on m is ofk the · k form
(tr X)2 (21) Q (X) := α + β π(X) 2 , αβ n k kstd
with α, β R. Clearly, Qαβ is positive definite if and only if α > 0 and β > 0. ∈ Moreover, if α > 0 and β < 0, then Qαβ is Lorentzian and the set X m : Q (X) 0, tr X 0 defines a pointed cone. Noting that { ∈ αβ ≥ ≥ } 1 (22) tr(XY ) + µ tr(X) tr(Y ) = µ + tr(X) tr(Y ) + π(X), π(Y ) , n h istd for each X,Y m, we confirm that the metrics in (6) are indeed positive definite if and only if µ >∈ 1/n. Similarly, we find that − n µ (23) (tr X)2 µ tr(X2) = − (tr X)2 µ π(X) 2 , − n − k kstd which is Lorentzian if and only if 0 < µ < n. Thus, we see that the affine-invariant + pseudo-Riemannian structures on Sn are essentially either Riemannian or Lorentzian, and the quadratic cone fields in (19) are precisely the cone fields defined by the affine- invariant Lorentzian metrics. + 3.3. Affine-invariant partial orders on Sn . A smooth cone field on a manifold gives rise to a conal order on , defined by x y if thereK exists a M ≺K M ≺K (piecewise) smooth curve γ : [0, 1] with γ(0) = x, γ(1) = y and γ0(t) (γ(t)) whenever the derivative exists. The→ Mclosure of this order is again an order∈ K and satisfies x y if and only if y z : x z≤.K We say that is globally orderable if is a≤ partialK order. Here we∈ will{ prove≺K that} the conal ordersM induced by affine- ≤K + invariant cone fields on Sn define partial orders. That is, we will show that the conal orders satisfy the antisymmetry property that Σ1 Σ2 and Σ2 Σ1 together ≤K + ≤K imply Σ1 = Σ2, for any affine-invariant cone field on Sn . In other words, we K + will prove that there do not exist any non-trivial closed conal curves in Sn . In the following, we will make use of the preimage theorem [3] given below. Recall that given a smooth map F : between manifolds, we say that a point y is a regular M → N 1 ∈ N value of F if for all x F − (y) the map dF : T T is surjective. ∈ |x xM → yN Theorem 3.4 (The preimage theorem). Let F : be a smooth map of manifolds, with dim = m and dim = n. If x Mis→ a regular N value of F , then 1 M N ∈ N F − (c) is a submanifold of of dimension m n. Moreover, the tangent space of 1 M − F − (c) at x is equal to ker(dF ). |x Now define F : S+ R by F (Σ) = det Σ. By Jacobi’s formula, the differential n → of the determinant takes the form d(det) ΣX = tr (adj(Σ)X) , where adj(Σ) denotes the adjugate of Σ. That is, |
1 (24) dF X = (det Σ) tr Σ− X , |Σ ORDERING POSITIVE DEFINITE MATRICES 7
+ 1 for all X TΣSn . Note that for c > 0 and any Σ F − (c), we have dF ΣI = 1 ∈ ∈ | c tr Σ− > 0, which clearly shows that any c > 0 is a regular value of F . Hence, 1 F − (c) is a submanifold of codimension 1 for any choice of c > 0. Furthermore, as + 1 im(F ) = R = c R : c > 0 , the collection of submanifolds F − (c) c>0 forms a + { ∈ } + { } foliation of Sn . Since det Σ > 0 for any Σ Sn ,(24) implies that ker(dF Σ) = X + 1 ∈ | 1 { ∈ TΣSn : tr(Σ− X) = 0 . Thus, the tangent spaces to the submanifolds F − (c) c>0 } + { } are described by the affine-invariant distribution Σ of rank dim Sn 1 = n(n+1)/2 1 + + 1 D − − on S defined by := X T S : tr(Σ− X) = 0 . n DΣ { ∈ Σ n } + Proposition 3.5. If γ : [0, 1] Sn is a non-trivial conal curve with respect to a quadratic affine-invariant cone field→ (19), then K (25) t > t = det(γ(t )) > det(γ(t )), 2 1 ⇒ 2 1 for t , t [0, 1]. 1 2 ∈ 1 Proof. First note that X (Σ) 0 implies that tr(Σ− X) > 0. This follows by 1 ∈ K \{ }1 1 1/2 1/2 2 noting that if tr(Σ− X) = 0, then tr(Σ− XΣ− X) = tr[(Σ− XΣ− ) ] 0, which is a contradiction. For simplicity, we assume that γ is a non-trivial smooth conal≤ curve. 1 The proof for a piecewise smooth curve is similar. We then have tr(γ(t)− γ0(t)) > 0, which implies that
d 1 (26) det γ(t) = (det γ(t)) tr γ(t)− γ0(t) > 0. dt
+ Proposition 3.5 clearly implies that Sn equipped with any of the cone fields described by (19) does not admit any non-trivial closed conal curves. Indeed, this result holds for all affine-invariant cone fields, not just quadratic ones. To see this, note 1 that the permutation symmetry (10) of Proposition 3.1, implies that tr(Σ− X) = 0 6 whenever X (Σ) 0 . It thus follows by (26) that det γ : [0, 1] R+ is a strictly monotone function∈ K for\{ any} non-trivial conal curve γ, which◦ rules out→ the existence of closed conal curves. We thus arrive at the following theorem. + Theorem 3.6. All affine-invariant conal orders on Sn are partial orders. At this point it is worth noting a few interesting features of the collection of 1 + submanifolds F − (c) c>0 of Sn . First note that if γ is an inextensible conal curve, { } 1 then by (26) it must intersect each of the submanifolds F − (c) exactly once. That 1 is, for each c > 0, F − (c) defines a Cauchy surface for the causal structure induced by any affine-invariant cone field. We also note the following results which connect + these submanifolds to geodesics on Sn with respect to the standard affine-invariant 2 1 2 + Riemannian metric ds = tr[(Σ− dΣ) ] on Sn . + Proposition 3.7. Endow Sn with the Riemannian structure defined by the stan- 2 1 2 dard Riemannian metric ds = tr[(Σ− dΣ) ]. We have the following results. + i) If Σ1, Σ2 Sn satisfy det Σ1 = det Σ2 = c, then the geodesic from Σ1 to Σ2 lies 1 ∈ in F − (c). + 1 ii) If X TΣSn satisfies tr(Σ− X) = 0, then the geodesic through Σ in the direction ∈ 1 of X stays on the submanifold F − (det Σ). + Proof. i) Let Σ1, Σ2 Sn satisfy det Σ1 = det Σ2. The geodesic γ from Σ1 to Σ2 is given by ∈
1/2 1/2 1/2 1/2 (27) γ(t) = Σ1 exp t log Σ1− Σ2Σ1− Σ1 . 8 C. MOSTAJERAN, AND R. SEPULCHRE
1/2 1/2 Thus, det(γ(t)) = (det Σ1) det(exp(t log(Σ1− Σ2Σ1− )). Using the matrix identity log(det A) = tr(log A), we find that
1 1 1 1 − 2 − 2 − 2 − 2 log det exp t log Σ1 Σ2Σ1 = tr log exp t log Σ1 Σ2Σ1 h i h 1 1 i − 2 − 2 (28) = t tr log Σ1 Σ2Σ1 (29) = t log (det Σ2/ det Σ1) = 0.
1/2 1/2 Therefore, det(exp(t log(Σ1− Σ2Σ1− )) = 1, which implies that det(γ(t)) = det Σ1 for all t R. ∈ + ii) The geodesic γ from Σ in the direction of X TΣSn takes the form γ(t) = 1/2 1/2 1/2 1/2 1 ∈ Σ exp(tΣ− XΣ− )Σ . If tr(Σ− X) = 0, then
1/2 1/2 1/2 1/2 1 (30) log(det(exp(tΣ− XΣ− ))) = tr(tΣ− XΣ− ) = t tr(Σ− X) = 0,
1/2 1/2 which implies that det(γ(t)) = (det Σ) det(exp(tΣ− XΣ− )) = det Σ for all t ∈ R. 3.4. Causal semigroups. Define a wedge to be a closed and convex subset of a vector space that is also invariant with respect to scaling by positive numbers. Notice in particular that a wedge need not be pointed. Let = G/H be a homogeneous space, G a Lie group with group identity element e andM Lie algebra g, H a closed subgroup with Lie algebra h, and π : G the associated projection map. Assume that the Lie algebra g contains a wedge→ MW such that (i) W W = h and (ii) Ad(h)W = W for all h H. A wedge W is said to be a Lie wedge∩ − if ead hW = W for all h W W . Denoting∈ the left action of G on by τ : , we have ∈ ∩ − M g M → M π λg = τg π, where λg is the left multiplication with g on G. Conditions (i) and (ii◦) ensure that◦ dπ dλ W only depends on π(g), so that |g ◦ g|e (31) (π(g)) = (dπ dλ ) W, K |g ◦ g|e yields a well-defined field of pointed cones on that is invariant under the action of G on : dτ (x) = (τ (x)). These resultsM can be found in [12]. The set M g|xK K g S = g G : o τg(o) , where o = π(e), is a closed semigroup of G referred to as the causal{ ∈ semigroup≤K of (} , G, ). The following theorem is derived from [18]. M K 1 Theorem 3.8. Let S = exp W H G, then S = π− ( x : o x ) and is globally orderable with respecth i to ⊆if and only if W = L{ (S∈), M where≤K } M K + (32) L(S) = Z g : exp(R Z) S . { ∈ ⊆ } + The affine-invariant cone fields on Sn = GL(n)/O(n) can be viewed as projections of invariant wedge fields on the Lie group GL(n) in the sense of the above results. Since we have the reductive decomposition gl(n) = o(n) m, it is easy to construct the corresponding wedge field W that satisfies conditions (⊕i) and (ii) for a given affine- invariant cone field . We will now use this structure and Theorem 3.8 to prove the following importantK result. + Theorem 3.9. Let Sn be equipped with an affine-invariant cone field and the 2 1 2 K standard affine-invariant Riemannian metric ds = tr[(Σ− dΣ) ]. For any pair of + matrices Σ1, Σ2 Sn , we have Σ1 Σ2 if and only if the geodesic from Σ1 to Σ2 is a conal curve. ∈ ≤K ORDERING POSITIVE DEFINITE MATRICES 9
Proof. Note that the expression of the geodesic from Σ1 to Σ2 given in (27) implies that this theorem is equivalent to
1/2 1/2 (33) Σ1 Σ2 log Σ1− Σ2Σ1− (I). ≤K ⇐⇒ ∈ K 1/2 1/2 As is affine-invariant, Σ1 Σ2 is equivalent to I Σ1− Σ2Σ1− . Thus, it is sufficientK to prove that ≤K ≤
(34) I Σ log (Σ) (I), ≤K ⇐⇒ ∈ K for any Σ S+. We define a wedge W in gl(n) by ∈ n (35) W := X + Y : X (I),Y o(n) gl(n) = m o(n), { ∈ K ∈ } ⊂ ⊕ m + where (I) is viewed as a subset of ∼= TI Sn . Note that (35) ensures that W satisfies the propertiesK required of it in Theorem 3.8. If I Σ, it follows from Theorem 3.8 that there exists A W such that ≤K ∈ T (36) Σ = π(exp A) = τexp A(I) = (exp A)(exp A) .
By the polar decomposition theorem of [16], any element g = exp A of the semigroup S = exp W O(n) GL(n) admits a unique decomposition as g = (exp X)Q with X Wh m andi Q ⊂ O(n). Thus, we have ∈ ∩ ∈
(37) Σ = τg(I) = τexp X (I) = exp 2X, so that log Σ = 2X (I). ∈ K Remark 1. Let be a quadratic affine-invariant cone field described by (19). K + Given a pair Σ1, Σ2 Sn , we have by Theorem 3.9 that Σ1 Σ2 if and only if 1/2 1/2 ∈ ≤K log Σ− Σ Σ− (I), which is equivalent to 1 2 1 ∈ K 1/2 1/2 tr log(Σ1− Σ2Σ1− ) 0, (38) ≥2 1/2 1/2 1/2 1/2 2 tr(log(Σ− Σ Σ− )) µ tr (log(Σ− Σ Σ− )) 0. 1 2 1 − 1 2 1 ≥ h i 1/2 1/2 1 Since Σ1− Σ2Σ1− and Σ2Σ1− have the same spectrum, (38) can be written as
1 tr log(Σ Σ− ) 0, (39) 2 1 ≥ 1 2 1 2 ( tr(log(Σ2Σ1− )) µ tr (log(Σ2Σ1− )) 0, − ≥ which has the virtue of not involving square roots of Σ1 and Σ2. Equation (39) in turn is equivalent to
log λ 0, (40) i i ≥ ( log λ )2 µ (log λ )2 0, (P i i − i i ≥ 1 P P where λi = λi(Σ2Σ1− )(i = 1, ..., n) denote the n real and positive eigenvalues of 1 Σ2Σ1− . We have thus used invariance to reduce the question of whether a pair of positive definite matrices Σ1 and Σ2 are ordered with respect to any of the quadratic 1 affine-invariant cone fields to a pair of inequalities involving the spectrum of Σ2Σ1− . 10 C. MOSTAJERAN, AND R. SEPULCHRE
2 2 2 z 0 z x y 0 λ2 + ≥ z − − ≥ S2
µ + Λ (a) (I) TI S K I K ⊂ n (b) y λ1 x
+ Fig. 1.(a) Identification of S2 with the interior of the closed, convex, pointed cone K = 3 2 2 2 3 + {(x, y, z) ∈ R : z − x − y ≥ 0, z ≥ 0} in R . The AdO(n)-invariant cone K(I) ⊂ TI Sn at µ 2 identity is also shown for a choice of µ ∈ (0, 1). (b) The corresponding spectral cone KΛ ⊂ R which + characterizes the cone K(I) ⊂ TI Sn .
+ 3.5. Visualization of affine-invariant cone fields on S2 . It is well-known that the set of positive semidefinite matrices of dimension n forms a cone in the + space of symmetric n n matrices. Moreover, Sn forms the interior of this cone. A concrete visualization× of this identification can be made in the n = 2 case, as + shown in Figure1( a). The set S2 can be identified with the interior of the set 3 2 2 2 + K = (x, y, z) R : z x y 0, z 0 , through the bijection φ : S2 int K given{ by ∈ − − ≥ ≥ } →
a b 1 1 (41) φ : (x, y, z) = √2b, (a c), (a + c) . b c 7→ √2 − √2 Inverting φ, we find that a = (z + y)/√2, b = x/√2, c = (z y)/√2. Note that − + the point (x, y, z) = (0, 0, √2) corresponds to the identity matrix I S2 . We seek to arrive at a visual representation of the affine-invariant cone fields∈ generated from the AdO(n)-invariant cones (11) for different choices of the parameter µ. The defining inequalities tr(X) 0 and (tr(X))2 µ tr(X2) 0 in T S+ take the forms ≥ − ≥ I 2 2 (42) δz 0, and 1 δz2 δx2 δy2 0, ≥ µ − − − ≥ + respectively, where (δx, δy, δz) T(0,0,√2)K ∼= TI S2 . The corresponding spectral µ ∈ cone R2 is given by KΛ ⊂ (43) λ + λ 0, and (λ + λ )2 µ(λ2 + λ2) 0. 1 2 ≥ 1 2 − 1 2 ≥ See Figure1( b) for an illustration of such a cone for a choice of µ (0, 1). Clearly the translation invariant cone fields generated from this∈ cone are given + by the same equations as in (42) for (δx, δy, δz) T(x,y,z)K ∼= TΣS2 , where φ(Σ) = ∈ 1 + (x, y, z). To obtain the affine-invariant cone fields, note that at Σ = φ− (x, y, z) S2 , 1 ∈ the inequality tr(Σ− X) 0 takes the form ≥ c b δa δb tr − = c δa 2b δb + a δc 0(44) b a δb δc − ≥ − (45) z δz x δx y δy 0. ⇐⇒ − − ≥ ORDERING POSITIVE DEFINITE MATRICES 11
(a) µ>1 µ =1 µ<1
(b)
+ Fig. 2. Cone fields on S2 : (a) Quadratic affine-invariant cone fields for different choices of the parameter µ ∈ (0, 2). (b) The corresponding translation-invariant cone fields.
1 2 1 1 Similarly, the inequality (tr(Σ− X)) µ tr(Σ− XΣ− X) 0 is equivalent to − ≥ 2(x δx + y δy z δz)2 µ (z2 + x2 y2)δx2 + (z2 x2 y2)δy2 − − − − − (46) + (x2 + y2 + z2)δz2 + 4xy δxδy 4xz δxδz 4yz δyδz] 0, − − ≥ where (δx, δy, δz) T K = T S+. In the case µ = 1, this reduces to ( 2 1)δz2 ∈ (x,y,z) ∼ Σ 2 µ − − δx2 δy2 0. Thus, for µ = 1 the quadratic cone field generated by affine-invariance coincides− with≥ the corresponding translation-invariant cone field. Generally, however, affine-invariant and translation-invariant cone fields do not agree, as depicted in Figure + 2. Each of the distinct cone fields in Figure2 induces a distinct partial order on Sn . + 3.6. The L¨ownerorder. The L¨owner order is the partial order L on Sn defined by ≥
(47) A B A B O, ≥L ⇐⇒ − ≥L where the inequality on the right denotes that A B is positive semidefinite [5]. The − + definition in (47) is based on translations and the ‘flat’ geometry of Sn . It is clear that the L¨ownerorder is translation invariant in the sense that A L B implies that + ≥ A + C L B + C for all A, B, C Sn . From the perspective of conal orders, the L¨ownerorder≥ is the partial order induced∈ by the cone field generated by translations + of the cone of positive semidefinite matrices at TI Sn . In the previous section, we gave an explicit construction showing that the cone field generated through translations of the cone of positive semidefinite matrices at + TI Sn coincides with the cone field generated through affine-invariance in the n = 2 case. We will now show that this is a general result which holds for all n. First note + that the cone at TI Sn can be expressed as
+ T n T (48) (I) = X TI S : u Xu 0 u R , u Xu = 0 u = 0 , K { ∈ n ≥ ∀ ∈ ⇒ } and the resulting translation-invariant cone field is simply given by
+ T n T (49) T (Σ) = X TΣS : u Xu 0 u R , u Xu = 0 u = 0 . K { ∈ n ≥ ∀ ∈ ⇒ } 12 C. MOSTAJERAN, AND R. SEPULCHRE
The corresponding affine-invariant cone field is given by
+ T 1/2 1/2 n A(Σ) = X TΣS : u Σ− XΣ− u 0 u R , K { ∈ n ≥ ∀ ∈ T 1/2 1/2 (50) u Σ− XΣ− u = 0 u = 0 , ⇒ }
which is seen to be equal to T by introducing the invertible transformationu ¯ = 1/2 K Σ− u in (50). Thus we see that the L¨ownerorder enjoys the special status of being both affine-invariant and translation-invariant, even though its classical definition is + based on the ‘flat’ or translational geometry on Sn . + 4. Monotone functions on Sn . + 4.1. Differential positivity. Let f be a map of Sn into itself. We say that f is monotone with respect to a partial order on S+ if f(Σ ) f(Σ ) whenever ≥ n 1 ≥ 2 Σ1 Σ2. Such functions were introduced by L¨ownerin his seminal paper [17] on oper- ator≥ monotone functions. Since then operator monotone functions have been studied extensively and found applications to many fields including electrical engineering [1], network theory, and quantum information theory [6, 19]. Monotonicity of mappings and dynamical systems with respect to partial orders induced by cone fields have a local geometric characterization in the form of differential positivity [9]. A smooth + + map f : Sn Sn is said to be differentially positive with respect to a cone field + → + K+ on Sn if df Σ(δΣ) (f(Σ)) whenever δΣ (Σ), where df Σ : TΣSn Tf(Σ)Sn denotes the| differential∈ K of f at Σ. Assuming∈ that K is a partial| order→ induced by , then f is monotone with respect to if and only≥K if it is differentially positive K ≥K with respect to . To see this, recall that Σ2 Σ1 means that there exists some K + ≥K conal curve γ : [0, 1] Sn such that γ(0) = Σ1, γ(1) = Σ2 and γ0(t) (γ(t)) for → + + ∈ K all t (0, 1). Now f γ : [0, 1] Sn is a curve in Sn with (f γ)(0) = f(Σ1), (f γ∈)(1) = f(Σ ), and◦ → ◦ ◦ 2
(51) (f γ)0(t) = df γ0(t). ◦ |γ(t)
Hence, f γ is a conal curve joining f(Σ1) to f(Σ2) if and only if df γ(t) (γ(t)) (f(γ(t)).◦ | K ⊆ K 4.2. The Generalized L¨owner-HeinzTheorem. One of the most fundamen- tal results in operator theory is the L¨owner-Heinztheorem [17, 11] stated below. Theorem 4.1 (L¨owner-Heinz). If Σ Σ in S+ and r [0, 1], then 1 ≥L 2 n ∈ (52) Σr Σr. 1 ≥L 2 Furthermore, if n 2 and r > 1, then Σ Σ Σr Σr. ≥ 1 ≥L 2 6⇒ 1 ≥L 2 There are several different proofs of the L¨owner-Heinztheorem. See [5, 20, 17, 11], for instance. Most of these proofs are based on analytic methods, such as integral representations from complex analysis. Instead we employ a geometric approach to study monotonicity based on a differential analysis of the system. One of the advan- tages of such an approach is that it is immediately applicable to all of the conal orders considered in this paper, while providing geometric insight into the behavior of the map under consideration. By using invariant differential positivity with respect to the family of affine-invariant cone fields in (19), we arrive at the following extension to the L¨owner-Heinztheorem. ORDERING POSITIVE DEFINITE MATRICES 13
Theorem 4.2 (Generalized L¨owner-Heinz). For any of the affine-invariant par- tial orders induced by the quadratic cone fields (19) parametrized by µ, the map f (Σ) = Σr is monotone on S+ for any r [0, 1]. r n ∈ r This result suggests that the monotonicity of the map fr :Σ Σ for r (0, 1) is + 7→ ∈ intimately connected to the affine-invariant geometry of Sn and not its translational geometry. The structure of the proof of Theorem 4.2 is as follows. We first prove that 1/p the map f1/p :Σ Σ is monotone for any p N. We then extend this result to 7→q/p ∈ maps fq/p :Σ Σ for rational numbers q/p Q (0, 1), before arriving at the full result via a7→ density argument. We prove monotonicty∈ ∩ by establishing differential 1/p positivity in each case. To prove the monotonicity of f1/p :Σ Σ , p N, we only need the following lemma [24]. 7→ ∈ Lemma 4.3. If A and B are Hermitian n n matrices, then × 2m 2m 2m (53) tr[(AB) ] tr[A B ], m N. ≤ ∈ The proof of the theorem for rational exponents is based on a simple observation whose proof nonetheless requires a few technical steps that are based on Proposition 4.5, which itself relies on Lemma 4.4 established in [7, 10]. Lemma 4.4. Let F,G be real-valued functions on some domain D R and Σ, X be Hermitian matrices, such that the spectrum of Σ is contained in D.⊆ If (F,G) is an antimonotone pair so that (F (a) F (b))(G(a) G(b)) 0 for all a, b D, then − − ≤ ∈ (54) tr [F (Σ)XG(Σ)X] tr F (Σ)G(Σ)X2 . ≥ Proposition 4.5. If Σ S+ and X is a Hermitian matrix, then ∈ n 2 k k 1 k 1+k (55) tr Σ− − XΣ X tr Σ− − XΣ− X , ≥ for integers k 0. ≥ 1 2k Proof. Define F,G : (0, ) R by F (x) := x− − and G(x) := x, and note that ∞ → + (F (a) F (b))(G(a) G(b)) 0 for all a, b > 0. Let Σ Sn and X be a Hermitian matrix.− Then, we have− ≤ ∈
2 k k 1 2k −1+k −1+k −1+k −1+k tr Σ− − XΣ X = tr Σ− − Σ 2 XΣ 2 Σ Σ 2 XΣ 2
h 2k −1+k −1+k −1+k −1+k i (56) tr Σ− Σ 2 XΣ 2 Σ 2 XΣ 2 ≥ h 1 k 1+k i (57) = tr Σ− − XΣ− X ,