<<

Semi-Riemannian Optimization

Tingran Gao

William H. Kruskal Instructor Committee on Computational and Applied Mathematics (CCAM) Department of Statistics The University of Chicago

2018 China-Korea International Conference on Matrix Theory with Applications Shanghai University, Shanghai, China

December 19, 2018 Outline

Motivation

I Riemannian Manifold Optimization

I Barrier Functions and Interior Point Methods Semi-Riemannian Manifold Optimization

I Optimality Conditions

I Descent Direction

I Semi-Riemannian First Order Methods

I Independence of Second Order Methods Optimization on Semi-Riemannian

Joint work with Lek-Heng Lim (The University of Chicago) and Ke Ye (Chinese Academy of Sciences) Riemannian Manifold Optimization

I Optimization on :

min f (x) x∈M

where (M, g) is a (typically nonlinear, non-convex) Riemannian manifold, f : M → R is a smooth function on M I Difficult constrained optimization; handled as unconstrained optimization from the perspective of manifold optimization

I Key ingredients: tangent spaces, , exponential map, parallel transport, ∇f , Hessian ∇2f Riemannian Manifold Optimization

I Riemannian Steepest Descent

x = Exp (−t∇f (x )) , t ≥ 0 k+1 xk k

I Riemannian Newton’s Method  2  ∇ f (xk ) ηk = −∇f (xk ) x = Exp (−tη ) , t ≥ 0 k+1 xk k

I Riemannian trust region

I Riemannian conjugate gradient

I ......

Gabay (1982), Smith (1994), Edelman et al. (1998), Absil et al. (2008), Adler et al. (2002), Ring and Wirth (2012), Huang et al. (2015), Sato (2016), etc. Riemannian Manifold Optimization

I Applications: Optimization problems on matrix manifolds

I Stiefel manifolds

n Vk (R ) = O (n) /O (k)

I Grassmannian manifolds

Gr (k, n) = O (n) / (O (k) × O (n − k))

I Flag manifolds: for n1 + ··· + nd = n,

Flag = O (n) / (O (n ) × · · · × O (n )) n1,··· ,nd 1 d

I Shape Spaces k×n  R \{0} /O (n)

I ...... From Riemannian to Semi-Riemannian Motivation 1: Question from a Geometer

How does optimization depend on the choice of metrics?

I Critical points {x ∈ M | ∇f (x) = 0} are metric independent

I , Hessians, Geodesics are metric dependent

I Index of Hessian becomes metric independent at a critical point;

I Optimization trajectories are obviously metric dependent

I Newton’s equation is metric independent

 2  h 2 i ∇ f (xk ) ηk = −∇f (xk ) ⇔ ∇e f (xk ) ηk = −∇e f (xk )

I Does it still work if the metric structure is more general than Riemannian? (E.g. Semi-Riemannian, Finsler, etc.) Barrier Functions and Interior Point Methods

d f : Q → R strictly convex, defined on an open subset Q ⊂ R . 2 I ∇ f (x) 0 on any x ∈ Q 2 I ∇ f (x) defines a Riemannian metric on Q  2 −1 I gradient under this Riemannian metric: ∇ f (x) ∇f (x)

I Newton’s method for f is exactly the gradient descent under the Riemannian metric defined by ∇2f (x)!

I Nesterov and Todd (2002): primal-dual central path are “close to” geodesics under this Riemannian metric

• Nesterov, Yurii E., and Michael J. Todd. On the Riemannian defined by self-concordant barriers and interior-point methods. Foundations of Computational Mathematics 2, no. 4 (2002): 333-361. • Duistermaat, Johannes J. On Hessian Riemannian structures. Asian Journal of Mathematics 5, no. 1 (2001): 79-91. From Riemannian to Semi-Riemannian

Motivation 2: Nonconvex objectives for interior point methods I What if f : Q → R is not strictly convex? I What happens to Newton’s method if the Hessian is not positive semi-definite? (Newton’s method with modified Hessian) Semi-

I Scalar product: h·, ·i : V × V → R symmetric bilinear form on vector V , not necessarily positive (semi-)definite

I Non-degeneracy: hv, wi = 0 for all w ∈ V ⇔ v = ~0

I A semi-Riemannian manifold is a differentiable manifold M with a scalar product h·, ·ix defined on Tx M, and h·, ·ix varies smoothly with respect to x ∈ M

I A semi-Riemannian manifold is non-degenerate if h·, ·ix is non-degenerate for all x ∈ M

I Non-degeneracy needs not be inherited by subspaces!

I Semi-Riemannian , , parallel transport, gradient, Hessian, ...... all well-defined if non-degenerate

I Of great interest to as a model (Lorentzian geometry) Semi-Riemannian Geometry

A Simplest Degenerate Subspace Example 2 I Consider R equipped with the Lorentzian metric of signature 2 2 (−, +), i.e., for x = (x1, x2) ∈ R and y = (y1, y2) ∈ R , define the scalar product as hx, yi = −x1y1 + x2y2 2 I Let W = span {(1, 1)}, which is a degenerate subspace of R ⊥  2 I If we define W := v ∈ R | hw, vi = 0 ∀w ∈ W , then one has W = W ⊥!

I Take-home message: If W is a degenerate subspace of a scalar product space (V , h·, ·i), then in general W + W ⊥ 6= V , and it may well be the case that W ∩ W ⊥ 6= 0!

I Nevertheless: If W is a non-degenerate subspace of V , then it is still true that V = W ⊕ W ⊥ Examples of Semi-Riemannian Manifolds

p,q p+q I R (p ≥ 0, q ≥ 0): R equipped with scalar product

> hu, vi = u Ip,qv

where   −Ip 0 Ip,q = 0 Iq n×n I Square matrices R , equipped with scalar product

 >  > hA, Bi = Tr A Ip,qB = vec (A) (In ⊗ Ip,q) vec (B)

I Submanifolds of semi-Riemannian manifolds Examples of Semi-Riemannian Manifolds: Applications

I Spacetime models: Lorentzian spaces, de Sitter spaces, anti-de Sitter spaces, ......

I Network models: social networks “behave like” spaces of negative Alexandrov

• Carroll, Sean M. Spacetime and geometry: An introduction to general relativity. San Francisco, USA: Addison-Wesley, 2004. • Krioukov, Dmitri, Maksim Kitsak, Robert S. Sinkovits, David Rideout, David Meyer, and Mari´anBogu˜n´a. Network cosmology. Scientific reports 2 (2012): 793. Outline

Motivation

I Riemannian Manifold Optimization

I Barrier Functions and Interior Point Methods Semi-Riemannian Manifold Optimization

I Optimality Conditions

I Descent Direction

I Semi-Riemannian First Order Methods

I Metric Independence of Second Order Methods Optimization on Semi-Riemannian Submanifolds

Joint work with Lek-Heng Lim (The University of Chicago) and Ke Ye (Chinese Academy of Sciences) Optimality Conditions

Given a semi-Riemannian manifold with scalar product h·, ·ix on Tx M, define the semi-Riemannian gradient of a smooth function f ∈ C 2 (M) by

hDf , X ix = Xf (x) , ∀X ∈ Γ(M, Tx M) , x ∈ M

and define the semi-Riemannian Hessian of f by

2 D f (X , Y ) = X (Yf ) − (DX Y ) f , ∀X , Y ∈ Γ(M, Tx M)

where DX Y is the covariant derivative of Y with respect to X . Optimality Conditions

Necessary conditions: A local optimum x ∈ M satisfies I (First order necessary condition) Df (x) = 0 if f : M → R is first-order differentiable

I (Second order necessary condition) Df (x) = 0 and 2 D f (x)  0 if f : M → R is second-order differentiable Sufficient condition:

I (Second-order sufficient condition) If x ∈ M is an interior point on a semi-Riemannian manifold M, if Df (x) = 0 and D2f (x) 0, then x is a strict relative minimum of f Descent Direction

I If general, −Df (x) is not a descent direction at x ∈ M! The main issue is that h−Df (x) , Df (x)ix may be negative, so the first-order approximation

f (x − Df (x)) ≈ f (x) − hDf (x) , Df (x)ix needs not be smaller than f (x)

I Similar issue exists in Newton’s method: When Hessian ∇2f (x) is not positive definite, the Newton direction − [∇f (x)]−1 ∇f (x) may not be a descent direction

I Solution for Newton’s method: Modified Hessian Methods

I For semi-Riemannian optimization: similar idea of modifying Df (x) Descent Direction

Solution: modify −Df (x) to obtain a descent direction on the semi-Riemannian manifold

I For each Tx M, use semi-Riemannian Gram-Schmidt to find an orthonormal basis e1, ··· , ed for Tx M n + X I Construct [Df (x)] = hDf (x) , ei i ei i=1 + I − [Df (x)] is a descent direction, since

n + X 2 − [Df (x)] , Df (x) = − |hDf (x) , ei i| < 0 i=1 Semi-Riemannian Steepest Descent I/II Semi-Riemannian Steepest Descent II/II Semi-Riemannian Conjugate Gradient Semi-Riemannian Second-Order Methods

 2  I Newton’s Method: Solve D f (x) η = Df (x) for a descent direction η ∈ Tx M at x ∈ M. Note that this is exactly the same Newton’s equation for Riemannian optimization!

I Trust Region: Minimize a local quadratic optimization problem near each x ∈ M. Again same as Riemannian optimization!

I The equivalence between Riemannian and Semi-Riemannian second-order methods is not just a coincidence; it is because these methods build upon local polynomial surrogates, which are always metric-independent in the sense of jets. Outline

Motivation

I Riemannian Manifold Optimization

I Barrier Functions and Interior Point Methods Semi-Riemannian Manifold Optimization

I Optimality Conditions

I Descent Direction

I Semi-Riemannian First Order Methods

I Metric Independence of Second Order Methods Optimization on Semi-Riemannian Submanifolds

Joint work with Lek-Heng Lim (The University of Chicago) and Ke Ye (Chinese Academy of Sciences) Which manifolds admit semi-Riemannian structures?

I Euclidean spaces and all Riemannian manifolds are particular semi-Riemannian manifolds

I Tangent bundles of manifolds admitting a semi-Riemannian metric that is not Riemannian must admit at least one decomposition into two sub-bundles

I Any manifold with trivial admits semi-Riemannian manifolds, e.g. all matrix Lie groups, p,q Minkowski R I Unfortunately, almost all matrix Lie groups are either degenerate semi-Riemannian manifolds as submanifolds of the ambient Minkowski space, or do not admit closed-form semi-Riemannian geodesics and/or parallel-transports Optimization on Semi-Riemannian Submanifolds

p,q ∼ p+q Minkowski Spaces R (= R as vector spaces) p,q p,q For all x, y ∈ R and V , W ∈ Tx R : p,q p,q I Tangent Spaces: Tx R = R p p+q X X I Scalar Product: hV , W ix = − Vi Wi + Vj Wj i=1 j=p+1

I Geodesics: Expx (tV ) = x + tV p,q p,q I Parallel Transport: Px→y V = V , ∀V ∈ Tx R = Ty R I Semi-Riemannian Gradient: Df (x) = Ip,q∇f (x) 2 2 I Semi-Riemannian Hessian: D f (x) = Ip,q∇ f (x) Ip,q Example: Minkowski Spaces

min x>Ax, where A  0 x∈R1,1

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

-0.2 -0.2

-0.4 -0.4

-0.6 -0.6

-0.8 -0.8

-1 -1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 Optimization on Semi-Riemannian Submanifolds

p+q−1 Euclidean sphere in a Minkowski space S p+q−1 p,q I Tangent Spaces: Tx S = {V ∈ Tx R | hV , Ip,qxi = 0} p,q I Scalar Product: Inherited from the ambient space R I The induced scalar product is non-degenerate except for a set  p+q−1 > of measure zero Z := x ∈ S | x Ip,qx = 0 p,q p+q−1 I Projection from Tx R to Tx S :

> V x p+q−1 Projx (V ) = V − > Ip,qx, ∀x ∈ S \Z x Ip,qx

I Geodesics: No closed-form expression, but can use Riemannian geodesics for optimization purposes

I Parallel Transport: No closed-form expression, but can use Riemannian parallel-transports for optimization purposes Optimization on Semi-Riemannian Submanifolds

p+q−1 Euclidean sphere in a Minkowski space S

I Semi-Riemannian Gradient:

p+q−1 Df (x) = Projx (∇f (x)) , ∀x ∈ S \Z

I Semi-Riemannian Hessian:

2 2 p+q−1 D f (x) = Projx ◦ ∇ f (x) , ∀x ∈ S \Z

2 p,q p,q (Viewing ∇ f (x) as a linear map from Tx R to Tx R , 2 p+q−1 p+q−1 and D f (x) as a linear map from Tx S to Tx S ) Example: Euclidean Spheres in Minkowski Spaces

max x>Ax, where A = A> 2 2 x1 +···+xp+q=1

10 1 10 1

0 10 10 0

-1 10 10 -1

-2 10 10 -2

-3 10 10 -3

-4 10 10 -4

-5 10 10 -5

-6 10 10 -6

-7 10 10 -7

10 -8 10 -8 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 180 Optimization on Semi-Riemannian Submanifolds p,q p+q−1 Pseudosphere S in a Minkowski space S

p,q  p,q 2 2 2 2 S := x ∈ R | −x1 − · · · − xp + xp+q + ··· + xp+q = 1 p,q p,q I Tangent Spaces: Tx S = {V ∈ Tx R | hV , xi = 0} p,q I Scalar Product: Inherited from the ambient space R Non-degenerate! p,q p,q I Projection from Tx R to Tx S :

 >  p,q Projx (V ) = V − V Ip,qx x, ∀x ∈ S

I Semi-Riemannian Gradient:

p,q Df (x) = Projx (∇f (x)) , ∀x ∈ S

I Semi-Riemannian Hessian:

2 2 p,q D f (x) = Projx ◦ ∇ f (x) , ∀x ∈ S Optimization on Semi-Riemannian Submanifolds

p,q p+q−1 Pseudosphere S in a Minkowski space S

p,q  p,q 2 2 2 2 S := x ∈ R | −x1 − · · · − xp + xp+q + ··· + xp+q = 1 p,q p,q I Geodesics: For any x ∈ S and V ∈ Tx S ,

Expx (tV ) =  V x cos(t kV k) + sin(t kV k), if hV , V i > 0,   kV k    V x cosh(t kV k) + sinh(t kV k), if hV , V i < 0,  kV k    x + tV , if hV , V i = 0.

where kV k := p|hV , V i|. Optimization on Semi-Riemannian Submanifolds

p,q p+q−1 Pseudosphere S in a Minkowski space S p,q I Parallel Transport along geodesics: For any x ∈ S and p,q V ∈ Tx S , the parallel transport of V along a geodesic p,q emanating from x in the direction X ∈ Tx S is P (V ) = x→Expx (tX )  hV , X i  X   − x sin (t kX k) − cos(t kX k) ,  kX k kX k   2  + + V − hV , X iX / kX k , if hX , X i > 0,   hV , X i  X  − x sinh (t kX k) + cosh(t kX k) kX k kX k     2  + V + hV , X iX / kX k , if hX , X i < 0,     1 2 −hV , X i tx + t X + V , if hX , X i = 0.  2 Optimization on Semi-Riemannian Submanifolds

p,q p+q−1 Pseudosphere S in a Minkowski space S

p,q  p,q 2 2 2 2 S := x ∈ R | −x1 − · · · − xp + xp+q + ··· + xp+q = 1

Semi-Riemannian vs. Riemannian optimization on pseudospheres

I Riemannian geodesics and parallel transports are much harder p,q to compute on S , if possible at all p,q I S is non-compact

• Fiori, Simone. Learning by natural gradient on noncompact matrix-type pseudo-Riemannian manifolds. IEEE transactions on neural networks 21, no. 5 (2010): 841-852. Example: Pseudospheres

2 p,q min kx − ξk2 , where ξ∈ / S x∈Sp,q

10 0

10 -2

10 -4

10 -6

0 20 40 60 80 100 120 140 160 180 200

• Tingran Gao, Lek-Heng Lim, and Ke Ye. Semi-Riemannian Manifold Optimization. arXiv:1812.07643 (2018) • MATLAB code available at https://github.com/trgao10/SemiRiem Matrix Manifolds

I Matrix manifolds are all Lie groups, so their tangent bundles are all trivial. They admit semi-Riemannian metrics of arbitrary index.

I Unfortunately, very few matrix manifolds inherit non-degenerate semi-Riemannian structures from the ambient Minkowski space; geodesics and parallel-transports for the non-degenerate ones are hard to compute in general

• Tingran Gao, Lek-Heng Lim, and Ke Ye. Semi-Riemannian Manifold Optimization. arXiv:1812.07643 (2018) Thank you!

Future Work:

I More applications, especially for matrix computation

I Trajectory analysis for non-convex optimization

I Numerical Differential Geometry

• Tingran Gao, Lek-Heng Lim, and Ke Ye. Semi-Riemannian Manifold Optimization. arXiv:1812.07643 (2018) • MATLAB code available at https://github.com/trgao10/SemiRiem