Semi-Riemannian Manifold Optimization
Tingran Gao
William H. Kruskal Instructor Committee on Computational and Applied Mathematics (CCAM) Department of Statistics The University of Chicago
2018 China-Korea International Conference on Matrix Theory with Applications Shanghai University, Shanghai, China
December 19, 2018 Outline
Motivation
I Riemannian Manifold Optimization
I Barrier Functions and Interior Point Methods Semi-Riemannian Manifold Optimization
I Optimality Conditions
I Descent Direction
I Semi-Riemannian First Order Methods
I Metric Independence of Second Order Methods Optimization on Semi-Riemannian Submanifolds
Joint work with Lek-Heng Lim (The University of Chicago) and Ke Ye (Chinese Academy of Sciences) Riemannian Manifold Optimization
I Optimization on manifolds:
min f (x) x∈M
where (M, g) is a (typically nonlinear, non-convex) Riemannian manifold, f : M → R is a smooth function on M I Difficult constrained optimization; handled as unconstrained optimization from the perspective of manifold optimization
I Key ingredients: tangent spaces, geodesics, exponential map, parallel transport, gradient ∇f , Hessian ∇2f Riemannian Manifold Optimization
I Riemannian Steepest Descent
x = Exp (−t∇f (x )) , t ≥ 0 k+1 xk k
I Riemannian Newton’s Method 2 ∇ f (xk ) ηk = −∇f (xk ) x = Exp (−tη ) , t ≥ 0 k+1 xk k
I Riemannian trust region
I Riemannian conjugate gradient
I ......
Gabay (1982), Smith (1994), Edelman et al. (1998), Absil et al. (2008), Adler et al. (2002), Ring and Wirth (2012), Huang et al. (2015), Sato (2016), etc. Riemannian Manifold Optimization
I Applications: Optimization problems on matrix manifolds
I Stiefel manifolds
n Vk (R ) = O (n) /O (k)
I Grassmannian manifolds
Gr (k, n) = O (n) / (O (k) × O (n − k))
I Flag manifolds: for n1 + ··· + nd = n,
Flag = O (n) / (O (n ) × · · · × O (n )) n1,··· ,nd 1 d
I Shape Spaces k×n R \{0} /O (n)
I ...... From Riemannian to Semi-Riemannian Motivation 1: Question from a Geometer
How does optimization depend on the choice of metrics?
I Critical points {x ∈ M | ∇f (x) = 0} are metric independent
I Gradients, Hessians, Geodesics are metric dependent
I Index of Hessian becomes metric independent at a critical point; Morse theory
I Optimization trajectories are obviously metric dependent
I Newton’s equation is metric independent
2 h 2 i ∇ f (xk ) ηk = −∇f (xk ) ⇔ ∇e f (xk ) ηk = −∇e f (xk )
I Does it still work if the metric structure is more general than Riemannian? (E.g. Semi-Riemannian, Finsler, etc.) Barrier Functions and Interior Point Methods
d f : Q → R strictly convex, defined on an open subset Q ⊂ R . 2 I ∇ f (x) 0 on any x ∈ Q 2 I ∇ f (x) defines a Riemannian metric on Q 2 −1 I gradient under this Riemannian metric: ∇ f (x) ∇f (x)
I Newton’s method for f is exactly the gradient descent under the Riemannian metric defined by ∇2f (x)!
I Nesterov and Todd (2002): primal-dual central path are “close to” geodesics under this Riemannian metric
• Nesterov, Yurii E., and Michael J. Todd. On the Riemannian geometry defined by self-concordant barriers and interior-point methods. Foundations of Computational Mathematics 2, no. 4 (2002): 333-361. • Duistermaat, Johannes J. On Hessian Riemannian structures. Asian Journal of Mathematics 5, no. 1 (2001): 79-91. From Riemannian to Semi-Riemannian
Motivation 2: Nonconvex objectives for interior point methods I What if f : Q → R is not strictly convex? I What happens to Newton’s method if the Hessian is not positive semi-definite? (Newton’s method with modified Hessian) Semi-Riemannian Geometry
I Scalar product: h·, ·i : V × V → R symmetric bilinear form on vector space V , not necessarily positive (semi-)definite
I Non-degeneracy: hv, wi = 0 for all w ∈ V ⇔ v = ~0
I A semi-Riemannian manifold is a differentiable manifold M with a scalar product h·, ·ix defined on Tx M, and h·, ·ix varies smoothly with respect to x ∈ M
I A semi-Riemannian manifold is non-degenerate if h·, ·ix is non-degenerate for all x ∈ M
I Non-degeneracy needs not be inherited by subspaces!
I Semi-Riemannian submanifold, geodesic, parallel transport, gradient, Hessian, ...... all well-defined if non-degenerate
I Of great interest to general relativity as a spacetime model (Lorentzian geometry) Semi-Riemannian Geometry
A Simplest Degenerate Subspace Example 2 I Consider R equipped with the Lorentzian metric of signature 2 2 (−, +), i.e., for x = (x1, x2) ∈ R and y = (y1, y2) ∈ R , define the scalar product as hx, yi = −x1y1 + x2y2 2 I Let W = span {(1, 1)}, which is a degenerate subspace of R ⊥ 2 I If we define W := v ∈ R | hw, vi = 0 ∀w ∈ W , then one has W = W ⊥!
I Take-home message: If W is a degenerate subspace of a scalar product space (V , h·, ·i), then in general W + W ⊥ 6= V , and it may well be the case that W ∩ W ⊥ 6= 0!
I Nevertheless: If W is a non-degenerate subspace of V , then it is still true that V = W ⊕ W ⊥ Examples of Semi-Riemannian Manifolds
p,q p+q I Minkowski space R (p ≥ 0, q ≥ 0): vector space R equipped with scalar product
> hu, vi = u Ip,qv
where −Ip 0 Ip,q = 0 Iq n×n I Square matrices R , equipped with scalar product
> > hA, Bi = Tr A Ip,qB = vec (A) (In ⊗ Ip,q) vec (B)
I Submanifolds of semi-Riemannian manifolds Examples of Semi-Riemannian Manifolds: Applications
I Spacetime models: Lorentzian spaces, de Sitter spaces, anti-de Sitter spaces, ......
I Network models: social networks “behave like” spaces of negative Alexandrov curvature
• Carroll, Sean M. Spacetime and geometry: An introduction to general relativity. San Francisco, USA: Addison-Wesley, 2004. • Krioukov, Dmitri, Maksim Kitsak, Robert S. Sinkovits, David Rideout, David Meyer, and Mari´anBogu˜n´a. Network cosmology. Scientific reports 2 (2012): 793. Outline
Motivation
I Riemannian Manifold Optimization
I Barrier Functions and Interior Point Methods Semi-Riemannian Manifold Optimization
I Optimality Conditions
I Descent Direction
I Semi-Riemannian First Order Methods
I Metric Independence of Second Order Methods Optimization on Semi-Riemannian Submanifolds
Joint work with Lek-Heng Lim (The University of Chicago) and Ke Ye (Chinese Academy of Sciences) Optimality Conditions
Given a semi-Riemannian manifold with scalar product h·, ·ix on Tx M, define the semi-Riemannian gradient of a smooth function f ∈ C 2 (M) by
hDf , X ix = Xf (x) , ∀X ∈ Γ(M, Tx M) , x ∈ M
and define the semi-Riemannian Hessian of f by
2 D f (X , Y ) = X (Yf ) − (DX Y ) f , ∀X , Y ∈ Γ(M, Tx M)
where DX Y is the covariant derivative of Y with respect to X . Optimality Conditions
Necessary conditions: A local optimum x ∈ M satisfies I (First order necessary condition) Df (x) = 0 if f : M → R is first-order differentiable
I (Second order necessary condition) Df (x) = 0 and 2 D f (x) 0 if f : M → R is second-order differentiable Sufficient condition:
I (Second-order sufficient condition) If x ∈ M is an interior point on a semi-Riemannian manifold M, if Df (x) = 0 and D2f (x) 0, then x is a strict relative minimum of f Descent Direction
I If general, −Df (x) is not a descent direction at x ∈ M! The main issue is that h−Df (x) , Df (x)ix may be negative, so the first-order approximation
f (x − Df (x)) ≈ f (x) − hDf (x) , Df (x)ix needs not be smaller than f (x)
I Similar issue exists in Newton’s method: When Hessian ∇2f (x) is not positive definite, the Newton direction − [∇f (x)]−1 ∇f (x) may not be a descent direction
I Solution for Newton’s method: Modified Hessian Methods
I For semi-Riemannian optimization: similar idea of modifying Df (x) Descent Direction
Solution: modify −Df (x) to obtain a descent direction on the semi-Riemannian manifold
I For each Tx M, use semi-Riemannian Gram-Schmidt to find an orthonormal basis e1, ··· , ed for Tx M n + X I Construct [Df (x)] = hDf (x) , ei i ei i=1 + I − [Df (x)] is a descent direction, since
n + X 2 − [Df (x)] , Df (x) = − |hDf (x) , ei i| < 0 i=1 Semi-Riemannian Steepest Descent I/II Semi-Riemannian Steepest Descent II/II Semi-Riemannian Conjugate Gradient Semi-Riemannian Second-Order Methods
2 I Newton’s Method: Solve D f (x) η = Df (x) for a descent direction η ∈ Tx M at x ∈ M. Note that this is exactly the same Newton’s equation for Riemannian optimization!
I Trust Region: Minimize a local quadratic optimization problem near each x ∈ M. Again same as Riemannian optimization!
I The equivalence between Riemannian and Semi-Riemannian second-order methods is not just a coincidence; it is because these methods build upon local polynomial surrogates, which are always metric-independent in the sense of jets. Outline
Motivation
I Riemannian Manifold Optimization
I Barrier Functions and Interior Point Methods Semi-Riemannian Manifold Optimization
I Optimality Conditions
I Descent Direction
I Semi-Riemannian First Order Methods
I Metric Independence of Second Order Methods Optimization on Semi-Riemannian Submanifolds
Joint work with Lek-Heng Lim (The University of Chicago) and Ke Ye (Chinese Academy of Sciences) Which manifolds admit semi-Riemannian structures?
I Euclidean spaces and all Riemannian manifolds are particular semi-Riemannian manifolds
I Tangent bundles of manifolds admitting a semi-Riemannian metric that is not Riemannian must admit at least one decomposition into two sub-bundles
I Any manifold with trivial tangent bundle admits semi-Riemannian manifolds, e.g. all matrix Lie groups, p,q Minkowski R I Unfortunately, almost all matrix Lie groups are either degenerate semi-Riemannian manifolds as submanifolds of the ambient Minkowski space, or do not admit closed-form semi-Riemannian geodesics and/or parallel-transports Optimization on Semi-Riemannian Submanifolds
p,q ∼ p+q Minkowski Spaces R (= R as vector spaces) p,q p,q For all x, y ∈ R and V , W ∈ Tx R : p,q p,q I Tangent Spaces: Tx R = R p p+q X X I Scalar Product: hV , W ix = − Vi Wi + Vj Wj i=1 j=p+1
I Geodesics: Expx (tV ) = x + tV p,q p,q I Parallel Transport: Px→y V = V , ∀V ∈ Tx R = Ty R I Semi-Riemannian Gradient: Df (x) = Ip,q∇f (x) 2 2 I Semi-Riemannian Hessian: D f (x) = Ip,q∇ f (x) Ip,q Example: Minkowski Spaces
min x>Ax, where A 0 x∈R1,1
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1 -1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 Optimization on Semi-Riemannian Submanifolds
p+q−1 Euclidean sphere in a Minkowski space S p+q−1 p,q I Tangent Spaces: Tx S = {V ∈ Tx R | hV , Ip,qxi = 0} p,q I Scalar Product: Inherited from the ambient space R I The induced scalar product is non-degenerate except for a set p+q−1 > of measure zero Z := x ∈ S | x Ip,qx = 0 p,q p+q−1 I Projection from Tx R to Tx S :
> V x p+q−1 Projx (V ) = V − > Ip,qx, ∀x ∈ S \Z x Ip,qx
I Geodesics: No closed-form expression, but can use Riemannian geodesics for optimization purposes
I Parallel Transport: No closed-form expression, but can use Riemannian parallel-transports for optimization purposes Optimization on Semi-Riemannian Submanifolds
p+q−1 Euclidean sphere in a Minkowski space S
I Semi-Riemannian Gradient:
p+q−1 Df (x) = Projx (∇f (x)) , ∀x ∈ S \Z
I Semi-Riemannian Hessian:
2 2 p+q−1 D f (x) = Projx ◦ ∇ f (x) , ∀x ∈ S \Z
2 p,q p,q (Viewing ∇ f (x) as a linear map from Tx R to Tx R , 2 p+q−1 p+q−1 and D f (x) as a linear map from Tx S to Tx S ) Example: Euclidean Spheres in Minkowski Spaces
max x>Ax, where A = A> 2 2 x1 +···+xp+q=1
10 1 10 1
0 10 10 0
-1 10 10 -1
-2 10 10 -2
-3 10 10 -3
-4 10 10 -4
-5 10 10 -5
-6 10 10 -6
-7 10 10 -7
10 -8 10 -8 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 180 Optimization on Semi-Riemannian Submanifolds p,q p+q−1 Pseudosphere S in a Minkowski space S
p,q p,q 2 2 2 2 S := x ∈ R | −x1 − · · · − xp + xp+q + ··· + xp+q = 1 p,q p,q I Tangent Spaces: Tx S = {V ∈ Tx R | hV , xi = 0} p,q I Scalar Product: Inherited from the ambient space R Non-degenerate! p,q p,q I Projection from Tx R to Tx S :
> p,q Projx (V ) = V − V Ip,qx x, ∀x ∈ S
I Semi-Riemannian Gradient:
p,q Df (x) = Projx (∇f (x)) , ∀x ∈ S
I Semi-Riemannian Hessian:
2 2 p,q D f (x) = Projx ◦ ∇ f (x) , ∀x ∈ S Optimization on Semi-Riemannian Submanifolds
p,q p+q−1 Pseudosphere S in a Minkowski space S
p,q p,q 2 2 2 2 S := x ∈ R | −x1 − · · · − xp + xp+q + ··· + xp+q = 1 p,q p,q I Geodesics: For any x ∈ S and V ∈ Tx S ,
Expx (tV ) = V x cos(t kV k) + sin(t kV k), if hV , V i > 0, kV k V x cosh(t kV k) + sinh(t kV k), if hV , V i < 0, kV k x + tV , if hV , V i = 0.
where kV k := p|hV , V i|. Optimization on Semi-Riemannian Submanifolds
p,q p+q−1 Pseudosphere S in a Minkowski space S p,q I Parallel Transport along geodesics: For any x ∈ S and p,q V ∈ Tx S , the parallel transport of V along a geodesic p,q emanating from x in the direction X ∈ Tx S is P (V ) = x→Expx (tX ) hV , X i X − x sin (t kX k) − cos(t kX k) , kX k kX k 2 + + V − hV , X iX / kX k , if hX , X i > 0, hV , X i X − x sinh (t kX k) + cosh(t kX k) kX k kX k 2 + V + hV , X iX / kX k , if hX , X i < 0, 1 2 −hV , X i tx + t X + V , if hX , X i = 0. 2 Optimization on Semi-Riemannian Submanifolds
p,q p+q−1 Pseudosphere S in a Minkowski space S
p,q p,q 2 2 2 2 S := x ∈ R | −x1 − · · · − xp + xp+q + ··· + xp+q = 1
Semi-Riemannian vs. Riemannian optimization on pseudospheres
I Riemannian geodesics and parallel transports are much harder p,q to compute on S , if possible at all p,q I S is non-compact
• Fiori, Simone. Learning by natural gradient on noncompact matrix-type pseudo-Riemannian manifolds. IEEE transactions on neural networks 21, no. 5 (2010): 841-852. Example: Pseudospheres
2 p,q min kx − ξk2 , where ξ∈ / S x∈Sp,q
10 0
10 -2
10 -4
10 -6
0 20 40 60 80 100 120 140 160 180 200
• Tingran Gao, Lek-Heng Lim, and Ke Ye. Semi-Riemannian Manifold Optimization. arXiv:1812.07643 (2018) • MATLAB code available at https://github.com/trgao10/SemiRiem Matrix Manifolds
I Matrix manifolds are all Lie groups, so their tangent bundles are all trivial. They admit semi-Riemannian metrics of arbitrary index.
I Unfortunately, very few matrix manifolds inherit non-degenerate semi-Riemannian structures from the ambient Minkowski space; geodesics and parallel-transports for the non-degenerate ones are hard to compute in general
• Tingran Gao, Lek-Heng Lim, and Ke Ye. Semi-Riemannian Manifold Optimization. arXiv:1812.07643 (2018) Thank you!
Future Work:
I More applications, especially for matrix computation
I Trajectory analysis for non-convex optimization
I Numerical Differential Geometry
• Tingran Gao, Lek-Heng Lim, and Ke Ye. Semi-Riemannian Manifold Optimization. arXiv:1812.07643 (2018) • MATLAB code available at https://github.com/trgao10/SemiRiem