Semi-Riemannian Manifold Optimization
Total Page:16
File Type:pdf, Size:1020Kb
Semi-Riemannian Manifold Optimization Tingran Gao William H. Kruskal Instructor Committee on Computational and Applied Mathematics (CCAM) Department of Statistics The University of Chicago 2018 China-Korea International Conference on Matrix Theory with Applications Shanghai University, Shanghai, China December 19, 2018 Outline Motivation I Riemannian Manifold Optimization I Barrier Functions and Interior Point Methods Semi-Riemannian Manifold Optimization I Optimality Conditions I Descent Direction I Semi-Riemannian First Order Methods I Metric Independence of Second Order Methods Optimization on Semi-Riemannian Submanifolds Joint work with Lek-Heng Lim (The University of Chicago) and Ke Ye (Chinese Academy of Sciences) Riemannian Manifold Optimization I Optimization on manifolds: min f (x) x2M where (M; g) is a (typically nonlinear, non-convex) Riemannian manifold, f : M! R is a smooth function on M I Difficult constrained optimization; handled as unconstrained optimization from the perspective of manifold optimization I Key ingredients: tangent spaces, geodesics, exponential map, parallel transport, gradient rf , Hessian r2f Riemannian Manifold Optimization I Riemannian Steepest Descent x = Exp (−trf (x )) ; t ≥ 0 k+1 xk k I Riemannian Newton's Method 2 r f (xk ) ηk = −∇f (xk ) x = Exp (−tη ) ; t ≥ 0 k+1 xk k I Riemannian trust region I Riemannian conjugate gradient I ...... Gabay (1982), Smith (1994), Edelman et al. (1998), Absil et al. (2008), Adler et al. (2002), Ring and Wirth (2012), Huang et al. (2015), Sato (2016), etc. Riemannian Manifold Optimization I Applications: Optimization problems on matrix manifolds I Stiefel manifolds n Vk (R ) = O (n) =O (k) I Grassmannian manifolds Gr (k; n) = O (n) = (O (k) × O (n − k)) I Flag manifolds: for n1 + ··· + nd = n, Flag = O (n) = (O (n ) × · · · × O (n )) n1;··· ;nd 1 d I Shape Spaces k×n R n f0g =O (n) I ...... From Riemannian to Semi-Riemannian Motivation 1: Question from a Geometer How does optimization depend on the choice of metrics? I Critical points fx 2 M j rf (x) = 0g are metric independent I Gradients, Hessians, Geodesics are metric dependent I Index of Hessian becomes metric independent at a critical point; Morse theory I Optimization trajectories are obviously metric dependent I Newton's equation is metric independent 2 h 2 i r f (xk ) ηk = −∇f (xk ) , re f (xk ) ηk = −re f (xk ) I Does it still work if the metric structure is more general than Riemannian? (E.g. Semi-Riemannian, Finsler, etc.) Barrier Functions and Interior Point Methods d f : Q ! R strictly convex, defined on an open subset Q ⊂ R . 2 I r f (x) 0 on any x 2 Q 2 I r f (x) defines a Riemannian metric on Q 2 −1 I gradient under this Riemannian metric: r f (x) rf (x) I Newton's method for f is exactly the gradient descent under the Riemannian metric defined by r2f (x)! I Nesterov and Todd (2002): primal-dual central path are \close to" geodesics under this Riemannian metric • Nesterov, Yurii E., and Michael J. Todd. On the Riemannian geometry defined by self-concordant barriers and interior-point methods. Foundations of Computational Mathematics 2, no. 4 (2002): 333-361. • Duistermaat, Johannes J. On Hessian Riemannian structures. Asian Journal of Mathematics 5, no. 1 (2001): 79-91. From Riemannian to Semi-Riemannian Motivation 2: Nonconvex objectives for interior point methods I What if f : Q ! R is not strictly convex? I What happens to Newton's method if the Hessian is not positive semi-definite? (Newton's method with modified Hessian) Semi-Riemannian Geometry I Scalar product: h·; ·i : V × V ! R symmetric bilinear form on vector space V , not necessarily positive (semi-)definite I Non-degeneracy: hv; wi = 0 for all w 2 V , v = ~0 I A semi-Riemannian manifold is a differentiable manifold M with a scalar product h·; ·ix defined on Tx M, and h·; ·ix varies smoothly with respect to x 2 M I A semi-Riemannian manifold is non-degenerate if h·; ·ix is non-degenerate for all x 2 M I Non-degeneracy needs not be inherited by subspaces! I Semi-Riemannian submanifold, geodesic, parallel transport, gradient, Hessian, ...... all well-defined if non-degenerate I Of great interest to general relativity as a spacetime model (Lorentzian geometry) Semi-Riemannian Geometry A Simplest Degenerate Subspace Example 2 I Consider R equipped with the Lorentzian metric of signature 2 2 (−; +), i.e., for x = (x1; x2) 2 R and y = (y1; y2) 2 R , define the scalar product as hx; yi = −x1y1 + x2y2 2 I Let W = span f(1; 1)g, which is a degenerate subspace of R ? 2 I If we define W := v 2 R j hw; vi = 0 8w 2 W , then one has W = W ?! I Take-home message: If W is a degenerate subspace of a scalar product space (V ; h·; ·i), then in general W + W ? 6= V , and it may well be the case that W \ W ? 6= 0! I Nevertheless: If W is a non-degenerate subspace of V , then it is still true that V = W ⊕ W ? Examples of Semi-Riemannian Manifolds p;q p+q I Minkowski space R (p ≥ 0, q ≥ 0): vector space R equipped with scalar product > hu; vi = u Ip;qv where −Ip 0 Ip;q = 0 Iq n×n I Square matrices R , equipped with scalar product > > hA; Bi = Tr A Ip;qB = vec (A) (In ⊗ Ip;q) vec (B) I Submanifolds of semi-Riemannian manifolds Examples of Semi-Riemannian Manifolds: Applications I Spacetime models: Lorentzian spaces, de Sitter spaces, anti-de Sitter spaces, ...... I Network models: social networks \behave like" spaces of negative Alexandrov curvature • Carroll, Sean M. Spacetime and geometry: An introduction to general relativity. San Francisco, USA: Addison-Wesley, 2004. • Krioukov, Dmitri, Maksim Kitsak, Robert S. Sinkovits, David Rideout, David Meyer, and Mari´anBogu~n´a. Network cosmology. Scientific reports 2 (2012): 793. Outline Motivation I Riemannian Manifold Optimization I Barrier Functions and Interior Point Methods Semi-Riemannian Manifold Optimization I Optimality Conditions I Descent Direction I Semi-Riemannian First Order Methods I Metric Independence of Second Order Methods Optimization on Semi-Riemannian Submanifolds Joint work with Lek-Heng Lim (The University of Chicago) and Ke Ye (Chinese Academy of Sciences) Optimality Conditions Given a semi-Riemannian manifold with scalar product h·; ·ix on Tx M, define the semi-Riemannian gradient of a smooth function f 2 C 2 (M) by hDf ; X ix = Xf (x) ; 8X 2 Γ(M; Tx M) ; x 2 M and define the semi-Riemannian Hessian of f by 2 D f (X ; Y ) = X (Yf ) − (DX Y ) f ; 8X ; Y 2 Γ(M; Tx M) where DX Y is the covariant derivative of Y with respect to X . Optimality Conditions Necessary conditions: A local optimum x 2 M satisfies I (First order necessary condition) Df (x) = 0 if f : M ! R is first-order differentiable I (Second order necessary condition) Df (x) = 0 and 2 D f (x) 0 if f : M ! R is second-order differentiable Sufficient condition: I (Second-order sufficient condition) If x 2 M is an interior point on a semi-Riemannian manifold M, if Df (x) = 0 and D2f (x) 0, then x is a strict relative minimum of f Descent Direction I If general, −Df (x) is not a descent direction at x 2 M! The main issue is that h−Df (x) ; Df (x)ix may be negative, so the first-order approximation f (x − Df (x)) ≈ f (x) − hDf (x) ; Df (x)ix needs not be smaller than f (x) I Similar issue exists in Newton's method: When Hessian r2f (x) is not positive definite, the Newton direction − [rf (x)]−1 rf (x) may not be a descent direction I Solution for Newton's method: Modified Hessian Methods I For semi-Riemannian optimization: similar idea of modifying Df (x) Descent Direction Solution: modify −Df (x) to obtain a descent direction on the semi-Riemannian manifold I For each Tx M, use semi-Riemannian Gram-Schmidt to find an orthonormal basis e1; ··· ; ed for Tx M n + X I Construct [Df (x)] = hDf (x) ; ei i ei i=1 + I − [Df (x)] is a descent direction, since n + X 2 − [Df (x)] ; Df (x) = − jhDf (x) ; ei ij < 0 i=1 Semi-Riemannian Steepest Descent I/II Semi-Riemannian Steepest Descent II/II Semi-Riemannian Conjugate Gradient Semi-Riemannian Second-Order Methods 2 I Newton's Method: Solve D f (x) η = Df (x) for a descent direction η 2 Tx M at x 2 M. Note that this is exactly the same Newton's equation for Riemannian optimization! I Trust Region: Minimize a local quadratic optimization problem near each x 2 M. Again same as Riemannian optimization! I The equivalence between Riemannian and Semi-Riemannian second-order methods is not just a coincidence; it is because these methods build upon local polynomial surrogates, which are always metric-independent in the sense of jets. Outline Motivation I Riemannian Manifold Optimization I Barrier Functions and Interior Point Methods Semi-Riemannian Manifold Optimization I Optimality Conditions I Descent Direction I Semi-Riemannian First Order Methods I Metric Independence of Second Order Methods Optimization on Semi-Riemannian Submanifolds Joint work with Lek-Heng Lim (The University of Chicago) and Ke Ye (Chinese Academy of Sciences) Which manifolds admit semi-Riemannian structures? I Euclidean spaces and all Riemannian manifolds are particular semi-Riemannian manifolds I Tangent bundles of manifolds admitting a semi-Riemannian metric that is not Riemannian must admit at least one decomposition into two sub-bundles I Any manifold with trivial tangent bundle admits semi-Riemannian manifolds, e.g.