Manifold Optimization Over the Set of Doubly Stochastic Matrices

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Caltech Authors 1 Manifold Optimization Over the Set of Doubly Stochastic Matrices: A Second-Order Geometry Ahmed Douik, Student Member, IEEE and Babak Hassibi, Member, IEEE Abstract—Convex optimization is a well-established research Another important property of convex optimization is that area with applications in almost all fields. Over the decades, mul- the interior of the search space can be identified with a tiple approaches have been proposed to solve convex programs. manifold that is embedded in a higher-dimensional Euclidean The development of interior-point methods allowed solving a more general set of convex programs known as semi-definite space. Using advanced tools to solve the constrained opti- programs and second-order cone programs. However, it has been mization, e.g., [2], requires solving on the high dimension established that these methods are excessively slow for high space which can be excessively slow. Riemannian optimization dimensions, i.e., they suffer from the curse of dimensionality. takes advantage of the fact that the manifold is of lower On the other hand, optimization algorithms on manifold have dimension and exploits its underlying geometric structure. shown great ability in finding solutions to nonconvex problems in reasonable time. This paper is interested in solving a subset The optimization problem is reformulated from a constrained of convex optimization using a different approach. The main Euclidean optimization into an unconstrained optimization idea behind Riemannian optimization is to view the constrained over a restricted search space, a.k.a., a Riemannian manifold. optimization problem as an unconstrained one over a restricted Thanks to the aforementioned low-dimension feature, opti- search space. The paper introduces three manifolds to solve mization over Riemannian manifolds is expected to perform convex programs under particular box constraints. The manifolds, called the doubly stochastic, symmetric and the definite more efficiently [3]. Therefore, a large body of literature ded- multinomial manifolds, generalize the simplex also known as the icated to adapting traditional Euclidean optimization methods multinomial manifold. The proposed manifolds and algorithms and their convergence properties to Riemannian manifolds. are well-adapted to solving convex programs in which the This paper introduces a framework for solving optimization variable of interest is a multidimensional probability distribution problems in which the optimization variable is a doubly function. Theoretical analysis and simulation results testify the efficiency of the proposed method over state of the art methods. In stochastic matrix. Such framework is particularly interesting particular, they reveal that the proposed framework outperforms for clustering applications. In such problems ,e.g., [4]–[7], one conventional generic and specialized solvers, especially in high wishes to recover the structure of a graph given a similarity dimensions. matrix. The recovery is performed by minimizing a prede- Index Terms—Riemannian manifolds, symmetric doubly fined cost function under the constraint that the optimization stochastic matrices, positive matrices, convex optimization. variable is a doubly stochastic matrix. This work provides a unified framework to carry such optimization. I. INTRODUCTION Numerical optimization is the foundation of various engi- A. State of the Art neering and computational sciences. Consider a mapping f Optimization algorithms on Riemannian manifolds appeared from a subset D of Rn to R. The goal of the optimization in the optimization literature as early as the 1970’s with the algorithms is to find an extreme point x∗ D such that ∈ work of Luenberger [8] wherein the standard Newton’s opti- f(x∗) f(y) for all point y ∗ in the neighborhood of x mization method has been adapted to problems on manifolds. x∗. Unconstrained≤ Euclidean1 optimization∈ N refers to the setup A decade later, Gabay [9] introduces the steepest descent and in which the domain of the objective function is the whole the quasi-Newton algorithm on embedded submanifolds of space, i.e., D = Rn. On the other hand, constrained Euclidean Rn. The work investigates the global and local convergence arXiv:1802.02628v1 [math.OC] 7 Feb 2018 optimization denotes optimization problem in which the search properties of both the steepest descent and the Newton’s set is constrained, i.e., D ( Rn. method. The analysis of the steepest descent and the Newton Convex optimization is a special case of constrained opti- algorithm is extended in [10], [11] to Riemannian manifolds. mization problems in which both the objective function and By using exact line search, the authors concluded the conver- the search set are convex. Historically initiated with the study gence of their proposed algorithms. The assumption is relaxed of least-squares and linear programming problems, convex in [12] wherein the author provides convergence rate and optimization plays a crucial role in optimization algorithm guarantees for the steepest descent and Newton’s method for thanks to the desirable convergence property it exhibits. The Armijo step-size control. development of interior-point methods allowed solving a more general set of convex programs known as semi-definite pro- The above-mentioned works substitute the concept of the line search in Euclidian algorithms by searching along a grams and second-order cone programs. A summary of convex optimization methods and performance analysis can be found geodesic which generalizes the idea of a straight line. While in the seminal book [1]. the method is natural and intuitive, it might not be practical. Indeed, finding the expression of the geodesic requires com- Ahmed Douik and Babak Hassibi are with the Department of Electrical puting the exponential map which may be as complicated as Engineering, California Institute of Technology, Pasadena, CA 91125 USA solving the original optimization problem [13]. To overcome (e-mail: {ahmed.douik,hassibi}@caltech.edu). 1The traditional optimization schemes are identified with the word Eu- the problem, the authors in [14] suggest approximating the clidean in contrast with the Riemannian algorithm in the rest of the paper. exponential map up to a given order, called a retraction, and show quadratic convergence for Newton’s method under The paper investigates the first and second order Rieman- such setup. The work initiated more sophisticated optimization nian geometry of the proposed manifolds endowed with the algorithm such as the trust region methods [3], [15]–[18]. Fisher information metric which guarantees that the manifolds Analysis of the convergence of first and second order methods have a differentiable structure. For each manifold, the tangent on Riemannian manifolds, e.g., gradient and conjugate gradi- space, Riemannian gradient, Hessian, and retraction are de- ent descent, Newton’s method, and trust region methods, using rived. With the aforementioned expressions, the manuscript general retractions are summarized in [13]. formulates first and a second order optimization algorithms Thanks to the theoretical convergence guarantees mentioned and characterizes their complexity. Simulation results are above, the optimization algorithms on Riemannian manifolds provided to further illustrate the efficiency of the proposed are gradually gaining momentum in the optimization field [3]. method against state of the art algorithms. Several successful algorithms have been proposed to solve The rest of the manuscript is organized as follows: Section II non-convex problems, e.g., the low-rank matrix completion introduces the optimization algorithms on manifolds and lists [19]–[21], online learning [22], clustering [23], [24] and the problems of interest in this paper. In Section III, the doubly tensor decomposition [25]. It is worth mentioning that these stochastic manifold is introduced and its first and second works modify the optimization algorithm by using a general order geometry derived. Section IV iterate a similar study connection instead of the genuine parallel vector transport to to a particular case of doubly stochastic matrices known as move from a tangent space to the other while computing the the symmetric manifold. The study is extended to the definite (approximate) Hessian. Such approach conserves the global symmetric manifold in Section V. Section VI suggests first and convergence of the quasi-Newton scheme but no longer en- second order algorithms and analyze their complexity. Finally, sures their superlinear convergence behavior [26]. before concluding in Section VIII, the simulation results are Despite the advantages cited above, the use of optimization plotted and discussed in Section VII. algorithms on manifolds is relatively limited. This is mainly due to the lack of a systematic mechanism to turn a constrained II. OPTIMIZATION ON RIEMANNIAN MANIFOLDS optimization problem into an optimization over a manifold provided that the search space forms a manifold, e.g., convex This section introduces the numerical optimization methods optimization. Such reformulation, usually requiring some level on smooth matrix manifolds. The first part introduces the of understanding of differential geometry and Riemannian Riemannian manifold notations and operations. The second manifolds, is prohibitively complex for regular use. This paper part extends the first and second order Euclidean optimization addresses

Load more