Fenchel Duality Theory and a Primal-Dual Algorithm on Riemannian Manifolds
Ronny Bergmann∗ Roland Herzog∗ Maurício Silva Louzeiro∗ Daniel Tenbrinck† José Vidal-Núñez‡
This paper introduces a new notion of a Fenchel conjugate, which generalizes the classical Fenchel conjugation to functions dened on Riemannian manifolds. We investigate its properties, e.g., the Fenchel–Young inequality and the characterization of the convex subdierential using the analogue of the Fenchel–Moreau Theorem. These properties of the Fenchel conjugate are employed to derive a Riemannian primal-dual optimization algorithm, and to prove its convergence for the case of Hadamard manifolds under appropriate assumptions. Numerical results illustrate the performance of the algorithm, which competes with the recently derived Douglas–Rachford algorithm on manifolds of nonpositive curvature. Furthermore, we show numerically that our novel algorithm even converges on manifolds of positive curvature.
Keywords. convex analysis, Fenchel conjugate function, Riemannian manifold, Hadamard manifold, primal-dual algorithm, Chambolle–Pock algorithm, total variation
AMS subject classications (MSC2010). 49N15, 49M29, 90C26, 49Q99
1 Introduction
Convex analysis plays an important role in optimization, and an elaborate theory on convex analysis and conjugate duality is available on locally convex vector spaces. Among the vast references on this topic, we mention Bauschke, Combettes, 2011 for convex analysis and monotone operator techniques, Ekeland,
∗Technische Universität Chemnitz, Faculty of Mathematics, 09107 Chemnitz, Germany ([email protected] arXiv:1908.02022v4 [math.NA] 9 Oct 2020 chemnitz.de, https://www.tu-chemnitz.de/mathematik/part dgl/people/bergmann, ORCID 0000-0001-8342- 7218, [email protected], https://www.tu-chemnitz.de/mathematik/part dgl/people/herzog, ORCID 0000-0003-2164-6575, [email protected], https://www.tu-chemnitz.de/ mathematik/part dgl/people/louzeiro, ORCID 0000-0002-4755-3505). †Friedrich-Alexander Universität Erlangen-Nürnberg, Department of Mathematics, Chair for Applied Mathematics (Modeling and Numerics), 91058 Erlangen, Germany ([email protected], https://en.www.math.fau.de/ angewandte-mathematik-1/mitarbeiter/dr-daniel-tenbrinck/). ‡University of Alcalá, Department of Physics and Mathematics, 28801 Alcalá de Henares, Spain ([email protected], https:// www.uah.es/es/estudios/profesor/Jose-Vidal-Nunez/, ORCID 0000-0002-1190-6700).
2020-10-12 page 1 of 38 R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Ten [...] Fenchel Duality on Manifolds
Temam, 1999 for convex analysis and the perturbation approach to duality, or Rockafellar, 1970 for an in-depth development of convex analysis on Euclidean spaces. Rockafellar, 1974 focuses on conjugate duality on Euclidean spaces, Zălinescu, 2002; Boţ, 2010 on conjugate duality on locally convex vector spaces, and Martínez-Legaz, 2005 on some particular applications of conjugate duality in economics.
We wish to emphasize in particular the role of convex analysis in the analysis and numerical solution of regularized ill-posed problems. Consider for instance the total variation (TV) functional, which was introduced for imaging applications in the famous Rudin–Osher–Fatemi (ROF) model, see Rudin, Osher, Fatemi, 1992, and which is known for its ability to preserve sharp edges. We refer the reader to Chambolle, Caselles, et al., 2010 for further details about total variation for image analysis. Further applications and regularizers can be found in Chambolle, Lions, 1997; Strong, Chan, 2003; Chambolle, 2004; Chan, Esedoglu, et al., 2006; Wang et al., 2008. In addition, higher order dierences or dierentials can be taken into account, see for example Chan, Marquina, Mulet, 2000; Papatsoros, Schönlieb, 2014 or most prominently the total generalized variation (TGV) Bredies, Kunisch, Pock, 2010. These models use the idea of the pre-dual formulation of the energy functional and Fenchel duality to derive ecient algorithms. Within the image processing community the resulting algorithms of primal-dual hybrid gradient type are often referred to as the Chambolle–Pock algorithm, see Chambolle, Pock, 2011.
In recent years, optimization on Riemannian manifolds has gained a lot of interest. Starting in the 1970s, optimization on Riemannian manifolds and corresponding algorithms have been investigated; see for instance Udrişte, 1994 and the references therein. In particular, we point out the work by Rapcsák with regard to geodesic convexity in optimization on manifolds; see for instance Rapcsák, 1986; 1991 and Rapcsák, 1997, Ch. 6. The latter reference also serves as a source for optimization problems on manifolds obtained by rephrasing equality constrained problems in vector spaces as unconstrained problems on certain manifolds. For a comprehensive textbook on optimization on matrix manifolds, see Absil, Mahony, Sepulchre, 2008 and the recent Boumal, 2020.
With the emergence of manifold-valued imaging, for example in InSAR imaging Bürgmann, Rosen, Fielding, 2000, data consisting of orientations for example in electron backscattered diraction (EBSD) Adams, Wright, Kunze, 1993; Kunze et al., 1993, dextrous hand grasping Dirr, Helmke, Lageman, 2007, or for diusion tensors in magnetic resonance imaging (DT-MRI), for example discussed in Pennec, Fillard, Ayache, 2006, the development of optimization techniques and/or algorithms on manifolds (especially for non-smooth functionals) has gained a lot of attention. Within these applications, the same tasks appear as for classical, Euclidean imaging, such as denoising, inpainting or segmentation. Both Lellmann et al., 2013 as well as Weinmann, Demaret, Storath, 2014 introduced the total variation as a prior in a variational model for manifold-valued images. While the rst extends a lifting approach previously introduced for cyclic data in Strekalovskiy, Cremers, 2011 to Riemannian manifolds, the latter introduces a cyclic proximal point algorithm (CPPA) to compute a minimizer of the variational model. Such an algorithm was previously introduced by Bačák, 2014a on CAT(0) spaces based on the proximal point algorithm introduced by Ferreira, Oliveira, 2002 on Riemannian manifolds. Based on these models and algorithms, higher order models have been derived Bergmann, Laus, et al., 2014; Bačák et al., 2016; Bergmann, Fitschen, et al., 2018; Bredies, Holler, et al., 2018. Using a relaxation, the half-quadratic minimization Bergmann, Chan, et al., 2016, also known as iteratively reweighted least squares (IRLS) Grohs, Sprecher, 2016, has been generalized to manifold-valued image processing tasks and employs a quasi-Newton method. Finally, the parallel Douglas–Rachford algorithm (PDRA) was introduced on Hadamard manifolds Bergmann, Persch, Steidl, 2016 and its convergence proof is, to the best of our knowledge, limited to manifolds with constant nonpositive curvature. Numerically,
2020-10-12 cbna page 2 of 38 R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Ten [...] Fenchel Duality on Manifolds the PDRA still performs well on arbitrary Hadamard manifolds. However, for the classical Euclidean case the Douglas–Rachford algorithm is equivalent to applying the alternating directions method of multipliers (ADMM) Gabay, Mercier, 1976 on the dual problem and hence is also equivalent to the algorithm of Chambolle, Pock, 2011.
In this paper we introduce a new notion of Fenchel duality for Riemannian manifolds, which allows us to derive a conjugate duality theory for convex optimization problems posed on such manifolds. Our theory allows new algorithmic approaches to be devised for optimization problems on manifolds. In the absence of a global concept of convexity on general Riemannian manifolds, our approach is local in nature. On so-called Hadamard manifolds, however, there is a global notion of convexity and our approach also yields a global method.
The work closest to ours is Ahmadi Kakavandi, Amini, 2010, who introduce a Fenchel conjugacy-like concept on Hadamard metric spaces, using a quasilinearization map in terms of distances as the duality product. In contrast, our work makes use of intrinsic tools from dierential geometry such as geodesics, tangent and cotangent vectors to establish a conjugation scheme which extends the theory from locally convex vector spaces to Riemannian manifolds. We investigate the application of the correspondence of a primal problem Minimize 퐹 (푝) + 퐺 (Λ(푝)) (1.1) to a suitably dened dual and derive a primal-dual algorithm on Riemannian manifolds. In the absence of a concept of linear operators between manifolds we follow the approach of Valkonen, 2014 and state an exact and a linearized variant of our newly established Riemannian Chambolle–Pock algorithm (RCPA). We then study convergence of the latter on Hadamard manifolds. Our analysis relies on a careful investigation of the convexity properties of the functions 퐹 and 퐺. We distinguish between geodesic convexity and convexity of a function composed with the exponential map on the tangent space. Both types of convexity coincide on Euclidean spaces. This renders the proposed RCPA a direct generalization of the Chambolle-Pock algorithm to Riemannian manifolds.
As an example for a problem of type (1.1), we detail our algorithm for the anisotropic and isotropic total variation with squared distance data term, i. e., the variants of the ROF model on Riemannian manifolds. After illustrating the correspondence to the Euclidean (classical) Chambolle–Pock algorithm, we compare the numerical performance of the RCPA to the CPPA and the PDRA. While the latter has only been shown to converge on Hadamard manifolds of constant curvature, it performs quite well on Hadamard manifolds in general. On the other hand, the CPPA is known to possibly converge arbitrarily slowly; even in the Euclidean case. We illustrate that our linearized algorithm competes with the PDRA, and it even performs favorably on manifolds with non-negative curvature, like the sphere.
The remainder of the paper is organized as follows. In Section 2 we recall a number of classical results from convex analysis in Hilbert spaces. In an eort to make the paper self-contained, we also briey state the required concepts from dierential geometry. Section 3 is devoted to the development of a complete notion of Fenchel conjugation for functions dened on manifolds. To this end, we extend some classical results from convex analysis and locally convex vector spaces to manifolds, like the Fenchel–Moreau Theorem (also known as the Biconjugation Theorem) and useful characterizations of the subdierential in terms of the conjugate function. In Section 4 we formulate the primal-dual hybrid gradient method (also referred to as the Riemannian Chambolle–Pock algorithm, RCPA) for
2020-10-12 cbna page 3 of 38 R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Ten [...] Fenchel Duality on Manifolds general optimization problems on manifolds involving non-linear operators. We present an exact and a linearized formulation of this novel method and prove, under suitable assumptions, convergence for the linearized variant to a minimizer of a linearized problem on arbitrary Hadamard manifolds. As an application of our theory, Section 5 focuses on the analysis of several total variation models on manifolds. In Section 6 we carry out numerical experiments to illustrate the performance of our novel primal-dual algorithm. Finally, we give some conclusions and further remarks on future research in Section 7.
2 Preliminaries on Convex Analysis and Differential Geometry
In this section we review some well known results from convex analysis in Hilbert spaces as well as necessary concepts from dierential geometry. We also revisit the intersection of both topics, convex analysis on Riemannian manifolds, including its subdierential calculus.
2.1 Convex Analysis
In this subsection let 푓 : X → R, where R B R ∪ {±∞} denotes the extended real line and X is a ∗ Hilbert space with inner product (· , ·)X and duality pairing h· , ·iX∗,X, respectively. Here, X denotes the dual space of X. When the space X and its dual X∗ are clear from the context, we omit the space and just write (· , ·) and h· , ·i, respectively. For standard denitions like closedness, properness, lower semicontinuity (lsc) and convexity of 푓 we refer the reader, e. g., to the textbooks Rockafellar, 1970; Bauschke, Combettes, 2011.
Denition 2.1. The Fenchel conjugate of a function 푓 : X → R is dened as the function 푓 ∗ : X∗ → R such that 푓 ∗ (푥∗) B sup {h푥∗ , 푥i − 푓 (푥)} . (2.1) 푥 ∈X
We recall some properties of the classical Fenchel conjugate function in the following lemma.
Lemma 2.2 (Bauschke, Combettes, 2011, Ch. 13). Let 푓 ,푔 : X → R be proper functions, 훼 ∈ R, 휆 > 0 and 푏 ∈ X. Then the following statements hold.
(푖) 푓 ∗ is convex and lsc. (푖푖) If 푓 (푥) ≤ 푔(푥) for all 푥 ∈ X, then 푓 ∗ (푥∗) ≥ 푔∗ (푥∗) for all 푥∗ ∈ X∗. (푖푖푖) If 푔(푥) = 푓 (푥) + 훼 for all 푥 ∈ X, then 푔∗ (푥∗) = 푓 ∗ (푥∗) − 훼 for all 푥∗ ∈ X∗. (푖푣) If 푔(푥) = 휆푓 (푥) for all 푥 ∈ X, then 푔∗ (푥∗) = 휆푓 ∗ (푥∗/휆) for all 푥∗ ∈ X∗. (푣) If 푔(푥) = 푓 (푥 + 푏) for all 푥 ∈ X, then 푔∗ (푥∗) = 푓 ∗ (푥∗) − h푥∗ ,푏i for all 푥∗ ∈ X∗.
2020-10-12 cbna page 4 of 38 R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Ten [...] Fenchel Duality on Manifolds
(푣푖) The Fenchel–Young inequality holds, i. e., for all (푥, 푥∗) ∈ X × X∗ we have h푥∗ , 푥i ≤ 푓 (푥) + 푓 ∗ (푥∗). (2.2)
4 푓 (푥) 1 푓 ∗(푥∗) 푓 ∗(푥∗) 2 푥∗ −4 −3 −2 −1 1 2 푥 -1 푥 1 2 ˆ ∗ ∗ −푓 (푥 ) −1 푓 (a) The function (solid) and the linear function of (b) The Fenchel conjugate 푓 ∗ of 푓 . 푥∗ = −4 (dashed) and its shifted tangent (dotted).
Figure 2.1: Illustration of the Fenchel conjugate for the case 푑 = 1 as an interpretation by the tangents of slope 푥∗.
The Fenchel conjugate of a function 푓 : R푑 → R can be interpreted as a maximum seeking problem on the epigraph epi 푓 B {(푥, 훼) ∈ R푑 × R | 푓 (푥) ≤ 훼}. For the case 푑 = 1 and some xed 푥∗ the conjugate maximizes the (signed) distance h푥∗ , 푥i − 푓 (푥) of the line of slope 푥∗ to 푓 . For instance, let us focus ∗ ∗ on the case 푥 = −4 highlighted in Fig. 2.1a. For the linear functional 푔푥∗ (푥) = h푥 , 푥i (dashed), the maximal distance is attained at 푥ˆ. We can nd the same value by considering the shifted functional ∗ ∗ ∗ ∗ ℎ푥∗ (푥) = 푔푥∗ (푥) − 푓 (푥 ) (dotted line) and its negative value at the origin, i. e., −ℎ푥∗ (0) = 푓 (푥 ). Furthermore ℎ푥∗ is actually tangent to 푓 at the aforementioned maximizer 푥ˆ. The function ℎ푥∗ also illustrates the shifting property from Lemma 2.2( 푣) and its linear oset −h푥∗ ,푏i. The overall plot of the Fenchel conjugate 푓 ∗ over an interval of values 푥∗ is shown in Fig. 2.1b.
We now recall some results related to the denition of the subdierential of a proper function.
Denition 2.3 (Bauschke, Combettes, 2011, Def. 16.1). Let 푓 : X → R be a proper function. Its subdierential is dened as 휕푓 (푥) B {푥∗ ∈ X∗ | 푓 (푧) ≥ 푓 (푥) + h푥∗ , 푧 − 푥i for all 푧 ∈ X} . (2.3)
Theorem 2.4 (Bauschke, Combettes, 2011, Prop. 16.9). Let 푓 : X → R be a proper function and 푥 ∈ X. Then 푥∗ ∈ 휕푓 (푥) holds if and only if 푓 (푥) + 푓 ∗ (푥∗) = h푥∗ , 푥i. (2.4)
Corollary 2.5 (Bauschke, Combettes, 2011, Thm. 16.23). Let 푓 : X → R be a lsc, proper, and convex function and 푥∗ ∈ X∗. Then 푥 ∈ 휕푓 ∗ (푥∗) holds if and only if 푥∗ ∈ 휕푓 (푥).
The Fenchel biconjugate 푓 ∗∗ : X → R of a function 푓 : X → R is given by 푓 ∗∗ (푥) = (푓 ∗)∗ (푥) = sup {h푥∗ , 푥i − 푓 ∗ (푥∗)} . (2.5) 푥∗ ∈X∗
2020-10-12 cbna page 5 of 38 R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Ten [...] Fenchel Duality on Manifolds
Finally, we conclude this section with the following result known as the Fenchel–Moreau or Biconjugation Theorem.
Theorem 2.6 (Bauschke, Combettes, 2011, Thm. 13.32). Given a proper function 푓 : X → R, the equality 푓 ∗∗ (푥) = 푓 (푥) holds for all 푥 ∈ X if and only if 푓 is lsc and convex. In this case 푓 ∗ is proper as well.
2.2 Differential Geometry
This section is devoted to the collection of necessary concepts from dierential geometry. For details concerning the subsequent denitions, the reader may wish to consult do Carmo, 1992; Lee, 2003; Jost, 2017.
Suppose that M is a 푑-dimensional connected, smooth manifold. The tangent space at 푝 ∈ M is a vector space of dimension 푑 and it is denoted by T푝 M. Elements of T푝 M, i. e., tangent vectors, will be denoted by 푋푝 and 푌푝 etc. or simply 푋 and 푌 when the base point is clear from the context. The disjoint union of all tangent spaces, i. e., Ø TM B T푝 M, (2.6) 푝 ∈M is called the tangent bundle of M. It is a smooth manifold of dimension 2푑.
∗ 푝 The dual space of T푝 M is denoted by T푝 M and it is called the cotangent space to M at . The disjoint union ∗ Ø ∗ T M B T푝 M (2.7) 푝 ∈M ∗ 푝 is known as the cotangent bundle. Elements of T푝 M are called cotangent vectors to M at and they will be denoted by 휉푝 and 휂푝 or simply 휉 and 휂. The natural duality product between 푋 ∈ T푝 M and 휉 ∗ 휉 , 푋 휉 푋 ∈ T푝 M is denoted by h i = ( ) ∈ R.
We suppose that M is equipped with a Riemannian metric, i. e., a smoothly varying family of inner products on the tangent spaces T푝 M. The metric at 푝 ∈ M is denoted by (· , ·)푝 : T푝 M × T푝 M → R. The induced norm on T푝 M is denoted by k·k푝 . The Riemannian metric furnishes a linear bijective correspondence between the tangent and cotangent spaces via the Riesz map and its inverse, the so-called musical isomorphisms; see Lee, 2003, Ch. 8. They are dened as 푋 푋 ♭ ∗ ♭: T푝 M 3 ↦→ ∈ T푝 M (2.8) satisfying ♭ h푋 , 푌i = (푋 , 푌)푝 for all 푌 ∈ T푝 M, (2.9) and its inverse, ∗ 휉 휉♯ ♯ : T푝 M 3 ↦→ ∈ T푝 M (2.10) satisfying ♯ (휉 , 푌 )푝 = h휉 , 푌i for all 푌 ∈ T푝 M. (2.11)
2020-10-12 cbna page 6 of 38 R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Ten [...] Fenchel Duality on Manifolds
The ♯-isomorphism further introduces an inner product and an associated norm on the cotangent ∗ , space T푝 M, which we will also denote by (· ·)푝 and k·k푝 , since it is clear which inner product or norm we refer to based on the respective arguments.
The tangent vector of a curve 푐 : 퐼 → M dened on some open interval 퐼 is denoted by 푐¤(푡). A curve is said to be geodesic if the directional (covariant) derivative of its tangent in the direction of the tangent vanishes, i. e., if ∇푐¤(푡)푐¤(푡) = 0 holds for all 푡 ∈ 퐼, where ∇ denotes the Levi-Cevita connection, cf. do Carmo, 1992, Ch. 2 or Lee, 2018, Thm. 4.24. As a consequence, geodesic curves have constant speed.
We say that a geodesic connects 푝 to 푞 if 푐(0) = 푝 and 푐(1) = 푞 holds. Notice that a geodesic connecting 푝 to 푞 need not always exist, and if it exists, it need not be unique. If a geodesic connecting 푝 to 푞 exists, there also exists a shortest geodesic among them, which may in turn not be unique. If it is, we 푝 푞 훾 denote the unique shortest geodesic connecting and by 푝,푞.
N Using the length of piecewise smooth curves, one can introduce a notion of metric (also known as 푑 , Riemannian distance) M (· ·) on M; see for instance Lee, 2018, Ch. 2, pp.33–39. As usual, we denote by 푝 푦 푑 푝, 푞 푟 B푟 ( ) B { ∈ M | M ( ) < } (2.12) 푟 푝 푝 Ð 푝 the open metric ball of radius > 0 with center ∈ M. Moreover, we dene B∞ ( ) = 푟 >0 B푟 ( ).
We denote by 훾푝,푋 : 퐼 → M, with 퐼 ⊂ R being an open interval containing 0, a geodesic starting at 푝 with 훾¤푝,푋 (0) = 푋 for some 푋 ∈ T푝 M. We denote the subset of T푝 M for which these geodesics are well dened until 푡 = 1 by G푝 . A Riemannian manifold M is said to be complete if G푝 = T푝 M holds for some, and equivalently for all 푝 ∈ M.
푋 훾 The exponential map is dened as the function exp푝 : G푝 → M with exp푝 B 푝,푋 (1). Note 푡푋 훾 푡 푡 , 0 that exp푝 ( ) = 푝,푋 ( ) holds for every ∈ [0 1]. We further introduce the set G푝 ⊂ T푝 M as some open 푟 0 0 ball of radius 0 < ≤ ∞ about the origin such that exp푝 : G푝 → exp푝 (G푝 ) is a dieomorphism. The 0 0 logarithmic map is dened as the inverse of the exponential map, i. e., log푝 : exp푝 (G푝 ) → G푝 ⊂ T푝 M.
In the particular case where the sectional curvature of the manifold is nonpositive everywhere, all geodesics connecting any two distinct points are unique. If furthermore, the manifold is simply connected and complete, the manifold is called a Hadamard manifold, see Bačák, 2014b, p.10. Then the exponential and logarithmic maps are dened globally.
Given 푝, 푞 ∈ M and 푋 ∈ T푝 M, we denote by P푞←푝푋 the so-called parallel transport of 푋 along a 훾 unique shortest geodesic 푝,푞. Using the musical isomorphisms presented above, we also have a parallel transport of cotangent vectors along geodesics according to N 휉 휉♯ ♭. P푞←푝 푝 B P푞←푝 푝 (2.13)
푑 푑 푑 Finally, by a Euclidean space we mean R (where T푝 R = R holds), equipped with the Riemannian 푋 푝 푋 푞 푞 푝 metric given by the Euclidean inner product. In this case, exp푝 = + and log푝 = − hold.
2020-10-12 cbna page 7 of 38 R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Ten [...] Fenchel Duality on Manifolds
2.3 Convex Analysis on Riemannian Manifolds
Throughout this subsection, M is assumed to be a complete and connected Riemannian manifold and we are going to recall the basic concepts of convex analysis on M. The central idea is to replace straight lines in the denition of convex sets in Euclidean vector spaces by geodesics.
Denition 2.7 (Sakai, 1996, Def. IV.5.1). A subset C ⊂ M of a Riemannian manifold M is said to be strongly convex if for any two points 푝, 푞 ∈ C, there exists a unique shortest geodesic of M connecting 푝 푞 훾 to , and that geodesic, denoted by 푝,푞, lies completely in C.
N On non-Hadamard manifolds, the notion of strongly convex subsets can be quite restrictive. For 푛 instance, on the round sphere S with 푛 ≥ 1, a metric ball B푟 (푝) is strongly convex if and only if 푟 < 휋/2.
Denition 2.8. Let C ⊂ M and 푝 ∈ C. We introduce the tangent subset LC,푝 ⊂ T푝 M as n o L 푋 ∈ T M 푋 ∈ C k푋 k = 푑 푋, 푝 , C,푝 B 푝 exp푝 and 푝 M exp푝 a localized variant of the pre-image of the exponential map.
Note that if C is strongly convex, the exponential and logarithmic maps introduce bijections between C and LC,푝 for any 푝 ∈ C. In particular, on a Hadamard manifold M, we have LM,푝 = T푝 M.
The following denition states the important concept of convex functions on Riemannian manifolds.
Denition 2.9 (Sakai, 1996, Def. IV.5.9).
(푖) A function 퐹 : M → R is proper if dom 퐹 B {푝 ∈ M | 퐹 (푝) < ∞} ≠ ∅ and 퐹 (푝) > −∞ holds for all 푝 ∈ M. (푖푖) Suppose that C ⊂ M is strongly convex. A function 퐹 : M → R is called geodesically convex 푝, 푞 퐹 훾 , on C ⊂ M if, for all ∈ C, the composition ◦ 푝,푞 is a convex function on [0 1] in the classical sense. Similarly, 퐹 is called strictly or strongly convex if 퐹 ◦ 훾 fullls these properties. N 푝,푞 (푖푖푖) Suppose that 퐴 ⊂ M. The epigraph of a function 퐹 : 퐴 → R isN dened as
epi 퐹 B {(푝, 훼) ∈ 퐴 × R | 퐹 (푝) ≤ 훼}. (2.14)
(푖푣) Suppose that 퐴 ⊂ M. A proper function 퐹 : 퐴 → R is called lower semicontinuous (lsc) if epi 퐹 is closed.
Suppose that C ⊂ M is strongly convex and 퐹 : C → R, then an equivalent way to describe its lower semicontinuity (item (푖푣)) is to require that the composition 퐹 ◦ exp푚 : LC,푚 → R (2.15)
2020-10-12 cbna page 8 of 38 R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Ten [...] Fenchel Duality on Manifolds
is lsc for an arbitrary 푚 ∈ C in the classical sense, where LC,푚 is dened in Denition 2.8.
We now recall the notion of the subdierential of a geodesically convex function dened on a Riemannian manifold.
Denition 2.10 (Ferreira, Oliveira, 1998, Udrişte, 1994, Def. 3.4.4). Suppose that C ⊂ M is strongly convex. The subdierential 휕M퐹 on C at a point 푝 ∈ C of a proper, geodesically convex function 퐹 : C → R is given by n o 휕 퐹 (푝) 휉 ∈ T ∗M 퐹 (푞) ≥ 퐹 (푝) + h휉 , 푞i 푞 ∈ C . M B 푝 log푝 for all (2.16)
In the above notation, the index M refers to the fact that it is the Riemannian subdierential; the set C should always be clear from the context.
We further recall the denition of the proximal map, which was generalized to Hadamard manifolds in Ferreira, Oliveira, 2002.
Denition 2.11. Let M be a Riemannian manifold, 퐹 : M → R be proper, and 휆 > 0. The proximal map of 퐹 is dened as 푝 1푑2 푝, 푞 휆 퐹 푞 . prox휆 퐹 ( ) B Arg min M ( ) + ( ) (2.17) 푞∈M 2
Note that on Hadamard manifolds, the proximal map is single-valued for proper geodesically convex functions; see Bačák, 2014b, Ch. 2.2 or Ferreira, Oliveira, 2002, Lem. 4.2 for details. The following lemma is used later on to characterize the proximal map using the subdierential on Hadamard manifolds.
Lemma 2.12 (Ferreira, Oliveira, 2002, Lem. 4.2). Let 퐹 : M → R be a proper, geodesically convex 푞 푝 function on the Hadamard manifold M. Then the equality = prox휆 퐹 is equivalent to 1 푝♭ ∈ 휕 퐹 (푞). 휆 log푞 M (2.18)
3 Fenchel Conjugation Scheme on Manifolds
In this section we present a novel Fenchel conjugation scheme for extended real-valued functions dened on manifolds. We generalize ideas from Bertsekas, 1978, who dened local conjugation on manifolds embedded in R푑 specied by nonlinear equality constraints.
Throughout this section, suppose that M is a Riemannian manifold and C ⊂ M is strongly convex. The denition of the Fenchel conjugate of 퐹 is motivated by Rockafellar, 1970, Thm. 12.1.
2020-10-12 cbna page 9 of 38 R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Ten [...] Fenchel Duality on Manifolds
Denition 3.1. Suppose that 퐹 : C → R, where C ⊂ M is strongly convex, and 푚 ∈ C. The 푚-Fenchel 퐹 퐹 ∗ ∗ conjugate of is dened as the function 푚 : T푚 M → R such that 퐹 ∗ 휉 휉 , 푋 퐹 푋 , 휉 ∗ . 푚 ( 푚) B sup h 푚 i − (exp푚 ) 푚 ∈ T푚 M (3.1) 푋 ∈LC,푚
퐹 ∗ Remark 3.2. Note that the Fenchel conjugate 푚 depends on both the strongly convex set C and on the base point 푚. Observe as well that when M is a Hadamard manifold, it is possible to have C = M. In the particular case of the Euclidean space C = M = R푑 , Denition 3.1 becomes 퐹 ∗ 휉 휉 , 푋 퐹 푚 푋 휉 , 푌 푚 퐹 푌 퐹 ∗ 휉 휉 ,푚 푚 ( ) = sup {h i − ( + )} = sup {h − i − ( )} = ( ) − h i 푋 ∈R푑 푌 ∈R푑 for 휉 ∈ R푑 . Hence, taking 푚 to be the zero vector we recover the classical (Euclidean) conjugate 퐹 ∗ from Denition 2.1 with X = R푛.
M 푚 ∈ M 퐹 M → 퐹 (푝) 1 푑2 (푝,푚) Example 3.3. Let be a Hadamard manifold, and : R dened as = 2 M . Due to the fact that 1 1 퐹 (푝) = 푑2 (푝,푚) = klog 푝k2 , 2 M 2 푚 푚 we obtain from Denition 3.1 the following representation of the 푚-conjugate of 퐹: n o 퐹 ∗ 휉 휉 , 푋 1 푋 2 푚 ( 푚) = sup h 푚 i − klog푚 exp푚 k푚 푋 ∈T푚 M 2 n o 휉 , 푋 1 푋 2 1 휉 2 . = sup h 푚 i − k k푚 = k 푚 k푚 푋 ∈T푚 M 2 2 Notice that the conjugate w.r.t. base points other than 푚 does not have a similarly simple expression. In M 푑 퐹 (푝) 1 k푝 − 푚k2 the Euclidean setting with = R and = 2 , it is well known that 1 1 퐹 ∗ (휉) = 퐹 ∗ (휉) = k휉 + 푚k2 − k푚k2 0 2 2 holds and thus, by Remark 3.2, 1 퐹 ∗ (휉) = 퐹 ∗ (휉) − h휉 ,푚i = k휉 k2 푚 2 holds in accordance with the expression obtained above.
We now establish a result regarding the properness of the 푚-conjugate function, generalizing a result from Bauschke, Combettes, 2011, Prop. 13.9.
퐹 푚 퐹 ∗ 퐹 Lemma 3.4. Suppose that : C → R and ∈ C where C is strongly convex. If 푚 is proper, then is also proper.
2020-10-12 cbna page 10 of 38 R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Ten [...] Fenchel Duality on Manifolds
퐹 ∗ 휉 퐹 ∗ Proof. Since 푚 is proper we can pick some 푚 ∈ dom 푚. Hence, applying Denition 3.1 we get 퐹 ∗ 휉 휉 , 푋 퐹 푋 , 푚 ( 푚) = sup h 푚 i − (exp푚 ) < +∞ 푋 ∈LC,푚 푋¯ 퐹 푋¯ 퐹 so there must exist at least one ∈ LC,푚 such that (exp푚 ) ∈ R. This shows that . +∞. On the 푝 푋 푝 퐹 푝 퐹 ∗ 휉 other hand, let ∈ C and take B log푚 . If ( ) were equal to −∞, then 푚 ( 푚) = +∞ for any 휉 ∗ 퐹 ∗ 퐹 푚 ∈ T푚 M, which would contradict the properness of 푚. Consequently, is proper.
Denition 3.5. Suppose that 퐹 : C → R, where C is strongly convex, and 푚,푚0 ∈ C. Then the (푚푚0)- 퐹 ∗∗ Fenchel biconjugate function 푚푚0 : C → R is dened as 퐹 ∗∗ 푝 휉 , 푝 퐹 ∗ 휉 , 푝 . 푚푚0 ( ) = sup h 푚0 log푚0 i − 푚 (P푚←푚0 푚0 ) ∈ C (3.2) 휉 0 ∈T∗ M 푚 푚0
퐹 ∗∗ 퐹 ∗∗ Note that 푚푚0 is again a function dened on the Riemannian manifold. The relation between 푚푚 and 퐹 is discussed further below, as well as properties of higher order conjugates.
퐹 푚 퐹 ∗∗ 푝 퐹 푝 푝 Lemma 3.6. Suppose that : C → R and ∈ C. Then 푚푚 ( ) ≤ ( ) holds for all ∈ C.
Proof. Applying (3.2), we have
퐹 ∗∗ 푝 휉 , 푝 퐹 ∗ 휉 푚푚 ( ) = sup h 푚 log푚 i − 푚 ( 푚) ∗ 휉푚 ∈T푚 M n o 휉 , 푝 휉 , 푋 퐹 푋 = sup h 푚 log푚 i − sup h 푚 i − (exp푚 ) ∗ 휉푚 ∈T푚 M 푋 ∈LC,푚 n o 휉 , 푝 휉 , 푋 퐹 푋 = sup h 푚 log푚 i + inf −h 푚 i + (exp푚 ) ∗ 푋 ∈LC,푚 휉푚 ∈T푚 M 휉 , 푝 휉 , 푝 퐹 푝 ≤ sup h 푚 log푚 i − h 푚 log푚 i + exp푚 log푚 ∗ 휉푚 ∈T푚 M = 퐹 (푝), which nishes the proof.
The following lemma proves that our denition of the Fenchel conjugate enjoys properties( 푖푖)–(푖푣) stated in Lemma 2.2 for the classical denition of the conjugate on a Hilbert space. Results parallel to properties( 푖) and( 푣푖) in Lemma 2.2 will be given in Lemma 3.12 and Proposition 3.9, respectively. Observe that an analogue of property( 푣) in Lemma 2.2 cannot be expected for 퐹 : M → R due to the lack of a concept of linearity on manifolds.
Lemma 3.7. Suppose that C ⊂ M is strongly convex. Let 퐹,퐺 : C → R be proper functions, 푚 ∈ C, 훼 ∈ R and 휆 > 0. Then the following statements hold.
푖 퐹 푝 퐺 푝 푝 퐹 ∗ 휉 퐺∗ 휉 휉 ∗ ( ) If ( ) ≤ ( ) for all ∈ C, then 푚 ( 푚) ≥ 푚 ( 푚) for all 푚 ∈ T푚 M.
2020-10-12 cbna page 11 of 38 R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Ten [...] Fenchel Duality on Manifolds
푖푖 퐺 푝 퐹 푝 훼 푝 퐺∗ 휉 퐹 ∗ 휉 훼 휉 ∗ ( ) If ( ) = ( ) + for all ∈ C, then 푚 ( 푚) = 푚 ( 푚) − for all 푚 ∈ T푚 M. 푖푖푖) 퐺 (푝) 휆 퐹 (푝) 푝 ∈ C 퐺∗ (휉 ) 휆 퐹 ∗ 휉푚 휉 ∈ T ∗M ( If = for all , then 푚 푚 = 푚 휆 for all 푚 푚 .
퐹 푝 퐺 푝 푝 퐹 푋 퐺 푋 푋 Proof. If ( ) ≤ ( ) for all ∈ C, then it also holds (exp푚 ) ≤ (exp푚 ) for every ∈ LC,푚. 휉 ∗ Then we have for any 푚 ∈ T푚 M that 퐹 ∗ 휉 휉 , 푋 퐹 푋 푚 ( 푚) = sup h 푚 i − (exp푚 ) 푋 ∈LC,푚 휉 , 푋 퐺 푋 퐺∗ 휉 . ≥ sup h 푚 i − (exp푚 ) = 푚 ( 푚) 푋 ∈LC,푚
This shows( 푖). Similarly, we prove( 푖푖): let us suppose that 퐺 (푝) = 퐹 (푝) + 훼 for all 푝 ∈ C. Then 퐺 푋 퐹 푋 훼 푋 휉 ∗ (exp푚 ) = (exp푚 ) + for every ∈ LC,푚. Hence, for any 푚 ∈ T푚 M we obtain 퐺∗ 휉 휉 , 푋 퐺 푋 푚 ( 푚) = sup h 푚 i − (exp푚 ) 푋 ∈LC,푚 휉 , 푋 퐹 푋 훼 = sup h 푚 i − ( (exp푚 ) + ) 푋 ∈LC,푚 휉 , 푋 퐹 푋 훼 퐹 ∗ 휉 훼. = sup {h 푚 i − (exp푚 )} − = 푚 ( 푚) − 푋 ∈LC,푚 푖푖푖 휆 퐺 푋 휆 퐹 푋 푋 Let us now prove( ) and suppose that > 0 and (exp푚 ) = (exp푚 ) for all ∈ LC,푚. Then 휉 ∗ we have for any 푚 ∈ T푚 M that 퐺∗ 휉 휉 , 푋 퐺 푋 푚 ( 푚) = sup {h 푚 i − (exp푚 )} 푋 ∈LC,푚 휉 , 푋 휆 퐹 푋 = sup h 푚 i − (exp푚 ) 푋 ∈LC,푚 휆 h휆−1휉 , 푋i − 퐹 ( 푋) 휆 퐹 ∗ 휉푚 . = sup 푚 exp푚 = 푚 휆 푋 ∈LC,푚
Suppose that 퐹 : C → R, where C is strongly convex, and 푚,푚0,푚00 ∈ C. The following proposition 퐹 ∗∗∗ ∗ 퐹 addresses the triconjugate 푚푚0푚00 : T푚00 M → R of , which we dene as 퐹 ∗∗∗ 퐹 ∗∗ ∗ . 푚푚0푚00 B ( 푚푚0 )푚00 (3.3)
Proposition 3.8. Suppose that M is a Hadamard manifold, 푚 ∈ M and 퐹 : M → R. Then the following holds: 퐹 ∗∗∗ 퐹 ∗∗ ∗ 퐹 ∗ ∗∗ 퐹 ∗ ∗ . 푚푚푚 = ( 푚푚)푚 = ( 푚) = 푚 on T푚 M (3.4)
Proof. Using Denitions 2.1, 3.1 and 3.5, it is easy to see that
퐹 ∗ ∗ 푝 퐹 ∗∗ 푝 ( 푚) (log푚 ) = 푚푚 ( )
2020-10-12 cbna page 12 of 38 R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Ten [...] Fenchel Duality on Manifolds
푝 holds for all in M. Now (3.3), Denition 3.1, and the bijectivity of exp푚 and log푚 imply that 퐹 ∗∗∗ 휉 퐹 ∗∗ ∗ 휉 , 푋 퐹 ∗∗ 푋 푚푚푚 ( 푚) = ( 푚푚)푚 = sup h 푚 i − 푚푚 (exp푚 ) 푋 ∈T푚 M 휉 , 푝 퐹 ∗∗ 푝 = sup h 푚 log푚 i − 푚푚 ( ) 푝 ∈M 휉 , 푝 퐹 ∗ ∗ 푝 = sup h 푚 log푚 i − ( 푚) (log푚 ) 푝 ∈M 휉 ∗ 푓 퐹 holds for all 푚 ∈ T푚 M. We now set 푚 B ◦ exp푚 and use Denitions 2.1 and 3.1 to infer that 퐹 ∗ 휉 휉 , 푋 퐹 푋 푓 ∗ 휉 푚 ( 푚) = sup h 푚 i − (exp푚 ) = 푚 ( 푚) 푋 ∈T푚 M 휉 ∗ holds for all 푚 ∈ T푚 M. Consequently, we obtain 퐹 ∗∗∗ 휉 휉 , 푝 푓 ∗∗ 푝 푚푚푚 ( 푚) = sup h 푚 log푚 i − 푚 (log푚 ) 푝 ∈M 휉 , 푋 푓 ∗∗ 푋 = sup h 푚 i − 푚 ( ) 푋 ∈T푚 M 푓 ∗∗∗ 휉 . = 푚 ( 푚) 푓 ∗∗∗ 푓 ∗ According to Bauschke, Combettes, 2011, Prop. 13.14 (iii), we have 푚 = 푚. Collecting all equalities conrms (3.4).
The following is the analogue of item (푣푖) in Lemma 2.2.
Proposition 3.9 (Fenchel–Young inequality). Suppose that C ⊂ M is strongly convex. Let 퐹 : C → R be proper and 푚 ∈ C. Then 퐹 푝 퐹 ∗ 휉 휉 , 푝 ( ) + 푚 ( 푚) ≥ h 푚 log푚 i (3.5) 푝 휉 ∗ holds for all ∈ C and 푚 ∈ T푚 M.
Proof. If 퐹 (푝) = ∞ the inequality trivially holds, since 퐹 is proper and hence 퐹 ∗ is nowhere −∞. It 퐹 푝 휉 ∗ 푝 푋 푝 remains to consider ( ) < ∞. Suppose that 푚 ∈ T푚 M, ∈ C and set B log푚 . From Denition 3.1 we obtain 퐹 ∗ 휉 휉 , 푝 퐹 푝, 푚 ( 푚) ≥ h 푚 log푚 i − exp푚 log푚 which is equivalent to (3.5).
We continue by introducing the manifold counterpart of the Fenchel–Moreau Theorem, compare Theorem 2.6. Given a set C ⊂ M, 푚 ∈ C and a function 퐹 : C → R, we dene 푓푚 : T푚M → R by (퐹 푋 , 푋 , (exp푚 ) ∈ LC,푚 푓푚 (푋) = (3.6) +∞, 푋 ∉ LC,푚.
Throughout this section, the convexity of the function 푓푚 : T푚M → R is the usual convexity on the vector space T푚M, i. e., for all 푋, 푌 ∈ T푚M and 휆 ∈ [0, 1] it holds 푓푚 (1 − 휆)푋 + 휆푌 ≤ (1 − 휆)푓푚 (푋) + 휆푓푚 (푌). (3.7)
2020-10-12 cbna page 13 of 38 R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Ten [...] Fenchel Duality on Manifolds
We present two examples of functions 퐹 : M → R dened on Hadamard manifolds such that 푓푚 is convex. In the rst example, 퐹 depends on an arbitrary xed point 푚0 ∈ M. In this case, we can 0 guarantee that 푓푚 is convex only when 푚 = 푚 . In the second example, 퐹 is dened on a particular Hadamard manifold and 푓푚 is convex for any base point 푚 ∈ M. It is worth emphasizing that the functions in the following examples are geodesically convex as well but in general, the convexity of 퐹 and 푓푚 are unrelated and all four cases can occur.
0 Example 3.10. Let M be any Hadamard manifold and 푚 ∈ M arbitrary. Consider the function 푓푚0 퐹 퐹 푝 푑 푚0, 푝 푝 dened in (3.6) with : M → R given by ( ) = M ( ) for all ∈ M. Note that 푓 푋 퐹 푋 푑 푚0, 푋 푋 푋 . 푚0 ( ) = (exp푚0 ) = M exp푚0 = k k푚0 for all ∈ T푚0 M
Hence, it is easy to see that 푓푚0 satises (3.7) and, consequently, it is convex on T푚0 M.
Our second example is slightly more involved. A problem involving the special case 푎 = 0 and 푏 = 1 appears in the dextrous hand grasping problem in Dirr, Helmke, Lageman, 2007, Sect. 3.4.
Example 3.11. Denote by P(푛) the set of symmetric matrices of size 푛 × 푛 for some 푛 ∈ N, and by M = P+ (푛) the cone of symmetric positive denite matrices. The latter is endowed with the ane invariant Riemannian metric, given by
−1 −1 (푋 , 푌)푝 B trace(푋푝 푌푝 ) for 푝 ∈ M and 푋, 푌 ∈ T푝 M. (3.8)
The tangent space T푝 M can be identied with P(푛). M is a Hadamard manifold, see for example Lang, 1999, Thm. 1.2, p. 325. The exponential map exp푝 : T푝 M → M is given by 푋 푝1/2 (푝−1/2푋푝−1/2)푝1/2 푝, 푋 . exp푝 = e for ( ) ∈ T M (3.9) Consider the function 퐹 : M → R, dened by 퐹 (푝) = 푎 ln2 (det 푝) − 푏 ln(det 푝), (3.10) where 푎 ≥ 0 and 푏 ∈ R are constants. Using (3.9) and properties of det: P(푛) → R, we have 푋 (푚−1/2푋푚−1/2) 푚 det(exp푚 ) = det e det −1/2 −1/2 −1 = etrace(푚 푋푚 ) det푚 = etrace(푚 푋 ) det푚, 푚 푓 푋 퐹 푋 for any ∈ M. Hence, considering 푚 ( ) = (exp푚 ), we obtain
2 −1 −1 2 푓푚 (푋) = 푎 trace (푚 푋) + 2푎 trace(푚 푋) ln(det푚) + 푎 ln (det푚) − 푏 trace(푚−1푋) − 푏 ln(det푚), for any 푚 ∈ M. The Euclidean gradient and Hessian of 푓푚 are given by 푓 0 푋 푎 푚−1푋 푚−1 푎 푚 푚−1 푏푚−1, 푚 ( ) = 2 trace( ) + 2 ln(det ) − 푓 00 푋 푌, 푎 푚−1푌 푚−1, 푚 ( )( ·) = 2 trace( ) 푋, 푌 푛 푓 00 푋 푌, 푌 푎 2 푚−1푌 respectively, for all ∈ P( ). Hence 푚 ( )( ) = 2 trace ( ) ≥ 0 holds. Thus, the function 푓푚 is convex for any 푚 ∈ M. From Ferreira, Louzeiro, Prudente, 2019, Ex. 4.4 we can conclude that (3.10) is also geodesically convex.
2020-10-12 cbna page 14 of 38 R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Ten [...] Fenchel Duality on Manifolds