arXiv:2102.05134v2 [math.OC] 18 Feb 2021 htflos hs ag ucinidcste norm. a then symme induces centrally function consider gauge whose only follows, will what we again, simplicity For ucinof function yisui alsuiomcneiy hc oal rvst Gauges. drives notably [ which inequalities convexity, concentration several uniform induces ball’s the instance, unit For fields. its many in by role central a plays and set, [ resul two only knowledge, our to that, is work arguab our is of This tivation covered. gro characterizat sparsely recent equivalent only the survey, are Despite briefly opti conditions of now weaker explored. set we less feasible which much the ture, been on has conditions but similar icant of That stood. KBGY20 s various in LR15 applications with learning, machine in notably ovxasmtoso h osritstt describe to set constraint the of assumptions convex [ in introduced sets gauge of tion ataon fltrtr rudlclzdpoete fob of properties algorith localized on [ around properties impact literature significant of a amount vast have properties local these aeasgicn mato rbe opeiy[ complexity problem on impact significant a have BDLM10 § ‡ ∗ † mor to convexity strong generalizes (UC) convexity Uniform of convexity uniform or convexity strong of impact the While togcneiyo nfr ovxt rpriso h obj the of properties convexity uniform or convexity Strong D.I. 8548. UMR CNRS Germany. Berlin, Institute, Zuse ehiceUiesta,Bri,Germany. Universit¨at, Berlin, Technische , MSJ cl oml u´rer,Prs France. Sup´erieure, Paris, Normale Ecole ´ ihsm rcia xmlsi piiainadmciele results. machine complexity and new to optimization lead in assumptions localized examples practical wo this some and with counterparts, functional their con as o uniform thoroughly or convexity as convexity strong strong the these complexity, on on rely impact which co of optimization, body offline recent a or on insights methods. sharper provide optimization to of conditions convergence the analyzing ge in the from used stemming results connect and spaces dimensional A , o ipiiy efcshr ncmatcne sets convex compact on here focus we simplicity, For BSTRACT , FKT20 BDL07 C ABRS10 + HMSKERDREUX THOMAS 15 rvdsacrepnec ewe esadnr-iefunct norm-like and sets between correspondence a provides , OA N LBLUIOMCNEIYCONDITIONS CONVEXITY UNIFORM GLOBAL AND LOCAL SFM erve aiu hrceiain fuiomconvexity uniform of characterizations various review We . ,gm hoy[ theory game ], o ntne eeae ntecnegneaaye ffirs of analyses convergence the in leveraged instance, for ] , BNPS17 + 17 , Sti18 , Rd20 † ,dfeeta rvc [ privacy differential ], , DH19 ∗ ALLW18 LXNR D’ASPREMONT ALEXANDRE , , k KdP19 x , k LS19 C Pis75 seuvln osrn ovxt [ convexity strong to equivalent is ] .I 1. , , , Ker20 inf NTRODUCTION ALW19 , Pin94 Nes15 λ ]. 1 ≥ global ki nefr ocretti maac.W conclude We imbalance. this correct to effort an is rk , 0 , IN14 methods, order first by exploited heavily are and ] TTZ14 MOP20 peiyrslsi erigter,oln learning, online theory, learning in results mplexity | eiypoete ffail esaentexploited not are sets feasible of properties vexity tig uha itiue piiain[ optimization distributed as such ettings mtyo aahsae with spaces Banach of ometry npriua,w sals oa esoso these of versions local establish we particular, In h esbest hl hyhv significant a have they While set. feasible the f emtyo aahsaei ral influenced greatly is space Banach a of geometry x etv ucin,sc sKurdyka-Łojasiewicz as such functions, jective rigweelvrgn hs odtosand conditions these leveraging where arning efrac.Ti ssrrsn ie the given surprising is This performance. m ecnegnebhvo fmrigls and martingales, of behavior convergence he ∈ ahn erigpolmcmlxt,while complexity, problem learning machine oso togcneiyo esadrelated and sets of convexity strong of ions rccne oiswt oepyitro in interior nonempty with bodies convex tric ]. signif- as just priori a is problems mization rcsl uniytecraueo convex a of curvature the quantify precisely e ylaigt oeconfusion, some to leading ly s[ ts ciefnto fa piiainproblem optimization an of function ective λ ‡ , , C igltrtr eeaigsc e struc- set such leveraging literature wing § C ]. ZZMW17 N EATA POKUTTA SEBASTIAN AND ,
nfiiedmninlsae.Tegauge The spaces. finite-dimensional in Dun79 n moheso omblsi finite- in balls norm on smoothness and . h betv ucini elunder- well is function objective the , os[ ions KdP20 , CLK19 -re piiainmethods optimization t-order Roc70 Mol20 consider ] cln inequalities scaling , n sdfie as defined is and ] INS .Aohrkymo- key Another ]. + 19 † local , ∗ , e.g. BFTGT19 h no- the , JST (Gauge) strongly + 14 , , Uniformly Convex Sets in Optimization. Some feasible set structures lead to accelerated convergence rates for first-order algorithms, e.g., projection-free algorithms. Conditional gradients, a.k.a. Frank-Wolfe (FW) algorithms, are known to enjoy accelerated convergence rates compared to the O(1/T ) baseline when the set is globally strongly convex [Pol66, DR70, GH15]. However, to our knowledge, only two results in machine learning consider local strong convexity assumptions on the feasible set. [Dun79] proposes a geometrical condition on a given point x∗ ∈ ∂C ensuring accelerated convergence rates for Frank-Wolfe algorithms and [KdP20] then show that this assumption is equivalent to local strong convexity and further generalizes all existing accelerated Frank-Wolfe regimes to hold also on locally uniformly convex sets. Other projection-free algorithms exist with improved guarantees on strongly convex sets, e.g., for non- convex optimization [RBWM19], min-max problems [GJLJ17, WA18] or approximate Carath´eodory results [CP19]. The various equivalent definitions of strongly convex sets have also stimulated an interest in de- signing and analysing affine-invariant first-order methods. For instance, [dGJ18] proposed a choice of norm and prox-function in the implementation of first-order accelerated methods from [Nes05] which make these methods affine-invariant and provably optimal for optimization problems constrained on uniformly convex ℓp balls with p > 1. [KLLJS20] proposed an optimal (w.r.t. known analyses) affine-invariant analysis of the affine-covariant Frank-Wolfe algorithm on strongly convex sets. Their analysis rely on assumptions that combine scaling inequalities for strongly convex feasible sets and an affine-invariant characterization of smoothness [Jag13]. Finally, strong convexity for sets was also used outside of projection-free optimization techniques in, e.g., [VV20, Bac20].
Uniformly Convex Sets in Machine Learning. The global strong convexity of sets also characterizes performance in learning theory and online learning. [HLGS16, HLGS17] studied logarithmic regret bounds of simple algorithms for online linear learning on smooth strongly convex decisions sets. [Mol20, KdP20] later extended these results to non-smooth and uniformly convex sets. [AYAS09, RT10] considered such assumptions of the constraint set for stochastic linear bandits and [AR09, BCL18] for non-stochastic linear bandits. The global uniform convexity of the decision set has recently attracted much attention in “online learning with a hint”, which is a multiplicative version of optimistic online learning. In this framework, regret bounds are obtained in terms of the uniform convexity power type of the decision set [DFHJ17, BCKP20a, BCKP20b]. [KST09] studied generalization bounds of low-norm linear classes. They obtain upper bounds on the Rademacher constant of the hypothesis class that depend on the strong convexity of the norm regularizing the class. However, they expressed these results in terms of the functional strong convexity of the square of the norm. In Section 6.2, we recall that this result is a quantitative corollary of known results in the geometrical study of Banach spaces: a uniformly convex space has a non-trivial Rademacher type. [EBEGT19] also consider global strong convexity of the feasible region to strengthen convergence results in generalization bounds in the Predict-Then-Optimize framework. They notably rely on a characterization of strong convexity akin to scaling inequalities covered in (b) of Theorems 4.1-5.1. In online learning on Banach spaces, several works analyse regret bounds in terms of the martingale type/cotype of the space [ST10, SST11], a property directly tied with uniform convexity. In fact, [ST10, SST11] relies on the fact that the martingale type of a space is related to the existence of a uniformly convex function on this space, see [ST10, Theorem 1]. Besides, as we recall in Section 6.2, a uniformly convex space has also a Rademacher type (the reverse might not be true), a notion related to the martingale type. This martingale type structure has been leveraged in various applications in learning [Sch16, KCd17] as it is a central tool to derive concentration inequalities [Pis75, Pin94, Pis11]. However, our main focus here remains on uniform convexity as it has a simple geometrical interpretation in terms of scaling inequalities with direct algorithmic consequences (items (b) in Theorems 4.1-5.1), and admits local versions (Theorem 5.1) which also better characterize empirical performance, as opposed to martingale type/cotype properties.
Contributions. We first provide elementary proofs of various local and global equivalent characterizations of uniform convexity of sets. We then discuss applications in machine learning and cover some practical examples leveraging these alternative points of view in Section 6. Most of our results are quantitative. 2 We then characterize the uniform convexity of a set in terms of the “angles” between normal cone di- rections and feasible directions at boundary points. These quantifications appear regularly in convergence proofs of algorithms such as Frank-Wolfe and we call them scaling inequalities. The link with uniform convexity is often ignored and our objective here is to explicitly quantify this connection. Finally, we derive equivalent relationships for the localized versions of UC (see Theorem 5.1) to better explain empirical performance in optimization methods.
Related Works. Our work connects different perspectives of uniform convexity of a set. Our Theorems 4.1- 5.1 rely on several classical monographs. We refer to [Zˇa83, AP95, Zˇa02] for the study of functional uniform convexity and smoothness, to [LT13, Bea11, DGZ93, BGHV09] for the study of the geometry of Banach spaces in terms of uniform convexity and smoothness, and to [Pis11] for results on type/cotype properties of a Banach space. We also invoke [GI17] for practical local characterizations of the strong convexity of sets. Finally, we rely on [Roc70] for convex analysis references and on [Sch14] for convex geometry in finite dimensions. Whenever possible, we keep track of the precise reference to these monographs when establishing the results in Sections 3-5. In many cases, we have adapted the proofs to make the results quantitative.
Outline. In Section 2 we group some preliminary facts and in Section 3, we recall the definition of uniform convexity and smoothness for functions and spaces. In Section 4, we present Theorem 4.1 stating different equivalent definitions of the uniform convexity of a norm ball in finite-dimensional spaces. Theorem 5.1 in Section 5 provides the same results but with local assumptions. Results in Section 4-5 are self-contained and proofs are elementary. However, they hold even in infinite-dimensional spaces. Finally, in Section 6, we provide three examples in offline optimization and learning theory where these different points of view on uniform convexity lead to new results.
Notations. The finite-dimensional ambient vector space is Rm and by Int(C) and ∂C, we denote the interior , of C and the boundary of C respectively. The support function of C is defined as σC(d) supv∈C hv; di. The ∗ ∗ ∗ normal cone of C at x ∈C is defined as NC(x ) , d | hd; x − x i≤ 0 ∀x ∈C and the support set of C at ∗ d is F (d) , x ∈C|hx; di = σ (d) . We write f (y)= sup m hx; yi− f(x) as the Fenchel conjugate C C x∈R of f. We will consider convex functions f : Rm → R, finite everywhere and continuous. In particular, we ∗∗ then have that f = f. For a norm k · k, we write kxk⋆ , sup hx; yi | kyk≤ 1 to denote its dual norm. We sometimes also use k · k⋆. We use different star symbols to distinguish between dual norms and Fenchel dual, e.g., the Fenchel dual of a norm is not the dual norm in general. We write Bk·k the unit ball and Sk·k the unit sphere associated to a norm k · k. We most often consider (p,q) s.t. p ≥ 2, q ∈]1, 2] and 1/p + 1/q = 1. The p (resp. q) parameter will hence be employed in the context of uniform convexity (resp. smoothness).
2. PRELIMINARIES We restrict the discussion to finite-dimensional spaces for simplicity. It allows for a direct analogy of duality between a norm and its dual norm with the duality between the norm ball’s gauge function and the support function of the norm ball’s polar, which we detail now. Note that results similar to Theorems 4.1 and 5.1 hold in infinite-dimensional Banach spaces though. We consider centrally symmetric convex bodies C with non-empty interior so that the gauge function k·kC of C is a norm [Roc70, Theorem 15.2.]. In particular, the unit ball (resp. the sphere) of k · kC corresponds to C (resp. ∂C), i.e., C = Bk·kC and ∂C = Sk·kC . The m function k · kC and σC are every-where finite convex functions from R to R+ and, e.g., subdifferentiable [Roc70, Theorem 23.4]. A strictly convex set C is such that for any distinct (x,y) ∈ ∂C, we have (x + y)/2 ∈ C \ ∂C. Conversely, C is smooth if there is only one supporting hyperplane at each boundary point of C. The following lemma recalls the classical relation between strict convexity of a set and differentiability of the support function [Sch14, Cor 1.7.3]. 3 m Lemma 2.1 (Support/Gauge Differentiability). Consider C ⊂ R a compact convex set. σC is differentiable m at d ∈ R \ {0} if and only if {y | hy; di = σC(d)} = {x}. In that case ∇σC(d) = x. In particular, if C is m strictly convex, then σC is differentiable on R \ {0}. The polar of C is defined as C◦ = d ∈ Rm | hx; di ≤ 1 ∀x ∈ C . Importantly, the support and gauge function are dual to each other via the polar operation, i.e., σC(·) = k · kC◦ [Roc70, Theorem 14.5.]. We systematically write x (resp. d) for an element of C (resp. C◦). This duality parallels that of a norm and its ⋆ dual. Indeed, if k · kC is a norm, then k · kC◦ is a norm and k · kC = k · kC◦ [Roc70, Cor 15.1.2]. Finally, the following classical lemma will be particularly useful [Asp68, Lemma 2]. 1 1 Lemma 2.2. Let p,q > 1 s.t. p + q = 1. Then, for any α> 0, we have 1 α ασp ∗(·)= − k · kq . C (αp)1/(p−1) (αp)q C 1 h 1 ip 1 q In particular for α = p , it means that the Fenchel conjugate of p k · k is q k · k⋆. ∗ , Proof of Lemma 2.2. We recall the proof for completeness. Consider ρ (u) supt>0 tu − ρ(t) . For any y, we have ∗ ρ (kyk⋆) = supt>0 tkyk⋆ − ρ(t) = supt>0supx6=0 thy; xi/kxk− ρ(t) hy; xt/k xki h hy; xi i = sup t − ρ(t) = sup t − ρ(t); kxk = t t>0;x6=0 kxt/kxkk t>0;x6=0 kxk h i ∗ n o = supx=06 hy; xi− ρ(kxk) = (ρ ◦k·k) (y). r ∗ Also, an immediate calculation proves that for u ≥ 0 and when ρ(t) = αt with r > 1, we have ρ (u) = 1 α r/(r−1) − u . We finally conclude noting that σ ◦ (·)= k · k and k · k ◦ = k · k . (αr)1/(r−1) (αr)r/(r−1) C C C ⋆ h i
3. SPACES,SETS,FUNCTIONS UNIFORM SMOOTHNESSAND CONVEXITY In this section, we introduce the necessary concepts to state the main theorems in Sections 4-5. We recall the classical notions of uniform convexity and smoothness for functions (Section 3.1) and Banach spaces (Section 3.2). We also recall quantitative statements on the duality correspondence between smoothness and uniform convexity in each of these situations. 3.1. Uniform Convexity and Smoothness of Functions. Uniform convexity and smoothness of functions were introduced to analyse optimization algorithms [Pol66] and extensively studied in [Zˇa83, AP95, Zˇa02], and is now a standard assumption in the analysis of first order methods, see, e.g., [IN14]. The following equivalent definitions of uniformly smooth function are classical, see, e.g., [Zˇa02, (i)- (iv)-(ix) of Theorem 3.5.6.], which notably shows that a continuous uniformly smooth function is Fr´echet differentiable. This means that a norm for instance is not uniformly smooth as it is not differentiable at 0, see Lemma 2.1. This explains why hypothesis (c) in Theorem 4.1 below is restricted to Sk·k(1). In the following sections, we consider only uniform convexity and smoothness of functions to ultimately apply it to simple transformations of the gauge and support functions. We recall self-contained proofs of the equivalences in the definition to obtain quantitative statements. Note that whenever we invoke uniformly smooth or convex functions in the other sections, we will often refer to these zero-order characterization. Definition 3.1 (Uniformly Smooth Functions). Consider a convex function f : Rm → R and q ∈]1, 2]. The following assertions are equivalent (a) (Zero-order) There exists c > 0 s.t. f is (c, q)-uniformly smooth with respect to k · k, i.e., for any (x,y) and λ ∈ [0, 1] f(λx + (1 − λ)y) + (c/q)λ(1 − λ)kx − ykq ≥ λf(x) + (1 − λ)f(y).
4 (b) (First-order) f is differentiable and there exists c′ > 0 such that for any (x,y), we have c′ f(y) ≤ f(x)+ h∇f(x); y − xi + kx − ykq. q
(c) (Holder¨ gradient) f is differentiable and there exists c′′ > 0 such that f is (c′′,q)-Holder-smooth¨ w.r.t. k · k, i.e., for any (x,y) ′′ q−1 ∇f(x) −∇f(y) ⋆ ≤ c kx − yk .
Proof of equivalency in Definition 3.1. We adapt the proof of [Zˇa02, Theorem 3.5.6] to our case. (a) =⇒ (b). Let (x,y) ∈ Rm and λ ∈]0, 1]. The zero-order condition evaluated at (x,y) implies that f(y + λ(x − y)) − f(y) + (c/q)(1 − λ)kx − ykp ≥ f(x) − f(y). (1) λ And because f is a finite convex function, the limit of f(y + λ(x − y)) − f(y) /λ when λ converges to 0+ exists [Roc70, Theorem 23.1.] and f ′(x, ·) is defined for d ∈ Rm as f(y + λd) − f(y) f ′(x, d) , lim . λ→0+ λ In particular, with d = x − y, it implies in (1) that f ′(y,x − y) + (c/q)kx − ykq ≥ f(x) − f(y). (2) Let us now show that f ′(x, ·) is linear. By definition of f ′(x, ·), we have that f ′(x,y) ≥ −f ′(x, −y). Let us now show that the other side inequality is also true. Summing the two versions of (2) by interchanging x and y, we obtain f ′(y,x − y)+ f ′(x,y − x) + (2c/q)kx − ykq ≥ 0. Let u ∈ Rm, t> 0 and write ρ(t)= f(x + tu). Then
′ , ρ(λ) − ρ(t) ′ ρ+(t) lim = f (x + tu, u) λ→t+ λ − t ′ , ρ(λ) − ρ(t) ′ ρ−(t) lim = −f (x + tu, −u). λ→t− λ − t ′ ′ ′ ′ Then, because ρ is convex, we have ρ+(−t) ≤ ρ−(0) ≤ ρ+(0) ≤ ρ−(t). Hence, for any t> 0 2c f ′(x, u)+f ′(x, −u)= ρ′ (0)−ρ′ (0) ≤ ρ′ (t)−ρ′ (−t)= − f ′(x+tu, −u)+f ′(x−tu, u) ≤ (2t)qkukq. + − − + q We conclude that f ′(x, u) ≤ −f ′(x, −u) and finally that f′(x, u) = −f ′(x, −u). Hence f ′(x, ·) is a bounded linear function for any x so that f is differentiable with f ′(x, h) = h∇f(x); hi. We conclude by letting λ converging to 1 in (1). (b) =⇒ (a). Write xλ = λx+(1−λ)y. Applying the first order at x = xλ +x−xλ and y = xλ +y−xλ, we obtain q q f(x) ≤ f(xλ) + (1 − λ)h∇f(xλ); x − yi + (c/q)(1 − λ) kx − yk q q f(y) ≤ f(xλ)+ λh∇f(xλ); y − xi + (c/q)λ kx − yk . Then, by multiplying the inequalities respectively with λ and 1 − λ and summing then, we obtain q−1 q−1 q λf(x) + (1 − λ)f(y) ≤ f(xλ) + (c/q)λ(1 − λ) (1 − λ) + λ kx − yk . Then, by symmetry of (1 − λ)q−1 + λq−1 and because q − 1 ∈]0, 1], we obtain q λf(x) + (1 − λ)f(y) ≤ f(xλ) + 2(c/q)λ(1 − λ)kx − yk .
5 (b) =⇒ (c). For any z, by convexity of f, we have f(y + z) ≥ f(x)+ h∇f(x); y + z − xi and by c′ q assumption we have f(y + z) ≤ f(y)+ h∇f(y); zi + q kzk . Hence c′ f(y)+ h∇f(y); zi + kzkq ≥ f(x)+ h∇f(x); y + z − xi q so that for any z c′ c′ hz; ∇f(x) −∇f(y)i− kzkq ≤ f(y) − f(x)+ h∇f(x); x − yi≤ kx − ykq. q q Then, by taking the supremum over z on both sides, we obtain 1 ∗ c′ c′ k · kq (∇f(x) −∇f(y))/c′ ≤ kx − ykq. q q 1 1 With p ≥ 2 s.t. p + q = 1, Lemma 2.2 then implies c′ ∇f(x) −∇f(y) p c′ ≤ kx − ykq. p c′ ⋆ q
p 1 In particular, q = q−1 and we obtain ′′ c q−1 k∇f(x) −∇f(y)k⋆ ≤ kx − yk . (q − 1)1/p (c) =⇒ (b). By convexity of f, we have f(x) ≥ f(y)+ h∇f(y); x − yi. Hence by definition of the dual norm, we obtain
f(y) − f(x) − h∇f(x); y − xi≤h∇f(y) −∇f(x); y − xi≤k∇f(y) −∇f(x)k⋆ky − xk. and using the H¨older-smoothness of f we obtain f(y) − f(x) − h∇f(x); y − xi≤ c′ky − xkq. We now define uniform convexity of a function, see, e.g., [AP95, Definition 1]. We state the results in terms of subgradients as gauge or support functions are not necessarily differentiable. Definition 3.2 (Uniformly Convex Functions). Consider a convex function f : Rm → R and p ≥ 2. The following assertions are equivalent (a) (Zero-order) There exists c > 0 s.t. f is (c, p)-uniformly convex with respect to k · k, i.e., for any (x,y) and λ ∈ [0, 1], we have f(λx + (1 − λ)y) + (c/p)λ(1 − λ)kx − ykp ≤ λf(x) + (1 − λ)f(y).
(b) (First-order) There exists α> 0 s.t. for any (x,y) ∈C and d ∈ ∂f(x), we have α f(y) ≥ f(x)+ hd; y − xi + kx − ykp. p
Proof of equivalency in Definition 3.2. (a) =⇒ (b). Let (x,y) ∈ Rm and d ∈ ∂f(y). Combining convexity of f and zero-order uniform convexity, we have f(y)+ λhd; x − yi≤ f(y + λ(x − y)) ≤ f(y)+ λ(f(x) − f(y)) − (c/p)λ(1 − λ)kx − ykp. Then, dividing by λ and evaluating with λ converging to zero, we have hd; x − yi≤ f(x) − f(y) − (c/p)kx − ykp.
6 (b) =⇒ (a). Write xλ = λx + (1 − λ)y. We apply the first-order condition at x = xλ + x − xλ and y = xλ + y − xλ. With d ∈ ∂f(xλ), we have α f(x) ≥ f(x ) + (1 − λ)hd; x − yi + (1 − λ)pky − xkp λ p α f(y) ≥ f(x )+ λhd; y − xi + λpky − xkp. λ p Multiplying the inequalities respectively by λ and 1 − λ and summing them, we obtain α λf(x) + (1 − λ)f(y) ≥ f(x )+ λ(1 − λ) (1 − λ)p−1 + λp ky − xkp. λ p p−1 p p−2 Then, by symmetry, we have that minλ∈[0,1] (1 − λ) + λ = 1/2 , which concludes that α λf(x) + (1 − λ)f(y) ≥ f(x )+ λ(1 − λ)ky − xkp. λ 2p−2p Uniform smoothness (US) and uniform convexity (UC) are dual properties by Fenchel conjugacy [Zˇa83, Theorem 2.1.] or [AP95, Proposition 2.6]. We recall a proof below, both for completeness and to obtain quantitative statements. Proposition 3.3 (Uniform Smoothness and Convexity with Fenchel duality). Consider α,c > 0, p ≥ 2 and 1 1 Rm R q ∈]1, 2] such that p + q = 1, and a norm k · k with its dual norm k · k⋆. Let f : → be a convex function. We have the following implications (a) If f is (α, q)-uniformly smooth w.r.t. k · k (Definition 3.1 (a)), then f ∗ is (1/(pαp−1),p)-uniformly convex w.r.t. k · k⋆ (Definition 3.2 (a)). (b) If f is (c, p)-uniformly convex w.r.t. k · k, then f ∗ is 1/(qcq−1),q -uniformly smooth with respect to k · k . ⋆ Proof of Proposition 3.3. Let us prove (b), (a) follows similarly. Assume f is (c, p)-uniformly convex. m m Consider (y1,y2) ∈ R , λ ∈ [0, 1] and write yλ = λy1 + (1 − λ)y2. Similarly, for any (x1,x2) ∈ R , let ∗ ∗ us write xλ = λx1 + (1 − λ)x2, f(xi) = fi for i = 1, 2, f(xλ) = fλ, f(xλ) = fλ etc. By definition of conjugate functions, and using the zero-order uniform convexity of f(·) at xλ, we have ∗ ∗ p hyλ; xλi≤ f (yλ)+ fλ ≤ f (yλ) − (c/p)λ(1 − λ)kx1 − x2k + λf1 + (1 − λ)f2.
By adding and subtracting λ(1 − λ)hy1 − y2; x1 − x2i, we obtain ∗ p hyλ; xλi≤ f (yλ)+λf1+(1−λ)f2−λ(1−λ)hy1−y2; x1−x2i+λ(1−λ) hy1−y2; x1−x2i−(c/p)kx1−x2k . p ∗ The right term in brackets is upper bounded by ((c/p)k · k ) (y1 − y2), so that ∗ p ∗ hyλ; xλi− λf1 − (1 − λ)f2 + λ(1 − λ)hy1 − y2; x1 − x2i≤ f (yλ)+ λ(1 − λ)((c/p)k · k ) (y1 − y2). Note also the following equality
hyλ; xλi + λ(1 − λ)hy1 − y2; x1 − x2i = λhy1; x1i + (1 − λ)hy2; x2i. Hence, we obtain ∗ p ∗ λhy1; x1i + (1 − λ)hy2; x2i− λf1 − (1 − λ)f2 ≤ f (yλ)+ λ(1 − λ)((c/p)k · k ) (y1 − y2) ∗ p ∗ λ hy1; x1i− f2 + (1 − λ) hy2; x2i− f2 ≤ f (yλ)+ λ(1 − λ)((c/p)k · k ) (y1 − y2).
Because the last inequality is true for any (x1,x2), we conclude that ∗ ∗ ∗ p ∗ λf (y1) + (1 − λ)f (y2) ≤ f (yλ)+ λ(1 − λ)((c/p)k · k ) (y1 − y2). p ∗ 1 q ∗ q−1 Lemma 2.2 implies that ((c/p)k · k ) (y1 − y2)= qcq−1 ky1 − y2k⋆. Finally f is 1/(qc ),q -uniformly smooth with respect to k · k . ⋆ In the following proposition, we provide similar results for local notions of uniform convexity and smoothness of a function. These are quantitative versions of [AP95, Proposition 3.2.] or [Zˇa83, (iv) & (v) Theorem 2.1.]. 7 1 1 Rm R Proposition 3.4. Consider α,c > 0, p ≥ 2 and q ∈]1, 2] such that p + q = 1. Let f : → a convex function and (x, d) such that d ∈ ∂f(x) and x ∈ ∂f ∗(d). The following assertions are equivalent ∗ (a) For some α> 0, f is (α, q)-uniformly-smooth at d w.r.t. to k · k⋆, i.e., for all d1, we have α f ∗(d ) ≤ f ∗(d)+ hx; d − di + kd − dkq . 1 1 q 1 ⋆
(b) For some c> 0, f is (c, p)-uniformly convex at x w.r.t k · k, i.e., for any y, we have c f(y) ≥ f(x)+ hd; y − xi + ky − xkp. p
Proof of Proposition 3.4. First note, that since f is finite l.s.c., for d ∈ ∂f(x), we have x ∈ ∂f ∗(d) [Roc70, Theorem 23.5.]. Let us show that (a) =⇒ (b), the converse follows similarly. Recall that ∗ q sup m . Write , . Combining the uniform smooth- f(y) = d1∈R hy; d1i− f (d1) Φ(d1) α/qkd1 − dk⋆ ness assumption on f ∗, adding and subtracting hy; di and with the equality f ∗(d)+ f(x)= hd; xi [Roc70, Theorem 23.5.], we have for any y ∗ sup m f(y) ≥ d1∈R hy; d1i− f (d)+ hx; d1 − di + Φ(d1 − d) ∗ f(y) ≥ sup m hy; d − di− f (d)+ hx; d − di + Φ(d − d) + hy; di d1∈R 1 1 1 ∗ sup m f(y) ≥ d1∈R hy − x; d1 − d i− Φ(d1 − d)+ hy; di− f (d)
f(y) ≥ sup m hy − x; d − di− Φ(d − d) + hy; di + f(x) − hd; xi d1∈R 1 1 ∗ f(y) ≥ f(x)+ hd; y − xi +Φ (y − x). Write k · k = k · kC for some compact centrally symmetric convex set C with nonempty interior. Then, note α q ∗ p 1/(q−1) that Φ= q σC. Hence, by Lemma 2.2, we have Φ (·)= k · kC /(pα ). Finally 1 f(y) ≥ f(x)+ hd; y − xi + ky − xkp . pα1/(q−1) C 3.2. Uniform Convexity and Smoothness for Sets and Spaces. Moduli of convexity and smoothness of m a norm k · kC help characterize the geometry of the normed space (R , k · kC) or the convex set C. This connects set uniform convexity with results in the study of Banach spaces, in the special case where C is cen- trally symmetric with nonempty interior. In Section 6.2, we provide an important use case stemming from this other perspective on uniform convexity. These moduli are classical objects characterizing either en- hanced convex properties of C (for uniform convexity, rotundity) or regularity of the boundary of C (uniform smoothness). Here too, these properties are dual for a normed space and its dual space [Lin63]. The (global) modulus of convexity [Cla36] is defined, for ǫ ∈ [0, 2], as
δk·kC (ǫ)= inf{1 − k(x + y)/2kC | kxkC = kykC = 1; kx − ykC ≥ ǫ}. (3)
The restriction of ǫ ∈ [0, 2] ensures that the infimum is defined. It measures the convexity of k · kC at midpoints on the border of C. Note that the value of δk·kC does not change by considering (x,y) ∈ Bk·k(1) in place of Sk·k(1), see discussion following [LT13, Definition 1.e.1]. The modulus of smoothness [Lin63] of k · kC is defined, for τ > 0, as
ρk·kC (τ)= sup{(kx + τykC + kx − τykC )/2 − 1 | kxkC = kykC = 1}. (4) We can now define uniformly convex (resp. smooth) norm balls and normed spaces. Definition 3.5 (Uniformly Convex Set or Space). Consider a compact convex set C, p ≥ 2 and α > 0. Assume C is centrally symmetric with nonempty interior. C is (α, p)-uniformly convex iff for any ǫ ∈ [0, 2] p δk·kC (ǫ) ≥ αǫ . m In that case, we also say that the normed space (R , k · kC) is uniformly convex of type p. 8 There are other equivalent definitions of the set uniform convexity of C. We will detail some of them in Theorem 4.1, prove their equivalence and discuss their practical significance. Definition 3.6 (Uniformly Smooth Set or Space). Consider a compact convex set C and q ∈]1, 2]. Assume C is centrally symmetric with nonempty interior. C is (α, q)-uniformly smooth if for any τ > 0, we have q ρk·kC (τ) ≤ ατ . m In that case, we also say that the normed space (R , k · kC) is uniformly smooth of type q. When a set is (µ, 2)-uniformly convex (resp. (L, 2)-uniformly smooth), we say it is µ-strongly convex (resp. L-smooth), see [GI17, Theorem 2.1.] for a thorough review on strongly convex sets in Hilbert spaces. These properties are dual to each other, in terms of the set C and its polar C◦, or the norm ball and its dual norm ball [DGZ93, Proposition IV 1.12]. The Lindenstrauss formula [Lin63, Theorem 1] leads to quantitative versions of that duality. For any τ > 0, we have τǫ ρ (τ)= sup − δ (ǫ) . (Lindenstrauss) k·kC◦ ǫ∈[0,2] 2 k·kC The following lemma [DGZ93, Proposition 1.12.] thenn quantifies thiso duality and is similar to Proposition 3.3 on a function and its Fenchel conjugate. The proof directly follows from (Lindenstrauss). Proposition 3.7 (Uniform Smoothness and Convexity with dual norms). Consider α,c > 0, p ≥ 2 and 1 1 q ∈]1, 2] such that p + q = 1 and a compact convex set C centrally symmetric with nonempty interior. We have the following implications (a) If C is (α, q)-uniformly smooth (Definition 3.5), then C◦ is (1/ 2p(2αq)1/(q−1) ,p)-uniformly con- vex (Definition 3.6). (b) If C is (c, p)-uniformly convex, then C◦ is (1/ 2q(2αp)q−1 ,q) -uniformly smooth. Proof of Proposition 3.7. For instance, let us prove (a) . With (Lindenstrauss ), we have for any τ > 0 and that q. Optimizing w.r.t. to , the optimal ∗ 1/(q−1) leads to ǫ ∈ [0, 2] τǫ/2 − δk·kC◦ (ǫ) ≤ ατ τ τ = (ǫ/(2αq)) p δ (ǫ) ≤ 1 ǫ . k·kC◦ 2p (2αq)1/(q−1) 3.3. Local Moduli. Local counterparts of the global moduli characterize local properties of C around a ∗ ∗ point x ∈ ∂C with respect to a (normalized) direction d in the normal cone NC(x ). As we will see, these local properties are important as they explain empirical globally accelerated convergence rates in optimization problems where the functions or constraints do not satisfy global regularity assumptions such as, e.g., strong convexity [Dun79, KdP20]. The local modulus of smoothness [GI17, (15)] of at ∗ with respect to is defined as, C x ∈ ∂C d ∈ Sk·kC◦ (1) for t> 0, ∗ ∗ ∗ ρk·kC (t,x , d)= sup kx + txkC − kx kC − thd; xi |kxkC ≤ 1 . (Loc. Smoothness) Similarly to all moduli seen so far, the local modulus of smoothness is designed so that when t goes to zero, the first order terms cancel. In the following, for convenience we write ρC for ρk·kC . We measure the local uniform convexity at x∗ via the local modulus of rotundity. In the equivalent characterization of the set uniform convexity, the definition of modulus of rotundity is most related with the scaling inequalities characterizations, see (b) in Theorems 4.1 and 5.1. For ∗ and ∗ , the local x ∈ ∂C d ∈ NC(x ) ∩ Sk·kC◦ (1) modulus of rotundity at x∗ w.r.t. d is defined for ǫ ∈ [0, 2] as ∗ ∗ ∗ νC(ǫ,x , d)= inf hd; x − xi | x ∈C, kx − xkC ≥ ǫ . (Rotundity) The following lemma makes the duality between smoothness and rotundity explicit by linking the two moduli, to produce a local counterpart to (Lindenstrauss). We cite a version giving a quantitative dual relationship between local modulus of smoothness and local modulus of rotundity [GI17, Theorem 2.7.]. Lemma 3.8 (Local Lindstrauss formula). Consider ∗ and ∗ . Then the local x ∈ ∂C d ∈ Sk·kC◦ (1) ∩ NC(x ) modulus of smoothness and rotundity satisfy for any t> 0 ∗ ∗ ρC◦ (t,d,x )= supǫ∈[0,2] ǫt − νC(ǫ,x , d) . (Loc. Lindenstrauss) 9 ◦ ∗ Proof of Lemma 3.8. Let t> 0, by definition of ρC◦ , for η > 0 there exists dη ∈C such that ρC◦ (t,d,x ) ≤ ∗ kd + tdηkC◦ − kdkC◦ − thx ; dηi + η. Also, by compactness of C, there exists xη ∈ ∂C s.t. ktdη + dkC◦ = ∗ ∗ σC(tdη + d)= htdη + d; xηi. Since d ∈ NC(x ), we have kdkC◦ = σC(d)= hd; x i and hence ∗ ∗ ρC◦ (t,d,x ) ≤ ktdη + dkC◦ − kdkC◦ − thx ; dηi + η ∗ ∗ ∗ ρC◦ (t,d,x ) ≤ htdη + d; xηi − hd; x i− thx ; dηi + η ∗ ∗ ∗ ∗ ∗ ρC◦ (t,d,x ) ≤ hd; xη − x i + htdη; xη − x i + η ≤ hd; xη − x i + tσC◦ (xη − x )+ η ∗ ∗ ∗ ρC◦ (t,d,x ) ≤ supx∈C hd; x − x i + tkx − x kC + η ∗ ∗ ∗ ∗ ρC◦ (t,d,x ) ≤ supǫ∈[0,2]supx∈C hd; x − x i + tkx − x kC kx − x kC = ǫ + η ∗ ∗ ∗ ρC◦ (t,d,x ) ≤ supǫ∈[0,2] tǫ − inf x∈C hd; x − xi | kx − x kC = ǫ + η
∗ ∗ ρC◦ (t,d,x ) ≤ supǫ∈[0,2]tǫ − νC(ǫ,x, d) + η.
We used that for any x,y ∈C we have kx − ykC ∈ [0, 2] . Finally, last inequality is true for any η > 0, hence ∗ ∗ ρC◦ (t,d,x ) ≤ supǫ∈[0,2] tǫ − νC(ǫ,x , d) . We now provide a similar reasoning to obtain the equality. ∗ ∗ Indeed, for λ > 0, there exists ǫλ > 0 such that supǫ∈[0,2] tǫ − νC(ǫ,x , d) ≤ tǫλ − νC(ǫλ,x , d)+ λ. ∗ ∗ ∗ Also, for η > 0, there exists xη ∈ C s.t. νC(ǫλ,x , d) ≥ hd; x − xηi− η with kxη − x kC ≥ ǫλ. By ◦ ∗ ∗ ∗ compactness of C, there exists dη ∈C such that kxη − x kC = σC◦ (x − xη)= hx − xη; dηi. Therefore, for t> 0, we have
∗ ∗ ∗ ∗ supǫ∈[0,2] tǫ − νC(ǫ,x , d) ≤ tǫλ − hd; x − xηi + λ + η ≤ tkxη − x kC − hd; x − xηi + λ + η ∗ ∗ ∗ supǫ∈[0,2]tǫ − νC(ǫ,x , d) ≤ thdη; x − xηi − hd; x − xηi + λ + η ∗ ∗ ∗ supǫ∈[0,2]tǫ − νC(ǫ,x , d) ≤ hxη; d − tdηi − hd; x i + thdη; x i + λ + η ∗ ∗ supǫ∈[0,2]tǫ − νC(ǫ,x ,p) ≤ σC(d − tdη) − σC(d) − thx ; −dηi + λ + η ∗ ∗ supǫ∈[0,2]tǫ − νC(ǫ,x , d) ≤ kd − tdηkC◦ − kdkC◦ − thx ; −dηi + λ + η. ◦ ∗ ∗ Hence, since −dη ∈ C , for any λ, η > 0, we have νC(ǫ,x , d) ≤ ρC◦ (t,d,x )λ + η. We conclude that ∗ ∗ sup tǫ − ν (ǫ,x , d) ≤ ρ ◦ (t,d,x ). ǫ∈[0,2] C C
4. EQUIVALENCE BETWEEN GLOBAL SETAND FUNCTIONAL ASSUMPTIONS We expose some classical equivalence between functional and geometrical properties in Theorem 4.1 below. This leads to new insights in learning theory in Section 6.2 and in optimization in Sections 6.1-6.3. Item (a) is similar to the definition appearing in most machine learning papers [GH15, HLGS16, HLGS17] and gives an intuitive understanding of set uniform convexity. The uniformly mid-convex property is equiv- alent to its continuous counterpart, see, e.g., [Mol20, Lemma 9], but allows more concise proofs. Item (b) is an essential inequality in analysing projection-free online or offline optimization methods. There are other related and useful inequalities that can be seamlessly derived from this one, see, e.g., Lemma 6.6. Item (d)-(f) provides equivalent functional properties of the gauge and support function of C. Note that C is UC, but it is only a power of its gauge k·kC that is UC in the sense of functions. Also, the support function is only partially H¨older smooth as Item (d) holds on the sphere . Again, it is only a specific power Sk·kC◦ (1) of the support function that is uniformly smooth in the sense of functions without restriction on its domain. Finally, item (c) connects all other perspectives with the study of uniformly convex Banach spaces. This connection is rich with hindsights, see, e.g., Section 6.2. These results are classical and appear in many textbooks [DGZ93, LT13] often in non-quantitative, scat- tered, or too generic forms. We detail self-contained elementary proofs and provide quantitative versions in 10 the finite-dimensional setting. Further, we only present the most practically significant equivalent charac- terizations here. In Section 5, we will provide similar quantitative results with local uniform convexity and smoothness of C. 1 1 Theorem 4.1 (Global Set Uniform Convexity). Consider p ≥ 2 and q ∈]1, 2] s.t. p + q = 1. Let C be a centrally symmetric compact convex set with nonempty interior. The following assertions are equivalent (a) (Set mid-convex property) There exists α> 0 s.t. for all (x,y) ∈C we have x + y + αkx − ykp B (1) ⊂C. 2 C k·kC
(b) (Global scaling inequality) There exists α > 0 s.t. for any (x,y) ∈ C× ∂C and d ∈ Rm with d ∈ NC(y) (or y ∈ argmaxv∈Chd; vi) we have p hd; y − xi≥ αkdkC◦ ky − xkC . (Global-Scaling)
(c) (Set Modulus UC) There exists α> 0 s.t. C is (α, p)-uniformly convex (Definition 3.5), i.e., for any ǫ> 0, we have the following lower bound on the modulus (3), p δk·kC (ǫ) ≥ αǫ .
(d) (Support Holder-Smooth¨ Sphere) The exists c > 0 s.t. the support function σC(·) is (c, q − 1)- Holder¨ smooth with respect to ◦ on , i.e., it is differentiable on and for any k · kC Sk·kC◦ (1) Sk·kC◦ , we have (d1, d2) ∈ Sk·kC◦ (1) q−1 1/(p−1) ∇σC(d1) −∇σC(d2) C ≤ c d1 − d2 C◦ = c d1 − d2 C◦ .
q R m q (e) (Support US) σC(·) is differentiable on and there exists c > 0 s.t. σC(·) is (c, q)-uniformly m smooth on R with respect to k · kC◦ for some c> 0. p (f) (Gauge UC) There exists α> 0 s.t. k·kC is (α, p)-uniformly convex with respect to k·kC (Definition 3.2). Proof of Theorem 4.1. (a) =⇒ (c). Let (x,y) ∈ S (1). For z = x+y , we have x+y +αkx−ykp z ∈C. k·kC kx+ykC 2 C Hence x + y p 2 1+ αkx − ykC ≤ 1. 2 C kx + ykC This shows that 1 − k(x + y)/2 k ≥ α kx − ykp and hence δ (ǫ) ≥ αǫp. C C k·kC
(c) =⇒ (a). Recall that the modulus of convexity ρk·kC (ǫ) in (3), can be written as the infimum over (x,y) ∈ Bk·k(1) instead of Sk·k(1), see discussion following [LT13, Definition 1.e.1]. Let (x,y) ∈ C. By p definition of the modulus of convexity, we have 1 − k(x + y)/2kC ≥ αkx − ykC . Hence by the triangle p p inequality, for any z ∈ Bk·kC (1), we have k(x+y)/2+αkx−ykCzkC ≤ 1, so that (x+y)/2+αkx−ykCz ∈ C.
Rm (a) =⇒ (b). Let x ∈ C, y ∈ ∂C and d ∈ s.t. d ∈ NC(y). We have y ∈ argmaxv∈C hd; vi. Because p (x + y)/2+ αkx − ykCz ∈C, for any z ∈ Bk·kC (1), the optimality of y implies p hd; (x + y)/2+ αkx − ykCzi ≤ hd; yi. p Hence, for any z ∈ Bk·kC (1) we have 2αkx − ykChd; zi ≤ hd; y − xi. By definition of the dual norm, we p ⋆ ⋆ hence have 2αkx − ykC kdkC ≤ hd; y − xi and conclude with kdkC = kdkC◦ .
11 (b) (d). Let and consider argmax for . We have that =⇒ (d1, d2) ∈ Sk·kC◦ (1) vdi ∈ v∈C hdi; vi i = 1, 2 for any x ∈C p p ◦ hd1; vd1 − xi≥ αkd1kC · kvd1 − xkC = αkvd1 − xkC p p ◦ (hd2; vd2 − xi≥ αkd2kC · kvd2 − xkC = αkvd2 − xkC.
Then, by summing the two inequalities evaluated respectively at x = vd2 and x = vd1 , we have p hd1 − d2; vd1 − vd2 i≥ 2αkvd1 − vd2 kC.
By Cauchy-Schwartz and since vdi = ∇σC(di) for i = 1, 2 (Lemma 2.1 applies because C is strictly convex and di 6= 0), we obtain p kd1 − d2kC◦ · k∇σC (d1) −∇σC(d2)kC ≥ 2αk∇σC (d1) −∇σC(d2)kC , and conclude that 1 1/(p−1) k∇σC(d1) −∇σC(d2)kC ≤ kd1 − d2k ◦ . (2α)1/(p−1) C Note finally that 1/(p − 1) = q − 1.
(e) =⇒ (c). Note that [BGHV09, (d) =⇒ (a) of Theorem 2.2.] is not constructive and that [DGZ93, (ii) =⇒ (i) in Lemma 5.1.] is incomplete as it only proves that the modulus of smoothness has the right lower-bound for τ ∈ [0, 1/2[. [LT13] do not consider these aspects and [K¨ot83, §26] neither. [Zˇa02, (iii) of Theorem 3.7.4.] bears some similarity. Recall the duality between support and gauge functions ◦ σC (·) = k · kC◦ . We now show that C is uniformly smooth by providing an upper bound on its modulus of smoothness and conclude on (c) by duality. Recall that for τ > 0, the modulus of smoothness of C◦ is defined as
ρC◦ (τ)= sup kd1 + τd2kC◦ + kd1 − τd2kC◦ /2 − 1 kd1kC◦ = kd2kC◦ = 1 . Consider , since q is -uniformly smooth on Rm and by equivalence between (a) and (d1, d2) ∈ Sk·kC◦ σC (c, q) (b) in Definition 3.1, we have
q q 2c q kd + τd k ◦ ≤ 1+ h∇k · k ◦ (d ); τd i + kτd k ◦ 1 2 C C 1 2 q 2 C q q 2c q kd − τd k ◦ ≤ 1 − h∇k · k ◦ (d ); τd i + kτd k ◦ . 1 2 C C 1 2 q 2 C 1/q 1/q When q ∈]1, 2], (1 + x) is concave and below its tangent. In particular, (1 + x) ≤ 1+ x/q. Hence, combined with kd2kC◦ = 1, we have
1 q 2c q kd + τd k ◦ ≤ 1+ h∇k · k ◦ (d ); τd i + τ 1 2 C q C 1 2 q2 1 q 2c q kd − τd k ◦ ≤ 1 − h∇k · k ◦ (d ); τd i + τ . 1 2 C q C 1 2 q2 Then summing the two inequalities and dividing by 2, we obtain 2c q kd + τd k ◦ + kd − τd k ◦ /2 − 1 ≤ τ . 1 2 C 1 2 C q2 Hence, C◦ is (2c/q2,q)-uniformly smooth. Then Proposition 3.7 (a), implies that C is (1/(2p(2αq)p−1),p)- uniformly convex with α = 2c/q2, i.e., C is (qp−1/(22p−1pcp−1))-uniformly convex.
(f) =⇒ (e) From Lemma 2.2, we have that k · kp ⋆(·) = 1 − 1 σq (·). Then Item (b) of C p1/(p−1) pq C p−1 q ′ h i ⋆ ◦ Proposition 3.3 implies that pq−1 σC(·) is (c ,q)-uniformly smooth on C with respect to k·kC = k·kC and ′ q−1 q h i pq−1 c = 1/(qc ). Hence, σC is ( (p−1)qcq−1 ,q)-uniformly smooth. Note also that by equivalence between (a) and (b) in Definition 3.1, we have that q is differentiable. σC
12 q (e) =⇒ (f). Conversely, let us assume that σC is (α, q)-uniformly smooth. From Lemma 2.2, we have that q ⋆ 1 1 p . And, with Proposition 3.3 (a), q−1 p is ′ -uniformly σC (·)= q1/(q−1) − qp k · kC(·) qp−1 k · kC (·) (c ,p) h i ′ p−1 h i p qp−1 convex with respect to k · kC with c = 1/(pα ). Finally, we conclude that | · kC(·) is ( (q−1)pαp−1 ,p)- uniformly convex. q (d) =⇒ (e). Conversely, let us show that σC(·) is uniformly smooth. The proof follows that of [BGHV09, q Rm Rm Theorem 2.1.]. Let us start by showing that σC(·) is differentiable on . For d1 ∈ \ {0}, we have q q−1 ∇σC(d1) = qkd1k ∇σC(d1). Because C is strictly convex, there is a unique x1 ∈ ∂C s.t. d1 ∈ NC(x1). From [Sch14, Corollary 1.7.3.], we have that ∇σC(d1) = x1. Because q > 1, when d1 converges to 0, we q q q have that ∇σC(d1) also converges to zero. Hence, σC is differentiable at zero with ∇σC(0) = 0. m Let (d1, d2) ∈ R and xi ∈ ∂C s.t. di ∈ NC(xi), i.e., ∇σC(di) = xi. Because σC is H¨older smooth on 1/(q−1) , we have ◦ ◦ . We then obtain Sk·kC◦ k∇σC(d1) −∇σC(d2)kC ≤ ckd1/kd1kC − d2/kd2kC kC◦ q q q−1 q−1 k∇σC(d1) −∇σC(d2)kC = kqσC (d1)∇σC(d1) − qσC (d2)∇σC(d2)kC q−1 q−1 q−1 ≤ qσC (d1) ∇σC(d1) −∇σC(d2) C + q ∇σC(d2) C σC (d1) − σC (d2) q−1 q−1 q−1 q−1 ≤ qckd k ◦ d /kd k ◦ − d /kd k ◦ + q kd k ◦ − kd k ◦ 1 C 1 1 C 2 2 C C◦ 1 C 2 C q−1 q−1 q−1 ≤ qc d − d kd k ◦ /kd k ◦ + q kd k ◦ − kd k ◦ . 1 2 1 C 2 C C◦ 1 C 2 C r r r We have for λ1, λ2 > 0 and r ∈]0, 1] |λ 1 − λ2| ≤ |λ1− λ2| [BGHV09 , Lemma 2.1.]. Hence, for q−1 q−1 q −1 q−1 q − 1 ∈]0, 1], we have kd1kC◦ − kd2kC◦ ≤ kd1kC◦ − kd2kC◦ ≤ kd1 − d2kC◦ . Also, with the triangle inequality d − d kd k ◦ /kd k ◦ ≤ kd − d k ◦ + kd k ◦ − kd k ◦ ≤ 2kd − d k ◦ . Hence 1 2 1 C 2 C 1 2 C 2 C 1 C 1 2 C q q q−1 q−1 k∇ σ (d ) −∇σ (d )k ≤ q(c2 + 1)kd − d k ◦ . C 1 C 2 C 1 2 C q 2 q−1 Equivalence between (a) and (c) in Definition 3.1 shows that σC is (2q (c2 + 1),q)-uniformly smooth. (e) (d). Let . Since, for , q q−1 , =⇒ (d1, d2) ∈ Sk·kC◦ (1) i = 1, 2 ∇σC(di)= qσC(di) ∇σC(d1)= qσC(d1) we directly have (because of the equivalence between (a) and (c) in Definition 3.1) c 1/(p−1) k∇σ (d ) −∇σ (d )k ≤ kd − d k ◦ . C 1 C 2 C q 1 2 C Remark 1. From the proof of Theorem 4.1, one can obtain quantitative results. (a) and (c) are equiv- alent with the same constant. (a) with (α, p) implies (b) with (2α, p); (b) with (α, p) implies (d) with (1/(2α)q−1,q − 1); (e) with (c, q) implies (c) with (qp−1/(22p−1pcp−1),p); (f) with (α, p) implies (e) with (pq−1/((p − 1)qcq−1),q); Conversely, (e) with (α, q) implies (f) with (qp−1/((q − 1)pαp−1),p); Finally, (d) with (c, q − 1) implies (e) with (2q2(c2q−1 + 1)).
5. EQUIVALENCE BETWEEN LOCAL SETAND FUNCTIONAL ASSUMPTIONS In this section, we provide equivalent characterizations of the local uniform convexity of C at x∗ ∈ ∂C. The results are summarized in Theorem 5.1, the analog to Theorem 4.1. We seek to articulate different useful views on the local uniform convexity property of a set. Item (a) is a Banach geometry definition via the local modulus of rotundity. Item (b) is a geometric local scaling inequality useful in some algorithm analysis, see for instance the Frank-Wolfe method on locally uniformly convex sets [KdP20]. Note that a natural local version of (Global-Scaling), could be that for any ∗ d ∈ NC (x ), for any x ∈C, we require ∗ ∗ q hd; x − xi≥ αkdkC◦ kx − xkC . However, we opted for a weaker version in (Local-Scaling) which expresses the property only with respect to a single direction in the normal cone at the point of interest. Finally Items (b) and (e) connect these 13 geometrical characterization with their functional counterpart, both in term of smoothness and uniform con- vexity. These results appear scattered in the literature, see, e.g., [Zˇa83, Chapter 3.7] or [AP95, Proposition 3.2.]. We expect these various equivalences to provide convergence proof of algorithms in online and offline settings when the decision sets or constraints sets are not globally strongly convex. We provide an example of such a result in Section 6.1. 1 1 Theorem 5.1 (Local Set Uniform Convexity). Consider p ≥ 2 and q ∈]1, 2] s.t. p + q = 1. Let C be ∗ ∗ a compact strictly convex set centrally symmetric with nonempty interior. Let x ∈ ∂C, d1 ∈ NC(x ) ∩ (note ◦). The following assertions are equivalent Sk·kC◦ (1) Sk·kC◦ (1) = ∂C (a) (Modulus of Rotundity) There exists α > 0 s.t. C is (α, p)-locally uniformly convex at x∗ w.r.t. direction d1, i.e., for any ǫ ∈ [0, 2], we have ∗ ∗ ∗ p νC(ǫ,x , d1) , inf hd1; x − xi | x ∈C, kx − x kC ≥ ǫ ≥ αǫ . (b) (Local scaling inequality) For any x ∈C, we have ∗ ∗ p hd1; x − xi≥ αkx − xkC . (Local-Scaling)
(c) (Support Local Holder-Smooth¨ Sphere) There exists c> 0 s.t. σC(·) is (c, q − 1)-Holder¨ smooth at on w.r.t. ◦ , i.e., for any , we have d1 Sk·kC◦ (1) k · kC d2 ∈ Sk·kC◦ (1) q−1 1/(p−1) ∇σC(d1) −∇σC(d2) C ≤ c d1 − d2 C◦ = d1 − d2 C◦ .
q (d) (Support Local US) There exists c > 0 s.t. σC(·) is (α, q)-uniformly smooth at d1 w.r.t k · kC◦ , i.e., m for any d2 ∈ R , we have q q ∗ α q σ (d ) ≤ σ (d )+ qhx ; d − d i + kd − d k ◦ , C 2 C 1 2 1 q 2 1 C q ∗ where ∇σC(d1)= qx . p ∗ (e) (Gauge local UC) There exists µ> 0 s.t. k · kC is (µ,p)-uniformly convex at x on C in direction d1 m w.r.t. k · kC, i.e., for any y ∈ R µ kykp ≥ kx∗kp + phd ; y − x∗i + ky − x∗kp . C C 1 2 C
m Proof of Theorem 5.1. Because C is strictly convex, σC is differentiable on R \ {0}, see Lemma 2.1. ∗ ∗ q In particular, ∇σC(d1) = x since d1 ∈ NC(x ). Also, because kd1kC◦ = 1, note that ∇σC(d1) = q−1 ∗ qkd1kC◦ ∇σC(d1) = qx Finally, note that k · kC = σC◦ is not necessarily differentiable (would require assuming that C◦ is smooth). (a) ⇐⇒ (b) is immediate.
(a) (c). Let us assume that is -uniformly convex at ∗ with respect to =⇒ C (α, p) x ∈ ∂C d1 ∈ Sk·kC◦ (1)∩ ∗ ∗ p NC(x ), i.e., for any ǫ> 0, νC(ǫ,x , d1) ≥ αǫ . Hence, we have for any x ∈C ∗ ∗ p hd1; x − xi≥ αkx − x kC . Let and , argmax (it is unique because is strictly convex compact). In d2 ∈ Sk·kC◦ (1) x2 x∈Chx; d2i C ∗ particular, hx − x2; d2i≤ 0, hence we have ∗ ∗ ∗ ∗ ∗ p hd1 − d2; x − x2i ≥ hd1 − d2; x − x2i + hd2; x − x2i = hd1; x − x2i≥ αkx − x2kC. ≤0 ∗ ∗ p Then, with Cauchy-Schwartz we have kd1 − d2kC◦ k|x −{zx2kC ≥} αkx − x2kC. Hence, ∗ 1 1/(p−1) kx2 − x kC ≤ kd1 − d2k ◦ . α1/(p−1) C 14 ∗ With Lemma 2.1, we have ∇σC(d2)= x2 and x = ∇σC(d1), which concludes with q − 1 = 1/(p − 1).
q (d) =⇒ (a). Let us now assume that σC(·) is (α, q)-uniformly smooth at d1 w.r.t k·kC◦ . Also σC(·)= k·kC◦ . ∗ ◦ ∗ Let us first prove an upper bound on the local modulus of smoothness ρC◦ (t, d1,x ) of C at d1 w.r.t. x , see (Loc. Smoothness). By the duality formula (Loc. Lindenstrauss), we will then obtain a lower bound on the modulus of rotundity. Recall that the local modulus of smoothness in (Loc. Smoothness) is defined for any t> 0, as ∗ ∗ ◦ ρC◦ (t, d1,x )= sup kd1 + td2kC◦ − kd1kC◦ − thx ; d2i | d2 ∈C . m By (d), we have for any d2 ∈ R
q q q α q q kd + td k ◦ ≤ kd k ◦ + th∇σ (d ); d i + t kd k ◦ . 1 2 C 1 C C 1 2 q 2 C
q ∗ 1/q Recall from the beginning of the proofs that ∇σC(d1) = qx . Then, we have by concavity of (1 + x) when q ∈]1, 2] q ∗ α q ∗ α q kd + td k ◦ ≤ 1+ tqhx ; d i + t ≤ 1+ thx ; d i + t . 1 2 C 2 q 2 q2 ◦ ∗ 2 q In particular, for d2 ∈ C and because kd1kC◦ = 1, we have ρC◦ (t, d1,x ) ≤ α/q t . Then, with Lemma 3.8, we have that for any ǫ ∈ [0, 2] and t> 0
∗ 2 q supǫ∈[0,2] ǫt − νC(ǫ,x , d1) ≤ α/q t . Hence, for any ǫ ∈ [0, 2] ∗ 2 q νC(ǫ,x , d1) ≥ ǫt − α/q t . Then for t = (qǫ/α)1/(q−1), we have
qp−2 ν (ǫ,x∗, d ) ≥ (q − 1)ǫp. C 1 αp−1
qp−2 ∗ Therefore, C is ( αp−1 (q − 1),p)-locally uniformly convex at x with respect to d1.
(c) =⇒ (d). The proof is similar to that of (d) =⇒ (f) in Theorem 4.1, we repeat it for completeness. q Rm First, by the very same argument, σC is differentiable on (recall that σC is not differentiable at 0). Now, m consider d2 ∈ R \ {0} and the unique (because C is strictly convex) x2 ∈ ∂C s.t. d2 ∈ NC (x2). Then, with Lemma 2.1, we have ∇σC(d2)= x2 and with the same argument ∇σC(d2/kd2kC◦ )= x2. Because σC 1/(q−1) is H¨older smooth at on , we have ◦ . We then d1 Sk·kC◦ k∇σC(d1) −∇σC(d2)kC ≤ ckd1 − d2/kd2kC kC◦ q−1 obtain, by adding and subtracting qσC (d1)∇σC(d2) and applying the triangle inequality q q q−1 q−1 k∇σC(d1) −∇σC(d2)kC = kqσC (d1)∇σC(d1) − qσC (d2)∇σC(d2)kC q−1 q−1 q−1 ≤ qσC (d1) ∇σC(d1) −∇σC(d2) C + q ∇σC(d2) C σC (d1) − σC (d2) q−1 q−1 q−1 q−1 ≤ qckd k ◦ d /kd k ◦ − d /kd k ◦ + q kd k ◦ − kd k ◦ 1 C 1 1 C 2 2 C C◦ 1 C 2 C q−1 q−1 q−1 ≤ qc d − d kd k ◦ /kd k ◦ + q kd k ◦ − kd k ◦ . 1 2 1 C 2 C C◦ 1 C 2 C r r r We have for λ1, λ2 > 0 and r ∈]0, 1] |λ1 − λ2| ≤ |λ1 − λ2| . Hence, for q − 1 ∈ ]0, 1], we have q−1 q−1 q−1 q−1 kd1kC◦ − kd2kC◦ ≤ kd1kC◦ − kd2kC◦ ≤ kd1 − d2kC◦ . Also, by the triangle inequality, d1 − m d kd k ◦ /kd k ◦ ≤ kd − d k ◦ + kd k ◦ − kd k ◦ ≤ 2kd − d k ◦ . Hence for any d ∈ R \ {0} 2 1 C 2 C 1 2 C 2 C 1 C 1 2 C 2