A perturbation view of level-set methods for convex optimization

Ron Estrina and Michael P. Friedlanderb a b Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA ([email protected]); Department of Computer Science and Department of Mathematics, University of British Columbia, Vancouver, BC, Canada ([email protected]). Research supported by ONR award N00014-16-1-2242

July 31, 2018

Level-set methods for convex optimization are predicated on the idea where τ is an estimate of the optimal value τ ∗. If τ τ ∗, the p ≈ p that certain problems can be parameterized so that their solutions level-set constraint f(x) τ ensures that a solution x ≤ τ ∈ X can be recovered as the limiting process of a root-finding procedure. of this problem causes f(xτ ) to have a value near τp∗. If, This idea emerges time and again across a range of algorithms for additionally, g(x ) 0, then x is a nearly optimal and τ ≤ τ convex problems. Here we demonstrate that strong is a nec- feasible solution for (P). The trade-off for this potentially essary condition for the level-set approach to succeed. In the ab- more convenient problem is that we must compute a sequence sence of , the level-set method identifies -infeasible of parameters τk that converges to the optimal value τp∗. points that do not converge to a feasible point as  tends to zero. Define the optimal-value function, or simply the value func- The level-set approach is also used as a proof technique for estab- tion, of (Qτ ) by lishing sufficient conditions for strong duality that are different from Slater’s constraint qualification. v(τ) = inf g(x) f(x) τ . [1] x { | ≤ } ∈X | constrained optimization | duality | level-set methods This definition suggests that if the constraint in (P) is active at a solution (i.e., g(x) = 0), then τp∗ is a root of the equation uality in convex optimization may be interpreted as a Dnotion of the sensitivity of an optimization problem to v(τ) = 0, perturbations of its data. Similar notions of sensitivity appear and in particular, the leftmost root: in numerical analysis, where the effects of numerical errors on the stability of the computed solution are of central con- τp∗ = inf τ v(τ) = 0 . [2] cern. Indeed, backward-error analysis (1, §1.5) describes the { | } related notion that computed approximate solutions may be The surprise is that this is not always true. considered as exact solutions of perturbations of the original Fix τd∗ to be the optimal value of the Lagrange-dual prob- problem. It is natural, then, to ask if duality can help us to lem (5, §I) to (P). If strong duality fails for (P)—i.e., τp∗ > τd∗— understand the behavior of a class of numerical algorithms in fact it holds that for convex optimization. In this paper, we describe how the level-set method (2–4) produces an incorrect solution when τ ∗ = inf τ v(τ) = 0 < τ ∗, d { | } p applied to a problem for which strong duality fails to hold. In other words, the level-set method cannot succeed if there and the inequality is tight only when strong duality holds, i.e., does not exist a dual pairing that is tight. This failure of τp∗ = τd∗. Thus, a root-finding algorithm—such as bisection or strong duality indicates that the stated optimization problem Newton’s method—implemented so as to yield the leftmost is brittle, in the sense that its value as a function of small root of the equation Eq. (1), converges to a value of τ that perturbations to its data is discontinuous; this violates a vital prevents (Qτ ) from attaining an optimal solution. This idea assumption needed for the level-set method to succeed. is illustrated in Figure 1 and manifested by Example 2. Consider the convex optimization problem Our discussion identified τd∗ as the optimal value of the Lagrange dual problem. However, the same conclusion holds minimize f(x) subject to g(x) 0, [P] when we consider any dual pairing that arises from a convex x ≤ ∈X perturbation. The following theorem, proven in a later section, where f and g are closed proper convex functions that map encapsulates the main result of this paper. The inequality n R to the extended real line R , and is a convex on v in the following statement accommodates the possibility n ∪ {∞} X set in R . Let the optimal value be τp∗ := inf (P) < , so that the constraint in (P) is not active at the solution, and so ∞ that (P) is feasible. In the context of level-set methods, we applies more generally than the preceding discussion. may think of g(x) 0 as representing a constraint that poses a ≤ Theorem 1. For the value function v, computational challenge. For example, there may not exist any efficient algorithm that can compute the projection onto the τd∗ = inf τ v(τ) 0 and v(τ) 0 for all τ > τd∗. constraint set x g(x) 0 . In many important cases, { | ≤ } ≤ { ∈ X | ≤ } the objective function has a useful structure that makes it Note that the theorem does not address conditions under computationally convenient to swap the roles of the objective which v(τ ∗) 0, which is true if and only if the solution set p ≤ f with the constraint g, and to instead solve the level-set argmin (P) is not empty. In particular, any x∗ argmin (P) is ∈ problem a solution of (Q ) for τ = τ ∗, and hence v(τ ∗) 0. However, τ p p ≤ if argmin (P) is empty, then there is no solution to (Qτ ) and minimize g(x) subject to f(x) τ, [Qτ ] ∗ x ≤ hence v(τp ) = + . ∈X ∞

1 log(1/) approximate evaluations of v are required to ob- O v(τ) tain an -infeasible solution, where the subproblem accuracy is proportional to the final required accuracy specified by Eq. (3).

Problem formulation. The formulation (P) is very general, even

∗ though the constraint g(x) 0 represents only a single function τd∗ τp ≤ of the full constraint set represented by . There are various τ 1 0 X − avenues for reformulating any combination of constraints that gap lead to this formulation. For instance, Example 2 demonstrates how multiple linear constraints of the form Ax = b can be represented as a constraint on the norm of the residual, i.e., super-optimal sub-optimal g(x) = Ax b 0. More generally, for any set of constraints k − k ≤ infeasible feasible c(x) 0 where c = (c ) is a vector of functions c , we may ≤ i i set g(x) = ρ(max 0, c(x) ) for any convenient nonnegative Fig. 1. A sketch of the value function v for Example 2; cf. Eq. (9). The value function { } ∗ ∗ ∗ ρ that vanishes only at the origin. v(τ) vanishes for all τ ≥ τd , where τd < τp . This causes the level-set method to converge to an incorrect solution when strong duality fails to hold. Examples Some examples will best illustrate the behavior of the value We do not assume that readers are experts in convex du- function. The first two are semidefinite programs (SDPs) for ality theory, so we present an abbreviated summary of the which strong duality fails to hold. These demonstrate that machinery needed to develop the proof of Theorem 1. We also the level-set method can produce diverging iterates. The last describe a generalized version of the level-set pairing between example is a 1-norm regularized least-squares problem. It the problems (P) and (Qτ ) and thus establish Theorem 4, confirms that the level-set method succeeds in that important which generalizes Theorem 1. We show how this theory can be case, often used in solving problems that arise in applications used as a theoretical device to establish sufficient conditions of sparse recovery (14) and compressed sensing (15, 16). for strong duality. For the first two examples below, let xij denote the (i, j)th entry of the n-by-n symmetric matrix X = (xij ). The notation Level-set methods X 0 denotes the requirement that X is symmetric positive  semidefinite. The technique of exchanging the roles of the objective and constraint functions has a long history. For example, the Example 1 (SDP with infinite gap). Consider the 2 2 SDP × isoperimetric problem—which dates back to the second cen- minimize 2x21 subject to x11 = 0, [4] tury B.C.E.—seeks the maximum area that can be circum- X 0 −  scribed by a curve of fixed length (6). The converse problem seeks the minimum-length curve that encloses a certain area. whose solution and optimal value are given, respectively, by   Both problems yield the same circular solution. The mean- 0 0 X = and τp∗ = 0. variance model of financial portfolio optimization pioneered ∗ 0 0 by Markovitz (7), is another example. It can be phrased as either the problem of allocating assets that minimize risk (i.e., The Lagrange dual is a feasibility problem: variance) subject to a specified mean return, or as the problem   y 1 of maximizing the mean return subject to a specified risk. The maximize 0 subject to − 0. y 1 0  correct choice of parameters causes both problems to have the ∈R − same solution. Because the dual problem is infeasible, we assign the dual The idea of rephrasing an optimization problem as root- optimal value τ ∗ = . Thus, τ ∗ = < τ ∗ = 0, and this d −∞ d −∞ p finding problem appears often in the optimization literature. dual pairing fails to have strong duality. The celebrated Levenberg-Marquadt algorithm (8, 9), and The application of the level-set method to the primal prob- trust-region methods (10) more generally, use a root-finding lem Eq. (4) can be accomplished by defining the functions procedure to solve a parameterized version of the optimiza- tion problem. The widely-used SPGL1 software package for f(X) := 2x21 and g(X) := x11 , − | | sparse optimization (11) implements the level-set method for which together define the value function of the level-set prob- obtaining sparse solutions of linear least-squares and underde- lem (Q ): termined linear systems (12, 13). τ  In practice, only an approximate solution of the problem (P) v(τ) = inf x11 2x21 τ . [5] X 0 | | − ≤ is required, and the level-set method can be used to obtain an  approximate root that satisfies v(τ) . The solution x Because X∗ is primal optimal, v(τ) = 0 for all τ τ ∗ = 0. ≤ ∈ X ≥ p of the corresponding level-set problem (Qτ ) is super-optimal Now consider the parametric matrix and -infeasible:  τ   2 X(τ, ) := 2 for all τ < 0 and  > 0, f(x) τp∗ and g(x) . [3] τ τ ≤ ≤ 2 4 Aravkin et al. (4) give a full description of the general al- which is feasible for the level-set problem Eq. (5)—i.e., v(τ) is gorithm, together with a complexity analysis showing only finite. The level-set problem clearly has a zero lower bound

2 Estrin and Friedlander that can be approached by sending  0. Thus, v(τ) = 0 for Example 3 (Basis pursuit denoising (14, 17)). The level-set ↓ all τ < 0. method implemented in the SPGL1 software package solves In summary, v(τ) = 0 for all τ, and so v(τ) has roots less the 1-norm regularized least-squares problem than the true optimal value τp∗. Furthermore, for τ < 0, there minimize x 1 subject to Ax b 2 σ is no primal attainment for Eq. (1), because lim 0 X(τ, ) ↓ x k k k − k ≤ does not exist. for any value of σ 0, assuming that the problem remains ≥ Example 2 (SDP with finite gap). Consider the 3 3 SDP σ × feasible. (The case = 0 is important, as it accommodates the case in which we seek a sparse solution to the under-determined minimize 2x31 subject to x11 = 0, x22 + 2x31 = 1. [6] X 0 − linear system Ax = b.) The algorithm approximately solves a  sequence of flipped problems The positive semidefinite constraint on X, together with the constraint x = 0, implies that x must vanish. Thus, the minimize Ax b 2 subject to x 1 τk, 11 31 x k − k k k ≤ solution and optimal value are given, respectively, by where τk is chosen so that the corresponding solution xk satis- "0 0 0# fies Ax b σ. Strong duality holds because the domains k k − k2 ≈ X∗ = 0 1 0 and τp∗ = 0. [7] of the nonlinear functions (i.e., the 1- and 2-norms) cover the 0 0 0 whole space; see Rockafellar and Wets (5, Theorem 11.39), which is also summarized by Theorem 3. The Lagrange dual problem is Equivalence between value functions. The level-set method is " y 0 y 1# 1 2 − based on a kind of inverse-function relationship between the maximize y2 subject to 0 y2 0 0. 2 −  value function Eq. (1) of the flipped problem and the value y R y 1 0 0 ∈ 2 − function of the perturbed original problem y The dual constraint requires 2 = 1, and thus the optimal dual p(σ) = inf f(x) g(x) σ . value is τ ∗ = 1 < 0 = τ ∗. x { | ≤ } d − p ∈X For the application of the level-set method to the primal Clearly, τ ∗ = p(0). Aravkin et al. (18) describe the formal problem Eq. (6), we can assign p relationship between the value functions v and p, and their 2 2 respective solutions. We summarize the key aspects here. f(X) := 2x31 and g(X) := x11 + (x22 + 2x31 1) , [8] − − Use argmin v(τ) and argmin p(σ), respectively, to denote which together define the value function the set of solutions to the optimization problem underlying the value functions v and p (which may be empty). Thus, for 2 2 v(τ) = inf x11 + (x22 + 2x31 1) 2x31 τ . [9] example, if p(σ) < , X 0 { − | − ≤ }  ∞ argmin p(σ) = x f(x) = p(σ), g(x) 0 , As in Example 1, any convex nonnegative g function that { ∈ X | ≤ } vanishes on the feasible set could have been used to define v. and argmin p(σ) is empty otherwise. Clearly, argmin p(0) = It follows from Eq. (7) that v(τ) = 0 for all τ 0. Also, it can ≥ argmin (P). Because p is defined via an infimum, argmin p(σ) be verified that v(τ) = 0 for all τ τ ∗ = 1. To understand ≥ d − can be empty even if p is finite, in which case we say that the this, first define the parametric matrix optimal value p(σ) is not attained. " 1 # Let be the set of parameters τ for which the level-set  0 2 S constraint f(x) τ of (Qτ ) holds with equality. Formally, X = 0 0 0 with  > 0, ≤ 1 1 2 0 4 = τ + = argmin v(τ) x f(x) = τ . S { ≤ ∞ | ∅ 6 ⊆ { ∈ X | }} which is feasible for the level-set problem Eq. (9), and has Theorem 2 (Value-function inverses (18, Theorem 2.1)). For 2 objective value g(X) =  . Because X is feasible for every τ , the following statements hold: all positive , the optimal value vanishes because v(τ) = ∈ S inf g(X )  > 0 = 0. Moreover, the set of minimizers (a) (p v)(τ) = τ, {  | } ◦ for Eq. (9) is empty for all τ ( 1, 0). Figure 1 illustrates ∈ − (b) argmin v(τ) = argmin p(v(τ)) x f(x) = v(τ) . the behavior of this value function. ⊆ { ∈ X | } The level-set method fails because the root of v(τ) identifies The condition τ means that the constraint of the level- ∈ S an incorrect optimal primal value τp∗, and instead identifies set problem (Qτ ) must be active in order for the result to hold. the optimal dual value τd∗ < τp∗. The following example illustrates the need for this condition. The proof of Theorem 1 reveals that the behavior exhibited Example 4. The univariate problem by Examples 1 and2 stems from failure of strong duality with minimize x subject to x 1 0 respect to perturbations in the linear constraints. In the case x | | | | − ≤ ∈R of Example 2, we can produce a sequence of matrices X each of which is -infeasible with respect to the infeasibility measure has the obvious solution x∗ = 0 and optimal value τp∗ = 0, and g—cf. Eq. (8). However, the limit as  0 does not produce a an inactive constraint. The corresponding level-set problem ↓ feasible point, and the limit does not even exist because the minimize x 1 subject to x τ entry x33 of X goes to infinity. x | | − | | ≤ ∈R

Estrin and Friedlander 3 n shares the same solution x∗ when τ = τp∗. However the where f : R R is an arbitrary closed proper convex → ∪ {∞} corresponding value function for the level-set problem is function. The perturbation approach is predicated on fixing n m an arbitrary convex function F (x, u): R R R  1 if τ 0 × → ∪ {∞} v(τ) = − ≥ with the property that + if τ < 0, ∞ F (x, 0) = f(x) x. so it has the wrong optimal value. ∀ (Example 5 below describes an instance of such a function.) f This theorem is symmetric, and holds if the roles of and The choice of F determines the perturbation function g, and p and v, are reversed. We note that this result holds even if any of the underlying objects that define (P) are not p(u) := inf F (x, u), convex. x Part (b) of the theorem confirms that if τ ∗ —i.e., the p ∈ S which describes how the optimal value of f changes under constraint g(x) 0 holds with equality at a solution of (P)— ≤ a perturbation u. We seek the behavior of the perturbation then solutions of the level-set problem coincide with solution function about the origin, at which the value of p coincides p of the original problem defined by (0). More formally, with the optimal value τp∗, i.e., p(0) = τp∗. A dual optimization problem is constructed by considering argmin v(τ ∗) = argmin (P). p the set of affine functions u µ, u + q parameterized by m 7→ h i Again consider Example 2, where we set τ = 1/2, which (µ, q) R R that minorize p: − ∈ × falls midway between the interval (τ ∗, τ ∗) = ( 1, 0). Because d p − the solution set argmin v(τ) is empty, τ / . Thus, µ, u + q p(u) u. [10] ∈ S h i ≤ ∀ (p v)(τ) = p(0) = 0 = τ, The tightest lower minorant is obtained by maximizing the ◦ 6 left-hand side µ, u + q over (µ, q) subject to the constraint h i and the level-set method fails. above. This can be accomplished by first eliminating q and In order establish an inverse-function-like relationship be- setting it to tween the value functions p and v that always holds for convex ? problems, we provide a modified definition of the epigraphs q qµ := sup µ, u p(u) p (µ), ≡ − { h i − } ≡ − for v and w. u where Definition 1 (Value function epigraph). The value function ? epigraph p (µ) = sup µ, u p(u) of the optimal value function p(σ) u { h i − } vfepi p = (σ, τ) x , f(x) τ, g(x) σ . is the of p. Figure 2(a) illustrates how the { | ∃ ∈ X ≤ ≤ } choice of qµ causes the affine function to support the epigraph This definition is almost exactly the same as the regular of p. Thus, Eq. (10) is equivalent to the expression definition for the epigraph of a function, given by µ, u p?(µ) p(u) u. epi p = (σ, τ) p(σ) τ , h i − ≤ ∀ { | ≤ } Now take the supremum over µ of the left-hand side, which except that if τ = p(σ) but argmin p(σ) is empty, then (σ, τ) / ∈ results in the value vfepi w. The result below follows immediately from the definition p??(u) = sup µ, u p?(µ) . of the value function epigraph. It establishes that Eq. (2) µ { h i − } holds if (Qτ ) has a solution that attains its optimal value (as ?? opposed to relying on the infimal operator to achieve that It is thus evident that the biconjugate p , which is necessarily value). closed convex, provides a global lower envelope for p, i.e., Proposition 1. For the value functions p and v, p??(u) p(u) u. ≤ ∀ (σ, τ) vfepi p (τ, σ) vfepi v. This inequality is tight at a point u—i.e., p??(u) = p(u)—if ∈ ⇐⇒ ∈ and only if p is lower-semicontinuous at u. Because of the Duality in convex optimization connection between lower semicontinuity and the closure of the epigraph, we say that p is closed at such points u; see Duality in convex optimization can be understood as describing Rockafellar (19, Theorem 7.1). the behavior of an optimization problem under perturbation Weak duality expresses the following relationship between to its data. From this point of view, dual variables describe the optimal primal and dual values: the sensitivity of the problem’s optimal value to that per- turbation. The description that we give here summarizes ?? τd∗ p (0) p(0) τp∗. [11] a well-developed theory fully described by Rockafellar and ≡ ≤ ≡ Wets (5). We adopt a geometric viewpoint that we have found Strong duality holds when τp∗ = τd∗, which is equivalent to the helpful for understanding the connection between duality and closure of p at the origin. the level-set method. The functions p and p??, evaluated at the origin, define For this section only, consider the generic convex optimiza- dual pairs of optimization problems given by tion problem ?? ? minimize f(x) p(0) = inf F (x, 0) and p (0) = sup F (0, y). [12] x x y − ∈X

4 Estrin and Friedlander epi p epi p

∗ τ ∗ τ ∗ τp p(u) p = d p(u)

0 u 0 u µ, u + qµ µ, u + q h i h i µ (a) Non-optimal dual (b) Optimal dual

Fig. 2. The relationship between the primal perturbation p(u) and a single instance (with slope µ and intercept qµ) of the uncountably many minorizing affine functions ∗ that define the dual problem. The panel on the left depicts a non-optimal supporting hyperplane that generates a value qµ < τp ; the panel on the right depicts an optimal ∗ ∗ supporting hyperplane that generates a slope µ that causes qµ ≡ τd = τp .

The expression for p??(0) is a straightforward application of A gives a false root the conjugate operator. We discuss some intuition as to why we can expect level- weak duality The following theorem confirms the that al- set methods to return incorrect solutions when applied to ways holds between the optimal values τp∗ and τd∗. It also problems without strong duality. Apply the level-set method establishes sufficient conditions that guarantee when the per- to (P) and for simplicity, suppose that (P) attains its optimal turbation function p is closed at the origin, and hence strong value and that the constraint g(x) 0 is active at any solution. ≤ duality holds. The complete version of this theorem includes Consider the perturbation function statements regarding the relationships between the subdiffer- entials of p and q at the origin (5, Theorem 11.39). p(u) = inf f(x) g(x) u , [14] x { | ≤ } ∈X Theorem 3 (Weak and strong duality). Consider the primal- and the value function v(τ) given by Eq. (1). First, suppose dual pair Eq. (12). that strong duality holds for the problem under the given perturbation, so then p(0) = p∗∗(0), which means that the (a) [Weak duality] The inequality τ ∗ τ ∗ always holds. p ≥ d perturbation function p is closed at the origin. We sketch an example p(u) and the corresponding v(τ) in the top row of (b) [Strong duality] If 0 int dom p, then τ ∗ = τ ∗. ∈ p d Figure 3. To understand this picture, first consider the value τ1 < τp∗, Example 5. Consider the convex optimization problem as in the top row of Figure 3. It is evident that v(τ1) is positive, because otherwise there must exist a vector x ∈ X minimize f(x) subject to c(x) 0, Ax = b, [13] that is super-optimal (f(x) τ < τ ∗) and feasible (g(x) 0), x ≤ ≤ 1 p ≤ which contradicts the definition of τp∗. It then follows that u v τ p u τ τ > τ ∗ where c is a vector-valued convex function and A is a matrix. the value := ( 1) yields ( ) = 1. For 2 , any Introduce perturbations u and u to the right-hand sides solution to the original problem would be feasible (therefore 1 2 u of the constraints, which gives rise to Lagrange duality, and requiring no perturbation ) and would achieve objective value p(0) = τ ∗ < τ . Furthermore, notice that as τ τ ∗, the corresponds to the perturbation function p 2 1 → p value p(u1) varies continuously in τ1, where u1 is the smallest root of p(u) = τ1. p(u1, u2) = inf f(x) c(x) u1, Ax b = u2 . x { | ≤ − } Next consider the second row of Figure3. In this case, strong duality fails, which means that One valid choice for the value function that corresponds to lim p(u) = τd∗ = p(0). swapping both constraints with the objective to Eq. (13) can u 0 6 be expressed as ↓ With τ = τ1, we have v(τ1) > 0. With τ = τ3 > τp∗, we have  f(x) τ  v(τ) = 0 because any solution to (P) causes (Qτ ) to have  ≤  1 2 1 2 zero value. But we see that for τd∗ < τ2 < τp∗, v(τ2) = 0, v(τ) = inf [u1]+ 2 + u2 2 c(x) u1 , x,u ,u 2 2 because for any positive  there exists positive u <  such that 1 2  k k k k ≤  Ax b = u2 p(u) τ . Even though there is no feasible point that achieves − ≤ 2 a superoptimal value f(x) τ < τ ∗, for any positive  there ≤ 2 p where the operator [u ] = max 0, u is taken component- exists an -infeasible point that achieves that objective value. 1 + { 1} wise on the elements of u1. This particular formulation of the value function makes explicit the connection to the per- Lagrange duality framework. Define turbation function. We may thus interpret the value function  as giving the minimal perturbation that corresponds to an f(x) if g(x) u, F (x, u) = ≤ objective value less than or equal to τ. + otherwise. ∞

Estrin and Friedlander 5 epi p v(τ) τ2

τp∗ = τd∗

τ1

0 u τ 0 τ1 τp∗ = τd∗ τ2 p(u) (a) Perturbation function p under strong duality (b) Level-set value function v corresponding to (a)

τ3 epi p v(τ) τp∗

τ2

τd∗

τ1

u τ 0 0 τ τ ∗ τ τ ∗ τ p(u) 1 d 2 p 3

(c) Perturbation function p with no strong duality (d) Level-set value function v corresponding to (c)

Fig. 3. The perturbation function p(u) and corresponding level-set value function v(τ) for problems with strong duality (top row) and no strong duality (bottom row). Panel (c) ∗ ∗ illustrates the case when strong duality fails and the graph of p is open at the origin, which implies that τd < τp ≡ p(0).

The function F is convex in both of its arguments, and gives where equality (i) follows from the fact that p(u) = p??(u) for rise to the perturbation function p defined by Eq. (14). In all u dom p, equality (ii) follows from the closure of p??, and ∈ summary, this choice of perturbation leads to the pair (iii) follows from Eq. (11). This implies that

p(u) = inf f(x) g(x) u , [15a] η > 0, (u, ω) epi p such that (u, p(u)) (0, τd∗) < η. x { | ≤ } ∀ ∃ ∈ k − k ∈X For any fixed positive  define µ = min , 1 (τ τ ∗) . v(τ) = inf g(x) f(x) τ , [15b] { 4 − d } x { | ≤ } Choose uˆ dom p such that (ˆu, p(ˆu)) (0, τ ∗) < µ, and so ∈X ∈ k − d k   µ > (ˆu, p(ˆu)) (0, τ ∗) max uˆ , p(ˆu) τ ∗ . as defined by Eqs. (1) and (14). ≥ k − d k ≥ k k | − d | Thus, Proof of Theorem 1. We first prove the second result that p(ˆu) < τd∗ + µ. [18] v(τ) 0 if τ > τd∗. Suppose that strong duality holds, i.e., ≤ Moreover, it follows from the definition of p(uˆ)—cf. Eq. (14)— τp∗ = τd∗. Then the required result is immediate because if τp∗ that is the optimal value, then for any τ > τp∗, there exists feasible x such that f(x) τ. ν > 0, x such that F (x, uˆ) p(ˆu) + ν. ≤ ∀ ∃ ∈ X ≤ Suppose that strong duality does not hold, i.e., τp∗ > τd∗. If τ > τ ∗, it is immediate that v(τ) 0. Assume, then, that Choose ν = µ, and so there exists xˆ such that F (x,ˆ uˆ) p ≤ ≤ τ (τ ∗, τ ∗]. Note that the conditions g(x) u and f(x) τ p(ˆu) + µ. Together with Eq. (18), we have ∈ d p ≤ ≤ F x, u τ are equivalent to ( ) . We will therefore prove that f(ˆx) p(ˆu) + µ < τ ∗ + 2µ τ. ≤ ≤ d ≤  > 0, x such that F (x, u) τ, u , [16] Therefore, for each  > 0, we can find a pair (x,ˆ uˆ) that satisfies ∀ ∃ ∈ X ≤ ≤ Eq. (16), which completes the proof of the second result. which is equivalent to the required condition v(τ) 0. It ≤ Next we prove the first result, which is equivalent to proving follows from the convexity of epi p and from Eq. (11) that that v(τ) > 0 if τ < τd∗ because v(τ) is convex. Observe that (0, τd∗) epi p∗∗ = cl epi p. Thus, τ < τ ∗ p∗∗(0) is equivalent to (0, τ) / cl epi p, which implies ∈ d ≡ ∈ that η > 0, (u, ω) epi p such that (u, ω) (0, τd∗) < η. ∀ ∃ ∈ k − k 0 < inf u (u, τ) cl epi p u { | ∈ } Note that = inf u (u, τ) epi p [19] u { | ∈ }  (i)  ?? lim inf p(u) u  = lim inf p (u) u  = inf u x such that F (x, u) τ = v(τ),  0 | | ≤  0 | | ≤ ↓ ↓ [17] u { | ∃ ∈ X ≤ } (ii) ?? (iii) = p (0) = τd∗, which completes the proof.

6 Estrin and Friedlander General duality framework. We generalize the notion of the We can use Proposition 2 to establish that certain op- level-set method to arbitrary perturbations and therefore more timization problems that do not satisfy a Slater constraint general notions of duality. In this case we are interested in qualification still enjoy strong duality. As an example, consider the value function pair the conic optimization problem

p(u) = inf F (x, u), [20a] minimize c, x subject to x = b, x , [21] x x h i A ∈ K ∈X v(τ) = inf u F (x, u) τ , [20b] where : E is a linear map between Euclidean x { k k | ≤ } A E1 → 2 ∈X spaces and , and is a closed proper convex E1 E2 K ⊆ E1 where is any norm. Because p is parameterized by a vector cone. This wide class of problems includes linear program- k·k u, we must consider the norm of the perturbation, rather than ming (LP), second-order programming (SOCP), and SDPs, just its value. Therefore, v(τ) is necessarily non-negative. and has many important scientific and engineering applica- We are thus interested in the leftmost root of the equation tions (21). If c is contained in the interior of the dual cone ∗ = y x, y 0 x , then c, x > 0 for all fea- v(τ) = 0, rather than an inequality as in Theorem 1. K { ∈ E1 | h i ≥ ∀ ∈ K } h i sible x . Equivalently, the function f(x) := c, x + δ (x) ∈ K h i K Theorem 4. For the functions p and v defined by Eq. (20), is coercive. Thus, Eq. (21) is equivalent to the problem

τd∗ = inf τ v(τ) = 0 and v(τ) = 0 for all τ > τd∗. minimize f(x) subject to x = b, { | } x A The proof is almost identical to that of Theorem 1, except which has a coercive objective. Thus, Part (a) of Proposition 2 that we treat u as a vector, and replace u by u in Eqs. (16), applies, and strong duality holds. k k (17), and (19). A concrete application of this model problem is the SDP Theorems 1 and4 imply that v(τ) 0 for all values larger relaxation of the celebrated phase-retrieval problem (22, 23) ≤ than the optimal dual value (the inequality τ > τd∗ is strict, ∗ minimize tr(X) subject to X = b, X 0, [22] as v(τd ) may be infinite). Thus if strong duality does not X A  hold, then v(τ) identifies the wrong optimal value for the where is now the cone of Hermitian positive semidefinite ma- original problem that is being solved. This means that the K level-set method may provide a point that is arbitrarily close trices (i.e., all the eigenvalues are real-valued and nonnegative) and c = I is the identity matrix, so that C,X = tr(X). In to feasibility, but is at least a fixed distance away from the true h i solution independent of how close to feasibility the returned that setting, (22) prove that with high probability, the feasible point may be. set of Eq. (21) is a rank-1 singleton (the desired solution), and thus we cannot use Slater’s condition to establish strong duality. However, because is self dual (24, Example 2.24), Sufficient conditions for strong duality K clearly c int , and by the discussion above, we can use ∈ K The condition that 0 dom p may be interpreted as Slater’s Proposition 2 to establish that strong duality holds Eq. (22). ∈ constraint qualification (20, §3.2), which in the context of (P) A consequence of Proposition 2 is that it is possible to requires that there exist a point xˆ in the domain of f and for modify (P) in order to guarantee strong duality. In particular, which g(xˆ) < 0. This condition is sufficient to establish strong we may regularize the objective, and instead consider a version duality. Here we show how Theorem 1 can be used as a device of the problem with the objective as f(x) + µ x , where the k k to characterize an alternative set of sufficient conditions that parameter µ controls the degree of regularization contributed continue to ensure strong duality even for problems that do by the regularization term x . If, for example, f is bounded k k not satisfy Slater’s condition. below on , the regularized objective is then coercive and X Proposition 2 asserts that the revised problem satisfies strong Proposition 2. Problem (P) satisfies strong duality if either duality. Thus, the optimal value function of the level-set one of the following conditions hold: problem has the correct root, and the level-set method is applicable. For toy problems such as Examples 1 and2, where (a) the objective f is coercive, i.e., f(x) as x ; → ∞ k k → ∞ all of the feasible points are optimal, regularization would not (b) is compact. perturb the solution; however, in general we expect that the X regularization will perturb the resulting solution—and in some cases this may be the desired outcome. Proof. Consider the level-set problem (Qτ ) and its correspond- v τ ing optimal-value function ( ) given by Eq. (1). In either ACKNOWLEDGMENTS. The authors are indebted to Professor case (a) or (b), the feasible set x f(x) τ of Eq. (1) { ∈ X | ≤ } Bruno F. Lourenço of Seikei University for fruitful discussions that is compact because either is compact or the level sets of f followed the second author’s course on first-order methods at the X are compact. Therefore, (Qτ ) always attains its minimum for summer school associated with the 2016 International Conference on all τ inf f(x) x . Continuous Optimization, held in Tokyo. Professor Lourenço asked ≥ { | ∈ X } if level-set methods could be be applied to solve degenerate SDPs. Suppose strong duality does not hold. Theorem 1 then His thinking was that the level-set problems (Qτ ) might satisfy confirms that there exists a parameter τ (τ ∗, τ ∗) such that Slater’s constraint qualification even if the original problem (P) did ∈ d p v(τ) = 0. However, because (Qτ ) always attains its minimum, not, and therefore the level-set method might be useful as a way to alleviate numerical difficulties that can arise when an algorithm is there must exist a point xˆ such that f(xˆ) τ < τp∗ and ∈ X ≤ applied directly to an SDP without strong duality. The conclusion g(x) 0, which contradicts the fact that τ ∗ is the optimal ≤ p of this paper suggests that this is not always the case. value of (P). We have therefore established that τd∗ = τp∗ and hence that (P) satisfies strong duality.

Estrin and Friedlander 7 1. Higham NJ (2002) Accuracy and stability of numerical algorithms. (Society for Industrial and Applied Mathematics) Vol. 80. 2. van den Berg E, Friedlander MP (2007) SPGL1: A solver for large-scale sparse reconstruc- tion. http://www.cs.ubc.ca/labs/scl/spgl1. 3. van den Berg E, Friedlander MP (2008) Probing the pareto frontier for basis pursuit solutions. SIAM Journal on Scientific Computing 31(2):890–912. 4. Aravkin AY, Burke JV, Drusvyatskiy D, Friedlander MP, Roy S (2017) Level-set methods for convex optimization. ArXiv e-prints. Submitted to Mathematical Programming Journal. 5. Rockafellar RT, Wets RJB (1998) Variational Analysis. (Springer) Vol. 317. 3rd printing. 6. Wiegert J (2010) The sagacity of circles: A history of the isoperi- metric problem (https://www.maa.org/press/periodicals/convergence/ the-sagacity-of-circles-a-history-of-the-isoperimetric-problem). Accessed: 2018-07-25. 7. Markowitz HM (1987) Mean-Variance Analysis in Portfolio Choice and Capital Markets. (Frank J. Fabozzi Associates, New Hope, Pennsylvania). 8. Marquardt D (1963) An algorithm for least-squares estimation of nonlinear parameters. SIAM Journal on Applied Mathematics 11:431–441. 9. Morrison DD (1960) Methods for nonlinear least squares problems and convergence proofs in Proceedings of the Seminar on Tracking Programs and Orbit Determination, eds. Lorell J, Yagi F. (Jet Propulsion Laboratory, Pasadena, USA), pp. 1–9. 10. Conn AR, Gould NIM, Toint PhL (2000) Trust-Region Methods, MPS-SIAM Series on Opti- mization. (Society of Industrial and Applied Mathematics, Philadelphia). 11. van den Berg E, Friedlander MP (2013) SPGL1: A solver for large-scale sparse reconstruc- tion. 12. van den Berg E, Friedlander MP (2011) Sparse optimization with least-squares constraints. SIAM J. Optim. 21(4):1201–1229. 13. van den Berg E, Friedlander MP (2008) Probing the Pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31(2):890–912. 14. Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev. 43(1):129–159. 15. Candès EJ (2006) Compressive sampling in Proceedings of the International Congress of Mathematicians. 16. Donoho DL (2006) For most large underdetermined systems of linear equations, the mini- mal l1-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics 59. 17. Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1):33–61. 18. Aravkin AY, Burke J, Friedlander MP (2013) Variational properties of value functions. SIAM J. Optim. 23(3):1689–1717. 19. Rockafellar RT (1970) Convex Analysis. (Princeton University Press, Princeton). 20. Borwein J, Lewis AS (2010) Convex analysis and nonlinear optimization: theory and exam- ples. (Springer Science & Business Media). 21. Ben-Tal A, Nemirovski A (2001) Lectures on Modern Convex Optimization: Analysis, Algo- rithms, and Engineering Applications, MPS/SIAM Series on Optimization. (Society of Indus- trial and Applied Mathematics, Philadelphia) Vol. 2. 22. Candès EJ, Strohmer T, Voroninski V (2013) PhaseLift: exact and stable signal recovery from magnitude measurements via convex programming. Comm. Pure Appl. Math. 66(8):1241– 1274. 23. Waldspurger I, d’Aspremont A, Mallat S (2015) Phase recovery, maxcut and complex semidef- inite programming. Math. Program. 149(1-2):47–81. 24. Boyd S, Vandenberghe L (2004) Convex optimization. (Cambridge University Press, Cam- bridge), pp. xiv+716.

8 Estrin and Friedlander