L1-Norm Minimization with Regula Falsi Type Root Finding Methods

1 ℓ1-Norm Minimization with Regula Falsi Type Root Finding Methods Metin Vural, Aleksandr Y. Aravkin, and Sławomir Stańczak Abstract—Sparse level-set formulations allow practitioners to In the most general case, the relationship between (Pτ ) and find the minimum 1-norm solution subject to likelihood con- (Pσ) does not require convexity [11, Theorem 2.1]. However, straints. Prior art requires this constraint to be convex. In this practical implementations of the root-finding approach require letter, we develop an efficient approach for nonconvex likelihoods, using Regula Falsi root-finding techniques to solve the level- convexity of the Pareto frontier to guarantee success of the set formulation. Regula Falsi methods are simple, derivative- root finding procedure, limiting the approach to the convex free, and efficient, and the approach provably extends level-set case. Current implementations favor Newton’s method, which methods to the broader class of nonconvex inverse problems. requires derivatives. To address this issue, an extension using Practical performance is illustrated using ℓ1-regularized Stu- an inexact secant method has also been developed [7]. dent’s t inversion, which is a nonconvex approach used to develop outlier-robust formulations. In this paper, we introduce Regula Falsi type derivative-free non-linear equation root finding schemes to solve (P ). They Index Terms σ —ℓ1-norm minimization, nonconvex models, are bracketing type methods that offer convergence guarantee Regula-Falsi, root-finding for convex and nonconvex models with the proper choices of I. INTRODUCTION root searching interval: two initial points with the opposite signs assures convergence [12]. Regula Falsi type methods PARSE recovery using ℓ1-norm minimization plays a ma- do not require convexity to trace the root, allowing nonconvex S jor role in many signal processing applications. Denoting loss functions in the (P ) formulations. Finally, these methods RM RM×N σ y ∈ as a measurement vector, D ∈ as an are also derivative free. All of these advantages allow Regula overcomplete matrix with M <N, and ρ as the penalty that Falsi type methods to be applied to cases where Newton, measures the data misfit, the ‘noise-aware’ level-set problem secant, and their variants are not guaranteed to converge. is to minimize ℓ1-norm subject to a misfit or likelihood Moving outside of the convex class opens the way for constraint: using many useful nonconvex models in (Pσ) formulations. For example, in [13] and [14] consider mixture models whose (Pσ) minimize kxk1 s.t. ρ(y − Dx) ≤ σ, x∈RN negative log-likelihood are nonconvex, with applications to high-dimensional inhomogeneus data where number of covari- where σ indicates the noise level. Pσ is used in many applications, including compressed sensing [1] [2], overcomplete ates could be larger than sample size. A second application signal representation [3], [4], coding theory [5], and image area uses nonconvex Student’s t likelihoods to develop outlier- processing [6]. robust approaches [15]–[17]. In this paper, we show how Regula Falsi type root finding methods can be used with the An efficient way to solve (Pσ) is to develop an explicit relationship with a simpler problem that can be directly nonconvex Student’s t loss, as well the convex least-squares solved with primal-only methods, such as the prox-gradient and Huber losses. This paper is organized as follows. In Section II, a Pareto arXiv:2105.00244v1 [math.OC] 1 May 2021 algorithm [7]: frontier that reveals the relation between (Pτ ) and (Pσ) is (Pτ ) minimize ρ(y − Dx) s.t. kxk1 ≤ τ. defined and Regula Falsi type methods are introduced. Section x∈RN III presents the proposed (Pσ) solver while the simulation Exploiting the relationship between Pσ and Pτ allows one to results are discussed in Section IV. specify the noise tolerance σ, and then find the solution by II. PARETO FRONTIER AND REGULA FALSI-TYPE ROOT inexactly optimizing a sequence of simpler (Pτ ) problems. FINDING METHODS It has been known for a long time that P and P ( τ ) ( σ) Under simple ‘active constraint’ conditions, problems (P ) can provide equivalent solutions [8], and the idea of solving τ and (Pσ) are equivalent for some pair (τ, σ) [11]. Pareto (Pτ ) to obtain the solution of (Pσ) was first proposed by [9], frontier approaches use root finding and inexact solutions toa [10]. Their idea follows the optimality trade-off between the sequence of (Pτ ) to solve (Pσ). minimum ℓ1-norm and the least squares data misfit, which generates a differentiable convex Pareto frontier. This optimal- A. Pareto Optimality ity tracing is formulated as a non-linear equation root finding Definition 1: i) Pareto optimal is the minimal achievable problem, i.e. getting the exact τ for a given noise tolerance feasible point of a feasible set. ii) The set that comprised of σ, and is solved by an inexact Newton Method. The resulting Pareto optimal points is called the Pareto frontier. level-set approach has been generalized to other instances of In this work, we also seek to solve (Pσ) by working with convex programming by [7], [11]. (Pτ ). Specifically, we are interested in the optimal objective 2 value of the (Pτ ) for a given y and τ which can be expressed with following ν(τ) := inf{ρ(Dx − y)| kxk ≤ τ}, (1) ) ) x 1 ( ( and the corresponding Pareto frontier can be defined as ψ(τ) := ν(τ) − σ. (2) (a) Convex ρ (b) Quasi-convex ρ Theorem 1: i) If ρ is a convex function (e.g. ℓ2-norm, Huber function), then so is ψ. ii) If ρ is a nonconvex function, Fig. 1: Pareto frontiers for convex and quasi-convex losses ρ. convexity of ψ does not follow. When ρ is quasi-convex function, then so is ψ. TABLE I: Regula Falsi-type methods with different µ values. Proof 1: Let us consider any two solutions x1 and x2 of Method µ (Pτ ) for any τ1 and τ2 respectively. Since ℓ1-norm is convex, Regula Falsi 1 for any β ∈ [0, 1] following holds Illinois 0.5 f(b) Pegasus f(b)+f(c) kβx1 + (1 − β)x2k ≤ β kx1k + (1 − β) kx2k Anderson-Björck 1 − f(c) , and in case 1 ≤ f(c) set γ = 0.5. 1 1 1 (3) f(b) f(b) = βτ1 + (1 − β)τ2. An immediate outcome of eq. (3) is that βx1 + (1 − β)x2 is the ρ, bracketing type root finding methods are guaranteed to a feasible point of (Pτ ) with τ = βτ1 + (1 − β)τ2. Thus we solve (7). Here, we develop Regula Falsi type methods for (7). can write the following inequality We denote the solution of a nonlinear equation of f by x∗, i.e f(x∗)=0. With this notation, Regula Falsi type methods ν(βτ1 + (1 − β)τ2) ≤ ρ(D(βx1 + (1 − β)x2) − y) (4) starting with the points a and b proceed as follows. = ρ(β(Dx1 − y)+(1 − β)(Dx2 − y)). 1) Calculate the secant line between a and b, i) If ρ is convex, then f(b) − f(a) sab = , (8) ρ(β(Dx1 − y)+(1 − β)(Dx2 − y)) ≤ βρ(Dx1 − y)+ b − a (5) (1 − β)ρ(Dx2 − y)) = βν(τ1)+(1 − β)ν(τ2), and find the point where (8) intersects the x-axis, which is c = b − f(b) . that shows ν is convex as well as ψ. sab 2) Calculate f(c). If f(c)=0 then x∗ = c, otherwise ii) If ρ is quasi-convex, then continue. ∗ ρ(β(Dx1 − y)+(1 − β)(Dx2 − y)) ≤ 3) Adjust the new interval: if f(c)f(b) < 0, x should be (6) max{ρ(Dx1 − y),ρ(Dx2 − y)} = max{ν(τ1),ν(τ2)}, in between b and c. Set that shows ν is quasi-convex as well as ψ. a = b, b = c, and f(a)= f(b), f(b)= f(c), (9) Pareto optimal points are unique for P with convex ( τ ) if f(c)f(b) > 0, x∗ should be in between a and c. and quasi-convex losses ρ that can be infered from [18, Theorem 1.1, Theorem 1.2], [19]. Also, the feasible set of b = c, and f(a)= µf(a), f(b)= f(c), (10) (P ) enlarges as τ increases, thus ψ(τ) is nonincreasing [20]. τ where µ is the scaling factor. In Fig. 1 an abstract ψ(τ) is depicted for convex and quasi- 4) Check the ending condition: if |b − a| ≤ ǫ, stop the convex losses ρ where the red line represents the σ level. iteration. Take Obtaining the solution of (Pσ) by solving (Pτ ) proceeds as follows. We start with a τ parameter to solve (P ), and using b, if |f(b)| ≤ |f(a) τ x∗ = , (11) the solution of (Pτ ), find a new τ value. We proceed iteratively (a, if |f(b)| > |f(a) until ψ(τσ) → 0. τσ occurs at the intersection of the red line and black curve in Fig 1, where we immediately see that the if |b − a| >ǫ, continue the iteration, go back to 1) with the values a,b and f(a),f(b) from 3). solution of (Pτ ) is also a solution of the (Pσ), a fact proven formally by [11]. Finding τσ can be formulated as a nonlinear Regula Falsi type methods differ from each other in the root finding problem. choice of the scaling factor µ. Several commonly considered µ in the literature is summarized in Table I. Additional options B. Regula Falsi Type Methods for µ are studied in [21], [22]. Our aim is to III. SOLVING (Pσ) find τ such that ψ(τ)=0.

Load more