arXiv:2105.00244v1 [math.OC] 1 May 2021 1] hi daflosteotmlt rd-f ewe t between trade-off optimality the minimum follows idea Their [10]. ovxpormigb 7,[11]. [7], by programming convex level-set findi exact root the σ getting non-linear a i.e. as problem, formulated is op tracing This frontier. ity Pareto convex differentiable a generates xliigterltosi ewe P between relationship the Exploiting a rvd qiaetsltos[] n h dao solvin of idea the and [8], ( solutions equivalent provide can nxcl piiigasqec fsimpler of sequence a optimizing tolerance inexactly noise the specify eainhpwt ipe rbe htcnb directly be prox-gradient the can as that such [7]: methods, problem primal-only simpler with a solved with relationship inlrpeetto 3,[] oigter 5,adimag and [5], theory coding [6]. overcomple [4], processing [2], [3], [1] representation sensing signal compressed including cations, where euaFli root-finding Regula-Falsi, etstivrin hc sannovxapoc sdt dev to used approach nonconvex formulations. a outlier-robust is which inversion, t dent’s ehd otebodrcaso ocne nes problems. inverse using nonconvex illustrated of level is class extends performance derivati broader provably levePractical the simple, approach the the to are and solve methods methods efficient, to Falsi and free, techniques Regula root-finding formulation. I Falsi set convex. con- likel be Regula nonconvex likelihood for to approach using to efficient constraint an subject this develop we requires letter, solution art 1-norm Prior straints. minimum the find esrstedt ifi,te‘os-wr’lvlstprobl minimize level-set ‘noise-aware’ to the is misfit, data the measures constraint: y vropeemti with matrix overcomplete S ℓ P n ssle ya nxc etnMto.Teresulting The Method. Newton inexact an by solved is and , τ thsbe nw o ogtm that time long a for known been has It nefiin a osolve to way efficient An ne Terms Index Abstract 1 ∈ ) o oei aysga rcsigapiain.Denoting applications. processing signal many in role jor AS eoeyusing recovery PARSE Nr iiiainwt euaFliTp Root Type Falsi Regula with Minimization -Norm ooti h ouinof solution the obtain to ( R ( σ P P M σ τ prahhsbe eeaie oohrisacsof instances other to generalized been has approach niae h os ee.P level. noise the indicates ) Sas ee-e omltosalwpattoesto practitioners allow formulations level-set —Sparse ) ℓ 1 samaueetvco,D vector, measurement a as nr n h es qae aamst which misfit, data squares least the and -norm minimize — minimize ℓ x x ℓ ∈ 1 ∈ 1 nr ujc oamsto likelihood or misfit a to subject -norm nr iiiain ocne models, nonconvex minimization, -norm R R N N .I I. NTRODUCTION k N < M ρ ℓ x σ ( 1 k y nr iiiainpasama- a plays minimization -norm n hnfidteslto by solution the find then and , ei ua,AesnrY rvi,adSaoi Sta´nczak Sławomir and Aravkin, Y. Aleksandr Vural, Metin 1 ( ( − P P τ σ σ s.t. D ) ) o ie os tolerance noise given a for and , x a rtpooe y[9], by proposed first was st eeo nexplicit an develop to is σ ) σ ρ sue nmn appli- many in used is ( s.t. n P and y ρ − idn Methods Finding ℓ stepnlythat penalty the as ∈ 1 ( k D rglrzdStu- -regularized P x τ x τ ( k R P ) ) 1 losoeto one allows τ M ≤ problems. ≤ ) × σ, and τ. N ihoods, timal- san as this n ( elop P -set em ve- ng he σ te l- g e ) ( I rsnsteproposed the presents III eut r icse nScinIV. Section in discussed are results endand defined .Prt Optimality Pareto A. and rnirapoce s otfidn n nxc ouin t solutions inexact of and finding sequence root use approaches frontier esbepito esbest i h e htcmrsdo comprised that set the The called ii) is set. points feasible optimal Pareto a of point feasible rnirta eel h eainbetween relation the reveals that frontier least-square convex the how well losses. Huber as show and loss, we t paper, Student’s nonconvex this In Falsi application Regula [15]–[17]. second outl develop A approaches to size. likelihoods robust t sample Student’s nonconvex than uses larger area be could covari of ates applications number where with data whose inhomogeneus nonconvex, high-dimensional models are mixture consider log-likelihood [14] negative and [13] in example, For otsacigitra:toiiilpit ihteopposi the with [12]. points convergence initial of assures choices two signs proper interval: the searching with models root guarant nonconvex convergence and offer convex that for methods type bracketing are sn ayueu ocne oesin models nonconvex useful many converge. to using guaranteed not are variants their and allow secant, advantages these of All Falsi free. also are ontrqiecneiyt rc h ot loignonconv allowing the root, in the functions trace loss to convexity require not do o-iereuto otfidn cee osolve to schemes finding root equation non-linear nieatscn ehdhsas endvlpd[7]. developed been also u has extension method an secant issue, inexact convex this an address the whic To . to the method, Newton’s requires approach of favor implementations success the Current guarantee limiting case. to procedure, frontier finding Pareto root the req of approach convexity root-finding the of implementations practical ( P P I P II. τ σ ne ipe‘ciecntan’cniin,problems conditions, constraint’ ‘active simple Under nti ok eas ekt solve to seek also we work, this In ento 1: Definition hsppri raie sflos nScinI,aPareto a II, Section In follows. as organized is paper This oigotieo h ovxcasoestewyfor way the opens class convex the of outside Moving nti ae,w introduce we paper, this In ntems eea ae h eainhpbetween relationship the case, general most the In ) ) pcfial,w r neetdi h pia objective optimal the in interested are we Specifically, . ( osntrqiecneiy[1 hoe .] However, 2.1]. Theorem [11, convexity require not does P yemtost eapidt ae hr Newton, where cases to applied be to methods type ARETO σ ) r qiaetfrsm pair some for equivalent are euaFalsi Regula ( yero nigmtoscnb sdwt the with used be can methods finding root type P F τ OTE AND RONTIER ) i) osolve to aeooptimal Pareto ( F P INDING σ ) yemtosaeitoue.Section introduced. are methods type omltos ial,teemethods these Finally, formulations. ( ( P P σ euaFalsi Regula σ M ) ) R . ETHODS EGULA ovrwietesimulation the while solver euaFalsi Regula stemnmlachievable minimal the is aeofrontier Pareto ( P F σ ALSI ( yederivative-free type ( ) ,σ τ, ( P P ywrigwith working by σ τ ) -T ) ) yemethods type formulations. YPE 1] Pareto [11]. and . ( P ( σ P R ( ) Regula P They . τ OOT ) σ ( sing ) uire P and ier- a o τ ex ee to te is h ) s 1 f - 2

value of the (Pτ ) for a given y and τ which can be expressed with following

ν(τ) := inf{ρ(Dx − y)| kxk ≤ τ}, (1) ) ) x 1 ( ( and the corresponding Pareto frontier can be defined as ψ(τ) := ν(τ) − σ. (2) (a) Convex ρ (b) Quasi-convex ρ Theorem 1: i) If ρ is a convex function (e.g. ℓ2-norm, Huber function), then so is ψ. ii) If ρ is a nonconvex function, Fig. 1: Pareto frontiers for convex and quasi-convex losses ρ. convexity of ψ does not follow. When ρ is quasi-convex function, then so is ψ. TABLE I: Regula Falsi-type methods with different µ values. Proof 1: Let us consider any two solutions x1 and x2 of Method µ (Pτ ) for any τ1 and τ2 respectively. Since ℓ1-norm is convex, Regula Falsi 1 for any β ∈ [0, 1] following holds Illinois 0.5 f(b) Pegasus f(b)+f(c) kβx1 + (1 − β)x2k ≤ β kx1k + (1 − β) kx2k Anderson-Bj¨orck 1 − f(c) , and in case 1 ≤ f(c) set γ = 0.5. 1 1 1 (3) f(b) f(b) = βτ1 + (1 − β)τ2.

An immediate outcome of eq. (3) is that βx1 + (1 − β)x2 is the ρ, bracketing type root finding methods are guaranteed to a feasible point of (Pτ ) with τ = βτ1 + (1 − β)τ2. Thus we solve (7). Here, we develop Regula Falsi type methods for (7). can write the following inequality We denote the solution of a nonlinear equation of f by x∗, i.e f(x∗)=0. With this notation, Regula Falsi type methods ν(βτ1 + (1 − β)τ2) ≤ ρ(D(βx1 + (1 − β)x2) − y) (4) starting with the points a and b proceed as follows. = ρ(β(Dx1 − y)+(1 − β)(Dx2 − y)). 1) Calculate the between a and b, i) If ρ is convex, then f(b) − f(a) sab = , (8) ρ(β(Dx1 − y)+(1 − β)(Dx2 − y)) ≤ βρ(Dx1 − y)+ b − a (5) (1 − β)ρ(Dx2 − y)) = βν(τ1)+(1 − β)ν(τ2), and find the point where (8) intersects the x-axis, which is c = b − f(b) . that shows ν is convex as well as ψ. sab 2) Calculate f(c). If f(c)=0 then x∗ = c, otherwise ii) If ρ is quasi-convex, then continue. ∗ ρ(β(Dx1 − y)+(1 − β)(Dx2 − y)) ≤ 3) Adjust the new interval: if f(c)f(b) < 0, x should be (6) max{ρ(Dx1 − y),ρ(Dx2 − y)} = max{ν(τ1),ν(τ2)}, in between b and c. Set that shows ν is quasi-convex as well as ψ. a = b, b = c, and f(a)= f(b), f(b)= f(c), (9) Pareto optimal points are unique for P with convex ( τ ) if f(c)f(b) > 0, x∗ should be in between a and c. and quasi-convex losses ρ that can be infered from [18, Theorem 1.1, Theorem 1.2], [19]. Also, the feasible set of b = c, and f(a)= µf(a), f(b)= f(c), (10) (P ) enlarges as τ increases, thus ψ(τ) is nonincreasing [20]. τ where µ is the scaling factor. In Fig. 1 an abstract ψ(τ) is depicted for convex and quasi- 4) Check the ending condition: if |b − a| ≤ ǫ, stop the convex losses ρ where the red line represents the σ level. iteration. Take Obtaining the solution of (Pσ) by solving (Pτ ) proceeds as follows. We start with a τ parameter to solve (P ), and using b, if |f(b)| ≤ |f(a) τ x∗ = , (11) the solution of (Pτ ), find a new τ value. We proceed iteratively (a, if |f(b)| > |f(a) until ψ(τσ) → 0. τσ occurs at the intersection of the red line and black curve in Fig 1, where we immediately see that the if |b − a| >ǫ, continue the iteration, go back to 1) with the values a,b and f(a),f(b) from 3). solution of (Pτ ) is also a solution of the (Pσ), a fact proven formally by [11]. Finding τσ can be formulated as a nonlinear Regula Falsi type methods differ from each other in the root finding problem. choice of the scaling factor µ. Several commonly considered µ in the literature is summarized in Table I. Additional options B. Regula Falsi Type Methods for µ are studied in [21], [22]. Our aim is to III. SOLVING (Pσ) find τ such that ψ(τ)=0. (7) A. (Pτ ) Solver If ρ is nonconvex, neither Newton’s method nor secant variants In order to solve (Pσ), we repeatedly solve (Pτ ). (Pτ ) can are guaranteed to solve (7). In particular, the tangent lines be solved using the simple projected gradient method may cross in the feasible region, and secant lines may not x(k) proj x(k−1) DT y Dx(k−1) (12) bracket the feasible area. In contrast, regardless of shape of = τB1 + γ ∇ρ( − ) .   3

1) Projection onto the ℓ -ball: Projection of a vector a = Problems id M N ρl(y) ρh(y) ρs(y) 1 cos-spike 3 1024 2048 102.2423 2378.8 25.09 [a1,a2, ..., aN ] onto the ℓ1-ball can be written as following gauss-en 11 256 1024 99.9055 1272.5 8.957 jitter 902 200 1000 0.4476 4.6881 0.0901 a, if kak ≤ 1 proj (a, τ)= 1 TABLE II: N,M, ρ(y) values for the problem setups. sgn(ai)max{|ai|− κ, 0}, else  (13) where κ is the Lagrangian multiplier of proj1 (a, τ) [23]. The Performances of the Regula Falsi type methods are inves- tricky part of the projection is to find the κ that satisifes N tigated for three different loss functions ρ which are Least Karush-Kuhn-Tucker optimality condition (|ai|− κ) = i=1 squares ρl(x)= kxk2, Huber τ in an efficient way. P N x2 To find κ, we utilized the simple, sorting based approach i , if |x |≤ δ ρ (x) = 2δ i , (14) introduced in [24]. Additional variations of this method are h i δ i=1 (|xi|− 2 , otherwise described in [25]. X N 2 To find κ: and Student’s t ρs(x) = i=1 ν log 1+ xi /ν where δ and ν are the tunning parameters for ρh and ρs, respectively; we • Sort |a| as: c1 ≥ c2 ≥ ... ≥ cN , −3 P −2  k take δ = 5 × 10 and ν = 10 in our simulations. N,M • Find K = max k | j=1 cj − τ /k ≤ ck , 1≤k≤N and ρ(y) values for the chosen problems are given in Table II. nK   o We solve (Pσ) for a range of σ values, chosen relative • Calculate κ = ckP− τ /K. k=1 to ρ(y), in particular 5 × 10−1ρ(y), 5 × 10−2ρ(y) and   −3 To solve (Pτ ), weP used a projected gradient method with 5 × 10 ρ(y). We compute residuals ρ(rσ) = ρ(y − Dxσ), the spectral line search strategy discussed by [9], with the norms kxσk1, the number of nonzero (nnz) for each solution projection steps given in III-A1. xσ, and iter the total solves of (Pτ ) to reach these solutions; results are displayed in Table III. B. Solving (Pσ) Newton’s method requires fewer (Pτ ) solves (see Table 1) Bracketing: In order to choose the root searching inter- III). However, solving (7) is more expensive for Newton’s val, we consider a well-known decomposition method called method than for Regula Falsi-type methods since the derivative method of frames (mof) [26]. The mof decomposition of a calculation of the nonlinear equation is required along with signal y can be obtained with the inverse linear mapping such the function evaluation, while Regula Falsi-type methods need T T −1 only the function evaluation. From a robustness standpoint, that xMF = D (DD ) y = arg min {kxk2 s.t. y = Dx}. Many common loss functions ρ are nonnegative and vanish at Newton, secant, and their variants are not guaranteed to the origin, including gauges and nonconvex losses considered converge for nonconvex loss functions, in contrast to Regula in this study. For these losses, under the assumption that D Falsi-type methods. is full row-rank, ρ(y − DxMF ) = 0 and ψ(τMF ) = −σ The Pareto frontier ν(τ) is shown for the gauss-en problem with losses ρ , ρ and ρ in Figure 2. We expect similar with τMF = kxMF k1. For the left endpoint, we consider l h s x = 0, and τ = 0, with loss equal to ρ(y). Bracketing the patterns in many problems. As expected, the Pareto frontier is root searching interval between the points xMF and 0 ensures nonconvex, and Newton iterations will not stay in the feasible finding a solution for (7) since they provide two initial points area for ρs in this example, since the tangent lines can leave with opposite signs for ψ, as long as ρ(y) > σ. the feasible region. Figure 3 shows a typical compressed sensing example. A 2) Solving (Pσ): We combine the proposed Regula Falsi 20-sparse vector is recovered using a normally distributed methods and a (Pτ ) solver to solve (Pσ) as follows: Parseval frame D ∈ R175×600. A measurement is generated • Choose initial τ values. according to y = Dx + w + ζ, where the noise w is zero Choose two initial values with opposite signs to ensure mean normal error with the variance of 0.005 and ζ has five convergence of Regula Falsi type Methods. The default randomly placed outliers with a zero mean normal distribution choice is given by τ =0 and τ = τMF . variance of 4. (Pσ) solved with the σ = ρ(ζ) for a fair • Apply the steps of Regula Falsi type methods. comparison of ρl, ρh and ρs. Huber loss is less sensitive to Every iteration of the Regula Falsi-type methods requires outliers in measurement data than least-squares, but Student’s solving (Pτ ), except at τ =0 and τ = τMF . t outperforms Huber loss since it grows sublinearly as outliers • Terminate once the stopping criteria are met. increase, a property also noted by [11].

IV. SIMULATIONS V. CONCLUSION In this note, we developed a new approach using bracketing In order to examine the performances of Regula Falsi type Regula Falsi-type methods, that enable level-set methods to methods given in Table I, we created a test environment be applied to sparse optimization problems with nonconvex and benchmark by using Sparco framework [27]. Real-valued likelihood constraints, significantly expanding their usability. problems are chosen from Sparco for the simulations among These methods achieve comparable performance to Newton’s this collection of test problems that includes many examples method for root finding on convex problems, and are guaran- from the literature. Details about the problems and related teed to converge in the nonconvex case, where Newton and publications can be found in [27]. secant variants may fail. 4

100 1300 9 ( ) ( ) 1000 ( )

50 5 500

0 0 0 0 10 20 0 10 20 0 10 20

Fig. 2: From left to right, ν(τ) for the gauss-en problem with ρl, ρh and ρs.

Fig. 3: Left, top to bottom: true signal, reconstructions with least squares, Huber and Student’s t losses. Right, top to bottom: true errors, least squares, Huber and Student’s t residuals.

TABLE III: Simulation Results for Solving (Pσ).

− − least squares Huber (δ = 5 × 10 3) Student’s t (ν = 10 2) Problems σ/ρ(y) Methods ρl(rσ) kxσk1 nnz iter ρh(rσ ) kxσ k1 nnz iter ρs(rσ ) kxσ k1 nnz iter Regula Falsi 51.12 66.24 2 10 1189.4 68.328 2 8 12.545 94.289 10 14 Illinois 51.12 66.24 2 7 1189.4 68.328 2 7 12.545 94.289 10 7 0.5 Pegasus 51.12 66.24 2 7 1189.4 68.328 3 6 12.545 94.289 10 6 And.-Bj¨orck 51.12 66.24 2 9 1189.4 68.328 3 9 12.545 94.289 10 10 Newton 51.12 66.24 2 3 1189.4 68.328 2 3 − − −− Regula Falsi 5.112 188.4 75 85 118.94 134.71 2 68 1.2105 171.09 79 72 Illinois 5.112 188.4 75 13 118.94 134.71 2 15 1.2214 172.65 78 16 cos-spike 0.05 Pegasus 5.112 188.4 75 12 118.94 134.71 2 14 1.259 170.95 76 34 And.-Bj¨orck 5.112 188.4 75 27 118.94 134.71 2 32 1.251 170.85 75 50 Newton 5.112 188.4 75 5 118.94 134.7 2 4 − − −− Regula Falsi 0.511 235.6 123 396 11.893 229.02 127 451 0.1272 231.15 124 383 Illinois 0.511 235.6 123 19 11.894 229.02 125 15 0.1183 231.15 124 30 0.005 Pegasus 0.511 235.6 123 19 11.894 229.02 130 14 0.118 231.16 121 27 And.-Bj¨orck 0.511 235.6 123 34 11.894 229.02 130 21 0.1273 231.13 122 19 Newton 0.511 235.6 123 5 11.894 229.02 125 3 − − −− Regula Falsi 49.95 11.9 16 9 636.27 11.97 19 9 4.479 16.6 139 11 Illinois 49.95 11.9 16 7 636.27 11.97 19 6 4.479 16.6 139 8 0.5 Pegasus 49.95 11.9 16 5 636.27 11.97 19 5 4.479 16.6 139 6 And.-Bj¨orck 49.95 11.9 16 7 636.27 11.97 19 7 4.479 16.6 139 8 Newton 49.95 11.9 16 4 636.27 11.97 19 4 − − −− Regula Falsi 4.995 26.45 35 26 63.63 26.406 50 27 0.448 27.82 39 24 Illinois 4.995 26.45 35 11 63.63 26.406 50 14 0.448 27.82 39 15 gauss-en 0.05 Pegasus 4.995 26.45 35 11 63.63 26.406 50 11 0.448 27.82 39 15 And.-Bj¨orck 4.995 26.45 35 15 63.63 26.406 50 27 0.448 27.82 39 28 Newton 4.995 26.45 35 4 63.63 26.406 50 4 − − −− Regula Falsi 0.499 28.06 35 208 6.3624 28.036 46 209 0.045 28.12 1024 207 Illinois 0.499 28.06 35 19 6.3627 28.036 46 19 0.045 28.12 1024 27 0.005 Pegasus 0.499 28.06 35 19 6.3627 28.036 46 20 0.045 28.12 1024 24 And.-Bj¨orck 0.499 28.06 35 58 6.3627 28.036 46 62 0.045 28.12 1024 65 Newton 0.499 28.06 35 3 6.3628 28.036 46 4 − − −− Regula Falsi 0.224 0.766 3 7 2.344 0.6959 3 7 0.045 0.4548 2 9 Illinois 0.224 0.766 3 6 2.344 0.6958 3 7 0.045 0.4548 2 7 0.5 Pegasus 0.224 0.766 3 6 2.344 0.6958 3 6 0.045 0.4548 2 6 And.-Bj¨orck 0.224 0.766 3 8 2.344 0.6958 3 8 0.045 0.4548 2 9 Newton 0.224 0.766 3 3 2.344 0.6958 3 3 − − −− Regula Falsi 0.022 1.537 3 39 0.2343 1.4338 3 43 0.005 1.2587 3 54 Illinois 0.022 1.537 3 12 0.2344 1.4338 3 15 0.005 1.2585 3 14 jitter 0.05 Pegasus 0.022 1.537 3 12 0.2344 1.4338 3 14 0.005 1.2585 3 12 And.-Bj¨orck 0.022 1.537 3 30 0.2344 1.4338 3 39 0.005 1.2585 3 41 Newton 0.022 1.537 3 2 0.2344 1.4338 3 4 − − −− Regula Falsi 0.002 1.614 3 375 0.0233 1.5646 3 391 0.0005 1.5086 3 413 Illinois 0.002 1.614 3 19 0.0234 1.5644 3 16 0.0005 1.508 3 16 0.005 Pegasus 0.002 1.614 3 20 0.0234 1.5644 3 14 0.0005 1.508 3 16 And.-Bj¨orck 0.002 1.614 3 35 0.0234 1.5644 3 27 0.0005 1.508 3 107 Newton 0.002 1.614 3 2 0.0235 1.5643 3 5 − − −− 5

REFERENCES [14] P. Buehlmann and S. Geer, “Non-convex loss functions and l1- regularization,” Statistics for High-Dimensional Data, pp. 293–338, [1] E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: 2011. exact signal reconstruction from highly incomplete frequency informa- [15] “Robust and trend-following kalman smoothers using student’s t*,” IFAC IEEE Transactions on Information Theory tion,” , vol. 52, no. 2, pp. Proceedings Volumes, vol. 45, no. 16, pp. 1215 – 1220, 2012, 16th IFAC 489–509, 2006. Symposium on System Identification. [2] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theor., [16] A. Y. Aravkin, J. V. Burke, and G. Pillonetto, “Sparse/robust esti- vol. 52, no. 4, p. 1289–1306, Apr. 2006. [Online]. Available: mation and kalman smoothing with nonsmooth log-concave densities: https://doi.org/10.1109/TIT.2006.871582 Modeling, computation, and theory,” The Journal of Machine Learning [3] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomic Research, vol. 14, no. 1, pp. 2689–2728, 2013. IEEE Transactions on Information Theory decomposition,” , vol. 47, [17] A. Aravkin, M. P. Friedlander, F. J. Herrmann, and T. Van Leeuwen, no. 7, pp. 2845–2862, 2001. “Robust inversion, dimensionality reduction, and randomized sampling,” [4] J. . Fuchs, “On sparse representations in arbitrary redundant bases,” Mathematical Programming, vol. 134, no. 1, pp. 101–125, 2012. IEEE Transactions on Information Theory, vol. 50, no. 6, pp. 1341– [18] P. M. Pardalos, A. Zilinskas,ˇ and J. Zilinskas,ˇ Non-convex multi-objective 1344, 2004. optimization. Springer, 2017. IEEE [5] E. J. Candes and T. Tao, “Decoding by linear programming,” [19] K. Miettinen, “Some methods for nonlinear multi-objective optimiza- Transactions on Information Theory , vol. 51, no. 12, pp. 4203–4215, tion,” in International conference on evolutionary multi-criterion opti- 2005. mization. Springer, 2001, pp. 1–20. [6] M. Lustig, J. H. Lee, D. L. Donoho, and J. M. Pauly, “Faster imaging [20] E. Van den Berg, “Convex optimization for generalized sparse recovery,” with randomly perturbed, undersampled spirals and —l—1 reconstruc- Ph.D. dissertation, University of British Columbia, 2009. tion,” in in Proceedings of the 13th Annual Meeting of ISMRM, 2005, [21] S. Galdino, “A family of regula falsi root-finding methods.” p. 685. [22] J. A. Ford, “Improved of illinois-type for the numerical [7] A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, and solution of nonlinear .” Mathematical S. Roy, “Level-set methods for convex optimization,” [23] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra, “Efficient Programming, vol. 174, no. 1, pp. 359–390, 2019. projections onto the l1-ball for learning in high dimensions,” in ICML [8] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive ’08: Proceedings of the 25th international conference on Machine Sensing , ser. Applied and Numerical Harmonic Analysis. Springer New learning, New York, NY, USA, 2008, pp. 272–279. [Online]. Available: York, 2013. http://doi.acm.org/10.1145/1390156.1390191 [9] E. van den Berg and M. Friedlander, “Probing the pareto [24] M. Held, P. Wolfe, and H. P. Crowder, “Validation of subgradient SIAM Journal on Scientific frontier for basis pursuit solutions,” optimization,” Math. Program., vol. 6, no. 1, pp. 62–88, Dec. 1974. Computing , vol. 31, no. 2, pp. 890–912, 2009. [Online]. Available: [Online]. Available: https://doi.org/10.1007/BF01580223 https://doi.org/10.1137/080714488 [25] L. Condat, “Fast Projection onto the Simplex and the l1 Ball,” SIAM [10] ——, “Sparse optimization with least-squares constraints,” Mathematical Programming, Series A, vol. 158, no. 1, pp. 575–585, Jul. Journal on Optimization , vol. 21, no. 4, pp. 1201–1229, 2011. [Online]. 2016. [Online]. Available: https://hal.archives-ouvertes.fr/hal-01056171 Available: https://doi.org/10.1137/100785028 [26] I. Daubechies, “Time-frequency localization operators: a geometric [11] A. Y. Aravkin, J. V. Burke, and M. P. Friedlander, “Variational properties phase space approach,” IEEE Transactions on Information Theory, SIAM Journal on optimization of value functions,” , vol. 23, no. 3, pp. vol. 34, no. 4, pp. 605–612, July 1988. 1689–1717, 2013. ˚ [27] E. van den Berg, M. P. Friedlander, G. Hennenfent, F. J. Herrmann, [12] G. Dahlquist and A. Bj¨orck, Numerical Methods, ser. Dover Books R. Saab, and O. Yilmaz, “Algorithm 890: Sparco: A testing framework on . Dover Publications, 2003. [Online]. Available: for sparse reconstruction,” ACM Trans. Math. Softw., vol. 35, no. 4, Feb. https://books.google.de/books?id=armfeHpJIwAC 2009. [Online]. Available: https://doi.org/10.1145/1462173.1462178 [13] N. St¨adler, P. Buehlmann, and S. Van De Geer, “l1-penalization for mixture regression models,” Test, vol. 19, no. 2, pp. 209–256, 2010.