Lp Quasi-Norm Minimization

Lp Quasi-norm Minimization M.E. Ashour∗, C.M. Lagoa∗ and N.S. Aybaty ∗ The Department of Electrical Engineering and Computer Science, The Pennsylvania State University. y The Department of Industrial Engineering, The Pennsylvania State University. Abstract—The `p (0 < p < 1) quasi-norm is used as a sparsity- A sparse solution to (1) is defined as one that has a inducing function, and has applications in diverse areas, e.g., small number of entries whose magnitudes are significantly statistics, machine learning, and signal processing. This paper different than zero [1]. Indeed, many signals/images are either proposes a heuristic based on a two-block ADMM algorithm for sparse or compressible, i.e., can be approximated by a sparse tackling `p quasi-norm minimization problems. For p = s=q < 1, s; q 2 Z+, the proposed algorithm requires solving for the roots representation with respect to some transform domain. The of a scalar degree 2q polynomial as opposed to applying a soft development of a plethora of overcomplete waveform dictio- thresholding operator in the case of `1. We show numerical results naries motivate the basis pursuit principle that decomposes a for two example applications, sparse signal reconstruction from signal into a sparse superposition of dictionary elements [2]. few noisy measurements and spam email classification using support vector machines. Our method obtains significantly sparser Furthermore, sparsity finds application in object recognition solutions than those obtained by `1 minimization while achieving and classification problems, e.g., [5], and signal estimation similar level of measurement fitting in signal reconstruction, and from incomplete linear measurements known as compressed training and test set accuracy in classification. sensing [6], [7]. Reference [8] provides a comprehensive review of theoretical results on sparse solutions of linear I. INTRODUCTION systems and its applications in inverse problems. This paper considers problems of the form: Retrieving sparse solutions of underdetermined linear systems received tremendous attention over the past two decades; p , P p see [8] and references therein. Reference [10] identifies the min kxkp i2[n] jxij s:t: f(x) ≤ 0; (1) x major algorithmic approaches for tackling sparse approxima- n where p 2 (0; 1), [n] = f1; : : : ; ng, n 2 Z+, and f : R ! R tion problems, namely, greedy pursuit [11], convex relaxation is a convex possibly nonsmooth function. This formulation [2], [6], [7], [12], and nonconvex optimization [13], [14]. arises in areas as machine learning and signal processing. Problems seeking sparse solutions are often posed as For instance, let f(ui; vi)gi2[m] be a training set of feature- minff(x) + µg(x)g for some µ > 0 and a sparsity-inducing p label pairs (ui; vi), and m 2 Z+. In regression, one seeks penalty function g, e.g., g(x) = kxkp, where g can be to fit a model that relates vi 2 R to ui via solving (1) with either convex, e.g., p = 1, or nonconvex, e.g., 0 ≤ p < 1. f(x) = kUx − vk2 − , where the ith row of U 2 Rm×n is For a comprehensive reference on sparsity-inducing penalty constructed using ui, v = [vi]i2[m], and > 0. Alternatively, functions, see [15]. It has been shown that exact sparse signal if vi 2 {−1; 1g denotes a class label in a binary classification recovery from few measurements is possible via letting g be problem, one might seek to find a linear classifier with a the `1 norm if the measurement matrix satisfies a certain decision rule v^ = sign(u>x), e.g., using support vector restricted isometry property (RIP) [1], [16]. However, RIP is p machines, where the first entry of u and x are 1 and the a stringent property. Motivated by the fact that kxkp ! kxk0 bias term, respectively. The classifier can be obtained by as p ! 0, it is natural to consider the `p quasi-norm + 1 P > problem (0 < p < 1). It has been shown in [17] that `p solving (1) with f(x) = m i2[m] 1 − viui x − , where + minimization with p < 1 achieves perfect signal reconstruction (:) = max(:; 0). In both examples, the `p quasi-norm of x is used in (1) as a sparsity-inducing function. Problem (1) under less restrictive isometry conditions than needed for provides a tradeoff between how well a model performs on a `1. Several references considered sparse signal reconstruction certain task versus its complexity, controlled by . via nonconvex optimization, [13], [17]–[21] to name a few. We propose an ADMM algorithm for approximating the so- In [13], it is shown that replacing `1 norm with `p quasi- lution of (1). For p = s=q < 1, the computational complexity norm, signal recovery is possible using fewer measurements. Furthermore, [13] presents a simple projected gradient descent of the proposed algorithm is similar to `1 minimization except for the additional effort of solving for the roots of a scalar method that identifies a local minimizer of the problem. An degree 2q polynomial as opposed to applying the soft thresh- algorithm that uses operator splitting and Bregman iteration methods as well as a shrinkage operator is presented in [18]. olding operator for `1. We present numerical results showing Reference [19] proposes an algorithm based on the idea of that our method significantly outperforms `1 minimization in terms of the sparsity level of obtained solutions. locally replacing the original nonconvex objective function by quadratic convex functions that are easily minimized and This work was partially supported by National Institutes of Health (NIH) establishes connection to iterative reweighted least squares Grant R01 HL142732, National Science Foundation (NSF) Grant 1808266. [22]. In [20], an interior point potential reduction algorithm 978-1-7281-4300-2/19/$31.00 ©2019 IEEE 726 Asilomar 2019 Authorized licensed use limited to: Penn State University. Downloaded on August 04,2020 at 13:41:52 UTC from IEEE Xplore. Restrictions apply. n 1 is proposed to compute an -KKT solution in O( log ) Algorithm 1: ADMM (ρ > 0) iterations, where n is the dimension of x. Reference [21] 0 0 0 0 1 Initialize: y , z , λ , θ uses ADMM and proposes a generalized shrinkage operator 2 for k ≥ 0 do for nonconvex sparsity-inducing penalty functions. k k k+1 k λi k θi 3 (xi; ti) ΠX yi − ρ ; zi − ρ ; 8i 2 [n] II. ALGORITHM k+1 k+1 λk 4 y ΠY x + ρ This section develops a method for approximating the k+1 k+1 θk−1 5 z t + solution of (1). Problem (1) is convex at p = 1; hence, can be ρ k+1 k k+1 k+1 solved efficiently and exactly. However, the problem becomes 6 λ λ + ρ(x − y ) k+1 k k+1 k+1 nonconvex when p < 1. An epigraph equivalent formulation 7 θ θ + ρ(t − z ). of (1) is obtained by introducing the variable t = [ti]i2[n]: min 1>t x;t According to the expression of the augmented Lagrangian p s:t: ti ≥ jxij ; i 2 [n] (2) function in (5), it follows from (6) that the variables x and t f(x) ≤ 0; are updated via solving the following nonconvex problem: k k R2 λ θ where 1 is a vector of all ones. Let the nonconvex set X ⊂ min kx − yk + k2 + kt − zk + k2 be the epigraph of the scalar function jxjp, i.e., X = f(x; t) 2 x;t ρ ρ (10) 2 p R : t ≥ jxj g. Then, (2) can be cast as s:t: (xi; ti) 2 X ; i 2 [n]: X > Exploiting the separable structure of (10), one immediately min 1X (xi; ti) + 1 t x;t concludes that (10) splits into n independent 2-dimensional i2[n] (3) problems that can be solved in parallel, i.e., for each i 2 [n], s:t: f(x) ≤ 0; k k k+1 k λi k θi where 1 (:) is the indicator function to the set X , i.e., it (xi; ti) = ΠX y − ; z − ; (11) X i ρ i ρ evaluates to zero if its argument belongs to the set X and is +1 otherwise. ADMM exploits the structure of the problem where ΠX (:) denotes the Euclidean projection operator onto to split the optimization over the variables via iteratively the set X . Furthermore, (5) and (7) imply that y and z are solving fairly simple subproblems. In particular, we introduce independently updated as follows: auxiliary variables y = [y ] and z = [z ] and obtain ! i i2[n] i i2[n] λk an ADMM equivalent formulation of (3) given by: yk+1 = Π xk+1 + (12) Y ρ X > min 1X (xi; ti) + 1Y (y) + 1 z k x;t;y;z k+1 k+1 θ − 1 i2[n] z = t + : (13) (4) ρ s:t: x = y : λ Algorithm 1 summarizes the proposed ADMM algorithm. t = z : θ; It is clear that z, λ, and θ merit closed form updates. How- where Y is the 0-sublevel set of f, i.e., Y = fy 2 Rn : ever, updating (x; t) requires solving n nonconvex problems. f(y) ≤ 0g. The dual variables associated with the constraints Moreover, a projection onto the convex set Y is needed for x = y and t = z are λ and θ, respectively. Hence, the updating y which can lead to a heavy computational burden. Lagrangian function corresponding to (4) augmented with a In the following two sections, we present our approach for quadratic penalty on the violation of the equality constraints, handling these two concerns. with penalty parameter ρ > 0, is given by: III. NONCONVEX PROJECTION X > Lρ(x; t; y; z; λ; θ) = 1X (xi; ti) + 1Y (y) + 1 z In this section, we present the method used to tackle the i2[n] nonconvex projection problem required to update x and t.

Load more