An Interior-Point Method for Large-Scale -Regularized Least Squares 607

606 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 1, NO. 4, DECEMBER 2007 An Interior-Point Method for Large-Scale I-Regularized Least Squares Seung-Jean Kim, Member, IEEE, K. Koh, M. Lustig, Stephen Boyd, Fellow, IEEE, and Dimitry Gorinevsky, Fellow, IEEE Abstract—Recently, a lot of attention has been paid to I A. -Regularized Least Squares regularization based methods for sparse signal reconstruction (e.g., basis pursuit denoising and compressed sensing) and feature selection (e.g., the Lasso algorithm) in signal processing, statistics, A standard technique to prevent over-fitting is or Tikhonov and related fields. These problems can be cast as I-regularized least-squares programs (LSPs), which can be reformulated as regularization [38], which can be written as convex quadratic programs, and then solved by several standard methods such as interior-point methods, at least for small and medium size problems. In this paper, we describe a specialized (1) interior-point method for solving large-scale I-regularized LSPs that uses the preconditioned conjugate gradients algorithm to where is the regularization parameter. The Tikhonov compute the search direction. The interior-point method can solve regularization problem or -regularized least-squares program large sparse problems, with a million variables and observations, in a few tens of minutes on a PC. It can efficiently solve large dense (LSP) has the analytic solution problems, that arise in sparse signal recovery with orthogonal transforms, by exploiting fast algorithms for these transforms. The method is illustrated on a magnetic resonance imaging data (2) set. Index Terms—Basis pursuit denoising, compressive sampling, We list some basic properties of Tikhonov regularization, compressed sensing, convex optimization, interior-point methods, which we refer to later when we compare it to -regularized least squares, preconditioned conjugate gradients, I regulariza- least squares. tion. • Linearity. From (2), we see that the solution to the Tikhonov regularization problem is a linear function of . • Limiting behavior as .As converges I. INTRODUCTION to the Moore–Penrose solution , where is the Moore–Penrose pseudoinverse of . The limit point E consider a linear model of the form has the minimum -norm among all points that satisfy W : where is the vector of unknowns, is the vector of observations, is the noise, and is the data matrix. • Convergence to zero as . The optimal solution When and the columns of are linearly indepen- tends to zero, as . dent, we can determine by solving the least squares problem • Regularization path. The optimal solution is a smooth of minimizing the quadratic loss , where function of the regularization parameter , as it varies denotes the norm of . over .As decreases to zero, converges to the When , the number of observations, is not large enough Moore–Penrose solution; as increases, converges to compared to , simple least-squares regression leads to over-fit. zero. The solution to the Tikhonov regularization problem can be Manuscript received January 30, 2007; revised August 30, 2007. This work computed by direct methods, which require flops (as- was supported by the Focus Center Research Program Center for Circuit and suming that is order or less), when no structure is exploited. System Solutions award 2003-CT-888, by JPL award I291856, by the Precourt Institute on Energy Efficiency, by Army award W911NF-07-1- 0029, by NSF The solution can also be computed by applying iterative (nondi- awards ECS-0423905 and 0529426, by DARPA award N66001-06-C-2021, by rect) methods (e.g., the conjugate gradients method) to the linear NASA award NNX07AEIIA, by AFOSR award FA9550-06-1-0514, and by system of equations . Iterative methods AFOSR award FA9550-06-1-0312. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Yonina Eldar. are efficient especially when there are fast algorithms for the ma- The authors are with the Information Systems Lab, Department of trix-vector multiplications with the data matrix and its trans- Electrical Engineering, Stanford University, Stanford, CA 94305 USA pose (i.e., and with and ), which (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]). is the case when is sparse or has a special form such as partial Digital Object Identifier 10.1109/JSTSP.2007.910971 Fourier and wavelet matrices. 1932-4553/$25.00 © 2007 IEEE KIM et al.: AN INTERIOR-POINT METHOD FOR LARGE-SCALE -REGULARIZED LEAST SQUARES 607 B. -Regularized Least Squares method from incomplete measurements (e.g., [4], [7], [6], [11], In -regularized least squares (LS), we substitute a sum of [12], [54], [53]). In statistics, the idea of regularization is used absolute values for the sum of squares used in Tikhonov regu- in the well-known Lasso algorithm [52] for feature selection and larization, to obtain its extensions including the elastic net [63]. Some of these problems do not have the standard form (3) but have a more general form (3) (5) where denotes the norm of and is the regularization parameter. We call (3) an -regularized LSP. This problem always has a solution, but it need not be unique. where are regularization parameters. (The variables We list some basic properties of -regularized LS, pointing that correspond to are not regularized.) This general out similarities and differences with -regularized LS. problem can be reformulated as a problem of the form (3). • Nonlinearity. From (2), we see that Tikhonov regulariza- We now turn to the computational aspect of -regularized tion yields a vector , which is a linear function of the LS, the main topic of this paper. There is no analytic formula vector of observations . By contrast, -regularized least or expression for the optimal solution to the -regularized LSP squares yields a vector , which is not linear in . (3), analogous to (2); its solution must be computed numeri- • Limiting behavior as . -regularized LS shows a cally. The objective function in the -regularized LSP (3) is different limiting behavior with -regularized LS, as convex but not differentiable, so solving it is more of a compu- .In -regularized LS, the limiting point has the minimum tational challenge than solving the -regularized LRP (1). norm among all points that satisfy . Generic methods for nondifferentiable convex problems, • Finite convergence to zero as . As in Tikhonov reg- such as the ellipsoid method or subgradient methods [51], [43], ularization, the optimal solution tends to zero, as . can be used to solve the -regularized LSP (3). These methods For -regularized LS, however, the convergence occurs are often very slow. for a finite value of : The -regularized LSP (3) can be transformed to a convex quadratic problem, with linear inequality constraints. The (4) equivalent quadratic program (QP) can be solved by standard convex optimization methods such as interior-point methods where denotes the norm of the [32], [37], [57], [58]. Standard interior-point methods are vector .For , the optimal solution of the -reg- implemented in general purpose solvers including [34], ularized LSP (3) is 0. In contrast, the optimal solution to the which can readily handle small and medium size problems. Tikhonov regularization problem is zero only in the limit Standard methods cannot handle large problems in which there as . (The derivation of the formula (4) for is are fast algorithms for the matrix-vector operations with given in Section III-A.) and . Specialized interior-point methods that exploit such • Regularization path The solution to the Tikhonov reg- algorithms can scale to large problems, as demonstrated in ularization problem varies smoothly as the regularization [8], [27]. High-quality implementations of specialized inte- parameter varies over . By contrast, the regulariza- rior-point methods include [5] and [50], which tion path of the -regularized LSP (3), i.e., the family of use iterative algorithms, such as the conjugate gradients (CG) solutions as varies over , has the piecewise-linear or LSQR algorithm [42], to compute the search step. solution path property [15]: There are values , Recently, several researchers have proposed homotopy with , such that the regular- methods and variants for solving -regularized LSPs [14], ization path is a piecewise linear curve on [24], [15], [46], [40]. Using the piecewise linear property of the regularization path, path-following methods can compute efficiently the entire solution path in an -regularized LSP. When the solution of (13) is extremely sparse, these methods can be very fast, since the number of kinks the methods need where solves the -regularized LSP (3) with . to find is modest [14]. Otherwise, the path-following methods (So and when .) can be slow, which is often the case for large-scale problems. More importantly, -regularized LS typically yields a sparse Other recently developed computational methods for -reg- vector , that is that has relatively few nonzero coefficients. ularized LSPs include coordinate-wise descent methods [19], (As decreases, it tends to be sparser but not necessarily [26], a fixed-point continuation method [23], Bregman iterative [52].) In contrast, the solution to the Tikhonov regularization regularization based methods [41], [59], sequential subspace problem typically has all coefficients nonzero. optimization methods [35], bound optimization methods [17], Recently, the idea of regularization has been receiving a iterated shrinkage methods [9], [16], gradient methods [36], lot of interest in signal processing and statistics. In signal pro- and gradient projection algorithms [18]. Some of these methods cessing, the idea of regularization comes up in several con- including the gradient projection algorithms [18] can efficiently texts including basis pursuit denoising [8] and a signal recovery handle very large problems. 608 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 1, NO. 4, DECEMBER 2007 C. Outline well chosen, then the number of measurements can be dra- matically smaller than the size usually considered necessary The main purpose of this paper is to describe a specialized [4], [11].

An Interior-Point Method for Large-Scale -Regularized Least Squares 607

Instrumental Regression in Partially Linear Models

PARAMETER DETERMINATION for TIKHONOV REGULARIZATION PROBLEMS in GENERAL FORM∗ 1. Introduction. We Are Concerned with the Solut

Statistical Foundations of Data Science

Regularized Radial Basis Function Models for Stochastic Simulation

Regularization Tools

On the Determination of Proper Regularization Parameter

The L-Curve and Its Use in the Numerical Treatment of Inverse Problems 1 Introduction

Arxiv:1707.06852V1 [Stat.ME] 21 Jul 2017 Dynamical System Dx T = G(T, X , Θ), (1.1) Dt T Where G Is a Known Function and Θ Is a Parameter

Bayesian Analysis of Linear Inverse Problems with Applications in Economics and Finance

Regularization of Least Squares Problems

Kernel Instrumental Variable Regression

Sequential Tikhonov Regularization: an Alternative Way for Integral Inversion … Fachbeitrag