AN EFFICIENT METHOD FOR

Seung-Jean Kim, Kwangmoo Koh, Michael Lustig, and Stephen Boyd

Department of Electrical Engineering, Stanford University, Stanford, CA 94305-9510 USA {sjkim,deneb1,mlustig,boyd}@stanford.edu

ABSTRACT 1.2. Solution methods Compressed sensing or compressive sampling (CS) has been When W is invertible, the CS problem (1) can be reformu- receiving a lot of interest as a promising method for signal lated as the ℓ -regularized problem (LSP) recovery and sampling. CS problems can be cast as convex 1 problems, and then solved by several standard methods such 2 minimize kAx − yk2 + λkxk1 (2) as interior-point methods, at least for small and medium size n problems. In this paper we describe a specialized interior- where the variable is x ∈ R and the problem data or pa- − m×n m point method for solving CS problems that uses a precondi- rameters are A = ΦW 1 ∈ R and y ∈ R . The tioned conjugate gradient method to compute the search step. ℓ1-regularized problem (2) can be transformed to a convex The method can efficiently solve large CS problems, by ex- quadratic program (QP), with linear inequality constraints, ploiting fast algorithms for the signal transforms used. The 2 n minimize kAx − yk + λui method is demonstrated with a medical resonance imaging i=1 (3) subject to −ui ≤ xi ≤ ui, i = 1,...,n, (MRI) example. P n n Index Terms— compressed sensing, compressive sam- where the variables are x ∈ R and u ∈ R . pling, ℓ1 regularization, interior-point methods, preconditioned The data A is typically fully dense, and so small conjugate gradients. and medium sized problems can be solved by standard con- vex optimization methods such as interior-point methods. The QP (3) that arises in compressed sensing applications has an 1. INTRODUCTION important difference from general dense QPs: there are a fast 1.1. Compressed sensing method for multiplying a vector by A and a fast method for multiplying a vector by AT , based on fast algorithms for the n Let z be an unknown vector in R . Suppose that we have m sparsifying transform and its inverse transform. A special- noisy linear measurements of z of the form ized interior-point method that exploits such algorithms may scale to large problems [4]. An example is l1-magic [5], yi = hφi,zi + vi, i = 1,...,m, which uses the conjugate gradient (CG) method to compute m where h·, ·i denotes the usual inner product, v ∈ R is the the search step. n noise, and φi ∈ R are known signals. Standard reconstruc- Specialized computational methods for problems of the tion methods require at least n samples. Suppose we know a form (2) include path-following methods and variants [6, 7, priori that z is compressible or has a sparse representation in 8]. When the optimal solution of (2) is extremely sparse, path- n×n a transform domain, described by W ∈ R (after expand- following methods can be very fast. Path-following methods ing the real and imaginary parts if necessary). In this case, if tend to be slow, as the number of nonzeros at the optimal the measurement vectors are well chosen, then the number of solution increases. Other recently developed computational measurements m can be dramatically smaller than the size n methods for ℓ1-regularized LSPs include coordinate-wise de- usually considered necessary. scent methods [9], bound optimization methods [10], sequen- Compressed sensing [1] or compressive sampling [2] ex- tial subspace optimization methods [11]), iterated shrinkage ploits the compressibility in the transform domain by solving methods [12, 13], and gradient projection algorithms [14]. a problem of the form Some of these methods can handle very large problems with 2 modest accuracy. minimize kΦz − yk + λkWzk1 (1) 2 The main goal of this paper is to describe a specialized n where the variable is z ∈ R and kxk1 = i |xi| denotes interior-point method for solving the QP (3). The method T m×n the ℓ norm. Here, Φ = [φ · · · φm] ∈ R is called uses a preconditioned conjugate gradient (PCG) method to 1 1 P the compressed sensing matrix, λ > 0 is the regularization compute the search step and therefore can exploit fast algo- parameter, and W is called the sparsifying transform. rithms for the sparsifying transform and its inverse transform. The specialized method is far more efficient than (primal- 3. AN INTERIOR-POINT METHOD dual) interior-point methods that use direct or CG methods to compute the search step. Compared with first-order methods We start by defining the logarithmic barrier for the bound con- such as coordinate descent methods, the specialized method is straints −ui ≤ xi ≤ ui in (3), comparable in solving large problems with modest accuracy, n n but is able to solve them with high accuracy with relatively Φ(x,u) = − log(ui + xi) − log(ui − xi) small additional computational cost. The method is demon- i=1 i=1 strated with an MRI example. X X n n with domain dom Φ = {(x,u) ∈ R × R | |xi| 0, α ∈ (0, 1/2), β ∈ (0, 1) ν = 2s(Ax − y), (5) n Initialize. t := 1/λ, x := 0, u := 1 = (1,..., 1) ∈ R . T with the scaling constant s = min 1, λ/kA νk∞ , is dual repeat feasible. Therefore G(ν) is a lower bound on the optimal  1. Compute the search direction (∆x, ∆u) value of (2). The difference between the primal objective as an approximate solution to the Newton system (7). value and the associated lower bound G(ν) is called the dual- 2. Backtracking line search. ity gap and denoted η: Find the smallest integer k ≥ 0 that satisfies n k k φt(x + β ∆x,u + β ∆u) 2 i η = kAx − yk2 + λ|x | − G(ν). (6) k T ∆x i=1 ≤ φt(x,u) + αβ ∇φt(x,u) . X ∆u k   The duality gap is always nonnegative, and is zero at the op- 3. Update . (x,u) := (x,u) + β (∆x, ∆u). timal point. 4. Construct dual feasible point ν from (5). 5. Evaluate duality gap η from (6). 4. APPLICATION TO SPARSE MRI 6. quit if η/G(ν) ≤ ǫrel. 7. Update t. In this section we demonstrate the interior-point method de- scribed in Section 3 with real Magnetic Resonance Imaging As a stopping criterion, the method uses the duality gap (MRI) data, using algorithm parameters α = 0.01, β = 0.5, divided by the dual objective value. By weak duality, the ratio smin = 0.5, µ = 2, ξ = 0.01, and ǫrel = 0.05. The regu- is an upper bound on the relative suboptimality. larization parameter is taken as λ = 0.01. The method was The update rule we propose is implemented in Matlab, and run on a 3.2GHz Pentium IV un- der Linux. max µ min{t,tˆ },t , s ≥ smin In MRI, samples are collected directly in the spatial fre- t := quency domain of the object of interest. The scan time in ( t, s 1 and smin ∈ (0, 1] are algorithm one can significantly reduce the number of acquisition sam- parameters. The choice of µ = 2 and smin = 0.5 appears ples and hence the scan time. This approach is referred to as to give good performance for a wide range of problems. This sparse MRI [3]. Here, the compressed sensing matrix Φ in (1) rule has been used in solving l1-regularized logistic regression is a random Fourier ensemble, i.e., a matrix obtained by ran- problems in [16]. See [16] for an informal justification of domly removing many rows of the discrete multi-dimensional convergence of the interior-point method based on this update Fourier transform (DFT) matrix. Brain images have a sparse rule (with exact search directions). representation in the domain. In the example shown, We compute the search direction approximately, applying we use the Daubechies 4 wavelet transform as the sparsifying the PCG method [17, §6.6] to the Newton system (7). It uses transform W in (1). a (symmetric positive definite) preconditioner P ∈ R2n×2n We scanned the brain of a healthy volunteer. We obtained that approximates the Hessian of φt(x,u), 205 out of 512 possible parallel lines in the spatial frequency of the image. The lines were chosen randomly with higher 2 2 2 2 2n×2n ∇ φt(x,u) = t∇ kAx − yk + ∇ Φ(x,u) ∈ R . density sampling at low frequency achieving a 2.5 scan-time reduction factor, as illustrated in the left panel of Fig. 1. The preconditioner approximates the first term with its diag- We compared the compressed sensing reconstruction method onal entries, while retaining the second term: with a linear reconstruction method, which sets unobserved Fourier coefficients to zero and then performs the inverse Fourier 2 2 2 R2n×2n P = diag(t∇ kAx − yk ) + ∇ Φ(x,u) ∈ . transform. Fig. 1 shows the two reconstruction results. The linear reconstruction suffers from incoherent noise-like streak- (Here diag(S) is the diagonal matrix obtained by setting the ing artifacts (pointed by the arrow) due to undersampling, off-diagonal entries of the matrix S to zero.) The cost of com- whereas the artifacts seem benign in the compressed sensing puting the diagonal entries can be amortized over all interior- reconstruction. point iterations since we need to compute them only once. The QP for compressed sensing reconstruction has around The computational effort of each iteration of the PCG al- 2 6 gorithm is dominated by one matrix-vector product of the 4 × 512 ≈ 10 variables. (Here one half are the real and 2 imaginary wavelet coefficients and the other half are new vari- form Hp with the Hessian H = ∇ φt(x,u) and one solve − ables added in transforming the CS problem into a QP.) The step of the form P 1r with the preconditioner P . The cost run time of the Matlab implementation of our interior-point of computing Hp is cheap, since we can use fast algorithms method was around minutes, and the total number of PCG for the transforms Φ and W (e.g., fast discrete wavelet and 3 −1 steps required over all interior-point iterations was 137. MOSEK Fourier transforms). The cost of computing P rk is O(n) flops. [18] could not handle the QP, since forming the Hessian H (let alone computing the search direction) is prohibitively ex- The PCG algorithm has two parameters: the initial point x 0 pensive for direct methods. and relative tolerance ǫpcg. For the initial point in the PCG al- gorithm, we use the previous search direction. The PCG rela- tive tolerance parameter ǫpcg has to be carefully chosen to ob- 5. EXTENSIONS tain good efficiency in the interior-point method. We change the relative tolerance adaptively as ǫpcg = min {0.1,ξη/kgk2}, Although not described here in detail, the interior-point method where η is the duality gap at the current iterate and ξ is an al- described in Section 3 can be readily extended to other prob- gorithm parameter. (The choice of ξ = 0.01 appears to work lems that have a similar form. For instance, it can be readily well for a wide range of problems.) Thus, we solve the New- extended to CS problems where the ℓ1 norm of a complex ton system with low accuracy at early iterations, and solve it vector is the sum of the absolute values of the complex ele- more accurately as the duality gap decreases. ments. Fig. 1: Brain image reconstruction results. Left. Collected partial Fourier coefficients (in white). Middle. Linear reconstruc- tion. Right. Compressed sensing reconstruction.

6. REFERENCES [10] M. Figueiredo and R. Nowak, “A bound optimization approach to wavelet-based image deconvolution,” in [1] D. Donoho, “Compressed sensing,” IEEE Transactions Proceedings of IEEE International Conference on Im- on Information Theory, vol. 52, no. 4, pp. 1289–1306, age Processing (ICIP), 2005, pp. 782–5. 2006. [11] G. Narkiss and M. Zibulevsky, “Sequential subspace [2] E. Candes,` “Compressive sampling,” in Proceedings of optimization method for large-scale unconstrained prob- International Congress of Mathematics, 2006. lems,” 2005, Tech. Report CCIT No 559, EE Dept., Technion. [3] M. Lustig, D. Donoho, and J. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imag- [12] I. Daubechies, M. Defrise, and C. De Mol, “An iterative ing,” 2007, Revised for publication in Magnetic Reso- thresholding algorithm for linear inverse problems with nance in Medicine. a sparsity constraint,” Communications on Pure and Ap- plied Mathematics, vol. 57, pp. 1413–1541, 2004. [4] S. Chen, D. Donoho, and M. Saunders, “Atomic decom- position by pursuit,” SIAM Review, vol. 43, no. 1, [13] M. Elad, B. Matalon, and M. Zibulevsky, “Image de- pp. 129–159, 2001. noising with shrinkage and redundant representations,” in Proceedings of the IEEE Computer Society Confer- [5] E. Candes` and J. Romberg, l1-magic: A collection of ence on Computer Vision and Pattern Recogn (CVPR), MATLAB routines for solving the convex optimization 2006, vol. 2, pp. 1924–1931. programs central to compressive sampling, 2006, Avail- able from www.acm.caltech.edu/l1magic/. [14] M. Figueiredo, R. Nowak, and S. Wright, “Gradi- ent projection for sparse reconstruction: Application to [6] D. Donoho and Y. Tsaig, “Fast solution of compressed sensing and other inverse problems,” 2007, l1-norm minimization problems when the solution Manuscript. Available from http://www.lx.it. may be sparse,” 2006, Manuscript. Available pt/∼mtf/GPSR/. from http://www.stanford.edu/%7Etsaig/ Papers/FastL1.pdf. [15] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004. [7] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, [16] K. Koh, S.-J. Kim, and S. Boyd, “An interior-point “Least angle regression,” Annals of Statistics, vol. 32, no. 2, pp. 407–499, 2004. method for ℓ1-regularized logistic regression,” 2007, To appear in Journal of Machine Learning Research. [8] M. Osborne, B. Presnell, and B. Turlach, “A new ap- Available from www.stanford.edu/∼boyd/l1 proach to variable selection in least squares problems,” logistic reg.html. IMA Journal of Numerical Analysis, vol. 20, no. 3, pp. [17] J. Demmel, Applied Numerical Linear Algebra, Society 389–403, 2000. for Industrial and Applied Mathematics, 1997. [9] J. Friedman, T. Hastie T, and R. Tibshirani, “Path- [18] MOSEK ApS, The MOSEK Optimization Tools Ver- wise coordinate optimization,” 2007, Manuscript avail- sion 2.5. User's Manual and Reference, 2002, Available www-stat.stanford.edu/∼hastie/ able from from www.mosek.com. pub.htm.