AN EFFICIENT METHOD for COMPRESSED SENSING Seung
Total Page:16
File Type:pdf, Size:1020Kb
AN EFFICIENT METHOD FOR COMPRESSED SENSING Seung-Jean Kim, Kwangmoo Koh, Michael Lustig, and Stephen Boyd Department of Electrical Engineering, Stanford University, Stanford, CA 94305-9510 USA {sjkim,deneb1,mlustig,boyd}@stanford.edu ABSTRACT 1.2. Solution methods Compressed sensing or compressive sampling (CS) has been When W is invertible, the CS problem (1) can be reformu- receiving a lot of interest as a promising method for signal lated as the ℓ -regularized least squares problem (LSP) recovery and sampling. CS problems can be cast as convex 1 problems, and then solved by several standard methods such 2 minimize kAx − yk2 + λkxk1 (2) as interior-point methods, at least for small and medium size n problems. In this paper we describe a specialized interior- where the variable is x ∈ R and the problem data or pa- − m×n m point method for solving CS problems that uses a precondi- rameters are A = ΦW 1 ∈ R and y ∈ R . The tioned conjugate gradient method to compute the search step. ℓ1-regularized problem (2) can be transformed to a convex The method can efficiently solve large CS problems, by ex- quadratic program (QP), with linear inequality constraints, ploiting fast algorithms for the signal transforms used. The 2 n minimize kAx − yk + λui method is demonstrated with a medical resonance imaging i=1 (3) subject to −ui ≤ xi ≤ ui, i = 1,...,n, (MRI) example. P n n Index Terms— compressed sensing, compressive sam- where the variables are x ∈ R and u ∈ R . pling, ℓ1 regularization, interior-point methods, preconditioned The data matrix A is typically fully dense, and so small conjugate gradients. and medium sized problems can be solved by standard con- vex optimization methods such as interior-point methods. The QP (3) that arises in compressed sensing applications has an 1. INTRODUCTION important difference from general dense QPs: there are a fast 1.1. Compressed sensing method for multiplying a vector by A and a fast method for multiplying a vector by AT , based on fast algorithms for the n Let z be an unknown vector in R . Suppose that we have m sparsifying transform and its inverse transform. A special- noisy linear measurements of z of the form ized interior-point method that exploits such algorithms may scale to large problems [4]. An example is l1-magic [5], yi = hφi,zi + vi, i = 1,...,m, which uses the conjugate gradient (CG) method to compute m where h·, ·i denotes the usual inner product, v ∈ R is the the search step. n noise, and φi ∈ R are known signals. Standard reconstruc- Specialized computational methods for problems of the tion methods require at least n samples. Suppose we know a form (2) include path-following methods and variants [6, 7, priori that z is compressible or has a sparse representation in 8]. When the optimal solution of (2) is extremely sparse, path- n×n a transform domain, described by W ∈ R (after expand- following methods can be very fast. Path-following methods ing the real and imaginary parts if necessary). In this case, if tend to be slow, as the number of nonzeros at the optimal the measurement vectors are well chosen, then the number of solution increases. Other recently developed computational measurements m can be dramatically smaller than the size n methods for ℓ1-regularized LSPs include coordinate-wise de- usually considered necessary. scent methods [9], bound optimization methods [10], sequen- Compressed sensing [1] or compressive sampling [2] ex- tial subspace optimization methods [11]), iterated shrinkage ploits the compressibility in the transform domain by solving methods [12, 13], and gradient projection algorithms [14]. a problem of the form Some of these methods can handle very large problems with 2 modest accuracy. minimize kΦz − yk + λkWzk1 (1) 2 The main goal of this paper is to describe a specialized n where the variable is z ∈ R and kxk1 = i |xi| denotes interior-point method for solving the QP (3). The method T m×n the ℓ norm. Here, Φ = [φ · · · φm] ∈ R is called uses a preconditioned conjugate gradient (PCG) method to 1 1 P the compressed sensing matrix, λ > 0 is the regularization compute the search step and therefore can exploit fast algo- parameter, and W is called the sparsifying transform. rithms for the sparsifying transform and its inverse transform. The specialized method is far more efficient than (primal- 3. AN INTERIOR-POINT METHOD dual) interior-point methods that use direct or CG methods to compute the search step. Compared with first-order methods We start by defining the logarithmic barrier for the bound con- such as coordinate descent methods, the specialized method is straints −ui ≤ xi ≤ ui in (3), comparable in solving large problems with modest accuracy, n n but is able to solve them with high accuracy with relatively Φ(x,u) = − log(ui + xi) − log(ui − xi) small additional computational cost. The method is demon- i=1 i=1 strated with an MRI example. X X n n with domain dom Φ = {(x,u) ∈ R × R | |xi| <ui, i = 2. PRELIMINARIES 1,...,n}. The central path consists of the unique minimizer of the convex function We describe some basic ingredients necessary for the interior- n point method described in Section 3. 2 φt(x,u) = tkAx − yk2 + t λui + Φ(x,u), i=1 X 2.1. Dual problem as the parameter t varies from 0 to ∞. To derive a Lagrange dual of (2), we first write it in the equiv- In the primal interior-point method, we compute a se- alent form quence of points on the central path, for an increasing se- T n quence of values of t, starting from the previously computed minimize z z + i=1 λ|xi| subject to z = Ax − y, central point. In the primal barrier method, Newton’s method P t n m is used to minimize φ (x,u), i.e., the search direction is com- where the variables are x ∈ R and z ∈ R . We associate puted as the exact solution to the Newton system dual variables νi ∈ R, i = 1,...,m with the equality con- straints zi = (Ax − y)i. The Lagrange dual of (2) can be 2 ∆x ∇ φt(x,u) = −∇φt(x,u). (7) written as ∆u maximize G(ν) = −(1/4)νT ν − νT y T (4) (The reader is referred to [15, Chap.11] for more on the primal subject to kA νk∞ ≤ λ. barrier method.) The dual problem (4) is a convex optimization problem with For a large ℓ1-regularized LSP, solving the Newton sys- m m a variable ν ∈ R . We say that ν ∈ R is dual feasible if it tem exactly is not computationally practical. In the method satisfies the constraints in the dual problem (4). (See [15, §4] described below, the search direction is computed as an ap- for more on Lagrange duality.) proximate solution to the Newton system, using a truncated Any dual feasible point ν gives a lower bound on the op- Newton method. ⋆ ⋆ timal value p of the primal problem (2), i.e., G(ν) ≤ p , In the primal barrier method, the parameter t is held con- which is called weak duality. Furthermore, the optimal value stant until φt is (approximately) minimized, i.e., k∇φtk2 is of the primal and dual are equal since the primal problem (3) small; when this occurs, t is increased by a factor typically satisfies Slater’s condition, which is called strong duality [15]. between 2 and 50. The method described below attempts to update the parameter t at each iteration, using the observation 2.2. Suboptimality bound made above that we can cheaply compute a dual feasible point and associated duality gap for any x. We are able to derive an easily computed bound on the sub- optimality of x, by constructing a dual feasible point ν from TRUNCATED NEWTON INTERIOR-POINT METHOD. an arbitrary x. The dual point given relative tolerance ǫrel > 0, α ∈ (0, 1/2), β ∈ (0, 1) ν = 2s(Ax − y), (5) n Initialize. t := 1/λ, x := 0, u := 1 = (1,..., 1) ∈ R . T with the scaling constant s = min 1, λ/kA νk∞ , is dual repeat feasible. Therefore G(ν) is a lower bound on the optimal 1. Compute the search direction (∆x, ∆u) value of (2). The difference between the primal objective as an approximate solution to the Newton system (7). value and the associated lower bound G(ν) is called the dual- 2. Backtracking line search. ity gap and denoted η: Find the smallest integer k ≥ 0 that satisfies n k k φt(x + β ∆x,u + β ∆u) 2 i η = kAx − yk2 + λ|x | − G(ν). (6) k T ∆x i=1 ≤ φt(x,u) + αβ ∇φt(x,u) . X ∆u k The duality gap is always nonnegative, and is zero at the op- 3. Update . (x,u) := (x,u) + β (∆x, ∆u). timal point. 4. Construct dual feasible point ν from (5). 5. Evaluate duality gap η from (6). 4. APPLICATION TO SPARSE MRI 6. quit if η/G(ν) ≤ ǫrel. 7. Update t. In this section we demonstrate the interior-point method de- scribed in Section 3 with real Magnetic Resonance Imaging As a stopping criterion, the method uses the duality gap (MRI) data, using algorithm parameters α = 0.01, β = 0.5, divided by the dual objective value. By weak duality, the ratio smin = 0.5, µ = 2, ξ = 0.01, and ǫrel = 0.05. The regu- is an upper bound on the relative suboptimality. larization parameter is taken as λ = 0.01. The method was The update rule we propose is implemented in Matlab, and run on a 3.2GHz Pentium IV un- der Linux.