Approximate Newton's Method for Tomographic Image Reconstruction
Total Page:16
File Type:pdf, Size:1020Kb
Approximate Newton's method for Tomographic Image Reconstruction Yao Xie Reports for EE 391 Fall 2007 { 2008 December 14, 2007 Abstract In this report we solved a regularized maximum likelihood (ML) image reconstruction problem (with Poisson noise). We considered the gradient descent method, exact Newton's method, pre-conditioned conjugate gradient (CG) method, and an approximate Newton's method, with which we can solve a million variable problem. However, the current version of approximate Newton's method converges slow. We could possibly improve by ¯nding a better approximate to the Hessian matrix, and still easy to compute its inverse. 1 Introduction Deterministic Filtered backprojection (FBP) and regularized Maximum-Likelihood(ML) es- timation are two major approaches to transmission tomography image reconstruction, such as CT [KS88]. Though the latter method typically yields higher image quality, its practical application is yet limited by its high computational cost (0.02 second per image for FBP versus several hours per image for ML-based method). This study works towards solving large scale ML imaging problem. We consider several versions of the Newton's algorithm, including an approximate Newton's method. We tested our methods on one simulated example and one real data set. Our method works, but converges relatively slow. One possible remedy is to ¯nd a better approximation to the Hessian matrix. 2 Problem Formulation We consider a computed tomographic (CT) system. The object to be imaged is a square region, which we divide into an n £n array of square pixels. The pixels are indexded column 1 ¯rst, by a single index i ranging from 1 to n2. We are interested in the density which varies over the region. We assume the density is constant inside eatch pixel, and we denote by xi 2 the density in pixel i, i = 1; ¢ ¢ ¢ ; n2. Thus, x 2 Rn is a vector that describes the density. The problem is to estimate the vector of densities x, from a set of sensor measurements. Each sensor measurement is a line integral of the object density. In addition, each measure- ment follows Possion distributing (counting statistics). Now suppose we have m line integral measurements. The jth line is characterized by flijg, where lij is the length of the intersection of jth line with ith pixel (or zero if they don't intersect), as illustrated in Fig. 1. Then the whole set of measurements forms a vector y 2 RN whose elements are given by yj » Poisson(¸j): (1) The parameters f¸ig are determined according to Beer's law: Pn2 ¡ lij xj ¸j = Ije i=1 ; (2) where Ij is the intensity of the ith X-ray before passing through the object. The problem is to reconstruct the pixel densities x from the line integral measurements y. We assume each measurement yi are independent. Then the likelihood function is given by Ym ¸yj j ¡¸j px(y) = e ; (3) i=1 yj! and the log-likelihood function is Xm l(x) = log px(y) = (yj log ¸j ¡ ¸j ¡ log(yj!)): (4) i=1 The maximum likelihood (ML) problem is to minimize the negative log-likelihood function. To prevent over¯tting the noisy data, we also add a regularization term in the cost function: Xm XP minimize ¡ (yj log ¸j + ¸j) + ¯ wpÁ([Dx]p): (5) i=1 p=1 The regularization term we use here is the Geman prior [De 01] x±2 Á(x) = ; 2(x2 + ±2) which penalizes roughness of the image. The parameter ± controls the width of the function. The Geman prior is convex, and has continuous ¯rst and second order derivatives. The 2 matrix D 2 RP £n takes the di®erence between neighboring pixels. There are totally P di®erences taken. The weights wp depend on the distances between two neighboring pixels 2 that are taken di®erences. The parameter ¯ is used to control the amount of regularization required. Using (2) in (5), we can write the regularized ML problem as minimize f(x) = yT Lx + IT e¡Lx + ¯wT Á(Dx): (6) m£n2 where the variable is x, the forward projection matrix L = [l1; ¢ ¢ ¢ ; lm] 2 R , I = T m2£1 T P £1 x [I1; ¢ ¢ ¢ ;Im] 2 R , and w = [w1; ¢ ¢ ¢ ; wP ] 2 R . The functions e and Á(x) are overloaded to operate on each element of the input vector. 2.1 Optimality conditions Since f(x) is di®erentiable and convex, a necessary and su±cient condition for a solution x¤ to be optimal is ³ ´ ¢f(x¤) = LT y ¡ diagfIge¡Lx¤ + ¯DT diagfwgÁ0(Dx¤) = 0: (7) 2.2 Stopping criterion Since there is no analytical solution to (7), we solve the optimality equation (7) by an iterative algorithm: xk+1 = xk + sk¢xk; (8) where ¢xk is the search direction, and sk is the step size at iteration k. We use backtracking line search [BV04] to ¯nd the optimal sk. The stopping criterion for the iterative algorithms is k¢fk2 · ² with small ² > 0. For initial guess x0, we use either all zero vector or a smoothed image vector that has been obtained by FBP [KS88]. 3 Approximate Newton's Method The Newton's method solves the linearized optimality condition (7) near x for updating step ¢x: H¢x = ¡g; (9) where the gradient vector and the Hessian matrix are given by g = LT (y ¡ y^) + ¯DT diagfwgÁ0(Dx); (10) H = LT diagfy^gL + ¯DT diagfÁ00(Dx)gD > 0; (11) respectively. The vector y^ = diagfIge¡Lx 3 can be interpreted as the an estimate of the measurement from current x using (2). The func- tions Á0(¢) and Á00(¢) are the ¯rst and second order derivatives of the Geman prior function. Here the physical interpretation of LT is the adjoint operator that maps the measurement to image space. Exact Newton's method solves (11) by inverting H. However, directly evaluating this matrix inversion is prohibitive even for a reasonable size image. One may consider using a truncated Newton's method: conjugate gradient (CG) method with diagonal preconditioner [Boy07] to solve this equation. Here we present another approach. Assume the ¯rst term dominates the Hessian matrix. This may be reasonable since we do not want the prior knowledge to dominate what we learned from the measurements. Also, we can control the parameter ¯ to adjust the influence of the prior term. Then H ¼ LT diagfy^gL; (12) Second, note that L is sparse because each ray only passes through very few number of T pixels. If the size the pixel is small compared with the ray spacing, we have li lj ¼ 0, for i 6= j, as illustrated in Fig. 1. Fig. 2 shows one such case (corresponds to the simulated example where n2 = 4096). Note that the LT L indeed can be approximated by several diagonals. Then LT L ¼ diagfcg; (13) Pm 2 for some constant vector c, whose elements are ci = j=1 lij. Using these two approximations, we can rewrite the optimality equations (11). First, extract the common L factor ³ ´ LT diagfy^gL¢x + (^y ¡ y) + diagfcg¡1Lγ = 0; (14) where γ denotes the prior term in (11). If LT has empty null space (which is true in the normal case since all the elements of L are nonnegative, or at least one ray will hit the object), we have diagfy^gL¢x + (^y ¡ y) + diagfcg¡1Lγ = 0; (15) From this we can solve ³ ´ ¢x = ¡ diagfc^g¡1 diagfy^g¡1(^y ¡ y) + (diagfy^g diagfcg)¡1Lγ : (16) In this way the Hessian matrix inversion was simpli¯ed to a set of scalar divisions. 4 Numerical Examples To test our algorithms, we simulated a small size problem with n2 = 4096 variables and m = 18000 measurements. Then we solve one large scale problem by approximate Newton's method using a real measured clinical data set. The problem has about one million variables n2 = 10242 ¼ 106 and m ¼ 0:8 £ 106 measurements. 4 4.1 Small example with simulated data For this case, we compare the approximate Newton's method with gradient method, exact Newton's method, and truncated Newton's method (Preconditioned Conjugate Gradient (PCG) method with diagonal preconditioners). For the gradient method, ¢x = ¡g. We simulated a parallel beam CT geometry, with 100 detectors, and 180 uniform angular sampling, so m = 18000. The rays spread out wide enough to cover the entire image, with 6 uniform intensities Ij = 10 , j = 1; ¢ ¢ ¢ ; m. The image has 64 £ 64 (n = 4096) pixels. We use ¡8 k±fk2 < 10 as a stopping criterion. The problem is solved on a IBM Laptop with Intel dual core CPU at 2.0 GHz and 2GB RAM. Figs. 3(a) and 3(b) shows the negative log-likelihood function value, and norm of the gradient, versus the number of iterations. PCG converges within 10 iterations; the approxi- mated Newton's method is faster than the the gradient method, but both of them are slower than PCG. Fig. 4 shows the image xk in PCG from the ¯rst to the 8th iteration, from which we can see how the image converges. Fig. 5 shows norm of the gradient versus iterations, for PCG algorithm with Nmax = 10; 30; 100 (Nmax is the maximum number of CG iterations). PCG with Nmax = 30 takes shortest time to converge. 4.2 Large problem with real data Next we solve a large size problem, using real data measured on a GE fan beam geometry CT scanner. We compare the regularized ML image with the images solved by FBP, The regularized ML method was run for 200 iterations. We used all zero image as an initial guess. Figs. 6 and 7 show the image reconstructed using the regularized ML method, and FBP (the standard way for image reconstruction image in commercial scanners).