Approximate ’s method for Tomographic Image Reconstruction

Yao Xie Reports for EE 391 Fall 2007 – 2008 December 14, 2007

Abstract

In this report we solved a regularized maximum likelihood (ML) image reconstruction problem (with Poisson noise). We considered the descent method, exact Newton’s method, pre-conditioned conjugate gradient (CG) method, and an approximate Newton’s method, with which we can solve a million variable problem. However, the current version of approximate Newton’s method converges slow. We could possibly improve by finding a better approximate to the , and still easy to compute its inverse.

1 Introduction

Deterministic Filtered backprojection (FBP) and regularized Maximum-Likelihood(ML) es- timation are two major approaches to transmission tomography image reconstruction, such as CT [KS88]. Though the latter method typically yields higher image quality, its practical application is yet limited by its high computational cost (0.02 second per image for FBP versus several hours per image for ML-based method). This study works towards solving large scale ML imaging problem. We consider several versions of the Newton’s algorithm, including an approximate Newton’s method. We tested our methods on one simulated example and one real data set. Our method works, but converges relatively slow. One possible remedy is to find a better approximation to the Hessian matrix.

2 Problem Formulation

We consider a computed tomographic (CT) system. The object to be imaged is a square region, which we divide into an n ×n array of square pixels. The pixels are indexded column

1 first, by a single index i ranging from 1 to n2. We are interested in the density which varies over the region. We assume the density is constant inside eatch pixel, and we denote by xi 2 the density in pixel i, i = 1, ··· , n2. Thus, x ∈ Rn is a vector that describes the density. The problem is to estimate the vector of densities x, from a set of sensor measurements. Each sensor measurement is a line integral of the object density. In addition, each measure- ment follows Possion distributing (counting statistics). Now suppose we have m line integral measurements. The jth line is characterized by {lij}, where lij is the length of the intersection of jth line with ith pixel (or zero if they don’t intersect), as illustrated in Fig. 1. Then the whole set of measurements forms a vector y ∈ RN whose elements are given by

yj ∼ Poisson(λj). (1)

The parameters {λi} are determined according to Beer’s law:

Pn2 − lij xj λj = Ije i=1 , (2)

where Ij is the intensity of the ith X-ray before passing through the object. The problem is to reconstruct the pixel densities x from the line integral measurements y. We assume each measurement yi are independent. Then the likelihood function is given by

Ym λyj j −λj px(y) = e , (3) i=1 yj! and the log-likelihood function is

Xm l(x) = log px(y) = (yj log λj − λj − log(yj!)). (4) i=1 The maximum likelihood (ML) problem is to minimize the negative log-likelihood function. To prevent overfitting the noisy data, we also add a regularization term in the cost function:

Xm XP minimize − (yj log λj + λj) + β wpφ([Dx]p). (5) i=1 p=1

The regularization term we use here is the Geman prior [De 01]

xδ2 φ(x) = , 2(x2 + δ2) which penalizes roughness of the image. The parameter δ controls the width of the function. The Geman prior is convex, and has continuous first and second order derivatives. The 2 matrix D ∈ RP ×n takes the difference between neighboring pixels. There are totally P differences taken. The weights wp depend on the distances between two neighboring pixels

2 that are taken differences. The parameter β is used to control the amount of regularization required. Using (2) in (5), we can write the regularized ML problem as

minimize f(x) = yT Lx + IT e−Lx + βwT φ(Dx). (6)

m×n2 where the variable is x, the forward projection matrix L = [l1, ··· , lm] ∈ R , I = T m2×1 T P ×1 x [I1, ··· ,Im] ∈ R , and w = [w1, ··· , wP ] ∈ R . The functions e and φ(x) are overloaded to operate on each element of the input vector.

2.1 Optimality conditions Since f(x) is differentiable and convex, a necessary and sufficient condition for a solution x∗ to be optimal is ³ ´ ∆f(x∗) = LT y − diag{I}e−Lx∗ + βDT diag{w}φ0(Dx∗) = 0. (7)

2.2 Stopping criterion Since there is no analytical solution to (7), we solve the optimality equation (7) by an iterative algorithm:

xk+1 = xk + sk∆xk, (8)

where ∆xk is the search direction, and sk is the step size at iteration k. We use backtracking [BV04] to find the optimal sk. The stopping criterion for the iterative algorithms is k∆fk2 ≤ ² with small ² > 0. For initial guess x0, we use either all zero vector or a smoothed image vector that has been obtained by FBP [KS88].

3 Approximate Newton’s Method

The Newton’s method solves the linearized optimality condition (7) near x for updating step ∆x:

H∆x = −g, (9)

where the gradient vector and the Hessian matrix are given by

g = LT (y − yˆ) + βDT diag{w}φ0(Dx), (10)

H = LT diag{yˆ}L + βDT diag{φ00(Dx)}D > 0, (11)

respectively. The vector yˆ = diag{I}e−Lx

3 can be interpreted as the an estimate of the measurement from current x using (2). The func- tions φ0(·) and φ00(·) are the first and second order derivatives of the Geman prior function. Here the physical interpretation of LT is the adjoint operator that maps the measurement to image space. Exact Newton’s method solves (11) by inverting H. However, directly evaluating this matrix inversion is prohibitive even for a reasonable size image. One may consider using a truncated Newton’s method: conjugate gradient (CG) method with diagonal preconditioner [Boy07] to solve this equation. Here we present another approach. Assume the first term dominates the Hessian matrix. This may be reasonable since we do not want the prior knowledge to dominate what we learned from the measurements. Also, we can control the parameter β to adjust the influence of the prior term. Then H ≈ LT diag{yˆ}L, (12) Second, note that L is sparse because each ray only passes through very few number of T pixels. If the size the pixel is small compared with the ray spacing, we have li lj ≈ 0, for i 6= j, as illustrated in Fig. 1. Fig. 2 shows one such case (corresponds to the simulated example where n2 = 4096). Note that the LT L indeed can be approximated by several diagonals. Then LT L ≈ diag{c}, (13)

Pm 2 for some constant vector c, whose elements are ci = j=1 lij. Using these two approximations, we can rewrite the optimality equations (11). First, extract the common L factor ³ ´ LT diag{yˆ}L∆x + (ˆy − y) + diag{c}−1Lγ = 0, (14)

where γ denotes the prior term in (11). If LT has empty null space (which is true in the normal case since all the elements of L are nonnegative, or at least one ray will hit the object), we have diag{yˆ}L∆x + (ˆy − y) + diag{c}−1Lγ = 0, (15) From this we can solve ³ ´ ∆x = − diag{cˆ}−1 diag{yˆ}−1(ˆy − y) + (diag{yˆ} diag{c})−1Lγ . (16) In this way the Hessian matrix inversion was simplified to a set of scalar divisions.

4 Numerical Examples

To test our algorithms, we simulated a small size problem with n2 = 4096 variables and m = 18000 measurements. Then we solve one large scale problem by approximate Newton’s method using a real measured clinical data set. The problem has about one million variables n2 = 10242 ≈ 106 and m ≈ 0.8 × 106 measurements.

4 4.1 Small example with simulated data For this case, we compare the approximate Newton’s method with , exact Newton’s method, and truncated Newton’s method (Preconditioned Conjugate Gradient (PCG) method with diagonal preconditioners). For the gradient method, ∆x = −g. We simulated a parallel beam CT geometry, with 100 detectors, and 180 uniform angular sampling, so m = 18000. The rays spread out wide enough to cover the entire image, with 6 uniform intensities Ij = 10 , j = 1, ··· , m. The image has 64 × 64 (n = 4096) pixels. We use −8 kδfk2 < 10 as a stopping criterion. The problem is solved on a IBM Laptop with Intel dual core CPU at 2.0 GHz and 2GB RAM. Figs. 3(a) and 3(b) shows the negative log-likelihood function value, and norm of the gradient, versus the number of iterations. PCG converges within 10 iterations; the approxi- mated Newton’s method is faster than the the gradient method, but both of them are slower than PCG. Fig. 4 shows the image xk in PCG from the first to the 8th iteration, from which we can see how the image converges. Fig. 5 shows norm of the gradient versus iterations, for PCG algorithm with Nmax = 10, 30, 100 (Nmax is the maximum number of CG iterations). PCG with Nmax = 30 takes shortest time to converge.

4.2 Large problem with real data Next we solve a large size problem, using real data measured on a GE fan beam geometry CT scanner. We compare the regularized ML image with the images solved by FBP, The regularized ML method was run for 200 iterations. We used all zero image as an initial guess. Figs. 6 and 7 show the image reconstructed using the regularized ML method, and FBP (the standard way for image reconstruction image in commercial scanners). The zoom in of these two images in the spine region are shown in Fig. 8. Note that regularized ML image is less noiser than the FBP image. Fig. 3 shows the likelihood versus the number of iterations, in a log-log scale plot. Besides the zero initial guess, we also plot the case using smoothed FBP image as an initial guess. The cost function value starts from a lower value but there is not a big difference.

5 Discussions

• Why not fast Radon transform? Matrix multiplication Lx and LT y is dominating factor of computation time. It is possi- ble to do them in a faster manner. Based on Fourier slice theorem, the fast approximate Radon transform can do this in O(n2 log(n2)) rather than O((n2)2) [FS03]. However, it requires interpolation, which results in quite some artifacts in the intermediate images. The errors accumulate as iteration goes, which tend to prevent convergence of the algo- rithm. We tried this approach in our project, but not very successful (for more details

5 l2

l1

Figure 1: The line intersection model. When the pixel size is sufficiently small, the chances of two rays passing one pixel is small. In this figure only the shaded pixel is shared by two rays.

please see my course project report for EE 369C, Fall 2007-2008). The projectors used here are written in C.

• How to improve convergence speed of the approximated Newton’s method? The current approximate Newton’s method converges slow. That may caused by ig- noring the off-diagonals of the matrix LT L. Finding a better approximation to this product (and still easy to compute), or to the Hessian matrix, may lead to better convergence speed.

6 LTL

500

1000

1500

2000

2500

3000

3500

4000 500 1000 1500 2000 2500 3000 3500 4000

Figure 2: The LT L matrix for 64 by 64 image. Figure shows the entries larger than 0.1 of the maximum entry value.

4 10 PCG Approximate Gradient 2 4.2 10 10

0 10 4.19 10

−2 10 4.18 10

Likelihood −4 10 Gradient Norm

4.17 10

−6 10

4.16 10 −8 10 PCG Approximate 4.15 Gradient 10 −10 10 0 1 0 1 10 10 10 10 Iterations Iterations (a): (b)

Figure 3: For the small size simulated example: (a): negative log-likelihood versus number of iterations, (b): norm of the gradient versus the number of iterations.

7 Figure 4: Convergence of PCG: regularized ML image sequence xk versus the number of iterations, PCG.

5 10 pcg−diag, 10 pcg−diag, 30 pcg−diag, 100 Newton, exact

0 10 f|| ∆ ||

−5 10

−10 10 0 5 10 15 20 iter

Figure 5: Norm of the gradient k∆fk2 versus the number of iterations, using exact and truncated Newton (PCG with maximum number of CG iterations Nmax = 10, 30, and 100, respectively).

8 Figure 6: The full image (1024 by 1024 pixels) reconstructed by regularized ML.

Figure 7: The full image (1024 by 1024 pixels) reconstructed by FBP.

9 FBP ML

Figure 8: Spine region of the reconstructed images in Figs. 6 and 7: (a): FBP; (b): ML.

18.2 ML, zero initials ML, FBP initials 18

17.8

17.6

17.4

17.2 log−likelihood

17

16.8

16.6

16.4 0 1 2 10 10 10 Iterations

Figure 9: Negative log-likelihood versus the number of iterations, for the large size problem.

10 References

[Boy07] S. Boyd. EE364b Notes. Stanford University, Jan. 2007.

[BV04] S. Boyd and L. Vandenberghe. . Cambridge University Press, 2004.

[De 01] Bruno De Man. Iterative reconstruction for reduction of metal artifacts in computed tomography. PhD thesis, University of Leuven, Leuven-Heverlee, Belgium, May 2001.

[FS03] J. Fessler and B. P. Sutton. Nonuniform fast Fourier transforms using min-max interpolation. IEEE Trans on Signal Processing, 51(2):560–574, Feb. 2003.

[KS88] A. C. Kaka and M. Slaney. Principles of computerized tomographic imaging. IEEE Press, New York, 1988.

11