<<

120 IEEE TRANSACTIONS ON , VOL. 22, NO. 1, JANUARY 2003 PET-CT Image Registration in the Chest Using Free-form Deformations David Mattes, David R. Haynor, Hubert Vesselle, Thomas K. Lewellen, and William Eubank

Abstract—We have implemented and validated an algorithm tered. The idea can still be used that, if a candidate registration for three-dimensional positron emission tomography transmis- matches a set of similar features in the first image to a set of sion-to-computed tomography registration in the chest, using features in the second image that are also mutually similar, it is as a similarity criterion. Inherent differences in the two imaging protocols produce significant nonrigid motion probably correct [23]. For example, according to the principle of between the two acquisitions. A rigid body deformation combined mutual information, homogeneous regions of the first image set with localized cubic B-splines is used to capture this motion. The should generally map into homogeneous regions in the second deformation is defined on a regular grid and is parameterized set [1], [2]. The utility of information theoretic measures arises by potentially several thousand coefficients. Together with a from the fact that they make no assumptions about the actual spline-based continuous representation of images and Parzen histogram estimates, our deformation model allows closed-form intensity values in the images, but instead measure statistical expressions for the criterion and its gradient. A limited-memory relationships between the two images [24]. Rather than mea- quasi-Newton optimization algorithm is used in a hierarchical sure correlation of intensity values in corresponding regions of multiresolution framework to automatically align the images. the two images, which may be low or negative, mutual informa- To characterize the performance of the method, 27 scans from tion measures statistical dependence of intensities in these re- patients involved in routine lung cancer staging were used in a validation study. The registrations were assessed visually by gions. The mutual information metric has been effective in var- two expert observers in specific anatomic locations using a split ious applications where the requisite transformations are linear window validation technique. The visually reported errors are in [2], [3] and more recently in cases involving nonlinear motion the 0- to 6-mm range and the average computation time is 100 min descriptions [4], [7]. on a moderate-performance workstation. We have concentrated our efforts on PET-to-CT image reg- Index Terms—Computed tomography (CT), deformation, istration in the chest, where we attempt to fuse images from a multimodality, multiresolution, mutual information, nonlinear, modality with high anatomic detail (CT) with images from a nonrigid, positron emission tomography (PET), registration, modality delineating biological function (PET). Although PET validation. is a functional imaging modality, a transmission (TR) image is acquired immediately before acquisition of the emission image I. INTRODUCTION and is, therefore, in near-perfect registration with the functional IVEN two image sets acquired from the same patient but scan. The TR image is similar to a CT attenuation map but it G at different times or with different devices, image reg- uses a higher energy radiation beam, resulting in less soft-tissue istration is the process of finding a geometric transformation detail than the CT, and detector configuration limits its spatial between the two respective image-based coordinate systems resolution. Once the TR and CT images are registered, the re- that maps a point in the first image set to the point in sulting transformation can be applied to the emission or standard the second set that has the same patient-based coordinates, i.e. uptake value (SUV) image for improved PET image interpreta- represents the same anatomic location. This notion presupposes tion. Our intentions are to provide a deformation that aligns the that the anatomy is the same in the two image sets, an assump- TR image with the CT image and to present the visually assessed tion that may not be precisely true if, for example, the patient accuracy achievable by this method. has had a surgical resection between the two acquisitions. The Patient and anatomic motion during acquisition blurs the im- situation becomes more complicated if two image sets that re- ages. The sharpness of boundaries in CT scans is obtained by flect different tissue characteristics [e.g. computed tomography having the patient maintain maximum inspiration during the (CT) and positron emission tomography (PET)] are to be regis- 30 s required for acquisition. To avoid attenuation of the X-ray beam by the arms, the patient also holds the arms overhead. Anatomically, this pose causes expansion of the arm muscles. Manuscript received June 11, 2001; revised July 16, 2002. This work was Also, the expansion of the lungs and chest wall due to the breath supported by the National Institutes of Health (NIH) under Grant CA42045 and Grant CA80907. The Associate Editor responsible for coordinating the review hold cause descent of the diaphragm and abdominal organs. of this paper and recommending its publication was D. Townsend. Asterisk in- Most patients cannot endure an arms-up posture for the dura- dicates corresponding author. tion of a PET scan, which can last up to 30 min, and will, of *D. Mattes is with The Boeing Company, Phantom Works, M&CT, Advanced Systems Laboratory, PO Box 3707, MC 7L-40, Seattle, WA 98124-2207 USA course, be engaged in normal tidal breathing. As a result, PET (e-mail: [email protected]). scans show an average of the anatomy over the respiratory cycle. D. R. Haynor, H. Vesselle, T. K. Lewellen, and W. Eubank are with the Uni- Clearly, a linear transformation model will not be sufficient to versity of Washington; Imaging Research Laboratory University of Washington Medical Center, Box 356004, Seattle, WA 98195 USA. match anatomic regions under these significantly different con- Digital Object Identifier 10.1109/TMI.2003.809072 ditions. We present a nonparametric deformation model capable

0278-0062/03$17.00 © 2003 IEEE MATTES et al.: PET-CT IMAGE REGISTRATION IN THE CHEST USING FREE-FORM DEFORMATIONS 121

normalized mutual information may be more robust [22]. The deformation model we propose is parameterized by a large number of coefficients. Under this condition the optimization of mutual information is facilitated by a formulation that is explicitly differentiable. The requirement of differentiability in the cost function in turn means that both the deformations and the similarity criterion must be differentiable. This is accomplished by using a B-Spline basis to represent the test image and model deformations and by estimating the joint (a) (b) probability distribution between the test and reference images with a Parzen window. We draw on the work of Thevenaz and Unser [3] for the mathematical development. We now describe the nonlinear transformation model which is used to incorporate deformations into the geometric manipulation of the test image.

A. Image Representation We assume an image is described by a set of samples , defined on a Cartesian grid with integer (c) (d) spacing. The calculation of at points not on the grid re- Fig. 1. Sample CT and TR images illustrating the necessity for nonrigid quires an interpolation method based on the samples and their registration. (a) Axial and (b) coronal views of a typical CT dataset. (c) Axial and (d) coronal views of a TR scan from the same patient. locations . We utilize an interpolation scheme that represents the underlying continuous image by a B-Spline basis. The expansion coefficients of the basis are computed from the of recovering nonlinear deformations due to anatomic motion image samples through an efficient recursive filtering algo- that occurs in CT images at maximum inspiration and PET TR rithm [5]. Values of that do not lie on the lattice can then images that are an average image of tidal breathing. Our work be interpolated [3] does not include a model of the respiratory process itself, or the resulting blurring that occurs in the TR images. Some sample (2) images of the relevant anatomy are shown in Fig. 1. where is any real-valued voxel location in the II. ALGORITHM DESIGN AND IMPLEMENTATION volume, is the coordinate vector of a lat- tice point, and is a separable Let be a test image we want to register to a reference convolution kernel. The argument of the spline window is the image . We assume that and are defined on the sampled cubic B-Spline continuous domains and , respectively, according to the image model presented in the Section II-A. The transformation describes the deformation from to , where is a (3) set of transformation parameters to be determined. We pose the task of medical image registration as an optimization problem. The gradient of the interpolated image at any location may To align the reference image with the transformed test be calculated in a similar manner, but a derivative operator is image , we seek the set of transformation parameters applied to the convolution. This is simply the derivative of the that minimizes an image discrepancy function spline window in the respective dimension of each gradient component (1) (4) Values for the transformation parameters are iteratively chosen to reduce the discrepancy. In our implementation, with similar formulas for and . The cubic spline the negative of mutual information is used to measure image window can be differentiated explicitly, and after simpli- discrepancy. We hypothesize that the set of transformation fication reduces to the difference of two shifted second-order parameters that minimizes the discrepancy function also splines brings the transformed test image into best registration (as defined above) with the reference image. (5) Mutual information is an entropy-based measure of image alignment derived from probabilistic measures of image in- tensity values [1], [2], [21]. Because a large number of image B. Image Statistics samples are used to estimate image statistics, the effects of Let and be specified numbers of uniformly sized bins image noise on the metric are attenuated. Mutual information along the respective dimensions of the joint histogram of the is also robust under varying amounts of image overlap as reference and test images. The histogram bins are indexed by the test image moves with respect to the reference, although integer values , and , and, in 122 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 22, NO. 1, JANUARY 2003 general, correspond to linearly scaled ranges in intensity values. Calculation of the gradient of the cost function is necessary Scaling the data provides a method of binning data with arbi- for its efficient and robust minimization. For —a set of -in- trary magnitude and dynamic range such that any contribution to dependent parameters—the gradient of mutual information is the histogram will fall into a valid bin. We form a histogram by given as determining the appropriate bin for a set of paired image values and totalling the contributions in each of the bins. When we nor- (10) malize the histogram it behaves like a traditional probability dis- tribution. A Parzen window is used to generate continuous es- A single component of the gradient is found by differentiating timates of the underlying image distributions, thereby reducing (9) with respect to a transformation parameter (as given in [3]), the effects of quantization from interpolation and discretization and requires differentiation of the joint distribution in (6). After from binning the data [3]. successive applications of the chain rule, the i partial deriva- The calculation of mutual information requires estimates of tive of the joint distribution is given as the marginal and joint probability distributions of the intensity values of the reference and test images. Let be a cubic spline Parzen window and be a zero-order spline Parzen window (centered unit pulse), both of which satisfy the partition of unity constraint [3]. The smoothed joint histogram of ( , )is given by (11)

where is the number of voxels used in the summation, the (6) image gradient is calculated as in (4), and the various spline derivatives ( , ) are computed using (5). where is a normalization factor that ensures , The final term to discuss from (11) is the expression for the and and are samples of the reference and partial derivatives of the transformation . This is interpolated test images, respectively. Each contributing image the variation in the deformation at a point due to a variation value is normalized by the minimum intensity value, or , in the transformation parameters, and depends on geometry and and the intensity range of each bin, or , to fit into the the transformation model. As will be described briefly in the specified number of bins ( or ) in the intensity distribu- Section II-D, the linearity of the expression of the transforma- tion. The summation range is over a subset of voxel locations tion makes the differentiation straightforward. in that contribute to the distribution, i.e. is the intersection D. Deformations of transformed voxels chosen from that map into . The marginal smoothed histogram for the test image is com- An important aspect of our algorithm is the expression for puted from the joint histogram the nonrigid transformation of image coordinates. We model de- formations on cubic B-splines, because of their computational (7) efficiency (separability in multiple dimensions, calculation via filtering), smoothness, and local control. The description of a The marginal probability for the reference image can be com- deformation is similar to that of the images; however, the de- puted independently of the transformation parameters by noting formation is defined on a much coarser grid. A deformation is that the B-Spline Parzen window satisfies the partition of unity defined on a sparse, regular grid of control points placed constraint. The reference marginal smoothed histogram is com- over the test image and is then varied by defining the motion puted as of each control point. Using a spline interpolation kernel to compute the deformation values between the control points (8) produces a locally controlled, globally smooth transformation. The spline deformation model is differentiable with respect to where is the constant in (6). both the deformation parameters (necessary for computing the gradient of mutual information) and spatial coordinates (useful C. Mutual Information in other applications, for example estimating inverse coordinate mappings). The image discrepancy measure we use is the negative of mu- The resolution of the deformation deter- tual information. The negative of mutual information between mines the spacing of the grid and can be anisotropic. The set of the reference image and the transformed test image is expressed control points is a regular grid with spacings as a function of the transformation parameters [3] (12) (9) where , , and are the dimensions of the test image. Given where , , and are the joint, marginal test, and marginal the grid spacing, we can compute the coordinates of the control reference probability distributions, respectively. points. The ( , , ) coordinates of the control points are stored MATTES et al.: PET-CT IMAGE REGISTRATION IN THE CHEST USING FREE-FORM DEFORMATIONS 123 in separate three–dimensional (3-D) matrices. Any element of these matrices has a location in the test image given by . Each control point has an associated three-element deformation coefficient , describing the x-, y-, and z-components of the deformation. For a give there will be deformation coefficients. The deformation at any point in the test image is interpolated using a cubic B-Spline convolution kernel

(13) where is a separable cubic B-spline convolution kernel. Only those corresponding to the 64 control points nearest contribute to this sum. When we differentiate (13) with respect to a deformation coefficient the parameter appears only once in the summation, resulting in a single nonzero term. The derivative with respect to a x compo- nent deformation coefficient is (the y and z components follow similarly) x (14)

The transformation of the test image is specified by mapping reference image coordinates according to a locally perturbed rigid body transformation. Given a 3 3 homogeneous rotation matrix , a three-element transformation vector , and a defor- Fig. 2. The registration algorithm in a multiresolution framework. The area mation term , we can apply nonlinear transformations to enclosed in a dashed line is the algorithmic core for computing and optimizing f f the test image mutual information for a single resolution step. and are the test and reference images, respectively. x is any geometric location in the reference image, " is the set of transformation parameters. p, p , p , and rp are the (15) joint, marginal test, marginal reference, and derivative of the joint distribution, respectively. ƒ and rƒ are the values of mutual information and its gradient, where is the location of the center respectively, between the two images. L-BFGS-B is an optimization package of the test volume and is any voxel location in that searches the parameter space of ". the reference image. With this transformation model, the set of transformation parameters becomes In order to avoid local minima, and to decrease computa- (16) tion time, we use a hierarchical multiresolution optimization scheme. A flowchart of the algorithm in the multiresolution con- where are the Roll-Pitch-Yaw (RPY) Euler angles of text is shown in Fig. 2. We initially optimize for a deformation the rotation matrix , , is the translation vector, to recover the gross motion of the patient and large anatomic and is the set of deformation coefficients. In the application structures. As we increase the resolution, we recover increas- described here, the set of deformation coefficients contained up ingly fine misalignments. To simplify the transition between to 2200 members. multiresolution steps, the size of the image volume remains the same while other parameters vary. According to an empirically E. Multiresolution Optimization Strategy determined schedule, we maintain control over the resolution The registration process is automated by varying the defor- of the deformation, the number of image samples used to mea- mation in the test image until the discrepancy between the two sure mutual information, the degree of Gaussian image blurring, images is minimized. The alignment process is divided into two and the optimizer’s termination criterion. The resolution steps registrations by separating (15) into a term containing only rigid are denoted as , where is the number of multires- body parameters and a second with only deformation parame- olution steps (typically ). The optimum set of transfor- ters. We use L-BFGS-B [6], a limited-memory, quasi-Newton mation parameters at a coarser resolution level is upsampled to minimization package, to reduce the cost function in (9) until become the starting point for the next resolution level. termination criteria are satisfied. The limited-memory method If specifies the resolution of the deforma- is useful here because of the high dimensionality of the param- tion, and are the deformation resolutions for each of eter space. Instead of estimating the entire Hessian during mini- the multiresolution steps, then we proceed from resolution mization, only a low-rank approximation is calculated, allowing to resolution as follows. Starting at current deforma- linear or super-linear convergence rates. L-BFGS-B provides an tion resolution with deformation coefficients and grid additional advantage in that it allows bound constraints on the spacing , we first place a deformation grid over the volume independent variables. at higher resolution . The deformation at resolution is 124 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 22, NO. 1, JANUARY 2003

TABLE I MULTIRESOLUTION PARAMETERS USED FOR REGISTERING THE 27 DATASETS

computed for each of the control point locations for res- respectively. In all cases, the number of bins and used olution to compute the joint histogram are both fixed at 50, a number established empirically by ourselves and others [22]. Computa- (17) tion time is most strongly affected by the number of transforma- tion parameters and the number of voxels used to estimate Then the deformation coefficients are the spline coefficients mutual information (and, consequently, the voxel sampling of the new grid of deformation values proportionality constant ). (18) where is the recursive filter used to compute a spline basis of III. VALIDATION the test image [5]. Validating the performance of an image registration algo- In addition to varying the deformation resolution, Gaussian rithm with real images is not straightforward. The lack of a gold blurring is applied to the images with a kernel that narrows standard complicates matters and prevents any automated as- as multiresolution proceeds. Smoothing the images tends to sessment of registration accuracy. Even if individuals trained smooth the mutual information cost function, thereby avoiding to interpret medical images are involved in a validation experi- local maxima. Because the image (and the cost function) is ment, providing a method for consistently assessing individual smoothed we can use fewer samples to estimate the joint images is difficult. There is a tradeoff between the number of distributions, which is an alternative to constructing an image images that can be assessed and the time required to assess each pyramid. We follow the general rule that the number of data one, a situation that often forces researchers to validate algo- points at resolution level should be proportional to the rithms based on a limited sample size. number of independent parameters for which we are trying to For algorithms with linear/global transformation models, ret- solve rospective validation can be performed if fiducial markers are in (19) place at the time the scan is performed [9]. This method would not be appropriate here as the imaged anatomy is subject to non- where is the number of voxels in the volume, rigid motion between examinations or due to physiological ef- is the number of deformation coefficients, fects such as breathing. In cases where placing fiducial markers and is a proportionality constant. Finally, the optimizer’s ter- is impractical or impossible, researchers often conduct valida- mination criterion is varied. The iteration terminates when the tion experiments using simulated data, generated with a physical change in the objective function between consecutive iterations or simulated [29] phantom. Such studies provide insight into falls below a specified value . At coarse resolutions, this cri- the performance range of the algorithm but still mean little in terion can be relaxed, however, it is set more strictly with in- terms of performance on real data. Landmark identification is creasing resolution. another common method of validation, and was considered for The alignment process is divided into two registrations: a our validation experiments. A difficulty with this technique is rigid body (6 parameters, )followed by a deformation that there are few (if any) point landmarks in the anatomy. Struc- ( parameters, ) recovery. Each registration pro- tures like the carina, which shows the split in the trachea and is ceeds from coarse to fine resolution in four steps. The rigid body well-defined in the test and reference images, do not terminate at parameters are recovered first and then fixed for the deformation a single three-dimensional (3-D) coordinate, but are distributed recovery; the flowchart in Fig. 2 is used for both recoveries. The over a surface. There will always be some degree of error in the separation in recovery stages is needed because redundancy in selection of corresponding points. the independent variables can cause competition during mini- We would like to avoid the pitfalls in landmark identifica- mization. For the registrations presented in this paper, the mul- tion, but the capacity of the human eye to rapidly and accurately tiresolution parameter settings for the deformation recovery are (albeit only qualitatively) determine the quality of a registra- summarized in Table I. For rigid body recovery, and follow tion should not be underestimated. While validation methods the same schedules given in Table I, while the number of pa- involving human assessment will be prone to error, bias, and rameters and (and, consequently, ) are fixed at 6 and 0.05, inconsistencies, if two images are presented in a user-friendly MATTES et al.: PET-CT IMAGE REGISTRATION IN THE CHEST USING FREE-FORM DEFORMATIONS 125 manner, experienced physicians can rapidly assess the “gestalt” quality of the registration in a short amount of time. Further- more, they have a priori knowledge of where in the anatomy the quality of the registration should be high and where these requirements can be relaxed. Our aim with the experiment was to assess the algorithm only in anatomically relevant areas. We took seven images from each patient set: five axial slices (four regularly spaced through the lungs, and one in the upper ab- domen), one coronal slice at the carina, and one sagittal midline slice. Additionally, we included 10% duplicate slices to measure intraobserver consistency. We created a user interface for validation that allows rapid visual assessment of registered images. To provide a measure of consistency between multiple observers, we allowed only two-dimensional (2-D) image navigation, and presented only specific images for assessment. The interface is based on a split Fig. 3. A third-order polynomial is used to map PET attenuation values to CT window, a display technique that fixes one image over another Hounsfield units. and allows the user to vary the lines of transition from the top image to the image below. The images are stationary, but with requested to use only the provided presets when measuring reg- the mouse button held the vertical and horizontal lines of tran- istration accuracy. sition move with the mouse cursor. To quantify the perceived error, a ruler is placed over the image and fixed to the mouse IV. DATA cursor. A set of error bars physically sized according to the ruler divisions is used to assess the registration accuracy for The image sets used for the validation study consist of the given image pair. The observer makes two error assessments FDG-PET and CT scans of the chest in 28 patients. Each set for each image pair: overall, or “representative,” error and max- contains an FDG-PET scan and a chest CT scan performed on a imum error. Using this method, the two strongest visual tests patient undergoing initial staging for nonsmall cell lung cancer. will be the presence/absence of similar anatomic structures and These scans were all performed at the University of Washington the magnitude of discontinuities in tissue boundaries. The vali- Medical Center. The chest CT scan was performed with spiral dation technique will measure errors in translation (both locally technique and bolus intravenous contrast administration, using and globally) but will not assess rotation errors. Since both im- a variety of GE CT scanners (and with varying collimation ages contain similar anatomic information, we found the split and pitch) with submillimeter intrinsic spatial resolution. The window display technique preferable to a pseudocolor image imaging protocol specified an arms-up position and breath-hold overlay for validation. at maximum inspiration. PET imaging was performed on a Since the intensity values for CT and TR scans are not in the GE Advance Scanner with measured isotropic resolution of 8-bit display range, window and level functionality is necessary approximately 5 mm in air (which decreases due to scattering) for rendering specific anatomic structure. Image interpretation [31]. For the FDG PET scan, emission imaging was performed is sensitive to the display parameters, in that if the anatomy is not from the neck to the pelvis. Transmission imaging (15 min rendered at similar window and level settings, feature bound- with a 511-keV energy beam) of the thorax and upper abdomen aries can significantly vary. Therefore, some form of display was also performed for the purpose of attenuation correction. normalization is required to provide consistency in the images Therefore, emission images spanning from the neck to pelvis, presented in the validation experiment. Window and level set- transmission and attenuation-corrected images of the thorax tings for CT images are standardized in Hounsfield units so we and upper abdomen were available for testing our estimated the intensity transformation for attenuation values to approach. Hounsfield units using the joint histogram from a single reg- All patients received both scans within at most two months of istered image set. Averaging along the CT Hounsfield (refer- each other, and did not have surgical resection or major tumor ence) and PET attenuation (test) dimensions produces several growth between acquisitions. The TR scans are exported from datapoints that are fit with the third-order polynomial shown in the PET station as a single 3-D floating-point data set of size Fig. 3. As shown in Tai et al. the intensity transformation from . The TR voxels have a physical size of CT to TR values is nonlinear due to scattering, and they gen- mm ; therefore, each TR scan axially images 437.75 erate a figure similar to Fig. 3 [8]. mm in its field-of-view (FOV). The CT scans are acquired ax- During validation the intensity values of the TR slices are ially with slice thicknesses that vary from 2–7 mm. Once the transformed using a lookup table generated from the polyno- patient scan is acquired, the in-plane voxel size is set to mag- mial fit. We should note that the intensity transformation applied nify the patient anatomy as much as possible, within the range to the TR slices discussed here is used for validation only and of 0.7–1.1 mm. The CT FOV varies across patients and is gen- not during the registration process. Standard window and level erally smaller than that of the TR scan (a typical CT FOV covers settings are applied to both the TR and CT slices for improved 350 mm of axial anatomy). For the data presented here, the PET consistency during image interpretation. The physicians were TR scan is the test image and the CT scan is the reference 126 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 22, NO. 1, JANUARY 2003

. We should note that theoretically either scan can be the test or reference, but we have consistently used the CT scan as the reference image. In the one case we did juxtapose the relation- ship, a poor registration resulted. Several preprocessing steps are performed before registering the images. The images are filtered with a Gaussian blur ( voxel), the TR image is resliced to have isotropic voxels, and using cubic B-Spline interpolation the spatial resolution of the CT is reduced to match that of the TR. All the images pre- (a) (b) sented in the validation experiment have an isotropic voxel size of 4.29 mm for the registration. As explained in the introduc- tion the arms are present in the PET images and are extraneous features for which there is no correspondence in the CT. With the aid of a simple graphical user interface, manually placed polygons in several axial slices are extended through the PET volume to exclude the arms. The voxels within this volume [ in (11)] represent the set of image samples from which mutual information is computed. No other cropping or masking is per- (c) (d) formed on the images. A final postprocessing step zooms the selected slices of the TR and CT volumes for validation, pro- ducing 2-D images of size 384 384 with mm pixel dimensions. In some cases, the rigid body registration required a manual initialization along the z axis.

V. R ESULTS (e) (f) Fig. 4. Sample images from a registered TR-CT pair. (a) Axial and (b) coronal Twenty-eight patient sets were registered for the validation views of the CT. (c) Axial and (d) coronal views of the transformed TR image. experiment, with one failing due to a limited CT FOV. All data The crosshairs are at the same image coordinates in both CT and TR volumes. presented is for the 27 patients successfully registered, with no Approximately corresponding (e) axial and (f) coronal (f) slices from the gross misregistrations being the only criteria for a successful unregistered TR volume are shown for comparison. registration. Alignment was assessed by two radiologists with experience interpreting both PET and CT images. Registration was possible for the observers to determine an exact error mea- was performed on a DEC Alpha RS6000 (dual 375-MHz sure, especially for the maximum assessed error. In future work, processors, 1.2-GB RAM) workstation running Compaq Tru64 we will experiment with a method of estimating the error on a Unix. The algorithm is implemented as a library of C modules continuous domain. and compiled on both the Windows 98 and Unix platforms The means, medians, and standard deviations of the overall using Microsoft Visual C 6.0 (Microsoft Corp., Redmond, and maximum assessed errors for both observers are given WA) and gcc (Gnu C compiler), respectively. L-BFGS-B is a in Table III. Although the median error index is zero for the library of routines written in Fortran 77 [20]. The Interactive “overall” assessments, remember that the error index of zero Data Language (IDL, Research Systems Inc., Boulder, CO) corresponds to a mm-equivalent as given in Table II. We can was used to manage program execution by allocating memory, also group the results by anatomic region and compare errors loading and calling library modules, and interfacing with in all patients for a given region and for all the anatomy. The L-BFGS-B. IDL is also used for the graphical user interfaces assessed error by anatomy is shown in Fig. 6. The errors in for manual segmentation, dual image viewing, and validation. axial slices in the upper lung regions are small, while the Using the DEC Alpha workstation, an average of 100 min poorest performance is in the abdomen. As shown in Fig. 6 the ( 10 min for rigid body, 90 min for deformable) was re- sagittal and coronal slices were also assessed as being less well quired to register each dataset. Example images from a regis- registered than the lung regions shown in axial slices. There tered dataset are shown in Fig. 4 with a locked cursor, while is some tendency for observer 2 to grade more harshly than Fig. 5 shows sample images from the validation user interface. observer 1 in the “overall” category, but their choices agree The validation experiment was conducted on a Macintosh G4 fairly well . Perhaps because the maximum error can 400-MHz desktop computer. The 205 slices used in the ex- be assessed more objectively, the observers tend to agree more periment were divided into four sets, each of which required in this error category . Intraobserver consistency 1.5 h for an observer to assess. The observers scored the accu- was measured from the assessments of 21 duplicate slice racy of the registration by selecting the appropriate error range presentations. Observer 1 ( for “overall,” (Table II) based on the on-screen error bars, and the observers’ for “maximum”) tended to be more consistent than observer selections were recorded by the validation software. At the time 2( for “overall”, for “maximum”). Given the validation experiments were conducted, we did not feel it the nature of the validation experiment, only translation errors MATTES et al.: PET-CT IMAGE REGISTRATION IN THE CHEST USING FREE-FORM DEFORMATIONS 127

Fig. 5. Sample images from the seven anatomic locations considered in the validation experiment, as presented to the observers with the split window viewer. Images are rendered using lung window/level unless noted otherwise. (top row from left to right) Lower lung, mid-lower lung, mid-upper lung, and upper lung. (bottom row from left to right) Abdomen (mediastinal window), carina, midline (mediastinal window), and midline (repeat in lung window).

TABLE II TABLE III ERROR INDEX RECORDED BY THE VALIDATION USER INTERFACE CORRESPOND GLOBAL STATISTICS FOR VALIDATION EXPERIMENT TO ERROR RANGES IN PIXELS AND MILLIMETERS

are measurable and we can make no claims about the rotational accuracy of the results.

VI. DISCUSSION AND CONCLUSION The results of the validation study indicate that the algorithm is capable of accurate registrations in the thorax, and the radi- ologists who validated the results felt the errors were generally within clinically acceptable ranges. We have identified regions in the anatomy for which the algorithm succeeds in varying de- grees. According to visual assessments of 27 patient datasets by Fig. 6. Mean overall and maximum assessed errors by anatomy for both two observers, the overall error in the considered anatomy has observers for the 27 patient datasets. The average performance for all the an error index of 0.54, which is in the 0- to 6-mm error range images is given in the last data series. (recall that the original PET voxel size is 4.3 mm on a side). The mid to upper lung regions are registered the most accurately lung and soft tissue regions are perhaps driving the deforma- with a mean overall assessed error index of 0.24, which is also tion, and the smaller airway anatomy is moved along with them. in the 0–6 mm error range. The assessed performance in the ab- Increasing the deformation resolution in these dimensions may domen is the worst in general with a mean overall assessed error yield better results. Using a mathematical phantom such as the index of 1.37, which is in the 6- to 11-mm error range. MCAT phantom would allow us to characterize the algorithm The algorithm produces registrations with relatively large as- when a respiratory model is included in the image data [29]. sessed error in the coronal and sagittal planes. This is partic- The registration accuracy in the smaller anatomic regions (such ularly true at the lung-diaphragm interface since more surface as along the trachea) can likely benefit from some form of clus- area of the diaphragm is imaged on coronal or sagital slices tering of deformation control points. This highlights the prin- compared to the axial plane, and the diaphragm is not expected cipal limitation of the algorithm: the inability to use a nonuni- to be in the same location due to the differences in image ac- form grid of sampling points in the test image. Future work will quisition (breath-hold vs non breath-hold). Additionally, larger address this limitation. 128 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 22, NO. 1, JANUARY 2003

From the validation study, we notice a poor assessment in the [7] J. Kybic, P. Thevenaz, A. Nirkko, and M. Unser, “Unwarping of unidi- abdomen. At 511 keV, the TR scan shows little intensity differ- rectionally distorted EPI images,” IEEE Trans. Med. Imag., vol. 19, pp. 80–93, Feb. 2000. ences between fat, muscle, and soft tissue, resulting in a uniform [8] Y. Tai, K. P. Lin, C. K. Hoh, S. C. H. Huang, and E. J. Hoffman, “Utiliza- image in the abdomen (except for gas bubbles). At 140 keV, the tion of 3-D elastic transformation in the registration of chest X-ray CT CT scan of the abdomen, on the other hand, shows clear delin- and whole body PET,” IEEE Trans. Nucl. Sci., vol. 44, pp. 1606–1612, Aug. 1997. eation of the anatomy. Since there are few shared structures in [9] J. West, J. M. Fitzpatrick, M. Y. Wang, B. M. Dawant, C. R. Maurer the TR and CT scan of the abdomen on which the algorithm can Jr., R. M. Kessler, and R. J. Maciunas, “Comparison and evaluation of anchor, a poor registration results. In this case, the PET emission retrospective intermodality image registration techniques,” J. Comput. Assist. Tomogr., vol. 21, pp. 554–566, July/Aug. 1997. or SUV scan may be a better candidate for alignment with CT. [10] J. M. Fitzpatrick, D. L. G. Hill, Y. Shyr, J. B. West, C. Studholme, and C. An alternative approach includes using attenuation and emission R. Maurer Jr., “Visual assessment of the accuracy of retrospective reg- values simultaneously, which may improve the overall accuracy istration of MR and CT images of the brain,” IEEE Trans. Med. Imag., vol. 17, pp. 571–585, Aug. 1998. in the torso. [11] J. B. West, J. M. Fitzpatrick, M. Y. Wang, B. M. Dawant, C. R. Maurer In addition to the question of how fine a grid do we need Jr., R. M. Kessler, and R. J. Maciunas, “Retrospective intermodality reg- to represent a given set of deformations, we must also decide istration techniques: surface-based versus volume-based,” IEEE Trans. Med. Imag., vol. 18, pp. 144–150, Feb. 1999. how fine a deformation grid we can effectively use. The coarse- [12] R. Szeliski and S. Lavallee, “Matching 3-D anatomical surfaces with ness of the grid effectively regularizes the deformation and pre- nonrigid deformations using octree-splines,” Int. J. Comput. Vis., vol. vents violent local deformations. This may impose an upper 18, no. 2, pp. 171–186, 1996. [13] H. Lester and S. R. Arridge, “A survey of hierarchical nonlinear medical limit on grid fineness, unless additional regularization terms are image registration,” Pattern Recogn., vol. 32, pp. 129–149, 1999. included in the cost function. [14] M. Unser, A. Aldroubi, and M. Eden, “B-spline signal processing: part We have validated the algorithm only for the TR to CT I—theory,” IEEE Trans.Signal Processing, vol. 41, pp. 821–833, Feb. 1993. matching problem. It is natural to question if it will work well [15] , “B-spline signal processing: part II—efficient design and applica- for other inter- or intra-modality image pairs. PET images tions,” IEEE Trans.Signal Processing, vol. 41, pp. 834–848, Feb. 1993. are among the worst (after single photon emission computed [16] P. J. Kostelec, J. B. Weaver, and D. M. Healy Jr., “Multiresolution elastic image registration,” Med. Phys., vol. 25, no. 9, pp. 1593–1604, Sept. tomography) of the 3-D imaging modalities in terms of tissue 1998. delineation and signal to noise ratio. We believe that given the [17] B. C. Vemuri, J. Ye, Y. Chen, and C. M. Leonard, “A level-set based success of the TR-CT registration problem, the other modality approach to image registration,” in Proc. IEEE Workshop Mathematical Methods in Biomedical Image Analysis, June 2000, pp. 86–93. combinations will be no more difficult or significantly easier. [18] J. P. W. Pluim, J. B. A. Maintz, and M. A. Viergever, “Image registration Clinical studies are currently underway for assessing the by maximization of combined mutual information and gradient informa- utility of the registration method described here for improved tion,” IEEE Trans. Med. Imag., vol. 19, pp. 809–814, Aug. 1999. [19] M. Holden, D. L. G. Hill, E. R. E. Denton, J. M. Jarosz, T. C. S. Cox, T. PET image interpretation, assessment of response to therapy, Rohlfing, J. Goodey, and D. J. Hawkes, “Voxel similarity measures for studying tumor biology, and image-guided therapy applications. 3-D serial MR brain image registration,” IEEE Trans. Med. Imag., vol. 19, pp. 94–102, Feb. 2000. [20] “Tech. Rep.,” EECS Dept., Northwestern Univ., Evanston, IL, 1994. [21] P. Viola and W. M. Wells III, “Alignment by maximization of mutual information,” Int. J. Comput. Vis., vol. 24, no. 2, pp. 137–154, 1997. ACKNOWLEDGMENT [22] C. Studholme, “Measures of 3D image alignment,” Ph.D. dissertation, Univ. London, London, U.K., Aug. 1997. The authors wish to thank S. Kohlmyer for his comments [23] P. Thevenaz and M. Unser, “A pyramid approach to sub-pixel image fu- throughout this research project, and B. Lewellen and S. Santoni sion based on mutual information,” in Proc. 1996 IEEE Int. Conf. Image Processing (ICIP’96), vol. I, Lausanne, Switzerland, Sept. 16–19, 1996, for their assistance with acquiring the images. pp. 265–268. [24] L. G. Brown, “A survey of image registration techniques,” ACM Com- puting Surveys, vol. 24, no. 4, pp. 325–376, Dec. 1992. [25] A. Roche, G. Malandain, X. Pennec, and N. Ayache, “Multimodal image REFERENCES registration by maximization of the correlation ratio,”, Paris, France, INRIA Res. Rep. 3378, 1998. [1] C. Studholme, D. L. G. Hill, and D. J. Hawkes, “An overlap invariant [26] G. Golub and C. van Loan, Matrix Computation. Baltimore, MD: entropy measure of 3D medical image alignment,” Pattern Recogn., vol. Johns Hopkins Univ. Press, 1983. 32, pp. 71–86, 1999. [27] R. P. Woods, S. T. Grafton, J. D. G. Watson, N. L. Sicotte, and J. C. [2] F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens, Mazziotta, “Automated image registration: II. Intersubject validation of “Multimodality image registration by maximization of mutual informa- linear and nonlinear methods,” J. Comput. Assist. Tomogr., vol. 22, no. tion,” IEEE Trans. Med. Imag., vol. 16, pp. 187–198, Apr. 1997. 1, pp. 153–165, 1998. [3] P. Thevenaz and M. Unser, “Spline pyramids for inter-modal image reg- [28] M. van Herk, J. C. de Munck, J. V.Lebesque, S. Muller, C. Rasch, and A. istration using mutual information,” in Proc. SPIE, vol. 3169, 1997, pp. Touw, “Automatic registration of pelvic computed tomography data and 236–247. magnetic resonance scans including a full circle method for quantitative [4] D. Rueckert, L. I. Sonada, C. Hayes, D. L. G. Hill, M. O. Leach, and accuracy evaluation,” Med. Phys., vol. 25, no. 10, pp. 2054–2066, 1998. D. J. Hawkes, “Nonrigidregistrationusing free-form deformations: ap- [29] W. P. Segars, D. S. Lalush, and B. W. Tsui, “Modeling respiratory me- plication to breast MR images,” IEEE Trans. Med. Imag., vol. 18, pp. chanics in the MCAT and spline-based MCAT phantoms,” IEEE Trans. 712–722, Aug. 1999. Nucl. Sci., vol. 48, pp. 89–97, Jan. 2001. [5] M. Unser, A. Aldroubi, and M. Eden, “Fast B-spline transforms for con- [30] P. Thevenaz and M. Unser, “Optimization of mutual information for tinuous image representation and interpo-lation,” IEEE Trans. Pattern multiresolution image registration,” IEEE Trans. Image Processing, vol. Anal. Machine Intell., vol. 13, pp. 277–285, Mar. 1991. 9, Dec. 2000. [6] J. Nocedal and S. J. Wright, Numerical Optimization. New York: [31] B. Bendriem and D. W. Townsend, The theory and practice of 3D Springer-Verlag, 2000, ch. 8–9. PET. Dordrecht, The Netherlands: Kluwer Academic, 1998.