<<

Master of Science Thesis in Electrical Engineering Department of Electrical Engineering, Linköping University, 2017

Evaluation of 3D MRI image registration methods

Magnus Ivarsson Master of Science Thesis in Electrical Engineering Evaluation of 3D MRI image registration methods Magnus Ivarsson LiTH-ISY-EX--17/5037--SE

Supervisor: Andreas Robinson cvl, Linköping University Thobias Romu amra Magnus Borga amra Examiner: Maria Magnusson cvl, Linköping University

Computer Vision Laboratory Department of Electrical Engineering Linköping University SE-581 83 Linköping, Sweden

Copyright © 2017 Magnus Ivarsson Abstract

Image registration is the process of geometrically deforming a template image into a reference image. This technique is important and widely used within the field of medical IT. The purpose could be to detect image variations, pathological development or in the company AMRA’s case, to quantify fat tissue in various parts of the human body.

From an MRI (Magnetic Resonance Imaging) scan, a water and fat tissue image is obtained. Currently, AMRA is using the Morphon algorithm to register and seg- ment the water image in order to quantify fat and muscle tissue. During the first part of this master thesis, two alternative registration methods were evaluated. The first algorithm was Free Form Deformation which is a non-linear parametric based method. The second algorithm was a non-parametric optical flow based method known as the Demon algorithm. During the second part of the thesis, the Demon algorithm was used to evaluate the effect of using the fat images for registrations.

iii

Acknowledgments

I would like to thank my supervisor Andreas Robinson and examinor Maria Mag- nusson at CVL for your great support during my master thesis. I would also like to thank my supervisors Thobias Romu and Magnus Borga for giving me the opportunity to complete my master thesis at AMRA and for your continuous support throughout the project.

Linköping, Januari 2017 Magnus Ivarsson

v

Contents

Notation ix

1 Introduction 1 1.1 Background ...... 1 1.1.1 Aim ...... 4 1.2 Problem Formulation ...... 4 1.3 Limitations ...... 4

2 Theory 7 2.1 Image registration - an overview ...... 7 2.1.1 Parametric methods ...... 8 2.1.2 Non-parametric methods ...... 10 2.1.3 Landmark based registration ...... 12 2.2 Distance measures ...... 13 2.2.1 SSD ...... 14 2.2.2 Normalized Cross Correlation ...... 15 2.2.3 Normalized Gradient Field ...... 15 2.2.4 ...... 16 2.3 Image registration as an optimization problem ...... 16 2.3.1 Regularization methods ...... 16 2.4 Methods ...... 17 2.4.1 Morphon ...... 18 2.4.2 Demon ...... 20 2.4.3 Free Form Deformation ...... 21 2.5 Evaluation ...... 22 2.5.1 Body Composition Measurements ...... 22 2.5.2 Segmentation metrics ...... 24

3 Method 29 3.1 Algorithm comparison ...... 29 3.1.1 Morphon ...... 32 3.1.2 Free form deformation ...... 32 3.1.3 Demon ...... 33

vii viii Contents

3.2 Further investigation ...... 34 3.3 Region definitions ...... 35 3.4 Dataset ...... 36 3.4.1 Region masks ...... 37 3.5 Deformation and segmentation ...... 39 3.5.1 Prototype deformations ...... 40 3.5.2 Prototype selection ...... 40 3.5.3 Probability field ...... 41 3.5.4 Segmentation ...... 42 3.5.5 Disjoint regions ...... 42 3.6 Evaluation ...... 42 3.6.1 Body composition measurements ...... 42 3.6.2 Weighted segmentation ...... 44 3.6.3 Statistics ...... 45

4 Results 47 4.1 Algorithm comparison ...... 47 4.1.1 Visceral adipose tissue ...... 48 4.1.2 Abdominal subcutaneous adipose tissue ...... 50 4.1.3 Muscles ...... 52 4.2 Further investigation ...... 54 4.2.1 Visceral adipose tissue ...... 55 4.2.2 Abdominal subcutaneous adipose tissue ...... 57 4.2.3 Muscles ...... 59

5 Discussion 63 5.1 Results ...... 63 5.1.1 Algorithm comparison ...... 64 5.1.2 Further investigation ...... 64 5.2 Method ...... 66 5.2.1 Flawed and biased ground truth ...... 66 5.2.2 Incomplete evaluation of registrations ...... 66 5.2.3 Unequal tuning of algorithms ...... 67 5.2.4 Limited set of prototypes ...... 67 5.2.5 Training and evaluation set ...... 67 5.3 Future work ...... 68 5.3.1 Refine ground truth ...... 68 5.3.2 Evaluating registrations differently ...... 68 5.3.3 Detecting and handling outliers ...... 68 5.3.4 Extend prototype bank ...... 68

6 Conclusions 71

Bibliography 73 Notation

Notation Meaning

IT Template image IR Reference image Ω Domain where an image I is defined u Displacement field δu Incremental displacement field c Certainty field I Image gradient ∇D Distance/Similarity measure T Transformation (unless something else is specified) R Regularization x Bold lowercase character indicate vector A Bold uppercase character indicate matrix I,I Scalar product between two images h · ui Divergence of a displacement field ∇ Convolution T ∗ I Transformation applied to an image ◦

Abbreviation Meaning MRI Magnetic resonance imaging VAT Visceral adipose tissue ASAT Abdominal subcutaneous adipose tissue SAT Subcutaneous adipose tissue LULB Left upper leg back (muscle) LULF Left upper leg front (muscle) RULB Right upper leg back (muscle) RULF Right upper leg front (muscle) IP In phase (water + fat image)

ix

1 Introduction

This chapter is intended to give the reader the necessary background information and motivation of why image registration of MR images is useful and the aim of the master thesis. The problem formulation and the limitations of this thesis are also described.

1.1 Background

With an aging population, an efficient and improved health care is essential to handle the demands we put on our health care system. and image registration of medical images will play an important role in the future. In today’s society, health and metabolic status for individuals are often obtained by indirect measurements such as BMI and waist circumference. However, alternate methods exist: AMRA is a company that specializes in precise body composition measurements that can improve and customize treatments for individuals with high metabolic risk. This is done by generating a water and fat image from an MRI scan followed by a quantification of fat and muscle tissue in different regions of the body. This quantification is possible because of two reasons: First, the water image indicate which voxels that contain muscle tissue while the fat image indicate which voxels contain fat tissue. Second, the images are normalized so that the voxel intensity correspond to the concentration of fat or muscle tissue within that particular voxel. An example of a water and fat image can be seen in figure 1.1 and images of segmented adipose and muscle tissue can be seen in figure 1.2.

1 2 1 Introduction

Figure 1.1: The image to the left illustrates a water image, the image in the middle illustrates a fat image and the image to the right illustrates an IP image (water + fat) in the coronal plane. Note that the water signal is strong where the fat signal is weak and vice versa. Image source: AMRA.

Figure 1.2: The images to the left illustrate the upper leg muscle regions. Blue pixels represent left upper leg back, yellow represents left upper leg front, magenta represents right upper leg back and cyan represents right upper leg front. In the image to the right, red pixels represent VAT and blue pixels represent ASAT. Image source: AMRA. 1.1 Background 3

Examples of measurements AMRA provide are • VAT - Visceral Adipose Tissue (intra-abdominal fat) • ASAT - Abdominal Subcutaneous Adipose Tissue (pincheable fat) • Thigh Muscle Volume • Lean Muscle Tissue Volume • Total Adipose Tissue • Additional Muscle Group Volumes The reason why the body is segmented into different regions is that it often mat- ters where fat tissue is located. For instance, research has shown that large vol- umes of visceral adipose tissue is connected to diabetes, liver disease, and cancer [15]. On the other hand, subcutaneous fat is much less likely to have adverse health effects [16]. Tools that are used today, like BMI and waist circumference, are well suited to determine health and metabolic status on a population basis but can sometimes be crude when used to determine the body composition of an individual. An image that illustrates the difference between BMI and VAT can be seen in figure 1.3.

Figure 1.3: Example of six men with a BMI of 21 where their body composi- tion and metabolic risk have a high variation. The blue color represents ab- dominal subcutaneous adipose tissue and red color represents visceral adi- pose tissue. Image source: AMRA. 4 1 Introduction

So far, the reasons why precise body composition measurements are necessary have been explained. However, the method by which the regions of interest are found, i.e. how to segment the image, has not yet been described. A simple but time-consuming way to segment an image is manual labelling of each voxel. An- other approach, which is more cost effective, is to use image registration methods to automatically segment an image with the aid of a computer. Note that the precision is not guaranteed to be better by manual segmentation compared to au- tomatic. Today, AMRA is using the Morphon algorithm in order to register and segment an MR image.

1.1.1 Aim The aim of the master thesis is to evaluate different registration algorithms and to determine whether the results given by the Morphon can be improved. The evaluation was carried out as explained in section 2.5.1 and 3.6.

1.2 Problem Formulation

The master thesis has been divided into two parts. The first part was devoted to answer the following questions:

• Is it possible to obtain better segmentation with the parametric method Free Form Deformation (explained in section 2.1.1) compared to the Morphon?

• Is it possible to obtain better segmentation with the Demon algorithm (ex- plained in section 2.4.2) compared to the Morphon?

In the second part, the Demon algorithm was used to answer the following ques- tions:

• Is it possible to obtain better segmentation if the fat image is used as input to the registration algorithm?

• Is it possible to obtain better segmentation if a combination of the fat and water image is used as input to the registration algorithm?

1.3 Limitations

Because of the limited time frame of the master thesis, the methods chosen for evaluation are within the group of parametric and non-parametric methods.

Due to the high resolution of the 3D images and the computational demands of the algorithms, only a modest number of images were used for evaluation. How- ever, the number of samples is still enough to indicate whether an algorithm has a lower, higher or the same potential than the Morphon algorithm. 1.3 Limitations 5

Since the focus of this thesis is image registration, the theory behind MR imaging and how the water and fat images is obtained are not within the scope of this thesis. Neither is the theory behind the normalization of the images to do a quan- titative measurement of fat and muscle tissue. The interested reader is referred to [11], [10] and [17].

2 Theory

This chapter primarily explains the theory behind image registration but also describes some metrics that can be used to evaluate a registration.

2.1 Image registration - an overview

This section provides an overview of the most common image registration tech- niques used today. The content given in this section is based on [3].

Before presenting different types and groups of algorithms, it is important to un- derstand the main concept. The idea is to find a transformation T that deforms a template image I so that the deformed template image T I is similar to a T ◦ T reference image IR. The transformation T can be a transform applied to all voxels in the image IT , which is referred to as a parametric transform. It can also be a non-parametric transform where all voxels are allowed to be displaced separately. In the latter case, the transformation is seen as a displacement field which is de- noted by u instead of T . These types of transformations are further explained in section 2.1.1 and 2.1.2. An example of a template image before and after it is deformed into a reference image can be seen in figure 2.1.

All registration algorithms, no matter which group they belong to, aim to mini- mize some kind of cost function described by

(T ) = D(IR,IT ,T ) + R(T ), (2.1) where D is a distance measure and R is a regularization term. The term D is used to determine how different the reference image IR and the template image IT are after the transformation T has been applied to IT . The term R is added

7 8 2 Theory to regularize the solution so that the displacement field becomes homogeneous and to penalize unreasonable transformations. Some methods, however, are not constructed as a pure optimization problem. Instead the problem is divided into two parts:

1. Finding the optimal transform T by a distance measure.

2. Regularizing the transform.

The main reason for dividing the problem into two parts is to simplify it and facil- itate the parameter tuning. Non-parametric image registration methods, which are explained in section 2.1.2, typically do this separation.

Figure 2.1: This figure illustrates the effects of an image registration with the water images in the axial plane of a body. The image to the left is a template image I , the image in the middle is a deformed template image T I and T ◦ T the image to the right is a reference image IR. Note that the template image is more similar to the reference image after the deformation than before. Image source: AMRA.

2.1.1 Parametric methods In parametric methods, the aim is to find a set of parameters p that minimizes a cost function under the constraint of a certain type of transformation. The idea of using a parametric instead of a non-parametric transform is that the parametriza- tion explicitly regularizes the solution [13]. First, linear transformations will be explained based on [3]. Second, non-linear transformation is briefly covered based on [13].

Linear transformation

Examples of linear transformation types are

• Rigid - translation and rotation.

• Similarity - rigid and scaling.

•A ffine - similarity and skewing. 2.1 Image registration - an overview 9

The general form of these transformations can be described as

X Tp(x) = x + pk Bk(x), (2.2) k where pk is a scalar, x is the coordinate of a voxel, Bk is a multivariate basis func- tion and Tp(x) is the transformed coordinate. The generic cost function described by (2.1) can then be modified into

(p) = D(IR,IT ,T (p)) + R(p). (2.3)

In general, the regularizing term R is omitted since the idea behind a parametric transform is that the constraints of the parameters themselves will regularize the solution.

A rigid transform in 2D for instance, which consists of a rotation R and a transla- tion t, can be written as

" #" # " # p1 p2 x p3 Tp(x) = Rx + t = − + p2 p1 y p4   p1 (2.4) " #   x y 1 0 p2 = −   = B(x)p y x 0 1 p3   p4 where the columns of B(x) are the basis functions Bk(x).

Non-linear transformation

In order to allow for more complex deformations, non-linear transformations are the natural step after the affine transformations. Note that non-linear transforma- tions should not be confused with non-parametric methods which are explained in section 2.1.2. Non-linear transformations are still governed by a set of pa- rameters while non-parametric transformations allow each voxel to be displaced separately.

Thin-plate spline (TPS) is a well known non-linear and grid-based registration algorithm. The idea is to place an evenly spaced grid of control points in the template image and find an affine transformation for these points. The transfor- mation for the remaining voxels is then given by,

Xn T (x) = Ax + t + w · x x , (2.5) i || i − || i=1 10 2 Theory where A is an affine transformation matrix, t a translation vector, n the number of control points, xi is the coordinate of the ith control point and wi is the weight of the ith control point. In TPS, all control points are included in each transforma- tion i. e. they have global support. This makes the transformation more robust but do not allow as complex local deformations as Free Form Deformation which will be explained in section 2.4.3.

2.1.2 Non-parametric methods As described in section 2.1.1, parametric methods aim is to find a transformation T that deforms the template image given certain constraints. For non-parametric methods, the deformation is no longer bound to a certain type of transformation. Instead, all voxels of the template image are displaced separately which allows for very complex deformations. Since the deformation is no longer a product of a transformation T applied to all voxels, the transformation is replaced by a displacement field u. This displacement field u contains the displacement of all voxels within the image.

Two ways of finding the displacement field for a non-parametric method will be covered in section 2.4 where the Morphon and Demon algorithms will be ex- plained. These are two well known non-parametric image registration algorithms where Morphon make use of the local phase to find the displacement while De- mon is based on optical flow.

The content in this section is based on [3] unless anything else is specified.

Field Accumulation

To be able to find a large and accurate deformation, an iterative approach through scale-space where the displacement field is accumulated is often preferred. There are many ways to accumulate the displacement field and this thesis explains three alternatives, see (2.6), (2.7) and (2.8).

The simplest way is to add the incremental displacement δu to the current accu- mulated displacement u by

u u + δu. (2.6) ← However this approach can cause unwanted tears and folding which is not phys- ically possible for a deformed body. The usage of composite accumulation is preferred in this situation,

u u δu = u(x + δu(x)) + δu(x). (2.7) ← ◦ Another way to avoid tears and foldings of the displacement field is by accumu- lating u as 2.1 Image registration - an overview 11

u u exp (δu), (2.8) ← ◦ which ensures that the displacement field stays diffeomorphic. This means that the field is invertible and that both the field and its inverse are smooth or, to put differently, that its Jacobian determinant is positive.

Field Regularization

An important concept in image registration is the displacement field regulariza- tion. In section 2.3.1 the regularization of the displacement field was a part of the error function explicitly. However in non-parametric image registration methods the incremental and/or the accumulated displacement fields need to be regular- ized to make the problem well-posed. The simplest and most straight-forward way is to convolve the field with an isotropic low pass filter

u u g . (2.9) ← ∗ iso This filter will efficiently smooth the displacement field in homogeneous regions, but it will also smear out edges and lines which is not preferred. This can be mitigated by adaptive filtering where the filter size is modified depending on the local region.

The following content describes a method to perform anisotropic filtering sug- gested by [4]. The examples and equations are given in 2D but can be extended to 3D. The first step is to compute the local structure tensor T of the displacement field and do an eigenvalue decomposition

X2 T T = λkeˆkeˆk . (2.10) k=1

A large eigenvalue λk indicates that there is a distinct structure orthogonal to their corresponding eigenvector eˆk. Therefore low pass filtering should only be executed in the directions of the eigenvectors that correspond to small eigenval- ues. There are basically three different cases where:

1. No apparent structure exists, both λ1 and λ2 small.

2. Distinct structure exists in one direction, λ1 large and λ2 small.

3. Distinct structure exists in two orthogonal directions, λ1 and λ2 large.

The next step is to create the control tensor C 12 2 Theory

X3 T C = γkeˆkeˆk , (2.11) k=1 where γ is mapped from λ such that 0 γ 1. Further, an adaptive filter k k ≤ k ≤ gadap can be created by a linear combination of three filters:

• glarge - isotropic filter which suits case 1.

• ganiso - anisotropic filter where the orientation is governed by eˆ1 which suits case 2.

• gsmall - small isotropic filter which suits case 3. The adaptive filter is given by

γ2 γ2 gadap = C · ((1 )· ganiso + · gsmall) + (1 C )· glarge. (2.12) || || − γ1 γ1 − || ||

The parameters γk can be found by

T = T g, (2.13) 1p ∗ γ = m( T , σ, α, β), (2.14) 1 || 1p|| k Y λl γk = γ1 µ( , αl, βl), k = 2, ..., N, (2.15) λl 1 l=2 − where g is a Gaussian low pass filter and N is the number of dimensions. The m and µ functions are used to control the transition between case 1-2 and between case 2-3. Examples of the functions m and µ and the purpose of the parameters σ, α and β are not given by [4].

2.1.3 Landmark based registration Landmark based image registration is a sparse method that is based on anatomi- cal landmarks, which are a set of interest points that can be found in most human beings. An image that illustrates a set of landmarks and their correspondence between two images can be seen in figure 2.2. The content in this section is pri- marily based on [12] unless anything else is specified.

The idea is to find a transformation T that minimizes the distance between the landmarks in the template and reference image. Since the correspondence be- tween one point in the template image and one point in the reference image is known, there is no need of introducing a similarity measure. Which transfor- mation that minimizes the distance between the landmarks depends on the con- straints of the transformation. As explained in section 2.1.1, such constraints 2.2 Distance measures 13 could be to allow a translation of the points or a rigid, similarity, affine or non- linear transformation.

An important detail when describing landmark based registration is how to actu- ally find the coordinates of the landmarks. The simplest method is to manually locate each landmark in the images. This can be an accurate but time consuming approach that is not easily scalable. To overcome the problem of manual place- ment, a classifier can be trained to find each landmark which would make this approach scalable. An example of how a convolutional neural network can be trained to detect landmarks on distal femur in 3D MR images is presented in [23].

Figure 2.2: Example of correspondences between landmarks in two images. The green circles indicate the landmarks and the red lines in-between them determine their correspondence.

2.2 Distance measures

To determine whether a transformation applied to IT makes it similar to IR de- pends primarily on which distance measure is used. This section gives an intro- duction to the most common distance measures used in image registration based on [12], [3] and [13] unless anything else is specified. Some measures are simple, have a low computational cost, while other methods can handle more complex contexts but comes at a higher computational cost. The transformation in this section will be regarded as a displacement field, u.

Before going through the different methods, the topic of multi-modality has to be mentioned since some distance measures can handle this situation better than others. Multi-modality refers to images generated by different settings of a MRI 14 2 Theory scan or different scanning methods (ex MRI and CT), which vary in intensity and tissue contrast. To name one example, it can be an additive or a multiplicative intensity offset between the images. This makes image registration a much more complex problem since simple methods, which assume that similar structures have similar intensity values, won’t work in this case. In figure 2.3 a T1 and T2 weighted MRI scan can be seen. Note how they have similar shapes and struc- tures, but completely different intensity levels in some areas. Since the focus of this master thesis is on image registration, T1 and T2 weighted MRI scanning will not be explained. It should also be mentioned that this thesis only considered mono-modal image registration.

Figure 2.3: The image to the left is a T1 weighted MR image and the image to the right is a T2 weighted MR image. Image source: Wikipedia [7].

2.2.1 SSD One well known intensity based method is sum of squared differences (SSD), Z  2 D (I ,I , u) = I (x) I (x + u(x)) dx, (2.16) SSD R T R − T Ω where Ω is the image domain, x is a voxel coordinate and u(x) is the displacement for that particular voxel.

This distance measure assumes that the intensities match when the correct trans- formation is applied to the template image. This method work well for mono- modal image registration but not for multi-modal registration since it is intensity based. 2.2 Distance measures 15

2.2.2 Normalized Cross Correlation

The cross-correlation CC between a template image IT and reference image IR is defined as

Z CC(IR,IT , u) = < IR,IT (u) > = IR(x)IT (x + u(x))dx. (2.17) Ω Normalized cross-correlation is given by

< IR,IT (u) > NCC(IR,IT , u) = , (2.18) < IR,IR > < IT (u),IT (u) > and the corresponding distance measure is defined as

D (I ,I , u) = 1 NCC(I ,I , u)2. (2.19) NCC R T − R T This approach can handle images where the intensities are related by a multi- plicative factor because of the normalization in (2.18).

2.2.3 Normalized Gradient Field The normalized gradient field, NGF, was first presented in [6] and the following content is based on that article. The equation for NGF is given by

I NGF(I) = ∇ , (2.20) √ I T I + 2 ∇ ∇ where I is the image gradient and  is a parameter which makes the distance measure∇ robust against noise. It is included to indicate the smallest change in the image I that should be interpreted as a distinct edge. This parameter is given by

η Z  = I(x) dx (2.21) V |∇ | Ω where η is the estimated noise level in the image and V is the volume of the image domain. The corresponding distance measure to NGF is given by

Z  2 D (I ,I , u) = 1 NGF(I (x))T NGF(I (x + u(x)) dx. (2.22) NGF R T − R T Ω There are two reasons why this is a powerful distance measure. First, the gradient image is normalized which solves the problem with a multiplicative intensity difference between the images. Second, since gradients are computed by voxel differences, any additive intensity difference will be canceled by that operation. 16 2 Theory

2.2.4 Mutual Information Mutual information, MI, is a distance measure based on statistical information rather than spatial information [5]. The distance measure is given by

Z Z Z D (I ,I ,T ) = p log (p )dr + p log (p )dt p log (p )drdt, MI R T R R T T − R,T R,T R R R2 (2.23) where pR and pT are the marginal densities and pR,T is the joint density over intensity values. This distance measure is the only one which can handle a non- linear mapping of intensities presented in this thesis.

2.3 Image registration as an optimization problem

As mentioned in section 2.1, image registration aims to minimize a cost function under certain constraints. How this cost function and constraints are formulated may vary between applications and will heavily affect the end result. A naive approach would be to let all voxels be displaced separately and to use SSD from (2.16) as a distance measure

Z (u) = D (I ,I , u) = (I (x) I (x + u(x)))2dx. (2.24) SSD R T R − T Ω In this case, the error function  would be easy to minimize and the cost would probably be low. However, the displacement field would not be particularly ac- curate or unique.

2.3.1 Regularization methods In order to improve the probability of finding an accurate and plausible displace- ment field, i.e. making the problem well-posed, a regularization term R can be added to the cost function

(u) = DSSD (IR,IT , u) + R(u). (2.25) This section will briefly explain the most common regularization terms, i.e. dif- fusion, curvature, elastic and fluid based on [3], [13], [2] and [12].

Diffusion

Diffusion regularization aims to penalize the inhomogeneities in the displace- ment field by, in each dimension, taking the sum of the norm of the gradient of u. The equation is given by 2.4 Methods 17

Z d 1 X R(u) = u 2dx (2.26) 2 ||∇ i|| Ω i=1 where i indicate the dimension of u and d is the number of dimensions. Note that (2.26) is equal to zero if the transformation corresponds to a translation.

Curvature

Contrary to diffusion regularization, curvature regularization allow more com- plex deformations by taking the Laplacian instead of the gradient of the displace- ment field. As long as the deformation is affine, the regularization term will be equal to zero. The equation is given by

Z d 1 X R(u) = (∆u )2dx, (2.27) 2 i Ω i=1 where ∆ is the Laplace operator.

Elastic

Elastic regularization has a physical interpretation where an elastic material’s energy potential is minimized. The equation is given by

Z 1 R(u) = µ < u, u > + (λ + µ)( · u)2dx, (2.28) 2 ∇ ∇ ∇ Ω where λ and µ are the so-called Lamé constants.

Fluid

The equation for fluid regularization is the same as for elastic regularization ex- cept that it is applied to the velocity field, e.g. the incremental displacement field δu.

2.4 Methods

This section describes the general theory behind the methods that were evaluated in this thesis. The first two methods, Morphon and Demon, are both well known non-parametric methods. The last method, Free form deformation, is a typical non-linear and grid based parametric method. 18 2 Theory

2.4.1 Morphon The Morphon algorithm was first introduced [9] and the content in this section is based on that article and [3]. One of the ideas is to estimate the local displace- ment u by computing the phase difference between the template and reference image. This can be done by convolving both images with a set of N quadrature filters f N in N different directions nˆ N . Consequently the filter responses, { k}k=1 { k}k=1 q N and q N , for the template and reference image are given by { Tk }k=1 { Rk }k=1

q = I f , (2.29) Tk T ∗ k q = I f , (2.30) Rk R ∗ k where fk is a quadrature filter. In practice, fk is represented by two filters, one filter that represents the real part and one filter that represents the imaginary part of the complex filter. An illustration of how the phase of a quadrature response is related to different types of edges can be seen in figure 2.4.

Figure 2.4: Illustration of the phase ϕ from a quadrature response. Image source: [1]

The conjugate product of the filter responses can be denoted as

Q = q q , (2.31) k Tk R∗ k and the phase differences pk can then be computed by taking the argument of the conjugate product of the filter responses

pk = arg(Qk). (2.32) In order to increase the robustness against noise, the local phase differences can be weighted by the magnitude of the conjugate product 2.4 Methods 19

p c = Q , (2.33) k | k| where c stands for certainty. A clear edge, for instance, will result in a large Qk while noise will give a smaller response. The certainties are used to weight the displacements u when estimating the displacement field (2.35). They can also be further regulated by incorporating the phase difference,

c = Q 1/2 cos2(p /2), (2.34) k | k| k where small differences in phase will give a higher certainty. The simplest way to estimate the displacement, which was presented in [19], is by a weighted sum given by

P N c p nˆ u = k=1 k k k (2.35) PN , k=1 ck where ck is the certainty and pk is the phase difference in the direction of nˆk.

As explained in section 2.1.2 and suggested in [9], the displacement field should be accumulated incrementally while iterating from coarse to fine scale. The accu- mulated displacement u and certainties c are then updated with

cu + δc(u + δu) u = , (2.36) c + δc c2 + δc2 c = , (2.37) c + δc where δc is the incremental certainty and δu is the incremental displacement on a specific scale. To update the displacement field with help of the certainty map characterizes the Morphon algorithm and is one of the benefits of using the local phase as a distance measure. Another advantage of the local phase difference is its invariance to any intensity differences between the template and reference im- ages.

Since the registration problem is ill-posed and there exists multiple displace- ments, the field can be regularized by normalized convolution where the certain- ties are incorporated,

(δuδc) g δu = ∗ , (2.38) δc g ∗ where g is a Gaussian low pass filter. 20 2 Theory

2.4.2 Demon The Demon algorithm, first described in [21] and [20], was inspired by the con- cept of optical flow to find a displacement u of the template image. The content in this section is based on those articles and [3].

Optical flow refers to the motion between two images taken with a small time difference ∆t apart. Assuming that the brightness in the two images is constant, the following equation must hold,

I(x, t) = I(x + ∆x, t + ∆t). (2.39) A first order Taylor expansion of (2.39) yields

I T ∆x = I , (2.40) ∇ t where It is the derivative of I with respect to t and I is the spatial gradient of the image. Since the solution to (2.40) is not unique,∇ there exists many solutions where the simplest solution is given by

I ∆x = t I, (2.41) I 2 ∇ ||∇ || which also served as an inspiration to the Demon algorithm. Since It denotes the change of time and IR IT can be seen as two images taken one time step apart, I could be replaced by−I I and ∆x by u in (2.41) which gives t R − T I I u = R − T I . (2.42) I 2 ∇ R ||∇ R|| A modified version, which is more stable for small values of I is given by ∇ R I I u = R − T I . (2.43) I 2 + (I I )2 ∇ R ||∇ R|| R − T Ref. [20] suggests the use of a multiscale implementation, similar to the Morphon algorithm, for several reasons. First of all it speeds up the implementation, the additional computational cost of performing a computation on all scales is 1/3 of the computational cost for the finest scale. In addition, the convergence is faster. Finally, macroscopic features are more stable for human anatomy.

Since (2.43) is not a unique solution, it is necessary to regularize the incremental displacement field δu with a Gaussian low pass filter,

δu = δu g, (2.44) ∗ for each iteration on each scale [3]. 2.4 Methods 21

2.4.3 Free Form Deformation

Free form deformation - FFD - is a well known non-linear and grid based method similar to Thin-plate splines described in section 2.1.1. The following content is based on [18] and [13].

The name Free Form Deformation is slightly misleading since the deformation is still governed by a parametric transformation under a few constraints, as op- posed to non-parametric deformations. The idea is to place an evenly spaced mesh (nx ny x nz) of control points in the template image and find an affine transformation× × for× these points. The deformations in between these points are found by non-linear interpolation using 1D cubic B-splines. The deformation of a single voxel is given by

X3 X3 X3 T (x, y, z; p) = Bk(ux)Bm(vy)Bn(wz)pi+k,j+m,l+n, (2.45) k=0 m=0 n=0 where x, y and z are the coordinates of a voxel and i = x/n 1, j = y/n 1, l = b xc− b yc− z/nz 1 and p are the control points. The local B-spline coordinates (ux, vy, wz) areb givenc − by

u = x/n x/n , (2.46) x x − b xc v = y/n y/n , (2.47) y y − b yc w = z/n z/n , (2.48) z z − b zc and the B-spline functions are given by

B (u) = (1 u)3/6, (2.49) 0 − B (u) = (3u3 6u2 + 4)/6, (2.50) 1 − B (u) = ( 3u3 + 3u2 + 3u + 1)/6, (2.51) 2 − 3 B3(u) = u /6. (2.52)

There are some similarities but also some differences between FFD and TPS. Both methods aim to find an affine transformation of the control points. The difference, however, is by which method the displacements between the control points are interpolated and how many of the control points affect each interpolation. In TPS, the deformation of all control points impact the deformation of all other voxels which means that they have global support. In FFD, the deformation of a voxel is only affected by the nearest control points, which means that they have local sup- port. The consequence is that FFD allows for more complex local deformations while TPS tends to be more robust. 22 2 Theory

2.5 Evaluation

In order to determine how well an image registration method works, some kind of similarity metric has to be used. The first type of metric, body composition measurements, evaluates the effect of the image registration. The second type, segmentation metrics, evaluate how well a deformed mask pertaining to a tem- plate image overlap with the ground truth mask pertaining to the reference im- age.

2.5.1 Body Composition Measurements Given a registered and segmented image, it is possible to compute the body com- position measurements mentioned in section 1.1. The muscle measurements be- low are given by [8]. The muscle volume is given by

X MV = Mmuscle · Mbody · Vvoxel, (2.53) voxels ∀ where Mmuscle is a deformed binary muscle mask pertaining to the template im- age, Mbody a binary body mask, Vvoxel the volume of a voxel connected to the reference image. The fat free muscle volume MVFF is given by

X MVFF = (f < 0.5) · Mmuscle · Mbody · Vvoxel, (2.54) voxels ∀ where f is the fat-concentration in a particular voxel connected to the reference image. This means that each voxel within Mmuscle with a fat concentration less than 0.5 is considered to be a fat-free muscle voxel.

Estimation of VAT and ASAT can be done in a similar manner,

X VAT = f · MVAT · Mbody · Vvoxel, (2.55) voxels ∀ X ASAT = f · MASAT · Mbody · Vvoxel, (2.56) voxels ∀ where MVAT and MASAT are the binary VAT and ASAT masks.

Bland-Altman plot

As a complement to the standard statistics like the mean µ, standard deviation σ, median xmed, minimum xmin, maximum xmax and range xrange of a sample set, a Bland-Altman plot can be used to determine the relation between two differ- ent methods. Consider two sets of n samples (e.g. n different image volumes), s1, ..., sn and s1, ..., sn , where the first set contain the measurements of the first { 1 1 } { 2 2 } 2.5 Evaluation 23 method and the second set contain the measurements of the second method. For i i the ith measurement, the average sav and difference sdif f between the two meth- ods are given by

si + si si = 1 2 , (2.57) av 2 si = si si . (2.58) dif f 1 − 2

The sample mean difference also known as the bias µˆdif f and the sample stan- dard deviation of the difference σˆdif f are given by

n 1 X µˆ = si , (2.59) dif f n dif f i=1 v t n 1 X σˆ = si µˆ . (2.60) dif f n 1 dif f − dif f − i=1

In the Bland-Altman plot, the difference s1 , ..., sn is plotted against the av- { dif f dif f } erage s1 , ..., sn as a scatter plot for the n measurements. In addition, the mean { av av} difference µˆdif f is visualized as a straight line and the standard deviation of the difference σˆdif f is used to visualize the 95% confidence interval assuming that the difference follows a Gaussian distribution. An example of a Bland-Altman plot can be seen in figure 2.5. 24 2 Theory

Figure 2.5: The green dots represent each sample in a dataset. The samples are located depending on the average (2.57) and difference (2.58) between the two methods. The blue line (bias) is given by (2.59). The red dotted lines are given by µˆdif f 1.96σˆdif f which represent the 95% confidence interval for a Gaussian distribution.±

2.5.2 Segmentation metrics

To determine how accurately the algorithms register a reference image, a few well known similarity metrics are presented in this section. These metrics are not, by definition, specific to image registration and segmentation, but are applicable in this context. To understand the similarity metrics, the concept of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) have to be introduced. In the field of 3D image segmentation they can be defined in the following ways:

• True positives, TP, refer to the sum of voxels which are correctly labelled as foreground.

• False positives, FP, refer to the sum of voxels which are falsely labelled as foreground.

• True negatives, TN, refer to the sum of voxels which are correctly labelled as background.

• False negatives, FN, refer to the sum of voxels which are falsely labelled as background. 2.5 Evaluation 25

The number of true positives or false negatives in an image provide, by them- selves, very little information. However the relationship between these measure- ments do. To name an example, if the number of true positives in an image is 500, is that a good or bad result? If the number of false positives is 4 and the number of false negatives is 6, it is reasonable to say that the answer to that question is yes. On the other hand, if the number of false positives is 400 and the number of false negatives is 600, it is likely that the answer is no.

There are many ways to relate these labels to each other and the following content introduces a few ways to determine similarity between a deformed template im- age and a reference image. In order to fully understand the concept of TP, FP, TN, FN and the similarity measurements described below, an image that illustrates these concepts is provided in figure 2.6.

Precision

Precision is a measure which describes how accurately a method determines true positives within the set of all attempts to do so. When a method gives a positive answer, the precision measurement informs us of how likely the answer is to be correct. The equation is given by

TP P recision = . (2.61) TP + FP

Images that illustrate how the precision measurement is affected in different situ- ations can be seen in figure 2.6.

Recall

Recall is a measure which describes how likely a method is to determine all true positives, regardless of false positives. The equation is given by

TP Recall = . (2.62) TP + FN

Images that illustrate how the recall measurement is affected in different situa- tions can be seen in figure 2.6. 26 2 Theory

Figure 2.6: This image illustrate the back and front leg muscles in the coronal plane. The yellow pixels represent true positives, magenta repre- sent false positives and cyan represent false negatives. In the left image, P recision < 1.0 and Recall = 1.0. In the right image, P recision = 1.0 and Recall < 1.0. Image source: AMRA.

Specificity

Specificity is a measure which describes how accurate a method determines true negatives within the set of all attempts to do so. When a method gives a true negative answer, the specificity measurement informs us of how likely the answer is to be correct. The equation is given by

TN Specif icity = . (2.63) TN + FP Dice

A common similarity metrics explained in [22] is the Dice Similarity Coefficient given by

2TP ΩI ΩI DSC = = 2 · || R ∩ T || , (2.64) 2TP + FP + FN Ω + Ω || IR || || IT || Ω Ω where IR and IT are the regions of the template and reference image. The advantage of Dice compared to precision and recall is that both false positives and false negatives are included in the metric. In the images illustrated in figure 2.6, the Dice Similarity Coefficent will accurately describe the relation between true positives and the sum of false positives and false negatives. The downside 2.5 Evaluation 27 of Dice is that it doesn’t describe which type of error, false positives or false neg- atives that occurs the most.

3 Method

This chapter describes the methods used to answer the problems formulated in section 1.2. First, the Free Form Deformation (FFD) and Demon algorithms were compared to the Morphon algorithm. Second, the Demon algorithm was further investigated.

Section 3.1 describes the exact variants chosen for the Morphon, FFD and Demon algorithms. Section 3.2 describes the further investigations with the Demon algorithm. Section 3.3 describes the regions that are of interest to segment. Section 3.4 describes the datasets and the properties of the templates (prototypes) and reference images. Section 3.5 describes how the prototypes were registered to the reference image, how the most appropriate prototypes are selected and how the segmentation of different tissue are performed. Section 3.6 explains the evaluation and its corresponding measurements.

3.1 Algorithm comparison

Before presenting the algorithms, it is important to mention that FFD and De- mon both measure voxel intensity differences. It was possible to use this distance measure because the images were normalized as mentioned in section 1.1.

All algorithms evaluated in this thesis used a multiscale approach where the fi- nal displacement was found by iterating the algorithms several times on several scales and accumulating the displacement field accordingly. When describing an algorithm, a table is included which contains the parameters for the particu- lar implementation used in this thesis. The parameter downsampling factor is a

29 30 3 Method scalar which determines how much the images were downsampled for each scale. Start scale determines how many times the images were downsampled were the first iterations were made and finally the end scale determines on which scale the last iterations were made. The finest scale, which is the original size of the image, is equal to 0. In order to avoid any confusion, an example is provided:

Assume that the start scale is 3, the end scale is 0 and the downsampling factor between one scale and the next is 2. The scale were the first iterations are made are generated by downsampling the original image by 23 = 8 and the second scale were generated by downsampling the original image by 22 = 4 and so on. The final scale were generated by downsampling the original image by 20 = 1, which is the finest scale and the original image itself.

A flowchart which describes the different steps in the registration process can be seen in figure 3.1. The results from this part are found in section 4.1. 3.1 Algorithm comparison 31

Figure 3.1: A flowchart that describes the different steps in the registration process. 32 3 Method

3.1.1 Morphon

The Morphon was regarded as the baseline method of this thesis and was in- cluded to evaluate the results of FFD and Demon. The Morphon algorithm is implemented by AMRA in Python and constructed as described in section 2.4.1 where the certainties where computed as in (2.34). The specific parameters of their implementation are not specified in this thesis.

3.1.2 Free form deformation

Section 2.4.3 explains the general idea of FFD and how to find the transformation in-between the control points. This section describes the method used to find the displacements of the control points.

As mentioned in section 2.4.3, the idea of the FFD algorithm is to find an affine transformation of the control points and then interpolate of the displacement for the remaining voxels with cubic B-splines. In order to find an accurate transfor- mation which remain affine for the control points, the problem was formulated as a minimization problem with SSD (2.16) as a distance measure and curvature regularization (2.27) of the control points.

SSD was used since the images are normalised and any other distance measure would only cost more in terms of computational speed. As explained in section 2.3.1, curvature regularization implicitly adds a cost for non-affine transforma- tions of the control points. This does not necessarily mean that the transforma- tion will be affine, only that the regularization term penalizes non-affine transfor- mations.

The cost function is given by

Z n 1 2 1 X 2 min (u1,...,n) = (IR(x) IT (x + u(x)) dx + λ · · (∆ui) , (3.1) u1,...,n 2 − 2 Ω i=1 where u(x) is the displacement of a voxel, λ is a weight that controls the regu- larization, ui is the displacement of the ith control point and n is the number of control points.

The optimal solution to (3.1) was found with the implicit Euler method. A list of the parameters used for the FFD algorithm in this thesis are given by table 3.1. The implementation used for the FFD algorithm is a part of the Medical Image Registration Toolbox (MIRT) created by Andriy Myronenko in Matlab [14]. 3.1 Algorithm comparison 33

Table 3.1: Parameters for the FFD algorithm

Parameter Value Downsampling factor 2 Start scale 2 End scale 0 Iterations per scale [100 100 100] Control point distance 8 voxels λ 0.01

3.1.3 Demon The Demon algorithm that was evaluated had the following specifics. The incre- mental displacement field estimate δu was given by

I I δu = R − T I , (3.2) I 2 + (I I )2 ∇ R ||∇ R|| R − T explained in section 2.4.2. The gradient was computed by the MATLAB® function gradient which gives the numerical gradient. The displacement field u and the incremental displacement field δu were accumulated with a simple sum,

u u + δu, (3.3) ← explained in section 2.1.2. The displacement field u and the incremental displace- ment field δu were regularized with normalized convolution,

(uc) g u = ∗ , (3.4) c g ∗ (δuc) g δu = ∗ , (3.5) c g ∗ where g is a Gaussian low pass filter. The certainty field is given by the gradient magnitude of the reference image,

c = I , (3.6) |∇ R| which is constant for each scale. A list of the parameters used for the Demon algorithm are given by table 3.2. The implementation of the Demon algorithm is a part of the Fordanic/Image-registration Toolbox created by Daniel Forsberg in Matlab [3]. 34 3 Method

Table 3.2: Parameters used for the Demon algorithm. Note that the last value of the parameter iterations per level is the number of iterations for the end scale.

Parameter Value Downsampling factor 2 Start scale 3 End scale 0 Iterations per scale [50 50 50 25] Gaussian filter size 5 x 5 x 5 Gaussian filter sigma 1 voxel

3.2 Further investigation

In the first part of the thesis, the water images were used to register the proto- types to the reference images for each algorithm. In this part, the effect of in- cluding the fat images in the registrations was evaluated. This modification is applicable to all the algorithms presented so far and the Demon algorithm (with the water images as input) was chosen as the basis algorithm since it is the fastest and easiest to modify.

The first change was to simply register the prototypes to the reference images by using the fat instead of the water images. The parameters used were the same as for the Demon algorithm with the water images as input which can be seen in table 3.2.

The second change was to combine the water and fat images in the registrations. This was done by first using the fat images on scale 3, 2 and 1 followed by the water images on scale 1 and 0. A list of the number of iterations done on each scale are given by table 3.3. The remaining parameters remained the same as in table 3.2.

The results from this part are found in section 4.2. 3.3 Region definitions 35

Table 3.3: Parameters used for the Demon algorithm when both the fat and water images were used in the registrations.

Parameter Value Start scale fat 3 End scale fat 1 Iterations per scale fat [50 50 50] Start scale water 1 End scale water 0 Iterations per scale water [50 25]

3.3 Region definitions

In order to understand the method by which the prototypes are registered to a reference image and how this leads to a segmentation of the reference image, it is important to describe which regions that are of interest to segment and how they are defined. Why these regions are of particular interest and why the mus- cles are grouped in a particular way are not relevant for this thesis, however the definitions of these regions are. The definitions are given below and images that illustrates these regions can be seen in figure 3.2. Note that there is a difference between VAT/ASAT measurements and VAT/ASAT regions. The measurements determines the total fat volume within a particular region.

VAT region

The region that determines where the visceral adipose tissue is located is defined as the region inside the peritoneum where the upper boundary is the diaphragm and the lower boundary is determined by the hip bone and sacrum.

ASAT region

The region that determines where the abdominal subcutaneous adipose tissue is located is defined by all the tissue outside the abdominal muscles where the up- per boundary is the 9th thoracic vertebrae and the lower boundary is the top of the femural head.

Left/Right Upper Leg Front

These regions, LULF and RULF, are defined as the union between the muscles sartorius, tensor fasciae latae and quadriceps femoris in the left and right leg. 36 3 Method

Left/Right Upper Leg Back

These regions, LULB and RULB, are defined as the union between the muscles gluteus minimus, medius and maximus, semimenbranosus, semitendinosus and biceps femoris in the left and right leg.

Figure 3.2: The image to the left illustrate the upper leg masks. Blue pixels represent left upper leg back, yellow represents left upper leg front, magenta represents right upper leg back and cyan represents right upper leg front. The image to the right illustrate the VAT and ASAT regions where red pixels represent VAT and blue pixels represent ASAT. Image source: AMRA.

3.4 Dataset

This section describes the datasets used as template and reference images in this thesis. From now on, the template images is referred to as prototypes since that is the name convention in the context of full body image registration.

The prototypes and reference images were taken from the UK bio-bank which is a major international health resource. These images were obtained from an MRI scan and as mentioned in section 1.1, an MRI scan generates both a water and a fat image. The prototypes and reference images could either refer to the water or the fat images depending on the context.

A set of 29 prototypes was used to register each reference image. This was needed in order to represent many different body types and to give a more accurate seg- mentation. The prototype set was fixed to ensure a fair comparison between the algorithms and to have one parameter less to tune. The set of reference images used for evaluation consisted of 67 images with varying body composition. 3.4 Dataset 37

The size of the images are the same along the x and y axis while the size along the z axis differs. The varying sizes and the resolution of the images can be seen in table 3.4.

Table 3.4: Size and resolution of UK bio-bank images

Property UK bio-bank images x-size 224 voxels y-size 174 voxels z-size 320 - 370 voxels x-resolution 2.232 mm y-resolution 2.232 mm z-resolution 2.996 mm

Even though datasets with images of larger sizes and higher resolution were avail- able, the lack of computational power made it very time consuming to evaluate the algorithms on those datasets.

3.4.1 Region masks Each prototype and reference image has a set of six binary masks pertaining to them that indicate which region each voxel belongs to. The binary masks corre- spond to the regions described in section 3.3. An example of a prototype and a pertaining binary mask can be seen in figure 3.3.

Figure 3.3: The image to the left illustrate the water image from a prototype and the image to the right illustrate the binary mask that indicate the left upper leg front region. Image source: AMRA. 38 3 Method

The region masks pertaining to the prototypes are accurate and well segmented. It is however paramount to mention the precision of the ground truth masks pertaining to the reference images and their flaws. The procedure of obtaining a ground truth mask is divided into two steps:

1. The Morphon algorithm autonomously generates suggestions, i.e. binary masks for each region. 2. The suggestions are manually inspected and corrected (if needed).

Since AMRA sell body composition measurements, the suggested masks are only corrected if they contain regions that will add error to the measurement. As an example, if the ASAT mask leaks into the VAT region (which contain a fat signal), the mask will be corrected. However, if the ASAT mask leaks outside of the body (which contains no fat signal) the mask is not corrected. An image which illus- trate this effect for the muscle regions can be seen in figure 3.4. Note that the ground truth masks are corrected where the image contains a signal that will add to the muscle volume.

With this in mind, an evaluation metric which only considers the overlap between the estimated region mask generated by an algorithm and the ground truth mask pertaining to the reference image will not only be inaccurate, but also biased toward the Morphon. How this was avoided is explained in section 3.6.

Figure 3.4: An example of a flawed ground truth masks. The images to the left illustrate the front and back leg ground truth masks. The images to the right illustrate the masks after deformation with the Morphon algorithm. Blue pixels represent left upper leg back, yellow represents left upper leg front, magenta represents right upper leg back and cyan represents right upper leg front. Image source: AMRA. 3.5 Deformation and segmentation 39

3.5 Deformation and segmentation

This section explains the method by which the prototypes are registered to a reference image and how this leads to a segmentation of the reference image. To make the content easier to digest, a flowchart which describes the different steps of the method can be seen in figure 3.5.

Figure 3.5: A flowchart that describes the different steps in segmenting a reference image. 40 3 Method

3.5.1 Prototype deformations In order to segment a reference image, the first step was to register each proto- type to the reference image. This means that, for each prototype, a displacement field was found in order to map the prototype into the reference image. The field was obtained from one of the algorithms explained in section 3.1. An example of a prototype before and after it is deformed into a reference image can be seen in figure 3.6.

Figure 3.6: This figure illustrates the effects of an image registration with the fat images in the coronal plane of a body. The image to the left is a prototype, the image in the middle is a deformed prototype and the image to the right is a reference image. Note that the prototype is more similar to the reference image after the deformation than before. Image source: AMRA.

3.5.2 Prototype selection After the mapping between each prototype and a reference image is obtained, only a subset of prototypes is allowed to impact the segmentation. The idea is to determine which prototypes, after deformation, that are most similar to the reference image in each of the regions described in section 3.3. This is done by: First, finding the upper and lower boundary of the different regions in the refer- ence image. Second, computing the normalized cross correlation (2.18), for each region, between the deformed prototypes and the reference image. Third, the prototypes with the highest NCC values are selected to create a probability field for each region. Note that the selected prototypes might differ from region to region.

In order to find the regions of interest in the reference image, the displacement fields obtained by the registrations are applied to the region masks pertaining to each prototype. Then, the min and max z values of the deformed masks are used to determine the upper and lower boundary of these regions. The normalized cross correlation are then computed between the deformed prototypes and the reference image in the area determined by these boundaries. 3.5 Deformation and segmentation 41

Muscle masks pertaining to a prototype before and after a registration on top of reference image can be seen in figure 3.7.

Figure 3.7: These images illustrate the front and back leg masks pertaining to a prototype before and after deformation on top of a reference image. The images to the left illustrate the masks before the deformation and the images to the right illustrate the masks after the deformation. Note how the masks fit the reference image much better after the deformation. Blue pixels represent left upper leg back, yellow represents left upper leg front, magenta represents right upper leg back and cyan represents right upper leg front. Image source: AMRA.

3.5.3 Probability field

In order to determine whether a voxel in the reference image is part of a partic- ular region, a probability field PFregion is created from the deformed masks per- taining to the selected prototypes with the highest NCC values. The probability that a specific voxel belongs to a certain region is given by

Nregion 1 X PF (v) = M (v) (3.7) region N region,p region p=1 where Mregion,p is a deformed mask for a specific region of the pth prototype, v is a particular voxel and Nregion is the number of prototypes used to create the probability field for a specific region. 42 3 Method

3.5.4 Segmentation

The final segmentations, i.e. creating the binary masks of each region Mregion in the reference image, are obtained by thresholding the probability fields,

Mregion(v) = (PFregion(v) > Tregion), (3.8) where Tregion is the threshold for a specific region.

In this thesis, the optimal combination of Nregion and Tregion was found with an semi-exhaustive search for the parameter that resulted in the highest mean pre- cision and recall. The term semi-exhaustive search is used since not all values of N were evaluated, only 7, 9, ..., 23 . All thresholds however, for each value region { } of Nregion, were evaluated.

A list of the optimal numbers of prototypes used to create the probability fields and the optimal thresholds used to segment the probability fields, for each algo- rithm and region, can be seen in table 4.1 and 4.5.

3.5.5 Disjoint regions

After the regions are segmented, there is a possibility that the binary masks over- lap. This overlap has to be dealt with in order to create disjoint regions and compute accurate body composition measurements. If a voxel is a part of two re- gions, the probability field with the highest probability, for that particular voxel, determines which region it will ultimately belong to.

3.6 Evaluation

After the regions of interest had been segmented, the registration was evaluated and compared to the ground truth. Two types of evaluation measurements, vol- ume difference and the Bland-Altman plot, were included to evaluate how the registration affects the body composition measurements. The other evaluation measurements, weighted precision and recall, were used to evaluate how the reg- istration affects the segmentation.

3.6.1 Body composition measurements

The body composition measurements used were VAT, ASAT and fat free muscle volume of the upper leg regions, all of which are described in section 2.5.1 but repeated in this section with a slightly different notation. The estimated body composition measurements are given by 3.6 Evaluation 43

X VATest = f · MV AT ,est · Mbody · Vvoxel, (3.9) voxels ∀ X ASATest = f · MASAT ,est · Mbody · Vvoxel, (3.10) voxels ∀ X LULFest = (f < 0.5) · MLU LF,est · Mbody · Vvoxel, (3.11) voxels ∀ X RULFest = (f < 0.5) · MRU LF,est · Mbody · Vvoxel, (3.12) voxels ∀ X LULBest = (f < 0.5) · MLU LB,est · Mbody · Vvoxel, (3.13) voxels ∀ X RULBest = (f < 0.5) · MRU LB,est · Mbody · Vvoxel. (3.14) voxels ∀

The fat-concentration, f , is the intensity in a particular voxel of the reference im- age. MV AT ,est, MASAT ,est, MLU LF,est, MRU LF,est, MLU LB,est and MRU LB,est are the estimated binary masks generated by the method explained in section 3.5 and correspond to (3.8). Mbody is a binary body mask of the reference image. Note that each voxel within any of the muscle regions with a fat concentration less than 0.5 is considered to be a fat-free muscle voxel.

To determine how accurately each algorithm computed these measurements, vol- ume differences and Bland-Altman plots were used. These measurements tell how accurately the algorithm estimate the correct volume, but not necessarily how well the algorithm determine the correct segmentation.

The total volume difference vol,tot is given by the difference between the esti- mation of the volume and the ground truth volume in a particular region. The equation is given by

 = bcm bcm , (3.15) vol,tot est − gt where bcmest is the estimated measurement and bcmgt is the ground truth mea- surement. Bcmest can refer to either VATest, ASATest, LULFest, RULFest, LULBest or RULBest given by (3.9 - 3.14). Bcmgt is generated by the same equations where the estimated binary mask has been substituted by the ground truth binary mask for each region.

The Bland-Altman plot explained in section 2.5.1 was used to give a visual rep- resentation of the difference between an estimated and a ground truth body com- position measurement for each reference image in the evaluation set. 44 3 Method

3.6.2 Weighted segmentation

As mentioned in section 3.4.1, the ground truth masks are sometimes flawed and biased toward the Morphon algorithm. In order to evaluate the segmentation, weighted precision and recall were used to cancel the bias toward the Morphon algorithm. Weighted precision and recall are modified versions of the definition described in section 2.5.2. One way of describing them is that the segmentation measurements are weighted with the density of the volume in each voxel. For VAT and ASAT, TPv, FPv and FNv are given by

X TPv = f · Mintersection, (3.16) voxels ∀ X FP = f ·(M M ), (3.17) v est − intersection voxels ∀ X FN = f ·(M M ), (3.18) v gt − intersection voxels ∀ where Mintersection is the intersection between the estimated binary region mask Mest and the ground truth binary region mask Mgt and f is the fat-concentration in a particular voxel. For muscles, TPv, FPv and FNv are given by

X TPv = (f < 0.5) · Mintersection, (3.19) voxels ∀ X FP = (f < 0.5) · (M M ), (3.20) v est − intersection voxels ∀ X FN = (f < 0.5) · B(M M ), (3.21) v GT − intersection voxels ∀ where (f < 0.5) is equal to 1 when the fat-concentration is less than 0.5 and 0 otherwise. Images that illustrate the difference between these weighted measures and their unweighted counterparts from section 2.5.2 can be seen in figure 3.8. Precision and recall were then computed as usual,

TPv P recisionv = , (3.22) (TPv + FPv) TPv Recallv = . (3.23) (TPv + FNv) 3.6 Evaluation 45

Figure 3.8: The images to the left illustrate the overlap between the flawed ground truth masks and deformed masks with the FFD algorithm. Here, yellow represents TP , magenta represents FP and cyan represents FN ex- plained in section 2.5.2. The images to the right illustrate the overlap be- tween ground truth masks and deformed masks with the FFD algorithm were the masks are weighted by the volume density. Here, yellow repre- sents TPv, magenta represents FPv and cyan represents FNv corresponding to (3.19), (3.20) and (3.21). Image source: AMRA.

3.6.3 Statistics To be able to draw as accurate conclusions as possible from the body composition and segmentation measurements, the statistics sample mean µ, standard devia- tion σ, median xmed, minimum xmin, maximum xmax and range xrange were used.

4 Results

This chapter presents the results and statistics of the body composition measure- ments, weighted precision and recall explained in section 3.6.1 and 3.6.2. To reduce the number of plots and tables and to make the content easier to digest, the statistics from the muscle regions LULB, LULF, RULB and RULF are grouped together and presented as one region.

For every region, a Bland-Altman plot is included that describes the difference between the ground truth volume and the estimated volume for every sample in the dataset. In addition, a table with the statistics from the estimated volume, weighted precision and recall is included as well. Note that the volume is always given in liters and keep in mind that the bias line (blue line) in the Bland-Altman plots correspond to the volume difference mean in the tables. The red dotted lines, which represent the 95% confidence interval, are given by bias 1.96σ ± dif f where σdif f is the standard deviation of the volume difference which is also given in the tables.

To make the text easier to read, the 95% confidence interval will be referred to as the confidence interval from now on. Weighted precision and recall will be referred to as just precision and recall.

4.1 Algorithm comparison

This section presents the results of the Morphon, FFD and Demon algorithms when using the water images as input to the registration algorithms. A table with the optimal number of prototypes used to create the probability field and the optimal threshold for each region can be seen in table 4.1. These values where found via the method explained in section 3.5.2 and 3.5.3.

47 48 4 Results

Table 4.1: Table with optimal number of prototypes used to create the prob- ability field and the optimal threshold for each region and algorithm. Total number of prototypes: 29.

Region Morphon FFD Demon VAT Prototypes 9 17 23 VAT Thresholds 0.5 0.47 0.47 ASAT Prototypes 9 19 23 ASAT Thresholds 0.6 0.31 0.34 Muscle Prototypes 9 19 21 Muscle Thresholds 0.6 0.84 0.52

4.1.1 Visceral adipose tissue By looking at figure 4.1, it is clear that the Morphon and Demon algorithms have the most narrow confidence interval while the FFD algorithm have one or two outliers each which makes their confidence interval a bit wider. This difference is also reflected by the standard deviation of the volume difference in table 4.2. Even though the Morphon and Demon algorithms have similar standard devia- tion, the Morphon has a mean and median volume difference closer to zero and a smaller range. Also, note that the mean precision and recall for the Morphon algorithm are well balanced while the FFD and Demon algorithms have a higher mean recall than mean precision.

Another result worth highlighting is that the Demon algorithm is the most unbal- anced since it has the highest mean recall but the lowest mean precision of them all.

An example where the Morphon algorithm manage to segment the correct VAT region while the FFD algorithm underestimate the VAT region for one of the sam- ples can be seen in figure 4.2. 4.1 Algorithm comparison 49

Figure 4.1: Bland-Altman plot of the VAT measurements for the Morphon, FFD and Demon algorithm. The green dots represent each sample in the set, the blue line represents the estimated bias and the red dotted lines represent the upper and lower confidence interval.

Figure 4.2: The image to the left illustrates the vat mask deformed with the Morphon algorithm. The image to the right illustrates the vat mask de- formed with the FFD algorithms. Yellow pixels represent true positive, ma- genta represents false positives and cyan represents false negative. Note that the FFD algorithm underestimate the size of the VAT region for this image. Image source: AMRA. 50 4 Results

Table 4.2: This table contains the statistics from the volume estimation, pre- cision and recall for VAT. VD stands for volume difference which is the dif- ference between the ground truth volume and the estimated volume. The volume measurements are given in liters.

Measurements Morphon Free form deformation Demon VD Mean σ -0.029 0.162 0.023 0.243 0.053 0.181 ± ± ± ± VD Median 0.017 0.088 0.072 VD Min -0.579 -1.377 -0.852 VD Max 0.422 0.305 0.552 VD Range 1.001 1.682 1.404 Precision Mean σ 0.969 0.023 0.951 0.024 0.943 0.050 ± ± ± ± Precision Median 0.975 0.957 0.962 Precision Min 0.887 0.865 0.767 Precision Max 0.996 0.983 0.991 Precision Range 0.108 0.118 0.224 Recall Mean σ 0.971 0.030 0.971 0.038 0.977 0.020 ± ± ± ± Recall Median 0.984 0.987 0.985 Recall Min 0.849 0.801 0.909 Recall Max 0.998 0.997 0.995 Recall Range 0.149 0.197 0.087

4.1.2 Abdominal subcutaneous adipose tissue In the Bland-Altman plot presented in figure 4.3, the measurements from the Morphon algorithm contain an outlier which underestimate the ASAT volume by 4 liters. This outlier heavily affects the standard deviation and range of the vol- ume difference which can be seen in table 4.3. Note that the Demon algorithm has the lowest standard deviation of the volume difference, a mean and median volume difference closest to zero and the smallest range compared to the other algorithms. The Demon algorithm also has the most well balanced mean preci- sion and recall while the Morphon algorithm has a higher mean precision and the FFD algorithm has a higher mean recall.

An example where the Demon algorithm manage to segment the correct ASAT region while the Morphon algorithm underestimate the ASAT region for one of the samples can be seen in figure 4.4. 4.1 Algorithm comparison 51

Figure 4.3: Bland-Altman plot of the ASAT measurements for the Morphon, FFD and Demon algorithm. The green dots represent each sample in the set, the blue line represents the estimated bias and the red dotted lines represent the upper and lower confidence interval.

Figure 4.4: The image to the left illustrates the asat mask deformed with the Morphon algorithm. The image to the right illustrates the ASAT mask deformed with the Demon algorithms. Yellow pixels represent true positive, magenta represents false positives and cyan represents false negative. Note that the Morphon algorithm underestimate the size of the ASAT region for this image. Image source: AMRA. 52 4 Results

Table 4.3: This table contains the statistics from the volume estimation, pre- cision and recall for ASAT. VD stands for volume difference which is the difference between the ground truth volume and the estimated volume. The volume measurements are given in liters.

Measurements Morphon Free form deformation Demon VD Mean σ -0.175 0.588 0.177 0.336 0.045 0.196 ± ± ± ± VD Median -0.053 0.097 0.048 VD Min -4.275 -0.612 -0.844 VD Max 0.380 1.683 0.849 VD Range 4.655 2.295 1.693 Precision Mean σ 0.991 0.012 0.970 0.029 0.983 0.022 ± ± ± ± Precision Median 0.996 0.981 0.989 Precision Min 0.942 0.853 0.842 Precision Max 1.000 0.994 0.999 Precision Range 0.058 0.140 0.157 Recall Mean σ 0.976 0.037 0.994 0.014 0.989 0.023 ± ± ± ± Recall Median 0.984 0.999 0.997 Recall Min 0.745 0.930 0.850 Recall Max 0.999 1.000 1.000 Recall Range 0.255 0.070 0.150

4.1.3 Muscles In the Bland-Altman plot presented in figure 4.5, it is quite clear that the mea- surements from the FFD algorithm correspond well to the ground truth measure- ments except for a single outlier. For the Morphon algorithm, the RULF and LULF measurements tend to be overestimated while RULB and LULB tend to be underestimated. The measurements from the Demon algorithm tend to be quite inaccurate compared to the other algorithms.

The narrow confidence interval for the FFD algorithm is reflected in the low stan- dard deviation of the volume difference which can be seen in table 4.4. It is interesting that, despite the low standard deviation, the mean precision and re- call is lower than for the Morphon algorithm.

An example where the FFD algorithm manage to segment the correct muscle re- gions while the Morphon algorithm overestimate the muscle regions for one of the samples can be seen in figure 4.6. 4.1 Algorithm comparison 53

Figure 4.5: Bland-Altman plot of the muscle measurements for the Mor- phon, FFD and Demon algorithms. The cyan dots represent LULB, the red dots represent LULF, the yellow dots represent RULB, the blue dots represent RULF for each sample in the set. The blue line represents the estimated bias and the red dotted lines represent the upper and lower confidence interval.

Figure 4.6: The image to the left illustrates the upper leg back masks per- taining to a prototype after deformation with the Morphon algorithm. The image to the right illustrates the masks after deformation with the FFD al- gorithm. Yellow pixels represent true positives, magenta represents false positives and cyan represents false negative. Note that the Morphon algo- rithm has overestimated the size of the muscle regions in this image. Image source: AMRA. 54 4 Results

Table 4.4: This table contains the statistics from the volume estimation, pre- cision and recall for the muscle regions. VD stands for volume difference which is the difference between the ground truth volume and the estimated volume. The volume measurements are given in liters.

Measurements Morphon Free form deformation Demon VD Mean σ -0.003 0.049 -0.011 0.032 -0.022 0.063 ± ± ± ± VD Median -0.002 -0.014 -0.011 VD Min -0.168 -0.084 -0.272 VD Max 0.179 0.277 0.128 VD Range 0.347 0.361 0.400 Precision Mean σ 0.983 0.018 0.976 0.015 0.971 0.013 ± ± ± ± Precision Median 0.989 0.978 0.974 Precision Min 0.830 0.783 0.883 Precision Max 1.000 0.993 0.990 Precision Range 0.170 0.211 0.107 Recall Mean σ 0.987 0.009 0.970 0.013 0.963 0.024 ± ± ± ± Recall Median 0.989 0.974 0.970 Recall Min 0.943 0.924 0.844 Recall Max 0.999 0.993 0.994 Recall Range 0.057 0.069 0.149

4.2 Further investigation

This section presents the results from the modifications of the Demon algorithm. To avoid any confusion, note that Demon Water in this section refer to Demon in the previous section. Demon Fat refer to the Demon algorithm where the fat im- ages are used for registration and to determine similarity between the deformed prototypes and the reference images.

In Demon Fat Water, the fat images were first used on scale 3, 2 and 1 and the water images were then used on scale 1 and 0 in the registrations. The water im- ages were used to determine similarity between the deformed prototypes and the reference images.

A table with the optimal number of prototypes used to create the probability field and the optimal threshold for each region and algorithm can be seen in table 4.5. These values were found via the method explained in section 3.5.2 and 3.6.2. 4.2 Further investigation 55

Table 4.5: Table with optimal number of prototypes used to create the prob- ability field and the optimal threshold for each region and algorithm. Total number of prototypes: 29.

Region Demon Water Demon Fat Demon Fat Water VAT Prototypes 23 7 17 VAT Thresholds 0.52 0.57 0.52 ASAT Prototypes 23 7 15 ASAT Thresholds 0.34 0.57 0.53 Muscles Prototypes 23 13 19 Muscles Thresholds 0.56 0.46 0.57

4.2.1 Visceral adipose tissue The first observation that can be made is that the confidence interval in figure 4.7 is more narrow when using the fat images or the combination of the fat and water images. The samples in Demon fat and Demon fat water are in general closer to the bias line and there are no significant outliers. This is also reflected in table 4.6 where the range of the volume difference has dropped from 1.404 to 0.650 or 0.621 when the fat images are included in the registrations. The standard de- viation of the volume difference is almost reduced by a factor of 2. An example where the outcome of the fat registration is much more accurate than the water registration can be seen in figure 4.8.

Another result worth highlighting, which can be seen in table 4.6, is that the combination of the fat and water images has the lowest standard deviation and range of the volume difference. It also has the highest mean and lowest standard deviation for precision and recall. 56 4 Results

Figure 4.7: Bland-Altman plot of the VAT measurements for each variant of the Demon algorithm. The green dots represent each sample in the set, the blue line represents the estimated bias and the red dotted lines represent the upper and lower confidence interval.

Figure 4.8: The image to the left illustrates the VAT mask deformed with water images and the Demon algorithm. The image to the right illustrates the VAT masks deformed with fat images and the Demon algorithm. Yellow pixels represent true positive, magenta represents false positives and cyan represents false negative. Image source: AMRA. 4.2 Further investigation 57

Table 4.6: This table contains the statistics from the volume estimation, pre- cision and recall for VAT. VD stands for volume difference which is the dif- ference between the ground truth volume and the estimated volume. The volume measurements are given in liters.

Measurements Demon water Demon fat Demon fat water VD Mean σ 0.053 0.181 0.066 0.110 0.056 0.105 ± ± ± ± VD Median 0.072 0.068 0.070 VD Min -0.852 -0.294 -0.363 VD Max 0.552 0.356 0.258 VD Range 1.404 0.650 0.621 Precision Mean σ 0.943 0.050 0.945 0.046 0.950 0.038 ± ± ± ± Precision Median 0.962 0.960 0.962 Precision Min 0.767 0.774 0.819 Precision Max 0.991 0.995 0.991 Precision Range 0.224 0.221 0.172 Recall Mean σ 0.977 0.020 0.975 0.026 0.977 0.024 ± ± ± ± Recall Median 0.985 0.983 0.985 Recall Min 0.909 0.823 0.854 Recall Max 0.995 0.996 0.995 Recall Range 0.087 0.173 0.141

4.2.2 Abdominal subcutaneous adipose tissue Even here, the observation is that the confidence interval in figure 4.9 is more narrow when using only the fat images or the combination of the fat and water images in the registrations.

In table 4.7, note that the standard deviation of the volume difference is almost reduced by a factor of 2 when including the fat images. Demon fat water have a slightly lower standard deviation but the range is lower for Demon fat when it comes to volume difference. Also note that Demon fat water has a better mean precision and recall compared to the other algorithms.

An example where the outcome of the fat registration is much more accurate than the water registration can be seen in figure 4.8. 58 4 Results

Figure 4.9: Bland-Altman plot of the ASAT measurements for each variant of the Demon algorithm. The green dots represent each sample in the set, the blue line represents the estimated bias and the red dotted lines represent the upper and lower confidence interval.

Figure 4.10: The image to the left illustrates the ASAT mask deformed with water images and the Demon algorithm. The image to the right illustrates the ASAT masks deformed with fat images and the Demon algorithm. Yellow pixels represent true positive, magenta represents false positives and cyans represent false negative. Image source: AMRA. 4.2 Further investigation 59

Table 4.7: This table contains the statistics from the volume estimation, pre- cision and recall for ASAT. VD stands for volume difference which is the difference between the ground truth volume and the estimated volume. The volume measurements are given in liters.

Measurements Demon water Demon fat Demon fat water VD Mean σ 0.045 0.196 0.021 0.106 0.024 0.104 ± ± ± ± VD Median 0.048 -0.004 0.018 VD Min -0.844 -0.132 -0.226 VD Max 0.849 0.465 0.574 VD Range 1.693 0.597 0.801 Precision Mean σ 0.983 0.022 0.984 0.028 0.989 0.013 ± ± ± ± Precision Median 0.989 0.995 0.994 Precision Min 0.842 0.822 0.920 Precision Max 0.999 0.999 0.999 Precision Range 0.157 0.176 0.078 Recall Mean σ 0.989 0.023 0.990 0.013 0.993 0.015 ± ± ± ± Recall Median 0.997 0.993 0.997 Recall Min 0.850 0.923 0.883 Recall Max 1.000 0.998 1.000 Recall Range 0.150 0.075 0.117

4.2.3 Muscles Even though it is hard to notice in figure 4.11, the algorithms that include the fat images have a smaller confidence interval than Demon water. The difference is not as clear as for VAT or ASAT, but as can be seen in table 4.8, the standard deviation of the volume difference for Demon fat and Demon fat water is slightly smaller than for Demon water.

An example where the outcome of the fat registration is much more accurate than the water registration can be seen in figure 4.12. 60 4 Results

Figure 4.11: Bland-Altman plot of the muscle measurements for each variant of the Demon algorithm. The cyan dots represent LULB, the red dots repre- sent LULF, the yellow dots represent RULB, the blue dots represent RULF for each sample in the set. The blue line represents the estimated bias and the red dotted lines represent the upper and lower confidence interval.

Figure 4.12: The image to the left illustrates the LULF mask deformed with water images and the Demon algorithm. The image to the right illustrates the LULF masks deformed with fat images and the Demon algorithm. Yellow pixels represent true positive, magenta represents false positives and cyan represents false negative. Image source: AMRA. 4.2 Further investigation 61

Table 4.8: This table contains the statistics from the volume estimation, pre- cision and recall for the muscle regions. VD stands for volume difference which is the difference between the ground truth volume and the estimated volume. The volume measurements are given in liters.

Measurements Demon water Demon fat Demon fat water VD Mean σ -0.022 0.063 0.016 0.048 -0.011 0.051 ± ± ± ± VD Median -0.011 0.017 -0.004 VD Min -0.272 -0.228 -0.299 VD Max 0.128 0.131 0.124 VD Range 0.400 0.359 0.423 Precision Mean σ 0.971 0.013 0.966 0.015 0.971 0.016 ± ± ± ± Precision Median 0.974 0.969 0.974 Precision Min 0.883 0.845 0.819 Precision Max 0.990 0.985 0.990 Precision Range 0.107 0.140 0.170 Recall Mean σ 0.963 0.024 0.973 0.019 0.968 0.020 ± ± ± ± Recall Median 0.970 0.978 0.973 Recall Min 0.844 0.853 0.843 Recall Max 0.994 0.995 0.994 Recall Range 0.149 0.141 0.151

5 Discussion

In the first part of this chapter, the methods and results described in chapter 3 and 4 are discussed. This is followed up by section 5.3 where suggestions regard- ing future work are proposed.

5.1 Results

Two sections are included: the first section discusses the results from the first part of the thesis where different algorithms were evaluated. In the second sec- tion, reflections on the second part of the thesis, where the effect of including the fat images in the registrations, are presented.

Before discussing the results of the body composition measurements and the weighted segmentation, it is important to mention the meaning of the optimal number of prototypes used to create the probability fields and the optimal thresh- olds used to segment the probability fields for each algorithm presented in sec- tion 4.1.

When a mask pertaining to a prototype is deformed, it will sometimes under- estimate and sometimes overestimate the size of a region compared to what is considered ground truth. It is reasonable to say, for an unbiased algorithm, that the mean error after a deformation is zero and that the noise in the images is the cause for the under- or overestimation. With this in mind, one could argue that the optimal thresholds of the probability fields are 0.5 if an algorithm is unbi- ased. High thresholds (> 0.5) could indicate that the prototypes in general tend to overestimate the regions of interest and that low thresholds (< 0.5) could indi- cate that the prototypes in general underestimate the regions of interest.

63 64 5 Discussion

The optimal numbers of prototypes used to create the probability fields indicate how robust the registration is as long as the optimal thresholds remain close to 0.5. As an example, large numbers of prototypes in combination with high or low thresholds indicate that the large numbers of prototypes compensate for the bias. However, if the thresholds remain close to 0.5 and that many prototypes are used to create the probability fields, it is an indication that the registration method is robust.

5.1.1 Algorithm comparison The first thing to note is that both the FFD and Demon algorithms allow more prototypes to vote compared to the Morphon algorithm, which is presented in table 4.1. However, the muscle threshold for the FFD algorithm is likely to com- pensate for an overestimation while the ASAT threshold compensates for an un- derestimation. The ASAT threshold is also low for the Demon algorithm and the VAT threshold is a bit high for the Morphon algorithm.

By comparing the Bland-Altman plots, volume difference, precision and recall statistics for the different algorithms, it is quite clear that all algorithms perform approximately the same and the errors are of the same order of magnitude. As an example, table 4.3 contains the precision and recall statistics for ASAT. Since the mean precision and recall are equal to or higher than 97%, it is reasonable to say that all algorithms perform quite well. However, there are some differences between the algorithms. The Morphon algorithm gave the best results for VAT, the Demon algorithm was most accurate for ASAT and the FFD algorithm was the most robust algorithm for the muscle regions.

The purpose of this thesis was to evaluate other algorithms and determine whether the results of the Morphon algorithm could be improved upon. Since all the al- gorithms performed equally well, it is important to mention their computational complexity even though they were not evaluated explicitly. It is clear that, by comparing the equations in section 2.4.1 and 3.1.3, that the Demon algorithm has a lower computational complexity compared to the Morphon algorithm. The biggest advantage is that the template image is not convolved with any filters in the Demon algorithm while six quadrature filters are convolved with the tem- plate image in the Morphon algorithm for each iteration. There are also other details which makes the Morphon algorithm more complex that will not be dis- cussed here.

5.1.2 Further investigation In table 4.5, note that Demon water allow the most prototypes to vote followed by Demon fat water and finally Demon fat. What is interesting is that all thresh- olds are close to 0.5 which indicates that the registrations are unbiased except for the ASAT region for Demon water. 5.1 Results 65

By comparing the results for VAT, ASAT and the muscle regions, one conclusion that can be made is that using the fat images for the registrations make the al- gorithms more robust. In general, the fat images tend to result in more accurate registrations, but the biggest impact is the reduction of outliers.

It can reasonably be assumed that including the fat images would result in more accurate registrations in the abdominal area and therefore more accurate VAT and ASAT measurements. By looking at the images in figure 5.1, it is clear that the outer line which represents the skin in the water image is thin and quite weak in some areas. The outer line which represents the ASAT region in the fat image is, on the other hand, much broader and the edges are apparent. Since the algo- rithms presented in this thesis aim to match apparent structures, lines and edges, it is intuitive that the fat images results in more accurate registrations.

Perhaps it is less intuitive that the fat images would result in more accurate regis- trations in the upper leg area. By looking at the images in figure 5.2, it is evident that the same type of reasoning can be applied here as for VAT and ASAT. Note that the structures which separates the back and front upper leg muscles in the water image is also apparent in the fat image.

It is however, not certain whether only the fat images should be included in the registrations or if a combination of fat and water images is preferred. The differ- ences between the two methods are too small to give an indication whether one method is better than the other.

Figure 5.1: Illustration of the difference between the water and fat image seen in the axial plane in the abdominal area. The water image to the left has a thin outer line that represents the skin, note that the line is quite weak on the right side of the image. The fat image to the right has, on the other hand, a much broader outer line that represents the ASAT region. Image source: AMRA. 66 5 Discussion

Figure 5.2: Illustration of the difference between the water and fat image seen in the axial plane in the upper leg area. The water image to the left has a thin outer line that represents the skin, note that the line is quite weak on the right side of the image. The fat image to the right has, on the other hand, a much broader outer line that represents the SAT region. Image source: AMRA.

5.2 Method

This section presents some reflections regarding the potential flaws of the method presented in section 3.

5.2.1 Flawed and biased ground truth As explained in section 3.4.1, the ground truth masks were not perfectly seg- mented for the prototypes and reference images. Even though the evaluation metrics were chosen to take this into consideration, it was difficult to detect sub- tle differences in the registrations since the results were disguised behind flawed masks.

It is also possible that the ground truth masks are biased toward the Morphon since they are determined by a suggestion from the Morphon algorithm followed by manual correction.

5.2.2 Incomplete evaluation of registrations One problem with the method used in this thesis is that the registrations them- selves were never evaluated. As explained in section 3.5, the registered proto- types were used to create a probability field. This probability field was then thresholded to find the final segmentation. The results that were evaluated and presented in section 4 only considered the segmentation and the body composi- tion measurements which came about through the segmentation. In other words, 5.2 Method 67 the outcome of the registrations were evaluated but not the registrations them- selves.

In order to give an accurate and fair description of how well each algorithm reg- istered the prototypes to the reference images, at least one measurement that evaluates registrations should have been included. Since the images are normal- ized, a simple measurement like the sum of squared differences (2.16) for the water and fat images between the deformed prototypes and the reference images would work. Another example of a measurement which could have been used to evaluate the registration is the normalized cross correlation (2.18).

5.2.3 Unequal tuning of algorithms

In this thesis, the Morphon was considered to be the baseline algorithm and the other algorithms were evaluated in order to determine their potential. This com- parison was not really fair since the parameters of the Morphon have been op- timized and refined for many years at AMRA while the parameters of the other algorithms were examined and refined during a much shorter time frame. It is likely that the results presented in this thesis are not the outcome of the global maxima of each algorithm. Another possibility is that, while refining the param- eters, the algorithms have reached a local maxima.

5.2.4 Limited set of prototypes

As explained in section 3, the set of prototypes was constructed so that it would represent the spectra of different body types and enable accurate registrations. However, the downside of having a fixed pool of 29 very different body types is that when trying to register a reference image, most prototypes require large deformations in order to resemble the reference image. It is possible that the algorithms have to be tuned so that these large deformations were possible and thus the true potential of each algorithm might not have been reached. It is quite problematic to create and tune an algorithm so that it can handle all types of varying reference images while also being able to achieve precise deformations.

5.2.5 Training and evaluation set

As explained in sections 3.5.3 and 3.5.4, the optimal number of prototypes used to create the probability field and the optimal threshold used to threshold the probability field were found by an exhaustive search for the best combination of parameters that resulted in the lowest mean precision and recall. These parame- ters should have been found by with the aid of a training set and then evaluated on a different set of reference images. There is a possibility that the results of the FFD and Demon algorithms might be the outcome of an overtraining of the pa- rameters. It was however, not possible to run the registrations on both a training set and an evaluation set within the limited time frame of this thesis. 68 5 Discussion

5.3 Future work

With the discussion of the results and method presented in the previous sections in mind, some thoughts regarding how to continue on with this problem is pre- sented in this section.

5.3.1 Refine ground truth As discussed in section 5.1.1, the results of the baseline method (Morphon) and the other algorithms can be considered to be very accurate. To make sure that the algorithms have reached their full potential, it is necessary to create perfectly seg- mented ground truth masks for the prototypes and the reference images which was discussed in section 5.2.1.

5.3.2 Evaluating registrations differently As discussed in section 5.2.2, the evaluation did not cover the registrations them- selves but rather the outcome of the registrations. In the future, metrics like normalized cross correlation (2.18) or sum of squared differences (2.16) on the entire water and/or fat image should be included in the evaluation.

5.3.3 Detecting and handling outliers In most cases, the automatic suggestion by any of the algorithms presented in this thesis is good enough in order to make a fully automatic body composition measurement. The real problem however, is how to deal with the outliers which can sometimes be very inaccurate. An example of an outlier can be seen in the Bland-Altman plot in figure 4.3 and illustrated as an image in figure 4.4.

One potential solution to the problem presented could be to develop a post- registration automatic outlier detector. When an outlier has been detected, anatom- ical landmarks can be placed into the reference and prototype image and an affine parametric transformation can be found which minimizes the distance between the landmarks. This transformation can be used as an an initial solution to the Morphon algorithm (or any of the other algorithms).

5.3.4 Extend prototype bank As mentioned in section 5.2.4, the set of prototypes used in this thesis was con- structed to span the spectra of different body types. When registering a reference image, many of the prototypes will not be similar to that particular image from the start and requires large deformations in order to achieve some resemblance. Algorithms in general, not just for image registrations, highly depend on the ini- tial solution in order to generate an accurate output. If the input data (initial solution or starting guess) is close to the truth, it is more likely that the output is 5.3 Future work 69 accurate.

One solution to this problem is to create a larger prototype bank and automati- cally select a subset of prototypes for each reference image. Instead of having 29 prototypes where 5 of them are similar to a reference image, a set of 29 proto- types can be automatically selected where all of them are similar from the start. A simple method to determine similarity between a reference image and a proto- type would be to use the Dice coefficient (2.64) of the body masks. Another idea is to use normalized cross correlation (2.18) on the water and/or fat images.

Since outliers are likely to be the outcome from poor input data, this approach would probably reduce the number of outliers by quite a bit. This could either be an alternate or complementary approach to what was suggested in section 5.3.3 in order to solve the outlier problem.

6 Conclusions

The first conclusion that can be made is that the Demon and FFD algorithms are not better image registration methods than the Morphon algorithm. In this context, all three algorithms performed approximately the same. The difference between the estimated body composition measurements and ground truth were of the same order of magnitude and the precision and recall were all above 93% for all regions. The Morphon algorithm had the best results for VAT, the De- mon algorithm had the best ASAT results and the FFD algorithm showed good robustness for the muscle regions. These differences are either the outcome of the algorithms themselves or the tuning of the parameters.

The second conclusion that can be made is that including the fat images in the registrations improve the performance of the Demon algorithm, especially for measurements of VAT and ASAT but also for the muscles. It is likely that includ- ing the fat images in the Morphon and FFD algorithms will improve their results as well.

71

Bibliography

[1] A. Eklund, P. Dufort, M. Villani, and S. LaConte. Broccoli: Software for fast fMRI analysis on many-core CPUs and GPUs. Frontiers in Neuroinformatics, 8:24, 2014. Cited on page 18.

[2] Bernd Fischer and Jan Modersitzki. A unified approach to fast image reg- istration and a new curvature based registration technique. Linear Algebra and its Applications, 380(15):107–124, 2004. Cited on page 16.

[3] D. Forsberg. Robust Image Registration for Improved Clinical Efficiency : Using Local Structure Analysis and Model-Based Processing. PhD the- sis, Linköping University, Medical Informatics, The Institute of Technology, Center for Medical Image Science and Visualization (CMIV), 2013. Cited on pages 7, 8, 10, 13, 16, 18, 20, and 33.

[4] D. Forsberg, M. Andersson, and H. Knutsson. Adaptive anisotropic regular- ization of deformation fields for non-rigid registration using the morphon framework. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 473–476, March 2010. doi: 10.1109/ICASSP. 2010.5495704. Cited on pages 11 and 12.

[5] Yujun Guo and Cheng-Chang Lu. Multi-modality image registration using mutual information based on gradient vector flow. In 18th International Conference on Pattern Recognition (ICPR’06), volume 3, pages 697–700, Aug 2006. doi: 10.1109/ICPR.2006.826. Cited on page 16.

[6] Eldad Haber and Jan Modersitzki. Intensity gradient based registration and fusion of multi-modal images. In International Conference on Medi- cal Image Computing and Computer-Assisted Intervention, pages 726–733. Springer Berlin Heidelberg, 2006. Cited on page 15.

[7] Hellerhoff. Cavum septi pellucidi et vergae 43jw - mrt t1 axial und t2 axial - 001, 2015. URL https://commons.wikimedia.org/wiki/File: Cavum_septi_pellucidi_et_vergae_43jw_-_MRT_T1_axial_ und_T2_axial_-_001.jpg. CC BY-SA 3.0. Cited on page 14.

73 74 Bibliography

[8] A. Karlsson, J. Rosander, T. Romu, J. Tallberg, A. Grönqvist, M. Borga, and O. Dahlqvist Leinhard. Automatic and quantitative assessment of regional muscle volume by multi-atlas segmentation using whole-body water–fat mri. Journal of Magnetic Resonance Imaging, 41(6):1558–1569, 2015. Cited on page 22.

[9] H. Knutsson and M. Andersson. Morphons: segmentation using elastic can- vas and paint on priors. In IEEE International Conference on Image Pro- cessing 2005, volume 2, pages II–1226–9, Sept 2005. doi: 10.1109/ICIP. 2005.1530283. Cited on pages 18 and 19.

[10] O. D. Leinhard, A. Johansson, J. Rydell, O. Smedby, F. Nystrom, P. Lundberg, and M. Borga. Quantitative abdominal fat estimation using mri. In 2008 19th International Conference on Pattern Recognition, pages 1–4, Dec 2008. doi: 10.1109/ICPR.2008.4761764. Cited on page 5.

[11] Jingfei Ma. Dixon techniques for water and fat imaging. Journal of Magnetic Resonance Imaging, 28(3):543–558, 2008. ISSN 1522-2586. doi: 10.1002/ jmri.21492. URL http://dx.doi.org/10.1002/jmri.21492. Cited on page 5.

[12] J. Modersitzki. FAIR: flexible algorithms for image registration. Society for Industrial and Applied Mathematics, 2009. Cited on pages 12, 13, and 16.

[13] A. Myronenko. Non-rigid Image Registration: Regularization, Algorithms and Applications. PhD thesis, Oregon Health & Science University, 2010. URL https://books.google.se/books?id=uS3pZwEACAAJ. Cited on pages 8, 13, 16, and 21.

[14] Andriy Myronenko. Medical image registration toolbox (mirt) for matlab. URL https://sites.google.com/site/myronenko/research/ mirt. Cited on page 32.

[15] Ian J. Neeland, Aslan T. Turer, Colby R. Ayers, Jarett D. Berry, Anand Ro- hatgi, Sandeep R. Das, Amit Khera, Gloria L. Vega, Darren K. McGuire, Scott M. Grundy, and James A. de Lemos. Body fat distribution and inci- dent cardiovascular disease in obese adults. Journal of the American Col- lege of Cardiology, 65(19):2150–2151, 2015. ISSN 0735-1097. doi: 10.1016/ j.jacc.2015.01.061. URL http://www.onlinejacc.org/content/65/ 19/2150. Cited on page 3.

[16] Stacy A. Porter, Joseph M. Massaro, Udo Hoffmann, Ramachandran S. Vasan, Christopher J. O’Donnel, and Caroline S. Fox. Abdominal subcutaneous adipose tissue: A protective fat depot? Diabetes Care, 32(6):1068–1075, 2009. ISSN 0149-5992. doi: 10.2337/dc08-2280. URL http://care. diabetesjournals.org/content/32/6/1068. Cited on page 3.

[17] T. Romu, M. Borga, and O. Dahlqvist. Mana - multi scale adaptive nor- malized averaging. In 2011 IEEE International Symposium on Biomed- Bibliography 75

ical Imaging: From Nano to Macro, pages 361–364, March 2011. doi: 10.1109/ISBI.2011.5872424. Cited on page 5.

[18] D. Rueckert, L. I. Sonoda, C. Hayes, D. L. G. Hill, M. O. Leach, and D. J. Hawkes. Nonrigid registration using free-form deformations: application to breast mr images. IEEE Transactions on Medical Imaging, 18(8):712–721, Aug 1999. ISSN 0278-0062. doi: 10.1109/42.796284. Cited on page 21. [19] L. Tautz, A. Hennemuth, M. Andersson, A. Seeger, H. Knutsson, and O. Friman. Phase-based non-rigid registration of myocardial perfusion mri image sequences. In 2010 IEEE International Symposium on Biomed- ical Imaging: From Nano to Macro, pages 516–519, April 2010. doi: 10.1109/ISBI.2010.5490297. Cited on page 19. [20] J.-P. Thirion. Fast non-rigid matching of 3d medical images. In Proceed- ings of the Conference on Medical Robotics and Computer Assisted Surgery (MRCAS’95), Baltimore, November 1995. Cited on page 20.

[21] J.-P. Thirion. Image matching as a diffusion process: an anal- ogy with maxwell’s demons. Medical Image Analysis, 2(3):243 – 260, 1998. ISSN 1361-8415. doi: http://dx.doi.org/10.1016/ S1361-8415(98)80022-4. URL http://www.sciencedirect.com/ science/article/pii/S1361841598800224. Cited on page 20. [22] Y. Yaegashi, K. Tateoka, K. Fujimoto, T. Nakazawa, A. Nakata, et al. As- sessment of similarity measures for accurate deformable image registration. 2012. Cited on page 26.

[23] D. Yang, S. Zhang, Z. Yan, C. Tan, K. Li, and D. Metaxas. Automated anatom- ical landmark detection ondistal femur surface using convolutional neural network. In 2015 IEEE 12th International Symposium on Biomedical Imag- ing (ISBI), pages 17–21, April 2015. doi: 10.1109/ISBI.2015.7163806. Cited on page 13.