A High Performance Computing Approach to Registration in Medical Imaging

by Gabriel Mañana Guichón Mentor: Dr. Eduardo Romero Castro

A dissertation presented to the National University of in fulfilment of the thesis requirement for the degree of Doctor of Philosophy in Electrical Engineering

Bogotá, D.C., Colombia, 2010

AUTHOR’S DECLARATION FOR ELECTRONIC SUBMISSION OF A THESIS

I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners.

I understand that my thesis may be made electronically available to the public.

Gabriel Mañana Guichón 2010

iii

Abstract

This research has been devoted to the study of the performance related issues associated to automatic registration systems in medical imaging. Registration, in this context, is the determination of a spatial transformation that aligns points in one view of a region of the anatomy, with corresponding points of the same region in another view. One major issue with most image registration techniques is their high computational cost. Because of this, these methods have found limited application to clinical situations where fast execution is required, e.g., intra-operative imaging. High performance can be achieved by reduction in data space, reduction in solution search space and parallel processing. This research has aimed to obtain high performance by taking advantage of grid computing architectures and exploiting inherent parallel techniques such as those found in evolutionary computation.

v

Acknowledgements

I want to especially thank Professor Eduardo Romero, my advisor, for his knowl- edgeable counseling and for always being inspiring. His friendship has been invaluable to me during these years of hard work, as it will be in the years to come. All my love to my partners, Natalia, Juan y Manuel, for their unwaver- ing support and patience. Finally, I also want to thank the National University of Colombia, home, for the scholarship I was awarded and without which the present work could not have been accomplished.

vii

To my parents, Mabel y Pepe, who passed away recently.

Life is a very exceptional situation. – Wim Wenders

ix

Table of Contents

Author’s Declaration iii

Abstractv

Acknowledgements vii

Dedication ix

Table of Contents xi

List of Tables xv

List of Figures xvii

1 Introduction1 1.1 The Image Registration Process...... 3 1.2 High Performance Computing...... 6 1.3 Cluster Computing...... 7 1.4 Grid Computing...... 8 1.5 Cloud Computing...... 15 1.6 Problem Definition...... 16 1.7 Document Organization...... 17

2 A Grid Computing Framework for Medical Imaging 19 2.1 Distributed Systems...... 20 2.2 Space-based Systems...... 21 2.3 Data Privacy...... 27

3 A Distributed Evolutionary Approach Subtraction 29 3.1 Problem Statement...... 30 3.2 Parametric Transformations...... 31

xi 3.3 Similarity Measure...... 31 3.4 Optimization Problem...... 33 3.5 Interpolation Approach...... 34 3.6 Search Strategy...... 34 3.7 Algorithm Distribution...... 36 3.8 Algorithm Validation...... 37 3.9 The Subtraction Service...... 41

4 Automatic Registration for Evaluation of PSBM 47 4.1 Introduction...... 47 4.2 Image Registration in the Context of PSBM...... 49 4.2.1 Problem Formulation...... 51 4.3 Materials and Methods...... 51 4.3.1 Image acquisition process...... 52 4.3.2 Preprocessing...... 52 4.3.3 Geometrical transformations...... 52 4.3.4 Interpolation criterion...... 53 4.3.5 Similarity measure...... 53 4.3.6 Optimization method...... 56 4.3.7 Computational Framework...... 60 4.4 Results...... 64 4.4.1 Validation of the Implemented Algorithms...... 64 4.5 Discussion...... 70

5 Curvature-based 3D Non-rigid Image Registration 75 5.1 Introduction...... 75 5.2 Problem Statement...... 76 5.3 Solution Strategy...... 76 5.4 Algorithm Distribution...... 77 5.5 Algorithm Validation...... 78 5.6 The Registration Service...... 79

6 Atlas-based Segmentation Service 81

7 Discussion and Conclusions 85 7.1 Contributions...... 87 7.2 Discussion...... 89 7.3 Further Work...... 89

Appendices 93

xii A Characterization of Tier 3 Sites 95

B National University’s Tier-3 Computing Cluster 97 B.1 Hardware Specification...... 97 B.2 Software Specification...... 98 B.2.1 Basic Configuration...... 98 B.2.2 OSG middleware...... 104 B.2.3 gLite middleware...... 104 B.3 Cluster Performance...... 105

C Published Material 107 C.1 Distributed Genetic Algorithm for Subtraction Radiography, GECCO 2006, ISBN: 1-59593-186-4...... 107 C.2 Grid Computing Based Subtraction Radiography, ICIP 2007, ISBN: 978-1-4244-1437-6...... 107 C.3 Characterization of Tier 3 Sites, CERN 2008, ISBN: 978-92-2083- 321-5...... 108 C.4 A Distributed Evolutionary Approach to Subtraction Radiography, Springer-Verlag 2009, ISBN: 978-3-642-10700-9...... 108 C.5 Automatic Registration Method for the Evaluation of Post Surgi- cal Bone Mineralization, submitted to International Journal of Computer Assisted Radiology and Surgery, Springer Verlag 2010. 109 C.6 A Service-Oriented Approach to High-Performace Medical Image Processing, submitted to the International Journal of Medical Informatics, Elsevier 2010...... 109 C.7 Distributed Curvature-based Non-rigid Image Registration, submit- ted to Transactions on Medical Imaging, IEEE 2010...... 110

Bibliography 111

xiii

List of Tables

3.1 Some combinations of rotation, scaling, and translation applied to the set of synthetic images...... 38 3.2 Values found by the Downhill Simplex algorithm...... 38 3.3 Values found by the Genetic Algorithm...... 39 3.4 Values found by Differential Evolution...... 39 3.5 DS-GA-DE performance comparison...... 40

4.1 Some combinations of rotation, scaling, and translation applied to the set of synthetic images...... 65 4.2 Values found by the Downhill Simplex algorithm...... 66 4.3 Values found by the Genetic Algorithm...... 66 4.4 Values found by the Differential Evolution algorithm...... 67 4.5 DS-GA-DE performance comparison...... 68

5.1 Comparison of curvature-based registration algorithms...... 78

xv

List of Figures

1.1 PET image slice, CT image slice, PET/CT fusion...... 1 1.2 Archetypal grid architecture. The easiest way to integrate het- erogeneous computing resources is not to recreate them as ho- mogeneous elements, but to provide a layer that allows them to communicate despite their differences. This software layer is commonly known as middleware...... 11 1.3 The architecture currently envisaged attempts to bring together the (upward) refinement of data to information and knowledge and the (downward) application of knowledge to information handling and data collection through feedback loop control.... 14 1.4 Cloud Computing taxonomy map...... 15

2.1 Space-based coordination by means of a minimal programming interface: write, read, take...... 24 2.2 Space-based computing grid using the replicated-worker design pattern...... 25 2.3 Overall architecture of the medical imaging framework, showing the technologies used for communication between neighbouring components: (a) plain Java objects (POJO), (b) HTTP, HTTPS, sockets TCP/UDP, (c) Job Submission Description Language (XML) over sockets TCP, (d) JavaSpaces API, (e) Java Database Connec- tivity (JDBC), (f) sockets TCP, (g) JDBC...... 26 2.4 Data privacy in the Computing Grid layer is attained using the Proxy design pattern...... 27

3.1 Timing profile for the parallel iterative algorithm showing the percentage time required for each operation...... 37 3.2 The upper row shows the two images to subtract. Bottom row shows the subtracted images: left without geometrical correction and right after automatic correction...... 44

xvii 3.3 Overall architecture of the subtraction radiography service, show- ing the protocols used for communication between neighbouring components...... 45 3.4 Graphical user interface for the radiography subtraction service.. 45

4.1 An example of one-point crossover when using binary encoding. Once a crossover point is randomly chosen, the offspring chromo- some will consist of a combination of the two parent substrings. This mechanism can be used to produce one or two offsprings.. 58 4.2 An example of mutation when using binary encoding. A mutation point is randomly chosen and then the allele in that position is inverted...... 58 4.3 Real-number encoding of the chromosome used in both evolution- ary algorithms...... 59 4.4 Overall architecture of the subtraction radiography service..... 61 4.5 Space-based coordination by means of a simple programming interface...... 63 4.6 The upper row shows the two images to subtract. Bottom row shows the subtracted images: left without geometrical correction and right after automatic correction...... 69

5.1 The hypothesis behind the proposed method: computing the 2D displacement fields in the slices of the three space directions in- dependently, and then merging them into a 3D field, yields an equivalent result to applying the 3D version of the algorithm.... 77 5.2 Graphical user interface of the 3D registration service...... 79

6.1 Graphical user interface of the manual segmentation service.... 83 6.2 Diagram of the Atlas-based segmentation process. The segmented atlas image (template) is registered to the image to be segmented (reference). The displacement vector field found in this step is then applied to the surface meshes of the segmented atlas, thereby providing a segmentation of the latter...... 84

xviii Chapter 1

Introduction

Technological advances in medical imaging in the past two decades have enabled doctors to create images of the human body and its internal structures with unprecedented resolution and realism. Conventional radiography, state-of-the-art Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Single Photon Emission Computed Tomography (SPECT), Positron Emission Tomography (PET), and other imaging devices can quickly acquire two and three-dimensional images, and these images can further be computed to merge into a single image, volume or sequence, thus combining the information of all modalities. This combination or fusion of images requires the images to be previously aligned or registered. By registering two images, the fusion of multimodality information becomes possible, changes in the body can be detected, and therefore clinical activities like diagnosis, treatment planning and follow-up can be enhanced.

Figure 1.1: PET image slice, CT image slice, PET/CT fusion.

In image processing, the interest often lies not only in analyzing one image but also in comparing or combining the information present in different images. In this context, image registration can be defined as the process of aligning images

1 2 CHAPTER 1. INTRODUCTION so that corresponding features can be related. The term image registration is also used to refer to the alignment of images with a computer model or the alignment of features in an image with locations in physical space. For this reason image registration is one of the fundamental tasks within medical image processing: by determining the transformation required to align two images, registration enables specialists to make quantitative comparisons. From an operational point of view, image registration is an optimization prob- lem and its goal is to produce, as output, an optimal geometrical transformation that aligns corresponding points of the two given views. Image registration has applications in many fields, including remote sensing, astro- and geophysics, computer vision, and medical imaging. The field of application to be addressed in this chapter is medical imaging, and in this field, this transformation is generally used as input to another system that can be, for instance, a fusion system or a subtraction system. For a complete overview of the different image acquisition systems and the relevance of registration in medical image interpretation and analysis, you may refer to Hajnal et al. [77], and references therein. In many clinical scenarios, images of the same or different modalities may be acquired and it is the responsibility of the diagnostician to combine or fuse the images information to draw useful clinical conclusions. Without an imaging sys- tem this generally requires mental compensation for changes in patient position, the sensors used, or even the chemicals involved. An image registration system aligns the images and so establishes correspondence between features present in different images, allowing the monitoring of subtle changes in size or intensity over time or across a population. It also allows establishing correspondence between images and physical space in image guided interventions. In many applications a rigid transformation, i.e., translations and rotations only, is enough to describe the spatial relationship between two images. However, there are many other applications where non-rigid transformations are required to describe this spatial relationship adequately. In terms of the algorithms used, the current tendency is to use automatic algorithms (i.e., no user interaction) [34], which requires the application of advanced image registration techniques, all characterized by their high computa- tional cost. Due to this restraint, these methods have found limited application in clinical situations where real time or near real time execution is required, e.g., intra-operative imaging or image guided surgery. High performance in image registration can be achieved by reduction in data space, as well as reduction in solution search space. These techniques can decrease significantly the registra- tion time without compromising registration accuracy. Nonetheless, to obtain a significant increase in performance, these approaches must be complemented with parallel processing. The problem is that parallel processing has always 1.1. THE IMAGE REGISTRATION PROCESS 3 been associated with extremely expensive supercomputers, unaffordable for most medical institutions in developing countries. This chapter will describe our experience in achieving high performance in an affordable way, i.e., taking advantage of an existing computational infrastructure. More specifically, it will outline how this can be done by using open source software tools that are readily available. This will be illustrated by the use of a real case study: an online subtraction radiography service that employs distributed evolutionary algorithms for automatic registration. This research has dealt with the high computational cost of image registration and the relevance of this restraint in medical imaging. The following section reviews the overall registration process and introduces grid computing as a cost-effective alternative to overcome this difficulty.

1.1 The Image Registration Process

Any image registration technique can be described by three main components [29]: 1. A transformation which relates the reference and floating images,

2. a similarity measure which measures similarity between reference and transformed image,

3. an optimization scheme which determines the optimal transformation as a function of the similarity measure. Geometrical transformation refers to the mathematical forms of the geometri- cal mapping used in the registration process and can be classified by complexity into “rigid transformations", where all distances are preserved, and deformable or “non-rigid transformations" where images are stretched or deformed. While the first is ideal for most fusion applications, and accounts for differences such as patient positioning, non-rigid transformations are used to take into account more complex motions, such as breathing or heart beating. The similarity measure is the driving force behind the registration process, and it aims to maximize the similarity between both images. From a probabilistic point of view, it can be viewed as a likelihood term that expresses the probability of a match between the reference and transformed image [52]. The main similarity measures used for image registration will be briefly reviewed below. Like many other problems in computer vision and image analysis, registration can be formulated as an optimization problem whose goal is to minimize an associated energy or cost function [118]:

C = Csimilarity + Ctransformation (1.1) − 4 CHAPTER 1. INTRODUCTION

where the first term characterizes the similarity between the images and the second term characterizes the cost associated with particular deformations. From a probabilistic point of view, the cost function in Equation 1.1 can be can be explained in a Bayesian context. In this framework, the similarity measure can be viewed as a likelihood term which expresses the probability of a match between the two images, and the second term can be interpreted as prior which represents a priori knowledge about the expected deformation. In the case of rigid or affine registration this term is normally ignored and only plays a role in non-rigid registration. Several approaches can be used to optimize this function. They go from the use of standard numerical methods to the use of evolutionary methods, including some hybrid approaches. No matter what method is used, this always implies an iterative process whose computational cost is so high that prevents most applications from performing appropriately in real time situations. One possible way to solve this issue is to devise faster algorithms. Another way is to exploit the intrinsic parallelism that most methods convey. Medical image registration spans numerous applications and there is a large score of different techniques reported in the literature. What follows is an attempt to classify the different techniques and categorize them based upon some criteria, for a complete analysis, please refer to, e.g., [10]. Maintz and Viergever [71] originally proposed a nine-dimensional scheme that can be condensed into the following eight criteria [64]: image dimensionality, registration basis, geometrical transformation, degree of interaction, optimization procedure, image acquisition modalities, subject, and object. Image dimensionality refers to the number of geometrical dimensions of the image spaces involved, which in medical applications are typically two and three-dimensional, but may include time as a fourth dimension. For spatial registration, there are the 2D/2D, 3D/3D, as in our case, and the more complex 2D/3D registration (e.g., CT/X-ray). The registration basis is the aspect of the two images used to perform the reg- istration. In this category, registration can be classified into extrinsic and intrinsic methods. Registration methods that are based upon the attachment of markers are termed extrinsic methods, and in contrast, those which rely on anatomic features only are termed intrinsic. When there are no known correspondences as input, intensity patterns in the two views are used for alignment. A basis known as intensity- or voxel-based, has become in recent years the most widely used registration basis in medical imaging. Here there are two distinct approaches: the first reduces the image gray value content to a representative set of scalars and orientations (e.g. principal axes and moments based methods), the second uses the full image pixel content throughout the registration process. In general, 1.1. THE IMAGE REGISTRATION PROCESS 5 intensity-based methods are more complex, yet more flexible, and it will be the registration basis used by the method presented in this work. The category geometrical transformation refers to the mathematical forms of the geometrical mapping used to align points in one space with those in the other. These include rigid transformations, which preserve all distances, i.e., transformations that preserve the straightness of lines - and hence planarity of surfaces - and all angles between straight lines. Images are rotated and translated in two or three dimensions in the matching process, but not deformed in any way. This is ideal for most fusion applications, and accounts for differences such as patient positioning. Registration problems that are limited to rigid trans- formations are called rigid registration problems. In deformable or non-rigid registration, images are stretched to take into account complex motions, such as breathing, and any changes in the shape of the body or organs, which may occur following surgery, for example. Non-rigid transformations are important not only for applications to non-rigid anatomy, but also for inter-patient registration of rigid anatomy and intra-patient registration of rigid anatomy, in those cases where there are non-rigid distortions caused by the image acquisition procedure. These include scaling transformations, with a special case when the scaling is isotropic, known as similarity transformations; the more general affine trans- formations that preserve the straightness of lines and planarity of surfaces, as well as parallelism, but change the angles between lines; the even more general projective transformations that preserve the straightness of lines and planarity of surfaces, but no parallelism; perspective transformations, a subset of the projective transformations, required for images obtained by techniques such as X-ray, endoscopy or microscopy, and finally curved or non-rigid transformations which do not preserve the straightness of lines. Each type of transformation contains as special cases the ones described before it, e.g., rigid transformations are a special type of non-rigid transformations, and so on. Transformations that are applied to the whole image are called global, while transformations that are applied to subsections of the image are called local. Rigid, affine and projective transformations are generally global, and curved transformations are more or less local, depending upon the underlying physical model used. Particularly, in Chapter5 we will show a curvature-based algorithm that does not rely upon an underlying physical model. Degree of interaction refers to the degree of intervention of a human operator in the registration algorithm. The fully automatic algorithm, which requires no user interaction and represents the ideal situation, is a central focus of the registration service presented in this work. The optimization procedure is the method by which the function that mea- sures the alignment of the images is maximized. Depending upon the mathe- 6 CHAPTER 1. INTRODUCTION matical approach to registration used, i.e., parametric or non-parametric, the optimization method will try to find an optimum of some function defined on the parameter space (parametric registration), or will try to come up with an appropriate measure, both for the similarity of the images as well as for the likelihood of a non-parametric transformation (non-parametric registration). The more common situation here is that in which a global extremum is sought among many local ones by means of iterative search. In parametric registration, popular techniques include traditional numerical methods like Powell’s method [92], Downhill Simplex [66], gradient descent methods, as well as evolutionary meth- ods like genetic algorithms [73], simulated annealing [115], and differential evolution [80]. By contrast, the idea behind non-parametric registration is to come up with an appropriate measure, both for the similarity of the images as well as for the likelihood of a non-parametric transformation, as shown in the algorithm presented in Chapter5. Modalities refers to the means by which the images to be registered are acquired. Two-dimensional images are acquired, e.g., by X-ray projections cap- tured on film or digitally, and three-dimensional images are typically acquired by tomographic modalities such as computed tomography (CT), nuclear magnetic resonance (MRI), or positron emission tomography (PET). In medical applications the object in each view is some anatomical region of the body. In all cases we are concerned primarily with digital images stored as discrete arrays of intensity values. Registration methods used for like modalities are typically distinct from those used for differing modalities. They are called mono-modal and multi-modal registration, respectively. We will present a mono-modal registration modality in this work. Subject refers to patient involvement and there can be intra-patient registra- tion involving images of the same patient, inter-patient registration involving images of different patients, and atlas. Atlas refers to registration between an image acquired from a single patient and an image constructed from an image database of many patients. Finally, object refers to the particular region of anatomy to be registered, e.g., mandible. To show the distributed method implemented, in this work we will use human brain 3D images.

1.2 High Performance Computing

High performance computing (HPC), is the use of parallel processing for running advanced application programs efficiently, reliably and fast. Technology plays a critical role today to help academics to do research more effectively. Most research relies on HPC for data-intensive applications, information access, visual- 1.3. CLUSTER COMPUTING 7 ization and communications. The last decade has seen a considerable increase in the performance of computers and communication networks, mainly due to faster development of hardware and more elaborate software. However, there are still many problems associated with algorithms in the fields of science, en- gineering and business, which cannot be managed efficiently with the current generation of supercomputers. This inefficiency is reflected in several negative factors associated with supercomputers: high cost1, complex maintenance and administration, limited scalability, rapid obsolescence. Another option is to link together an homogeneous set of high performance servers or personal computers [119] by means of a fast local area network (e.g. fiber-optic systems). These are known as computer clusters and can provide a computing capacity similar to that provided by supercomputers, at a fraction of their cost. A number of teams have conducted studies on the cooperative use of geograph- ically distributed resources conceived as a single powerful virtual computer [41]. This alternative approach is known by several names, such as, meta-computing, global computing, and more recently by grid computing. Internet and grid-based systems, whether their purpose is computation, collaboration or information sharing, are all instances of systems based on the application of fundamental principles of distributed computing. Grid computing is a set of standards and technologies that academics, researchers, and scientists around the world are developing to help organizations take collective advantage of improvements in microprocessor speeds, optical communications, raw storage capacity, and the Internet. By using the technique to disaggregate their computer platforms and distribute them as network resources, researchers can vastly increase their com- puting capacity. Linking geographically dispersed and heterogeneous computer systems can lead to important gains in computing power, speed, and productivity. The following sections will review the current state of the art in –economically viable– high performance computing, meaning by this, cluster computing, grid computing, and an emerging and revolutionary trend known as cloud computing.

1.3 Cluster Computing

A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or

1The IBM Roadrunner, the world’s second fastest computer as of the end of 2009: about 133 million dollars (http://en.wikipedia.org/wiki/IBM_Roadrunner). 8 CHAPTER 1. INTRODUCTION availability over that of a single computer, while typically being much more cost-effective than single computers of comparable speed or availability [30].

1.4 Grid Computing

The last decade has seen a considerable increase in commodity computer and network performance, mainly as a result of faster hardware and more sophisti- cated software. Nevertheless, there are still algorithms associated to problems in the fields of science, engineering and business, which cannot be dealt effectively with the current generation of supercomputers [87]. In fact, due to their size and complexity, these problems are often numerically and/or data intensive and require a variety of heterogeneous resources that cannot be provided by a single machine. A number of teams have conducted studies on the cooperative use of geographically distributed resources conceived as a single powerful virtual computer [41]. This alternative approach is known by several names, such as, metacomputing, global computing, and more recently by Grid Computing. Grid computing uses distributed interconnected computers and resources collectively to achieve higher performance computing and resource sharing. It was developed in the mid-1990s with the growth of high-speed networks and the Internet, that allowed distributed computer systems to be readily interconnected. Grid computing has become one of the most important techniques in high perfor- mance computing by providing resource sharing in science more than anything. By taking advantage of the Internet and high-speed networks like , Géant and CLARA2, geographically distributed computers can be used collectively for collaborative problem solving. In Grid Computing, different organizations can supply the resources and personnel, and the Grid infrastructure can cross organizational and institutional boundaries. This concept has many benefits:

• Problems that could not be solved previously because of limited computing resources can now be tackled successfully. Examples include understanding the human genome, searching for new drugs, and the study of the birth of our Universe.

• Interdisciplinary teams can be formed across different institutions and organizations to tackle problems that require the expertise of multiple

2Internet2 is a networking consortium that operates the Internet2 Network. This network connects over 60.000 U.S. educational, research, government and “community anchor” institu- tions (http://www.internet2.edu/). Géant is an European multi-gigabit computer network dedicated to serving Europe’s research and education community and connects many educational networks from Europe and other countries in the world (http://www.geant.net/). CLARA stands for Cooperación Latino Americana de Redes Avanzadas (http://www.redclara.net/). 1.4. GRID COMPUTING 9

disciplines.

• Specialized experimental equipment can be accessed remotely and col- lectively within a Grid infrastructure, e.g., the Large Hadron Collider at CERN3.

• Large collective databases can be created to hold vast amounts of data.

• Unused compute cycles can be harnessed at remote sites, achieving high performance and more efficient use of computers.

• Business processes can be re-implemented using Grid technologies for cost saving.

Perhaps the most important and differentiating feature of Grid computing is the ability to conduct collaborative computing. Grid computing is about collaboration and resource sharing as mush as it is related to high performance computing. Distributed computing certainly existed before Grid computing, but the easy prospect of developing teams of geographically distributed researchers, a hallmark of Grid computing, became a reality with the development of the Internet. It is common practice to use the word Grid as a proper noun, although that does not refer to a universal Grid. Without qualification, the word Grid refers to Grid computing infrastructures in general. Although very high performance Grid projects employ their own dedicated high-speed interconnection networks, using the Internet to interconnect the distributed computers really made Grid computing possible to all. Grid computing came as from the recognition that that the Internet and Internet-type interconnections provide a unique opportunity for implementing a geographically distributed computing system. Some Grid projects involve computers spread across the globe [103], while others are more localized [94] depending upon the goals. Internet and Grid based systems, whether their purpose is computation, col- laboration or information sharing, are all instances of systems based on the application of fundamental principles of distributed computing. Grid computing is a set of standards and technologies that academics, researchers, and scientists around the world are developing to help organizations take collective advan- tage of improvements in microprocessor speeds, optical communications, raw storage capacity, and the Internet. By using the technique to disaggregate their computer platforms and distribute them as network resources, researchers can vastly increase their computing capacity. Linking geographically dispersed and heterogeneous computer systems can lead to staggering gains in computing power, speed, and productivity.

3Why the LHC, http://public.web.cern.ch/public/en/LHC/WhyLHC-en.html. 10 CHAPTER 1. INTRODUCTION

Grid computing is driven by five big areas:

• Resource sharing: Global sharing is the very essence of grid computing.

• Secure access: Trust between resource providers and users is essential, es- pecially when they don’t know each other. Sharing resources conflicts with security policies in many individual computer centers, and on individual PCs, so getting grid security right is crucial.

• Resource use: Efficient, balanced use of computing resources is essential.

• The death of distance: Distance should make no difference: you should be able to access to computer resources from whereever you are.

• Open standards: Interoperability between different grids is a big goal, and is driven forward by the adoption of open standards for grid development, making it possible for everyone can contribute constructively to grid devel- opment. Standardization also encourages industry to invest in developing commercial grid services and infrastructure.

A grid’s architecture is often described in terms of “layers”, where each layer has a specific function. The higher layers are generally user-centric, whereas lower layers are more hardware-centric, focused on computers and networks.

• The lowest layer is the network, which connects grid resources. Nowadays, the de facto network is Internet.

• Above the network layer lies the resource layer: actual grid resources, such as computers, storage systems, electronic data catalogues, sensors and even telescopes that are connected to the network.

• The middleware layer provides the tools that enable the various elements (servers, storage, networks, etc.) to participate in a grid. The middleware layer is sometimes called the “brains” behind a computing grid, and is responsible for hiding distribution and the heterogeneity of the various hardware components, operating systems and communication protocols. At its most basic level, middleware is nothing but a way of abstracting access to a resource through the use of an Application Programming Interface (API).

• The highest layer of the structure is the application layer, which includes applications in science, engineering, business, finance and more, as well as portals and development toolkits to support the applications. This is the 1.4. GRID COMPUTING 11

layer that grid users “see” and interact with. The application layer often includes the so-called serviceware, which performs general management functions like tracking who is providing grid resources and who is using them.

The middleware layer is responsible for hiding distribution and the hetero- geneity of the various hardware components, operating systems and commu- nication protocols. At its most basic level, middleware is nothing but a way of abstracting access to a resource through the use of an Application Programming Interface (API). Despite their benefits, distributed systems can be notoriously difficult to build. Perhaps the most obvious complexity is the variety of machine architectures and software platforms over which a distributed application must function. In the past, developing a distributed application entailed porting it to every platform it would run on, as well as managing the distribution of platform- specific code to each machine. Most of the computers in the University campus are used in academic related tasks and use a variety of operating systems, a fact that clearly indicated the necessity of a platform-independent middleware.

application layer

middleware layer

resource layer network

Figure 1.2: Archetypal grid architecture. The easiest way to integrate heteroge- neous computing resources is not to recreate them as homogeneous elements, but to provide a layer that allows them to communicate despite their differences. This software layer is commonly known as middleware.

The Globus Toolkit is a popular example of grid middleware. It’s a set of tools for constructing a grid, covering security measures, resource location, resource management, communications and so on. Many major grid projects use the Globus Toolkit, which is being developed by the Globus Alliance4, a team primarily involving Ian Foster’s team at Argonne National Laboratory and Carl Kesselman’s team at the University of Southern in Los Angeles.

4Globus has become the de facto open source toolkit for building computing grids, http: //www.globus.org/. 12 CHAPTER 1. INTRODUCTION

Many of the protocols and functions defined by the Globus Toolkit are similar to those in networking and storage today, but have been optimized for grid- specific deployments. Globus includes services such as:

• GRAM (Globus Resource Allocation Manager): figures out how to convert a request for resources into commands that local computers can understand.

• GSI (Grid Security Infrastructure): authenticates users and determines their access rights.

• MDS (Monitoring and Discovery Service): collects information about re- sources such as processing capacity, bandwidth capacity, type of storage, and so on.

• GRIS (Grid Resource Information Service): queries resources for their current configuration, capabilities, and status.

• GIIS (Grid Index Information Service): coordinates arbitrary GRIS services.

• GridFTP (Grid File Transfer Protocol): provides a high-performance, secure, and robust data transfer mechanism.

• Replica Catalog: provides the location of replicas of a given dataset on a grid.

• The Replica Management system: manages the Replica Catalog and GridFTP, allowing applications to create and manage replicas of large datasets.

There are two main reasons for the strength and popularity of the Globus toolkit:

1. Grids need to support a wide variety of applications created according to different programming paradigms. Rather than providing a uniform programming model for grid applications, the Globus Toolkit has an “object- oriented approach”, providing a bag of services so that developers can choose the services that best meet their needs. The tools can also be introduced one at a time. For example, an application can use GRAM or GRIS without having to necessarily use the Globus security or replica management systems.

2. The Globus Toolkit is available under an“open-source” licensing agreement, which means anyone is free to use or improve the software. This is similar to the World Wide Web and the Linux operating system. 1.4. GRID COMPUTING 13

There are many other layers within the middleware layer. For example, middleware includes a layer of “resource and connectivity protocols”, and a higher layer of “collective services”. Resource and connectivity protocols handle all grid-specific network transac- tions between different computers and grid resources. For example, computers contributing to a particular grid must recognize grid-relevant messages and ignore the rest. This is done with communication protocols, which allow the resources to communicate with each other, enabling exchange of data, and authentication protocols, which provide secure mechanisms for verifying the identity of both users and resources. The collective services are also based on protocols: information protocols, which obtain information about the structure and state of the resources on a grid, and management protocols, which negotiate uniform access to the resources. Collective services include:

• updating directories of available resources,

• brokering resources (which like stockbroking, is about negotiating between those who want to “buy” resources and those who want to “sell”),

• monitoring and diagnosing problems,

• replicating data so that multiple copies are available at different locations for ease of use,

• providing membership/policy services for tracking who is allowed to do what and when.

From a service point of view, the architecture of a grid computing infrastruc- ture can also be described as a layered model. At the bottom, the Data and Computational grids provide the basic services: data availability, privacy, security, and computing power. The main functions of computational-data grid include compute load sharing, algorithm partitioning, resolution of data source addresses, security, replication and message rerouting. Based upon these two, an Infor- mation Grid can be built. The Information Grid resolves homogeneous access to heterogeneous information sources. Based upon the services provided by an Information Grid, and by using data mining and other related techniques such as BI (Business Intelligence) and OLAP (Online Analytical Processing), a Knowledge Grid can be implemented. A Knowledge Grid incorporates epistemology and ontology to reflect human cognitive characteristics; exploits social, ecological and economic principles to provide appropriate on-demand services to support scien- tific research, technological innovation, cooperative teamwork, problem solving, 14 CHAPTER 1. INTRODUCTION and decision making [55]. The knowledge grid utilizes knowledge discovery in database technology to generate knowledge and also allows for representation of knowledge through scholarly works, peer-reviewed (publications) and grey literature, the latter especially hyperlinked to information and data to sustain the assertions in the knowledge. As shown below, the aim and scope of this work has been centered in the design and development of a high performance com- puting facility, i.e., a Computational Grid, from a big set of (shared) commodity computers.

Knowledge Grid

Information Grid

Data Grid Computational Grid

Figure 1.3: The architecture currently envisaged attempts to bring together the (upward) refinement of data to information and knowledge and the (downward) application of knowledge to information handling and data collection through feedback loop control.

For the last seven years, the author of this work has been working on the development of a computing grid infrastructure that profits from idle CPU cycles (cycle scavenging) of the workstations that are part of the National University campus at Bogotá, Colombia (http://ungrid.unal.edu.co/). Given that several thousand computers with high heterogeneity operate on the campus, JiniTM [95] was chosen as the architectural foundation of the computational grid. Jini is a platform agnostic network architecture for the construction of distributed systems based on network-centric services that are highly adaptive to change. The features and properties of this technology, such as spontaneous networking and service discovery, leasing, distributed events and transactions, security and service-oriented programming models, make it a very suitable base for creating dynamic, reliable grid systems. Currently, the grid is in operational status and has been used to work on multiple scientific problems whose computational cost would have been prohibitive otherwise. Detailed information, including the replicated-worker design pattern used to implement the computing grid, will be presented in Chapter2, A Grid Computing Framework for Medical Imaging. As expected, unGrid uses Globus in the middleware layer. 1.5. CLOUD COMPUTING 15

1.5 Cloud Computing

Cloud computing is a general term for anything that involves delivering hosted services over the Internet. These services are broadly divided into three categories: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as- a-Service (SaaS). The name cloud computing was inspired by the cloud symbol that’s often used to represent the Internet in flow charts and diagrams. Figure 1.4 shows a taxonomy map of Cloud Computing where it can be seen that SaaS is the base service upon which all others are built.

Multi-Tenancy Service Oriented Web Experience Architecture (SOA) SLA Reporting Utility Monetization Computing Schemes Provisioning Tenant Integration Automation Isolation and Mashups

Reporting and Persistent Analytics Federated Identity Storage Management Delegated Administration Load Federated Access Balancing Control

Software as a On Demand Scaling Service (SaaS) Unstructured Quota Storage Management Software + Services Partitioning White Hosted Development Branding Version Tooling Offline Data Data Model Experience Semi-structured Export Extensions Desktop Storage Experience Workflow Platform as a Relational Cloud Customization Device Experience Service (PaaS) Storage Programming Visual Billing and Customization Metering

Software as Self Service

Figure 1.4: Cloud Computing taxonomy map.

A cloud service has three distinct characteristics that differentiate it from traditional hosting. It is sold on demand, typically by the minute or the hour; it is elastic –a user can have as much or as little of a service as they want at any given time; and the service is fully managed by the provider (the consumer needs nothing but a personal computer and Internet access). Significant innovations in virtualization and distributed computing, as well as improved access to high- 16 CHAPTER 1. INTRODUCTION speed Internet and a weak economy, have accelerated interest in cloud computing. The common thread between grid computing and cloud computing is the use of the Internet to access the resources. Cloud computing is driven by the widespread access that the Internet technologies provide. However, cloud com- puting is quite distinct from the original purpose of grid computing. Whereas Grid computing focuses on collaborative and distributed shared resources, cloud computing concentrates upon placing resources for paying users to access. The technologies for cloud computing emphasizes the use of services (software as a service, or SaaS) and eventually the use of virtualization (the process of separat- ing the particular user’s software environment from the underlying hardware). A number of companies entered the cloud computing space in the mid-late 2000s. IBM5 was an early promoter of on-demand Grid computing in the early 2000s and moved into cloud computing in a significant way, opening several cloud computing centers in 2008, first in Ireland (Dublin), and subsequently in the Netherlands (Amsterdam), China (Beijing), and South Africa (Johannesburg). Other major cloud computing players include Amazon6 and Google7 who utilize their massive server infrastructure. The cloud computing business model is one step further than hosting compa- nies simply renting servers they provide at their locations, which became popular in the early-mid 2000s and continues to date.

1.6 Problem Definition

As mentioned, one major problem with advanced image registration techniques is their high computational cost. Because of this restraint, these methods have found limited application to clinical situations where real time or near real time execution is required, e.g., intra-operative imaging or image guided surgery. High performance in image registration can be achieved by reduction in data space, reduction in solution search space and parallel processing. Reduction in data space can be achieved, for instance, by performing registration using only sub-images or exploiting multi-resolution representations such as those provided by the wavelet transform. Reduction in solution search space can be achieved by using an iterative refinement search such as hill-climbing or simulated annealing. The previous techniques can reduce significantly the registration time without compromising registration accuracy. However, to obtain a significant increase in performance, these approaches must be complemented with parallel processing.

5While the site name is IBM - Cloud Computing, the URL still refers to “grid”: http://www. ibm.com/grid/. 6Amazon Elastic Compute Cloud EC2, http://aws.amazon.com/ec2/. 7Google App Engine, http://code.google.com/appengine/. 1.7. DOCUMENT ORGANIZATION 17

At the same time, parallel processing has always been associated with extremely expensive supercomputers or high performance clusters, almost equally expensive and unaffordable for most medical institutions.

1.7 Document Organization

This dissertation has been prepared following the recommendations given by Davis and Parker in “Writing the Doctoral Dissertation” [45]. In particular, special attention has been paid to Chapter 12, The Defense and Publishing the Results, in which the authors emphasize the significance of the candidate and advisor partnership, as well as the importance of a careful planning and systematic approach. This dissertation is organized as follows. Chapter2, A Grid Computing Framework for Medical Imaging, presents my experience in building a scalable computing framework for medical imaging, by using a service-oriented archi- tecture and open source software tools. To show the use of the framework in some real working situations, three case studies are presented: Chapter3, Auto- matic Subtraction Radiography using Distributed Evolutionary Algorithms, will review the algorithms behind the implementation of an online rigid registration system; Chapter5, Curvature-based 3D Non-rigid Image Registration Service, will analyze a novel non-rigid registration service that uses a second-derivative kernel to achieve a rigid and non-rigid registration in one execution; and Chapter 6, Atlas-based Segmentation Service, will describe an automatic segmentation service based upon 3D atlases and the previous non-rigid registration service. Finally, Chapter7, Discussion and Conclusions summarizes the contributions of this work, draws some pertinent conclusions and discusses my experience adopting a service-oriented approach and an open source developing model. AppendixA includes the document “Characterization of Tier 3 Sites”, result of my internship at CERN (Conseil Européen pour la Recherche Nucléaire). Ap- pendixB shows the process of implementation of a high-performance computing cluster at the National University of Colombia main campus in Bogotá. Finally, AppendixC brings together the book chapter and papers published during the doctoral period.

Chapter 2

A Grid Computing Framework for Medi- cal Imaging

The following text from the article “Space-based Computing Grid for Medical Imaging”, G. Mañana and E. Romero, submitted to Computer Methods and Programs in Biomedicine, Elsevier, July 2010. As introduced in Section 1.2, high performance computing (HPC) is the use of parallel processing for running advanced application programs efficiently. There are a few ways to implement parallel processing, and this ultimately will depend upon the available hardware infrastructure available. Supercomputers, despite of their cost, are still a valid option. However, as previously noted, there are many problems associated with algorithms in the fields of science, engineering and business, which cannot be managed efficiently with the current generation of supercomputers. This inefficiency is reflected in several negative factors associated with supercomputers: extremely complex maintenance and administration, limited scalability, and rapid obsolescence. The other option is to link together a substantial number of CPUs (or GPUs) by means of a fast network. These are known as computer clusters and can provide a computing capacity similar to that provided by supercomputers, at a fraction of their cost. As we will see next, these systems can be classified between tightly-coupled (clusters) and loosely-coupled (grids), however, they are all instances of systems based on the application of fundamental principles of distributed computing. Next section will review these principles, as well as the challenges faced when implementing a system of this class. This chapter presents a grid computing framework for medical imaging whose architecture has been based upon a shared and distributed memory space, based on which a space-based computing grid has been built.

19 20 CHAPTER 2. A GRID COMPUTING FRAMEWORK FOR MEDICAL IMAGING

2.1 Distributed Systems

A distributed system consists of multiple autonomous computers that commu- nicate through a computer network. The computers interact with each other in order to achieve a common goal. The word distributed in terms such as “distributed system”, “distributed programming”, and “distributed algorithm” originally referred to computer networks where individual computers were physi- cally distributed within some geographical area [100]. These terms are nowadays used in a much wider sense, even when referring to autonomous processes that run on the same physical computer, and interact with each other by message passing [47]. Each computer, or node, in a distributed system may have its own user with individual needs, and the purpose of the distributed system is to coordinate the use of shared resources or provide communication services to the users. Alternatively, a distributed system may have a common goal, such as solving a large computational problem [114]. The terms“concurrent computing”, “parallel computing”, and “distributed computing” have a lot of overlap, and no clear distinction exists between them. The same system may be characterized both as “parallel” and “distributed”; the processors in a typical distributed system run concurrently in parallel [28]. Parallel computing may be seen as a particular tightly-coupled form of distributed computing, and distributed computing may be seen as a loosely-coupled form of parallel computing. Nevertheless, it is possible to roughly classify concurrent systems as “parallel” or “distributed” using the following criteria:

• In parallel computing, all processors have access to a shared memory. Shared memory can be used to exchange information between proces- sors [23, 60].

• In distributed computing, each processor has its own private memory (distributed memory). Information is exchanged by passing messages between the processors [100, 47, 114].

As shown below, the computing grid presented in this dissertation clearly falls into the latter category, in the sense that each processor has its own memory, but also uses a shared and network-accessible memory space for the exchange of messages. Various hardware and software architectures are used for distributed comput- ing. At a lower level, it is necessary to interconnect multiple CPUs with some sort of network, regardless of whether that network is printed onto a circuit board or made up of loosely-coupled devices and cables. At a higher level, it is 2.2. SPACE-BASED SYSTEMS 21 necessary to interconnect processes running on those CPUs with some sort of communication system. Distributed systems typically fall into one of several basic and well known architectures: client-server and its many variations (e.g., N-tier architectures) [90, 39], peer-to-peer [111, 130], and space-based [68, 106]. Another basic aspect of a distributed computing architecture is the method of communicating and coordinating work among concurrent processes. In this regard, distributed systems can be classified between tightly-coupled and loosely- coupled. Computing clusters are an example of tightly-coupled systems, while space-based systems generally exhibit decoupling in time, space and reference. Distributed computing has many advantages over the stand-alone application model, however, distributed applications can be notoriously difficult to design, build and debug. Perhaps the most obvious complexity is the variety of machine architectures and software platforms over which a distributed application must commonly execute. In the past, this heterogeneity problem has thwarted the development and proliferation of distributed applications: developing an applica- tion entailed porting it to every platform it would run on, as well as managing the distribution of platform-specific code to each machine. More recently, virtual machines (e.g. Java) have eased this burden by providing automatic loading of class files across a network, along with a common runtime that runs on most platforms. Nevertheless, a networked environment presents many challenges beyond heterogeneity. By their very nature, distributed applications are built from multiple (potentially faulty) components that communicate over (poten- tially slow and unreliable) network links. These characteristics force us to deal with issues such as latency, synchronization, and partial failure, that do not occur in standalone applications. These issues have a significant impact on distributed application design and development.

2.2 Space-based Systems

A space, in this context, is a shared and network-accessible repository for objects that processes can use as persistent storage and exchange mechanism: instead of communicating directly, they coordinate by exchanging objects through spaces, as shown in Figure 2.1. The space-based model of distributed computing has its roots in the Linda coordination language developed by Dr. Gelernter at Yale University [27]. Processes perform simple operations to write new objects into a space, take (remove) objects from a space, or read (make a copy of) objects in a space. The construction of space-based applications requires the design of distributed data structures, and distributed protocols that operate over them. A distributed data structure is made up of multiple objects that are stored in one or more spaces. Representing data as a collection of objects in a shared 22 CHAPTER 2. A GRID COMPUTING FRAMEWORK FOR MEDICAL IMAGING space allows multiple processes to concurrently access and modify the data structure. Distributed protocols, on the other hand, define the way participants in an application share and modify these data structures in a coordinated way. Distributed protocols written using spaces have the advantage of being loosely- coupled: because processes interact indirectly through a space, data senders and receivers are not required to know each other’s identities or even to be active at the same time. Conventional network tools (e.g., MPI) require that all messages be sent to a particular process (who), on a particular machine (where), and at a particular time (when). In contrast, using a space-based system, a process can write an object into a space and expect that some other process, somewhere, at some time, will take the object and make use of it according to the distributed protocol in use. Uncoupling senders and receivers leads to protocols that are simple, flexible, and reliable. Despite its minimal programming interface, space-based technologies provide a unique set of features that allows for the construction of loosely coupled and transactionally secure distributed applications, and as we will show next, for the implementation of computing grids with automatic load balancing. Some relevant features provided by a space are:

Spaces are shared: Spaces are “shared memories” where remote processes can create and manipulate distributed data structures in a concurrent manner. The onus is on the space service to handle this concurrency consistently.

Spaces are persistent: Spaces provide reliable storage (temporary or perma- nent), meaning that objects written into a space may outlive the processes that created them. Processes can also specify a “lease” time for an object, after which it will automatically removed from the space.

Spaces are associative: Objects in a space are located via associative lookup, rather than by memory location. Associative lookup provides a simple means of finding objets according to their content.

Spaces are transactionally secure: Spaces provide a transaction mechanism that guarantees that all operations are atomic. Transactions are supported for single or multiple operations, over one or more spaces.

As said before, spaces allow the exchange of objects, that is, self-contained entities that include data and related code. While in the space, objects are just passive data, but when read or taken out from the space they transform themselves into standalone applications, i.e., spaces allow for the exchange of executable content. This not only solves the problem of code distribution, but also gives us, along with the leasing model and distributed transactions, a powerful 2.2. SPACE-BASED SYSTEMS 23 mechanism for building parallel computing servers. The application pattern used for this is known as the Replicated-Worker pattern [37], and involves a manager process that divides a problem into smaller tasks and puts them into a space. The workers take and execute these tasks, and write the results back into the space. It is then the responsibility of the manager to collect the task results and combine them into a meaningful overall solution (Figure 2.2). In this respect, the space acts as a mediator between the manager and the worker nodes, another commonly used design pattern in object-oriented programming. It is worth pointing out a couple of additional and important characteristics of this pattern. First, each worker process may execute many tasks, as soon as one task is computed a worker can take another task from the space and execute it. In this way, the replicated-worker pattern automatically balances the load: workers compute tasks in direct relation to their availability and capacity to do the work. Second, the type of applications that fit into the replicated-worker pattern scales naturally: more workers can be added and the computation speeds up, without rewriting the code. The Appendix section shows how the JavaSpaces API1 and the replicated-worker pattern can be used to build a generic worker node. Based upon the previous considerations, and a thorough analysis of the available middleware technologies at that moment, we decided to use the Java- based Jini2 platform on top of which construct our computing grid. Jini, an open source technology that now is part of the Apache project River [14], is a service- oriented technology that provides a platform and protocol-agnostic infrastructure to build distributed systems. This includes services for the registration and discovery of other services, distributed events, transactional support like that one provided by relational database engines, and most important for us, the JavaSpaces service. This technology is a high-level coordination tool for gluing processes together into a distributed application by means of a minimal, yet powerful, programming interface. It is a departure from conventional distributed tools which rely on passing messages between processes (MPI), or invoking methods on remote objects (RPC), and produce inherently tightly coupled systems. The JavaSpaces technology uses a fundamentally different approach that views a distributed application as a collection of processes cooperating via the flow of objects into and out of one, or more, distributed shared-memory spaces. The space-based model of distributed computing has its roots in the Linda coordination language developed by Dr. Gelernter at Yale University [27]. As mentioned before, a space is a shared and network-accessible repository for objects that processes can use as persistent storage and exchange mechanism:

1Java, Jini and JavaSpaces are trademarks of Sun Microsystems Inc. 2Java, Jini and JavaSpaces are trademarks of Sun Microsystems Inc. 24 CHAPTER 2. A GRID COMPUTING FRAMEWORK FOR MEDICAL IMAGING

worker read node

worker write obj node read

worker node space

write obj take worker worker node node

Figure 2.1: Space-based coordination by means of a minimal programming interface: write, read, take. instead of communicating directly, they coordinate by exchanging objects through spaces, as shown in Figure 2.1. Despite its minimal programming interface, JavaSpaces provides a unique set of features that allows for the construction of loosely coupled and transactionally secure distributed applications, and as we will show next, for the implementation of computing grids with automatic load balancing. Spaces allow the exchange of objects, that is, self-contained entities that include data and related code. While in the space, objects are just passive data, but when read or taken out from the space they transform themselves into standalone applications. This not only solves the problem of code distribution, but also gives us a powerful mechanism for building parallel computing servers. The application pattern use for this is known as the Replicated-Worker pattern [37], and involves a master process that divides a problem into smaller tasks and puts them into a space. The workers take and execute these tasks, and write the results back into the space. It is then the responsibility of the master to collect the task results and combine them into a meaningful overall solution. It is worth pointing out a couple of important characteristics of this pattern. First, each worker process may execute many tasks, as soon as one task is computed a worker can take another task from the space and execute it. In this way, the replicated-worker pattern automatically balances the load: workers compute tasks in direct relation to their availability and capacity to do the work. Second, the type of applications that fit into the replicated-worker pattern scales 2.2. SPACE-BASED SYSTEMS 25 naturally: more workers can be added and the computation speeds up, without rewriting the code. The Appendix section shows how the JavaSpaces API and the replicated-worker pattern can be used to build a generic worker node.

task 1 take worker write write node

resul t1 . take . . . manager write worker node take taskn

Task t = readTask(); Result r = t.execute(); resul tn writeResult(r);

space

Figure 2.2: Space-based computing grid using the replicated-worker design pattern.

Another important issue that has to be addressed when implementing a distributed system is related to the following. So far, we have discussed several tools that Jini provides and that help obtain fault tolerance in the worker nodes. However, the set of Jini services run in a server computer, a situation known as a single point of failure (SPOF). This means that if this single server computer fails for some reason, the whole computing grid goes down. To avoid this situation, we have embedded our computing grid service, along with Jini, in a layered application (JEE). The service is then run in a cluster of six application servers using two open source frameworks: the application server JBoss [75] and the clustering tool Terracotta [120]. Figure 2.3 shows a diagram of overall architecture of the medical imaging framework implemented. The user interacts with the framework services using a standard browser, independently of the operating system or hardware used. In the browser resides an “applet” that uses the Java binding for OpenGL API [97] to provide the tools for visualization, manipulation and lightweight processing tasks such as drawing and rigid geometric transformations (e.g., translation, rotation, scaling). At this point, there are two possibilities or working modes of the system depending upon the connectivity status of the user. If there is an active Internet 26 CHAPTER 2. A GRID COMPUTING FRAMEWORK FOR MEDICAL IMAGING

user ) gui

(a)

local (b) web (c) HA (d) computing proxy server cluster grid

(f) (e)

local (g) local database server database

Figure 2.3: Overall architecture of the medical imaging framework, showing the technologies used for communication between neighbouring components: (a) plain Java objects (POJO), (b) HTTP, HTTPS, sockets TCP/UDP, (c) Job Submission Description Language (XML) over sockets TCP, (d) JavaSpaces API, (e) Java Database Connectivity (JDBC), (f) sockets TCP, (g) JDBC.

connection, then all user database or server-related actions are sent to a web application that is responsible for performing some administrative tasks and then redirecting them to the HA cluster. In this case, the proxy component works transparently and merely acts as a simple cache for the client applet. Otherwise, the proxy communicates with a local server, that in turn uses a local lightweight database to store all relevant user data. Either way, the applet only communicates with the proxy and always follows the same protocols. When there is again an active Internet connection then then proxy takes care of synchronizing all data with (to and from) the server side.

For detailed information about the framework technical specifications, the hardware and software components, the installation and configuration process followed, as well as the communications scheme deployed, please refer to Sec- tion B.1, AppendixB. 2.3. DATA PRIVACY 27

Data Grid proxy Computational Grid

Figure 2.4: Data privacy in the Computing Grid layer is attained using the Proxy design pattern.

2.3 Data Privacy

Data security is the means of ensuring that data is kept safe from corruption and that access to it is suitably controlled. Thus data security helps to ensure privacy. Information privacy, or data privacy, on the other hand, is the relationship between collection and dissemination of data, technology, the public expectation of privacy, and the legal and political issues surrounding them. Privacy concerns exist wherever personally identifiable information is collected and stored - in digital form or otherwise. Improper or non-existent disclosure control can be the root cause for privacy issues. As shown in Figure 1.3, the fundamental components of a grid computing infrastructure are the Data and Computational grid layers. A clear separation of responsibilities between the two layers is a critical factor when designing such an infrastructure. Security, one of the most challenging aspects of distributed medi- cal imaging, is a crucial responsibility of the data grid component and is therefore part of the Globus Toolkit base grid infrastructure used. Authorization and user mapping is performed by the Grid User Management System (GUMS [48]) component, that uses the standard digital certificate mechanism. GUMS is partic- ularly well suited to a heterogeneous environment with multiple gatekeepers; it allows the implementation of a single site-wide usage policy, thereby providing better control and security for access to the site’s grid resources. Using GUMS, individual resource administrators are able to assign different mapping policies to different groups of users and define groups of hosts on which the mappings will be used. However, it is worth noticing that all images that are processed in the com- puting grid have been previously anonymized, i.e., all patient personal data is removed. This is achieved by placing a proxy component between the data and computing grids. The task of the proxy component is to scan the header of the images that are going to be processed, remove all patient private information, and generate a universally unique identifier (UUID) that is going to be used by the client software to track the images in the system.

Chapter 3

A Distributed Evolutionary Approach to Subtraction Radiography

The following text from the book “Computational intelligence in Expensive Optimization Problems”, Chapter 27, A Distributed Evolutionary Approach to Subtraction Radiography, G. Mañana and E. Romero, Springer-Verlag, ISBN 978-3-642-10700-9, 2010. Digital subtraction radiography detects tissue mass changes by subtracting two digital radiographs. This method has shown to be very useful in early diagnosis of disease and follow-up examination [3]. When subtracting two radiographs taken over time, the image features which are coincident to both images can be removed and the small changes can be amplified to highlight their presence. For many years, digital subtraction radiography in dentistry has been used to qualitatively assess changes in radiographic density. Numerous authors have demonstrated the ability of this method to improve diagnostic performance for the detection of approximal dental caries, periapical pathology and periodontal disease, e.g. [50]. A large variety of odontological diseases result in destruction of mineralized tissues, which are relatively small in the initial progression of the disease. A reliable detection and follow-up examination necessarily requires a precise alignment of the two images for the tissue changes to be detectable. Multiple works in the literarture address the problem of image registration by means of evolutionary algorithms. The 2D intensity-based method proposed by Gómez García et al. in [57], for instance, uses a (µ + λ) evolutionary strategy (µ = 250, λ = 50) for optimization and a multiscale representation of the images to reduce the search space. For the same problem, Yuan et al. [131] propose a feature-based method that uses a (µ, λ) selection scheme (µ = 50, λ = 300). Cordón et al. [56] extend the binary-coded CHC algorithm [85] to work with real-coded chromosomes and successfully apply it to 3D registration. Particularly,

29 30 CHAPTER 3. A DISTRIBUTED EVOLUTIONARY APPROACH SUBTRACTION

De Falco et al. [59] show the ability of Differential Evolution to perform well in satellite image registration and raise its possible use in medical image registration. In this section we will evaluate a standard numerical technique, the Down- hill Simplex method, and two evolutionary strategies: Genetic Algorithms and Differential Evolution.

3.1 Problem Statement

Different approaches have been proposed for correcting such geometrical dis- tortions. It goes from manual correction to different devices used to ensure a consistent geometric projection which can be reliably reproduced over time. In daily medical practice, however, devices for adequate patient fixation are not available to clinicians, a drawback that has not allowed the application of this method to the series of routine examinations needed for progression estimation of lesions or treatments. In fact, since most clinicians do not pay attention to this issue, radiographic examinations generally produce strong geometrical distor- tions which makes it inappropriate to apply conventional correction approaches. Under these circumstances, standard numerical techniques for extrema searching like the Powell’s [92] method or the Downhill Simplex method [66] usually yield irrelevant results. In this section, an entirely automatic method is proposed for spatial radio- graphic alignment in those cases where a considerable amount of distortion is presented. The process starts by selecting one of the two images as the reference while the other is considered to be the template image. Afterwards, illumination differences are eliminated by means of an equalization algorithm explained below. Consecutive geometrical transformations are then performed on the template image, and the outcome is compared to the reference image using the correlation ratio as the similarity measure. Conventional registration approaches have been successfully used in those situations where the patients head has been appropriately fixated, therefore producing images with little distortions. However, anatomical variations either from patient to patient or for the same patient in two different moments, have been a major inconvenience for radiographic subtraction to become an applicable method in routine evaluations. Our problem can be defined, therefore, as a multi-parametric search in a highly irregular space of possible transformations, for which conventional approaches have a high probability of remaining trapped in local extrema. 3.2. PARAMETRIC TRANSFORMATIONS 31

3.2 Parametric Transformations

Small tissue deformations are conveniently modeled using affine or projective transformations. The genetic algorithm presented below is based on a previous work by the author [44]. In this work, affine transformations were used to register the images, i.e., only translation in the x and y axes, rotation in the z axis, and scaling were considered. Experimental results obtained in that opportunity, showed that the capacity of the algorithm to correctly register images, significantly deteriorated in the presence of very strong misalignments. Further studies allowed us to determine that affine transformations were not enough to properly model the acquisition geometry, and that also rotations in the x and y axes should be taken into account. The projective transformations applied in this work can be defined using homogeneous coordinates such as:

      u a a d x x ‚ Œ ‚ Œ 1 2 x u/w  v   a a d y   y  , 0 (3.1)   =  3 4    = y0 v/w w a5 a6 1 1

Therefore, the new coordinates (x0, y0) of pixels (x, y) in the template image, are given by x0 = (a1 x + a2 y + d x)/(a5 x + a6 y + 1) and y0 = (a3 x + a4 y + d y)/(a5 x + a6 y + 1).

3.3 Similarity Measure

The mutual information measure, successfully applied to multimodal image registration [71, 107], assumes only statistical dependence between image inten- sities. It treats intensity values in a purely qualitative way, without considering any correlation or spatial information conveyed by nearby intensities. Mutual information tries to reduce entropy and this can be observed as a trend to form in- tensity clusters in the joint histogram. In the problem of radiography subtraction, since it deals with mono-modal images of natural tissue, the mutual information measure is under-constrained and a functional correlation can be assumed. The concept of functional dependence, fundamental in statistics, provided us with the framework for the computation of similarity between the two images. To use this concept we consider images as random variables and interpret an image histogram as its probability density function. Furthermore, we consider the 2D histogram of a pair of images as their joint probability density function as proposed in [129]. Thus when a pixel is randomly selected from an image X 32 CHAPTER 3. A DISTRIBUTED EVOLUTIONARY APPROACH SUBTRACTION having N pixels, the probability of getting an intensity i is proportional to the number of pixels Ni in X having intensity i, i.e.,

Ni P i . (3.2) ( ) = N In order to define the joint probability density function of an image pair, we consider two images (X , Y ) and a spatial transformation T that maps the set of pixels of Y , Ωy , to the set of pixels of X , Ωx . Since we are working with digitized radiographs, we can also assume that images X and Y take their intensity values from a known finite set A = 0, ..., 255:

X : Ωx A → Y : Ωy A. → Now, by applying transformation T to image Y , a new mapping is defined from the transformed positions of Y to A:

YT : T(Ωy ) A, → 1 ω Y [T − (ω)]. 7→ We now have to find the intensities that a given point of T(ωy ) simultaneously takes in X and YT . Since we are dealing with continuous spatial transformations, points of the grid T(ωy ) do not, in general, transform to points of the grid ωx . So, in order to define the joint probability density function of the images, we used the interpolation approach explained below, discarding the points of T(ωy ) that do not have eight neighbors in ωx . If we denote by T(ωy )∗ the subset of accepted points and by X˜ the interpolation of X , we can define the image pair as the following couple:

2 ZT : T(Ωy )∗ A , → € ˜ 1 Š ω X (ω), Y [T − (ω)] , 7→ and, in a similar way as we did for a single image in Eq. (3.2), their joint probability density function as:  Card x ZT (x) = (i, j) P i, j . (3.3) T ( ) = Card| T (Ωy )∗ On the other hand, the total variance theorem:

Var(Y ) = Var [E(Y X )] + EX [E(Y X = x)] , (3.4) | | 3.4. OPTIMIZATION PROBLEM 33

expresses the fact that the variance can be decomposed as a sum of two energy terms: a first term Var [E(Y X )] that is the variance of the conditional expectation and measures the part of| Y which is predicted by X , and a second term EX [E(Y X = x)] which is the conditional variance and stands for the part of Y which is| functionally independent of X . Now, based on the previous equation that can be seen as an energy conserva- tion equation, we can define the correlation ratio as the measure of the functional dependence between two random variables:

Var [E(Y X )] η(Y X ) = | . | Var(Y ) Unlike the correlation coefficient which measures the linear dependence be- tween two variables, the correlation ratio measures the functional dependence. The correlation ratio takes on values between 0 and 1, where a value near 1 indicates high functional dependence. Then, for a given transformation T, in order to compute η(YT X ) we can use the following equation: |   EX Var(YT X = x) 1 η(YT X ) = | , − | Var(YT ) that by means of Eq. (3.3) and Eq. (3.4) can be expressed as:

1 X 2 1 η(YT X ) = 2 σi Px,T (i), − | σ i where

2 X 2 2 X σ = j Py (j) m , m = jPy (j), j − j and 1 X 1 X σ2 j2 P i, j m2, m jP i, j . i = P i ( ) i i = P i ( ) x ( ) j − x ( ) j The correlation ratio measures the similarity between two images, and since it is assumed to be maximal when the images are correctly aligned, it will be used to compute the fitness of the individuals that make up the algorithm population.

3.4 Optimization Problem

The problem faced is to find the transformation that maximizes the correlation ratio between a pair of images. The parameters to be found are the eight parameters that define the required projective transformation. 34 CHAPTER 3. A DISTRIBUTED EVOLUTIONARY APPROACH SUBTRACTION

3.5 Interpolation Approach

In terms of linear interpolation, the reconstructed signal is obtained by convolu- tion of the discrete signal (defined as a sum of Dirac functions) with a convenient selected kernel. We used spline interpolation due to its accuracy and acceptable computing speed. Spline interpolation of order n is uniquely characterized in terms of a B-spline expansion:

X∞ n s(x) = c(κ)β (x κ), κεZ − which involves integer shifts of the central B-spline. The parameters of the spline are the coefficients c. In the case of images with regular grids, they are calculated at the beginning of the procedure by recursive filtering. A three-order approximation was used in the present work.

3.6 Search Strategy

Evolutionary algorithms (EA) represent a subset of evolutionary computation and use mechanisms inspired by biological evolution: recombination, mutation, and selection. By simulating the natural selection process, where the fittest individuals are more likely to survive, these algorithms can be used to find approximate or even exact solutions to optimization problems. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the environment within which the solutions live, also known as the cost function. Evolutionary algorithms are implemented as a computer simulation in which a population of abstract representations of candidate solutions evolves towards better solutions. These representations are called chromosomes (or the genotype of the genome), and the candidates are called individuals or phenotypes. Tradi- tionally, individuals are represented as binary strings, but as we shall see, real number encoding is also possible. The evolution usually starts from a population of randomly generated individuals and occurs in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are stochastically selected from the current population, recombined and mutated to form a new population. The new population is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either an adequate fitness level has been reached, a maximum number of iterations has been reached, or, as in our case, the available computational time is exausted. Despite their computational cost, evolutionary algorithms have been chosen over standard numerical methods because of their strong immunity to local 3.6. SEARCH STRATEGY 35 extrema, their intrinsic parallelism and robustness, as well as their ability to cope with large and irregular search spaces. In this section we compare two simple evolutionary algorithms categorized as parallel iterative [74]: a Genetic Algo- rithm (GA) and Differential Evolution (DE). Genetic algorithms are attributed to Holland (1975) [73] and Goldberg (1989) [31], while evolution strategies were developed by Rechenberg (1973) [61] and Schwefel (1995) [58]. A good and diverse set of GA examples is synthetized in Chambers [81], while a practical approach to Differential Evolution can be found in [80]. Both approaches mimic Darwinian evolution and attempt to evolve better solutions through recombi- nation, mutation, and selection. However, some distinctions do exist. DEs are very effective in problems of continuous functions optimization, in part because they use real encoding and arithmetic operators. Since GAs generally encode parameters as binary strings and manipulate them with logical operators, they are more suited to combinatorial optimization. Upon analyzing the most relevant works in this area, it can be concluded that the most crucial aspects refer to the selection of the coding scheme and the design of the fitness function. All seem to agree that for this kind of optimization problem, real-number encoding performs better than both binary and Gray encoding [89]. Accordingly for the problem at hand, in both evolutionary algorithms the chromosome has been coded as eight floating point numbers representing the set of parameters used in the projective transformation. The initial population includes an individual that is either the null transformation or the center of mass transformation, according to their respective fitness. The rest of the population is generated randomly within the search space. The fitness of each individual, indicating the similarity between the trans- formed image and the reference image, is then computed using the correlation ratio previously described. Selection in the GA is performed as follows. The fittest ten percent of the population is selected to be part of the next generation, a facet known as exploitation. The rest of the individuals are the result of either crossver (pc = 0.85 in our implementation) or random selection. In the case of crossover, the parents of each new offspring are selected by tournament (5% the size of the population) from the current population. Finally, leaving unmodified the individuals selected by elitism (the evolution history), new candidate individuals are mutated according to a predetermined probability (pm = 0.21), known as the exploration characteristic. Crossover in the GA is performed applying a convex operator as suggested by Davis in [82]. The genes of an offspring chromosome are then the result of a convex interpolation of the parameters of the two mates. A mutation operator is applied to guarantee that the probability of searching a particular subspace of the problem space is never zero. This prevents the algorithm from becoming trapped 36 CHAPTER 3. A DISTRIBUTED EVOLUTIONARY APPROACH SUBTRACTION in local extrema [31]. The mutation operator used, known as real number creep, sweeps the individual adding or subtracting a Gaussian distributed random noise to each parameter [82]. The creep operator implemented is a neighborhood search that looks in the vicinity of a good solution, to see if a better one exists. By contrast, in DE all individuals undergo mutation, recombination, and selection. Mutation starts by randomly selecting three individuals (vectors in DE terminology) and adding the weighted difference of two of the vectors to the third, and hence the name differential mutation. The resulting vector is called the donor vector. For recombination, a trial vector is developed from the elements of the target vector and the elements of the donor vector. The elements of the donor vector enter the trial vector with a given probability (Cr = 0.5). In this step, to ensure that the trial vector results effectively different from the target one, one of the elements of the donor vector is selected at random and entered directly into the trial vector. Finally, target and trial vectors are compared and the one with the higher similarity measure is selected to be part of the next generation. This process is repeated in both algorithms until some stopping criterion is reached. In our example, given that we receive a large variety of cases, a predetermined similarity measure is ineffective as the only stopping criterion. For this reason, the actual stopping criterion in our case is given by a maximum number of allowed iterations that is computed as follows. The available process- ing time span is about 12 hours, and since each iteration takes an average of 250 ms, we can make an estimate of the overall number of iterations that can be performed from one day to the next. Also, according to tests carried out with synthetic images, it has been determined that to obtain acceptable results, at least 200 iterations are required. Based on these considerations and the number of images to be processed, we precompute the number of times the algorithm can be executed for each pair of radiographs in the daily batch. The algorithm is then executed the maximum number of times possible and the best result obtained is the one used for the subtraction process.

3.7 Algorithm Distribution

The evaluation of the fitness function consists in applying a projective transforma- tion and then computing the corresponding correlation ratio. This computational intensive operation (see Figure 3.1) is required for each individual of the popula- tion. Since the operation can be computed independently for each individual, this part of the algorithm was parallelized and executed on the computational grid pre- viously described. The execution of the evolutionary algorithms uses 120 worker nodes: general purpose workstations, a 1GHz processor on average and memory ranging from 256 to 512 Mbytes. The source code of the implemented distributed 3.8. ALGORITHM VALIDATION 37

Figure 3.1: Timing profile for the parallel iterative algorithm showing the per- centage time required for each operation. algorithms, as well as additional documentation regarding the computational grid, can be found on the unGrid project site (http://ungrid.unal.edu.co/).

3.8 Algorithm Validation

To validate the correctness of the evolutionary algorithms implemented, two sets of experiments were conducted. In both cases, the algorithms were compared to a standard numerical method, the Downhill Simplex method devised by Nelder and Mead. This method was chosen because of its ease of implementation and because, amid the standard numerical optimization methods, it is the least sensitive to initial conditions. First, a series of synthetic images was created by applying a set of known transformations to ten reference radiographs. Then the transformed images were registered to the original ones to verify the ability of each algorithm to find the original values used in the transformation. In the second batch of experiments the algorithms were evaluated with pairs of images obtained from real radiographs. The set of synthetic images was created by applying the transformations shown in Table 3.1, using a reliable image processing program. The algorithms were executed ten times for each pair of reference and template images. The GA and DE algorithms were executed on the computational grid, while the Downhill Simplex implementation was executed on a single machine of the same grid, using its full processor capacity. For the set of synthetic images, the transformation values and correlation ratio obtained by the Downhill Simplex method, the Genetic Algorithm, and Differential Evolution are presented in Table 3.2, Table 3.3, and Table 3.4, respectively. From these results, it can be concluded that for deformations in the expected range, both evolutionary algorithms outperform the Downhill Simplex algorithm 38 CHAPTER 3. A DISTRIBUTED EVOLUTIONARY APPROACH SUBTRACTION

Table 3.1: Some combinations of rotation, scaling, and translation applied to the set of synthetic images. a θx θy θz Tx Ty SF 1 1 1 1 10 10 0.8 2 10 1 1 10 10 0.8 3 10 10 1 10 10 0.8 4 10 10 10 10 10 0.8 5 10 10 10 100 10 0.8 6 10 10 10 100 100 0.8 7 1 1 1 10 10 1.2 8 10 1 1 10 10 1.2 9 10 10 1 10 10 1.2 10 10 10 10 10 10 1.2 11 10 10 10 100 10 1.2 12 10 10 10 100 100 1.2

a SF: scale factor, angles expressed in degrees and translations in pixels.

Table 3.2: Values found by the Downhill Simplex algorithm. a θx θy θz Tx Ty SFCR Error 1 0.91 0.88 1.17 8.55 8.78 0.88 0.67 12.5 2 12.32 1.15 0.76 8.75 11.74 0.81 0.61 15.6 3 11.15 8.16 0.91 13.28 10.98 0.83 0.64 14.2 4 13.98 7.75 12.24 7.34 8.68 0.79 0.50 21.0 5 8.11 8.12 10.83 76.16 7.74 0.78 0.60 15.8 6 7.98 10.71 12.77 91.21 129.05 0.67 0.56 18.2 7 1.11 0.96 0.87 8.15 10.78 1.08 0.71 10.7 8 7.12 0.95 0.82 9.05 10.54 1.04 0.65 13.3 9 7.67 12.98 1.31 6.55 8.13 1.42 0.40 25.9 10 8.73 9.01 11.96 13.34 12.26 1.41 0.53 19.3 11 10.98 12.85 7.77 87.14 7.76 1.01 0.55 18.6 12 11.96 6.72 6.11 111.17 132.32 1.39 0.42 25.1

a CR: correlation ratio. 3.8. ALGORITHM VALIDATION 39

Table 3.3: Values found by the Genetic Algorithm.

θx θy θz Tx Ty SF CR Error 1 0.98 1.03 0.96 9.65 10.15 0.81 0.87 2.5 2 11.01 0.99 1.01 9.43 9.96 0.82 0.85 3.5 3 9.47 10.03 1.03 9.15 11.37 0.82 0.81 5.6 4 8.69 8.98 9.21 9.99 10.50 0.79 0.79 6.3 5 10.1 10.93 11.10 96.04 13.34 0.78 0.72 10.2 6 8.76 11.23 11.52 115.88 96.13 0.82 0.71 10.4 7 1.03 1.09 0.99 10.12 10.35 1.21 0.86 3.1 8 9.44 0.95 0.89 9.53 9.66 1.16 0.81 5.5 9 10.91 9.78 1.11 9.01 11.00 1.23 0.77 7.7 10 9.62 10.12 10.52 11.08 11.10 1.22 0.79 6.4 11 9.63 9.21 10.09 92.77 12.96 1.32 0.73 9.5 12 12.08 10.23 8.91 93.49 108.65 1.34 0.72 10.0

Table 3.4: Values found by Differential Evolution.

θx θy θz Tx Ty SF CR Error 1 0.99 0.99 1.02 9.66 9.78 0.82 0.88 2.0 2 10.05 0.98 0.99 9.55 9.36 0.81 0.87 2.6 3 9.77 10.02 1.03 10.1 10.99 0.83 0.85 3.4 4 9.01 9.57 9.52 9.98 10.02 0.80 0.85 3.5 5 10.2 9.78 11.1 95.99 11.34 0.81 0.81 5.6 6 9.06 10.55 9.43 92.77 117.31 0.82 0.76 7.9 7 0.98 0.97 1.01 9.88 10.02 1.10 0.87 2.6 8 10.2 0.97 0.91 10.53 9.87 1.15 0.84 4.1 9 9.85 10.15 0.97 10.9 10.7 1.19 0.84 3.8 10 10.34 9.88 9.68 10.95 9.15 1.18 0.83 4.6 11 10.65 10.37 9.76 109.01 9.13 1.19 0.82 5.2 12 9.19 9.39 10.91 110.56 92.28 1.17 0.77 7.4 40 CHAPTER 3. A DISTRIBUTED EVOLUTIONARY APPROACH SUBTRACTION

Table 3.5: DS-GA-DE performance comparison. Property DS GA DE Average Correlation Ratio 0.63 0.81 0.83 Average Execution Time (secs) 52 50 48 Number of CPUs 1 120 120 and provide clinically acceptable registration accuracy. Moreover, Table 3.4 shows that the DE algorithm produces the most accurate results. For the second experiment, a group of ten intra-oral radiograph pairs, taken on different occasions, was randomly selected from an unrelated study of periodontal therapy. No film holders or any other fixation device were mechanically coupled to the cone of the X-ray machine. Radiographs were digitized in a HP 3570 scanner using a transparent material adapter at a resolution of 600 600 DPI, producing 724 930 pixel images. × Even though× acquisition conditions are standardized as much as possible, illumination differences are inevitable. Thus, the histogram of the template image is equalized by using the reference image luminances. This transformation first computes the histogram of each image and then luminances are homogeneously distributed in the template image according to the levels found in the reference image. The properties compared in this experiment were accuracy, in terms of the similarity measure obtained, and efficiency, in terms of execution time and use of resources. All algorithms were coded in the same programming language and use the same routine to compute the correlation ratio between the transformed and reference images. For this comparison, the three algorithms were also executed ten times for each pair of radiographs. A summary of the results obtained is presented in Table 3.5: As expected, the Downhill Simplex method appeared to be very sensitive to the initial parameters and not always converged to the global optimum. While in some executions it obtained better results than the EAs, in other executions it produced meaningless values and this is reflected in the low overall accuracy shown in Table 3.5. Again, the DE algorithm consistently outperformed the GA and for that reason it is the algorithm currently used in production. It is also important to note that the computational grid, used to run the EAs, only uses the free CPU cycles of the computers that make it up. Fig. 3.2 shows a pair of radiographs to be subtracted (top row). The bottom row displays subtraction without geometric correction on the left and with correction on the right. Null intensity level is shifted to 128 in order to make tissue changes easily observed. In this particular example, it can be appreciated 3.9. THE SUBTRACTION SERVICE 41 that the match is precise enough to make objective measurements despite the fact that in the second radiograph, the fifth tooth (from left to right) is nearly hidden. The small spot, possibly an artifact, that appears in both images is observed in the resulting image in white, indicating that new tissue developed. In this image it can also be observed that a difference appears at the root of the third tooth which corresponds to new tissue developed after treatment. These changes are impossible to observe in the raw difference image (bottom left). Similarly, in this image the bone pattern is blurred and impossible to recognize, while in the resulting image the trabecular bone pattern is clear. For the entire set of test images, matching has been visually assessed by two experts in the field. They judged that the alignment was sufficiently accurate to get objective measurements while maintaining acceptable computation times. For the GA, 4580 experiments were performed in order to guarantee a com- plete analysis of the parameter space. An experiment is the execution of the algorithm with a particular set of images and parameters, i.e. population size, tournament size and genetic operators probabilities. In this task the grid became an essential tool and allowed us to achieve a second level of parallelism. The first analysis was conducted to determine two basic parameters of the algorithm: population size and selection scheme used to choose the parents for crossover. The experiments showed that the optimum population size for this problem is 120. Two common selection options are tournament selection and elitism. In tournament selection of size N, N individuals are selected at random and the fittest is chosen. Elitism is a particular case of tournament selection where the size of the tournament equals the size of the population, so the best individual is always preserved. For this problem, tournament selection of size 12 is the best option for selecting the parents for a new generation. The other parameters analyzed were the crossover and mutation probabilities. The combination of probabilities that yielded the best results were 0.76 and 0.18 respectively. Another advantage of the DE algorithm over the GA is that it only uses two parameters: the scale factor F, that controls the rate at which the population evolves, and the uniform crossover probability Cr . This makes the analysis of the parameter space simpler and therefore tuning of the algorithm becomes easier. The values found for the DE algorithm are F = 0.5 and Cr = 0.5.

3.9 The Subtraction Service

The medical imaging community has a growing need for Internet-aware tools that facilitate interaction with remote data in a collaborative way. Such medical imaging data typically requires special-purpose tools that come in the form of stand-alone and non-portable applications, i.e., software to be manually installed 42 CHAPTER 3. A DISTRIBUTED EVOLUTIONARY APPROACH SUBTRACTION and maintained on a local computer. This is possible provided there is an available binary version for the particular platform in use, or the source code is publicly obtainable and the user is responsible for gathering the required libraries and tools, and compiling the source code. There is also a growing need in the medical imaging community for Internet-aware software tools that facilitate collaborative data analysis and visualization. Research projects and clinical studies require a medium for a geographically dispersed scientific and clinical community to interact and examine medical imaging data via the Internet. Additionally, health care and medical research rely increasingly on the use of computationally intensive algorithms. The service-oriented model proposed has many advantages over the stand- alone application model. First and foremost, it avoids the user having to manually install the software. All that is needed is a standard web browser and an Internet connection: the platform dependency is no longer an issue. In addition, updates are made on the server side and automatically propagate to all service users. By having redundant and clustered servers it is possible to attain high data availabil- ity, something difficult -if indeed possible- with a personal workstation. Finally, and most important, the service-oriented paradigm leverages interdisciplinary and collaborative work, one critical success factor in biomedical practice and research. This section will present a service-oriented model for medical imaging via the Internet. Services are accessed via a standard web browser, however the essential tools also work offline. This is accomplished by using a local server and a local database, and synchronizing data when the user goes back online. The proposed architecture for this model is basically an enhanced version of a standard client-server architecture (see Figure 3.3). The difference lies in the addition of a local server that allows for the basic services to keep functioning without an active Internet connection. The proxy component is responsible for providing the essential functionality when the user is disconnected from the Internet, and for synchronizing data with the server when the connection is active again. To accomplish this, the local server is connected to a lightweight relational database engine that allows the client application to store, search and recover data using the structured query language (SQL). On the client side, services are accessed using a standard web browser. In our implementation, all services use a digitally-signed Java browser extension (“applet”) that takes care of the installation of the local server and other required tools such as the database engine (e.g. Apache Derby) and OpenGL1 libraries. Additionally, the applet provides the tools for visualization, manipulation and lightweight processing tasks such as image reconstruction and rigid geometric transformations. On the server side, and behind a pool of web servers, there is a

1OpenGL is a registered trademark of Silicon Graphics Inc. 3.9. THE SUBTRACTION SERVICE 43 high-availability cluster that provides access to the computing grid (HPC). The cluster is based on the Rocks cluster distribution [49] and Sun Grid Engine [96], and uses peer-to-peer technologies - a replicated, distributed, transactional tree- structured cache - to avoid the appearence of a single-point of failure (HA). The subtraction service provides two modes of operation: an interactive mode and a batch mode. In the first mode, the user loads the images to register, and interactively drags, rotates and scales the template image to align it manually. Once registered, the images can be subtracted and the difference visualized. Since this is a lightweight operation, it is carried out completely on the client side. However, the user can choose to register the images automatically. In this case, the images are uploaded to the server (if not already there), and then registered by the aforementioned distributed algorithm. The parameters of the projective transformation are then sent back to the client application for visualization. With the help of the local server and database, the interactive mode keeps working even without an active Internet connection, provided the images reside in the local machine. This is what happens most of the time, either because the images were produced on the local machine, or because they were previously downloaded from the server. The synchronization process is the responsibility of the so called proxy component that the client application actually communicates with: it uploads the locally digitized images to the server, and downloads the images stored on the server to the corresponding client computers. Figure 3.4 shows the graphical user interface of the service. In practice, the service is mostly used in the second or batch mode. In this mode the set of radiographs taken daily are digitized and uploaded to the service server where they are registered automatically by the same evolutionary algorithm. The job of the master process, executed in the cluster, is to generate the initial populations and send them to the computing grid for evaluation. Since the fitness of each individual can be evaluated independently from the others, this task is performed in parallel on the grid. Once evaluated, each population is collected by the corresponding master process that applies the genetic operators (mutation, recombination, selection), produces a new population and sends it again to the grid for evaluation. The process repeats until the stop conditions are met. The optimal transformations are then stored in the server database and sent to the client applications for visualization. The use of Java and Internet technologies in medical imaging is not new. These technologies have been used in radiology teaching files, to access information in multimedia integrated picture archiving and communication systems (PACS), and for teleradiology purposes. However, all known approaches seem to assume the existence of a reliable and stable Internet connection, and this is not always possible. 44 CHAPTER 3. A DISTRIBUTED EVOLUTIONARY APPROACH SUBTRACTION

Figure 3.2: The upper row shows the two images to subtract. Bottom row shows the subtracted images: left without geometrical correction and right after automatic correction. 3.9. THE SUBTRACTION SERVICE 45

Figure 3.3: Overall architecture of the subtraction radiography service, showing the protocols used for communication between neighbouring components.

Figure 3.4: Graphical user interface for the radiography subtraction service.

Chapter 4

Curvature-based 3D Non-rigid Image Reg- istration

The following text from the article “Distributed Curvature-based Non-rigid Image Registration”, G. Mañana and E. Romero, submitted to Transactions on Medical Imaging, IEEE, September 2010.

4.1 Introduction

As introduced in the previous chapter, the task of image registration is to find an optimal geometric transformation between corresponding image data. The image registration problem can be stated in just a few words: given a reference and a template image, find an appropriate geometric transformation such that the transformed template becomes similar to the reference. However, though the problem is easy to express, it is hard to solve. In practice, the concrete types of the geometric transformation, as well as the notions of optimal and corresponding depend on the specific application. In many applications a rigid transformation, i.e., translations and rotations only, is enough to describe the spatial relationship between two images. However, there are many other applications where non- rigid transformations are required to describe this spatial relationship adequately. Like many other problems in computer vision and image analysis, registration can be formulated as an optimization problem whose goal is to minimize an associated cost function [29]:

C = Csimilarit y + Ctrans f ormation, (4.1) − where the first term characterizes the similarity between the images and the second term characterizes the cost associated with particular deformations. From

47 48 CHAPTER 4. CURVATURE-BASED 3D NON-RIGID IMAGE REGISTRATION a probabilistic point of view, the cost function in eq. (5.1) can be can be explained in a Bayesian context. In this framework, the similarity measure can be viewed as a likelihood term which expresses the probability of a match between the two images, and the second term can be interpreted as a prior which represents a priori knowledge about the expected deformation. This term only plays a role in non-rigid registration –as a regularizer– and in the case of rigid registration is usually ignored.

4.2 Problem Statement

The main difference with respect to the parametric case, where one is looking for a set of optimal parameters of an expansion of the transformation, is that we are now seeking a smooth transformation. That is, the idea behind non- parametric registration is to come up with an appropriate measure, both for the similarity of the images as well as for the likelihood of a non-parametric transformation. Since image registration in general is an ill-posed problem, regularization is essential and inevitable. Moreover, regularization can be used to supply additional prior knowledge, and is what distinguishes the different non-rigid registration methods.

4.3 Solution Strategy

The 3D non-rigid image registration service presented in this section employs a distributed version of the novel curvature-based algorithm proposed by Fis- cher et al. in [15]. Unlike other common non-rigid registration methods, e.g., elastic [18], or fluid registration [46, 88], which are physically motivated, in the curvature-based method the regularizer is related to curvature. Since the regularizing term is based upon second order derivatives, the main consequence is that the affine linear pre-registration step –unavoidable in other methods– here becomes redundant. Unfortunately, the computation of a numerical solution for a non-parametric registration is not straightforward. The mathematical framework proposed relies on a variational formulation of the registration problem, and the numerical scheme used is based upon the Euler-Lagrange equations which characterize a minimizer (a detailed explanation can be found in [65]). While this approach allows to avoid the affine pre-registration step in many cases, it also leads to a high-dimensional system of non-linear partial differential equations. After an appropriate discretization by means of finite differences, the algorithm ends up with an iterative scheme, where in each step a large system of linear equations 4.4. ALGORITHM DISTRIBUTION 49 has to be solved numerically. This results in an algorithm that requires 16 CPU minutes to process a pair of 10242 pixel images, and more than 10 CPU hours to register a couple of 5123 pixel images, executed on a single server. A multi- resolution approach can significantly reduce the computation time and serve as an additional regularizer. However, since most of the images to register come from CT/MR scanners, the processing time for 3D images renders the algorithm useless. The solution devised was a combination of a multi-resolution approach, and a distributed implementation of the algorithm. The multi-scale representation of the images is achieved through the construction of a 4 or 5-level Gaussian pyramid, until 643 pixel images are obtained. In this step of the algorithm, a low-order recursive filter is used, as proposed by Young et al. in [62].

y y z z

x x

Figure 4.1: The hypothesis behind the proposed method: computing the 2D displacement fields in the slices of the three space directions independently, and then merging them into a 3D field, yields an equivalent result to applying the 3D version of the algorithm.

4.4 Algorithm Distribution

The hypothesis assumed for the implementation of the distributed version of the algorithm is based upon the superposition principle, and justified by the fact that the regularizer constrains the displacements to small quantities at each iteration, whereby the actual displacement field can be decomposed as a linear combination of its projections on each 2D plane. That is, for each level of the pyramid, both 3D images are decomposed into three sets of slices as shown in Figure 5.1. Then, the 2D displacement fields are computed in parallel for all pairs of corresponding slices. Finally, the 2D fields obtained are linearly combined 50 CHAPTER 4. CURVATURE-BASED 3D NON-RIGID IMAGE REGISTRATION

Table 4.1: Comparison of curvature-based registration algorithms.

Algorithm Image Size MRL∗ Iterations SSD∗∗ CPU Cores CPU Time Standalone 2563 1 100 1.56E 4 1 34.7m Standalone 2563 3 500, 25, 5 1.57E−4 1 3.8m Distributed 2563 1 100 1.96E−4 192 2.2m Distributed 2563 3 500, 25, 5 1.85E−4 192 1.6m Standalone 5123 1 100 3.41E−5 1 10.3h Standalone 5123 4 200, 20, 5, 1 3.16E−5 1 31.7m Distributed 5123 1 100 4.76E−5 384 4.1m Distributed 5123 4 200, 20, 5, 1 3.78E−5 384 3.6m *Multi Resolution Levels − **Sum of Squared Differences

into a 3D field that is used as the starting point for the next level in the pyramid. During the phase of experimentation, the maximum difference between the 2D 3 solutions was computed and was below 10− mm. In this way, by distributing all pairs of slices and processing them concurrently, the overall execution time for each pyramid level is reduced to that of processing one single pair of 2D images for that level. This approach results in a significant speedup of the algorithm while producing analogous similarity measures, as shown in Table 5.1.

4.5 Algorithm Validation

In order to validate the correctness of the distributed version of the algorithm, a sequential version of the original algorithm was implemented, and then the results of the execution of both algorithms were compared. Both versions of the algorithm use 4 execution threads per CPU core. The results shown were obtained by executing a fixed number of iterations of both versions of the algorithm, non- distributed and distributed, regardless of the similarity measure obtained. For this analysis, ten pairs of MR images were processed for the computation of the average values shown. From these results, it can be concluded that the distributed version of the algorithm produces appropriate values, while reducing the execution time significantly. In the common situation where a pair of 5123 voxel images are to be registered, the speedup factor obtained is of about 170, showing the important benefits of the distributed approach. 4.6. THE REGISTRATION SERVICE 51

4.6 The Registration Service

Based upon the distributed algorithm for non-rigid registration, a 3D registration service has been developed. The service is currently being used in clinical follow- up studies, as well as activities related to teaching and training in the School of Medicine. The graphical user interface for this service is shown in Figure 5.2, and uses the same disconnected model presented in Chapter3.

Figure 4.2: Graphical user interface of the 3D registration service.

The implemented algorithm also plays an important role in the Atlas-based automatic segmentation service, described in the following section. In this case, the algorithm is used to register the image to be automatically segmented with a manually segmented atlas image. As shown below (Chapter6), the idea is that given an accurate coordinate mapping from the image to the atlas, the label for each image voxel can be determined by looking up the structure at the corresponding location in the atlas under that mapping.

Chapter 5

Atlas-based Segmentation Service

Segmentation of medical images is the task of partitioning the data into con- tiguous regions representing individual anatomical objects. It is a prerequisite for further investigations in many computer-assisted medical applications, e.g., individual therapy planning and evaluation, diagnosis, simulation and image guided surgery. Due to the nature of medical images the task of segmentation can be tedious, time-consuming and may involve manual guidance. Clinical routine, however, requires efficient, robust and automatic methods. Segmentation is a difficult task because in most cases it is very hard to sepa- rate the object from the image background. This is due to the characteristics of the imaging process as well as the grey-value mappings of the objects themselves. The most common medical image acquisition modalities include computer tomog- raphy (CT), magnetic resonance tomography (MRT), and ultrasound (US). As a consequence of the nature of the image acquisition process noise is inherent in all medical data. The resolution of every acquisition device is limited, such the value of each voxel of the image represents an averaged value over some neighboring region (known as the partial volume effect). Moreover, inhomogeneities in the data might lead to undesired boundaries within the object to be segmented, while homogenous regions might conceal true boundaries between organs. In general, segmentation is an application specific task. Anatomy experts can overcome these problems and identify objects in the data due to their knowledge about typical shape and image data characteristics. Manual segmentation however is a very time-consuming process for large 3D image stacks, since this usually has to be done in a slice-by-slice fashion. The manual segmentation step is followed by surface mesh generation and simplifica- tion. In a clinical environment this amount of interaction often is not acceptable. Hence reliable automatic methods for image segmentation have to be devised. Intensity-based segmentation methods [98, 128, 99] work locally, typically

53 54 CHAPTER 5. ATLAS-BASED SEGMENTATION SERVICE one voxel at a time, by clustering the space of voxel values, i.e., image intensities. The clusters are often determined by an unsupervised learning method, for exam- ple k-means clustering, or derived from example segmentations [53]. There are, however, many applications where there is no well-deÞned relationship between the value of a voxel and the label that should be assigned to it. This observation is fairly obvious when we are seeking to label anatomical structures rather than tissue types. What distinguishes these structures instead is their location and their spatial relationship to other structures. In such cases, spatial information (e.g., neighborhood relationships) needs to be taken into consideration and in- cluded in the segmentation process. Level set methods [132, 4, 6] simultaneously segment all voxels that belong to a given anatomical structure. Starting from a seed location, a discrete set of labeled voxels is evolved according to image information (e.g., image gradient) and internal constraints (e.g., smoothness of the resulting segmented surface). Snakes or active contours [8] use an analytical description of the segmented geometry rather than a discrete set of voxels. In addition to geometrical constraints, one can take into account neighbor- hood relationships between several different structures [5, 69]. A complete description of such relationships is an atlas. In general, an atlas incorporates the locations and shapes of anatomical structures, and the spatial relationships between them. An atlas can, for example, be generated by manually segmenting a selected image. It can also be obtained by integrating information from multiple segmented images, for example from different individuals. Given an atlas, an image can be segmented by mapping its coordinate space to that of the atlas in an anatomically correct way, i.e., by means of registration. Labeling an image by mapping it to an atlas is consequently known as atlas-based segmentation, or registration-based segmentation. The idea is that, given an accurate coordinate mapping from the image to the atlas, the label for each image voxel can be determined by looking up the structure at the corresponding location in the atlas under that mapping. Obviously, computing the coordinate mapping between the image and atlas is the critical step in any such method. As there are typically substantial shape differences between different individuals, and therefore between an individual and an atlas, the registration must yield a non-rigid transformation capable of describing those inter-subject deformations. The service for atlas-based segmentation actually comprises two separate tools: one for the creation of atlases through manual segmentation, and one for automatic segmentation that uses the non-rigid registration service previously described, as explained next. The service for atlas generation provides the necessary interactive tools for the human expert to draw the limits of the organ(s) being segmented, in a slice-by-slice manner. First, the service allows the user to load a set of 2D images, 55

Figure 5.1: Graphical user interface of the manual segmentation service.

usually files in DICOM [101] format. Then patient information is removed from the images, a process known as image anonymization. Based upon the geometric information contained in the images (coordinate space, voxel size, etc.), the 3D volume is then reconstructed and visualized (see Figure 6.1). The user can then draw points over the image slices, in any of the 2D views (coronal, sagittal, axial), and the client application automatically interpolates a closed curve (one of polyline, Bézier, Cubic spline, or B-spline) based upon the points drawn. As the expert draws the contour of the anatomical structure, the resulting segmentation mesh is gradually visualized in the 3D view. The triangles of the segmentation mesh are stored in the server database if the user is online, otherwise they are stored in the local database for later synchronization. Once there are enough manual segmentations, a minimum of ten, the software in the server side runs an multi-scale instantiation of the STAPLE algorithm [116] to remove the variability introduced by the different human experts, and in this way find the “true” segmentation for the given image. The algorithm considers the collection of segmentations and computes a probabilistic estimate of the true segmentation and a measure of the performance level represented by each manual segmentation. The probabilistic estimate of the true segmentation is formed by estimating an optimal combination of the segmentations, by means of an expectation-maximization (EM) algorithm, and then incorporating a prior model for the spatial distribution of structures being 56 CHAPTER 5. ATLAS-BASED SEGMENTATION SERVICE segmented, by means of a Markov random field (MRF) model.

E x per t1 atlas segmentation

E x per t2 T . . NRR DF service

E x per tn

R

image to be segmented

Figure 5.2: Diagram of the Atlas-based segmentation process. The segmented atlas image (template) is registered to the image to be segmented (reference). The displacement vector field found in this step is then applied to the surface meshes of the segmented atlas, thereby providing a segmentation of the latter.

The second tool provided by the service also allows the user to load a 3D image, but this time with the purpose of automatically segmenting it. In order to do this, the user selects the appropriate atlas image from the server database, and then, by means of the non-rigid registration service described in Chapter5, a couple of minutes later automatically obtains a segmentation of the latter. Chapter 6

Discussion and Conclusions

Recent advances in image registration software have produced a large set of cutting-edge algorithms, most of them intensity-based, automatic methods, all characterized by their high complexity and computational cost. Adding to this that similar advances have occurred in the hardware counterpart, with hardware devices capable of producing better and bigger image data sets, this combination results in algorithms that are not actually used in daily clinical practice. Not only because their long execution time, sometimes many hours as in the case of elastic registration of 1024 1024 1024 voxel images, but also because of the lack of usability of the services× provided× to the end-user. The fundamental issue addressed in this work is then twofold: the compu- tational cost of advanced automatic registration algorithms, and additionally, the predominant working model that requires complex processes of installation, configuration, and consequently integration of a set of (usually) stand-alone and non-collaborative tools. To leverage the use of advanced registration algorithms in clinical practice, i.e., in time periods of seconds or few minutes, an infras- tructure of very high computational power is required. At software level, high performance can be achieved by reduction in data space –not likely nowadays–, reduction in solution search space -e.g., by using evolutionary algorithms–, and parallel processing. In particular, the ability to register structures is crucial in many medical applications, including follow-up studies of several treatments in diverse organs such as brain, liver or kidneys. Evidence based medicine attempts to find out the objective sources of information that aid to optimize the type and the moment of the medical intervention. Modern medical imaging has improved the quality of the diagnosis, but with very little quantitative information that effectively contribute to the quality of intervention. Registering structures would allow not only to match organs before and after a treatment, but also would permit

57 58 CHAPTER 6. DISCUSSION AND CONCLUSIONS to estimate a quantitative degree of a particular lesion. We demonstrated the utility of such applications when comparing bone mineralization before and after a surgical intervention, obviously this could be extended to improve fracture treatments in much more complicated clinical scenarios. However, for this to be useful it has to work in clinical time, i.e., in situations like intra-operative imaging or image guided surgery of tumors, for instance. There is still much to do in this field, mainly because the use of automatic registration has remained limited because of its high computational cost. For all we know, this is the first coherent attempt to solve this problem as well as to extend the solution to many clinical problems that have never been addressed in actual practice. This first aspect of the problem was addressed by designing and implementing a computing grid that takes advantage of idle CPU cycles, as the computational muscle behind a small HA computing cluster, responsible of providing the end- user services. For the second aspect of the problem a service-oriented approach has been proposed. In this model, medical imaging services are accessed via Internet, by means of a standard web browser. Since browsers come included in all modern operating systems, no installation or actualization process is required. To demonstrate the applicability of the computing grid in a real situation, we have presented two case studies: automatic digital subtraction and 3D non-rigid registration. In the first case, we evaluated two evolutionary algorithms, a genetic algorithm and differential evolution, as the search strategy to solve an expensive optimization problem. In this evaluation, differential evolution showed better performance and reliability than the genetic algorithm. The high computational cost of the evolutionary algorithm was addressed by developing a distributed implementation, and a speedup factor of about 30 was obtained. In the second case study, a sequential algorithm for non-rigid registration was also parallelized through distribution. The hypothesis assumed for the implementation of the distributed version of the algorithm is based upon the superposition principle, and justified by the fact that the regularizer applied constrains the displacements to small quantities at each iteration, whereby the actual displacement field can be decomposed as a linear combination of its projections on each 2D plane. For the aforementioned 3D images, the speedup obtained was about 170, reducing the execution time from more than 10 hours to a few minutes. The proposed service-oriented model for medical imaging is feasible and useful in research and clinical scenarios, and is in fact used daily in the School of Dental Medicine and the University Hospital. The following sections present a general discussion of the medical image registration problem and the main contributions of this work in that field. Particular discussions have already been presented in the corresponding chapters and articles. Finally, further work related to medical image registration algorithms and service-oriented models is 6.1. CONTRIBUTIONS 59 discussed.

6.1 Contributions

The main contributions of this work fall in three categories: 1) new algorithms and developments, 2) hardware infrastructure design and implementation, and 3) academic contributions.

New algorithms In the first category, a couple of new distributed algorithms has been devised. The first algorithm follows an evolutionary approach to solve the optimization problem that is to automatically register two digitized radiographs. While the use of evolutionary algorithms (EA) in optimization problems has already been proposed, the hybrid algorithm devised is novel in that it takes advantage of the EA to rapidly approximate the region of the global maximum, and then switches to a standard numerical method to find it. The algorithm is described in Chapter3 and has been published as part of the book “Computational intelligence in Expensive Optimization Problems”, Chapter 27, A Distributed Evolutionary Approach to Subtraction Radiography, G. Mañana and E. Romero, Springer-Verlag, ISBN 978-3-642- 10700-9, 2010. The service implemented using this algorithm is useful in research and clinical scenarios, and is used daily in the School of Dental Medicine. To the best of our knowledge, the second algorithm is the first distributed –and therefore parallel– version of the novel algorithm for non-rigid 3D registration proposed by Fischer et al. in [15]. The numerical scheme used in the original algorithm is based upon the concept of curvature and the Euler-Lagrange equations which characterize a minimizer, that leads to a high-dimensional system of non-linear partial differential equations. This results in an algorithm that requires more than 10 CPU hours to register a couple of 5123 pixel images, when executed on a single server. The distributed version devised uses 1536 CPU cores and shortens this time to 3.6 minutes, equivalent to an astounding speedup factor of about 170. The algorithm is described in Chapter5 and has been submitted for publication as “Distributed Curvature-based Non-rigid Image Registration”, G. Mañana and E. Romero, to Transactions on Medical Imaging, IEEE, in September 2010. In addition to the algorithms devised, the main contribution in this category is the development of a “disconnected” model and user interface for the medical imaging services implemented. Most online services, except for 60 CHAPTER 6. DISCUSSION AND CONCLUSIONS

some Google services like Calendar, require the existence of an active Internet connection to work properly. However, and mainly due to the high mobility of doctors in developing countries, it is not unusual that they have to work with intermittent connections or no connection at all, i.e., offline. To solve this, the disconnected model proposed is based upon a well known proxy design pattern. The proxy component comprises two local elements: a specialized web server and a small footprint database. When the user is online, the proxy takes advantage of the active connection to synchronize data –images and other data produced by the user– to and from the server. When the connection is lost, the proxy uses the local server and database to let the user keep working in a transparently manner. In conjunction with an OpenGL-based graphical user interface, this novel combination shows the feasibility of having an GUI with advanced capabilities, something that used to be restricted to standalone applications, but now in a web service that is accessed using a standard web browser.

Infrastructure design and implementation The contributions that fall in the second category are related to the design and development of a high- performance Tier-3 class computing cluster. The cluster was designed following the characterization presented in the AppendixA. In turn, the characterization was the result of the charge received during my internship at CERN1, while working as part of the Tier 3 Task Force team of the ATLAS experiment2. These guidelines have proven very useful to several research groups at universities in and other developing countries. The computing cluster is now part of Grid Colombia3, an inter-university research project sponsored by Colciencias4 that brings together fourteen institutions in a nationwide computational grid.

Academic contributions Finally, during the years it took this study, several academic contributions were made. These contributions were made mainly through workshops on image registration and related topics, and whose associated material can be found at the unGrid Research Project site5.

1Conseil Européen pour la Recherche Nucléaire, http://cern.ch/. 2Tier-3 Task Force, https://twiki.cern.ch/twiki/bin/view/Atlas/Tier3TaskForce. 3Grid Colombia, http://gridcolombia.org/. 4Colciencias, http://colciencias.gov.co/. 5unGrid, http://ungrid.unal.edu.co/doctorate/presentations.htm. 6.2. DISCUSSION 61

6.2 Discussion

This research has introduced some relevant contributions to the field of medical imaging, in particular, the use of grid computing techniques in routine clinical applications such as image registration and segmentation, with minimal com- putational requirements at the client side and modern interaction paradigms. Moreover, this work proposes an online service-oriented model that, while of common use in many other disciplines, its application to daily medical practice is mostly limited to management related tasks. The underlying idea behind this work is that medical imaging should no longer be constrained to the use of stand-alone applications, but that it can also be efficiently deployed as a service on the Internet and therefore available to a wider group of clinicians. From a user standpoint, the service-oriented model has many advantages over the stand-alone application model. First and foremost, it avoids the user having to manually install the software. Clinicians, as a rule, have neither the time nor the expertise, to deal with common issues that arise when installing complex software tools. The application can be accessed from any machine with a browser and can easily support multiple platforms consistently. Additionally, upgrades need only be performed on the server side and are immediately available to all service users. Data backup, as well as other availability related issues, can be performed on a single machine as data will not be spread out across multiple clients. Finally, and most important, the service-oriented paradigm leverages interdisciplinary and collaborative work, a critical success factor in current biomedical practice and research. It may be argued that for all this to work a network connection is strictly required. However, network access is not only becoming ubiquitous, but also we have seen that with the help of a software proxy and a small database, the interactive tools can continue working appropriately. The implemented framework allows doctors to use up to date medical imaging techniques and high-performance computing power in routine clinical studies, by means of a standard web browser and without specialized training.

6.3 Further Work

General-Purpose computing on Graphics Processing Units (GPGPU) is an emerg- ing field of research which allows software developers to utilize the significant amount of computing resources GPUs provide, for a wider range of applications. While traditional high performance computing environments such as clusters, grids and supercomputers require significant architectural modifications to incor- porate GPUs, volunteer computing grids already have these resources available as most personal computers have GPUs available for recreational use. The GPUs are 62 CHAPTER 6. DISCUSSION AND CONCLUSIONS chosen because of their availability and large increase in computational power over conventional CPUs [36]. While developing GPGPU applications still requires significant technical knowledge, the process is becoming easier. At JavaOne 2010, for instance, AMD announced the release of an alpha version of Aparapi [11], an API that enables Java developers to express data parallel workloads plus a runtime capable of running compatible workloads on a GPU. By having the code run either on the GPU or falling back to Java execution, Aparapi helps preserve Java’s “write once” principle. Additionally, the large amount of computing re- sources this type of hardware provides makes utilizing GPUs a highly desirable prospect, especially in the area of volunteer computing. As the hardware and software matures, we expect GPGPU applications to become more mainstream. Therefore, future work in this regard is related to the modification of the unGrid infrastructure, to not only take advantage of CPU idle cycles, but also to use GPU idle cycles. The replicated-worker pattern is the key enabler in grid computing and one of the most common patterns for parallelizing work. The use of a clustering tool [120] to solve the problem of having a single point of failure in the shared memory service, has proved to be greatly effective. With this in mind, further work is related to the implementation of the same computing services using this clustering tool, and to compare the resulting system with the current one, implemented using Jini and JavaSpaces. Our research group is currently working on evolving the architecture in use towards a cloud computing model, in which the common theme relies on integrated services over the Internet to satisfy the clinician computing and visualization needs. Regarding the algorithms used in registration, current and future work is re- lated to further exploring hybrid evolutionary algorithms such as those presented in [54, 20, 91], their possible application to 3D curvature-based registration [65], as well as their distribution and parallelization. Although a lot of work has been done in the field, automatic image registration still remains an open problem. Registration of images with complex non-linear and local distortions, multimodal registration, and registration of N-D images (where N > 2) belong to the most challenging tasks at this moment. The major difficulty of N-D image registration resides in its computational complexity. Although the speed of computers has been growing, the need to decrease the computational time of methods persists. The complexity of methods as well as the size of data still grows (the higher resolution, higher dimensionality, larger size of scanned areas). In the future, the idea of an ultimate registration method, able to recognize the type of given task and to decide by itself about the most appropriate solution, will motivate the development of expert systems and other tools using techniques from the field of artificial intelligence. As awareness of the problems of the patient motion 6.3. FURTHER WORK 63 and anatomic changes increases, further research on 4D imaging and non-rigid registration will be stimulated to meet the clinical demands. With advances in hybrid registration algorithms and parallel computing, more progresses are expected, resulting in improved accuracy and performance.

Appendices

65

Appendix A

Characterization of Tier 3 Sites

The documentation included in this appendix is the result of the charge received during my internship at CERN1, while working as part of the Tier 3 Task Force team of the ATLAS experiment2. The internship was sponsored by the HELEN grant3 and entailed three periods of three months each (years 2006, 2007 and 2008) at CERN headquarters in the Swiss/French border near Geneva, Switzer- land. Section A.1 shows the contents of the charge, and the following sections present the report produced, including observations and recommendations. The characterization document can be accessed at https://twiki.cern. ch/twiki/bin/view/Atlas/Tier3TaskForce.

1Conseil Européen pour la Recherche Nucléaire, http://cern.ch. 2Tier-3 Task Force, https://twiki.cern.ch/twiki/bin/view/Atlas/Tier3TaskForce. 3High Energy Latinamerican-European Network, http://www.roma1.infn.it/exp/ helen/.

67

Appendix B

National University’s Tier-3 Computing Cluster

The documentation included in this appendix is the result of the implementation of a high-performance computing cluster at the National University of Colombia main campus in Bogotá. The cluster has been designed following the guidelines presented in the AppendixA. The first section, (Section B.1), presents detailed information about the cluster technical specifications, the hardware installation and configuration process, including a diagram of the communications scheme de- ployed. The second section, (Section B.2), presents a description of the software stack deployed, from the operating system to the grid computing middleware selected. Finally, (Section B.3), presents the performance evaluation process realized as well as the results obtained.

B.1 Hardware Specification

When this research project started back six years ago, there were already several thousand desktop computers in the main university campus, interconnected by a backbone of high speed fiber optics. Studies carried out on a random sample of more than two hundred of these computers during the day, showed that an average of 90% of the CPU cycles were wasted. Our research group [104] works on mathematical modeling and machine learning techniques, and provides multiple services for medical imaging, including simulation, visualization, and analysis, as well as technological developments. Like most research groups at the university, our group requires a significant computing capacity for its everyday research and production related activities. Since at that time there were no other high-performance computing facilities, we started the development of a general

69 70 APPENDIX B. NATIONAL UNIVERSITY’S TIER-3 COMPUTING CLUSTER purpose computing grid that would allow us to benefit from the idle CPU cycles. These computers have 1.5 GHz processors and 2 GB RAM in average, and are the basis on which the computational grid has been constructed. While the backbone is FDDI, most computers have 10/100 Mbps network interface cards, which becomes a major bottleneck. The operating systems used in these computers range from different MS Windows (60%), to several flavors of Linux (40%). The HA cluster is composed of 15 Dell M600 blade servers, two quad-core Intel Xeon 5430 2.66 GHz processors, and 16 GB of RAM each. The operating system used on these servers is Scientific Linux 5.4 [22]. The front end computer for this cluster is a Dell 1950, two quad-core Intel Xeon 5570 2.66 GHz processors, and 32 GB of RAM. The framework for medical imaging services is an all-Java implementation. As mentioned before, the middleware used in the computing grid is based upon the Jini technology, and the code that runs on the worker nodes was also written in Java. The HA cluster uses the JBoss application server [75] and the Terracotta [9] clustering software for replication. This tool delivers clustering as a runtime infrastructure service, which simplifies the task of clustering a Java application immensely. This is done by clustering the JVM underneath the application, through bytecode manipulation, instead of clustering the application itself. The database engine used is MySQL Community Server 5.1 [105]. In the client side, the GUI has been implemented as a Java applet that uses the Java binding for OpenGL API [97].

B.2 Software Specification

B.2.1 Basic Configuration

The basic configuration of the cluster is based upon Sun HPC Software Linux Edition 2.0.1 ready-made framework. This simplifies the deployment of com- puting clusters by providing a set of software components to provision, manage, and operate large scale Linux clusters. It also serves as a foundation for optional add-ons such as schedulers, like Sun’s Sun Grid Engine, and other components not included with the original solution. The following components have been installed at the time of this writing:

Lustre 1.8.0.1 Lustre is an object-based, distributed file system, generally used for large scale cluster computing. The name Lustre is a blend of the words Linux and cluster. Available under the GNU GPL, the project aims to provide a file system for clusters of tens of thousands of nodes with B.2. SOFTWARE SPECIFICATION 71

petabytes of storage capacity, without compromising speed, security or availability through the decades. perfctr 2.6.39 In computer science, Performance Application Programming In- terface (PAPI) is a portable interface (in the form of a library) to hardware performance counters on modern microprocessors. It is being widely used to collect low level performance metrics (e.g. instruction counts, clock cycles, cache misses) of computer systems running UNIX/Linux operat- ing systems. A Linux/x86 kernel has to be patched with a performance monitoring counters driver (perfctr) to support PAPI.

Env-switcher 1.0.13 The env-switcher package provides an convenient method for users to switch between “similar” packages. System- and user-level defaults are maintained in data files and are examined at shell invocation time to determine how the user’s enviornment should be set up. The canonical example of where this is helpful is using multiple implementations of the Message Passing Interface (MPI). genders 1.11 Genders is a static cluster configuration database used for cluster configuration management. It is used by a variety of tools and scripts for management of large clusters. The genders database is typically replicated on every node of the cluster. It describes the layout and configuration of the cluster so that tools and scripts can sense the variations of cluster nodes. By abstracting this information into a plain text file, it becomes possible to change the configuration of a cluster by modifying only one file. git 1.6.1.3 Git is a free distributed revision control, or software source code management project with an emphasis on being fast. Git was initially designed and developed by Linus Torvalds for Linux kernel development. Every Git working directory is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on network access or a central server.

Heartbeat 2.1.4-2.1 The Linux-HA (High-Availability Linux) project provides a high-availability (clustering) solution for Linux, FreeBSD, OpenBSD, Solaris and Mac OS X which promotes reliability, availability, and serviceability (RAS). The project’s main software product is Heartbeat, a GPL-licensed portable cluster management program for high-availability clustering.

Mellanox Firmware Tools 2.5.0 The Mellanox Firmware Tools (MFT) package is a set of firmware management tools for a single InfiniBand node. MFT can be used for: Generating a standard or customized Mellanox firmware 72 APPENDIX B. NATIONAL UNIVERSITY’S TIER-3 COMPUTING CLUSTER

image Querying for firmware information. Burning a firmware image to a single InfiniBand node.

Modules 3.2.6 Modules is a package that lets users add or remove changes to your environment to use various applications or libraries. It also allows a user to lookup what applications are available on a cluster without doing an “ls” exploration. Currently module is only available on clusters bulldogj and newer.

MVAPICH 1.1 / MVAPICH2 1.2p1 Network Based Computing Lab (NowLab) is a part of the Computer Science & Engineering Dept at The State University. One of the notable projects in the group is MVAPICH which deals with investigating the potential of InfiniBand and other emerging RDMA-enabled networking technologies for designing High Performance and Scalable Communication and I/O Subsystems for Clusters with multi- thousand nodes.

OFED 1.4.1 The Sockets Direct Protocol (SDP) is a networking protocol orig- inally defined by the Software Working Group (SWG) of the InfiniBand Trade Association. Originally designed for InfiniBand, SDP now has been redefined as a transport agnostic protocol for Remote Direct Memory Access (RDMA) network fabrics. SDP defines a standard wire protocol over an RDMA fabric to support stream sockets (SOCK_STREAM) network. SDP utilizes various RDMA network features for high-performance zero-copy data transfers. SDP is a pure wire-protocol level specification and does not go into any socket API or implementation specifics. The purpose of the Sockets Direct Protocol is to provide an RDMA accelerated alterna- tive to the TCP protocol on IP. The goal is to do this in a manner which is transparent to the application. Today, Sockets Direct Protocol for the Linux operating system is part of the OpenFabrics Enterprise Distribution (OFED), a collection of RDMA networking protocols for the Linux operating system. OFED is managed by the OpenFabrics Alliance. Many standard Linux distributions include the current OFED.

RRDTool 1.2.30 The round-robin database tool RRDtool aims to handle time- series data like network bandwidth, temperatures, CPU load etc. The data is stored in a round-robin database thus the system storage footprint remains constant over time. It also includes tools to extract RRD data in a graphical format.

HPCC Bench Suite 1.2.0 The HPC Challenge benchmark consists at this time of 7 benchmarks: HPL, STREAM, RandomAccess, PTRANS, FFTE, DGEMM B.2. SOFTWARE SPECIFICATION 73

and b_eff Latency/Bandwidth. HPL is the Linpack TPP benchmark. The test stresses the floating point performance of a system. STREAM is a benchmark that measures sustainable memory bandwidth (in GB/s), Ran- domAccess measures the rate of random updates of memory. PTRANS measures the rate of transfer for larges arrays of data from multiprocessor’s memory. Latency/Bandwidth measures (as the name suggests) latency and bandwidth of communication patterns of increasing complexity between as many nodes as is time-wise feasible.

Lustre IOKit The Lustre I/O kit is a collection of benchmark tools for a Lustre cluster. The I/O kit can be used to validate the performance of the various hardware and software layers in the cluster and also as a way to find and troubleshoot I/O issues. The I/O kit contains three tests. The first surveys basic performance of the device and bypasses the kernel block device layers, buffer cache and file system. The subsequent tests survey progressively higher layers of the Lustre stack. Typically with these tests, Lustre should deliver 85-90% of the raw device performance.

Sun HPC ClusterTools 8.2 Sun HPC ClusterTools software is an integrated toolkit that allows developers to create and tune Message-passing Interface (MPI) applications that run on high performance clusters and SMPs. Sun HPC ClusterTools software offers a comprehensive set of capabilities for parallel computing.

Sun Grid Engine 6.2u3 The Grid Engine project is an open source community effort to facilitate the adoption of distributed computing solutions. Spon- sored by Sun Microsystems and hosted by CollabNet, the Grid Engine project provides enabling distributed resource management software for wide ranging requirements from compute farms to grid computing.

IOR 2.10.1 IOR is an I/O benchmark which is particularly useful for characteriz- ing the performance of parallel / cluster file systems. In particular, it can perform parallel reads and writes to/from either a single file, or multiple files, using MPIIO.

LNET self test In a cluster with a Lustre file system, the system network con- necting the servers and the clients is implemented using Lustre Networking (LNET), which provides the communication infrastructure required by the Lustre file system. Disk storage is connected to the Lustre file system MDSs and OSSs using traditional storage area network (SAN) technologies. LNET supports many commonly-used network types, such as InfiniBand and IP 74 APPENDIX B. NATIONAL UNIVERSITY’S TIER-3 COMPUTING CLUSTER

networks, and allows simultaneous availability across multiple network types with routing between them.

NetPIPE 3.7.1 NetPIPE is a protocol independent performance tool that visually represents the network performance under a variety of conditions. It performs simple ping-pong tests, bouncing messages of increasing size between two processes, whether across a network or within an SMP system. Message sizes are chosen at regular intervals, and with slight perturbations, to provide a complete test of the communication system. Each data point involves many ping-pong tests to provide an accurate timing. Latencies are calculated by dividing the round trip time in half for small messages (< 64 Bytes).

Slurm 1.3.13 SLURM is an open-source resource manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclu- sive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a par- allel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

MUNGE 0.5.8 In computing, the term munge means to attempt to create a strong, secure password through character substitution. "Munge" is some- times backronymmed as Modify Until Not Guessed Easily. The usage differs significantly from Mung (Mash Until No Good), because munging implies destruction of data, while mungeing implies creation of strong protection for data. Google demonstration

Ganglia 3.1.01 Ganglia is a scalable distributed monitoring system for high- performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. oneSIS 2.0.1 oneSIS is an open-source software package aimed at simplifying diskless cluster management. It is a simple and highly flexible method for deploying and managing a system image for diskless systems that can turn any supported Linux distribution into a master image capable of being used in a diskless environment.

Cobbler 1.4.1 Cobbler is a Linux provisioning server that centralizes and simpli- fies control of services including DHCP, TFTP, and DNS for the purpose of B.2. SOFTWARE SPECIFICATION 75

performing network-based operating systems installs. It can be configured for PXE, reinstallations, and virtualized guests using Xen, KVM or VMware. Cobbler interacts with a program called Koan for reinstallation and virtu- alization support. Koan and Cobbler use libvirt to integrate with different virtualization software.

CFEngine 2.2.6 Many datacentre managers feel that there must be a better way of managing systems and that the way must be through automation, but don’t see how or where to begin. Using a tool like cfengine can be the answer but getting started is a daunting prospect and it takes time to build up trust. What does configuration mean? The configuration of resources (like disk contents and process tables) is the “state of repair” of the system. The core to life-cycle management is the Build-Deploy-Manage- Audit cycle. Using this cycle, the lifetime of the system can be assured with a minimum of human intervention. CFEngine is unique in its ability to perform real-time repair, providing hands-free management of systems.

Conman 0.2.3 ConMan is a serial console management program designed to support a large number of console devices and simultaneous users. It supports local serial devices, remote terminal servers (via the telnet pro- tocol), Unix domain sockets, and external processes (e.g., using Expect to control connections over telnet, ssh, or IPMI Serial-Over-LAN). Its features include logging (and optionally timestamping) console device output to file, connecting to consoles in monitor (R/O) or interactive (R/W) mode, allowing clients to share or steal console write privileges, and broadcasting client output to multiple consoles.

FreeIPMI 0.7.5 FreeIPMI provides in-band and out-of-band IPMI software based on the IPMI v1.5/2.0 specification. The IPMI specification defines a set of interfaces for platform management and is implemented by a number vendors for system management. The features of IPMI that most users will be interested in are sensor monitoring, system event monitoring, power control, and serial-over-LAN (SOL). The FreeIPMI tools and libraries listed below should provide users with the ability to access and utilize these and many other features. A number of useful features for large HPC or cluster environments have also been implemented into FreeIPMI.

IPMItool 1.8.10 IPMItool is a utility for managing and configuring devices that support the Intelligent Platform Management Interface (IPMI) version 1.5 and version 2.0 specifications. IPMI is an open standard for monitoring, logging, recovery, inventory, and control of hardware that is implemented independent of the main CPU, BIOS, and OS. The service processor (or 76 APPENDIX B. NATIONAL UNIVERSITY’S TIER-3 COMPUTING CLUSTER

Baseboard Management Controller, BMC) is the brain behind platform management and its primary purpose is to handle the autonomous sensor monitoring and event logging features. lshw B.02.14 lshw (Hardware Lister) is a small tool to provide detailed infor- mation on the hardware configuration of the machine. It can report exact memory configuration, firmware version, mainboard configuration, CPU version and speed, cache configuration, bus speed, etc. on DMI-capable x86 or EFI (IA-64) systems and on some PowerPC machines. OpenSM 3.1.1 In order to bring up and manage an IB subnet, a subnet manager (SM) is required. Closely associated with the SM is subnet adminstration (SA) which answers queries from the end port, which are needed to obtain paths, setup and query multicast, and deal with services and events. The OpenIB SM is OpenSM. pdsh 2.18 Pdsh is a an efficient, multithreaded remote shell client which exe- cutes commands on multiple remote hosts in parallel. Pdsh implements dynamically loadable modules for extended functionality such as new re- mote shell services and remote host selection. PowerMan 2.3.5 PowerMan is a tool for manipulating remote power control (RPC) devices from a central location. Several RPC varieties are supported natively by PowerMan and Expect-like configurability simplifies the addition of new devices.

B.2.2 OSG middleware

Grid Colombia is an academic organization that currently gathers 14 universi- ties [25], including National University of Colombia, with the aim of constructing a nationwide computing grid. The grid uses the academic high-speed network RENATA [112] and will help to solve relevant computational problems in the colombian context. For this purpose, the Open Science Grid (OSG) [26] mid- dleware has been installed on a set of virtual machines using the Xen [24] hypervisor. Interesting enough, Apache Hadoop [76], an open-source data pro- cessing framework that uses the same replicated-worker pattern (“map-reduce” in their parlance) applied in this work, is now included in the OSG Virtual Data Toolkit (VDT) [43].

B.2.3 gLite middleware

Grid Colombia is an academic organization that currently gathers 14 universi- ties [25], including National University of Colombia, with the aim of constructing B.3. CLUSTER PERFORMANCE 77 a nationwide computing grid. The grid uses the academic high-speed network RENATA [112] and will help to solve relevant computational problems in the colombian context. For this purpose, the Open Science Grid (OSG) [26] mid- dleware has been installed on a set of virtual machines using the Xen [24] hypervisor. Interesting enough, Apache Hadoop [76], an open-source data pro- cessing framework that uses the same replicated-worker pattern (“map-reduce” in their parlance) applied in this work, is now included in the OSG Virtual Data Toolkit (VDT) [43].

B.3 Cluster Performance

SPEC CPU2006 Based upon the SPECint2006 values obtained [32], the esti- mated performance of the cluster is 360k SPECint2006. The SPEC CPU2006 benchmark measures processor, chipset, and compiler speed (SPECint and SPECfp) and throughput (SPECint_rate and SPECfp_rate). The SPECint (single-task) and SPECint_rate (multi-task) benchmarks measure compute- intensive integer performance, while SPECfp (single-task) and SPECfp_rate (multi-task) measure compute-intensive floating point performance. The integer benchmarks are representative of most real-world workloads, while the floating point benchmarks are more specialized (e.g., crash simulations, ocean modeling).

Appendix C

Published Material

This appendix section lists the published material, papers and book chapters, produced during the development of the doctoral work. This includes:

C.1 Distributed Genetic Algorithm for Subtraction Radiography, GECCO 2006, ISBN: 1-59593-186-4

Abstract: Digital subtraction is a promising technique used in radiographic studies of periapical lesions and other dental disorders for which the treatment must be evaluated over time. This paper presents a fast and reliable automated image registration method for subtracting two radiographs where an unpredicted mismatch is present. An optimal affine transformation is found using an adaptive Genetic Algorithm (GA) as the optimization strategy and a correlation ratio as the similarity measure. The parallel GA implemented takes advantage of the CPU idle cycles of a computational grid, resulting in an application with a computational time of less than three minutes when processing standard intra-oral radiographs pairs.

C.2 Grid Computing Based Subtraction Radiography, ICIP 2007, ISBN: 978-1-4244-1437-6

Abstract: Digital subtraction is a common technique used in radiographic studies of periapical lesions and other dental disorders for which the treatment must be evaluated over time. This paper presents a fast and reliable parallel image registration method for subtracting two digitized radiographs where an unpre- dicted mismatch is present. An optimal affine transformation is found using an adaptive Genetic Algorithm (GA) as the optimization strategy and a correlation

79 80 APPENDIX C. PUBLISHED MATERIAL ratio as the similarity measure. The parallel GA implemented takes advantage of the CPU idle cycles of a computational grid, resulting in an application with a computational time of about fifty seconds when processing pairs of standard intra-oral radiographs.

C.3 Characterization of Tier 3 Sites, CERN 2008, ISBN: 978-92- 2083-321-5

Abstract: The documentation included in this appendix is the result of the charge received during my internship at CERN1, while working as part of the Tier 3 Task Force team of the ATLAS2 experiment. The internship was sponsored by the HELEN grant3 and entailed three periods of three months each (years 2006, 2007 and 2008) at CERN headquarters in the Swiss/French border near Geneva, Switzerland. Section A.1 shows the contents of the charge, and the following sec- tions present the report produced, including observations and recommendations.

C.4 A Distributed Evolutionary Approach to Subtraction Radiogra- phy, Springer-Verlag 2009, ISBN: 978-3-642-10700-9

Abstract: Automatic image registration is a fundamental task in medical image processing, and significant advances have occurred in the last decade. How- ever, one major problem with advanced registration techniques is their high computational cost. Due to this restraint, these methods have found limited application to clinical situations where real time or near real time execution is required, e.g., intra-operative imaging, or high volumes of data need to be pro- cessed periodically. High performance in image registration can be achieved by reduction in data and search spaces. However, to obtain a significant increase in performance, these approaches must be complemented with parallel processing. Parallel processing is associated with expensive supercomputers and computer clusters that are unaffordable for most public medical institutions. This chapter will describe how to take advantage of an existing computational infrastructure and achieve high performance image registration in a practical and affordable way. More specifically, it will outline the implementation of a fast and robust Internet subtraction service, using a distributed evolutionary algorithm and a service-oriented architecture.

1Conseil Européen pour la Recherche Nucléaire, http://cern.ch. 2http://pcatdwww.cern.ch/atlas-point1/ATLASview/ACR.htm. 3High Energy Latinamerican-European Network, http://www.roma1.infn.it/exp/ helen/. C.5. AUTOMATIC REGISTRATION METHOD FOR THE EVALUATION OF POST SURGICAL BONE MINERALIZATION, SUBMITTED TO INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, SPRINGER VERLAG 201081

C.5 Automatic Registration Method for the Evaluation of Post Sur- gical Bone Mineralization, submitted to International Journal of Computer Assisted Radiology and Surgery, Springer Verlag 2010

Abstract: Digital subtraction is a common technique used in radiographic studies of periapical lesions and other dental disorders, for which the treatment must be evaluated over time. This paper presents the design, implementation and validation of a fast and reliable image registration Internet service for subtracting two digitized radiographs when an unpredicted mismatch is present. Subtraction is performed after optimal alignment of the radiographs and this is achieved using a multiresolution optimization strategy and a distributed evolutionary algorithm. The algorithm uses the free CPU cycles of a computational grid, resulting in an application that can process pairs of standard intra-oral radiographs in less than a minute in average. For a set of ten test image pairs, registration results obtained by four experts in the field were compared to those obtained by the automatic system. The validation process shows that no significant differences appear between them (p < 0.01).

C.6 A Service-Oriented Approach to High-Performace Medical Im- age Processing, submitted to the International Journal of Med- ical Informatics, Elsevier 2010

Current medical image processing has become a complex mixture of many sci- entific disciplines including mathematics, statistics, physics, and algorithmics, to perform tasks such as registration, segmentation, and visualization, with the ultimate purpose of helping clinicians in their daily routine. This requires high performance computing capabilities that can be achieved in several ways. This paper presents a space-based computational grid that uses the otherwise wasted CPU cycles of a set of personal computers, to provide high-performance medical imaging services over the Internet. By using an existing hardware infrastructure and software of free distribution, the proposed approach is apt for university hospitals and other low-budget institutions. This will be illustrated by the use of two real case studies of services where an important speedup factor has been obtained and whose performance has become suitable for use in real clinical scenarios. 82 APPENDIX C. PUBLISHED MATERIAL

C.7 Distributed Curvature-based Non-rigid Image Registration, sub- mitted to Transactions on Medical Imaging, IEEE 2010

Abstract: The task of image registration is to find an optimal geometric trans- formation between corresponding image data. The image registration problem can be stated in just a few words: given a reference and a template image, find an appropriate geometric transformation such that the transformed template becomes similar to the reference. In practice, the concrete types of the geometric transformation, as well as the notions of optimal and corresponding depend on the specific application. In many applications a rigid transformation, i.e., translations and rotations only, is enough to describe the spatial relationship between two images. However, there are many other applications where non- rigid transformations are required to describe this spatial relationship adequately. This paper introduces a novel approach for the parallelization of the non-rigid curvature-based registration algorithm proposed by Fischer and Modersitzki [65] in 2004. The hypothesis assumed for the implementation of the distributed version of the algorithm is based upon the superposition principle, and justified by the fact that the regularizer constrains the displacements to small quantities at each iteration, whereby the actual displacement field can be decomposed as a linear combination of its projections on each 2D plane. Bibliography

The numbers at the end of each entry list pages where the reference was cited. In the electronic version, they are clickable links to the pages. [1] Fidler A, Likar B, Pernus F, and Skaleric U. Influence of developer ex- haustion on accuracy of quantitative digital subtraction radiography. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontology, 90(2):233–239, 2000. 48 [2] Parashis A. Comparison of two regenerative procedures-guided tissue regeneration and demineralized freeze-dried bone allograft-in the treat- ment of intrabony defects: A clinical and radiographic study. Journal of Periodontology, 69(7):751–758, 1998. 47, 48 [3] Petersson A, Ekberg EC, and Nilner M. An evaluation of digital subtraction radiography for assessment of changes in position of the mandibular condyle. Dentomaxillofacial Radiology, 27:230–235, 1998. 29 [4] Sarti A, de Solórzano CO, Locket S, and Malladi R. A geometric model for 3-d confocal image analysis. IEEE Transactions on Biomedical Engineering, 47(12):1600–1609, 2000. 82 [5] Tsai A, Yezzi Jr. A, Wells W, Tempany C, Grimson WE, and Willsky A. Coupled multishape model and mutual information for medical image segmentation. Lecture Notes in Computer Science, 2732:185–197, 2003. 82 [6] Tsai A, Yezzi Jr. A, Wells, W Tempany C, Tucker D, Fan A, Grimson WE, and Willsky A. A shape-based approach to the segmentation of medical imagery using level sets. IEEE Transactions on Medical Imaging, 22(2):137–154, 2003. 82 [7] Wenzel A. Effect of manual compared with reference point superimposition on image quality in digital subtraction radiography. Dentomaxillofacial Radiology, 18:145–150, 1989. 48

83 84 BIBLIOGRAPHY

[8] Yezzi Jr. A, Kichenassamy S, Kumar A, Olver P, and Tannenbaum A. A geometric snake model for segmentation of medical imagery. IEEE Trans- actions on Medical Imaging, 16(2):199–209, 1997. 82

[9] Zilka A. The Definitive Guide to Terracotta: Cluster the JVM for Spring, Hibernate, and POJO Scalability. Apress Berkeley, 2008. 98

[10] Farag AA, Yamany SM, Nett J, Moriarty T, El-Baz A, Hushek S, and Falk R. Medical Image Registration: Theory, Algorithm, and Case Studies in Surgical Simulation, Chest Cancer, and Multiple Sclerosis, volume 3, chapter 1, pages 1–46. Kluwer Academic/Plenum Publishers, , NY, 2005.4, 51

[11] Inc. Advanced Micro Devices. Aparapi. http://developer.amd.com/ zones/java/aparapi/Pages/default.aspx, 2010. 90

[12] Eiben AE and Smith JE. Introduction to Evolutionary Computing. Natural Computing Series. Springer, 2nd. edition, 2007. 57

[13] Gartner AH and Dorn SO. Avances en cirugía endodóntica. Dental Clinics of North America, 36(2):357–378, 1992. 47

[14] Apache. Apache River. http://incubator.apache.org/river, 2007. 23, 62

[15] Fischer B and Modersitzki J. Curvature-based image registration. Journal of Mathematical Imaging and Vision, 18(1):81–85, 2003. 73, 76, 87

[16] Fischer B and Modersitzki J. A unified approach to fast image registration and a new curvature based registration technique. Linear Algebra and its Applications, pages 107–124, 2004. 73

[17] Ghaffary BK and Sawchuk AA. A survey of new techniques for image registration and mapping. In Proceedings of the SPIE: Applications of Digital Image Processing, volume 432, pages 222–239, 1983. 71

[18] Broit C. Optimal registration of deformed images, ph.d. thesis. Computer and Information Science, University of Pensylvania, 1981. 76

[19] Canalda C and Brau E. Endodoncia, técnica clínica y bases científicas. Editorial Masson España, 2nd. edition, 2006. 52

[20] Grosan C, Abraham A, and Ishibuchi H. Hybrid Evolutionary Algorithms. Springer, 2007. 73, 90 BIBLIOGRAPHY 85

[21] Studholme C, Hill DLG, and Hawkes DJ. An overlap invariant entropy measure of 3d medical image alignment. Pattern Recognition, 32:71–86, 1999. 72

[22] CERN. Scientific Linux. https://www.scientificlinux.org/, 2010. 98

[23] Papadimitriou CH. Computational Complexity. AddisonÐWesley, 1994. 20

[24] Inc. Citrix Systems. Xen Hypervisor. http://www.xen.org/, 2010. 104, 105

[25] Grid Colombia. Grid colombia: Nationwide computing grid using renata. http://www.gridcolombia.org/index.php?option=com_ content&view=article&id=60&Itemid=77, 2010. 104

[26] Open Science Grid Consortium. Distributed computing grid for data- intensive research. http://www.opensciencegrid.org/, 2010. 104, 105

[27] Gelernter D. Generative communication in linda. ACM TRansactions on Programming Languages and Systems, 7(1):80–112, 1985. 21, 23, 63

[28] Peleg D. Distributed Computing: A Locality-Sensitive Approach. SIAM, 2000. 20

[29] Rueckert D. Non-rigid Registration: Concepts, Algorithms and Applications, chapter 13, pages 281–301. Biomedical Engineering. CRC Press, Florida, FL, 2001.3, 50, 75

[30] Bader DA and Pennington R. Cluster computing: Applications. The International Journal of High Performance Computing, 15(2):181–185, 2001.8

[31] Goldberg DA. Genetic algorithms in search, optimization and machine learning. Addison-Wesley Professional, 1989. 35, 36, 57, 59

[32] Dell. Dell PowerEdge Server Benchmarks. http://www.dell.com/ content/topics/topic.aspx/global/products/pedge/topics/ en/benchmarks?c=us&l=en&cs=555, 2010. 105

[33] Van den Elsen PA, Pol EJD, and Viergever MA. Medical image matching- a review with classification. IEEE Engineering in Medicine and Biology, 12:26–39, 1993. 71 86 BIBLIOGRAPHY

[34] Hawkes DJ. Registration Methodology: Introduction, chapter 2, pages 11–38. Biomedical Engineering. CRC Press, Florida, FL, 2001.2

[35] Hill DLG, Batchelor PG, Holden M, and Hawkes DJ. Medical image registration. Physics in Medicine and Biology, 46:1–45, 2001. 71

[36] Elsen E, Houston M, Vishal V, Darve E, Hanrahan P, and Pande VS. N-body simulation on gpus. SC’06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 188, 2006. 90

[37] Freeman E, Hupfer S, and Arnold K. JavaSpaces Principles, Patterns, and Practice. Prentice Hall PTR, 1999. 23, 24, 63

[38] Ekberg EC, Petersson A, and Nilner M. An evaluation of digital subtraction radiography for assessment of changes in position of the mandibular condyle. Dentomaxillofacial Radiology, 27:230–235, 1998. 48

[39] Johnson EJ. The Complete Guide to Client/Server Computing. Prentice Hall, 2001. 21

[40] Delano EO, Tyndall D, Ludlow JB, Trope M, and Lost C. Quantitative radiographic follow-up of apical surgery: a radiometric and histologic correlation. Journal of Endodontics, 69(7):420–426, 1998. 48

[41] Berman F, Fox G, and Hey AJG. Grid Computing: Making The Global Infrastructure a Reality. Wiley, 2003.7,8

[42] Carvalho FB, Gonçalves M, and Tanomaru-Filho M. Evaluation of chronic periapical lesions by digital subtraction radiography by using adobe photo- shop cs: A technical report. Journal of Endodontics, 33(4):493–497, 2007. 71

[43] Apache Software Foundation. Apache Hadoop. http: //opensciencegrid.org/Hadoop_Announcement, 2010. 104, 105

[44] Mañana G, Romero E, and González F. A grid computing approach to subtraction radiography. In IEEE International Conference on Image Processing, pages 3225–3228, 2006. 31, 52

[45] Davis GB and Parker CA. Writing the Doctoral Dissertation, A systematic approach. Barron’s Educational Services, 2nd. edition, 1997. 17

[46] Christensen GE. Deformable shape models for anatomy, ph.d. thesis. Sever Institute of Technology, University of Washington, 1994. 76 BIBLIOGRAPHY 87

[47] Andrews GR. Foundations of Multithreaded, Parallel, and Distributed Pro- gramming. Addison-Wesley, 2000. 20

[48] Open Science Grid. Grid user management system. https: //twiki.grid.iu.edu/bin/view/ReleaseDocumentation/ InstallConfigureAndManageGUMS, 2010. 27

[49] Rocks Group. Rocks Clusters. http://www.rocksclusters.org/, 2008. 43

[50] Gröndahl H and Gröndahl K. Subtraction radiography for the diagnosis of periodontal bone lesions. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontology, 55:208–213, 1983. 29, 47, 48, 52

[51] Gröndahl H, Gröndahl K, Okano T, and Webber RL. Statistical contrast enhancement of subtraction images for radiographic caries diagnosis. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontology, 53:219–223, 1982. 48

[52] Lester H and Arridge SR. A survey of hierarchical non linear medical image registration. Pattern Recognition, 32(1):129–149, 1999.3, 50, 71

[53] Park H, Bland PH, and Meyer CR. Construction of an abdominal prob- abilistic atlas and its application in segmentation. IEEE Transactions on Medical Imaging, 22(4):483–492, 2003. 82

[54] Talbi H and Batouche M. Hybrid particle swarm with differential evolu- tion for multimodal image registration. IEEE International Conference on Industrial Technology, pages 1567–1572, 2004. 73, 90

[55] Zhuge H. The Knowledge Grid Methodology, chapter 1, pages 2–32. World Scientific Publishing Co. Pte. Ltd., Belligham, WA, 2004. 14

[56] Cordón HF, Damas S, and Santamaría J. A chc evolutionary algorithm for 3d image registration. Lecture Notes in Artificial Intelligence, 2715:440–411, 2003. 29

[57] Gómez García HF, González Vega A, Hernández Aguirre A, Marroquín Za- leta JL, and Coello Coello C. Robust multiscale affine 2d-image registra- tion through evolutionary strategies. Lecture Notes in Computer Science, 2439:740–748, 2002. 29

[58] Schwefel HP. Evolution and Optimum Seeking: The Sixth Generation. Wiley-Interscience, New York, NY, 1995. 35, 57 88 BIBLIOGRAPHY

[59] De Falco I, Della Cioppa A, Maisto D, and Tarantino E. Differential evolu- tion as a viable tool for satellite image registration. Applied Soft Computing, 8:1453–1462, 2008. 30

[60] Keidar I. Distributed computing column 32. ACM SIGACT News, 39(4):53– 54, 2008. 20

[61] Rechenberg I. Evolutionsstrategie - Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Frommann-Holzboog, Stuttgart, 1973. 35, 57

[62] Young IT and van Vliet LJ. Recursive implementation of the gaussian filter. Signal Processing, 44:139–151, 1995. 77

[63] Ashburner J, Neelin P, Collins DL, Evans A, and Friston K. Incorporating prior knowledge into image registration. Neuroimage, 6:344–352, 1997. 72

[64] Beutel J, Sonka M, Kundel HL, Fitzpatrick JM, and Van Metter RL. Medical Image Processing and Analysis, volume 2, pages 447–513. SPIE Press, Belligham, WA, 2000.4, 51

[65] Modersitzki J. Numerical Methods for Image Registration. Oxford University Press, 2004. 76, 90, 110

[66] Nelder J and Mead RA. A simplex method for function minimization. The Computer Journal, 7(4):308–313, 1965.6, 30, 56

[67] Samarabandu J, Allen KM, Hausmann E, and Acharya R. Algorithm for the automated alignment of radiographs for image subtraction. Theoretical Computer Science, 77:75–79, 1994. 71

[68] Wu J. Distributed System Design. CRC Press, 1999. 21

[69] Yang J, Staib LH, and Duncan JS. Neighbor-constrained segmentation with a 3d deformable model. Lecture Notes in Computer Science, 2732:198–209, 2003. 82

[70] Ludlow JB and Peleaux CP. Comparison of stent versus laser-and cephalostat-aligned periapical film-positioning techniques for use in digital subtraction radiography. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontology, 77(2):208–215, 1994. 48

[71] Maintz JBA and Viergever MA. An overview of medical image registra- tion methods. Symposium of the Belgian Hospital Physicists Association (SBPH/BVZF), 1997.4, 31 BIBLIOGRAPHY 89

[72] Maintz JBA and Viergever MA. A survey of medical image registration methods. Medical Image Analysis, 2:1–36, 1998. 51, 53, 71

[73] Holland JH. Adaptation in natural and artificial systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. The MIT Press, Massachusetts, MA, 1992.6, 35, 57

[74] Bahi JM, Contassot-Vivier S, and Couturier R. Parallel Iterative Algorithms: From Sequential to grid Computing. Chapman & Hall/CRC, 2008. 35, 57

[75] JBoss. JBoss Application Server. http://www.jboss.org/jbossas, 2008. 25, 98

[76] JBoss. JBoss Application Server. http://www.jboss.org/jbossas, 2008. 104, 105

[77] Hajnal JV. Introduction, chapter 1, pages 1–8. Biomedical Engineering. CRC Press, Florida, FL, 2001.2

[78] Harrison JW and Jurosky KA. Wound healing in the tissues of the periodon- tium following periradicular surgery. Journal of Endodontics, 17(9):425– 435, 1991. 47

[79] Eastman Kodak Company. Los rayos X en odontología, 1964. 47

[80] Price KV, Storn RM, and Lampinen JA. Differential Evolution: a practical approach to global optimization. Springer Verlag, 2005.6, 35, 57

[81] Chambers L. The practical handbook of genetic algorithms: Applications. Chapman & Hall/CRC, 2nd. edition, 2000. 35, 57

[82] Davis L. Handbook of genetic algorithms. Chapman & Hall/CRC, 2nd. edition, 2000. 35, 36, 58, 59

[83] Lin L, Gaengler P, and Langeland K. Perirradicular curetaje. International Endodontic Journal, 29(1):220–227, 1996. 47

[84] Brown LG. A survey of image registration techniques. ACM Computing Surveys, 24:326–376, 1992. 71

[85] Eshelman LJ. Real-coded genetic algorithms and interval schemata, vol- ume 2, pages 187–202. Morgan Kaufmann Publishers, Belligham, WA, 1993. 29 90 BIBLIOGRAPHY

[86] Schmitt LM. Theory of genetic algorithms ii: models for genetic operators over the string-tensor representation of populations and convergence to global optima for arbitrary fitness function under scaling. Theoretical Computer Science, 310:181–231, 2004. 57

[87] Baker M, Buyya R, and Laforenza D. Grids and Grid Technologies for Wide- Area Distributed Computing, volume 1, chapter 20, pages 487–502. John Wiley and Sons, Belligham, WA, 2002.8

[88] Bro-Nielsen M and Gramkow C. Fast fluid registration of medical images. Lecture Notes in Computer Science, 1131:267–276, 1996. 76

[89] Gen M and Cheng R. Genetic Algorithms and Engineering Optimization. Wiley-Interscience, 2000. 35, 58

[90] Goodyear M. Enterprise System Architectures: Building Client Server and Web Based Systems. CRC Press, 1999. 21

[91] Lozano M and García-Martínez C. Hybrid metaheuristics with evolutionary algorithms specializing in intensification and diversification: Overview and progress report. Computers & Operations Research, In press, 2009. 73, 90

[92] Powell M. An efficient method for finding the minimum of a function of several varialbles without calculating derivatives. The Computer Journal, 7(2):155–162, 1964.6, 30, 56

[93] Unser M. Sampling-50 years after shannon. Proceedings IEEE, 88(4):569– 587, 2000. 53

[94] Gabriel Mañana. unGrid Research Project. http://ungrid.unal.edu. co/, 2003.9

[95] Sun Microsystems. Jini. http://www.jini.org/, 1998. 14

[96] Sun Microsystems. Sun Grid Engine. http://gridengine.sunsource. net/, 2008. 43

[97] Sun Microsystems. Java binding for the opengl api. http://kenai.com/ projects/jogl/pages/Home, 2010. 25, 98

[98] Vannier MW, Pilgram TK, Speidel CM, Neumann LR, Rickman DL, and Schertz LD. Validation of magnetic resonance imaging (mri) multispec- tral tissue classiÞcation. Computerized Medical Imaging and Graphics, 15(4):217–223, 1991. 81 BIBLIOGRAPHY 91

[99] Kovacevic N, Lobaugh NJ, Bronskill MJ, Bronskill B, Bronskill L, Fein- stein A, and Black SE. A robust method for extraction and automatic segmentation of brain images. NeuroImage, 17(3):1087–1100, 2002. 81

[100] Lynch NA. Distributed Algorithms. Morgan Kaufmann, 1996. 20

[101] Nema. Digital imaging and communications in medicine. http:// medical.nema.org/, 2010. 83

[102] Langland OE, Langlais RP, and Preece JW. Principles of dental imaging. Lippincott Williams & Wilkins, 2002. 52

[103] University of California. SETI@home. http://setiathome.berkeley. edu/, 1999.9

[104] National University of colombia. Bioingenium research group. http: //www.bioingenium.unal.edu.co, 2010. 97

[105] Oracle. Mysql community server 5.1. http://dev.mysql.com/ downloads/, 2010. 98

[106] Bishop P and Warren N. JavaSpaces in Practice. Pearson Education, 2002. 21

[107] Viola P and Wells III WM. Alignment by maximization of mutual informa- tion. International Journal of Computer Vision, 24:137–154, 1997. 31, 53, 72

[108] Hingston PF, Barone LC, , and Zbigniew M. Design by Evolution, Advances in Evolutionary Design. Natural Computing Series. Springer, 2008. 57

[109] Abramowitz PN, Rankow H, and Trope M. Multidisciplinary approach to apical surgery in conjunction with the loss of bucal cortical plate. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontology, 77(5):502–506, 1994. 47

[110] Nummikoski PV, Martínez TS, Matteson SR, McDavid WD, and Dove SB. Digital subtraction radiography in artificial recurrent caries detection. Dentomaxillofacial Radiology, 29(2):59–64, 1992. 48

[111] Steinmetz R and Wehrle K. Peer-to-Peer Systems and Applications. Springer Verlag, 2005. 21

[112] RENATA. Red Nacional de Tecnología Avanzada. http://www.renata. edu.co/, 2010. 104, 105 92 BIBLIOGRAPHY

[113] Woods RP, Cherry SR, and Mazziotta JC. Rapid automated algorithm for aligning and reslicing pet images. Journal of Computer Assisted Tomogra- phy, 16:620–633, 1992. 72

[114] Ghosh S. Distributed Systems - An Algorithmic Approach. Chapman & Hall/CRC, 2007. 20

[115] Kirkpatrick S, Gelatt Jr. CD, , and Vecchi MP. Optimization by simulated annealing. Science, 220(4598):671–680, 1983.6

[116] Warfield SK, Zou KH, and Wells WM. Simultaneous truth and performance level estimation (staple): An algorithm for the validation of image seg- mentation. IEEE Transactions on Medical Imaging, 23(7):903–921, 2004. 83

[117] Lee SS, Huh YJ, Kim KY, Heo MS, Choi SC, Koak JY, Heo SJ, Han CH, and Yi WJ. Development and evaluation of digital subtraction radiography computer program. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontology, 98:471–475, 2004. 71

[118] Rohlfing T, Maurer Jr. CR, Bluemke DA, and Jacobs MA. Volume-preserving non-rigid registration of mr breast images using free-form deformation with an incompressibility constraint. IEEE Transactions on Medical Imaging, 22:730–741, 2003.3

[119] Sterling T and Becker D. Beowulf. http://www.beowulf.org/, 2008.7

[120] Terracotta. Terracotta. http://www.terracotta.org/, 2010. 25, 90

[121] Bacgı˘ U, Udupa JK, and Bai L. The role of intensity standardization in medical image registration. Pattern Recognition Letters, 31:315–323, 2009. 72

[122] Ruttimann UE, Webber RL, and Schmidt E. A robust digital method for film contrast correction in subtraction radiography. Journal of Periodontal Research, 21:486–495, 1986. 52

[123] Vanderbilt University. Retrospective image registration evaluation project. http://www.insight-journal.org/rire/, 2008. 71

[124] Byrd V, Mayfield-Donahoo T, Reddy M, and Jeffcoat M. Semiautomated image registration for digital subtraction radiography. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontology, 85(4):473– 478, 1998. 48, 71 BIBLIOGRAPHY 93

[125] Markaki VE, Asvestas PA, and Matsopoulos GK. An iterative point cor- respondence algorithm for automatic image registration: An application to dental subtraction radiography. Computer Methods and Programs in Biomedicine, 93:61–72, 2009. 71

[126] Press WH, Teulkolsky SA, Vetterling WT, and Flannery BP. Numerical Recipes in C, The Art of Scientific Computing. Cambridge University Press, 2nd. edition, 2002. 56

[127] Yi WJ, Heo MS, Lee SS, Choi SC, and Huh KH. ROI-based image registra- tion for digital subtraction radiography. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontology, 101:523–529, 2006. 71

[128] Wells III WM, Grimson WEL, Kikinis R, and Jolesz FA. Adaptive segmenta- tion of mri data. IEEE Transactions on Medical Imaging, 15(4):429–442, 1996. 81

[129] Pennec X, Roche A, Malandain G, and Ayache N. Multimodal image registration by maximization of the correlation ratio. http://hal. archives-ouvertes.fr/, 1998. 31, 54

[130] Shen X, Yu H, Buford J, and Akon M. Handbook of Peer-to-Peer Networking. Springer Verlag, 2009. 21

[131] Yuan X, Zhang J, and Buckles BP. Evolution strategies based image registration via feature matching. Information Fusion, 5:269–282, 2004. 29

[132] Zeng X, Staib LH, Schultz RT, and Duncan JS. Segmentation and measure- ment of the cortex from 3-d mr images using coupled-surfaces propagation. IEEE Transactions on Medical Imaging, 18(10):927–937, 1999. 82