MRI-BASED IMAGES SEGMENTATION FOR GPU ACCELERATED FUZZY

METHODS ON GRAPHICS PROCESSING UNITS BY CUDA

A dissertation submitted

to Kent State University in partial

fulfillment of the requirements for the

degree of Doctor of Philosophy

by

Wei-Hung Cheng

December 2018

Dissertation written by

Wei-Hung Cheng

B.B.A., Baylor University, USA 1998

M.S., Texas A&M University-Commerce, USA, 2002

M.S., Texas A&M University-Commerce, USA, 2003

Ph.D., Kent State University, USA 2018

Approved by

Dr. Cheng Chang Lu , Chair, Doctoral Dissertation Committee

Dr. Austin Melton, Jr. , Members, Doctoral Dissertation Committee

Dr. Angela Guercio ,

Dr. Jun Li ,

Dr. Robert J. Clements ,

Accepted by

Dr. Javed I. Khan, , Chair, Department of Computer Science

Dr. James L. Blank , Dean, College of Arts and Sciences



7$%/(2)&217(176

7$%/(2)&217(176,,,

/,672)),*85(69,

/,672)7$%/(69,,

$&.12:/('*(0(1769,,,

,1752'8&7,21

 ,PDJH6HJPHQWDWLRQ

 3DUDOOHO3URFHVVLQJ

 5HVHDUFK0HWKRGRORJ\

 5RDG0DS

%$&.*5281'

 0DJQHWLF5HVRQDQFH,PDJLQJ

 ,PDJH6HJPHQWDWLRQ

 7KH&ODVVLFDO0HWKRGV

 7KH6WDWLVWLFDO0HWKRGV

 7KH1HXUDO1HWZRUNV0HWKRGV

 7KH)X]]\&OXVWHULQJ0HWKRGV

35(9,286:25.

 ',PDJH6HJPHQWDWLRQ

 5REXVW)X]]\/RFDO,QIRUPDWLRQ&PHDQVDOJRULWKP

LLL

 

 )DVW*HQHUDOL]HG)&0DOJRULWKP

 /HYHO6HW0HWKRG$OJRULWKP

 .0HDQV$OJRULWKP

 ',PDJH6HJPHQWDWLRQ

 $GDSWLYH6SDWLDO)&0$OJRULWKP

 3UHFLVLRQ)&0$OJRULWKP

 7KH)ORRG)LOOLQJ7HFKQLTXH$OJRULWKP

 0DUNRY5DQGRP)LHOGV%DVHG 05) DQG+00(5¶V9LWHUEL$OJRULWKPV   

 7KH4XLFN6KLIW,PDJH6HJPHQWDWLRQ$OJRULWKP

 7KH&RQYH[5HOD[DWLRQ$SSURDFK$OJRULWKP

 6XPPDU\

0(',&$/,0$*(352&(66,1*21*383/$7)250

 3DUDOOHO3URFHVVLQJ

 *UDSKLF3URFHVVRU8QLWV

 &8'$$UFKLWHFWXUH

5(6($5&+0(7+2'2/2*<

 7KHSURFHVVRIFUHDWLQJD'PRGHO

 ,PDJH6HJPHQWDWLRQDOJRULWKPV

 )X]]\&0HDQV$OJRULWKP

 3RVVLELOLVWLF&0HDQV$OJRULWKP

 &RPELQH7ZR$OJRULWKPV

 3URSRVHG3DUDOOHO)&0EDVHG$OJRULWKP

LY 

 'LVFXVVLRQ

,03/(0(17$7,21$1'3(5)250$1&((9$/8$7,21

 ,PSOHPHQWDWLRQIRUSDUDOOHOVHJPHQWDWLRQ

 ([SHULPHQWV

 0DFKLQH(QYLURQPHQW

 'DWDVHW'HVFULSWLRQ

 &OXVWHU9DOLGLW\)XQFWLRQV

 5HVXOWV

 7KLVUHVHDUFK

&21&/86,216$1')8785(:25.

 &RQFOXVLRQ

 )XWXUH:RUN

 'HHS/HDUQLQJLQ0HGLFDO,PDJH6HJPHQWDWLRQ

 &RQYROXWLRQDO1HXUDO1HWZRUNVIRUYROXPHWULFVWUXFWXUHVHJPHQWDWLRQ

 '&11V0HGLFDO,PDJHV6HJPHQWDWLRQ&KDOOHQJHV

 $SSURDFKWRWKH&KDOOHQJHV



 

Y

LIST OF FIGURES

Figure 4.1: Floating-Point Operations per Second for the CPU and GPU. Data of

NVIDIA...... 28

Figure 4.2: Memory Bandwidth for the CPU and GPU. Data of NVIDIA...... 29

Figure 4.3: Diagram of Block and Thread Organization in CUDA [41] ...... 32

Figure 4.4: Diagram of Memory Hierarchy in CUDA [41] ...... 32

Figure 5.1: 3D image model for DICOM files ...... 34

Figure 5.2: Loading data from DICOM Flowchart ...... 35

Figure 5.3: Cube shape cases ...... 35

Figure 5.4: Grid size effect on 3D model resolution ...... 35

Figure 6.1: Proof of segmentation accuracy on the side brain image ...... 56

Figure 6.2: CPU and GPU Execution Time of FCM-based Algorithms on the Side Brain

Image ...... 57

Figure 6.3: Accuracy of the IT2FCM algorithm ...... 57

Figure 6.4: CPU and GPU Execution Time of FCM-based Algorithms on the Top Brain

Image ...... 58

Figure 6.5: Execution Times of IT2FCM on CPU and GPU ...... 59

vi

LIST OF TABLES

Table 6.1: GPU utilization for different number of threads in % [54] ...... 51

Table 6.2: Access times of different memory types [54] ...... 51

Table 6.3: Configuration of CPU and GPU Hardware ...... 53

Table 6.4: Speed-up for GPU Implementations ...... 56

Table 6.5: The performance of FCM-based algorithm comparison on the side brain image

...... 56

Table 6.6: The performance of FCM-based algorithm comparison on the top brain image

...... 57

vii

ACKNOWLEDGEMENTS

I would like to thank my advisor, Professor Dr. Cheng-Chang Lu, for his continuous guidance, support, and encouragement throughout my Ph.D. study. He has provided me not only a collaborative environment to do research, but also the technical knowledge and a rigorous attitude towards that research. I would like to give sincere thanks to Professor

Dr. Melton Jr. Austin, Professor Dr. Angela Guercio, Professor Dr. Jun Li, and Professor

Dr. Robert J. Clements for participating in the committee. Thanks for their constructive suggestions and comments on my research and thesis.

Meanwhile, I want to thank the wonderful members of our research group at the image processing and computer vision lab. They are Yujun Guo, Chi-Hsiang Lo, Fan

Chen, Yufan Liu, and Xinyu Chang. Because of them, the laboratory life became more pleasant and enjoyable. Special thanks for their helpful discussions and friendship.

Most of all I am grateful to my family. I thank my sister and my brother for their love and support. I would like to express my deepest thanks to my mother, for her unwavering support and encouragement. Without their support, this thesis could not be done. I dedicate this thesis to them.

Wei-Hung Cheng

November 16, 2018, Kent, Ohio

viii





,QWURGXFWLRQ

0HGLFDO,PDJH3URFHVVLQJ 0,3 KDVEHHQGHYHORSLQJUDSLGO\RYHUWKHSDVWGHFDGHV

VLJQLILFDQWO\GXHWRWKHGLUHFWLPSDFWRQWKHGLDJQRVLVDQGWKHWUHDWPHQWRIPDQ\GLVHDVHV

$SSOLFDWLRQV XVHG LQ LPDJH DFTXLVLWLRQ V\VWHPV VXFK DV 0DJQHWLF 5HVRQDQFH ,PDJLQJ

05, ;UD\&RPSXWHG7RPRJUDSK\ &7 8OWUDVRXQGDQG;UD\PDPPRJUDSK\QRZ

VFDQZLWKKLJKHUUHVROXWLRQDQGLPSURYHGTXDOLW\$SSOLFDWLRQVDUHDOVRXVHGLQQXFOHDU

PHGLFLQHLPDJLQJWHFKQLTXHVWKDWLQFOXGH3RVLWURQ(PLVVLRQ7RPRJUDSK\ 3(7 6LQJOH

3KRWRQ (PLVVLRQ &RPSXWHG 7RPRJUDSK\ 63(&7  FDUGLRYDVFXODU LPDJLQJ DQG ERQH

VFDQQLQJHWF6RPHRIWKHVHWHFKQRORJLHVXVHPDJQHWLFUDGLRZDYHVWRFDSWXUHLPDJHV>@

7KHUHVHDUFKRIPHGLFDOLPDJHDQDO\VLVPHWKRGRORJLHVIRULPDJHDFTXLVLWLRQHTXLSPHQW

LQFOXGH LPDJH VHJPHQWDWLRQ LPDJH UHJLVWUDWLRQ PRWLRQ WUDFNLQJ DQG FKDQJH GHWHFWLRQ

IURPLPDJHVHTXHQFHVDQGWKHPHDVXUHPHQWRIDQDWRPLFDODQGSK\VLRORJLFDOSDUDPHWHUV

IURPLPDJHV

 ,PDJH6HJPHQWDWLRQ

,PDJHVHJPHQWDWLRQLVWKHSURFHVVRIVHJPHQWLQJRUSDUWLWLRQLQJDGLJLWDOLPDJH

LQWRDVHWRIUHJLRQVZKLFKLVDYHU\LPSRUWDQWVWHSLQPDQ\DSSOLFDWLRQVRIFRPSXWHUYLVLRQ

>@  ,Q PHGLFDO LPDJH DQDO\VLV LPDJH VHJPHQWDWLRQ LV W\LFDOO\XVHGDVDQHDUO\SUH

SURFHVVLQJWRLGHQWLI\WKHGLIIHUQHQWUHJLRQVUHSUHVHQWLQJGLIIHUHQWW\SHVRIWLVVXHV,WLV

YHU\XVHIXOLQH[WUDFWLQJWKH5HJLRQRI,QWHUHVW 52, ZKLFKKHOSVLQGLUHWLQJWKHPHGLFDO



physician’s attention to any abnormalities in the body tissues such as tumors [3].

Moreover, efficient and effective techniques are required in order to identify the important parts in medical images such as the Regions of Interest (ROI), which helps medical experts to identify the disease. This is achieved via a segmentation operation [4]. In many forms of Computer Aid Diagnosis (CAD), it is based on separating the ROI in order to study it independently [5]. Clustering, region growing and many other methods can be applied to improve or enchance the result of the segementation process.

The system for extracting the ROI can be built on two different concepts such as supervised machine learning and unsupervised machine learning. In supervised machine learning, the machine is provided with a set of solution instances of the problem at hand, which may for example be classification or regression. Both are supervised learning methods; they differ in whether the set of potential solutions for each instance is discrete or continuous. On the other side, in unsupervised machine learning, incorrect solutions are provided. In many of cases, the researcher prefers to use unsupervised learning methods over the supervised ones because the supervised methods need time for training. The unsupervised methods segment the ROI directly from the image without inspecting other images [6]. In this research, we are interested in unsupervised segmentation algorithms.

We are focusing on the Fuzzy C-means (FCM) clustering algorithm which is one of the very popular 2D image segmentation algorithms for ROI extraction. Moreover, applying the Fuzzy C-mean clustering algorithm for segementing 3D medical volumes has a main problem of efficiency because of its expensive computation time. In many medical

2

applications, the source of the 3D dataset is the acquisition system such as PET, CT, or

MRI. A stack of slices is produced, each of which is a 2D medical image that covers a specific section from the human body scan. All slices are grouped together using

MATLAB code to form a 3D matrix which evaluates the 3D medical volume [7]. 3D volumes can be evaluated either by the volume they occupy or by their contouring edges.

This research will develop an algorithm to handle an expensive computation such as 2D and 3D medical image segmemtation in parallel processing in order to improve its performance.

1.2 Parallel Processing

The performance of the segmentation part can be the main performance bottleneck in many applications of medical image analysis. Researchers have been invested in studying and enhancing the performance of segmentation algorithms in great effort. Those efforts range from modifying how the segmentation works to employing different parallel programming technologies [8] [9] [10] [11]. In the field of High-Performance Computing

(HPC), the use of the Graphics Processing Unit (GPU) has emerged as a competitive solution for computing massively parallel problems. Many segmentation algorithms for extracting ROI from medical images are enhanced using a GPU [12]. The simultaneous execution of the same task on multiple processors to obtain results quickly is known as parallel computing. The main advantage of parallel processing is that tasks can be divided into many sub-tasks and processed by more than one processor simultaneously, which results in a large amount of execution time reduction. Parallel computing plays an essential 3

role in many applications involving a large amount of data such as weather forecasting, data visualization, biology and engineering [13]. The architecture of parallel computing is required to connect all the tasks involved using distributed memory or shared memory.

Communication between processors forms the basic foundation of parallel computation.

Moreover, many message-passing libraries have been developed that can initiate and configure the messaging environment as well as send and receive packets of data between processors [14]. One of the parallel architectures is Compute Unified Device Architecture

(CUDA), which was introduced by NVIDA in 2006. CUDA has its own massively parallel architecture and contributes to the evolution in the GPU programming model. CUDA is a parallel programming model and its instruction set architecture uses a parallel computing engine build on an NVIDIA GPU to handle large computational problems.

1.3 Research Methodology

In this chapter we will present the proposed methodology of our research on a parallel implementation 3D medical image segmentation. The main approach for our research is as following: First, the process of creating 3D models for medical imaging is explained. Second, the main algorithms for the FCM-based segmentation process performed as sequential steps for the volume object, are explained. Then the proposed parallel version algorithm for the method with the 2D and 3D medical image is presented.

1.4 Road Map

The overall organization of this research is:

4

In chapter 2: background knowledge of medical image segmentation. Many segmentation methods are computationally expensive while running on large amounts of dataset produced by the medical modalities. Segmentation of image data before or during the operation must be fast and accurate in the clinical environment. Image segmentation in medical imaging is used to segment brain structures, blood vessels, tumors and bones. This section will describe the segmentation methods, which have been used in the areas of the classical methods, the statistical methods, the neural network methods, and the fuzzy clustering methods.

In chapter 3: a literature survey of previous methods used for medical image segmentation for 2D and 3D, from both theoretical as well as practical aspects.

In chapter 4: medical image processing on GPU platform.

In chapter 5: research methodology. Describe our research as following: First, the process of creating a 3D model for a medical image is explained. Second, the main algorithms for the segmentation process as sequential steps for the volume object, is explain. Then the proposed parallel version algorithm for the medical image is presented.

In chapter 6: implementation detail about parallel algorithms for medical image segmentation and performance evaluation of the obtained results is discussed.

In chapter 7: conclusion and future work.

5









%DFNJURXQG

7KLVFKDSWHULVEDFNJURXQGNQRZOHGJHRIPHGLFDOLPDJLQJVHJPHQWDWLRQ0DQ\

VHJPHQWDWLRQPHWKRGVDUHFRPSXWDWLRQDOO\H[SHQVLYHZKLOHUXQQLQJRQODUJHDPRXQWVRI

GDWDVHWSURGXFHGE\WKHPHGLFDOPRGDOLWLHV6HJPHQWDWLRQRILPDJHGDWDEHIRUHRUGXULQJ

WKHRSHUDWLRQPXVWEHIDVWDQGDFFXUDWHLQWKHFOLQLFDOHQYLURQPHQW,PDJHVHJPHQWDWLRQ

LQPHGLFDOLPDJLQJLVXVHGWRVHJPHQWEUDLQVWUXFWXUHVEORRGYHVVHOVWXPRUVDQGERQHV

7KLVVHFWLRQZLOOGHVFULEHWKHVHJPHQWDWLRQPHWKRGVWKDWKDYHEHHQXVHGLQWKHDUHDVRI

WKUHVKROGLQJUHJLRQJURZLQJPRUSKRORJ\DQGZDWHUVKHG

 0DJQHWLF5HVRQDQFH,PDJLQJ

05,LV0DJQHWLF5HVRQDQFH,PDJLQJZKLFKLVDQLPDJLQJWHFKQLTXHXVHGSULPDULO\

LQPHGLFDOLPDJHSURFHVVLQJWRSURGXFHKLJKTXDOLW\LPDJHVRIWKHLQVLGHRIKXPDQERG\

>@,WLVDPDSRIWKHORFDOWUDQVYHUVHPDJQHWL]DWLRQRIWKHK\GURJHQQXFOHL,WLQWXUQ

GHSHQGV RQ VHYHUDO LQWULQVLF SURSHUWLHV RI WKH WLVVXH  0RUHRYHU 05, LV EDVHG RQ WKH

IXQGDPHQWDO SURSHUW\ WKDW SURWRQV DQG QHXWURQV ZKLFK PDNH XS D QXFOHXV SRVVHVV DQ

LQWULQVLFDQJXODUPRPHQWXPFDOOHGVSLQ:KHQSURWRQVDQGQHXWURQVFRPELQHWRIRUPD

QXFOHXV WKH\ FRPELQH ZLWK RSSRVLWHO\ RULHQWHG VSLQV  7KHUHIRUH QXFOHL ZLWK DQ HYHQ

QXPEHURISURWRQVDQGQHXWURQVKDYHQRQHWVSLQZKHUHDVQXFOHLZLWKDQRGGQXPEHURI

SURWRQVRUQHXWURQVSRVVHVVDQHWVSLQ+\GURJHQQXFOHLKDYHDVLJQDOVLQFHLWVQXFOHXVLV 



made up of only a single proton and therefore possesses a net spin. The human body is made up of fat and water which contain many hydrogen atoms. Medical image MRI’s primarily show the magnetic resonance signal from the hydrogen nuclei in body tissues.

The net spin of the nucleus around its axis gives it an angular moment. A current loop perpendicular to the rotation axis is also created, and as a result the proton generates a magnetic field since the proton has a positive charge. The joint effect of the angular moment and the self-generated magnetic field gives the proton a magnetic dipole moment parallel to the rotation axis. For normal conditions, one will not experience any net magnetic field from the volume since the magnetic dipole moments are oriented randomly and on average, equalize one another. A proton, with its magnetic dipole moment, process around the field axis when placed in a magnetic field. The frequency of this precession, f0, is the resonant frequency and is called the Larmor frequency. The precession frequency is directly proportional to the strength of the magnetic field.

Equation 2.1 f0 = gB0

B0 is the primary magnetic field strength, and g is a constant called gyromagnetic ratio, which is different for each nucleus (42.56 MHz/Tesla for protons). The application of magnetic field B0 would create a net equilibrium magnetization M0 per cubic centimeter, which is aligned to the B0 field. The M0 is the net summing up of the magnetic fields due to each of the H nuclei, and it is directly proportional to the location of proton density.

Moreover, M0 is many orders of magnitude weaker than B0 and cannot be observable. By moving M0 away from the B0-field axis with a properly chosen RF pulse having a

7

frequency equal to the Larmor frequency, a longitudinal magnetization component ML and a transverse magnetization component MT is produced. After the pulse, the longitudinal magnetization component ML recovers to M0 with a relaxation time T1, and the transverse magnetization component MT decays to zero with a relaxation time T2. The protons lose energy by emitting their own RF signal with amplitude proportional to MT during relaxation. This signal is the free-induction decay (FID) response signal. T2 indicates the time constant required for the FID response signal from a given tissue type to decay. The

FID response signal is measured by an RF coil placed around the object being imaged. The

RF pulse is repeated at a predetermined rate when MRI images are acquired. TR is the period of the RF pulse sequence and the repetition time. The FID response signals can be measured at various times within the TR interval. TE is the time between the RF pulse and is applied to the response signal and is measured by the echo delay time. It is the time when the spin echo occurs due to the refocusing effects of the 180 degrees refocusing pulse applied after a delay of TE/2 from the RF pulse. The TR and TE control how much the local tissue relaxation times, T1 and T2, affect the signal. The acquired MR image can be made to contrast different tissue types by adjusting TR and TE.

2.2 Image Segmentation

Medical image segmentation relies on human graphical interaction to define regions, using methods such as manual slice editing, region painting and interactive thresholding. The different methods of image segmentation are classified into four main categories by Rajapakse as following [16]: 8

(1) The classical methods: thresholding region growing and edge-based techniques.

(2) The statistical methods (the maximum-likelihood classifier (MLC)): these methods are basically supervised and depend on the prior model and its parameters. Moreover, the method introduced the use of cues to guide the segmentation; those cues marked by the user have the mean and standard deviation as description parameters.

(3) The neural networks method: the work of Ahmed et al. [17], this method uses a two- neural network system for CT/MRI image segmentation. The first stage is a self- organized principle component analysis network; and the second stage consists of a self- organizing feature map. The outcome obtained compares favorably with the classical and statistical methods.

(4) The Fuzzy Clustering methods: a method using the unsupervised fuzzy c-mean algorithm. Researcher presented a paper that is contains a comparison between the fuzzy clustering and neural network techniques in segmenting MRI images of the brain, and discussion of the need for unsupervised technique in segmentation [18].

2.2.1 The Classical Methods

One of the classical methods for medical image segmentation is thresholding.

Thresholding methods have been demonstrated to be effective in constrained processing environment with predictable images. Threshold technique is one of important techniques in image segmentation. The fundamental principle of thresholding is based on the characteristics of the image. For image segmentation, it is a process to segment the image

9

on each pixel or voxel by different threshold values for different objects. Threshold technique can be expressed as:

Equation 2.2 T = T [x, y, p (x, y), f (x, y)]

1, 푖푓 푓 (푥, 푦) > 1 Equation 2.3 g (x, y) = { 0, 푖푓 푓 (푥, 푦) ≤ 0

Where, T is the threshold value x, y are the coordinates of the threshold value point, p (x, y) and f (x, y) are points the gray level image pixels. Threshold image can be defined as g

(x, y) in Equation 2.3 [19].

Thresholding techniques are classified into two groups: global thresholding and local thresholding. Global thresholding can be differentiated by traditional, iterative, and multistage. It is used when the differences between foreground and background are very distinct so that a single threshold value can be used to differentiate both objects. This differentiation is depended on the property of the pixel and grey level value of image. One of popular global thresholding techniques is Otsu’s Method, which is traditional thresholding. Otsu’s method is used as a pre-processing method to segment an image for feature analysis and quantification [20].

2.2.2 The Statistical Methods

The statistical methods for medical image segmentation have two major approaches: edge-based, and region-based. Edge-based approaches segmentation processes an image based on detects and links edge pixels to form contours, while region- based approaches segmentation does the similar function based a desired property within

10

a neighboring pixel. Edge-based approach also called as edge detection or boundary detection technique because it detects and links edge pixels to form contours. The edge - based method can be preferable because algorithm is usually less complex, and edges are important features in an image to separate regions [21]. Region-based approaches segmentation processes an image based on looking for equality properties, which are intensity, color, and texture, inside a sub-region. Region-based methods cover more pixels than edges-based methods and thus it has more information available in order to characterize the region. For instance, you could use texture property for detecting a region, which is not easy when dealing with edges. Furthermore, the region growing methods are generally better in noisy images where edges are difficult to detect. Therefore, region- based methods are robust in segmentation process than edge-based methods [21].

2.2.3 The Neural Networks Methods

Recent years, the neural networks methods have come to be more popular as a different approach for medical image segmentation. The neural networks methods image segmentation can be divided into two groups: supervised and unsupervised methods [22].

A supervised method requires the human expert input, which the human expert is selecting the training data to use to segment the images. The general procedure of a supervised method [23]: First, the method begins with the expert selects training images to manually segment training image into number of proper sub-regions. Second, the method uses training image data to build up the proposed architecture. Third, the method labels those sub-regions according to the training image data to segment the image 11

then complete segmentation process. Deep learning neural network is one of the supervised methods for segmentation, which is a growing trend in medical image segmentation such as convolutional neural network [24].

An unsupervised method automatically partitions the image without human intervention. Moreover, the user intervention might be necessary at some point in the process to improve performance of the segmentation methods. The general procedure of a un supervised method [23]: First, the method segments the input image data into number of proper sub-regions automatically. Second, the method assigns labels to those sub- regions, then complete segmentation process. The unsupervised method for medical image segmentation is cluster based, which it is not dependent on training data and process. The command used algorithms for cluster based are K-mean, C-mean, and Fuzzy C-mean [22].

2.2.4 The Fuzzy Clustering Methods

The most of fuzzy clustering methods is the unsupervised algorithms, which has been successfully applied to medical image segmentation [3]. An image has various feature spaces, and the clustering is the process of classifies the image by finding the natural group of data points in the multidimensional feature space. The standard fuzzy clustering method has some limitation. Those limitation are sensitivity to initial matrix, wrongly classified noisy properties, and solution get stuck at local minima [25]. In the past two decades, researchers have applied the fuzzy clustering methods to medical image segmentation and contributed the improvements of the medical image segmentation such as fuzzy c-mean algorithms with the different modification of member functions. Because 12

of the standard FCM algorithm does not fully utilize the spatial information, researchers took advantage of classified information to improve the optimization procedure of the conventional FCM algorithm [26]. Another researcher modified classical FCM algorithm to improve the results of conventional FCM algorithm on noisy image [27].

13









3UHYLRXV:RUN

0DQ\UHVHDUFKHUVIRFXV RQ LPSURYLQJWKHDFFXUDF\RIWKHVHJPHQWDWLRQSURFHVV

EHFDXVH RI LWV LPSRUWDQFH LQ FRPSXWHU DLG GLDJQRVH &$'  V\VWHPV  +RZHYHU WKH

LPSURYHPHQW RQ DFFXUDF\ PLJKW FRPH ZLWK WKH FRVW RI LQFUHDVLQJ WKH H[HFXWLRQ WLPH

)XUWKHUPRUHRWKHUUHVHDUFKHUVZRUNWRUHGXFHWKLVHIIHFWE\XWLOL]LQJSDUDOOHOSURJUDPPLQJ

WHFKQRORJLHV7KLVFKDSWHUZLOOGLVFXVVWKHSULRUZRUNRIPHGLFDOLPDJHVHJPHQWDWLRQWKDW

LQFOXGHG'DQG'IURPERWKWKHRUHWLFDODVZHOODVSUDFWLFDODVSHFWVDQGRQLPSURYLQJ

WKHSHUIRUPDQFHRIVHJPHQWDWLRQWHFKQLTXHVE\XWLOL]LQJ*38

 ',PDJH6HJPHQWDWLRQ

,QPDQ\PHGLFDOLPDJHSURFHVVLQJDSSOLFDWLRQVVHJPHQWDWLRQLVRQHRIWKHHDUO\

SURFHVVLQJVWHSVWREHH[HFXWHGDIWHUDFTXLULQJLPDJHVIURPPHGLFDOVFDQQHUV,WLVWKH

SURFHVVRIUHJLPHQWLQJRUSDUWLWLRQLQJDGLJLWDOLPDJHLQWRDVHWRIFODVVHVRUUHJLRQVZKLFK

LVDYLWDOVWHSIRUPDQ\FRPSXWHUGLVSOD\WDVNV>@,PDJHVHJPHQWDWLRQJURXSVWKHLPDJH

SRLQWVLQWRVHSDUDWHUHJLRQVFRUUHVSRQGLQJWRVRPHORJLFDOO\FRQQHFWHGREMHFWVZLWKLQWKH

LPDJH>@,WLVW\SLFDOO\GHILQHGDVLGHQWLI\LQJWKHVHWRIYR[HOVWKDWPDNHXSHLWKHUWKH

FRQWRXURUWKHLQWHULRURIWKHREMHFWRILQWHUHVW,QPHGLFDOLPDJHVLWLVXVHGDVDSUH

SURFHVVLQJVWHSWRVHSDUDWHWKHGLIIHUHQWUHJLRQVUHSUHVHQWLQJGLIIHUHQWW\SHVRIWLVVXHV,W

FDQEHXVHIXOLQWKHEHJLQQLQJRIH[WUDFWLQJWKHUHJLRQRILQWHUHVW 52, ZKLFKKHOSVLQ 



directing the physician’s attention to any abnormalities in the body tissues such as tumors

[1] [29]. Fuzzy C-Mean segmentation algorithm is one of the most popular medical image segmentation algorithms [30]. It has been shown to achieve high accuracy on medical image segmentation. Moreover, it has a main drawback that it has a long computation time. We will discuss some of the efforts on FCM-based segmentation of 2D images and highlight some of the research on improving the performance of FCM-based algorithms.

There have been many researchers working on improving the accuracy of FCM-based segmentation.

3.1.1 Robust Fuzzy Local Information C-means algorithm

In 2010, researchers improved the FCM algorithm accuracy for image segmentation to be able to run with an image containing noise [31]. This improvement was prepared by adding a spatial membership property with the original FCM for image clustering which was called robust Fuzzy local information c-means algorithm. The major characteristic of FLICM is the use of a fuzzy local similarity measure (based on spatial and gray level information), aiming to guarantee noise insensitiveness and image detail preservation. This is achieved by incorporating local spatial and gray level information.

The FLICM introduces a new factor Gki as a local (spatial and gray level) similarity measure, which aims to guarantee robustness both to noise and outliers. This algorithm is relatively insensitive of the type of the added noise, and therefore, without prior knowledge of the noise, FLICM is the best choice for clustering. It is also enforced by the way that spatial and gray level image information are combined in the algorithm; the factor Gki 15

combines in a fuzzy manner the spatial and gray level information, rendering the algorithm more robust to all kinds of noises and outliers. Moreover, the other fuzzy c-means algorithms for image clustering exploit, in their objective functions, a crucial parameter a

(or λ), which is used to balance the robustness and effectiveness of ignoring the added noise. This parameter is mainly determined empirically or using trial and error. The

FLICM is completely free of any parameter determination, while the balance between the noise and image details is automatically achieved by the fuzzy local constraints, enhancing concurrently the clustering performance. This is also enhanced by the other methods performed in the clustering on a precomputed image, while FLICM is applied on the original image.

Their experiments performed on synthetic and real-world images show that the

FLICM algorithm is effective and efficient, providing robustness to noisy images. The results from experiments tested colored and gray images with Gaussian noise of 30% and those experiment results showed that FLICM is better than FCM.

3.1.2 Fast Generalized FCM algorithm

Another proposed improvement for the FCM algorithm is used to reduce the effect of noise in an image, which is called the Fast Generalized FCM (FGFCM) algorithms [8].

It is focused on objective function calculations of FCM Spatial local information algorithm that mitigated the noise affects. Moreover, it is targeted on execution time by trying to make it run in the same complexity as the FCMS algorithm. The major characteristics of

FGFCM are: (1) to use a new factor Sij as a local (both spatial and gray level) similarity 16

measure aiming to guarantee both noise-immunity and detail-preserving for an image, and meanwhile remove the empirically-adjusted parameter α; (2) fast clustering or segmenting of the image, with the segmenting time dependent only on the number of the gray-levels q rather than the size N(⪢q) of the image, and consequently its computational complexity is reduced from O(NcI1) to O(qcI2), where c is the number of the clusters, I1 and I2 are the numbers of iterations, respectively, in the standard FCM and this fast segmentation method.

This proposal algorithm was tested with many types of noise such as Gaussian, and salt & pepper with mixed noise at different noise levels for each type. Their experiments showed that FGFCM has almost perfect outcome with around 99% accuracy for the segmentation process.

3.1.3 Level Set Method Algorithm

A level set method for image segmentation is a technique used in a region-based method with intensity inhomogeneity to segment the ROI [32]. The most widely used image segmentation algorithms are region-based and typically rely on the homogeneity of the image intensities in the regions of interest, which often fail to provide accurate segmentation results due to the intensity inhomogeneity. This algorithm deals with intensity inhomogeneities in the segmentation. First, based on the model of images with intensity inhomogeneities, they derive a local intensity clustering property of the image intensities, and define a local clustering criterion function for the image intensities in a neighborhood of each point. This local clustering criterion function is then integrated with respect to the neighborhood center to give a global criterion of image segmentation. In a 17

level set formulation, this criterion defines an energy in terms of the level set functions that represent a partition of the image domain and a bias field that accounts for the intensity inhomogeneity of the image. Therefore, by minimizing this energy, their method can simultaneously segment the image and estimate the bias field, and the estimated bias field can be used for intensity inhomogeneity correction (or bias correction).

Their method has been validated on synthetic images and real images of various modalities, with desirable performance in the presence of intensity inhomogeneities. Their experiment results have demonstrated the superior performance of their method in terms of accuracy, efficiency, and robustness. As an application, their method has been applied to MR image segmentation and bias correction with promising results.

3.1.4 K-Means Algorithm

An improved image segmentation technique that uses the algorithm for medical images is the K-Means Algorithm [33]. The use of the conventional Watershed algorithm for medical image analysis is widespread because of its advantages, such as always being able to produce a complete division of the image. However, its drawbacks include over-segmentation and sensitivity to false edges. This algorithm addresses the drawbacks of the conventional Watershed algorithm when it is applied to medical images by using K-means clustering to produce a primary segmentation of the image before application of this improved Watershed segmentation algorithm to the medical images.

The K-means clustering is an unsupervised learning algorithm, while the improved

Watershed segmentation algorithm makes use of automated thresholding on the gradient 18

magnitude map and post-segmentation merging on the initial partitions to reduce the number of false edges and over-segmentation.

The experiment compared the number of partitions in the segmentation maps of 50 images. The result showed that their proposed methodology produced segmentation maps which have 92% fewer partitions than the segmentation maps produced by the conventional

Watershed algorithm. By reducing the amount of over-segmentation, this algorithm obtains a segmentation map which is more representative of the various anatomies in the medical image.

3.2 3D Image Segmentation

The segmentation of organs and other substructures in medical images allows quantitative analysis of clinical parameters related to volume and shape such as in cardiac or brain analysis. It is often an important initial step in computer-aided detection pipelines.

The medical image segmentation is typically defined as identifying the set of voxels that make up either the contour or the interior of the objects of interest. Most common research on segmentation subjects applies deep learning to medical images, and deep learning methods have also seen the widest variety in methodology [34]. The detection of objects of interest or lesions in medical images is a major part of diagnosis and is one of the most labor-intensive tasks for clinicians. Commonly, the tasks consist of the localization and identification of small lesions in the full image space (3D). In CAD system research, it has been a long research tradition focused on automatically detecting lesions, improving the detection accuracy or decreasing the reading time of human experts [34]. In medical 19

application, the source of the 3D dataset is the acquisition systems such as PET, CT or

MRI. A stack of slices, which is a 2D medical image, is produced that covers a specific section of the human body. All slices are grouped together using MATLAB code [7] to build a 3D matrix that evaluates the 3D medical volume. This 3D volume can be evaluated either by the volume they occupy or by their contouring edges. We will discuss some of researcher’s algorithms on medical image 3D segmentation as following.

3.2.1 Adaptive Spatial FCM Algorithm

The adaptive spatial FCM clustering algorithm for 3D model MR images was introduced in [35]. This algorithm considers a local spatial continuity constraint, as well as the suppression of the INU artifact in 3-D MR images. It used a novel dissimilarity index that considers the local influence of neighboring pixels in an adaptive manner. If the neighborhood window is in a nonhomogeneous region, the influence of the neighboring voxels on the center voxel is suppressed; otherwise, the center voxel is smoothed by its neighboring voxels during membership and cluster centroid computation. To suppress the

INU artifact, a multiplicative MR image formation model is used. The estimation of the

3-D multiplicative bias field is formulated as the estimation of a stack of 2-D smoothing spline surfaces, with continuity enforced across slices, minimizing the 3-D residual signal between the actual data and a piecewise constant FCM solution. The spline coefficients of the spline surfaces are obtained by a computationally efficient two-stage algorithm.

20

Their experiment results and comparative studies with several existing methods on the segmentation of both simulated and real MR brain images illustrate the effectiveness and robustness of this algorithm.

3.2.2 Precision FCM Algorithm

The authors chose a hybrid segmentation algorithm base on Precision FCM and neural network algorithms [6]. This algorithm has been proven to be a high-performance neural network classifier. However, the FCM is difficult to learn from a large data set, which restricts its practical application. In their study, a parallel Precision FCM method is proposed to speed up the FCM based on the Compute Unified Device Architecture, especially for a large data set such as the ROI from 3D medical images. To improve the performance, they used parallel techniques by implementing code on the CUDA programming language, which is designed to take advantage of NVIDIA GPU hardware.

This method performs all stages as a batch in one block. Blocks and threads are responsible for evaluating classifiers and performing subtasks, respectively.

Their experimental results indicate that the speed and accuracy are improved by employing this novel approach. The improvement ratio in the worst case is 47 while in the best case it is 94.

3.2.3 The Flood Filling Technique Algorithm

The researcher used GPU to improve the performance of a segmentation algorithm that is based on the flood filling technique [36]. While level sets have demonstrated a great

21

potential for 3D medical image segmentation, their usefulness has been limited by two problems. First, 3D level sets are relatively slow to compute. Second, their formulation usually entails several free parameters which can be very difficult to correctly tune for specific applications. This algorithm presents a tool for 3D segmentation that relies on level-set surface models computed at interactive rates on commodity graphics cards

(GPUs). The interactive rates for solving the level-set PDE give the user immediate feedback on the parameter settings, and thus users can tune three separate parameters and control the shape of the model in real time. They have found that this interactivity enables users to produce good, reliable segmentation, as supported by qualitative and quantitative results.

A careful implementation of a sparse level-set solver on a GPU provides a new tool for interactive 3D segmentation. Users can manipulate several parameters to find a set of values that are appropriate for a segmentation task. In their experiments, the quantitative results of using this tool for brain tumor segmentation suggest that it compares well with hand contouring and state-of-the-art automated methods. However, the tool as built and tested is quite general, and it has no hidden parameters. Thus, the same tool can be used to segment other anatomy. The current limitations are mostly in the speed function and the interface. The speed function used in this algorithm is quite simple and easily extended, within the current framework, to include image edges, more complicated greyscale profiles, and vector-valued data.

22

3.2.4 Markov Random Fields Based (MRF) and HMMER’s Viterbi Algorithms

Improving the performance of two algorithms: Markov random fields based and

HMMER’s Viterbi algorithms by parallelizing algorithm using GPU [37]. This algorithm used GPU capability to decrease the execution time of the quick shift image segmentation algorithm [38]. A comparison of the speedup from our GPU implementations of MRF and

P7Viterbi/hmm search shows that significantly higher speedup is achieved in the MRF implementation. They consider the major factors that resulted in different speedups.

MRF’s major advantage over hmm search is that the former makes very few reads from the GPU’s global memory; and at each iteration, MRF accesses global memory only twice.

While hmm search also reads global/texture memory a minimum of twice per iteration, it also writes 3 values to global memory for each iteration. Since this loop is repeated over the entire length of the sequence, the P7Viterbi and, hmm search spend a large portion of their run-time accessing global memory. The memory coalescing and the use of constant memory proved more effective in HMMER than in their MRF implementation because its repeated global memory access in hmm search. This was unsurprising, considering MRF’s limited use of global memory. Loop unrolling proves more effective in P7Viterbi than in

MRF. The Viterbi algorithm has a limited number of variables needed in the core loop and lends itself nicely to unrolling the inner loop contents. Therefore, MRF requires more variables in its inner-loop. Unrolling these iterations results in increased register usage for temporary variables, leading to reduced performance. The MRF code ultimately proved well suited for acceleration on GPUs. Due to the architectural requirements of the NVIDIA

23

GPU, any thread participating in a warp will execute the same instructions simultaneously.

This essentially turns the GPU into a large SIMD processor. MRF is a natural fit for such architectures because its inner-loop is relatively free of branches with each thread operating on the same set of images at the same time. While HMMER demonstrated exceptional speedup as well, it includes more branching than the MRF code and requires more I/O from global memory. This ultimately prevents the level of speedup achieved by MRF.

In their experiment, they compare the performance of two statistics-based applications, MRF-based liver segmentation, and HMMER’s hmm search database searching tool. The unique characteristics of both algorithms are demonstrated by implementations on an NVIDIA 8800 GTX Ultra using the CUDA programming environment. The results demonstrated excellent performance improvement on the GPU, with MRF exhibiting a speedup of over 130x compared to serial execution and hmm search outperforming all known HMMER GPU implementations with a speedup of 38.6x. As they have shown, significant effort is required to properly leverage a GPU for general purpose computing. Therefore, they have demonstrated that algorithms must properly target the GPU to achieve performance improvements. This includes attention to the occupancy of the GPU kernel, loop unrolling, proper shared and constant memory usage, and most importantly memory coalescing.

3.2.5 The Quick Shift Image Segmentation Algorithm

This algorithm used GPU capability to decrease the execution time of quick shift image segmentation [38]. This algorithm uses Gaussian to calculate the similarity between 24

pixels and spatial memberships. Variants of the implementation which use global memory and texture caching are presented, and those results show that a method backed by texture caching can produce a 10 to 50 times speedup for practical images, making computation of super-pixels possible at 5-10Hz on modest sized (256x256) images.

In their experiment, they have shown a GPU implementation of quick shift which provides a 10 to 50 times speedup over the CPU implementation, resulting in a super- pixelization algorithm which can run at 10Hz on 256x256 images. The implementation is an exact copy of quick shift and could be further speeded up by approximating the density, via subsampling or other methods. It is likely to be the implementation that would also present similar speedups for exact mean shift.

3.2.6 The Convex Relaxation Approach Algorithm

The improved convex relaxation approach for image segmentation now utilizes GPU [39].

Convex relaxation techniques have become a popular approach to image segmentation as they allow us to compute solutions independent of initialization to a variety of image segmentation problems. This algorithm uses the moment constraints in a convex shape optimization framework. The researcher showed that for an entire family of constraints on the area, the centroid, the covariance structure of the shape and respective higher-order moments, the feasible constraint sets are all convex. While, they cannot guarantee global optimality of the arising segmentations, all computed solutions are independent of initialization and within a known bound of the optimum.

25

In both qualitative and quantitative experiments on interactive image segmentation, they demonstrated that respective moment constraints are easily imposed by the user and lead to drastic improvements of the segmentation results, reducing the average segmentation error from 12% to 0.35%. In contrast to existing works on shape priors in segmentation, the use of low-order moment constraints does not require shape learning and is easily applied to arbitrary shapes since the recovery of fine scale shape details is not affected through the moment constraints. Efficient GPU-accelerated PDE solvers allow for computation times of about one second for images of size 300×400, making this a practical tool for interactive image segmentation.

3.3 Summary

This chapter presents all previous works on 2D and 3D medical image segmentation for improving on the accuracy and performance. Research focus on the accuracy of segmentation usually incurs an increase in the execution time. On the other hand, researchers focus on improving the algorithm performance by reducing running time without affecting the accuracy of the original version especially for 3D medical images.

We can see some of the algorithms implemented with parallel technologies to decrease the execution time of segmentation for 2D and 3D images in the GPU environment and at the same time keep the accuracy of the results.

26









0HGLFDO,PDJH3URFHVVLQJRQ*38SODWIRUP

 3DUDOOHO3URFHVVLQJ

3DUDOOHO FRPSXWLQJ LV WKH VLPXOWDQHRXV H[HFXWLRQ RI WKH VDPH WDVN RQ PXOWLSOH

SURFHVVRUVLQRUGHUWRREWDLQUHVXOWVTXLFNO\>@7KHPDLQDGYDQWDJHRIWKLVLVWKDWSDUDOOHO

WDVNV FDQ EH GLYLGHG LQWR PDQ\ VXEWDVNV DQG SURFHVVHG E\ PRUH WKDQ RQH SURFHVVRU

VLPXOWDQHRXVO\ UHVXOWLQJ LQ D VLJQLILFDQW UHGXFWLRQ LQ WKH H[HFXWLRQ WLPH  3DUDOOHO

FRPSXWLQJSOD\VDQHVVHQWLDOUROHLQPDQ\DSSOLFDWLRQVLQYROYLQJODUJHDPRXQWVGDWDVXFK

DVZHDWKHUIRUHFDVWLQJGDWDYLVXDOL]DWLRQELRORJ\DQGHQJLQHHULQJ>@7KHDUFKLWHFWXUH

RIDSDUDOOHOHQYLURQPHQWLVUHTXLUHGWRFRQQHFWDOOWKHQRGHVLQYROYHGXVLQJVKDUHGRU

GLVWULEXWHG PHPRU\  &RPPXQLFDWLRQ EHWZHHQ SURFHVVRUV IRUPV WKH EDVLV RI SDUDOOHO

FRPSXWDWLRQ0RUHRYHUPDQ\PHVVDJHSDVVLQJOLEUDULHVKDYHEHHQGHYHORSHGWKDWFDQ

LQLWLDWHDQGFRQILJXUHWKHPHVVDJLQJHQYLURQPHQWDVZHOODVVHQGDQGUHFHLYHSDFNHWVRI

GDWDEHWZHHQSURFHVVRUV>@2QHRIWKHSDUDOOHODUFKLWHFWXUHVLV&RPSXWH8QLILHG'HYLFH

$UFKLWHFWXUH &8'$  ZKLFK ZDV LQWURGXFHG E\ 19,'$ LQ  &8'$ KDV LWV RZQ

PDVVLYHO\SDUDOOHODUFKLWHFWXUHDQGPDNHWKHHYROXWLRQLQWKH*38SURJUDPPLQJPRGHO

&8'$LVDSDUDOOHOSURJUDPPLQJPRGHODQGLWVLQVWUXFWLRQVHWDUFKLWHFWXUHXVHVDSDUDOOHO

FRPSXWLQJHQJLQHIURPWKH19,',$*38WRKDQGOHODUJHFRPSXWDWLRQDOSUREOHPV





4.2 Graphic Processor Units

For the last decade, the computing capacities of graphics processing units (GPUs) have improved exponentially in several dimensions. Therefore GPU-accelerated computation is a viable option for many applications. Figure 4.1 and Figure 4.2 are showing the rapid growth in GPU processing power compared to the CPU [41]. The design of the GPU architecture provides higher performance for many vision algorithms compared to the CPU. When NVIDIA released a CUDA programming model for GPU, it gave the medical image research community new tools to improve their research. Many studies of medical image processing on GPU platform with CUDA show a bright future for medical image research.

12000 Theoretical GFLOP/s at base clock NVIDIA GPU Single Precision 10000 NVIDIA GPU Double Precision Intel CPU Single Precision 8000 Intel CPU Double Precision

6000

Second 4000

2000

0 2003 2005 2007 2009 2010 2011 2013 2014 2015 2016 Year

Figure 4.1: Floating-Point Operations per Second for the CPU and GPU. Data of NVIDIA.

28

800 Theoretical Peak GB/s GeForce GPU 700 Intel CPU 600 Tesla GPU 500 400 300 Second 200 100 0 2003 2005 2007 2009 2011 2013 2015 Years Figure 4.2: Memory Bandwidth for the CPU and GPU. Data of NVIDIA.

4.3 CUDA Architecture

The GPUs platform is gaining popularity in performing computation intensive tasks and is well-suited for the data parallel computation applications. CUDA (Compute Unified

Device Architecture) programming, developed by NVIDIA, is a parallel programming model and software environment designed to handle parallel computing tasks. Its major abstractions are a hierarchy of thread groups, shared memories, and barrier synchronization. These abstractions provide a programming model for data parallelism, thread parallelism, and task parallelism. CUDA presents a structure to the GPU programmer that includes a collection of threads to run in parallel. This structure is similar to the traditional single instruction, multiple data (SIMD) parallel model. The computation paradigm of CUDA is described as the programming hierarchical structure and memory structure.

29

• Programming hierarchical structure:

CUDA Architecture for programming hierarchical structure has three basic parts that

help programmer to utilize effectively the full computational capability of the graphics

card on the system [41]. CUDA architecture splits a hierarchical structure into grids,

blocks and threads as shown in Figure 4.3. CUDA program composed a number of

grids in one GPU, a number of blocks in one grid, and a number of threads in one block,

this hierarchical architecture can provide the parallelism for the programmer to execute

large numbers of parallel threads. A grid is set of blocks, which are contain a number

of threads all running the same kernel. A block is a group of threads mapped to a single

multiprocessor by the programmer to share the memory. Blocks are not shared between

multiprocessors. A built-in variable “blockIdx” is used to identify the block. Based on

grid dimension, the block IDs can be 1D or 2D. A GPU usually has 65,535 blocks.

Threads are run on the single core of the multiprocessors, but grids and blocks are not

restricted to a single core. Thread has an ID “threadIdx”, which can be 1D, 2D or 3D

depend on block dimension. Threads have an amount of register memory. Number of

threads per block can be 1024 threads.

• Memory hierarchical structure:

CUDA device memory organization has multiple memories for their thread execution

shown in Figure 4.4. Memory hierarchy in the CUDA programming environment is in

the form of registers, local memory, shared memory, constant memory, texture

memory, and global memory.

30

Registers memory: it is the fastest level in hierarchy but only has limited amount of space. One set of register memory given to each thread. It is used for accessed variables fast storage and retrieval of data such as counter, which are frequently used by thread.

Local memory: it is slow and uncached. It allows automatic coalesced reads and writes.

It is used for any data not fit into registers and used by threads.

Shared memory: it is commonly used all threads in block for read or write operations.

Its size is smaller than global memory. The number of threads can be executed simultaneously in a block is determined the shared memory size, which is the occupancy within a block.

Constant memory: it is subset of device memory, which is shared by the processors and cannot be modified at run-time by a device. It is read only access where constants and kernel arguments are stored. It is slow and with cache.

Texture memory: it is a subset of the device memory, which is read only on the device, provides faster cached reads and allows addressing through a specialized texture unit.

Its cache optimized for 2D spatial access pattern. It is used by application such as visualization process.

Global memory: it permits read and write operation from all threads but is uncached and has long latencies. It is slow and uncached and requires sequential and aligned 16 byte reads and writes to be fast.

31

Host Device

Grid 1

Kernel 1 Block Block (0, 0) (1, 0)

Block Block (0, 1) (1, 1)

Grid 2

Kernel 2

Block (1, 1) (0,0,1) (1,0,1) (2,0,1) (3,0,1)

Thread Thread Thread Thread (0,0,0) (1,0,0) (2,0,0) (3,0,0)

Thread Thread Thread Thread (0,1,0) (1,1,0) (2,1,0) (3,1,0)

Figure 4.3: Diagram of Block and Thread Organization in CUDA [41]

Figure 4.4: Diagram of Memory Hierarchy in CUDA [41]

32









5HVHDUFK0HWKRGRORJ\

,QWKLVFKDSWHU,ZLOOSUHVHQWWKHSURSRVHGPHWKRGRORJ\RIRXUUHVHDUFKRQDSDUDOOHO

LPSOHPHQWDWLRQ ' DQG ' PHGLFDO LPDJH VHJPHQWDWLRQ  7KH PDLQ DSSURDFK IRU RXU

UHVHDUFKLVDVIROORZLQJ)LUVWWKHSURFHVVRIFUHDWLQJD'PRGHOIRUPHGLFDOLPDJHVLV

H[SODLQHG6HFRQGWKHPDLQDOJRULWKPVIRUVHJPHQWDWLRQSURFHVVDVVHTXHQWLDOVWHSVIRU

YROXPHREMHFWLVH[SODLQHG7KHQWKHSURSRVHGSDUDOOHOYHUVLRQDOJRULWKPIRUWKHPHGLFDO

LPDJHLVSUHVHQWHG

 7KHSURFHVVRIFUHDWLQJD'PRGHO

0DQ\ZRUNVRQ)&0XVHG'LPDJHGDWD>@0HGLFDOGDWDDUHXVHGDVLQSXWGDWDILOHV

ZKLFKDUHUHSUHVHQWHGDV',&20ILOHV7KHLPDJHGDWDLVH[WUDFWHGE\ XVLQJ',&20

UHDGHUIURPWKH)HOORZ2DN',&20OLEUDU\>@7KHGDWDSRLQWVKHUHDUHWKHSL[HO

YDOXHVZKLFKDUH5HG*UHHQ%OXHDQGDOSKD7KHW\SHRIPHGLFDOLPDJHLV&7VRWKH

LPDJHLVJUD\FRORU7KLVUHGXFHGGDWDVL]HLQRXUH[SHULPHQW:HZRUNZLWK'ZKHUH

;D[LVLVWKHZLGWKRIWKHLPDJH<D[LVLVWKHKHLJKWRIWKHLPDJHDQG=D[LVLVWKHQXPEHU

RIVOLFHVVHH)LJXUH7KHVWUXFWXUHRIORDGLQJGDWDIURP',&20ILOHVFDQEHVHHQLQ

)LJXUH7KHORDGLQJSURFHVVEHJLQVE\ORDGLQJDOO',&20ILOHVDQGRUGHULQJWKHPLQ

DVFHQGLQJRUGHUE\WKHLUQDPHV(DFK',&20ILOHKDVSURFHVVLQJLPDJHGDWDDQG WKH

UHDGHUUHWULHYHVLPDJHSL[HOVDQGUHSUHVHQWVWKHPDVD'LPDJH7KHVHFRQGLPDJHZLOO 

 



EHORFDWHGEHKLQGWKHILUVWRQHDQGVRRQIRUWKHUHVWRIWKHILOHV$IWHUSURFHVVLQJDOORI

LPDJHGDWDWKH'PRGHOZLOOEHEXLOWIURPDOOWKRVHVOLFHVXVLQJDQLPDJHUHFRQVWUXFWLQJ

SURFHVV,PDJHUHFRQVWUXFWLQJLVDQLPDJHSURFHVVLQJWHFKQLTXHZKLFKLVXVHGWREXLOG

IURP'LPDJHVD'PRGHORIDVLQJOHREMHFWIURPGLIIHUHQWDQJHOV1H[WWKHPDUFKLQJ

FXEHVDOJRULWKPLVXVHGWREXLOGWKH'PRGHOIRUWKH52,(DFKSL[HOLQWKHDOJRULWKPLV

UHSUHVHQWHGDVDFXEHZLWKYHUWH[LQGH[HVUDQJLQJIURPWRZKLOHHDFKHGJHDOVRKDV

LQGH[HVUDQJLQJIURPWR7KH0DUFKLQJ&XEHVDOJRULWKPXVHVWKHGLYLGHDQGFRQTXHU

WHFKQLTXHWRFKHFNWKHFXEHVKDSH>@,QWKLVDOJRULWKPWKHFXEHUHSUHVHQWVWKHSL[HO

XQLWLQD'V\VWHPZKLFKJHWVRQHRIFDVHVVHH)LJXUH7KHVHFDVHVGHSHQGRQWKH

SL[HOORFDWLRQDQGVWDWH$OOFDVHVRIDFXEHFDQEHGUDZQDVD'XQLWIRUHDFKSL[HOLQ

'7KHUHVROXWLRQRIWKHLPDJHLVVHWE\WKHJULGVL]HZKHQWKHJULGVL]HLVVPDOOHUWKH

UHVROXWLRQZLOOEHVPRRWKHUDVIRUWKHLPDJHVKRZQLQ)LJXUH

 <D[LV

=D[LV ;D[LV 

)LJXUH'LPDJHPRGHOIRU',&20ILOHV





 



2XWSXW,PDJHV /RDGLQJ)LOHV ',&20)LOHV

([WUDFW,PDJH 3L[HOV

%XLOGLQJ'0RGHO 0DUFKLQJ&XEH

6HJPHQWDWLRQ 3URFHVV )X]]\&0HDQ



)LJXUH/RDGLQJGDWDIURP',&20)ORZFKDUW

 )LJXUH&XEHVKDSHFDVHV



)LJXUH*ULGVL]HHIIHFWRQ'PRGHOUHVROXWLRQ





5.2 Image Segmentation algorithms

Clustering algorithms are used widely in different areas of research, such as pattern recognition, data mining, classification, image segmentation, data analysis, and modeling.

Clustering algorithms find data groups that share similar characteristics in a given data set.

There are several fuzzy clustering algorithms have developed to improving the performance of image segmentation. Our research focuses on a mix of algorithms including Fuzzy C-Means and Possibilistic C-Means. Because Fuzzy C-Means was inadequate to handle the noise in image data sets, Possibilistic C-Means can help and improved the weaknesses of FCM.

5.2.1 Fuzzy C-Means Algorithm

Fuzzy C-Means (FCM) is a clustering technique used to separate data into several groups. Each group has several data points and each point has a membership value in each group; the stronger membership determines the best group for this point. This membership characteristic is calculated by the Euclidean distance.

FCM works in three main steps as following [30]:

Step 1. It is to initialize random numbers to be the cluster centroids. Those centroids are

updated on each iteration according to the following equation:

푵 풎 ∑풌=ퟏ 흁풊풋 풙풊 Equation 5.1 vj = 푵 풎 ∑풌=ퟏ 흁풊풋

Where m is the fuzziness factor, N is the number of point and vj is the center of

cluster j.

36

Step 2. Memberships for each data point are calculated by using the following equation:

ퟏ Equation 5.2 흁풊풋 = ퟐ 풄 ‖풙풊 – 풗풊‖ 풎−ퟏ (∑푳=ퟏ ) ‖풙풊 – 풗푳‖

Where C is the number of clusters, i is the index of the data point, j is the index of

the cluster, 휇푖푗 is the value of the relationship between data point i and cluster j in

the membership matrix U, and xi is the pixel value at index i after converting the

image from a two dimensional to a one dimensional matrix using formula (row =

the pixel row number * the image width in pixels + the pixel column number). In

the step each data point will have percentage of correlation with all clusters. At

this point, each data point cannot determine to which cluster it belongs, so this is

called the fuzzification state, where any data point is related to more than a single

cluster. The total of all membership values for a single data point is equal to one

and the range of each membership value is [0-1].

Step 3. FCM is to calculate the objective function which can be calculated by following

equation:

푵 푪 풎 ퟐ Equation 5.3 J = ∑풊=ퟏ ∑풋=ퟏ[흁풊풋 || 풙풊 – 풗풋|| ]

By using this function, FCM decides when to stop. In each iteration of FCM, the

comparison between the current objective function value and the previous one is

used as the termination condition. If the difference between these two values is

smaller than a certain threshold value, it stops and outputs the segmented data.

Then each data point must belong to a single cluster. This is called the 37

defuzzification state, which can be done by choosing the maximum membership

value for each data point from all its membership values to all clusters.

The following pseudo code shows the steps of the FCM algorithm and Segmentation function:

Algorithm: Fuzzy C-Means Algorithm

1: Set the parameters C for number of clusters, m for the fuzziness parameter and 휖 for the termination criterion, set k = 0. 2: Initialize random cluster centers 3: Initialize the membership matrix U = [휇ij] using Equation 5.2 4: Compute the objective function Jk using Equation 5.3 5: Loop: Initialize the loop counter to k = k + 1 6: Compute the cluster center vectors C (k) = [vi] using Equation 5.1 7: Compute the membership matrix U = [휇ij] using Equation 5.2 8: Compute the objective function Jk using Equation 5.3 9: If ||Jk + 1 – Jk || < 휖 then end the loop otherwise repeat step 5 10: Call segmentation function which performs the actual pixels distribution over the needed number of clusters according to the last calculated membership matrix U.

Algorithm: Segmentation Function

1: Pass the membership matrix U into function (in the matrix U, row r represents a pixel, column c represents a cluster, the value inside matrix represents the membership relation between the pixel r and the cluster c) 2: Set an image size same as original image for each cluster 3: Repeat each row r in U do step 4 ~ step 6 4: Set j to the index of the maximum value in the row r 푟 5: Set (x, y) to convert index r to the original pixel location l = 푊 (l is the pixel row number, W is the image width in pixels) 6: Put the Pixel r value in the image that represents the cluster j at the location of (x, y)

To improve FCM performance accuracy, a new equation to calculate the membership degree was proposed which is called the T2FCM algorithm [50]. This new membership

38

function is in Equation 5.4 which uses the old membership function of the FCM algorithm and added to it.

ퟏ− 흁 풂 흁 풊풋 Equation 5.4 풊풋 = 풊풋 - ퟐ

The membership values show degrees of membership and depends not only on the distance of that point to the cluster center, but also on the distance from the other cluster centers. In addition, when the norm used in the FCM method is different than the Euclidean, introducing restrictions is necessary. Another researcher proposed T2FCM improvement algorithm called Interval Type 2 Fuzzy C-means [51]. This algorithm uses two fuzziness factors m1 and m2 to compute upper and lower membership matrices. Those matrices are computed by Equation 5.5 and Equation 5.6. The value of C is calculated using the value of the current cluster normalized with the maximum pixel value.

ퟏ 풄 ‖풅풊풋‖ ퟐ , 풊풇 ∑풌=ퟏ ( ) <= 푪 ‖풅풊풌‖ ‖풅 ‖ 풎ퟏ−ퟏ 풄 풊풋 ∑풌=ퟏ( ) ‖풅풊풌‖ Equation 5.5 흁풊풋 = ퟏ ퟐ , 푶풕풉풆풓풘풊풔풆 ‖풅 ‖ 풎ퟐ−ퟏ 풄 풊풋 ∑풌=ퟏ( ) { ‖풅풊풌‖

ퟏ 풄 ‖풅풊풋‖ ퟐ , 풊풇 ∑풌=ퟏ ( ) >= 푪 ‖풅풊풌‖ ‖풅 ‖ 풎ퟏ−ퟏ 풄 풊풋 ∑풌=ퟏ( ) ‖풅풊풌‖ Equation 5.6 흁풊풋 = ퟏ ퟐ , 푶풕풉풆풓풘풊풔풆 ‖풅 ‖ 풎ퟐ−ퟏ 풄 풊풋 ∑풌=ퟏ( ) { ‖풅풊풌‖

39

The pixel values are sorted in ascending order. Then two new centers are computed for each cluster using Equation 5.7 (푣푗퐿푒푓푡) and Equation 5.8 (푣푗푅푖푔ℎ푡). The actual cluster center is computed as the average of these two centers using Equation 5.9.

푵 풎 ∑풊=ퟏ 흁풊풋 풙풊 Equation 5.7 풗풋푳풆풇풕 = 푵 풎 , where 흁풊풋 = 흁풊풋 if i <= K, else 흁풊풋 = 흁풊풋 and m = 2 ∑풊=ퟏ 흁풊풋

푵 풎 ∑풊=ퟏ 흁풊풋 풙풊 Equation 5.8 풗풋푹풊품풉풕 = 푵 풎 , where 흁풊풋 = 흁풊풋 if i <= K, else 흁풊풋 = 흁풊풋 and m = 2 ∑풊=ퟏ 흁풊풋

풗 + 풗 풗 풋푹풊품풉풕 풋푳풆풇풕 풗 풗 Equation 5.9 풋 = ퟐ , 풋푳풆풇풕 (Equation 5.7) and 풋푹풊품풉풕 (Equation 5.8)

The objective function for this algorithm is computed by Equation 5.10.

푪 푵 풙풋 풙풋 Equation 5.10 J = ∑풊=ퟏ ∑풋=ퟏ ( ) + ( ) 흁풊풋 흁풊풋

The following pseudo code shows the steps of the update FCM algorithm with IT2:

Algorithm: IT2 Fuzzy C-Means Algorithm

1: Set the parameters C for number of clusters, m1 and m2 for the fuzziness parameter and 휖 for the termination criterion, set k = 0. 2: Sorting the pixels in ascending order, xi-1 < xi < xi+1 3: Initialize random cluster centers V = [vi] 4: Compute the upper and the lower membership matrix U = [ 휇푖푗, 휇푖푗 ] using Equation 5.7 and 5.8 5: Compute the objective function Jk using Equation 5.10 6: Loop: Initialize the loop counter to k = k + 1 7: Get the index of the cluster center from sorted list (K) 8: Compute the new cluster center vectors V = [vj] using Equation 5.9 9: Compute the upper and the lower membership value using Equation 5.7 and 5.8 10: Compute the objective function Jk + 1 using Equation 5.10 11: If ||Jk + 1 – Jk || < 휖 then stop the loop, otherwise repeat step 6 12: Call segmentation function which performs the actual pixels distribution over the needed number of clusters according to the last calculated membership matrix U.

40

5.2.2 Possibilistic C-Means Algorithm

Although FCM is a very good clustering method, it has some disadvantages: the obtained solution may not be a desirable solution and the FCM performance might be inadequate, especially when the data set is contaminated by noise. In FCM, the user determined the number of clusters. In many cases, the clustering operation can accomplish better clustering results if the user sets the number of clusters smaller. However, if the number of clusters chosen by the user is smaller than it should be, some of the data points are added to clusters which do not belong to them. Possibilistic C-Means algorithm provided the solution to this problem by determining the number clusters C based on the density of the data points automatically. PCM has created a possibilistic approach, which uses a possibilistic type of membership function to describe the degree of belonging [52].

It can be used to improve the FCM weaknesses which are inadequate to handle image noise.

It is desirable that the memberships for representative feature points be as high as possible and unrepresentative points have low membership. The objective function is formulated as follows in Equation 5.11:

푪 푵 풎 ퟐ 푪 푵 풎 Equation 5.11 J = ∑풋=ퟏ ∑풊=ퟏ 풕풋(풙풊) * ‖풙풊 − 풗풋‖ + ∑풋=ퟏ ∑풊=ퟏ 풏풋 (1 – 풕풋(풙풊) ) where, ti is computed by using Equation 5.12 and nj is computed by using Equation 5.13.

1 Equation 5.12 tj (xi) = 1 2 푚−1 ‖푥 − 푣 ‖ 퐶 푖 푗 1+ ∑푗=1( ) 푛푗 ( )

41

푵 풎 ퟐ ∑풊=ퟏ 흁풊풋 ∗ ‖풙풊− 풗풋‖ Equation 5.13 nj = 푵 풎 ∑풊=ퟏ 흁풊풋 where 휇ik is computed as same as FCM Equation 5.2. The algorithm has the same steps and same calculation operation as FCM. The PCM is more robust in the presence of noise, in finding valid clusters, and in giving a robust estimate of the centers.

The following pseudo code shows the steps of the PCM algorithm:

Algorithm: Possibilistic C-Means Algorithm

1: Set the parameters C for number of clusters, m for the fuzziness parameter and 휖 for the termination criterion, set k = 0. 2: Initialize random cluster centers V = [vi] 3: Compute the membership matrix U = [휇ij] and t using Equation 5.2 and 5.12 4: Compute the objective function Jk using Equation 5.11 5: Loop: Initialize the loop counter to k = k + 1 6: Compute the cluster center vectors V = [vj] using Equation 5.1 7: Compute the membership matrix U = [휇ij] and t using Equation 5.2 and 5.12 8: Compute the objective function Jk+1 using Equation 5.11 9: If ||Jk + 1 – Jk || < 휖 then end the loop otherwise repeat step 5 10: Call segmentation function which performs the actual pixels distribution over the needed number of clusters according to the last calculated membership matrix U.

5.2.3 Combine Two Algorithms

This combined algorithm was proposed in 2014 [53]. It combines two versions from FCM and PCM to create a new version of fuzzy clustering algorithm. This algorithm produces both membership and possibility using weights as exponents m and n for the fuzziness and possibility which may be represented by a range rather than a precise value; m = [m1, m2] where m1 and m2 represent the lower and upper limits of weighting exponents for fuzziness and n = [n1, n2] where n1 and n2 represent the lower and upper limits of weighting exponents for possibility. The membership’s calculation is split into upper and

42

lower memberships for each point in dataset as shown in Equation 5.5 and Equation 5.6 in

T2FCM algorithm. This process exchange the membership calculation for mix FCM and

PCM version with Equation 5.15 and Equation 5.16.

Equation 5.14

−ퟏ ퟐ −ퟏ ퟏ ퟐ 풎−ퟏ 푪 ‖풙풊− 풗풋‖ 풎−ퟏ 푪 ‖풙풊− 풗풋‖ 흁풊 (풙풌) = min {(∑푳=ퟏ ( ) ) , (ퟏ + ∑풋=ퟏ ( ) ) } ‖풙풊− 풗푳‖ 풏풋

Equation 5.15

−ퟏ ퟐ −ퟏ ퟏ ퟐ 풎−ퟏ 푪 ‖풙풊− 풗풋‖ 풎−ퟏ 푪 ‖풙풊− 풗풋‖ 흁풊 (풙풌) = max {(∑풋=ퟏ ( ) ) , (ퟏ + ∑풋=ퟏ ( ) ) } ‖풙풊− 풗푳‖ 풏풋

The following pseudo code shows the steps of the mixed FCM algorithm and PCM algorithm with IT2:

Algorithm: IT2 Fuzzy and Possibilistic C-Means Algorithm

1: Set the parameters C for number of clusters, m1 and m2 for the fuzziness parameter and 휖 for the termination criterion, set k = 0. 2: Sorting the pixels in ascending order, xi-1 < xi < xi+1 3: Initialize random cluster centers V = [vi] 4: Compute the upper and the lower membership matrix U= [ 휇푖푗, 휇푖푗 ] using Equation 5.15 and 5.16 5: Compute the objective function Jk using Equation 5.10 6: Loop: Initialize the loop counter to k = k + 1 7: Get the index of the cluster center from sorted list (K) 8: Compute the new cluster center vectors V = [vj] using Equation 5.9 9: Compute the upper and the lower membership matrix U= [ 휇푖푗, 휇푖푗 ] using Equation 5.15 and 5.16 10: Compute the objective function Jk + 1 using Equation 5.10 11: If ||Jk + 1 – Jk || < 휖 then stop the loop, otherwise repeat step 6 12: Call segmentation function which performs the actual pixels distribution over the needed number of clusters according to the last calculated membership matrix U. 43

5.3 Proposed Parallel FCM based Algorithm

The Fuzzy C-Means based algorithms which were described in section 5.1 are covered here. The sequential version of FCM based algorithms take a long execution time.

The expensive computation time problem can be solved by utilizing the GPUs capabilities to decrease execution time. We start our parallelization by parallelizing the FCM based algorithms, which need to be identified in the parallelizable section in the algorithms. From

FCM based algorithms, we identified four main steps in each algorithm as following: a) Compute the new cluster centroids:

Can this operation be parallelized? The answer is yes. To compute a single cluster

centroid, all the pixel values in the image and all the values in cluster column in the

membership matrix are required. Parallelizing this operation, a thread can be created

for each cluster and made to iterate through the cluster column in the membership

matrix. Parallelize this operation, then the maximum number of threads that can be

created and run in parallel is equal to the number of clusters in the image, which is

usually a small number. b) Compute the membership matrix:

Can this operation be parallelized? The answer is yes. This operation required all the

pixel values and all the cluster centroids to be computed. To compute a single value in

the membership matrix, only the pixel value for which the membership value is being

computed, and current centroid for all clusters, are required. To parallelize this

44

operation, a thread can be created to do the computation of each value, which means a

very large number of threads can work together without any synchronization problems. c) Compute the objective function:

Can this operation be parallelized? The answer is not suitable. This operation requires

the pixel values, the cluster centroids, and the membership matrix. To parallelize this

operation, the summations should be divided to work on smaller portion of the data that

usually requires a high degree of synchronization. It will have large overhead and

cannot benefit performance for increasing excision time, so this operation is not

suitable for parallelization. d) Segmentation function:

Can this operation be parallelized? The answer is yes. This operation requires

information on where the pixel belongs. It requires only the membership matrix. To

do this operation for a single pixel, only the pixel row in the membership matrix is

required. In this operation, a thread can be created to perform the required

computations for each pixel and a high level of parallelism can be achieved.

Based on those 4 steps, we developed the parallel version of the FCM-based algorithms. The segmentation function is the same in all the FCM-based algorithms and can be parallelized. Parallel segmentation function algorithm is shown as following:

Algorithm: Parallel Segmentation Function

1: Pass the membership matrix U into function 2: Set an image size same as original image for each cluster (in Shared Memory) 3: Set a thread for each row r in the membership matrix U 4: Repeat each row r in U do step 5 ~ step 7 45

5: Set j to the index of the maximum value in the row r 푟 6: Set (x, y) to convert index r to the original pixel location l = 푊 (l is the pixel row number, W is the image width in pixels) 7: Put the Pixel r value in the image that represents the cluster j at the location of (x, y)

In FCM based algorithms, the membership computation function is done for each value in the membership matrix independently. For each value, a thread is created to perform the required computations parallelized in T2FCM. Moreover, the parallelization has been done for the FCM algorithm which computes the new membership matrix A. We present a general pseudo code Parallel FCM based algorithm, which include FCM, T2FCM and PCM, shown as following:

Algorithm: Parallel FCM based algorithms

1: Set the parameters C for number of clusters, m for the fuzziness parameter and 휖 for the termination criterion, set k = 0. 2: Initialize random cluster centers V = [vi] 3: Compute the element of the membership matrix U on CPU 4: If (Segmentation method is PCM) then 5: Compute nj on CPU 6: Compute t on CPU 7: Allocate memory for the image data on the GPU (Pixels, Clusters, and memberships) 8: Compute the objective function Jk on CPU 9: Transfer Pixel data to the GPU memory 10: Loop: Initialize the loop counter to k = k + 1 11: Compute the cluster center vectors V = [vj] on CPU 12: Transfer the cluster data to the GPU memory 13: Compute the membership matrix U using multithreading on GPU 14: Transfer the membership matrix data from the GPU to the CPU memory 15: If (Segmentation method is PCM) then 16: Compute nj on CPU 17: Compute t on CPU 18: Compute the objective function Jk+1 19: If ||Jk + 1 – Jk || < 휖 then end the loop otherwise repeat step 5 20: Call parallel segmentation function which performs the actual pixels distribution over the needed number of clusters according to the last calculated membership matrix U.

46

In the other two FCM-based algorithms, IT2FCM and IT2FPCM, the computation of the upper and the lower membership matrices can be converted to parallel operation using a multithreading method. It needs the sorting algorithm in which we choose a built- in sorting function on the CPU side because of two reasons. The first, the sorting operation is needed only one time at the beginning of the algorithm and not in each iteration. The second, moving all the pixel data to GPU memory before sorting causes a lot of overhead and that each pixel point is stored in a struct data type which contains many attributes, x- axis, y-axis, Red, Green, Blue and α value, CPU works better than GPU with struct data type. In both algorithms, the need to search for the index value of the cluster center K is in each iteration. We use the binary search algorithm in here because it is faster in searching data. We present a general pseudo code Parallel FCM based algorithms, which include

IT2FCM and IT2FPCM, shown as following:

Algorithm: Parallel IT2FCM and IT2PFCM Algorithms

1: Set the parameters C for number of clusters, m1 and m2 for the fuzziness parameter and 휖 for the termination criterion, set k = 0. 2: Sorting the pixels in ascending order, xi-1 < xi < xi+1 3: Initialize random cluster centers V = [vi] 4: Compute the upper and the lower membership matrix U on CPU 5: Allocate memory for the image data on the GPU (Pixels, Clusters, and memberships) 6: Compute the objective function Jk using Equation 5.10 on CPU 7: Transfer Pixel data to the GPU memory 8: Loop: Initialize the loop counter to k = k + 1 9: Get the index of the cluster center in sorted list (K) 10: Compute the new cluster center vectors V = [vj] using Equation 5.11 11: Transfer Cluster data to the GPU memory 12: Compute the upper and the lower membership matrix U on GPU 13: Transfer membership matrixes from the GPU memory to the CPU memory 14: Compute the objective function Jk + 1 using Equation 5.10 on CPU 15: If ||Jk + 1 – Jk || < 휖 then stop the loop, otherwise repeat step 8 47

16: Call parallel segmentation function which performs the actual pixels distribution over the needed number of clusters according to the last calculated membership matrix U.

5.4 Discussion

In this chapter, we show all FCM-based algorithms such as FCM, T2FCM, PCM,

IT2FCM, and IT2FPCM in detail and developed the new parallel version for each of those algorithms using CUDA programming. All the algorithms can be used for 2D and 3D medical image segmentation especially for the 3D model which will have a long processing time. Due to the nature of building a 3D image, which is stacks of 2D images, the algorithm present can keep its accuracy of the segmentation at the same time improving its speed of performance in our experiments.

48









,PSOHPHQWDWLRQDQG3HUIRUPDQFH(YDOXDWLRQ

 ,PSOHPHQWDWLRQIRUSDUDOOHOVHJPHQWDWLRQ

7KH VHTXHQWLDO LPSOHPHQWDWLRQ RI )&0EDVHG DOJRULWKPV WDNHV ORQJ H[HFXWLRQ

WLPH  7KH H[SHQVLYH FRPSXWDWLRQ WLPH SUREOHP FDQ EH VROYHG E\ XWLOL]LQJ WKH *38V

FDSDELOLWLHVWRGHFUHDVHH[HFXWLRQWLPH)LUVWZHQHHGWRILQGRXWWKHKHDYLHVWIXQFWLRQVLQ

VHTXHQWLDOLPSOHPHQWDWLRQZKLFKDUHPRVWVXLWDEOHWREHH[HFXWHGRQWKH*38VLGHE\

UXQQLQJLWZLWK0LFURVRIW9LVXDO6WXGLRSURILOHU&DOFXODWLQJPHPEHUVKLSYDOXHVDUH

LGHQWLILHG DV WKH KHDYLHVW IXQFWLRQ IRU &38  7KLV IXQFWLRQ LV DSSURSULDWH WR EH UXQ LQ

SDUDOOHOEHFDXVHHDFKPHPEHUVKLSYDOXHIURPWKHPHPEHUVKLSPDWUL[QHHGVWRUHDGWKH

YDOXHRIGDWDSRLQWVDQGFOXVWHUFHQWURLGVWKHQHDFKPHPEHUVKLSYDOXHZLOOZULWHRQLWVRZQ

PHPRU\ORFDWLRQ:HFDQVHHWKDWWKHUHDUHQRFRQIOLFWLQJRSHUDWLRQV(DFKWKUHDGFDQ

UHDGWKHVDPHPHPRU\ORFDWLRQZLWKRXWDQ\HIIHFWVWRWKHRWKHUV8SGDWLQJWKHFOXVWHUV

FHQWURLGVDQGFDOFXODWLQJWKHREMHFWLYHQHHGHGWRUHDGYDOXHVIURPRWKHUPHPRU\ORFDWLRQV

EXWWKHVXPPDWLRQRSHUDWLRQPXVWZULWHLQRQHPHPRU\ORFDWLRQ,QWKLVVLWXDWLRQWKH

WKUHDGVPXVWKDYHDFWLYHV\QFKURQL]DWLRQEHWZHHQWKHPLQRUGHUWRDYRLGWKHQDVW\UHDG

SUREOHPDQGPDLQWDLQWKHFRQVLVWHQF\RIWKHVXPPDWLRQYDOXHRWKHUZLVHLWZLOOFUHDWHGHOD\

LQ WKH H[HFXWLRQ WLPH  ,Q RXU SDUDOOHO LPSOHPHQWDWLRQ WKH PHPEHUVKLS FRPSXWDWLRQ

IXQFWLRQLVWUDQVIHUUHGWRWKH*38VLGHDIWHUWKHSURILOLQJSURFHVVEHFDXVHLWLVWKHKHDYLHVW





function on the CPU side. The other functions, which are update centroids and calculate objective function, are run on the CPU side because they are not easily parallelizable as they require performing summation operations.

Parallel programming has many challenges to achieve high performance improvements. The first challenge is the memory management. Transferring data between the CPU side and the GPU side causes long delays. To handle this problem, we must track any useless transaction, when data transactions between CPU and GPU, which should be eliminated. In this parallel implementation, we need to transfer more than one variable between CPU and GPU; it is about 45 million data points in any single iteration and must be transferred between CPU and GPU. Moreover, this big data transfer creates lengthy delay times. This problem can be solved by using the one direction transfer technique [54].

Therefore, only the cluster index values and cluster centroids are transferred to the GPU side, and the membership matrix is transferred from the GPU side to the CPU side to calculate the objective function on the CPU side. The second challenge is how to utilize the GPU. In our implementation, the best wrap size of threads in each block must be determined.

The number of threads that run in a single block of memory is chosen which 256 threads per block is. Moreover, this is a suitable number of threads that gives 100 % utilization of the GPU shown in Table 6.1. The number of blocks is calculated using

Equation 6.1, which is needed to launch the CUDA kernel, the utilization of threads inside each single block must be increased. Thus, we must set the size of warp in a block which

50

was done by using data type dim3 in CUDA programming language. Therefore, the number of threads 256 can be represented as 32 threads in x-axis and 8 threads in y-axis.

This will increase the utilization inside the single block of threads.

Equation 6.1 number of blocks = data size / number of threads

Table 6.1: GPU utilization for different number of threads in % [54]

Thread capability 1.0 1.1 1.2 1.3 2.0 2.1 3.0 64 67 67 50 50 33 33 50 96 100 100 75 75 50 50 75 128 100 100 100 100 67 67 100 192 100 100 94 94 100 100 91 256 100 100 100 100 100 100 100 384 100 100 75 75 100 100 94 512 67 67 100 100 100 100 100 768 N/A N/A N/A N/A 100 100 75 1024 N/A N/A N/A N/A 67 67 100

Table 6.2: Access times of different memory types [54]

Storage Type Bandwidth Latency

Register 8 TB/s, 1 cycle

Shared Memory 1.5 TB/s, 1 - 32 cycle

Texture Memory 200 MB/s, 400 – 600 cycle

Constant Memory 200 MB/s, 400 – 600 cycle

Global Memory 200 MB/s, 400 – 600 cycle

51

The third challenge is choosing the most suitable memory type. As shown in Table

6.1, the fastest memory type which is the local memory such as registers; it can access 8

TB/second in one cycle on GPU. The slowest memory which is global memory; it can access 200 MB/second in 400 cycles. Therefore, the registers are multiple orders of magnitude faster than the global memory. The data dependencies must be reduced to avoid using global memory. In our implementation, the membership values computation has zero dependency on its operation and it is suitable to be run on the GPU side, so it does not need to use the slow global memory. This is one of the advantages of the parallel implementation.

6.2 Experiments

In simulation, we present the results of comparing the different implementations discussed in chapter 5. First, the hardware and software that are used in the experiments are described. Moreover, we discuss calculating the machine epsilon for the used machine in order to choose the number of precision rounding operations. Second, we discuss the dataset using in experiments. Third, the segmentation results after reconstructing the 3D model are displayed.

6.2.1 Machine Environment

The table 6.3 shows the specification of the machine used for conducting the experiments:

52

Hardware: we use a desktop pc with an Intel® Core™ i5-3470 Quad-Core 3.20 GHz CPUs, and 16 GB DDR3 memory. For the GPU, we use NVIDIA GeForce 740M GT with 2GB

DDR3 memory. The memory bandwidth for the GPU is 29 GB/s. The bus is PCI Express

X8 Gen3.

Software: we use 64-bit Window 8 as the operating system with the CUDA toolkit version

7.5 with NVIDIA hardware drivers. For the development environment, we use Visual

Studio 2013.

Table 6.3: Configuration of CPU and GPU Hardware

Device CPU GPU Features

Name i5-3470 GeForce GT740M Frequency 2.40 GHz 810 MHz Number of cores 4 384 Installed Memory 16 GB DDR3 2 GB DDR3 Multiprocessor count 1 2 Operating System Windows 8 64 buts Windows 8 64 bits Programming Language C++ CUDA 7.5 Memory bus width 21 GB/s 29 GB/s Max Threads per block 4 1024 Memory clock rate 21GB/s 1800 MHz Max threads per multiprocessor 4 1024

6.2.2 Dataset Description

The NEMA dataset consists of 35 DICOM files captured using the CT technology.

Each file has CT image size 512 x 512. We build the 3D model from those files by using the Fellow Oak DICOM 2.0.2 library. It will create around 9,175,040 pixels as the 3D 53

volume object. Moreover, each pixel data has four values to represent it, which will generate 36,700,160 data elements in the memory.

6.2.3 Cluster Validity Functions

Two types of cluster validity functions, fuzzy partition and feature structure, are often used to evaluate the performance of clustering in different clustering methods. The representation function for the fuzzy partition are partition coefficient Vpc [55] and partition entropy Vpe [56]. They are defined as follows:

푵 풄 ퟐ ∑풋 ∑풊 풖풊풋 Equation 6.2 Vpc = 푵

And

푵 풄 − ∑풋 ∑풊 풖풊풋 풍풐품 풖풊풋 Equation 6.3 Vpe = 푵

The idea of these validity functions is that the partition with less fuzziness mean better performance. As a result, the best clustering is achieved when the value Vpc is maximal or Vpe is minimal. We used this cluster validity function to evaluate our algorithm’s performance for the experiments.

6.2.4 Results

In our experiment, we run the implementation of each FCM-based algorithm on 2D medical images, a side brain image and a top brain image, for sequential version and parallel version. We measure the execution time for each algorithm by averaging the

54

execution time of five runs on each algorithm experiment. We use Equation 6.1 to compute the speedup achieved by parallelizing the implementation of each algorithm.

Equation 6.1 Speedup = Sequential Execution Time / Parallel Execution Time

Before running the experiment, we need to compute the machine epsilon. The

CPU’s machine epsilon is 1 x 10-45 whereas the GPU’s machine epsilon is 1 x 10-36. There is a big difference in precision between the CPU and the GPU. The threshold value we use for the experiment is 1 x 10-5 because it can avoid any overflow error when we transfer data between the CPU and the GPU. The maximum number of iterations for the FCM- based algorithm is 100 and each test is repeated five times.

The execution times of all algorithms for both sequential and the parallel implementations on the side brain and the top brain shown in Figure 6.2 and Figure 6.4.

The speedup results are shown in Table 6.4. From the experiments, the highest improvement in execution time is 7.17x and 9.10x in the case of IT2FCM algorithm. In addition, the improvement in performance does not make IT2FPCM better because the accuracy of IT2FCM is the best between all the FCM-base algorithms shown in Table 6.5,

Table 6.6, and Figure 6.4. Moreover, the segmentation results from the parallel versions for all FCM-based methods are shown in Figure 6.1 for the side brain image. We can clearly see from Table 6.4 and Table 6.5 that the execution speed of the algorithms is increased with the same accuracy.

55

Table 6.4: Speed-up for GPU Implementations

Algorithm Name FCM T2FCM PCM IT2FCM IT2FPCM On the Side Brain Image 4.73x 4.49x 4.00x 7.17x 5.68x On the Top Brain Image 5.56x 4.56x 4.80x 9.10x 8.25x

Table 6.5: The performance of FCM-based algorithm comparison on the side brain image

Side Brain Image FCM T2FCM PCM IT2FCM IT2FPCM Vpc 0.888 0.899 0.915 0.96 0.951 CPU Vpe 0.233 0.223 0.133 0.032 0.044 Vpc 0.889 0.9 0.914 0.961 0.952 GPU Vpe 0.234 0.221 0.131 0.03 0.042

Original

FCM T2FCM PCM

IT2FCM IT2FPCM Figure 6.1: Proof of segmentation accuracy on the side brain image

56





&38 *38 









 ([HFXWLRQ7LPHV V 

 )&0 7)&0 3&0 ,7)&0 ,7)3&0 )&0%DVHG0HWKRG 

)LJXUH&38DQG*38([HFXWLRQ7LPHRI)&0EDVHG$OJRULWKPVRQWKH6LGH%UDLQ ,PDJH

7DEOH7KHSHUIRUPDQFHRI)&0EDVHGDOJRULWKPFRPSDULVRQRQWKHWRSEUDLQLPDJH

7RS%ULDQ,PDJH )&0 7)&0 3&0 ,7)&0 ,7)3&0 9SF 0.884 0.899 0.93 0.961 0.953 &38 9SH 0.27 0.189 0.125 0.029 0.037 9SF 0.283 0.898 0.932 0.962 0.955 *38 9SH 0.271 0.188 0.123 0.028 0.035 

&38&OXVWHU *38&OXVWHU 

)LJXUH$FFXUDF\RIWKH,7)&0DOJRULWKP



 



 &38 *38









([HFXWLRQ7LPH V 



 )&0 7)&0 3&0 ,7)&0 ,7)3&0 )&0%VHG0HWKRG 

)LJXUH&38DQG*38([HFXWLRQ7LPHRI)&0EDVHG$OJRULWKPVRQWKH7RS%UDLQ ,PDJH

)URPILUVWSKDVHRIWKHH[SHULPHQWVZHFDQFRQFOXGHWKDW,7)&0LQ)&0%DVHG

DOJRULWKPVVHHPWREHWKHEHVWRSWLRQIRUWKHVHJPHQWDWLRQ7KHQH[WSKDVHRIH[SHULPHQW

LVDSSO\LQJWKHSDUDOOHOYHUVLRQDOJRULWKPRI,7)&0WRD'REMHFWIRUVHJPHQWDWLRQ7KH

'YROXPHZDVFRQVWUXFWHGIURP'LPDJHVOLGHVZKLFKDUH[DQGFRQWDLQ

DURXQGSL[HOV(DFKSL[HOGDWDKDVIRXUSURSHUW\YDOXHVVRWKDWWKHLQSXWLPDJH

ZLOO KDYH  GDWD HOHPHQWV LQ PHPRU\  ,Q RXU H[SHULPHQWV ZH FRPSDUH WKH

VHTXHQWLDO YHUVLRQ DQG SDUDOOHO YHUVLRQ  ,Q )LJXUH  ZH VKRZ D SDUDOOHO YHUVLRQ RI

,7)&0DOJRULWKPIRUVHJPHQWDWLRQRQ*38KDGDQDYHUDJHRIVHFRQGVDQGRQWKH

&38KDGDQDYHUDJHRIVHFRQGV7KHUHVXOWVVKRZDSDUDOOHOYHUVLRQLVDOPRVW

VSHHGXS[ [ PRUHWKDQWKHVHTXHQWLDOYHUVLRQ

 

 



10000 9000 8000 7000 6000 5000 4000 3000

([HFXWLRQ7LPH V 2000 1000 0 CPU GPU ,7)&0 

)LJXUH([HFXWLRQ7LPHVRI,7)&0RQ&38DQG*38

 7KLVUHVHDUFK

7KLVUHVHDUFKSUHVHQWVDQRYHOSDUDOOHODOJRULWKPIRU)&0%DVHGVHJPHQWDWLRQIRU

' DQG ' PHGLFDO LPDJHV  7KLV SDUDOOHO DOJRULWKP SURYLGHV WKH LPSURYHPHQW RI WKH

SHUIRUPDQFH RI H[HFXWLRQ WLPH RQ *38 DQG VWLOO PDLQWDLQV LWV RXWSXW DFFXUDF\  7KH

UHVHDUFKDLPHGDWLPSURYLQJWKHSHUIRUPDQFHRIWKHVHJPHQWDWLRQSURFHVV:HIRFXVRQ

WKH )X]]\&0HDQV )&0  %DVHG FOXVWHULQJDOJRULWKP ZKLFK LQFOXGHV)&07)&0

3&0 ,7)&0 ,7)3&0 DQG HPSOR\HG *38 SDUDOOHOLVP FDSDELOLWLHV WR LPSURYH LWV

SHUIRUPDQFH)URPRXUH[SHULPHQWVWKHEHVWSDUDOOHOLPSOHPHQWDWLRQZHGHYLVHGLQRXU

UHVHDUFKZDV[IDVWHUFRPSDUHGWRWKHVHTXHQWLDOLPSOHPHQWDWLRQRQ'VHJPHQWDWLRQ,Q

DGGLWLRQLQLWLDOLPSURYHPHQWRQ'YROXPHVHJPHQWDWLRQRQSDUDOOHOLPSOHPHQWDWLRQZDV

[ IDVWHU FRPSDUHG ZLWK WKH VHTXHQWLDO LPSOHPHQWDWLRQ  7KHUHIRUH ZH QHHG PRUH

UHVHDUFKRQ'VHJPHQWDWLRQDOJRULWKPVGHULYHGIURP)&0EDVHGDOJRULWKPV 

 







&RQFOXVLRQVDQG)XWXUH:RUN

7KLVFKDSWHUFRQFOXGHVWKHZRUNGHVFULEHGLQWKLVGLVVHUWDWLRQDQGWKHLPSURYHPHQW

WRWKHWHFKQLTXHDQGJLYHVVRPHLQVLJKWVDERXWWKHIXWXUHZRUNDQGWKHIXWXUHUHVHDUFK

GLUHFWLRQRIWKHFXUUHQWWHFKQLTXH

 &RQFOXVLRQ

0HGLFDOLPDJHSURFHVVLQJLVRQHRIPRVWLPSRUWDQWILHOGVLQWKLVHUDEHFDXVHRILWV

FRQWULEXWLRQ WR UHGXFLQJ WLPH RI GLDJQRVLQJ LOOQHVVHV  0DQ\ WHFKQLTXHV LQ LPDJH

SURFHVVLQJDUHSURSRVHGWRLPSURYHWKHTXDOLW\RIPHGLFDOLPDJHVVXFKDVUHFRQVWUXFWLRQ

VHJPHQWDWLRQHWF,QWKH$,ILHOGWKHGHQRLVLQJDQGVHJPHQWDWLRQRSHUDWLRQVDUHXVHG

DVSUHSURFHVVLQJVWHSVEHIRUHFODVVLILFDWLRQ)XUWKHUPRUHWKRVHRSHUDWLRQVWDNHH[WUHPHO\

ORQJWLPHV WR ILQLVKEHIRUHFODVVLILFDWLRQLQ WKHVDPHFDVH  7KHGHYHORSPHQWV LQ +LJK

3HUIRUPDQFH&RPSXWLQJOLNHWKH*UDSKLFV3URFHVVLQJ8QLW *38 WHFKQRORJLHVSUHVHQWHG

SDUDOOHOLVPDVDVROXWLRQIRUWKLVSUREOHP

7KLV GLVVHUWDWLRQ SURSRVHV SDUDOOHO DOJRULWKPV IRU )X]]\ &0HDQV %DVHG

VHJPHQWDWLRQSURFHVVHVRQPHGLFDOLPDJHVIRU'DQG'RQ*38ZLWK&8'$7KHUHIRUH

ZH DLPHG DW LPSURYLQJ WKH SHUIRUPDQFH RI WKH VHJPHQWDWLRQ SURFHVV ZLWK *38  7KH

UHVHDUFK EHJDQ ZLWK DQDO\]LQJ GLIIHUHQW WHFKQLTXHV DQG FKDOOHQJHV RQ DOO )&0%DVHG

VHJPHQWDWLRQDOJRULWKP7KHFKDOOHQJHVIRUHDFKDFFXUDF\LPSURYHPHQWRQ)&0%DVHG 



segmentation algorithms are increasing computation time especially on the IT2FCM algorithm. To handle the optimization challenge, the parallelism of the GPU is suitable to improve the performance execution time because it can handle huge data elements in computation. A large performance gap occurs between GPU and a general-purpose multi- core CPU. GPU is highly parallel, multithread, multiple core processors and higher memory bandwidth give the solution to the computational problem such as medical image segmentation processing. NVIDIA introduced its own massively parallel architectures called Compute Unified Device Architecture (CUDA) in 2006 and contributed to the evolution in a GPU programming model. Our proposal parallel FCM-based algorithm was developed on CUDA programming. We developed novel parallel algorithm for IT2FCM and IT2FPCM, which contributed to improving performance on the medical image segmentation process. From our experiments, the best parallel implementation we devised in our research was 9x faster compared to the sequential implementation on 2D segmentation. In addition, initial improvement on 3D volume segmentation on parallel implementation was 11.8x faster compared with the sequential implementation. Therefore, we need more research on 3D segmentation algorithms derived from FCM-based algorithms.

7.2 Future Work

The most important challenge in the future of medical image segmentation is 3D volume segmentation. Even though this research shows some initial improvement for 3D volume segmentation, it still has some room for improvement. We will continue work on 61

new 3D segmentation algorithms derived from FCM-based algorithms. In the future, we can expand this research to considering the other steps in the segmentation process of medical images such as region growing, features selection and classification with GPU by

CUDA programming. Another interest research idea is the challenge to the extraction of oblique slices from 3D volumetric data. Orthographic views are sometimes too restrictive and breaking this limitation on orientation to offer arbitrary oblique views is a natural extension. The suggested oblique views need to be obtained by computer reconstruction from orthographic slices. Moreover, future research could be focused on deep learning in medical image segmentation.

7.2.1 Deep Learning in Medical Image Segmentation

Currently, the recognized golden standard segmentation results are obtained from experienced physicians and radiologists via their visual inspection and manual delineations. However, the annotation of volumetric images with hundreds of slices in a slice-by-slice manner is tedious, time-consuming and very expensive. Moreover, the manual labeling is subjective, suffers from the low reproducibility and would introduce a high inter-observer variability, as the quality of the segmentation results could be significantly influenced by the operator’s experience and knowledge. Finally, automated segmentation algorithms are highly demanded, especially if we would like to efficiently obtain accurate and reproducible segmentation results in day-to-day clinical practice.

Automated volumetric medical image segmentation is indeed a challenging task. The appearance and shape variations of the targeting objects are often significant among 62

patients. The boundary between the targeting organs or structures and its neighboring tissues is usually ambiguous with limited contrast, which is in essence caused by their similar imaging-related physical properties, e.g., attenuation coefficients in CT image in and relaxation time in MR imaging [57]. Various algorithms have been extensively studied to meet these challenges in the past decades. Those algorithms mainly utilized statistical shape modeling, level sets, active contours, multi-atlas and graphical models, with handcrafted features. These handcrafted features usually have an overly limited representation capability to deal with the large variations of appearance and shape. In recent years, the deep learning base methods have been explored to seek more powerful features, but it is still difficult for those methods to take full advantage of the 3D spatial information existing in the volumetric medical image to gain satisfactory segmentation results.

7.2.2 Convolutional Neural Networks for volumetric structure segmentation

Convolutional neural networks have been demonstrating state of the art performance on many challenging medical image analysis tasks, including classification, detection and segmentation in recent years. CNNs are designed to better utilize spatial and configural information by taking 2D or 3D image as input [58]. Structurally, CNNs have convolutional layers interspersed with pooling layers, followed by fully connected layers as in a standard multilayer neural network. Unlike a deep neural network, a CNN exploits three mechanisms – a local receptive field, weight sharing, and subsampling that greatly reduce the degree of freedom in a model. Due to the mechanisms of weight sharing and 63

local receptive field, when the input feature map is slightly shifted, the activation of the units in the feature maps is shifted by the same amount. A pooling layer follows a convolutional layer to down sample the feature maps of the preceding convolutional layer.

Each feature map in a pooling layer is linked to a feature map in the convolutional layer; each unit in a feature map of the pooling layer is computed on the basis of a subset of units within a local receptive field from the corresponding convolutional feature map. Similar to the convolutional layer, the receptive field finds a representative value among the units in its field. Normally, a change in the size of the receptive field during convolution is set equal to the size of the receptive field for subsampling, helping the CNN to be translation invariant. Theoretically, the gradient descent method combined with a back-propagation algorithm can also be applied to the learning parameters of a CNN. Moreover, due to the special mechanisms of weight sharing, the local receptive field, and pooling, slight changes need to be made. That is, one needs to sum the gradient for a given weight over all the connection using the kernel weights in order to determine which patch in the layer’s feature map correspond to a unit in the next layer’s feature map, and to up sample the feature maps of the pooling layer to recovers the reduce size of the maps. Therefore, CNN-based volumetric medical image segmentation algorithms can be categorized into two group 2D

CNN based and 3D CNN based methods.

The 2D CNN based methods usually segment the volumetric CT or MR data in a slice-by-slice manner. For example, a researcher proposed two-pathway shallow networks with various cascaded architectures for low/high grade glioblastomas segmentation in brain

64

MR images [59]. Another research proposed spatial aggregation of holistically-nested networks for pancreas segmentation in CT scans [60]. Other representative work was U-

Net which formed a 2D fully convolutional architecture and was efficient for dense medical image segmentations [61] [62]. Evan though these 2D CNN based methods have greatly improved the segmentation accuracy over traditional hand-crafted features-based methods, they are not optimal for volumetric medical image analysis as they cannot take full advantage of the special information encoded in the volumetric data. The 2.5D methods have been proposed to incorporate richer spatial information but were still limited to 2D kernels [63]. To overcome this difficulty, 3D CNN based algorithms have been recently proposed, aiming at extracting more powerful volumetric representations across all three spatial dimensions [64] [65]. For example, researcher proposed a 3D network consisting of dual pathways and the training strategy exploited a dense inference technique on image segments to overcome computational burden. Moreover, using a 3D conditional random field model, this framework has demonstrated greater performance on lesion segmentation from multi-channel MR volumes with traumatic brain injuries, brain tumors and ischemic stroke.

Several 3D volumes to volume segmentation networks have been proposed. The

3D U-Net extended the 2D U-Net into a 3D version, which had an analysis path to abstract features and a synthesis path to produce a full-resolution segmentation [66]. The shortcut connections were established between layers of equal resolution in the analysis and synthesis paths. The V-Net divided the architecture into stage and incorporated residual

65

connections [67]. The V-Net was trained towards a novel Dice coefficient based objective function which focused to deal with the class imbalance situation. The VoxResNet profoundly borrowed the spirit of 2D deep residual learning [68] and constructed a very deep 3D network [69]. Multi-modality input and multi-level contextual information were further leveraged to produce greater brain segmentation results. The I2I-3D was proposed for vascular boundary detection [70]. To localize small vascular structures, complex multi-scale interactions were designed via a mixing layer which concatenated features in upper and lower layers, following by 1 x 1 x 1 convolution. The I2I-3D included auxiliary supervision via side outputs in a holistic and dense manner.

7.2.3 3D CNNs Medical Images Segmentation Challenges

Recently, convolutional neural networks, leveraging their hierarchically learned highly representative features, have revolutionized the natural image processing [68] [62], and also have successful applications in medical image analysis domain [71] [72] [73].

Deep learning-based methods have been emerging as a competitive and important branch of alternatives to resolve the traditional medical image segmentation tasks. Deep CNNs have shown remarkable success in 2D medical image segmentation [74] [64] [75], and it is still a difficult task for CNNs to segment objects from 3D medical images. This will be good area for our next research. We will plan to research deep learning-based methods such as Deep CNNs on 2D medical image segmentation and explore the possibility of parallelization of segmentation on Deep CNNs and develop the parallel algorithm to improve its performance. Our next research area will be Deep CNNs to segment objects 66

from 3D medical images challenges: First, the 3D medical images have much more complicated anatomical environment than 2D images, and hence 3D variants of CNNs with many more parameters are usually required to capture more representative features.

Second, training such a 3D CNN often confronts various optimization difficulties, such as over-fitting, gradients vanishing or exploding, and slow convergence speed.

7.2.4 Approach to the Challenges

Our first approach to those challenges begins with the research 3D convolutional networks for volumetric medical image segmentation. The 3D CNN is capable of encoding representations from volumetric receptive fields and extracting more discriminative features via 3D spatial information. The main components of the 3D CNN are the 3D convolutional layer and 3D sub-sampling layers, which are successively stacked as a hierarchical architecture. The feature maps contain neuron activations in each layer and are a set of 3D tensors. To generate a new feature volume in a convolutional layer, a set of 3D kernels sweep over the inputs, sum up the activations from these kernels, and add a bias term and finally apply a non-linear activation function. The neurons have sparse interactions and the kernel weights are spatially shared which can greatly reduce the number of parameters and hence alleviate the computational workload of the model. The

3D kernels are learned via the stochastic gradient descent in a data-driven manner which is a key advancement of convolutional networks compared with traditional pre-defining transformations of hand-crafted features. In a sub-sampling layer, the output responses from a convolutional layer are further modified by computing the summary statistic of 67

nearby neurons. Using 3D max-pooling functions, the maximum response within a small cubic neighborhood is selected out and preceded to subsequent computations. After the pooling operation, the resolution of feature volumes is reduced corresponding to the pooling kernel size. Theoretically, the pooling contributes to make the learned features become invariant to local translations in 3D space, which is a very useful characteristic for the image processing [76]. Brosch developed a 3D deep convolutional encoder network that combined interconnected convolutional and deconvolutional pathways [77]. The convolutional pathway learned higher level features, and deconvolutional pathways predicted the voxel level segmentation. This dual pathway and the training strategy exploit a dense inference technique on image segments to overcome computation burden. This method could apply to the development of a 3D fully convolutional architecture with 3D deconvolutional layer to bridge the coarse feature volumes to the dense probability prediction for voxel-level segmentation tasks. This could address the first challenges of volumetric medical image segmentation.

Our second approach begins with research for 3D deep supervision mechanism. To segment the structures from the complicated anatomical environment in volumetric medical images, we usually need relatively deep models to encode highly representative features. On the other hand, training a deep network is broadly recognized as a difficult task. One notorious problem is the presence of gradients vanishing or exploding, which would make the loss back-propagation ineffective, and hamper the convergence of the training process [78]. Bradley found that the back-propagated gradients would become

68

smaller as they move from the output layer towards the input layer during the training [79].

This would make different layers in the network receive gradients with very different magnitude, leading to bad conditioning and slower training. The training challenges could be critical in volumetric medical image segmentation tasks due to the low inter-class voxel variation in medical images, the larger number of parameters in 3D networks compared with 2D counterparts, and the limited training data for many medical applications.

Therefore, we could develop a 3D deep supervision mechanism by formulating an objective function that directly guides the training of both upper and lower layers in order to reinforce the prorogation of gradients flows within the network and hence learn more powerful and representative features. This mechanism should simultaneously speed up the optimization process and improve discrimination capability of the model. This could address the second challenges of volumetric medical image segmentation.

69

REFERENCES

[1] S. AlZubi, M. Sharif and M. Abbod, "Mulitresolution analysis using wavelet

redgelet, and curvelet tranforms for mecdical image segementation," Biomed

Imaging, vol. 4, 2011.

[2] C. Chen, L. F. Pau and P. S. Wang, "Handbook of pattern recognition and computer

vision.," World Scientific, vol. 27, 2010.

[3] K. S. Chuang, H. L. Tzeng, S. Chen, J. Wu and T. J. Chen, "Fuzzy c-mean sclustering

with spatial information for image segmentaion.," Computer Medial Imaging Graph,

vol. 30, no. 1, pp. 9-15, 2006.

[4] A. Eklund, P. Dufort, D. Forsberg and S. LaConte, "Medical image processing on the

gpu past, present and future," Medical Image Analysis, vol. 17, no. 8, pp. 1073-1094,

December 2013.

[5] L. G. Ugarriza, E. Saber , S. R. Vantaram, V. Amuso, M. Shaw and R. Bhaskar ,

"Automatic image segmentaion by dynamic region growth and multiresolution

merging.," IEEE Trans. Image Process, vol. 18, no. 10, pp. 2275-2288, 2009.

70

[6] L. Wang, B. Yang, Y. Chen, Z. Chen and H. Sun, "Accelerating FCM neural network

classifier using graphics processing units with CUDA.," Appl. Intell., vol. 40, no. 1,

pp. 143-153, 2014.

[7] C. Moler, "Matlab incorporates lapack.," Math Word Technical Articales and

Newsletters, 2000.

[8] W. Cai, S. Chen and D. Zhang, "Fast and robust fuzzy c-means clustering algorithms

incorportating local information for image segmentation.," Pattern Recogn, vol. 40,

no. 3, pp. 825-838, 2007.

[9] L. O. Hall and D. B. Goldgof, "Convergence of the single-pass and online fuzzy c-

means algorithms.," IEEE Trans Fuzzy System, vol. 19, no. 4, pp. 792-794, 2011.

[10] S. Murugavalli and V. Rajamani, "A high speed parallel fuzzy c-mean algotihms for

brain tumor segmentation.," BIME journal, vol. 6, no. 1, pp. 29-33, 2006.

[11] S. Rahimi, M. Zargham, A. Thankre and D. Chhillar, "A parallel fuzzy c-mean

algorithm for image segmentation.," IEEE Annual meeting of the Fuzzy information,,

vol. 1, pp. 234-237, 2004.

[12] G. Pratx and I. Xing, "GPU computing in medical physics: a review.," Medical

Physics, vol. 38, no. 5, pp. 2685-2697, 2011.

71

[13] K. M. Jaber, R. Abdulah and N. A. Rashid, "Fast decision tree-based method to index

large dna-protein sequence dtabases using hybrid distributed shared memory

programming model.," International Journal of Bioinformatics Research and

Applications, vol. 10, pp. 321-340, 2014.

[14] C. Leopold, Parallel and Distributed Computing: A Survey of Medels, Paradigms and

Approaches., A Wiley-Interscience publication, 2001.

[15] A. Wee, C. Liew and H. Yan, "Current Methods in the Automatic Tissue

Segmentation of 3D Magnetic Resonance Brain Images.," Current Medical Image

Review, vol. 2, pp. 1-13, 1 November 2006.

[16] J. Rajapakse, J. Giedd and J. Rapoport, "Statistical Approach to Segmentation of

Single Channel Cerebral MR Images.," in IEEE Trans. on Medical Imaging, April

1997.

[17] M. N. Ahmed and A. A. Farag., "Two stages Neural Network for Medical Volume

Segmentation.," Journal of Pattern Recognition Letters, 1998.

[18] L. O. Hall, A. M. Bensaid, L. P. Clarke, R. P. Velthuizen, M. S. Silbger and J. C.

Bezdek, "A Comparison of Neural Network and Fuzzy Clustering Techniques in

72

Segmenting Magnetic Resonance Images of the Brain.," IEEE Transactions on

Neural Networks, vol. 3, no. 5, pp. 672-681, 1992.

[19] P. G. Kumbhar and S. N. Holambe, "A Review of Image Thresholding Techniques,"

International Journal of Advanced Research in Computer Science and Software

Engineering, vol. 5, no. 6, 2015.

[20] N. Otsu, "A Threshold Selection Method from Gray-Level Histograms," IEEE

TRANSACTIONS ON SYSTREMS, MAN, AND CYBERNETICS,, vol. 9, no. 1, pp. 62-

66, 1979.

[21] R. Nagaraju and M. J. Reddy, "Review of Medical Image Segmentation with

Statistical Approach- State of The Art and Analysis," International Journal of

Computer Engineering & Technology, vol. 8, no. 3, pp. 36-55, 2017.

[22] N. Sharma and L. Aggarwar, "Automated medical image segmentation techniques,"

Journal of Medical Physics, vol. 35, no. 1, pp. 3-14, 2010.

[23] C. Amza, "A Review on Neural Network-Based Image Segmentation Techniques,"

De Montfort University Mechanical and Manufactruing Engineering, Leicester, 200.

73

[24] F. Milletari, N. Navab and S. -A. Ahmadi, "V-Net: fully convolutional neural

networks for volumetric medical image segmentation.," in Fourth International

Confference on 3D Vision, 2016.

[25] J. C. Bezdek, L. O. Hall and L. P. Clarke, "Review of MR image segmentation

techniques using pattern recognition.," Medical Physic , vol. 20, no. 4, pp. 1033-48,

1993.

[26] W. Pedrycz and J. Waletzky, "Fuzzy Clustering with Partial Supervision," IEEE

Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 27, no.

5, pp. 787-795, 1997.

[27] M. Ahmed, S. Yamany, N. Mohamed, A. Farag and T. Moriarty, "A Modified Fuzzy

C-means Algorithm for Bias Field Estimation and Segmentation of MRI Data.,"

IEEE Transactions on Medical Imaging, vol. 21, no. 3, pp. 193-199, 2002.

[28] R. C. Gonzalez and R. E. Woods, , 3rd ed., Upper Saddle

River, NJ: Prentice-Hall, Inc., 2006.

[29] E. A. El-Dahshan, H. M. Mohsen, K. Revett and A. M. Salem, "Computer-aided

diagnosis of human brain tumor throgh mri: A survey and a new algorithm.," Expert

Systems with Applications, vol. 41, pp. 5526-5545, 2014.

74

[30] J. C. Bezdek, R. Ehrlich and W. Full, "The fuzzy c-means clustering algorithm,"

Computers & Geosciences, vol. 10, no. 2-3, pp. 191-203, 1984.

[31] S. Krinidis and V. Chatzis, "A robust fuzzy local information c-means clustering

algorithm.," IEEE Transactions on Image Processing, vol. 19, no. 5, pp. 1328-1337,

2010.

[32] C. Li, R. Huang, Z. Ding, J. C. Gatenby , D. N. Metaxas and J. C. Gore, "A level set

method for image segmentation in the presence of intensity inhomogeneities with

application to MRI.," IEEE Transaction Image Process, vol. 20, no. 7, pp. 2007-

2016, 2011.

[33] H. Ng, S. Ong, K. Foong, P. Goh and W. Nowinski, "Medical image segmentation

using k-means clustering and improved watershed algorithm.," in IEEE Southwest

Symposium on Image Analysis and Interpretation., 2006.

[34] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A.

van der Laak, B. van Ginneken and C. I. Snchez, "A survey on deep learning in

medical image analysis," Medical Image Analysis, vol. 42, pp. 60-88, 2017.

75

[35] A. C. Liew and H. Yan, "An adaptive spatial fuzzy clustering algorithm for 3D MR

image segmentation.," IEEE Transaction Medical Imaging, vol. 22, no. 9, pp. 1063-

1075, 2003.

[36] A. E. Lefohn, J. E. Cates and R. T. Whitaker, "Interactive, GPU-based level sets for

3D segmentation.," in Medical Image Computing and Computer-Assisted

Intervention, Springer, 2003.

[37] J. P. Walters, V. Balu, S. Kompalli and V. Chaudhary, "Evaluating the use of GPUs

in liver image segmentation and HMMER database searches.," in IEEE International

Symposium on Parallel & Distributed Processing, 2009.

[38] B. Fulkerson and S. Soatto, "Really quick shift: Image segmentation on a GPU.," in

Tends and Topics in Computer Vision, Springer, 2010.

[39] M. Klodt and D. Cremers, "A convex framework for image segmentation with

moment constraints.," in IEEE International Conference on Computer vision, 2011.

[40] H. Jordan and G. Alaghband, Fundamentals of Parallel Processing., Prentice Hall,

2003.

[41] NVIDIA, "CUDA C. Programming Guide.," NVIDIA, 2017.

76

[42] [Online]. Available: https://www.nuget.org/packages/fo-dicom/.

[43] W. E. Lorensen and H. E. Cline, "Marching cubes: a high resolution 3d surface

construction algorithm.," in In: ACM Siggraph , 1987.

[44] L. Yao and K. S. Weng, "On A Type-2 Fuzzy Clustering Algorithm," in The Fourth

International Conferences on Pervasive Patterns and Applications, 2012.

[45] C. Hwang and F. Rhee, "Uncertain Fuzzy clustering: interval type-2 fuzzy approach

to c-means.," IEEE Trans Fuzzy System, vol. 15, no. 1, pp. 107-120, 2007.

[46] R. Krishnqpuram and J. M. Keller, "The possibilistic c-means algorithm: insights and

recommendations.," IEEE transactions on Fuzzy Systems, vol. 4, no. 3, pp. 385-393,

1996.

[47] E. Rubio and O. Castillo, "Interval type-2 fuzzy clustering algotihm using the

combination of the fuzzy and possibilistic c-mean algorithms," in IEEE the 21st

Century, Norbert Wiener, 2014.

[48] S. Cook, CUDA programming: a developer's guide to parallel computing with

GPU's., 2012.

77

[49] J. C. Bezdek, "Cluster validity with fuzzy sets.," Journal of Cybernetics, vol. 3, pp.

58 - 73, 1974.

[50] J. C. Bezdek, "Mathematical models for systematic and taxonomy.," in Eight

international conference on numerical taxonomy, San Francisco, 1975.

[51] A. Kronman and L. Joskowicz, "A geometric mthod for the detection and correction

of segmentation leaks of anatomaical structures in volumetric medical images," Int.

J. Comput. Assist. Radiol. Surg., vol. 11, no. 3, pp. 369-380, 2016.

[52] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to

document recognition.," Proc. IEEE, vol. 86, no. 2, pp. 278 - 324 , 1998.

[53] M. Havaei, A. Davy, D. Varde-Farley, A. Biard, A. Courville, Y. Bengio, C. Pal, P.-

M. Jodoin and H. Larochelle, "Brain tumor segmentation with deep neural

networks.," Medical Image Analysis, vol. 35, pp. 18-31, 2017.

[54] H. R. Roth, L. Lu, A. Farag, A. Sohn and R. M. Summers, "Spatial Aggregation of

Holistically-Nested Networks for Automated Pancreas Segmentation.," in Springer

International Publishing, Cham., 2016.

78

[55] O. Ronneberger, P. Fischer and T. Brox, "U-net: Convolutional networks for

biomedical image segmentation.," in Medical Image Computing and Compiuter-

Assisted Intervention, Springer, 2015.

[56] J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks for semantic

segmentation.," in Computer Vision and Pattern Recorgnition, 2015.

[57] H. R. Roth, L. Lu, A. Seff, K. M. Cherry, J. Hoffman, S. Wang, J. Lu, E. Turkbey

and R. M. Summers, "A new 2.5D representation for lymph node dection using

random sets of deep convolutional neural network observations.," in The

International Conference on Medical Image Computing and Computer-Assisted

Intervention, Springer, 2014.

[58] H. Chen, Y. Zheng, J. -H. Park, P. -A. Heng and S. K. Zhou, "Iterative multi-domain

reaularized deep learning for anatomaical structure detection and segmentation from

ultrasound image," in Interhnational Conference on Medical Image Computing and

Computer-Assisted Intervention, Springer, 2016.

[59] K. Kamnitsas, C. Ledig, V. J. Newcombe, J. P. Simpson, A. D. Kane , D. K. Menon,

D. Ruechert and B. Glocker, "Efficient multi-scale 3D CNN with fully connected

CRF for accurate brain lesion segmentation.," Medical Image Analysis, vol. 36, pp.

61 - 78, 2017.

79

[60] O. Cicek, A. Abdulkadir, S. S. Lienkamp, T. Brox and O. Ronneberger, "3D U-Net:

Learning Dense Volumetric Segmentation from Sparse Annotation.," Springer

International Publishing, Cham, 2016.

[61] K. He, X. Zhang, S. Ren and J. Sun, "Deep residual learning for image recognition.,"

in Computer Vision and Patttern Recorgnition, Las Vegas, NV, USA, 2016.

[62] H. Chen, Q. Dou, L. Yu, J. Qin and P. -A. Heng, "VoxResNet: Deep voxelwise

residual networks for brain segmentation from 3D MR images.," Neural Image, vol.

4, p. 41, 2017.

[63] J. Merkow, A. Marsden, D. Kriegman and Z. Tu, "Dense volume-to-volume vascular

boundary detection," in 19th International Conference: Medical Image Computing

and Computer-Assisted intervention., Ahens, Greece, 2016.

[64] S. Albarqouni, C. Baur, F. Achilles, V. Belagiannis, S. Demirci and N. Navab,

"AggNet: deep learning from crowds for mitosis detection in breast cancer histology

images.," IEEE Trans. Medical Imaging, vol. 35, no. 5, pp. 1313-1321, 2016.

[65] H. Chen, X. Qi, L. Yu, Q. Dou, J. Qin and P.-A. Heng, "DCAN: deepcontour-aware

networks for object instance segmentation from histology images.," Medical Image

Analysis, vol. 36, pp. 135-146, 2017.

80

[66] P. Moeskops, M. A. Viergever, A. M. Mendrik, L. S. de Vries, M. J. Benders and I.

Isgum, "Automatic segmentation of mr brain images with a convolutional neural

network.," IEEE Trans. Medical Imaging, vol. 35, no. 5, pp. 1252-1261, 216a.

[67] F. Xing, Y. Xie and I. Yang, "An automatic learning-based framework for robust

nucleus segmentation.," IEEE. Trans. Medical Imaging, vol. 35, no. 2, pp. 550-566,

2016.

[68] N. Dhungel, G. Carneiro and A. P. Bradley, "Deep learning and structured prediction

for the segmentation of mass in mammograms.," in The International Conference on

Medical Image Computing and Computer-Assisted intervention., Springer, 2015.

[69] I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT Press., 2016.

[70] T. e. a. Brosch, "Deep 3D convolutional encoder network with shortcuts for mutiscale

feature integration applied to multiple sclerosis lesion segmentation.," IEEE Trans.

Medical Image, vol. 35, no. 5, pp. 1229 - 1261, 2016.

[71] X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward

neural networks.," in Artificial Intelligence and Statistics, 2010.

[72] D. M. Bradley, "Learning in Modular Systems.," DTIC Document, 2010.

81

[73] E. Olmedo, J. Calleja, A. Benitez and M. A. Medina, "Point to point processin of

digital images using parallel computing.," Internation Journal of Computer Science

Issues, vol. 9, no. 3, pp. 1-10, 2012.

[74] W. K. Pratt, Digital Image Porcessing, 4 ed., Los Altos, California: John Wiley &

Sons, Inc., 2007.

[75] S. Park, J. Lee, J. Shin, J. Seo and K. H. Lee, "Parallelized seeded region growing

using CUDA," pp. 1-10, 2014.

[76] S. Ravi and A. M. Khan , "Morphological operations for image processing:

understanding and its application.," in Signal Processing & Communications, 2013.

[77] T. Kalaiselvi, P. Sriramakrishnan and K. Somasundaram, "Performance analysis of

morphological operations in CPU and GPU for accelerating digital image

applications.," International Journal Computer Science Information Technology,

vol. 4, no. 1, pp. 15-27, 2016.

[78] J. M. Koay , Y. C. Chang, S. m. Tahir and S. Sreeramula, "Parallel implementation

of morphological operation on binary image using CUDA.," Advance Machine

Learning and Signal Processing, vol. 387, pp. 163-173, 2016.

82

[79] L. Vincent and P. Soille, "Waterhseds in digital spaces: an efficient allogithm based

on immersion simulation.," The IEEE Transactions on Pattern Analysis and Machine

Intelligence , vol. 13, no. 6, pp. 583-598, 1991.

[80] L. Pan, I. Gu and J. Xu, "Implementation of medical image segmentation in CUDA.,"

in IEEE Proceedings of the international conference on technology and applications

in bimedicine., 2008.

[81] G. Vitor, J. Ferreira and A. Korbes , "Fast image segmentation by watershed

transform on graphical hardware.," in Proceedings of 30th CILAMCE, 2009.

[82] E. Olmedo, J. Calleja and A. Benitez, "Medina MA. Point to point processing of

digital images using parallel computing.," IJCSI Int. J. Computer Science, vol. 9, no.

3, pp. 1-10, 2012.

[83] S. Park, J. Lee, H. Lee, J. Shin, J. Seo and K. Lee, "Parallelized seeded region

growing using CUDA," 2014, pp. 1-10.

[84] J. Serra, "Introduction to mathematical morphology.," Comput Vis Graph Image

Process, vol. 35, no. 3, pp. 283-305, 1986.

[85] T. Kalaiselvi, P. Sriramakrishnan and K. Somasundaram, "Performance analysis of

morphological operations in CPU and GPU for accelerating digital Image

83

application," International Journal Computer Science Inforamtion Technology, vol.

4, no. 1, pp. 15-27, 2016.

[86] G. Vitor, J. Ferreira and A. Korbes, "Fast image segmentation by watershed

transform on graphical hardware.," in 30th CILAMCE, 2009.

[87] L. Pan, L. Gu and J. Xu, "Implementation of medical Image segmentation in

CUDA.," in The International conference on technology and applications in

biomedicine, 2008.

84