Journal of Imaging
Article Automatic Gleason Grading of Prostate Cancer Using Shearlet Transform and Multiple Kernel Learning
Hadi Rezaeilouyeh * and Mohammad H. Mahoor
Department of Electrical and Computer Engineering, University of Denver, Denver, CO 80208, USA; [email protected] * Correspondence: [email protected]
Academic Editors: Gonzalo Pajares Martinsanz, Philip Morrow and Kenji Suzuki Received: 11 May 2016; Accepted: 6 September 2016; Published: 9 September 2016
Abstract: The Gleason grading system is generally used for histological grading of prostate cancer. In this paper, we first introduce using the Shearlet transform and its coefficients as texture features for automatic Gleason grading. The Shearlet transform is a mathematical tool defined based on affine systems and can analyze signals at various orientations and scales and detect singularities, such as image edges. These properties make the Shearlet transform more suitable for Gleason grading compared to the other transform-based feature extraction methods, such as Fourier transform, wavelet transform, etc. We also extract color channel histograms and morphological features. These features are the essential building blocks of what pathologists consider when they perform Gleason grading. Then, we use the multiple kernel learning (MKL) algorithm for fusing all three different types of extracted features. We use support vector machines (SVM) equipped with MKL for the classification of prostate slides with different Gleason grades. Using the proposed method, we achieved high classification accuracy in a dataset containing 100 prostate cancer sample images of Gleason Grades 2–5.
Keywords: Gleason grading; multiple kernel learning; prostate cancer; Shearlet transform; texture analysis
1. Introduction
1.1. Motivation Prostate cancer is diagnosed more often than any other cancer in men with the exception of skin cancer [1]. It is estimated that 180,890 new cases of prostate cancer will occur in the U.S. in 2016 [1]. Prostate cancer cells can transfer to other tissues and develop new damaging tumors. Therefore, it is vital to diagnose prostate cancer in the early stages and to provide necessary treatment. There are several screening methodologies for prostate cancer diagnosis. Histological grading is an essential key in prostate cancer diagnosis and treatment methodology [2]. Cancer aggressiveness is quantified using histological grading of the biopsy specimens of the tissue. The tissue is usually stained by the hematoxylin and eosin (H&E) technique [2]. The Gleason grading system is widely used for prostate grading [3], which classifies the prostate cancer as Grades 1–5, as shown in Figure1. Higher Gleason grades indicate a higher malignancy level. The Gleason score is calculated using the sum of the two most dominant Gleason grades inside a tissue and ranges from 2 to 10. Patients with a combined score between two and four have a high chance of survival, while patients with a score of 8–10 have a higher mortality rate [2]. To better understand the difference between normal and malignant tissues, we illustrate sample images in Figure2. Figure2a shows normal prostate tissue containing gland units enclosed by stroma.
J. Imaging 2016, 2, 25; doi:10.3390/jimaging2030025 www.mdpi.com/journal/jimaging J. Imaging 2016, 2, 25 2 of 19 J. Imaging 2016, 2, 25 2 of 18 J. Imaging 2016, 2, 25 2 of 18 Glandstroma. units Gland contain units epithelial containcells epitheli aroundal cells the around lumen. the When lumen. cancer When happens, cancer epithelial happens, cells epithelial randomly cells stroma.duplicate,randomly Gland disturbing duplicate, units contain the disturbing normal epitheli structure theal normal cells of around glands,structure the as lumen.of shown glands, When in Figureas shown cancer2b,c. in happens, Figure 2b,c. epithelial cells randomly duplicate, disturbing the normal structure of glands, as shown in Figure 2b,c.
Figure 1. Gleason grading system depiction [4]. Higher Gleason grades indicate higher malignancy. Figure 1. Gleason grading system depiction [4]. Higher Gleason grades indicate higher malignancy. Figure 1. Gleason grading system depiction [4]. Higher Gleason grades indicate higher malignancy.
(a) (b) (a) (b)
(c) (c) Figure 2. Prostate tissue samples. (a) Benign; (b) Grade 2; (c) Grade 5 [5]. Figure 2. Prostate tissue samples. ( a)) Benign; Benign; ( (b) Grade 2; ( c) Grade 5 [5]. [5]. Furthermore, the shape of normal and benign cells has a recognizable pattern, while malignant Furthermore, the shape of normal and benign cells has a recognizable pattern, while malignant cellsFurthermore, are described the by shape irregular of normal morphology and benign in nuclei cells [6]. has Malignant a recognizable cells pattern,have larger while nuclei malignant and less cells are described by irregular morphology in nuclei [6]. Malignant cells have larger nuclei and less cellscytoplasm are described than bybenign irregular cells. morphology Figure 2b,c in nucleishows [ 6Gleason]. Malignant Grade cells 2 have and larger 5 malignant nuclei and tissues, less cytoplasm than benign cells. Figure 2b,c shows Gleason Grade 2 and 5 malignant tissues, cytoplasmrespectively. than benign cells. Figure2b,c shows Gleason Grade 2 and 5 malignant tissues, respectively. respectively. AutomaticAutomatic medical medical image image classification classification is is an an important important field field mainly mainly based based on on image image processing processing Automatic medical image classification is an important field mainly based on image processing methods.methods. Automatic Automatic processing processing of of cancer cancer cell cell images images can can be be less less time time consuming consuming and and more more efficient efficient methods. Automatic processing of cancer cell images can be less time consuming and more efficient thanthan pathologist pathologist examination. examination. Furthermore, Furthermore, computerized computerized analysis analysis is not is prone not prone to inter-observer to inter-observer and than pathologist examination. Furthermore, computerized analysis is not prone to inter-observer intra-observerand intra-observer variations variations resulting resulting from human from gradinghuman [grading7]. In this [7]. paper, In this we paper, extract we different extract features, different and intra-observer variations resulting from human grading [7]. In this paper, we extract different includingfeatures, a including new feature a basednew feature on the Shearletbased on transform, the Shearlet combine transform, them using combine multiple them kernel using learning multiple features, including a new feature based on the Shearlet transform, combine them using multiple (MKL)kernel and learning utilize (MKL) them for and the utilize automatic them Gleasonfor the automatic grading of Gleason prostate. grading of prostate. kernel learning (MKL) and utilize them for the automatic Gleason grading of prostate.
J. Imaging 2016, 2, 25 3 of 19
1.2. Literature Review Several methods have been proposed for automatic Gleason grading [7–16]. We categorize those papers based on the features they use and review them in the following. The features are: (1) structural; (2) texture; and (3) color. Structural techniques are based on cell segmentation and then feature extraction [7–11]. Farjam et al. [7] extracted structural features of the glands and used them in a tree-structured algorithm to classify the images into five Gleason grades of 1–5. First, the gland units were located using texture features. Then, the authors used a multistage classification method based on morphometric and texture features extracted from gland units to classify the image into Grades 1–5. They reported classification accuracies of 95% and 85% on two different datasets using the leave-one-out technique. Salman et al. [8] extracted joint intensity histograms of hematoxylin and eosin components from the histological images and used k-nearest neighbor supervised classification to delineate the benign and malignant glands. They compared their results with the histogram of oriented gradients and manual annotation by a pathologist and showed that their method outperformed the other methods. Gertych et al. [9] extracted intensity histograms and joint distributions of the local binary patterns and local variance features and used a machine learning approach containing a support vector machine followed by a random forest classifier to classify prostate tissue into stroma, benign, and prostate cancer regions. Khurd et al. [10] used random forests to cluster the filter responses from a rotationally-invariant filter bank at each pixel of the histological image into textons. Then, they used the spatial pyramid match kernel along with an SVM classifier to classify different Gleason grade images. A microscopic image segmentation method was proposed in [11]. The authors transformed the images to the HSV domain and used mean shift clustering for segmentation. They extracted texture features from the cells and used an SVM classier to classify lymphocytes. Some other techniques are based on texture features or a combination of texture, color and morphological features [12–16]. Jafari and Soltanian-Zadeh [5] used multiwavelets for the task of automatic Gleason grading of prostate. The authors compared the results from features based on multiwavelets with co-occurrence matrices and wavelet packets. They used a k-nearest neighbor classifier to classify images. They assigned weights on features and optimized those weights using simulated annealing. They report an accuracy of 97% on a set of 100 images using the leave-one-out technique. Tabesh et al. [12] proposed a multi-feature system for the task of cancer diagnosis and Gleason grading. The authors combined color, texture and morphometric features for classification. They used 367 images for cancer detection and reported 96% accuracy. They also used 268 images for Gleason grading and reported 81% classification accuracy. In a recent study, we used Shearlet coefficients for the automatic diagnosis of breast cancer [13]. We extracted features from multi-decomposition levels of Shearlet filters and used the histogram of Shearlet coefficients for the task of classification of benign and malignant breast slides using SVM. We applied our proposed method on a publicly-available dataset [14] containing 58 slides of breast tissue and achieved 75% classification accuracy. We also used the Shearlet transform for prostate cancer detection of histological images [15] and MR images [16], and we achieved 100% and 97% classification accuracy, respectively. Compared to our previous work [13–16], we have two novelties in this paper. First, despite our previous studies where we used the histogram of Shearlet coefficients, here, we extract more statistics from Shearlet coefficients using a co-occurrence matrix. This provides more information on the texture of the cancerous tissues, which in turn results in higher classification rates for the task of Gleason grading as presented in our Experimental Results section. Second, we utilize other types of features (color and morphological) and combine them using MKL, which makes our proposed method in this paper more comprehensive. J. Imaging 2016, 2, 25 4 of 18
1.3. Proposed System Overview Image classification problems rely on feature descriptors that point out the differences and similarities of texture while robust in the presence of noise [17]. However, different applications need different feature descriptors. Therefore, there is an extensive amount of research carried out on finding the bestJ. Imaging feature2016, 2 ,extraction 25 methods. In this paper, we propose to use three4 different of 19 types of features and fuse them using MKL [18]. Our main1.3. contribution Proposed System in Overview this paper is two-fold. First, we propose using the Shearlet transform and its coefficientsImage as texture classification features problems for rely automatic on feature descriptorsGleason thatgrading point out of the prostate differences cancer. and Traditional frequency-basedsimilarities methods of texture cannot while robustdetect in thedirectional presence of noise features [17]. However, [19]. differentOn the applications other hand, need the Shearlet different feature descriptors. Therefore, there is an extensive amount of research carried out on finding transform [20]the can best featuredetect extraction features methods. in Indifferent this paper, wescales propose and to use orientations three different types and of features provides efficient mathematical andmethods fuse them to using find MKL 2D [18 singularities,]. such as image edges [21–23]. The Gleason grading Our main contribution in this paper is two-fold. First, we propose using the Shearlet images are typicallytransform governed and its coefficients by curvilinear as texture features features, for automatic which Gleason justifies grading ofthe prostate choice cancer. of the Shearlet transform as ourTraditional main frequency-basedtexture features. methods cannot detect directional features [19]. On the other hand, Second, wethe utilize Shearlet the transform multiple [20] can kernel detect featureslearning in differentalgorithm scales [18] and orientationsfor fusing and three provides different types of efficient mathematical methods to find 2D singularities, such as image edges [21–23]. The Gleason features used ingrading our imagesproposed are typically approach. governed byIn curvilinear multiple features, kernel which learning, justifies the a choice kernel of the model Shearlet is constructed using a linear combinationtransform as our mainof some texture fixed features. kernels. The advantage of using MKL is in the nature of the Second, we utilize the multiple kernel learning algorithm [18] for fusing three different types of classification problemfeatures used we in ourare proposed trying approach. to solve. In multiple Since kernelwe have learning, different a kernel model types is constructed of features extracted from images, itusing is possible a linear combination to define of some a kernel fixed kernels. for each The advantage of these of usingfeature MKL types is in the and nature to of optimize their linear combination.the classification This will problem eliminate we are trying the need to solve. for Since feature we have selection. different types of features extracted from images, it is possible to define a kernel for each of these feature types and to optimize their linear After we extractedcombination. the This features, will eliminate we the use need support for feature selection.vector machines (SVM) equipped with MKL for the classification ofAfter prostate we extracted slid thees features,with different we use support Gleason vector machines grades. (SVM) Then, equipped we compare with MKL forour results with the classification of prostate slides with different Gleason grades. Then, we compare our results with the state of thethe art. state The of the overall art. The system overall system is depicted is depicted in Figure Figure3. 3.
Figure 3.Figure Overall 3. Overall system. system. MKL, MKL, multiple multiple kernel kernel learning. learning.
The remainder of this paper is organized as follows. In the next section, three different types of The remainderimage features of this are paper presented. is organized In Section3, feature as follows. fusion using In the the MKL next algorithm section, and SVM-basedthree different types of image featuresclassification are presented. is reported. In InSection Section4 ,3, we feat reporture the fusion classification using results. the Finally, MKL the algorithm conclusions are and SVM-based classification ispresented reported. in Section In Section5. 4, we report the classification results. Finally, the conclusions are presented in Section 5.
2. Feature Analysis of Microscopic Images As described in the previous section, Gleason grading is based on different properties of the tissue image. Therefore, we need a robust feature extraction method that considers all of the possible changes in the tissue due to cancer. In this section, we present features representing the color, texture and morphological properties of the cancerous tissues. These features are the essential building blocks of what pathologists consider when they perform Gleason grading. Some of the parameters in the following section, such as the number of histogram bins in color channel histograms, were selected based on our preliminary results.
J. Imaging 2016, 2, 25 5 of 19
2. Feature Analysis of Microscopic Images As described in the previous section, Gleason grading is based on different properties of the tissue image. Therefore, we need a robust feature extraction method that considers all of the possible changes in the tissue due to cancer. In this section, we present features representing the color, texture and morphological properties of the cancerous tissues. These features are the essential building blocks of what pathologists consider when they perform Gleason grading. Some of the parameters in the following section, such as the number of histogram bins in color channel histograms, were selected based on our preliminary results.
2.1. Texture Features Gleason grading is mainly based on texture features and the characteristics of the cancerous tissues. Taking another look at Figure1 highlights the changes in texture due to malignancy by observing that the texture becomes more detailed as the epithelial cell nuclei grow in a random manner and spread across the tissue. Therefore, we need to develop the most novel, accurate and robust texture analysis tool. For this purpose, we propose a novel feature representation method using the statistics extracted from discrete Shearlet coefficients.
2.1.1. Shearlet Transform We utilize the Shearlet transform [19,20] as our main texture detection method. The Shearlet transform is a new method that is efficiently developed to detect 1D and 2D directional features in images. The curvelet transform was the pioneer of the new set of transforms [24]. Curvelets are defined by three parameters: scale, location and rotation. However, the curvelet is sensitive to rotation, and a rotation breaks its structures. Furthermore, curvelets are not defined in the discrete domain; therefore, it made it impractical for real-life applications. Therefore, we need a directional representation system that can detect anisotropic features in the discrete domain. To overcome this limitation, the Shearlet transform was developed based on the theory of affine systems. The Shearlet transform uses shearing instead of rotation, which in turn allows for discretization. The continuous Shearlet transform [19] is defined as the mapping for f e L2(R2):
SHΨ f (a, s, t) = < f , Ψa,s, t > (1)
Then, the Shearlets are given by:
− 1 −1 Ψa,s,t (x) = |detMa,s| 2 Ψ Ma,s x − t (2)
√ ! a as 2 where Ma s = √ for a > 0, s e R, t e R . We can observe that Ma s = Bs Aa, where , 0 a , ! ! a 0 1 s Aa = √ and Bs = . Hence, Ma s consists of an anisotropic dilation produced by 0 a 0 1 , the matrix Aa and a shearing produced by the matrix Bs. As a result, the Shearlets are well-localized waveforms. By sampling the continuous Shearlet transform in the discrete domain, we can acquire the discrete Shearlet transform [20]. Choosing a = 2−j and s = −l with j, l e Z, one can acquire the J. Imaging 2016, 2, 25 6 of 19
j j/2 ! −1 2 l2 l j collection of matrices M −j . By observing that M −j = M j = = B A , where 2 ,−l 2 ,−l 2 ,l 0 2j/2 0 0 ! ! 2 0 1 l A = √ and B = , the discrete system of Shearlets can be represented as: 0 0 2 0 0 1
j 2 l j Ψj,l,k = |detA0| Ψ B0 A0 x − k (3) where j, l e Z, k e Z2. The discrete Shearlets can deal with multidimensional functions. Due to the simplicity of the mathematical structure of Shearlets, their extensions to higher dimensions are very natural [25]. The discretized Shearlet system often forms a frame, which allows for a stable representation of images. This justifies this way of discretization. Furthermore, Shearlets are scaled based on parabolic scaling, which makes them appear as elongated objects in the spatial domain, as shown in Figure4. To better understand the advantage of the Shearlet transform over wavelet transforms, we further discuss their scaling properties. Wavelets are based on isotropic scaling, which makes them isotropic transforms and, therefore, not suitable for detecting non-isotropic objects (e.g., edges, corners, etc.) [26–28]. On the other hand, since the Shearlet transform defines the scaling function using parabolic scaling, it can detect curvilinear structures, as shown in Figure5. We refer our readers J. Imaging 2016, 2, 25 6 of 18 J.to Imaging [25] for 2016 more, 2, 25details on the Shearlet transform and its properties. 6 of 18