Paper Title (Use Style: Paper Title) s65

Granular Computing Approach for the Design of Medical Data Classification Systems

Mohammed Al-Shammaa*, Maysam F. Abbod Department of Electronic and Computer Engineering Brunel University London, United Kingdom [email protected], [email protected]

Abstract— Granular computing is a computation theory that decision making in diagnosing cardiovascular disease by imitates human thinking and reasoning by dealing with determining the risk of a patient suffering from the disease information at different levels of abstraction/precision. The within the next 10 years. A method for automatic construction adoption of granular computing approach in the design of data of fuzzy rule-based system for medical data classification is classification systems improves their performance in dealing with presented in [5]. The proposed method uses genetic algorithm data uncertainty and facilitates handling large volumes of data. to optimize subtractive clustering method for fuzzy rules In this paper, a granular computing approach for the design of generation from data. medical data classification systems is proposed. The proposed approach makes use of data granulation in training the classifier. In addition to fuzzy systems, the use of artificial neural Training data is granulated at different levels and data from each networks in data classification has been proposed by many level is used for constructing the classification system. To researches. A neural network model that is used as a biomarker evaluate performance of the proposed approach, a classification association network is proposed in [6]. The proposed model system based on neural network is implemented. Four medical captures the associations between the biomarkers by datasets are used to compare performance of the proposed minimizing an energy function. Based on the proposed model, approach to other classifiers: neural network classifier, ANFIS a cancer classification approach is developed. In [7] a spiking classifier and SVM classifier. Results show that the proposed wavelet radial basis neural network that can be used for the approach improves classification performance of neural network classifier and produces better accuracy and area under curve classification of gene expression data is presented. than other classifiers for most of the datasets used. Several other computational intelligence methods have been used in medical data classification including: fuzzy Keywords—Artificial neural network; granular computing; decision trees [8] [9], support vector machines [10] [11] and data clustering; data classification. adaptive neuro-fuzzy inference system [12] [13].

I. INTRODUCTION When designing data-driven systems, the concept of granular computing arises. Granular computing is a The task of data classification plays a major role in many computation theory that inspired by the way human process medical diagnosis and decision making procedures [1]. Data information. In this theory, information is handled in groups or classification is a supervised learning method that divides data entities called information granules. The use of granular into homogeneous (in terms of some similarity measure) computing approach in system design provides the flexibility groups (classes). There are two types of classification with of handling information at different levels of regard to the number of classes: binary classification (grouping abstraction/precision. the data into only two classes, and multiclass classification (classifying the data into more than two classes). In this paper, In this paper, a granular computing approach for the design only binary classification is considered since any multiclass of medical data classification systems is proposed. The classification task can be divided into multiples binary proposed approach makes use of multi-level data granulation in classification tasks. This can be achieved by classifying data training and constructing the classification system. into one class against all other classes [2]. This paper is arranged as follows. A brief description of Soft computing techniques have been applied to address granular computing is given in section II. The proposed classification problem. Many researchers have presented the approach is presented in section III. Section IV shows use of fuzzy rule-based classification systems. [3] presents a experimental results and comparison of the proposed method to fuzzy classification framework to extract fuzzy rules from other classification methods. Conclusions are given in section numerical data to construct medical diagnosis systems. In [4], a V. system that combines fuzzy rule-based classification systems with interval-valued fuzzy sets is proposed. The system assists II. GRANULAR COMPUTING To classify data, scores of the input data are computed Granular computing (GrC) is a computation theory that using second stage classifiers. The final stage classifier uses imitates human thinking and reasoning by dealing with similar these scores to compute the final output. items or elements as groups or entities called information The first stage provides different versions of training data granules. The size and number of information granules depends that differs in level of specificity / generality. The classifiers at on the granulation level. Granulation levels represent the levels the next stage produce scoring values for each version of each of abstraction/precision at which knowledge is extracted from data point and the final stage classifier is modelled using these data [14] [15]. values. In other words, original training data is replaced by GrC provides flexibility and adaptability in the level of scoring values of the multi-level granulated data. knowledge extraction and introduces a framework that includes Data granulation is achieved using a clustering algorithm. all methods of data cluster analysis, fuzzy sets, rough sets…etc. The clustering algorithm creates granules of data points whose [16] distance is less than a specific value that represents the Adoption of granular computing approach in the design of granulation level. Euclidean distance function is used to data classification systems improves their performance in computes distance. Class label of each granule is the label that dealing with data uncertainty and facilitates handling large has a number of data points more than CR within the granule. volumes of data. Many computational intelligence methods CR equals 0.5 for balanced data and n/N for imbalanced data, have been modified to incorporate the concept of granular where n=total number of data points that belong to the given computing. class in the whole training data and N is the total number of training data points. One example is granular neural networks. Artificial neural networks (ANNs) are based on the basic principle of biological Different granulation levels are obtained by increasing the neural networks in human brain. ANNs consist of a number of maximum distance between points that belong to the same artificial neurons that are connected together by weighted granule. When the total number of resulted granules is less than connections. Each neuron is a nonlinear mapping function that that of the previous level by a significant number (e.g. 10), a maps the weighted inputs to the output. Granular neural new level is obtained. networks combine ANNs with methods that belong to GrC like A granule is represented by its center and number of points fuzzy sets and rough sets [17] [18]. Fuzzy neural networks are belonging to it. A granule center is the arithmetic mean of all the result of the fuzzification of ANNs or applying fuzzy logic points in the granule. To simplify the process of training the reasoning to ANNs. On the other hand, rough neural networks classifiers, a granule is used multiple times equal to the number are produced by integrating ANNs and principles of rough sets. of its points so that the total number of training samples is One way of performing data granulation is by data always the same regardless of the number of granules – i.e. the clustering. There are several methods that cluster data points total number of original training data points. with a certain level of similarity into clusters. One clustering Any type of classifier can be used given that it can provide method is data granulation algorithm presented in [16]. scoring output – i.e. output ranging from 0 to 1. Neural In this paper, a granular computing approach for the design network classifiers are used in our study. of medical data classification systems is proposed. Instead of A schematic diagram of the proposed approach is shown in incorporating granular computing in the architectural design of Fig. 1 and the pseudo-code of the proposed approach is given the classifier, the proposed approach makes use of data in Algorithm 1. granulation in training the classifier. Training data is granulated at different levels and data from each level is used for Algorithm 1. Classifier Design Using Granular Computing constructing the classification system. Input: Training data: N×I matrix, N is the number of samples, I is the number of input variables; Class Labels: N×1

Output: Classification system CF, Cr (r=1….R) III. GRANULAR COMPUTING APPROACH FOR CLASSIFIER 1: For each selected level of granulation Lr (r=1….R) DESIGN 2: Perform data granulation Gr In this study, a data granulation approach is used to 3: Compute granules centers construct a binary classification system. The proposed approach consists of three stages. In the first stage, data 4: Train classifier Cr using granules’ centers, use each granule granulation is performed on the training data at different levels, Ng times (Ng= no. of points in granule g) i.e. from fine granulation to coarse granulation. Then, at the 5: Compute scores Sr using classifier Cr and granulated second stage, the resulted granulated data is used to train a set data Gr as input of classifiers that compute classification scores for each granule using its corresponding class label. This is performed 6: Repeat steps 1-5 for all selected granulation levels for each selected granulation level. In the final stage, resulted 7: Use scores S to train final stage classifier CF (S is the scores are used to train a classifier which performs the final concatenation of Sr, r=1….R) classification.

* PhD student, sponsored by the Higher Committee for Education Development (HCED), Iraq. (ANFIS) with different number of rules. Table 1 lists brief descriptions for each classifier used. 5-fold cross validation was used in the evaluation of all classifiers. Each data set is divided into five equal-sized partitions. Each classifier was tested five times with a different partition (20% of training samples). The remaining partitions (80% of training samples) were used for training.

TABLE I. DESCRIPTION OF CLASSIFIERS

Classifier Description Classifier designed using the granular computing GNN approach with 2-layer neural network classifiers in the second and final stages. Single layer neural network classifier with 10 1-layer NN neurons. 2-layer neural network classifier with 10 and 5 2-layer NN neurons in the first and second hidden layers respectively. ANFIS classifier with 2 rules fuzzy system 2-rule ANFIS constructed using fuzzy c-mean method (FCM) ANFIS classifier with 3 rules fuzzy system 3-rule ANFIS constructed using fuzzy c-mean method (FCM) ANFIS classifier with 5 rules fuzzy system 5-rule ANFIS constructed using fuzzy c-mean method (FCM) SVM Linear SVM classifier with linear kernel. SVM classifier with gaussian radial basis function SVM RBF kernel (sigma = 1). SVM Polynomial SVM classifier with polynomial kernel (order = 3).

TABLE II. DESCRIPTION OF DATASETS

No. of No. of Class Dataset record input Description ratio s variables Records of female Pima Indian Pima 768 8 268/500 patients (at least 21 years old) from USA tested for diabetes Liver patients and non liver patients records (441 male and ILPD 579 10 165/414 142 female) collected from north east of Andhra Pradesh, Fig. 1. Flowchart of the proposed approach. India. Heart Heart disease patients and non 270 13 120/150 IV. EXPERIMENTAL STUDY disease heart disease patients records. To implement and evaluate the proposed approach, a Records of blood tests which classification system is designed using granular computing are thought to be sensitive to Bupa 341 6 199/142 liver disorders that might arise approach with 2-layer neural network classifiers used in second from alcohol consumption in and last stage. A neural network of 10 neurons in the first males. hidden layer and 5 neurons in the second hidden layer is used for each classifier. Classification results are compared to those of other classifiers: single layer neural network, double layer neural network, support vector machine (SVM) with different kernels, and adaptive network-based fuzzy inference system Classification performance was evaluated using two TABLE IV. CLASSIFICATION RESULTS FOR ILPD DATASET measures: accuracy and area under curve (AUC). Four Classifier Accuracy Area Under Curve different datasets were used for performance evaluation: Pima Indians diabetes dataset, Indian liver patient dataset (ILPD), GNN 75.14% 71.76% heart disease dataset and Bupa liver disorders dataset. Table II 1-layer NN 72.72% 66.45% lists some characteristics and descriptions of these datasets. All datasets can be found at University of California in Irvine 2-layer NN 72.54% 64.93% (UCI) Machine Learning Repository [19]. Data samples 2-rule ANFIS 73.07% 69.65% containing missing values were deleted and all data were 3-rule ANFIS 71.85% 54.18% normalized before being used in training or testing. 5-rule ANFIS 71.51% 50.00% Classification results for the four datasets are given in Tables III-VI. For the first three datasets, results show that the SVM Linear 71.51% 50.00% granular computing approach has improved the classification SVM RBF 71.51% 50.00% accuracy and AUC of the neural network classifier (2-layer NN). In addition, GNN has better classification accuracy and SVM Polynomial 71.51% 50.00% AUC than all other tested classifiers. However, for the fourth dataset (Bupa), GNN does not have the best classification accuracy and AUC although it has improved performance compared to that of the neural network TABLE V. CLASSIFICATION RESULTS FOR HEART DISEASE DATASET classifiers. This suggests that granular computing approach Classifier Accuracy Area Under Curve might be more advantageous for datasets with certain characteristics. GNN 86.67% 88.56% In addition, the number of granulation levels used in 1-layer NN 84.07% 87.13% designing the classifier and which levels are selected play a 2-layer NN 82.59% 87.74% crucial role in achieving the best classification performance. 2-rule ANFIS 84.07% 84.70% For example, Table VII shows classification accuracy and AUC of GNN for different set of granulation levels of Pima 3-rule ANFIS 81.85% 80.22% dataset. 5-rule ANFIS 82.22% 79.67% In this study, granulation levels were selected empirically. SVM Linear 84.07% 83.58% However, to further improve the classification performance, methods that dynamically select the granulation levels that SVM RBF 81.85% 81.49% result in optimal performance have to be investigated. For SVM Polynomial 79.63% 79.62% example, for Pima dataset, selecting best granulation levels in each testing fold of GNN results in better accuracy and AUC as shown in Table VIII. One suggestion is the use of optimization methods (e.g. particle swam optimization or genetic algorithms) that use entropy measure as an objective function. TABLE VI. CLASSIFICATION RESULTS FOR BUPA DATASET Classifier Accuracy Area Under Curve GNN 74.47% 70.71%

TABLE III. CLASSIFICATION RESULTS FOR PIMA DATASET 1-layer NN 69.47% 63.74%

Classifier Accuracy Area Under Curve 2-layer NN 68.93% 64.55% GNN 80.86% 83.17% 2-rule ANFIS 75.95% 74.52% 1-layer NN 78.91% 82.67% 3-rule ANFIS 70.95% 70.81% 2-layer NN 78.52% 82.44% 5-rule ANFIS 73.31% 72.24% 2-rule ANFIS 78.78% 81.15% SVM Linear 48.10 % 50.43% 3-rule ANFIS 77.22% 80.51% SVM RBF 55.16% 52.78% 5-rule ANFIS 76.44% 76.90% SVM Polynomial 68.92% 65.26% SVM Linear 76.57% 71.38% SVM RBF 76.96% 71.83% Fig. 2-9 show the receiver operating characteristic (ROC) SVM Polynomial 76.44% 70.86% curves of GNN and a selected classifier for the four datasets. TABLE VII. CLASSIFICATION RESULTS FOR PIMA DATASET USING GNN WITH DIFFERENT GRANULATION LEVELS

Selected granulation levels Accuracy Area Under Curve 2, 9, 16, 23, 30, 37, 44, 51, 58 78.13% 81.75% 3, 11, 19, 27, 35, 43, 51 80.86% 83.17% 3, 9, 15, 21, 27, 33, 39, 45, 51, 57 78.52% 82.11% 4, 11, 18, 25, 32, 39, 46, 53 79.43% 82.99% 1, 9, 17, 25, 33, 41, 49, 57 78.91% 82.35%

TABLE VIII. CLASSIFICATION RESULTS FOR PIMA DATASET USING GNN WITH BEST GRANULATION LEVELS FOR EACH FOLD

Fold Accuracy Area Under Curve 1 81.82% 85.90% 2 85.06% 85.10%

3 83.12% 85.88% Fig. 2. GNN ROC curve for Pima Dataset. 4 77.27% 78.45% 5 82.24% 84.76% Average 81.90% 84.02%

V. CONCLUSION Granular computing is a promising approach for the design of data classification systems, especially when dealing with data uncertainty and large volumes of data. In this paper, a granular computing approach for the design of medical data classification systems is proposed. The proposed approach performs data granulation to the training data at different levels and uses the resulted granulated data to train a set of classifiers that compute classification scores for each granule. Resulted scores are used to train a classifier which performs the final classification. To evaluate performance of the proposed approach, a classification system based on neural network is implemented and its performance is compared to other classifiers: neural network classifier, ANFIS classifier and SVM classifier. Results from four datasets show that the Fig. 3. 1-layer NN ROC curve for Pima Dataset proposed approach improves classification performance of neural network classifier and produces better accuracy and AUC than other classifiers for most of the datasets used. To further improve the classification performance, we suggest investigating methods that dynamically select the best set of granulation levels and the optimal number of levels. One suggestion is the use of optimization methods (e.g. particle swam optimization or genetic algorithms) that use entropy measure as an objective function. Fig. 4. GNN ROC curve for ILPD Dataset Fig. 6. GNN ROC curve for Heart Disease Dataset

Fig. 5. 2-rule ANFIS ROC curve for ILPD Dataset Fig. 7. 2-rule ANFIS ROC curve for Heart Disease Dataset REFERENCES [1] Paul R. Harper, A review and comparison of classification algorithms for medical decision making, Health Policy, Volume 71, Issue 3, March 2005, Pages 315-331, ISSN 0168-8510. [2] Tax, D.M.J.; Duin, R.P.W., "Using two-class classifiers for multiclass classification," Pattern Recognition, 2002. Proceedings. 16th International Conference on , vol.2, pp.124,127. [3] Ioannis Gadaras, Ludmil Mikhailov, An interpretable fuzzy rule-based classification methodology for medical diagnosis, Artificial Intelligence in Medicine, Volume 47, Issue 1, September 2009, Pages 25-41, ISSN 0933-3657. [4] S., J. Antonio, et al. "Medical diagnosis of cardiovascular diseases using an interval-valued fuzzy rule-based classification system." Applied Soft Computing 20 (2014): 103-111. [5] M., Al-Shammaa, and M. F. Abbod, "Automatic Generation of Fuzzy Classification Rules from Data," in Recent Advances in Neural Networks and Fuzzy Systems, Venice, Italy, March 2013, pp. 74-79. [6] Hong-Qiang Wang, Hau-San Wong, Hailong Zhu, Timothy T.C. Yip, A neural network-based biomarker association information extraction approach for cancer classification, Journal of Biomedical Informatics, Volume 42, Issue 4, August 2009, Pages 654-666, ISSN 1532-0464 [7] B. Chandra, K.V. Naresh Babu, Classification of gene expression data using Spiking Wavelet Radial Basis Neural Network, Expert Systems Fig. 8. GNN ROC curve for Bupa Dataset with Applications, Volume 41, Issue 4, Part 1, March 2014, Pages 1326- 1330, ISSN 0957-4174. [8] Chin-Yuan Fan, Pei-Chann Chang, Jyun-Jie Lin, J.C. Hsieh, A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification, Applied Soft Computing, Volume 11, Issue 1, January 2011, Pages 632-644, ISSN 1568-4946. [9] Levashenko, V.; Zaitseva, E., "Fuzzy Decision Trees in medical decision Making Support System," Computer Science and Information Systems (FedCSIS), 2012 Federated Conference on , vol., no., pp.213,219, 9-12 Sept. 2012. [10] Kemal Polat, Salih Güneş, Ahmet Arslan, A cascade learning system for classification of diabetes disease: Generalized Discriminant Analysis and Least Square Support Vector Machine, Expert Systems with Applications, Volume 34, Issue 1, January 2008, Pages 482-487, ISSN 0957-4174. [11] Azar, Ahmad Taher, and Shaimaa Ahmed El-Said. "Performance analysis of support vector machines classifiers in breast cancer mammography recognition." Neural Computing and Applications 24.5 (2014): 1163-1177.. [12] Kalaiselvi, C.; Nasira, G.M., "A New Approach for Diagnosis of Diabetes and Prediction of Cancer Using ANFIS," Computing and Communication Technologies (WCCCT), 2014 World Congress on, pp.188,190, Feb. 27 2014-March 1 2014. [13] Huang, Mei-Ling, et al. "Usage of case-based reasoning, neural network and adaptive neuro-fuzzy inference system classification techniques in breast cancer dataset classification diagnosis." Journal of medical Fig. 9. 2-rule ANFIS ROC curve for Bupa Dataset systems 36.2 (2012): 407-414. [14] L. A. Zadeh, “Towards a Theory of Fuzzy Information Granulation and its Centrality in Human Reasoning and Fuzzy Logic,” Fuzzy Sets and Systems, Vol. 19, 1997, pp. 111-127. [15] Zadeh L. A., Fuzzy sets and information granularity //GUPTAM, RAGADE R,YAGER R. Advances in Fuzzy Set Theory and Applications. Amsterdam: North-Holland Publishing Co, 1979:3-18. [16] G. Panoutsos, M. Mahfouf, A neural-fuzzy modelling framework based on granular computing: concepts and applications, Fuzzy Set. Syst. 161 (21) (2010)2808–2830. [17] Ding, Shifei, et al. "Granular neural networks." Artificial Intelligence Review 41.3 (2014): 373-384. [18] Li, Hui, and S. F. Ding. "Research and Development of Granular Neural Networks." Applied Mathematics & Information Sciences 7.3 (2013): 1251-1261. [19] A. Frank and A. Asuncion, UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2010.