Granular Computing-Based Approach for Classification Towards Reduction

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Crossref Granul. Comput. (2017) 2:131–139 DOI 10.1007/s41066-016-0034-1 ORIGINAL PAPER Granular computing-based approach for classification towards reduction of bias in ensemble learning 1 1 Han Liu • Mihaela Cocea Received: 25 August 2016 / Accepted: 2 November 2016 / Published online: 11 November 2016 Ó The Author(s) 2016. This article is published with open access at Springerlink.com Abstract Machine learning has become a powerful 1 Introduction approach in practical applications, such as decision making, sentiment analysis and ontology engineering. To Machine learning has become an increasingly powerful improve the overall performance in machine learning tasks, approach in real applications, such as decision making (Das ensemble learning has become increasingly popular by et al. 2016; Xu and Wang 2016), sentiment analysis (Liu combining different learning algorithms or models. Popular 2012; Pedrycz and Chen 2016) and ontology engineer- approaches of ensemble learning include Bagging and ing (Pedrycz and Chen 2016; Roussey et al. 2011). In Boosting, which involve voting towards the final classifi- practice, machine learning can be involved in classification cation. The voting in both Bagging and Boosting could and regression, which are considered as supervised learn- result in incorrect classification due to the bias in the way ing tasks. In other words, training data used in classifica- voting takes place. To reduce the bias in voting, this paper tion and regression are labelled. Also, machine learning proposes a probabilistic approach of voting in the context can be involved in association and clustering, which are of granular computing towards improvement of overall considered as unsupervised learning tasks. In other words, accuracy of classification. An experimental study is training data used in association and clustering are unla- reported to validate the proposed approach of voting using belled. This paper focuses on classification tasks. 15 data sets from the UCI repository. The results show that In the context of classification, popular machine learn- probabilistic voting is effective in increasing the accuracy ing methods include decision tree learning (Quinlan 1986; through reduction of the bias in voting. This paper con- Chen et al. 2016), rule learning (Liu and Gegov 2016b;Du tributes to the theoretical and empirical analysis of causes et al. 2011; Rodrguez-Fdez et al. 2016), Bayesian learn- of bias in voting, towards advancing ensemble learning ing (Zhang et al. 2009; Yager 2006), instance-based approaches through the use of probabilistic voting. learning (Tu et al. 2016; Gonzlez et al. 2016; Langone and Suykens 2017) and perceptron learning (Shi et al. 2016; Keywords Granular computing Á Machine learning Á da Silva and de Oliveira 2016). Both decision tree learning Ensemble learning Á Bagging Á Boosting Á Probabilistic and rule learning aim to learn a set of rules. The difference voting between these two types of learning is that the former is aimed at learning of rules in the form of a decision tree, e.g. ID3 (Quinlan 1986) and C4.5 (Quinlan 1993), whereas the latter aims to learn if-then rules directly from training & Han Liu instances, e.g. Prism (Cendrowska 1987) and IEBRG (Liu [email protected] and Gegov 2016a). In particular, decision tree learning Mihaela Cocea typically follows the divide and conquer approach, whereas [email protected] rule learning mainly follows the separate and conquer approach. Bayesian learning works based on the assump- 1 School of Computing, University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth PO1 3HE, tion that all the input attributes are totally independent of UK each other, e.g. Naive Bayes (Barber 2012). In this context, 123 132 Granul. Comput. (2017) 2:131–139 each attribute–value pair would be independently corre- in voting. Section 5 summarises the contributions of this lated to each of the possible classes, which means that a paper and suggests further directions for this research area posterior probability is provided between the attribute– towards further advances in ensemble learning. value pair and the class. Instance-based learning generally involves predicting test instances on the basis of their similarity to the training instances, e.g. K nearest neigh- 2 Related work bour (Liu et al. 2016a). In other words, this type of learning does not involve learning models in the training This section describes in depth the concepts of ensemble stage, but just aims to classify each instance to the category learning and reviews two popular approaches, namely to which the majority of its nearest neighbours (the Bagging and Boosting. It also highlights how the voting instances most similar to it) belong. Perceptron learning involved in these two approaches can lead to incorrect aims to build a neural network topology that consists of a classification. number of layers, each of which has a number of nodes and represents a perceptron. Some popular methods of neural 2.1 Ensemble learning concepts network learning include backpropagation and probabilistic neural networks (Kononenko and Kukar 2007). The concepts of Ensemble learning are usually used to In general, each machine learning algorithm has its improve overall accuracy, i.e. in order to overcome the advantages and disadvantages. To improve the overall limitations that each single learning algorithm has its own accuracy of classification, ensemble learning has been disadvantages and the quality of original data may not be adopted. Popular ensemble learning approaches include good enough. In particular, this purpose can be achieved Bagging (Breiman 1996) and Boosting (Freund and Scha- through scaling up algorithms (Kononenko and Kukar pire 1996). Both approaches involve voting in the testing 2007) or scaling down data (Kononenko and Kukar 2007). stage towards the final classification. In particular, Bagging The former means a combination of different learning employs majority voting (Kononenko and Kukar 2007;Li algorithms which are complementary to each other. The and Wong 2004) by means of selecting the class with the latter means pre-processing of data towards the improve- highest frequency towards classifying an unseen instance ment of data quality. In practice, ensemble learning can be and Boosting employs weighted voting (Kononenko and done both in parallel and sequentially. Kukar 2007; Li and Wong 2004) by means of selecting the In the context of classification, the parallel ensemble class with the highest weight for the same purpose. Both learning approach works by combining different learning majority voting and weighted voting are considered to be algorithms, each of which generates a model independently biased to always select the class with the highest frequency on the same training set. In this way, the predictions of the or weight, which may result in overfitting of training models learned by these algorithms are combined toward data (Barber 2012). The aim of this paper is to provide classifying unseen instances. This way belongs to scaling theoretical and empirical analysis of bias in voting and up algorithms because different algorithms are combined to contribute towards reduction of the bias in voting through generate a stronger hypothesis. In addition, the parallel the use of granular computing concepts. In particular, the ensemble learning approach can also be achieved using a probabilistic voting approach, which has been proposed single learning algorithm to generate models independently in Liu et al. (2016a) for advancing individual learning on different sample sets of training instances. In this con- algorithms that involve voting, is used in this paper to text, the sample set of training instances can be provided by advance ensemble learning approaches. More details on the horizontally selecting the instances with replacement or probabilistic voting are presented in Sect. 3. How this vertically selecting the attributes without replacement. This voting approach is linked to granular computing concepts way belongs to scaling down data, because it is aimed to is also justified in Sect. 3. pre-process data towards reducing the variability of the The rest of this paper is organised as follows: Sect. 2 data, leading to the reduction of the variance in classifi- presents ensemble learning concepts and the two popular cation results. approaches namely Bagging and Boosting; Sect. 3 presents In the sequential ensemble learning approach, accuracy a probabilistic approach of voting in the context of granular can also be improved through scaling up algorithms or computing and argues that this approach can effectively scaling down data. In the former way, different algorithms reduce the bias in voting towards classifying an unseen are combined in such a way that the first algorithm learns to instance; Sect. 4 reports an experimental study to validate build a model and then the second algorithm learns to the proposed approach of voting, and the results are also correct the model and so on. In this way, there are no discussed to show the extent to which the accuracy of changes made to the training data. In the latter way, in classification is improved through the reduction of the bias contrast, the same algorithm is used iteratively on different 123 Granul. Comput. (2017) 2:131–139 133 versions of the training data. At each iteration, there is a and Wong (2004), Bagging is robust and does not lead to model learned, which is then evaluated on the basis of the overfitting due to the increase of the number of generated validation data. According to the estimated quality of the models. Therefore, it is useful especially for those non- model, the training instances are weighted to different stable learning methods with high variance. A popular extents and then used for the next iteration. In the testing example of Bagging is Random Forests (Breiman 2001), stage, these models learned at different iterations make which is illustrated in Fig.

Load more