A Novel Mapreduce Classification Rule Discovery with Ant Colony

https://doi.org/10.20965/jaciii.2019.p0928 Kong, Y. et al. Paper: MR-AntMiner: A Novel MapReduce Classification Rule Discovery with Ant Colony Intelligence Yun Kong∗1,∗2, Junsan Zhao∗1,†,NaDong∗1, Yilin Lin∗1, Lei Yuan∗3, and Guoping Chen∗4 ∗1Faculty of Land Resource Engineering, Kunming University of Science and Technology No. 68 Wenchang Road, 121 Avenue, Wuhua District, Kunming, Yunnan 650093, China E-mail: [email protected] ∗2Library of Kunming University of Science and Technology No. 727 Jingming Nan Road, Chenggong District, Kunming, Yunnan 650504, China ∗3School of Information Science and Technology, Yunnan Normal University No. 1 Yuhua District, Chenggong New District, Kunming, Yunnan 650500, China ∗4Geomatics Engineering Faculty, Kunming Metallurgy College No. 388 Xuefu Road, Wuhua District, Kunming, Yunnan 650028, China †Corresponding author [Received September 30, 2018; accepted May 1, 2019] Ant colony optimization (ACO) algorithms have been that of the compared targets. Furthermore, experi- successfully applied to data classification problems mental studies show the feasibility and the good per- that aim at discovering a list of classification rules. formance of the proposed parallelized MR-AntMiner However, on the one hand, the ACO algorithm has de- algorithm. fects including long search times and convergence issues with non-optimal solutions. On the other hand, given bottlenecks such as memory restrictions, time Keywords: ant colony optimization (ACO), MapReduce complexity, or data complexity, it is too hard to solve Model, classification rule a problem when its scale becomes too large. One solution for this issue is to design a highly parallelized learning algorithm. The MapReduce program- 1. Introduction ming model has quickly emerged as the most com- mon model for executing simple algorithmic tasks over The ant colony algorithm is a heuristic intelligent huge volumes of data, since it is simple, highly ab- optimization algorithm originally proposed by Colorni stract, and efficient. Therefore, MapReduce-based et al. [1], which has been successfully applied to solve ACO has been researched extensively. However, due to many NP combinatorial optimization problems. Clas- its unidirectional communication model and the inher- sification rule mining based on ant colony optimization ent lack of support for iterative execution, ACO algo- (ACO) was first proposed by Parpinelli et al. [2]. The rithms cannot easily be implemented on MapReduce. basic problem can be depicted as follows: the definition In this paper, a novel classification rule discovery algo- of an ant search path is the connection of attribute nodes rithm is proposed, namely MR-AntMiner, which can and class nodes, in which the attribute nodes only appear capitalize on the benefits of the MapReduce model. In once and must have class nodes, each path corresponds order to construct quality rules with fewer iterations to a classification rule, and the mining of the rules can be as well as less communication between different nodes regarded as the search for the optimal path. Rule min- to share the parameters used by each ant, our algo- ing consists of three stages: rule construction, rule prun- rithm splits the training data into some subsets that ing, and pheromone path updating. The form of a rule is are randomly mapped to different mappers; then the shown in Eq. (1), where term1 is a conditional item and traditional ACO algorithm is run on each mapper to the rule conclusion (THEN) defines the prediction cate- gain the local best rule set, and the global best rule list gory of the sample (class). is produced in the reducer phase according to a voting IF term AND term AND THEN class . (1) mechanism. The performance of our algorithm was 1 2 studied experimentally on 14 publicly available data With the advent of the era of big data, the scale of sets and further compared to several state-of-the-art data is increasing exponentially. The traditional data min- classification approaches in terms of accuracy. The ex- ing algorithms are mainly suitable for small and medium- perimental results show that the predictive accuracy sized data sets, but are difficult to apply to the analysis obtained by our algorithm is statistically higher than of large-scale data sets. Traditional data mining algo- 928 Journal of Advanced Computational Intelligence Vol.23 No.5, 2019 and Intelligent Informatics © Fuji Technology Press Ltd. Creative Commons CC BY-ND: This is an Open Access article distributed under the terms of the Creative Commons Attribution-NoDerivatives 4.0 International License (http://creativecommons.org/licenses/by-nd/4.0/). MR-AntMiner rithms are challenged by memory constraints, high time that the rules found in the example data will not be re- complexity, and data intensive as well as complex struc- peated. This process continues iteratively until the train- tures [3]. The ACO algorithm is also faced with the same ing data set is empty or the termination condition is met. problems. When the data set is increased to a certain ex- The heuristic value is calculated from the entropy of terms tent, the space and time costs of the traditional single ma- and their normalized information gain. Thereafter, a ma- chine solution become huge, so it is difficult to meet cur- jor voting mechanism method is employed in AntMiner rent computing requirements. The emerging cloud com- to prune the irrelevant terms in order to raise the accu- puting model [4], with utilities such as Hadoop [5], as a racy. Unlike AntMiner, a new heuristic function calcu- new parallel processing technology, has excellent perfor- lation method based on density estimation was adopted mance in dealing with large data sets and massive storage, in AntMiner2 [7] and AntMiner3 [8]. Besides that, the so using optimization algorithms on a cloud computing distinct feature of AntMiner3 is a new pheromone up- platform has become a feasible and reliable solution. date method, in which the pheromones are updated and The main goal of this paper is to combine the ACO evaporated only for those predefined conditions occur- algorithm and the MapReduce model to realize an ant ring in the rule. In this way, exploration behavior is en- colony classification rule mining algorithm in a large- couraged. AntMiner+ [9], which is an enhanced version scale environment. Our main work is as follows: firstly, in of AntMiner, designs a class specific heuristic function, order to solve the problem of one training data set being which enables the ants to know the class of an extracted insufficient to achieve high classification accuracy, this rule. The class label is chosen in AntMiner+ before the study employed data segmentation and sampling tech- ants construct their rules. AntMiner-CC [10] adopts a niques to divide the training data set into N subsets in a new heuristic function calculation method based on the uniform distribution mode. Secondly, considering that the correlation of data attributes, which takes full account of time complexity and search space of ACO are unaccept- the relationships between the selected nodes and the can- able when applied to large-scale datasets, in this work, didate nodes and uses a disordered search space instead we adopted a strategy that randomly casts N subsets to of a determined search space. Generally, the continuous N Mappers in a MapReduce cluster. Thirdly, in view of attributes are preprocessed by means of discretization be- the time overhead and lack of iterative execution in the fore the ACO algorithm is applied, so cAnt-Miner [11] MapReduce framework, our method takes K ants into a utilizes information entropy to discretize the continuous Map to produce the local best rule list in a certain sub- attributes. AntMinermbc [12] proposes a novel classifica- set, which can effectively reduce the time overhead of the tion rule discovery algorithm based on ACO, in which a framework and solve the problem of sharing the global new model of multiple rule sets is presented to produce pheromones when running the ACO algorithm in a map- multiple lists of rules. Multiple base classifiers are built per. Finally, a voting selection mechanism is applied to into AntMinermbc, and each base classifier is expected to generate the global best rules as the result of the final clas- remedy the weaknesses of other base classifiers, which sification rule sets at the reducer phase in the MapReduce can improve the predictive accuracy by exploiting the use- model. ful information from various base classifiers. Neverthe- The remainder of this paper is organized as follows. less, as it constructs multiple base classifiers instead of Section 2 presents the background of ACO and gives an one base classifier, it will take more execution time to overview of existing ACO algorithms for classification build a solution from each ant as well as to complete ten- rules discovery. Section 3 reviews the ACO algorithms fold cross validation in a serial computing environment. based on MapReduce and analyzes the issues in iterative To sum up, since the early 1990s, many ACO algo- execution of the MapReduce framework. Our novel al- rithms have been reported, most of which were designed gorithm is described in Section 4, which presents a new by using different probability calculation and pheromone model of MapReduce with ACO. Section 5 gives the ex- update methods. In our method, each ant builds a classifi- perimental results of our algorithm a some publicly avail- cation rule following the traditional flow [2], in which the able data sets in comparison with other classification al- probability transfer formula plays a significant role in the gorithms. At last, Section 6 summarizes our conclusions ant selecting a node, as shown in Eq. (2): and future work. τα (t)ηβ (t) P (t)= ij j , ij total next values ..... (2) τα (t)ηβ (t) 2. ACO with Classification Rule Discovery ∑ ik k k=1 The principle of ACO classification rule mining is to where τij is the concentration of pheromone between imitate ants finding the shortest path from food to nest.

A Novel Mapreduce Classification Rule Discovery with Ant Colony

The Hydrochemical Response of Heilongtan Springs to the 2010

I Am Thinking of Having an Hiv Test

Table of Codes for Each Court of Each Level

Kunming South HSR Integrated Development Next to Kunming South HSR Station, a Key HSR Station in China

Affectionate Ballads” in Wuhua District, Kunming City

Delineation of the Urban-Rural Boundary Through Data Fusion: Applications to Improve Urban and Rural Environments and Promote Intensive and Healthy Urban Development

Detailed Conference Program Day1

International Journal of Business Anthropology

Yunnan WLAN Hotspots 1/15

Evaluation and Monitoring of Urban Public Greenspace Planning Using Landscape Metrics in Kunming

Download Article (PDF)

Analysis of Internal Transcribed Spacer Regions II Gene and Morphology of Paragonimus from Yunnan Province, China