Revisiting Evolutionary Algorithms in Feature Selection and Nonfuzzy/Fuzzy Rule Based Classification Satchidananda Dehuri1 and Ashish Ghosh2∗
Total Page:16
File Type:pdf, Size:1020Kb
Advanced Review Revisiting evolutionary algorithms in feature selection and nonfuzzy/fuzzy rule based classification Satchidananda Dehuri1 and Ashish Ghosh2∗ This paper discusses the relevance and possible applications of evolutionary algo- rithms, particularly genetic algorithms, in the domain of knowledge discovery in databases. Knowledge discovery in databases is a process of discovering knowl- edge along with its validity, novelty,andpotentiality. Various genetic-based feature selection algorithms with their pros and cons are discussed in this article. Rule (a kind of high-level representation of knowledge) discovery from databases, posed as single and multiobjective problems is a difficult optimization problem. Here, we present a review of some of the genetic-based classification rule discovery methods based on fidelity criterion. The intractable nature of fuzzy rule mining using single and multiobjective genetic algorithms reported in the literatures is reviewed. An extensive list of relevant and useful references are given for further research. C 2013 Wiley Periodicals, Inc. How to cite this article: WIREs Data Mining Knowl Discov 2013. doi: 10.1002/widm.1087 INTRODUCTION is by means of data mining or knowledge discovery he current information era is characterized by from databases (KDD).3–6 Through data mining, in- T a great expansion in the volume of data that teresting knowledge7 can be extracted and the dis- are being generated by low-cost devices (e.g., scan- covered knowledge can be applied in the target field ners, bar code readers, sensors) and stored. Intu- to increase the working efficiency and to improve itively, this large amount of stored data contains valu- the quality of decision making. Some of the knowl- able hidden knowledge, which could be used to im- edge discovery and data mining tools, e.g., DBMiner, prove the decision-making process of an organization. DeltaMiner, CN2, which aim at the mainstream of , Piatetsky-Shapiro reported1 that it is an urgent re- business user are providing up-to-date solutions.8 9 quirement to develop a semiautomatic tool to discover Interested reader/practitioner can obtain a range of hidden knowledge. However, discovering knowledge existing state-of-the-art data mining and related tools from such a volume of complex data can be char- discussed in Ref 10. acterized as a problem of intractability.2 Therefore, Over last one and half decades, most of the data the development of efficient and effective tools for re- mining techniques are focused from database perspec- vealing valuable knowledge hidden in these databases tive. In comparison, little effort has been made from becomes more critical for enterprise decision mak- machine learning and soft computing perspective.11 ing. One of the possible approaches to this problem However, recently a growing interest from researchers of evolutionary algorithms and multiobjective evo- lutionary algorithms for data mining applications ∗Correspondence to: [email protected] are coming up with their own findings. Some of 1Department of Systems Engineering, Ajou University, Suwon the findings in this direction can be obtained from 15 South Korea Refs 12–14. Alcala-Fdez et al. have developed a 2Center for Soft Computing Research, Indian Statistical Institute software tool known as knowledge extraction based Kolkata, Kolkata India on evolutionary learning (KEEL) to assess evolution- DOI: 10.1002/widm.1087 ary algorithms16 for the data mining problem of Volume 00, January/February 2013 c 2013 John Wiley & Sons, Inc. 1 Advanced Review wires.wiley.com/widm various kinds including regression, classification, un- tionary algorithms (MOEAs) and fuzzy systems. EAs supervised learning (clustering), and so on. It includes particularly genetic algorithms (GAs) for attribute se- evolutionary learning algorithms based on different lection is discussed in the Data Mining Using Genetic approaches: Pittsburgh,17,18 Michigan,19,20 iterative Algorithms: Attribute Selection. rule learning (IRL),21 and genetic cooperative compet- Data mining involves various tasks such as clas- itive learning (GCCL).22 Along with the integration sification, clustering, association rule mining, regres- of evolutionary learning with different preprocessing sion, and change detection. Each task can be con- techniques, it allowed to perform a complete analy- sidered as a problem to be solved by data min- sis of any learning model in comparison with existing ing algorithms. In this paper, the utility of GAs for software tools. Similarly, in recent years,23,24 the de- classification task is primarily dealt with. However, velopment of methods for data mining has attracted the interested reader for other tasks of data mining increasing attention in the fuzzy set community. A can refer to Ref 44,45 for association rule mining systematic discussion of possible benefits of fuzzy based on evolutionary algorithms,46–49 for evolution- methods in data mining is presented in Ref 25. To ary algorithms based clustering,50 for genetic-based this end, this paper presents a well-balanced review regression, and51 for genetic-based change detection, of the literature of evolutionary algorithms, hybrid and so on. Various issues interwined with classifica- fuzzy genetic rule based system in data mining, and tion rule mining (CRM) using GAs are discussed in KDD. the Data Mining Using Genetic Algorithms. Nowadays, the application domain of data min- Later on, fuzzy classification rule mining ing is getting more complex and complex, i.e., it is (FCRM) using genetic and multiobjective genetic al- shifted from traditional scientific26 and market basket gorithm (MOGA) is presented. The aim is to generate database mining27 to biological,28,29 health care,30,31 a compact set of classification rules by simultaneous agriculture,32 process monitoring and control,33 in- optimization of rule accuracy, length of the rules, and trusion detection,34–36 and social network analysis.37 number of rules. The last section presents the sum- For example, detecting unauthorized use, misuse, and mary and future research directions. attacks that have no previously described patterns on information systems is usually a very complex task for traditional methods. Similarly, data about a hos- PRELIMINARIES pital’s patients might contain interesting knowledge about which kind of patient is more likely to develop In this section, some preliminaries are discussed. a given disease. Hence, viewing all these complex- ities of the domains and the limitations of statisti- cal classifier, neural network based classifier, decision Knowledge Discovery in Databases tree based classifier, and some of the nonparamet- The subject of KDD has evolved and continues to ric classifiers, it is a source of inspiration to develop evolve, from the intersection of research from various an intelligent system using various evolutionary al- fields such as databases, machine learning,52 pattern gorithm and fuzzy system based approaches. Recall recognition, statistics, artificial intelligence, reason- that this paper will review some of the representa- ing with uncertainties, knowledge acquisition for ex- tive data preprocessing using genetic algorithms and pert systems,53 data visualization, machine discovery, classification rule generation using genetic and fuzzy high-performance computing, evolutionary computa- systems. tion, multiobjective evolutionary computation, and It should be noted that the quality of the dis- swarm intelligence.12,54 This paper focuses on EAs, covered knowledge (whether it is a classification rule, particularly GAs, for KDD. Hence, it is important to association rule, prediction rule, or clusters) strongly discuss the definitions and concepts of data mining depends on the quality of the data being mined. and the process of KDD. This has motivated the improvement and develop- Definition: KDD is defined as the nontrivial pro- ment of several data preprocessing techniques such as cess of identifying valid, novel, potentially useful, and attribute selection, attribute constructions, and train- ultimately understandable patterns in data. ing set selection.38 The requirements of data prepro- cessing in KDD and its success are reported in the KDD Process: The overall KDD process is interactive literature.39–43 In the Preliminaries, we present some and iterative involving four steps: (i) data acquisi- of the preliminary concepts and definitions of KDD, tion and integration, (ii) data preprocessing, (iii) data evolutionary algorithms (EAs), multiobjective evolu- mining, and (iv) postprocessing. Specifically, the 2 c 2013 John Wiley & Sons, Inc. Volume 00, January/February 2013 WIREs Data Mining and Knowledge Discovery Revisiting evolutionary algorithms following steps are interwined in any practical im- Data Mining: KDD refers to the overall process of plementation: turning low-level data into high-level knowledge. An important step in the KDD process is data mining. Domain specific knowledge includes relevant Data mining is an interdisciplinary field with a general prior knowledge and goals of the appli- goal of predicting outcomes and uncovering relation- cation. ships in data. It uses automated tools employing so- Extracting/selecting the target data set includes phisticated algorithms to discover hidden patterns, as- extracting/selecting a data set or focusing sociations, anomalies, and/or structures from a large on a subset of data instance. amount of data stored in data warehouses or other Data cleansing includes basic operations, such information repositories.