Grammar-Based Fuzzy Pattern Trees for Classification Problems

Grammar-based Fuzzy Pattern Trees for Classification Problems Aidan Murphy1 a, Muhammad Sarmad Ali1 b, Douglas Mota Dias1;2 c, Jorge Amaral2 d, Enrique Naredo1 e and Conor Ryan1 f 1University of Limerick, Limerick, Ireland 2Rio de Janeiro State University, Rio de Janeiro, Brazil Keywords: Grammatical Evolution, Pattern Trees, Fuzzy Logic. Abstract: This paper introduces a novel approach to induce Fuzzy Pattern Trees (FPT) using Grammatical Evolution (GE), FGE, and applies to a set of benchmark classification problems. While conventionally a set of FPTs are needed for classifiers, one for each class, FGE needs just a single tree. This is the case for both binary and multi-classification problems. Experimental results show that FGE achieves competitive and frequently better results against state of the art FPT related methods, such as FPTs evolved using Cartesian Genetic Programming (FCGP), on a set of benchmark problems. While FCGP produces smaller trees, FGE reaches a better classification performance. FGE also benefits from a reduction in the number of necessary user- selectable parameters. Furthermore, in order to tackle bloat or solutions growing too large, another version of FGE using parsimony pressure was tested. The experimental results show that FGE with this addition is able to produce smaller trees than those using FCGP, frequently without compromising the classification performance. 1 INTRODUCTION Although ML algorithms are successful in terms of results and predictions, they have their short- Machine learning (ML) has great potential to solve comings. The most compelling is the absence of real-world problems and to contribute to the improve- transparency, which identifies the so-called black-box ment of processes, products, and research. In the models. In such models, it is very difficult or even last two decades, the number of applications of ma- impossible to understand how the ML system makes chine learning has been increasing due to the avail- its decision or to extract the knowledge of how the de- ability of vast collections of data and massive com- cision is made. As a result, it does not allow a human puter power thanks to the development of new train- being, expert, or not to check, interpret, and under- ing algorithms, the emergence of new hardware plat- stand how the model reaches its conclusions. forms based on graphics cards with GPUs and the In order to address these issues, Explainable Ar- availability of open-source libraries (Dosiloviˇ c´ et al., tificial Intelligence (XAI) (Adadi and Berrada, 2018; 2018). Such conditions provide ML systems with the Arrieta et al., 2020) has appeared as a field of research ability to solve highly complex problems with per- focused on the interpretability of ML. The main pur- formance superior to those obtained by techniques pose is to create a set of models and interpretable that, until then, represented state of the art. More- methods that are more explainable while preserving over, in some specific fields of application, such as high levels of predictive performance (Carvalho et al., image classification, ML systems have surpassed hu- 2019). man performance (He et al., 2015). Fuzzy Set theory has provided a framework to develop interpretable models (Cordon,´ 2011) (Her- a https://orcid.org/0000-0002-6209-4642 rera, 2008) because it allows the knowledge acquired b https://orcid.org/0000-0002-7223-5322 from data to be expressed in a comprehensible form, c https://orcid.org/0000-0002-1783-6352 close to natural language, which gives the model a d https://orcid.org/0000-0001-6580-5668 higher degree of interpretability (Hullermeier,¨ 2005). e https://orcid.org/0000-0001-9818-911X Most developed fuzzy models are rule-based fuzzy f https://orcid.org/0000-0002-7002-5815 systems (FBRS) that can represent both classification 71 Murphy, A., Ali, M., Dias, D., Amaral, J., Naredo, E. and Ryan, C. Grammar-based Fuzzy Pattern Trees for Classification Problems. DOI: 10.5220/0010111900710080 In Proceedings of the 12th International Joint Conference on Computational Intelligence (IJCCI 2020), pages 71-80 ISBN: 978-989-758-475-6 Copyright c 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved ECTA 2020 - 12th International Conference on Evolutionary Computation Theory and Applications and regression functions and for which there are many associated with fuzzy predicates on input variables. It strategies developed for the synthesis of these models propagates information from the bottom to the top: A (Cordon,´ 2011). Obtaining fuzzy models based on node takes the values of its descendants as input, ag- easily interpretable rules may not be an easy task, be- gregates them using the respective operator, and sub- cause depending on the application, many rules may mits the result to its predecessor. Thus, an FPT im- be necessary with many antecedents that make it dif- plements a recursive mapping producing outputs in ficult to understand the model. the [0,1] interval. On the other hand, a system with relatively few The following operators are used, where a and b rules can be easily interpreted, but have its predic- are the inputs to the operator: tive accuracy compromised. In this work, a novel approach to automatically induce models applied on classification problems is introduced. It uses a WTA = IFfg()::ElSE() (1) method based on the theory of fuzzy sets, Fuzzy Pat- MAX = max(a;b) (2) tern Trees (FPT), which is not based on rules, but on MIN = min(a;b) (3) a hierarchical method. This work replaces the FPT WA(k) = ka + (1 − k)b (4) learning method with Grammatical Evolution (GE). GE is flexible enough to derive feasible models OWA(k) = k · max(a;b) + (1 − k)min(a;b) (5) such as FPTs, and it can efficiently address different CONCENTRATE = a2 (6) problems by changing the grammar and the evaluation 1 function. As a result, it is possible to obtain models DILATE = a 2 (7) that can solve a classification problem and to get ex- COMPLEMENT = 1 − a (8) plainable solutions at the same time. Moreover, the combination of GE and Fuzzy Logic gives a valuable where WTA, WA & OWA denote Winner takes opportunity to address the new research lines in XAI. all, Weighted Average and Ordered Weighted Aver- Experimental results show that GE can evolve fuzzy age, respectivly. pattern trees to solve benchmark classification prob- Figure 1 shows an example of an FPT, which was lems with competitive results against state of the art trained from a (wine) quality dataset. It represents the methods with better results in three of them. fuzzy concept – a fuzzy criterion for – wine with a The remainder of this paper is organized as fol- high quality. lows: Section 2 reviews the main background con- cepts, including FPTs, Genetic Programming (GP), WA(0.58) Cartesian GP (CGP) and GE. Section 3 explains the proposal and contributions of this work in more de- Med_Alcohol tail. Next, Section 4 presents the experimental set- up, outlining all of the considered variants and performance measures. Section 5 presents and discusses the main experimental results of the described research. Finally, Section 6 presents the conclusions and future Med_SO2 Med_Sulf Med_Acid Low_Alcohol work derived from this research. Figure 1: Tree representing the interpretable class ”Good Quality Wine”, showing each variable with different color. The node labels of the tree illustrate their inter- 2 BACKGROUND pretation and not yet their implementation. In order to interpret the whole tree and grasp the fuzzy pattern 2.1 Fuzzy Pattern Trees it depicts, we start at the root node. It represents the final aggregation (a simple average in this case) and FPTs have independently been introduced by Huang outputs the overall evaluation of the tree for a given et al. (Huang et al., 2008), and Yi et al. (Yi et al., instance (a wine). Then, we proceed to its children 2009) who called this type of model Fuzzy Opera- and so forth. The interpretation could be like this: tor Trees. The FPT model class is related to several A high quality wine fulfills two criteria. We call other model classes including fuzzy rule-based sys- these two criteria – the left and right subtrees of the tems (FRBS), and fuzzy decision trees (FDT). root node – criterion I and criterion II. Criterion I A FPT is a hierarchical, tree-like structure, whose is fulfilled if the alcohol concentration of the wine is inner nodes are marked with generalized (fuzzy) logi- high or its density is high. Criterion II is fulfilled, cal and arithmetic operators, and whose leaf nodes are if the wine has a high concentration of sulfates or a 72 Grammar-based Fuzzy Pattern Trees for Classification Problems third criterion (III) is met. This is the case, if both large, the algorithm will take a long time to evalu- alcohol concentration and the wines acidity is low. ate all the possibilities; as a result, there will be an The FPTs were created focusing on the represen- explosion in the number of possible combinations. tation of knowledge through a tree-shaped expression The results reported in (dos Santos and do Ama- rather than representing it in the form of rules. The ral, 2015) indicated that FPTs synthesized by CGP first FPT induction method was created by Huang, are competitive with other classifier algorithms, and Gedeon and Nikravesh (Huang et al., 2008), and re- they are smaller than those obtained in (Senge and fined in (Senge and Hullermeier,¨ 2011). Hullermeier,¨ 2011). Hierarchical representation minimizes existing Another example of the synthesis of FPTs by CGP problems in rule-based systems, such as exponential can be found in (dos Santos et al., 2018). In that pa- increase in the number of rules with increasing entries per, the authors implement the improvements in CGP and loss of interpretability when a large number of suggested by (Goldman and Punch, 2014) and imple- rules are required to achieve accuracy requirements.

Load more