Generating Fuzzy Queries from Weighted Fuzzy Classifier Rules
Total Page:16
File Type:pdf, Size:1020Kb
Generating Fuzzy Queries from Weighted Fuzzy Classifier Rules Antonio C. S. Branco Alexandre G. Evsukoff Nelson F. F. Ebecken COPPE/Federal University of COPPE/Federal University of COPPE/Federal University of Rio de Janeiro, Brazil Rio de Janeiro, Brazil Rio de Janeiro, Brazil [email protected] [email protected] [email protected] Abstract will return all the records that match the query sentence. However, it is frequently desirable to rank those records, This work presents a methodology to generate such that the manager is able to define priorities. automatically a set of fuzzy queries that are translated Neural networks and Bayesian classifiers are also from a set of fuzzy rules learned from a data set. A rule frequently found in most of DM suites. Such models may simplification approach is proposed, which allows a be coded into DBMS, via PMML, to retrieve a ranked list better interpretability of the fuzzy query sentence. The of records. Nevertheless, neural networks and Bayesian fuzzy query sentences are generated in such a way that classifiers models are not linguistically understandable, they can be processed by a standard fuzzy query system. such that managers cannot directly validate the knowledge The methodology is applied to a benchmark data set extracted by the DM algorithm. considering a scenario of target selection in a direct Fuzzy queries have emerged in the last 25 years to deal market application. The results obtained by the current with the necessity to soften the Boolean logic in relational approach are compared with the results of a standard a databases. A fuzzy query system is an interface to human decision tree induction algorithm. users to get information from database using (quasi) natural language sentences [1][2][3]. Many fuzzy queries implementations have been proposed, resulting in slightly 1. Introduction different languages such as SQLf [4], FQUERY [5] and Summary SQL [6][7], among others. Although some The integration of Data Mining (DM) tools with Data variations according to the particularities of different Base Management Systems (DBMS) is now more than implementations, the answer to a fuzzy query sentence is trend, it is a reality. The major DBMS vendors have a generally a list of records, ranked by the degree of already integrated DM solutions within their products. On matching. the other hand, the main DM suites have also provided the The focus of attention of the fuzzy query research has integration of DM models into DBMS trough modeling been on the expressive power of the fuzzy query languages such as the Predictive Model Markup Language language, since the query is usually provided by the user. (PMML). It is thus a fact that future research on new DM Nevertheless, the definition of a query to retrieve the tools and methods must consider their integration with records according to their features’ values is a very DBMS. difficult task. Not due to the lack of expressive power of On the application side, data intensive industries, such the query language, but due to the difficulty on defining as insurance, accounting and telecommunications, among the concept itself. others, need frequently to retrieve their costumers for Fuzzy queries sentences are structured definitions of different marketing relationship actions. Defining a query fuzzy concepts. Under this assumption, fuzzy queries can sentence that captures the main features of a subset of be automatically generated by fuzzy rule based classifiers. records is a non trivial task. Generally, however, past Fuzzy rule based classifiers are a very active research experiences can be used to guide the new actions, such field [8][9][10]. The weighted fuzzy rule based approach that queries generation can thus be regarded as a DM task. allows better classification results since it allows to define Structured Query Languages (SQL) provides a more precisely the decision boundary among the classes structured description of the retrieved records. A standard [11][12]. The translation of weighted fuzzy rules into SQL sentence representing a concept can be translated fuzzy queries is not straightforward since most of fuzzy from rules generated by an algorithm for decision-tree or queries languages do not support weights. rule induction. The translation of machine learning rules In this work, it is proposed a methodology to translate into SQL sentences is straightforward: for each class, each a set of rules computed by a weighted fuzzy classifier into rule corresponds to a sentence (in the WHERE clause), fuzzy queries, such that they can be coded into most of and all rules related to the same class are aggregated with available fuzzy query languages. The whole procedure is a disjunctive (OR) operator. A standard query, however, sketched in Figure 1. The left side of the figure shows an existing fuzzy query system (denoted Fuzzy SQL) used to matching approach [8]. Under this approach, the output of process the fuzzy queries connected to a DBMS. The the fuzzy classifiers is computed in two steps: proposed methodology is showed in the right side of the 1. for each input, compute partial outputs as the figure. It is considered that a labeled database exists from degree of matching of the observed input value to which a training set may be selected to generate a set of each class, fuzzy rules by the fuzzy classifier. The set of rules 2. compute the final output by the aggregation of all generated by the fuzzy classifier may be very large, partial outputs. containing many useless rules. The fuzzy rule base is thus The following subsections describe the main steps of pruned and then translated into a set of fuzzy query the fuzzy rule-based classifier approach. sentences. The fuzzy query system is used to retrieve an ordered set of records from a new and unlabelled 2.1. Fuzzification database, corresponding to a given class. In a general application, the input variables may be numeric (discrete or continuous) or nominal. Fuzzy sets Fuzzy SQL Rule Base Translation allow a unified representation for nominal and numeric variables as fuzzy sets. Fuzzification is thus an important Fuzzy Queries Fuzzy Rule issue in fuzzy query generation since it provides the Base Unlabelled numeric-to-linguistic interface that allows dealing with Data Base Training Set numeric values as linguistic terms. X X X Y 1 2 K n 1 Generally, each input variable x can be described x (1) x (1) x (1) y (1) i 1 2 K n 1 Fuzzy x (2) x (2) x (2) y (2) 1 2 K n 1 Classifier using ordered linguistic terms in a descriptor set M M O M M x ( N ) x ( N ) x ( N ) y ( N ) Labeled 1 2 K n 1 Ai = {Ai1,K, Ain } . When the variable is nominal, the Data Base i descriptor set is the set of possible values for the variable Figure 1. The proposed approach. (or a combination of them). When the variable is numeric, The remainder of this paper is organized as follows. the meaning of each term Aij ∈ A i is given by a fuzzy Next section presents the weighted fuzzy rule-based set defined on the variable domain. classifier and the translation of fuzzy rules into fuzzy For a single variable input x (t) , the fuzzification queries. Section 3 describes the learning algorithm of i vector u (t) = u (t), ,u (t) is computed by the fuzzy fuzzy rules and the pruning procedure. Section 4 presents i ( i1 K ini ) the results of the application of the proposed methodology sets in the fuzzy partition of the input variable domain as: for a set of benchmark data sets and a case study using the (1) ui (t) = (µ A (xi (t)),K, µ A (xi (t))) . Credit Card data set. The fuzzy queries results are i1 ini compared to standard queries generated by a standard An easy way to parameterize fuzzy sets is to use decision-tree induction algorithm. Finally some triangular membership functions that are completely conclusions are drawn. determined by the centers of triangles, which may be considered as prototypes values for the corresponding 2. Fuzzy rule-based classifier fuzzy sets (see Figure 2). µ Consider the standard classification problem where (x) A1 A2 A3 A4 A5 input variables are presented as a p-dimensional vector x p in the input variable domain X1 × X 2 ×K× X p = X and the output variables represented by the classes’ set C = {}C1,K,Cm . The solution to the classification p1 p2 p3 p4 p5 problem is to assign a class label Ck ∈ C to an observation x(t)∈ X p , where t represents a record in the Figure 2: Example of fuzzy partition. database. The fuzzification vector generalizes the information Fuzzy classifiers attempt to compute, for each class, a contained in the input variable and is computed in the fuzzy membership value µC (x(t)) that correspond to the same way if the variable is numeric or nominal. For k nominal variables, there is no fuzziness and the degree of matching of the observation x(t) to the class fuzzification vector is a binary vector, indicating which Ck . nominal value is present on the record. This section describes the weighted fuzzy rule-based In a general-purpose fuzzy classifier, multidimensional classifier approach, which is based on the fuzzy pattern fuzzification may also be considered and computed by fuzzy clustering methods. Nevertheless, multidimensional A rule base is defined for each input variable and used fuzzy sets often do not represent linguistic concepts and to compute partial outputs by fuzzy inference. Fuzzy rules are not supported by some fuzzy query languages. with more than one variable in the antecedent allow Multidimensional fuzzy sets are thus out of the scope of representing interactions between input variables but the this paper.