Processing Fuzzy Relational Queries Using Fuzzy Views Emmanuel Doumard, Olivier Pivert, Grégory Smits, Virginie Thion
Total Page:16
File Type:pdf, Size:1020Kb
Processing Fuzzy Relational Queries Using Fuzzy Views Emmanuel Doumard, Olivier Pivert, Grégory Smits, Virginie Thion To cite this version: Emmanuel Doumard, Olivier Pivert, Grégory Smits, Virginie Thion. Processing Fuzzy Relational Queries Using Fuzzy Views. IEEE International Conference on Fuzzy Systems, Jun 2019, New-Orleans, United States. hal-02116133 HAL Id: hal-02116133 https://hal.inria.fr/hal-02116133 Submitted on 30 Apr 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Processing Fuzzy Relational Queries Using Fuzzy Views Emmanuel Doumard, Olivier Pivert, Gregory´ Smits, and Virginie Thion Univ Rennes, IRISA - UMR 6074, F-22305 Lannion, France Email: folivier.pivert,gregory.smits,[email protected] Abstract—This paper proposes two original approaches to the experimentation aimed to compare the performances of the processing of fuzzy queries in a relational database context. The two solutions described before. The experimental results are general idea is to use views, either materialized or not. In the first discussed and the pros and cons of each method are pointed case, materialized views are used to store the satisfaction degrees related to user-defined fuzzy predicates, instead of calculating out. Finally, Section VI recalls the main contributions of the them at runtime by means of user functions embedded in the paper and outlines perspectives for future research. query (which induces an important overhead). In the second case, abstract views are used to efficiently access the tuples that belong II. CONTEXT AND RELATED WORK to the α-cut of the query result, by means of a derived Boolean In this section, we recall some useful notions concerning selection condition. fuzzy queries in a relational database context, with a par- ticular focus on the SQLf language [5] which constitutes I. INTRODUCTION the framework considered in this work. We first deal with The idea of making database management systems more syntactic aspects, then we briefly discuss query-processing- flexible by switching from Boolean logic to fuzzy logic for related issues. interpreting queries is already quite old, as the first work on A. Refresher about Fuzzy Relational Queries this topic dates back to the mid 70’s. Indeed, the first paper considering fuzzy relational queries was authored by V. Tahani Dealing with fuzzy queries in a relational database context — then a Ph.D. student supervised by L.A. Zadeh — in 1977 first implies to extend the classical notion of a relation into [16]. Since that time, a great amount of work has of course that of a fuzzy relation. A fuzzy relation is associated with a been done, both for defining highly expressive fuzzy query fuzzy concept and is simply a relation where every tuple is languages (see for instance [12], [5], [1], [14]) and for devising assigned a membership degree in the unit interval, that reflects efficient processing techniques suitable for such queries (see the extent to which the tuple satisfies . The operations from e.g. [4], [6], [17], [15]). As a matter of fact, it is of prime the relational algebra can be extended to fuzzy relations along importance to have available efficient evaluation algorithms so two lines: first, by considering fuzzy relations as fuzzy sets; that the gain in terms of expressiveness be not counterbalanced second by introducing gradual predicates in the appropriate by a severe performance degradation. So far, fuzzy query operations (selections and joins especially). The definitions processing relies on a technique first introduced by Bosc and of the extended relational operators can be found in [1]. As Pivert in [2], called derivation, whose basic principle is to an illustration, we give the definition of the fuzzy selection compute a Boolean query from the fuzzy one to be evaluated. operator hereafter: The derived Boolean query, which corresponds to the desired µ (t) = >(µ (t); µ (t)) α-cut of the fuzzy query to be evaluated (or a superset thereof), select(r; ') r ' can then be efficiently processed by a regular DBMS, and the where r denote a fuzzy relation, ' is a fuzzy predicate and > satisfaction degree of each answer can be computed either by is a triangular norm (most usually, min is used). means of an external program or by a call to user functions Relational algebra is obviously too abstruse for non-experts in the derived query (which proves rather costful). In the to use, this is why no commercial DBMS offers it as a query present paper, we introduce an alternative processing strategy, language. More user-oriented languages (based on relational based on the use of fuzzy views. Two variants are studied: one algebra) had to be defined, and the most famous one is of where the views are materialized, the other where they are course SQL, in which queries are expressed by means of kept abstract. “blocks” involving clauses such as select, from, where, etc. The The remainder of the paper is organized as follows. In language called SQLf, whose initial version was presented in Section II, we recall some useful notions about fuzzy rela- [5], extends SQL so as to support fuzzy queries. The general tional queries and their evaluation. We also provide a brief principle consists in introducing gradual predicates wherever survey of related work. Sections III and IV are devoted to it makes sense. The three clauses select, from and where of the the description of the two approaches involving materialized base block of SQL are kept in SQLf and the from clause re- and abstract views respectively. In Section V, we present an mains unchanged (except when fuzzy joins are used, in which case the join condition involves a fuzzy comparison operator). RDBMS, which entails better performances. Above all, The principal differences concern mainly two points: fuzzy queries are directly submitted to the DBMS. • the filtering of the result, which can be achieved through • tight coupling: the new features are incorporated into the a number of desired answers (k), a minimal level of RDBMS inner core. This solution, which is of course satisfaction (α), or both, and the most efficient in terms of query evaluation, implies • the nature of the authorized conditions as mentioned to entirely rewrite the query engine (including the parser previously. and query optimizer), which is a quite heavy task. Therefore, the SQLf base block is expressed as: In [15], we presented an implementation at the junction select [distinct][k j α j k, α] attributes between mild coupling and tight coupling where i) the mem- from relations where fuzzy-cond bership functions corresponding to the user-specified fuzzy predicates are defined as stored procedures and ii) the gradual where fuzzy-cond may involve both Boolean and fuzzy pred- connectives (fuzzy conjunction, disjunction, and quantifiers) icates. From a conceptual point of view, this expression is are implemented in C and integrated in the query processing interpreted as: engine of the RDBMS PostgreSQL. • the fuzzy selection (by fuzzy-cond) of the Cartesian Since commercial RDBMSs are not able to natively interpret product of the relations appearing in the from clause, fuzzy queries, some previous works (see [2], [6]) suggested • a projection over the attributes of the select clause (du- to perform a derivation step in order to generate a regular plicates are kept by default, and if distinct is specified Boolean query used to prefilter the relation (or Cartesian the maximal degree among the duplicates is retained), product of relations) concerned. The idea is to restrict the • the filtering of the result (top k elements and/or those degree computation phase only to those tuples that belong whose score is over the threshold α). to the α-cut of the result of the fuzzy query (assuming that Besides, SQLf also preserves (and extends) the constructs α is a qualitative threshold specified by the user; 0+ is used specific to SQL, e.g. nesting operators, relation partitioning, by default). With this type of evaluation strategy, fuzzy query etc., see [14] for more detail. In the following, we only processing involves three steps: consider single-block Selection-Projection-Join SQLf queries. 1) derivation of a Boolean query from the fuzzy one, using Any fuzzy querying system must provide users with a the threshold α and the membership functions of the convenient way to define the fuzzy terms that they wish to fuzzy predicates involved in the where clause; include in their queries. In practice, the membership function 2) processing of the derived query, which produces a clas- F associated with a fuzzy set is often chosen to be of a sical relation; F trapezoidal shape. Then, may be expressed by a quadruplet 3) computation of the satisfaction degree attached to each (a; b; c; d) core(F ) = [b; c] support(F ) = (a; d) where and . tuple (followed by a tuple filtering step if necessary1), In a previous work [15], we described a graphical interface which yields the fuzzy relation that constitutes the final aimed at helping the user express his/her fuzzy queries. result. B. About Fuzzy Query Processing: Related Work In terms of performances, the interest of using a mild coupling Fuzzy query processing [8], [12], [9], [3] raises several architecture lies in the fact that the resulting fuzzy relation is issues, among which the main two ones are listed hereafter: directly computed during the tuple selection phase (no external program needs to be called to perform step 3).