Exploiting Fuzzy-SQL in Case-Based Reasoning

Luigi Portinale and Andrea Verrua Dipartimentodi Scienze e Tecnoiogie Avanzate(DISTA) Universita’ del PiemonteOrientale "AmedeoAvogadro" C.so Borsalino 54 - 15100Alessandria (ITALY) e-mail: portinal @mfn.unipmn.it

Abstract similarity-basedretrieval is the fundamentalstep that allows one to start with a set of relevant cases (e.g. the mostrele- The use of database technologies for implementingCBR techniquesis attractinga lot of attentionfor severalreasons. vant products in e-commerce),in order to apply any needed First, the possibility of usingstandard DBMS for storing and revision and/or refinement. representingcases significantly reduces the effort neededto Case retrieval algorithms usually focus on implement- developa CBRsystem; in fact, data of interest are usually ing Nearest-Neighbor(NN) techniques, local simi- alreadystored into relational databasesand used for differ- larity metrics relative to single features are combinedin a ent purposesas well. Finally, the use of standardquery lan- weightedway to get a global similarity betweena retrieved guages,like SQL,may facilitate the introductionof a case- and a target case. In (Burkhard1998), it is arguedthat the basedsystem into the real-world,by puttingretrieval on the notion of acceptancemay represent the needs of a flexible sameground of normaldatabase queries. Unfortunately,SQL case retrieval methodologybetter than distance (or similar- is not able to deal with queries like those neededin a CBR ity). Asfor distance, local acceptancefunctions can be com- system,so different approacheshave been tried, in orderto buildretrieval engines able to exploit,at thelower level, stan- bined into global acceptancefunctions to determinewhether dard SQL.In this paper, wepresent a proposalwhere case a target case is acceptable(i.e. it is retrieved) with respect retrieval is implementedby using a fuzzyversion of SQL.In to a given query. In particular, very often local acceptance the presentedapproach a fuzzynotion of similarityis adopted functions take the formof fuzzy distributions; in this way, and an extendedversion of SQL,dealing with fuzzypredi- the definition of a fuzzy linguistic term over the domainof cates, is introduced.A case-based system prototype exploit- a given attribute of a case can be exploited to characterize ing Fuzzy-SQLas a retrieval engineis then presented. the acceptanceof cases similar (in the fuzzy sense) valuesfor that particularattribute or feature. Introduction In the present paper we will present an approachwhere local acceptancerelative to a feature can be actually ex- The use of database techniques for supporting the construc- pressed throughfuzzy distributions on its domain,abstract- tion of case-basedsystems is attracting serious attention; the ing the actual values to linguistic terms. Thepeculiarity of reasonsare twofold:i) if data of interest are alreadystored the approachis that global acceptanceis completeddefined in a database, the databaseitself can be used as a case base; in fuzzy terms, by meansof the usual combinationsof lo- ii) part of the technological facilities of a DBMSmay be cal distributions throughspecific defined norms(Liu & Lee exploitedin the CBRcycle, in particular for case retrieval. 1996). Moreover,an extendedversion of SQL,able to deal This is particularly true in e-commerceapplications, with fuzzypredicates andconditions, is introducedas a suit- wherethe productdatabase represents the main data source, able wayto directly query a case base stored on a relational whoserecords are easily interpretable as "cases" for a case- DBMS;this approach is based on the SQLflanguage pro- based recommendation system (Schumacher & Bergmann posed in (Bose & Pivert 1995), extending standard SQL 2000; Schumacher& Roth-Berghofer 1999; Wilke, Lenz, order to deal with fuzzy queries. A systemprototype able to & Wess1998), as well as in the broad context of knowl- generate a Fuzzy-SQLquery a case specification and edge management,for the implementation of corporate- workingon top of a standard relational DBMShas been im- wide case-based systems (Kitano & Shimazu 1996; Shi- plemented;its mainfeatures will be presentedin the paper. mazu,Kitano, & Shibata 1993). All these database driven proposalsfocus on the retrieval step of the CBRcycle l, since Fuzzy Queryingin Databases Copyright(~) 2001,American Association for Artificial Intelli- gence(www.aaai.org). All rights reserved. It is well-knownthat standard relational databases can only ] Actually,in (Wilke,Lenz, & Wess 1998) a revisionof the clas- deal with precise informationand standard query languages, sical CBRcycle presentedin (Aamodt& Plaza 1994)is proposed, like SQL,only support boolean queries. Fuzzylogic pro- by specifically taking into accountthe e-commerceapplication; videsa natural wayof generalizingthe strict satisfiability of howeverno particular databasetechnique is specificallyproposed booleanlogic to a notion of satisfiability with different de- for implementingthis newcycle. grees; this is the reason whyconsiderable efforts has been

CASEBASED REASONING 103 dedicated inside the database communitytoward the possi- ¯ discrete operators:defining a similarity relation charac- bility of dealing with fuzzy informationin a database. terized by a symmetricmatrix of similarity degrees (e.g. There are at least two mainapproaches to the problem:i) the operator compatible over the attribute job defin- explicit representation of vague and imprecise information ing to whatdegree a pair of different jobs are compatible). inside the database (Buckles & Petry 1982; Medina,Pons, By allowing fuzzy predicates and operators to form the con- & Vila 1994); ii) vague and imprecise querying on a regu- dition of the WHEREclause, the result of the SELECTis lar database (Bose & Pivert 1995; Kacprzyck& Ziolowski actually a fuzzyrelation, i.e. a set of tuples with associated 1986). Evenif the first class of approachesis moregeneral the degree to whichthey satisfy the WHEREclause. Such a that the secondone, wewill concentrateon the latter, since degreecan be characterizedas follows: let the interest in this paper is to showhow flexible querying SELECT A FROM R WHERE fc in a standard database can be used to implementcase re- be a querywith fuzzycondition fc; the result will be a fuzzy trieval. In particular, in (Bose& Pivert 1995)standard SQL relation Rf with membershipfunction is adoptedas the starting point for a set of extensionsable to improvequery capabilities from boolean to fuzzy ones. pRf(a) = sup l~/c(z) Whilethe long term goal of the approachis to integrate such (zER)A(a~.A=a) extensions directly at the DBMSlevel (directly providing Thefuzzy distribution p.cc(z) relative to fc mustthen to the user with specific DBMSalgorithms for flexible query computedby taking into accountthe logical connectivesin- processing), in the short term the implementationof the SQL volvedand their fuzzy interpretation. It is well knownthat extensionscan be actually providedon top of a standard re- the generalway to give a fuzzyinterpretation to logical con- lational DBMS,by meansof a suitable moduleable to trans- nectives is to associate negation with complementto one, forma fuzzy query in a regular one (derivation principle). conjunctionwith a suitable t-normand disjunction with the In the following, we concentrate on this methodology,by correspondingt-conorm (Liu & Lee 1996). In the following, defining a set of fuzzy extensionsto SQLthat maybe of in- we will concentrate on the simplest t-norm and t-eonorm, terest for CBRand by showinghow they can be processed in namelythe rain and maxoperators such that order to transformthem in standard queries; we finally show howthis can be actually implementedfor case retrieval pur- I~AA.(Z) = min(pA(X), IJBCz)) poses. Weare interested in simple SQLstatements with no #AvB(X) ma x(/zA (x),/~B (X nesting (i.e. we consider the WHEREclause to be a ref- We willdiscuss the implementative advantages of this erence to an actual and not to nested SQLstate- choiceand what problems have to be addressedwith other ments); in our Fuzzy-SQLlanguage, the condition can be norms. a compositefuzzy formula involving both crisp and fuzzy predicates and operators. Deriving Standard SQLfrom Fuzzy-SQL In order to process a query using a standard DBMS,we have Fuzzy Predicates to devise a wayof translating the Fuzzy-SQLstatement into A set of linguistic values can be defined over the domains a standard one. Wehave noticed that the result of a fuzzy of the attributes of interest; three different types of fuzzy query is alwaysa fuzzy relation, while the result of a stan- predicates are considered: dard queryis a booleanrelation; this meansthat, if weare ¯ fuzzy predicates on continuousdomain: a set of linguistic able to translate a fuzzy query into a standard one, the re- values defined on a continuous domainand througha con- suiting booleanrelation has to satisfy certain criteria related tinuous distribution (e.g. a trapezoidal or a gaussian-like to the original fuzzy query. The most simple wayis to re- possibility distribution); quire the fuzzy query to return a booleanrelation Rb whose tuples are extracted fromthe fuzzyrelation Ry, by consider- ¯ fuzzy predicates on discrete domains:a set of linguistic values defined on a discrete domain,but with two differ- ing a suitable threshold on the fuzzy distribution of Rf. We ent possible types of distribution: continuousfor scalar consider, as in (Bose& Pivert 1995), the followingsyntax SELECT (A) FROM R WHERE discrete attributes (i.e. with orderedvalues) (e.g. a trape- A fe zoidal distribution for the linguistic value youngdefined whosemeaning is that a set of tuples with attribute set A, from relation set R, satisfying the condition fc with degree over the discrete domainof integers of the attribute age) p > A is returned (in fuzzy terms, the A-cut of the fuzzy or discrete for nominalattributes (with unorderedval- ues) (e.g. a vector distribution for the linguistic term relation R.¢ resulting fromthe queryis returned). bright defined over the attribute color, associating to Theidea is then to transformthe fuzzy query into an SQL each color the membershipfunction to the fuzzy set). queryreturning a superset of the A-cutof the result relation. Theinteresting point is that, if werestrict attention to the Furthermore,a set of fuzzy operatorscan be definedto relate kind of queries previouslydiscussed (whichare suitable to fuzzyor crisp predicates; also in this case wehave different modelstandard case retrieval) and if we adopt rain, maxas types of operators: norms,then it is possible to derive froma Fuzzy-SQLquery, ¯ continuous operators: characterized by a continuousdis- an SQLquery returning exactly the A-cut required. This can tribution function (e.g. the operator nearover scalar at- be easily verified as follows: let P be a fuzzy predicate; tributes, characterized by a gaussian-like curve centered wewrite P _> A to indicate that P is satisfied with degree at 0 and workingon the difference of the operands); greater or equal than A; let be DNC(P,>_, A) the derived

104 FLAIRS-2001 in the querylanguage, to use this facility to compute,dur- ing query processing, the actual degree of the tuple, by h£gh 1 accepting only those belongingto the A-cut. 0.8 In the next section, wewill showhow these strategies maybe implementedin a system able to create a Fuzzy-SQLquery froma target case, in order to retrieve a set of similar cases. An Architecture for Fuzzy

price 1100 1800 Case Retrieval Followingthe approach outlined in the previous sections, we have implementeda system able to process Fuzzy-SQL 1, queries on top of a relational DBMS;since emphasisis given

0.8- to case retrieval, the systemprovides a wayof generatingthe relevant fuzzyquery from a target case specification. Thear- chitecture of the systemis shownin Figure 2. A case base is implementedusing a standard , where cases are stored in a suitable set of tables; in additionto the object database, a meta-databasestoring all fuzzy informa- 100 125 tion andknowledge is also required. At the current stage, all the meta-data(definitions of fuzzy predicates and operators) Figure 1 : FuzzyDistributions are stored in a relational format as well, in such a waythat standardqueries can be used to process this kind of informa- tion (we have chosen POSTGRESas DBMSfor the current necessary condition for P > A, i.e. a boolean condition implementation).A Fuzzy-SQLServer is devoted to the pro- such that P _> A --+ DNC(P,>_, A). It is trivial to verify cessing of the fuzzy query, by meansof a Parser/Translator, that implementedusing Lex and Yacc; the syntax of the fuzzy AND(P1,...,P,) > A ~ ,nin(Pt .... ,Pn) > A query is checked and a standard SQLquery is then gener- DNC(Pa, >, A)..., DNC(PI, >, A) ated, by deriving the DNCsusing the fuzzy knowledgein OR(P],...,Pn) > A ++ max(P1 .... ,Pn) > A the meta-database. DNC(P1, >, A)..., DNC(P1, >, A) Finally, a Web-BasedGraphical User Interface has been This implies that, using rain and maxoperators, each de- implementedin .lava, exploiting standard browsers’capabil- rived necessary condition is also a sufficient one and thus ities. TheGUI fulfills the followingrequirements: the obtainedboolean condition can be used to return exactly ¯ 2. automatic Fuzzy-SQLquery generation from a target case the required A-cut Each DNCcan then be obtained from template; the fuzzy distribution associatedto the involvedpredicate. Example. Consider a relation product containing the ¯ manual Fuzzy-SQLquery construction; attribute price over whichthe linguistic term high is de- ¯ meta-data managementoperations (definition of fuzzy fined. Figure I shows the fuzzy distribution of high as knowledge)and database administration. well as the distribution of a fuzzy operator > > (muchhigher than). The condition (price = highAprice >> 1000) > Whilethe user can alwaysrequire a suitable retrieval by con- 0.8 will hold iff rain(price = high, price >> 1000) > structing a fuzzy query by hand, the system also provides a 0.8,iff (price = high)>_ 0.8A (price>> I000)_> morecase-based interface able to construct a fuzzy query iff (1100_< price_< 1800)A(price- 1000) _> 120. from a target case specification. Werestricted our atten- Thelatter condition can be easily translated in a standard tion to flat representationof cases through(feature, value) WHEREclause of SQL. pairs. However,while cases are stored in the database by Unfortunately, there are situations (for instance when using, for each attribute, only values fromthe corresponding norms different that mi,, ,nax are used) where the exact domain,target cases mayalso be specified by using linguis- derivation of the A-cut is not possible; in these cases a su- tic (fuzzy) abstractions on such values. This meansthat, for perset of the actual A-cut will be returned. There are two eachfeature (attribute) to be specifiedin a target case, three possible solutionsto that: different possibilities can be used: 1. to implementan externalfilter receivingin input the result I. specificationof a linguistic value; set of the query, producingin output the actual A-cut, by 2. specificationof a crisp value, to be usedas is; filtering out those tuples havingdegree less than A; 3. specificationof a crisp value to befuzzified. 2. if the DBMSsupport the inclusion of external functions In the last case, fuzzification maytake place in different "-Similarresults holdfor (P, <; A), to be usedwhen negation ways;for instance, in case of a scalar attribute A whosein- involved. put value is a, the fuzzy expression A near a can be used

CASERASED REASONING lOS r AUTOMATI C TARGET CASE RUS~E _~QUERY CONSTRUCTION~

t NORM FILTER Fuzzy Data Fuzzy-SQL M~nagement Operations 4 I Query Result Set ] ! I II

Postgres SERVER SQL Query I (RDBMS Fuzzy Information PARSER/ TRANSLATOR

Datmbase Meta-Database

Figure 2: System Architecture

in case the input value belongs to more than one fuzzy set (i.e. if more linguistic terms are applicable to the input); 1 h:Lgh that case the fuzzification process must take into account a suitable t-conorm (usually the max operator) to combine the ~,-o. 75 iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii... results from each fuzzy set. In addition to the specification of a crisp or linguistic value, in the target case template the user can also specify additional conditions involving specific (possibly fuzzy) op- erators (e.g. the user can specify that it requires cases with 1000 prt.ae price=high and price >> 1000). Finally, a global re- trieval threshold A is specified. A Fuzzy-SQLquery can then Figure 3: Fuzzification Example be generated as follows: i) one or more tables (R1, .. ¯ Rk) whose rows contain the cases of interest for retrieval are se- lected; ii) a fuzzy condition/c is created by taking the AND instead of A : u3. More sophisticated fuzzification strate- of all the expressions (crisp, fuzzy and fuzzified) specified gies can also be defined, by constructing a fuzzy distribution the case template; iii) a query of the form SELECT (A) * FROM WHERE from those defined over the domain of the attribute and from R1,...R k fc the input value. An example is provided in figure 3 where is generated. Figure 4 shows a screen shot of part of a fuzzification threshold 4’ = 0.1 is specified on the fuzzy the interface for constructing the query (the example uses distributions for the linguistic term high of the attribute a slight variation of the travel database available from price with input value 1000; the result is the fuzzification http://www.ai-cbr.org). The execution of this of price= 1000 in terms of the fuzzy distribution shown query will then return a set of cases (in the form of table as a bold line in figure 3; the slopes of the distribution can be tuples) having the required degree of similarity (i.e. at least determined in dependenceon the original fuzzy distribution, A) with the target case. by setting a spread factor similar to that used in the HCV From the architectural point of view, the module of the algorithm (Wu 1999). Fuzzification may be more complex systems have been designed to run on the Internet platform, so communication among module is realized by means of 3Amore general solution could be to provide specific operators TCP/IP protocol. The only constraint, in the current ver- of nearness near~for each individual scalar attribute A. sion, is that the Fuzzy-SQLServer must run on the same ma-

106 FLAIRS-2001 Figure 4: WHEREClause Construction chine where the WWWserver is running, because of the use of data for relational databases. Fuzzy Sets and Systems of Java applets in the implementation of the GUI; no con- 7:213-226. straint is put on the location of the DBMSServer. Since the Burkhard, H.-D. 1998. Extending some concepts CBR: user interface is realized via standard browserfacilities, the foundations of case retrieval nets. In Lenz, M.; Bartsch- client browser, the DBMSand the Fuzzy-SQLServer can Spoerl, B.; Burkhard, H.-D.; and Wess, S., eds., Case in principle be on different machines. Finally, it is worth Based reasoning Technology: from Foundations to Appli- noting that in figure 2, a specific modulecalled NormFil- cations. LNAI1400, Springer. 17-50. ter is shown; this maybe needed in case norms different Kacprzyck, J., and Ziolowski, A. 1986. Database queries than min and maxare adopted, in order to reduce the re- with fuzzy linguistic quantifiers. IEEETransactions on sult set to the desired/-cut. As mentionedbefore, this filter Systems, Manand Cybernetics 16(3). would not be necessary if the Fuzzy-SQLServer includes in the translation process external function calls able to verify, Kitano, H., and Shimazu, H. 1996. The experience- during query processing, the degree of satisfiability of the sharing architecute: a case study in corporate-wide case- tuples; this capability must be supported by the DBMSand based software quality control. In Leake, D., ed., Case the Fuzzy-SQLServer has to use the meta-databasefor func- Based Reasoning: Experiences, Lessons attd Future Di- tion implementation.The different data flow of the result set rections. AAAIPress. in the different situations (use of the filter or not) are shown Liu, C., and Lee, C. G. 1996. Ne,ral Fuzzy Systems. Pren- in figure 2 with dashedlines. At the current stage, the use of tice Hall. external functions, even if supported by POSTGRES,has not Medina, M.; Pons, O.; and Vila, M. 1994. GEFRED,a been implementedyet. GEnerilized Fuzzy model for REletional Databases. h~for- mation Sciences 76( 1-2):87-109. Conclusions Schumacher, J., and Bergmann, R. 2000. An efficient In the present paper a case retrieval system, exploiting a approachto similarity-based retrieval on top of relational Fuzzy-SQLquery language has been presented. The sys- databases. In Blanzieri, E., and Portinale, L., eds., Proc. tem is able to automatically produce a fuzzy query from a 5th EWCBR,273-284. Trento: Lecture Notes in Artificial target case template and can exploit the support of a stan- Intelligence 1898. dard relational DBMSfor storing and retrieving cases. Un- Schumacher, M., and Roth-Berghofer, T. 1999. Archi- like similar approaches, the emphasisis given to acceptance tectures for integration of CBRsystems with databases criteria defined through fuzzy distribution, instead that to for e-commerce. In Proc. 7th GermanWorkshop on CBR distance functions as in NNapproaches. Future work will (GWCBR’99). concentrate on the possibility of creating moreparametric Shimazu, H.; Kitano, H.; and Shibata, A. 1993. Re- queries, also supporting the different importancea given at- trieving cases from relational databases: another strike to- tribute mayhave in specific retrievals. ward corporate-wide case-based systems. In Proc. 13th bTtern. Joint Conferenceon Artificial hltelligence (I J- References CA1"93), 909-914. Aamodt,A., and Plaza, E. 1994. Case-based reasoning: Wilke, W.; Lenz, M.; and Wess, S. 1998. Intelligent Foundational issues, methodologicalvariations and system sales support with cbr. In Lenz, M.; Bartsch-Spoerl, B.; approaches. Al Communications7(1):39-59. Burkhard, H.-D.; and Wess, S., eds., Case Based reason- ing Technology: from Foundations to Applications. LNAI Bosc, P., and Pivert, O. 1995. SQLf:a relational database 1400, Springer. 91-113. language for fuzzy querying. IEEE Transactions on Fuzzy Systems 3(1). Wu,X. 1999. Fuzzyinterpretation of discretized intervals. IEEE Transactions on Fuzzy Systems 7(6):753-759. Buckles, B., and Petry, E 1982. A fuzzy representation

CASEBASED REASONING 107