Exploiting Fuzzy-SQL in Case-Based Reasoning
Total Page:16
File Type:pdf, Size:1020Kb
Exploiting Fuzzy-SQL in Case-Based Reasoning Luigi Portinale and Andrea Verrua Dipartimentodi Scienze e Tecnoiogie Avanzate(DISTA) Universita’ del PiemonteOrientale "AmedeoAvogadro" C.so Borsalino 54 - 15100Alessandria (ITALY) e-mail: portinal @mfn.unipmn.it Abstract similarity-basedretrieval is the fundamentalstep that allows one to start with a set of relevant cases (e.g. the mostrele- The use of database technologies for implementingCBR techniquesis attractinga lot of attentionfor severalreasons. vant products in e-commerce),in order to apply any needed First, the possibility of usingstandard DBMS for storing and revision and/or refinement. representingcases significantly reduces the effort neededto Case retrieval algorithms usually focus on implement- developa CBRsystem; in fact, data of interest are usually ing Nearest-Neighbor(NN) techniques, where local simi- alreadystored into relational databasesand used for differ- larity metrics relative to single features are combinedin a ent purposesas well. Finally, the use of standardquery lan- weightedway to get a global similarity betweena retrieved guages,like SQL,may facilitate the introductionof a case- and a target case. In (Burkhard1998), it is arguedthat the basedsystem into the real-world,by puttingretrieval on the notion of acceptancemay represent the needs of a flexible sameground of normaldatabase queries. Unfortunately,SQL case retrieval methodologybetter than distance (or similar- is not able to deal with queries like those neededin a CBR ity). Asfor distance, local acceptancefunctions can be com- system,so different approacheshave been tried, in orderto buildretrieval engines able to exploit,at thelower level, stan- bined into global acceptancefunctions to determinewhether dard SQL.In this paper, wepresent a proposalwhere case a target case is acceptable(i.e. it is retrieved) with respect retrieval is implementedby using a fuzzyversion of SQL.In to a given query. In particular, very often local acceptance the presentedapproach a fuzzynotion of similarityis adopted functions take the formof fuzzy distributions; in this way, and an extendedversion of SQL,dealing with fuzzypredi- the definition of a fuzzy linguistic term over the domainof cates, is introduced.A case-based system prototype exploit- a given attribute of a case can be exploited to characterize ing Fuzzy-SQLas a retrieval engineis then presented. the acceptanceof cases having similar (in the fuzzy sense) valuesfor that particularattribute or feature. Introduction In the present paper we will present an approachwhere local acceptancerelative to a feature can be actually ex- The use of database techniques for supporting the construc- pressed throughfuzzy distributions on its domain,abstract- tion of case-basedsystems is attracting serious attention; the ing the actual values to linguistic terms. Thepeculiarity of reasonsare twofold:i) if data of interest are alreadystored the approachis that global acceptanceis completeddefined in a database, the databaseitself can be used as a case base; in fuzzy terms, by meansof the usual combinationsof lo- ii) part of the technological facilities of a DBMSmay be cal distributions throughspecific defined norms(Liu & Lee exploitedin the CBRcycle, in particular for case retrieval. 1996). Moreover,an extendedversion of SQL,able to deal This is particularly true in e-commerceapplications, with fuzzypredicates andconditions, is introducedas a suit- wherethe productdatabase represents the main data source, able wayto directly query a case base stored on a relational whoserecords are easily interpretable as "cases" for a case- DBMS;this approach is based on the SQLflanguage pro- based recommendation system (Schumacher & Bergmann posed in (Bose & Pivert 1995), extending standard SQL 2000; Schumacher& Roth-Berghofer 1999; Wilke, Lenz, order to deal with fuzzy queries. A systemprototype able to & Wess1998), as well as in the broad context of knowl- generate a Fuzzy-SQLquery from a case specification and edge management,for the implementation of corporate- workingon top of a standard relational DBMShas been im- wide case-based systems (Kitano & Shimazu 1996; Shi- plemented;its mainfeatures will be presentedin the paper. mazu,Kitano, & Shibata 1993). All these database driven proposalsfocus on the retrieval step of the CBRcycle l, since Fuzzy Queryingin Databases Copyright(~) 2001,American Association for Artificial Intelli- gence(www.aaai.org). All rights reserved. It is well-knownthat standard relational databases can only ] Actually,in (Wilke,Lenz, & Wess 1998) a revisionof the clas- deal with precise informationand standard query languages, sical CBRcycle presentedin (Aamodt& Plaza 1994)is proposed, like SQL,only support boolean queries. Fuzzylogic pro- by specifically taking into accountthe e-commerceapplication; videsa natural wayof generalizingthe strict satisfiability of howeverno particular databasetechnique is specificallyproposed booleanlogic to a notion of satisfiability with different de- for implementingthis newcycle. grees; this is the reason whyconsiderable efforts has been CASEBASED REASONING 103 dedicated inside the database communitytoward the possi- ¯ discrete operators:defining a similarity relation charac- bility of dealing with fuzzy informationin a database. terized by a symmetricmatrix of similarity degrees (e.g. There are at least two mainapproaches to the problem:i) the operator compatible over the attribute job defin- explicit representation of vague and imprecise information ing to whatdegree a pair of different jobs are compatible). inside the database (Buckles & Petry 1982; Medina,Pons, By allowing fuzzy predicates and operators to form the con- & Vila 1994); ii) vague and imprecise querying on a regu- dition of the WHEREclause, the result of the SELECTis lar database (Bose & Pivert 1995; Kacprzyck& Ziolowski actually a fuzzyrelation, i.e. a set of tuples with associated 1986). Evenif the first class of approachesis moregeneral the degree to whichthey satisfy the WHEREclause. Such a that the secondone, wewill concentrateon the latter, since degreecan be characterizedas follows: let the interest in this paper is to showhow flexible querying SELECT A FROM R WHERE fc in a standard database can be used to implementcase re- be a querywith fuzzycondition fc; the result will be a fuzzy trieval. In particular, in (Bose& Pivert 1995)standard SQL relation Rf with membershipfunction is adoptedas the starting point for a set of extensionsable to improvequery capabilities from boolean to fuzzy ones. pRf(a) = sup l~/c(z) Whilethe long term goal of the approachis to integrate such (zER)A(a~.A=a) extensions directly at the DBMSlevel (directly providing Thefuzzy distribution p.cc(z) relative to fc mustthen to the user with specific DBMSalgorithms for flexible query computedby taking into accountthe logical connectivesin- processing), in the short term the implementationof the SQL volvedand their fuzzy interpretation. It is well knownthat extensionscan be actually providedon top of a standard re- the generalway to give a fuzzyinterpretation to logical con- lational DBMS,by meansof a suitable moduleable to trans- nectives is to associate negation with complementto one, forma fuzzy query in a regular one (derivation principle). conjunctionwith a suitable t-normand disjunction with the In the following, we concentrate on this methodology,by correspondingt-conorm (Liu & Lee 1996). In the following, defining a set of fuzzy extensionsto SQLthat maybe of in- we will concentrate on the simplest t-norm and t-eonorm, terest for CBRand by showinghow they can be processed in namelythe rain and maxoperators such that order to transformthem in standard queries; we finally show howthis can be actually implementedfor case retrieval pur- I~AA.(Z) = min(pA(X), IJBCz)) poses. Weare interested in simple SQLstatements with no #AvB(X) ma x(/zA (x),/~B (X nesting (i.e. we consider the WHEREclause to be a ref- We willdiscuss the implementative advantages of this erence to an actual condition and not to nested SQLstate- choiceand what problems have to be addressedwith other ments); in our Fuzzy-SQLlanguage, the condition can be norms. a compositefuzzy formula involving both crisp and fuzzy predicates and operators. Deriving Standard SQLfrom Fuzzy-SQL In order to process a query using a standard DBMS,we have Fuzzy Predicates to devise a wayof translating the Fuzzy-SQLstatement into A set of linguistic values can be defined over the domains a standard one. Wehave noticed that the result of a fuzzy of the attributes of interest; three different types of fuzzy query is alwaysa fuzzy relation, while the result of a stan- predicates are considered: dard queryis a booleanrelation; this meansthat, if weare ¯ fuzzy predicates on continuousdomain: a set of linguistic able to translate a fuzzy query into a standard one, the re- values defined on a continuous domainand througha con- suiting booleanrelation has to satisfy certain criteria related tinuous distribution (e.g. a trapezoidal or a gaussian-like to the original fuzzy query. The most simple wayis to re- possibility distribution); quire the fuzzy query to return a booleanrelation Rb whose tuples are extracted fromthe fuzzyrelation Ry, by consider- ¯ fuzzy predicates on discrete domains:a set of linguistic values defined on a discrete domain,but with two differ- ing a suitable threshold on the fuzzy distribution of Rf. We ent possible types of distribution: continuousfor scalar consider, as in (Bose& Pivert 1995), the followingsyntax SELECT