International Journal of Fuzzy Mathematics and Systems. ISSN 2248-9940 Volume 1, Number 2 (2011), pp. 173-180 © India Publications http://www.ripublication.com

On Effective Data Retrieval from SQL by use of Fuzzy

1Y.S. Pawar, 2R.G. Sapre and 3Sayali R. Sapre

1Rukmai Technical Institute, Solapur, Maharashtra, India 2Department of Mathematics, Gogate Jogalekar College, Ratnagiri, Maharashtra, India 3Gogate Jogalekar College, Ratnagiri, Maharashtra, India

Abstract

The Structured Query Language (SQL) is a very powerful tool, it is unable to satisfy needs for data selection based on linguistic expressions and degrees of . The goal of the research whose results are presented in the paper is to capture these expressions and make them suitable for queries. For this purpose the fuzzy generalised logical condition for the WHERE part of SQL was developed. In this way, queries based on linguistic expressions are supported and are accessing relational in the same way as with the SQL. Fuzzy query is not only a querying tool; it improves the meaning of a query and extracts additional valuable information. Statistical data about districts of the Slovak Republic are used in the case study. Fuzzy approach has some limitations that would appear in a querying process. These limitations and ideas how to solve them are outlined in this paper.

Keywords: SQL, fuzzy queries, fuzzy logical condition.

AMS Classification Number: 03B52

Introduction This paper examines situations when querying process by the two-valued realisation of is not adequate and offers solution based on the fuzzy logic because the fuzzy logic is an approach to computing based on "degrees of truth" rather than the usual "true or false" logic. The fuzzy logic deals with reasoning that is approximate rather than precise to solve problems in a way that more resembles human logic. Fuzzy Logic allows us to form queries which expresses semantic intent

174 Y.S. Pawar et al

of our queries. This area of research is not new one but there are still many possibilities and ways for the improvement of existing approaches and for creating new approaches. Fuzzy queries have emerged in the last 25 years to deal with the necessity to soften the two-valued Boolean logic in relational databases. A fuzzy query system is an interface to users to get information from database using regular language sentences. Many fuzzy query implementations have been proposed, resulting in slightly different languages. Although there are some variations according to the particularities of different implementations, the answer to a fuzzy query sentence is generally a list of records, ranked by the degree of matching

SQL and its limitations Users search databases in order to obtain data needed for analysis, decision making or to satisfy their curiosity. The SQL is a standard query language for relational databases. The simple SQL query is as follows: select attribute_1,…,attribute_n from T where attribute_p > P and attribute_r < R

Figure 1: The result of the classical query.

The result of the query is shown in graphical mode in figure 1. Values P and R delimit the space of interesting data. Small squares in the graph show database records. In the graph it is obviously shown that three records are very close to meet the query criterion. These records have potentials to satisfy the query but due to rigidity of SQL query they will not be included in the output.

On Effective Data Retrieval from SQL by use of Fuzzy Logic 175

The SQL uses the crisp logic in querying process that causes crisp selection. It means that the record would have not been selected even if it is extremely close to the intent of the query criterion. As the criterion becomes more and more complex, the set of records selected by the WHERE statement becomes more and more crisp. Since then the SQL has been used in many relational databases and information systems for data selection. The use of SQL may be regarded as one of the main reasons for the success of relational databases in the commercial world In this research the core of SQL remains intact and the extension is done to improve the selection process. Adding some flexibility to the SQL meets above mentioned requirements and increases effectiveness and comprehensibility of the whole querying process.

Fuzzy Logic is combined with Classical SQL • When we use SQL to retrieve data, we have to fire queries. • The queries are based on certain relational conditions. • Relational conditions are connected to each other by logical operators. • After the query is fired, SQL displays the output satisfying given conditions • Some concepts and ideas that we come across in daily life can not be expressed properly by rigid conditions. • The output given by SQL query analyzer does not fit human understanding of the situations. • We intend to use ‘fuzzy logic’ in all such situations so that the output should reflect semantic intent of the query.

Case Study Suppose, School Administrator has to search a Student Database for all “good” students. She first has to make some arbitrary decisions about what exactly a "good" student is. One might say that "good" students are those students who have a Grade Point Average (GPA) of 3.5 or higher. And so, the following SQL query is devised: SELECT * FROM STUDENTS WHERE GPA >= 3.5;

Immediately a list of students who have grade point averages at or above 3.5 will be displayed. SQL gives what you ask for. It works with Boolean logic and so can not reflect semantic intent of the query.

The answer is you get exactly what you ask for and only what you ask for. Suppose we have to also consider other parameters: 1. Academic achievement of the student (GPA). 2. Absenty of the student (Absenty).

176 Y.S. Pawar et al

3. Participation in sports activities (SP). 4. Participation in cultural activities (CU).

Suppose, Goodness of a student depends on these 4 parameters. Suppose goodness is defined as GPA >= 3.5, Absenty < 10, SP upto atleast state level and CU upto atleast university level. So following query is fired SELECT * FROM STUDENTS WHERE (GPA >= 3.5) AND (Absenty < 10) AND SP IN(‘State’,’National’,’International’) AND CU IN(‘University’,’State’,’National’); SQL returns a list of all students who satisfy the given conditions in fired query.

Name GPA Absenty SP CU Juili Khatu 4.00 3.00 University State Govind Date 3.50 9.80 State State Ranjan Wakde 3.80 4.50 International University Tejas Kadam 4.00 1.00 State State Kaustubh Bhagwat 3.80 1.00 State National Atharwa Jog 3.80 4.00 State State Kedar Joshi 3.90 2.00 National University Leena Gore 4.00 9.80 National University Sai Bhat 4.00 7.00 State National

It is very difficult to see who the best students Is? Sorting can be done by using GPA as key field. The query fired is: SELECT * FROM STUDENTSWHERE (GPA >= 3.5) AND (Absenty < 10) ORDER BY GPA DESC, ABSENTY ASC;

Name GPA Absenty SP CU Tejas Kadam 4.00 1.00 State State Juili Khatu 4.00 3.00 University State Sai Bhat 4.00 7.00 State National Leena Gore 4.00 9.80 National University Kedar Joshi 3.90 2.00 National University Kaustubh Bhagwat 3.80 1.00 State National Atharw Jog 3.80 4.00 State State Ranjan Wakde 3.80 4.50 International University Govind Date 3.50 9.80 State State

On Effective Data Retrieval from SQL by use of Fuzzy Logic 177

We can see that semantic intent of the query is not reflected in the query. The output of the query does not include names of the students who should surely be there in the list. For example: Students having GPA = 3.49 and SP = “National” Students having ABSENTY = 10.1 and CU = “National” The students who fit in the definition of “Good Students” are not listed here because of rigidness of the SQL query. Also the order does not reflect order of GOODness properly. This is where we use fuzzy logic Fuzzy logic is well suited to express the intent of a database query when the semantics of the query are rather vague. What is needed is a way to express the idea of "good grades" and "good attendance" also sports grade and cultural grade on a sliding scale. This would allow the two ideas to be expressed in terms of degree rather than in absolute terms. Suppose student A has a 4.0 GPA and student B has a 3.5 GPA. Both have good grades. However, it is plain to see that only student A fits the idea of having good grades. In fuzzy logic, we would say that student A has complete membership in the set of students with good grades, while student B only has partial membership in the set of students with good grades. Under certain circumstances, students with grades as low as 3.0 might be considered to have good grades. This idea becomes especially important in situations where more than one column of data is used to define a semantic notion. In this case, data from the GPA column and the ABSENTY column are somehow combined to form the semantic notion "GOOD STUDENT".

GPA Gradations

Range Linguistic termGrade 0 – 0.5 Bad 0 0.5 – 1 Very poor 0.1 1 – 1.5 Poor 0.2 1.5 – 2 Less than avg. 0.3 2 – 2.5 Above avg. 0.4 2.5 – 3 Average 0.5 3 – 3.5 Good 0.6 3.5 – 4 Very good 0.7 4 – 4.5 Excellent 0.8 4.5 – 4.8 Superb 0.9 4.8 - 5 Outstanding 1

178 Y.S. Pawar et al

Absenty Gradations

Range Linguistic termGrade 10 or above Very bad 0 9 – 10 Bad 0.1 8 – 9 Very poor 0.2 7 – 8 Poor 0.3 6 – 7 Less avg. 0.4 5 – 6 Average 0.5 4 – 5 Above avg. 0.6 3 – 4 Good 0.7 2 – 3 Very good 0.8 1 – 2 Excellent 0.9 0 – 1 Outstanding 1

Sports Level Gradations

Levels Grades International 1.0 National 0.852 State 0.710 University 0.568 District 0.426 Others 0.284 School / College 0.142 No participation 0

Cultural level participation

Level Grades National 1 State 0.8 University 0.6 District 0.4 School / College 0.2 No participation 0

Algorithm Consider attributes which can not be expressed rigidly for getting proper semantic intent. 1. Assign membership grades to various levels in case of these attributes. 2. Add new columns which contain values depending on membership grades of these attributes. Prefix the letter M to such columns.

On Effective Data Retrieval from SQL by use of Fuzzy Logic 179

3. Calculate mean of all the values of M columns and name as IG. ( Index of Goodness ). 4. Fire SQL query targeting this new column.

SQL query can be Select * from students where IG > 0.4 We can also categorize students with different levels of GOODness depending on IG values.

Actually what happenes ? GPA Absenty SP CU Calculation Leena Gore 4.00 9.80 National University Grades 0.8 0.1 0.852 0.6 2.352 / 4 = 0.588 Kedar Joshi 3.90 2.00 National University Grades 0.7 0.8 0.852 0.6 2.952 / 4 = 0.738

So Kedar Joshi is good student compared with Leena Gore by using Fuzzy Logic which is more accurate and can not be found in SQL directly.

Similarly GPA Absenty SP CU Calculation A 3.00 0.00 National National Grades 0.70 1.00 0.852 1 3.552 / 4 = 0.888 B 5.00 10.00 National National Grades 0.70 1.00 0.852 1 3.352 / 4 = 0.838

So the students which are good but couldn’t be in the list of GOOD STUDENTS, now they can be in that list One can see from this Case study that when fuzzy logic is combined with standard data retrieval methods, the classical barriers to producing semantically meaningful information tend to evaporate. What you are left with is a tool for extracting information from a database in a way that more closely matches the way we experience the world.

Conclusion The technique is very much useful to remove the due to rigidity of SQL queries. There are many situations in daily life that we come across where rigidity of sql commands limits our search. E.g when we are required to measure a) Efficiency of employee b) Intelligence of student c)Honesty of a person d) Leadership quality of a person e) One person’s influence on other etc. All these things can not rigidly

180 Y.S. Pawar et al

measured. These qualities depend on various parameters and we have to consider all these parameters before final conclusion. So if we have to retrieve data from database we can use the method described in case study and it will then be more near to our expectations. The output will reflect semantic intent behind our query.

References

[1] Benčič, A., Hudec, M.: MOŠ/MIS–Urban and municipal statistics project and information system of the Slovak Republic.In Proceedings of the SYM-OP-IS. Vuletić Print, Tara, Serbia, XXI-32--XXI-35. (2002) [2] Branco, A., Evsukoff, A., Ebecken, N.: Generating Fuzzy Queries from Weighted Fuzzy Classifier Rules. In Proceedings of the ICDM workshop on Computational Intelligence in Data Mining. IOS Press, Huston, USA, 21-28. (2005) [3] Cox, E.: Fuzzy modeling and genetic algorithms for data mining and exploration. Morgan Kaufman, San Francisco, USA. (2005) [4] Galindo, J., Urrutia, A., Piattini, M., Fuzzy Databases: Modeling, Design and Implementation, Idea Group Publishing, Hershey, USA. (2006) [5] Hudec, M.: Fuzzy improvement of the SQL. In Proceedings of the Balkan Conference on Operational Research (Balcor). Beograd, Serbia, 255-265 (2007). [6] Hudec, M.: Fuzzy SQL for statistical databases. In MSIS, Meeting on the Management of Statistical Information Systems. Luxembourg. (2008) [Online]. Available: http://www.unece.org/stats/documents/ece/ces/ge.50/2008/wp.12.e.pdf (January 2009) [7] Fuzzy Sets, And Applications: George J. Klir.Tina A. Folger. [8] Data-base Management Systems by Shital Gujar and Takale.