Natural Language to SQL Generation for Semantic Knowledge Extraction in Social Web Sources
Total Page:16
File Type:pdf, Size:1020Kb
ISSN (Online) : 0974-5645 ISSN (Print) : 0974-6846 Indian Journal of Science and Technology, Vol 8(1), 01–10, January 2015 DOI: 10.17485/ijst/2015/v8i1/54123 Natural language to SQL Generation for Semantic Knowledge Extraction in Social Web Sources K. Javubar Sathick* and A. Jaya Department of Computer Applications, B.S. Abdur Rahman University, Chennai, India; [email protected], [email protected] Abstract Enormous evolution of web data creates a peculiar myth in the field of computer and information technology for extracting the meaningful content from the web. Many organizations and social networks use databases for storing information and the data will be fetched from the specified data store. Data can be retrieved or accessed by SQL queries whereas the query is in the form of natural lingual statement which has to be processed. So, the primary objective of this research article is to find the suitable way to convert natural language query to SQL and make the data apt for semantic extraction. This Research paper also aims to derive an automatic query translator for Natural Language based questions into their associated SQL queries and provides an user friendly interface between end user and the database for easy access of social web data from different web sources such as facebook, twitter and linkedIn etc.,. This paper is implemented using java as the front end, SQL server as the back end and R-tool is used to collect the data from social web sources. This research article provides an oKpetiymwizoerdd SsQ:L query generation for the Natural Language question provided by the end user. Natural Language Interface for Databases (NLIDB), Natural Language Processing (NLP), R-tool, Semantic Knowledge Extraction (SKE), Structured Query Language (SQL), Social Web Data 1. Introduction of social media the query conversion is quite important in terms of bringing out the exact data which is requested Data Storage plays an important role in today’s commercial by the web users who surf on the net. The query /request system especially with progression of social media, the size will be of natural statement such as blog, comment, tweets of the data in database and data accessing pace become etc., these statement must be converted into a most reli- more crucial part in the recent research world. Plenty of able and acceptable form of typical query which system new database tools and emerging technologies are growing can understand and crack the exact data from the data- in a wide spectrum, therefore the provision for storing the base. So these factors are acting as a precious evidence for huge set of data is available, but the fact that the technol- implementing the proposed work through this article. The ogy or an interface which can process the user request objective of NLP is to facilitate communication between and pulls the exact data as per the request from the huge human and computers without memorization of multi- database is not familiarized as that of storage provisions. faceted instructions and procedures. In other words, NLP Most of the businesses and social sites need these types of is the technique that can make the workstation to under- applications by using the SQL language. Natural Language stand the natural languages used by users. In the current Processing (NLP) is becoming one of the most active world the basic requirement of commercial and social techniques used in Human-computer Interaction which based system is to extract the data from a database such plays a vital role since the social media started playing its as MS Access, Oracle and Hadoop where the large col- part to a large extent in the current trend. In the context lection of data is stored. An end user or layman without *Author for correspondence Natural language to SQL Generation for Semantic Knowledge Extraction in Social Web Sources knowledge of SQL may find quite difficult to extract the users especially for online applications, search engines data with the complicated query in corresponding with and many other different databases, where accuracy and the database. Therefore in this work the development of efficiency are most important terms required. Various system for people to interact with the database in simple analyses shows user is not restricted to formulate any kind English language is implemented and analyzed for the of query so this system provides result to users any type accuracy. This enables a user to input their queries in sim- of query he fires to the system accurately and efficiently. ple English and get the answer in same language which The semantic search is not enabled in the system which is is referred as Natural Language Interface to a Database enabled in this research work. (NLIDB).The knowledge extraction is enabled with the Gaganpreet Kaur in2 emphasize about a usage successful implementation of SQL generation from the of regular expressions in NLP to search text is well natural language statement. Once the query is processed, known and understood as a useful technique. Regular then extracting the semantic knowledge from the social Expressions are generic representations for a string or a web sources done by analyzing the data collected in the collection of strings. Regular expressions (regexps) are R-tool interface. one of the most useful tools in computer science. NLP, as The paper is organized as follows. Section 2 provides an area of computer science, has greatly benefitted from background and related work.Section 3 describes regexps: they are used in phonology, morphology, text the architectural layout of the proposed work which analysis, information extraction, & speech recognition. It projects the actual flow of NLP.Section 4 discusses the helps a reader to give a general review on usage of regular Implementation and results obtained out of sample expressions from natural language processing, whereas query processing experiment conducted. Section 5 briefly the clear collection of expression in sentence is not clearly analyse the generated the SQL with Precision and Recall specified which is acting as a issue which will be addressed Threshold measure.Section 6 summarizes the most in this research article. important conclusions of research work and draws the Mahesh P. Gaikwad in3 discussed about the need future lines of research for natural language interfaces to database has become increasingly acute as more and more people access infor- 2. Background and Related Works mation from web browsers. Yet NLI (Natural Language Interface) is only usable if they map natural language The various attempts have been made so far for conversion questions to SQL queries correctly. Natural Language of natural language to dedicate the SQL query for easy processing is becoming one of the most active areas accessing of database. NLP database interfaces are just as in Human-Computer Interaction. The goal of NLP old as any other NLP research. Posting query to databases (Natural Language Processing) is to enable communica- in natural language is a exceptionally fitting and uncom- tion between people and computers without requiring to plicated method of data access, especially for unfussy memorization of complex commands and procedures. users who do not recognize the complex database query The main purpose of natural Language Query Processing languages such as SQL. The accomplishment in this area is for an English sentence to be interpreted by the com- is partly because of the real-world remuneration that can puter and appropriate action taken. The use NLI in NLP come from database NLP systems, and to a certain extent is elaborated in this article to a large extent to attain the because NLP works very well in a single-database domain, actual purpose of query processing system. but the fact that the social web sources is the combination Jasmeen Kaur et al in4 describes about the purpose of various aspects, therefore the flexible interface is need. of Natural Language Query Processing which is used to Some of the contributions made by several authors were interpret an English sentence and hence a complementary highlighted in this section. Databases usually provide action is taken. Querying to databases in natural lan- small enough domains that ambiguity problems in natu- guage is a convenient method for data access, especially ral language can be resolved successfully. Akshay G. Satav for newbie’s who have less knowledge about complicated et al1 proposes a system that will provide search interface/ database query languages such as SQL. The author empha- NLP System for users without knowing any specific syntax size on the structural designing methods for translating or knowledge of a database language. Hence the author English Query into SQL using automata. A system that present a system that will provide the search interface for is capable of handling simple queries with standard join 2 Vol 8 (1) | January 2015 | www.indjst.org Indian Journal of Science and Technology K. Javubar Sathick and A. Jaya conditions is introduced here but because not all forms of Arati K. Deshpande and Prakash. R. Devale in8, SQL queries are supported, further development would proposed “Natural Language Processing using probabilis- be required which is carried out in the proposed research tic context free grammar”, the author discussed a method through this article. to create new NLDBI system using Probabilistic Context As database plays a vital role, here are some examples Free Grammar (PCFG). This paper highlights, Natural of database NLP systems proposed by Anil M. Bhadgale language statement is converted into internal represen- et al5 LUNAR (Woods, 1973) involved a system that tation based on the syntactic and semantic knowledge answered questions about rock samples brought back from of the natural language. This representation is then the moon. Two databases were used, the chemical analy- converted into queries using a representation converter, ses and the literature references.