International Journal of Scientific & Engineering Research Volume 9, Issue 11, November-2018 470 ISSN 2229-5518 Review on Semantic Query Formulation Approaches Rufai AliyuYauri, Haruna A Yelwa Information and C Communication Technology Department Kebbi State University of Science and Technology, Nigeria. Multi Media Uinversit, Malaysia [email protected] , [email protected]

Abstract-Semantic Web plays important role in the access of Web data based on the meaning instead of traditional keyword search. However, the main challenges of systems is complex structured queries such as SPARQL are needed in order to retrieve data semantically. To address this challenge, several efforts have been put in place by researchers to enable user use natural language query for retrieval instead of using complex structured query syntax. Natural language query is transformed into structured query for retrieving semantically which is refers to as semantic query formulation. This study presents a review on semantic query formulation researches that has been reported by researchers. The main motivation of the paper is because of the growing interest semantic search systems globally. The study will give an up to date works on the area of semantic query formulation which will assist in furthering up researches in the area

Keywords: , Ontology, Semantic Query Formulation, Retrieval

1.0 INTRODUCTION words Web Ontology Language (OWL), is commonly defined as formal, and explicit The semantic web is described as a web of specifications of shared conceptualization. . (Shadbolt, Berners-Lee & Hall 2006) Formal signifies ontology as a machine-readable define the semantic web as “An extension of the format. Whereas, the concepts or entities used current version of web where information is given are explicitly described, shared, and displayed, well-defined meaning, enabling computers and ontology is concept that captures knowledge in a people to work in co-corporation”. It is designed widely acceptable standard, and its to overcome the challenges of the current web conceptualization reflects ontology as a notion search systems, which are mainly designed for that identifies entities in the real world (Hu, presenting, organising and linking data in the 2004). In the semantic web, data is represented form of text, videos, audio and images. The in formal ontology format. Where ontology model structure of data on the current World Wide Web data is in the form of concepts and relationships is published in suchIJSER a way that the data can only between concepts. In the semantic web these be understood by humans, computer programs concepts and their corresponding relationships cannot understand its meaning. The web search are represented in RDF graphical format. RDF is systems struggle with the aggregation and World Wide Web Consortium’s (W3C) standard querying of information without having a syntax for representing concepts and consistent way of achieving such tasks, whereas relationships between concepts in graphical the semantic web concept enables linked format. RDF shows or describes the relationship documents on the web and assigning better between a concept and its literal. (Patel- meaning for both human and computer Schneider, 2005). It is presented in triple as understanding. In other words it improves the below.< Predicate> current world wide web structure from that of Subject and object represents an ontology interconnected documents to semantically driven concept, while the predicate represents the documents that allow better aggregation of relationship between these concepts. RDF triple information, storage, manipulation and retrieval format was mainly a practical rule language for (Kück, 2004). This provides a better enabling computers to understand, manipulate and share environment for promoting good working data ( Decker, Sintek, Andreas, Nicola, Andreas, relationships between humans and computers. Leicher, Susanne, Weathers , Gustaf The main building block of semantic web is &Zolt,2007.). Therefore, an RDF graph shows ontology. data represented in a format by which computers The main building block of the semantic web is could understand the meaning and processes. ontology, which transforms web content into a RDF triple formats are stored in a machine-readable format that can be knowledgebase, which enables computers to manipulated (Ahmed & Gerhard, 2007). Ontology access the data and manipulate it semantically. is the main building block of the semantic web Semantic searches are seen as a semantic web which transforms web content into a machine- technology approach to interpreting search readable and format that can be manipulated queries and resources based on underlying (Ahmed & Gerhard, 2007). Ontology, in other ontologies, labelling some contextual domain

IJSER © 2018 http://www.ijser.org International Journal of Scientific & Engineering Research Volume 9, Issue 11, November-2018 471 ISSN 2229-5518

knowledge, by connecting web resources to semantic search engines, so as to enable users semantic annotations (Fazzinga & Lukasiewicz to have easy access to structured datasuch as 2010). A semantic search is a data retrieval RDF data representation in the knowledgebase. mechanism that integrates the capabilities of the The semantic search approaches mentioned semantic web and search engines in order to get earlier have been adopted by different semantic more precise results than the current search query formulations systems. We will examine the engine. The main basic concept of a semantic various research that has reported on semantic search is that it semantically manipulates and query formulation in the next section. transforms a natural language query to a structured formal query such Protocol and RDF Query Language (SPARQL), 2.0 PREVIOUS RESEARCH ON RDF Data Query Language SEMANTIC QUERY (RDQL),Sesame RDF Query Language(SeRQL), Triple pattern matching among others (Tablan et FORMULATION al.,2008.; Esmaili & Abolhassani, 2006). These structured queries enable users to retrieve data In recent years much more information has from knowledgebase and other knowledge- become available on the web, in databases, related sources. knowledgebase and other related document A structured query involves the use of formal storage and manipulation mechanisms. Quite a structured rules to generate a query in order to number of organisations use automated use it for a retrieval process (Tannier, Girardot, information systems, mostly for storage of their Mathieu, & Saint-étienne, 2002). A structured information. Many important decisions are made query involves complex syntax and therefore based on adequate support for information, users need to be familiar with this before it can which remains a problem, as retrieval is based be used. Figure 1 shows an example of a on complex schemata. This has led to a number structured query in SPARQL query language. of research projects that focus on the area of query formulation (Hofstede, Proper, & Nijmegen, 1995). Several research projects have PREFIX foaf: been proposed that attempt to semantically search for knowledge stored in a SELECT ?x ?name knowledgebase. Currently, there are couple of semantic search research projects that are WHERE { ?x foaf:name ?name } designed for retrieval of a knowledgebase from Figure 1 Example of structured language different domains. These semantic search (SPARQL) systems attempt to semantically search knowledge represented in the knowledgebase by The query above is a structured query in semantically formulating user queries in various SPARQL syntax, andIJSER in natural language means distinct ways (Blanco, Halpin, Herzig, Mika, “select all names that are in friend of friend data Pound, Thompson, Tran &Thanh, 2013; Madhu, (foaf). The user therefore needs to have prior Govardhan, & Rajinikanth, 2011). Previously knowledge of what they should query, which is reported systems on semantic query formulation too complex for users and requires them to learn can be categorised into three approaches: mainly the syntax before they can query the manual, semi-automatic and automatic semantic knowledgebase using structured formal query query formulation systems. Most of these language. To simplify access to data in the semantic query formulation approaches knowledgebase, users should therefore be formulates natural language queries into triple allowed to use their favourite natural language format and then to corresponding SPARQL, query, so that they can pose their query using a because SPARQL is the most powerful formal natural language and system which assists the language for retrieval from a knowledgebase, as user to formulate the query into a structured recommended by the W3C group. query which is what is refers to as semantic query formulation. 2.1 MANUAL SEMANTIC QUERY FORMULATION. Semantic query formulation is the transformation of a natural language query into a formal The manual semantic query formulation process structured query such as SQL, SPARQL, and is a semantic query formulation approach where SeRQL, among others. The goal of query a user semantically manually constructs a formulation is to assist users by formulating their structured query language such as SPARQL. natural language query into a formal structured The user manually writes the syntax or query in order to enable them retrieve relevant represents their query in the same format as the information from a knowledgebase. knowledgebase data, such as triple. The In recent years, there has been an increase in manually constructed semantic query is then research interest about the semantic web and used against the knowledgebase for retrieval.

IJSER © 2018 http://www.ijser.org International Journal of Scientific & Engineering Research Volume 9, Issue 11, November-2018 472 ISSN 2229-5518

Ontology editors such as Protégée and some query, which is used to retrieve knowledge from query editors like Virtuoso SPARQL, Flint the knowledgebase. Most of the systems using SPARQL Editor, and Drupal SPARQL Query the semantic query formulation approach are Builder among others are systems that enable based on the semi-automatic semantic query users to manually formulate a formal query formulation approach. The semi-automatic language and retrieve knowledge from the semantic query formulation approach comprises knowledgebase. Protégée enables users to of a combination of the automatic and manual manually construct a SPARQL query in order to approach where computers and humans retrieve knowledge stored in Protégée. In collaborate to semantically formulate a formal Protégée, a user can use the SPARQL query tab query. In this approach, users don’t need to be provided to construct a SPARQL query, and fully familiar with the complex syntax of a formal execute the query against the knowledge stored query language or know exactly how information in Protégée. The result of the corresponding is represented in the knowledge before they can constructed query is a return in the form of triples pose their query and get answer. or concepts to the user depending of what they Some of the semi-automatic semantic query are looking for. formulation systems are template-based, where Virtuoso SPARQL Query Editor is another the user is presented with menu-based system that enables users to construct SPARQL information from which they choose variables queries in order to retrieve information. Virtuoso that are used for semantic query formulation. The presents an interface to the user where the user template base query formulation approach is formulate the SPARQL query, and the system presented in the KIM system (Popov. et al, 2003) then formulate the SPARQL query to the It was designed to go a step further than the equivalent SQL query to retrieve data from triple manual semantic query formulation system. The store tables. In Virtuoso, all graphs are stored in KIM approach was mainly to combine automatic one large table containing annotated ontology and manual semantic query formulation concepts. approach. The system saves the user from going Another query editor that allows users to through the complexity of a structured query and manually construct formal queries was presented the effort of knowing the structure of the in the work of ( Kharlamov, Giese, Soylu, knowledgebase, before querying and retrieving a AZheleznyakov, Bagosi, Console, Haase, result. In this system, the user is presented with Horrocks, Marciuska, Pinkel, Ruzzi, Santarelli, predefined query templates from where they Savo, Sengupta, Schmidt, Thorstensen, Trame & choose to semantically formulate the structured Waaler, 2013). The system is a visual semantic query SeRQL that is used for retrieval from the query formulation system that poses queries via knowledgebase. The SeRQL translation is then a visual query formulation (VQF) interface. VQF used to match data in the knowledgebase for is a SPARQL query editor that allows a user to retrieval. manually construct a semantic query. The Another semi-automatic approach for semantic semantically formulatedIJSER queries are then query formulation is (Donderler, Saykol, Arslan, executed by the query answering module against Ulusoy & Gudukbay 2003). Here information in a knowledgebase for retrieval of the answer. The the knowledgebase is presented to the user in process of manual construction of the semantic graphical format where the user sketches to query is complex because it requires users to be formulate semantic query. In this approach, a familiar with the complex syntax of the query visual query is formulated by a gathering of language before retrieval from the objects with some conditions. Here the system knowledgebase. Work in (Popov, Kiryakov, provides the user with a visual interface which Kirilov, Manov&Dimitar, 2003; Damljanovi, 2011) provides a graphical representation of the improve from manually semantic query knowledge from which the user chooses formulation to semi-automatic semantic query variables for query formulation in order to retrieve formulation were proposed. In these system important knowledge. Another work that focuses computer and human work together in order to on query formulation in database was presented semantic formulate structured query. by (Chen & Zhu, 1998). The conceptual modelling approach allows users to express the 2.2 Semi-Automatic Semantic Query way they intend to discover knowledge from a Formulation database on a constructed network. The user chooses variables that form a network, which Unlike the manual semantic query formulation provides the system with a hint of what the approach, where users are required to manually intended query should look like to form a casual formulate their query without assistance from the network. If a causal network is formed, it could system, in the semi-automatic semantic query indicate possible relationships between some approach, the system and user work together in concepts. When the user send a query, the the query formulation process. In this approach, relevant stored knowledge is presented, from the user is given some sort of assistance by the which they choose and guide the knowledge system in order to semantically formulate their search. Another work where users select query

IJSER © 2018 http://www.ijser.org International Journal of Scientific & Engineering Research Volume 9, Issue 11, November-2018 473 ISSN 2229-5518

variables for query formulation in a database is choose in order to semantically formulate their (Bellavia, Maio, & Rizzi, 1992)This provides a query for retrieval of the desired information. In query formulation system which supports the template-based approach, the user needs to inference about problems by optimizing query browse a lot of information provided by the formulation cost. The system is based on the system before they can formulate the query that concept of the user viewpoint (UV), where a user is used for retrieving important information. chooses their viewpoint by defining criteria for TAP (Guha, McCool & Miller, accessing data. The user selects a relation from 2003)AmiGO(GeneInfoViz, 2007) and Gauch, the database schema. The query is formulated Chaffee, & Pretschner (2003) proposed some based on this relationship by using the semantic improvements to the approaches mentioned aspect of the selected relationships. The above. TAP goes beyond just providing a relationship chosen by the user is used to template from which the user chooses the formulate a query, taking into consideration variables that are used for query formulation, by various graphical links between concepts providing the users with a search and browse according to the selected relationship in the mechanism. The main concept of TAP is to database. (Dongilli, Franconi, & Tessaris, 2000) enable users to either use the browsing present another ontology based query capability provided by the system or search for processing system that semantically formulate a the information they need. The search user’s natural language into a structured query. mechanism accepts user input in textual form This is a project by Semantic Webs and Agent and returns all resources whose title properties Integrated Economies (SEWASIE). The project contain the text. focuses on building an intelligent natural Although the above-mentioned approaches offer language query interface that supports users in browsing mechanisms for knowledge in the formulating their natural language query into a knowledgebase, determining the right concept for precise query. The system is based on user/ the systems using the posed search query is not computer interaction where the user is presented straightforward (Damljanovi, 2011). There is with a visual interface to query ontology. The therefore an ambiguity problem as users may be system uses the user’s query to provide various misled by the system to assume their intended related concepts and relationships for the user to query does not exist in the system, when in fact a choose from in order to retrieve the desired different vocabulary is used by the system, such information. This system also requires the user to as synonyms when a user is using the word be involved in an initial query formulation task, by “Allah” but this is represented as “God” in the selecting concepts that should be used for query underlying knowledgebase. formulation. Here the user is also restricted to CINDI improves further from the previous words in the provided in the knowledgebase. The systems by looking at the problem of ambiguity in SEWASIE system supports users in formulating user queries. CINDI proposed a form-base query a semantic query based on the refinement formulation approach where a user poses their process supportedIJSER by ontology navigation. Users query through filling in a form presented by the specify a query using generic terms, are able to system. CINDI formulate user natural language refine some terms, can also introduce new terms queries to structured language SQL using and can iterate the process if required by the semantic templates (Stratica, Kosseim, & Desai, query. 2005). The system provides the user with a Barzdins, Liepins, Veilande, & Zviedris (2008) template that enables them to formulate a present an ontology-assisted query formulation structured SQL query. The query is syntactically based on concepts annotated in the database. parsed by Link Parser, and semantically The main concept of this approach is to use analysed based on domain-specific templates. ontology as the main guide to generating a These templates are connected to a conceptual SPARQL query. The user query is formulated knowledgebase from a database schema using with the help of an assisted graphical user WordNet. The semantically stored information is interface that enables them to construct the used to guide the user in formulating an SQL semantic query. The user first chooses concepts query by selecting the concept and relationship. in the ontology that are related to their query and The system was tested on the CINDI database relate with available relationship that connects containing information about virtual libraries. In the selected concept. The system uses the this system they incorporated lexical dictionary shortest path algorithm to find available WordNet to create a list of hyponyms and relationships that exist between the selected synonyms for each relationship and attribute concepts found in the ontology. name. This enables the user to query the Although the above-mentioned systems knowledgebase with flexible words without being simplified by the effort of semantic query restricted to vocabularies provided by the formulation compared to the manual approach, system. however this system has some limitations QUICK (Zenz, Zhou, Minack, Siberski, & Nejdl, because users are restricted to information 2009) is a work on query formulation that presented to them by the system from which they supports users to construct structured queries.

IJSER © 2018 http://www.ijser.org International Journal of Scientific & Engineering Research Volume 9, Issue 11, November-2018 474 ISSN 2229-5518

Their system is based on user interaction, where approach. The user’s natural the user is guided through constructing a language is processed and formulated according structured query based on the underlying to the underlying knowledgebase.The system ontology backend. In this system the user makes identify concepts from user query work and user the query, the system uses the semantically to manually associate the concepts with annotated information in the knowledgebase and relationships in the underlying knowledgebase. key works in the user query to provide the user The system is based on two fundamental roles with a clarification dialogue where they choose comprising of the end user who uses the system their intended semantic query. This system also by querying the system, and the domain experts requires user participation in selecting query who are familiar with the underlying variables for the system to use in formulating the knowledgebase and play the role of lexicon query. In this case the user queries the system, engineers who interact with the system in lexicon and the system processes and provides the user acquisition mode. The lexical engineer creates with a clarification dialogue, to which the user domain-specific lexicons to adapt the system to provides clarification for the intend query. The an exact domain. In this case, users ask system then uses the clarification provided by the questions which are semantically interpreted by user in formulating the user’s natural language the query. A user’s query is formulated by taking query for information retrieval. This system also into consideration the user’s query where uses WordNet to enables flexible vocabulary concepts are identified by the system and the without restriction, but the system is based on a user manually chooses the relationship with keyword search, which restricts the keywords respect to domain-specific predicates. The that can be used since a keyword may not system simplifies formulating structured query in necessarily be in existence even if the synonym order to retrieve from the knowledgebase. of the word is checked. PANTO is a natural language interface approach LANLI is a natural language interface system for where the system accepts a natural language relational database query formulation query and transforms it into a structured query (Enikuomehin & Okwufulueze, 2012). The work (Wang, Xiong, Zhou, & Yu, 2007). The system semantically assists user’s natural language accepts generic natural language queries, queries to retrieve information from databases. In transforms the queries and output it inform of this work, the system allows users to ask queries structured query language SPARQL. The system in natural language, and they are transformed to is a triple-based query formulation approach that structured query language SQL. The SQL is is designed to cope with various natural executed over a relational database for retrieval language issues such as negation, superlatives of important information. In LANLI they proposed and comparatives. It involves use of the the use of an Unguided Loose Search where statistical Stanford parser, WordNet, and various users can simply send their request in terms of metric algorithms that transform natural language natural language or present some set of the queries to triple-based representation. The triple keywords that describeIJSER their desired information representation enables the construction of a without worrying about the database structure or SPARQL query language based on user syntax. In this system WordNet is used to interaction. expand the user’s words during query Aksac, Ozturk, and Dogdu (2012) present formulation. PERSON, which is a system design to Quite a number of works have been presented semantically search for relevant information from that offers users a Google-like search the web. The approach is to extend the mechanism where users pose natural language functionality of the Mozilla firewall browser by queries to retrieve information from the incorporating a plug-in that identifies named knowledgebase. These systems accept user to entities navigated by users, annotates such query using natural language and the systems entities and formulate the queries that retrieve semantically assist the user in formulating the relevant information based on associating newly query as a formal structure query which is then annotated data with the existing data for used against the knowledgebase to retrieve retrieving relevant information. Users are guided relevant information. ORAKEL is a natural to a query by clicking ontology entities, and then language interface system that semantically a SPARQL query is executed and related formulate a user’s natural language query into a knowledge from the ontology is retrieved. structured query (Cimiano, Haase, Heizmann, Damova and Dann (2013) present a work that Mantel, & Studer, 2008).ORAKEL is a system transforms natural language queries into that accepts user input as natural language structured queries based on the user interaction queries to the system and the system formulates approach. The system offers a mechanism that the queries into formal structured queries in order allows users to semantically query the to be matched against the knowledgebase for knowledgebase using natural language. A user’s retrieval. ORAKEL approach formulates factoid natural language is transform into SPARQL in questions such as what, who, where, and which, order to retrieve answer mainly yes/no answers. using full syntax parsing and a compositional In this system the user needs to search

IJSER © 2018 http://www.ijser.org International Journal of Scientific & Engineering Research Volume 9, Issue 11, November-2018 475 ISSN 2229-5518

knowledge base to get the best desired answer. selecting a template from which they choose There is no proper ranking mechanism to get the from natural language queries that are already best-formulated query to be transformed into loaded, or users make a natural language query SPARQL, since the system returns many and the system guides the user with suggestions possible options. for how to make a suitable query that could be Furthermore another interactive query answered by the system. formulation system that focuses on transforming ONLIstands for ‘ontology natural language a user’s natural language query to a formal interaction’ and is another natural language structured format was presented in (Jarrar et al., question answering system that is mainly 2012). In this work, they propose an interactive designed to transform user’s natural language query formulation language (MashQL) to enable queries into nRQL. The system is based on users to access data on the web easily, with users being familiar with the ontology concept in more precise answers to queries. The language order to transform their query into a structured is represented in the form of a tree, where the query. User interaction is needed for query roots are seen as the query subject and each transformation to nRQL. Unger et al., (2012) branch is comprised of properties and describe a template-based question answering restrictions. The system allows users to navigate system for querying RDF graphs. The system’s through and select various concepts and main purpose is to assist users in querying RDF relationships from an underlying dataset. Users graphs by transforming natural language user interact with the system to form various queries to SPARQL. It reflects the internal questions by selecting concept relationships and structure of the question using statistical entity restrictions to form simple and complex queries. identification and predicate detection. The FFQI presented a query formulation approach for system attempts to identify concepts in a user’s retrieving structured data from database query token and detect possible relationships (Shobana and Venkatesan, 2012). The system is that may exist between the identified concepts designed to accept natural language queries based on deep linguistic analysis by generating based on a semantic graph model. A user is SPARQL templates with slots that need to be presented with an interface that enables them to filled with URIs. To fill those slots, the likely make some selections that the system uses in concept is identified using string similarity and formulating a query by using probabilistic natural language patterns extracted from popularity measures. In this system, the structured data and text documents. disambiguation of the user query is done based QASYO is another system based on a question on ranking technique. The semantic graph is a answering approach where a user’s natural model for a relational database that is comprised language query is transformed to a structured of nodes as relationships, and links are query (Moussa & Abdel-kader, 2011). The represented as the joins between nodes. When system accepts natural language queries and users input a natural language query, the YAGO ontology as input and retrieves popularity of the nodesIJSER and their link is used for information from the semantically annotated the formulation and ranking of the query. data. The system analyses natural language SemSearchis a keyword-based search system queries by looking at the keywords in the query, that semantically transforms user keyword mapping the keywords against the semantically queries to formal structured queries (Lei, Uren, & annotated data, and retrieving answers relevant Motta, 2006). The system accepts a natural to the user query. The system is based on triple language query and transform it into structured mapping, where user queries are transformed SQL query language. The system is based on a into triple form and matched against the triple concept search where users send queries as represented data in the knowledgebase. concepts in order to get result based on that Although the semi-structured semantic query concept. In SemSearch the user conveys to the formulation system eases the processes involved search engine the type of search result. In this for user to retrieve data from a knowledgebase case the user is required to have knowledge of using natural language, the time and the the concepts that exist in the knowledgebase processes involved still require a lot from users. before they can query and retrieve important Users need to interact with the system before information. NaLIX is another semantic search they can retrieve important knowledge from the system that is based on user interaction for knowledgebase, which is hectic and time retrieval from XML data (Li, Ave, & Arbor, 2005). consuming. Users need a more simple method The system accepts a natural language query such as a Google-like search mechanism where and formulates the query into an XQuery they can easily make a query and receive an expression, which is used against XML data for answer without participating in the retrieval retrieving important information. The system process. maps the grammatical closeness of natural language parsed tokens to the closeness of corresponding elements in the result XML. In this system users interact with the system by

IJSER © 2018 http://www.ijser.org International Journal of Scientific & Engineering Research Volume 9, Issue 11, November-2018 476 ISSN 2229-5518

2.3 Automatic Semantic Query formulation user’s natural language query into structured Approach form. It is an extension of the previously discussed Aqualog, mainly designed to cope with In order to fully automate the process of problems in the Aqualog system. Power Aqua semantic query formulation, various systems uses huge amounts of available heterogeneous attempt to automate the process of semantic semantic data in order to interpret a natural query formulation. Although the proposed language query, without making any automatic semantic query formulation systems assumptions about the particular ontologies of a have shown significant improvement over the particular query. Power Aqua has the advantage semi-automatic query formulation approach, of being domain independent, where user most of these systems are still not fully queries don’t have to target specific domains. automated. The systems still involve users during User queries can be formulated to retrieve the semantic query formulation process. There is information from semantically structured data on still not a fully automated system that the web. automatically formulate a user’s natural language QACID is a semantic query formulation system query to a structured query without human for querying database in a natural language intervention. question answering approach (Ferrández, AquaLog(Garcia, Hall, Keynes, Motta, & Uren, Izquierdo, Ferrández, & Vicedo, 2009). The 2006) presented a system based on a question system transforms collections of given domain answering approach. The system is a portable queries, analyses such queries, and groups the Natural language interface that enables users to queries in clusters. Each query is used for query the knowledgebase using natural mapping natural language query terms with language. A user is able to make a query using knowledge in the knowledgebase by using string natural language, the system analyses the query, distance metrics. Each cluster contains formulate the query into a structured query and representations of a certain group of queries and matches the query against the knowledgebase in has a characteristic query pattern derived from order to make an inference and return an answer training set data. to the user. This system attempts to semantically AutoSPARQL is a query formulation interface formulate structures query from user’s imputed that formulates user queries to SPARQL query natural language query. When a user send their language (Lehmann. et al, 2011). The system is query, the system starts by automatically based on a supervised machine learning transforming the user query into possible approach that learns about a user’s intended linguistic triples. Linguistic triples are candidate query based on user interaction. Users can either triples generated from a user query after some ask a question directly, as in a question linguistic processing, which are used to generate answering system, search for relevant resources, the best triple representation of the user query or select a search result. Although the system through the Relation Similarity Service module claims to be an automatic SPARQL generation (RSS). Aqualog IJSERis potable because it takes system it relies heavily on learning from what natural language queries and ontology as an users describe as answer or not answers, i.e. input, and then returns answers retrieved from positive and negative answers. The system first one or more knowledgebase. Users can ask allows users to search for a concept, for example queries and customise their queries by animal, then the system will ask the user if the associating some keywords with the concept in answer to the search term is “Donkey” if they say the ontology. yes, then it is used as a positive answer, and if NLP-Reduce is an approach based on automatic they say no it is used as a negative answer. The semantic query formulation that transforms a system learns answers to certain queries, which user’s natural language query into a structured can be used to show the next user asking a query (Kaufmann, Bernstein, & Fischer, 2007). similar query. The system presents a natural language Deines & Krechel (2013) describe another interface that transforms natural language semantic query formulation approach that queries into structured queries. The core part of formulates a user’s natural language query in the system is the query generator which is German to SPARQL query language. The accountable for creating SPARQL queries given system uses natural language queries in the the words and the lexicon extracted from the German language and matches them against an knowledgebase, where users are able to RDF graph that is labelled in German. The enter keywords or full sentences for querying the system is based on identification of various knowledgebase. The system uses a set of resources from user queries, which shows path- natural language processing techniques, such as based identification of similar semantic stemming and synonym expansion to reduce a resources. In this system users pose questions in query to a structured query. the German language, the system then PowerAqua(Lopez, Fernández, Motta, &Stieler, automatically expands the domain ontology used 2011) is an ontology-based natural language by populating the ontology with the interface that supports the transformation of a

IJSER © 2018 http://www.ijser.org International Journal of Scientific & Engineering Research Volume 9, Issue 11, November-2018 477 ISSN 2229-5518

corresponding German vocabulary using fails to answer the query. The system uses GermanNet. semantically annotated ontology with syntactic DENNA is another semantic query formulation parsing in order to formulate user natural system mainly designed to transform natural language queries into structured SPARQL query language queries into structured queries (Yahya, language. Berberich, & Elbassuoni, 2011). The system is SWSNL is a work in (Habernal & Konopík, 2013) based on integer linear programming for solving was proposed to go beyond phrase or single several natural disambiguation problems. The sentence query. SWSNL works on semantic system transforms a user’s natural language query formulation that enables users to query the query into a structured SPARQL query, where knowledgebase using natural language in a the focus is on entities, classes, and the phrase, single sentence or multiple sentences. relationships between them. The system utilizes The system processes queries using semantic the richness of the knowledge base’s analyses, and semantic interpretation to semantically annotated data which enables users transform the user’s natural language to to make a query in natural languagThe SPARQL, and then retrieves result from MyAutoSPARQL system (Sharef, Noah, 2003) is knowledgebase. The system allow users to enter another work that attempt to automatically queries using keywords, complete sentences, or formulate a user’s natural language to a even paragraphs. The system was evaluated structured query based on the technique of using two languages, mainly English and Czech. rewriting a NL query. MyAutoSPARQL started This system analyses user queries using natural by using linguistic processing to transform user language components, which are comprised of queries into linguistic triples, as in Aqualog. The query pre-processing, name entity recognition system then tries to identify the namespace of and semantic analyses. The queries are the concepts and obtain the closest property formulated into SPARQL query language and a name based on the linguistic triples, and then the result is retrieved. triples are formed for SPARQL construction. An Automated Semantic Query formulation Although the system attempts to automatically approach was presented in the work of (R. transform natural language queries into Abdulkadir, R.A Yauri, 2017). There work structured SPARQL queries, it possesses some automatically formulates user’s natural language limitations. First the system relies on linguistic query to structured query based on using triples formulated by the system in order to statistical machine learning approach. What process to SPARQL construction, which is based differentiate their work with previous works is that on heuristic linguistic processing. there systems can translate paragraph length Systems were developed in order to deal with natural language query to structured query as this problem by attempting to automatically against previous works that are mostly small formulate a user’s natural language query to fragment query. Additionally their disambiguation formal query language but, if the system is not approach is capable of disambiguating words able to semanticallyIJSER formulate user queries, they that are not found in WordNet. They used don’t just fail. These systems try to assist users equivalent assertion to disambiguate non English to avoid having to re-write their queries from the words. scratch. This gives the user some sort of support in case their queries cannot be answered directly 3.0 CONCLUSION by the system. The Querix system employs In Conclusion, semantic query formulation semantic query formulation of users’ natural systems have been developed, ranging from language queries (Kaufmann, Bernstein, & manual semantic query formulation, through Zumstein, 2006). The system is in the form of a semi-automatic semantic query formulation, to natural language interface, where a user is able automatic query formulation. The manual to make queries using natural language, and the process requires a user to be familiar with the system transforms the queries into structured syntax of formal language or know what SPARQL queries. The system attempts to information is in the knowledgebase, which formulate user queries but when there is creates limitations in terms of how to make encounters ambiguity, it uses clarification dialog queries and what can be queried. Although the to ask the user to manually disambiguate their semi-automatic process provides some query. The system also uses clarification assistance to help the user to know what can be feedback to formulate SPARQL queries, which is queried and how to make queries, it still requires executed against the knowledgebase. a user to go through a lot by taking time during FREyA(Damljanovic, Agatonovic, & the query formulation process. Automatic Cunningham, 2010) is a natural language semantic query formulation provides users with interface for querying ontologies where the their favourite google-like search system, and in system attempts to automatically semantically this case the system does not require heavy formulate natural language queries into intervention by the user in the query formulation structured queries. The system provides the user process. The automatic semantic query with a clarification dialogue in case the system formulation approach saves users from having to

IJSER © 2018 http://www.ijser.org International Journal of Scientific & Engineering Research Volume 9, Issue 11, November-2018 478 ISSN 2229-5518

be familiar with the complex syntax of formal Blanco, RoiHalpin, HarryHerzig, Daniel M.Mika, query language or knowing the structure of the PeterPound, JeffreyThompson, Henry knowledgebase information, as in the case of S.Tran, Thanh. (2013). Web Semantics: manual semantic query formulation. Automatic Science, Services and Agents on the World query formulation also reduces the customisation Wide Web. Journal of Web Semantics. and interaction processes a user must go Madhu, G., Govardhan, a, & Rajinikanth, T. K. V. through, as in semi-automatic query formulation. (2011). Intelligent semantic web search In automatic semantic query formulation a user’s engines: A brief survey. International natural language query is transformed Journal of Web & Semantic Technology, automatically into a structured query language 2(1), 34–42. doi:10.5121/ijwest.2011.2103 such as SPARQL, which is then matched against Kharlamov, E., Giese, M., Soylu, A., the knowledgebase by several retrieval Zheleznyakov, D., Bagosi, T., Console, mechanisms in order to retrieve relevant M.,Waaler, A. (2013). Optique 1. 0 : information from the knowledgebase. Semantic Access to Big Data the Case of Norwegian Petroleum Directorate’s Fact REFERENCE Pages. The 12th International Semantic Web Conference, 21-25 October 2013, Shadbolt, N., Berners-Lee, T., & Hall, W. (2006). Sydney, Australia (1), 1–4. The semantic web revisited. IEEE Popov, B., Kiryakov, A., Kirilov, A., Manov, D., Intelligent Systems, 21(3), 96–101. Ognyanoff, D. and Goranov, M. (2003). doi:10.1109/MIS.2006.62 KIM - Semantic Annotation Platform. In 2nd G Kück, (2004), "Tim Berners-Lee's Semantic International Semantic Web Conference, Web”. South African Journal of Information Florida, USA, 2003 834-849. Management. Bellavia, G., Maio, D., & Rizzi, S. (1992). Ahmed, Z., & Gerhard, D. (2007). Web to Resolving the query inference problem by Semantic Web & Role of Ontology. optimizing query formulation cost, 1–30. In the proceedings of National Technical Report CIOC-C.N.R. n. 85. Conference on Information and Dongilli, P., Franconi, E. (2006). An Intelligent Communication Technologies, (NCICT- Query Interface with Natural Language 2007), Pakistan, 9th May 2007. Support. In: Proceedings of the 19th Hu, Y. (2004). The Semantic Web : Current International FLAIRS Conference FLAIRS- Status and Future Directions. [Power Point 2006, Melbourne Beach, Florida, May slides]. Retrieved 2006. from http://www.cs.nccu.edu.tw/~jong/pub/ Barzdins, G., Liepins, E., Veilande, M., & mis0601talk.pdf. Zviedris, M. (2009). Ontology Enabled Patel-Schneider, P. F. (2005): A Revised Graphical Database Query Tool for End- Architecture for the Semantic Web Users. In Eighth International Baltic Reasoning. In:IJSER Proceedings of PPSWR’05, Conference on Databases and Information Dagstuhl, Germany. Systems (DB&IS 2008), 105–116. Decker, S., Sintek, M., Billig, A., Henze, N., Guha, R., McCool, R., and Miller, E. (2003). Harth, A., Leicher, A.Neumann, G. (2006). Semantic Search. Proceedings of the TRIPLE - an RDF Rule Language with WWW2003, Budapest, 2003. Context and Use Cases. Proceedings of GeneInfoViz. Gene Ontology. University the W3C Workshop on Rule Languages for Montpellier. Retrieved 6 October 2011 Interoperability. Washington DC. USA. from http://irb.chu- Bettina Fazzinga and Thomas Lukasiewicz. montpellier.fr/fr/PDF/Bioinfo2007. (2010). Semantic search on the Web. Gauch Groppe, J., Groppe, S., Ebers, S., & Semantic Web — Interoperability, Usability, Linnemann, V. (2009). Efficient processing Applicability. 89–96. of SPARQL joins in memory by dynamically Tablan, V., Damljanovic, D., & Bontcheva, K. restricting triple patterns. Proceedings of (2010). A Natural Language Query the 2009 ACM Symposium on Applied Interface to Structured Information, In Computing - SAC ’09, 1231. ESWC 2010, volume 6088 of LNCS, pages Tablan, V., Damljanovic, D., & Bontcheva, K. 106{120. Springer, 2010 . (2010). A Natural Language Query A.H.M. ter Hofstede, H.A. Proper and Th.P. van Interface to Structured Information, In der Weide (1995) .Computer supported ESWC 2010, volume 6088 of LNCS, pages query formulation in an evolving context. In 106{120. Springer, 2010 . Proceeding Sixth Australasian Database Stratica, N., Kosseim, L., & Desai, B. C. (2005). Conference, ADC'95, Volume 17(2) of Using semantic templates for a natural Australian Computer Science language interface to the CINDI virtual Communications, Adelaide, Australia library. Data & Knowledge Engineering, (January 1995) 55(1), 4–19. doi:10.1016/j.datak.2004.12.002

IJSER © 2018 http://www.ijser.org International Journal of Scientific & Engineering Research Volume 9, Issue 11, November-2018 479 ISSN 2229-5518

Zenz, G., Zhou, X., Minack, E., Siberski, W., & Kaufmann, E., Bernstein, A., Zumstein, R. Nejdl, W. (2009). From keywords to (2006). Querix: A natural language semantic queries—Incremental query interface to query ontologies based on construction on the semantic web. Web clarification dialogs. In: ISWC 2006.LNCS, Semantics: Science, Services and Agents 980–981. Springer, Heidelberg. on the World Wide Web, 166–176. Lopez, V., Fernández, M., Stieler, N., Motta, E., Enikuomehin A.O., Okwefulueze D.O. (20120. Hall, W., Mkaa, M. K., & Kingdom, U. An Algorithm for Solving Natural Language (2011). PowerAqua : supporting users in Query Execution Problems on Relational querying and exploring the Semantic Web Databases. International Journal of content. Semantic Web Journal. Advanced Computer Science and Ferrández, Ó., Izquierdo, R., Ferrández, S., & Applications 169-175 Vicedo, J. L. (2009). Addressing ontology- Unger, C., Bühmann, L., Lehmann, J., Ngonga based question answering with collections Ngomo, A.-C., Gerber, D., & Cimiano, P. of user queries. Information Processing & (2012). Template-based question Management 45(2), 175–188. answering over RDF data. Proceedings of Lehmann, J., & Lorenz, B. (2011). AutoSPARQL : the 21st International Conference on World Let Users Query Your Knowledge Base. In Wide Web - WWW ’12, 639. Proceedings of ESWC 1–15. Wang, C., Xiong, M., Zhou, Q., & Yu, Y. (2007). Deines, I., & Krechel, D. (2013). A German PANTO : A Portable Natural Language Natural Language Interface for Semantic Interface to Ontologies. Proceedings of the Search, Semantic Technology. In Fourth European Semantic Web proceeding of Second Joint International Conference 473-487. Conference, JIST 2012Nara, Japan, Aksac, A., Ozturk, O., & Dogdu, E. (2012). A December 2012. novel semantic web browser for user Yahya, M., Berberich, K., & Elbassuoni, centric information retrieval: PERSON. S. (2011). Natural Language Questions for Expert Systems with Applications, 39(15), the Web of Data. In Proceedings of the 12001–12013. 2012 joint conference for Empirical Mustafa Jarrar, Marios D. Dikaiakos. (2012). A methods of Natural Language Processing Query Formulation Language for the Data and Computational Natural Language Web," IEEE Transactions on Knowledge Learning and Data Engineering 783-798. Damljanovic, D.; Agatonovic, M.; and R. Shobana, D.Venkatesan (2012). FFQI-Fast Cunningham, H. 2012. FREyA: an Formulation Query Interface For. Journal of Interactive Way of Querying Linked Data Theoretical and Applied Information using Natural Language. The Semantic Technology 37(1), 125–131. Web.125–138. Springer Lei, Y., Uren, V., Motta, E. (2006).Semsearch: A Habernal, I., & Konopík, M. (2013). SWSNL : search engineIJSER for the semantic web. In: Semantic Web Search Using Natural Staab, S.,Svátek, V. (eds.) EKAW 2006. Language. Expert Systems with LNCS (LNAI), vol. 4248, pp. 238–245. Applications 40, 3649–3664. Springer, Heidelberg. Rabiah Abdul Kadir, Rufai Aliyu Yauri and Li, Y., Yang, H., Jagadish, H. (2005).NaLIX: An Azreen Azman.(2018) Automated Semantic interactive natural language interface for Document Retrieval. International querying xml. In: Proceedings of the 2005 Conference on Information Retrieval and ACM SIGMOD International Conference on Knowledge Management Management of Data, pp. 900–902. ACM Press, New York Unger, C., Bühmann, L., Lehmann, J., Ngonga Ngomo, A.-C., Gerber, D., & Cimiano, P. (2012). Template-based question answering over RDF data. Proceedings of the 21st International Conference on World Wide Web - WWW ’12, 639. M. Moussa and R. F. Abdel-Kader. (2011).QASYO: A Question Answering System for YAGO Ontology. International Journal of Database Theory and Application Vol. 4, No. 2, June, 2011 Lopez, V., Motta, E., Uren, V. and Pasin, M. (2007). AquaLog: An ontology-driven Question Answering System for Semantic intranets. Journal of Web Semantics 5 (2).

IJSER © 2018 http://www.ijser.org