Ontop: Answering SPARQL Queries Over Relational Databases

Total Page:16

File Type:pdf, Size:1020Kb

Ontop: Answering SPARQL Queries Over Relational Databases Semantic Web 0 (0) 1 1 IOS Press Ontop: Answering SPARQL Queries over Relational Databases Editor(s): Óscar Corcho, Universidad Politécnica de Madrid, Spain Solicited review(s): Jean Paul Calbimonte, École Polytechnique Fédérale de Lausanne, Switzerland; José Luis Ambite, University of Southern California, USA; one anonymous reviewer Diego Calvanese a, Benjamin Cogrel a, Sarah Komla-Ebri a, Roman Kontchakov b, Davide Lanti a, Martin Rezk a, Mariano Rodriguez-Muro c, and Guohui Xiao a a Free University of Bozen-Bolzano, Italy {calvanese,bcogrel,sakomlaebri,dlanti,mrezk,xiao}@inf.unibz.it b Birkbeck, University of London, UK [email protected] c IBM TJ Watson, US [email protected] Abstract. We present Ontop, an open-source Ontology-Based Data Access (OBDA) system that allows for querying relational data sources through a conceptual representation of the domain of interest, provided in terms of an ontology, to which the data sources are mapped. Key features of Ontop are its solid theoretical foundations, a virtual approach to OBDA, which avoids materializing triples and is implemented through the query rewriting technique, extensive optimizations exploiting all elements of the OBDA architecture, its compliance to all relevant W3C recommendations (including SPARQL queries, R2RML mappings, and OWL 2 QL and RDFS ontologies), and its support for all major relational databases. Keywords: Ontop, OBDA, Databases, RDF, SPARQL, Ontologies, R2RML, OWL 1. Introduction Since the mid 2000s, Ontology-Based Data Access (OBDA) has become a popular approach for tackling Over the past 20 years we have moved from a this challenge [42]. In OBDA, a conceptual layer is world where most companies operated with a sin- provided in the form of an ontology that defines a gle all-knowing, self-contained, central database to a shared vocabulary, models the domain, hides the struc- world where companies buy and sell their data, in- ture of the data sources, and enriches incomplete data teract with several data sources, and analyze patterns with background knowledge. Then, queries are posed and statistics coming from them. The focus is shifting over this high-level conceptual view, and the users from obtaining information to finding the right infor- no longer need an understanding of the data sources, mation. It has always been the case that information the relation between them, or the encoding of the is power, but today attention rather than information data. Queries are translated by the OBDA system into becomes the scarce resource, and those who can dis- queries over potentially very large (usually relational tinguish valuable information from background clut- and federated) data sources. The ontology is connected ter gain power [28]. To separate the wheat from the to the data sources through a declarative specification chaff, companies need a comprehensive understanding given in terms of mappings that relate symbols in the of their data and the ability to cope with diversity in ontology (classes and properties) to (SQL) views over the data. the data. The W3C standard R2RML [17] was cre- 1570-0844/0-1900/$27.50 c 0 – IOS Press and the authors. All rights reserved 2 Ontop ated with the goal of providing a language for spec- is encoded in the data. In this paper, we describe how ifying mappings in the OBDA setting. The ontology to use the Ontop system to address this challenge by together with the mappings exposes a virtual RDF enhancing the database with a semantic layer. graph, which can be queried using SPARQL, the stan- We present the OBDA system Ontop2, a mature dard query language in the Semantic Web community. open-source system, which is currently being used This virtual RDF graph can be materialized, generat- in a number of projects. Ontop supports all the ing RDF triples to be used with RDF triplestores, or al- W3C recommendations related to OBDA: OWL 2 QL, virtual ternatively it can be kept and queried only dur- R2RML, SPARQL, SWRL, and the OWL 2 QL entail- ing query execution. The virtual approach avoids the ment regime in SPARQL. The system is available as a cost of materialization and can profit from more than Protégé plugin, a SPARQL endpoint through Sesame 30 years’ maturity of relational database systems (effi- Workbench, and a Java library supporting OWL API cient query answering, security, robust transaction sup- and Sesame API. port, etc.). The structure of the article is as follows. Section 2 To illustrate these concepts and different notions in presents a high-level overview of the architecture of this article, we will use the following running exam- Ontop. Section 3 surveys additional tools that can be ple. All the material required to run this example in the used with Ontop for creating, deploying, and query- OBDA system Ontop (and a complementary tutorial) 1 ing OBDA systems. Section 4 describes the SPARQL can be found online . query answering techniques implemented in Ontop. Example 1.1 (Hospital Database). We consider a hos- Section 5 outlines the applications of Ontop, in partic- pital database with a single table tbl_patient that ular the Statoil and Siemens use cases in the context contains information on lung cancer patients. The ta- of the Optique EU project. Related SPARQL query an- ble has 4 attributes: the patient identifier (pid), his/her swering systems are surveyed in Section 6. Section 7 is name, the type of cancer (tumor), and its stage. The a retrospective on the development of Ontop over the lung cancer can be of two types: Non-Small Cell Lung past five years. Finally, Section 8 concludes the article. Carcinoma (NSCLC) and Small Cell Lung Carcinoma (SCLC), which are encoded in the table by a boolean value type as follows: 2. Architecture of Ontop – false for NSCLC and true for SCLC. Ontop is an open-source3 OBDA system released The stage of the cancer is encoded by a positive integer under the Apache license, developed at the Free Uni- value stage as follows: versity of Bozen-Bolzano. The Ontop system exposes relational databases as virtual RDF graphs by linking – NSCLC: 1–6 for stages I, II, III, IIIa, IIIb, and IV, the terms (classes and properties) in the ontology to respectively; the data sources through mappings. This virtual RDF – SCLC: 1 and 2 for stages Limited and Extensive, graph can then be queried using SPARQL by translat- respectively. ing the SPARQL queries into SQL queries over the re- Our sample table contains the following data: lational databases. This translation process is transpar- ent to the user. pid name type stage The architecture of Ontop, which is illustrated in 1 ’Mary’ false 4 Fig. 1, can be divided in four layers: (i) the inputs, 2 ’John’ true 1 i.e., the domain-specific artifacts such as the ontology, mappings, database, and queries; (ii) the core of the Suppose we need a simple piece of information from system in charge of query translation, optimization, this database: “Give me the names of patients with a and execution; (iii) the APIs exposing standard Java tumor of stage IIIa”. Even this simple query in this tiny interfaces to users of the system; and (iv) the applica- database already presents some challenges, because in tions that allow end-users to execute SPARQL queries order to formulate the query and to understand and an- over databases. We explore each of these components alyze the results we need to know how the information in turn. 1https://github.com/ontop/ontop-examples/ 2http://ontop.inf.unibz.it tree/master/swj-2015 3http://github.com/ontop/ontop Ontop 3 Sesame Workbench & Optique Application Protege Layer SPARQL Endpoint Platform API OWL-API Sesame Storage And Inference Layer (SAIL) API Layer Ontop Ontop SPARQL Query Answering Engine (Quest) Core OWL-API Sesame API R2RML API JDBC (OWL Parser) (SPARQL Parser) OWL 2 QL R2RML Inputs Relational SPARQL Ontologies Mappings Databases Queries Fig. 1. Architecture of the Ontop system 2.1. Inputs: Ontology, Mappings, Queries, and classes of :LungCancer (that is, both are types Databases of lung cancer), which in turn is a subclass of :Neoplasm. The object property :hasNeoplasm has To the best of our knowledge, Ontop is the first OBDA system that supports all the W3C recom- class :Patient as its domain and :Neoplasm as its mendations related to OBDA: OWL 2 QL, R2RML, range (in other words, it relates patients to neoplasms). SPARQL, SWRL, and the OWL 2 QL entailment We also have a datatype property :hasName and an regime in SPARQL4. In addition, it supports all major commercial and open-source relational databases. object property :hasStage. Ontology. Ontop uses RDFS [8] and OWL 2 QL [39] as ontology languages. OWL 2 QL is based on the DL- Lite family of lightweight description logics [11,3], Mappings. Ontop supports two mapping languages: which guarantees that queries over the ontology can be the W3C RDB2RDF Mapping Language (R2RML), rewritten into equivalent queries over the data alone. Recently Ontop has been extended to support also a which is a widely used standard; and the native Ontop fragment of SWRL [61]. mapping language, which is easier to learn and use. Example 2.1. The following ontology captures the do- Ontop includes tools for converting native mappings main knowledge of our running example. It describes into R2RML mappings and vice-versa. Intuitively, a the concepts of cancer and cancer patient with the fol- mapping assertion consists of a source, which is an lowing OWL axioms: SQL query retrieving values from the database, and a :NSCLC rdfs:subClassOf :LungCancer . :SCLC rdfs:subClassOf :LungCancer . target, which constructs RDF triples with values from :LungCancer rdfs:subClassOf :Neoplasm . the source. :hasNeoplasm rdfs:domain :Patient . :hasNeoplasm rdfs:range :Neoplasm . :hasName a owl:DatatypeProperty .
Recommended publications
  • Representing and Reasoning About Semantic Conflicts in Heterogeneous Information Systems
    Representing and Reasoning about Semantic Conflicts in Heterogeneous Information Systems Cheng Hian Goh CISL WP# 97-01 January 1997 The Sloan School of Management Massachusetts Institute of Technology Cambridge, MA 02142 Representing and Reasoning about Semantic Conflicts in Heterogeneous Information Systems by Cheng Hian Goh Submitted to the Sloan School of Management on Dec 5, 1996, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract The context INterchange (COIN) strategy [Sciore et al., 1994, Siegel and Madnick, 1991] presents a novel perspective for mediated data access in which semantic con- flicts among heterogeneous systems are not identified a priori, but are detected and reconciled by a Context Mediator through comparison of contexts associated with any two systems engaged in data exchange. In this Thesis, we present a formal character- ization and reconstruction of this strategy in a COIN framework, based on a deductive object-oriented data model and language called COIN. The COIN framework provides a logical formalism for representing data semantics in distinct contexts. We show that this presents a well-founded basis for reasoning about semantic disparities in heterogeneous systems. In addition, it combines the best features of loose- and tight- coupling approaches in defining an integration strategy that is scalable, extensible and accessible. These latter features are made possible by teasing apart context knowledge from the underlying schemas whenever feasible, by enabling sources and receivers to remain loosely-coupled to one another, and by sustaining an infrastructure for data integration. The feasibility and features of this approach have been demonstrated in a prototype implementation which provides mediated access to traditional database systems (e.g., Oracle databases) as well as semi-structured data (e.g., Web-sites).
    [Show full text]
  • A Semantic Query Method for Deep Web∗
    Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013) A Semantic Query Method for Deep Web∗ Jiang Hao**,Liu Wen-ju, Lu Li-li School of Computer Science and Engineering Southeast University Nanjing, Jiangsu Province 210096, China [email protected] Abstract—Based on the idea of "functionality-centric", this Web environment. paper proposes a complete set of oriented semantic query methods for Deep Web, builds up the relevant software II. RELEVANT RESEARCH WORK architecture, provides a new method for full use of Deep Web Lots of relevant researches have been done at home data resources in semantic web environment through and abroad in recent years, including following describing the establishment of the semantic environment, implementations: re-writing the SPARQL-to-SQL query, semantic packaging of semantic query result, and the architecture of semantic query A. Web data integration services. It mainly includes two methods in traditional Web data Keywords-Deep Web; semantic query; SPARQ and RDF integration and ontology-based semantic integration. The view traditional Web data integration is typical for WISE-Integrator[4] and MetaQuerier[5] developed by I. INTRODUCTION American scholar with the idea of schema extracting of the Deep Web query interface, constructing the integrated The Deep Web, an important part of the current Web interface, implementing data query and integration through information, refers to the hidden information in the Web the meta-queries. While Ontology-based semantic database. With the continuous development of computer integration is mainly concentrated on the Deep Web depth, networks, the reality of how to use effectively the Deep Web describing the semantics of Web data sources through local data resources becomes a hot issue.
    [Show full text]
  • A Survey of Geospatial Semantic Web for Cultural Heritage
    heritage Review A Survey of Geospatial Semantic Web for Cultural Heritage Ikrom Nishanbaev 1,* , Erik Champion 1 and David A. McMeekin 2 1 School of Media, Creative Arts, and Social Inquiry, Curtin University, Perth, WA 6845, Australia; [email protected] 2 School of Earth and Planetary Sciences, Curtin University, Perth, WA 6845, Australia; [email protected] * Correspondence: [email protected] Received: 23 April 2019; Accepted: 16 May 2019; Published: 20 May 2019 Abstract: The amount of digital cultural heritage data produced by cultural heritage institutions is growing rapidly. Digital cultural heritage repositories have therefore become an efficient and effective way to disseminate and exploit digital cultural heritage data. However, many digital cultural heritage repositories worldwide share technical challenges such as data integration and interoperability among national and regional digital cultural heritage repositories. The result is dispersed and poorly-linked cultured heritage data, backed by non-standardized search interfaces, which thwart users’ attempts to contextualize information from distributed repositories. A recently introduced geospatial semantic web is being adopted by a great many new and existing digital cultural heritage repositories to overcome these challenges. However, no one has yet conducted a conceptual survey of the geospatial semantic web concepts for a cultural heritage audience. A conceptual survey of these concepts pertinent to the cultural heritage field is, therefore, needed. Such a survey equips cultural heritage professionals and practitioners with an overview of all the necessary tools, and free and open source semantic web and geospatial semantic web platforms that can be used to implement geospatial semantic web-based cultural heritage repositories.
    [Show full text]
  • Federated Query Processing for the Semantic Web
    Departamento de Inteligencia Artificial Facultad de Informatica´ Federated Query Processing for the Semantic Web A thesis submitted for the degree of PhD Thesis Author: Msc. Carlos Buil-Aranda Advisor: Dr. Oscar Corcho Garc´ıa January 2012 ii Tribunal nombrado por el Sr. Rector Magfco. de la Universidad Politecnica´ de Madrid, el d´ıa...............de.............................de 20.... Presidente : Vocal : Vocal : Vocal : Secretario : Suplente : Suplente : Realizado el acto de defensa y lectura de la Tesis el d´ıa..........de......................de 20...... en la E.T.S.I. /Facultad...................................................... Calificacion´ .................................................................................. EL PRESIDENTE LOS VOCALES EL SECRETARIO iii Abstract The recent years have witnessed a constant growth in the amount of RDF data available on the Web. This growth is largely based on the increasing rate of data publication on the Web by different actors such governments, life science researchers or geographical institutes. RDF data generation is mainly done by converting already existing legacy data resources into RDF (e.g. converting data stored in relational databases into RDF), but also by creating that RDF data directly (e.g. sensors). These RDF data are normally exposed by means of Linked Data-enabled URIs and SPARQL endpoints. Given the sustained growth that we are experiencing in the number of SPARQL endpoints available, the need to be able to send federated SPARQL queries across them has also grown. Tools for accessing sets of RDF data repositories are starting to appear, differing between them on the way in which they allow users to access these data (allowing users to specify directly what RDF data set they want to query, or making this process transparent to them).
    [Show full text]
  • EAGLE—A Scalable Query Processing Engine for Linked Sensor Data †
    sensors Article EAGLE—A Scalable Query Processing Engine for Linked Sensor Data † Hoan Nguyen Mau Quoc 1,* , Martin Serrano 1, Han Mau Nguyen 2, John G. Breslin 3 and Danh Le-Phuoc 4 1 Insight Centre for Data Analytics, National University of Ireland Galway, H91 TK33 Galway, Ireland; [email protected] 2 Information Technology Department, Hue University, Hue 530000, Vietnam; [email protected] 3 Confirm Centre for Smart Manufacturing and Insight Centre for Data Analytics, National University of Ireland Galway, H91 TK33 Galway, Ireland; [email protected] 4 Open Distributed Systems, Technical University of Berlin, 10587 Berlin, Germany; [email protected] * Correspondence: [email protected] † This paper is an extension version of the conference paper: Nguyen Mau Quoc, H; Le Phuoc, D.: “An elastic and scalable spatiotemporal query processing for linked sensor data”, in proceedings of the 11th International Conference on Semantic Systems, Vienna, Austria, 16–17 September 2015. Received: 22 July 2019; Accepted: 4 October 2019; Published: 9 October 2019 Abstract: Recently, many approaches have been proposed to manage sensor data using semantic web technologies for effective heterogeneous data integration. However, our empirical observations revealed that these solutions primarily focused on semantic relationships and unfortunately paid less attention to spatio–temporal correlations. Most semantic approaches do not have spatio–temporal support. Some of them have attempted to provide full spatio–temporal support, but have poor performance for complex spatio–temporal aggregate queries. In addition, while the volume of sensor data is rapidly growing, the challenge of querying and managing the massive volumes of data generated by sensing devices still remains unsolved.
    [Show full text]
  • Semantic Web Technologies and Data Management
    Semantic Web Technologies and Data Management Li Ma, Jing Mei, Yue Pan Krishna Kulkarni Achille Fokoue, Anand Ranganathan IBM China Research Laboratory IBM Software Group IBM Watson Research Center Bei Jing 100094, China San Jose, CA 95141-1003, USA New York 10598, USA Introduction The Semantic Web aims to build a common framework that allows data to be shared and reused across applications, enterprises, and community boundaries. It proposes to use RDF as a flexible data model and use ontology to represent data semantics. Currently, relational models and XML tree models are widely used to represent structured and semi-structured data. But they offer limited means to capture the semantics of data. An XML Schema defines a syntax-valid XML document and has no formal semantics, and an ER model can capture data semantics well but it is hard for end-users to use them when the ER model is transformed into a physical database model on which user queries are evaluated. RDFS and OWL ontologies can effectively capture data semantics and enable semantic query and matching, as well as efficient data integration. The following example illustrates the unique value of semantic web technologies for data management. Figure 1. An example of ontology based data management In Figure 1, we have two tables in a relational database. One stores some basic information of several companies, and another one describes shareholding relationship among these companies. Sometimes, users want to issue such a query “find Company EDOX’s all direct and indirect shareholders which are from Europe and are IT company”. Based on the data stored in the database, existing RDBMSes cannot represent and answer the above query.
    [Show full text]
  • Ontop: Answering SPARQL Queries Over Relational Databases
    Undefined 0 (0) 1 1 IOS Press Ontop: Answering SPARQL Queries over Relational Databases Diego Calvanese a, Benjamin Cogrel a, Sarah Komla-Ebri a, Roman Kontchakov b, Davide Lanti a, Martin Rezk a, Mariano Rodriguez-Muro c, and Guohui Xiao a a Free University of Bozen-Bolzano {calvanese,bcogrel, sakomlaebri,dlanti,mrezk,xiao}@inf.unibz.it b Birkbeck, University of London [email protected] c IBM TJ Watson [email protected] Abstract. In this paper we present Ontop, an open-source Ontology Based Data Access (OBDA) system that allows for querying relational data sources through a conceptual representation of the domain of interest, provided in terms of an ontology, to which the data sources are mapped. Key features of Ontop are its solid theoretical foundations, a virtual approach to OBDA that avoids materializing triples and that is implemented through query rewriting techniques, extensive optimizations exploiting all elements of the OBDA architecture, its compliance to all relevant W3C recommendations (including SPARQL queries, R2RML mappings, and OWL 2 QL and RDFS ontologies), and its support for all major relational databases. Keywords: Ontop, OBDA, Databases, RDF, SPARQL, Ontologies, R2RML, OWL 1. Introduction vocabulary, models the domain, hides the structure of the data sources, and can enrich incomplete data with Over the past 20 years we have moved from a background knowledge. Then, queries are posed over world where most companies had one all-knowing this high-level conceptual view, and the users no longer self-contained central database to a world where com- need an understanding of the data sources, the relation panies buy and sell their data, interact with several between them, or the encoding of the data.
    [Show full text]
  • SMART: a Web-Based, Ontology-Driven, Semantic Web Query Answering Application
    SMART: A Web-Based, Ontology-Driven, Semantic Web Query Answering Application Alexander De Leon Battista1, Natalia Villanueva-Rosales1, Myroslav Palenychka1, Michel Dumontier1,2 1 School of Computer Science, 2Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, K1S5B6 Canada {adlbatti,nvillanu,mpalenyc}@scs.carleton,ca, [email protected] Abstract. SMART (Semantic web information Management with automated Reasoning Tool) is an open-source project, which aims to provide intuitive tools for life scientists for represent, integrate, manage and query heterogeneous and distributed biological knowledge. SMART was designed with interoperability and extensibility in mind and uses AJAX, SVG and JSF technologies, RDF, OWL, SPARQL semantic web languages, triple stores (i.e. Jena) and DL reasoners (i.e. Pellet) for the automated reasoning. Features include semantic query composition and validation using DL reasoners, a graphical representation of the query, a mapping of DL queries to SPARQL, and the retrieval of pre-computed inferences from an RDF triple store. With a use case scenario, we illustrate how a biological scientist can intuitively query the yeast knowledge base and navigate the results. Continued development of this web-based resource for the biological semantic web will enable new information retrieval opportunities for the life sciences. Keywords: Semantic Web, Query Answering, Graphical User Interface. 1 Introduction The primary goal of the Semantic Web is to add semantics to the current Web in such a way that machines can reason about the content [1]. Research activities in the Dumontier Lab are focused on using semantic web technologies to enable biochemical knowledge discovery [2, 3]. In combination with complimentary efforts, we expect that more powerful knowledge discovery tools will facilitate data integration in bioinformatics and create powerful new data mining opportunities over heterogeneous biochemical knowledge.
    [Show full text]
  • A Visual Approach to Semantic Query Design Using a Web-Based Graphical Query Designer
    A Visual Approach to Semantic Query Design Using a Web-Based Graphical Query Designer Paul R. Smart1, Alistair Russell1, Dave Braines2, Yannis Kalfoglou1, , Jie Bao3 and Nigel R. Shadbolt1 1 School of Electronics and Computer Science, University of Southampton, Southampton, SO17 1BJ, United Kingdom ar5, ps02v, yk1, [email protected] 2 Emerging Technology Services, IBM United Kingdom Ltd, Hursley Park, Winchester, Hampshire, SO21 2JN, United Kingdom dave_braines @uk.ibm.com 3 Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, USA [email protected] Abstract. Query formulation is a key aspect of information retrieval, contributing to both the efficiency and usability of many semantic applications. A number of query languages, such as SPARQL, have been developed for the Semantic Web; however, there are, as yet, few tools to support end users with respect to the creation and editing of semantic queries. In this paper we introduce NITELIGHT, a Web-based graphical tool for semantic query construction that is based on the W3C SPARQL specification. NITELIGHT combines a number of features to support end-users with respect to the creation of SPARQL queries. These include a columnar ontology browser, an interactive graphical design surface, a SPARQL-compliant visual query language, a SPARQL syntax viewer and an integrated semantic query results browser. The functionality of each of these components is described in the current paper. In addition, we discuss the potential contribution of the NITELIGHT tool to rule creation/editing and semantic integration capabilities. Keywords: sparql, visual query system, semantic web, graphical query language, ontology, owl, rdf, semantic integration, ontology alignment.
    [Show full text]
  • An Ontology-Based Framework for XML Semantic Integration
    An Ontology-based Framework for XML Semantic Integration Isabel F. Cruz Huiyong Xiao Feihong Hsu Department of Computer Science University of Illinois at Chicago fifc, hxiao, [email protected] Abstract different XML sources. In doing so, we propose an ap- proach for integration of heterogeneous XML sources and XML is becoming the standard for data interchange on query processing across these XML sources. the web. However, XML and its schema languages do not XML documents that conform to different schemas may express semantics but rather structure, such as nesting in- represent data with similar semantics. Therefore, a user formation. Therefore, semantically equivalent documents must construct queries in accordance to an XML docu- often present different document structures. In this pa- ment’s structure to retrieve fragments of information that per, we provide an ontology-based framework that aims to have the same meaning. This fact makes the formulation of make two XML documents interoperate at the semantic level queries on heterogeneous XML sources a nontrivial burden while retaining their nesting structure. In our global-as- to the user. Furthermore, this shortcoming of XML impedes view approach, we generate an RDF ontology for each of the interoperability of XML sources since the reformulation the participating XML documents, which preserves the nest- of XML queries has to eliminate the structural differences ing structure of the document. An RDF global ontology is of the queries while presenting the same semantics. Let us the result of merging the individual ontologies. The global illustrate the problem using a running example as shown in ontology unifies the query access and establishes seman- Figure 1.
    [Show full text]
  • Query Algebra for the Very Loosely Structured Data Model
    International Journal of Database Theory and Application Vol.8, No.1 (2015), pp.197-204 http://dx.doi.org/10.14257/ijdta.2015.8.1.20 Query Algebra for the Very Loosely Structured Data Model Ying Pan, Changan Yuan, Zhengqi Li and Wenjing Li (College of Computer and Information Engineering, Guangxi Teachers Education University, Nanning 530023) Abstract In order to realize the idea of pay-as-you-go (PAYG) data management which dataspace emphasizes, the very loosely structured data model is usually used to describe the massive, heterogeneous and dynamic data in dataspace. However, the present study mainly concentrates on the applications of the data model, and the query theory research is less. The query algebra is a theoretical foundation for query and its optimization in a PAYG fashion, so that how to establish a complete algebra based on the characteristics of loosely structured data model is an important problem need to be solved. In this paper, a formal definition of very loosely structured data model is given, then the query model and query algebra based on the model are proposed, which support to not only the operations such as set operators, selection, projection and join, but also association query in dataspace. Keywords: dataspace, very loosely structured model, query algebra 1. Introduction and Motivation How to manage mass, heterogeneous and evolving data efficiently is the major problem in data management field. However, the existing database systems and data integration technology require the hard up-front investment before powerful functionalities can be provided, and they are thus difficult to management such heterogeneous and evolving data timely [1, 2].
    [Show full text]
  • Natural Language to SQL Generation for Semantic Knowledge Extraction in Social Web Sources
    ISSN (Online) : 0974-5645 ISSN (Print) : 0974-6846 Indian Journal of Science and Technology, Vol 8(1), 01–10, January 2015 DOI: 10.17485/ijst/2015/v8i1/54123 Natural language to SQL Generation for Semantic Knowledge Extraction in Social Web Sources K. Javubar Sathick* and A. Jaya Department of Computer Applications, B.S. Abdur Rahman University, Chennai, India; [email protected], [email protected] Abstract Enormous evolution of web data creates a peculiar myth in the field of computer and information technology for extracting the meaningful content from the web. Many organizations and social networks use databases for storing information and the data will be fetched from the specified data store. Data can be retrieved or accessed by SQL queries whereas the query is in the form of natural lingual statement which has to be processed. So, the primary objective of this research article is to find the suitable way to convert natural language query to SQL and make the data apt for semantic extraction. This Research paper also aims to derive an automatic query translator for Natural Language based questions into their associated SQL queries and provides an user friendly interface between end user and the database for easy access of social web data from different web sources such as facebook, twitter and linkedIn etc.,. This paper is implemented using java as the front end, SQL server as the back end and R-tool is used to collect the data from social web sources. This research article provides an oKpetiymwizoerdd SsQ:L query generation for the Natural Language question provided by the end user.
    [Show full text]