A Case Study of Data Integration for Aquatic Resources Using Semantic Web Technologies

Total Page:16

File Type:pdf, Size:1020Kb

A Case Study of Data Integration for Aquatic Resources Using Semantic Web Technologies A Case Study of Data Integration for Aquatic Resources Using Semantic Web Technologies By Janice Gordon, Nina Chkhenkeli, David Govoni, Frances Lightsom, Andrea Ostroff, Peter Schweitzer, Phethala Thongsavanh, Dalia Varanka, and Stephan Zednik Open-File Report 2015–1004 U.S. Department of the Interior U.S. Geological Survey U.S. Department of the Interior SALLY JEWELL, Secretary U.S. Geological Survey Suzette M. Kimball, Acting Director U.S. Geological Survey, Reston, Virginia: 2015 For more information on the USGS—the Federal source for science about the Earth, its natural and living resources, natural hazards, and the environment—visit http://www.usgs.gov or call 1–888–ASK–USGS For an overview of USGS information products, including maps, imagery, and publications, visit http://www.usgs.gov/pubprod To order this and other USGS information products, visit http://store.usgs.gov Suggested citation: Gordon, Janice, Chkhenkeli, Nina, Govoni, David, Lightsom, Frances, Ostroff, Andrea, Schweitzer, Peter, Thongsavanh, Phethala, Varanka, Dalia, and Zednik, Stephan, 2015, A case study of data integration for aquatic resources using semantic web technologies: U.S. Geological Survey Open-File Report 2015–1004, 55 p., http://dx.doi.org/10.3133/ofr20151004. ISSN 2331-1258 (online) Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. Although this information product, for the most part, is in the public domain, it also may contain copyrighted materials as noted in the text. Permission to reproduce copyrighted items must be secured from the copyright owner. ii Contents Abstract ...................................................................................................................................................................... 1 Introduction ................................................................................................................................................................. 1 Challenges of Scientific Data Integration ................................................................................................................ 1 The Potential of Semantic Web Technologies......................................................................................................... 2 Demonstration Project ............................................................................................................................................. 3 Purpose and Scope ............................................................................................................................................. 3 Use Case: Data Integration for Freshwater Fish Habitat Modeling ...................................................................... 3 Data Sources ....................................................................................................................................................... 4 Methods ...................................................................................................................................................................... 4 Application of the Methodology ................................................................................................................................... 6 Use Case Development and Iteration ..................................................................................................................... 6 Information Modeling ............................................................................................................................................... 7 Technical Approach ................................................................................................................................................ 9 Architecture Design Plan ..................................................................................................................................... 9 Data Preparation ............................................................................................................................................... 10 BioData .......................................................................................................................................................... 10 Multistate Aquatic Resource Information System .......................................................................................... 11 National Geochemical Survey ........................................................................................................................ 11 National Hydrography Data set ...................................................................................................................... 13 Prototype Development Approach ..................................................................................................................... 13 Results ...................................................................................................................................................................... 14 User Interface Design ........................................................................................................................................... 14 Prototype Architecture ........................................................................................................................................... 15 TDB Triple-store ................................................................................................................................................ 16 Fuseki SPARQL Endpoint ................................................................................................................................. 16 API Development .................................................................................................................................................. 17 Integrated Data .................................................................................................................................................. 18 Fish ................................................................................................................................................................ 20 Water ............................................................................................................................................................. 22 RDF Serialization in Turtle ......................................................................................................................... 23 Sediment........................................................................................................................................................ 24 RDF Serialization in Turtle ......................................................................................................................... 24 Geospatial Data Integration ............................................................................................................................... 24 Discussion and Conclusions ..................................................................................................................................... 25 Evaluation of the Prototype ................................................................................................................................... 25 Evaluation of the Use Case ............................................................................................................................... 26 Evaluation of the Methodology .............................................................................................................................. 27 Future Goals ......................................................................................................................................................... 27 Provenance and Data Quality ............................................................................................................................ 28 Technology Performance .................................................................................................................................. 28 Geospatial Semantic Integration ....................................................................................................................... 28 iii Use(s) of Linked Data ........................................................................................................................................ 29 Acknowledgments ..................................................................................................................................................... 29 References Cited ...................................................................................................................................................... 30 Glossary ................................................................................................................................................................... 32 Appendix 1: Semantic Web Technologies (Overview) .............................................................................................. 34 Data Interchange................................................................................................................................................... 34 Resource Description Framework ..................................................................................................................... 34 Resource Description Framework in Attributues (RDFa) ................................................................................... 34 Linked Data ......................................................................................................................................................
Recommended publications
  • Basic Querying with SPARQL
    Basic Querying with SPARQL Andrew Weidner Metadata Services Coordinator SPARQL SPARQL SPARQL SPARQL SPARQL Protocol SPARQL SPARQL Protocol And SPARQL SPARQL Protocol And RDF SPARQL SPARQL Protocol And RDF Query SPARQL SPARQL Protocol And RDF Query Language SPARQL SPARQL Protocol And RDF Query Language SPARQL Query SPARQL Query Update SPARQL Query Update - Insert triples SPARQL Query Update - Insert triples - Delete triples SPARQL Query Update - Insert triples - Delete triples - Create graphs SPARQL Query Update - Insert triples - Delete triples - Create graphs - Drop graphs Query RDF Triples Query RDF Triples Triple Store Query RDF Triples Triple Store Data Dump Query RDF Triples Triple Store Data Dump Subject – Predicate – Object Query Subject – Predicate – Object ?s ?p ?o Query Dog HasName Cocoa Subject – Predicate – Object ?s ?p ?o Query Cocoa HasPhoto ------- Subject – Predicate – Object ?s ?p ?o Query HasURL http://bit.ly/1GYVyIX Subject – Predicate – Object ?s ?p ?o Query ?s ?p ?o Query SELECT ?s ?p ?o Query SELECT * ?s ?p ?o Query SELECT * WHERE ?s ?p ?o Query SELECT * WHERE { ?s ?p ?o } Query select * WhErE { ?spiderman ?plays ?oboe } http://deathbattle.wikia.com/wiki/Spider-Man http://www.mmimports.com/wp-content/uploads/2014/04/used-oboe.jpg Query SELECT * WHERE { ?s ?p ?o } Query SELECT * WHERE { ?s ?p ?o } Query SELECT * WHERE { ?s ?p ?o } Query SELECT * WHERE { ?s ?p ?o } LIMIT 10 SELECT * WHERE { ?s ?p ?o } LIMIT 10 http://dbpedia.org/snorql SELECT * WHERE { ?s ?p ?o } LIMIT 10 http://dbpedia.org/snorql SELECT * WHERE { ?s
    [Show full text]
  • A Hierarchical SVG Image Abstraction Layer for Medical Imaging
    A Hierarchical SVG Image Abstraction Layer for Medical Imaging Edward Kim1, Xiaolei Huang1, Gang Tan1, L. Rodney Long2, Sameer Antani2 1Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA; 2Communications Engineering Branch, National Library of Medicine, Bethesda, MD ABSTRACT As medical imaging rapidly expands, there is an increasing need to structure and organize image data for efficient analysis, storage and retrieval. In response, a large fraction of research in the areas of content-based image retrieval (CBIR) and picture archiving and communication systems (PACS) has focused on structuring information to bridge the “semantic gap”, a disparity between machine and human image understanding. An additional consideration in medical images is the organization and integration of clinical diagnostic information. As a step towards bridging the semantic gap, we design and implement a hierarchical image abstraction layer using an XML based language, Scalable Vector Graphics (SVG). Our method encodes features from the raw image and clinical information into an extensible “layer” that can be stored in a SVG document and efficiently searched. Any feature extracted from the raw image including, color, texture, orientation, size, neighbor information, etc., can be combined in our abstraction with high level descriptions or classifications. And our representation can natively characterize an image in a hierarchical tree structure to support multiple levels of segmentation. Furthermore, being a world wide web consortium (W3C) standard, SVG is able to be displayed by most web browsers, interacted with by ECMAScript (standardized scripting language, e.g. JavaScript, JScript), and indexed and retrieved by XML databases and XQuery. Using these open source technologies enables straightforward integration into existing systems.
    [Show full text]
  • Mapping Spatiotemporal Data to RDF: a SPARQL Endpoint for Brussels
    International Journal of Geo-Information Article Mapping Spatiotemporal Data to RDF: A SPARQL Endpoint for Brussels Alejandro Vaisman 1, * and Kevin Chentout 2 1 Instituto Tecnológico de Buenos Aires, Buenos Aires 1424, Argentina 2 Sopra Banking Software, Avenue de Tevuren 226, B-1150 Brussels, Belgium * Correspondence: [email protected]; Tel.: +54-11-3457-4864 Received: 20 June 2019; Accepted: 7 August 2019; Published: 10 August 2019 Abstract: This paper describes how a platform for publishing and querying linked open data for the Brussels Capital region in Belgium is built. Data are provided as relational tables or XML documents and are mapped into the RDF data model using R2RML, a standard language that allows defining customized mappings from relational databases to RDF datasets. In this work, data are spatiotemporal in nature; therefore, R2RML must be adapted to allow producing spatiotemporal Linked Open Data.Data generated in this way are used to populate a SPARQL endpoint, where queries are submitted and the result can be displayed on a map. This endpoint is implemented using Strabon, a spatiotemporal RDF triple store built by extending the RDF store Sesame. The first part of the paper describes how R2RML is adapted to allow producing spatial RDF data and to support XML data sources. These techniques are then used to map data about cultural events and public transport in Brussels into RDF. Spatial data are stored in the form of stRDF triples, the format required by Strabon. In addition, the endpoint is enriched with external data obtained from the Linked Open Data Cloud, from sites like DBpedia, Geonames, and LinkedGeoData, to provide context for analysis.
    [Show full text]
  • Validating RDF Data Using Shapes
    83 Validating RDF data using Shapes a Jose Emilio Labra Gayo ​ ​ a University​ of Oviedo, Spain Abstract RDF forms the keystone of the Semantic Web as it enables a simple and powerful knowledge representation graph based data model that can also facilitate integration between heterogeneous sources of information. RDF based applications are usually accompanied with SPARQL stores which enable to efficiently manage and query RDF data. In spite of its well known benefits, the adoption of RDF in practice by web programmers is still lacking and SPARQL stores are usually deployed without proper documentation and quality assurance. However, the producers of RDF data usually know the implicit schema of the data they are generating, but they don't do it traditionally. In the last years, two technologies have been developed to describe and validate RDF content using the term shape: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL). We will present a motivation for their appearance and compare them, as well as some applications and tools that have been developed. Keywords RDF, ShEx, SHACL, Validating, Data quality, Semantic web 1. Introduction In the tutorial we will present an overview of both RDF is a flexible knowledge and describe some challenges and future work [4] representation language based of graphs which has been successfully adopted in semantic web 2. Acknowledgements applications. In this tutorial we will describe two languages that have recently been proposed for This work has been partially funded by the RDF validation: Shape Expressions (ShEx) and Spanish Ministry of Economy, Industry and Shapes Constraint Language (SHACL).ShEx was Competitiveness, project: TIN2017-88877-R proposed as a concise and intuitive language for describing RDF data in 2014 [1].
    [Show full text]
  • Towards Ontology Based BPMN Implementation. Sophea Chhun, Néjib Moalla, Yacine Ouzrout
    Towards ontology based BPMN Implementation. Sophea Chhun, Néjib Moalla, Yacine Ouzrout To cite this version: Sophea Chhun, Néjib Moalla, Yacine Ouzrout. Towards ontology based BPMN Implementation.. SKIMA, 6th Conference on Software Knowledge Information Management and Applications., Jan 2012, Chengdu, China. 8 p. hal-01551452 HAL Id: hal-01551452 https://hal.archives-ouvertes.fr/hal-01551452 Submitted on 6 Nov 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 1 Towards ontology based BPMN implementation CHHUN Sophea, MOALLA Néjib and OUZROUT Yacine University of Lumiere Lyon2, laboratory DISP, France Natural language is understandable by human and not machine. None technical persons can only use natural language to specify their business requirements. However, the current version of Business process management and notation (BPMN) tools do not allow business analysts to implement their business processes without having technical skills. BPMN tool is a tool that allows users to design and implement the business processes by connecting different business tasks and rules together. The tools do not provide automatic implementation of business tasks from users’ specifications in natural language (NL). Therefore, this research aims to propose a framework to automatically implement the business processes that are expressed in NL requirements.
    [Show full text]
  • Semantic Integration and Knowledge Discovery for Environmental Research
    Journal of Database Management, 18(1), 43-67, January-March 2007 43 Semantic Integration and Knowledge Discovery for Environmental Research Zhiyuan Chen, University of Maryland, Baltimore County (UMBC), USA Aryya Gangopadhyay, University of Maryland, Baltimore County (UMBC), USA George Karabatis, University of Maryland, Baltimore County (UMBC), USA Michael McGuire, University of Maryland, Baltimore County (UMBC), USA Claire Welty, University of Maryland, Baltimore County (UMBC), USA ABSTRACT Environmental research and knowledge discovery both require extensive use of data stored in various sources and created in different ways for diverse purposes. We describe a new metadata approach to elicit semantic information from environmental data and implement semantics- based techniques to assist users in integrating, navigating, and mining multiple environmental data sources. Our system contains specifications of various environmental data sources and the relationships that are formed among them. User requests are augmented with semantically related data sources and automatically presented as a visual semantic network. In addition, we present a methodology for data navigation and pattern discovery using multi-resolution brows- ing and data mining. The data semantics are captured and utilized in terms of their patterns and trends at multiple levels of resolution. We present the efficacy of our methodology through experimental results. Keywords: environmental research, knowledge discovery and navigation, semantic integra- tion, semantic networks,
    [Show full text]
  • A State-Of-The-Art Survey on Semantic Web Mining
    Intelligent Information Management, 2013, 5, 10-17 http://dx.doi.org/10.4236/iim.2013.51002 Published Online January 2013 (http://www.scirp.org/journal/iim) A State-of-the-Art Survey on Semantic Web Mining Qudamah K. Quboa, Mohamad Saraee School of Computing, Science, and Engineering, University of Salford, Salford, UK Email: [email protected], [email protected] Received November 9, 2012; revised December 9, 2012; accepted December 16, 2012 ABSTRACT The integration of the two fast-developing scientific research areas Semantic Web and Web Mining is known as Se- mantic Web Mining. The huge increase in the amount of Semantic Web data became a perfect target for many re- searchers to apply Data Mining techniques on it. This paper gives a detailed state-of-the-art survey of on-going research in this new area. It shows the positive effects of Semantic Web Mining, the obstacles faced by researchers and propose number of approaches to deal with the very complex and heterogeneous information and knowledge which are pro- duced by the technologies of Semantic Web. Keywords: Web Mining; Semantic Web; Data Mining; Semantic Web Mining 1. Introduction precisely answer and satisfy the web users’ requests [1,5, 6]. Semantic Web is a part of the second generation web Semantic Web Mining is an integration of two important (Web2.0) and its original idea derived from the vision scientific areas: Semantic Web and Data Mining [1]. Se- W3C’s director and the WWW founder, Sir Tim Berners- mantic Web is used to give a meaning to data, creating complex and heterogeneous data structure, while Data Lee.
    [Show full text]
  • Poolparty Semantic Suite Functional Overview
    PoolParty Semantic Suite Functional Overview Andreas Blumauer CEO & Managing Partner Semantic Web Company / PoolParty Semantic Suite 2 Semantic Web Company (SWC) ▸ Founded in 2004, based in Vienna ▸ Privately held 3 ▸ 50 FTE ▸ Software Engineers & Consultants for Introducing NLP, Semantics and Machine learning Semantic Semantic Web AI ▸ Developer & Vendor of Company PoolParty Semantic Suite ▸ Participating in projects with €2.5 million funding for R&D ▸ ~30% revenue growth/year ▸ SWC named to KMWorld’s ‘100 Companies That Matter in Knowledge Management’ in 2016, 2017 and 2018 ▸ www.semantic-web.com 2017 2016 PoolParty Semantic Suite ▸ Most complete Semantic Middleware on the Global Market ▸ Semantic AI: Fusion of Knowledge 4 Graphs, NLP, and Machine Learning ▸ W3C standards compliant Fact sheet: PoolParty ▸ First release in 2009 ▸ Current version 7.0 ▸ On-premises or cloud-based ▸ Over 200 installations world-wide ▸ Named as Sample Vendor in Gartner’s Hype Cycle for AI 2018 ▸ KMWorld listed PoolParty as Trend-Setting Product 2015, 2016, 2017, and 2018 ▸ www.poolparty.biz We work with Global Fortune Companies, and with some of the largest GOs and NGOs from over 20 countries. SWC head- US UK quarters West US Selected Customer References East 5 ● Credit Suisse ● Boehringer Ingelheim Selected ● Roche ● adidas Customer ● The Pokémon Company ● Fluor AUS/ References ● Harvard Business School NZL ● Wolters Kluwer and Partners ● Philips ● Nestlé ● Electronic Arts Selected Partners ● Springer Nature ● Pearson - Always Learning ● Enterprise Knowledge ● Healthdirect Australia ● Mekon Intelligent Content Solutions ● World Bank Group ● Soitron ● Canadian Broadcasting Corporation ● Accenture ● Oxford University Press ● EPAM Systems ● International Atomic Energy Agency ● BAON Enterprises ● Siemens ● Findwise ● Singapore Academy of Law ● Tellura Semantics ● Inter-American Development Bank ● HPC ● Council of the E.U.
    [Show full text]
  • Semantic Technologies for Systems Engineering (ST4SE)
    Semantic Technologies for Systems Engineering (ST4SE) Update at INCOSE IW, 27 Jan 2019, Torrance, CA, USA Hans Peter de Koning European Space Agency, ESA/ESTEC, Noordwijk, The Netherlands Based on earlier presentations and contributions by other ST4SE Core Team Members Objectives of the ST4SE Foundation To promote and champion the open-source development and utilization of ontologies and semantic technologies to support system engineering practice, education, and research 1. Provide a semantically rich language to communicate among systems engineers and other stakeholders 2. Define patterns that can be used to check for consistency and completeness 3. Support querying of information from model 4. Focus on adding value by balancing the expected benefits from being formal and the cost of being formal ST4SE Update | INCOSE IW | 2019-01-27, Torrance, CA, USA 2 MBSE Challenge – 3 Kinds of Communication . Person ↔ Person . Machine ↔ Machine . Person ↔ Machine Person . All bi-directional (of course) . All need to work flawlessly Machine ST4SE Update | INCOSE IW | 2019-01-27, Torrance, CA, USA 3 Outline . Background on Semantic Technologies • Knowledge representation, reasoning, querying . Semantic Technologies for System Engineering • Motivation • Scope and focus • Relationship between ST4SE and SysML 2.0 . ST4SE Approach • Open-source foundation • Bootstrapping: (best) practices for defining, demonstrating and documenting patterns ST4SE Update | INCOSE IW | 2019-01-27, Torrance, CA, USA 4 Increasing levels of semantic precision (and understanding
    [Show full text]
  • Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing
    RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing A collection of attributes and processing rules for extending XHTML to support RDF W3C Recommendation 14 October 2008 This version: http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014 Latest version: http://www.w3.org/TR/rdfa-syntax Previous version: http://www.w3.org/TR/2008/PR-rdfa-syntax-20080904 Diff from previous version: rdfa-syntax-diff.html Editors: Ben Adida, Creative Commons [email protected] Mark Birbeck, webBackplane [email protected] Shane McCarron, Applied Testing and Technology, Inc. [email protected] Steven Pemberton, CWI Please refer to the errata for this document, which may include some normative corrections. This document is also available in these non-normative formats: PostScript version, PDF version, ZIP archive, and Gzip’d TAR archive. The English version of this specification is the only normative version. Non-normative translations may also be available. Copyright © 2007-2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported - 1 - How to Read this Document RDFa in XHTML: Syntax and Processing into a user’s desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo’s creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.
    [Show full text]
  • Using Rule-Based Reasoning for RDF Validation
    Using Rule-Based Reasoning for RDF Validation Dörthe Arndt, Ben De Meester, Anastasia Dimou, Ruben Verborgh, and Erik Mannens Ghent University - imec - IDLab Sint-Pietersnieuwstraat 41, B-9000 Ghent, Belgium [email protected] Abstract. The success of the Semantic Web highly depends on its in- gredients. If we want to fully realize the vision of a machine-readable Web, it is crucial that Linked Data are actually useful for machines con- suming them. On this background it is not surprising that (Linked) Data validation is an ongoing research topic in the community. However, most approaches so far either do not consider reasoning, and thereby miss the chance of detecting implicit constraint violations, or they base them- selves on a combination of dierent formalisms, eg Description Logics combined with SPARQL. In this paper, we propose using Rule-Based Web Logics for RDF validation focusing on the concepts needed to sup- port the most common validation constraints, such as Scoped Negation As Failure (SNAF), and the predicates dened in the Rule Interchange Format (RIF). We prove the feasibility of the approach by providing an implementation in Notation3 Logic. As such, we show that rule logic can cover both validation and reasoning if it is expressive enough. Keywords: N3, RDF Validation, Rule-Based Reasoning 1 Introduction The amount of publicly available Linked Open Data (LOD) sets is constantly growing1, however, the diversity of the data employed in applications is mostly very limited: only a handful of RDF data is used frequently [27]. One of the reasons for this is that the datasets' quality and consistency varies signicantly, ranging from expensively curated to relatively low quality data [33], and thus need to be validated carefully before use.
    [Show full text]
  • Semantic Integration Across Heterogeneous Databases Finding Data Correspondences Using Agglomerative Hierarchical Clustering and Artificial Neural Networks
    DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2018 Semantic Integration across Heterogeneous Databases Finding Data Correspondences using Agglomerative Hierarchical Clustering and Artificial Neural Networks MARK HOBRO KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Semantic Integration across Heterogeneous Databases Finding Data Correspondences using Agglomerative Hierarchical Clustering and Artificial Neural Networks MARK HOBRO Master in Computer Science Date: April 11, 2018 Supervisor: John Folkesson Examiner: Hedvig Kjellström Swedish title: Semantisk integrering mellan heterogena databaser: Hitta datakopplingar med hjälp av hierarkisk klustring och artificiella neuronnät School of Computer Science and Communication iii Abstract The process of data integration is an important part of the database field when it comes to database migrations and the merging of data. The research in the area has grown with the addition of machine learn- ing approaches in the last 20 years. Due to the complexity of the re- search field, no go-to solutions have appeared. Instead, a wide variety of ways of enhancing database migrations have emerged. This thesis examines how well a learning-based solution performs for the seman- tic integration problem in database migrations. Two algorithms are implemented. One that is based on informa- tion retrieval theory, with the goal of yielding a matching result that can be used as a benchmark for measuring the performance of the machine learning algorithm. The machine learning approach is based on grouping data with agglomerative hierarchical clustering and then training a neural network to recognize patterns in the data. This al- lows making predictions about potential data correspondences across two databases.
    [Show full text]