Xmlschema2shex: Converting XML Validation to RDF Validation

Total Page:16

File Type:pdf, Size:1020Kb

Xmlschema2shex: Converting XML Validation to RDF Validation Semantic Web 0 (0) 1 1 IOS Press XMLSchema2ShEx: Converting XML validation to RDF validation Editor(s): Axel Polleres, Vienna University of Economics and Business (WU Wien), Austria Solicited review(s): Felix Sasaki, Cornelsen Verlag GmbH, Germany; Emir Muñoz, National University of Ireland Galway, Ireland; Simon Steyskal, Vienna University of Economics and Business (WU Wien), Austria Herminio Garcia-Gonzalez a;∗, Jose Emilio Labra-Gayo a a Department of Computer Science, University of Oviedo, Oviedo, Asturias, Spain Email: [email protected], [email protected] Abstract. RDF validation is a field where the Semantic Web community is currently focusing attention. Besides, there is a recent trend to migrate data from different sources to semantic web formats. Therefore, in order to facilitate this transformation, we propose: a set of mappings that can be used to convert from XML Schema to Shape Expressions (ShEx), a prototype that implements a subset of the proposed mappings, an example application to obtain a ShEx schema from an XML Schema and a discussion on conversion implications of non-deterministic schemata. We demonstrate that an XML and its corresponding XML Schema are still valid when converted to their RDF and ShEx counterparts. This conversion, along with the development of other format mappings, could drive to an improvement of data interoperability due to the reduction of the technological gap. Keywords: ShEx, XML Schema, Shape Expressions, formats mapping, data validation 1. Introduction XML Schema [5] was designed as a language to make XML validation possible with more expressive- Data validation is a key area when normalisation ness than DTDs [4]. Using XML Schema developers and confidence are desired. Normalisation—which can can define the structure, constraints and documenta- be defined, in this context, as using an homogeneous tion of an XML vocabulary. Besides DTD and XML schema or structure across different sources of similar Schema, other alternatives for XML validation (such information—is desired as a way of making a dataset as Relax NG [11] and Schematron [18]) were pro- more reliable and even more useful to possible con- posed. sumers because of its standardised schema. Validation In the Semantic Web, RDF was missing a stan- can excel data cleansing, querying and standardisation dard constraints validation language which covers the of datasets. In words of P.N. Fox et al. [16]: “Proce- same features that XML Schema does for XML. Some dures for data validation increase the value of data alternatives were OWL [17] and RDF Schema [10]; and the users’ confidence in predictions made from however, they do not cover completely what XML them. Well-designed data management systems may Schema does for XML [38]. For this purpose, Shape strengthen data validation itself, by providing better Expressions (ShEx) [32,33] was proposed to fulfill the estimates of expected values than were available pre- requirement of a constraints validation language for viously.”. Therefore, validation is a key field of data RDF, and SHACL [20] (another proposed language for management. RDF validation) has recently become a W3C recom- mendation. As many documents and data are persisted in XML, *Corresponding Author. Email: [email protected] the need for migration and interoperability to more 1570-0844/0-1900/$35.00 c 0 – IOS Press and the authors. All rights reserved 2 H. Garcia-Gonzalez & J. E. Labra-Gayo / XMLSchema2ShEx flexible data is nowadays more pressing than ever, introduction to ShEx; Section 4 describes a possible set many authors have proposed conversions from XML to of mappings between XML Schema and ShEx; Sec- RDF [27,12,2,6], with the goal of transforming XML tion 5 presents a prototype used to validate a subset data to Semantic Web formats. of previously presented mappings and how this con- Although these conversions enable users to migrate version works against existing RDF validators; Sec- their data to Semantic Web, means for validating the tion 6 discusses the implications of Non-Deterministic output data after converting XML to RDF are missing. schemata on our work. Finally, Section 7 draws some Therefore, we should ensure that the conversion has conclusions and future lines of work and improvement. been done correctly and that both versions—in differ- ent languages—are defining the same meaning. Conversions between XML and RDF, and between 2. Background XML Schema and ShEx are necessary to alleviate the gap between semantic technologies and more The related work of XML ecosystem conversion can traditional ones (e.g., XML, JSON, CSV, relational be divided in three main categories: conversions from databases). With that in mind, providing generic trans- XML to Semantic Web formats, conversions from formation tools from non-semantic technologies to se- XML schemata to non Semantic Web schemata and mantic technologies can enhance the migration pos- conversions from XML schemata to RDF schemata. sibilities; in other words, if we can create tools that ease the transformation and adaptation among tech- 2.1. From XML to Semantic Web formats nologies we will encourage future migrations. Taking Text Encoding Initiative (TEI) [14] as an example, dig- Along with schemata conversions, data transforma- ital humanities can take benefit from Semantic Web tion has to be tackled. Therefore many authors have approaches [37,35]. There are many manuscripts tran- worked on this topic of converting from XML to Se- scribed to XML—using TEI—that can be converted mantic Web formats and more specifically to RDF. For to RDF. But transcribers are hesitant to deal with the this conversions there are plenty of strategies that have underlying technology although they can benefit from been proposed and followed by other authors. it [26]. Those are the cases where generic approaches, In [27], authors describe their experience on devel- as the one introduced here, can offer a solution and oping this transformation for business to business in- where automatic conversion of schemata has its place dustry in the case of the Semantic Mediation tools. An when transformations are to be checked. XML Schema to RDF Schema transformation is per- Taking into account what we previously presented, formed as part of the requirement of the Semantic Me- the questions that we want to address in the present diation tool. work are the following: In [12], a transformation between XML and RDF depending on an ontology is described. This transfor- – RQ1: What components should have a mapping mation takes an XML document, a mapping document from XML Schema to ShEx? and an ontology document and makes the transforma- – RQ2: How to ensure that both schemata are tions to RDF instances compliant with the input ontol- equivalent? ogy. Using the mapping file, conversions between the – RQ3: Is it possible to ensure a backwards conver- XML Schema and the ontology are established. sion in all cases? In [1], the author explains how XML can be con- – RQ4: Are non-deterministic schemata (i.e., am- verted to RDF—and vice versa—using XML Schema biguous schemata) possible to translate and vali- as the base for the mappings. This work is then ex- date? panded in [2] where the author tries to solve the lift In this paper, we describe a solution on how to make problem (the problem of how to map heterogeneous the conversion from XML Schema to ShEx. We de- data sources in the same representational framework) scribe how each element in XML Schema can be trans- from XML to RDF and backwards by using the Gloze lated into ShEx. Moreover, we present a prototype that mapping approach on top of Apache Jena. can convert a subset of what is defined in the following In [40], the authors present a mechanism to query sections. XML data as RDF. Firstly, a matching from XML The rest of the paper is structured as follows: Sec- Schema to RDF Schema class hierarchy is performed. tion 2 presents the background; Section 3 gives a brief Then XML elements can be interpreted as RDF triples. H. Garcia-Gonzalez & J. E. Labra-Gayo / XMLSchema2ShEx 3 The same procedure but using DTDs is described in to RDF Schema [27]. Moreover, when no schema is [39]. available the transformation can be performed from In [9], the author presents a technique for making XML to OWL [7,21,30,23]. standard transformations between XML and RDF us- However, RDF Schema and OWL were not de- ing XSLT. A case study in the field of astronomy is signed as RDF validation languages. Their use of Open used to illustrate the solution. World and Non-Unique Name Assumptions can pose Another approach using XSLT is [36] where authors some difficulties to define the integrity constraints that describe a mapping mechanism using XSLT that can RDF validation languages require [38]. be attached to schemata definition. In [3], a transformation from RDF to other kind of 2.4. FHIR approach formats, including XML, is proposed using in XSLT stylesheets embedded SPARQL which by means of Another approach for transformation between sche- these extensions, could query, merge and transform mas is to take a domain model as the main represen- data from the Semantic Web. tation of data structure and constraints and then trans- In [6], authors describe XSPARQL which is a form between that model and other schema formats framework that enables the transformation between like XML Schema, JSON Schema or ShEx. This has XML and RDF based on XQuery and SPARQL and been the approach followed by FHIR1. However, this solves the disadvantages of using XSLT for these technique needs the creation of a domain model as an transformations. abstract representation which is not the goal of our However, these works (except [27]) are not covering work. the schemata mapping problem. 2.5. RDF validation languages and its conversions 2.2.
Recommended publications
  • Validating RDF Data Using Shapes
    83 Validating RDF data using Shapes a Jose Emilio Labra Gayo ​ ​ a University​ of Oviedo, Spain Abstract RDF forms the keystone of the Semantic Web as it enables a simple and powerful knowledge representation graph based data model that can also facilitate integration between heterogeneous sources of information. RDF based applications are usually accompanied with SPARQL stores which enable to efficiently manage and query RDF data. In spite of its well known benefits, the adoption of RDF in practice by web programmers is still lacking and SPARQL stores are usually deployed without proper documentation and quality assurance. However, the producers of RDF data usually know the implicit schema of the data they are generating, but they don't do it traditionally. In the last years, two technologies have been developed to describe and validate RDF content using the term shape: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL). We will present a motivation for their appearance and compare them, as well as some applications and tools that have been developed. Keywords RDF, ShEx, SHACL, Validating, Data quality, Semantic web 1. Introduction In the tutorial we will present an overview of both RDF is a flexible knowledge and describe some challenges and future work [4] representation language based of graphs which has been successfully adopted in semantic web 2. Acknowledgements applications. In this tutorial we will describe two languages that have recently been proposed for This work has been partially funded by the RDF validation: Shape Expressions (ShEx) and Spanish Ministry of Economy, Industry and Shapes Constraint Language (SHACL).ShEx was Competitiveness, project: TIN2017-88877-R proposed as a concise and intuitive language for describing RDF data in 2014 [1].
    [Show full text]
  • V a Lida T in G R D F Da
    Series ISSN: 2160-4711 LABRA GAYO • ET AL GAYO LABRA Series Editors: Ying Ding, Indiana University Paul Groth, Elsevier Labs Validating RDF Data Jose Emilio Labra Gayo, University of Oviedo Eric Prud’hommeaux, W3C/MIT and Micelio Iovka Boneva, University of Lille Dimitris Kontokostas, University of Leipzig VALIDATING RDF DATA This book describes two technologies for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL), the rationales for their designs, a comparison of the two, and some example applications. RDF and Linked Data have broad applicability across many fields, from aircraft manufacturing to zoology. Requirements for detecting bad data differ across communities, fields, and tasks, but nearly all involve some form of data validation. This book introduces data validation and describes its practical use in day-to-day data exchange. The Semantic Web offers a bold, new take on how to organize, distribute, index, and share data. Using Web addresses (URIs) as identifiers for data elements enables the construction of distributed databases on a global scale. Like the Web, the Semantic Web is heralded as an information revolution, and also like the Web, it is encumbered by data quality issues. The quality of Semantic Web data is compromised by the lack of resources for data curation, for maintenance, and for developing globally applicable data models. At the enterprise scale, these problems have conventional solutions. Master data management provides an enterprise-wide vocabulary, while constraint languages capture and enforce data structures. Filling a need long recognized by Semantic Web users, shapes languages provide models and vocabularies for expressing such structural constraints.
    [Show full text]
  • Semantics Developer's Guide
    MarkLogic Server Semantic Graph Developer’s Guide 2 MarkLogic 10 May, 2019 Last Revised: 10.0-8, October, 2021 Copyright © 2021 MarkLogic Corporation. All rights reserved. MarkLogic Server MarkLogic 10—May, 2019 Semantic Graph Developer’s Guide—Page 2 MarkLogic Server Table of Contents Table of Contents Semantic Graph Developer’s Guide 1.0 Introduction to Semantic Graphs in MarkLogic ..........................................11 1.1 Terminology ..........................................................................................................12 1.2 Linked Open Data .................................................................................................13 1.3 RDF Implementation in MarkLogic .....................................................................14 1.3.1 Using RDF in MarkLogic .........................................................................15 1.3.1.1 Storing RDF Triples in MarkLogic ...........................................17 1.3.1.2 Querying Triples .......................................................................18 1.3.2 RDF Data Model .......................................................................................20 1.3.3 Blank Node Identifiers ..............................................................................21 1.3.4 RDF Datatypes ..........................................................................................21 1.3.5 IRIs and Prefixes .......................................................................................22 1.3.5.1 IRIs ............................................................................................22
    [Show full text]
  • Deciding SHACL Shape Containment Through Description Logics Reasoning
    Deciding SHACL Shape Containment through Description Logics Reasoning Martin Leinberger1, Philipp Seifer2, Tjitze Rienstra1, Ralf Lämmel2, and Steffen Staab3;4 1 Inst. for Web Science and Technologies, University of Koblenz-Landau, Germany 2 The Software Languages Team, University of Koblenz-Landau, Germany 3 Institute for Parallel and Distributed Systems, University of Stuttgart, Germany 4 Web and Internet Science Research Group, University of Southampton, England Abstract. The Shapes Constraint Language (SHACL) allows for for- malizing constraints over RDF data graphs. A shape groups a set of constraints that may be fulfilled by nodes in the RDF graph. We investi- gate the problem of containment between SHACL shapes. One shape is contained in a second shape if every graph node meeting the constraints of the first shape also meets the constraints of the second. Todecide shape containment, we map SHACL shape graphs into description logic axioms such that shape containment can be answered by description logic reasoning. We identify several, increasingly tight syntactic restrictions of SHACL for which this approach becomes sound and complete. 1 Introduction RDF has been designed as a flexible, semi-structured data format. To ensure data quality and to allow for restricting its large flexibility in specific domains, the W3C has standardized the Shapes Constraint Language (SHACL)5. A set of SHACL shapes are represented in a shape graph. A shape graph represents constraints that only a subset of all possible RDF data graphs conform to. A SHACL processor may validate whether a given RDF data graph conforms to a given SHACL shape graph. A shape graph and a data graph that act as a running example are pre- sented in Fig.
    [Show full text]
  • Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing
    RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing A collection of attributes and processing rules for extending XHTML to support RDF W3C Recommendation 14 October 2008 This version: http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014 Latest version: http://www.w3.org/TR/rdfa-syntax Previous version: http://www.w3.org/TR/2008/PR-rdfa-syntax-20080904 Diff from previous version: rdfa-syntax-diff.html Editors: Ben Adida, Creative Commons [email protected] Mark Birbeck, webBackplane [email protected] Shane McCarron, Applied Testing and Technology, Inc. [email protected] Steven Pemberton, CWI Please refer to the errata for this document, which may include some normative corrections. This document is also available in these non-normative formats: PostScript version, PDF version, ZIP archive, and Gzip’d TAR archive. The English version of this specification is the only normative version. Non-normative translations may also be available. Copyright © 2007-2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported - 1 - How to Read this Document RDFa in XHTML: Syntax and Processing into a user’s desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo’s creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.
    [Show full text]
  • Using Rule-Based Reasoning for RDF Validation
    Using Rule-Based Reasoning for RDF Validation Dörthe Arndt, Ben De Meester, Anastasia Dimou, Ruben Verborgh, and Erik Mannens Ghent University - imec - IDLab Sint-Pietersnieuwstraat 41, B-9000 Ghent, Belgium [email protected] Abstract. The success of the Semantic Web highly depends on its in- gredients. If we want to fully realize the vision of a machine-readable Web, it is crucial that Linked Data are actually useful for machines con- suming them. On this background it is not surprising that (Linked) Data validation is an ongoing research topic in the community. However, most approaches so far either do not consider reasoning, and thereby miss the chance of detecting implicit constraint violations, or they base them- selves on a combination of dierent formalisms, eg Description Logics combined with SPARQL. In this paper, we propose using Rule-Based Web Logics for RDF validation focusing on the concepts needed to sup- port the most common validation constraints, such as Scoped Negation As Failure (SNAF), and the predicates dened in the Rule Interchange Format (RIF). We prove the feasibility of the approach by providing an implementation in Notation3 Logic. As such, we show that rule logic can cover both validation and reasoning if it is expressive enough. Keywords: N3, RDF Validation, Rule-Based Reasoning 1 Introduction The amount of publicly available Linked Open Data (LOD) sets is constantly growing1, however, the diversity of the data employed in applications is mostly very limited: only a handful of RDF data is used frequently [27]. One of the reasons for this is that the datasets' quality and consistency varies signicantly, ranging from expensively curated to relatively low quality data [33], and thus need to be validated carefully before use.
    [Show full text]
  • Bibliography of Erik Wilde
    dretbiblio dretbiblio Erik Wilde's Bibliography References [1] AFIPS Fall Joint Computer Conference, San Francisco, California, December 1968. [2] Seventeenth IEEE Conference on Computer Communication Networks, Washington, D.C., 1978. [3] ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Los Angeles, Cal- ifornia, March 1982. ACM Press. [4] First Conference on Computer-Supported Cooperative Work, 1986. [5] 1987 ACM Conference on Hypertext, Chapel Hill, North Carolina, November 1987. ACM Press. [6] 18th IEEE International Symposium on Fault-Tolerant Computing, Tokyo, Japan, 1988. IEEE Computer Society Press. [7] Conference on Computer-Supported Cooperative Work, Portland, Oregon, 1988. ACM Press. [8] Conference on Office Information Systems, Palo Alto, California, March 1988. [9] 1989 ACM Conference on Hypertext, Pittsburgh, Pennsylvania, November 1989. ACM Press. [10] UNIX | The Legend Evolves. Summer 1990 UKUUG Conference, Buntingford, UK, 1990. UKUUG. [11] Fourth ACM Symposium on User Interface Software and Technology, Hilton Head, South Carolina, November 1991. [12] GLOBECOM'91 Conference, Phoenix, Arizona, 1991. IEEE Computer Society Press. [13] IEEE INFOCOM '91 Conference on Computer Communications, Bal Harbour, Florida, 1991. IEEE Computer Society Press. [14] IEEE International Conference on Communications, Denver, Colorado, June 1991. [15] International Workshop on CSCW, Berlin, Germany, April 1991. [16] Third ACM Conference on Hypertext, San Antonio, Texas, December 1991. ACM Press. [17] 11th Symposium on Reliable Distributed Systems, Houston, Texas, 1992. IEEE Computer Society Press. [18] 3rd Joint European Networking Conference, Innsbruck, Austria, May 1992. [19] Fourth ACM Conference on Hypertext, Milano, Italy, November 1992. ACM Press. [20] GLOBECOM'92 Conference, Orlando, Florida, December 1992. IEEE Computer Society Press. http://github.com/dret/biblio (August 29, 2018) 1 dretbiblio [21] IEEE INFOCOM '92 Conference on Computer Communications, Florence, Italy, 1992.
    [Show full text]
  • Rdfs:Frbr– Towards an Implementation Model for Library Catalogs Using Semantic Web Technology
    rdfs:frbr– Towards an Implementation Model for Library Catalogs Using Semantic Web Technology Stefan Gradmann SUMMARY. The paper sets out from a few basic observations (biblio- graphic information is still mostly part of the ‘hidden Web,’ library au- tomation methods still have a low WWW-transparency, and take-up of FRBR has been rather slow) and continues taking a closer look at Se- mantic Web technology components. This results in a proposal for im- plementing FRBR as RDF-Schema and of RDF-based library catalogues built on such an approach. The contribution concludes with a discussion of selected strategic benefits resulting from such an approach. [Article copies available for a fee from The Haworth Document Delivery Service: 1-800-HAWORTH. E-mail address: <[email protected]> Web- site: <http://www.HaworthPress.com> © 2005 by The Haworth Press, Inc. All rights reserved.] Stefan Gradmann, PhD, is Head, Hamburg University “Virtual Campus Library” Unit, which is part of the computing center and has a mission of providing information management services to the university as a whole, including e-publication services and open access to electronic scientific information resources. Address correspondence to: Stefan Gradmann, Virtuelle Campusbibliothek Regionales Rechenzentrum der Universität Hamburg, Schlüterstrasse 70, D-20146 Hamburg, Germany (E-mail: [email protected]). [Haworth co-indexing entry note]: “rdfs:frbr–Towards an Implementation Model for Library Cata- logs Using Semantic Web Technology.” Gradmann, Stefan. Co-published simultaneously in Cataloging & Classification Quarterly (The Haworth Information Press, an imprint of The Haworth Press, Inc.) Vol. 39, No. 3/4, 2005, pp. 63-75; and: Functional Requirements for Bibliographic Records (FRBR): Hype or Cure-All? (ed: Patrick Le Boeuf) The Haworth Information Press, an imprint of The Haworth Press, Inc., 2005, pp.
    [Show full text]
  • SHACL Satisfiability and Containment
    SHACL Satisfiability and Containment Paolo Pareti1 , George Konstantinidis1 , Fabio Mogavero2 , and Timothy J. Norman1 1 University of Southampton, Southampton, United Kingdom {pp1v17,g.konstantinidis,t.j.norman}@soton.ac.uk 2 Università degli Studi di Napoli Federico II, Napoli, Italy [email protected] Abstract. The Shapes Constraint Language (SHACL) is a recent W3C recom- mendation language for validating RDF data. Specifically, SHACL documents are collections of constraints that enforce particular shapes on an RDF graph. Previous work on the topic has provided theoretical and practical results for the validation problem, but did not consider the standard decision problems of satisfiability and containment, which are crucial for verifying the feasibility of the constraints and important for design and optimization purposes. In this paper, we undertake a thorough study of different features of non-recursive SHACL by providing a translation to a new first-order language, called SCL, that precisely captures the semantics of SHACL w.r.t. satisfiability and containment. We study the interaction of SHACL features in this logic and provide the detailed map of decidability and complexity results of the aforementioned decision problems for different SHACL sublanguages. Notably, we prove that both problems are undecidable for the full language, but we present decidable combinations of interesting features. 1 Introduction The Shapes Constraint Language (SHACL) has been recently introduced as a W3C recommendation language for the validation of RDF graphs, and it has already been adopted by mainstream tools and triplestores. A SHACL document is a collection of shapes which define particular constraints and specify which nodes in a graph should be validated against these constraints.
    [Show full text]
  • The Opencitations Data Model
    The OpenCitations Data Model Marilena Daquino1;2[0000−0002−1113−7550], Silvio Peroni1;2[0000−0003−0530−4305], David Shotton2;3[0000−0001−5506−523X], Giovanni Colavizza4[0000−0002−9806−084X], Behnam Ghavimi5[0000−0002−4627−5371], Anne Lauscher6[0000−0001−8590−9827], Philipp Mayr5[0000−0002−6656−1658], Matteo Romanello7[0000−0002−7406−6286], and Philipp Zumstein8[0000−0002−6485−9434]? 1 Digital Humanities Advanced research Centre (/DH.arc), Department of Classical Philology and Italian Studies, University of Bologna fmarilena.daquino2,[email protected] 2 Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna 3 Oxford e-Research Centre, University of Oxford [email protected] 4 Institute for Logic, Language and Computation (ILLC), University of Amsterdam [email protected] 5 Department of Knowledge Technologies for the Social Sciences, GESIS - Leibniz-Institute for the Social Sciences [email protected], [email protected] 6 Data and Web Science Group, University of Mannheim [email protected] 7 cole Polytechnique Fdrale de Lausanne [email protected] 8 Mannheim University Library, University of Mannheim [email protected] Abstract. A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations. This diversity, and the reuse of the same ontology terms with differ- ent nuances, generates inconsistencies in data. Adoption of a single data model would facilitate data integration tasks regardless of the data sup- plier or context application. In this paper we present the OpenCitations Data Model (OCDM), a generic data model for describing bibliographic entities and citations, developed using Semantic Web technologies.
    [Show full text]
  • Using Shape Expressions (Shex) to Share RDF Data Models and to Guide Curation with Rigorous Validation B Katherine Thornton1( ), Harold Solbrig2, Gregory S
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Repositorio Institucional de la Universidad de Oviedo Using Shape Expressions (ShEx) to Share RDF Data Models and to Guide Curation with Rigorous Validation B Katherine Thornton1( ), Harold Solbrig2, Gregory S. Stupp3, Jose Emilio Labra Gayo4, Daniel Mietchen5, Eric Prud’hommeaux6, and Andra Waagmeester7 1 Yale University, New Haven, CT, USA [email protected] 2 Johns Hopkins University, Baltimore, MD, USA [email protected] 3 The Scripps Research Institute, San Diego, CA, USA [email protected] 4 University of Oviedo, Oviedo, Spain [email protected] 5 Data Science Institute, University of Virginia, Charlottesville, VA, USA [email protected] 6 World Wide Web Consortium (W3C), MIT, Cambridge, MA, USA [email protected] 7 Micelio, Antwerpen, Belgium [email protected] Abstract. We discuss Shape Expressions (ShEx), a concise, formal, modeling and validation language for RDF structures. For instance, a Shape Expression could prescribe that subjects in a given RDF graph that fall into the shape “Paper” are expected to have a section called “Abstract”, and any ShEx implementation can confirm whether that is indeed the case for all such subjects within a given graph or subgraph. There are currently five actively maintained ShEx implementations. We discuss how we use the JavaScript, Scala and Python implementa- tions in RDF data validation workflows in distinct, applied contexts. We present examples of how ShEx can be used to model and validate data from two different sources, the domain-specific Fast Healthcare Interop- erability Resources (FHIR) and the domain-generic Wikidata knowledge base, which is the linked database built and maintained by the Wikimedia Foundation as a sister project to Wikipedia.
    [Show full text]
  • Using Rule-Based Reasoning for RDF Validation
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Ghent University Academic Bibliography Using Rule-Based Reasoning for RDF Validation Dörthe Arndt, Ben De Meester, Anastasia Dimou, Ruben Verborgh, and Erik Mannens Ghent University - imec - IDLab Sint-Pietersnieuwstraat 41, B-9000 Ghent, Belgium [email protected] Abstract. The success of the Semantic Web highly depends on its in- gredients. If we want to fully realize the vision of a machine-readable Web, it is crucial that Linked Data are actually useful for machines con- suming them. On this background it is not surprising that (Linked) Data validation is an ongoing research topic in the community. However, most approaches so far either do not consider reasoning, and thereby miss the chance of detecting implicit constraint violations, or they base them- selves on a combination of dierent formalisms, eg Description Logics combined with SPARQL. In this paper, we propose using Rule-Based Web Logics for RDF validation focusing on the concepts needed to sup- port the most common validation constraints, such as Scoped Negation As Failure (SNAF), and the predicates dened in the Rule Interchange Format (RIF). We prove the feasibility of the approach by providing an implementation in Notation3 Logic. As such, we show that rule logic can cover both validation and reasoning if it is expressive enough. Keywords: N3, RDF Validation, Rule-Based Reasoning 1 Introduction The amount of publicly available Linked Open Data (LOD) sets is constantly growing1, however, the diversity of the data employed in applications is mostly very limited: only a handful of RDF data is used frequently [27].
    [Show full text]