
Semantic Web 0 (0) 1 1 IOS Press XMLSchema2ShEx: Converting XML validation to RDF validation Editor(s): Axel Polleres, Vienna University of Economics and Business (WU Wien), Austria Solicited review(s): Felix Sasaki, Cornelsen Verlag GmbH, Germany; Emir Muñoz, National University of Ireland Galway, Ireland; Simon Steyskal, Vienna University of Economics and Business (WU Wien), Austria Herminio Garcia-Gonzalez a;∗, Jose Emilio Labra-Gayo a a Department of Computer Science, University of Oviedo, Oviedo, Asturias, Spain Email: [email protected], [email protected] Abstract. RDF validation is a field where the Semantic Web community is currently focusing attention. Besides, there is a recent trend to migrate data from different sources to semantic web formats. Therefore, in order to facilitate this transformation, we propose: a set of mappings that can be used to convert from XML Schema to Shape Expressions (ShEx), a prototype that implements a subset of the proposed mappings, an example application to obtain a ShEx schema from an XML Schema and a discussion on conversion implications of non-deterministic schemata. We demonstrate that an XML and its corresponding XML Schema are still valid when converted to their RDF and ShEx counterparts. This conversion, along with the development of other format mappings, could drive to an improvement of data interoperability due to the reduction of the technological gap. Keywords: ShEx, XML Schema, Shape Expressions, formats mapping, data validation 1. Introduction XML Schema [5] was designed as a language to make XML validation possible with more expressive- Data validation is a key area when normalisation ness than DTDs [4]. Using XML Schema developers and confidence are desired. Normalisation—which can can define the structure, constraints and documenta- be defined, in this context, as using an homogeneous tion of an XML vocabulary. Besides DTD and XML schema or structure across different sources of similar Schema, other alternatives for XML validation (such information—is desired as a way of making a dataset as Relax NG [11] and Schematron [18]) were pro- more reliable and even more useful to possible con- posed. sumers because of its standardised schema. Validation In the Semantic Web, RDF was missing a stan- can excel data cleansing, querying and standardisation dard constraints validation language which covers the of datasets. In words of P.N. Fox et al. [16]: “Proce- same features that XML Schema does for XML. Some dures for data validation increase the value of data alternatives were OWL [17] and RDF Schema [10]; and the users’ confidence in predictions made from however, they do not cover completely what XML them. Well-designed data management systems may Schema does for XML [38]. For this purpose, Shape strengthen data validation itself, by providing better Expressions (ShEx) [32,33] was proposed to fulfill the estimates of expected values than were available pre- requirement of a constraints validation language for viously.”. Therefore, validation is a key field of data RDF, and SHACL [20] (another proposed language for management. RDF validation) has recently become a W3C recom- mendation. As many documents and data are persisted in XML, *Corresponding Author. Email: [email protected] the need for migration and interoperability to more 1570-0844/0-1900/$35.00 c 0 – IOS Press and the authors. All rights reserved 2 H. Garcia-Gonzalez & J. E. Labra-Gayo / XMLSchema2ShEx flexible data is nowadays more pressing than ever, introduction to ShEx; Section 4 describes a possible set many authors have proposed conversions from XML to of mappings between XML Schema and ShEx; Sec- RDF [27,12,2,6], with the goal of transforming XML tion 5 presents a prototype used to validate a subset data to Semantic Web formats. of previously presented mappings and how this con- Although these conversions enable users to migrate version works against existing RDF validators; Sec- their data to Semantic Web, means for validating the tion 6 discusses the implications of Non-Deterministic output data after converting XML to RDF are missing. schemata on our work. Finally, Section 7 draws some Therefore, we should ensure that the conversion has conclusions and future lines of work and improvement. been done correctly and that both versions—in differ- ent languages—are defining the same meaning. Conversions between XML and RDF, and between 2. Background XML Schema and ShEx are necessary to alleviate the gap between semantic technologies and more The related work of XML ecosystem conversion can traditional ones (e.g., XML, JSON, CSV, relational be divided in three main categories: conversions from databases). With that in mind, providing generic trans- XML to Semantic Web formats, conversions from formation tools from non-semantic technologies to se- XML schemata to non Semantic Web schemata and mantic technologies can enhance the migration pos- conversions from XML schemata to RDF schemata. sibilities; in other words, if we can create tools that ease the transformation and adaptation among tech- 2.1. From XML to Semantic Web formats nologies we will encourage future migrations. Taking Text Encoding Initiative (TEI) [14] as an example, dig- Along with schemata conversions, data transforma- ital humanities can take benefit from Semantic Web tion has to be tackled. Therefore many authors have approaches [37,35]. There are many manuscripts tran- worked on this topic of converting from XML to Se- scribed to XML—using TEI—that can be converted mantic Web formats and more specifically to RDF. For to RDF. But transcribers are hesitant to deal with the this conversions there are plenty of strategies that have underlying technology although they can benefit from been proposed and followed by other authors. it [26]. Those are the cases where generic approaches, In [27], authors describe their experience on devel- as the one introduced here, can offer a solution and oping this transformation for business to business in- where automatic conversion of schemata has its place dustry in the case of the Semantic Mediation tools. An when transformations are to be checked. XML Schema to RDF Schema transformation is per- Taking into account what we previously presented, formed as part of the requirement of the Semantic Me- the questions that we want to address in the present diation tool. work are the following: In [12], a transformation between XML and RDF depending on an ontology is described. This transfor- – RQ1: What components should have a mapping mation takes an XML document, a mapping document from XML Schema to ShEx? and an ontology document and makes the transforma- – RQ2: How to ensure that both schemata are tions to RDF instances compliant with the input ontol- equivalent? ogy. Using the mapping file, conversions between the – RQ3: Is it possible to ensure a backwards conver- XML Schema and the ontology are established. sion in all cases? In [1], the author explains how XML can be con- – RQ4: Are non-deterministic schemata (i.e., am- verted to RDF—and vice versa—using XML Schema biguous schemata) possible to translate and vali- as the base for the mappings. This work is then ex- date? panded in [2] where the author tries to solve the lift In this paper, we describe a solution on how to make problem (the problem of how to map heterogeneous the conversion from XML Schema to ShEx. We de- data sources in the same representational framework) scribe how each element in XML Schema can be trans- from XML to RDF and backwards by using the Gloze lated into ShEx. Moreover, we present a prototype that mapping approach on top of Apache Jena. can convert a subset of what is defined in the following In [40], the authors present a mechanism to query sections. XML data as RDF. Firstly, a matching from XML The rest of the paper is structured as follows: Sec- Schema to RDF Schema class hierarchy is performed. tion 2 presents the background; Section 3 gives a brief Then XML elements can be interpreted as RDF triples. H. Garcia-Gonzalez & J. E. Labra-Gayo / XMLSchema2ShEx 3 The same procedure but using DTDs is described in to RDF Schema [27]. Moreover, when no schema is [39]. available the transformation can be performed from In [9], the author presents a technique for making XML to OWL [7,21,30,23]. standard transformations between XML and RDF us- However, RDF Schema and OWL were not de- ing XSLT. A case study in the field of astronomy is signed as RDF validation languages. Their use of Open used to illustrate the solution. World and Non-Unique Name Assumptions can pose Another approach using XSLT is [36] where authors some difficulties to define the integrity constraints that describe a mapping mechanism using XSLT that can RDF validation languages require [38]. be attached to schemata definition. In [3], a transformation from RDF to other kind of 2.4. FHIR approach formats, including XML, is proposed using in XSLT stylesheets embedded SPARQL which by means of Another approach for transformation between sche- these extensions, could query, merge and transform mas is to take a domain model as the main represen- data from the Semantic Web. tation of data structure and constraints and then trans- In [6], authors describe XSPARQL which is a form between that model and other schema formats framework that enables the transformation between like XML Schema, JSON Schema or ShEx. This has XML and RDF based on XQuery and SPARQL and been the approach followed by FHIR1. However, this solves the disadvantages of using XSLT for these technique needs the creation of a domain model as an transformations. abstract representation which is not the goal of our However, these works (except [27]) are not covering work. the schemata mapping problem. 2.5. RDF validation languages and its conversions 2.2.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages20 Page
-
File Size-