Shapeness: a SHACL-Driven RDF Graph Editor
Total Page:16
File Type:pdf, Size:1020Kb
SHAPEness: a SHACL-driven RDF Graph Editor Rossana Pacielloa,c,*, Daniele Bailoa,c, Luca Tranib, Valerio Vinciarellia, Manuela Sbarrac, Lorenzo Fen- oglioc, Sara Capotostic a European Plate Observing System, EPOS-ERIC, Rome, Italy b Department of R&D Seismology and Acoustics, Royal Netherlands Meteorological Institute (KNMI), Utrechtseweg 297, 3731 GA, De Bilt, The Netherlands c Istituto Nazionale di Geofisica e Vulcanologia (INGV), Rome, Italy E-mails: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Abstract. The Shapes Constraint Language (SHACL) has been recently introduced as a W3C recommendation to define con- straints (called shapes) for validating RDF graphs. In this paper a novel SHACL-driven multi-view editor is presented: SHAPEness. It empowers users by offering them a rich interface for assessing and improving the quality of RDF graphs. SHAPEness has been developed and tested in the framework of the European Plate Observing System (EPOS). In this context, the SHAPEness features have proven to be a valuable solution to easily create and maintain valid graphs according to the EPOS data model. The SHACL-driven approach underpinning SHAPEness, makes this tool suitable for a broad range of do- mains, or use cases, which structure their knowledge by means of SHACL constraints. Keywords: SHACL shapes, RDF validation, knowledge graph quality, graphs visualization, metadata editor. 1. Introduction FAIR principles, metadata can facilitate data ex- change, data findability [14] and data access [5]. Over the past decade, the use of standard semantic Resource Description Framework1 (RDF) is the technologies and linked data principles have become most popular format for exchanging semantic (me- general practice in structuring metadata and related ta)data, structured as dynamic and schemaless graphs. data. For instance, semantic representations such as However, the RDF flexibility turns out to be the ontologies are applied to address FAIR data princi- source of many data quality and knowledge represen- ples [22, 10], that require rich metadata descriptions tation issues. Furthermore, since the use of schema, for scientific datasets. These principles have gained ontology and constraint languages is not mandatory, consensus within scientific communities, fostered by there is often room for misunderstandings related to pan-European initiatives such as the European Open the structure of the data [15]. Science Cloud (EOSC) [16], where they are used as The validation of the structure of RDF graphs driving concepts to support data interoperability against a set of constraints is a requirement shared by among standardized repositories compliant to a many users with different expertise levels, and repre- shared set of requirements [14]. According to the sents an actual and relevant research topic [12, 4]. 1 https://www.w3.org/RDF/ * Corresponding author. Email: [email protected] Constraint languages, such as SHACL 2 or The SHACL-driven mechanism, implemented by ShEx3, meet this need by providing models and vo- SHAPEness, aims at providing an innovative visual cabularies for expressing structural and semantic tool in creating and validating metadata as RDF relationships [11] — they typically support different graphs, suitable for any knowledge domain described types and multiple levels of severity. by means of SHACL constraints. In this paper we The Shapes Constraint Language (SHACL) is a prove its added value in the context of solid-Earth recent W3C recommendation and a powerful formal- sciences. ism to define a set of constraints (called shapes) on The remainder of this paper is organized as fol- the content of RDF data graphs (called nodes) there- lows: section 2 provides an overview of the existing by improving their quality. SHACL addresses vari- tools for creating and validating RDF data, against a ous use cases and application scenarios, for instance, set of constraints. Section 3 describes the main fea- it can be adopted to validate RDF data structures, as a tures of SHAPEness. Section 4 gives a complete template to model RDF and as an RDF structure que- overview of the SHAPEness architecture. Section 5 ry mechanism. describes the application of SHAPEness in a real The modelling of such constraints is usually the case scenario. Section 6 is devoted to discussing the realm of experts with a deep understanding of the approach and results of user testing. Finally, Section target domain concepts (e.g. environmental science) 7 summarizes the paper and provides the directions who are also knowledgeable in the language’s syntax. for future developments. In this way they are able to describe the desired data structure whilst considering the domain requirements. Due to the potential size of the data structure and 2. Related tools the complexity of the restrictions needed to model a knowledge domain, several approaches and tools At present a limited set of visual tools is available have been proposed for supporting experts in the for supporting users in the creation and validation of above-mentioned modelling task [9, 3, 7, 8]. RDF data by means of user-friendly user interfaces. Nevertheless, defining SHACL constraints is just This might be due to the recent standardization of the first step towards assessment and improvement of constraint languages for Linked Data such as SHACL. knowledge graphs quality [18]. It is equally im- This section introduces some of those tools and portant to provide visual tools to create, edit and provides considerations about their supported fea- maintain graphs compliant to such constraints, even tures. when users have a limited knowledge of the domain, Schimatos [23] is a form-based web application of its main concepts and data structures. that helps users to create and edit RDF data, validat- In this paper we present SHAPEness, a SHACL- ed against SHACL constraints. The web forms are driven desktop application that provides users with a automatically generated by using the SHACL content rich interactive and user-friendly environment where and the acquired data are stored in a SPARQL data- they can generate, visualize and work with RDF data base. It offers also a client-side validation which al- and structures complying to defined SHACL con- lows to minimize the input of erroneous data. Schi- straints (shapes). matos serializes the created RDF data to RDF/Turtle RDF data graphs can be easily created by loading format, but only for storing data in a SPARQL data- a set of shapes into the SHAPEness application; base. The software is available to download as an graph properties can be edited and visualized using HTML+CSS/JavaScript package. simple forms; and the resulting graphs can be in- ActiveRaUL [6] is a web forms generator that spected and serialized to RDF/Turtle4 format. helps users, with little or no knowledge about the SHAPEness’ features are provided by combining Semantic Web, to create and maintain RDF data. It three different types of views in a single user inter- operates on a model defined according to the RDFa face: graph-based; form-based; tree-based. This User Interface Language5 that consists of two parts: approach enables users to easily perform their tasks 1) a form model for describing the structure of a web by hiding data structure complexity, and thus over- form with different types of forms controls their as- coming the lack of technical expertise. sociated operations; and 2) a data model for defining the structure of the exchanged data as RDF state- 2 https://www.w3.org/TR/shacl/ ments which are referenced from the form model via 3 https://github.com/shexSpec/shex/wiki/ShEx 4 https://www.w3.org/TR/turtle/ 5 http://purl.org/NET/raul a data binding mechanism. ActiveRaUL is able to SPARQL database, VitroLib provides a serialization create input forms directly using an arbitrary RDF of data by querying the database and retrieving the ontology or an RaUL RDF model in order to acquire result also in JSON format. data from users and store them in a SPARQL data- To summarize, all the presented tools allow users base. However, ActiveRaUL does not provide users to edit RDF data by means of input forms. However, with a serialization of RDF data and does not support none of these tools combines the form-based visuali- any types of validation of the edited metadata. zation with a graph-based representation of da- RDForms6 is a collection of javascript libraries for ta. This feature is beneficial as it offers users a com- supporting users in the creation of a form-based RDF plete vision of the data structure. Most of the evalu- editor in a web environment. It is the sixth version of ated tools support the serialization of RDF data as a a project started in 2001 previously known as database storage mechanism. All the analyzed tools SHAME7. Developers can use one or a combination are distributed as web applications or software librar- of such libraries to build up a web application which ies that need to be installed and configured thus re- includes: the editor, the validator, the linked data quiring users’ technical skills. Moreover, although browser and the template editor. The template editor the web-based approach enables users to work in a helps domain experts to define RDF templates by collaborative environment, it requires an active inter- using a graphic interface. It also provides a converter net connection along the entire working process. to generate templates from RDF schemes or Descrip- Finally, some of these tools support a template- tion Set Profile (DSP). The editor generates web driven approach, which allows to customize the fea- forms according to the defined templates and enables tures offered, including the data validation, on the users to fill in the RDF data. The serialization format basis of a configured template (like SHACL) at in- is RDF/XML or RDF/JSON and the RDF statements stallation stage.