Semantic Web 0 (0) 1 1 IOS Press

1 1 2 2 3 3 4 A Shape Expression approach for assessing 4 5 5 6 the quality of Linked Open Data in digital 6 7 7 8 8 9 libraries 9 10 10 Gustavo Candela a,*, Pilar Escobar a and María Dolores Sáez a and Manuel Marco-Such a 11 11 a 12 Departmento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Spain 12 13 E-mail: [email protected] 13 14 14 15 15 16 16 17 17 Abstract. Cultural heritage institutions are exploring Semantic Web technologies to publish and enrich their catalogs. Several 18 initiatives such as Labs are based on the reuse of the materials published by cultural heritage institutions in innovative and 18 19 creative ways. In this sense, quality has become a crucial aspect when identifying and reusing a dataset for research. In this 19 20 article, we propose a methodology to create Shape Expressions definitions to validate LOD datasets published by digital libraries. 20 21 This methodology is then applied to two use cases based on datasets published by relevant GLAM institutions. It intends to 21 22 encourage institutions to use ShEx to validate LOD datasets as well as to promote the reuse of LOD made openly available by 22 23 digital libraries. 23 24 24 Keywords: Linked Open Data, Data Quality, Digital libraries, Cultural Heritage 25 25 26 26 27 27 28 28 1. Introduction read by computers [5]. The Semantic Web is based on 29 29 a set of technologies that enable the connection of re- 30 30 Galleries, Libraries, Archives and Museums (GLAM) sources known as Linked Open Data (LOD). 31 31 institutions have traditionally provided access to digi- The application of the LOD concepts to the digi- 32 32 tal collections. The diversity of materials include text, 33 tal collections provided by GLAM institutions has be- 33 image, video, audio or maps. 34 come very popular in the research community. Many 34 As technologies have evolved over the years, GLAM 35 institutions are exploring the adoption of Resource 35 organisations have adapted to the new environments in 36 Description Framework (RDF) to describe their con- 36 terms of new skills, service design or digital research 37 tent. In addition, the adoption of collaborative edi- 37 38 [1]. Institutions have started to make their collections tion approaches has been explored using and 38 39 accessible for computational use such as data science, Wikibase [6, 7] to highlight opportunities for research 39 40 Machine Learning, and Artificial Intelligence [2, 3]. in community-based collections and collective collec- 40 41 Recently, the concept of Lab has appeared as a means tion. The use of LOD enhances the discoverability and 41 42 to explore innovate and creative ways of reusing and impact of digital collections by transforming isolated 42 43 exploiting digital collections published by GLAM in- repositories (data silos) into valuable datasets that are 43 44 stitutions [4]. GLAM institutions are exploring user connected to external repositories. Moreover, LOD has 44 45 engagement as a means to conduct research with the become crucial for search engines in which meaning- 45 46 digital collections. ful results are the key to unlocking a successful and 46 47 The Semantic Web was presented by Tim Berners- rich user experience [8]. 47 48 Lee in 2001 as an extension of the current Web in However, the use and exploitation of the Seman- 48 49 which information is structured in a way that it can be tic Web technologies requires complex technical skills 49 50 and professional knowledge in different areas hinder- 50 51 *Corresponding author. E-mail: [email protected]. ing its adoption. Many aspects must be taken into ac- 51

1570-0844/0-1900/$35.00 c 0 – IOS Press and the authors. All rights reserved 2 G. Candela et al. / A ShEx approach for assessing the quality of LOD in DLs

1 count such as the vocabulary to describe the resources, mans using a browser but also for machines that may 1 2 the identification of external repositories to create the automatically process that data. was in- 2 3 links and the system to store the final dataset. Simi- troduced by Tim Berners-Lee [14] as a essential com- 3 4 lar to other types of structured data, LOD suffers from ponent of the Semantic Web to create relationships 4 5 quality problems such as inaccuracy, inconsistency, among datasets. In this sense, Resource Description 5 6 and incompleteness which impedes its reuse and ex- Framework (RDF) [15] lies at the heart of Semantic 6 7 ploitation to its full potential. Web providing a standard model for data interchange 7 8 Some approaches have assessed the quality of LOD on the Web and extending the linking structure of the 8 9 using several methods and techniques [9, 10]. Shape Web by means of URIs. In addition, SPARQL provides 9 10 Expressions (ShEx) have emerged as a concise, for- a query language for RDF providing a list of instruc- 10 11 mal, modeling and validation language for RDF struc- tions [16]. 11 12 tures, addressing the need of the Semantic Web com- Libraries have traditionally provided the descriptive 12 13 munity to ensure data quality for RDF graphs [11, 12]. of bibliographic records using standards such 13 14 A preliminary query-based approach assesses the qual- as MARC.1 While MARC is the most common for- 14 15 ity of the LOD published by four relevant DLs [13]. 15 mats used in libraries to publish bibliographic infor- 16 However, none of them provides a systematic and re- 16 mation, it has limitations for use as RDF, since MARC 17 producible approach to assess the quality of the LOD 17 was not defined for a Web environment [17]. 18 published by DLs based on ShEx as a main compo- 18 In this sense, several initiatives provide a more ex- 19 nent. 19 pressive and modern framework for bibliographic in- 20 The purpose of this paper is to introduce a system- 20 formation based on Semantic Web technologies. Some 21 atic and reproducible approach with which to analyze 21 examples are Functional Requirements for Biblio- 22 the data-quality of LOD published by libraries. The 22 graphic Records (FRBR) family of conceptual mod- 23 methodology is then applied to two LOD repositories 23 24 published by relevant institutions. The results of this els [18] and Resource Description and Access (RDA) 24 25 study could then be used to reproduce and extend the specification [19], IFLA Library Reference Model 25 26 methodology and the ShEx definitions, as well as to (LRM) [20], Bibliographic Ontology (BIBO) [21], 26 27 identify candidate datasets for reuse in innovative and Bibliographic Framework Initiative (BIBFRAME) [22] 27 28 creative ways. and CIDOC Conceptual Reference Model (CRM) 28 29 The main contributions of this paper are the follow- [23]. However, the translation of the old records into 29 30 ing: (a) the methodology to assess they quality of the the new format is not an easy task [24], since libraries 30 31 LOD published by digital libraries (DLs) using ShEx; usually host large catalogs including many types of re- 31 32 (b) the results obtained after the quality assessment; sources that in many occasions requires a manual revi- 32 33 and (c) the ShEx definitions to assess LOD published sion for an accurate transformation of the data. 33 34 by libraries. Several major libraries (e.g., OCLC, British Library, 34 35 The paper is organized as follows: after a brief de- National Library of France, publishers, and library cat- 35 36 scription of the state of the art in Section 2, Section 3 alog vendors) have applied LOD to their catalogs in 36 37 describes the methodology employed to evaluate LOD an effort to make these records more useful to users. 37 38 in DLs using ShEx. Section 4 shows the results of the For instance, the Library of Congress Linked Data 38 39 application of the methodology. The paper concludes Service (id.loc.gov) provides access to authority data. 39 40 with an outline of the methodology adopted, general The Bibliothèque nationale de France (BnF) has pub- 40 41 guidelines for the use of the results and future work. lished data.bnf.fr by aggregating information concern- 41 42 ing works, authors and subjects. The Biblioteca Na- 42 43 cional de España (BNE) has transformed its catalog 43 44 2. Related Work to RDF and is available at datos.bne.es [25]. The Bib- 44 45 lioteca Virtual Miguel de Cervantes (BVMC) catalog 45 46 2.1. Background has been transformed to RDF based on the RDA vo- 46 47 cabulary to describe the resources [26]. The British 47 48 The Semantic Web is a web of data that is machine- National Bibliography (BNB) LOD platform provides 48 49 readable and includes a collection of technologies to 49 50 describe and query the data, and define standard vocab- 50 51 ularies. The goal is that information is useful for hu- 1https://www.loc.gov/marc/ 51 G. Candela et al. / A ShEx approach for assessing the quality of LOD in DLs 3

1 access to the British National Bibliography published node constraints to define the set of allowed values of 1 2 as LOD and made available through SPARQL.2 a node. For instance, the type of a RDF node, literal 2 3 LOD enhance discovery of and access to cultural datatype, XML String and numeric facets and enumer- 3 4 heritage providing context about resources by link- ation of value sets. ShEx also enables the definition of 4 5 ing bibliographic catalog records to external reposito- constraints on the allowed neighbourhood of a node 5 6 ries such as Wikidata, GeoNames and Virtual Interna- called Shape, in terms of the allowed triples that con- 6 7 tional Authority File (VIAF). In this sense, GLAM in- tain this node as subject or object. Figure 2 shows an 7 8 stitutions are increasingly embracing the value of con- example of ShEx to validate entities of type person de- 8 9 tributing information to open knowledge and collabo- scribed using FOAF. 9 10 rative projects such as Wikidata. In this sense, many in- There are several implementation of ShEx includ- 10 11 stitutions have linked their collections to Wikidata by ing shex.js for Javascript,3 Shaclex for Scala4 and Java 11 12 means of dedicated properties. For instance, the prop- ShEx for Java.5 In particular, shex.js include a Simple 12 13 erty BNB person ID (P5361) at Wikidata links to the Online Validator6 to provide a configuration file called 13 14 BNB LOD platform. The linking and enrichment of manifest that can load a schema, load and execute a 14 15 entities enables the combination of information from query against a particular SPARQL endpoint, and val- 15 16 datasets that are stored in different places and have dif- idate the nodes selected by the query. The combina- 16 17 ferent SPARQL endpoints [27]. Figure 1 shows an ex- tion of the validation tool, the ShEx definitions and 17 18 ample of federated SPARQL query run in the Wikidata the manifests enables a reproducible environment with 18 19 query browser to retrieve the works of William Shake- which to reproduce the research result. A collection of 19 20 speare from the BNB LOD platform. ShEx schemas has also been started for several vocab- 20 21 ularies.7 21 22 2.2. Validating LOD The use and application of ShEx for validating RDF 22 23 data has gained international interest in the research 23 24 Data quality is a crucial aspect for data researchers community. For instance, the description and vali- 24 25 when making the choice of a dataset for reuse [10, 28]. dation of Fast Healthcare Interoperability Resources 25 26 In this sense, several methods and tools have recently (FHIR) to RDF transformations by means of ShEx 26 27 appeared in order to assess the quality of datasets built [33]. In addition, ShEx is used in several Wikidata 27 28 upon Semantic Web technologies. In addition, the re- projects to ensure data-quality by developing quality- 28 29 search community have highlighted the need for repro- control pipelines [34]. ShEx is also used to facilitate 29 30 ducible research by providing articles, as well as, data the creation of RDF resources that are validated upon 30 31 31 and code [29]. creation [35]. Another approach proposes a set of map- 32 32 Stardog Integrity Constraint Validation (ICV) al- pings that can be used to convert from XML Schema 33 33 lows to write constraints that are translated to SPARQL to Shape Expressions (ShEx) [36]. 34 34 in order to assess RDF triples in a repository [26]. Sev- With regard to DLs, previous works are focused on 35 35 eral approaches provide data quality criteria accord- the identification of LOD repositories published by 36 36 ing to which LOD repositories can be analyzed. They DLs and their assessment by means of a data-quality 37 37 have contributed to understand and specify data qual- criteria [13]. Another approach is based on Europeana 38 38 ity on several dimensions regarding data quality (e.g., and multilinguality describing the measures defined 39 39 accuracy, completeness, licensing) [10, 30, 31]. These and providing initial interpretations of the results [37]. 40 40 efforts are mostly concentrated on the evaluation of A new method and results of validation of several cat- 41 41 repositories focused on general knowledge rather than alogs using MARC as metadata format identifies the 42 42 specific domains such as cultural heritage or literature. structural features of the records and most frequent is- 43 43 ShEx enables RDF validation through the declara- sues [38]. Moreover, an extensible quality assessment 44 44 tion of constraints on the RDF model [32]. ShEx are framework which supports multiple metadata schemas 45 45 46 defined using terms from RDF semantics such as node 46 47 which corresponds to one IRI, a blank node or a lit- 3https://github.com/shexSpec/shex.js/ 47 4 48 eral, and graph as a set of triples described as (sub- https://github.com/labra/shaclex/ 48 5https://gforge.inria.fr/projects/shex-impl/ 49 ject, predicate, object). ShEx enables the definition of 49 6https://rawgit.com/shexSpec/shex.js/master/packages/ 50 shex-webapp/doc/shex-simple.html 50 51 2https://bnb.data.bl.uk/ 7https://github.com/shexSpec/schemas 51 4 G. Candela et al. / A ShEx approach for assessing the quality of LOD in DLs

1 SELECT ?work ?workLabel WHERE { 1 2 wd:Q692 wdt:P5361 ?id 2 3 BIND(uri(concat("http://bnb.data.bl.uk/id/person/", ?id)) as ?bnbID) 3 4 SERVICE { 4 5 ?work dct:creator ?bnbID. 5 6 ?work dct:title ?workLabel 6 7 } 7 8 } 8 9 LIMIT 20 9 10 10 11 Fig. 1. A federated SPARQL query run in Wikidata query browser to retrieve the works of William Shakespeare from the BNB LOD platform. 11 12 12 13 { # An has: 13 14 :givenName xsd:string+, # at least one givenName. 14 15 foaf:familyName xsd:string, # one familyName. 15 16 foaf:phone IRI*, # any number of phone numbers. 16 17 foaf:mbox IRI # one FOAF mbox. 17 18 } 18 19 19 20 # Person1 matches PersonShape 20 21 21 22 foaf:givenName "Gustave" ; 22 23 foaf:familyName "Eiffel" ; 23 24 # no phone number needed 24 25 foaf:mbox 25 26 . 26 27 27 28 28 29 Fig. 2. A ShEx Shape to validate a person described using the FOAF ontology. Person1 matches PersonShape including the required properties. 29 30 30 31 describes the requirements that need to be considered tion of resources, (iii) definition of ShEx rules and (iv) 31 32 during the design of such a software [39]. However, validation. The output of the validation step is a report 32 33 to the best of our knowledge, none has been carried describing the results of the evaluation. 33 34 out an evaluation of the LOD published by DLs using We have selected ShEx in this methodology since it 34 35 ShEx. has become very popular in the research community. 35 36 This paper is based on the previously published In addition, ShEx enables reproducibility allowing re- 36 37 benchmarking and data-quality criteria applied to the searchers to improve the definitions. Moreover, several 37 38 38 LOD made openly available by four relevant libraries: tools have recently been developed to automate the val- 39 39 the Biblioteca Nacional de España (BNE), the Biblio- idation process. 40 thèque nationale de France (BnF), the British National 40 Although the publication of LOD repositories has 41 Bibliography (BNB) and the Biblioteca Virtual Miguel 41 recently increased, there are some cases in which the 42 de Cervantes (BVMC) [13] . 42 43 URL is no longer available due to several reasons and 43 44 making difficult its reuse. In this sense, their exploita- 44 45 3. Methodology tion and analysis requires specific knowledge about 45 46 Semantic Web technologies. However, their promotion 46 47 This section introduces the methodology to assess by means of prototypes and examples of reuse can help 47 48 the data-quality of LOD published by libraries using easing entry barriers. 48 49 ShEx. The procedure is described in Figure 3 and is In addition to the publication of the LOD repository, 49 50 based on 4 steps, which are detailed in the following metadata can be enriched using external repositories. 50 51 subsections: (i) selection of a repository, (ii) identifica- This information can also be assessed in order to iden- 51 G. Candela et al. / A ShEx approach for assessing the quality of LOD in DLs 5

1 tify duplicates as well as to validate the number of ex- with different levels of granularity depending on the 1 2 ternal links. vocabulary. For instance, DC includes the properties 2 3 dc:author and dc:contributor while RDA 3 4 3.1. Selection provides a much expressive vocabulary to describe the 4 5 roles with works, expressions and manifestations. 5 6 The selection of a LOD repository is a critical fac- 6 tor as well as a complicated task since many institu- 7 3.3. Definition of ShEx rules 7 tions are publishing their metadata as LOD. Choosing 8 8 the right subject ensures the possibility of replicating 9 A collection of RDF triples can be assessed by 9 existing results as well as to address new challenges by 10 means of a ShEx definition to determine whether 10 researchers. 11 the collection meets the requirements defined in the 11 In this sense, benchmarks provide an experimental 12 schema. 12 basis for evaluating and comparing the performance of 13 According to the entities and its properties identified 13 computer systems [40, 41] as well as the possibility 14 in the previous step, ShEx rules are defined to assess 14 of replicating existing results [42]. Previous research 15 RDF data. ShEx can be represented in JSON structures 15 are based on four LOD repositories published by li- 16 (ShExJ) –intended for human consumption– or a com- 16 braries –BVMC, BnF, BNB and BNE– that serve as 17 pact syntax (ShExC) –for machine processing– [44]. 17 benchmark and discusses the methodology employed 18 ShEx has several serialization formats [45]: 18 to evaluate linked data in DLs [13]. 19 19 There are many aspects to consider when using a – a concise, human-readable compact syntax (ShExC); 20 20 LOD repository. For instance, open licenses and clear – a JSON-LD syntax (ShExJ) which serves as an 21 21 terms of use and conditions are key when reusing abstract syntax; 22 22 datasets. Depending on the requirements, a SPARQL – an RDF representation (ShExR) derived from the 23 23 endpoint may be necessary in order to assess the infor- JSON-LD syntax. 24 24 mation provided by the repository. Table 1 shows an 25 Following other approaches [12], the ShEx-based 25 overview of LOD repositories published by libraries 26 validation workflow for DLs consists of: 26 and the vocabulary used. 27 27 1. writing a schema for the data type in question; 28 28 3.2. Identification of resources 2. transferring that schema into the DL model of 29 29 items, statements, qualifiers and references; 30 The identification of resources is a crucial step in 30 3. writing a ShEx manifest for the DL-based schema; 31 order to analyse the elements and properties to be as- 31 32 sessed by means of ShEx. A manifest file include several properties: (i) a label 32 33 Publication workflows in libraries are increasingly for the schema; (ii) a ShEx schema; (iii) a data label 33 34 complicated since maintenance of the metadata is a describing the dataset; (iv) a data property including a 34 35 dynamic and evolving process [43]. In this sense, SPARQL endpoint; (v) the SPARQL query to retrieve 35 36 bibliographic information is stored as metadata us- the data; and (vi) a status property with the value con- 36 37 ing common resources (e.g. author, work, date). Meta- formant. Figure 4 shows an example of manifest file 37 38 data is available in an increasing number of options to test entities typed as Person at the LOD repository 38 39 including FRBR, BIBFRAME, RDA, published by the BNE. ShEx manifests can be hosted 39 40 (DC), schema.org, Europeana Data Model (EDM) and on GitHub so it can be used by online services. 40 41 CIDOC-CRM. In addition, the vocabulary used to de- In addition, the definition of ShEx constraints for an 41 42 scribe the contents can be complex such as the partic- existing dataset and its validation can be performed by 42 43 ular case of FRBR based vocabularies in which enti- means of graphical tools aimed at novices and experts 43 44 ties typed as Work follow a hierarchical organization that enable combination and modification functionali- 44 45 including several layers. ties in order to build a complex ShEx schema [46]. 45 46 Table 2 shows an overview of the main entities in 46 47 LOD vocabularies used by libraries to publish their 3.4. Validation results 47 8 48 catalogs. Each entity may include several properties 48 49 The last step consists on testing entity data from the 49 50 8The prefixes used to abbreviate RDF vocabularies can be found DL for conformance to the ShEx manifest defined in 50 51 in the appendix (Table 4). the previous step. 51 6 G. Candela et al. / A ShEx approach for assessing the quality of LOD in DLs

1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 Fig. 3. Methodology for assessing the data-quality of LOD repositories published by digital libraries using ShEx. 19 20 20 21 21 22 22 23 23 24 Table 1 24 25 Overview of LOD repositories published by libraries. 25 26 26 Institution Vocabulary URL 27 27 28 Biblioteca Nacional de España FRBR http://datos.bne.es 28 Biblioteca Virtual Miguel de Cervantes RDA data.cervantesvirtual.com 29 29 Bibliothèque nationale de France FRBR data.bnf.fr 30 30 BNB Linked Data Platform BIBO https://bnb.data.bl.uk/ 31 31 Europeana EDM https://pro.europeana.eu/page/sparql 32 32 Library of Congress BIBFRAME https://id.loc.gov/ 33 National Library of Netherlands LRM https://data.bibliotheken.nl 33 34 34 35 35 36 36 37 37 38 38 39 { 39 40 "schemaLabel": "BNE person entities", 40 41 "schemaURL": "https://raw.githubusercontent.com/hibernator11/ 41 42 ShEx-DLs/master/bne-person.shex", 42 43 "dataLabel": "Get 20 items typed as person from datos.bne.es" , 43 44 "data": "Endpoint: http://datos.bne.es/sparql", 44 45 "queryMap": "SPARQL ’’’select ?item where 45 46 {\r\n ?item rdf:type ns2:C1005\r\n}\r\n limit 20’’’@START", 46 47 "status": "conformant" 47 48 } 48 49 49 50 Fig. 4. An example of manifest file to test entities typed as Person – that corresponds to the class ns2:C10005– at the LOD repository published 50 51 by the BNE. 51 G. Candela et al. / A ShEx approach for assessing the quality of LOD in DLs 7

Table 2 1 bibliographic information, in particular the entities and 1 Main entities described by LOD vocabularies used by digital li- 2 properties. 2 braries to publish bibliographic information. 3 Both datasets are linked to Wikidata by means of 3 4 Vocabulary Entities specific properties. In this sense, and in addition to the 4 5 5 BIBFRAME Work, Person ShEx definitions created according to the vocabular- 6 ies used by the DLs, we have created a ShEx schema 6 BIBO Event, Agent, Book, Newspaper 7 per DL to assess that the resources linked to Wiki- 7 CIDOC-CRM Creation, Person, Event, Place 8 data are typed as human (wd:Q5) via the public Wiki- 8 Dublin Core BibliographicResource, Agent, Location 9 data SPARQL endpoint provided by the Wikidata in- 9 EDM ProvidedCHO, WebResource, Aggregation, 10 10 Agent, Concept frastructure. The ShEx definitions have been made 11 11 FRBR Work, Expression, Manifestation, Agent grouped by DL in a manifest file (see Figure 3). The 12 project is available on GitHub10 and has been made 12 LRM Work, Expression, Manifestation, Agent 13 citable via Zenodo.11 13 RDA Work, Expression, Manifestation, Agent 14 14 schema.org CreativeWork, Book, Event, Person, Organi- 15 15 zation 4.1. The BNB Linked Data Platform 16 16 17 The BNB Linked Data Platform provides access to 17 9 18 The ShEx2 Simple Online Validator can be used the British National Bibliography12 published as LOD 18 19 to test and experiment with ShEx. The prototype pro- and made available through SPARQL services. The 19 20 vides several examples to show how to use ShEx. The Linked Open BNB is a subset of the full BNB in- 20 21 validator requires a ShEx expression and a SPARQL cluding published books, serial publications and new 21 22 endpoint and query to retrieve the entities in order to and forthcoming books, representing approximately 22 23 assess them. The results are shown item by item allow- 4.4 million records. The dataset is available under a 23 24 ing to fix possible issues in the definition of the ShEx Creative Commons CC0 licence.13 24 25 rules. The dataset is accessible through different inter- 25 26 Prototypes and tools like this example enable the faces: (i) a SPARQL online editor; (ii) a SPARQL end- 26 27 reproducibility of the results allowing researchers to point for remote access; and (iii) a web interfaces pro- 27 28 replicate, reuse and extend findings, and thereby driv- viding a search box to enter a plain text term. 28 29 ing scientific progress. However, There are some as- The BNB dataset has been modelled and repre- 29 30 pects to consider when using a LOD dataset published sented in RDF using a number of standard schemas in- 30 31 by a DL for assessment. For instance, in order to use cluding the British Library Terms,14 BIBO, the uncon- 31 32 the ShEx2 Simple Online Validator, the DL must pro- strained version of the RDA element sets, schema.org 32 33 vide a SPARQL endpoint via the secure HTTPS proto- and DC, amongst others. In addition, the BNB dataset 33 34 col. has been enriched by means of the creation of links to 34 35 several external datasets such as Wikidata, GeoNames 35 36 and VIAF. The Book Data model provides an overview 36 37 4. Assessing the quality of LOD published by DLs of the main classes and properties involved in the data 37 38 model.15 38 39 According to the previously published benchmark- 39 4.1.1. Validating the dataset 40 ing applied to the LOD made openly available by 40 41 DLs [13], this section introduces the application of the The identification of resources is performed by 41 42 methodology introduced in Section 3 to two uses cases means of the SPARQL query in Figure 5 that shows an 42 43 based on relevant libraries. In the first use case, the example of how to retrieve the different classes stored 43 44 BNB Linked Data platform is used by means of its in a RDF repository. However, several classes may be 44 45 SPARQL endpoint. In the second use case, the val- used to type the same resource. For instance, a book 45 46 idation is performed using the LOD repository pub- 46 47 lished by the BnF. The main difference between the 10https://github.com/hibernator11/ShEx-DLs 47 11 48 two repositories is the vocabulary used to describe the https://doi.org/10.5281/zenodo.4022755 48 12https://www.bl.uk/collection-metadata/metadata-services 49 49 13http://creativecommons.org/publicdomain/zero/1.0/ 50 9https://rawgit.com/shexSpec/shex.js/wikidata/packages/ 14http://www.bl.uk/schemas/bibliographic/blterms 50 51 shex-webapp/doc/shex-simple.html 15http://www.bl.uk/bibliographic/pdfs/bldatamodelbook.pdf 51 8 G. Candela et al. / A ShEx approach for assessing the quality of LOD in DLs

SELECT DISTINCT ?type 1 4.2.1. Validating the dataset 1 WHERE { 2 An overview of the main classes stored in the LOD 2 ?s a ?type. 3 repository has been extracted and is shown in Figure 3 } 4 10. A new vocabulary has been defined to describe 4 5 roles in which resources are linked to the Library of 5 6 Fig. 5. A SPARQL query to retrieve the different classes stored in a Congress subject headings (LCSH).19 6 7 RDF repository. Once we extracted the main resources described 7 8 in the repository and identified their type, we ex- 8 9 can be typed as bibo:Book, schema:Book and tracted the properties per each class using SPARQL 9 10 dc-terms:BibliographicResource. As a re- queries. For instance, Figure 11 shows a SPARQL 10 11 sult, Figure 6 shows a summary of the main classes query to retrieve the different properties used by the 11 12 12 used in the BNB repository. class frbr-rda:Work. In this case, 394 unique 13 13 Once we extracted the main resources described properties where identified to define the ShEx schema 14 14 in the repository and identified their type, we ex- including a long list of roles. 15 A ShEx schema was defined per each class to per- 15 tracted the properties per each class using SPARQL 16 form the validation as is shown in Figure 12. Similar 16 queries. For instance, Figure 7 shows a SPARQL query 17 to the previous use case, all the ShEx schemas were in- 17 to retrieve the different properties used by the class 18 cluded in a manifest file that is used by the validation 18 bibo:Book. 19 tool as an input.20 19 20 A ShEx definition was created per each class to per- In addition, we have created an additional ShEx 20 21 form the assessment. As an example, the correspond- schema to assess that the resources linked from Wiki- 21 22 ing definition for the bibo:Book can be found in data to data.bnf.fr by means of the property wdt:P268 22 23 Figure 8. All the definitions were included in a mani- are typed as human (wd:Q5). 23 24 fest file that can be consumed by the ShEx validation 24 16 25 tool [47]. 4.3. Results and discussion 25 26 In addition, we have created an additional ShEx 26 27 schema to asses the resources linked to the BNB In order to assess the datasets, the main resources 27 28 Linked Data platform from Wikitada by means of the have been identified and validated using a random 28 29 property wdt:P5361 are typed as human (wd:Q5) sample of 1000 items retrieved per entity and DL from 29 30 their SPARQL endpoints. In total, 15 ShEx definitions 30 31 4.2. Bibliothèque nationale de France as LOD: have been created to validate the LOD published by 31 32 data.bnf.fr digital libraries. Table 3 shows the description of the 32 33 classes, ShEx and manifest files used to assess each 33 34 DL. Figure 9 shows the ShEx validation interface con- 34 35 The data.bnf.fr project endeavours to make the data 35 produced by Bibliothèque nationale de France more suming the manifest file and presenting the evaluation 36 results for the BNB Linked Data platform. 36 useful on the Web using Semantic Web technologies. 37 When creating the ShEx definitions, preliminary 37 The dataset integrates several including 38 tests were performed to pass the validation and after 38 the BnF main catalogue, BnF archives et manuscripts, 39 several iterations we were able to address all the is- 39 40 and Gallica. The data model is based on FRBR, foaf 40 sues. For some classes including a large number of 41 and SKOS as main vocabularies and provides links resources, the properties have been extracted manu- 41 42 to external repositories such as GeoNames, Library of ally since the SPARQL endpoint produced some errors 42 17 43 Congress and VIAF. The dataset can be used via a due to the complexity and time of the query. For in- 43 18 44 public SPARQL endpoint or as a dump file. stance, when using many properties in a ShEx defini- 44 45 tion, we may receive a 404 HTTP error (Request-URI 45 46 46 47 16See, for instance, https://rawgit.com/shexSpec/shex.js/wikidata/ 47 19 48 packages/shex-webapp/doc/shex-simple.html?manifestURL=https: https://data.bnf.fr/vocabulary/roles/ 48 //raw.githubusercontent.com/hibernator11/ShEx-DLs/master/bnb. 20https://rawgit.com/shexSpec/shex.js/wikidata/packages/ 49 49 manifest.json shex-webapp/doc/shex-simple.html?manifestURL=https: 50 17https://data.bnf.fr/en/opendata //raw.githubusercontent.com/hibernator11/ShEx-DLs/master/bnf. 50 51 18http://api.bnf.fr/dumps-de-databnffr manifest.json 51 G. Candela et al. / A ShEx approach for assessing the quality of LOD in DLs 9

1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 Fig. 6. Main classes retrieved from BNB LOD platform based on BIBO, SKOS and FOAF controlled vocabularies, and how they interact to 20 21 create meaning. 21 22 22 23 Table 3 23 24 Description of the classes, ShEx definitions and manifest files used to assess each DL and provided in the Github project. 24 25 DL Manifest file Class ShEx file 25 26 26 27 BNB bnb.manifest.json foaf:Agent bnb-agent.shex 27 28 bibo:Book bnb-book.shex 28 29 skos:Concept bnb-concept.shex 29 30 blterms:Publication bnb-publication.shex 30 31 wd:Q5 bnb-wikidata.shex 31 32 BnF bnf.manifest.json foaf:Agent bnf-agent.shex 32 33 frbr:Work bnf-work1.shex 33 34 frbr-rda:Work bnf-work2.shex 34 35 frbr-rda:Expression bnf-expression.shex 35 36 frbr-rda:Manifestation bnf-manifestation.shex 36 37 skos:Concept bnf-concept.shex 37 38 foaf:Organization bnf-organization.shex 38 39 foaf:Person bnf-person.shex 39 40 geo:SpatialThing bnf-place.shex 40 41 wd:Q5 bnf-wikidata.shex 41 42 42 43 Too Large). In this sense, the schemas could be further and use a SPARQL engine to create custom queries 43 44 such as RDF4J.21 44 refined with additional properties and rules to obtain 45 Moreover, some resources may not include enough 45 46 better performance in the results. In some cases, orga- information to be assessed. For instance, the re- 46 47 sources typed as skos:Concept only includes a 47 nizations provide a dump file instead of having a public 48 rdfs:label in the BNB Linked Data platform. In 48 49 49 SPARQL endpoint available. For instance, the Library 50 50 51 of Congress suggests to download the bulk metadata 21https://id.loc.gov/techcenter/searching.html 51 10 G. Candela et al. / A ShEx approach for assessing the quality of LOD in DLs

1 PREFIX bibo: common classes and properties used by DLs will be 1 2 analysed. 2 3 3 4 SELECT distinct ?p WHERE { 4 5 ?s a bibo:Book. Appendix A. List of prefixes 5 6 ?s ?p ?o 6 7 } The prefixes in Table 4 are used to abbreviate 7 8 LIMIT 10 namespaces throughout this paper. 8 9 9 10 Fig. 7. A SPARQL query to retrieve the different properties used by 10 11 the class bibo:Book. Appendix. References 11 12 12 13 some cases the same schema can be used for different [1] Research Libraries UK, A manifesto for the digital shift in re- 13 search libraries, 2020, [Online; accessed 20-October-2020]. 14 classes such as blterms:PublicationEvent 14 and blterms:ProjectedPublicationEvent [2] T. Padilla, Responsible Operations: Data Science, Machine 15 Learning, and AI in Libraries, OCLC Research, 2019. https: 15 16 since they are based on the same properties and vocab- //doi.org/10.25333/xk7z-9g97. 16 17 ularies. The ShEx definitions defined for the Bnf are [3] T. Padilla, L. Allen, H. Frost, S. Potvin, E. Russey Roke 17 18 more detailed since the FRBR model provides addi- and S. Varner, Final Report — Always Already 18 Computational: Collections as Data, Zenodo, 2019. 19 tional classes to describe the resources instead of the 19 BIBO vocabulary. doi:10.5281/zenodo.3152935. 20 [4] M. Mahey, A. Al-Abdulla, S. Ames, P. Bray, G. Candela, 20 21 The ShEx Online Validator requires a public SPARQL C. Derven, M. Dobreva-McPherson, K. Gasser, S. Chambers, 21 22 endpoint that uses HTTPS to test the entities. How- S. Karner, K. Kokegei, D. Laursen, A. Potter, A. Straube, S.- 22 23 ever, some organizations do not provide this protocol C. Wagner and L. Wilms, Open a GLAM lab, 2019. ISBN 978- 23 9927-139-07-9. 24 in their services such as the BNE and BVMC. 24 [5] T. Berners-Lee, J. Hendler and O. Lassila, The Semantic Web 25 in Scientific American, Scientific American Magazine 284 25 26 (2001). 26 27 5. Conclusions [6] J. Godby, S.-Y. Karen, W. Bruce, D. Kalan, D. Karen, 27 28 F.E. Christine, S. Folsom, X. Li, M. McGee, K. Miller, 28 H. Moody, H. Tomren, and C. Thomas, Creating Li- 29 Cultural heritage institutions are exploring Semantic 29 brary Linked Data with Wikibase: Lessons Learned from 30 Web technologies to publish and enrich their catalogs. Project Passage, OCLC Research, 2019. https://doi.org/10. 30 31 While LOD repositories can be reused in innovative 25333/faq3-ax08. 31 32 and creative ways, data quality has become a crucial [7] Association of Research Libraries (ed.), ARL Task Force 32 33 aspect when identifying a dataset for reuse. on Wikimedia and Linked Open Data. ARL White Paper 33 on Wikidata: Opportunities and Recommendations, 2019. 34 Based on previous research, we defined a methodol- 34 https://www.arl.org/storage/documents/publications/2019.04. 35 ogy based on ShEx described in Section 3 for assess- 18-ARL-white-paper-on-Wikidata.pdf. 35 36 ing the quality of LOD repositories published by DLs. [8] P. Mika, T. Tudorache, A. Bernstein, C. Welty, C.A. Knoblock, 36 37 The methodology have been applied to two use cases. D. Vrandecic, P.T. Groth, N.F. Noy, K. Janowicz and 37 38 The expressivity of ShEx enables the identification C.A. Goble (eds), The Semantic Web - ISWC 2014 - 13th In- 38 ternational Semantic Web Conference, Riva del Garda, Italy, 39 of inconsistencies and the validation in large RDF 39 October 19-23, 2014. Proceedings, Part I, in Lecture Notes in 40 datasets. In this sense, ShEx addresses the need of the Computer Science, Vol. 8796, Springer, 2014. ISBN 978-3- 40 41 Semantic Web community to ensure data quality for 319-11963-2. doi:10.1007/978-3-319-11964-9. 41 42 RDF graphs. ShEx is useful as documentation of the [9] A. Zaveri, A. Rula, A. Maurino, R. Pietrobon, 42 43 LOD datasets published by DLs since it provides a J. Lehmann and S. Auer, Quality Assessment for 43 Linked Data: A Survey, Semantic Web Journal 44 human-readable representation that helps researchers 44 (2015). http://www.semantic-web-journal.net/content/ 45 to understand the data model and evaluate their suit- quality-assessment-linked-data-survey. 45 46 ability for reuse in creative and innovative ways pro- [10] M. Färber, F. Bartscherer, C. Menne and A. Rettinger, Linked 46 47 moted by Labs. data quality of DBpedia, Freebase, OpenCyc, Wikidata, and 47 48 Future work to be explored includes the improve- YAGO, Semantic Web 9(1) (2018), 77–129. doi:10.3233/SW- 48 170275. 49 49 ment of the ShEx definitions and the inclusion of addi- [11] E. Prud’hommeaux, I. Boneva, J.E.L. Gayo and G. Kel- 50 tional use cases. Moreover, the extension of the ShEx logg, Shape Expressions Language 2.1, 2019. http://shex.io/ 50 51 validation tool in terms of DLs requirements such as shex-semantics/index.html. 51 G. Candela et al. / A ShEx approach for assessing the quality of LOD in DLs 11

1 start = @ 1 2 { 2 3 rdfs:label xsd:string ; 3 4 dct:title xsd:string ; 4 5 blterms:bnb xsd:string ; 5 6 dct:language IRI* ; 6 7 dct:subject IRI* ; 7 8 dct:alternative xsd:string* ; 8 9 dct:description rdf:langString* ; 9 10 dct:abstract rdf:langString* ; 10 11 dct:spatial IRI* ; 11 12 bibo:isbn13 xsd:string* ; 12 13 bibo:isbn10 xsd:string* ; 13 14 schema:isbn xsd:string* ; 14 15 schema:identifier IRI+ ; 15 16 schema:datePublished xsd:string ; 16 17 schema:name xsd:string ; 17 18 schema:author IRI* ; 18 19 schema:contributor IRI* ; 19 20 blterms:projectedPublication IRI* ; 20 21 blterms:publication IRI* ; 21 22 owl:sameAs IRI* ; 22 23 isbd:P1053 rdf:langString* ; 23 24 isbd:P1042 rdf:langString* ; 24 25 isbd:P1073 rdf:langString* ; 25 26 rdau:P60048 IRI* ; 26 27 rdau:P60049 IRI* ; 27 28 rdau:P60050 IRI* ; 28 29 } 29 30 30 31 31 32 Fig. 8. A ShEx Shape to validate the resources typed as bibo:Book at BNB Linked Data platform. Each line corresponds to a property based 32 33 on a particular vocabulary used to describe the resources. 33 34 34 35 [12] K. Thornton, H. Solbrig, G.S. Stupp, J.E. Labra Gayo, D. Mi- [17] T.W. Cole, M.-J. Han, W.F. Weathers and E. Joyner, Library 35 36 etchen, E. Prud’hommeaux and A. Waagmeester, Using Shape Marc Records Into Linked Open Data: Challenges and Oppor- 36 Expressions (ShEx) to Share RDF Data Models and to Guide tunities, Journal of Library Metadata 13(2–3) (2013), 163– 37 37 Curation with Rigorous Validation, in: The Semantic Web, 196. doi:10.1080/19386389.2013.826074. 38 38 P. Hitzler, M. Fernández, K. Janowicz, A. Zaveri, A.J.G. Gray, [18] IFLA, IFLA Study Group on the FRBR. Functional Require- 39 V. Lopez, A. Haller and K. Hammar, eds, Springer Interna- ments for Bibliographic Records, IFLA Series on Biblio- 39 40 tional Publishing, Cham, 2019, pp. 606–620. ISBN 978-3-030- graphic Control, München, 1998. 40 41 21348-0. [19] RDA Steering Committee, RDA Toolkit: Resource Description 41 and Access, 2012, [Online; accessed 19-November-2018]. 42 [13] G. Candela, P. Escobar, R.C. Carrasco and M. Marco-Such, 42 [20] IFLA, IFLA Library Reference Model (LRM), 2017. 43 Evaluating the quality of linked open data in digital libraries, 43 https://www.ifla.org/files/assets/cataloguing/frbr-lrm/ Journal of Information Science 0(0) (0), 0165551520930951. 44 ifla-lrm-august-2017_rev201712.pdf. 44 doi:10.1177/0165551520930951. 45 [21] B. D’Arcus and F. Giasson, Bibliographic Ontology, 2009. 45 [14] Tim Berners-Lee, Linked-data design issues. W3C de- 46 http://bibliontology.com/. 46 sign issue document, 2006. http://www.w3.org/DesignIssues/ 47 [22] Library of Congress, Bibliographic Framework Initiative. 47 LinkedData.html. https://www.loc.gov/bibframe/. 48 48 [15] Consortium, RDF 1.1 Concepts and Abstract [23] CIDOC Documentation Standards Working Group, CIDOC 49 49 Syntax, 2014. https://www.w3.org/TR/rdf11-concepts/. Conceptual Reference Model. http://www.cidoc-crm.org/. 50 [16] World Wide Web Consortium, SPARQL 1.1 Query Language, [24] T. Aalberg and M. Zumer, Looking for Entities in Biblio- 50 51 2013. https://www.w3.org/TR/sparql11-query/. graphic Records, in: Digital Libraries: Universal and Ubiq- 51 12 G. Candela et al. / A ShEx approach for assessing the quality of LOD in DLs

1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 Fig. 9. The ShEx validator interface that uses the manifest file provided for the BNB Linked Data platform to assess each of the ShEx definitions 18 19 showing the results. 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 Fig. 10. Overview of the main classes based on FRBR, FOAF and SKOS retrieved from data.bnf.fr and how they interact to create meaning. 38 39 39 40 uitous Access to Information, 11th International Conference [27] K. Coombs, Federated Queries with SPARQL, 40 41 on Asian Digital Libraries, ICADL 2008, Bali, Indonesia, De- 2016. https://www.oclc.org/developer/news/2016/ 41 42 cember 2-5, 2008. Proceedings, G. Buchanan, M. Masoodian federated-queries-with-sparql.en.html. 42 43 and S.J. Cunningham, eds, Lecture Notes in Computer Science, [28] J. Debattista, C. Lange, S. Auer and D. Cortis, Evaluating the 43 44 Vol. 5362, Springer, 2008, pp. 327–330. doi:10.1007/978-3- quality of the LOD cloud: An empirical investigation, Semantic 44 45 540-89533-6_36. Web 9(6) (2018), 859–901. doi:10.3233/SW-180306. 45 46 [25] D. Vila-Suero, B. Villazón-Terrazas and A. Gómez-Pérez, [29] B. Baillieul John, O. Hall Larry, M.F. Moura Jose, S. Hemami 46 47 datos.bne.es: A library linked dataset, Semantic Web 4(3) Sheila, G. Setti, G. Grenier, B. Forster Michael, F. Zappulla, 47 48 (2013), 307–313. doi:10.3233/SW-120094. J. Keaton, D. McCormick and L. Moore Kenneth, The first 48 49 [26] G. Candela, P. Escobar, R.C. Carrasco and M. Marco-Such, IEEE workshop on the Future of Research Curation and Re- 49 50 Migration of a library catalogue into RDA linked open data, search Reproducibility, OpenBU, 2017. https://open.bu.edu/ 50 51 Semantic Web 9(4) (2018), 481–491. doi:10.3233/SW-170274. handle/2144/39028. 51 G. Candela et al. / A ShEx approach for assessing the quality of LOD in DLs 13

1 Table 4 1 2 Common prefixes used to designate RDF vocabularies. 2 3 prefix URI 3 4 bibo http://purl.org/ontology/bibo/ 4 5 blterms http://www.bl.uk/schemas/bibliographic/blterms# 5 6 bneonto http://datos.bne.es/def/ 6 7 bnf-onto http://data.bnf.fr/ontology/bnf-onto/ 7 8 bnfroles http://data.bnf.fr/vocabulary/roles/ 8 9 dcmitype http://purl.org/dc/dcmitype/ 9 10 dcterms http://purl.org/dc/terms/ 10 11 foaf http://xmlns.com/foaf/0.1/ 11 12 frbr http://iflastandards.info/ns/fr/frbr/frbrer/ 12 13 frbr-rda http://rdvocab.info/uri/schema/FRBRentitiesRDA 13 14 isbd http://iflastandards.info/ns/isbd/elements/ 14 15 owl http://www.w3.org/2002/07/owl# 15 16 prov http://www.w3.org/ns/prov# 16 17 rdaa http://rdaregistry.info/Elements/a/ 17 18 rdac http://rdaregistry.info/Elements/c/ 18 19 rdae http://rdaregistry.info/Elements/e/ 19 20 rdam http://rdaregistry.info/Elements/m/ 20 21 rdaw http://rdaregistry.info/Elements/w/ 21 22 rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# 22 23 rdfs http://www.w3.org/2000/01/rdf-schema# 23 24 rdau http://rdaregistry.info/Elements/u/> 24 25 schema http://schema.org/ 25 26 skos http://www.w3.org/2004/02/skos/core# 26 27 wdt http://www.wikidata.org/entity/ 27 28 wd http://www.wikidata.org/entity/ 28 29 wemi http://rdvocab.info/RDARelationshipsWEMI/ 29 30 xsd http://www.w3.org/2001/XMLSchema# 30 31 31 32 PREFIX frbr-rda: Data, Semantic Web 9(1) (2018), 3–24. doi:10.3233/SW- 32 33 [32] E. Prud’hommeaux, J.E. Labra Gayo and H. Solbrig, Shape 34 35 Expressions: An RDF Validation and Transformation Lan- 35 guage, in: Proceedings of the 10th International Conference 36 SELECT distinct ?p 36 WHERE { on Semantic Systems, SEM ’14, Association for Computing 37 Machinery, New York, NY, USA, 2014, pp. 32–40–. ISBN 37 ?s a frbr-rda:Work . 38 9781450329279. doi:10.1145/2660517.2660523. 38 39 ?s ?p ?o [33] H.R. Solbrig, E. Prud’hommeaux, G. Grieve, L. McKenzie, 39 40 FILTER (!regex(?p, "roles")) . J.C. Mandel, D.K. Sharma and G. Jiang, Modeling and validat- 40 41 FILTER (!regex(?p, "relators")) . ing HL7 FHIR profiles using semantic web Shape Expressions 41 (ShEx), Journal of Biomedical Informatics 67 (2017), 90– 42 } LIMIT 100 42 100. doi:https://doi.org/10.1016/j.jbi.2017.02.009. http://www. 43 sciencedirect.com/science/article/pii/S1532046417300345. 43 44 Fig. 11. A SPARQL query to retrieve the different properties used [34] K. Thornton, H. Solbrig, G.S. Stupp, J.E.L. Gayo, D. Mi- 44 45 by the class frbr-rda:Work. The FILTER instructions exclude etchen, E. Prud’hommeaux and A. Waagmeester, Using Shape 45 46 the roles and relators properties. Expressions (ShEx) to Share RDF Data Models and to 46 47 Guide Curation with Rigorous Validation, in: The Semantic 47 Web - 16th International Conference, ESWC 2019, Portorož, 48 [30] A. Piscopo, Wikidata:Requests for comment/Data quality 48 framework for Wikidata, 2016, [Online; accessed 11-February- Slovenia, June 2-6, 2019, Proceedings, 2019, pp. 606–620. 49 49 2018]. doi:10.1007/978-3-030-21348-0_39. 50 [31] F. Radulovic, N. Mihindukulasooriya, R. García-Castro and [35] J.C.J. van Dam, J.J. Koehorst, P.J. Schaap and M. Suárez- 50 51 A. Gómez-Pérez, A comprehensive quality model for Linked Diez, The Empusa code generator: bridging the gap between 51 14 G. Candela et al. / A ShEx approach for assessing the quality of LOD in DLs

1 start = @ 1 2 { 2 3 rdfs:label xsd:string ; 3 4 owl:sameAs IRI* ; 4 5 dct:title rdf:langString ; 5 6 dct:subject IRI* ; 6 7 dct:creator IRI* ; 7 8 dct:contributor IRI* ; 8 9 dct:language IRI* ; 9 10 dct:description rdf:langString* ; 10 11 dct:created xsd:string* ; 11 12 dct:publisher xsd:string* ; 12 13 dct:frequency xsd:string* ; 13 14 bnf-onto:subject IRI* ; 14 15 bnf-onto:translation IRI* ; 15 16 bnf-onto:electronicEdition IRI* ; 16 17 bibo:issn xsd:string* ; 17 18 rdam:P30135 IRI* ; 18 19 rdam:P30086 IRI* ; 19 20 rdam:P30016 IRI* ; 20 21 rdam:P30088 IRI* ; 21 22 rdam:P30176 IRI* ; 22 23 wemi:workManifested IRI* ; 23 24 wemi:expressionOfWork IRI* ; 24 25 wemi:electronicReproduction IRI* ; 25 26 } 26 27 27 28 28 29 Fig. 12. A ShEx to validate the resources typed as rda-frbr:Work at data.bnf.fr. Each line corresponds to a property used to describe the 29 30 resources. 30 31 31 32 the intended and the actual content of RDF resources, CoRR York, NY, USA, 2017, pp. 111–115–. ISBN 9781450352659. 32 abs/1812.04386 (2018). http://arxiv.org/abs/1812.04386. doi:10.1145/3078081.3078109. 33 33 [36] H. García-González and J.E.L. Gayo, XMLSchema2ShEx: [40] S.E. Sim, S.M. Easterbrook and R.C. Holt, Using Benchmark- 34 34 Converting XML validation to RDF validation, Semantic Web ing to Advance Research: A Challenge to Software Engi- 35 11(2) (2020), 235–253. doi:10.3233/SW-180329. neering, in: Proceedings of the 25th International Conference 35 36 [37] V. Charles, J. Stiller, P. Király, W. Bailer and N. Freire, Data on Software Engineering, May 3-10, 2003, Portland, Oregon, 36 37 Quality Assessment in Europeana: Metrics for Multilinguality, USA, 2003, pp. 74–83. doi:10.1109/ICSE.2003.1201189. 37 in: Joint Proceedings of the 1st Workshop on Temporal Dynam- 38 [41] S.S. Heckman and L. Williams, On establishing a benchmark 38 ics in Digital Libraries (TDDL 2017), the (Meta)-Data Qual- for evaluating static analysis alert prioritization and classifica- 39 ity Workshop (MDQual 2017) and the Workshop on Model- 39 tion techniques, in: Proceedings of the Second International 40 ing Societal Future (Futurity 2017) co-located with 21st In- 40 Symposium on Empirical Software Engineering and Measure- 41 ternational Conference on Theory and Practice of Digital Li- 41 ment, ESEM 2008, October 9-10, 2008, Kaiserslautern, Ger- braries (TPLD 2017), Thessaloniki, Greece, September 21, 42 many, 2008, pp. 41–50. doi:10.1145/1414004.1414013. 42 2017, 2017. http://ceur-ws.org/Vol-2038/paper6.pdf. 43 [42] B. Spahiu, A. Maurino and R. Meusel, Topic profiling bench- 43 [38] P. Király, Validating 126 Million MARC Records, in: 44 Proceedings of the 3rd International Conference on Dig- marks in the linked open data cloud: Issues and lessons learned, 44 45 ital Access to Textual Cultural Heritage, DATeCH2019, Semantic Web 10(2) (2019), 329–348. doi:10.3233/SW- 45 46 Association for Computing Machinery, New York, NY, 180323. 46 [43] J. Baxmeyer, K. Coyle, J. Dyla, M. Han, S. Folsom, P. Schreur 47 USA, 2019, pp. 161–168–. ISBN 9781450371940. 47 and T. Thompson, Linked Data Infrastructure Models: Areas of 48 doi:10.1145/3322905.3322929. 48 [39] P. Király, Towards an Extensible Measurement of Meta- Focus for PCC Strategies, 2017. https://www.loc.gov/aba/pcc/ 49 49 data Quality, in: Proceedings of the 2nd International Con- documents/LinkedDataInfrastructureModels.pdf. 50 ference on Digital Access to Textual Cultural Heritage, [44] World Wide Web Consortium, Shape Expressions Language 50 51 DATeCH2017, Association for Computing Machinery, New 2.1, 2019. https://shex.io/shex-semantics/. 51 G. Candela et al. / A ShEx approach for assessing the quality of LOD in DLs 15

1 [45] J.E. Labra Gayo, E. Prud’hommeaux, I. Boneva and [49] G. Candela, P. Escobar, R.C. Carrasco and M. Marco-Such, A 1 2 D. Kontokostas, Validating RDF Data, Synthesis Lectures linked open data framework to enhance the discoverability and 2 3 on the Semantic Web: Theory and Technology, Vol. 7, impact of culture heritage, Journal of Information Science 0(0) 3 Morgan & Claypool Publishers LLC, 2017, pp. 1–328. 4 (0), 0165551518812658. doi:10.1177/0165551518812658. 4 doi:10.2200/s00786ed1v01y201707wbe016. [50] J. Jett, T.W. Cole, M.K. Han and C. Szylowicz, Linked 5 5 [46] I. Boneva, J. Dusart, D. Fernández Alvarez and J.E.L. Gayo, Open Data (LOD) for Library Special Collections, in: 2017 6 6 Shape Designer for ShEx and SHACL Constraints, 2019, ACM/IEEE Joint Conference on Digital Libraries, JCDL 2017, Poster. https://hal.archives-ouvertes.fr/hal-02268667. 7 Toronto, ON, Canada, June 19-23, 2017, 2017, pp. 309–310. 7 [47] E. Prud’hommeaux, T. Baker, Glenna, J.E.L. Gayo, 8 doi:10.1109/JCDL.2017.7991604. 8 mrolympia, andrawaag, L. Werkmeister and D. Booth, shex.js 9 9 - Javascript implementation of Shape Expressions, Zenodo, [51] M.H. Carrasco, S. Luján-Mora, A. Maté and J. Trujillo, Current 10 2018. doi:10.5281/zenodo.1213693. state of Linked Data in digital libraries, J. Information Science 10 11 [48] R.Y. Wang and D.M. Strong, Beyond Accuracy: What Data 42(2) (2016), 117–127. doi:10.1177/0165551515594729. 11 12 Quality Means to Data Consumers, J. of Management Infor- [52] E.T. Mitchell, Library Linked Data: Early Activity and De- 12 Library Technology Reports 13 mation Systems 12(4) (1996), 5–33. http://www.jmis-web.org/ velopment, 52(1) (2016), 5–13. 13 articles/1002. doi:10.5860/ltr.52n1. 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51