Survey of Tools for Linked Data Consumption
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Web 0 (2018) 1–58 1 IOS Press 1 1 2 2 3 3 4 Survey of Tools for Linked Data 4 5 5 6 Consumption 6 7 7 8 Jakub Klímek a,* Petr Škoda a and Martin Necaskýˇ a 8 9 9 a Faculty of Mathematics and Physics, Charles University in Prague 10 10 Malostranské námˇestí25, 118 00 Praha 1, Czech Republic 11 11 E-mails: [email protected], [email protected], [email protected] 12 12 13 13 Editor: Ruben Verborgh, Ghent University – iMinds, Belgium 14 14 Solicited reviews: Elena Demidova, LS3 Research Center Hannover, Germany; Ruben Taelman, Ghent University, Belgium; Daniel Garijo, 15 University of Southern California, USA; Stefan Dietze, Heinrich-Heine-University Düsseldorf, Germany 15 16 16 17 17 18 18 19 19 20 20 Abstract. There is a large number of datasets published as Linked (Open) Data (LOD/LD). At the same time, there is also 21 a multitude of tools for publication of LD. However, potential LD consumers still have difficulty discovering, accessing and 21 22 exploiting LD. This is because compared to consumption of traditional data formats such as XML and CSV files, there is a distinct 22 23 lack of tools for consumption of LD. The promoters of LD use the well-known 5-star Open Data deployment scheme to suggest 23 24 that consumption of LD is a better experience once the consumer knows RDF and related technologies. This suggestion, however, 24 25 falls short when the consumers search for an appropriate tooling support for LD consumption. In this paper we define a LD 25 26 consumption process. Based on this process and current literature, we define a set of 34 requirements a hypothetical Linked Data 26 27 Consumption Platform (LDCP) should ideally fulfill. We cover those requirements with a set of 94 evaluation criteria. We survey 27 28 110 tools identified as potential candidates for an LDCP, eliminating them in 3 rounds until 16 candidates for remain. We evaluate 28 29 the 16 candidates using our 94 criteria. Based on this evaluation we show which parts of the LD consumption process are covered 29 by the 16 candidates. Finally, we identify 8 tools which satisfy our requirements on being a LDCP. We also show that there are 30 30 important LD consumption steps which are not sufficiently covered by existing tools. The authors of LDCP implementations may 31 use this survey to decide about directions of future development of their tools. LD experts may use it to see the level of support of 31 32 the state of the art technologies in existing tools. Non-LD experts may use it to choose a tool which supports their LD processing 32 33 needs without requiring them to have expert knowledge of the technologies. The paper can also be used as an introductory text to 33 34 LD consumption. 34 35 35 36 Keywords: Linked Data, dataset, consumption, discovery, visualization, tool, survey 36 37 37 38 38 39 39 40 1. Introduction 2* make it available as structured data (e.g., Excel in- 40 41 stead of image scan of a table) 41 42 The openness of data is often measured using the 3* make it available in a non-proprietary open format 42 43 now widely accepted and promoted 5-star Open Data (e.g., CSV instead of Excel) 43 44 deployment scheme introduced by Sir Tim Berners- 4* use URIs to denote things, so that people can point 44 1 45 Lee . The individual stars are awarded for compliance at your stuff 45 with the following rules (citing the web): 46 5* link your data to other data to provide context 46 47 1* make your stuff available on the Web (whatever The last star is also denoted as Linked Open Data 47 48 format) under an open license (LD/LOD). More and more data publishers are con- 48 49 vinced that the additional effort put into the last 2 stars, 49 50 *Corresponding author. E-mail: [email protected]. i.e. LD in RDF as opposed to CSV files, will bring 50 51 1http://5stardata.info the promised benefits to the users of their data. The 51 1570-0844/18/$35.00 c 2018 – IOS Press and the authors. All rights reserved 2 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 1 publishers rightfully expect that the users will appre- e.g., in [3]. However, they seem to be even more serious 1 2 ciate the 5-star data much more than, e.g., the 3-star in case of LD. For example, the SPARQLES service7 2 3 CSV files given that the proper publication of 5-star shows that only 38 % of known services providing LD 3 4 data is considerably harder and more costly to achieve. are available8. 4 5 In particular, they expect that the users will appreciate In this paper we do not aim at surveying and evaluat- 5 6 benefits such as better described and understandable ing the problems of data availability, performance and 6 7 data and its schema, safer data integration thanks to the quality as these are separate topics on their own. We 7 8 global character of the IRIs and shared vocabularies, are rather interested in the question of whether there 8 9 better reuse of tools, more context thanks to the links are software tools or platforms available which would 9 10 and the ability to (re)use only parts of the data. support Open Data users, both with and without expert 10 11 Open Data users are familiar with 3-star publication LD knowledge, in consuming LD. It is not so important 11 12 formats and principles but it is hard to encourage them whether such platform is a single software tool or a set 12 13 to use 5-star data. Often they do not know LD technolo- of compatible and reusable software tools integrated 13 14 gies, most notably the RDF data model, its serializa- together. The need for building such a platform is also 14 15 tion formats and the SPARQL query language. They identified in [2]. 15 16 refuse to learn them because they do not see the dif- We call such a hypothetical platform a Linked Data 16 17 ference between 3-star and 5-star representations. For Consumption Platform (LDCP). In the broadest sense, 17 18 example, our initiative OpenData.cz has re-published an LDCP should support users in discovering, loading 18 19 77 2-star and 3-star datasets published by Czech public and processing LD datasets with the aim of getting 19 20 bodies in 5-star representation2. Moreover, as a part of some output useful for users without the knowledge of 20 21 the EU funded project COMSODE, we participated on LD technologies. However, it would not be practical 21 22 re-publishing other datasets published by Czech, Slo- to require each LDCP to support all these tasks. In the 22 23 vak, Italian and Dutch public bodies and several pan- minimal sense, an LDCP should have the following 23 24 European datasets in 5-star representation3. We also properties: 24 25 25 helped two Czech governmental bodies to publish their 1. It is able to load a given LD dataset or its part. 26 26 Open Data as 5-star data (the Czech Social Security Ad- 2. Its user interface does not presume the knowledge 27 4 5 27 ministration [1] and the Ministry of Finance ). We pre- of LD technologies so that users without such 28 28 sented the publication results to Open Data consumers knowledge can use the tool. Most notably this in- 29 29 at several community events. Their feedback was that cludes the knowledge of the RDF data model, its 30 30 they would like to use the data but they need a consump- serializations and the SPARQL query language. 31 31 tion tool which does not require them to know the LD 3. It provides some non-RDF output the users can 32 32 technologies. Only after being able to work with LD further work with. The output can be a visualiza- 33 33 in this way and seeing the benefits on real data of their tion or a machine-readable output in a non-RDF 34 34 interest will they consider learning the LD technologies format, e.g. a CSV file. 35 to use the data in an advanced way. 35 36 The contribution of this paper is a survey of exist- 36 There are also other resources reporting that learn- 37 ing software tools presented in recent scientific litera- 37 ing LD technologies is a problem. For example, [2] 38 ture which can serve as an LDCP at least in this min- 38 describes a toolset for accessing LD by users without 39 imal sense. The tools are evaluated using a set of re- 39 the knowledge of LD technologies. A recent blog post 40 quirements described in the paper. The results of our 40 "Why should you learn SPARQL? Wikidata!"6 shows an 41 evaluation show that despite many years of work on 41 interesting view of learning SPARQL presented by one 42 tools for Linked Data consumption, there is still a large 42 of the Open Data users. This blog post also mentions a 43 gap between state of the art technologies and the level 43 widely known problem of quality of LD and availability 44 of their implementation in tools for actual users. In fact, 44 and performance of services providing LD.