Survey of Tools for Linked Data Consumption

Semantic Web 0 (2017) 1–42 1 IOS Press Survey of Tools for Linked Data Consumption Jakub Klímek a;∗ Petr Škoda a Martin Necaskýˇ a a Faculty of Mathematics and Physics, Charles University in Prague Malostranské námˇestí25, 118 00 Praha 1, Czech Republic E-mail: [email protected], [email protected], [email protected] Abstract. There is lots of data published as Linked (Open) Data (LOD/LD). At the same time, there is also a multitude of tools for publication of LD. However, potential LD consumers still have difficulty discovering, accessing and exploiting LD. This is because compared to consumption of traditional data formats such as XML and CSV files, there is a distinct lack of tools for consumption of LD. The promoters of LD use the well-known 5-star Open Data deployment scheme to suggest that consumption of LD is a better experience once the consumer knows RDF and related technologies. This suggestion, however, falls short when the consumers search for an appropriate tooling support for LD consumption. In this paper we define a LD consumption process. Based on this process and current literature, we define a set of 36 requirements a hypothetical Linked Data Consumption Platform (LDCP) should ideally fulfill. We cover those requirements with a set of 93 evaluation criteria. We survey 110 tools identified as potential candidates for LDCP, eliminating them in 4 rounds until 9 candidates for LDCP remain. We evaluate the 9 candidates using our 93 criteria. Based on this evaluation we show which parts of the LD consumption process are covered by the 9 candidates. We also show that there are important LD consumption steps which are not sufficiently covered by existing tools. The authors of LDCP implementations may use our paper to decide about directions of future development of their tools. The paper can also be used as an introductory text to LD consumption. Keywords: Linked Data, RDF, dataset, consumption, discovery, visualization, tool, survey 1. Introduction that the non-trivial effort put into publishing the data as LD will bring benefits mainly to them, the users of A considerable amount of data represented in the the data. These benefits, compared to 3-star Open Data, form of Linked (Open) Data (LOD/LD) is now avail- include better described and understandable data and able on the Web, or in enterprise knowledge graphs. its schema, safer data integration thanks to the global More and more data publishers are convinced that the character of the IRIs and shared vocabularies, better additional effort put into the last 2 stars of the now reuse of tools, more context thanks to the links and the widely accepted and promoted 5-star Open Data deploy- ability to (re)use only parts of the data. ment scheme1 will bring the promised benefits to the To encourage Open Data users to consume LD there users of their data. The publishers rightfully expect that is a need for a platform which will help them with such the users will appreciate the 5-star data much more than, consumption and which will exploit all the benefits of e.g., the 3-star CSV files files given that the proper pub- LD. It is not so important whether such platform is a lication of 5-star data is considerably harder and more single software tool or a set of compatible and reusable costly to achieve. Naturally, the expectations of users software tools which can be combined as needed. The of LD are quite high as the promoters of LD promise necessity of building such a platform is also identified in [107], where the authors try to build a toolset us- *Corresponding author. E-mail: [email protected]. able by users to access LD. In this paper we call such 1http://5stardata.info/ a hypothetical platform a Linked Data Consumption 1570-0844/17/$35.00 c 2017 – IOS Press and the authors. All rights reserved Platform (LDCP). In the broadest sense, LDCP should using the whole set of criteria. The evaluation is not a support users in discovering, loading and processing user based study and, therefore, it does not evaluate the LD datasets with the aim of either getting some useful usability of the tools. visual output, which can be presented to lay users, or This paper is based on our introductory study of getting a machine-readable output, which can then be the problem [72]. Since then, we have better defined used in many existing non-RDF processing tools avail- requirements, better defined survey methodology and able today for data analysis and visualization. In the we survey considerably more tools which are also more minimal sense, LDCP should be able to load a given LD recent, which we present here. dataset or its part and provide some non-RDF output, This paper is structured as follows. In Section 2 we either a visualization or a machine-readable output in a provide a motivating example of a user scenario in non-RDF format, such as a CSV file. which a journalist wants to consume LD and expects The aim of this paper is to identify existing usable a platform that will enable him to enjoy the promised software tools which could serve as LDCP and assess LD benefits. In Section 3 we describe our evaluation the extent of support they could provide to the users methodology. In Section 4 we define the LD consump- regarding LD consumption. tion process, describe the identified requirements and In the first part of the paper, we describe the LD con- cover them with evaluation criteria. In Section 5 we sumption process and we identify requirements which identify existing tools and evaluate them using the eval- specify in detail how individual activities in the LD con- uation criteria. In Section 6 we survey related studies sumption process should be supported by LDCP. The of LD consumption and in Section 7 we conclude. requirements are identified based on a motivating scenario, existing W3C Recommendations and research ar- eas and approaches published in recent scientific litera- 2. Motivating Example ture. This paper is not a detailed and complete survey of existing research literature related to LD consumption. To motivate our work, let us suppose a data journalist We use the existing literature to support the proposed who is used to working with 3-star Open Data, i.e. non- requirements. In other words, the individual proposed RDF Open Data available on the web such as HTML requirements are not novel. Each requirement is either pages, CSV tables, etc. He needs to collect data about back by an existing W3C Recommendation and/or the population of cities in Europe and display an informa- possibility of fulfilling the requirement is described in tive map in an article he is working on. The intended the literature. We compose the requirements into the audience of the article is also statisticians who are used LD consumption process. Then, for each requirement to CSVs, so the journalist also wants to publish the we define a set of criteria to be used to decide whether underlying data as a CSV file attached to the article. a given software tool fulfills the requirement or not. In He is used to working with 3-star open data, which total, the paper presents 36 requirements covered by 93 means that he has to know where to find the necessary evaluation criteria. data sources, e.g. population statistics and descriptive In the second part of the paper, we identify existing data about cities including their location, he has to usable software tools which we classify as LDCP candi- download them, integrate them and process them to get dates. We did a systematic review to identify these soft- the required output, i.e. a visualization and a CSV file. ware tools. Then we filtered these software tools only The journalist now wants to use LD because he heard to those which fulfill the following conditions which that it is the highest possible level of openness accord- define a minimalistic LDCP described above: ing to the 5-star Open Data deployment scheme. He expects the experience of working with LD will be some- – It is possible to install the tool locally or its public how better than the experience of working with 3-star instance is available on the web. data once he learns RDF and related technologies. Let – It is possible to load RDF data into the tool. – It is possible to get some output from the tool us now suppose that he has learned them. Now he ex- which can be used by a non-RDF expert, such pects to get better access to data and better experience as a visualization or an output in a 3-star data working with the data. representation Naturally, the user expects a tool that can be used to work with such data and which will bring the promised As a result we identify 9 usable software tools which benefits of LD. Ideally, the tool would be an integrated we classify as minimal LDCP and evaluate them further platform that supports the whole process of LD consumption starting from identification of data sources, Let us now suppose that the user has all the datasets ending with the resulting visualization or processed needed for his goal. There could be an interoperabil- data ready to be used by other non-RDF tools while ity issue caused by the existence of multiple vocabu- using the benefits of LD where applicable. laries describing the same domain. In our example, it Intuitively, the user needs to find data about cities in could be that the dataset containing locations of the Europe, i.e.

Survey of Tools for Linked Data Consumption

Applications

V a Lida T in G R D F Da

A Field Guide to Web Apis by Kin Lane Contents

Semantics Developer's Guide

Product Schema, SEO, Structured Data | Caliber Media Group & Schema.Org Products | Product Schema | SEO

Seamless Interoperability and Data Portability in the Social Web for Facilitating an Open and Heterogeneous Online Social Network Federation

Navigating the New Streaming API Landscape

Bibliography of Erik Wilde

The Application of Semantic Web Technologies to Content Analysis in Sociology

Dezvoltarea Aplicațiilor Web La Nivel De Client

15 Years of Semantic Web: an Incomplete Survey

Designing an Ontology for Managing the Diets of Hypertensive Individuals