Semantic Web 0 (2018) 1–58 1 IOS Press

1 1 2 2 3 3 4 Survey of Tools for Linked Data 4 5 5 6 Consumption 6 7 7 8 Jakub Klímek a,* Petr Škoda a and Martin Necaskýˇ a 8 9 9 a Faculty of Mathematics and Physics, Charles University in Prague 10 10 Malostranské námˇestí25, 118 00 Praha 1, Czech Republic 11 11 E-mails: [email protected], [email protected], [email protected] 12 12 13 13 Editor: Ruben Verborgh, Ghent University – iMinds, Belgium 14 14 Solicited reviews: Elena Demidova, LS3 Research Center Hannover, Germany; Ruben Taelman, Ghent University, Belgium; Daniel Garijo, 15 University of Southern California, USA; Stefan Dietze, Heinrich-Heine-University Düsseldorf, Germany 15 16 16 17 17 18 18 19 19 20 20 Abstract. There is a large number of datasets published as Linked (Open) Data (LOD/LD). At the same time, there is also 21 a multitude of tools for publication of LD. However, potential LD consumers still have difficulty discovering, accessing and 21 22 exploiting LD. This is because compared to consumption of traditional data formats such as XML and CSV files, there is a distinct 22 23 lack of tools for consumption of LD. The promoters of LD use the well-known 5-star Open Data deployment scheme to suggest 23 24 that consumption of LD is a better experience once the consumer knows RDF and related technologies. This suggestion, however, 24 25 falls short when the consumers search for an appropriate tooling support for LD consumption. In this paper we define a LD 25 26 consumption process. Based on this process and current literature, we define a set of 34 requirements a hypothetical Linked Data 26 27 Consumption Platform (LDCP) should ideally fulfill. We cover those requirements with a set of 94 evaluation criteria. We survey 27 28 110 tools identified as potential candidates for an LDCP, eliminating them in 3 rounds until 16 candidates for remain. We evaluate 28 29 the 16 candidates using our 94 criteria. Based on this evaluation we show which parts of the LD consumption process are covered 29 by the 16 candidates. Finally, we identify 8 tools which satisfy our requirements on being a LDCP. We also show that there are 30 30 important LD consumption steps which are not sufficiently covered by existing tools. The authors of LDCP implementations may 31 use this survey to decide about directions of future development of their tools. LD experts may use it to see the level of support of 31 32 the state of the art technologies in existing tools. Non-LD experts may use it to choose a tool which supports their LD processing 32 33 needs without requiring them to have expert knowledge of the technologies. The paper can also be used as an introductory text to 33 34 LD consumption. 34 35 35 36 Keywords: Linked Data, dataset, consumption, discovery, visualization, tool, survey 36 37 37 38 38 39 39 40 1. Introduction 2* make it available as structured data (e.g., Excel in- 40 41 stead of image scan of a table) 41 42 The openness of data is often measured using the 3* make it available in a non-proprietary open format 42 43 now widely accepted and promoted 5-star Open Data (e.g., CSV instead of Excel) 43 44 deployment scheme introduced by Sir Tim Berners- 4* use URIs to denote things, so that people can point 44 1 45 Lee . The individual stars are awarded for compliance at your stuff 45 with the following rules (citing the web): 46 5* link your data to other data to provide context 46 47 1* make your stuff available on the Web (whatever The last star is also denoted as Linked Open Data 47 48 format) under an open license (LD/LOD). More and more data publishers are con- 48 49 vinced that the additional effort put into the last 2 stars, 49 50 *Corresponding author. E-mail: [email protected]. i.e. LD in RDF as opposed to CSV files, will bring 50 51 1http://5stardata.info the promised benefits to the users of their data. The 51

1570-0844/18/$35.00 c 2018 – IOS Press and the authors. All rights reserved 2 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 publishers rightfully expect that the users will appre- e.g., in [3]. However, they seem to be even more serious 1 2 ciate the 5-star data much more than, e.g., the 3-star in case of LD. For example, the SPARQLES service7 2 3 CSV files given that the proper publication of 5-star shows that only 38 % of known services providing LD 3 4 data is considerably harder and more costly to achieve. are available8. 4 5 In particular, they expect that the users will appreciate In this paper we do not aim at surveying and evaluat- 5 6 benefits such as better described and understandable ing the problems of data availability, performance and 6 7 data and its schema, safer data integration thanks to the quality as these are separate topics on their own. We 7 8 global character of the IRIs and shared vocabularies, are rather interested in the question of whether there 8 9 better reuse of tools, more context thanks to the links are software tools or platforms available which would 9 10 and the ability to (re)use only parts of the data. support Open Data users, both with and without expert 10 11 Open Data users are familiar with 3-star publication LD knowledge, in consuming LD. It is not so important 11 12 formats and principles but it is hard to encourage them whether such platform is a single software tool or a set 12 13 to use 5-star data. Often they do not know LD technolo- of compatible and reusable software tools integrated 13 14 gies, most notably the RDF data model, its serializa- together. The need for building such a platform is also 14 15 tion formats and the SPARQL query language. They identified in [2]. 15 16 refuse to learn them because they do not see the dif- We call such a hypothetical platform a Linked Data 16 17 ference between 3-star and 5-star representations. For Consumption Platform (LDCP). In the broadest sense, 17 18 example, our initiative OpenData.cz has re-published an LDCP should support users in discovering, loading 18 19 77 2-star and 3-star datasets published by Czech public and processing LD datasets with the aim of getting 19 20 bodies in 5-star representation2. Moreover, as a part of some output useful for users without the knowledge of 20 21 the EU funded project COMSODE, we participated on LD technologies. However, it would not be practical 21 22 re-publishing other datasets published by Czech, Slo- to require each LDCP to support all these tasks. In the 22 23 vak, Italian and Dutch public bodies and several pan- minimal sense, an LDCP should have the following 23 24 European datasets in 5-star representation3. We also properties: 24 25 25 helped two Czech governmental bodies to publish their 1. It is able to load a given LD dataset or its part. 26 26 Open Data as 5-star data (the Czech Social Security Ad- 2. Its user interface does not presume the knowledge 27 4 5 27 ministration [1] and the Ministry of Finance ). We pre- of LD technologies so that users without such 28 28 sented the publication results to Open Data consumers knowledge can use the tool. Most notably this in- 29 29 at several community events. Their feedback was that cludes the knowledge of the RDF data model, its 30 30 they would like to use the data but they need a consump- serializations and the SPARQL query language. 31 31 tion tool which does not require them to know the LD 3. It provides some non-RDF output the users can 32 32 technologies. Only after being able to work with LD further work with. The output can be a visualiza- 33 33 in this way and seeing the benefits on real data of their tion or a machine-readable output in a non-RDF 34 34 interest will they consider learning the LD technologies format, e.g. a CSV file. 35 to use the data in an advanced way. 35 36 The contribution of this paper is a survey of exist- 36 There are also other resources reporting that learn- 37 ing software tools presented in recent scientific litera- 37 ing LD technologies is a problem. For example, [2] 38 ture which can serve as an LDCP at least in this min- 38 describes a toolset for accessing LD by users without 39 imal sense. The tools are evaluated using a set of re- 39 the knowledge of LD technologies. A recent post 40 quirements described in the paper. The results of our 40 "Why should you learn SPARQL? Wikidata!"6 shows an 41 evaluation show that despite many years of work on 41 interesting view of learning SPARQL presented by one 42 tools for Linked Data consumption, there is still a large 42 of the Open Data users. This blog post also mentions a 43 gap between state of the art technologies and the level 43 widely known problem of quality of LD and availability 44 of their implementation in tools for actual users. In fact, 44 and performance of services providing LD. These prob- 45 there are entire areas of Linked Data consumption such 45 lems are problems of Open Data in general as reported, 46 as dataset search and analysis of semantic relations 46 47 of datasets in which there are no tools available for 47 48 2https://linked.opendata.cz non-expert users at all. 48 3 49 http://data.comsode.eu/ 49 4https://data.cssz.cz/sparql 50 50 5http://cedropendata.mfcr.cz/c3lod/cedr/sparql 7http://sparqles.ai.wu.ac.at/ 51 6https://longair.net/blog/2017/11/29/sparql-wikidata/ 8Checked on 09.03.2018 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 3

1 In the first part of the paper, we describe the LD con- something relevant for LD consumption in terms of 1 2 sumption process and we identify requirements which our requirements even though they provide only RDF 2 3 specify how individual activities in the LD consumption output. These tools do not fulfill condition 3 of the 3 4 process could be supported by an LDCP. The individ- minimal LDCP defined above. However, their RDF 4 5 ual requirements are identified either based on existing output can be used as input for any LDCP and, therefore, 5 6 W3C Recommendations, or based on research areas they can be integrated with any LDCP and extend its 6 7 and approaches published in recent scientific literature LD consumption features. 7 8 related to some parts of the LD consumption process. Let us also note that the evaluation requirements are 8 9 Therefore, the individual proposed requirements are not not defined by non-LD experts. They are LD expert 9 10 novel. For each requirement we define a set of criteria requirements based on the current state of the art in the 10 11 to be used to decide whether a given software tool ful- field of LD consumption. Our evaluation shows how 11 12 fills the requirement or not. In total, the paper presents the current state of the art in different parts of the LD 12 13 34 requirements covered by 94 evaluation criteria. The consumption process is reflected in the 16 evaluated 13 14 requirements and their evaluation criteria constitute a tools in a way usable by non-LD experts or at least by 14 15 framework we use to evaluate the tools. LD experts. LD experts will find interesting the details 15 16 In the second part of the paper we present our con- of the evaluation of the individual requirements or even 16 17 tribution – a survey of existing software tools which the individual criteria. Moreover, we also aggregate the 17 18 can be used as LDCPs. To identify LDCP candidates evaluation results to show to which extent the evaluated 18 19 we do a systematic review of recent scientific literature. tools cover different parts of the LD consumption pro- 19 20 We search for papers which describe software tools for cess. This result will help non-LD experts in selecting 20 21 LD consumption, visualization, discovery, exploration, the tool they can use to consume LD depending on their 21 22 analysis or processing. For each candidate we search specific needs. 22 23 for its on-line instance or demo or we install our own This paper is based on our introductory study of 23 24 instance and we try to load our testing LD datasets to the problem [4]. Since then, we have better defined 24 25 the instance. If we are successful the candidate is se- requirements, better defined survey methodology and 25 26 lected for further evaluation. As a result, we select 16 we survey considerably more tools which are also more 26 27 software tools. Each tool is evaluated using the whole 27 recent, which we present here. 28 set of criteria to determine the fulfilled requirements 28 This paper is structured as follows. In Section 2 29 introduced in the first part of the paper. The individual 29 we provide a motivating example of a user scenario 30 requirements are neither disqualifying nor intended to 30 in which journalists want to consume LD and expect 31 select the best tool for LD consumption. They represent 31 to have a platform that will enable them to enjoy the 32 different LD consumption features. We show how each 32 promised LD benefits. In Section 3 we describe our 33 considered tool supports these features. 33 evaluation methodology. In Section 4 we define the LD 34 We distinguish two kinds of users in the evaluation. 34 consumption process, describe the identified require- 35 First, there are 3-star data consumers who are not fa- 35 ments and cover them with evaluation criteria. In Sec- 36 miliar with LD technologies. These users initially mo- 36 tion 5 we identify existing tools and evaluate them us- 37 tivated our work. We call them non-LD experts. Sec- 37 ing the evaluation criteria. In Section 6 we survey re- 38 ond, there are experts in LD principles and related tech- 38 lated studies of LD consumption and in Section 7 we 39 nologies. We call them LD experts. When evaluating a 39 conclude. 40 tool we therefore distinguish whether the tool fulfills 40 41 a requirement for LD experts only, or whether even a 41 42 non-LD expert can use it. We consider a tool suitable 42 43 for a non-LD expert when its user interface does not 2. Motivating Example 43 44 presume the knowledge of LD technologies. The evalu- 44 45 ation is not a user based study and therefore it does not To motivate our work, let us suppose a team of data 45 46 evaluate the usability of the tools beyond distinguishing journalists. For their article they need to collect data 46 47 the LD expert and non-LD expert usage. about the population of cities in Europe and display 47 48 Based on the evaluation, we show tools which are an informative map. The intended audience are statisti- 48 49 LDCPs – either in the minimal sense specified above cians, who are used to working with CSV files, so the 49 50 or with some additional features represented by our journalists also want to publish the underlying data for 50 51 requirements. Moreover, we show tools which offer them as CSV files attached to the articles. 51 4 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 The journalists are used to working with 3-star open utilizes the DCAT-AP v1.113 vocabulary for metadata 1 2 data, i.e. they have to know where to find the necessary about datasets and exposes it in a SPARQL endpoint. 2 3 data sources, e.g. population statistics and descriptive The endpoint can be queried for, e.g. keywords (city) 3 4 data about cities including their location, they have to and formats (Turtle, JSON-LD, etc.). An LDCP could 4 5 download them, integrate them and process them to get therefore support loading and querying of dataset - 5 6 the output, i.e. a visualization and a CSV file. data from DCAT-AP enabled catalogs. It could even 6 7 There are several examples of such articles available contain well-known catalogs such as the EDP preset 7 8 on-line. For instance, there is the pan-European Euro- so that the user can search for datasets immediately. It 8 9 pean Data Journalism Network9 where data journal- could then provide a search user interface for non-LD 9 10 ists from different publishing houses in Europe publish experts on top of the collected DCAT-AP metadata. 10 11 their articles based on different data sources. This can After identifying candidate datasets the journalists 11 12 be seen in the article titled "Xenophobia in European need to choose the ones that contain information needed 12 13 cities"10 exploiting Open Data about European cities. for their goal. A helpful functionality of an LDCP could 13 14 14 The journalists now want to use LD because they be various kinds of summaries of the candidate datasets 15 15 heard that it is the highest possible level of openness and characterizations based on both the metadata of 16 16 according to the 5-star Open Data deployment scheme. the dataset and the actual data in the dataset. Here we 17 17 They expect that the experience of working with LD suppose that the metadata loaded from a catalog contain 18 18 will be somehow better than the experience of work- correct access information for each candidate dataset. 19 19 ing with 3-star data. Naturally, they expect a tool they For example, for each candidate dataset, the LDCP 20 could offer human-readable labels and descriptions, 20 can use to work with such data and benefit from LD 21 vocabularies used, numbers of instances of individual 21 principles. On the other hand, they expect that they will 22 classes and their interlinks, previews tailored to the 22 be able to use the tool even without the knowledge of 23 vocabularies used or to the current use case (visual or 23 LD technologies such as RDF serialization formats or 24 textual, readable to non-LD experts), datasets linked 24 SPARQL. Such user expectations are described also 25 form the candidate, datasets linking to the candidate, 25 in the literature. For example, [2] presents a scenario 26 spatial and time coverage, etc. Using this information, 26 where non-LD experts search and analyze LD indexed 27 the journalists would be able to choose a set of datasets 27 28 in Open Data portals such as the Open Data portal of 28 11 containing the required information more easily. 29 the EU commission or similar. In our recent work [5] Another feature that could help at this time is rec- 29 30 we present similar scenarios of searching, analyzing ommendation of related datasets. Based on available 30 31 and visualizing datasets by non-LD experts published information about datasets already selected, the LDCP 31 32 by Czech public bodies. A user based study [6] presents could suggest similar datasets. Given a set of datasets 32 33 how data consumers from the tourism domain, both to be integrated, they can be further analyzed in order 33 34 LD and non-LD experts, can benefit from LD when to find out whether they have a non-empty intersection, 34 35 working with different cross-domain datasets. e.g. whether the population information specified in one 35 36 In the rest of this section, we show how a LD con- dataset actually links to the cities and their locations 36 37 sumption tool may enable our journalists to benefit present in another dataset. 37 38 from LD in different steps of the consumption process There could be an interoperability issue between se- 38 39 without having to know the LD technologies. lected datasets caused by the existence of multiple vo- 39 40 First, the journalists need to find data about cities in cabularies describing the same domain. In our example, 40 41 Europe, i.e. their description and location so that they it could be that the dataset containing locations of the 41 42 can be placed on a map, and data about their popula- cities could have the geocoordinates described using the 42 43 tion. Here, LD can help already. Recently, a number schema:GeoCoordinates14 class from the Schema.org 43 44 of national open data catalogs emerged and are being vocabulary but the user needs another representation, 44 45 45 harvested by the European Data Portal (EDP)12, which e.g., the geo:Point15 class of the WGS84 Geo Position- 46 46 ing vocabulary. Since both of these vocabularies are 47 47 48 9https://www.europeandatajournalism.eu/ 48 10 13 49 https://www.europeandatajournalism.eu/eng/News/Data-news/ https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/ 49 Xenophobia-in-European-cities dcat-ap-v11 50 50 11https://open-data.europa.eu 14http://schema.org/GeoCoordinates 51 12https://www.europeandataportal.eu/ 15http://www.w3.org/2003/01/geo/wgs84_pos#Point 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 5

1 well-known and registered in Linked Open Vocabular- 3.1. Requirements identification 1 2 ies (LOV) [7]16, the LDCP could contain components 2 3 for data transformation between them and offer them To identify requirements to be used, we first define 3 4 to the user automatically. the LD consumption process based on existing literature 4 5 Our journalists now have to choose the entities and which also introduces a similar process. For each step 5 6 properties from those datasets that are needed for their of the process we identify requirements that we see as 6 7 7 goal. This could be done in a graphical way used e.g. crucial for an LDCP. We focus on requirements which 8 8 by graphical SPARQL query builders such as SPARQL- can be satisfied by exploiting the benefits of consuming 9 9 Graph [8], but not necessarily complicated by the full LD compared to consuming 3-star data. We then cover 10 10 expressiveness of SPARQL. each requirement with a set of evaluation criteria which 11 Finally, the data should be prepared for further pro- 11 12 we later use to evaluate existing tools and decide if and 12 cessing, and the journalists need to decide, in which to which extend they fulfill the requirement. 13 data formats they will publish the results of their work. 13 We identified each requirement based on the follow- 14 Since their readers, given the focus of the article, will 14 ing existing resources: 15 typically be non-LD experts, a suitable format is one of 15 16 the 3-star data ones, i.e. XML or CSV, so that readers 16 17 1. An existing software tool which satisfies the re- 17 can open the data for instance in their favorite spread- 18 quirement. 18 sheet editor. This format is also suitable for people sim- 19 19 ply used to using legacy tools. This would require as- 2. A technical specification (e.g., W3C Recommen- 20 20 sisted export to tabular data in CSV, tree data in XML, dation) which defines the requirement. 21 21 or perhaps generic graph data, e.g. in CSV for Gephi [9]. 3. A published research paper which at least theoreti- 22 22 Another option is to publish the data as-is in the plat- cally proposes a method fulfilling the requirement. 23 23 form, i.e. in 5-star RDF. This would be the preferred 4. A published research paper which expresses the 24 24 way for experts, who could then use their favorite RDF need for such a requirement. 25 25 26 processing tools, and for users with access to an LDCP, 26 27 which, as we show in this survey, is still far from avail- These resources are evidence that requirements are 27 28 able. This could enable the readers to exploit for exam- meaningful and relevant. For each of the introduced 28 29 ple some RDF enabled visualizations such as Linked- requirements it must hold that it is supported by at least 29 30 Pipes Visualization [10]. Naturally, the journalists may one of these types of evidence. The evidence identified 30 31 decide to publish both versions of their data to satisfy to support a requirement by one author was then verified 31 32 the widest possible audience. by another author, and possible conflicts were addressed 32 33 Users such as our journalists will never be able to in a discussion leading to a consensus. 33 34 exploit these features only using basic technologies like In Section 4, we describe the evidence for each iden- 34 35 IRI dereferencing and SPARQL. They need a software tified requirement. The evidence is not a complete sur- 35 36 tool, an LDCP, which will help them to use and to vey of software tools, technical specifications and litera- 36 37 benefit from these features even without the knowledge ture supporting the requirement. It only describes some 37 38 of LD technologies. relevant resources which support the requirement. If 38 39 there is a research paper surveying resources relevant to 39 40 the requirement available, we refer to it in the evidence. 40 41 3. Survey methodology 41 42 42 3.2. Tools selection and evaluation 43 The goal of the paper is to evaluate tools classified 43 44 as LDCP candidates using a predefined list of require- 44 45 In this section, we describe the publication venues 45 ments. In this section we describe our methodology for investigated, search engines used and the method used 46 identification of the requirements, identification of the 46 47 to identify tools which we classify as LDCP candidates, 47 tools to be evaluated, for conducting their evaluation 48 i.e. tools described in recent scientific literature as po- 48 and for aggregation of their scores. 49 tentially relevant to LD consumption. Then we describe 49 50 the method for selecting final candidates from the po- 50 51 16https://lov.okfn.org/ tential ones and the tool evaluation methodology. 51 6 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 3.2.1. Publication venues investigated OR toolkit OR platform OR application ) AND ( con- 1 2 We have identified the most important venues where sumption OR visualization OR discovery OR explo- 2 3 publications about relevant tools could appear. For each ration OR analysis OR processing )). For limiting the 3 4 venue we investigated its main sessions and especially publication year, we used PUBYEAR > 2012 AND 4 5 the posters and demo tracks. In addition, the workshops PUBYEAR < 2018 and for limiting the subject we 5 6 listed below have been considered. We investigated used LIMIT-TO ( SUBJAREA, "COMP" )). 6 7 publications since 2013 (published till July 13th 2017). The query returned 1416 candidate papers. Two of 7 8 The relevant venues were: the authors independently went through the search re- 8 9 sult and identified additional potentially interesting can- 9 – World Wide Web Conference (WWW) 2013 - 10 didate papers based on their title and abstract. In the full 10 2017 11 text of the candidate papers they searched for mentions 11 – International Semantic Web Conference (ISWC) 12 of relevant tools. This resulted into 28 additional tools. 12 2013 - 2016 13 All conflicts between the authors in this phase were 13 – Extended Semantic Web Conference (ESWC) 14 resolved by a discussion leading to a consensus. 14 15 2013 - 2017 15 16 – SEMANTiCS 2013 - 2016 3.2.3. Selection of candidates 16 17 – OnTheMove Federated Conferences (OTM) 2013 The list of potential candidates was iteratively fil- 17 18 - 2016 tered in 3 elimination rounds using preliminary criteria 18 19 – International Conference on Web Engineering described below, to limit the number of tools to be later 19 20 (ICWE) 2013 - 2016 evaluated according to all our 94 criteria from Section 4. 20 21 – Linked Data on the Web workshop (LDOW) 2013 The list of all tools which passed round 1, and their 21 22 - 2017 progress through rounds 2 and 3 can be seen in Table 9, 22 23 – International Workshop on Consuming Linked Table 10 and Table 11 in Section B. 23 Data (COLD) 2013 - 2016 24 First elimination round In the first elimination round, 24 – International Conference on Information Integra- 25 we eliminated potential candidates based on the follow- 25 tion and Web-based Applications & Services (ii- 26 ing exclusion criteria, using the formula C1 OR (C2 26 WAS) 2013 - 2016 27 and C3): 27 28 In addition, we investigated the following journals 28 1. C1: based on its available description, it does not 29 for papers no older than 2013, the newest ones being 29 use LD 30 accepted for publication on July 13th 2017: 30 31 2. C2: source code is not available 31 32 – Semantic Web journal (SWJ) 3. C3: does not have a freely accessible demo, i.e. it 32 33 – Journal of Web Semantics (JWS) is online or downloadable without registration or 33 request 34 Based on titles and abstracts, two of the authors in- 34 35 dependently went through the investigated venues and Out of the 110 potential candidates, 65 were left after 35 36 picked potentially interesting candidate papers. Then the first elimination round. 36 37 they read the full text of the candidate papers. In the full 37 Second elimination round In the second elimination 38 text they searched for any mentions of an existing tool 38 round, we eliminated potential candidates based on the 39 which could be used for any of the identified require- 39 following exclusion criteria, using the formula C4 OR 40 ments. This resulted in 82 tools identified as potential 40 41 C5: 41 candidates. 42 42 1. C4: despite initial assessment based on titles and 43 3.2.2. Search engines used 43 abstracts, it is clear from the text of the paper or the 44 In addition to going through the above mentioned 44 17 description on the web that the tool is not related 45 publication venues, we used the Scopus search engine 45 to LD consumption 46 to query for more potential candidates using the follow- 46 2. C5: cannot be installed or the demo does not work 47 ing query: 47 48 For keywords, we used TITLE-ABS-KEY((linked Out of the 65 potential candidates, 40 were left af- 48 49 data OR semantic data OR rdf OR software OR tool ter the second elimination round. For each of the 40 49 50 candidates we had its running instance available which 50 51 17https://www.scopus.com was tested in the following elimination rounds. These 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 7

1 criteria may seem too restrictive, however, it is crucial criterion. Finally, if the scenario could not be done in 1 2 for the evaluated tools to be easily and freely usable or any way, we assigned "–" to the tool for this criterion. 2 3 installable, e.g. without registration, so that they can be When evaluating a tool given a criterion and its sce- 3 4 assessed and used without the need for interaction with nario, we first studied the user interface of the tool for 4 5 their maintainers. any indication of support for that scenario in that tool. 5 6 Next, we searched for evidence of support for that cri- 6 Third elimination round In the third elimination 7 terion in the available tool documentation, i.e. articles 7 round, we eliminated potential candidates based on 8 and web. Only after we failed to find any of 8 the following exclusion criteria, using the formula C6 9 the desired functionality this way, we said that the tool 9 AND C7 AND C8: 10 failed the criterion. 10 11 1. C6: cannot load data from provided RDF file on For criteria dealing with dataset search, when the tool 11 12 the Web allowed us to search for individual RDF resources and 12 13 2. C7: cannot load data from provided SPARQL end- we were able to identify the datasets the found resources 13 14 point belonged to, we considered the criteria passed. 14 15 3. C8: cannot load data from provided RDF file using 15 16 file upload 3.4. Tool score aggregation and interpretation 16 17 17 1819 18 For testing C6, we used two prepared RDF files . Once a tool is evaluated, it has a non-LD expert pass 18 19 Their content is identical, they differ only in the used "", LD expert pass "" or a fail "–" mark assigned 19 20 RDF serialization format. The content consists of ap- for each criterion. These detailed results can be seen 20 21 prox. 1 million RDF triples. For testing C7, we used in Table 6, Table 7 and Table 8 in Section A. Next, its 21 20 22 our SPARQL endpoint . It contains 260 RDF graphs requirement score is computed as percentage of passed 22 23 with over 610 million RDF triples. Note that we do not criteria of each requirement. These can be seen in Ta- 23 24 assume that the tool will try to copy all the data in the ble 3 for non-LD experts and in Table 4 for LD experts 24 25 dataset. Instead, the tool can query the endpoint or load in Section 5. Finally, its requirement group score for 25 26 the dataset only partly, avoiding issues which may be each group of requirements is computed as an average 26 27 caused by the amount of data. For testing C8, we used of scores of requirements in the group. These can be 27 28 the same files as for C6. Two of the evaluated tools seen in Table 1 for non-LD experts and in Table 2 for 28 29 were not able to load our provided files, but were able LD experts in Section 5. The purpose of the score ag- 29 30 to load different files. For the purpose of our survey we gregation is to show the focus of each particular tool re- 30 31 consider those able to load RDF. garding the steps of the LD consumption process, both 31 32 Out of the 40 potential candidates, 16 were left after for non-LD and LD experts. 32 33 the third elimination round. 33 34 34 35 3.3. Candidate evaluation methodology 4. Requirements and evaluation criteria 35 36 36 37 The 16 candidates were evaluated thoroughly based We define the following steps of the LD consumption 37 38 on all 94 of our criteria from Section 4. The evaluation process: 38 39 results are presented in Section 5. 1. discovery of candidate datasets available on the 39 40 Each criterion was evaluated from the point of view web based on a given user’s intent and selection 40 41 of a non-LD expert and then from the point of view of relevant datasets needed to fulfill the intent, 41 42 of an LD expert according to the scenario, which can 2. extraction of data from the relevant datasets, 42 43 be seen in Section C. When the scenario was passed 3. manipulation of the extracted data, e.g. cleansing, 43 44 without expert knowledge of LD technologies, we as- linking, fusion, and structural and semantic trans- 44 45 signed "" to the tool for this criterion. If the sce- formation, 45 46 nario could be passed only with expert knowledge of 4. output of the resulting data into a required format. 46 47 LD technologies, we assigned "" to the tool for this 47 Similar processes are considered in the related lit- 48 48 erature. For example, in [11] the authors describe the 49 49 18https://linked.opendata.cz/soubor/ovm/ovm.trig following process: 50 50 19https://linked.opendata.cz/soubor/ovm/ovm.ttl 51 20https://linked.opendata.cz/sparql (a) correctly identify relevant resources, 51 8 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 (b) extract data and metadata that meet the user’s con- 1. Dataset discovery and selection - techniques al- 1 2 text and the requirements of their task or end goal, lowing users to actually find LD needed to fulfill 2 3 (c) pre-process this data to feed into the visualization their intent are detailed in Section 4.1. 3 4 tool available and the analysis required, 2. Data manipulation - techniques supporting users 4 5 (d) carry out initial exploratory analysis, using analyt- in manipulating the found LD are described in 5 6 ical and/or mining models. Section 4.2. 6 7 Compared to our definition of the data consumption 3. Data visualization - techniques allowing users to 7 8 process, the process proposed in [11] concentrates more visualize LD are identified in Section 4.3 8 9 on data visualization and analysis. Nevertheless, it is 4. Data output - techniques allowing users to output 9 10 comparable to ours. Step (a) corresponds to our step the result of consumption process in other tools, 10 11 1. Step (b) corresponds to our steps 1 and 2. Step (c) both LD and non-LD, are identified in Section 4.4 11 12 5. Developer and community support - properties 12 corresponds to our steps 3 and 4. Finally, step (d) is 13 not covered by our process. We consider that the user allowing users to share plugins and projects cre- 13 14 works with the data as described in (d). However, this ated in the tools and properties allowing develop- 14 15 is out of the scope of an LDCP as it is performed in an ers to integrate and reuse the tools are specified in 15 16 external tool. Therefore, we do not evaluate this step. Section 4.5. 16 17 There is also non-scientific literature describing the 17 18 Therefore, there are no efficiency or performance 18 LD consumption process. In [12], the authors specify requirements specified here. Once enough tools imple- 19 the following process: 19 20 ment at least the minimal requirements identified here, 20 21 (a) specify concrete use cases, their efficiency and performance may be evaluated. 21 22 (b) evaluate relevant data sets, We also do not evaluate how well a certain criterion 22 23 (c) check the respective licenses, is implemented regarding the user interface of the tool. 23 24 (d) create consumption patterns, What we evaluate is simply whether or not the tool im- 24 25 (e) manage alignment, caching and updating mecha- plements at least some approach to address the criterion 25 26 nisms, and whether LD expert knowledge is required or not. 26 27 (f) create mash ups, GUIs, services and applications We identified 34 requirements based on the existing 27 28 on top of the data tools and literature, split further into 94 fine grained 28 criteria. 29 This process can also be aligned with ours. Steps (a) 29 30 and (b) correspond to our step 1 - in (a) the users formu- 30 4.1. Dataset discovery 31 late their intent, in (b) they search for relevant datasets. 31 32 Step (c) is a part of our step 3 (see Section 4.2.4). In 32 The consumption process starts with the user’s intent 33 step (d) the users identify and select relevant data from 33 to discover datasets published on the web which contain 34 the datasets which corresponds to our step 2. Step (e) 34 the required data. Therefore, an LDCP should be able 35 is a part of our step 3. Step (f) contains our step 4 and 35 to gather metadata about available datasets and provide 36 extends it with the actual creation of a service on top 36 a search user interface on top of the metadata. 37 of the data. We expect that the users work with the data 37 38 created in step 4 of our process in the same way as 4.1.1. Gathering metadata 38 39 in step (f) from [12] but it is out of the scope of our The primary goal of dataset discovery is to provide 39 40 process. the users with information about existence of datasets 40 41 Our described process is generic and any data con- which correspond to their intent. The provided infor- 41 42 sumption process may be described in these steps, not mation should consist of metadata characterizing the 42 43 only the LD consumption process. However, an LDCP datasets, such as title, description, temporal and spa- 43 44 may benefit from the LD principles in each step. tial information, URL where it can be accessed, etc. In 44 45 In the rest of this section, we describe the require- this section, we define requirements on how an LDCP 45 46 ments for each of the described steps. In this paper, we should gather such metadata. 46 47 aim explicitly at the requirements which may somehow The most natural way of gathering such information 47 48 benefit from LD principles and related technologies. is loading dataset metadata from existing data catalogs. 48 49 For each requirement, we further specify evaluation They usually provide a machine readable API, which an 49 50 criteria. The requirements are split into 5 requirement LDCP can use. Currently, the most wide-spread open 50 51 groups. data catalog API is the proprietary non-RDF CKAN 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 9

1 API21. For example, open data catalogs implemented ] 1 2 by CKAN22 or DKAN23 open-source platforms support } 2 3 this API. 3 The detailed information about a dataset typically con- 4 Besides the proprietary CKAN API metadata repre- 4 tains, among others, a link for downloading the data 5 5 sentation, more or less standardized RDF vocabular- and information about its data format, i.e. a MIME-type, 6 6 ies for representing dataset metadata exist. First, there in case of downloadable files. 7 is the DCAT [13] vocabulary, a W3C Recommenda- 7 8 tion, and DCAT-AP v1.1, the DCAT application pro- { 8 9 file for European data catalogs recommended by Eu- "description": "Download", 9 10 ropean Commission, providing the vocabulary support "format": "application/n-triples", 10 11 for dataset metadata. There are existing data catalogs "": "http.../all_languages.tar", 11 12 utilizing these vocabularies for providing access to "resource_type": "file" 12 13 dataset metadata. Examples of such data catalogs are } 13 14 the European Data Portal, integrating dataset metadata 14 15 from various national data catalogs and the European Therefore, LDCP can consume such API in a way 15 16 Union Open Data Portal providing information about which allows it to directly access the cataloged data 16 17 the datasets from the institutions and other bodies of without unnecessary user interaction. SPARQL end- 17 18 the European Union. Besides DCAT, there is also the points can be discovered in a similar fashion. 18 19 VoID [14] vocabulary, a W3C interest group note for Example 1.2 (DCAT-AP). In the European Union, the 19 20 representation of metadata about LD datasets. The meta- DCAT-AP standard for data portals prescribes that 20 21 data using these vocabularies is then accessible by the dataset metadata should be represented in RDF, and 21 22 usual LD ways, i.e. in a data dump, in a SPARQL end- lists classes and properties to be used. Therefore, LDCP 22 23 point or using IRI dereferencing. can consume an RDF metadata dump or access a 23 24 An LDCP should therefore support loading dataset SPARQL endpoint, e.g. the European Data Portal end- 24 25 metadata using a data catalog API and/or the LD princi- point24, to get the data access information in a stan- 25 26 ples and vocabularies. This leads us to our first require- dardized way. 26 27 ment, Req. 1. 27 28 Criterion 1.1 (CKAN API). The evaluated tool is able 28 29 Requirement 1 (Catalog support). Support for loading to load dataset metadata from a CKAN API. 29 30 dataset metadata from wide-spread catalogs like CKAN 30 Criterion 1.2 (RDF metadata in dump). The evalu- 31 and from standardized metadata in DCAT, DCAT- ated tool can exploit VoID, DCAT, DCAT-AP and/or 31 32 AP and VoID using dereferencable dataset IRIs, data Schema.org dataset metadata stored in RDF dumps. 32 33 dumps and SPARQL endpoints which provide the meta- 33 34 data. Criterion 1.3 (Metadata in SPARQL endpoint). The 34 35 evaluated tool can exploit VoID, DCAT, DCAT-AP 35 Example 1.1 (CKAN API). Given a URL of a CKAN 36 and/or Schema.org dataset metadata stored in SPARQL 36 instance, e.g. https://old.datahub.io, one can access 37 endpoints. 37 the list of datasets and then the details about each of 38 38 the found datasets by using the package_list and pack- Criterion 1.4 (Metadata from IRI dereferencing). The 39 39 functions of the CKAN Action API, respec- evaluated tool can dereference a dcat:Dataset or 40 age_show 40 tively. void:Dataset IRI to get the corresponding VoID, 41 DCAT, DCAT-AP and/or Schema.org dataset metadata. 41 42 { 42 43 "success": true, The previous requirement assumes that metadata 43 44 "result": [ about datasets already exists provided through 44 45 "0000-0003-4469-8298", of existing data catalogs. However, there is a more ad- 45 46 "0000-0003-4469-8298-1", vanced way of gathering metadata. It is through imple- 46 mentation or support of a custom crawling and index- 47 "0000-0003-4469-8298-2" 47 48 ing service not necessarily relying solely on metadata 48 49 found in data catalogs. Such a service can build its own 49 21https://docs.ckan.org/en/latest/api/ 50 50 22https://ckan.org 51 23https://getdkan.org 24https://www.europeandataportal.eu/sparql 51 10 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 index of datasets comprising automatically computed a given metadata property, e.g. publisher or date of 1 2 metadata based on the content of the datasets using so publication, and enables the users to restrict the list 2 3 called dataset profiling techniques [15]. The computed only to datasets having given values of this property. 3 4 metadata of a dataset is then called a dataset profile. The provided facets are determined by the structure of 4 5 Profiles may encompass structure of the datasets, i.e. the collected metadata. Alternatively, a UI may provide 5 6 the used classes and predicates [16], semantically re- a search form structured according to the structure of 6 7 lated labels [17] or topics which characterize the con- collected metadata about datasets. 7 8 [18] where a topic is a resource from a well-known These features are independent of the methods of 8 9 and highly reused LD data source, e.g. DBpedia [19] gathering the metadata described in Section 4.1.1 and 9 10 or Wikidata25. There are also approaches to clustering they can be offered without building on top of the LD 10 11 resources in a dataset. A cluster is formed by similar principles. However, there exist many approaches in the 11 12 resources and characterization of these clusters may be literature which show how the UI may benefit from the 12 13 included in the profile. [15] provides a detailed survey LD principles to better support the user. Even though 13 14 of existing RDF dataset profiling methods. they are primarily intended for querying the content of 14 15 Examples of services which build dataset profiles datasets, they could be directly used by an LDCP for 15 16 automatically from the content of the datasets are dataset discovery. We show examples of these works in 16 17 Sindice [20] or LODStats [21]. The resulting Req. 2 can the rest of this section and formulate requirements on 17 18 be viewed as a parallel to the development witnessed the LDCP based on them. 18 19 in the Web of Documents where first, there were web Supported query language. As argued in [22], ex- 19 20 page catalogs and later came full-text search engines pressing the user’s intent with a keyword-based query 20 21 such as Google. language may be inaccurate when searching for data. 21 22 It shows that building the search user interface on top 22 Requirement 2 (Catalog-free discovery). Support for 23 automated extraction of dataset profiles from the web of the LD principles may help with increasing the ac- 23 24 based on the dataset content, without using a catalog. curacy of the queries. Theoretically, it would be possi- 24 25 ble to offer SPARQL, a query language for RDF data, 25 26 Example 2.1 (Sindice). Sindice [20] was an LD index- as a search query language. However, this language is 26 27 ing service which then allowed to search the index data too complex and hard to use for common users [23], 27 28 using the Sig.MA tool, which facilitated discovering including our intended users such as data journalists. 28 29 the data sources of the search results. The service is, There are two approaches to this problem in the lit- 29 30 however, currently unavailable. erature. The first approach tries to provide the users 30 31 Criterion 2.1 (Dataset profiling). The evaluated tool is with the power of SPARQL while hiding its complexity 31 32 able to discover datasets without a catalog and compute and enables them to express their intent in a natural 32 33 their profile of any kind. language. The query is then translated to a SPARQL 33 34 expression. For example, in [24], the authors parse a 34 35 4.1.2. Search user interface natural language expression to a generic domain in- 35 36 A user interface (UI) enables users to express their dependent SPARQL template which captures its se- 36 37 intent to search for datasets in the form of dataset mantic structure and then they identify domain specific 37 38 search queries. In this section, we define requirements concepts, i.e. classes, predicates, resources, combining 38 39 on the query language the users may use to express NLP methods and statistics. In [25], the authors propose 39 40 their dataset search queries and the assistance provided a natural language syntactical structure called Normal- 40 41 by LDCP to the users when writing such search queries. ized Query Structure and show how a natural language 41 42 The way how search queries may be evaluated by an query can be translated to this normalized structure and 42 43 LDCP is discussed later in Section 4.1.3. from here to SPARQL. In SimplePARQL [26], parts of 43 44 Most users are familiar with UIs provided by domi- the SPARQL query can be written as keywords, which 44 45 nating dataset discovery services - open data catalogs are later rewritten into pure SPARQL. 45 46 (e.g., CKAN). They enable the users to express their The second approach offers search query languages 46 47 intent as a list of keywords or key phrases. The users which do not aim to be as expressive as SPARQL, but 47 48 may also restrict the initial list of datasets discovered they somehow improve the expressive power of the 48 49 using keywords with facets. A facet is associated with keywords based languages. They achieve this by ex- 49 50 tending them with additional concepts. For example, 50 51 25https://www.wikidata.org [22] proposes a search query language which supports 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 11

1 Criterion 3.4 (Formal language based intent expres- 1 2 sion). The evaluated tool supports expressing user’s 2 3 intent in a formal language less complex than SPARQL. 3 4 4 Assistance provided to the user when writing the 5 5 query. Most users are familiar with behavior of com- 6 6 mon search engines which auto-suggest parts of queries 7 7 to assist the users when expressing their intent. Also, 8 8 facets may be considered as a way of assisting the users 9 9 when formulating their intent. Similar assistance can 10 10 also be provided in the scenario of dataset discovery. 11 11 For example, [22] auto-suggests predicates relevant for 12 12 a given keyword in the searched datasets. In [29], the 13 13 authors show how it is possible to assist users when for- 14 14 mulating a natural language query by auto suggesting 15 15 the query not based on query logs (which is a method 16 16 used by conventional web search engines) but based on 17 Fig. 1. SPARKLIS user interface for formulating SPARQL queries 17 the predicates and classes in the searched RDF datasets. 18 18 In [28], auto-suggestions are provided by the tool auto- 19 so called predicate-keyword pairs where a keyword 19 matically, based on the chosen SPARQL endpoint with 20 may be complemented by RDF predicates which are 20 a knowledge base. Faceted search over LD is studied in 21 related to the keywords in the searched datasets. In [27], 21 [30] where the authors present a tool which generates 22 the authors propose a novel query language NautiLOD 22 facets automatically based on the contents of indexed 23 which enables to specify navigation paths which are 23 datasets. A detailed survey of faceted exploration of LD 24 then searched in the datasets. A navigation expres- 24 is provided in [31]. 25 sion is simpler and easier to specify than a SPARQL 25 26 query even though NautiLOD navigation paths are more Requirement 4 (Search query formulation assistance). 26 27 powerful then SPARQL navigation paths. Assist the user when formulating search query with 27 28 Requirement 3 (Search query language). Provide a auto-suggestion of parts of the query or with facets 28 29 user friendly search interface. derived automatically from the contents of indexed 29 30 datasets. 30 26 31 Example 3.1 (Sparklis). Sparklis[28] provides a 31 Example 4.1 (Search query formulation assistance). 32 user interface which supports a restricted english as 32 The user interface provided by Sparklis (see Figure 1) 33 a search query language. A screenshot is displayed in 33 assists the user with formulating a graph pattern ex- 34 Figure 1 with the query Give me every food that is the 34 pressed in a restricted natural English by providing 35 industry of Coffee Republic. The query expression must 35 labels of classes, labels of their individuals as well as 36 use labels of classes, predicates and individuals present 36 their predicates. 37 in the queried dataset. 37 38 Criterion 3.1 (Structured search form). The evaluated Criterion 4.1 (Seach query automated suggestions). 38 39 tool provides a form structured according to the struc- The evaluated tool assists the user when formulating 39 40 ture of metadata considered by the tool where the user’s a search query with automated suggestions of parts of 40 41 intent can be expressed by restricting possible values the query. 41 42 in metadata profiles of searched datasets. 42 Criterion 4.2 (Faceted search). The evaluated tool as- 43 43 Criterion 3.2 (Fulltext search). The evaluated tool pro- sists the user when formulating a search query with 44 44 vides an input field where the user’s intent can be ex- facets. 45 45 pressed as a set of keywords or key phrases. 46 4.1.3. Search query evaluation 46 47 Criterion 3.3 (Natural language based intent expres- In this section we define requirements on how an 47 48 sion). The evaluated tool supports expressing user’s LDCP can evaluate the user’s intent expressed as a 48 49 intent in a natural language. search query using the provided search user interface. 49 50 The evaluation means that the LDCP identifies a list of 50 51 26http://www.irisa.fr/LIS/ferre/sparklis/ datasets corresponding to the intent. The simplest solu- 51 12 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 tion is provided by the existing open data catalogs. Key- raphy of the Czech Republic, such a query expansion 1 2 words or other metadata values filled in a search form would mean that also datasets with demography of ad- 2 3 are matched against metadata records. Further restric- ministrative entities of the Czech Republic would be 3 4 tions expressed by the user using facets are evaluated searched for. 4 5 by filtering of the metadata records of the datasets. 5 Criterion 5.1 (Search query expansion). The evaluated 6 The current literature shows how this basic evalua- 6 tool is able to expand a search query provided by a 7 tion can be extended based on LD. The first possibility 7 user with additional resources which are semantically 8 is to interpret the semantics of keywords or key phrases 8 related to the resources in the original search query. 9 in the search query against given knowledge bases. The 9 10 keywords are represented with semantically correspond- Another possibility is to extend the discovered list 10 11 ing resources, i.e. individuals, concepts, classes or prop- of datasets. It means that after the search query is eval- 11 12 erties, from the given knowledge bases, possibly includ- uated, and either expanded or not, the resulting list of 12 13 ing their context. In [23], the authors first discover the discovered datasets is extended with additional seman- 13 14 semantics of entered keywords by consulting a given tically related datasets. By the term “semantically re- 14 15 pool of ontologies to find a syntactical match of key- lated” we mean various kinds of semantic relationships 15 16 words and labels of resources in these ontologies. Then among datasets. A simpler solution is to search for ad- 16 17 they automatically formulate a formal query based on ditional datasets which provide statements about the 17 18 the semantics. In [32], the authors discover semantics same resources, use the same vocabularies or link their 18 19 of keywords by matching them to extraction ontologies, resources to the resources in the already discovered 19 20 i.e. linguistically augmented ontologies, to enable the datasets. These relationships are explicitly specified as 20 21 system to extract information from texts. An LDCP RDF statements in the datasets. A more complex solu- 21 22 may use these approaches in two ways - it may add tion is to derive semantic relationships among datasets 22 23 semantically similar concepts to the search query or based on the similarity of their content [36]. 23 24 it may add semantically related ones as discussed in An LDCP may also profit from approaches which rec- 24 25 [33]. Both possibilities mean that the query provided by ommend datasets for linking based on dataset similarity. 25 26 the user is expanded. Therefore, we speak about query For example, [37] introduces a dataset recommendation 26 27 expansion. A query expansion may improve precision method based on cosine similarity of sets of concepts 27 28 and recall if considered carefully, which is studied in present in datasets. Similarly, the authors of [38] recom- 28 29 [34]. mend datasets based on the similarity of resource labels 29 30 An LDCP may also benefit from approaches for com- present in the datasets. Having discovered a dataset, an 30 31 puting semantic relatedness of resources. For exam- LDCP may use such an approach to recommend other 31 32 ple, [35] shows how the semantic relatedness may be semantically similar datasets. Instead of recommend- 32 33 computed in a scalable way based on graph distance ing them for linking, they could be recommended as 33 34 between two resources, i.e. average length of paths con- additional datasets possibly interesting to the user. We 34 35 necting the resources in a graph. may also get back to simpler solutions and consider 35 36 36 Requirement 5 (Search query expansion). Support for metadata profiles which may also contain some explic- 37 37 expanding search queries with additional semantically itly expressed semantic relationships between datasets. 38 38 related resources. E.g., DCAT-AP discussed in Section 4.1.1 enables data 39 publishers to specify semantic relationships between 39 40 Example 5.1 (Search query expansion). Using a knowl- datasets in their metadata descriptions. 40 41 edge base such as Wikidata, an LDCP may recog- 41 The discovery of related datasets is understood as 42 nize terms in the search query as concepts from the 42 one of the key features an LDCP should have for exam- 43 knowledge base. For instance, it may recognize that 43 ple in [39], where the authors show how a possibility of 44 the term Czech Republic represents a country27. It may 44 recommending additional semantically related datasets 45 have several query expansion rules encoded, such as 45 may improve the quality of scientific research in do- 46 when a country is identified in the intent, it may be 46 mains such as medicine or environmental research by 47 expanded with contained administrative territorial enti- 47 supporting interdisciplinary research. 48 ties28. When a user searches for datasets with demog- 48 49 Requirement 6 (Search results extension). Support for 49 50 50 27https://www.wikidata.org/wiki/Q213 recommendation of additional datasets semantically 51 28https://www.wikidata.org/wiki/Property:P150 related to already discovered datasets. 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 13

1 Example 6.1 (Search results extension). DCAT-AP Criterion 7.2 (Context-based dataset ranking). The 1 2 considers the following predicates from the Dublin Core evaluated tool supports ranking of datasets based on 2 3 Vocabulary which can be used to express semantic rela- their context. 3 4 tionships between datasets: hasVersion (a dataset 4 Criterion 7.3 (Intent-based dataset ranking). The eval- 5 is a version of another dataset), source (a dataset is a 5 uated tool supports ranking of datasets based on the 6 source of data for another dataset), and relation (a 6 intent of the user. 7 dataset is somehow related to another dataset). These 7 8 relationships define a context which can be used to Second, it is possible to show a (visual) preview or 8 9 extend the set of discovered datasets. An LDCP may summary of the discovered datasets [43] to help the user 9 10 provide datasets related to the discovered one which select the appropriate datasets. For example, a timeline 10 11 contain other versions of the dataset, other datasets showing the datasets according to their creation dates, 11 12 with the content derived from the discovered one, etc. the dates of their last modification, their accrual peri- 12 13 odicities or their combination. Other types of previews 13 Criterion 6.1 (Search results extension). The tool is 14 are a map showing their spatial coverage, a word cloud 14 able to extend the list of datasets discovered by evaluat- 15 diagram from their keywords, etc. 15 ing the search query provided by a user with additional 16 The techniques which would allow an LDCP to cre- 16 semantically related datasets. 17 ate such previews of the datasets include support for 17 18 4.1.4. Displaying search results the appropriate RDF vocabularies. Some of the LD vo- 18 29 19 An LDCP needs to display the discovered list of cabularies recently became W3C Recommendations , 19 20 datasets to the user. For this, it needs to order the dis- which are de facto standards, and as such should be 20 21 covered list and display some information about each supported in LDCPs. These include the Simple Knowl- 21 22 dataset. The current dataset discovery solutions, i.e. edge Organization System (SKOS) [44], The Organi- 22 23 data catalogs, order the discovered list according to the zation Ontology (ORG) [45], the Data Catalog Vocab- 23 24 keywords based similarity of the intent of the user and ulary (DCAT) and the RDF Data Cube Vocabulary 24 25 the metadata descriptions of the discovered datasets. (DCV) [46]. Having the knowledge of such well-known 25 26 Usually, they display an ordered list of datasets with vocabularies, an LDCP can generate vocabulary specific 26 27 basic metadata such as the name, description, publica- preview of each of the candidate datasets, giving the 27 28 tion date, and formats in which the dataset is available. user more information for his decision making. There 28 29 However, there are more advanced ways of displaying are also many well-known vocabularies which are not 29 30 the search results which can be built on top of LD. W3C Recommendations. Some of them are W3C Group 30 31 First, the discovered datasets may be ranked and their Notes, e.g. VoID [14] used to describe RDF datasets 31 32 list may be ordered according to this ranking. For exam- and the vCard Ontology [47] for contact information. 32 33 ple, in [40], the authors modify a well-known PageR- Others are registered in Linked Open Vocabularies [7], 33 34 ank algorithm for ranking datasets. Dataset ranking can e.g. Dublin Core [48], FOAF [49], GoodRelations [50] 34 30 35 also be based on their quality scores surveyed in [41]. and Schema.org . These vocabularies are also popular 35 36 In [34], the authors also propose an algorithm based on as shown by the LODStats service [21]. 36 37 37 PageRank but they moreover consider the relevance to Requirement 8 (Preview based on vocabularies). Sup- 38 38 the intent of the user. A review of ranking approaches port for dataset preview based on well-known vocab- 39 39 is available in [42]. ularies used to represent the dataset content or meta- 40 40 data. 41 Requirement 7 (Dataset ranking). Support for dataset 41 42 ranking based on their content. Example 8.1 (Preview using the RDF Data Cube Vo- 42 43 cabulary). Given a statistical dataset modeled using the 43 Example 7.1 (Dataset ranking). For example, an 44 RDF Data Cube Vocabulary, a dataset preview (Req. 8) 44 LDCP may combine a PageRank score of a dataset with 45 could contain a list of dimensions, measures and at- 45 the relevance of publishers of datasets where datasets 46 tributes or a list of concepts represented by them, based 46 published by public authorities are ranked higher than 47 on its Data Structure Definition (DSD). In addition, the 47 datasets published by other organizations. 48 number of observations could be displayed and, in the 48 49 Criterion 7.1 (Content-based dataset ranking). The 49 50 50 evaluated tool supports ranking of datasets based on 29https://www.w3.org/standards/techs/rdfvocabs#w3c_all 51 their content. 30http://schema.org 51 14 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 case of simpler data cubes such as time series, a graphi- if the vocabularies are indeed specified in the meta- 1 2 cal line graph or bar chart can be shown automatically, data, the dataset itself may use only few properties and 2 3 as in LinkedPipes Visualization [51]. classes from them, not all covered by the vocabularies. 3 4 Therefore, a useful feature of a LDCP would be extrac- 4 Criterion 8.1 (Preview – W3C vocabularies). The eval- 5 tion of the actual schema of the data, i.e. used classes 5 uated tool provides a dataset preview based on used 6 and predicates connecting their instances shown in a 6 vocabularies that are W3C Recommendations. 7 diagram. This can be done by querying the data us- 7 8 Criterion 8.2 (Preview – LOV vocabularies). The eval- ing quite simple SPARQL queries such as this one for 8 9 uated tool provides a dataset preview based on used classes and numbers of their instances: 9 10 10 well-known vocabularies registered at Linked Open Vo- SELECT ?class (COUNT(?s) as ?size) 11 11 cabularies. WHERE {?s a ?class} 12 12 Criterion 8.3 . The evaluated tool ORDER BY DESC(?size) 13 (Preview metadata) 13 previews dataset description and statistics metadata 14 and this one for predicates among classes with number 14 recorded in DCAT, DCAT-AP, Schema.org and/or VoID. 15 of their occurences: 15 16 16 The user may also require a dataset preview com- SELECT ?c1 ?c2 17 17 piled not based on descriptive and statistical metadata (MIN(?p) as ?predicate) 18 18 provided by the publisher but based on metadata au- (COUNT(?p) AS ?num) 19 19 tomatically computed from the dataset contents. This WHERE { 20 20 metadata include descriptive metadata (e.g., important ?s1 a ?c1 . 21 21 keywords or spatial and temporal coverage), statistical ?s2 a ?c2 . 22 22 metadata (e.g., number of resources or numbers of in- ?s1 ?p ?s2 . 23 23 stances of different classes) and a schema. It it impor- } 24 24 tant to note that schemas are often very complex and GROUP BY ?c1 ?c2 25 25 their summaries are more useful for users. [52, 53] pro- ORDER BY DESC(?num) 26 pose techniques for schema summarization which cre- 26 27 ate a subset of a given dataset schema which contains Criterion 9.1 (Preview data). The evaluated tool pro- 27 28 only the most important nodes and edges. Therefore, vides dataset description and statistics based on auto- 28 29 it is a reasonable requirement that an LDCP should matic querying of the actual dataset. 29 30 30 be able to generate metadata independently of what is Criterion 9.2 (Schema extraction). The evaluated tool 31 31 contained in the manually created metadata records. provides a schema automatically extracted from the 32 32 given dataset. 33 Requirement 9 (Data preview). Provide dataset de- 33 scription, statistics and schema based on automatic 34 Besides metadata, statistics and used vocabularies 34 35 querying of the actual contents of the dataset. there are many other criteria regarding data quality 35 36 Example 9.1 (Data preview). A preview of data based that should also be presented to the user to support 36 37 on the data itself instead of relying on the metadata his decision making. We do not aim at specific quality 37 38 can be seen in the LODStats [21] dataset. From statis- indicators. The following requirement related to data 38 39 tics such as the number of triples, entities, classes, lan- quality is generic and it considers any meaningful LD 39 40 guages and class hierarchy depth, one can get a better quality indicator. A recent survey of such indicators and 40 41 information contributing to the decision on whether or techniques for their computation is available [41]. 41 42 42 not to use such a dataset. Requirement 10 (Quality indicators). Provide quality 43 43 measurements based on LD quality indicators. 44 Example 9.2 (Schema extraction using SPARQL). For 44 45 RDF datasets, the list of used vocabularies can be Example 10.1 (Checking of availability). One of the 45 31 46 attached using the void:vocabulary predicate. quality dimensions surveyed in [41] is availability. An 46 47 Their specifications can also be attached to RDF distri- LDCP could support its users by periodically checking 47 48 butions of datasets using the dcterms:conformsTo the availability of known datasets by trying to access 48 49 predicate as specified in DCAT-AP. Either way, even dumps or by querying SPARQL endpoints and derefer- 49 50 encing IRIs. The data gathered this way could be pre- 50 51 31https://www.w3.org/TR/void/#vocabularies sented e.g. as availability percentage. The SPARQLES 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 15

1 service [54] 32 performs such measurements periodi- e.g. the representation of Prague in DBpedia, for which 1 2 cally for known SPARQL endpoints. the user might want to monitor, some property, e.g. 2 3 population. The easiest way to do that is to simply 3 Criterion 10.1 (Quality indicators). Provide quality 4 access the resource IRI, to support following HTTP 4 measurements based on LD quality indicators. 5 redirects and to ask for RDF data in our favorite RDF 5 6 6 4.2. Data manipulation serialization (Req. 11). This process can be then re- 7 peated as necessary. It is an equivalent of running, 7 8 curl -L -H "Accept: text/turtle" 8 A common problem of LD tools out there is their e.g.: 9 9 limited support for standards of representation and ac- http://dbpedia.org/resource/Prague 10 10 cess to LD. An LDCP should support all existing serial- 11 Criterion 11.1 (IRI dereferencing). Ability to load 11 ization standards such as Turtle, JSON-LD, etc. and it 12 RDF data by dereferencing IRIs using HTTP content 12 should support loading the input in different forms such 13 negotiation. 13 as dump files, SPARQL endpoints, etc. In this section, 14 14 we motivate and define requirements on data input, its Criterion 11.2 (Crawling). Ability to apply the follow 15 15 transformations, provenance and licensing support and your nose principle 16 16 we emphasize existing web standards. 17 Publishing data via a SPARQL endpoint is consid- 17 18 4.2.1. Data input ered the most valuable way of publishing LD for the 18 19 An LDCP should support all the standard ways of ac- consumers. It provides instant access to the computing 19 20 cessing LD so that there are no unnecessary constraints power of the publishers infrastructure and the consumer 20 21 on the data to be consumed. These include IRI derefer- does not have to work with the data at all to be able to 21 22 encing (Req. 11), SPARQL endpoint querying (Req. 12 query it. Publishing data this way also enables users to 22 23 and the ability to load RDF dumps in all standard RDF do federated queries [55]. Therefore, this should be the 23 33 24 1.1 serializations (Req. 13), i.e. RDF/XML, Turtle, easiest way of getting data using an LDCP. On the other 24 25 TriG, N-Triples, N-Quads, RDFa and JSON-LD. This hand, public SPARQL endpoints typically implement 25 26 functionality can be achieved relatively easily, e.g. by various limitations on the number of returned results or 26 27 34 35 27 integrating the Eclipse RDF4J , the Apache Jena or query execution time, which make it hard to get, e.g. a 28 36 28 the RDF JavaScript libraries. whole dataset this way. These limits, however, can be 29 29 There are at least two ways of how an LDCP should tackled using various techniques such as paging. 30 support IRI dereferencing. First, an LDCP should be 30 31 able to simply access a URL and download a file in any Requirement 12 (RDF input using SPARQL). Ability 31 32 of the standard RDF serializations, which we deal with to load data from a SPARQL endpoint. 32 33 later in Req. 13. Second, an LDCP should be able to use 33 34 HTTP Content Negotiation37 to specify the request for Example 12.1 (Accessing the DBpedia SPARQL end- 34 35 RDF data even though the URL yields an HTML page point). Since DBpedia is the focal point of the whole 35 36 for web browsers, and be able to work with the result, LOD cloud, it is often used as a linking target to pro- 36 37 e.g. to dereference related entities, a.k.a. the "follow vide more context to published data. In addition to IRI 37 38 your nose" principle38. dereferencing, an LDCP can use the DBpedia SPARQL 38 39 endpoint to, e.g., suggest more related data of the same 39 Requirement 11 (IRI dereferencing). Ability to load 40 type, etc. 40 RDF data by dereferencing IRIs using HTTP content 41 41 negotiation and accepting all RDF 1.1 serializations. Criterion 12.1 (SPARQL: querying). Ability to query 42 42 a given SPARQL endpoint using a SPARQL query or a 43 Example 11.1 (Usage of IRI dereferencing). One of 43 set of queries provided by the tool. 44 the possible data inputs can be a single RDF resource, 44 45 Criterion 12.2 (SPARQL: named graphs). Awareness 45 46 46 32http://sparqles.ai.wu.ac.at/ of named graphs in a SPARQL endpoint and ability to 47 33https://www.w3.org/TR/rdf11-new/#section-serializations work with them. 47 48 34http://rdf4j.org/ 48 35 49 https://jena.apache.org/ Criterion 12.3 (SPARQL: custom query). Ability to 49 36https://github.com/rdfjs 50 50 37https://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html extract data from a SPARQL endpoint using a custom 51 38http://patterns.dataincubator.org/book/follow-your-nose.html SPARQL query. 51 16 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 Criterion 12.4 (SPARQL: authentication). Ability to Criterion 13.6 (N-Quads file load). Ability to load 1 2 access SPARQL endpoints which require authentica- data from an N-Quads file, both from a URL and locally. 2 3 tion. 3 Criterion 13.7 (TriG file load). Ability to load data 4 4 Criterion 12.5 (SPARQL: advanced download). Abil- from a TriG file, both from a URL and locally. 5 5 ity to download a dataset from a SPARQL endpoint 6 Criterion 13.8 (Handling of bigger files). Ability to 6 even though it is too big to download using the standard 7 process bigger RDF dumps without loading them as 7 CONSTRUCT WHERE {?s ?p ?o}, e.g. by paging 8 RDF models. These techniques can be found in RDF- 8 or similar techniques. 9 pro [56] and RDFSlice [57]. 9 10 Requirement 13 (RDF dump load). Ability to load 10 Criterion 13.9 (CSV on the Web load). Ability to load 11 data from an RDF dump in all standard RDF serializa- 11 data from a CSV file described by the CSV on the Web 12 tions, including tabular data with CSVW mapping and 12 (CSVW) metadata, both from a URL and locally. 13 relational databases with R2RML mapping. 13 14 Criterion 13.10 (Direct mapping load). Ability to load 14 The previous techniques such as SPARQL querying 15 data from a relational database using Direct map- 15 and IRI dereferencing relied on the fact that the target 16 ping [58]. 16 service was able to provide the desired RDF data serial- 17 17 ization. However, when there are RDF dumps already Criterion 13.11 (R2RML load). Ability to load data 18 18 serialized, to be downloaded from the web, an LDCP from a relational database using R2RML mapping. 19 19 needs to support all standardized serializations itself, as 20 Besides these traditional ways of getting LD, an 20 it cannot rely on any service for translation. 21 LDCP should support the Linked Data Platform [59] 21 22 Example 13.1 (RDF quads dump load). One of the use specification for getting resources from containers. 22 23 cases for named graphs in RDF already adopted by These can be, e.g. notifications according to the Linked 23 24 LD publishers is to separate different datasets in one Data Notifications W3C Recommendation [60, 61]. 24 25 SPARQL endpoint. It is then quite natural for them to 25 Requirement 14 (Linked Data Platform input). Ability 26 provide data dumps in an RDF quads enabled format 26 to load data from the Linked Data Platform compliant 27 such as N-Quads or TriG, e.g. statistical datasets from 27 39 servers. 28 the Czech Social Security Administration . This also 28 29 eases the loading of such data to a quad-store as the Example 14.1 (Loading data from Linked Data Plat- 29 30 user does not have to specify the target graph to load form). The Linked Data Platform (LDP) specifications 30 31 to. Since there are multiple libraries and quad-stores defines (among others) containers40 as lists of entities. 31 32 supporting RDF 1.1 serializations, this should not be a Therefore, an LDCP should be aware of this and sup- 32 33 problem and an LDCP should support all of them. port the handling of LDP containers, e.g. by listing 33 34 Criterion 13.1 (RDF/XML file load). Ability to load them and their entities to be used in the data consump- 34 35 tion process. An example of a LDP compliant imple- 35 data from an RDF/XML file, both from a URL and 41 36 locally. mentation is Apache Marmotta . 36 37 37 Criterion 14.1 (Linked Data Platform containers). 38 Criterion 13.2 (Turtle file load). Ability to load data 38 Ability to load items from the Linked Data Platform 39 from a Turtle file, both from a URL and locally. 39 containers. 40 Criterion 13.3 (N-Triples file load). Ability to load 40 41 data from an N-Triples file, both from a URL and lo- In addition to be able to load data in all standardized 41 42 cally. ways, an LDCP should also provide a way of saying that 42 43 the data source should be periodically monitored for 43 44 Criterion 13.4 (JSON-LD file load). Ability to load changes, e.g. like in the Dynamic Linked Data Observa- 44 45 data from a JSON-LD file, both from a URL and locally. tory [62]. This includes the specification of the period- 45 46 Criterion 13.5 (RDFa file load). Ability to load RDF icity of the checks and the specification of actions to be 46 47 data from an RDFa annotated file, both from a URL taken when a change is detected. The actions could vary 47 48 and locally. from a simple user notification to triggering a whole 48 49 49

50 40 50 39https://data.cssz.cz/dump/nove-priznane-duchody-dle-vyse-duchodu. https://www.w3.org/TR/ldp/#ldpc 51 trig 41http://marmotta.apache.org/platform/ldp-module.html 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 17

1 pipeline of automated data transformations. This way, Requirement 16 (Version management). Ability to 1 2 the users can see what is new in the data used for their maintain multiple versions of data from selected data 2 3 goals and whether the data is simply updated or needs sources. 3 4 their attention due to more complex changes. A short 4 Criterion 16.1 (Manual versioning support). Ability to 5 survey of existing approaches to change monitoring is 5 keep multiple versions of input data, the versions are 6 6 provided in [15], Section 3.7. created, deleted and switched manually. 7 7 8 Requirement 15 (Monitoring of changes in input data). Criterion 16.2 (Automated versioning support). Abil- 8 9 Ability to periodically monitor a data source and trigger ity to keep multiple versions of input data, with support 9 10 actions when the data source changes. for automated versioning actions such as automatic 10 11 Example 15.1 (Keeping data up to date). The EU Meta- synchronization of remote datasets. 11 12 12 data Registry (MDR)42 publishes various codelists to Criterion 16.3 (Source versioning support). Ability to 13 13 be reused across the EU and they are updated almost exploit the existence of multiple versions of a dataset at 14 14 weekly. An LDCP should allow the users to register a its source, e.g. by supporting the Memento protocol. 15 dataset to be downloaded regularly (or on demand). 15 16 The interval can be set manually, or automatically from 4.2.2. Analysis of semantic relationships 16 17 the dcterms:accrualPeriodicity property in The requirements discussed in the previous sections 17 18 DCAT metadata. Or, the LDCP may be notified by MDR support the users in discovering individual datasets iso- 18 19 when a change occurs according to the Linked Data lated from one another. However, the users usually need 19 20 20 Notification protocol. to work with them as with one integrated graph of RDF 21 data. Therefore, the users expect that the datasets are 21 22 Criterion 15.1 (Monitoring of input changes: polling). semantically related to each other and that they can 22 23 Ability to periodically monitor a data source and trigger use these semantic relationships for their integration. 23 24 actions when the data source changes. If a dataset is not in a required relationship with the 24 25 other discovered datasets the users need to know this so 25 26 Criterion 15.2 (Monitoring of input changes: subscrip- that they can omit the dataset from further processing. 26 27 tion). Ability to subscribe to a data source, e.g. by Web- We have briefly discussed possible kinds of semantic 27 28 Sub [63], formerly known as PubSubHubbub, or Activ- relationships in Req. 6. However, while Req. 6 only 28 29 ityPub [64] and trigger actions when the data source required an LDCP to show datasets which are somehow 29 30 changes. semantically related to the selected one, now the users 30 31 When working with real world data sources, it may expect a deeper analysis of the relationships. 31 32 be necessary to keep track of versions of the input data Each of the existing semantic relationships among 32 33 which can change in time or even become unavailable. the discovered datasets should be presented to the users 33 34 When more complex changes happen in the data, the with a description of the relationship, i.e. the kind of the 34 35 whole consumption pipeline could be broken. There- relationship and its deeper characteristics. The descrip- 35 36 fore, a version management mechanism should be avail- tions will help the users to understand the relationships 36 37 37 able in an LDCP so that the user can work with a fixed and decide whether they are relevant for the integration 38 of the datasets or not. 38 snapshot of the input data and see what has changed 39 When the datasets share resources or their resources 39 and when. Such a mechanism could also be automated. 40 are linked, the deeper characteristics may involve the 40 For example, if a consuming client holds a replica of 41 information about the classes of the resources, the ra- 41 a source dataset, changes in the source may be prop- 42 tio of the number of the shared or linked resources to 42 agated to the replica as shown in [65], creating a new 43 the total number of the resources belonging to these 43 version. Updates may also be triggered by events like 44 classes, compliance of the datasets in the statements 44 Git commits, as can be seen in [66]. 45 provided about the shared resources, etc. There are ap- 45 Another type of version management is support for 46 proaches which provide explanations of such relation- 46 handling a data source, which itself is available in mul- 47 ships to users. For example, [68, 69] propose a method 47 tiple versions at one time, e.g. using the Memento [67] 48 for generating explanations of relationships between 48 protocol. 49 entities in the form of most informative paths between 49 50 the entities. The approach can be extended to explain- 50 51 42http://publications.europa.eu/mdr/ ing relationships among datasets which contain related 51 18 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 entities. In [70], the authors present another approach so called ontology matching [78] which is a process of 1 2 to explaining relationships between datasets based on deducing mappings between two vocabularies. These 2 3 so called relatedness cores which are dense sub-graphs methods may help an LDCP to provide the user with 3 4 that have strong relationships with resources from the more semantic relationships. 4 5 related datasets. 5 Requirement 18 (Semantic relationship deduction). 6 When the datasets neither share the resources nor 6 Provide support for automated or semi-automated de- 7 there are links among them, the semantic relationship 7 duction of semantic relationships among datasets. 8 may be given by the fact that they share some parts 8 9 of vocabularies, i.e. classes or predicates. Such rela- Example 18.1 (Link discovery using SILK). A clas- 9 10 tionships may be deduced from metadata collected by sic example of adding semantic relations to existing 10 11 services such as LODStats [21]. data is link discovery. Let us, for instance, consider 11 12 12 Requirement 17 (Semantic relationship analysis). Pro- a dataset with useful information about all Czech ad- 13 13 vide characteristics of existing semantic relationships dresses, towns, etc. called RÚIAN. Data on the web, 14 14 among datasets which are important for the user to be however, typically links to DBpedia representations of 15 15 able to decide about their possible integration. Czech cities and regions. In order to be able to use 16 information both from the LOD cloud and from RÚIAN, 16 17 Example 17.1 (LODVader). LODVader [71] shows the data consumer needs to map, e.g., the Czech towns 17 18 relationships among given datasets, demonstrated on in the Czech dataset to the Czech towns in DBpedia, 18 19 the LOD cloud. Instead of relying on existing metadata based on their names and based on regions in which 19 20 in VoID, it generates its own based on the actual data. It they are located. This is a task well suited for a link 20 21 is able to present relationships based on links between discovery tool such as SILK, and an LDCP should be 21 22 resources as well as shared vocabularies. Unfortunately, able to support this process. 22 23 the software seems abandoned and the demo works no 23 24 longer. Criterion 18.1 (Link discovery). Provide support for 24 25 automated or semi-automated deduction of links among 25 Criterion 17.1 (Shared resources analysis). Pro- 26 datasets. 26 vide characteristics of existing semantic relationships 27 27 among datasets based on resources shared by the Criterion 18.2 (Ontology alignment). Provide support 28 28 datasets. for automated or semi-automated ontology alignment. 29 29 30 Criterion 17.2 (Link analysis). Provide characteristics 4.2.3. Semantic transformations 30 31 of existing semantic relationships among datasets based The user now needs to process the discovered 31 32 on links among resources. datasets. As we discuss later in Section 4.4, this means 32 33 either visualizing the datasets or exporting them with 33 Criterion 17.3 (Shared vocabularies analysis). Pro- 34 the purpose of processing in an external tool. In general, 34 vide characteristics of existing semantic relationships 35 it means to transform the datasets from their original 35 among datasets based on shared vocabularies or their 36 form to another one, which is necessary for such pro- 36 parts. 37 cessing. We can consider transformations, which trans- 37 38 As a supplemental requirement to the previous one form one RDF representation of the datasets to another 38 39 is the support of methods for automated or semi- RDF representation or to other data models such as 39 40 automated deduction of semantic relationships among relational, tree, etc. In this section, we discuss the RDF 40 41 datasets. These methods may be useful when semantic to RDF transformations. The transformations to other 41 42 relationships cannot be directly discovered based on data models are considered as outputs of the process 42 43 statements present in the datasets, but new statements and are covered in Section 4.4. A transformation in- 43 44 must be deduced from the existing ones. There are tools volves various data manipulation steps which should 44 45 like SILK [72], SLINT [73] or SERIMI [74] which sup- be supported by an LDCP. 45 46 port so called link discovery which means deducing new First, an LDCP must deal with the fact that there 46 47 statements linking resources of, usually, two datasets. exist different vocabularies for the same domain. We 47 48 In theory, deduction of new linking statements is usu- say that such datasets are semantically overlapping. Let 48 49 ally done using various similarity measures [75, 76]. us note that a semantic overlap is the kind of semantic 49 50 In [77] is a recent comprehensive survey of link dis- relationship discussed in Req. 17. Therefore, it often 50 51 covery frameworks. Another family of tools supports happens that the vocabulary used in the dataset is differ- 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 19

1 ent from the one required for the further processing. In laries mentioned above also form a kind of inference 1 2 such situations it is necessary to transform the dataset rules, but here we consider them in their broader sense. 2 3 so that it corresponds to the required vocabulary. The A specific kind of inference rules allows one to specify 3 4 transformation can be based on an ontological map- that two resources are the same real-world entity, i.e. 4 5 ping between both vocabularies or it can be based on the owl:sameAs predicate. Having such inference 5 6 a transformation script specifically created for these rule means that statements about both resources need 6 7 two vocabularies. This requirement is realistic when to be fused together. The fusion does not mean simply 7 8 we consider well-known vocabularies such as FOAF. putting the statements together but also identification 8 9 Schema.org, GoodRelations, WGS84_pos, and others of conflicting statements and their resolution [79, 80]. 9 10 registered in LOV. 10 11 Requirement 20 (Inference and resource fusion). Sup- 11 Requirement 19 (Transformations based on vocabu- 12 port inference based on inference rules encoded in vo- 12 laries). Provide transformations between semantically 13 cabularies or ontologies of discovered datasets. 13 overlapping vocabularies. 14 14 Example 20.1 (Inference). Many RDF vocabularies 15 Example 19.1 (GoodRelations to Schema.org trans- 15 are defined with some inference rules in place. An ex- 16 formation). In the process of development of the 16 ample of inference which would be beneficial to sup- 17 Schema.org vocabulary, a decision was made to import 17 port in an LDCP can be the SKOS vocabulary with 18 classes and properties of the GoodRelations vocabulary 18 skos:broader skos:narrower 19 used for e-commerce into the Schema.org namespace. its and proper- 19 20 This decision is questionable and controversial43 in the ties. According to the SKOS reference, these proper- 20 46 21 light of best practices for vocabulary creation. These ties are inverse properties . For instance, a dataset 21 22 suggest reusing existing parts of vocabularies instead of may contain triples such as ex:a skos:broader 22 23 creating identical classes and properties or importing ex:b, whereas, the output is required to contain ex:b 23 24 them into another IRI namespace. Nevertheless, the fact skos:narrower ex:a because of limitations of 24 25 remains that some datasets might be using GoodRela- another component. These inverse triples can be in- 25 26 tions44 and other datasets might be using Schema.org45 ferred and materialized by an LDCP, based on the 26 27 for, e.g., the specification of opening hours of a point of knowledge of SKOS. 27 28 sales. This is a perfect opportunity for a transformation 28 Example 20.2 (owl:sameAs fusion). Let us have 29 script or component which would transform data mod- 29 two datasets we already mentioned in Example 18.1, 30 eled according to one of those ways to the other kind, if 30 Czech towns from DBpedia and the Czech registry con- 31 necessary (Req. 19). 31 32 taining information about all addresses and, among 32 Criterion 19.1 (Vocabulary mapping based transforma- 33 others, cities of the Czech Republic and let us con- 33 tions). Provide transformations between semantically 34 sider we linked the corresponding entities using the 34 overlapping vocabularies based on their mapping, e.g. 35 owl:sameAs links. Now if we are preparing a dataset 35 using OWL. 36 for, e.g. visualization to a user, it might be more fitting to 36 37 Criterion 19.2 (Vocabulary-based SPARQL transfor- fuse the data about each city from both datasets than to 37 38 mations). Provide transformations between semanti- leave the data with the linking. Disjunctive facts about 38 39 cally overlapping vocabularies, based on predefined one city are easy to fuse. We choose the IRI of one or 39 40 SPARQL transformations. the other as the target subject and copy all predicates 40 41 and objects (including blank nodes) using this subject. 41 Another often required kind of data manipulation is 42 What may require assistance are resulting conflicting 42 inference. Inference can be used when additional knowl- 43 statements such as different city populations. An LDCP 43 edge about the concepts defined by the vocabulary is 44 should support the user in this process by, at least, pro- 44 provided which can be used to infer new statements. 45 viding a list of conflicts and options of solving them. 45 Such knowledge is expressed in the form of so called 46 46 inference rules. Semantic mappings between vocabu- Support inference 47 Criterion 20.1 (RDFS inference). 47 48 based on inference rules encoded in the RDFS vocabu- 48 43 49 http://wiki.goodrelations-vocabulary.org/GoodRelations_and_schema. lary. 49 org 50 50 44http://purl.org/goodrelations/v1#OpeningHoursSpecification 51 45http://schema.org/OpeningHoursSpecification 46https://www.w3.org/TR/skos-reference/#L2055 51 20 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 Criterion 20.2 (Resource fusion). Support fusion of Requirement 23 (Automated data manipulation). Sup- 1 2 statements about the same resources from different port automated compilation of data manipulation 2 3 datasets. pipelines and enable their validation and manual ad- 3 4 Criterion 20.3 (OWL inference). Support inference justment by users. 4 5 5 based on inference rules encoded OWL ontologies. Example 23.1 (Automated data manipulation). An 6 6 LDCP may be aware of what the data in each step of 7 Besides transformations, the user will typically need 7 the consumption process looks like even as the process 8 the standard operations for data selection and projec- 8 is being designed, e.g. by performing background oper- 9 tion. An LDCP can assist the user with specification of 9 ations on data samples, and it may recommend possible 10 these operations by providing graphical representation 10 or probable next steps. Using the same principles, the 11 of the required data subset, which may correspond to 11 data transformation pipelines may be even generated 12 a graphical representation of a SPARQL query like in 12 automatically, given that the requirements on the final 13 SPARQLGraph [8] and similar approaches. 13 data structure are known and the source data is ac- 14 14 Requirement 21 (Selection and projection). Support cessible. This can be seen in e.g. LinkedPipes Visual- 15 15 assisted graphical selection and projection of data. ization [10], where given an RDF data source, a list 16 16 Example 21.1 (Data language selection). One type of of possible visualizations formed by data transforma- 17 17 selection may be limiting the dataset to a single lan- tion pipelines, is offered automatically, based on known 18 18 guage. For example, when working with the EU Meta- vocabularies and input and output descriptions of the 19 19 data Registry (MDR), codelists are usually provided data processing components. 20 20 in all EU languages. Having the data in multiple lan- 21 Criterion 23.1 (Automated data manipulation). Sup- 21 guages may, however, not be necessary. An LDCP may 22 port automated compilation of data manipulation 22 support common selection techniques without requiring 23 pipelines and enable their validation and manual ad- 23 the user to write SPARQL queries. 24 justment by users. 24 25 Criterion 21.1 (Assisted selection). Support assisted 25 4.2.4. Provenance and license management 26 graphical selection of data. 26 For the output data to be credible and reusable, de- 27 27 Example 21.2 (Data projection). An example of a pro- tailed provenance information should be available cap- 28 28 jection may be picking only data which is relevant for turing every step of the data processing task, starting 29 29 a given use case, while omitting other properties of from the origin of the data to the final transformation 30 30 represented resources. The user may again be assisted step and data export or visualization. There is the PROV 31 31 by a graphical overview of the dataset with possibility Ontology [81], a W3C Recommendation that can be 32 32 to select relevant entities, their properties and links. used to capture provenance information in RDF. 33 33 Criterion 21.2 (Assisted projection). Support assisted 34 Requirement 24 (Provenance). Support for processing 34 graphical projection of data. 35 provenance information expressed using a standardized 35 36 Besides the above described specific data manipu- vocabulary. 36 37 lation requirements, the user may need to define own 37 Example 24.1 (Provenance). When the goal of the LD 38 specific transformations expressed in SPARQL. 38 consumption process is to gather data, combine it and 39 39 Requirement 22 (Custom transformations). Support pass it to another tool for processing, it might be useful 40 40 custom transformations expressed in SPARQL. to store information about this process. Specifically, the 41 41 consumers of the transformed data might be interested 42 Criterion 22.1 (Custom transformations). Support cus- 42 in where the data comes from, how it was combined, 43 tom transformations expressed in SPARQL. 43 what selection and projection was done, etc. This in- 44 Having the support for all kinds of data manipula- 44 formation can be stored as an accompanying manifest 45 tions described above it is necessary to be able to com- 45 using the PROV Ontology [81] and may look like this 46 bine them together so that the discovered datasets are 46 (taken directly from the W3C Recommendation): 47 transformed to the required form. The user may be re- 47 48 quired to create such data manipulation pipeline man- ex:aggregationActivity 48 49 ually. However, we might also expect that an LDCP a prov:Activity; 49 50 compiles such pipeline automatically and the user only prov:wasAssociatedWith ex:derek; 50 51 checks the result and adjusts it when necessary. prov:used ex:crimeData; 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 21

1 prov:used ex:nationalRegionsList; to developers who are LD experts, so a support like this, 1 2 . preferably even region specific, is necessary. 2 3 3 Criterion 24.1 (Provenance input). Ability to exploit 4 Criterion 25.1 (License awareness). Awareness of 4 structured provenance information accompanying input 5 dataset licenses, e.g. gathered from metadata, and their 5 data. 6 presentation to the user. 6 7 Criterion 24.2 (Provenance output). Provide prove- Criterion 25.2 (License management). Awareness of 7 8 nance information throughout the data processing the license combination problem and assistance in 8 9 pipeline using a standardized vocabulary. determining the resulting license when combining 9 10 10 Part of metadata of a dataset should be, at least in datasets. 11 11 case of open data, the information about the license 12 12 under which the data is published, since that is what the 4.3. Data visualization 13 13 first star of open data is for. A common problem when 14 14 integrating data from various data sources is license The goal of using an LDCP is either to produce data 15 15 management, i.e. determining whether the licenses of for processing in other software or to produce visualiza- 16 16 two data sources are compatible and what license to ap- tions of discovered data sets. Given the graph nature of 17 17 ply to the resulting data. This problem gets even more RDF, the simplest visualization is a graph visualization 18 18 complicated when dealing with data published in differ- of the data, reflecting its structure in RDF. Another type 19 19 ent countries. A nice illustration of the problem is given of visualizations are manually specified using a map- 20 20 as Table 3 in ODI’s Licence Compatibility47 page. Also, ping of the RDF data to the inputs of visualization plug- 21 21 the European Data Portal has its Licence Assistant48 ins, as in LinkDaViz [84]. Finally, visualizations can be 22 22 which encompasses an even larger list of open data automated, based on correct usage of vocabularies and 23 23 licenses. Issues with (linked) open data licensing are a library of visualization tools as in the LinkedPipes Vi- 24 24 also identified in [82], where the authors also present sualization (LP-VIZ) [51], which offers them for DCV, 25 25 the License Compositor tool. Finally, LIVE [83] is a SKOS hierarchies and Schema.org GeoCoordinates on 26 26 tool for verification of license compatibility of two LD Google Maps. 27 27 datasets. 28 Requirement 26 (Visualization). Offer visualizations 28 29 Requirement 25 (License management). Support li- of LD based on its graph structure, manual mapping 29 30 cense management by tracking licenses of original data from the data to the visualizations, or using automated 30 31 sources, checking their compatibility when data is in- methods for visualization. 31 32 tegrated and helping with determining the resulting 32 33 license of republished data. Example 26.1 (Manual visualization in LinkDaViz). 33 34 An example of manual visualization is LinkDaViz. The 34 Example 25.1 (License management). Based on open 35 user first selects the dataset to be visualized and then 35 data license compatibility information such as the 36 selects specific classes and properties from the dataset. 36 ODI’s Licence Compatibility table, an LDCP should 37 Based on the selected properties and their data types, 37 advise its user on whether two data sources can be com- 38 visualization types are recommended and more can be 38 bined or not and what is the license of the result. The 39 configured manually to get a visualization such as a 39 licenses of individual datasets can be obtained from its 40 map, a bar chart, etc. 40 DCAT metadata, as in, e.g. the Czech National Open 41 41 Data catalog49. For datasets without a specific license, Example 26.2 (Automated visualization in LP-VIZ). 42 42 a warning should be displayed. For datasets with known In contrast to manual visualization in Example 26.1, in 43 43 licenses, e.g. a dataset published using CC-BY and an- LinkedPipes Visualization the user points to the dataset 44 44 other one published using CC-BY-SA, the CC-BY-SA and the tool by itself, automatically, takes a look at 45 45 minimum license should be suggested. The legal envi- what data is present in the dataset. It then offers pos- 46 46 ronment of open data is quite complex and obscure even sible visualizations, based on the found, well-known 47 vocabularies supported by components registered in the 47 48 48 47 tool. 49 https://github.com/theodi/open-data-licensing/blob/master/guides/ 49 licence-compatibility.md 50 50 48https://www.europeandataportal.eu/en/licence-assistant Criterion 26.1 (Graph visualization). Offer automated 51 49https://data.gov.cz/sparql visualizations of LD based on well-known vocabularies. 51 22 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 Criterion 26.2 (Manual visualization). Offer visualiza- Requirement 28 (RDF output using SPARQL). Ability 1 2 tions of LD based on manual mapping from the data to to load RDF data to a SPARQL endpoint using SPARQL 2 3 the visualizations. Update or SPARQL Graph Store HTTP protocol. 3 4 Criterion 26.3 (Vocabulary-based visualization). Offer 4 5 Example 28.1 (Loading data using the SPARQL Graph 5 automated visualizations of LD based on well-known Store HTTP Protocol). For larger datasets, where 6 vocabularies. 6 7 whole named graphs are manipulated, it is simpler 7 to use the SPARQL Graph Store HTTP Protocol to 8 4.4. Data output 8 9 load data. Basically, a data dump is sent over HTTP 9 to the server instead of being inserted by a SPARQL 10 The other possible goal of our user is to output raw 10 11 data for further processing. The output data can be in query. However, despite being a W3C Recommendation, 11 12 various formats, the most popular may include RDF the implementation of the protocol differs among quad 12 13 for LD, which can be exported in various standardized stores. Therefore, besides the endpoint URL, authenti- 13 14 ways. The first means of RDF data output is to a dump cation options and the target named graph, an LDCP 14 15 file using standardized serializations. The inability to should be also aware of the specific implementation 15 16 create a dump in all standardized formats (only some) of the endpoint. This can be seen e.g. in LinkedPipes 16 50 17 will be classified as partial coverage of the requirement. ETL’s Graph Store Protocol component . 17 18 18 Requirement 27 (RDF dump output). Ability to export Criterion 28.1 (SPARQL Update support). Ability to 19 19 RDF data to a dump file in all standard RDF serializa- load RDF data to a SPARQL endpoint using SPARQL 20 20 tions. Update. 21 21 22 Example 27.1 (RDF dump output). Various tools may Criterion 28.2 (SPARQL: named graphs support). 22 23 need various RDF serializations to be able to process Awareness of named graphs in a SPARQL endpoint and 23 24 the output data. The main problem which can be seen ability to load data to them. 24 25 quite often is that many applications consuming RDF 25 26 only support triple formats and not quad formats. There- Criterion 28.3 (SPARQL: authentication support). 26 27 fore, a LDCP should support output in all standard Ability to access SPARQL Update endpoints which 27 28 RDF 1.1 serializations, which can be easily supported require authentication. 28 29 by using an existing library such as Apache Jena or 29 Criterion 28.4 (SPARQL Graph Store HTTP Protocol 30 Eclipse RDF4J. 30 output). Ability to load RDF data to a SPARQL end- 31 31 Criterion 27.1 (RDF/XML file dump). Ability to dump point using the SPARQL Graph Store HTTP Protocol. 32 data to an RDF/XML file. 32 33 Lastly, an LDCP should be able to write RDF data to 33 Criterion 27.2 (Turtle file dump). Ability to dump data 34 Linked Data Platform compliant servers. This may also 34 to a Turtle file. 35 include support for W3C Recommendations building on 35 36 Criterion 27.3 (N-Triples file dump). Ability to dump top of the LDP, such as Linked Data Notifications [61] 36 37 data to an N-Triples file. for sending notifications to inboxes on the web. 37 38 38 Criterion 27.4 (JSON-LD file dump). Ability to dump 39 Requirement 29 (Linked Data Platform output). Abil- 39 data to a JSON-LD file. 40 ity to write data to Linked Data Platform compliant 40 41 Criterion 27.5 (RDFa file dump). Ability to dump RDF servers. 41 42 data to an RDFa annotated file. 42 43 Example 29.1 (Loading data to Linked Data Platform). 43 Criterion 27.6 (N-Quads file dump). Ability to dump The Linked Data Platform (LDP) specifications defines 44 data to an N-Quads file. 44 45 (among others) how new entities should be added to ex- 45 46 Criterion 27.7 (TriG file dump). Ability to dump data isting containers. Therefore, an LDCP should be aware 46 47 to a TriG file. of this and support the handling of LDP containers. A 47 48 Another way of outputting RDF data is directly into specific use case may be sending messages according to 48 49 a triplestore, using one of two standard ways, either the Linked Data Notifications W3C Recommendation. 49 50 the SPARQL Update query [85] or using the SPARQL 50 51 Graph Store HTTP Protocol [86]. 50https://etl.linkedpipes.com/components/l-graphstoreprotocol 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 23

1 Criterion 29.1 (Linked Data Platform Containers Out- large data graphs, an LDCP should support output con- 1 2 put). Ability to upload data by posting new entities to sumable by Gephi. Technically, this is a pair of CSV 2 3 Linked Data Platform Containers. files with a specific structure. An LDCP could include 3 4 a wizard for creation of such files. 4 Besides being able to output RDF, a perhaps even 5 5 more common use case of an LDCP will be to prepare 6 Criterion 30.1 (CSV file output). Ability to export data 6 LD to be processed locally by other tools, which do not 7 as CSV files. 7 know LD. One choice for non-LD data export is a CSV 8 Criterion 30.2 (CSV on the Web output). Ability to 8 9 for tabular data. The usual way of getting CSV files out 9 of RDF data is by using the SPARQL SELECT queries. export data as CSV files accompanied by the CSV on 10 the Web JSON-LD descriptor. 10 11 Such CSV files can now be supplied by additional meta- 11 12 data in JSON-LD according to the Model for Tabular Criterion 30.3 (XML file output). Ability to export 12 13 Data and Metadata on the Web (CSVW) [87] W3C Rec- data as arbitrarily shaped XML files. 13 14 ommendation. Besides tabular data, tree-like data are 14 Criterion 30.4 (JSON file output). Ability to export 15 also popular among 3-star data formats. These include 15 data as arbitrarily shaped JSON files. 16 XML and JSON, both of which the user can get, when a 16 standardized RDF serialization in one of those formats, 17 Criterion 30.5 (Output for specific third-party tools). 17 i.e. RDF/XML and JSON-LD, is enough. This is al- 18 Ability to export data for visualization in specific third- 18 ready covered by Req. 27. However, for such data to be 19 party tools. 19 20 usable by other tools, it will probably have to be trans- 20 formed to the JSON or XML format accepted by the 21 4.5. Developer and community support 21 22 tools and this is something that an LDCP should also 22 support. For advanced graph visualizations it may be 23 For most of our requirements there already is a tool 23 24 useful to export the data as graph data e.g. for Gephi [9]. 24 Majority of people today do not know SPARQL and that satisfies them. The problem why there is no plat- 25 form using these tools is caused by their incompatibility 25 26 do not want to learn it. However, an LDCP should still 26 allow them to work with LD and use the data in their and the large effort needed to do so. An LDCP does 27 not have to be a monolithic platform and can consist of 27 28 own tools. Therefore, an LDCP should not only allow 28 to export 3* data, but it should support doing so in a multitude of integrated tools. However, the tools need 29 to support easy integration, which motivates the next 29 30 user friendly way, with predefined output formats for 30 external tools and, ideally, predefined transformations requirements. Each part of an LDCP and the LDCP 31 itself should use the same API and configuration which 31 32 from vocabularies used in the tool’s domain. This ob- 32 servation comes from a recent publication of a Czech it exposes for others to use. The API should again be 33 based on standards, i.e. REST or SOAP based, and the 33 34 registry of subsidies, which was published as LD only, 34 configuration should be consistent with the rest of the 35 with N-Triples dumps, dereferencable IRIs and a public 35 51 data processing, which means in RDF with a defined 36 SPARQL endpoint . This step, while being extremely 36 vocabulary. 37 interesting to semantic web researchers, was received 37 38 negatively by the Czech open data community, which Requirement 31 (API). Offer an easy to use, well doc- 38 39 demanded CSV or JSON files and was not willing to umented API (REST-like or SOAP based) for all impor- 39 40 learn RDF and SPARQL. This motivates Req. 30 and tant operations. 40 41 especially C 30.5. 41 Example 31.1 (API). For instance, LinkedPipes ETL 42 Requirement 30 (Non-RDF data output). Ability to ex- 42 43 offers REST-like API used by its own frontend to per- 43 port non-RDF data such as CSV with CSVW metadata, form operations like running a transformation, creating 44 XML, JSON or Graph data for visualizations or further 44 45 a transformation pipeline, etc. The API is documented 45 processing in external tools. 52 46 by OpenAPI (f.k.a. Swagger) to help developers with 46 47 Example 30.1 (Gephi support). To support graph vi- its reuse. An LDCP should also have an API for all op- 47 48 sualization in Gephi, which can effectively visualize erations so that its parts are easily reusable and easily 48 49 integrated with other software. 49 50 51http://cedr.mfcr.cz/cedr3internetv419/OpenData/ 50 51 DocumentationPage.aspx (in Czech only) 52https://github.com/OAI/OpenAPI-Specification 51 24 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 Criterion 31.1 (Existence of API). Ability to accept Example 33.1 (Sharing on GitHub). Git is a very pop- 1 2 commands via an open API. ular tool for sharing of and collaboration on software 2 3 Criterion 31.2 (API documentation). Existence of a and data, with GitHub as one of its most popular 3 4 documentation of the API of the tool, such as OpenAPI. repository implementations. To spark the network ef- 4 5 fect, an LDCP should have a catalog of plugins and 5 6 Criterion 31.3 (Full API coverage). Ability to accept reusable projects, on which the community can collabo- 6 7 all commands via an open API. rate. Since projects themselves should be stored as RDF 7 8 Requirement 32 (RDF configuration). Offer RDF con- (Req. 32), they can be easily shared on, e.g. GitHub. 8 9 figuration where applicable. This can be illustrated by, e.g. LinkedPipes ETL, which 9 10 has its repository of plugins53 and reusable pipeline 10 Example 32.1 (RDF configuration). All configuration 11 fragments54 which accompany component documenta- 11 properties and projects of an LDCP should be also 12 tion. 12 stored, or at least accessible, in RDF. This is so that this 13 13 data can be read, managed and created by third-party 14 Criterion 33.1 (Repository for source code). Existence 14 RDF tools and also shared easily on the Web, as any 15 of an openly accessible repository where the source 15 other data. For instance, LinkedPipes ETL stores its 16 code of the tool is developed. 16 transformation pipelines in RDF and makes their IRIs 17 17 dereferencable, which eases their sharing and offers Criterion 33.2 (Repository for plugins). Existence of 18 18 wider options for their editing. an openly accessible repository of plugins such as 19 GitHub. 19 20 Criterion 32.1 (Configuration in open format). Con- 20 21 figuration stored and accepted in an open format such Criterion 33.3 (Repository for data processes). Exis- 21 22 as JSON or XML. tence of an openly accessible repository of data pro- 22 23 Criterion 32.2 (Configuration in RDF). Configuration cesses such as GitHub. 23 24 24 stored and accepted in RDF. Criterion 33.4 (Community activity). There is at least 25 25 2 commits from 2 distinct contributors in the last year 26 An important quality of a tool available on the web is 26 to any of the tool related repositories. 27 the nature of its community. This includes the size of the 27 community formed around that tool, its activities and 28 When the whole process of data gathering, process- 28 repositories for sharing tool plugins and data processes 29 ing and output is described in LDCP, it may be useful 29 created in the tool. The community should be active 30 to expose the resulting transformation process as a web 30 and maintain the tool, or at least share new plugins or 31 service, which can be directly consumed by e.g. web 31 new data processes. 32 applications. This could facilitate live data views or 32 Since one of the ideas of LD is distribution of ef- 33 customizable caching and update strategies. 33 34 fort, the data processes defined in an LDCP should 34 35 themselves be shareable and described by an RDF vo- Requirement 34 (Deployment of services). Offer de- 35 36 cabulary as it is with, e.g. the Linked Data Visualiza- ployment of the data transformation process as a web 36 37 tion Model (LDVM) pipelines [88]. For this, a com- service. 37 munity repository for sharing of LDCP plugins, i.e. 38 Example 34.1 (Encapsulated SPARQL queries). A 38 39 reusable pieces of software, which can extend the func- 39 tionality of the LDCP, like visualization components very simple example of Req. 34 could be a SPARQL 40 query, possibly with parameters, encapsulated as a web 40 41 from Req. 26 and vocabulary-based transformers from 41 Req. 19 should be present and identified for the LDCP. service. This is a common way of helping non-RDF ap- 42 plications consume live RDF data without even know- 42 43 In addition, the data transformation processes defined 43 using the LDCP, i.e. the LDCP process configuration, ing it. However, the process behind the web service 44 can be much more complex, and an LDCP could offer 44 45 should also be easily shareable with the community, 45 creation of such services to its users. 46 facilitating effort distribution. An example of such a 46 shared project could be a configuration allowing users 47 Criterion 34.1 (Deployment of services). Ability to de- 47 to extract subsets of DBpedia as CSV files, etc. 48 ploy the data transformation process as a web service. 48 49 Requirement 33 (Tool community). An active commu- 49 50 50 nity should exist around the tool in the form of reposito- 53https://github.com/linkedpipes/etl/tree/develop/plugins 51 ries and contributors. 54https://github.com/linkedpipes/etl/tree/gh-pages/assets/pipelines 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 25

1 5. Tools evaluation and description which can be considered as an LDCP because they sat- 1 2 isfy at least the minimal conditions described in the 2 3 In this section, we provide the evaluation results of introduction. Not all evaluated tools meet even the min- 3 4 16 tools classified as an LDCP according to the method- imal conditions. However, it does not mean that they 4 5 ology described in Section 3.2. We evaluated all 94 are useless for LD consumption. We also discuss the 5 6 criteria for each of the 16 tools. The detailed results of evaluation results for these tools in the text. 6 7 the evaluation of each of the 16 tools can be seen in 7 8 Table 6, Table 7 and Table 8 in Section A. Next, as de- 8 9 tailed in Section 3.4, its requirement score is computed 9 10 as a percentage of passed criteria of each requirement. 10 11 These can be seen in Table 3 for non-LD experts and in 11 12 Table 4 for LD experts. Finally, its requirement group 12 13 score for each group of requirements defined in Sec- 13 14 tion 4 is computed as an average of scores of require- 14 15 ments in the group. These can be seen in Table 1 for 15 16 non-LD experts and in Table 2 for LD experts. 16

17 Tool Dataset discovery Data manipulation Data visualization Data output Developer and community support Average 17 18 LP-VIZ 10% 33% 13% 11% 18 19 V2 5% 4% 33% 8% 19 20 VISU 5% 3% 33% 10% 10% 20 21 Sp 13% 1% 67% 16% 21 22 YAS 5% 1% 67% 5% 13% 18% 22 23 UV 17% 55% 48% 24% 23 24 24 LP-ETL 17% 65% 94% 35% 25 25 OC 23% 17% 33% 30% 6% 22% 26 26 DL 22% 33% 23% 13% 18% 27 27 Tool Dataset discovery Data manipulation Data visualization Data output Developer and community support Average LAUN 5% 4% 4% 6% 4% 28 28 LP-VIZ 9% 33% 13% 11% LDR 3% 67% 6% 15% 29 29 V2 1% 33% 7% SF 10% 2% 13% 5% 30 30 VISU 0% ELD 3% 33% 6% 8% 31 31 Sp 8% 1% 67% 15% PEPE 5% 1% 6% 3% 32 32 FAGI 5% 33% 13% 13% 13% 33 YAS 5% 67% 13% 17% 33 RDFPRO 5% 17% 28% 25% 15% 34 UV 5% 31% 48% 17% 34 Table 2 35 LP-ETL 4% 31% 52% 18% 35 Requirement group scores of evaluated candidates for LD experts. 36 OC 8% 4% 33% 25% 6% 15% 36 Empty cells represent 0%. 37 DL 5% 23% 13% 8% 37 38 LAUN 4% 4% 6% 3% 38 39 LDR 1% 67% 6% 15% 39 5.1. Description of evaluated tools 40 SF 10% 2% 13% 5% 40 41 ELD 1% 33% 6% 8% 41 In this section, we introduce the 16 individual tools 42 PEPE 1% 6% 2% 42 which we evaluated using our 94 criteria. 43 FAGI 2% 33% 6% 13% 11% 43 44 RDFPRO 4% 28% 13% 9% 5.1.1. LinkedPipes Visualization 44 45 Table 1 LinkedPipes Visualization [10] (LP-VIZ) is a tool 45 46 Requirement group scores of evaluated candidates for non-LD experts. providing automated visualizations of LD, fulfilling 46 47 Empty cells represent 0%. Req. 8 Preview based on vocabularies and Req. 26 47 48 Visualization. It queries the input data given as an RDF 48 49 In this section, we first briefly describe each eval- dump or as a SPARQL endpoint for known vocabular- 49 50 uated tool and we point out to fulfilled requirements. ies, and based on the result, it offers appropriate visu- 50 51 Then we discuss the evaluation results and identify tools alizations. These include an interactive RDF data cube 51 26 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 1 2 2

Requirement LP-VIZ V2 VISU Sp YAS UV LP-ETL OC DL LAUN LDR SF ELD PEPE FAGI RDFPRO 3 3 Dataset discovery: Gathering metadata 4 4 Catalog support 25% 5 Catalog-free discovery 5 6 Dataset discovery: Search user interface 6 7 Search query language 25% 50% 50% 7 Search query formulation assistance 50% 50% 50% 8 8 Dataset discovery: Search query evaluation 9 9 Search query expansion 10 Search results extension 10 11 Dataset discovery: Displaying search results 11 12 Dataset ranking 12 Preview based on vocabularies 13 13 Data preview 14 14 Quality indicators 15 Data manipulation: Data input 15 16 IRI dereferencing 16 17 RDF input using SPARQL 20% 20% 20% 20% 20% 20% 20% 17 RDF dump load 9% 73% 64% 55% 73% 55% 36% 9% 64% 18 18 Linked Data Platform input 19 19 Monitoring of changes in input data 20 Version management 20 21 Data manipulation: Analysis of semantic relationships 21 22 Semantic relationship analysis 22 Semantic relationship deduction 23 23 Data manipulation: Semantic transformations 24 24 Transformations based on vocabularies 25 Inference and resource fusion 25 26 Selection and projection 26 27 Custom transformations 27 Automated data manipulation 100% 28 28 Data manipulation: Provenance and license management 29 29 Provenance 30 License management 30 31 Data visualization 31 32 Visualization 33% 33% 67% 67% 33% 67% 33% 33% 32 Data output 33 33 RDF dump output 100% 100% 100% 71% 14% 86% 34 34 RDF output using SPARQL 25% 25% 25% 25% 35 Linked Data Platform output 35 36 Non-RDF data output 20% 36 37 Developer and community support 37 API 67% 33% 38 38 RDF configuration 50% 39 39 Tool community 50% 50% 75% 75% 25% 50% 25% 25% 50% 25% 25% 50% 50% 40 Deployment of services 100% 40

41 Average 6% 2% 0% 5% 5% 11% 12% 8% 6% 3% 3% 5% 2% 1% 4% 7% 41 42 42 Table 3 43 43 Requirement scores of evaluated candidates from the point of view of non-LD experts. Empty cells represent 0%. Requirements are grouped 44 44 according to sub-sections of Section 4 45 45 46 46 visualization, various types of SKOS hierarchy visual- data cube visualization it also employs Req. 11 IRI 47 47 48 izations and a map based visualization (see Figure 2), dereferencing to search for labels of entities. 48 49 As to the other parts of the LD consumption process, 49 which can be further filtered based on concepts con- 50 no requirements were satisfied in the Data output and 50 51 nected to the points on the map. In addition, in the RDF Developer and community support groups. 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 27

1 predefined or user provided SPARQL SELECT queries 1 2 to retrieve data from given SPARQL endpoints. Then 2 3 it uses Google Charts Editor to create a wide range of 3 4 visualizations (see Figure 4) based on the output table, 4 5 including line charts, tree charts or map visualizations. 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 Fig. 2. LinkedPipes Visualization - Points on a map 15 15 16 16 5.1.2. V2 17 17 V2 [89] is an on-line demo for visualization of vo- 18 18 cabularies used in RDF data. In the first step the user 19 Fig. 4. VISU visualization 19 provides a list of SPARQL endpoints. V2 executes a set 20 20 of predefined SPARQL queries to analyze the vocabu- 21 21 laries used in the data and creates the dataset preview. The main focus of VISU is at Data visualization. 22 22 23 5.1.4. Sparklis 23 24 Sparklis [28] (Sp) is an on-line Linked Open Data 24 25 exploration tool that does not require the user to 25 26 have knowledge of SPARQL. The user can specify a 26 27 SPARQL endpoint and explore its contents using so 27 28 called focus, which represents the user’s intent. Its iter- 28 29 ative refinement corresponds to the exploration process. 29 30 The focus is defined using three types of constrains: 30 31 concept, entity and modifier. The concept constraint 31 32 allows the user to define subject type(s). The entity con- 32 33 straint restricts the focus to a single entity. Modifiers 33 34 represent logical operations and filters. 34 35 In each step of the focus refinement process, the user 35 36 is guided by suggestions for all three types of constrains. 36 37 Sparklis offers an on–the–fly data visualization in the 37 38 form of a table or, in case of geolocalized data, a map. A 38 39 39 Fig. 3. V2 visualization query corresponding to the current focus is accessible in 40 the integrated YASGUI tool, which provides additional 40 41 On the top level of the preview, the used classes are options for visualization and data export. 41 42 presented (see Figure 3). The user can select a class From the point of view of the LD Consumption pro- 42 43 43 to see the information about the number of instances cess, Sparklis covers mainly Dataset discovery and Data 44 44 and about predicates used. V2 is simple, nevertheless it manipulation. 45 45 can be used to address the Dataset discovery part of the 46 5.1.5. YASGUI 46 LD consumption process, specifically, it covers Data 47 Yet Another SPARQL GUI [90] (YAS) is a web- 47 preview. 48 based SPARQL query editor. As such it offers the abil- 48 49 5.1.3. VISU ity for the user to specify a SPARQL query without 49 50 VISUalization Playground [89] (VISU) is an on-line knowing the precise syntax. Once the query is executed 50 51 service for visualization of SPARQL results. It uses YASGUI provides different visualizations, such as ta- 51 28 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 1 2 2

Requirement LP-VIZ V2 VISU Sp YAS UV LP-ETL OC DL LAUN LDR SF ELD PEPE FAGI RDFPRO 3 3 4 Dataset discovery: Gathering metadata 4 Catalog support 25% 5 5 Catalog-free discovery 6 Dataset discovery: Search user interface 6 7 Search query language 25% 50% 50% 7 8 Search query formulation assistance 50% 50% 50% 50% 8 9 Dataset discovery: Search query evaluation 9 Search query expansion 10 10 Search results extension 11 Dataset discovery: Displaying search results 11 12 Dataset ranking 12 13 Preview based on vocabularies 13 14 Data preview 50% 50% 50% 100% 50% 50% 50% 14 Quality indicators 15 15 Data manipulation: Data input 16 IRI dereferencing 16 17 RDF input using SPARQL 40% 20% 40% 20% 20% 80% 80% 40% 60% 40% 40% 20% 40% 20% 17 18 RDF dump load 9% 73% 73% 64% 73% 55% 36% 9% 64% 18 19 Linked Data Platform input 19 Monitoring of changes in input data 20 20 Version management 21 Data manipulation: Analysis of semantic relationships 21 22 Semantic relationship analysis 33% 22 23 Semantic relationship deduction 50% 50% 23 24 Data manipulation: Semantic transformations 24 Transformations based on vocabularies 50% 25 25 Inference and resource fusion 33% 67% 26 Selection and projection 26 27 Custom transformations 100% 100% 100% 100% 100% 27 28 Automated data manipulation 100% 28 29 Data manipulation: Provenance and license management 29 Provenance 30 30 License management 31 Data visualization 31 32 Visualization 33% 33% 33% 67% 67% 33% 33% 67% 33% 33% 32 33 Data output 33 34 RDF dump output 100% 100% 100% 71% 14% 86% 34 RDF output using SPARQL 100% 100% 50% 25% 35 35 Linked Data Platform output 36 Non-RDF data output 40% 20% 20% 60% 20% 20% 36 37 Developer and community support 37 38 API 67% 100% 38 39 RDF configuration 50% 100% 50% 39 Tool community 50% 50% 75% 75% 25% 50% 25% 25% 50% 25% 25% 50% 50% 40 40 Deployment of services 100% 41 41 Average 7% 4% 5% 6% 6% 20% 26% 19% 15% 4% 4% 5% 3% 3% 6% 15% 42 42 43 Table 4 43 44 Requirement scores of evaluated candidates from the point of view of LD experts. Empty cells represent 0%. Requirements are grouped according 44 to sub-sections of Section 4 45 45 46 46 47 ble, pivot table, Google Chart and Map visualizations, mainly Data manipulation, Data visualization and Data 47 55 48 and export of SPARQL SELECT results to a CSV file. output. It is currently used in multiple LD based tools , 48 49 The aim of YASGUI is not to be used as a standalone 49 50 tool, but rather be integrated as a component. Still, it 50 51 covers several areas of the LD consumption process, 55https://github.com/OpenTriply/YASGUI.YASQE 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 29

1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 Fig. 7. UnifiedViews transformation pipeline 14 15 15 16 tion is limited to expert users only, due to its lack of 16 17 Fig. 5. Sparklis user interface for data exploration proper documentation, incomplete API coverage and 17 18 proprietary configuration formats. 18 19 19 5.1.7. LinkedPipes ETL 20 20 LinkedPipes ETL [92] (LP-ETL) is an open–source 21 21 tool for design and execution of ETL (Extract, Trans- 22 22 form, Load) pipelines (see Figure 8), with a library of 23 23 more than 60 reusable components which can be used 24 24 in the pipelines and which very well support the Data 25 25 manipulation and Data output requirement groups. The 26 26 tool is primarily focused on working with RDF data and 27 27 features a thorough documentation, including how-tos 28 28 and tutorials for typical LD ETL scenarios. LP-ETL 29 29 also helps the user, who still has to be a LD expert 30 30 though, to build a data transformation pipeline using 31 31 smart suggestions of components to use. LinkedPipes 32 32 ETL is developed as a replacement for UnifiedViews. 33 33 34 34 35 35 36 36 37 Fig. 6. YASGUI SPARQL query builder interface 37 38 38 39 e.g. Apache Jena Fuseki, Eclipse RDF4J and OntoText 39 40 GraphDB triplestores. 40 41 41 42 5.1.6. UnifiedViews 42 43 UnifiedViews [91] (UV) is a predecessor of Linked- 43 44 Pipes ETL, which means that it is also an open–source 44 45 tool for design and execution of ETL pipelines (see Fig. 8. LinkedPipes ETL transformation pipeline 45 46 Figure 7), and it is also focused on RDF and linked 46 47 data and features a library of reusable components. Its Regarding the LD Consumption process, the most 47 48 requirements coverage is therefore a subset of the cov- notable features are its capability of working with larger 48 49 erage of LinkedPipes ETL. data using so called data chunking approach [93] sat- 49 50 While it can be successfully used for LD publication, isfying C 12.5 SPARQL: advanced download. In the 50 51 which is its primary purpose, its usage for LD consump- Developer and community support requirements group, 51 30 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 the satisfaction of C 31.3 Full API coverage and C 32.2 1 2 Configuration in RDF means that the tool can be easily 2 3 integrated with other systems. 3 4 LinkedPipes ETL does not support the user in 4 5 Dataset discovery or Data visualization. 5 6 6 5.1.8. OpenCube 7 7 8 OpenCube [94] (OC) is an open source project, 8 9 which supports publication of statistical data using the 9 10 RDF Data Cube Vocabulary and also provides visu- 10 11 alizations and data analytics on top of the statistical 11 12 cubes. OpenCube wraps another tool, the Information 12 13 Workbench [95], and thus it has all the functionality 13 14 provided by Information Workbench (see Figure 9). 14 15 15 16 16 17 Fig. 10. Datalift user interface 17 18 18 19 data into a single format, removing duplicates and blank 19 20 nodes and by fixing syntax errors. 20 21 21 22 22 23 23 24 24 25 Fig. 9. OpenCube user interface in Information Workbench 25 26 26 27 OpenCube also integrates several other tools such as 27 28 R and Grafter. The created visualization can be embed- 28 29 ded into an HTML page as an interactive script. 29 30 5.1.9. Datalift 30 31 Datalift [96] (DL) aims at semantic lifting of non- 31 32 RDF data, but it can be also used to consume RDF 32 33 data. It uses projects to group different data sources 33 34 together and it allows users to perform transformations 34 35 such as SPARQL CONSTRUCT, String to URI, IRI 35 36 translation and Silk link discovery, supporting Data ma- 36 Fig. 11. LOD Laundromat data input user interface 37 nipulation. Finally, it provides visualizations and RDF 37 38 and CSV data export, fulfilling Req. 26 Visualization, 38 39 Users can submit data via custom for process- 39 Req. 27 RDF dump output, and C 30.1 CSV file output, 40 ing and cleaning. Once the data is clean, it can be down- 40 respectively. 41 The web-based user interface is simple and user loaded or queried using a linked data fragments inter- 41 42 42 friendly (see Figure 10). Datalift provides out of the face. LODLaundromat has strong support in the Data 43 43 box manual visualization with support for several chart input functionality. As data output, it only supports 44 44 types like line chart, table chart, bar chart, timeline, tree C 27.3 N-Triples file dump. 45 45 map, geographical map, etc. 5.1.11. LD-R 46 46 Linked Data Reactor [98] (LDR) is a framework 47 5.1.10. LODLaundromat 47 consisting of user interface components for Linked Data 48 LODLaundromat [97] (LAUN) is an open–source 48 applications. It is built to follow the Flux56 architecture 49 service, with web user interface (see Figure 11), de- 49 50 signed to provide access to Linked Open Data. It aims 50 51 to tackle the problem of data quality by converting the 56https://facebook.github.io/flux/ 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 31

1 using ReactJS57. When installed, LD-R can also be 1 2 used as a web-based application (see Figure 12). It 2 3 allows users to list resources, filter them, edit them 3 4 and visualize them. Most of this functionality, however, 4 5 requires changing the configuration and by so requires 5 6 the user to have expert knowledge of LD. 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 Fig. 13. SemFacet user interface 18 18 19 19 The user can also select measures, temporal and spatial 20 20 information that is visualized (see Figure 14). 21 21 22 22 23 23 24 24 25 25 26 26 27 Fig. 12. LD-R user interface 27 28 28 29 Nevertheless, once set up, most of the functionality 29 30 is non-LD expert friendly. LD-R mainly covers require- 30 31 ments in the Data input and Data visualization sections. 31 32 5.1.12. SemFacet 32 33 SemFacet [30] (SF) is an open–source tool imple- 33 34 menting semantic facet based search system. The tool Fig. 14. ESTA-LD user interface 34 35 has a web-based user interface (see Figure 13) and it is 35 36 designed solely for the purpose of searching, with focus Most of the ESTA-LD functionality is domain spe- 36 37 on faceted search. User can import custom data and cific, focused on data cubes and spatial and temporal 37 38 an OWL ontology. SemFacet then utilizes the provided data. From the point of view of an LDCP it supports 38 39 ontology to enhance the facet based search. the user in the area of Data input and it fulfills C 26.3 39 40 As the main focus of SemFacet is search, it offers Vocabulary-based visualization. 40 41 functionality mainly in the area of Data input and it 41 5.1.14. PepeSearch 42 fulfills C 4.2 Faceted search. 42 43 PepeSearch [100] (PEPE) is an open–source tool for 43 44 5.1.13. ESTA-LD LD exploration. It consists of two main components: a 44 45 ESTA-LD [99] (ELD) is a web–based application tool for preprocessing data in a given SPARQL endpoint 45 46 for viewing RDF data cubes with spatial and temporal and a web–based application (see Figure 15). After 46 47 data. ESTA-LD automatically searches the user pro- preprocessing the data with the tool, the user can add 47 48 vided SPARQL endpoint and provides the user with it to the web–based application and explore it. In the 48 49 selection of graphs and datasets found in the endpoint. exploration phase the user can select classes, properties 49 50 and relations that are visualized in the form of an HTML 50 51 57https://reactjs.org/ table. 51 32 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 tion and Data output, and implements C 20.2 Resource 1 2 fusion. 2 3 3 5.1.16. RDFpro 4 4 RDFpro [56] (RDFPRO) is an open–source tool for 5 5 transformation of LD. The transformation process is 6 6 described in a script and can be executed using a com- 7 7 mand line or web-based interface (see Figure 17). 8 8 9 9 10 10 11 11 12 12 13 13 14 Fig. 15. PepeSearch user interface 14 15 15 16 The web–application mostly fulfills C 9.2 Schema 16 17 extraction. The schema is created by the preprocessing 17 18 tool, that provides the application with the Data input 18 19 functionality. 19 20 20 21 5.1.15. FAGI-gis Fig. 17. RDFpro user interface 21 22 FAGI-gis [101] (FAGI) is an open–source tool for 22 23 fusion of RDF spatial data supported by a W3C com- The main advantage of RDFpro is the performance 23 24 munity and business group58. The tool can import data which can be archived during the LD transformation 24 25 from SPARQL endpoints using graphs to distinguish satisfying C 13.8 Handling of bigger files. RDFpro 25 26 input datasets (see Figure 16). It relies on the user to also supports the user mainly in Data input, Semantic 26 27 provide a mapping (linking), according to which the transformations and Data output. 27 28 data fusion will be performed. After the data and links 28 29 are loaded into the tool, the user can manipulate the 5.2. Discussion 29 30 spatial information of entities on a map and specify 30 31 the fusion conditions on RDF properties. The resulting As we have said earlier in the introduction, we show 31 32 fused data can be uploaded to a SPARQL endpoint. in this section which of the evaluated tools are LDCP - 32 33 either in the minimal sense specified in the introduction 33 34 or with some additional features relevant for the con- 34 35 sumption process. We introduced three conditions each 35 36 LDCP must fulfill. 36 37 First, an LDCP must support loading LD either from 37 38 a SPARQL endpoint or a dump file. In terms of our re- 38 39 quirements it means that the tool must fulfill the require- 39 40 ment RDF input using SPARQL (i.e. loading LD from a 40 41 SPARQL endpoint) or the requirement RDF dump load 41 42 (loading LD from a dump file). We do not consider the 42 43 other requirements describing possible ways of provid- 43 44 ing input data, i.e. the requirements IRI dereferencing 44 45 and Linked Data Platform input, because Table 3 and 45 Fig. 16. FAGI-gis user interface 46 Table 4 show that none of the evaluated tools fulfills 46 47 them. 47 In terms of the identified LDCP requirements, FAGi- 48 Second, the tool must be able to provide the loaded 48 gis supports the areas of Data input, Data visualiza- 49 data on output either in the form of a visualization or 49 50 represented in some non-RDF data format. In terms 50 51 58https://www.w3.org/community/geosemweb of our requirements it means that the tool must fulfill 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 33

1 1 2 2

Requirement LP-VIZ V2 VISU Sp YAS UV LP-ETL OC DL LAUN LDR SF ELD PEPE FAGI RDFPRO 3 3               4 RDF input using SPARQL – – 4 RDF dump load  – – – –      –  – –   5 5 Visualization      – –   –  –  –  – 6 Non-RDF data output – –  –      – – – – – – – 6 7 RDF dump output – – – – –      – – – – –  7 8 RDF output using SPARQL – – – – –   – – – – – – –   8 9 Table 5 9 10 Requirement fulfillment for LDCP specific requirements. means fulfillment for experts, means fulfillment for non-LD experts. 10 11 11 12 12 the requirement Visualization (i.e. a visualization of be used for LD consumption by LD experts only. 13 13 the data) or the requirement Non-RDF data output (i.e. These tools support the minimal features for LD 14 14 output of data in a non-RDF format). consumption. However, they provide these fea- 15 15 Third, non-LD experts must be able to use both fea- tures only for LD experts. 16 16 tures without any knowledge of LD technologies. In the 17 17 introduction, we have explained that this means that the Table 3 and Table 4 show that some of these tools 18 18 tool provides a user interface which does not require also support additional features represented by the other 19 19 the users to know the RDF model, its serializations or requirements. Table 3 shows which additional require- 20 20 the SPARQL query language. ments are fulfilled to which extent by the tools from the 21 21 We denote tools which meet the three conditions point of view of non-LD experts. Table 4 shows this 22 22 as LDCPs. However, the tools which fulfill only the from the point of view of LD experts. 23 23 first and the second condition are also important. They Table 5 shows that the rest of the evaluated tools, i.e. 24 24 support the basic LD consumption activities but only LODLaundromat, SemFacet, PepeSearch and RDFpro 25 25 LD experts can use them because the tools assume the cannot be considered as LDCPs because they do not 26 26 knowledge of the RDF data model, its serializations or provide neither a visualization nor a non-RDF repre- 27 27 the SPARQL query language. sentation on the output. However, Table 5 shows that 28 28 Table 5 shows evaluation of the above mentioned re- LODLaundromat and RDFpro provide an RDF output 29 29 quirements important for identifying LDCPs. For each in the form of a dump. Moreover, RDFpro provides 30 30 requirement and evaluated tool it shows the maximum also its RDF output in a SPARQL endpoint. Therefore, 31 31 result achieved for any of the criteria defined for the both tools can be combined with the LDCPs mentioned 32 32 requirement. The detailed evaluation of individual cri- above because their RDF output may serve as input for 33 33 teria is presented in Table 6, Table 7 and Table 8 in Sec- 34 the LDCPs. Also the tools UnifiedViews and Linked- 34 tion A. The value means that at least one criterion 35 Pipes ETL, which are classified above as LD consump- 35 defined for the requirement is satisfied by the tool in 36 tion tools for LD experts, provide an RDF output and, 36 a way usable for non-LD experts. The value means 37 therefore, can be combined with the LDCPs. And also 37 that at least one criterion defined for the requirement 38 the LDCPs OpenCube, Datalift and FAGI-gis provide 38 is satisfied by the tool in a way usable for LD experts 39 an RDF output so it is possible to combine them with 39 but there is no criterion satisfied in a way usable for 40 other LDCPs. 40 non-LD experts. Therefore, the evaluation presented in 41 Figure 18 summarizes the above discussion. The 41 Table 5 shows that 42 tools displayed in green boxes were classified as LD- 42 43 – the tools LinkedPipes Visualization, V2, Sparklis, CPs. The tools in blue boxes were classified as tools 43 44 OpenCube, LD-R, ESTA-LD and FAGI-gis are which support LD experts in LD consumption but they 44 45 LDCPs because they enable non-LD experts to are not suitable for non-LD experts and, therefore, they 45 46 load LD and visualize them, are not LDCPs. The tools in gray boxes provide some 46 47 – the tool Datalift is an LDCP because it enables additional features which are relevant for LD consump- 47 48 non-LD experts to load LD and provides an output tion. As shown in Table 5, they provide some features 48 49 in a non-RDF data format, and suitable even for non-LD experts. The tools on the left 49 50 – the tools VISU, YASGUI, UnifiedViews and hand side of Figure 18 provide an RDF output which 50 51 LinkedPipes ETL are not LDCPs but they can can be used as an RDF input for the other tools in the 51 34 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 figure. This is depicted by the connecting arrows with The evaluated tools do not inform users about quality 1 2 the label extends. of LD as there are no tools fulfilling the requirement 2 3 Quality indicators. No tools support users in manage- 3 4 ment of provenance information and licensing which 4 5 can be seen from the evaluation of the requirements 5 6 Provenance and License management. The tools do not 6 7 support users in change and version management which 7 8 is shown by requirements Monitoring of changes in 8 9 input data and Version management not fulfilled by any 9 10 of the tools. These parts of the LD consumption pro- 10 11 cess are well covered by existing W3C standards and 11 12 literature as we have shown in Section 4 but this theory 12 13 is not reflected by the tools. 13 14 In the category of Developer and community support, 14 15 8 of the 16 evaluated tools passed 33.4 Community 15 16 activity, the rest is not actively maintained. However, 16 17 most of the tools, 13, have their source code in a pub- 17 18 licly accessible repository. What could be improved is 18 19 the presence of well–known repositories for plug-ins 19 Fig. 18. Classification of evaluated tools: (green) tools classified 20 and projects, where most of the evaluated tools do not 20 as LDCPs, (blue) tools supporting LD experts in LD consumption 21 21 and (gray) tools providing additional LD consumption features. The have one, and which could boost community adoption. 22 arrows show that tools in their source provide an RDF output which If we restrict ourselves only to the non-LD expert 22 23 can be consumed as an input of tools in their target. view presented in Table 3 we must conclude that the 23 24 current tools support non-LD experts insufficiently in 24 25 In total, we identified 8 LDCPs which is not a small the LD consumption activities, with only basic search 25 26 number. Moreover, all evaluated tools have significant interfaces and visualization capabilities. This enables 26 27 added value for LD consumers. In summary, all evalu- non-LD experts to consume LD in a limited way, i.e. 27 28 ated tools enable users to load LD and perform some load given LD and visualize them somehow. However, 28 29 useful actions, at least to preview the loaded LD in case the tools do not provide the non-LD experts with fea- 29 30 of PepeSearch. There are tools providing automated tures which would enable them to exploit the full range 30 31 visualization of published data (LinkedPipes Visual- of benefits of LD. They still have the only possibility 31 32 ization), manual visualization of published data (V2, which is learning LD technologies. 32 33 VISU), exploration of given data (Sparklis), SPARQL 33 34 query building capabilities (YASGUI) and then there 34 35 are tools mainly focused on publishing LD, which 6. Related work 35 36 can also be used for LD consumption (UnifiedViews, 36 37 LinkedPipes ETL, OpenCube, Datalift). In this section we go over related work in the sense 37 38 However, the evaluation presented in Table 3 and of similar studies of LD consumption possibilities. We 38 39 Table 4 also shows that there is a lack of tools cover- structure related work to two parts. First, there are ex- 39 40 ing certain steps of the LD consumption process even isting surveys of research areas which correspond to 40 41 for LD experts. In the category of Dataset discovery, some parts of our LD consumption process. We sum- 41 42 there are no tools providing dataset search capabilities marize them in Section 6.1. We recommend them to 42 43 built on top of LD principles which is demonstrated by our readers as more complete overviews of existing 43 44 the fact that no tool fulfills requirements Catalog-free research literature in these concrete areas. For each 44 45 discovery, Search query expansion and Search results survey we show related requirements from Section 4. 45 46 extension. The existing tools are limited to requiring a However, not all areas related to LD consumption are 46 47 link to an RDF dump or a SPARQL endpoint from their covered by surveys. In this paper we do not aim at do- 47 48 users as an input, which could, ideally, be provided by ing a complete survey of research literature for these 48 49 the dataset discovery techniques. OpenCube supports areas. We only describe selected works in these areas 49 50 dataset discovery but only on top of CKAN APIs which or W3C Recommendations to support our introduced 50 51 is a proprietary non-LD API. requirements. 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 35

1 Second, there are works which are surveys of soft- A survey on Linked Data Visualization [104] defines 1 2 ware tools which can be used for LD consumption. 18 criteria and surveys 15 LD browsers, which could 2 3 These tools are close to our survey presented in this pa- be used to cover the discovery Req. 2 Catalog-free 3 4 per. We summarize them in Section 6.2 and we compare discovery and the visualization Req. 26 Visualization. 4 5 them to ours. In Exploring User and System Requirements of 5 6 Linked Data Visualization through a Visual Dashboard 6 7 6.1. Complementary surveys Approach [105] the authors perform a focus group 7 8 study. They aim for determining the users’ needs and 8 9 The Survey on Linked Data Exploration Systems system requirements for visualizing LD using the dash- 9 10 [102] provides an overview of 16 existing tools in three board approach. The authors utilize their Points of View 10 11 categories, classified according to 16 criteria. The first (.views.) framework for various parallel visualizations 11 12 category represents the LD browsers, the second cate- of LD. They also emphasize the heterogeneity of LD 12 13 gory represents domain-specific and cross-domain rec- and the need for highly customized visualizations for 13 14 ommenders, which should be considered for integration different kinds of LD, which supports our Req. 8 Pre- 14 15 into an LDCP regarding e.g. our Req. 6 Search results view based on vocabularies, and Req. 26 Visualization. 15 16 extension. The third category represents the exploratory 16 17 6.2. Surveys of LD consumption 17 search systems (ESS) which could be considered for 18 18 inclusion into an LDCP regarding Req. 2 Catalog-free 19 Close to our presented survey is a recent survey on 19 discovery. A recent survey on RDF Dataset Profiling 20 The Use of Software Tools in Linked Data Publication 20 [15] surveys methods and tools for LOD dataset pro- 21 and Consumption [106]. The authors assume a LOD 21 filing. The survey identifies and compares 20 existing 22 publication process as well as a LOD consumption pro- 22 profiling tools which could be considered for inclusion 23 cess, both taken from existing literature. They do a sys- 23 into an LDCP regarding Req. 2 Catalog-free discovery. 24 tematic literature review to identify existing software 24 The surveyed systems help to find the data the user is 25 tools for LOD publication and consumption. They iden- 25 looking for. However, in the scope of an LDCP, this is 26 tify 9 software tools for publication and consumption 26 only the beginning of the consumption, which is typi- 27 of LOD. Their survey differs from ours in many aspects. 27 cally followed by processing of the found data leading 28 First, 9 evaluated tools were published between years 28 to a custom visualization or data in a format for further 29 2007 - 2012, and one is from 2014. We evaluate tools 29 processing. 30 from 2013 or newer. Second, the authors evaluated how 30 In [103] authors introduce a system which provides 31 each tool supports individual steps in both processes. 31 32 a visual interface to assist users in defining SPARQL However, evaluation criteria are not clear. The paper 32 33 queries. The paper compares 9 existing software tools does not present any requirements nor criteria, only 33 34 which provide such assistance to the users. These tools steps of the processes. It is not clear how the authors 34 35 could be integrated into an LDCP to cover Req. 3 evaluated whether a given tool supports a given step. 35 36 Search query language and Req. 4 Search query formu- The only detail present is evaluation of supported RDF 36 37 lation assistance. serialization formats. Therefore, our study is more de- 37 38 A recent survey on Ontology Matching [78] indicates tailed and precise. Third, they evaluate the tools only 38 39 there is a growing interest in the field, which could based on the papers where the tools are described. It is 39 40 prove very useful when integrated into an LDCP, cover- not described whether tools were installed and used. In 40 41 ing Req. 17 Semantic relationship analysis and Req. 18 our study, this is one of the selection criteria. Therefore, 41 42 Semantic relationship deduction. our work is more useful for the community not only 42 43 A survey on Quality Assessment for Linked Data because it is more recent, detailed and precise but also 43 44 [41] defines 18 quality dimensions such as availability, because it informs the readers about the tools which 44 45 consistency and completeness and 69 finer-grained met- can be actually used and which requirements they really 45 46 rics. The authors also analyze 30 core approaches to LD cover in their run-time. 46 47 quality and 12 tools, which creates a comprehensive Another close paper is our own study [4] conducted 47 48 base of quality indicators to be used to cover Req. 10 at the end of 2015 which contains a brief survey on the 48 49 Quality indicators. One of the quality dimensions deals same topic. The novel survey presented in this paper is 49 50 with licensing, which can be a good base for Req. 25 its significant extension. We describe the LD consump- 50 51 License management, which is, so far, left uncovered. tion process and compare it to the existing literature. 51 36 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 We revise the evaluation requirements, put them into ready supported by the existing tools. However, non- 1 2 the context of the LD consumption process and make LD experts still cannot exploit many benefits of the LD 2 3 their evaluation more precise and repeatable by intro- principles and technologies. This poses a problem for 3 4 ducing exact evaluation criteria. Both the requirements LD adoption, as consumers often still find consuming 4 5 and the criteria are now supported by existing literature 3-star data such as CSV files more comfortable than 5 6 and W3C Recommendations. In [4] the evaluated tools working with LD. At the same time, the potential LD 6 7 were selected based on a quick literature survey while publishers are having a hard time convincing their supe- 7 8 in this paper we selected the tools based on a system- riors to invest the additional effort into LD publication, 8 9 atic, deep and repeatable survey. The evaluation results when there are no immediately appreciable benefits of 9 10 are presented in a bigger detail. Last but not least, the doing so. 10 11 11 evaluated tools in this paper are more recent and we Without a proper LDCP, even experts willing to learn 12 recently (Nov 2017) checked their availability. 12 LD technologies are having a hard time seeing the ben- 13 13 efits in processing RDF files using SPARQL. Even for 14 14 them it is still easier to process CSV files using their 15 7. Conclusions 15 16 own scripts or their favorite tabular data processor. 16 Based on our evaluation we therefore conclude that 17 In this paper, we identified the need for a software 17 18 tool which would provide access to Linked Data to con- there are large gaps in multiple areas of LD consump- 18 19 sumers who are experts in data consumption or process- tion, especially for non-LD experts, which yet need 19 20 ing but not experts on LD technologies. We call such to be bridged by non-LD expert friendly tools. These 20 21 software tool a Linked Data Consumption Platform would enable the data consumers to exploit the advan- 21 22 (LDCP) and the target consumers non-LD experts. We tages of working with LD, which would in turn motivate 22 23 motivated our work by a task given to data journalists more data publishers to publish LD, as their consumers 23 24 who need to write an article about population of Euro- will immediately get a better experience. A suitable first 24 25 pean cities and who want to attach data in a CSV file step would be to implement a generic extensible open- 25 26 to the article. Based on the current literature and W3C source framework to provide support for more specific 26 27 standards related to LD consumption we described the LDCPs. The framework would have to provide quality 27 28 LD consumption process, and in turn we defined 34 implementation of the minimal LDCP properties. These 28 29 requirements on tools implementing it, which were fur- include loading LD in the various ways currently avail- 29 30 ther split into 94 evaluation criteria. Then we performed able, providing non-RDF output and providing a user 30 31 a survey of 110 existing tools, which after 3 elimination interface suitable for non-LD experts. This framework 31 32 rounds ended in 16 tools which we evaluated using all could then be extended by the community in a modular 32 33 criteria. We evaluated the tools from the point of view "pay as you go" fashion implementing the requirements 33 34 of non-LD experts as well as LD experts. identified in Section 4. 34 35 We identified 8 software tools which can be classified Our survey may serve as an overview of expected 35 36 as an LDCP, i.e. a tool which enables non-LD experts to 36 functionality for anyone who would like to build a tool 37 consume LD at least in a minimal sense. This minimal 37 or integrate more existing tools for LD consumption or 38 sense means that the tool has the ability to load selected 38 selected parts of the consumption process. It may also 39 LD and present it in the form of a data visualization 39 serve as a starting point for anyone interested in what 40 or a representation in a non-RDF format, e.g., CSV. 40 benefits to expect when publishing data as LD. 41 Moreover, we have shown that the rest of the evaluated 41 As to the future research directions, possibilities of 42 tools, even not classified as an LDCP, support either LD 42 43 experts in LD consumption or both kinds of users in integration of individual tools into bigger platforms 43 44 certain parts of the LD consumption process. We have should be investigated. Our survey shows that on one 44 45 also shown that the tools can be combined together hand, there is enough research and specifications to 45 46 because many of them provide RDF output which can make LD at least as usable to non-LD experts as, e.g. 46 47 be used as input by other tools. CSV files, but on the other hand there is a lack of im- 47 48 On the other hand, we have also shown that there plementation of these features in present tools. These 48 49 are still significant steps in LD consumption which are should implement the advanced features exploiting the 49 50 covered by the existing tools insufficiently (see Table 3 LD ecosystem while shielding the non-LD experts from 50 51 and Table 4). Basic LD consumption activities are al- the technical details using a suitable user interface. 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 37

1 Acknowledgements [10] J. Klímek, J. Helmich and M. Necaský,ˇ LinkedPipes Visualiza- 1 2 tion: Simple Useful Linked Data Visualization Use Cases, in: 2 3 This work was supported in part by the Czech Sci- The Semantic Web - ESWC 2016 Satellite Events, Heraklion, 3 Crete, Greece, May 29 - June 2, 2016, Revised Selected Pa- 4 ence Foundation (GACR),ˇ grant number 16-09713S 4 pers, 2016, pp. 112–117. doi:10.1007/978-3-319-47602-5_23. 5 and in part by the project SVV 260451. https://dx.doi.org/10.1007/978-3-319-47602-5_23. 5 6 [11] A. Dadzie and E. Pietriga, Visualisation of Linked Data - 6 7 Reprise, Semantic Web 8(1) (2017), 1–21. doi:10.3233/SW- 7 8 References 160249. https://doi.org/10.3233/SW-160249. 8 9 [12] F. Bauer and M. Kaltenböck, Linked open data: The essen- 9 [1] J. Klímek, J. Kucera,ˇ M. Necaskýˇ and D. Chlapek, Publication tials, Edition mono/monochrom, Vienna (2011). ISBN 978-3- 10 10 and usage of official Czech pension statistics Linked Open 902796-05-9. 11 Data, Journal of Web Semantics 48 (2018), 1–21, ISSN [13] J. Erickson and F. Maali, Data Catalog Vocabulary (DCAT), 11 12 1570-8268. doi:https://doi.org/10.1016/j.websem.2017.09.002. W3C Recommendation, W3C, 2014, https://www.w3.org/TR/ 12 13 http://www.sciencedirect.com/science/article/pii/ 2014/REC-vocab-dcat-20140116/. 13 S1570826817300343. 14 [14] K. Alexander, J. Zhao, R. Cyganiak and M. Hausen- 14 [2] V. Sabol, G. Tschinkel, E.E. Veas, P. Höfler, B. Mutlu and blas, Describing Linked Datasets with the VoID Vocabu- 15 15 M. Granitzer, Discovery and Visual Analysis of Linked Data lary, W3C Note, W3C, 2011, http://www.w3.org/TR/2011/ 16 for Humans, in: The Semantic Web - ISWC 2014 - 13th In- NOTE-void-20110303/. 16 17 ternational Semantic Web Conference, Riva del Garda, Italy, [15] M.B. Ellefi, Z. Bellahsene, J. Breslin, E. Demidova, 17 18 October 19-23, 2014. Proceedings, Part I, 2014, pp. 309–324. S. Dietze, J. Szymanski and K. Todorov, RDF Dataset 18 doi:10.1007/978-3-319-11964-9_20. https://doi.org/10.1007/ 19 Profiling - a Survey of Features, Methods, Vocabu- 19 978-3-319-11964-9_20. laries and Applications, To appear in The Semantic 20 [3] D. Lämmerhirt, M. Rubinstein and O. Montiel, The 20 Web Journal, http://semantic-web-journal.net/content/ 21 State of Open Government Data in 2017, Open Knowl- 21 rdf-dataset-profiling-survey-features-methods-vocabularies-and-applications. edge International (2017). https://blog.okfn.org/files/2017/06/ 22 [16] K. Christodoulou, N.W. Paton and A.A.A. Fernandes, 22 FinalreportTheStateofOpenGovernmentDatain2017.pdf. 23 Structure Inference for Linked Data Sources Using 23 [4] J. Klímek, P. Škoda and M. Necaský,ˇ Requirements on Linked 24 Clustering, Trans. Large-Scale Data- and Knowledge- 24 Data Consumption Platform, in: Proceedings of the Workshop Centered Systems 19 (2015), 1–25. ISBN 978-3-662-46561- 25 on Linked Data on the Web, LDOW 2016, co-located with 25 5. doi:10.1007/978-3-662-46562-2_1. https://dx.doi.org/10. 26 25th International World Wide Web Conference (WWW 2016), 26 1007/978-3-662-46562-2_1. 27 S. Auer, T. Berners-Lee, C. Bizer and T. Heath, eds, CEUR 27 [17] M. Röder, A.N. Ngomo, I. Ermilov and A. Both, Detecting Workshop Proceedings, Vol. 1593, CEUR-WS.org, 2016. http: 28 Similar Linked Datasets Using Topic Modelling, in: The Se- 28 //ceur-ws.org/Vol-1593/article-01.pdf. 29 mantic Web. Latest Advances and New Domains - 13th Inter- 29 [5] M. Necaský,ˇ J. Klímek and P. Škoda, Practical Use Cases national Conference, ESWC 2016, Heraklion, Crete, Greece, 30 for Linked Open Data in eGovernment Demonstrated on the 30 May 29 - June 2, 2016, Proceedings 31 Czech Republic, in: Electronic Government and the Infor- , H. Sack, E. Blomqvist, 31 M. d’Aquin, C. Ghidini, S.P. Ponzetto and C. Lange, eds, Lec- 32 mation Systems Perspective, A. Ko˝ and E. Francesconi, eds, 32 ture Notes in Computer Science, Vol. 9678, Springer, 2016, 33 Springer International Publishing, Cham, 2017, pp. 80–96. 33 ISBN 978-3-319-64248-2. pp. 3–19. ISBN 978-3-319-34128-6. doi:10.1007/978-3-319- 34 34 [6] M. Sabou, I. Onder, A.M.P. Brasoveanu and A. Scharl, To- 34129-3_1. https://dx.doi.org/10.1007/978-3-319-34129-3_ 35 wards cross-domain data analytics in tourism: a linked data 1. 35 36 based approach, Information Technology & Tourism 16(1) [18] A. Abele, Linked Data Profiling: Identifying the Domain of 36 Datasets Based on Data Content and Metadata, in: Proceed- 37 (2016), 71–101, ISSN 1943-4294. doi:10.1007/s40558-015- 37 ings of the 25th International Conference on World Wide Web, 38 0049-5. https://doi.org/10.1007/s40558-015-0049-5. 38 [7] P. Vandenbussche and B. Vatant, Linked Open Vocabularies, WWW 2016, Montreal, Canada, April 11-15, 2016, Compan- 39 39 ERCIM News 2014(96) (2014). http://ercim-news.ercim.eu/ ion Volume, J. Bourdeau, J. Hendler, R. Nkambou, I. Horrocks 40 en96/special/linked-open-vocabularies. and B.Y. Zhao, eds, ACM, 2016, pp. 287–291. ISBN 978-1- 40 41 [8] D. Schweiger, Z. Trajanoski and S. Pabinger, SPARQLGraph: 4503-4144-8. doi:10.1145/2872518.2888603. https://doi.acm. 41 42 a web-based platform for graphically querying biological Se- org/10.1145/2872518.2888603. 42 [19] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, 43 mantic Web databases, BMC Bioinformatics 15(1) (2014), 43 1–5, ISSN 1471-2105. doi:10.1186/1471-2105-15-279. https: P.N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer 44 44 //dx.doi.org/10.1186/1471-2105-15-279. and C. Bizer, DBpedia - A large-scale, multilingual knowledge 45 [9] M. Bastian, S. Heymann and M. Jacomy, Gephi: An Open base extracted from Wikipedia, Semantic Web 6(2) (2015), 45 46 Source Software for Exploring and Manipulating Networks, in: 167–195. doi:10.3233/SW-140134. https://doi.org/10.3233/ 46 47 Proceedings of the Third International Conference on Weblogs SW-140134. 47 48 and , ICWSM 2009, San Jose, California, USA, [20] E. Oren, R. Delbru, M. Catasta, R. Cyganiak, H. Sten- 48 May 17-20, 2009, E. Adar, M. Hurst, T. Finin, N.S. Glance, zhorn and G. Tummarello, Sindice.com: a document-oriented 49 49 N. Nicolov and B.L. Tseng, eds, The AAAI Press, 2009. ISBN lookup index for open linked data, IJMSO 3(1) (2008), 37–52. 50 978-1-57735-421-5. http://aaai.org/ocs/index.php/ICWSM/09/ doi:10.1504/IJMSO.2008.021204. https://dx.doi.org/10.1504/ 50 51 paper/view/154. IJMSO.2008.021204. 51 38 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 [21] S. Auer, J. Demter, M. Martin and J. Lehmann, LODStats - An [30] M. Arenas, B. Cuenca Grau, E. Kharlamov, S. Marciuska, 1 2 Extensible Framework for High-Performance Dataset Analyt- D. Zheleznyakov and E. Jimenez-Ruiz, SemFacet: Semantic 2 3 ics, in: Knowledge Engineering and Knowledge Management - Faceted Search over Yago, in: Proceedings of the 23rd Inter- 3 18th International Conference, EKAW 2012, Galway City, Ire- national Conference on World Wide Web, WWW ’14 Compan- 4 4 land, October 8-12, 2012. Proceedings, A. ten Teije, J. Völker, ion, ACM, New York, NY, USA, 2014, pp. 123–126. ISBN 5 S. Handschuh, H. Stuckenschmidt, M. d’Aquin, A. Nikolov, 978-1-4503-2745-9. doi:10.1145/2567948.2577011. https:// 5 6 N. Aussenac-Gilles and N. Hernandez, eds, Lecture Notes doi.acm.org/10.1145/2567948.2577011. 6 7 in Computer Science, Vol. 7603, Springer, 2012, pp. 353– [31] Y. Tzitzikas, N. Manolis and P. Papadakos, Faceted exploration 7 8 362. ISBN 978-3-642-33875-5. doi:10.1007/978-3-642-33876- of RDF/S datasets: a survey, Journal of Intelligent Information 8 2_31. https://doi.org/10.1007/978-3-642-33876-2_31. Systems (2016), 1–36, ISSN 1573-7675. doi:10.1007/s10844- 9 9 [22] M. Teng and G. Zhu, Interactive search over Web scale RDF 016-0413-8. https://dx.doi.org/10.1007/s10844-016-0413-8. 10 data using predicates as constraints, Journal of Intelligent [32] A.J. Zitzelberger, D.W. Embley, S.W. Liddle and D.T. Scott, 10 11 Information Systems 44(3) (2015), 381–395, ISSN 1573-7675. HyKSS: Hybrid Keyword and Semantic Search, Journal 11 12 doi:10.1007/s10844-014-0336-1. https://dx.doi.org/10.1007/ on Data Semantics 4(4) (2015), 213–229, ISSN 1861-2040. 12 13 s10844-014-0336-1. doi:10.1007/s13740-014-0046-4. https://dx.doi.org/10.1007/ 13 [23] C. Bobed and E. Mena, QueryGen: Semantic interpre- s13740-014-0046-4. 14 14 tation of keyword queries over heterogeneous informa- [33] G. Zhu and C.A. Iglesias, Computing Semantic Similarity 15 tion systems, Information Sciences 329 (2016), 412–433, of Concepts in Knowledge Graphs, IEEE Transactions on 15 16 Special issue on Discovery Science, ISSN 0020-0255. Knowledge and Data Engineering PP(99) (2016), 1–1, ISSN 16 17 doi:https://dx.doi.org/10.1016/j.ins.2015.09.013. http://www. 1041-4347. doi:10.1109/TKDE.2016.2610428. 17 18 sciencedirect.com/science/article/pii/S0020025515006672. [34] J. Lee, J.-K. Min, A. Oh and C.-W. Chung, Effective 18 [24] C. Unger, L. Bühmann, J. Lehmann, A.-C. Ngonga Ngomo, ranking and search techniques for Web resources con- 19 19 D. Gerber and P. Cimiano, Template-based Question Answer- sidering semantic relationships, Information Processing 20 ing over RDF Data, in: Proceedings of the 21st International & Management 50(1) (2014), 132–155, ISSN 0306-4573. 20 21 Conference on World Wide Web, WWW ’12, ACM, New doi:https://dx.doi.org/10.1016/j.ipm.2013.08.007. http://www. 21 22 York, NY, USA, 2012, pp. 639–648. ISBN 978-1-4503-1229-5. sciencedirect.com/science/article/pii/S0306457313000976. 22 23 doi:10.1145/2187836.2187923. https://doi.acm.org/10.1145/ [35] D. Diefenbach, R. Usbeck, K.D. Singh and P. Maret, A Scal- 23 2187836.2187923. able Approach for Computing Semantic Relatedness Using Se- 24 24 [25] M. Dubey, S. Dasgupta, A. Sharma, K. Höffner and mantic Web Data, in: Proceedings of the 6th International Con- 25 J. Lehmann, AskNow: A Framework for Natural Language ference on Web Intelligence, Mining and Semantics, WIMS 25 26 Query Formalization in SPARQL, in: The Semantic Web. Lat- ’16, ACM, New York, NY, USA, 2016, pp. 20–1209. ISBN 26 27 est Advances and New Domains: 13th International Con- 978-1-4503-4056-4. doi:10.1145/2912845.2912864. https:// 27 28 ference, ESWC 2016, Heraklion, Crete, Greece, May 29 doi.acm.org/10.1145/2912845.2912864. 28 – June 2, 2016, Proceedings, Springer International Pub- [36] A. Wagner, P. Haase, A. Rettinger and H. Lamm, Discovering 29 29 lishing, Cham, 2016, pp. 300–316. ISBN 978-3-319-34129- Related Data Sources in Data-Portals, in: Proceedings of the 30 3. doi:10.1007/978-3-319-34129-3_19. https://dx.doi.org/10. 1st International Workshop on Semantic Statistics (SemStats), 30 31 1007/978-3-319-34129-3_19. CEUR Workshop Proceedings, Aachen, 2013, ISSN 1613- 31 32 [26] S. Djebali and T. Raimbault, SimplePARQL: A New Ap- 0073. http://ceur-ws.org/Vol-1549/#article-07. 32 33 proach Using Keywords over SPARQL to Query the Web [37] M.B. Ellefi, Z. Bellahsene, S. Dietze and K. Todorov, Dataset 33 of Data, in: Proceedings of the 11th International Confer- Recommendation for Data Linking: An Intensional Ap- 34 34 ence on Semantic Systems, SEMANTICS ’15, ACM, New proach, in: The Semantic Web. Latest Advances and New Do- 35 York, NY, USA, 2015, pp. 188–191. ISBN 978-1-4503-3462-4. mains - 13th International Conference, ESWC 2016, Herak- 35 36 doi:10.1145/2814864.2814890. https://doi.acm.org/10.1145/ lion, Crete, Greece, May 29 - June 2, 2016, Proceedings, 36 37 2814864.2814890. H. Sack, E. Blomqvist, M. d’Aquin, C. Ghidini, S.P. Ponzetto 37 38 [27] V. Fionda, G. Pirrò and C. Gutierrez, NautiLOD: A Formal and C. Lange, eds, Lecture Notes in Computer Science, 38 Language for the Web of Data Graph, ACM Trans. Web 9(1) Vol. 9678, Springer, 2016, pp. 36–51. ISBN 978-3-319-34128- 39 39 (2015), 5–1543, ISSN 1559-1131. doi:10.1145/2697393. https: 6. doi:10.1007/978-3-319-34129-3_3. https://doi.org/10.1007/ 40 //doi.acm.org/10.1145/2697393. 978-3-319-34129-3_3. 40 41 [28] S. Ferré, Sparklis: An expressive query builder for SPARQL [38] Y.C. Martins, F.F. da Mota and M.C. Cavalcanti, DSCrank: 41 42 endpoints with guidance in natural language, Semantic Web A Method for Selection and Ranking of Datasets, in: Meta- 42 43 8(3) (2017), 405–418. doi:10.3233/SW-150208. https://doi. data and Semantics Research: 10th International Confer- 43 org/10.3233/SW-150208. ence, MTSR 2016, Göttingen, Germany, November 22- 44 44 [29] D. Song, F. Schilder, C. Smiley, C. Brew, T. Zielund, H. Bretz, 25, 2016, Proceedings, Springer International Publish- 45 R. Martin, C. Dale, J. Duprey, T. Miller and J. Harrison, TR ing, Cham, 2016, pp. 333–344. ISBN 978-3-319-49157- 45 46 Discover: A Natural Language Interface for Querying and An- 8. doi:10.1007/978-3-319-49157-8_29. https://dx.doi.org/10. 46 47 alyzing Interlinked Datasets, in: The Semantic Web - ISWC 1007/978-3-319-49157-8_29. 47 48 2015: 14th International Semantic Web Conference, Bethle- [39] A. Lausch, A. Schmidt and L. Tischendorf, Data mining and 48 hem, PA, USA, October 11-15, 2015, Proceedings, Part II, linked open data – New perspectives for data analysis in 49 49 Springer International Publishing, Cham, 2015, pp. 21–37. environmental research, Ecological Modelling 295 (2015), 50 ISBN 978-3-319-25010-6. doi:10.1007/978-3-319-25010-6_2. 5–17, Use of ecological indicators in models, ISSN 0304-3800. 50 51 https://dx.doi.org/10.1007/978-3-319-25010-6_2. doi:https://dx.doi.org/10.1016/j.ecolmodel.2014.09.018. 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 39

1 http://www.sciencedirect.com/science/article/pii/ [52] G. Troullinou, H. Kondylakis, E. Daskalaki and D. Plex- 1 2 S0304380014004335. ousakis, RDF Digest: Efficient Summarization of RDF/S 2 3 [40] R. Delbru, N. Toupikov, M. Catasta, G. Tummarello and KBs, in: The Semantic Web. Latest Advances and New Do- 3 S. Decker, Hierarchical Link Analysis for Ranking Web Data, mains: 12th European Semantic Web Conference, ESWC 2015, 4 4 in: The Semantic Web: Research and Applications: 7th Ex- Portoroz, Slovenia, May 31 – June 4, 2015. Proceedings, 5 5 tended Semantic Web Conference, ESWC 2010, Heraklion, Springer International Publishing, Cham, 2015, pp. 119–134. 6 Crete, Greece, May 30 – June 3, 2010, Proceedings, Part II, ISBN 978-3-319-18818-8. doi:10.1007/978-3-319-18818-8_8. 6 7 Springer Berlin Heidelberg, Berlin, Heidelberg, 2010, pp. 225– https://doi.org/10.1007/978-3-319-18818-8_8. 7 8 239. ISBN 978-3-642-13489-0. doi:10.1007/978-3-642-13489- [53] A. Pappas, G. Troullinou, G. Roussakis, H. Kondylakis and 8 9 0_16. https://dx.doi.org/10.1007/978-3-642-13489-0_16. D. Plexousakis, Exploring Importance Measures for Summa- 9 The Semantic Web: 14th International 10 [41] A. Zaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann rizing RDF/S KBs, in: 10 and S. Auer, Quality assessment for Linked Data: A Survey, Conference, ESWC 2017, Portorož, Slovenia, May 28 – June 11 11 Semantic Web 7(1) (2016), 63–93. doi:10.3233/SW-150175. 1, 2017, Proceedings, Part I, Springer International Publish- 12 https://dx.doi.org/10.3233/SW-150175. ing, Cham, 2017, pp. 387–403. ISBN 978-3-319-58068-5. 12 13 [42] V. Jindal, S. Bawa and S. Batra, A review of ranking ap- doi:10.1007/978-3-319-58068-5_24. https://doi.org/10.1007/ 13 14 proaches for semantic search on Web, Inf. Process. Man- 978-3-319-58068-5_24. 14 15 age. 50(2) (2014), 416–425. doi:10.1016/j.ipm.2013.10.004. [54] P. Vandenbussche, J. Umbrich, L. Matteis, A. Hogan 15 16 https://doi.org/10.1016/j.ipm.2013.10.004. and C.B. Aranda, SPARQLES: Monitoring public 16 [43] F. Benedetti, S. Bergamasch and L. Po, A Visual Summary SPARQL endpoints, Semantic Web 8(6) (2017), 1049– 17 17 for Linked Open Data Sources, in: Proceedings of the 2014 1065. doi:10.3233/SW-170254. https://doi.org/10.3233/ 18 International Conference on Posters & Demonstrations Track SW-170254. 18 19 - Volume 1272, ISWC-PD’14, CEUR-WS.org, Aachen, Ger- [55] E. Prud’hommeaux and C.B. Aranda, SPARQL 1.1 Federated 19 20 many, Germany, 2014, pp. 173–176. http://dl.acm.org/citation. Query, W3C Recommendation, W3C, 2013, https://www.w3. 20 21 cfm?id=2878453.2878497. org/TR/2013/REC-sparql11-federated-query-20130321/. 21 22 [44] S. Bechhofer and A. Miles, SKOS Simple Knowl- [56] F. Corcoglioniti, A.P. Aprosio and M. Rospocher, Demonstrat- 22 edge Organization System Reference, W3C Recom- ing the Power of Streaming and Sorting for Non-distributed 23 23 mendation, W3C, 2009, https://www.w3.org/TR/2009/ RDF Processing: RDFpro, in: Proceedings of the ISWC 2015 24 REC-skos-reference-20090818/. Posters & Demonstrations Track co-located with the 14th Inter- 24 25 [45] D. Reynolds, The Organization Ontology, W3C Rec- national Semantic Web Conference (ISWC-2015), Bethlehem, 25 26 ommendation, W3C, 2014, https://www.w3.org/TR/2014/ PA, USA, October 11, 2015., CEUR Workshop Proceedings, 26 27 REC-vocab-org-20140116/. Vol. 1486, CEUR-WS.org, 2015. http://ceur-ws.org/Vol-1486/ 27 28 [46] D. Reynolds and R. Cyganiak, The RDF Data Cube Vocab- paper_52.pdf. 28 ulary, W3C Recommendation, W3C, 2014, https://www.w3. [57] E. Marx, S. Shekarpour, S. Auer and A.-C.N. Ngomo, Large- 29 29 org/TR/2014/REC-vocab-data-cube-20140116/. Scale RDF Dataset Slicing, in: Proceedings of the 2013 30 30 [47] J. McKinney and R. Iannella, vCard Ontology - for describing IEEE Seventh International Conference on Semantic Com- 31 People and Organizations, W3C Note, W3C, 2014, https:// puting, ICSC ’13, IEEE Computer Society, Washington, 31 32 www.w3.org/TR/2014/NOTE-vcard-rdf-20140522/. DC, USA, 2013, pp. 228–235. ISBN 978-0-7695-5119-7. 32 33 [48] DCMI Usage Board, DCMI Metadata Terms, DCMI Rec- doi:10.1109/ICSC.2013.47. https://dx.doi.org/10.1109/ICSC. 33 34 ommendation, Dublin Core Metadata Initiative, 2006, Pub- 2013.47. 34 35 lished online on December 18th, 2006 at http://dublincore. [58] A. Bertails, M. Arenas, E. Prud’hommeaux and J. Sequeda, 35 org/documents/2006/12/18/dcmi-terms/. http://dublincore.org/ A Direct Mapping of Relational Data to RDF, W3C Rec- 36 36 documents/2006/12/18/dcmi-terms/. ommendation, W3C, 2012, https://www.w3.org/TR/2012/ 37 [49] D. Brickley and L. Miller, The Friend Of A Friend (FOAF) REC-rdb-direct-mapping-20120927/. 37 38 Vocabulary Specification, 2007. http://xmlns.com/foaf/spec/. [59] J. Arwe, S. Speicher and A. Malhotra, Linked Data Platform 38 39 [50] M. Hepp, GoodRelations: An Ontology for Describing Prod- 1.0, W3C Recommendation, W3C, 2015, https://www.w3.org/ 39 40 ucts and Services Offers on the Web, in: Proceedings of TR/2015/REC-ldp-20150226/. 40 41 the 16th International Conference on Knowledge Engineer- [60] S. Capadisli and A. Guy, Linked Data Notifications, W3C 41 ing: Practice and Patterns, EKAW ’08, Springer-Verlag, Recommendation, W3C, 2017, https://www.w3.org/TR/2017/ 42 42 Berlin, Heidelberg, 2008, pp. 329–346. ISBN 978-3-540- REC-ldn-20170502/. 43 43 87695-3. doi:10.1007/978-3-540-87696-0_29. https://dx.doi. [61] S. Capadisli, A. Guy, C. Lange, S. Auer, A.V. Sambra 44 org/10.1007/978-3-540-87696-0_29. and T. Berners-Lee, Linked Data Notifications: A Resource- 44 45 [51] J. Klímek, J. Helmich and M. Necaský,ˇ Use Cases for Linked Centric Communication Protocol, in: The Semantic Web 45 46 Data Visualization Model, in: Proceedings of the Workshop on - 14th International Conference, ESWC 2017, Portorož, 46 47 Linked Data on the Web, LDOW 2015, co-located with the 24th Slovenia, May 28 - June 1, 2017, Proceedings, Part I, 47 International World Wide Web Conference (WWW 2015), Flo- E. Blomqvist, D. Maynard, A. Gangemi, R. Hoekstra, P. Hit- 48 48 rence, Italy, May 19th, 2015., C. Bizer, S. Auer, T. Berners-Lee zler and O. Hartig, eds, Lecture Notes in Computer Science, 49 49 and T. Heath, eds, CEUR Workshop Proceedings, Vol. 1409, Vol. 10249, 2017, pp. 537–553. ISBN 978-3-319-58067-8. 50 CEUR-WS.org, 2015. http://ceur-ws.org/Vol-1409/paper-08. doi:10.1007/978-3-319-58068-5_33. https://doi.org/10.1007/ 50 51 pdf. 978-3-319-58068-5_33. 51 40 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 [62] T. Käfer, J. Umbrich, A. Hogan and A. Polleres, DyLDO: [71] C.B. Neto, K. Müller, M. Brümmer, D. Kontokostas and 1 2 Towards a Dynamic Linked Data Observatory, in: WWW2012 S. Hellmann, LODVader: An Interface to LOD Visualiza- 2 3 Workshop on Linked Data on the Web, Lyon, France, 16 April, tion, Analytics and DiscovERy in Real-time, in: Proceedings 3 2012, C. Bizer, T. Heath, T. Berners-Lee and M. Hausenblas, of the 25th International Conference on World Wide Web, 4 4 eds, CEUR Workshop Proceedings, Vol. 937, CEUR-WS.org, WWW 2016, Montreal, Canada, April 11-15, 2016, Companion 5 5 2012. http://ceur-ws.org/Vol-937/ldow2012-paper-14.pdf. Volume, 2016, pp. 163–166. doi:10.1145/2872518.2890545. 6 [63] J. Genestoux and A. Parecki, WebSub, W3C Rec- https://doi.acm.org/10.1145/2872518.2890545. 6 7 ommendation, W3C, 2018, https://www.w3.org/TR/2018/ [72] J. Volz, C. Bizer, M. Gaedke and G. Kobilarov, Discover- 7 8 REC--20180123/. ing and Maintaining Links on the Web of Data, in: The Se- 8 9 [64] C. Webber and J. Tallon, ActivityPub, W3C Recom- mantic Web - ISWC 2009, A. Bernstein, D. Karger, T. Heath, 9 L. Feigenbaum, D. Maynard, E. Motta and K. Thirunarayan, 10 mendation, W3C, 2018, https://www.w3.org/TR/2018/ 10 REC-activitypub-20180123/. eds, Lecture Notes in Computer Science, Vol. 5823, Springer 11 11 [65] S. Faisal, K.M. Endris, S. Shekarpour, S. Auer and M. Vidal, Berlin Heidelberg, 2009, pp. 650–665. ISBN 978-3-642- 12 Co-evolution of RDF Datasets, in: Web Engineering - 16th 04929-3. doi:10.1007/978-3-642-04930-9_41. https://dx.doi. 12 13 International Conference, ICWE 2016, Lugano, Switzerland, org/10.1007/978-3-642-04930-9_41. 13 14 June 6-9, 2016. Proceedings, A. Bozzon, P. Cudré-Mauroux [73] K. Nguyen, R. Ichise and B. Le, SLINT: a schema-independent 14 linked data interlinking system, in: Proceedings of the 7th 15 and C. Pautasso, eds, Lecture Notes in Computer Science, 15 International Workshop on Ontology Matching, Boston, MA, 16 Vol. 9671, Springer, 2016, pp. 225–243. ISBN 978-3-319- 16 38790-1. doi:10.1007/978-3-319-38791-8_13. https://doi.org/ USA, November 11, 2012, P. Shvaiko, J. Euzenat, A. Ke- 17 17 10.1007/978-3-319-38791-8_13. mentsietsidis, M. Mao, N.F. Noy and H. Stuckenschmidt, eds, 18 [66] N. Arndt, N. Radtke and M. Martin, Distributed Collab- CEUR Workshop Proceedings, Vol. 946, CEUR-WS.org, 2012. 18 http://ceur-ws.org/Vol-946/om2012_Tpaper1.pdf. 19 oration on RDF Datasets Using Git: Towards the Quit 19 [74] S. Araújo, J. Hidders, D. Schwabe and A.P. de Vries, SER- 20 Store, in: 12th International Conference on Semantic Sys- 20 IMI - resource description similarity, RDF instance match- 21 tems Proceedings (SEMANTiCS 2016), SEMANTiCS ’16, 21 ing and interlinking, in: Proceedings of the 6th International Leipzig, Germany, 2016. ISBN 978-1-4503-4752-5/16/09. 22 Workshop on Ontology Matching, Bonn, Germany, October 22 doi:10.1145/2993318.2993328. https://svn.aksw.org/papers/ 23 24, 2011, P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao 23 2016/Semantics_Quit/public.pdf. 24 and I.F. Cruz, eds, CEUR Workshop Proceedings, Vol. 814, 24 [67] H.V. de Sompel, M.L. Nelson and R. Sanderson, HTTP CEUR-WS.org, 2011. http://ceur-ws.org/Vol-814/om2011_ 25 Framework for Time-Based Access to Resource States - Me- 25 poster6.pdf. 26 mento, RFC 7089 (2013), 1–50. doi:10.17487/RFC7089. https: 26 [75] R. Meymandpour and J.G. Davis, A semantic //doi.org/10.17487/RFC7089. 27 similarity measure for linked data: An informa- 27 [68] G. Pirrò, Explaining and Suggesting Relatedness in Knowl- 28 tion content-based approach, Knowledge-Based 28 edge Graphs, in: The Semantic Web - ISWC 2015 - 14th In- 29 Systems 109 (2016), 276–293, ISSN 0950-7051. 29 ternational Semantic Web Conference, Bethlehem, PA, USA, 30 doi:https://dx.doi.org/10.1016/j.knosys.2016.07.012. //www. 30 October 11-15, 2015, Proceedings, Part I, M. Arenas, Ó. Cor- 31 sciencedirect.com/science/article/pii/S095070511630226X. 31 cho, E. Simperl, M. Strohmaier, M. d’Aquin, K. Srini- [76] I. Traverso-Ribón, G. Palma, A. Flores and M.-E. Vidal, Con- 32 32 vas, P.T. Groth, M. Dumontier, J. Heflin, K. Thirunarayan sidering Semantics on the Discovery of Relations in Knowl- 33 and S. Staab, eds, Lecture Notes in Computer Science, edge Graphs, in: Knowledge Engineering and Knowledge 33 34 Vol. 9366, Springer, 2015, pp. 622–639. ISBN 978-3-319- Management: 20th International Conference, EKAW 2016, 34 35 25006-9. doi:10.1007/978-3-319-25007-6_36. https://dx.doi. Bologna, Italy, November 19-23, 2016, Proceedings, Springer 35 org/10.1007/978-3-319-25007-6_36. 36 International Publishing, Cham, 2016, pp. 666–680. ISBN 978- 36 [69] G. Pirrò and A. Cuzzocrea, RECAP: Building Relatedness 3-319-49004-5. doi:10.1007/978-3-319-49004-5_43. https: 37 37 Explanations on the Web, in: Proceedings of the 25th In- //dx.doi.org/10.1007/978-3-319-49004-5_43. 38 38 ternational Conference Companion on World Wide Web, [77] M. Nentwig, M. Hartung, A.N. Ngomo and E. Rahm, A survey 39 WWW ’16 Companion, International World Wide Web Con- of current Link Discovery frameworks, Semantic Web 8(3) 39 40 ferences Steering Committee, Republic and Canton of Geneva, (2017), 419–436. doi:10.3233/SW-150210. https://doi.org/10. 40 41 Switzerland, 2016, pp. 235–238. ISBN 978-1-4503-4144- 3233/SW-150210. 41 8. doi:10.1145/2872518.2890533. https://dx.doi.org/10.1145/ 42 [78] L. Otero-Cerdeira, F.J. Rodríguez-Martínez and A. Gómez- 42 2872518.2890533. Rodríguez, Ontology matching: A literature review, Expert 43 43 [70] S. Seufert, K. Berberich, S.J. Bedathur, S.K. Kondreddi, Systems with Applications 42(2) (2015), 949–971, ISSN 44 P. Ernst and G. Weikum, ESPRESSO: Explaining Relation- 0957-4174. doi:10.1016/j.eswa.2014.08.032. https://www. 44 45 ships between Entity Sets, in: Proceedings of the 25th ACM sciencedirect.com/science/article/pii/S0957417414005144. 45 46 International on Conference on Information and Knowledge [79] T. Knap, J. Michelfeit and M. Necaský,ˇ Linked Open Data 46 47 Management, CIKM 2016, Indianapolis, IN, USA, October 24- Aggregation: Conflict Resolution and Aggregate Quality, in: 47 28, 2016, S. Mukhopadhyay, C. Zhai, E. Bertino, F. Crestani, 36th Annual IEEE Computer Software and Applications Con- 48 48 J. Mostafa, J. Tang, L. Si, X. Zhou, Y. Chang, Y. Li and ference Workshops, COMPSAC 2012, Izmir, Turkey, July 16- 49 49 P. Sondhi, eds, ACM, 2016, pp. 1311–1320. ISBN 978-1-4503- 20, 2012, X. Bai, F. Belli, E. Bertino, C.K. Chang, A. Elçi, 50 4073-1. doi:10.1145/2983323.2983778. https://doi.acm.org/ C.C. Seceleanu, H. Xie and M. Zulkernine, eds, IEEE Com- 50 51 10.1145/2983323.2983778. puter Society, 2012, pp. 106–111. ISBN 978-1-4673-2714-5. 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 41

1 doi:10.1109/COMPSACW.2012.29. https://doi.org/10.1109/ [91] T. Knap, P. Škoda, J. Klímek and M. Necaský,ˇ UnifiedViews: 1 2 COMPSACW.2012.29. Towards ETL Tool for Simple yet Powerfull RDF Data Man- 2 3 [80] P.N. Mendes, H. Mühleisen and C. Bizer, Sieve: linked data agement, in: Proceedings of the Dateso 2015 Annual Inter- 3 quality assessment and fusion, in: Proceedings of the 2012 national Workshop on DAtabases, TExts, Specifications and 4 4 Joint EDBT/ICDT Workshops, Berlin, Germany, March 30, Objects, Nepˇrívˇecu Sobotky, Jiˇcín,Czech Republic, April 5 2012, D. Srivastava and I. Ari, eds, ACM, 2012, pp. 116–123. 14, 2015., 2015, pp. 111–120. http://ceur-ws.org/Vol-1343/ 5 6 ISBN 978-1-4503-1143-4. doi:10.1145/2320765.2320803. poster14.pdf. 6 7 http://doi.acm.org/10.1145/2320765.2320803. [92] J. Klímek, P. Škoda and M. Necaský,ˇ LinkedPipes ETL: 7 8 [81] S. Sahoo, T. Lebo and D. McGuinness, PROV-O: The PROV Evolved Linked Data Preparation, in: The Semantic Web - 8 ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 9 Ontology, W3C Recommendation, W3C, 2013, https://www. 9 w3.org/TR/2013/REC-prov-o-20130430/. 29 - June 2, 2016, Revised Selected Papers, 2016, pp. 95– 10 10 [82] I. Ermilov and T. Pellegrini, Data licensing on the cloud: 100. doi:10.1007/978-3-319-47602-5_20. https://dx.doi.org/ 11 empirical insights and implications for linked data, in: Pro- 10.1007/978-3-319-47602-5_20. 11 12 ceedings of the 11th International Conference on Semantic [93] J. Klímek and P. Škoda, Speeding up Publication of Linked 12 13 Systems, SEMANTICS 2015, Vienna, Austria, September 15- Data Using Data Chunking in LinkedPipes ETL, in: On 13 the Move to Meaningful Systems. OTM 2017 Con- 14 17, 2015, 2015, pp. 153–156. doi:10.1145/2814864.2814878. 14 https://doi.acm.org/10.1145/2814864.2814878. ferences - Confederated International Conferences: CoopIS, 15 15 [83] G. Governatori, H. Lam, A. Rotolo, S. Villata, G.A. Atemez- C&TC, and ODBASE 2017, Rhodes, Greece, October 23- 16 ing and F.L. Gandon, LIVE: a Tool for Checking Licenses 27, 2017, Proceedings, Part II, H. Panetto, C. Debruyne, 16 17 Compatibility between Vocabularies and Data, in: Proceedings W. Gaaloul, M.P. Papazoglou, A. Paschke, C.A. Ardagna 17 18 of the ISWC 2014 Posters & Demonstrations Track a track and R. Meersman, eds, Lecture Notes in Computer Science, 18 Vol. 10574, Springer, 2017, pp. 144–160. ISBN 978-3-319- 19 within the 13th International Semantic Web Conference, ISWC 19 69458-0. doi:10.1007/978-3-319-69459-7_10. https://doi.org/ 20 2014, Riva del Garda, Italy, October 21, 2014., M. Horridge, 20 M. Rospocher and J. van Ossenbruggen, eds, CEUR Work- 10.1007/978-3-319-69459-7_10. 21 21 shop Proceedings, Vol. 1272, CEUR-WS.org, 2014, pp. 77–80. [94] E. Kalampokis, A. Nikolov, P. Haase, R. Cyganiak, A. Stasiewicz, A. Karamanou, M. Zotou, D. Zeginis, E. Tam- 22 http://ceur-ws.org/Vol-1272/paper_62.pdf. 22 bouris and K.A. Tarabanis, Exploiting Linked Data Cubes 23 [84] K. Thellmann, M. Galkin, F. Orlandi and S. Auer, Link- 23 with OpenCube Toolkit, in: Proceedings of the ISWC 2014 24 DaViz – Automatic Binding of Linked Data to Visualiza- 24 Posters & Demonstrations Track a track within the 13th tions, in: The Semantic Web - ISWC 2015, Lecture Notes 25 International Semantic Web Conference, ISWC 2014, Riva 25 in Computer Science, Vol. 9366, Springer International 26 del Garda, Italy, October 21, 2014., 2014, pp. 137–140. 26 Publishing, 2015, pp. 147–162. ISBN 978-3-319-25006- http://ceur-ws.org/Vol-1272/paper_109.pdf. 27 9. doi:10.1007/978-3-319-25007-6_9. https://dx.doi.org/10. 27 [95] P. Haase, M. Schmidt and A. Schwarte, The Information Work- 28 1007/978-3-319-25007-6_9. 28 bench as a Self-Service Platform for Linked Data Applica- 29 [85] A. Polleres, P. Gearon and A. Passant, SPARQL 1.1 Update, 29 tions, in: Proceedings of the Second International Workshop W3C Recommendation, W3C, 2013, https://www.w3.org/TR/ 30 on Consuming Linked Data (COLD2011), Bonn, Germany, 30 2013/REC-sparql11-update-20130321/. 31 October 23, 2011, O. Hartig, A. Harth and J.F. Sequeda, eds, 31 [86] C. Ogbuji, SPARQL 1.1 Graph Store HTTP Protocol, W3C 32 CEUR Workshop Proceedings, Vol. 782, CEUR-WS.org, 2011. 32 Recommendation, W3C, 2013, https://www.w3.org/TR/2013/ 33 http://ceur-ws.org/Vol-782/HaaseEtAl_COLD2011.pdf. 33 REC-sparql11-http-rdf-update-20130321/. [96] A. Ferrara, A. Nikolov and F. Scharffe, Data Linking for the 34 34 [87] J. Tennison and G. Kellogg, Model for Tabular Data and Meta- Semantic Web, Int. J. Semantic Web Inf. Syst. 7(3) (2011), 46– 35 data on the Web, W3C Recommendation, W3C, 2015, https: 76. doi:10.4018/jswis.2011070103. https://doi.org/10.4018/ 35 36 //www.w3.org/TR/2015/REC-tabular-data-model-20151217/. jswis.2011070103. 36 37 [88] J. Klímek and J. Helmich, Vocabulary for Linked Data Visual- [97] W. Beek, L. Rietveld, H.R. Bazoobandi, J. Wielemaker and 37 ization Model, in: Proceedings of the Dateso 2015 Annual In- 38 S. Schlobach, LOD Laundromat: A Uniform Way of Pub- 38 ternational Workshop on DAtabases, TExts, Specifications and lishing Other People’s Dirty Data, in: The Semantic Web 39 39 Objects, Nepˇrívˇecu Sobotky, Jiˇcín,Czech Republic, April 14, - ISWC 2014 - 13th International Semantic Web Confer- 40 2015., 2015, pp. 28–39. http://ceur-ws.org/Vol-1343/paper3. ence, Riva del Garda, Italy, October 19-23, 2014. Proceed- 40 41 pdf. ings, Part I, P. Mika, T. Tudorache, A. Bernstein, C. Welty, 41 42 [89] M. Alonen, T. Kauppinen, O. Suominen and E. Hyvönen, C.A. Knoblock, D. Vrandecic, P.T. Groth, N.F. Noy, K. Janow- 42 Exploring the Linked University Data with Visualization Tools, 43 icz and C.A. Goble, eds, Lecture Notes in Computer Sci- 43 in: The Semantic Web: ESWC 2013 Satellite Events - ESWC ence, Vol. 8796, Springer, 2014, pp. 213–228. ISBN 978- 44 44 2013 Satellite Events, Montpellier, France, May 26-30, 2013, 3-319-11963-2. doi:10.1007/978-3-319-11964-9_14. https: 45 Revised Selected Papers, P. Cimiano, M. Fernández, V. López, //doi.org/10.1007/978-3-319-11964-9_14. 45 46 S. Schlobach and J. Völker, eds, Lecture Notes in Computer [98] A. Khalili, A. Loizou and F. van Harmelen, Adaptive 46 47 Science, Vol. 7955, Springer, 2013, pp. 204–208. ISBN 978-3- Linked Data-Driven Web Components: Building Flexible and 47 48 642-41241-7. doi:10.1007/978-3-642-41242-4_25. https://doi. Reusable Semantic Web Interfaces - Building Flexible and 48 org/10.1007/978-3-642-41242-4_25. Reusable Semantic Web Interfaces, in: The Semantic Web. Lat- 49 49 [90] L. Rietveld and R. Hoekstra, The YASGUI family of SPARQL est Advances and New Domains - 13th International Confer- 50 clients, Semantic Web 8(3) (2017), 373–383. doi:10.3233/SW- ence, ESWC 2016, Heraklion, Crete, Greece, May 29 - June 50 51 150197. https://doi.org/10.3233/SW-150197. 2, 2016, Proceedings, H. Sack, E. Blomqvist, M. d’Aquin, 51 42 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 C. Ghidini, S.P. Ponzetto and C. Lange, eds, Lecture Notes Patterns, Semantic Web 8(1) (2017), 87–112. doi:10.3233/SW- 1 2 in Computer Science, Vol. 9678, Springer, 2016, pp. 677– 160222. https://doi.org/10.3233/SW-160222. 2 3 692. ISBN 978-3-319-34128-6. doi:10.1007/978-3-319-34129- [108] E. Mäkelä, Aether - Generating and Viewing Extended VoID 3 3_41. https://doi.org/10.1007/978-3-319-34129-3_41. Statistical Descriptions of RDF Datasets, in: The Seman- 4 4 [99] Vuk Mijovic´ and Valentina Janev and Dejan Paunovic´ tic Web: ESWC 2014 Satellite Events - ESWC 2014 Satel- 5 and Sanja Vraneš, Exploratory spatio-temporal analysis of lite Events, Anissaras, Crete, Greece, May 25-29, 2014, Re- 5 6 linked statistical data, Web Semantics: Science, Services vised Selected Papers, V. Presutti, E. Blomqvist, R. Troncy, 6 7 and Agents on the World Wide Web 41 (2016), 1–8, ISSN H. Sack, I. Papadakis and A. Tordai, eds, Lecture Notes 7 8 1570-8268. doi:10.1016/j.websem.2016.10.002. https://www. in Computer Science, Vol. 8798, Springer, 2014, pp. 429– 8 433. ISBN 978-3-319-11954-0. doi:10.1007/978-3-319-11955- 9 sciencedirect.com/science/article/pii/S1570826816300488. 9 [100] G. Vega-Gorgojo, L. Slaughter, M. Giese, S. Heggestøyl, 7_61. https://doi.org/10.1007/978-3-319-11955-7_61. 10 10 J.W. Klüwer and A. Waaler, PepeSearch: Easy to Use and [109] K. Schlegel, T. Weißgerber, F. Stegmaier, C. Seifert, M. Gran- 11 Easy to Install Semantic Data Search, in: The Semantic Web - itzer and H. Kosch, Balloon Synopsis: A Modern Node- 11 12 ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May Centric RDF Viewer and Browser for the Web, in: The Se- 12 13 29 - June 2, 2016, Revised Selected Papers, H. Sack, G. Rizzo, mantic Web: ESWC 2014 Satellite Events - ESWC 2014 Satel- 13 lite Events, Anissaras, Crete, Greece, May 25-29, 2014, Re- 14 N. Steinmetz, D. Mladenic, S. Auer and C. Lange, eds, Lec- 14 ture Notes in Computer Science, Vol. 9989, 2016, pp. 146– vised Selected Papers, V. Presutti, E. Blomqvist, R. Troncy, 15 15 150. ISBN 978-3-319-47601-8. doi:10.1007/978-3-319-47602- H. Sack, I. Papadakis and A. Tordai, eds, Lecture Notes 16 5_29. https://doi.org/10.1007/978-3-319-47602-5_29. in Computer Science, Vol. 8798, Springer, 2014, pp. 249– 16 17 [101] G. Giannopoulos, N. Vitsas, N. Karagiannakis, D. Skoutas and 253. ISBN 978-3-319-11954-0. doi:10.1007/978-3-319-11955- 17 18 S. Athanasiou, FAGI-gis: A Tool for Fusing Geospatial RDF 7_29. https://doi.org/10.1007/978-3-319-11955-7_29. 18 [110] L. Costabello and F. Gandon, Adaptive Presentation of Linked 19 Data, in: The Semantic Web: ESWC 2015 Satellite Events - 19 ESWC 2015 Satellite Events Portorož, Slovenia, May 31 - June Data on Mobile, in: Proceedings of the 23rd International 20 Conference on World Wide Web 20 4, 2015, Revised Selected Papers, F. Gandon, C. Guéret, S. Vil- , WWW ’14 Companion, ACM, 21 21 lata, J.G. Breslin, C. Faron-Zucker and A. Zimmermann, eds, New York, NY, USA, 2014, pp. 243–244. ISBN 978-1-4503- 2745-9. doi:10.1145/2567948.2577314. https://doi.acm.org/ 22 Lecture Notes in Computer Science, Vol. 9341, Springer, 2015, 22 10.1145/2567948.2577314. 23 pp. 51–57. ISBN 978-3-319-25638-2. doi:10.1007/978-3-319- 23 [111] M. Martin, K. Abicht, C. Stadler, A.-C. Ngonga Ngomo, 24 25639-9_10. https://doi.org/10.1007/978-3-319-25639-9_10. 24 T. Soru and S. Auer, CubeViz: Exploration and Visualization [102] N. Marie and F.L. Gandon, Survey of Linked Data Based Ex- 25 of Statistical Linked Data, in: Proceedings of the 24th Interna- 25 ploration Systems, in: Proceedings of the 3rd International 26 tional Conference on World Wide Web, WWW ’15 Compan- 26 Workshop on Intelligent Exploration of Semantic Data (IESD ion, ACM, New York, NY, USA, 2015, pp. 219–222. ISBN 27 2014) co-located with the 13th International Semantic Web 27 978-1-4503-3473-0. doi:10.1145/2740908.2742848. https:// 28 Conference (ISWC 2014), Riva del Garda, Italy, October 20, 28 doi.acm.org/10.1145/2740908.2742848. 29 2014., D. Thakker, D. Schwabe, K. Kozaki, R. Garcia, C. Dijk- 29 [112] Y. Roussakis, I. Chrysakis, K. Stefanidis and G. Flouris, D2V: shoorn and R. Mizoguchi, eds, CEUR Workshop Proceedings, 30 A Tool for Defining, Detecting and Visualizing Changes on 30 Vol. 1279, CEUR-WS.org, 2014. http://ceur-ws.org/Vol-1279/ 31 the Data Web, in: Proceedings of the ISWC 2015 Posters & 31 iesd14_8.pdf. 32 Demonstrations Track co-located with the 14th International 32 [103] A. Soylu, E. Kharlamov, D. Zheleznyakov, E. Jimenez- 33 Semantic Web Conference (ISWC-2015), Bethlehem, PA, USA, 33 Ruiz, M. Giese, M.G. Skjæveland, D. Hovland, R. Schlatte, October 11, 2015., S. Villata, J.Z. Pan and M. Dragoni, eds, 34 34 S. Brandt, H. Lie and I. Horrocks, OptiqueVQS: a Visual query CEUR Workshop Proceedings, Vol. 1486, CEUR-WS.org, 35 system over ontologies for industry, Semantic Web (2018), 2015. http://ceur-ws.org/Vol-1486/paper_26.pdf. 35 36 1–34. doi:10.3233/SW-180293. https://content.iospress.com/ [113] D. Roman, N. Nikolov, A. Pultier, D. Sukhobok, B. Elvesæter, 36 articles/semantic-web/sw293. 37 A. Berre, X. Ye, M. Dimitrov, A. Simov, M. Zarev, R. Moyni- 37 [104] A. Dadzie and M. Rowe, Approaches to visualising 38 han, B. Roberts, I. Berlocher, S.-H. Kim, T. Lee, A. Smith and 38 Linked Data: A survey, Semantic Web 2(2) (2011), 89– T. Heath, DataGraft: One-Stop-Shop for Open Data Manage- 39 39 124. doi:10.3233/SW-2011-0037. https://doi.org/10.3233/ ment, Semantic Web. http://semantic-web-journal.net/content/ 40 SW-2011-0037. datagraft-one-stop-shop-open-data-management-0. 40 41 [105] S. Mazumdar, D. Petrelli and F. Ciravegna, Exploring user [114] P. Heyvaert, P. Colpaert, R. Verborgh, E. Mannens and 41 42 and system requirements of linked data visualization through R.V. de Walle, Merging and Enriching DCAT Feeds to Im- 42 a visual dashboard approach, Semantic Web 5(3) (2014), 43 prove Discoverability of Datasets, in: The Semantic Web: 43 203–220. doi:10.3233/SW-2012-0072. https://doi.org/10.3233/ ESWC 2015 Satellite Events - ESWC 2015 Satellite Events 44 44 SW-2012-0072. Portorož, Slovenia, May 31 - June 4, 2015, Revised Se- 45 [106] A. Barbosa, I.I. Bittencourt, S.W.M. Siqueira, lected Papers, F. Gandon, C. Guéret, S. Villata, J.G. Breslin, 45 46 R. de Amorim Silva and I. Calado, The Use of Software C. Faron-Zucker and A. Zimmermann, eds, Lecture Notes 46 47 Tools in Linked Data Publication and Consumption: A in Computer Science, Vol. 9341, Springer, 2015, pp. 67– 47 Systematic Literature Review, Int. J. Semantic Web Inf. 48 71. ISBN 978-3-319-25638-2. doi:10.1007/978-3-319-25639- 48 Syst. 13(4) (2017), 68–88. doi:10.4018/IJSWIS.2017100104. 9_13. https://doi.org/10.1007/978-3-319-25639-9_13. 49 49 https://doi.org/10.4018/IJSWIS.2017100104. [115] J. Attard, F. Orlandi and S. Auer, ExConQuer Framework 50 [107] A.G. Nuzzolese, V.Presutti, A. Gangemi, S. Peroni and P. Cian- - Softening RDF Data to Enhance Linked Data Reuse, in: 50 51 carini, Aemoo: Linked Data exploration based on Knowledge Proceedings of the ISWC 2015 Posters & Demonstrations 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 43

1 Track co-located with the 14th International Semantic Web 261. ISBN 978-3-319-38790-1. doi:10.1007/978-3-319-38791- 1 2 Conference (ISWC-2015), Bethlehem, PA, USA, October 11, 8_14. https://doi.org/10.1007/978-3-319-38791-8_14. 2 3 2015., S. Villata, J.Z. Pan and M. Dragoni, eds, CEUR Work- [124] F. Benedetti, S. Bergamaschi and L. Po, LODeX: A Tool for 3 shop Proceedings, Vol. 1486, CEUR-WS.org, 2015. http: Visual Querying Linked Open Data, in: Proceedings of the 4 4 //ceur-ws.org/Vol-1486/paper_39.pdf. ISWC 2015 Posters & Demonstrations Track co-located with 5 [116] D. Collarana, C. Lange and S. Auer, FuhSen: A Platform the 14th International Semantic Web Conference (ISWC-2015), 5 6 for Federated, RDF-based Hybrid Search, in: Proceedings Bethlehem, PA, USA, October 11, 2015., S. Villata, J.Z. Pan 6 7 of the 25th International Conference Companion on World and M. Dragoni, eds, CEUR Workshop Proceedings, Vol. 1486, 7 8 Wide Web, WWW ’16 Companion, International World Wide CEUR-WS.org, 2015. http://ceur-ws.org/Vol-1486/paper_62. 8 pdf. 9 Web Conferences Steering Committee, Republic and Can- 9 ton of Geneva, Switzerland, 2016, pp. 171–174. ISBN 978-1- [125] A. Micsik, Z. Tóth and S. Turbucz, LODmilla: Shared Visu- 10 10 4503-4144-8. doi:10.1145/2872518.2890535. https://doi.org/ alization of Linked Open Data, in: Theory and Practice of 11 10.1145/2872518.2890535. Digital Libraries - TPDL 2013 Selected Workshops - LCPD 11 12 [117] M. Kaschesky and L. Selmi, Fusepool R5 Linked Data Frame- 2013, SUEDL 2013, DataCur 2013, Held in Valletta, Malta, 12 13 work: Concepts, Methodologies, and Tools for Linked Data, September 22-26, 2013. Revised Selected Papers, L. Bo- 13 likowski, V. Casarosa, P. Goodale, N. Houssos, P. Manghi 14 in: Proceedings of the 14th Annual International Confer- 14 ence on Digital Government Research, dg.o ’13, ACM, New and J. Schirrwagen, eds, Communications in Computer and 15 15 York, NY, USA, 2013, pp. 156–165. ISBN 978-1-4503-2057-3. Information Science, Vol. 416, Springer, 2013, pp. 89–100. 16 doi:10.1145/2479724.2479748. https://doi.acm.org/10.1145/ ISBN 978-3-319-08424-4. doi:10.1007/978-3-319-08425-1_9. 16 17 2479724.2479748. https://doi.org/10.1007/978-3-319-08425-1_9. 17 18 [118] A. Flores, M. Vidal and G. Palma, Graphium Chrysalis: Ex- [126] M. Dudáš, V. Svátek and J. Mynarz, Dataset Summary Vi- 18 sualization with LODSight, in: The Semantic Web: ESWC 19 ploiting Graph Database Engines to Analyze RDF Graphs, in: 19 2015 Satellite Events - ESWC 2015 Satellite Events Por- 20 The Semantic Web: ESWC 2014 Satellite Events - ESWC 2014 20 Satellite Events, Anissaras, Crete, Greece, May 25-29, 2014, torož, Slovenia, May 31 - June 4, 2015, Revised Selected Pa- 21 pers, F. Gandon, C. Guéret, S. Villata, J.G. Breslin, C. Faron- 21 Revised Selected Papers, V. Presutti, E. Blomqvist, R. Troncy, Zucker and A. Zimmermann, eds, Lecture Notes in Com- 22 H. Sack, I. Papadakis and A. Tordai, eds, Lecture Notes 22 puter Science, Vol. 9341, Springer, 2015, pp. 36–40. ISBN 23 in Computer Science, Vol. 8798, Springer, 2014, pp. 326– 23 978-3-319-25638-2. doi:10.1007/978-3-319-25639-9_7. https: 24 331. ISBN 978-3-319-11954-0. doi:10.1007/978-3-319-11955- 24 //doi.org/10.1007/978-3-319-25639-9_7. 7_43. https://doi.org/10.1007/978-3-319-11955-7_43. 25 [127] P. Bellini, P. Nesi and A. Venturi, Linked open graph: 25 [119] M. Taheriyan, C.A. Knoblock, P.A. Szekely and J.L. Ambite, 26 Browsing multiple SPARQL entry points to build your own 26 Learning the semantics of structured data sources, J. Web Sem. LOD views, J. Vis. Lang. Comput. 25(6) (2014), 703–716. 27 37-38 (2016), 152–169. doi:10.1016/j.websem.2015.12.003. 27 doi:10.1016/j.jvlc.2014.10.003. https://doi.org/10.1016/j.jvlc. 28 https://doi.org/10.1016/j.websem.2015.12.003. 28 2014.10.003. 29 [120] A. Steigmiller, T. Liebig and B. Glimm, Konclude: 29 [128] F. Ilievski, W. Beek, M. van Erp, L. Rietveld and S. Schlobach, System description, J. Web Sem. 27 (2014), 78–85. 30 LOTUS: Linked Open Text UnleaShed, in: Proceedings of 30 31 doi:10.1016/j.websem.2014.06.003. https://doi.org/10.1016/j. the 6th International Workshop on Consuming Linked Data 31 websem.2014.06.003. 32 co-located with 14th International Semantic Web Conference 32 [121] Beck, Nicolas and Scheglmann, Stefan and Gottron, Thomas, 33 (ISWC 2015), Bethlehem, Pennsylvania, US, October 12th, 33 LinDA: A Service Infrastructure for Linked Data Analy- 2015., O. Hartig, J.F. Sequeda and A. Hogan, eds, CEUR 34 34 sis and Provision of Data Statistics, in: The Semantic Web: Workshop Proceedings, Vol. 1426, CEUR-WS.org, 2015. http: 35 ESWC 2013 Satellite Events: ESWC 2013 Satellite Events, //ceur-ws.org/Vol-1426/paper-06.pdf. 35 36 Montpellier, France, May 26-30, 2013, Revised Selected Pa- [129] J. Klímek, J. Helmich and M. Necaský,ˇ Payola: Collaborative 36 pers, Cimiano, Philipp and Fernández, Miriam and Lopez, 37 Linked Data Analysis and Visualization Framework, in: The 37 Vanessa and Schlobach, Stefan and V"olker,¨ Johanna, ed., 38 Semantic Web: ESWC 2013 Satellite Events - ESWC 2013 38 Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 225– Satellite Events, Montpellier, France, May 26-30, 2013, Re- 39 39 230. ISBN 978-3-642-41242-4. doi:10.1007/978-3-642-41242- vised Selected Papers, P. Cimiano, M. Fernández, V. López, 40 4_29. https://doi.org/10.1007/978-3-642-41242-4_29. S. Schlobach and J. Völker, eds, Lecture Notes in Computer 40 41 [122] T. Trinh, B. Do, P. Wetz, A. Anjomshoaa, E. Kiesling and Science, Vol. 7955, Springer, 2013, pp. 147–151. ISBN 978- 41 42 A.M. Tjoa, Open Mashup Platform - A Smart Data Exploration 3-642-41241-7. doi:10.1007/978-3-642-41242-4_14. https: 42 Environment, in: Proceedings of the ISWC 2014 Posters & 43 //doi.org/10.1007/978-3-642-41242-4_14. 43 Demonstrations Track a track within the 13th International Se- [130] A. Jentzsch, C. Dullweber, P. Troiano and F. Naumann, Ex- 44 44 mantic Web Conference, ISWC 2014, Riva del Garda, Italy, Oc- ploring Linked Data Graph Structures, in: Proceedings of 45 tober 21, 2014., 2014, pp. 53–56. http://ceur-ws.org/Vol-1272/ the ISWC 2015 Posters & Demonstrations Track co-located 45 46 paper_45.pdf. with the 14th International Semantic Web Conference (ISWC- 46 47 [123] A. Thalhammer, N. Lasierra and A. Rettinger, LinkSUM: 2015), Bethlehem, PA, USA, October 11, 2015., 2015. http: 47 Web 48 Using Link Analysis to Summarize Entity Data, in: //ceur-ws.org/Vol-1486/paper_60.pdf. 48 Engineering - 16th International Conference, ICWE 2016, [131] B. Balis, T. Grabiec and M. Bubak, Domain-Driven Visual 49 49 Lugano, Switzerland, June 6-9, 2016. Proceedings, A. Boz- Query Formulation over RDF Data Sets, in: Parallel Process- 50 zon, P. Cudré-Mauroux and C. Pautasso, eds, Lecture Notes ing and Applied Mathematics - 10th International Confer- 50 51 in Computer Science, Vol. 9671, Springer, 2016, pp. 244– ence, PPAM 2013, Warsaw, Poland, September 8-11, 2013, 51 44 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 Revised Selected Papers, Part I, R. Wyrzykowski, J.J. Don- H. Panetto, R. Meersman, T.S. Dillon, eva Kühn, D. O’Sullivan 1 2 garra, K. Karczewski and J. Wasniewski, eds, Lecture Notes and C.A. Ardagna, eds, Lecture Notes in Computer Science, 2 3 in Computer Science, Vol. 8384, Springer, 2013, pp. 293– Vol. 10033, 2016, pp. 792–809. ISBN 978-3-319-48471-6. 3 301. ISBN 978-3-642-55223-6. doi:10.1007/978-3-642-55224- doi:10.1007/978-3-319-48472-3_50. https://doi.org/10.1007/ 4 4 3_28. https://doi.org/10.1007/978-3-642-55224-3_28. 978-3-319-48472-3_50. 5 5 [132] N. Bikakis, G. Papastefanatos, M. Skourla and T. Sellis, A [141] V. Fionda, C. Gutiérrez and G. Pirrò, The swget portal: 6 hierarchical aggregation framework for efficient multilevel Navigating and acting on the web of linked data, J. Web 6 7 visual exploration and analysis, Semantic Web 8(1) (2017), Sem. 26 (2014), 29–35. doi:10.1016/j.websem.2014.04.003. 7 8 139–179. doi:10.3233/SW-160226. https://doi.org/10.3233/ https://doi.org/10.1016/j.websem.2014.04.003. 8 SW-160226. 9 [142] P. Bottoni and M. Ceriani, SWOWS and dynamic queries to 9 [133] R. Chawuthai and H. Takeda, RDF Graph Visualization by build browsing applications on linked data, J. Vis. Lang. Com- 10 Semantic Technol- 10 Interpreting Linked Data as Knowledge, in: put. 25(6) (2014), 738–744. doi:10.1016/j.jvlc.2014.10.027. 11 ogy - 5th Joint International Conference, JIST 2015, Yichang, 11 https://doi.org/10.1016/j.jvlc.2014.10.027. China, November 11-13, 2015, Revised Selected Papers, G. Qi, 12 [143] M. Röder, A.N. Ngomo, I. Ermilov and A. Both, Detecting 12 K. Kozaki, J.Z. Pan and S. Yu, eds, Lecture Notes in Com- 13 Similar Linked Datasets Using Topic Modelling, in: The Se- 13 puter Science, Vol. 9544, Springer, 2015, pp. 23–39. ISBN 14 mantic Web. Latest Advances and New Domains - 13th Inter- 14 978-3-319-31675-8. doi:10.1007/978-3-319-31676-5_2. https: national Conference, ESWC 2016, Heraklion, Crete, Greece, 15 //doi.org/10.1007/978-3-319-31676-5_2. 15 May 29 - June 2, 2016, Proceedings 16 [134] Assaf, Ahmad and Troncy, Raphaël and Senart, Aline, , H. Sack, E. Blomqvist, 16 M. d’Aquin, C. Ghidini, S.P. Ponzetto and C. Lange, eds, Lec- 17 Roomba: An Extensible Framework to Validate and Build 17 ture Notes in Computer Science, Vol. 9678, Springer, 2016, 18 Dataset Profiles, in: The Semantic Web: ESWC 2015 Satel- 18 lite Events: ESWC 2015 Satellite Events, Portorož, Slovenia, pp. 3–19. ISBN 978-3-319-34128-6. doi:10.1007/978-3-319- 19 19 May 31 – June 4, 2015, Revised Selected Papers, Gandon, 34129-3_1. https://doi.org/10.1007/978-3-319-34129-3_1. 20 Fabien and Guéret, Christophe and Villata, Serena and Breslin, [144] M. Luggen, A. Gschwend, B. Anrig and P. Cudré-Mauroux, 20 21 John and Faron-Zucker, Catherine and Zimmermann, Antoine, Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked 21 22 ed., Springer International Publishing, Cham, 2015, pp. 325– Data, in: Proceedings of the Workshop on Linked Data on 22 the Web, LDOW 2015, co-located with the 24th Interna- 23 339. ISBN 978-3-319-25639-9. doi:10.1007/978-3-319-25639- 23 9_46. https://doi.org/10.1007/978-3-319-25639-9_46. tional World Wide Web Conference (WWW 2015), Florence, 24 24 [135] J. Polowinski, Towards RVL: A Declarative Language for Vi- Italy, May 19th, 2015., C. Bizer, S. Auer, T. Berners-Lee 25 sualizing RDFS/OWL Data, in: Proceedings of the 3rd Interna- and T. Heath, eds, CEUR Workshop Proceedings, Vol. 1409, 25 26 tional Conference on Web Intelligence, Mining and Semantics, CEUR-WS.org, 2015. http://ceur-ws.org/Vol-1409/paper-07. 26 27 WIMS ’13, ACM, New York, NY, USA, 2013, pp. 38–13811. pdf. 27 28 ISBN 978-1-4503-1850-1. doi:10.1145/2479787.2479825. [145] P. Ristoski and H. Paulheim, Visual Analysis of Statistical 28 https://dx.doi.org/10.1145/2479787.2479825. Data on Maps Using Linked Open Data, in: The Semantic Web: 29 29 [136] C. Nikolaou, K. Dogani, K. Bereta, G. Garbis, M. Karpathio- ESWC 2015 Satellite Events - ESWC 2015 Satellite Events 30 30 takis, K. Kyzirakos and M. Koubarakis, Sextant: Visualizing Portorož, Slovenia, May 31 - June 4, 2015, Revised Selected 31 time-evolving linked geospatial data, J. Web Sem. 35 (2015), Papers, F. Gandon, C. Guéret, S. Villata, J.G. Breslin, C. Faron- 31 32 35–52. doi:10.1016/j.websem.2015.09.004. https://doi.org/10. Zucker and A. Zimmermann, eds, Lecture Notes in Computer 32 33 1016/j.websem.2015.09.004. Science, Vol. 9341, Springer, 2015, pp. 138–143. ISBN 978-3- 33 [137] S. Yumusak, E. Dogdu, H. Kodaz, A. Kamilaris and P. Van- 34 319-25638-2. doi:10.1007/978-3-319-25639-9_27. https://doi. 34 denbussche, SpEnD: Linked Data SPARQL Endpoints Dis- org/10.1007/978-3-319-25639-9_27. 35 IEICE Transactions 35 covery Using Search Engines, 100-D(4) [146] P. Arapov, M. Buffa and A. Ben Othmane, Semantic Mashup 36 (2017), 758–767. https://search.ieice.org/bin/summary.php? 36 with the Online IDE WikiNEXT, in: Proceedings of the 23rd id=e100-d_4_758. 37 International Conference on World Wide Web, WWW ’14 37 [138] S. Scheider, A. Degbelo, R. Lemmens, C. van Elzakker, P. Zim- 38 Companion, ACM, New York, NY, USA, 2014, pp. 119–122. 38 merhof, N. Kostic, J. Jones and G. Banhatti, Exploratory 39 ISBN 978-1-4503-2745-9. doi:10.1145/2567948.2577010. 39 querying of SPARQL endpoints in space and time, Seman- https://doi.acm.org/10.1145/2567948.2577010. 40 tic Web 8(1) (2017), 65–86. doi:10.3233/SW-150211. https: 40 41 //doi.org/10.3233/SW-150211. [147] S. Villata, J.Z. Pan and M. Dragoni (eds), Proceedings of 41 the ISWC 2015 Posters & Demonstrations Track co-located 42 [139] A. Hasnain, Q. Mehmood, S.S. e Zainab and A. Hogan, 42 with the 14th International Semantic Web Conference (ISWC- 43 SPORTAL: Searching for Public SPARQL Endpoints, in: Pro- 43 ceedings of the ISWC 2016 Posters & Demonstrations Track 2015), Bethlehem, PA, USA, October 11, 2015, in CEUR 44 44 co-located with 15th International Semantic Web Confer- Workshop Proceedings, Vol. 1486, CEUR-WS.org, 2015. http: 45 ence (ISWC 2016), Kobe, Japan, October 19, 2016., 2016. //ceur-ws.org/Vol-1486. 45 46 http://ceur-ws.org/Vol-1690/paper78.pdf. [148] P. Cimiano, M. Fernández, V. López, S. Schlobach and 46 47 [140] B. Do, P. Wetz, E. Kiesling, P.R. Aryan, T. Trinh and J. Völker (eds), The Semantic Web: ESWC 2013 Satellite 47 Events - ESWC 2013 Satellite Events, Montpellier, France, 48 A.M. Tjoa, StatSpace: A Unified Platform for Statistical Data 48 Exploration, in: On the Move to Meaningful Internet Sys- May 26-30, 2013, Revised Selected Papers, in Lecture Notes 49 49 tems: OTM 2016 Conferences - Confederated International in Computer Science, Vol. 7955, Springer, 2013. ISBN 978-3- 50 Conferences: CoopIS, C&TC, and ODBASE 2016, Rhodes, 642-41241-7. doi:10.1007/978-3-642-41242-4. https://doi.org/ 50 51 Greece, October 24-28, 2016, Proceedings, C. Debruyne, 10.1007/978-3-642-41242-4. 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 45

1 [149] C. Bizer, S. Auer, T. Berners-Lee and T. Heath (eds), Proceed- 1 2 ings of the Workshop on Linked Data on the Web, LDOW 2 3 2015, co-located with the 24th International World Wide Web 3 Conference (WWW 2015), Florence, Italy, May 19th, 2015, 4 4 in CEUR Workshop Proceedings, Vol. 1409, CEUR-WS.org, 5 2015. http://ceur-ws.org/Vol-1409. 5 6 [150] A. Bozzon, P. Cudré-Mauroux and C. Pautasso (eds), Web 6 7 Engineering - 16th International Conference, ICWE 2016, 7 8 Lugano, Switzerland, June 6-9, 2016. Proceedings, in Lecture 8 Notes in Computer Science, Vol. 9671, Springer, 2016. ISBN 9 9 978-3-319-38790-1. doi:10.1007/978-3-319-38791-8. https: 10 //doi.org/10.1007/978-3-319-38791-8. 10 11 [151] V. Presutti, E. Blomqvist, R. Troncy, H. Sack, I. Papadakis 11 12 and A. Tordai (eds), The Semantic Web: ESWC 2014 Satel- 12 13 lite Events - ESWC 2014 Satellite Events, Anissaras, Crete, 13 Greece, May 25-29, 2014, Revised Selected Papers, in Lec- 14 14 ture Notes in Computer Science, Vol. 8798, Springer, 2014. 15 ISBN 978-3-319-11954-0. doi:10.1007/978-3-319-11955-7. 15 16 https://doi.org/10.1007/978-3-319-11955-7. 16 17 [152] F. Gandon, C. Guéret, S. Villata, J.G. Breslin, C. Faron-Zucker 17 18 and A. Zimmermann (eds), The Semantic Web: ESWC 2015 18 Satellite Events - ESWC 2015 Satellite Events Portorož, Slove- 19 19 nia, May 31 - June 4, 2015, Revised Selected Papers, in Lec- 20 ture Notes in Computer Science, Vol. 9341, Springer, 2015. 20 21 ISBN 978-3-319-25638-2. doi:10.1007/978-3-319-25639-9. 21 22 https://doi.org/10.1007/978-3-319-25639-9. 22 23 [153] H. Sack, E. Blomqvist, M. d’Aquin, C. Ghidini, S.P. Ponzetto 23 and C. Lange (eds), The Semantic Web. Latest Advances 24 24 and New Domains - 13th International Conference, ESWC 25 2016, Heraklion, Crete, Greece, May 29 - June 2, 2016, Pro- 25 26 ceedings, in Lecture Notes in Computer Science, Vol. 9678, 26 27 Springer, 2016. ISBN 978-3-319-34128-6. doi:10.1007/978-3- 27 28 319-34129-3. https://doi.org/10.1007/978-3-319-34129-3. 28 29 29 30 30 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51 46 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 Appendix A. Tools evaluated using all criteria 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 47

1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8

9 Tool/Criterion LP-VIZ V2 VISU Sp YAS UV LP-ETL OC DL LAUN LDR SF ELD PEPE FAGI RDFPRO 9 10 1.1 CKAN API – – – – – – –  – – – – – – – – 10 11 1.2 RDF metadata in dump – – – – – – – – – – – – – – – – 11 12 1.3 Metadata in SPARQL endpoint – – – – – – – – – – – – – – – – 12 13 1.4 Metadata from IRI – – – – – – – – – – – – – – – – 13 14 dereferencing 14 15 2.1 Dataset profiling – – – – – – – – – – – – – – – – 15 16 3.1 Structured search form – – –  – – –  – – –  – – – – 16 17 3.2 Fulltext search – – – – – – –  – – –  – – – – 17 18 3.3 Natural language based intent – – – – – – – – – – – – – – – – 18 19 expression 19 20 3.4 Formal language based intent – – – – – – – – – – – – – – – – 20 21 expression 21 22 4.1 Seach query automated – – –   – –  – – – – – – – – 22 suggestions 23 23 4.2 Faceted search – – – – – – – – – – –  – – – – 24 24 25 5.1 Search query expansion – – – – – – – – – – – – – – – – 25 26 6.1 Search results extension – – – – – – – – – – – – – – – – 26 27 7.1 Content-based dataset ranking – – – – – – – – – – – – – – – – 27 28 7.2 Context-based dataset ranking – – – – – – – – – – – – – – – – 28 29 7.3 Intent-based dataset ranking – – – – – – – – – – – – – – – – 29 30 8.1 Preview – W3C vocabularies – – – – – – – – – – – – – – – – 30 31 8.2 Preview – LOV vocabularies – – – – – – – – – – – – – – – – 31 32 8.3 Preview metadata – – – – – – – – – – – – – – – – 32 33 9.1 Preview data – –  – – – –  –  – – – – –  33 34 9.2 Schema extraction –  –  – – –  – – – – –  – – 34 35 10.1 Quality indicators – – – – – – – – – – – – – – – – 35 36 11.1 IRI dereferencing – – – – – – – – – – – – – – – – 36 37 11.2 Crawling – – – – – – – – – – – – – – – – 37 38 12.1 SPARQL: querying     – – – – – –  –    – 38 39 12.2 SPARQL: named graphs  – – – –   –  – – –  –  – 39 40 12.3 SPARQL: custom query – –  –      –  – – – –  40 41 12.4 SPARQL: authentication – – – – –     – – – – – – – 41 42 12.5 SPARQL: advanced download – – – – –   – – – – – – – – – 42 43 Table 6 43 44 Tools evaluated using all criteria - part 1 - Req. 1 Catalog support – Req. 12 RDF input using SPARQL. means fulfilled for non-LD experts, 44 45 means fulfilled for LD-experts 45 46 46 47 47 48 48 49 49 50 50 51 51 48 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 1 2 2 3 3 4 4 5 5 Tool/Criterion LP-VIZ V2 VISU Sp YAS UV LP-ETL OC DL LAUN LDR SF ELD PEPE FAGI RDFPRO 6 6 13.1 RDF/XML file load – – – – –      – – – – –  7 7 13.2 Turtle file load  – – – –      –  – – –  8 8 13.3 N-Triples file load – – – – –      –  – –   9 9 13.4 JSON-LD file load – – – – –   –  – – – – – –  10 10 13.5 RDFa file load – – – – –      – – – – – – 11 11 13.6 N-Quads file load – – – – –      –  – – –  12 12 13.7 TriG file load – – – – –      –  – – –  13 13   14 13.8 Handling of bigger files – – – – – – – – – – – – – – 14 15 13.9 CSV on the Web load – – – – – – – – – – – – – – – – 15 16 13.10 Direct mapping load – – – – –  – –  – – – – – – – 16 17 13.11 R2RML load – – – – – – –  – – – – – – – – 17 18 14.1 Linked Data Platform – – – – – – – – – – – – – – – – 18 containers 19 19 20 15.1 Monitoring of input changes: – – – – – – – – – – – – – – – – 20 polling 21 21 15.2 Monitoring of input changes: – – – – – – – – – – – – – – – – 22 22 subscription 23 23 16.1 Manual versioning support – – – – – – – – – – – – – – – – 24 24 16.2 Automated versioning support – – – – – – – – – – – – – – – – 25 25 16.3 Source versioning support – – – – – – – – – – – – – – – – 26 26 17.1 Shared resources analysis – – – – – – – – – – – – – – – – 27 27 17.2 Link analysis – – – – – – – – – – – – – – – – 28 28 17.3 Shared vocabularies analysis –  – – – – – – – – – – – – – – 29 29 18.1 Link discovery – – – – – – –   – – – – – – – 30 30 18.2 Ontology alignment – – – – – – – – – – – – – – – – 31 31 32 19.1 Vocabulary mapping based – – – – – – – – – – – – – – – – 32 transformations 33 33 19.2 Vocabulary-based SPARQL – – – – – – – –  – – – – – – – 34 34 transformations 35 35 20.1 RDFS inference – – – – – – – – – – – – – – –  36 36 20.2 Resource fusion – – – – – – – – – – – – – –  – 37 37 20.3 OWL inference – – – – – – – – – – – – – – –  38 38 21.1 Assisted selection – – – – – – – – – – – – – – – – 39 39 21.2 Assisted projection – – – – – – – – – – – – – – – – 40 40 22.1 Custom transformations – – – – –     – – – – – –  41 41 23.1 Automated data manipulation  – – – – – – – – – – – – – – – 42 42 24.1 Provenance input – – – – – – – – – – – – – – – – 43 43 44 24.2 Provenance output – – – – – – – – – – – – – – – – 44 45 25.1 License awareness – – – – – – – – – – – – – – – – 45 46 25.2 License management – – – – – – – – – – – – – – – – 46 Table 7 47 47 Tools evaluated using all criteria - part 2 - Req. 13 RDF dump load – Req. 25 License management means fulfilled for non-LD experts, 48 48 means fulfilled for LD-experts 49 49 50 50 51 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 49

1 1 2 2 3 3 4 4 5 5 6 6

Tool/Criterion LP-VIZ V2 VISU Sp YAS UV LP-ETL OC DL LAUN LDR SF ELD PEPE FAGI RDFPRO 7 7 26.1 Graph visualization –  – – – – – – – –  – – – – – 8 8 26.2 Manual visualization – –    – –   – – – – – – – 9 9 26.3 Vocabulary-based  – –   – – – – –  –  –  – 10 10 visualization 11 11 27.1 RDF/XML file dump – – – – –     – – – – – –  12 12 27.2 Turtle file dump – – – – –     – – – – – –  13 13 27.3 N-Triples file dump – – – – –      – – – – –  14 14 27.4 JSON-LD file dump – – – – –    – – – – – – –  15 15 27.5 RDFa file dump – – – – –    – – – – – – – – 16 16 27.6 N-Quads file dump – – – – –     – – – – – –  17 17 27.7 TriG file dump – – – – –     – – – – – –  18 18 28.1 SPARQL Update support – – – – –   – – – – – – –   19 19    20 28.2 SPARQL: named graphs – – – – – – – – – – – – – 20 support 21 21 28.3 SPARQL: authentication – – – – –   – – – – – – – – – 22 22 support 23 23 28.4 SPARQL Graph Store HTTP – – – – –   – – – – – – – – – 24 Protocol output 24 25 29.1 Linked Data Platform Con- – – – – – – – – – – – – – – – – 25 26 tainers Output 26 27 30.1 CSV file output – –  –      – – – – – – – 27 28 30.2 CSV on the Web output – – – – – – – – – – – – – – – – 28 29 30.3 XML file output – – – – – –  – – – – – – – – – 29 30 30.4 JSON file output – – – – – –  – – – – – – – – – 30 31 30.5 Output for specific third-party – –  – – – – – – – – – – – – – 31 32 tools 32 33 31.1 Existence of API – – – – –   – – – – – – – – – 33 34 31.2 API documentation – – – – –   – – – – – – – – – 34 35 31.3 Full API coverage – – – – – –  – – – – – – – – – 35 36 32.1 Configuration in open format – – – – –   – – – – – – – –  36 37 32.2 Configuration in RDF – – – – – –  – – – – – – – – – 37 38 33.1 Repository for source code  – – –             38 39 39 33.2 Repository for plugins – – – – –   – – – – – – – – – 40 40 33.3 Repository for data processes – – – – – – – – – – – – – – – – 41 41 33.4 Community activity  – – –    –  – –  – –   42 42 34.1 Deployment of services – – – – – –  – – – – – – – – – 43 43 44 Total for LD experts 7 4 6 6 7 31 36 27 23 9 5 9 4 3 9 22 44 45 Total for non-LD experts 6 2 0 5 5 22 20 18 16 8 4 9 3 2 6 16 45 Table 8 46 46 Tools evaluated using all criteria - part 3 - Req. 26 Visualization – Req. 34 Deployment of services means fulfilled for non-LD experts, means 47 47 fulfilled for LD-experts 48 48 49 49 50 50 51 51 50 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 Appendix B. Elimination rounds for potential Tools/Round 2 3 1 2 candidates [119] Karma 75  2 3 [120] Konclude 76  3 4 – U in round 3 means that loading of the test RDF [98] LD-R 77   4 5 data into the tool is not supported by the tool, e.g. [51] LDVMi 78  5 6 the tool has a fixed set of datasets to work with [121] LinDA 79  E 6 7 7 – E in round 3 means that there was an error loading [84] LinDA LinkDaViz 80  8 8 the test RDF data into the tool [122] LinkedWidgets 81  U 9 9 [123] LinkSUM 82  U 10 10 [124] LODeX 83  11 Tools/Round 2 3 11 [97] LODLaundromat 84   12 [107] Aemoo 59  U 12 [125] LODmilla 85  13 [108] Aether 60  E 13 [126] LODSight 86  U 14 [109] CODE Balloon 61  U 14 [71] LODVader 87  15 [110] Prissma Browser 62  15 [127] LOG (linked-open-  E 16 [111] CubeViz 63  E 16 graph) 88 17 [112] D2V 64  17 [128] LOTUS 89  E 18 [113] DataGraft 65  18 [92] LP-ETL 90   19 [96] Datalift 66   19 [10] LP-VIZ 91   20 [114] DCAT merger 67  E 20 [94] OpenCube 92   21 [2] CODE DoSeR 68  21 [129] Payola 93  22 [2] CODE Enrichment Service  22 [100] PepeSearch 94   23 and Annotator Tool 69 23 [130] ProLOD++ 95  24 [99] ESTA-LD 70   24 [131] QUaTRO 96  25 [115] ExConQuer  25 [2] CODE Query Wizard 97  26 [101] FAGI-gis 71   26 [132] rdf:SynopsViz 98  E 27 [116] FuhSen 72  U 27 [133] RDF4U 99  U 28 [117] Fusepool 73  E 28 [56] RDFpro 100   29 [118] Graphium 74  29 101  30 Table 9 [134] Roomba 30 Table 10 31 All evaluated tools which passed elimination round 1 - part 1 31 32 All evaluated tools which passed elimination round 1 - part 2 32 33 33 34 34 59 35 http://wit.istc.cnr.it/aemoo/ 35 60http://demo.seco.tkk.fi/aether/ 81http://linkedwidgets.org/ 36 36 61http://schlegel.github.io/balloon/index.html 82http://km.aifb.kit.edu/services/link/ 37 62http://wimmics.inria.fr/projects/prissma/ 83http://dbgroup.unimo.it/lodex 37 38 63http://cubeviz.aksw.org/ 84https://github.com/LOD-Laundromat 38 64 85http://lodmilla.sztaki.hu/lodmilla/ 39 http://www.ics.forth.gr/isl/D2VSystem 39 65https://github.com/datagraft 86http://lod2-dev.vse.cz/lodsight-v2 40 66http://datalift.org/ 87https://github.com/AKSW/LODVader/ 40 41 67https://github.com/pheyvaer/dcat-merger 88http://log.disit.org/service/ 41 68 89 42 https://code.know-center.tugraz.at/search https://github.com/filipdbrsk/LOTUS_Search 42 69https://code.know-center.tugraz.at/search 90https://github.com/linkedpipes/etl 43 43 70https://github.com/GeoKnow/ESTA-LD 91https://github.com/ldvm/LDVMi 44 71https://github.com/SLIPO-EU/FAGI-gis 92http://opencube-toolkit.eu/ 44 72 93 45 https://github.com/LiDaKrA/FuhSen https://github.com/payola/Payola 45 73https://github.com/fusepoolP3 94https://github.com/guiveg/pepesearch 46 46 74https://github.com/afloresv/Graphium 95https://www.hpi.uni-potsdam.de/naumann/sites/prolod++ 47 75http://usc-isi-i2.github.io/karma/ 96http://dice.cyfronet.pl/products/quatro 47 48 76http://www.derivo.de/en/produkte/konclude.html 97https://code.know-center.tugraz.at/search 48 77 98 49 http://ld-r.org/ http://synopsviz.imis.athena-innovation.gr/ 49 78https://github.com/ldvm/LDVMi 99http://rc.lodac.nii.ac.jp/rdf4u/ 50 50 79http://linda.west.uni-koblenz.de/ 100http://rdfpro.fbk.eu/ 51 80http://eis.iai.uni-bonn.de/Projects/LinkDaViz.html 101https://github.com/ahmadassaf/OpenData-Checker 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 51

1 Tools/Round 2 3 1 2 [135] RVL 102  U 2 3 [30] SemFacet 103   3 4 [136] Sextant 104  U 4 5 [28] Sparklis 105   5 6 [137] SpEnD 106  E 6 7 7 [138] SPEX 107  E 8 8 [139] SPORTAL 108  E 9 9 [140] StatSpace 109  U 10 10 [110] Prissma Studio 110  U 11 11 [141] SWGET 111  12 12 [142] SWOWS 112  13 13 [143] Tapioca 113  14 14 [144] Uduvudu 114  U 15 15 [91] UnifiedViews 115   16 16 [89] V2 116   17 17 117  18 [145] ViCoMap E 18 118 19 [89] VISU   19 20 [2] CODE Visualisation Wiz-  20 ard 119 21 21 [146] WikiNext 120  22 22 [121] LinDA Workbench 121  23 23 122   24 [90] YASGUI 24 Table 11 25 25 All evaluated tools which passed elimination round 1 - part 3 26 26 27 27 28 28 29 29 30 30 31 31 32 32 33 33 34 34 35 35 102https://github.com/janpolowinski/rvl 36 36 103https://github.com/semfacet 37 104http://sextant.di.uoa.gr/ 37 38 105http://www.irisa.fr/LIS/ferre/sparklis/ 38 106https://github.com/semihyumusak/SpEnD 39 39 107https://github.com/lodum/SPEX 40 108https://github.com/SAliHasnain/sparqlautodescription 40 41 109http://statspace.linkedwidgets.org/ 41 110 42 http://wimmics.inria.fr/projects/prissma/ 42 111https://swget.wordpress.com/ 43 43 112http://www.swows.org 44 113http://aksw.org/Projects/Tapioca.html 44 114 45 https://github.com/uduvudu/uduvudu/ 45 115https://github.com/unifiedviews 46 46 116http://data.aalto.fi/V2 47 117http://vicomap.informatik.uni-mannheim.de/ 47 48 118http://data.aalto.fi/visu 48 119 49 https://code.know-center.tugraz.at/search 49 120https://github.com/pavel-arapov/wikinext 50 50 121http://linda.west.uni-koblenz.de/ 51 122https://github.com/OpenTriply/YASGUI 51 52 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 Appendix C. Criteria evaluation scenarios Criterion 3.1 Structured search form 1 2 Prerequisite The tool offers search functionality. 2 3 In this section we describe in more detail the evalu- Pass The tool provides a structured search form to en- 3 4 ation scenarios for our 94 criteria for tool evaluation. able the user to restrict metadata values of the 4 5 Each scenario is described by one or more PASS condi- searched datasets. The tool presents the result of 5 6 tions and optional Prerequisites and examples of pass- the search operation to the user. 6 7 ing or failing the conditions. It is enough for the tool to 7 Criterion 3.2 Fulltext search 8 pass one PASS condition to pass the criterion. 8 Pass The tool provides an input field where the user 9 can specify a full-text query to search for datasets. 9 10 Criterion 1.1 CKAN API 10 Pass The tool can load metadata from a CKAN catalog The tool presents the result of the search operation 11 to the user. 11 12 using its API. 12 Pass example Load metadata from CKAN catalog 13 Criterion 3.3 Natural language based intent expression 13 with URL https://linked.opendata.cz using its API. 14 Pass The tool provides an input field where the user 14 15 Criterion 1.2 RDF metadata in dump can specify a query in a natural language to search 15 16 Pass The tool is able to load VoID, DCAT, DCAT-AP for datasets. The tool presents the result of the 16 17 or Schema.org dataset metadata from a file or a search operation to the user. 17 18 18 URL. Criterion 3.4 Formal language based intent expression 19 Pass example The tool loads DCAT-AP metadata from Pass The tool provides an input field where the user 19 20 a file or a URL, e.g. https://nkod.opendata.cz/ can specify a query in a formal language to search 20 21 soubor/nkod.trig, and lets the user see the datasets for datasets. The tool presents the result of the 21 22 in the tool. search operation to the user. 22 23 23 24 Criterion 1.3 Metadata in SPARQL endpoint Criterion 4.1 Seach query automated suggestions 24 25 Pass The tool is able to load VoID, DCAT, DCAT-AP Prerequisite 3.1 Structured search form or 3.2 Full- 25 26 or Schema.org dataset metadata from a SPARQL text search or 3.3 Natural language based intent 26 27 endpoint. expression or 3.4 Formal language based intent 27 28 Pass example The tool can import DCAT-AP metadata expression 28 29 from https://www.europeandataportal.eu/sparql Pass When the user types a search query, the tool auto- 29 30 and show the user a list of datasets, which the user matically suggests some parts of the search query. 30 31 can use in the tool. Pass example The tool can automatically suggest pred- 31 32 icates or classes based on prefixes when the user 32 Criterion 1.4 Metadata from IRI dereferencing 33 types a search query expressed in a formal query 33 Pass The tool is able to load VoID, DCAT, DCAT-AP 34 language. 34 or Schema.org dataset metadata by dereferencing 35 Pass example The tool can automatically suggest parts 35 an IRI of a metadata resource. 36 of sentences when the user types a search query 36 Pass example The tool can dereference IRI of dcat:Dataset 37 expressed in a natural language. 37 or void:Dataset resource, e.g. https://nkod. 38 38 opendata.cz/zdroj/datová-sada/247025777, show Criterion 4.2 Faceted search 39 39 the user information about the dataset and allow Prerequisite The tool allows users to search for 40 40 the user to use the dataset. datasets. 41 Pass The tool shows filters (facets) the user can use to 41 42 Criterion 2.1 Dataset profiling (re)formulate the search query. 42 43 Pass The tool is able to discover datasets on the web Pass example The user can specify time/spatial cover- 43 44 and compute a profile of any kind. The tool in- age of a dataset. 44 45 forms the user about the discovered datasets. Pass example The tool provides the user with a list of 45 46 Pass example Given a URL of an RDF resource the all values for a certain predicate and lets the user 46 47 tool can identify related resources and enables the to select the required values. 47 48 user to use all of them as a dataset. Fail example The tool provides only a visualisation 48 49 Fail example The tool can only load datasets from with filters that can be used to restrict the values 49 50 an external catalog (not necessarily CKAN or being visualized, outside of the context of datasets 50 51 DKAN). search. 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 53

1 Criterion 5.1 Search query expansion Criterion 7.3 Intent-based dataset ranking 1 2 Prerequisite 3.1 Structured search form or 3.2 Full- Prerequisite The tool allows users to search for 2 3 text search or 3.3 Natural language based intent datasets. 3 4 expression or 3.4 Formal language based intent Pass The tool orders the list of discovered datasets by 4 5 expression a metric based on the structural and/or semantic 5 6 Pass The tool expands the search query provided by similarity or graph distance of their content or 6 7 the user with additional resources (e.g., concepts, context (i.e. linked datasets) with resources in the 7 8 properties, classes). search query. 8 9 Pass example The tool provides a check-box which 9 10 enables the user to switch on the query expansion. Criterion 8.1 Preview – W3C vocabularies 10 11 Pass example Before or after evaluating the search Prerequisite The tool allows users to search for 11 12 query, the tool shows the expanded version of the datasets. 12 13 query and it informs the user that it is an expanded Pass The tool offers the user additional information in 13 14 version of the original query provided by the user. the search result for each discovered dataset which 14 15 is a visual or textual preview of the dataset. The 15 16 Criterion 6.1 Search results extension preview is extracted from a part of the content of 16 17 Prerequisite 3.1 Structured search form or 3.2 Full- the dataset which is represented according to the 17 18 text search or 3.3 Natural language based intent specification of well-known W3C vocabularies 18 19 expression or 3.4 Formal language based intent Pass example For a dataset containing a SKOS hier- 19 20 expression archy, the tool can show information about the 20 21 Pass The tool extends the list of datasets which is the SKOS hierarchy. 21 22 result of evaluating the given search query with 22 23 additional semantically related datasets and it in- Criterion 8.2 Preview – LOV vocabularies 23 24 forms the user about the extension. Prerequisite The tool allows users to search for 24 25 Pass example After the result of evaluation of the datasets. 25 26 given search query is displayed by the tool, the Pass The tool offers the user additional information in 26 27 user can click on a button to see additional seman- the search result for each discovered dataset which 27 28 tically related results. is a visual or textual preview of the dataset. The 28 29 Pass example The tool displays a list of additional preview is based on vocabularies listed in LOV, 29 30 semantically related datasets below the list of similarly to the previous requirement. 30 31 datasets discovered by evaluating search query. Pass example For a dataset containing Schema.org 31 32 Both lists are visually separated from each other Geocoordinates, the user can click on a button and 32 33 in some way. see an image on a map. 33 34 34 Fail example The tool only implements a “show more” Criterion 8.3 Preview metadata 35 35 button that shows additional items (pagination) Prerequisite The tool allows users to search for 36 36 from the result set. datasets. 37 37 Pass In the query results view the user can see addi- 38 Criterion 7.1 Content-based dataset ranking 38 tional information for each dataset created based 39 Prerequisite The tool allows users to search for 39 on the dataset metadata. 40 datasets. 40 41 Pass The tool orders the list of discovered datasets by Criterion 9.1 Preview data 41 42 a metric based on their metadata (e.g., publisher, Pass The tool can compute statistics (number of triples, 42 43 update periodicity, size) or content (e.g., different etc.. ) or description of the dataset. 43 44 content quality metrics). 44 45 Criterion 9.2 Schema extraction 45 46 Criterion 7.2 Context-based dataset ranking Pass The tool can extract and visualize a schema used 46 47 Prerequisite The tool allows users to search for in a dataset. 47 48 datasets. Pass example The tool can visualise schema extracted 48 49 Pass The tool orders the list of discovered datasets by from the data. 49 50 a metric based on their context which is formed 50 51 by linked datasets. Criterion 10.1 Quality indicators 51 54 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 Pass The tool is able to provide user with any quality Criterion 13.2 Turtle file load 1 2 indicators. Pass The tool can load RDF data from a URL or a local 2 3 Pass example The tool shows that a dataset is available file in the Turtle format. 3 4 in 80% of time. 4 5 Criterion 13.3 N-Triples file load 5 6 Criterion 11.1 IRI dereferencing Pass The tool can load RDF data from a URL or a local 6 7 Pass The tool can request content from given IRI using file in the N-Triples format. 7 8 HTTP content negotiation. 8 9 Pass example The tool can download representation of Criterion 13.4 JSON-LD file load 9 10 Prague from http://dbpedia.org/resource/Prague. Pass The tool can load RDF data from a URL or a local 10 11 Fail example The tool can only download an RDF file file in the JSON-LD format. 11 from a server using simple HTTP GET. 12 Criterion 13.5 RDFa file load 12 13 13 Criterion 11.2 Crawling Pass The tool can load RDF data from a URL or a local 14 14 Prerequisite 11.1 IRI dereferencing file containing RDFa annotations. 15 15 Pass Given an IRI of an RDF resource the tool can 16 16 download not only the given resource but also Criterion 13.6 N-Quads file load 17 17 related resources. Pass The tool can load RDF data from a URL or a local 18 18 Pass example Given the resource http://dbpedia.org/ file in the N-Quads format. 19 19 resource/Prague the tool can find and download 20 Criterion 13.7 TriG file load 20 additional information about referenced resources. 21 Pass The tool can load RDF data from a URL or a local 21 22 Criterion 12.1 SPARQL: querying file in the TriG format. 22 23 Pass Given a SPARQL endpoint the tool can query it 23 Criterion 13.8 Handling of bigger files 24 using predefined queries and process the results. 24 25 Pass The tool implements a strategy for processing of 25 26 Criterion 12.2 SPARQL: named graphs bigger RDF files. 26 27 Pass The tool enables the user to specify named graph Pass example The tool implements streaming as a 27 28 present in the queried SPARQL endpoint by means form of RDF processing. 28 29 other that specifying it in the user provided query. Fail example The tool can only handle loading of big- 29 30 Pass example The tool allows the user to select or pro- ger files related to the size of the available mem- 30 31 vide a named graph that is used to evaluate the ory. 31 32 query. 32 33 Criterion 13.9 CSV on the Web load 33 34 Criterion 12.3 SPARQL: custom query Pass The tool can load an uploaded CSV file accompa- 34 35 Pass User can provide custom SPARQL queries. nied by a CSVW JSON-LD descriptor. 35 36 Pass The tool can load data from a given URL of a 36 Criterion 12.4 SPARQL: authentication 37 CSVW JSON-LD descriptor that contains a refer- 37 Pass User can provide credentials that are used for au- 38 ence to the CSV data file. 38 thentication/authorization on SPARQL endpoint. 39 Pass The tool can load a CSV file from a given URL as- 39 suming the server also provides the CSVW JSON- 40 Criterion 12.5 SPARQL: advanced download 40 41 LD descriptor in the location described by the CSV 41 Pass The tool explicitly offers support for dealing with on the Web specification. 42 large SPARQL results set that can not be down- 42 43 43 loaded in a regular way due to endpoint limita- Criterion 13.10 Direct mapping load 44 44 tions. Pass Using the Direct mapping specification the tool 45 45 Pass example The tool supports pagination. can load data directly from a relational database. 46 Pass example The tool supports the Virtuoso scrol- 46 47 lable cursor technique. Criterion 13.11 R2RML load 47 48 Pass Using R2RML mapping the tool can load data 48 49 Criterion 13.1 RDF/XML file load directly from a relational database. 49 50 Pass The tool can load RDF data from a URL or a local 50 51 file in the RDF/XML format. Criterion 14.1 Linked Data Platform containers 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 55

1 Pass The user can provide a URL of an LDP container Criterion 17.2 Link analysis 1 2 and the tool can extract data from it. The tool Pass The tool can perform an analysis of links among 2 3 provides the user with loading options related to datasets. 3 4 LDP containers. Pass example The tool can compute the number of 4 5 Pass example The tool offers an interactive wizard links among datasets. 5 6 guiding user through process of data extraction. 6 7 Fail example The tool simply downloads the content Criterion 17.3 Shared vocabularies analysis 7 8 of a given URL without knowing that it represents Pass The tool can perform an analysis of vocabularies 8 9 the LDP container. shared among datasets. 9 10 Pass example The tool can show the names of shared 10 11 Criterion 15.1 Monitoring of input changes: polling vocabularies. 11 12 Pass The tool can periodically check for changes in the Pass example The tool can show the shared predicates 12 13 data source. If there are changes, it can trigger an and classes among datasets. 13 14 action. 14 Criterion 18.1 Link discovery 15 Pass example The tool can periodically use HTTP 15 Pass The tool provides support for (semi-)automated 16 HEAD request using a given URL to detect 16 link discovery. 17 changes and download data if any change occurs. 17 Pass example 18 Fail example The tool only provides support for peri- Given two datasets the tool can discover 18 19 odical downloading of the dataset. and save new links between them. 19 Pass example The tool embeds Silk. 20 Criterion 15.2 Monitoring of input changes: subscription 20 21 Pass The tool can subscribe to a service (data source, Criterion 18.2 Ontology alignment 21 22 etc.) to receive notifications about data updates Pass Given a source and a target vocabulary or on- 22 23 and trigger actions. tology the tool can discover and save a mapping 23 24 Pass example The tool can subscribe to a data source between them. 24 25 using WebSub, and download new data whenever 25 26 it is notified. Criterion 19.1 Vocabulary mapping based transformations 26 27 Pass The tool can transform data to another vocabulary 27 28 Criterion 16.1 Manual versioning support using a mapping (for example OWL, R2R). 28 29 Pass The tool can store multiple versions of the same 29 30 dataset. The user can work with any of them. Criterion 19.2 Vocabulary-based SPARQL transformations 30 31 Pass The tool can transform data to another vocabulary 31 32 Criterion 16.2 Automated versioning support using a predefined SPARQL query. 32 33 Pass The tool can automatically synchronize versions 33 34 of a dataset with the data source. Criterion 20.1 RDFS inference 34 35 Pass example The tool can watch a Git repository con- Pass Support inference based on inference rules en- 35 36 taining RDF files. Upon change in the repository coded in the RDFS vocabulary. 36 37 the tool can pull the new version of the data. 37 Criterion 20.2 Resource fusion 38 Fail example The tool can only periodically delete and 38 Pass Given the links between resources the tool can 39 download the dataset to keep it updated to latest 39 perform resource fusion and provide support for 40 version. 40 solving conflicts. 41 41 Criterion 16.3 Source versioning support Fail example The tool can only rename resources in 42 42 Pass The tool supports a protocol for asking for a data one datasets so the they match the ones from the 43 43 from a specified time instant/range. other dataset. 44 44 Pass example The tool supports the Memento proto- Fail example The tool can only copy all predicates and 45 45 col. values from one resource to another. 46 46 47 Criterion 17.1 Shared resources analysis Criterion 20.3 OWL inference 47 48 Pass The tool can perform an analysis of shared re- Pass Support inference based on inference rules de- 48 49 sources among datasets. scribed in OWL ontologies. 49 50 Pass example The tool can compute the number of 50 51 shared resources among datasets. Criterion 21.1 Assisted selection 51 56 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 Pass The tool provides a graphical interface for selec- Criterion 25.1 License awareness 1 2 tion of data (filtering of resources using values) Pass The tool is aware of licensing information present 2 3 which the user can use to transform RDF data. in dataset metadata and allows the user to see it. 3 4 Pass example The tool assists the user with formulat- 4 5 ing a SPARQL query in with a graphical query Criterion 25.2 License management 5 6 builder. Prerequisite 25.1 License awareness 6 7 Fail example The tool only provides a query editor in Pass The tool can assists the user with combining 7 8 the form of a text area. datasets with different licenses. 8 9 Fail example The tool provides only a visualisations Pass example The tool notifies the user when datasets 9 10 with filters, i.e. filters can not be used to transform with different licenses are used together and lets 10 11 data. the user select the license of the output. 11 12 Criterion 12 Criterion 21.2 Assisted projection 26.1 Graph visualization 13 Pass The tool can visualize the graph structure of RDF 13 Pass The tool provides a graphical interface for data 14 data. 14 15 projection (selecting properties) which the user 15 16 can use to transform RDF data. Criterion 26.2 Manual visualization 16 17 Pass example The tool assists the user with formulat- Pass The tool allows the user to graphically visualize 17 18 ing a SPARQL query in with a graphical query selected datasets. The tool requires user assistance 18 19 builder. to select properties, data sources, visualization 19 20 Fail example The tool only provides a query editor in types etc. 20 21 the form of a text area. Pass example The tool provides the user with a map 21 22 Fail example The tool provides only a visualisations visualization, however, the user must provide the 22 23 with filters, i.e. filters can not be used to transform tool with identification of labels, longitude and 23 24 data. latitude properties and only then the tool can visu- 24 25 alize the resources. 25 26 Criterion 22.1 Custom transformations 26 27 Pass The user can transform data using custom trans- Criterion 26.3 Vocabulary-based visualization 27 28 formations. Pass The tool can explore datasets and offer the user 28 29 Pass example User can transform data using a SPARQL an automatically generated visualization. The user 29 30 query. does not have to provide any additional informa- 30 31 Pass example User can transform data using a tool- tion besides which dataset to visualize. 31 32 specific language. 32 Criterion 27.1 RDF/XML file dump 33 33 Criterion 23.1 Automated data manipulation Pass The tool allows the user to output data in the 34 34 Pass The tool provides the user with possible data trans- RDF/XML format. 35 35 formations that were automatically generated or 36 36 discovered. Criterion 27.2 Turtle file dump 37 37 Pass example Given input data the tool can generate Pass The tool allows the user to output data in the 38 38 transformations pipelines. Turtle format. 39 39 40 Criterion 24.1 Provenance input Criterion 27.3 N-Triples file dump 40 41 Pass The tool is able to load the provenance informa- Pass The tool allows the user to output data in the N- 41 42 tion and employ it in some way. Triples format. 42 43 43 Pass example The tool can load and utilize provenance Criterion 27.4 JSON-LD file dump 44 information using the PROV-O ontology. 44 45 Pass The tool allows the user to output data in the 45 JSON-LD format. 46 Criterion 24.2 Provenance output 46 47 Prerequisite The tool supports data output functional- Criterion 27.5 RDFa file dump 47 48 ity. Pass The tool allows the user to output data as an RDFa 48 49 Pass Together with the data the tool can output prove- annotated file. 49 50 nance information using RDF and the PROV-O 50 51 ontology or similar. Criterion 27.6 N-Quads file dump 51 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption 57

1 Pass The tool allows the user to output data in the N- Pass The tool has an API which enables other tools to 1 2 Quads format. work with it. 2 3 Pass example Human designed RESTful API. 3 4 Criterion 27.7 TriG file dump Fail example Generated HTTP API used for commu- 4 5 Pass The tool allows the user to output data in the TriG nication with a client part of the tool, e.g. as in 5 6 format. Vaadin. 6 7 Criterion 28.1 SPARQL Update support 7 Criterion 31.2 API documentation 8 Pass The tool can output a selected dataset to a remote 8 Prerequisite 31.1 Existence of API 9 SPARQL endpoint using SPARQL Update. 9 10 Pass There is a documentation for the API. 10 11 Criterion 28.2 SPARQL: named graphs support 11 Criterion 31.3 Full API coverage 12 Prerequisite 28.1 SPARQL Update support 12 Prerequisite 31.1 Existence of API 13 Pass The user can specify the target named graph for 13 Pass All the actions that can be performed by user 14 SPARQL Update queries. 14 interaction with the tool can also be performed 15 15 throughout the API calls. 16 Criterion 28.3 SPARQL: authentication support 16 Prerequisite 28.1 SPARQL Update support 17 Criterion 32.1 Configuration in open format 17 18 Pass The tool allows the user to provide credentials for 18 access to a remote SPARQL endpoint. Pass The configuration of the tool, e.g. the projects it 19 saves, is in an open format. 19 20 Criterion 28.4 SPARQL Graph Store HTTP Protocol 20 21 output Criterion 32.2 Configuration in RDF 21 22 Pass The tool can output a selected dataset to a remote Pass The tool, e.g. the projects it saves, can be config- 22 23 endpoint using the Graph Store HTTP Protocol. ured using the RDF format. 23 24 24 25 Criterion 29.1 Linked Data Platform Containers Criterion 33.1 Repository for source code 25 26 Output Pass There is a publicly available repository with the 26 27 Pass The tool can output a selected dataset to a LDP source code of the tool. 27 28 container. The tool must allow the user to utilize 28 29 the features of LDP when loading the data. Criterion 33.2 Repository for plugins 29 30 Prerequisite The tool supports plug-ins. 30 31 Criterion 30.1 CSV file output Pass There is a publicly available plug-in repository. 31 32 Pass The tool can output data in the CSV format. Pass example The tool supports a plug-in repository/- 32 marketplace from where the user can download 33 Criterion 30.2 CSV on the Web output 33 34 and install new plugins. 34 Pass The tool can output data in the CSV format along Pass example Git repository for plug-ins is mentioned 35 with its CSV on the Web JSON-LD descriptor. 35 36 in the documentation or referenced from the tool. 36 37 Criterion 30.3 XML file output 37 Criterion 33.3 Repository for data processes 38 Pass The tool can output data in the XML format. 38 Prerequisite There are projects, e.g. pipelines or work- 39 39 flows, created in the tool. 40 Criterion 30.4 JSON file output 40 Pass There is a publicly available projects repository. 41 Pass The tool can output data in the JSON format. 41 Pass example The tool supports a project repository/- 42 42 Criterion 30.5 Output for specific third-party tools marketplace from where the user can download or 43 43 Pass The tool can output data in a way that the data where the user can upload projects. 44 44 can be loaded in a specific third-party software. Pass example Git repository for projects is mentioned 45 45 Pass example The tool enables the user to export data in the documentation or referenced from the tool. 46 as a specific CSV file with graph data for Gephi. 46 47 Pass example The tool enables the user to export Criterion 33.4 Community activity 47 48 data specifically for Fuseki instead of a generic Pass There were at least 2 commits from 2 persons in 48 49 SPARQL endpoint. the last year across all the relevant repositories. 49 50 50 51 Criterion 31.1 Existence of API Criterion 34.1 Deployment of services 51 58 Klímek J., Škoda P., NeˇcaskýM. / Survey of Tools for Linked Data Consumption

1 Prerequisite There are projects, e.g. pipelines or work- 1 2 flows, created in the tool. 2 3 Pass The transformation pipeline or workflow in the 3 4 tool can be exposed as a web service. 4 5 Fail example A simple SPARQL query editor, where 5 6 the query is part of a URL, does not fulfill this 6 7 criterion. 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51