Specification and Implementation of Mapping Rule Visualization and Editing

Total Page:16

File Type:pdf, Size:1020Kb

Specification and Implementation of Mapping Rule Visualization and Editing Specification and Implementation of Mapping Rule Visualization and Editing: MapVOWL and the RMLEditor Pieter Heyvaerta,1,∗, Anastasia Dimoua,1, Ben De Meestera, Tom Seymoensb, Aron-Levi Herregodtsc, Ruben Verborgha, Dimitri Schuurmanc, Erik Mannensa aIDLab, Department of Electronics and Information Systems, Ghent University { imec, Sint-Pietersnieuwstraat 41, 9000 Gent, Belgium bimec { smit, Vrije Universiteit Brussel, Pleinlaan 9, 1050 Etterbeek, Belgium cimec { Ghent University { MICT, Korte Meer 7, 9000 Gent, Belgium Abstract Visual tools are implemented to help users in defining how to generate Linked Data from raw data. This is possible thanks to mapping languages which enable detaching mapping rules from the implementation that executes them. However, no thorough research has been conducted so far on how to visualize such mapping rules, especially if they become large and require considering multiple heterogeneous raw data sources and transformed data values. In the past, we proposed the RMLEditor, a visual graph-based user interface, which allows users to easily create mapping rules for generating Linked Data from raw data. In this paper, we build on top of our existing work: we (i) specify a visual notation for graph visualizations used to represent mapping rules, (ii) introduce an approach for manipulating rules when large visualizations emerge, and (iii) propose an approach to uniformly visualize data fraction of raw data sources combined with an interactive interface for uniform data fraction transformations. We perform two additional comparative user studies. The first one compares the use of the visual notation to present mapping rules to the use of a mapping language directly, which reveals that the visual notation is preferred. The second one compares the use of the graph-based RMLEditor for creating mapping rules to the form-based RMLx Visual Editor, which reveals that graph-based visualizations are preferred to create mapping rules through the use of our proposed visual notation and uniform representation of heterogeneous data sources and data values. Keywords: graph, Linked Data, mapping rule, MapVOWL, RMLEditor 1. Introduction loaded in multiple databases; UniProt [12] (UniPro- tKB, Uniref and UniParc), with approximately 45 Nowadays Linked Data still stems from (semi- billion triples across 3 datasets derived from the )structured formats. A few of the most well-known UniProt Knowledgebase6; and Bio2 7 [2], with 2 rdf and larger Linked Data sets [23] are: DBpedia approximately 11 billion triples across 35 datasets. 3 dataset [6], with approximately 1.39 billion triples Overall, Linked Data generation includes the fol- 4 derived from Wikipedia where the data is origi- lowing: (i) multiple heterogeneous raw data sources nally represented in the wikitext syntax; Linked whose data values might need transformation, 5 Geo Data [58] with approximately 1.38 billion (ii) ontologies used to annotate the rdf terms [13] triples derived from Open Street Map planet files that are generated from the data fractions of the dif- ferent data sources, (iii) actual mapping rules which ∗Corresponding author define how data fractions are semantically anno- Email address: [email protected] tated and used to generate rdf terms and triples, (Pieter Heyvaert) 1These authors contributed equally to this work. and (iv) generated Linked Data. 2http://stats.lod2.eu/rdfdocs?sort=triples 3http://dbpedia.org 4http://wikipedia.org 6http://www.uniprot.org/help/uniprotkb 5http://linkedgeodata.org 7http://bio2rdf.org/ Preprint submitted to Journal of Web Semantics February 6, 2018 Several datasets are generated by tools that in- which we presented in the past. The RMLEditor corporate directly in their implementation how offers a graph-based interface for specifying map- Linked Data is generated. This means when new or ping rules for raw data to generate Linked Data. updated semantic annotations are needed, knowl- Its target group of users have knowledge about both edge of Semantic Web technologies is required, as Linked Data and the domain of the data. The map- well as dedicated software development cycles for ping rules creation and editing is based on graph adjusting and extending the implementations. visualizations, without requiring knowledge of the To the contrary, mapping rules may also be de- underlying mapping language. Our novel contribu- fined, according to a specified mapping language tions include in particular: syntax, such as r2rml [16] or rml [20]. Mapping (i) a rich graph-based visual notation for map- languages define declaratively how terms are gener- ping rule visualization; ated from corresponding raw data and annotated (ii) an approach for manipulating rules when with ontology terms to form the desired Linked large visualizations emerge; Data. This way, mapping rules are detached from (iii) an approach to uniformly visualize data frac- the implementation that executes them. Neverthe- tions of data sources combined with an interactive less, knowledge of the underlying mapping language interface for uniform data fraction transformations; is required to define mapping rules, while manually (iv) an implementation of these three contribu- editing and curating them requires a substantial tions in the RMLEditor; and amount of human effort [31]. Therefore, the cre- (v) additional evaluations to compare the use of ation of mapping rules still remains complicated. { the visual notation to present mapping rules To this end, a significant number of mapping ed- to the use of a mapping language directly, which itors were implemented to facilitate mapping rules reveals that the visual notation is preferred; and creation and editing, such as Map-On [56], the { the graph-based RMLEditor to create mapping RMLEditor [31], and the RMLx Visual Editor8. rules to the form-based RMLx Visual Editor, which Only a few of them provide graph-based visualiza- reveals that the RMLEditor is preferred to create tions, although user evaluations suggest that such mapping rules through its use of the visual notation visualizations are suitable for supporting users to and uniform representation of heterogeneous data intuitively generate their desired Linked Data [31]. sources and data values. Nevertheless, how such user interfaces should be The remainder of the paper is structured as fol- designed is not thoroughly investigated so far: (i) a lows. In Section 2, we elaborate on Linked Data, visual notation specification for mapping rules does mapping rules, and rdf terms. In Section 3, we not exist. Such a specification would provide a for- discuss related work. In Section 4, we introduce mal description for mapping rules and allows mul- our research questions and hypotheses. We present tiple tools to implement it, improving the acces- in Section 5 our proposed visual notation for map- sibility for users across different tools; (ii) current ping rules, and in Section 6, the RMLEditor. In mapping editors do not uniformly present multi- particular, we elaborate in Section 6.5 on the ma- ple heterogeneous data sources. Therefore, creat- nipulation of large graphs in the RMLEditor, and in ing mapping rules that define relationships between Section 6.6 on how the RMLEditor deals with het- heterogeneous data sources is not always straight- erogeneous data sources. In Section 7, we present forward; (iii) data value transformations cannot be the evaluations and the results both for the visual defined in current visualization-based mapping ed- notation and the new version of the RMLEditor. In itors; (iv) scalability is not thoroughly addressed. Section 8, we summarize this work's conclusions. Even if graph-based visualizations are used, large graphs cause difficulties to users when editing the corresponding mapping rules. 2. Preliminary In this work, we present and extend our ongo- ing work towards a uniform graphical user interface Linked Data refers to data whose meaning is ex- (gui) to create and edit Linked Data mapping rules. plicitly defined, that is published on the Web in Such a gui is implemented in the RMLEditor, a machine-interpretable way, and that is linked to other external data sets [5]. Nowadays, RDF [13] is the prevalent framework to represent Linked Data. 8http://pebbie.org/mashup/rml Most of the time, Linked Data originally stems from 2 (semi-)structured formats (csv, xml, and so on). 3.1.1. Linked Data Visualizations Their rdf representation is obtained by repetitively As mapping rules resample how Linked Data will applying mapping rules according to an iteration eventually be generated [31], visualizations that are pattern which specifies the extract of data that is applicable for Linked Data would be expected to be considered during each iteration. applicable for mapping rules visualizations too. Mapping rules define correspondences between Efforts to improve Linked Data accessibility, re- data in different schemas [24]. In the case of Linked sulted in tools offering either (i) text-based pre- Data generation, mapping rules define how rdf sentations, or (ii) visualizations. Dadzie and terms, i.e., iris, literals or blank nodes [13], are gen- Rowe [15] conducted a survey on both approaches. erated from data fractions derived from one or more They concluded that text-based solutions, such as data sources which are annotated with ontologies. Sig.ma [60], (i) fail to provide an overview of the These rdf terms are used to form rdf triples. available information, and (ii) are only suitable for With the term data fractions, we do not only users with understanding of the underlying tech- mean raw data values as they are in the original nologies, whereas lay users need additional support. data source, but also transformed data values that Visual presentations lead to approaches relying result from a raw data value after applying a func- on network maps [35], diagrams [35], geographic tion to process the original data values. Data value maps [53], timelines [53], charts [3], and graphs [27, transformations are needed to support changes in 62, 19]. The latter was the default in the past ac- the structure, representation or content of data, cording to Dadzie and Pietriga [14], because (i) on- such as string transformations.
Recommended publications
  • Wikimedia Conferentie Nederland 2012 Conferentieboek
    http://www.wikimediaconferentie.nl WCN 2012 Conferentieboek CC-BY-SA 9:30–9:40 Opening 9:45–10:30 Lydia Pintscher: Introduction to Wikidata — the next big thing for Wikipedia and the world Wikipedia in het onderwijs Technische kant van wiki’s Wiki-gemeenschappen 10:45–11:30 Jos Punie Ralf Lämmel Sarah Morassi & Ziko van Dijk Het gebruik van Wikimedia Commons en Wikibooks in Community and ontology support for the Wikimedia Nederland, wat is dat precies? interactieve onderwijsvormen voor het secundaire onderwijs 101wiki 11:45–12:30 Tim Ruijters Sandra Fauconnier Een passie voor leren, de Nederlandse wikiversiteit Projecten van Wikimedia Nederland in 2012 en verder Bliksemsessie …+discussie …+vragensessie Lunch 13:15–14:15 Jimmy Wales 14:30–14:50 Wim Muskee Amit Bronner Lotte Belice Baltussen Wikipedia in Edurep Bridging the Gap of Multilingual Diversity Open Cultuur Data: Een bottom-up initiatief vanuit de erfgoedsector 14:55–15:15 Teun Lucassen Gerard Kuys Finne Boonen Scholieren op Wikipedia Onderwerpen vinden met DBpedia Blijf je of ga je weg? 15:30–15:50 Laura van Broekhoven & Jan Auke Brink Jeroen De Dauw Jan-Bart de Vreede 15:55–16:15 Wetenschappelijke stagiairs vertalen onderzoek naar Structured Data in MediaWiki Wikiwijs in vergelijking tot Wikiversity en Wikibooks Wikipedia–lemma 16:20–17:15 Prijsuitreiking van Wiki Loves Monuments Nederland 17:20–17:30 Afsluiting 17:30–18:30 Borrel Inhoudsopgave Organisatie 2 Voorwoord 3 09:45{10:30: Lydia Pintscher 4 13:15{14:15: Jimmy Wales 4 Wikipedia in het onderwijs 5 11:00{11:45: Jos Punie
    [Show full text]
  • Towards a Korean Dbpedia and an Approach for Complementing the Korean Wikipedia Based on Dbpedia
    Towards a Korean DBpedia and an Approach for Complementing the Korean Wikipedia based on DBpedia Eun-kyung Kim1, Matthias Weidl2, Key-Sun Choi1, S¨orenAuer2 1 Semantic Web Research Center, CS Department, KAIST, Korea, 305-701 2 Universit¨at Leipzig, Department of Computer Science, Johannisgasse 26, D-04103 Leipzig, Germany [email protected], [email protected] [email protected], [email protected] Abstract. In the first part of this paper we report about experiences when applying the DBpedia extraction framework to the Korean Wikipedia. We improved the extraction of non-Latin characters and extended the framework with pluggable internationalization components in order to fa- cilitate the extraction of localized information. With these improvements we almost doubled the amount of extracted triples. We also will present the results of the extraction for Korean. In the second part, we present a conceptual study aimed at understanding the impact of international resource synchronization in DBpedia. In the absence of any informa- tion synchronization, each country would construct its own datasets and manage it from its users. Moreover the cooperation across the various countries is adversely affected. Keywords: Synchronization, Wikipedia, DBpedia, Multi-lingual 1 Introduction Wikipedia is the largest encyclopedia of mankind and is written collaboratively by people all around the world. Everybody can access this knowledge as well as add and edit articles. Right now Wikipedia is available in 260 languages and the quality of the articles reached a high level [1]. However, Wikipedia only offers full-text search for this textual information. For that reason, different projects have been started to convert this information into structured knowledge, which can be used by Semantic Web technologies to ask sophisticated queries against Wikipedia.
    [Show full text]
  • Chaudron: Extending Dbpedia with Measurement Julien Subercaze
    Chaudron: Extending DBpedia with measurement Julien Subercaze To cite this version: Julien Subercaze. Chaudron: Extending DBpedia with measurement. 14th European Semantic Web Conference, Eva Blomqvist, Diana Maynard, Aldo Gangemi, May 2017, Portoroz, Slovenia. hal- 01477214 HAL Id: hal-01477214 https://hal.archives-ouvertes.fr/hal-01477214 Submitted on 27 Feb 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Chaudron: Extending DBpedia with measurement Julien Subercaze1 Univ Lyon, UJM-Saint-Etienne, CNRS Laboratoire Hubert Curien UMR 5516, F-42023, SAINT-ETIENNE, France [email protected] Abstract. Wikipedia is the largest collaborative encyclopedia and is used as the source for DBpedia, a central dataset of the LOD cloud. Wikipedia contains numerous numerical measures on the entities it describes, as per the general character of the data it encompasses. The DBpedia In- formation Extraction Framework transforms semi-structured data from Wikipedia into structured RDF. However this extraction framework of- fers a limited support to handle measurement in Wikipedia. In this paper, we describe the automated process that enables the creation of the Chaudron dataset. We propose an alternative extraction to the tra- ditional mapping creation from Wikipedia dump, by also using the ren- dered HTML to avoid the template transclusion issue.
    [Show full text]
  • Wiki-Metasemantik: a Wikipedia-Derived Query Expansion Approach Based on Network Properties
    Wiki-MetaSemantik: A Wikipedia-derived Query Expansion Approach based on Network Properties D. Puspitaningrum1, G. Yulianti2, I.S.W.B. Prasetya3 1,2Department of Computer Science, The University of Bengkulu WR Supratman St., Kandang Limun, Bengkulu 38371, Indonesia 3Department of Information and Computing Sciences, Utrecht University PO Box 80.089, 3508 TB Utrecht, The Netherlands E-mails: [email protected], [email protected], [email protected] Pseudo-Relevance Feedback (PRF) query expansions suffer Abstract- This paper discusses the use of Wikipedia for building from several drawbacks such as query-topic drift [8][10] and semantic ontologies to do Query Expansion (QE) in order to inefficiency [21]. Al-Shboul and Myaeng [1] proposed a improve the search results of search engines. In this technique, technique to alleviate topic drift caused by words ambiguity selecting related Wikipedia concepts becomes important. We and synonymous uses of words by utilizing semantic propose the use of network properties (degree, closeness, and pageRank) to build an ontology graph of user query concept annotations in Wikipedia pages, and enrich queries with which is derived directly from Wikipedia structures. The context disambiguating phrases. Also, in order to avoid resulting expansion system is called Wiki-MetaSemantik. We expansion of mistranslated words, a query expansion method tested this system against other online thesauruses and ontology using link texts of a Wikipedia page has been proposed [9]. based QE in both individual and meta-search engines setups. Furthermore, since not all hyperlinks are helpful for QE task Despite that our system has to build a Wikipedia ontology graph though (e.g.
    [Show full text]
  • Navigating Dbpedia by Topic Tanguy Raynaud, Julien Subercaze, Delphine Boucard, Vincent Battu, Frederique Laforest
    Fouilla: Navigating DBpedia by Topic Tanguy Raynaud, Julien Subercaze, Delphine Boucard, Vincent Battu, Frederique Laforest To cite this version: Tanguy Raynaud, Julien Subercaze, Delphine Boucard, Vincent Battu, Frederique Laforest. Fouilla: Navigating DBpedia by Topic. CIKM 2018, Oct 2018, Turin, Italy. hal-01860672 HAL Id: hal-01860672 https://hal.archives-ouvertes.fr/hal-01860672 Submitted on 23 Aug 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Fouilla: Navigating DBpedia by Topic Tanguy Raynaud, Julien Subercaze, Delphine Boucard, Vincent Battu, Frédérique Laforest Univ Lyon, UJM Saint-Etienne, CNRS, Laboratoire Hubert Curien UMR 5516 Saint-Etienne, France [email protected] ABSTRACT only the triples that concern this topic. For example, a user is inter- Navigating large knowledge bases made of billions of triples is very ested in Italy through the prism of Sports while another through the challenging. In this demonstration, we showcase Fouilla, a topical prism of Word War II. For each of these topics, the relevant triples Knowledge Base browser that offers a seamless navigational expe- of the Italy entity differ. In such circumstances, faceted browsing rience of DBpedia. We propose an original approach that leverages offers no solution to retrieve the entities relative to a defined topic both structural and semantic contents of Wikipedia to enable a if the knowledge graph does not explicitly contain an adequate topic-oriented filter on DBpedia entities.
    [Show full text]
  • Wikipedia Editing History in Dbpedia Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin
    Wikipedia editing history in DBpedia Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin To cite this version: Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin. Wikipedia editing history in DB- pedia : extracting and publishing the encyclopedia editing activity as linked data. IEEE/WIC/ACM International Joint Conference on Web Intelligence (WI’ 16), Oct 2016, Omaha, United States. hal- 01359575 HAL Id: hal-01359575 https://hal.inria.fr/hal-01359575 Submitted on 2 Sep 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Wikipedia editing history in DBpedia extracting and publishing the encyclopedia editing activity as linked data Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin Université Côte d’Azur, Inria, CNRS, I3S, France Wimmics, Sophia Antipolis, France [email protected] Abstract— DBpedia is a huge dataset essentially extracted example, the French editing history dump represents 2TB of from the content and structure of Wikipedia. We present a new uncompressed data. This data extraction is performed by extraction producing a linked data representation of the editing stream in Node.js with a MongoDB instance. It takes 4 days to history of Wikipedia pages. This supports custom querying and extract 55 GB of RDF in turtle on 8 Intel(R) Xeon(R) CPU E5- combining with other data providing new indicators and insights.
    [Show full text]
  • What You Say Is Who You Are. How Open Government Data Facilitates Profiling Politicians
    What you say is who you are. How open government data facilitates profiling politicians Maarten Marx and Arjan Nusselder ISLA, Informatics Institute, University of Amsterdam Science Park 107 1098XG Amsterdam, The Netherlands Abstract. A system is proposed and implemented that creates a lan- guage model for each member of the Dutch parliament, based on the official transcripts of the meetings of the Dutch Parliament. Using ex- pert finding techniques, the system allows users to retrieve a ranked list of politicians, based on queries like news messages. The high quality of the system is due to extensive data cleaning and transformation which could have been avoided when it had been available in an open machine readable format. 1 Introduction The Internet is changing from a web of documents into a web of objects. Open and interoperable (linkable) data are crucial for web applications which are build around objects. Examples of objects featuring prominently is (mashup) websites are traditional named entities like persons, products, organizations [6,4], but also events and unique items like e.g. houses. The success of several mashup sites is simply due to the fact that they provide a different grouping of already (freely) available data. Originally the data could only be grouped by documents; the mashup allows for groupings by objects which are of interest in their specific domain. Here is an example from the political domain. Suppose one wants to know more about Herman van Rompuy, the new EU “president” from Belgium. Being a former member of the Belgium parliament and several governments, an im- portant primary source of information are the parliamentary proceedings.
    [Show full text]
  • Federated Ontology Search Vasco Calais Pedro CMU-LTI-09-010
    Federated Ontology Search Vasco Calais Pedro CMU-LTI-09-010 Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213 www.lti.cs.cmu.edu Thesis Committee: Jaime Carbonell, Chair Eric Nyberg Robert Frederking Eduard Hovy, Information Sciences Institute Submitted in partial fulfillment of the requirements for the degree Doctor of Philosophy In Language and Information Technologies Copyright © 2009 Vasco Calais Pedro For my grandmother, Avó Helena, I am sorry I wasn’t there Abstract An Ontology can be defined as a formal representation of a set of concepts within a domain and the relationships between those concepts. The development of the semantic web initiative is rapidly increasing the number of publicly available ontologies. In such a distributed environment, complex applications often need to handle multiple ontologies in order to provide adequate domain coverage. Surprisingly, there is a lack of adequate frameworks for enabling the use of multiple ontologies transparently while abstracting the particular ontological structures used by that framework. Given that any ontology represents the views of its author or authors, using multiple ontologies requires us to deal with several significant challenges, some stemming from the nature of knowledge itself, such as cases of polysemy or homography, and some stemming from the structures that we choose to represent such knowledge with. The focus of this thesis is to explore a set of techniques that will allow us to overcome some of the challenges found when using multiple ontologies, thus making progress in the creation of a functional information access platform for structured sources.
    [Show full text]
  • Wikidata Through the Eyes of Dbpedia
    Semantic Web 0 (0) 1–11 1 IOS Press Wikidata through the Eyes of DBpedia Editor(s): Aidan Hogan, Universidad de Chile, Chile Solicited review(s): Denny Vrandecic, Google, USA; Heiko Paulheim, Universität Mannheim, Germany; Thomas Steiner, Google, USA Ali Ismayilov a;∗, Dimitris Kontokostas b, Sören Auer a, Jens Lehmann a, and Sebastian Hellmann b a University of Bonn and Fraunhofer IAIS e-mail: [email protected] | [email protected] | [email protected] b Universität Leipzig, Institut für Informatik, AKSW e-mail: {lastname}@informatik.uni-leipzig.de Abstract. DBpedia is one of the earliest and most prominent nodes of the Linked Open Data cloud. DBpedia extracts and provides structured data for various crowd-maintained information sources such as over 100 Wikipedia language editions as well as Wikimedia Commons by employing a mature ontology and a stable and thorough Linked Data publishing lifecycle. Wikidata, on the other hand, has recently emerged as a user curated source for structured information which is included in Wikipedia. In this paper, we present how Wikidata is incorporated in the DBpedia eco-system. Enriching DBpedia with structured information from Wikidata provides added value for a number of usage scenarios. We outline those scenarios and describe the structure and conversion process of the DBpediaWikidata (DBW) dataset. Keywords: DBpedia, Wikidata, RDF 1. Introduction munity provide more up-to-date information. In addi- tion to the independent growth of DBpedia and Wiki- In the past decade, several large and open knowl- data, there is a number of structural complementarities edge bases were created. A popular example, DB- as well as overlaps with regard to identifiers, structure, pedia [6], extracts information from more than one schema, curation, publication coverage and data fresh- hundred Wikipedia language editions and Wikimedia ness that are analysed throughout this manuscript.
    [Show full text]
  • Customised Visualisations of Linked Open Data
    Customised Visualisations of Linked Open Data Alice Graziosi, Angelo Di Iorio, Francesco Poggi, and Silvio Peroni DASPLab - Digital and Semantic Publishing Laboratory DISI - Department of Computer Science and Engineering University of Bologna, Italy [email protected], [email protected], [email protected], [email protected] Abstract. This paper aims to tackle on Linked Open Data (LOD) customised visualisations. The work is part of an ongoing research on interfaces and tools for helping non-programmers to easily present, analyse and support sense making over large semantic dataset. The customisation is a key aspect of our work. Pro- ducing effective customised visualisations is still difficult due to the complexity of the existing tools and the limited set of options they offer, especially to those with little knowledge in LOD and semantic data models. How can we give users full control on the primitives of the visualisation and their properties, without re- quiring them to master Semantic Web technologies or programming languages? The paper presents a conceptual model that can be used as a reference to build tools for generating customisable infoviews. We used it to conduct a survey on existing tools in terms of customisation capabilities. Starting from the feedback collected in this phase, we will implement these customisation features into some prototypes we are currently experimenting on. Keywords: Semantic Web Linked Open Data Visualisation RDF SPARQL. · · · 1 Introduction Along with the spread of the Semantic Web techniques, the amount of data available on the Web as RDF and, in particular, as Linked Open Data (LOD) increased.
    [Show full text]
  • Exploiting Dbpedia for Graph-Based Entity Linking to Wikipedia
    Exploiting DBpedia for Graph-based Entity Linking to Wikipedia Master Thesis presented by Bernhard Schafer¨ submitted to the Research Group Data and Web Science Prof. Dr. Christian Bizer University Mannheim May 2014 Contents 1 Introduction 1 1.1 Motivation . 1 1.2 Problem Description . 2 2 Related Work 4 2.1 The Entity Linking Process . 5 2.2 Entity Linking Data Sources . 6 2.2.1 Wikipedia . 7 2.2.2 DBpedia . 9 2.3 Phrase Spotting . 11 2.3.1 Link Probability . 11 2.3.2 NLP-informed Spotting . 12 2.4 Candidate Selection . 13 2.5 Disambiguation . 14 2.5.1 Features . 14 2.5.2 Systems . 18 3 Methods 22 3.1 Graph Construction . 22 3.1.1 Selecting DBpedia Datasets . 23 3.1.2 Triple Filtering . 24 3.2 Graph Traversal . 30 3.3 Candidate Pruning . 32 3.4 Semantic Relation Weighting . 35 3.4.1 Information Content . 37 3.4.2 Joint Information Content . 37 3.4.3 Combined Information Content . 38 3.4.4 Information Content and Pointwise Mutual Information . 38 3.5 Graph Connectivity Measures . 40 3.5.1 Local Measures . 40 3.6 Federated Entity Linking . 45 3.6.1 Enhanced Graph-based Disambiguation . 46 3.6.2 System Combination . 46 i CONTENTS ii 3.6.3 Feature Combination . 49 4 Evaluation 51 4.1 Metrics . 52 4.2 Datasets . 54 4.3 Baseline Linkers . 56 4.4 DBpedia Graph Linker . 58 4.4.1 Error Analysis . 58 4.4.2 Performance . 62 4.5 DBpedia Spotlight Linker . 67 4.5.1 Error Analysis .
    [Show full text]
  • Un Approccio Basato Su Dbpedia Per La Sistematizzazione Della Conoscenza Sul Web
    POLITECNICO DI TORINO SCUOLA DI DOTTORATO Dottorato in Beni Culturali - XXV Ciclo Tesi di Dottorato Un approccio basato su DBpedia per la sistematizzazione della conoscenza sul Web Federico Cairo Tutore Coordinatore del corso di dottorato prof. Mario Ricciardi prof. Costanza Roggero Marzo 2013 2 Indice Ringraziamenti ............................................................................................... 7 Introduzione ................................................................................................... 9 Capitolo 1 - Wikipedia e l’intelligenza collettiva .......................................... 13 1.1 Il successo di Wikipedia ................................................................... 13 1.2 Intelligenza collettiva e cultura partecipativa in Wikipedia ............. 21 1.3 Criticità in Wikipedia ...................................................................... 34 1.3.1 Inaccuratezza e inaffidabilità ................................................... 35 1.3.2 Copertura diseguale dei diversi ambiti del sapere .................... 39 1.3.3 Parzialità ed esposizione a pregiudizi e interessi ...................... 42 1.3.4 Volatilità ................................................................................. 45 1.4 Conclusioni ....................................................................................... 45 Capitolo 2 - DBpedia, Linked Open Data e classificazione semantica dei contenuti ...........................................................................................................
    [Show full text]