Uncertainty-Sensitive Reasoning Over the Web of Data

THESE` Pour obtenir le grade de DOCTEUR DE L’UNIVERSITE´ DE GRENOBLE Specialit´ e´ : Informatique Arretˆ e´ ministerial´ : 01 October 2011 Present´ ee´ par Mustafa AL BAKRI These` dirigee´ par Marie-Christine Rousset et codirigee´ par Manuel Atencia prepar´ ee´ au sein Laboratoire d’Informatique de Grenoble (LIG) et de EDMSTII Uncertainty-Sensitive Reasoning over the Web of Data These` soutenue publiquement le 15 December 2014 sous reserve´ de l’autorisation a` soutenir, devant le jury compose´ de : M. Mohand-Said Hacid Professeur Universite´ Claude Bernard Lyon 1, Rapporteur M. Andrea Tettamanzi Professeur Universite´ Nice Sophia Antipolis, Rapporteur M. Jer´ omeˆ Euzenat Directeur de recherche INRIA Grenoble Rhone-Alpes,ˆ Examinateur Mme. Marie-Laure Mugnier Professeur Universite´ de Montpellier 2 , Examinateur Mme. Marie-Christine Rousset Professeur Universite´ Joseph Fourier (Grenoble 1), Directeur de these` M. Manuel Atencia Maˆıtre de conf. Universite´ Pierre Mendes-France` (Grenoble 2), Co-Directeur de these` “I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ”Semantic Web”, which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ”intelligent agents” people have touted for ages will finally materialize” Tim, Berners-Lee Abstract In this thesis we investigate several approaches that help users to find useful and trustful information in the Web of Data using the Semantic Web technologies. In this purpose, we tackle two research issues: Data Linkage in Linked Data and Trust in Semantic P2P Networks. We model the problem of data linkage in Linked Data as a reasoning problem on possibly decen- tralized data. We describe a novel Import-by-Query algorithm that alternates steps of sub-query rewriting and of tailored querying the Linked Data cloud in order to import data as specific as possible for inferring or contradicting given target same-as facts. Experiments conducted on real-world datasets have demonstrated the feasibility of this approach and its usefulness in practice for data linkage and disambiguation. Furthermore, we propose an adaptation of this approach to take into account possibly uncertain data and knowledge, with a result, the inference of same-as and different-from links having some weights. In this adaptation we modeled uncertainty as probability values. Our experiments have showed that our the adapted approach scales to large data sets and produces meaningful probabilistic weights. Concerning trust, we introduce a trust mechanism for guiding the query-answering process in Semantic P2P Networks. Peers in Semantic P2P Networks organize their information using sep- arate ontologies and rely on alignments between their ontologies for translating queries. Trust is such a setting is subjective and estimates the probability that a peer will provide satisfactory answers for specific queries in future interactions. In order to compute trust, the mechanism exploits the information provided by alignments, along with the one that comes from peer’s ex- periences. The calculated trust values are refined over time using Bayesian inference as more queries are sent and answers received. For the evaluation of our mechanism, we build a semantic P2P bookmarking system (TrustMe) in which we can vary different quantitative and qualitative parameters. The results show the convergence of trust, and highlight the gain in the quality of peers’ answers —measured with precision and recall— when the process of query answering is guided by our trust mechanism. Contents Abstract iii Contents iv List of Figures vii List of Tables viii Abbreviations ix 1 Introduction1 1.1 Research problems addressed in this thesis....................4 1.2 Contributions and Outline of the Thesis.....................6 1.3 Publications....................................7 Under Submission........................7 I Reasoning over Linked-Data8 2 Inferring same-as facts from Linked Data: an iterative import-by-query approach9 2.1 Introduction....................................9 2.2 Illustrative Example................................ 11 2.3 Related work................................... 13 2.4 Formal background and problem statement................... 14 2.4.1 URIs, URLs and namespaces....................... 14 2.4.2 RDF datasets in Linked Data....................... 15 2.4.3 Queries over RDF datasets in Linked Data................ 15 2.4.4 Deductive RDF datasets......................... 16 2.4.5 Problem statement............................ 17 2.5 The iterative Import-by-Query Algorithm.................... 18 2.5.1 The QESQ algorithm........................... 19 2.5.2 Combining forward and backward chaining............... 21 2.6 Experiments.................................... 22 2.6.1 Experimental Goals and Set-Up..................... 23 2.6.2 Experimental Results........................... 24 2.7 Conclusion.................................... 25 iv Contents v 3 Reasoning With Uncertainty For Data Linkage 28 3.1 Introduction.................................... 28 3.2 Sources of Uncertainty.............................. 29 3.3 Modeling Uncertainty in Linked Data...................... 30 3.3.1 Illustrative Example........................... 30 3.3.2 Formal Background........................... 33 3.4 ProbFR: The Probabilistic forward Reasoner................... 35 3.5 Data complexity and approximation techniques................. 38 3.5.1 Maximum size of an event expression.................. 38 3.5.2 Approximating probabilistic weights................... 38 3.6 Setting up the acceptance threshold........................ 39 3.7 Experiments.................................... 40 3.7.1 Experimental Set-Up........................... 40 3.7.2 Experimental Results........................... 41 3.8 Extending the import by query approach to uncertainty............. 43 3.8.1 The Probabilistic Iterative Import-by-query............... 44 3.8.2 The PQESQ algorithm.......................... 45 3.9 Related Work and Conclusion.......................... 47 II Trust in Semantic Web 49 4 Trust in Networks of Ontologies and Alignments 50 4.1 Introduction.................................... 50 4.1.1 Semantic P2P Networks......................... 51 4.1.2 Trust in Semantic P2P Networks..................... 51 4.1.3 Summary of our Contribution...................... 52 4.2 Preliminaries................................... 53 4.2.1 Ontologies and Populated Ontologies.................. 53 4.2.2 Alignments................................ 54 4.2.3 Peers and Acquaintance Graphs..................... 54 4.2.4 Queries and Query Translations..................... 55 4.3 The Trust Mechanism............................... 55 4.3.1 Definition and Estimation of Trust.................... 56 4.3.2 Computation and Refinement of Trust Estimation............ 57 No direct experience: alignment-based trust........... 58 Direct experience: trust refinement................ 58 4.3.3 Update of Populated Ontologies..................... 60 Provenance-based trust aggregation................ 61 Discard of instances........................ 62 4.3.4 Use of Trust................................ 62 4.4 Experimentation and Evaluation......................... 62 4.4.1 Experimental Design........................... 63 Generation of P2P networks................... 63 Construction of the ontologies.................. 63 Construction of the reference populated ontologies....... 64 Construction and peer assignment of initial populated ontologies 65 Contents vi Construction of initial alignments................ 65 Simulation of query answering.................. 65 4.4.2 Experimental Results........................... 67 Convergence of trust....................... 67 Gain in the quality of peer answers when using trust...... 69 4.5 Conclusions and Related Work.......................... 71 4.5.1 Trust................................... 73 4.5.2 Ontology and Schema Matching..................... 75 5 Demo: TrustMe I got what you mean 78 5.1 Introduction.................................... 78 5.2 Goals of the Demo................................ 79 5.3 Setting Up the P2P Bookmarking Scenario.................... 80 Taxonomy Generation....................... 80 Alignment Generation....................... 80 Network Generation........................ 80 Taxonomy Population....................... 80 5.4 TrustMe...................................... 81 6 Conclusion and Future Works 85 6.1 Data Linkage................................... 86 6.2 Trust in Semantic P2P Networks......................... 87 A The set of rules used in the experiments 90 B The set of uncertain rules used in the experiments 94 Bibliography 97 List of Figures 1.1 The Linked Open Data diagram. It shows datasets that have been published in Linked Data format. It is based on metadata collected and curated by contribu- tors to the Data Hub as well as on metadata extracted from a crawl of the Linked Data web conducted in April 2014.........................3 2.1 A sample of the INA RDF facts and an extract of the INA vocabulary...... 11 2.2 DBpedia facts and an extract of the DBpedia vocabulary............. 12 2.3 The resulted external sub-queries submitted to DBpedia and their returned answers 13 3.1 A sample of certain facts from the INA and DBpedia datasets.......... 30 3.2 The distribution of the number of the inferred

Load more