Detecting Errors in Linked Data Using Learned Ontologies and Outlier

Total Page:16

File Type:pdf, Size:1020Kb

Detecting Errors in Linked Data Using Learned Ontologies and Outlier Detecting Errors in Linked Data Using Ontology Learning and Outlier Detection Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften der Universit¨atMannheim vorgelegt von Daniel Fleischhacker aus Groß-Gerau Mannheim, 2015 Dekan: Professor Dr. Heinz Jürgen Müller, Universität Mannheim Referent: Professor Dr. Heiner Stuckenschmidt, Universität Mannheim Korreferent: Professor Dr. Felix Naumann, Universität Potsdam Tag der mündlichen Prüfung: 11. März 2016 i Abstract Linked Data is one of the most successful implementations of the Semantic Web idea. This is demonstrated by the large amount of data available in repositories constituting the Linked Open Data cloud and being linked to each other. Many of these datasets are not created manually but are extracted automatically from existing datasets. Thus, extraction errors, which a human would easily recognize, might go unnoticed and could hence considerably diminish the usability of Linked Data. The large amount of data renders manual detection of such errors unrealistic and makes automatic approaches for detecting errors desirable. To tackle this need, this thesis focuses on error detection approaches on the logical level and on the level of numerical data. In addition, the presented methods operate solely on the Linked Data dataset without a requirement for additional external data. The first two parts of this work deal with the detection of logical errors in Linked Data. It is argued that an upstream formalization of the knowledge, which is required for the error detection, into ontologies and then applying it in a sepa- rate step has several advantages over approaches that skip the formalization step. Consequently, the first part introduces inductive approaches for learning highly expressive ontologies from existing instance data as a basis for detecting logical errors. The proposed and evaluated techniques allow to learn class disjointness axioms as well as several property-centric axiom types from instance data. The second part of this thesis operates on the ontologies learned by the ap- proaches proposed in the previous part. First, their quality is improved by detect- ing errors possibly introduced by the automatic learning process. For this purpose, a pattern-based approach for finding the root causes of ontology errors that is tai- lored to the specifics of the learned ontologies is proposed and then used in the context of ontology debugging approaches. To conclude the logical error detec- tion, the usage of learned ontologies for finding erroneous statements in Linked Data is evaluated in the final chapter of the second part. This is done by applying a pattern-based error detection approach that employs the learned ontologies to the DBpedia dataset and then manually evaluating the results which finally shows the adequacy of learned ontologies for logical error detection. The final part of this thesis complements the previously shown logical error detection with an approach to detect data-level errors in numerical values. The presented method applies outlier detection techniques to the datatype property val- ues to find potentially erroneous ones whereby the result and performance of the detection step is improved by the introduction of additional preprocessing steps. Furthermore, a subsequent cross-checking step is proposed which allows to handle the outlier detection imminent problem of natural outliers. In summary, this work introduces a number of approaches that allow to detect errors in Linked Data without a requirement for additional, external data. The generated lists of potentially erroneous facts can be a first indication for errors and the intermediate step of learning ontologies makes the full workflow even more suited for being used in a scenario which includes human interaction. ii Zusammenfassung Linked Data ist eine der erfolgreichsten Umsetzungen der Ideen des Semantic Web, was insbesondere an den großen Datenmengen zu erkennen ist, welche im Rahmen der Linked Open Data Cloud verfügbar sind. Viele dieser Datensätze sind jedoch nicht manuell erstellt, sondern mittels automatisierter Ansätze aus bereits vorhan- denen Datensätzen extrahiert worden. Hierdurch enthalten sie viele Fehler, welche bei einer manuellen Erstellung der Daten hätten erkannt werden können, nun je- doch die Verwendbarkeit der Daten einschränken. Da eine nachgelagerte manuelle Fehlererkennung aufgrund der großen Datenmenge nicht praktikabel ist, ist ein automatisierter Ansatz zur Erkennung von Datenfehlern wünschenswert. Die vor- liegende Arbeit setzt hier an, indem sie Methoden zur Erkennung von Datenfehlern auf der logischen und der numerischen Ebene einführt. Ein Hauptaugenmerk liegt hierbei auf Ansätzen, welche ohne zusätzliche, externe Datenquellen direkt auf dem Linked Data Datensatz angewandt werden können. Die ersten beiden Teile dieser Arbeit befassen sich mit der Erkennung von Fehlern auf der logischen Ebene. Grundlegend wird hierbei zugunsten der Nut- zung von Ontologien zur vorgelagerten Formalisierung des Wissens, welches für die Fehlererkennung genutzt wird, argumentiert. Daher werden im ersten Teil die- ser Arbeit induktive Ansätze zum Lernen von expressiven Ontologien präsentiert, welche Ontologie-Axiome für die Disjunktheit von Klassen sowie einer Reihe von Property-spezifischen Axiomen unterstützen. Der zweite Teil dieser Arbeit baut anschließend auf den dieserart gelernten Ontologien auf. Zur Erkennung von Fehlern in den gelernten Ontologien wird ei- ne muster-basierte Methode zur Bestimmung der Ursachen von Ontologie-Fehlern vorgeschlagen, welche speziell auf die in den Ontologien verwendeten Axiomar- ten und ihre Verwendung zugeschnitten ist. Diese Methode wird daraufhin im Rah- men von verschiedenen Ansätzen zur Behebung von Fehlern in Ontologien genutzt und die Ergebnisse ausgewertet. Schließlich wird die Erkennung logischer Feh- ler mittels der gelernten Ontologien anhand von Experimenten auf dem DBpedia Datensatz demonstriert. Die anschließende Auswertung zeigt die Anwendbarkeit gelernter Ontologien zur Erkennung logischer Fehler. Im dritten Teil wird daraufhin eine Methode zur Erkennung von Fehlern in numerischen Werten erweitert, welche auf Techniken zur Erkennung von Ausrei- ßern basiert. Hierbei verbessert ein Vorverarbeitungsschritt die Genauigkeit des Ansatzes und reduziert gleichzeitig die benötigte Verarbeitungszeit. Ein zusätzli- cher Nachbearbeitungsschritt erlaubt die Einbindung von im Linked Data Daten- satz verbundenen Werten zur Behandlung von natürlichen Ausreißern. Zusammenfassend präsentiert diese Arbeit Ansätze, deren Kombination es er- laubt Fehler in Linked Data auf logischer und numerischer Ebene zu erkennen und dabei unabhängig von externen Datenquellen zu sein. Die Listen potentieller Feh- ler, welche durch diese Ansätze erstellt werden, können anschließend manuell ge- prüft und wenn notwendig behoben werden. Der Zwischenschritt über Ontologien eröffnet hierbei zusätzliche Möglichkeiten im interaktiven Einsatz. Contents 1 Introduction 1 1.1 Research Questions . .5 1.2 Reader’s Guide . .6 2 Foundations 8 2.1 Description Logics and Ontologies . .8 2.2 RDF and Linked Data . 16 2.2.1 Resource Description Framework . 16 2.2.2 Linked Data . 17 2.2.3 DBpedia . 19 I Learning Expressive Schemas 23 3 Preliminaries 24 3.1 Learning from Instance Data . 25 3.2 Association Rule Mining . 27 3.2.1 Generating Association Rules . 31 3.2.2 Other Algorithms . 32 3.3 Statistical Schema Induction . 33 4 Related Work 39 4.1 Ontology Learning . 39 4.2 Inductive Ontology Learning . 41 4.3 Learning Disjointness Axioms . 42 4.4 Profiling Linked Data Datasets . 46 5 Inductive Learning of Disjointness Axioms 48 5.1 Class Disjointness Gold Standard . 50 5.1.1 Methodology . 50 5.1.2 Analysis . 52 5.2 Approaches . 57 5.2.1 Correlation-Based Approach . 57 5.2.2 Association Rule Mining-Based Approach . 59 iii iv CONTENTS 5.2.3 Negative Association Rule-based Approach . 61 5.3 Evaluation . 62 5.4 Conclusion . 68 6 Inductive Learning of Property Axioms 71 6.1 Approaches . 72 6.1.1 Terminology Acquisition . 73 6.1.2 Creation of Transaction Tables . 74 6.1.3 Association Rule Mining and Axiom Generation . 76 6.2 Evaluation . 78 6.2.1 Settings . 78 6.2.2 Expert Evaluation . 79 6.2.3 Crowd-Sourced Evaluation . 82 6.3 Conclusions and Contributions . 86 II Logical Debugging of Linked Data 87 7 Generating Incoherence Explanations 88 7.1 Related Work . 90 7.2 Approach . 91 7.2.1 Generation of Explanations . 94 7.2.2 Implementation . 96 7.3 Experiments . 97 7.3.1 Settings . 97 7.3.2 Results . 98 7.4 Conclusion . 101 8 Repairing Incoherent Ontologies 104 8.1 Related Work . 105 8.2 Approaches . 107 8.2.1 Baseline Approach . 108 8.2.2 Axiom Adding Approach . 109 8.2.3 MAP Inference-Based Approach . 110 8.2.4 Pure Markov Logic Approach . 112 8.3 Evaluation . 112 8.3.1 Settings . 113 8.3.2 Results . 114 8.4 Conclusion . 116 9 Schema-Based Error Detection 118 9.1 Related Work . 119 9.2 Approach . 123 9.3 Experiments . 125 CONTENTS v 9.3.1 Disjointness-Enriched Ontologies . 126 9.3.2 Property-Enriched Ontology . 129 9.4 Conclusion . 131 III Detection of Numerical Errors in Linked Data 133 10 Preliminaries: Outlier Detection 134 10.1 Statistical Outlier Detection . 136 10.2 Nearest-Neighbor-Based Outlier Detection . 137 11 Detecting Numerical Errors 143 11.1 Related Work . 145 11.2 Approach . 148 11.2.1 Dataset Inspection . 148 11.2.2 Generation of Possible Constraints . 149 11.2.3 Finding Subpopulations . 151 11.2.4 Outlier Detection and Outlier Scores . 154
Recommended publications
  • A Data-Driven Framework for Assisting Geo-Ontology Engineering Using a Discrepancy Index
    University of California Santa Barbara A Data-Driven Framework for Assisting Geo-Ontology Engineering Using a Discrepancy Index A Thesis submitted in partial satisfaction of the requirements for the degree Master of Arts in Geography by Bo Yan Committee in charge: Professor Krzysztof Janowicz, Chair Professor Werner Kuhn Professor Emerita Helen Couclelis June 2016 The Thesis of Bo Yan is approved. Professor Werner Kuhn Professor Emerita Helen Couclelis Professor Krzysztof Janowicz, Committee Chair May 2016 A Data-Driven Framework for Assisting Geo-Ontology Engineering Using a Discrepancy Index Copyright c 2016 by Bo Yan iii Acknowledgements I would like to thank the members of my committee for their guidance and patience in the face of obstacles over the course of my research. I would like to thank my advisor, Krzysztof Janowicz, for his invaluable input on my work. Without his help and encour- agement, I would not have been able to find the light at the end of the tunnel during the last stage of the work. Because he provided insight that helped me think out of the box. There is no better advisor. I would like to thank Yingjie Hu who has offered me numer- ous feedback, suggestions and inspirations on my thesis topic. I would like to thank all my other intelligent colleagues in the STKO lab and the Geography Department { those who have moved on and started anew, those who are still in the quagmire, and those who have just begun { for their support and friendship. Last, but most importantly, I would like to thank my parents for their unconditional love.
    [Show full text]
  • A Logical Framework for Modularity of Ontologies∗
    A Logical Framework for Modularity of Ontologies∗ Bernardo Cuenca Grau, Ian Horrocks, Yevgeny Kazakov and Ulrike Sattler The University of Manchester School of Computer Science Manchester, M13 9PL, UK {bcg, horrocks, ykazakov, sattler }@cs.man.ac.uk Abstract ontology itself by possibly reusing the meaning of external symbols. Hence by merging an ontology with external on- Modularity is a key requirement for collaborative tologies we import the meaning of its external symbols to de- ontology engineering and for distributed ontology fine the meaning of its local symbols. reuse on the Web. Modern ontology languages, To make this idea work, we need to impose certain con- such as OWL, are logic-based, and thus a useful straints on the usage of the external signature: in particular, notion of modularity needs to take the semantics of merging ontologies should be “safe” in the sense that they ontologies and their implications into account. We do not produce unexpected results such as new inconsisten- propose a logic-based notion of modularity that al- cies or subsumptions between imported symbols. To achieve lows the modeler to specify the external signature this kind of safety, we use the notion of conservative exten- of their ontology, whose symbols are assumed to sions to define modularity of ontologies, and then prove that be defined in some other ontology. We define two a property of some ontologies, called locality, can be used restrictions on the usage of the external signature, to achieve modularity. More precisely, we define two no- a syntactic and a slightly less restrictive, seman- tions of locality for SHIQ TBoxes: (i) a tractable syntac- tic one, each of which is decidable and guarantees tic one which can be used to provide guidance in ontology a certain kind of “black-box” behavior, which en- editing tools, and (ii) a more general semantic one which can ables the controlled merging of ontologies.
    [Show full text]
  • Semi-Automated Ontology Based Question Answering System Open Access
    Journal of Advanced Research in Computing and Applications 17, Issue 1 (2019) 1-5 Journal of Advanced Research in Computing and Applications Journal homepage: www.akademiabaru.com/arca.html ISSN: 2462-1927 Open Semi-automated Ontology based Question Answering System Access 1, Khairul Nurmazianna Ismail 1 Faculty of Computer Science, Universiti Teknologi MARA Melaka, Malaysia ABSTRACT Question answering system enable users to retrieve exact answer for questions submit using natural language. The demand of this system increases since it able to deliver precise answer instead of list of links. This study proposes ontology-based question answering system. Research consist explanation of question answering system architecture. This question answering system used semi-automatic ontology development (Ontology Learning) approach to develop its ontology. Keywords: Question answering system; knowledge Copyright © 2019 PENERBIT AKADEMIA BARU - All rights reserved base; ontology; online learning 1. Introduction In the era of WWW, people need to search and get information fast online or offline which make people rely on Question Answering System (QAS). One of the common and traditionally use QAS is Frequently Ask Question site which contains common and straight forward answer. Currently, QAS have emerge as powerful platform using various techniques such as information retrieval, Knowledge Base, Natural Language Processing and Hybrid Based which enable user to retrieve exact answer for questions posed in natural language using either pre-structured database or a collection of natural language documents [1,4]. The demand of QAS increases day by day since it delivers short, precise and question-specific answer [10]. QAS with using knowledge base paradigm are better in Restricted- domain QA system since it ability to focus [12].
    [Show full text]
  • Deciding Semantic Matching of Stateless Services∗
    Deciding Semantic Matching of Stateless Services∗ Duncan Hull†, Evgeny Zolin†, Andrey Bovykin‡, Ian Horrocks†, Ulrike Sattler†, and Robert Stevens† School of Computer Science, Department of Computer Science, † ‡ University of Manchester, UK University of Liverpool, UK Abstract specifying automated reasoning algorithms for such stateful service descriptions is basically impossible in the presence We present a novel approach to describe and reason about stateless information processing services. It can of any expressive ontology (Baader et al. 2005). Stateless- be seen as an extension of standard descriptions which ness implies that we do not need to formulate pre- and post- makes explicit the relationship between inputs and out- conditions since our services do not change the world. puts and takes into account OWL ontologies to fix the The question we are interested in here is how to help the meaning of the terms used in a service description. This biologist to find a service he or she is looking for, i.e., a allows us to define a notion of matching between ser- service that works with inputs and outputs the biologist can vices which yields high precision and recall for service provide/accept, and that provides the required functionality. location. We explain why matching is decidable, and The growing number of publicly available biomedical web provide biomedical example services to illustrate the utility of our approach. services, 3000 as of February 2006, required better match- ing techniques to locate services. Thus, we are concerned with the question of how to describe a service request Q Introduction and service advertisements Si such that the notion of a ser- Understanding the data generated from genome sequenc- vice S matching the request Q can be defined in a “useful” ing projects like the Human Genome Project is recognised way.
    [Show full text]
  • Ontology-Based Methods for Analyzing Life Science Data
    Habilitation a` Diriger des Recherches pr´esent´ee par Olivier Dameron Ontology-based methods for analyzing life science data Soutenue publiquement le 11 janvier 2016 devant le jury compos´ede Anita Burgun Professeur, Universit´eRen´eDescartes Paris Examinatrice Marie-Dominique Devignes Charg´eede recherches CNRS, LORIA Nancy Examinatrice Michel Dumontier Associate professor, Stanford University USA Rapporteur Christine Froidevaux Professeur, Universit´eParis Sud Rapporteure Fabien Gandon Directeur de recherches, Inria Sophia-Antipolis Rapporteur Anne Siegel Directrice de recherches CNRS, IRISA Rennes Examinatrice Alexandre Termier Professeur, Universit´ede Rennes 1 Examinateur 2 Contents 1 Introduction 9 1.1 Context ......................................... 10 1.2 Challenges . 11 1.3 Summary of the contributions . 14 1.4 Organization of the manuscript . 18 2 Reasoning based on hierarchies 21 2.1 Principle......................................... 21 2.1.1 RDF for describing data . 21 2.1.2 RDFS for describing types . 24 2.1.3 RDFS entailments . 26 2.1.4 Typical uses of RDFS entailments in life science . 26 2.1.5 Synthesis . 30 2.2 Case study: integrating diseases and pathways . 31 2.2.1 Context . 31 2.2.2 Objective . 32 2.2.3 Linking pathways and diseases using GO, KO and SNOMED-CT . 32 2.2.4 Querying associated diseases and pathways . 33 2.3 Methodology: Web services composition . 39 2.3.1 Context . 39 2.3.2 Objective . 40 2.3.3 Semantic compatibility of services parameters . 40 2.3.4 Algorithm for pairing services parameters . 40 2.4 Application: ontology-based query expansion with GO2PUB . 43 2.4.1 Context . 43 2.4.2 Objective .
    [Show full text]
  • Description Logics
    Description Logics Franz Baader1, Ian Horrocks2, and Ulrike Sattler2 1 Institut f¨urTheoretische Informatik, TU Dresden, Germany [email protected] 2 Department of Computer Science, University of Manchester, UK {horrocks,sattler}@cs.man.ac.uk Summary. In this chapter, we explain what description logics are and why they make good ontology languages. In particular, we introduce the description logic SHIQ, which has formed the basis of several well-known ontology languages, in- cluding OWL. We argue that, without the last decade of basic research in description logics, this family of knowledge representation languages could not have played such an important rˆolein this context. Description logic reasoning can be used both during the design phase, in order to improve the quality of ontologies, and in the deployment phase, in order to exploit the rich structure of ontologies and ontology based information. We discuss the extensions to SHIQ that are required for languages such as OWL and, finally, we sketch how novel reasoning services can support building DL knowledge bases. 1 Introduction The aim of this section is to give a brief introduction to description logics, and to argue why they are well-suited as ontology languages. In the remainder of the chapter we will put some flesh on this skeleton by providing more technical details with respect to the theory of description logics, and their relationship to state of the art ontology languages. More detail on these and other matters related to description logics can be found in [6]. Ontologies There have been many attempts to define what constitutes an ontology, per- haps the best known (at least amongst computer scientists) being due to Gruber: “an ontology is an explicit specification of a conceptualisation” [47].3 In this context, a conceptualisation means an abstract model of some aspect of the world, taking the form of a definition of the properties of important 3 This was later elaborated to “a formal specification of a shared conceptualisation” [21].
    [Show full text]
  • Proceedings of the 11Th International Workshop on Ontology
    Proceedings of the 11th International Workshop on Ontology Matching (OM-2016) Pavel Shvaiko, Jérôme Euzenat, Ernesto Jiménez-Ruiz, Michelle Cheatham, Oktie Hassanzadeh, Ryutaro Ichise To cite this version: Pavel Shvaiko, Jérôme Euzenat, Ernesto Jiménez-Ruiz, Michelle Cheatham, Oktie Hassanzadeh, et al.. Proceedings of the 11th International Workshop on Ontology Matching (OM-2016). Ontology matching workshop, Kobe, Japan. No commercial editor., pp.1-252, 2016. hal-01421835 HAL Id: hal-01421835 https://hal.inria.fr/hal-01421835 Submitted on 15 Jul 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Ontology Matching OM-2016 Proceedings of the ISWC Workshop Introduction Ontology matching1 is a key interoperability enabler for the semantic web, as well as a useful tactic in some classical data integration tasks dealing with the semantic hetero- geneity problem. It takes ontologies as input and determines as output an alignment, that is, a set of correspondences between the semantically related entities of those on- tologies. These correspondences can be used for various tasks, such as ontology merg- ing, data translation, query answering or navigation on the web of data. Thus, matching ontologies enables the knowledge and data expressed in the matched ontologies to in- teroperate.
    [Show full text]
  • Semantically Enhanced and Minimally Supervised Models for Ontology Construction, Text Classification, and Document Recommendation Alkhatib, Wael (2020)
    Semantically Enhanced and Minimally Supervised Models for Ontology Construction, Text Classification, and Document Recommendation Alkhatib, Wael (2020) DOI (TUprints): https://doi.org/10.25534/tuprints-00011890 Lizenz: CC-BY-ND 4.0 International - Creative Commons, Namensnennung, keine Bearbei- tung Publikationstyp: Dissertation Fachbereich: 18 Fachbereich Elektrotechnik und Informationstechnik Quelle des Originals: https://tuprints.ulb.tu-darmstadt.de/11890 SEMANTICALLY ENHANCED AND MINIMALLY SUPERVISED MODELS for Ontology Construction, Text Classification, and Document Recommendation Dem Fachbereich Elektrotechnik und Informationstechnik der Technischen Universität Darmstadt zur Erlangung des akademischen Grades eines Doktor-Ingenieurs (Dr.-Ing.) genehmigte Dissertation von wael alkhatib, m.sc. Geboren am 15. October 1988 in Hama, Syrien Vorsitz: Prof. Dr.-Ing. Jutta Hanson Referent: Prof. Dr.-Ing. Ralf Steinmetz Korreferent: Prof. Dr.-Ing. Steffen Staab Tag der Einreichung: 28. January 2020 Tag der Disputation: 10. June 2020 Hochschulkennziffer D17 Darmstadt 2020 This document is provided by tuprints, e-publishing service of the Technical Univer- sity Darmstadt. http://tuprints.ulb.tu-darmstadt.de [email protected] Please cite this document as: URN:nbn:de:tuda-tuprints-118909 URL:https://tuprints.ulb.tu-darmstadt.de/id/eprint/11890 This publication is licensed under the following Creative Commons License: Attribution-No Derivatives 4.0 International https://creativecommons.org/licenses/by-nd/4.0/deed.en Wael Alkhatib, M.Sc.: Semantically Enhanced and Minimally Supervised Models, for Ontology Construction, Text Classification, and Document Recommendation © 28. January 2020 supervisors: Prof. Dr.-Ing. Ralf Steinmetz Prof. Dr.-Ing. Steffen Staab location: Darmstadt time frame: 28. January 2020 ABSTRACT The proliferation of deliverable knowledge on the web, along with the rapidly in- creasing number of accessible research publications, make researchers, students, and educators overwhelmed.
    [Show full text]
  • Download Special Issue
    Scientific Programming Scientific Programming Techniques and Algorithms for Data-Intensive Engineering Environments Lead Guest Editor: Giner Alor-Hernandez Guest Editors: Jezreel Mejia-Miranda and José María Álvarez-Rodríguez Scientific Programming Techniques and Algorithms for Data-Intensive Engineering Environments Scientific Programming Scientific Programming Techniques and Algorithms for Data-Intensive Engineering Environments Lead Guest Editor: Giner Alor-Hernández Guest Editors: Jezreel Mejia-Miranda and José María Álvarez-Rodríguez Copyright © 2018 Hindawi. All rights reserved. This is a special issue published in “Scientific Programming.” All articles are open access articles distributed under the Creative Com- mons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Editorial Board M. E. Acacio Sanchez, Spain Christoph Kessler, Sweden Danilo Pianini, Italy Marco Aldinucci, Italy Harald Köstler, Germany Fabrizio Riguzzi, Italy Davide Ancona, Italy José E. Labra, Spain Michele Risi, Italy Ferruccio Damiani, Italy Thomas Leich, Germany Damian Rouson, USA Sergio Di Martino, Italy Piotr Luszczek, USA Giuseppe Scanniello, Italy Basilio B. Fraguela, Spain Tomàs Margalef, Spain Ahmet Soylu, Norway Carmine Gravino, Italy Cristian Mateos, Argentina Emiliano Tramontana, Italy Gianluigi Greco, Italy Roberto Natella, Italy Autilia Vitiello, Italy Bormin Huang, USA Francisco Ortin, Spain Jan Weglarz, Poland Chin-Yu Huang, Taiwan Can Özturan, Turkey
    [Show full text]
  • Ontology Learning and Its Application to Automated Terminology Translation
    Natural Language Processing Ontology Learning and Its Application to Automated Terminology Translation Roberto Navigli and Paola Velardi, Università di Roma La Sapienza Aldo Gangemi, Institute of Cognitive Sciences and Technology lthough the IT community widely acknowledges the usefulness of domain ontolo- A gies, especially in relation to the Semantic Web,1,2 we must overcome several bar- The OntoLearn system riers before they become practical and useful tools. Thus far, only a few specific research for automated ontology environments have ontologies. (The “What Is an Ontology?” sidebar on page 24 provides learning extracts a definition and some background.) Many in the and is part of a more general ontology engineering computational-linguistics research community use architecture.4,5 Here, we describe the system and an relevant domain terms WordNet,3 but large-scale IT applications based on experiment in which we used a machine-learned it require heavy customization. tourism ontology to automatically translate multi- from a corpus of text, Thus, a critical issue is ontology construction— word terms from English to Italian. The method can identifying, defining, and entering concept defini- apply to other domains without manual adaptation. relates them to tions. In large, complex application domains, this task can be lengthy, costly, and controversial, because peo- OntoLearn architecture appropriate concepts in ple can have different points of view about the same Figure 1 shows the elements of the architecture. concept. Two main approaches aid large-scale ontol- Using the Ariosto language processor,6 OntoLearn a general-purpose ogy construction. The first one facilitates manual extracts terminology from a corpus of domain text, ontology engineering by providing natural language such as specialized Web sites and warehouses or doc- ontology, and detects processing tools, including editors, consistency uments exchanged among members of a virtual com- checkers, mediators to support shared decisions, and munity.
    [Show full text]
  • Justification Oriented Proofs In
    Justification Oriented Proofs in OWL Matthew Horridge, Bijan Parsia, and Ulrike Sattler School of Computer Science, The University of Manchester Abstract. Justifications — that is, minimal entailing subsets of an on- tology — are currently the dominant form of explanation provided by ontology engineering environments, especially those focused on the Web Ontology Language (OWL). Despite this, there are naturally occurring justifications that can be very difficult to understand. In essence, justifi- cations are merely the premises of a proof and, as such, do not articulate the (often non-obvious) reasoning which connect those premises with the conclusion. This paper presents justification oriented proofs as a poten- tial solution to this problem. 1 Introduction and Motivation Modern ontology development environments such as Prot´eg´e-4, the NeOn Toolkit, Swoop, and Top Braid Composer, allow users to request explanations for entail- ments (inferences) that they encounter when editing or browsing ontologies. In- deed, the provision of explanation generating functionality is generally seen as being a vital component in such tools. Over the last few years, justifications have become the dominant form of explanation in these tools. This paper examines justifications as a kind of explanation and highlights some problems with them. It then presents justification lemmatisation as a non-standard reasoning service, which can be used to augment a justification with intermediate inference steps, and gives rise to a structure known as a justification oriented proof. Ultimately, a justification oriented proof could be used as an input into some presentation de- vice to help a person step though a justification that is otherwise too difficult for them to understand.
    [Show full text]
  • Semi-Automatic Terminology Ontology Learning Based on Topic Modeling
    The paper has been accepted at Engineering Applications of Artificial Intelligence on 9 May 2017. This paper can cite as: Monika Rani, Amit Kumar Dhar, O.P. Vyas, Semi-automatic terminology ontology learning based on topic modeling, Engineering Applications of Artificial Intelligence, Volume 63, August 2017, Pages 108-125,ISSN 0952- 1976,https://doi.org/10.1016/j.engappai.2017.05.006. (http://www.sciencedirect.com/science/article/pii/S0952197617300891. Semi-Automatic Terminology Ontology Learning Based on Topic Modeling Monika Rani*, Amit Kumar Dhar and O. P. Vyas Department of Information Technology, Indian Institute of Information Technology, Allahabad, India [email protected] Abstract- Ontologies provide features like a common vocabulary, reusability, machine-readable content, and also allows for semantic search, facilitate agent interaction and ordering & structuring of knowledge for the Semantic Web (Web 3.0) application. However, the challenge in ontology engineering is automatic learning, i.e., the there is still a lack of fully automatic approach from a text corpus or dataset of various topics to form ontology using machine learning techniques. In this paper, two topic modeling algorithms are explored, namely LSI & SVD and Mr.LDA for learning topic ontology. The objective is to determine the statistical relationship between document and terms to build a topic ontology and ontology graph with minimum human intervention. Experimental analysis on building a topic ontology and semantic retrieving corresponding topic ontology for the user‟s query demonstrating the effectiveness of the proposed approach. Keywords: Ontology Learning (OL), Latent Semantic Indexing (LSI), Singular Value Decomposition (SVD), Probabilistic Latent Semantic Indexing (pLSI), MapReduce Latent Dirichlet Allocation (Mr.LDA), Correlation Topic Modeling (CTM).
    [Show full text]