Knowledge Acquisition in a System

Knowledge Acquisition in a System

Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2012 Knowledge Acquisition in a System Christopher J. Thomas Wright State University Follow this and additional works at: https://corescholar.libraries.wright.edu/etd_all Part of the Computer Engineering Commons, and the Computer Sciences Commons Repository Citation Thomas, Christopher J., "Knowledge Acquisition in a System" (2012). Browse all Theses and Dissertations. 651. https://corescholar.libraries.wright.edu/etd_all/651 This Dissertation is brought to you for free and open access by the Theses and Dissertations at CORE Scholar. It has been accepted for inclusion in Browse all Theses and Dissertations by an authorized administrator of CORE Scholar. For more information, please contact [email protected]. Knowledge Acquisition in a System A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Christopher J. Thomas B.S., Universitat¨ Koblenz 2012 Department of Computer Science and Engineering Wright State University Wright State University GRADUATE SCHOOL January 9, 2013 I HEREBY RECOMMEND THAT THE THESIS PREPARED UNDER MY SUPER- VISION BY Christopher J. Thomas ENTITLED Knowledge Acquisition in a System BE ACCEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DE- GREE OF Doctor of Philosophy in Computer Science. Amit P. Sheth, Ph.D. Thesis Director Mateen Rizki, Ph.D. Chair, Department of Computer Science and Engineering Committee on Final Examination Andrew Hsu, Ph.D. Dean, Graduate School Amit P. Sheth, Ph.D. Pankaj Mehra, Ph.D. Shaojun Wang, Ph.D. Pascal Hitzler, Ph.D. Gerhard Weikum, Ph.D. ABSTRACT Thomas, Christopher. PhD, Department of Computer Science and Engineering, Wright State Uni- versity, 2012. Knowledge Acquisition in a System. I present a method for growing the amount of knowledge available on the Web using a hermeneutic method that involves background knowledge, Information Extraction tech- niques and validation through discourse and use of the extracted information. I present the metaphor of the “Circle of Knowledge on the Web”. In this context, knowl- edge acquisition on the web is seen as analogous to the way scientific disciplines gradually increase the knowledge available in their field. Here, formal models of interest domains are created automatically or manually and then validated by implicit and explicit valida- tion methods before the statements in the created models can be added to larger knowledge repositories, such as the Linked open Data cloud. This knowledge is then available for the next iteration of the knowledge acquisition cycle. I will both give a theoretical underpinning as well as practical methods for the acquisi- tion of knowledge in collaborative systems. I will cover both the Knowledge Engineering angle as well as the Information Extraction angle of this problem. Unlike traditional ap- proaches, however, this dissertation will show how Information Extraction can be incorpo- rated into a mostly Knowledge Engineering based approach as well as how an Information Extraction-based approach can make use of engineered concept repositories. Validation is seen as an integral part of this systemic approach to knowledge acquisition. The centerpiece of the dissertation is a domain model extraction framework that im- plements the idea of the “Circle of Knowledge” to automatically create semantic models for domains of interest. It splits the involved Information Extraction tasks into that of Do- main Definition, in which pertinent concepts are identified and categorized, and that of Domain Description, in which facts are extracted from free text that describe the extracted concepts. I then outline a social computing strategy for information validation in order to iii create knowledge from the extracted models. This dissertation makes the following contributions: • A hermeneutic methodology for knowledge acquisition within a system, involving – Human and artificial agents – Formally represented knowledge, – Textual information, – Information Extraction methods and – Information validation techniques • Ontology Design • Automatic Domain Model creation – Top-down Domain hierarchy extraction (Domain Definition) – Bottom-up Pattern-based extraction of named relationships (Domain Descrip- tion) ∗ Distantly supervised Relational Targeting Information Extraction ∗ Probabilistic positive-only Multi-class classifier ∗ Statistical measure for relationship pertinence ∗ Recall enhancement using pattern generalization • Implicit and Explicit Information validation iv Contents 1 Introduction1 1.1 Motivation...................................2 1.2 Hypothesis..................................4 1.3 Scope..................................... 11 2 Overview 13 2.1 Terminology.................................. 16 2.2 Knowledge Engineering oriented knowledge acquisition.......... 18 2.3 Information Extraction oriented knowledge acquisition........... 20 2.3.1 Epistemological Considerations................... 21 2.3.2 Automatic Domain Model Creation................. 23 3 Epistemological Foundations 29 3.1 Introduction.................................. 29 3.2 Knowledge.................................. 30 3.2.1 Truth................................. 34 3.2.2 Justification.............................. 40 3.2.3 Belief................................. 41 3.2.4 Knowledge in a Group - Social Epistemology............ 41 3.2.5 Knowledge in a System - Systems Epistemology.......... 43 3.3 Reference................................... 47 3.3.1 Rigid Designators.......................... 48 3.3.2 Definite Descriptions......................... 48 3.3.3 Application.............................. 49 3.4 The Hermeneutic Circle............................ 51 3.5 Knowledge Acquisition in a system..................... 52 3.5.1 Practical Considerations....................... 55 4 Knowledge Engineering - Based Domain Model Creation 57 4.1 Introduction.................................. 58 4.2 Ontology Design............................... 60 4.2.1 General Considerations........................ 60 v 4.2.2 Domain Definition - Schema Design................. 63 4.2.3 Archetypal Instances......................... 65 4.2.4 Application example......................... 71 4.2.5 Instances as Archetypes of Concepts................. 72 4.2.6 Implications............................. 75 4.3 Populating the Ontology........................... 75 4.3.1 General Considerations........................ 75 4.3.2 Populating GlycO from trusted sources............... 76 4.3.3 An Intelligent Population Algorithm................. 77 4.4 Evaluation................................... 79 4.5 Conclusion.................................. 81 5 Automatic Domain Model Extraction 83 5.1 Background.................................. 88 5.1.1 Domain Definition.......................... 88 5.1.2 Domain Description......................... 90 5.2 Related Work................................. 95 5.2.1 Ontology Learning.......................... 95 5.2.2 Top-Down Extraction of Knowledge................. 96 5.2.3 Bottom-up Extraction of Knowledge................. 96 5.3 Domain Definition - Hierarchy Creation................... 102 5.3.1 Expansion............................... 104 5.3.2 Reduction............................... 108 5.3.3 Synonym Acquisition......................... 110 5.3.4 Serialization............................. 111 5.4 Domain Description - Information Extraction................ 112 5.4.1 Surface Patterns........................... 114 5.4.2 Probabilistic Framework....................... 116 5.4.3 Vector-Space model......................... 120 5.4.4 Pertinence............................... 123 5.4.5 Relationship Domain and Range probabilities............ 128 5.4.6 Matrix-Based Fact Extraction.................... 130 5.4.7 Pattern Analysis........................... 131 5.4.8 Discussion.............................. 134 5.5 Model Completion - Combining Definition and Description......... 134 5.5.1 Concept Pairing heuristic - Wikipedia................ 135 5.5.2 Concept Pairing heuristic - General Case.............. 135 5.5.3 Model-Creation............................ 136 5.6 Evaluation................................... 137 5.6.1 Hierarchy Creation Evaluation.................... 137 5.6.2 Fact Extraction Evaluation...................... 147 5.6.3 Evaluation of the full Domain Models................ 157 5.6.4 Discussion.............................. 163 vi 6 Knowledge Verification and Propagation 167 6.1 Introduction.................................. 167 6.1.1 Explicit Validation.......................... 169 6.1.2 Validation in Use........................... 169 6.2 Discussion................................... 172 6.2.1 User feedback to focused browsing................. 172 6.2.2 Qualitative Evaluation of browsed facts............... 172 6.3 Propagation of validated statements..................... 174 6.4 Conclusion.................................. 175 7 Conclusion 177 7.1 Outlook.................................... 181 Bibliography 182 A Appendix A 211 vii viii List of Figures 1.1 Circle of (Web knowledge) life........................5 2.1 Classification of the work in this dissertation in terms of Knowledge Engi- neering vs. Information Extraction...................... 15 2.2 Traditional Ontology Learning Layer Cake.................. 25 2.3 Doozer++ Ontology Learning Layer Cake.................. 25 3.1 Circle of (Web knowledge) life........................ 53 3.2 Nonaka’s Knowledge Spiral........................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    238 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us