Querying Description Logic Knowledge Bases
Total Page:16
File Type:pdf, Size:1020Kb
QUERYING DESCRIPTION LOGIC KNOWLEDGE BASES A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Engineering and Physical Sciences 2007 By Birte Glimm School of Computer Science Contents Abstract 7 Declaration 8 Copyright 9 Acknowledgements 10 1 Introduction 11 1.1 Description Logics .......................... 12 1.1.1 Description Logic Knowledge Bases ............. 13 1.1.2 Historical Background of Description Logics ........ 14 1.1.3 Application Areas of Description Logics ........... 15 1.1.4 Semantics of Description Logics ............... 17 1.2 Reasoning Services .......................... 17 1.2.1 Standard Reasoning Services ................. 18 1.2.2 Conjunctive Queries ..................... 18 1.2.3 Challenges of Query Answering ............... 23 1.3 Aims and Objectives ......................... 27 1.4 A Guide for Readers ......................... 27 2 Foundations of Description Logics 29 2.1 Syntax and Semantics ........................ 29 2.2 Standard Reasoning Tasks ...................... 33 2.3 Conjunctive Queries ......................... 36 2.3.1 Query Answering ....................... 40 2.3.2 Conjunctive Queries in Databases .............. 41 2.3.3 Why Conjunctive Queries .................. 41 2 2.4 Combined and Data Complexity ................... 42 3 Related Work and Alternative Approaches 44 3.1 Conjunctive Queries for Expressive Description Logics ....... 44 3.2 Conjunctive Queries for Tractable Description Logics ....... 45 3.3 Modal Correspondence Theory .................... 46 3.4 Query Containment .......................... 47 3.4.1 The Difficulty of Regular Expressions ............ 49 3.5 Rule Formalisms ........................... 50 3.5.1 The Carin System ....................... 52 3.5.2 Extensions of the Carin System ............... 57 3.6 Hybrid Logics ............................. 60 3.6.1 Hybrid Logic Binders for Query Answering ......... 60 3.7 First-Order Logic ........................... 65 3.8 Summary ............................... 67 4 Query Entailment for SHIQ 69 4.1 Query Rewriting by Example .................... 70 4.1.1 Forest Bases and Canonical Interpretations ......... 70 4.1.2 The Running Example .................... 79 4.1.3 The Rewriting Steps ..................... 81 4.2 Query Rewriting ........................... 86 4.2.1 Tree- and Forest-Shaped Queries .............. 87 4.2.2 From Graphs to Forests ................... 88 4.2.3 From Trees to Concepts ................... 90 4.2.4 Query Matches ........................ 92 4.2.5 Correctness of the Query Rewriting ............. 94 4.3 Deciding Query Entailment for SHIQ ............... 106 4.3.1 A Deterministic Decision Procedure ............. 106 4.3.2 A Non-Deterministic Decision Procedure .......... 117 4.3.3 Consequential Results .................... 118 4.4 Summary ............................... 119 5 Query Entailment for SHOQ 120 5.1 Forest Bases and Canonical Interpretations ............. 122 5.2 Query Rewriting ........................... 125 3 5.2.1 Query Shapes and Matches ................. 128 5.2.2 From Forest-Shaped Queries to Concept Conjuncts . 133 5.2.3 Correctness of the Rewriting Steps ............. 135 5.3 Deciding Query Entailment for SHOQ ............... 138 5.3.1 Canonical Models of Bounded Branching Degree ...... 139 5.3.2 Eliminating Transitivity ................... 141 5.3.3 Alternating Automata .................... 146 5.3.4 Tree Relaxations ....................... 148 5.3.5 Deciding Existence of Tree Relaxations ........... 154 5.3.6 Combined Complexity .................... 161 5.3.7 Consequential Results .................... 163 5.4 Summary ............................... 163 6 Conclusions 165 6.1 Thesis Achievements ......................... 165 6.2 Significance of the Results ...................... 166 6.3 Future Work .............................. 168 Bibliography 171 Index 189 Word Count 72.357 4 List of Figures 1.1 A graphical representation of a query. ................ 21 1.2 A graphical representation of a query. ................ 24 1.3 The query graph after identifying y and y′. ............. 25 1.4 The query graph for the query from Example 1.3. ......... 26 3.1 A complete and clash-free completion graph for K. ......... 55 3.2 A graphical representation of a canonical model I for K. ..... 55 3.3 A graphical representation of a model that does not satisfy q. .. 55 3.4 An abstraction of a completion graph using tree blocking. ..... 56 3.5 An abstraction of the model for the completion graph in Figure 3.4. 56 3.6 A completion graph and its canonical model. ............ 59 3.7 A graphical representation of the query q from Example 3.4. ... 62 4.1 A representation of a canonical interpretation I for K. ...... 79 4.2 A forest base for the interpretation represented by Figure 4.1. .. 80 4.3 A graph representation of the query from Example 4.3. ...... 80 4.4 A match π for the query q. ...................... 81 4.5 A cyclic query and its tree-shaped collapsing. ........... 82 4.6 A split rewriting qsr for the query shown in Figure 4.3. ...... 83 4.7 A split match πsr for the query qsr. ................. 83 4.8 A loop rewriting qℓr and its match. ................. 84 4.9 A forest rewriting qfr with a forest match πfr. ........... 84 4.10 A representation of a canonical model. ............... 98 4.11 The match for a forest rewriting. .................. 99 5.1 A representation of a canonical interpretation I for K. ...... 123 5.2 A graphical representation of the query q with its match. ..... 123 5.3 A graphical representation of a nominal rewriting. ......... 126 5.4 A graphical representation of a shortcut rewriting. ......... 126 5 5.5 A representation of a canonical model I for K. ........... 133 5.6 A representation of an alternative canonical model I′ for K. 133 5.7 A representation of a model for K. ................. 142 5.8 A representation of a canonical model I for elimTrans(K). ..... 148 5.9 A graphical representation of a relaxation for K. .......... 149 5.10 A relaxation and the tree relaxation built from it. ......... 151 6 Abstract Knowledge representation systems provide a mechanism for storing facts about some part of the real world in a knowledge base, inferring new knowledge based on the given facts, and querying knowledge bases. The ability to infer new knowl- edge is one of the distinguishing features compared to databases. Such inference services require the definition of knowledge in a language for which such inference algorithms exist, e.g., a Description Logic (DL). A DL language allows for the specification of concepts, individuals that are instances of these concepts, and roles, which are interpreted as binary relations over the individuals. Description Logics have proved useful in a wide range of applications and form the foundations of the Web Ontology Language (OWL), which is used in the Semantic Web as a means for specifying machine processable information. Despite their popularity, the query facilities provided by DL systems are still limited. Current algorithms are incomplete or impose restrictions on the types of allowed queries. In this thesis we identify sources of incompleteness in existing algorithms and present extended query procedures that eliminate the deficien- cies described above. More precisely, we present query answering algorithms for unrestricted conjunctive queries for the DLs SHIQ and SHOQ—the former of which was a long standing open problem. Furthermore, the correctness of the presented algorithms is proved formally and an analysis of the theoretical com- plexity is given. The planned future work is targeted on optimisation techniques to improve the algorithms’ practicality. The work presented in this thesis should be of value mainly to implementors of Description Logic systems, as the presented algorithms build the theoretical foundation for implementable query answering interfaces. Additionally, the al- gorithms can also be used in order to extend a DL system with datalog style rules. 7 Declaration No portion of the work referred to in this thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning. 8 Copyright i. The author of this thesis (including any appendices and/or schedules to this thesis) owns any copyright in it (the “Copyright”) and s/he has given The University of Manchester the right to use such Copyright for any adminis- trative, promotional, educational and/or teaching purposes. ii. Copies of this thesis, either in full or in extracts, may be made only in accordance with the regulations of the John Rylands University Library of Manchester. Details of these regulations may be obtained from the Librar- ian. This page must form part of any such copies made. iii. The ownership of any patents, designs, trade marks and any and all other intellectual property rights except for the Copyright (the “Intellectual Prop- erty Rights”) and any reproductions of copyright works, for example graphs and tables (“Reproductions”), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intel- lectual Property Rights and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property Rights and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and exploitation of this thesis, the Copyright and any Intellectual Property Rights and/or