Practical Reasoning in Probabilistic Description Logic
Total Page:16
File Type:pdf, Size:1020Kb
PRACTICAL REASONING IN PROBABILISTIC DESCRIPTION LOGIC A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Engineering and Physical Sciences 2011 By Pavel Klinov School of Computer Science 2 Contents Abstract 9 Declaration 10 Copyright 11 Acknowledgments 12 1 Introduction 13 1.1 Description Logic and Ontologies . 13 1.1.1 Description Logic: From Semantic Nets to OWL 2 . 14 1.1.2 Ontologies at Work . 15 1.2 Uncertainty . 16 1.2.1 Uncertainty: Ubiquitous and Versatile . 16 1.2.2 Uncertainty vs. Vagueness . 18 1.3 P-SROIQ and Pronto Reasoner . 18 1.4 Objectives . 20 1.5 Thesis Structure . 20 2 Background and Related Work 22 2.1 Description Logic . 22 2.1.1 Syntax and Semantics . 22 2.1.2 Reasoning Problems and Complexity . 25 2.1.3 The Web Ontology Language . 26 2.2 Representation and Reasoning About Uncertainty . 26 2.2.1 Knowledge Based Model Construction . 27 2.2.2 Probabilistic Logics . 34 2.3 P-SROIQ: A Probabilistic Description Logic . 48 2.3.1 Syntax and Semantics . 48 2.3.2 Reasoning Problems . 51 2.3.3 Original Algorithms . 53 3 2.3.4 Note on Probabilistic Coherence . 57 2.4 Related Work . 58 2.4.1 Propositional PSAT Solvers . 59 2.4.2 NMPROBLOG and ContraBovemRufum . 61 3 Applications and Case Studies 62 3.1 Breast Cancer Risk Assessment Problem . 62 3.1.1 Risk Types and Assessment . 63 3.1.2 Existing Tools and Models . 65 3.1.3 The BCRA Problem and P-SROIQ ................ 66 3.2 Consistency of CADIAG-2 . 79 3.2.1 Background . 79 3.2.2 Probabilistic Formalization . 80 3.2.3 Diagnosis and Results . 82 3.3 Reasoning about Ontology Alignments . 87 3.3.1 Ontology Alignments . 88 3.3.2 Probabilistic Formalization . 89 3.3.3 Probabilistic vs. Classical Validation . 92 4 Understanding P-SROIQ 95 4.1 P-SROIQ as a Generalization of Propositional Probabilistic Logic . 95 4.1.1 Basic Translation . 95 4.1.2 Strength of Entailments . 97 4.2 P-SROIQ as a Fragment of First-Order Probabilistic Logic . 100 4.2.1 Translation of PTBoxes into FOPL . 100 4.2.2 Translation of PABoxes into FOPL . 103 4.3 Properties and Limitations of P-SROIQ ................. 106 4.3.1 Interpretation of Probabilistic Statements . 106 4.3.2 Representation of Statistics . 107 4.3.3 Representation of Beliefs . 109 4.3.4 Summary . 110 5 Algorithms for Practical Reasoning in P-SROIQ 112 5.1 The Probabilistic Satisfiability Algorithm . 112 5.1.1 Column Generation Basics . 113 5.1.2 Column Generation-Based PSAT Algorithm . 114 5.1.3 Possible World Generation . 116 5.1.4 Algorithm Analysis . 119 5.1.5 Main Optimizations . 122 4 5.1.6 Comparison with Propositional PSAT . 128 5.2 Analysis of Probabilistic Knowledge Bases . 130 5.2.1 Finding Minimal Unsatisfiable Subsets . 130 5.2.2 Finding Maximal Satisfiable Subsets . 136 5.3 Probabilistic Consistency Algorithms . 137 5.3.1 Optimized Original Algorithm . 137 5.3.2 Diagnosis-driven Algorithm . 139 5.4 Tight Lexicographic Entailment Algorithm . 142 6 Synthetic Performance Evaluation 147 6.1 Generic Evaluation Methodology . 148 6.1.1 Test Data Parameters . 148 6.1.2 Test Data Generation . 150 6.1.3 Performance Measures and Data Gathering . 155 6.2 Evaluation Environment . 156 6.3 Random Propositional Knowledge Bases . 156 6.3.1 Random Clauses . 157 6.3.2 Random Concept Hierarchies . 158 6.4 Random Bayesian Knowledge Bases . 160 6.4.1 Knowledge Base Generation . 162 6.4.2 Results . 163 6.5 Probabilistic Extensions of Real Ontologies . 164 6.5.1 Ontology Selection . 165 6.5.2 Results . 167 6.6 Summary of Results . 175 7 Application Performance Evaluation 177 7.1 Ontology Alignments Validation . 177 7.1.1 Experimental Setup . 178 7.1.2 Results and Discussion . 179 7.2 Analysis of CADIAG-2 . 181 7.2.1 Performance Metrics . 181 7.2.2 Finding Inconsistent Fragments . 182 7.2.3 Probabilistic Consistency Evaluation . 184 7.2.4 Lexicographic Entailment Evaluation . 186 8 Pronto: A Practical P-SROIQ Reasoner 190 8.1 Pronto Overview . 190 8.1.1 Knowledge Base Syntax . 190 5 8.1.2 Command Line Usage and API . 192 8.2 Architecture . 193 8.2.1 Linear Program Layer . 193 8.2.2 Monotonic Reasoning Layer . 194 8.2.3 Non-monotonic Reasoning Layer . 195 8.3 Configuring Pronto . 195 9 Conclusion 196 9.1 Summary of Contributions . 196 9.1.1 Practical Reasoning Algorithms and Evaluation . 196 9.1.2 Analysis of P-SROIQ ........................ 199 9.1.3 Applicability of P-SROIQ ..................... 200 9.2 Challenges and Future Directions . 200 A Proofs of Theorems 203 Bibliography 205 6 List of Tables 3.1 Example of a reported association between alcohol intake and the risk of hormone receptor-specific breast cancer (excerpt from [175]) . 71 3.2 Characteristics of CADIAG-2's knowledge base . 83 3.3 The size and the number of minimal unsatisfiable sets of various types in ΦCB under the relaxed interpretation. 86 4.1 Impact of the strength parameter on results of logical and lexicographic entailment (adapted from [135]). Recall that the result [1; 0] means that the tightest probability interval is undefined due to inconsistency. 99 4.2 Translation of P-ALC formulae into FOPLII ............... 100 6.1 PSAT performance on random propositional clauses . 158 6.2 PSAT performance on random concept hierarchies of variable size and fixed subsumption density . 159 6.3 PSAT performance on random concept hierarchies of variable subsump- tion density and fixed size. The column \MILP size" specifies the number of variables and inequalities in the MILP program (5.4) used to generate columns. 160 6.4 PSAT performance on translations of Bayesian networks with variable tree width into P-SROIQ.......................... 163 6.5 PSAT times for PTBoxes with probabilistic signatures of 250 concept names and variable size . 168 6.6 PSAT times for PTBoxes with 500 probabilistic statements and variable signature size . 169 6.7 PSAT times for PTBoxes with varying number of statements and signa- ture size . 170 6.8 PSAT evaluation results on PTBoxes with fixed size and variable number of unconditional constraints. U% stands for the proportion of uncondi- tional constraints in the PTBox. 171 7 6.9 PSAT times on unsatisfiable PTBoxes. H is the proportion of problem instances for which unsatisfiability has been detected by ECD. F is the average number of false invocations of ECD per problem instance. R is the average time savings (in %) that are due to ECD. 173 7.1 PSAT performance when validating probabilistic ontology alignments . 180 7.2 Performance of PTBox consistency algorithms on fragments of CADIAG-2186 7.3 Performance of the TLexEnt algorithm on PTBox and PABox queries against fragments of CADIAG-2 . 188 7.4 TLexEnt performance measures against the size of PABox . 189 8.1 The correspondence between Pronto's command line arguments and meth- ods of the API. 193 8 List of Figures 1.1 Ductal and lobular cancers are kinds of breast cancer. 14 1.2 DL axioms which entail that ductal and lobular cancers are kinds of breast cancer. 14 1.3 The structure of a P-SROIQ knowledge base (probabilistic ontology). 20 3.1 Joint confidence region for (µ, σ2)...................... 74 7.1 The average running time of the diagnosis algorithm against the size of CADIAG-2 fragments. 183 7.2 The average running time of the diagnosis algorithm against the number of incoherent subsets in CADIAG-2 fragments. 183 7.3 The average running time of the diagnosis algorithm against the number of repairs in CADIAG-2 fragments. 184 8.1 The layered architecture of Pronto . 194 9 10 Abstract Description Logics (DLs) form a family of languages which correspond to decidable fragments of First-Order Logic (FOL). They have been overwhelmingly successful for constructing ontologies|conceptual structures describing domain knowledge. Ontolo- gies proved to be valuable in a range of areas, most notably, bioinformatics, chemistry, Health Care and Life Sciences, and the Semantic Web. One limitation of DLs, as fragments of FOL, is their restricted ability to cope with various forms of uncertainty. For example, medical knowledge often includes statistical relationships, e.g., findings or results.