Incremental Rule-Based Reasoning on Semantic Data Streams

UNIVERSITY OF SOUTHAMPTON FACULTY OF PHYSICAL SCIENCES AND ENGINEERING School of Electronics and Computer Science Incremental Rule-based Reasoning on Semantic Data Streams by Rehab Albeladi Thesis for the degree of Doctor of Philosophy July 2016 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF PHYSICAL SCIENCES AND ENGINEERING School of Electronics and Computer Science Thesis for the degree of Doctor of Philosophy INCREMENTAL RULE-BASED REASONING FOR SEMANTIC DATA STREAMS Rehab Albeladi This thesis investigates the area of semantic stream processing, in which data streams are combined with semantic reasoning techniques. We have investigated techniques for rule- based reasoning over semantic streams in which reasoning is implemented natively over streams as data flow networks, and have developed an adaptive optimisation method to cope with the changing nature of streams. The contributions of this thesis include R4, a native rule-based reasoner for RDF streams using the Rete algorithm, and a cost-based adaptive plan optimiser designed for RDF streams. We have evaluated the performance of R4 and compared it to both a typical static reasoner and to the state-of-the-art in stream reasoners. The results show that R4 significantly outperforms these reasoners in terms of throughput. We have also evaluated the adaptive optimisation technique, with results that show the ability of the optimiser to devise and adopt better performing plans at runtime. Table of Contents Table of Contents .......................................................................................................... i List of Tables ................................................................................................................. v List of Figures .............................................................................................................. vii List of Listings ...............................................................................................................xi DECLARATION OF AUTHORSHIP .................................................................................. xiii Acknowledgements ..................................................................................................... xv Abbreviations ............................................................................................................ xvii Chapter 1: Introduction ....................................................................................... 1 1.1 Research Hypotheses ............................................................................................... 5 1.2 Contributions ............................................................................................................ 6 1.3 Thesis Structure ........................................................................................................ 6 1.4 Publications .............................................................................................................. 8 Chapter 2: Background Research ......................................................................... 9 2.1 Data Stream Processing............................................................................................ 9 2.1.1 Data Stream Management Systems ............................................................... 9 2.1.2 Querying Data Streams ................................................................................ 13 2.1.3 Continuous Query Optimisation .................................................................. 17 2.1.4 Distributed Stream Processing ..................................................................... 21 2.1.5 Complex Event Processing ........................................................................... 24 2.2 The Semantic Web ................................................................................................. 25 2.2.1 Knowledge Representation and Reasoning Techniques .............................. 26 2.2.2 Knowledge Representation on the Semantic Web ...................................... 31 2.2.3 Existing Semantic Reasoners ........................................................................ 38 2.3 Conclusion .............................................................................................................. 40 Chapter 3: Semantic Stream Processing ............................................................. 41 3.1 Processing RDF Streams ......................................................................................... 41 3.2 Reasoning on Semantic Streams ............................................................................ 47 3.2.1 Lightweight stream reasoning ...................................................................... 47 i 3.2.2 Complex stream reasoning .......................................................................... 48 3.3 Publishing Semantic Streams ................................................................................. 49 3.4 Distributed Semantic Stream Processing ............................................................... 50 3.5 Developed Semantic Streams Environments......................................................... 51 3.6 Benchmarking ........................................................................................................ 52 3.7 Conclusion .............................................................................................................. 53 Chapter 4: Continuous Reasoning ....................................................................... 55 4.1 Requirements ......................................................................................................... 55 4.2 Continuous reasoning framework for RDF streams .............................................. 59 4.2.1 Data Model .................................................................................................. 61 4.2.2 Operators ..................................................................................................... 64 4.3 R4: Rule-based Reasoner for RDF streams using Rete .......................................... 78 4.3.1 Rule Language .............................................................................................. 79 4.3.2 System architecture ..................................................................................... 81 4.3.3 Data Processing using Rete .......................................................................... 83 4.4 Conclusion .............................................................................................................. 88 Chapter 5: Evaluating R4 .................................................................................... 91 5.1 Evaluation scenario ................................................................................................ 91 5.1.1 Datasets ....................................................................................................... 91 5.1.2 Functionality Tests ....................................................................................... 94 5.2 Comparative Evaluation ......................................................................................... 99 5.2.1 Comparing Stream Reasoning to Static Reasoning ..................................... 99 5.2.2 Comparing to State-of-the-art Stream Reasoning Systems ...................... 108 5.3 Conclusion ............................................................................................................ 113 Chapter 6: Optimisation ................................................................................... 115 6.1 Initial Rete Network Generation: Static Optimisation ......................................... 115 6.2 Adaptive Optimisation ......................................................................................... 116 6.3 Cost Model ........................................................................................................... 119 6.3.1 Constant costs ............................................................................................ 120 ii 6.3.2 Estimating join’s selectivity (f) ................................................................... 120 6.3.3 Estimating window sizes (Wo) .................................................................... 122 6.3.4 Estimating output rates (λo) ....................................................................... 123 6.3.5 Insertion cost (Cinsert) .................................................................................. 124 6.3.6 Invalidation cost (Cinvalidate) ......................................................................... 124 6.3.7 Probing cost (Cprobe) .................................................................................... 125 6.3.8 Result generation cost (Cresult) .................................................................... 125 6.4 Monitoring ............................................................................................................ 125 6.5 Optimisation algorithm ........................................................................................ 126 6.5.1 Optimal plan algorithm .............................................................................. 126 6.5.2 Greedy algorithm ....................................................................................... 128 6.6 Plan migration ...................................................................................................... 129 6.7 Conclusion ............................................................................................................ 130

Load more