Interactive Event-Driven Knowledge Discovery from Data Streams
Total Page:16
File Type:pdf, Size:1020Kb
UC Irvine UC Irvine Electronic Theses and Dissertations Title Interactive Event-driven Knowledge Discovery from Data Streams Permalink https://escholarship.org/uc/item/8bc5k0j3 Author Jalali, Laleh Publication Date 2016 License https://creativecommons.org/licenses/by/4.0/ 4.0 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, IRVINE Interactive Event-driven Knowledge Discovery from Data Streams DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Computer Science by Laleh Jalali Dissertation Committee: Professor Ramesh Jain, Chair Professor Gopi Meenakshisundaram Professor Nalini Venkatasubramanian 2016 © 2016 Laleh Jalali TABLE OF CONTENTS Page LIST OF FIGURES v LIST OF TABLES viii LIST OF ALGORITHMS ix ACKNOWLEDGMENTS x CURRICULUM VITAE xi ABSTRACT OF THE DISSERTATION xiv 1 Introduction 1 1.1 Data-driven vs. Hypothesis-driven . .3 1.2 Knowledge Discovery from Temporal Data . .5 1.3 Contributions . .7 1.4 Thesis Outline . .9 2 Understanding Knowledge Discovery Process 10 2.1 Knowledge Discovery Definitions . 11 2.2 New Paradigm: Actionable Knowledge Extraction . 13 2.3 Data Mining and Knowledge Discovery Software Tools . 15 2.4 Knowledge Discovery in Healthcare Applications . 20 2.5 Design Requirements for an Interactive Knowledge Discovery Framework . 23 3 Literature Review 26 3.1 Temporal Data Mining . 27 3.1.1 Definitions and Concepts . 27 3.1.2 Pattern Discovery . 29 3.1.3 Temporal Association Rules . 35 3.1.4 Time Series Data Mining . 36 3.1.5 Temporal Classification and Clustering . 37 3.2 Temporal Reasoning . 38 3.2.1 Interval-based Temporal Logic . 39 3.2.2 Event Calculus . 40 3.2.3 Situation Calculus . 42 ii 3.3 Qualitative Models and Qualitative Reasoning . 44 3.3.1 Qualitative Models . 45 3.3.2 Qualitative Simulation . 46 3.3.3 Qualitative Data Mining . 47 4 Data Model and Pattern Operators 53 4.1 Physical-World vs. Cyber World . 54 4.1.1 Events Perception in Human . 55 4.1.2 Events in Cyber World . 55 4.1.3 Bridge the Semantic Gap . 57 4.2 Time Model . 58 4.3 Event Model . 61 4.4 Hypothesis-Driven Pattern Operators . 64 4.4.1 Selection Operation ρ.P ........................... 66 4.4.2 Sequence Operation (ρ1; ρ2 .......................... 67 ( ) 4.4.3 Conditional Sequence Operation (ρ1 ;!∆t1 ρ2)............... 67 4.4.4 Concurrency Operation (ρ1) ρ2)....................... 68 4.4.5 Alternation (ρ1 ρ2)............................. 68 4.4.6 Time (!∆t ρ)..................................á 69 4.5 Data-driven Operators . .S . 69 ′ 4.5.1 Sequential Co-occurrence SEQ CO[∆t] ES; ES ............. 69 4.5.2 Concurrent Co-occurrence CON CO ES; ES′ .............. 70 ( ) 5 Overall Framework( ) 72 5.1 Interactive Knowledge Discovery and Data Mining Process . 72 5.2 Re-visiting Design Principles . 74 5.2.1 Human-centered Analysis . 74 5.2.2 Expressiveness of the Pattern Query Language . 75 5.2.3 Interactive Modeling Approach . 75 5.2.4 Extensibility . 76 5.2.5 Result Interpretation . 76 5.3 General System Architecture . 76 5.4 Pattern Formulation and Query Language . 79 5.4.1 Automata Model for Pattern Formulation . 81 5.5 Graphical User Interface . 83 6 Significant Pattern Extraction 87 6.1 Co-occurrence Patterns . 88 6.2 Processing Algorithms . 90 6.2.1 Sequential Pattern Mining . 91 6.2.2 Conditional Sequential Pattern Mining . 92 6.2.3 Concurrent Pattern Mining . 95 6.3 Visual Analytics Process . 95 6.4 Simulation Results . 98 iii 7 Objective Self 103 7.1 Introduction . 104 7.2 Toward Objective Self . 105 7.2.1 Anecdotal Self . 105 7.2.2 Diarizing Self . 106 7.2.3 Quantified Self . 107 7.2.4 Objective Self Has Arrived . 109 7.3 An Architecture for Objective Self . 111 7.4 Life Events . 112 7.4.1 Life Log . 114 7.4.2 Life Event Recognition . 115 7.4.3 Formal Concept Analysis . 119 7.5 Frequent Behavior Pattern Extraction . 123 7.5.1 Co-occurrence Behavior Patterns . 124 7.5.2 Processing Co-occurrence Patterns . 125 7.6 Evaluation . 127 7.6.1 Data Collection . 127 7.6.2 Sequential Co-occurrence: Commute Behavior and Activity Trends . 129 7.6.3 Concurrent Co-occurrence: Multitasking Behavior . 131 7.6.4 Patterns Across a Group of Users . 132 7.6.5 The Effect of Environmental Factors on Behavior . 133 8 Asthma Risk Management 136 8.1 Introduction . 136 8.2 Motivation . 138 8.3 Related Work in Asthma Risk Factor Prediction . 139 8.4 Approach . 140 8.5 Data Pre-processing . 141 8.5.1 Topic Modeling . 142 8.5.2 Environmental Event Stream Modeling . 142 8.6 Experiments . 144 8.6.1 Data-driven Risk Factor Recognition . 144 8.6.2 Interactive Asthma Risk Factor Assessment . 152 9 Conclusion and Future Work 154 Bibliography 157 iv LIST OF FIGURES Page 1.1 From data to abstractions with model building process. Models are used not only for prediction but for understanding and explaining. .2 1.2 The cycle of knowledge discovery. .6 2.1 The process of Knowledge Discovery in Databases. 13 2.2 Interactive Knowledge Discovery (IKDD) process. 15 3.1 Categorization of input data type and temporal data mining algorithms. 29 3.2 Taxonomy of temporal Data mining. 30 3.3 Qualitative reasoning in action . 46 3.4 A qualitative tree induced from a set of examples for the function z = x2 - y2. The rightmost leaf, applying when attributes x and y are positive, says that z is strictly increasing in its dependence on x and strictly decreasing in its dependence on y [128] . 49 3.5 The graphs present the data and the Q2Q-learned regression functions based on two different qualitative explanations of the data. Left, the case with a three leaf qualitative tree; right, the case with a single leaf qualitative tree saying y M − x [129]. 51 4.1 Interaction= between( ) physical world and cuber world. Sensors act as interface between these two world. Objects and events are recognized in cyber world and effective models are built by understanding relations between cyber events. 54 4.2 Sample event media JSON for jogging and meeting events. Although events have general temporal, spatial, informational, structural, and experiential facets, their informational properties varies between different events. 58 4.3 Allens interval relations between the intervals X and Y. 59 4.4 Eleven semi-interval relationships. Question marks (?) in the pictorial illus- tration stand for either the symbol denoting the event depicted in the same line (X or Y) or for a blank. The number of question marks reflects the number of qualitatively alternative implementations of the given relation [46]. 60 + − 4.5 (a) Example encoding of a sequence of events. E1 and E1 represent the start and end times of event E1, respectively. Relational operators are used to indicate the ordered relations between start/end times. (b) Example of encoding a multi-event stream from two sequence of events. 63 v 4.6 Sample event streams ES(1), ES(2), and ES(3) and their corresponding event types. Pattern 1 and 2 are conditional sequential patterns, each one with two occurrences. 66 5.1 Interactive Knowledge Discovery/Data Mining Process . 73 5.2 High level architecture of the framework. 78 5.3 Basic Building Blocks of FSA in a high-level pattern formulation and query language . 80 5.4 The automaton corresponding to pattern ρ1 with 3 event components. It demonstrates 3 ordinary states, 2 time states, and EVALUATE() and SET() functions associated with each state. ..