Mining Moving Flock Patterns in Large Spatio-Temporal Datasets Using A
Total Page:16
File Type:pdf, Size:1020Kb
Mining moving flock patterns in large spatio-temporal datasets using a frequent pattern mining approach Andres Oswaldo Calderon Romero March 2011 Course Title: Geo-Information Science and Earth Observation for Environmental Modelling and Management Level: Master of Science (MSc.) Course Duration: September 2009 – March 2011 Consortium partners: University of Southampton (UK) Lund University (Sweden) University of Warsaw (Poland) University of Twente, Faculty ITC (The Netherlands) GEM thesis number: 2011– Mining moving flock patterns in large spatio-temporal datasets using a frequent pattern mining approach by Andres Oswaldo Calderon Romero Thesis submitted to the University of Twente, faculty ITC, in partial fulfilment of the requirements for the degree of Master of Science in Geo-information Science and Earth Observation for Environmental Modelling and Management. Thesis Assessment Board Chairman: Prof. Dr. Menno-Jan Kraak External Examiner: Dr. Jadu Dash First Supervisor: Dr. Otto Huisman Second Supervisor: Dr. Ulanbek Turdukulov Disclaimer This document describes work undertaken as part of a programme of study at the University of Twente, Faculty ITC. All views and opinions expressed therein remain the sole responsibility of the author, and do not necessarily represent those of the university. Abstract Modern data acquisition techniques such as Global positioning system (GPS), Radio-frequency identification (RFID) and mobile phones have resulted in the collection of huge amounts of data in the form of trajectories during the past years. Popularity of these technologies and ubiquity of mobile devices seem to indicate that the amount of spatio-temporal data will increase at accel- erated rates in the future. Many previous studies have focused on efficient techniques to store and query trajectory databases. Early approaches to re- covering information from this kind of data include single predicate range and nearest neighbour queries. However, they are unable to capture collective be- haviour and correlations among moving objects. Recently, a new interest for querying patterns capturing ‘group’ or ‘common’ behaviours have emerged. An example of this type of pattern are moving flocks. These are defined as groups of moving objects that move together (within a predefined distance to each other) for a certain continuous period of time. Current algorithms to discover moving flock patterns report problems in scalability and the way the discovered patterns are reported. The field of fre- quent pattern mining has faced similar problems during the past decade, and has sought to provided efficient and scalable techniques which successfully deal with those issues. This research proposes a framework which integrates techniques for clustering, pattern mining detection, postprocessing and vi- sualization in order to discover and analyse moving flock patterns in large trajectory datasets. The proposed framework was tested and compared with a current method (BFE algorithm). Synthetic datasets simulating trajectories generated by large number of moving objects were used to test the scalability of the frame- work. Real datasets from different contexts and characteristics were used to assess the performance and analyse the discovered patterns. The framework shows to be efficient, scalable and modular. This research shows that moving flock patterns can be generalized as frequent patterns and state-of-the-art algorithms for frequent pattern mining can be used to detect the moving flock patterns. This research develops preliminary visualization of the most relevant findings. Appropriate interpretation of the results demands further analysis in order to display the most relevant information. Keywords: Frequent pattern mining, Flock patterns, Trajectory datasets. Acknowledgements I would like to express my sincere gratitude to my first supervisor, Dr. Otto Huisman, and second supervisor, Dr. Ulanbek Turdukulov, for their great support and guidance during this research. I think I was the most fortunate student for having the chance to work with such great scientists. I very appreciate your support, critical comments and suggestions. Thank you so much!!! I would also like to thank Petter Pilesjo, Malgorzata Roge-Wisniewska, Andre Kooiman and Louise van Leeuwen for their valuable help at different stagesofmystudies. A special “Thank you!!!” goes to all my GEM friends for the wonderful time we had together. You were my second family during the past months and I never will forget you. I will miss you a lot. I would like to dedicate this thesis to my parents, Marcelo and Esperanza, my brother and sisters, Carlos, Paola and Carolina, and my little nephew and niece, Chris and Gabi. Thank you for believing in me even when I found it difficult to believe in myself. I owe you much more than this. Finally, I want to thank my fiancee. Nancy, you are the love of my life. Thank you for all your infinite love, support and patience during all this time. I love you!!! Contents 1 Introduction 1 1.1Background................................... 1 1.2Problemstatement............................... 2 1.3Researchidentification............................. 2 1.3.1 Researchobjectives........................... 3 1.3.2 Researchquestions........................... 3 1.3.3 Innovation aimed at . ......................... 3 1.3.4 Relatedwork.............................. 4 1.4Thesisstructure................................. 5 2 Framework Definition 7 2.1Identifyingpatternsinmovingobjects.................... 7 2.2BasicFlockPatternalgorithm......................... 8 2.3 Finding frequent patterns in traditional databases.................................... 9 2.3.1 Shoppingbasketanalysis:anexample................ 10 2.3.2 MaximalandClosedfrequentpatterns................ 11 2.4ProposedFramework.............................. 12 2.4.1 Gettingafinalsetofdiskspertimestamp.............. 13 2.4.2 Fromtrajectoriestotransactions................... 13 2.4.3 FrequentPatternMiningAlgorithms................. 14 2.4.4 PostprocessingStage.......................... 15 2.5FlockInterpretation.............................. 15 3 Implementation 17 3.1BFEImplementation.............................. 17 3.2SyntheticGenerators.............................. 18 3.3SyntheticDatasets............................... 18 3.4InternalComparison.............................. 18 3.5FrameworkImplementation.......................... 22 3.6ComputationalExperiments.......................... 26 3.7Validation.................................... 26 i 4 Study Cases 29 4.1TrackingIcebergsinAntarctica........................ 29 4.1.1 Implicationsandpossibleapplications................ 30 4.1.2 Datacleaningandpreparation.................... 31 4.1.3 Computationalexperiments...................... 32 4.1.4 Results................................. 34 4.1.5 Findingsinicebergtracking...................... 34 4.2PedestrianmovementinBeijing........................ 37 4.2.1 Implicationsandpossibleapplications................ 37 4.2.2 Datacleaningandpreparation.................... 37 4.2.3 Computationalexperiments...................... 38 4.2.4 Results................................. 39 4.2.5 Findingsinpedestrianmovement................... 39 5 Discussion 45 5.1ImplementationandPerformanceIssues................... 45 5.1.1 Impactofsizetrajectory........................ 45 5.1.2 Possiblesolutions............................ 46 5.2InterpretationIssues.............................. 46 5.2.1 Numberofpatternsandqualityoftheresults............ 46 5.2.2 Overlappingproblemandalternatives................ 47 6 Conclusions and Recommendations 49 6.1SummaryoftheResearch........................... 49 6.2Recommendation................................ 50 References 51 Appendices 59 A Main source code of the framework implementation 59 ii List of Figures 1.1 A flock pattern example: {T1,T2,T3}.Ti illustrates different trajectories, ci encloses a disk in which trajectories are considered close to each other and ti representsconsecutivetimeintervals(after[82])........... 2 2.1 BFE Algorithm for computing set of final disks per each timestamp and to joinandreportfinalflockpatterns(source:[82])............... 8 2.2 BFE pruning stages. (a) The initial set of disks. (b) Just disks which overpass μ are retained (μ = 3). (c) Redundant disks with subset members areremoved................................... 9 2.3ShoppingBasketAnalysisexample(source:[33]).............. 11 2.4Atrajectorydatasetexample.......................... 14 2.5Exampleofaflockwheredifferentinterpretationcanapply......... 16 3.1Oldenburgnetworkrepresentation....................... 19 3.2SanJoaquinnetworkrepresentation...................... 20 3.3 Comparison of internal execution time for the SJ25KT60 dataset. 21 3.4 Comparison of internal execution time for the SJ50KT55 dataset. 21 3.5Systematicdiagramfortheproposedframework............... 23 3.6 Overlapping problem during the generation of final disks. ......... 24 3.7 Performance of BFE algorithm and the proposed framework with different values for in SJ25KT60 dataset. The additional parameters were set as μ =5andδ =3................................. 26 3.8 Performance of BFE algorithm and the proposed framework with different values for in SJ50KT55 dataset. The additional parameters were set as μ =9andδ =3................................. 27 3.9 Visualization of the results from BFE (Left) and the proposed Framework (Right). BFE displays 448 flocks while the proposed framework 104. 28 4.1 Reported positions for all icebergs in the Iceberg dataset (1978, 1992-2009). 30 4.2