Discovering Contiguous Sequential Patterns in Network-Constrained Movement

Discovering Contiguous Sequential Patterns in Network-Constrained Movement CAN YANG Licentiate Thesis Stockholm, Sweden, 2017 TRITA SOM 2017-13 Division of Geoinformatics, ISRN KTH/SoM/2017-13/SE Royal Institute of Technology (KTH) ISBN 978-91-7729-589-1 SE-100 44 Stockholm © Can Yang, 2017 Tryck: Universitetsservice US AB iii Abstract A large proportion of movement in urban area is constrained to a road network such as pedestrian, bicycle and vehicle. That movement information is commonly collected by Global Positioning System (GPS) sensor, which has generated large collections of trajectories. A contiguous sequential pattern (CSP) in these trajectories represents a certain number of objects traversing a sequence of spatially contiguous edges in the network, which is an intuitive way to study regularities in network-constrained movement. CSPs are closely related to route choices and trafficflows and can be useful in travel demand modeling and transportation planning. However, the efficient and scalable extraction of CSPs and effective visualization of the heavily overlapping CSPs are remaining challenges. To address these challenges, the thesis develops two algorithms and a visual analytics system. Firstly, a fast map matching (FMM) algorithm is designed for matching a noisy trajectory to a sequence of edges traversed by the object with a high performance. Secondly, an algorithm called bidirectional pruning based closed contiguous sequential pattern mining (BP-CCSM) is de- veloped to extract sequential patterns with closeness and contiguity constraint from the map matched trajectories. Finally, a visual analytics system called sequential pattern explorer for trajectories (SPET) is designed for interactive mining and visualization of CSPs in a large collection of trajectories. Extensive experiments are performed on a real-world taxi trip GPS dataset to evaluate the algorithms and visual analytics system. The results demon- strate that FMM achieves a superior performance by replacing repeated routing queries with hash table lookups. BP-CCSM considerably outperforms three state-of-the-art algorithms in terms of running time and memory con- sumption. SPET enables the user to efficiently and conveniently explore spa- tial and temporal variations of CSPs in network-constrained movement. Keywords: map matching, trajectory pattern mining, closed contiguous sequential pattern mining, trajectory pattern visualization iv Sammanfattning En stor del av rörelser i urbana områden är begränsade till ett nätverk av vägar för fotgängare, cyklar och motorfordon. Information om rörelserna samlas vanligtvis av Global Positioning System (GPS) sensorer, vilka genere- rar stora mängder av rörelsebanor. Ett sammanhängande sekventiellt mönster (CSP) i dessa banor representerar ett visst antal objekt som rör sig utmed en sekvestill challengingns av rumsligt sammanhängande kanter i vägnätet, vilket är ett intuitivt sätt att studera regelbundenhet i nätverksbegränsad rörelse. CSPer är nära besläktat med ruttval och trafikflöde, och kan vara användbart i trafikmodellering och transportplanering. Effektiv och skalbar extraktion av CSPer och effektiv visualisering av deras överlappande är problem som återstår att lösas. För att lösa dessa problem utvecklas i avhandlingen två algoritmer och ett system för visuell analys. För det första utvecklas en algoritm för snabb matchning (FMM) för att med hög prestanda kunna omforma en oprecis bana till en sekvens av vägnätskanter som passeras av objektet. För det and- ra utvecklas en sluten sammanhängande sekventiell mönsterutvinningsalgo- ritm baserad på bidirektionell beskärning (BP-CCSM) för att kunna utvinna sekventiella mönster med slutenhets- och kontinuitetsbegränsningar från de map-matchade rörelsebanorna. Slutligen utformas ett visuellt analyssystem som kallas sekventiell mönsterutforskare för rörelsebanor (SPET) för interak- tiv utvinning och visualisering av CSPer från en stor mängd av rörelsebanor. Omfattande experiment utförs genom att använda ett GPS data set med verkliga taxiresor för att utvärdera algoritmerna och det visuella analyssy- stemet. Resultaten visar att FMM uppnår överlägsen prestanda genom att ersätta upprepade routing queries med lookups i en hash-tabell. BP-CCSM visar sig vara betydligt bättre än tre state-of-the-art algoritmer med avseende på körtid och minneskonsumtion. SPET gör det möjligt för användaren att effektivt utforska rumsliga och tidsmässiga variationer av CSPer i nätverks- begränsad rörelse. v List of Papers I Can Yang and Győző Gidófalvi (2017), Fast map matching, an algorithm inte- grating hidden Markov model with precomputation, International Journal of Geographical Information Science, DOI: 10.1080/13658816.2017.1400548 II Can Yang and Győző Gidófalvi (2017), Mining and visual exploration of closed contiguous sequential patterns in trajectories, International Journal of Geo- graphical Information Science, DOI: 10.1080/13658816.2017.1393542 vi Acknowledgements Firstly, I would like to express gratitude to my supervisors Győző Gidófalvi, Yifang Ban and Xiaoliang Ma for their guidance and support on various aspects of my research. Their valuable comments and suggestions have improved the quality of my work significantly. I would also like to thank my colleagues in the Geoinformatics division with whom I have shared innovative ideas. I would like to thank the Chinese Scholarship Council (CSC) and the Division of Geoinformatics at the School of Architecture and Built Environment (ABE) at KTH for providing me the scholarship and research opportunity. Last but not least, I would like to express my sincere gratitude to my parents, family and friends for their love and endless support. Contents Contents vii List of Figures ix List of Tables xi List of Abbreviations xii 1 Introduction 1 1.1 Background . ............... 1 1.2 Research objectives . ........... 3 1.3 Thesis structure . ............. 4 1.4 Declaration of contributions . 5 2 Literature review 7 2.1 Map matching . .............. 7 2.2 Mining sequential patterns in trajectories . 10 2.3 Trajectory pattern visualization . 11 3 Methodology 13 3.1 Fast Map Matching integrated with precomputation . 13 3.1.1 Preliminaries and problem formulation . 13 3.1.2 Hidden Markov Model based map matching . 14 3.1.3 Precomputation of upper bounded origin destination table . 17 3.1.4 Integration of precomputation with map matching . 20 3.1.5 Penalizing reverse movement in map matching . 23 3.2 Mining closed contiguous sequential patterns in trajectories . 24 3.2.1 Preliminaries . 25 3.2.2 Bidirectional pruning based closed contiguous sequential mining algorithm (BP-CCSM) . 26 3.2.3 Patternfiltering with BP-CCSM . 31 3.3 Sequential pattern explorer for trajectories (SPET) . 31 3.3.1 Visualization of CCSP . 31 vii viii CONTENTS 3.3.2 Web based visual analytics . 33 4 Case study 35 4.1 Data description and experiment setup . 35 4.2 Evaluation of FMM . 36 4.2.1 Performance assessment . 36 4.2.2 Accuracy assessment . 41 4.2.3 Evaluation of penalty for reverse movement . 43 4.3 Evaluation of BP-CCSM . 43 4.4 Pattern exploration with SPET . 46 4.4.1 Traffic map visualization . 47 4.4.2 Dynamic offset map visualization . 48 5 Conclusions and future work 53 5.1 Conclusions . 53 5.2 Future work . 54 Bibliography 57 List of Figures 1.1 Flowchart of the thesis . ............ 4 3.1 Process of Fast Map Matching algorithm . 14 3.2 Illustration of candidate selection. 16 3.3 Information stored in upper bounded origin destination table . 19 3.4 Illustration of querying SP distance from candidateC n toC n+1 using UBODT . 20 3.5 Illustration of unnecessary routing queries caused by false distant can- didates . 22 3.6 Two examples of reverse movements in matching real world GPS data without penalty. 23 3.7 Comparison of different types of sequential patterns. The second table only shows sequential patterns starting from item 1 due to space limit. 25 3.8 Illustration of BP-CCSM algorithm . 27 3.9 Illustration of dynamic offset map . 32 3.10 Conceptional architecture of SPET . 33 3.11 User interface of web based visual analytics. 34 4.1 Statistics of GPS data . 36 4.2 Study area and road network . 36 4.3 Investigation of running time for candidate search (CS) (a) and optimal path inference (OPI)(b) . 40 4.4 Accuracy of FMM and the cumulative distribution of matched edge distance . ................... 42 4.5 Examples of reverse movement corrected by applying a penalty with pf = 100. 43 4.6 Statistics of patterns mined from the one-month data . 44 4.7 Performance comparison of BP-CCSM with Inc-CCFR, CCSpan and CCPM with two dataset: Gazelle and GPS Trajectories . 45 4.8 Distribution of trajectories partitions with respect to day of week and hour of day. 46 4.9 Pattern exploration examples by varying min_sup and class scheme mined with 168 time windows . 47 4.10 Distribution of trajectory patterns in weekdays and weekends . 47 ix x List of Figures 4.11 Distribution of trajectory patterns in morning and afternoon rush hours from/to train station . 49 4.12 Dynamic offset map visualization with different kinds of patternfilters. 51 List of Tables 3.1 Four modes of precomputation integration in FMM . 22 3.2 Summary of data structures used by BP-CCSM . 31 4.1 Statistics of UBODT precomputation . 37 4.2 MM performance tested on a single day dataset. 38 4.3 MM performance tested on the one-month dataset . 38 4.4 Running time of all steps in FMM . 39 4.5 Performance comparison with existing literature by single processor’s MM speed . 41 xi List of Abbreviations ANPR Automatic

Discovering Contiguous Sequential Patterns in Network-Constrained Movement

Hardware Pattern Matching for Network Traffic Analysis in Gigabit

Comprehensive Examinations in Computer Science 1872 - 1878

Trusted Research Environments (TRE) a Strategy to Build Public Trust and Meet Changing Health Data Science Needs

A Personalized Cloud-Based Traffic Redundancy Elimination for Smartphones Vivekgautham Soundararaj Clemson University, [email protected]

Branchclust Tutorial

Approximate Pattern Matching with Index Structures

How to Think Like a Computer Scientist

A Best-First Anagram Hashing Filter for Approximate String Matching

The Answer Question in Question Answering Systems Engenharia

Comparing Namedcapture with Other R Packages for Regular Expressions by Toby Dylan Hocking

The Power of Prediction: Cloud Bandwidth and Cost Reduction

Necessary to Conduct Most Analyses Commercial Software Typically Have (Very Bad) Installers Open Source Software Typically Have