ALGORITHMS for the ANALYSIS of SPATIO-TEMPORAL DATA from TEAM SPORTS a Thesis Submitted in Fulfilment of the Requirements for Th
Total Page:16
File Type:pdf, Size:1020Kb
ALGORITHMS FOR THE ANALYSIS OF SPATIO-TEMPORAL DATA FROM TEAM SPORTS A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy in the School of Information Technologies at The University of Sydney Michael John Horton January 2018 c Copyright by Michael John Horton 2018 All Rights Reserved ii Abstract Modern object tracking systems are able to simultaneously record trajectories—sequences of time-stamped location points—for large numbers of objects with high frequency and accuracy. The availability of trajectory datasets has resulted in a consequent demand for algorithms and tools to extract information from these data. In this thesis, we present several contributions intended to do this, and in particular, to extract informa- tion from trajectories tracking football (soccer) players during matches. Football player trajectories have particular properties that both facilitate and present challenges for the algorithmic approaches to information extraction. The key property that we look to exploit is that the movement of the players reveals information about their objectives through cooperative and adversarial coordinated behaviour, and this, in turn, reveals the tactics and strategies employed to achieve the objectives. While the approaches presented here naturally deal with the application-specific properties of football player trajectories, they also apply to other domains where objects are tracked, for example behavioural ecology, traffic and urban planning. The research in this area is at a relatively early stage, and there is currently no consensus on the best approach to a number of key open problems. We present a detailed survey of the algorithmic approaches to mining sports trajectory data, and define a taxonomy for the tasks and problems that have been identified. Within this taxonomy, we make several individual contributions. We consider the task of automatically classifying passes made during football matches according to their quality, and present a framework that accepts player trajectory data as input and iii automatically makes such ratings with high accuracy. We find that the level of agree- ment of the assigned ratings between the automated classifier and an expert observer is similar to the level of agreement between two experts. Next, we observe that trajectories are a particular class of a more general data structure of state sequences, and we present a method of summarising sets of state sequences using flow diagrams that are minimal in the number of nodes, and where each state sequence appears as a path in the flow diagram. We prove that an exact algorithm for this problem is computationally intractable except for small inputs, and furthermore show that the exact solution is hard to approximate. As such we present two heuristic algorithms that perform well experimentally, and we also demonstrate the utility of this approach on two use cases on football trajectory data. We then consider two approaches to clustering of trajectories under the Frechet´ distance. First, we investigate the problems of clustering and outlier detection as an integrated task, and present improved heuristic algorithms derived from two distinct in- teger program formulations of the problem. Both the algorithms are iterative and main- tain their state in a set of auxiliary variables, and by monitoring these variables over the iterations of execution of the algorithm, time-series are captured. We claim that these time-series are an inherently two-dimensional view of the clustering and outliers that can be easily visualised and interpreted without suffering from the typical distortions implicit in low-dimensional visualisations of highly- or infinite-dimensional data. Finally, we investigate the problem of directly clustering trajectories such that each cluster contains an low-complexity exemplar that is representative of the cluster. Each exemplar is a trajectory with a bounded number of location points, and thus is robust to the noise that is typical in many trajectory data sets. We formalise this problem and present an algorithmic framework that decomposes the problem into distinct sim- plification and clustering tasks. Using previously known algorithms for these tasks, we present a family of approximation algorithms for the trajectory clustering prob- lem where the obtained clustering costs are bounded by a multiplicative factor of the optimal cost. iv Statement of Originality This is to certify that to the best of my knowledge, the content of this thesis is my own work. This thesis has not been submitted for any degree or other purposes. I certify that the intellectual content of this thesis is the product of my own work and that all the assistance received in preparing this thesis and sources have been ac- knowledged. Michael Horton v Acknowledgements I would like to first acknowledge the financial support that I received from the Com- monwealth of Australia under the Australian Postgraduate Award scheme, and from Data61, CSIRO under the NICTA Local Project Award and NICTA Research Project Award schemes. This support was gratefully received and helped to defray the costs of undertaking this research. I would also like to thank the collaborators I had on various parts of my research— your intuition, knowledge and analytic ability left a deep impression on me. Namely: Boris Aronov, Mark de Berg, Kevin Buchin, Maike Buchin, Anne Driemel, Serge Gaspers, Herman Haverkort, Bernard Mans, Ali Mehrabi, Stefan Rummele¨ and Stef Sijben. I had two very enjoyable and educational research visits, and thank Maike Buchin for hosting me at Ruhr-Universitat¨ Bochum for a week in 2014 and Mark de Berg for hosting a ten-week visit to TU Eindhoven in 2016. Finally, I want to acknowledge and thank my supervisors, Joachim Gudmundsson and Sanjay Chawla, for the time, resources, knowledge, belief and the patience that they invested in me as I struggled through the challenges that our research presented. I very much appreciate your contribution to my education, and I could not have achieved a fraction of what I have without your help. Michael Horton Sydney, January 17, 2018 vi List of Publications This thesis was based on these published works. I was the corresponding author and major contributor for all the works listed. Chapter 2 is based on this survey paper. • J. Gudmundsson and M. Horton, “Spatio-temporal analysis of team sports,” ACM Computing Surveys, vol. 50, no. 2, 22:1–22:34, 2017. DOI: 10 . 1145/3054132 Chapter 3 was published as a journal paper and a short version as a conference paper. • S. Chawla et al., “Classification of passes in football matches using spa- tiotemporal data,” ACM Transactions on Spatial Algorithms and Systems, vol. 3, no. 2, pp. 1–30, Aug. 2017. DOI: 10.1145/3105576 • M. Horton et al., “Automated classification of passing in football,” in Pro- ceedings of the 19th. Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD ’15), Part II, ser. Lecture Notes in Computer Science, vol. 9078, Springer, May 2015, pp. 319–330. DOI: 10.1007/978-3-319-18032-8_25 Chapter 4 is currently under review as a journal paper and a short version was pub- lished in conference proceedings. • K. Buchin et al., “Compact flow diagrams for state sequences,” Journal of Experimental Algorithmics, vol. 22, pp. 1–23, Dec. 2017. DOI: 10.1145/ 3150525 vii • K. Buchin et al., “Compact flow diagrams for state sequences,” in Proceed- ings of the 15th. International Symposium on Experimental Algorithms (SEA ’16), ser. Lecture Notes in Computer Science, vol. 9685, Springer, Jun. 2016, pp. 89–104. DOI: 10.1007/978-3-319-38851-9_7 viii Table of Contents Abstract iii Statement of Originality v Acknowledgements vi List of Publications vii Table of Contents ix List of Figures xiv List of Tables xvii 1 Introduction 1 1.1 Background . .2 1.2 Spatio-temporal Sports Data . .6 1.2.1 Preliminaries . .6 1.2.2 Object Trajectories . .7 1.2.3 Event Logs . .9 1.2.4 Mappings . 10 1.2.5 Distance Measures . 11 1.2.6 Experimental Data set . 12 1.3 Organisation and Contributions . 12 ix 2 Spatio-temporal Sports Analysis: A Survey 16 2.1 Playing Area Subdivision . 18 2.1.1 Intensity Matrices and Maps . 19 2.1.2 Low-rank Factor Matrices . 22 2.1.3 Movement Models and Dominant Regions . 24 2.1.3.1 Motion Model . 24 2.1.3.2 Dominant Regions . 26 2.1.3.3 Further Applications . 29 2.2 Network Techniques for Team Performance Analysis . 33 2.2.1 Centrality . 34 2.2.1.1 Degree centrality . 35 2.2.1.2 Betweenness Centrality . 36 2.2.1.3 Closeness Centrality . 37 2.2.1.4 Eigenvector Centrality and PageRank ........ 37 2.2.2 Clustering Coefficients . 38 2.2.3 Density and Heterogeneity . 40 2.2.4 Entropy, Topological Depth, Price-of-Anarchy and Power Law Distributions . 40 2.3 Data Mining . 41 2.3.1 Applying Labels to Events . 41 2.3.2 Predicting Future Event Types and Locations . 43 2.3.3 Identifying Formations . 44 2.3.4 Identifying Plays and Tactical Group Movement . 48 2.3.5 Temporally Segmenting the Game . 51 2.4 Performance Metrics . 53 2.4.1 Offensive Performance . 53 2.4.2 Defensive Performance . 56 2.5 Visualisation . 58 2.6 Applicability of Approaches to Other Sports . 59 x 2.7 Conclusion . 61 3 Classification of Passes in Football Matches Using Spatio-temporal Data 62 3.1 Related Work . 63 3.2 Preliminaries . 64 3.2.1 Predictor Variables . 66 3.2.2 Learning Algorithm and Classification Function . 67 3.2.3 Evaluation Functions . 68 3.2.4 Problem Statement . 68 3.3 Predictor Variables . 69 3.3.1 Feature Functions . 70 3.3.2 Player Motion Model . 72 3.3.3 The Dominant Region . 73 3.3.4 Discrete Algorithm to Approximate Dominant Region .