Ice Hockey Checking Detection from Indoor Localization Data

Juha-Pekka Parto ICE HOCKEY CHECKING DETECTION FROM INDOOR LOCALIZATION DATA Master of Science thesis Faculty of Information Technology and Communication Sciences Examiner: Prof. Joni Kämäräinen January 2021 i ABSTRACT Juha-Pekka Parto: Ice hockey checking detection from indoor localization data Master of Science thesis Tampere University Master’s Degree Programme in Information Technology January 2021 The purpose of this thesis was to perform a preliminary investigation of detecting bodychecks between ice hockey players automatically based on indoor localization data. The objectives of the thesis were to create a bodycheck dataset, train a machine learning algorithm with the dataset and evaluate the performance of the algorithm on full match runs. The location data was obtained from the Wisehockey sport analytics platform. The bodychecks of fourteen professional ice hockey matches were annotated manually using a custom annotation tool. The location data of players involved in the annotated bodychecks and randomly selected gameplay moments were gathered into a dataset. A random forest machine learning algorithm was trained on the dataset. The performance of the classifer was measured with receiver-operating characteristics and area under the curve metrics. These metrics were com- puted for cross-validation splits from the dataset and full matches that were used to create the dataset. The trained classifer performs well in the light of the metrics. It reaches an average AUC of 0.995 on the validation splits during the training phase and 0.992 on the full match runs. The classifer produces a small amount of false positives relative to the number of all negative cases during the full match runs. However, the absolute number of false positives is still many times larger than the amount of actual bodychecks that were annotated in the matches. The fnal system as such does not achieve suffcient performance to be used in a production environment. Typical false positives are situations where the players are contesting the puck and are in close contact. The outcome of this thesis is that the objectives have been met and the purpose has been fulflled. The number of false positives can be lowered by further developing the methods presented in this thesis. The performance of the learning system can be improved even without adding any new data sources. The attributes that were extracted from the location data are not ideal. For example, the representation only accounts for two players and ignores all other players on the ice. Other development directions could be to supplement the location data with acceleration data. Acceleration data would provide information about the impact forces that are present during bodychecks. Another option is to capture video footage of the detected bodychecks and analyze the footage with computer vision. Keywords: machine learning, indoor localization, ice hockey, checking The originality of this thesis has been checked using the Turnitin OriginalityCheck service. ii TIIVISTELMÄ Juha-Pekka Parto: Jääkiekon taklausten tunnistaminen sisäpaikannusdatasta Diplomityö Tampereen yliopisto Tietotekniikan koulutusohjelma Tammikuu 2021 Tämän diplomityön tavoitteena oli tutkia taklauksien automaattista tunnistamista jääkiekon pe- laajien paikannusdatasta. Tavoitteena oli luoda datajoukko taklauksista, kouluttaa koneoppimisal- goritmi datajoukon avulla ja mitata algoritmin suorituskyky kokonaisilla otteluilla. Paikannusdata saatiin Wisehockey urheiluanalytiikka-alustasta. Neljäntoista jääkiekko-ottelun taklaukset annotoitiin videomateriaaleista, jotka toistettiin omatekoisella annotointityökalulla. Da- tajoukkoon kerättiin paikannusdata annotoiduista taklauksista ja satunnaisesti valituista hetkistä ottelun aikana. Satunnaismetsäluokittelija valittiin approksimoimaan oppimisen kohteena ollutta funktiota. Luokittelijan suorituskykyä mitatiin ”receiver operating characteristics”-käyrien ja ”area under the curve”-metriikan avulla. Nämä metriikat ilmoitetaan tässä työssä luokittelijan koulutusvaiheessa validointidatalle ja datajoukon luonnissa käytetyille kokonaisille otteluille. Koulutettu luokittelija suoriutuu hyvin metriikoiden valossa. Se saavuttaa koulutusvaiheessa 0.995 ja kokonaisille otteluille 0.992 keskiarvon ”area under the curve”-metriikalle. Luokittelija tuottaa pienen määrän vääriä hälytyksiä suhteessa kaikkien ei-taklaus tapausten määrään. Vää- rien hälytysten lukumäärä on kuitenkin moninkertainen annotoituihin taklauksiin nähden. Tässä työssä saavutettu lopullinen järjestelmä ei saavuta riittävän hyvää suorituskykyä, jotta sitä voisi käyttää tuotantojärjestelmässä taklausten automaattiseen tunnistamiseen. Tyypillinen väärä hä- lytys on tilanne, jossa pelaajat kamppailevat kiekosta ja ovat lähikontaktissa toistensa kanssa. Lopputuloksena työn tavoitteet saavutettiin ja työn tarkoitus täytettiin. Väärien hälytysten määrää voidaan vähentää kehittämällä tässä työssä esitettyjä menetelmiä. Suorituskykyä voidaan parantaa ilman minkään uudenlaisen datan lisäämistä. Paikannusdatas- ta lasketut ominaisuudet eivät ole parhaimmat mahdolliset. Ominaisuuksissa ei esimerkiksi oteta huomioon muuta kuin kaksi pelaajaa. Kaikki muut pelaajat jääkiekkokaukalossa jätetään huomiot- ta. Järjestelmää voidaan kehittää myös tukemalla paikannusdataa esimerkiksi kiihtyvyysdatalla. Kiihtyvyysdata antaisi tietoa taklauksien aikana ilmaantuvista voimista. Yksi mahdollinen kehitys- suunta on videomateriaalin kerääminen tunnistetuista taklauksista ja analysoimalla videoita kone- näön avulla. Avainsanat: koneoppiminen, sisäpaikannus, jääkiekko, taklaus Tämän julkaisun alkuperäisyys on tarkastettu Turnitin OriginalityCheck -ohjelmalla. iii CONTENTS 1 Introduction . 1 2 Related work . 4 3 Theoretical background . 5 3.1 Indoor localization . 5 3.1.1 Bluetooth and Bluetooth Low Energy . 5 3.1.2 Angle of arrival . 6 3.1.3 Quuppa Intelligent Locating System . 6 3.1.4 Wisehockey . 7 3.2 Machine learning . 8 3.2.1 Receiver operating characteristics . 11 3.2.2 Decision tree learning . 13 3.2.3 Random forests . 15 4 Bodycheck dataset . 17 4.1 Ground truth annotations . 18 4.2 Annotated data . 20 4.3 Dataset analysis . 21 5 Experiments . 26 5.1 The representation and the target function . 26 5.2 The function approximation algorithm . 27 5.3 The full match evaluation . 28 6 Summary . 31 7 Conclusions . 38 References . 40 iv LIST OF FIGURES 1.1 An example bodycheck. 2 3.1 Quuppa indoor localization system overview [18]. 7 3.2 Wisehockey data collection process in the ice hockey scenario. 8 3.3 A screenshot of a video generated from the location data. 10 3.4 The structure of a confusion matrix [25]. 12 3.5 A ROC/AUC example [26]. 13 3.6 A decision tree example [24]. 14 3.7 An illustration of k-fold cross-validation splits. 15 3.8 An example of attribute importance. 16 4.1 A screenshot of the annotation tool. 19 4.2 Average bodychecks in each period. 22 4.3 Bodycheck distribution among player roles. 22 4.4 Bodycheck distribution between home and away teams. 23 4.5 A 2d histogram of all annotated bodychecks in the dataset. 24 4.6 A 2d histogram of all randomly selected non bodychecks. 24 4.7 The bodychecks of the home team in match 429. 25 5.1 ROC/AUC over 14-fold cross-validation training. 28 5.2 Averaged and sorted feature importances. 29 5.3 ROC/AUC over 14 full matches. 30 6.1 Relative false positive trend vs. probability threshold. 32 6.2 False positive trend vs. probability threshold. 32 6.3 False positive example 1. 34 6.4 False positive example 2. 35 6.5 False negative example 1. 36 6.6 False negative example 2. 37 v LIST OF TABLES 6.1 Confusion matrices from the full run on match 429. 33 vi LIST OF PROGRAMS AND ALGORITHMS 3.1 An example of combined location and clock data. .9 4.1 Example metadata fle. 19 4.2 Example annotations. 20 vii LIST OF SYMBOLS AND ABBREVIATIONS API Application programming interface FIR Finite impulse response ISM Industry, scientifc, medical RFID Radio-frequency identifcation UWB Ultra-wideband 1 1 INTRODUCTION In recent years, many different technologies have been adopted in team sports. Large amounts of data including player movement, health statistics and performance from training and matches is being collected. The players train and compete while being monitored by a variety of sensors. The collected data is used to gain information about the players and that information can help coaches and managers to optimize training to improve competition performance. [1] Wearable technologies provide new opportunities for media, television and betting com- panies. The collected data can be combined into datasets which can be used to improve the performance of teams and leagues as an extension. Improved on-feld performance results in increased prize money and more sponsorship deals. Professional betting com- panies use these datasets to exploit ineffciencies in the market to maximize their profts. [1] Perhaps the most important motivation for automatic bodycheck detection is to prevent player injury. According to Hootman et al. concussions amount to 7.9% of all injuries in men’s ice hockey [2]. Coaches and players are interested in injury situations because they want to learn how to avoid them. Automatic bodycheck detection could lead to a vast collection of information about bodychecks without a lot of manual labor. This collection could then be analyzed by teams and coaches and thus be used to improve training practices. Overall, players could be trained better with more in depth knowledge about the situations that

Load more