Juha-Pekka Parto

ICE HOCKEY CHECKING DETECTION FROM INDOOR LOCALIZATION DATA

Master of Science thesis Faculty of Information Technology and Communication Sciences Examiner: Prof. Joni Kämäräinen January 2021 i

ABSTRACT

Juha-Pekka Parto: checking detection from indoor localization data Master of Science thesis Tampere University Master’s Degree Programme in Information Technology January 2021

The purpose of this thesis was to perform a preliminary investigation of detecting bodychecks between ice hockey players automatically based on indoor localization data. The objectives of the thesis were to create a bodycheck dataset, train a machine learning algorithm with the dataset and evaluate the performance of the algorithm on full match runs. The location data was obtained from the Wisehockey sport analytics platform. The body- checks of fourteen professional ice hockey matches were annotated manually using a custom annotation tool. The location data of players involved in the annotated bodychecks and randomly selected gameplay moments were gathered into a dataset. A random forest machine learning algorithm was trained on the dataset. The performance of the classifer was measured with receiver-operating characteristics and area under the curve metrics. These metrics were com- puted for cross-validation splits from the dataset and full matches that were used to create the dataset. The trained classifer performs well in the light of the metrics. It reaches an average AUC of 0.995 on the validation splits during the training phase and 0.992 on the full match runs. The classifer produces a small amount of false positives relative to the number of all negative cases during the full match runs. However, the absolute number of false positives is still many times larger than the amount of actual bodychecks that were annotated in the matches. The fnal system as such does not achieve suffcient performance to be used in a production environment. Typical false positives are situations where the players are contesting the puck and are in close contact. The outcome of this thesis is that the objectives have been met and the purpose has been fulflled. The number of false positives can be lowered by further developing the methods presented in this thesis. The performance of the learning system can be improved even without adding any new data sources. The attributes that were extracted from the location data are not ideal. For example, the representation only accounts for two players and ignores all other players on the ice. Other development directions could be to supplement the location data with acceleration data. Acceleration data would provide information about the impact forces that are present during bodychecks. Another option is to capture video footage of the detected bodychecks and analyze the footage with computer vision.

Keywords: machine learning, indoor localization, ice hockey, checking

The originality of this thesis has been checked using the Turnitin OriginalityCheck service. ii

TIIVISTELMÄ

Juha-Pekka Parto: Jääkiekon taklausten tunnistaminen sisäpaikannusdatasta Diplomityö Tampereen yliopisto Tietotekniikan koulutusohjelma Tammikuu 2021

Tämän diplomityön tavoitteena oli tutkia taklauksien automaattista tunnistamista jääkiekon pe- laajien paikannusdatasta. Tavoitteena oli luoda datajoukko taklauksista, kouluttaa koneoppimisal- goritmi datajoukon avulla ja mitata algoritmin suorituskyky kokonaisilla otteluilla. Paikannusdata saatiin Wisehockey urheiluanalytiikka-alustasta. Neljäntoista jääkiekko-ottelun taklaukset annotoitiin videomateriaaleista, jotka toistettiin omatekoisella annotointityökalulla. Da- tajoukkoon kerättiin paikannusdata annotoiduista taklauksista ja satunnaisesti valituista hetkistä ottelun aikana. Satunnaismetsäluokittelija valittiin approksimoimaan oppimisen kohteena ollutta funktiota. Luokittelijan suorituskykyä mitatiin ”receiver operating characteristics”-käyrien ja ”area under the curve”-metriikan avulla. Nämä metriikat ilmoitetaan tässä työssä luokittelijan koulutus- vaiheessa validointidatalle ja datajoukon luonnissa käytetyille kokonaisille otteluille. Koulutettu luokittelija suoriutuu hyvin metriikoiden valossa. Se saavuttaa koulutusvaiheessa 0.995 ja kokonaisille otteluille 0.992 keskiarvon ”area under the curve”-metriikalle. Luokittelija tuottaa pienen määrän vääriä hälytyksiä suhteessa kaikkien ei-taklaus tapausten määrään. Vää- rien hälytysten lukumäärä on kuitenkin moninkertainen annotoituihin taklauksiin nähden. Tässä työssä saavutettu lopullinen järjestelmä ei saavuta riittävän hyvää suorituskykyä, jotta sitä voisi käyttää tuotantojärjestelmässä taklausten automaattiseen tunnistamiseen. Tyypillinen väärä hä- lytys on tilanne, jossa pelaajat kamppailevat kiekosta ja ovat lähikontaktissa toistensa kanssa. Lopputuloksena työn tavoitteet saavutettiin ja työn tarkoitus täytettiin. Väärien hälytysten määrää voidaan vähentää kehittämällä tässä työssä esitettyjä menetelmiä. Suorituskykyä voidaan parantaa ilman minkään uudenlaisen datan lisäämistä. Paikannusdatas- ta lasketut ominaisuudet eivät ole parhaimmat mahdolliset. Ominaisuuksissa ei esimerkiksi oteta huomioon muuta kuin kaksi pelaajaa. Kaikki muut pelaajat jääkiekkokaukalossa jätetään huomiot- ta. Järjestelmää voidaan kehittää myös tukemalla paikannusdataa esimerkiksi kiihtyvyysdatalla. Kiihtyvyysdata antaisi tietoa taklauksien aikana ilmaantuvista voimista. Yksi mahdollinen kehitys- suunta on videomateriaalin kerääminen tunnistetuista taklauksista ja analysoimalla videoita kone- näön avulla.

Avainsanat: koneoppiminen, sisäpaikannus, jääkiekko, taklaus

Tämän julkaisun alkuperäisyys on tarkastettu Turnitin OriginalityCheck -ohjelmalla. iii

CONTENTS

1 Introduction ...... 1 2 Related work ...... 4 3 Theoretical background ...... 5 3.1 Indoor localization ...... 5 3.1.1 Bluetooth and Bluetooth Low Energy ...... 5 3.1.2 Angle of arrival ...... 6 3.1.3 Quuppa Intelligent Locating System ...... 6 3.1.4 Wisehockey ...... 7 3.2 Machine learning ...... 8 3.2.1 Receiver operating characteristics ...... 11 3.2.2 Decision tree learning ...... 13 3.2.3 Random forests ...... 15 4 Bodycheck dataset ...... 17 4.1 Ground truth annotations ...... 18 4.2 Annotated data ...... 20 4.3 Dataset analysis ...... 21 5 Experiments ...... 26 5.1 The representation and the target function ...... 26 5.2 The function approximation algorithm ...... 27 5.3 The full match evaluation ...... 28 6 Summary ...... 31 7 Conclusions ...... 38 References ...... 40 iv

LIST OF FIGURES

1.1 An example bodycheck...... 2

3.1 Quuppa indoor localization system overview [18]...... 7 3.2 Wisehockey data collection process in the ice hockey scenario...... 8 3.3 A screenshot of a video generated from the location data...... 10 3.4 The structure of a confusion matrix [25]...... 12 3.5 A ROC/AUC example [26]...... 13 3.6 A decision tree example [24]...... 14 3.7 An illustration of k-fold cross-validation splits...... 15 3.8 An example of attribute importance...... 16

4.1 A screenshot of the annotation tool...... 19 4.2 Average bodychecks in each period...... 22 4.3 Bodycheck distribution among player roles...... 22 4.4 Bodycheck distribution between home and away teams...... 23 4.5 A 2d histogram of all annotated bodychecks in the dataset...... 24 4.6 A 2d histogram of all randomly selected non bodychecks...... 24 4.7 The bodychecks of the home team in match 429...... 25

5.1 ROC/AUC over 14-fold cross-validation training...... 28 5.2 Averaged and sorted feature importances...... 29 5.3 ROC/AUC over 14 full matches...... 30

6.1 Relative false positive trend vs. probability threshold...... 32 6.2 False positive trend vs. probability threshold...... 32 6.3 False positive example 1...... 34 6.4 False positive example 2...... 35 6.5 False negative example 1...... 36 6.6 False negative example 2...... 37 v

LIST OF TABLES

6.1 Confusion matrices from the full run on match 429...... 33 vi

LIST OF PROGRAMS AND ALGORITHMS

3.1 An example of combined location and clock data...... 9 4.1 Example metadata fle...... 19 4.2 Example annotations...... 20 vii

LIST OF SYMBOLS AND ABBREVIATIONS

API Application programming interface FIR Finite impulse response ISM Industry, scientifc, medical RFID Radio-frequency identifcation UWB Ultra-wideband 1

1 INTRODUCTION

In recent years, many different technologies have been adopted in team sports. Large amounts of data including player movement, health statistics and performance from train- ing and matches is being collected. The players train and compete while being monitored by a variety of sensors. The collected data is used to gain information about the play- ers and that information can help coaches and managers to optimize training to improve competition performance. [1]

Wearable technologies provide new opportunities for media, television and betting com- panies. The collected data can be combined into datasets which can be used to improve the performance of teams and leagues as an extension. Improved on-feld performance results in increased prize money and more sponsorship deals. Professional betting com- panies use these datasets to exploit ineffciencies in the market to maximize their profts. [1]

Perhaps the most important motivation for automatic bodycheck detection is to prevent player injury. According to Hootman et al. concussions amount to 7.9% of all injuries in men’s ice hockey [2]. Coaches and players are interested in injury situations because they want to learn how to avoid them. Automatic bodycheck detection could lead to a vast collection of information about bodychecks without a lot of manual labor. This collection could then be analyzed by teams and coaches and thus be used to improve training practices. Overall, players could be trained better with more in depth knowledge about the situations that lead to harmful injuries and the sport could be made safer for all.

The research question of this thesis is whether machine learning can be used to detect bodychecks from indoor localization data from professional ice hockey matches? This work has three contributions:

1. To create a dataset which contains bodychecks and random gameplay moments. 2. To train a machine learning model using the bodychecking dataset. 3. To evaluate the performance of the trained model on complete matches.

The International Ice Hockey Federation defnes bodychecking as follows:

"A bodycheck represents contact by a skater on an opposing skater, so long as the objective is to separate the opponent from the puck. Any skater who is in control or possession of the puck can be bodychecked provided that:

(a) the bodycheck is made with the hips, body, or shoulder; 2

Figure 1.1. An example bodycheck.

(b) contact with the opponent is from in front or to the side and does not target the head or neck area or the lower body (below the hip).

There is no such thing as a clean bodycheck to the back, head, or lower body of an opponent. There is no such thing as a clean bodycheck made principally with the lower body, stick, or head. There is no such thing as a clean bodycheck on a goaltender." [3]

An example of what a bodycheck looks like is given in Figure 1.1. The fgure contains 12 frames from a television broadcast video recording with the frst frame in the upper left corner and frames progressing from left to right and top to bottom.

Outdoor sports which utilize wearable technology rely on the global positioning system to track player movements. Due to the inability of GPS to maintain connection to satellites in indoor settings, some indoor sports have relied on local positioning systems instead [4]. Ice hockey is an indoor sport and thus an indoor positioning system is used in this work.

Chapter 2 explores the previous works published on indoor localization systems and ma- chine learning for wearable sensors with an emphasis on tackle and collision detection in team sports. Chapter 3 explains the theoretical background for the methods used in this thesis. Chapter 4 clarifes what the bodychecking dataset contains and how it was created. A brief analysis of the dataset is also presented in this chapter. A machine learning model is trained and applied to full match runs in Chapter 5. The results of the 3 experiments are analyzed in Chapter 6. Chapter 7 fnally concludes the work and gives practical future research directions. 4

2 RELATED WORK

Indoor localization systems have been implemented using several technologies and tech- niques [5]. One such system is developed by Quuppa, a privately owned company in Finland. Their real time location system has been used in a few sport scenarios. Swarén et al. investigated the possibility of using the Quuppa real time locating system to mon- itor the position and velocity of cross-country skiers in a competition setting [6]. They recorded position data from 70 skiers, ftted spline models to the data and analyzed skier velocities and distances. Both Figueira et al. and Colino et al. evaluated the accuracy of NBN23 indoor tracking system [7, 8]. NBN23 is built on the same Quuppa hardware and is used in several basketball leagues. Douglas et al. used another commercial indoor localization system (ClearSky T6) by Catapult Sports to track the velocities and distances of ice hockey players during competition [4]. The players wore microsensors which com- municated with an array of signal receivers using ultra-wideband technology. The location of the microsensors was computed by an algorithm based on time difference of arrival, two-way ranging and angle of arrival.

Machine learning and deep learning with wearable sensors have been investigated for a variety of sports [9, 10]. There are several attempts of identifying collision and tackle events in various team sports in the literature of human activity recognition. Kelly et al. investigated tackle modeling techniques to automatically detect player tackles and collisions using sensor technology [11]. Their system was able to identify collisions with high accuracy but it was unable to differentiate between collisions and tackles. Gastin et al. investigated the possibility of using a commercial microsensor device (MinimaxX S4) and a tackle detection algorithm for rugby to detect tackles and impact events in elite Australian football [12]. They suggest that sport and event-specifc algorithms are required instead. The work of Chambers et al. supports the claim of using sport and event-specifc algorithms [13]. Gastin et al. used another commercial microsensor device (Catapult S5) and designed an algorithm based on a random forest model to automate the detection of ruck and tackle events in rugby union. Hulin et al. studied the use of the same microsensor device as Gastin et al. to identify collision events during rugby matches [14]. Their work demonstrated that collisions detected using wearable technology are positively correlated with video coded collision events. Hardegger et al. used acceleration and rotation data from three inertial measurement units worn by ice hockey players to identify ice hockey hits [15]. They used a random forest classifer to detect several different gameplay events including tackles which were detected with a high accuracy. 5

3 THEORETICAL BACKGROUND

This chapter explores the theory behind the methods used in this thesis. The theory is divided into two sections: indoor localization and machine learning. Both sections follow the same pattern of briefy going through the basics and then moving on to more specifc theory relevant to this thesis.

3.1 Indoor localization

Indoor localization is the process of obtaining the location of a person or a device in an indoor setting. One of the main motivations of indoor localization is Location Based Services. A device and its user could be tracked inside a shopping mall or a hospital and given exact guidance on how to reach their destination. The user could also be rewarded by a store in the shopping mall based on their location. The location data of a large number of users could provide insightful movement patterns which could be used to increase sales. [5]

An indoor localization system must be able to transmit and receive wireless signals. The signals can be transmitted using any existing technology. WiFi, Radio-frequency identif- cation (RFID), Ultra-wideband (UWB) or Bluetooth is commonly used. The system must also process the incoming signals to determine the location of the transmitter. There are many metrics which can be extracted from the signals. Angle of arrival, time of fight, return time of fight and received signal strength are commonly selected localization met- rics. These technologies and metrics can be freely mixed to create new systems. Many kinds of combinations have been proposed in the literature. Different solutions have differ- ent benefts and downsides. The solutions can be evaluated based on energy effciency, range, latency, scalability, tracking accuracy and cost. [5]

3.1.1 Bluetooth and Bluetooth Low Energy

Bluetooth is a wireless technology operating in the unlicensed 2.4 GHz radio band re- served for industrial, medical and scientifc applications. Bluetooth is designed for short- range communication. Bluetooth is a replacement for cables connecting various personal devices to each other. Bluetooth technology enables ad hoc connectivity of devices with- out relying on an existing network infrastructure. Bluetooth supports asynchronous data transmission and synchronous audio streams. The development of Bluetooth is con- 6 trolled by the Bluetooth Special Interest Group. The frst version of the Bluetooth industry standard was published in 1999. [16]

Bluetooth Low Energy is a low-power version of the Bluetooth technology introduced in the 4th version of the Bluetooth specifcation. Its birth was advocated by the momentum of other low-power wireless technologies such as ZigBee. Theoretically, the lifetime of a BLE device powered by a coin cell battery has a lifetime ranging from 2 days to 14.1 years. This lifetime depends on a trade-off between energy consumption, latency and throughput. Healthcare, wellness and sports are domains where classic Bluetooth had already been used and where BLE can provide improvements. [17]

3.1.2 Angle of arrival

The position of an object can be estimated by measuring the angle at which a transmitted signal arrives into a sensing device. The angle of arrival method is based on measuring this angle when the signal arrives into a collection of receiver antennas. The angle in- formation combined with the time difference of arrival into each element of the antenna collection can be used to calculate the transmitter’s position. [5]

The beneft of angle of arrival is that the location can be estimated by only a small number of antennas. 2 dimensional localization is achieved with only 2 antennas and 3 antennas are required for 3 dimensional localization. However, angle of arrival requires carefully calibrated hardware that must be at a in a predetermined and fxed position. The esti- mated angles are most accurate on short distances and the accuracy deteriorates as the distance between the signal transmitter and the receivers grows. [5]

3.1.3 Quuppa Intelligent Locating System

The Quuppa Intelligent Locating System is a real time indoor localization solution based on Bluetooth low energy and angle of arrival. The system measures signals transmitted by electronic tags with an array of locators installed to the monitored area. Based on the measurements, Quuppa Positioning Engine software computes the positions of the tags using a proprietary algorithm. The basic principle of the locating system is illustrated in fgure 3.1. [18]

The tags are small and battery powered printed circuit boards with Bluetooth connectivity. The transmission rate of individual tags is confgurable up to 200 Hz. Quuppa offers a standard tag design which is lightweight (1.5 g), shockproof and waterproof. However, the standard design can be modifed to ft more use cases. The locators require a one time installation in the environment that is monitored and they do not require any phys- ical maintenance after the installation. The locators can be monitored remotely and the Quuppa system will notify users of any disturbances. [19]

Quuppa Positioning Engine is a proprietary software that computes real time tag positions based on angle of arrival measurements from an array of locators. The position data 7

Figure 3.1. Quuppa indoor localization system overview [18]. consist of x, y and z coordinates inside the confgured environment. Quuppa Positioning Engine can be run locally or in cloud platforms. It provides tools for monitoring tags and locators in a detailed manner and an Application programming interface (API) for integration into other systems. [20]

3.1.4 Wisehockey

Wisehockey is an automatic, real time sport analytics platform. Wisehockey removes manual effort from calculating sport statistics by calculating and visualizing statistics au- tomatically. Teams and coaches have access to statistics ranging from individual shots to the teams performance during the whole season. Automated statistics calculation offers a lot of entertainment value as well as betting opportunities [21]. Wisehockey is being used in thousands of ice hockey matches by the Kontinental Hockey League and the Finnish Hockey League [22]. Wisehockey interfaces with the leagues’ computer system to download match schedules, match rosters and player information [23].

Figure 3.2 depicts the data collection architecture in the ice hockey scenario. Grey and red colors are assigned to hardware and software components respectively. The infor- mation fow between the components is also presented. The architecture is divided into three parts:

1. The ice hockey rink. 2. The local server. 3. The cloud server.

An array of Quuppa LD-7L locators are installed to the ceiling above the ice hockey rink in each arena. During a match, Quuppa QT1 tags are worn by players in their protective shoulder pads and by offcials in their pockets. Hockey pucks with similar tags embedded inside them are used in the matches. The tags emit bluetooth signals which are captured by the locators. Quuppa Management Engine, running on a local server inside the arena, collects angle of arrival measurements that the locators output. Quuppa Management Engine then computes the locations of the individual tags. 8

Figure 3.2. Wisehockey data collection process in the ice hockey scenario.

On the local server, a controller software is collecting the tag location data. The software is interfaced with the offcial match clock in the arena to capture the match state and time during a match. The clock data is used to differentiate actual gameplay situations from stoppages of play. Unix timestamps are added to the location data and clock data. The data are then combined and streamed onto a cloud server for long term storage and statistics calculation. An example of what the combined data contains is given in Program 3.1. The collected location data is noisy. Before computing statistics, the location data is low-pass fltered using a Finite impulse response (FIR) flter with a Hamming window.

The Wisehockey system generates video material based on the collected location and clock data. A screenshot of a video generated from the location data is presented in Figure 3.3. An ice hockey rink is drawn to each frame. Players and the puck are added to the positions received from the ice hockey rink. The period, match time and information are received from the actual clock used in the ice hockey match.

3.2 Machine learning

Machine learning is concerned with the question of how to build computer programs which improve from experience automatically. Example applications include detecting fraudulent credit card transactions, recommendation systems and autonomous vehicles. Machine learning combines concepts from statistics, artifcial intelligence, philosophy, in- formation theory, biology, cognitive science, computational complexity and control theory. Algorithms for various learning tasks have been invented. [24] 9

1 { 2 " data " : [ 3 { 4 "tagId": "playerTag01", 5 "timestamp": 1584037800000, 6 " x " : 2.74 , 7 " y " : 3.65 , 8 " z " : 1.62 9 } , 10 { 11 "tagId": "playerTag02", 12 "timestamp": 1584037800015, 13 " x " : −1.04 , 14 " y " : −2.13 , 15 " z " : 1.64 16 } , 17 { 18 "clock_time": "19:52", 19 "match_state": "match_in_progress", 20 "timestamp": 1584037800017 21 } , 22 { 23 "tagId": "puckTag01", 24 "timestamp": 1584037800027, 25 " x " : 2.66 , 26 " y " : 3.75 , 27 " z " : 0.23 28 } 29 ] 30 } Program 3.1. An example of combined location and clock data.

Mitchell defnes learning problems precisely as follows:

"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."

Many learning problems can be defned like this. For example, a computer program that learns to play checkers [24]:

• Task T: play checkers. • Performance measure P: percent of games won. • Experience E: playing games against itself.

The design of a learning system involves the making of several crucial decisions. The frst of which is the type of training experience from which the system will learn. The training experience has a signifcant impact on the success or failure of the system. An 10

Figure 3.3. A screenshot of a video generated from the location data. The team names, scores and clock information are displayed at the top of the generated frames. The player and puck positions are depicted by the blue, red and black circles. Team logos are drawn on the ice to indicate which team plays on which side of the rink. important attribute of the training experience is what kind of feedback it provides. Direct feedback means that the learning system receives an immediate feedback on the choices it makes. The alternative is indirect feedback which is given only after a sequence of choices. For example, a checkers playing program might receive direct feedback from individual moves after they are made or indirect feedback after the game is over. Another important attribute of the training experience is the amount of control that the learning system has over it. The system might rely on a teacher to provide the experience and the correct answer. On the other end of the spectrum, the system may have complete control of the learning experience and it will experiment with novelties or exploit previ- ously learned things. The third important attribute of training experience is how well it represents the distribution of examples which the fnal system will encounter. Ideally, the training experience follows the same distribution as future test examples. Most theory of machine learning relies on this assumption. [24]

The second design choice is the choosing of a target function. The target function rep- resents the type of knowledge which will be learned. A checkers playing program that can generate all legal moves only needs to learn how to select the best move. The target function in this case could be called ChooseMove : B → M which accepts as an input any board confguration and outputs the best move for that confguration. Alternative tar- get functions should be considered because some are easier to learn than others. For example, an alternative for ChooseMove is Evaluate : B → ℜ which assigns a real val- ued score to the given board confguration. This alternative could be used to evaluate successor board confgurations generated by each legal move. Reducing the learning 11 problem into learning some particular function is key. It may be diffcult to learn an ex- act representation of the target function and it is expected to learn some approximation instead. This process is called function approximation. [24]

Next, the representation that the learning system will use to represent the target func- tion must be selected. There are often many options from which to choose. All of them involve a trade-off. A simple representation might not be able to capture all of the rel- evant information for the classifcation task, whereas a more expressive representation allows a closer approximation of the target function. However, An expressive representa- tion requires more training data to enable the learning system to choose from the many alternative hypotheses it can represent. [24]

The last design choice involves choosing a function approximation algorithm. A set of training examples have to be obtained from the training experiences. Each training ex- perience is paired together with a target value. The target values can be obtained from the direct or indirect feedback provided by a teacher or the learning system itself. The learning algorithm will weigh the different attributes of the representation to fnd the best ft for the training examples. A common way to fnd the best ft is to adjust the weighting and minimize the squared error between the target values and the values predicted by the learning algorithm. [24]

3.2.1 Receiver operating characteristics

Receiver operating characteristics (ROC) graphs are used in machine learning to visu- alize classifer performance. ROC graphs have properties which make them useful for problems with skewed class distribution or unequal classifcation error costs. Given a binary classifer, there are two possible output classes (true or false). A binary classifer can erroneously produce false positive or false negative outputs. False positives are sit- uations where the ground truth of the input is false and the classifer output is true. In a false negative case, the ground truth is true and the the classifer output is false. When a classifer predicts the correct output, the result of the prediction is either true positive (ground truth and predicted output are true) or true negative (ground truth and predicted output are false). A two-by-two confusion matrix can be formed from these four values. [25]

The confusion matrix is the basis of many other performance metrics used in machine learning. Figure 3.4 depicts the structure of a confusion matrix. There are four cells in the matrix. After a classifer has made its predictions, the amount of each outcome is counted and inserted into their respective cells. The cells along the major diagonal of the matrix represent the correctly predicted outcomes and the cells on the other diagonal represent the errors made by the classifer or its ”confusion”. [25]

T rue positives true positive rate = (3.1) T rue positives + F alse negatives 12

Figure 3.4. The structure of a confusion matrix [25].

F alse positives false positive rate = (3.2) F alse positives + T rue negatives

ROC graphs are two dimensional. The rate of true positives is plotted on the Y-axis and the rate of false positives on the X-axis. Equations 3.1 and 3.2 give the formulas for these values. These numbers are always in the range of [0, 1]. The main idea of ROC graphs is to display the trade-off between true positives and false positives. Classifers produce false positive and true positive pairs that can be thought of as coordinates in the ROC space. The closer these coordinates are to the point (0, 1), the better the performance of the classifer is. The diagonal y = x represents the performance of random classifcation. [25]

Some classifers produce only class labels as predictions. These kind of classifers can only produce a single confusion matrix. The matrix converts into a single point in the ROC space. To plot a curve in the ROC space, a classifer needs to output class probabilities. These classifers are called ranking or scoring classifers. Such a classifer can be used together with an adjustable probability threshold to produce many confusion matrices. A sequence of points in the ROC space can be obtained by setting the probability threshold to 0, incrementing the threshold by a small amount, making predictions with the classifer and computing the confusion matrix. This process should be repeated until the threshold reaches its maximum at 1. [25]

The area under a ROC curve (AUC) is a single scalar value which can be used to com- pare the performance of classifers. AUC is computed by adding together the areas of trapezoids formed by vertical slices of the graph between two successive ROC points. The summed area of trapezoids is divided by the total possible area to scale the value to a number between 0 and 1. In practice, no classifer should have an AUC less than 0.5 because that is equivalent to random classifcation performance. AUC is statistically important because it is equivalent to the probability of a classifer ranking a randomly cho- sen positive instance higher than a randomly selected negative instance. A higher AUC 13

Figure 3.5. A ROC/AUC example [26]. corresponds with a higher performance on average. An example ROC curve is displayed in Figure 3.5. The AUC for this curve is presented in the legend. [25]

3.2.2 Decision tree learning

Decision tree learning is a function approximation method that represents the learned function as a decision tree. The tree can be imagined as sets of if-statements in specifc sequences. Decision trees consist of nodes and branches. Each node specifes a test of some feature of the input data and each branch descending from a node is a possi- ble value of the feature. Input data classifcation starts from the root node of the tree where the frst test is specifed. The value of the feature specifed in the test determines which branch is travelled down to the next node. This process is repeated until it reaches a terminal node which gives a classifcation result. Figure 3.6 visualizes a simple de- cision tree which can classify whether a given Saturday morning is suitable for playing tennis. For example the instance (Outlook = Sunny, T emperature = Hot, Humidity = High, W ind = Strong) would traverse the tree on its leftmost branches and be classifed as a No. [24]

Decision tree learning is best suited for problems where:

• Instances are described by a fxed set attributes and their values. 14

Figure 3.6. A decision tree example [24].

• The target function has discrete output values. • The training data contains errors or missing attribute values.

Extensions of the core decision tree learning algorithm allow the handling of real-valued attributes and output values. Many learning problems ft these characteristics and deci- sion trees have been applied to practical applications in medicine, industry and banking. [24]

Most decision tree learning algorithms are based on a core algorithm which uses a top- down greedy search through the space of possible decision trees. The core algorithm starts by searching the attribute which should be tested at the root of the tree. The best attribute is the one which maximizes information gain. This is measured by calculating the expected reduction in entropy caused by partitioning the training examples into separate branches of the tree. Descendants for the root node are created for each possible value and the training examples are sorted to the descendant nodes which match their value of the tested attribute. The process is repeated for the descendants using the sorted training examples. [24]

The core algorithm grows each branch of the tree just enough to classify the training examples perfectly. This can lead to diffculties when the data is noisy or when there is not enough training data to represent the target function. The basic decision tree algorithm can overft to the training examples in these cases. A hypothesis is said to be overftting when there exists another hypothesis which fts the training examples less well but performs better over the distribution of all examples, seen and unseen instances alike. An indication of overftting is that while the tree grows larger and the training accuracy 15

Figure 3.7. An illustration of k-fold cross-validation splits. increases, the accuracy over validation examples decreases. [24]

One way to avoid overftting is to split the available data into two sets: training and val- idation. The training set is used to form the hypothesis and the validation set is used to evaluate the accuracy of the hypothesis. The idea is that the same random fuctua- tions which mislead the algorithm are present in the validation set aswell. If the available dataset is very small, a more advanced method can be utilized. This method is called k-fold cross-validation. In k-fold cross-validation, the dataset is partitioned into k subsets. Training and validation of the hypothesis is also performed k times. Each time a different partition of the dataset is used as the validation set and the rest are used as the training examples. This method ensures that all of the available data is used exactly once in the hypothesis validation. The results of k-fold cross-validation are often averaged. Figure 3.7 presents an illustration of cross-validation. The dataset is split into three equal sized sets. A different validation set is selected at each iteration. [24]

3.2.3 Random forests

Random forests are an implementation of bagging or bootstrap aggregation. Bagging is a technique that reduces the variance of the hypothesized target function by averaging multiple classifers. Decision trees are ideal for bagging because they can capture com- plex behaviour and they have a low bias. A random forest is a collection of uncorrelated tree-based classifers. The idea in random forests is to improve the variance reduction of 16

Figure 3.8. The Iris dataset contains measurements of sepals and petals of three Iris fower species. The information gain of each attribute, computed by the scikit-learn ran- dom forest classifer, is displayed on the y-axis. bagging by reducing the correlation between the individual trees. This is achieved through a process of randomly selecting the attributes which are used by an individual tree. At each node in a decision tree, a small subset of the attributes are selected randomly be- fore deciding which attribute is the one that maximizes information gain. Commonly the square root of the number of attributes is used when selecting the attributes. A smaller number of attributes reduces the correlation between the trees and the variance of the predicted output values. [27]

In regression problems, where the output of the target function is a continuous value, a random forest can simply average the results of the individual decision trees. If a random forest is used in classifcation, the trees form a committee which votes for an output class. Random forests can also measure the importance of the input attributes. The amount of information gain produced by each attribute is accumulated when the individual decision trees are grown. For example, the importance of the attributes in the Iris dataset are plotted in Figure 3.8. [27] 17

4 BODYCHECK DATASET

The problem studied in this thesis can be defned using the learning problem defnition explored in the previous chapter. The defnition is:

• Task T: detect bodychecks from the indoor localization data of professional ice hockey players. • Performance measure P: receiver operating characteristics graphs and area under the curve. • Experience E: a dataset of bodycheck and non-bodycheck training examples and target values.

In this chapter, we focus on the training experience E by explaining our design choices, the process of obtaining the training examples and the contents of the resulting dataset. An analysis of the dataset is also provided in this chapter in the form of trends found in bodychecks. The creation of the dataset was the frst objective of this thesis.

Ideally, the method investigated in this thesis should be a production ready system that could be integrated as such into an automated statistics calculation software for ice hockey. Our frst design choice was that the learning system would be trained using player and puck location data collected from professional ice hockey matches. The train- ing examples are paired with human labeled target outputs that provide a direct feedback to the learning algorithm. The distributions of the training examples and all possible body- checks is likely very close if enough data is collected. The Wisehockey system used in this work already has location data from thousands of ice hockey matches but there was a complete lack of bodycheck annotations. The only realistic solution for this problem was to annotate manually as many bodychecks as possible in a reasonable time frame. The data of these bodychecks together with that of randomly selected negative training examples were collected and put together into a single dataset.

Location and clock data of 14 Finnish Hockey League matches played during the 2019- 2020 regular season were collected and processed by the Wisehockey system and then later obtained for this work. The matches were played between 07.03.2020 and 12.03.2020 and they are identifed by by ids 429-442 [28]. The 14 matches were selected so that all matches were played in a different rinks. The ice hockey rinks used in the Finnish elite league have different dimensions and the indoor localization system hardware can- not be installed identically in every arena. These differences in the rinks add a different amount of measurement error and noise into the location data. 18

4.1 Ground truth annotations

The ice hockey players during a professional match are in physical contact very often. This makes the judging of what actually is a bodycheck somewhat subjective. Not all collisions or contacts between the players are bodychecks and this can be diffcult to determine even by a human observer in some situations.

Television broadcasts of all matches were recorded into video fles and then each record- ing was split into periods. The videos generated by the Wisehockey system from the location data of each match were also split into periods. The videos of a single period were synchronized using the match clock and the movements of the puck in both videos. The videos were synchronized using a custom software that can adjust the start time of both videos, place individual frames of one video above the frames of the other video and then saving them into a new video fle. The videos must have the same framerate for this process to function properly. 25 frames per second was used in this work.

The videos of each match were accompanied by a single metadata fle. The metadata fles contained two things:

1. Period start timestamps obtained from the combined location and clock data. 2. A mapping of players in the match roster and the tags worn by those players during the match.

The tag-to-player mapping was required because of the fact that the location data is collected from the tags which are assigned to specifc players during the match. The same mapping must be used during the annotation process. Otherwise there would be a mismatch between the tags and the players and that would result in bad annotations. An example of a metadata fle is presented in Program 4.1.

A custom annotation tool was used to create annotations of bodychecks. A screenshot of the annotation tool is given in Figure 4.1. A total of 775 bodychecks were annotated during this work. The annotations were placed as accurately as possible at the moment where the players frst make contact. Example annotations are given in Program 4.2. The annotation tool plays a period video fle and reads the accompanied metadata fle. The tool provides a user interface for easy creation, modifcation, browsing, and deletion of the annotations. The annotation tool saves the following information about bodychecks:

1. Unix timestamp. 2. Checking player tag id. 3. Target player tag id. 4. Team of the checking player. 19

1 { 2 "period_1_start_timestamp": 1584037799434, 3 "period_2_start_timestamp": 1584037823456, 4 "period_3_start_timestamp": 1584037915467, 5 "tag_mapping": [ 6 { 7 "player_number": 12, 8 "player_team": "Home", 9 "player_tag_id": "PlayerTag01", 10 "player_name": "Juha−M a t t i Aaltonen " 11 } , 12 { 13 " player_number " : 7 , 14 "player_team": "Away", 15 "player_tag_id": "PlayerTag02", 16 "player_name": "Lasse Kukkonen " 17 } 18 ] 19 } Program 4.1. Example metadata fle.

Figure 4.1. A screenshot of the annotation tool. 20

1 { 2 "annotations": [ 3 { 4 "checkingPlayer": "playerTag01", 5 "targetPlayer": "playerTag02", 6 "checkingTeam": "Home", 7 "timestamp": 158403780020 8 } , 9 { 10 "checkingPlayer": "playerTag02", 11 "targetPlayer": "playerTag01", 12 "checkingTeam": "Away", 13 "timestamp": 158403780614 14 } 15 ] 16 } Program 4.2. Example annotations.

4.2 Annotated data

After the bodychecks in the matches were annotated, the location data that matches the annotations were collected. Headegger et al. determined that the optimal window size for their problem was 0.6 seconds [15]. Even though their data is different, the same window size is used in this work. The windows were positioned so that the annotated moments were in the middle of the window. The location data of the checking player, target player and the puck were saved. For each annotated bodycheck, the location data of a random moment in the same match were collected as a negative training example. The checking player and target player were randomly selected among the players with goaltenders excluded. Special care was taken to prevent the random moments from overlapping with annotations, other random moments and stoppages of play.

The location data was collected into a single dataset with proper labels for both classes. Some metadata was added to allow further analysis of the dataset. The resulting dataset had the following information:

1. Match id. 2. Timestamp. 3. Period. 4. Checking player tag id. 5. Checking player role. 6. Checking player team (home/away). 7. Checking player x data. 8. Checking player y data. 21

9. Target player tag id. 10. Target player role. 11. Target player team (home/away). 12. Target player x data. 13. Target player y data. 14. Puck x data. 15. Puck y data. 16. Rink length. 17. Rink width. 18. Label.

Player information and rink dimensions were obtained from the Wisehockey system. The resulting dataset contained 1550 training examples 775 of which were annotated body- checks and the rest random gameplay moments.

4.3 Dataset analysis

The overall amount of bodychecks in each period was counted and the sums were grouped by period number and then averaged. There are three normal 20 minute pe- riods in all matches. 5 matches were tied after the frst three periods and an extra period was played in those matches. The extra period is played with 3 players on the ice from both teams. The maximum length of the extra period is 5 minutes and it ends if a is scored. The average amount of bodychecks grouped by period number is presented in Figure 4.2. In these matches, there was a trend where the amount of bodychecks slightly declined when the matches advanced. Ice hockey is a demanding sport and this trend could happen because the players get more tired towards the end of the match. Another reason for this trend could be that the players are not willing risk getting a penalty from a bodycheck gone wrong.

The players in ice hockey matches play different positions or roles in the team. These roles are forward, defenseman and goaltender. Forwards can be further divided into left wing, right wing and center roles. Similarly, defensemen can be divided into left and right roles. The data was grouped by checking player roles, counted and plotted. The same was done by grouping by target player roles. Figure 4.3 displays the distribution of performed and received bodychecks among the player roles. Goaltenders were excluded as there are no clean bodychecks on goalkeepers according to the IIHF rule book [3]. Individually these distributions are not so interesting. However, the two distributions side- by-side reveal that forwards receive slightly more hits than they perform. The opposite is true for defensemen.

Similarly to the division with respect to player roles, the dataset can be divided among the home teams and the away teams. Here, the data were grouped by the team of the 22

Figure 4.2. Average bodychecks in each period.

Figure 4.3. Bodycheck distribution among player roles. 23

Figure 4.4. Bodycheck distribution between home and away teams. checking player. The results of this grouping is given in Figure 4.4. In these matches, the home teams performed somewhat more hits than the visiting teams. This could be because there are more home team fans in the arena and they tend to cheer when well performed bodychecks happen. This could inspire the home team players to do more checking.

A very interesting way to analyze the data is by location on the ice. The ice hockey rinks used in the Finnish Hockey League often have different dimensions (length and width). This means that the raw x and y locations of the tags cannot reach the same values in two different sized rinks. To solve this issue, the dimensions of the rinks were obtained from the Wisehockey system and then the locations of the checking players at the anno- tated moment were normalized using the rink dimension. The normalized checking player locations were then used to plot a 2-dimensional histogram with 5000 unique bins. The histogram is presented in Figure 4.5. A majority of the bodychecks happened near the boards of the ice hockey rink with only individual checkings performed on the open ice. The familiar shape of an ice hockey rink can be seen from the histogram. As a contrast to the locations of the annotated bodychecks, the same plot was implemented for the randomly selected data. The resulting histogram is presented in Figure 4.6.

A similar analysis was performed to bodychecks performed by the home team of a single match. The normalized x and y locations of bodychecks of the home team were plotted in a 2-dimensional histogram. The histogram is displayed in Figure 4.7. Unsurprisingly, the same trend is present in the case of this individual team. 24

Figure 4.5. A 2d histogram of all annotated bodychecks in the dataset.

Figure 4.6. A 2d histogram of all randomly selected non bodychecks. 25

Figure 4.7. The bodychecks of the home team in match 429. 26

5 EXPERIMENTS

In Chapter 4, we defned the learning problem in this work and explored the design choices regarding the training experience. This chapter continues onto the other as- pects of the problem defnition: Task T and Performance measure P. This chapter also fulflls the next two objectives of this thesis: to train a machine learning algorithm that can classify the bodychecks and non-bodychecks from the bodycheck dataset and to eval- uate the detection performance of the model on the location data of full matches. We explain the selected target function, representation and function approximation algorithm in this chapter.

5.1 The representation and the target function

The representation of the target function was already partially defned when the dataset was created. The training examples in the dataset were put together so that there was always a checking player and a target player. The input parameters of the target function we selected were the attributes extracted from the location data of the two players. These are the attributes that were extracted from the location data in the bodycheck dataset:

• Checking player x and y location at the checking moment. • Target player x and y location at the checking moment. • The distance covered by the checking player before and after checking. • The distance covered by the target player before and after checking. • Speed of checking player before and after checking. • Speed of target player before and after checking. • Direction of the checking player before and after checking. • Direction of the target player before and after checking. • Puck proximity. • Distance between the players at the checking moment. • Distance between the puck and the checking player at the checking moment. • Distance between the puck and the target player at the checking moment.

The distance covered by both players were computed by summing the distance between adjacent points in the location data. The speed of the players was calculated in meters 27 per second by dividing the covered distance in meters by the time it took in seconds. The window length was defned as 0.6 seconds in Chapter 4. Player directions were calcu- lated as the angle in radians between a directional vector based on a player’s movement and the x-axis. Puck proximity was used to indicate whether the puck was closer than two meters from the target player at the checking moment.

The target function that we chose can be expressed simply as DetectBodycheck : A → P . The target function takes the attributes extracted from the location data of two players from the opposing teams and outputs the probability P of a bodycheck. If this function can be learned, it would provide the probability of a bodycheck at a single moment in the match and also information about which player bodychecked which player. An alternative representation might include information about the other players and even be able to provide more accurate predictions. However, it would be impossible to tell who actually performed the bodycheck and who was the target. In full match runs, the checking player and the target player at any given moment can be selected by forming all possible player pairs from the players on the ice and alternating the roles inside the pairs. With this procedure, the bodychecking probability of all possible player combinations is checked for all moments inside the match.

5.2 The function approximation algorithm

The dataset was divided into 14 folds where 13 matches were in the training set and the last match was the test set. Each match was used as the test set exactly once. Hardegger et al. determined that a random forest classifer with at least 25 subtrees performed best in detecting short and long activities in ice hockey [15]. A random forest with 100 subtrees was trained using the training data of each fold. Figure 5.1 displays the receiver operating characteristic curves and the areas under the curves for the test data of each cross-validation fold. The trained random forest classifer achieves a 0.995 area under the curve averaged over the test data.

We used the RandomForestClassifer from the scikit-learn Python module as the random forest implementation. The implementation in scikit-learn differs from the original random forest publication by averaging the probabilistic predictions instead of having the individ- ual decision trees vote for the target classes [26]. The scikit-learn implementation of the random forest classifer computes the importance of each attribute. The importances were averaged over the cross-validation folds and ordered by importance. The resulting histogram is presented in Figure 5.2. The distance between the players and the puck were the most important with a large margin. The others were less important but not completely insignifcant. 28

Figure 5.1. ROC/AUC over 14-fold cross-validation training.

5.3 The full match evaluation

The narrowing of the target function to only include two players means that during a full match run the target function must be invoked several times per time window to output a probability of a bodycheck between all possible player pairs. Ice hockey is a team sport where a maximum of 10 players are on the ice during normal even strength gameplay. This number can lower to 6 if enough penalties are given. Goalkeepers can be excluded because they are off limits according to the rules of ice hockey. During even strength, there are 5 ∗ 5 = 25 possible player pairs when each home team player is paired once with each visiting team player. This number doubles if each pair is considered twice to obtain the additional information of which player did the checking and which player was the target. The home team player acts as the checking player and the away team player as the target player in the frst scenario. The roles are reversed during the second time.

Before running the detection for the full matches, the random forest classifer was trained 29

Figure 5.2. Averaged and sorted feature importances. with all of the available data. The classifer was then applied to the same 14 matches that were used to create the dataset. Each match was processed with the following confguration:

1. A 0.6 second window was slid across the location data of the whole match. 2. An overlap of 50% or 0.3 seconds was used. 3. Inside one time window, all players on the ice were paired with the other players on the ice from the opposing team. Goaltenders were excluded. 4. The attributes were extracted for each player pair with both players in the pair taking turns as being the checking player and the target player. 5. The input attribute vector was fed into the trained algorithm and a probability output was obtained as an output.

Around 570000 predictions were executed during the run of each match. ROC curves and AUCs were computed and they are available in Figure 5.3. The ROC curves are identifed by the Liiga match ids. The performance is similar for whole matches as it is during the training phase. The average area under the curve was 0.992. 30

Figure 5.3. ROC/AUC over 14 full matches. 0.6 second window size and 0.3 second stride. 31

6 SUMMARY

Table 6.1 displays confusion matrices on different probability thresholds from a full run on match 429. Match 429 was part of the training data and it has 53 annotated bodychecks. All or most of those are detected at thresholds up to 0.9 while the number of false positives continues to decline rapidly. The other 13 matches showed similar trends. The amount of false positives relative to the total amount of negative cases at the highest threshold is only 0.0006% and at 0.9 it is 1%. This trend is visualized in Figure 6.1. The machine learning model trained with the methods presented in this work can successfully detect all or most bodychecks with a relatively small number of false positives.

The trend of absolute amount of false positives with respect to the probability threshold is also visualized in Figure 6.2. 378 and 5878 false positives are detected in match 429 at thresholds 1.0 and 0.9 respectively. The rate of true positive detections declines rapidly between the two thresholds. 83% of bodychecks are detected at 0.9 compared to only 28% at 1.0. The number of false positives is over seven times higher than the number of true positives even at the highest threshold. This makes the method presented in this thesis unattractive for deployment into a production system.

Bodychecks are a subset of all contact events in ice hockey. The players are constantly pushing, shoving, hitting, blocking or holding each other. These kind of events would not be classifed as bodychecks by an ice hockey fan but they might look a lot like bodychecks for a machine learning model which only has access to a few features derived from loca- tion data. The false positives are typically situations where the two players are contesting the puck. When players are battling for the control of the puck it is expected that there is some type of close contact between the players.

Examples of false positive situations are given in Figures 6.3 and 6.4. The fgures contain similar screenshots from the television broadcast as Figure 1.1 and the plotted location data of the players and the puck inside the time window that was used in the predic- tion. False negative examples are presented in Figures 6.5 and 6.6 in the same format. All examples are taken from the set of false positives and false negatives produced by threshold 1.0 on match 429. In Figure 6.3 is a situation which is very hard to differentiate from a true positive case. The players are contesting for puck control and even make contact briefy. This kind of cases are the most common false positives. Some false pos- itives are produced even when the players do not come into contact. This can be seen in Figure 6.4. 32

Figure 6.1. Relative false positive trend vs. probability threshold. The number of relative false positives was computed by dividing the number of false positives at each threshold by the number of false positives at threshold 0.

Figure 6.2. False positive trend vs. probability threshold. 33

Table 6.1. Confusion matrices from the full run on match 429.

Probability threshold True positive False positive False negative True negative 0.0 53 572581 0 0 0.1 53 94329 0 478252 0.2 53 45950 0 526631 0.3 53 35689 0 536892 0.4 52 29935 1 542646 0.5 52 24190 1 548391 0.6 50 19127 3 553454 0.7 50 14816 3 557765 0.8 48 10537 5 562044 0.9 44 5878 9 566703 1.0 15 378 38 572203 34

Figure 6.3. False positive example 1. The frst part of the fgure consist of screenshots from the television broadcast. The players come into contact but this situation was not annotated as a bodycheck. The second part of the fgure displays the location data of the players and the puck. The plot axes display the whole range of possible values inside an ice hockey rink. 35

Figure 6.4. False positive example 2. 36

Figure 6.5. False negative example 1. 37

Figure 6.6. False negative example 2. 38

7 CONCLUSIONS

The purpose of this thesis was to investigate the use of machine learning to detect body- checks from indoor localization data collected from professional ice hockey matches. We annotated hundreds of bodychecks and used the annotations together with randomly se- lected gameplay moments to create a bodychecking dataset. A random forest classifer was then trained with the dataset to create a detection model. The performance of the classifer was evaluated using receiver operating characteristics and area under the curve on the location data of several full matches. An average AUC of 0.992 was obtained from the full match runs.

The method presented in this thesis can detect most bodychecks and simultaneously flter almost all of the negative cases. The number of false positives relative to the amount of negative cases is very low. However, the absolute amount of false positives detected in a full match is in the hundreds. The average amount of annotated bodychecks is several times lower than that. The typical false positive occurs when the two players are contesting the puck. Often the players are in contact with each other but not so that a human viewer would express it as a bodycheck. A very important detail in this work is that the annotations were made by a single person and that creates its own bias which could refect upon the fnal classifcation performance. A fully automated detection system is expected to produce almost perfect results because the users of such systems would not accept the performance that the method presented here achieves.

There are several options from which to continue this work. The frst option is to simply continue exploring different methods with the current bodychecking dataset. The model is limited to using location data and even further limited to a small number of features. One could simply come up with new features from the location data to get better performance. For example, only two players are observed at a time. No information about the other players on the ice are included in the input features. Furthermore, only one machine learning algorithm was investigated in this work. The performance of a single classifer could be overtaken via training more classifers and forming a committee of sorts. The use of deep learning methods should also be taken into consideration. The results achieved in this work might be improved by omitting the feature extraction process and feeding raw location data into a neural network.

The second option is to implement accelerometers into the Quuppa QT-1 tags which were used to collect the location data. This approach amounts to a considerable effort as the tags likely require changes on the hardware and the software levels. Many of the re- 39 lated works presented in Chapter 2 utilize intertial measurement units or accelerometers. For example, Kelly et al. used a sensing device which contained a GPS receiver and a three axis accelerometer to detect collisions in professional rugby. They hint that fnding informative features from accelerometer signals is a diffcult task because peaks in the acceleration signal occur from many actions including running, jumping and falling. [11] It is diffcult to imagine that not being the case for ice hockey. Nevertheless, Acceleration data would reveal more information about the collisions happening between the players and would improve a classifers ability to better separate false positives from actual body- checks. This is clearly a step in the right direction.

The third option is to implement camera systems into the ice hockey arenas to produce another source of information for the classifcation task. Computer vision could then be used to analyze the video footage. There is a clear motivation on why this would be a good addition to bodycheck detection. The visual system is the central way that a human perceives ice hockey matches. A deep learning based computer vision system might learn to detect bodychecks in a similar fashion than the method presented in this work. With that in mind, there are edge cases where the performance could deteriorate. Depending on the camera setup there could be blind spots near the boards of the ice hockey rink. In addition, the players are often clumped up together in the corners and near the goals. These kind of situations may lead to problems in the detection.

We created a bodychecking dataset which was successfully used to train a machine learning model. The model was evaluated on full match runs and the results of those runs were analyzed. This thesis has reached its main objectives and the research question has been answered even with the shortcomings of the presented method. Multiple concrete future research avenues were presented. 40

REFERENCES

[1] Beal, R., Norman, T. J. and Ramchurn, S. D. Artifcial intelligence for team sports: a survey. eng. Knowledge engineering review 34 (2019). ISSN: 1469-8005. [2] Hootman, J. M., Dick, R. and Agel, J. Epidemiology of collegiate injuries for 15 sports: summary and recommendations for injury prevention initiatives. eng. Jour- nal of athletic training 42.2 (2007), 311–319. ISSN: 1062-6050. [3] International Ice Hockey Federation. IIHF offcial rule book 2018-2022. URL: https: //iihfstorage.blob.core.windows.net/iihf-media/iihfmvc/media/downloads/ rule % 20book / iihf _ official _ rule _ book _ 2018 _ ih _ 191114 . pdf (visited on 05/10/2020). [4] Douglas S., A. and Kennedy R., C. Tracking In-Match Movement Demands Using Local Positioning System in World-Class Men’s Ice Hockey. Journal of Strength and Conditioning Research 34.3 (2020), 639–646. ISSN: 1064-8011. [5] Zafari, F., Gkelias, A. and Leung, K. A survey of indoor localization systems and technologies. Communications Surveys and Tutorials (2019). ISSN: 1553-877X. [6] Swarén, M., Stöggl, T., Supej, M. and Eriksson, A. Usage and validation of a track- ing system to monitor position and velocity during cross-country skiing. eng. In- ternational Journal of Performance Analysis in Sport 16.2 (2016), 769–785. ISSN: 2474-8668. URL: http://www.tandfonline.com/doi/abs/10.1080/24748668. 2016.11868922. [7] Figueira, B., Gonçalves, B., Folgado, H., Masiulis, N., Calleja-González, J. and Sampaio, J. Accuracy of a Basketball Indoor Tracking System Based on Standard Bluetooth Low Energy Channels (NBN23). eng. Sensors (Basel, Switzerland) 18.6 (2018). ISSN: 1424-8220. [8] Colino, E., Garcia-Unanue, J., Sanchez-Sanchez, J., Calvo-Monera, J., Leon, M., Carvalho, M. J., Gallardo, L., Felipe, J. L. and Navandar, A. Validity and Reliability of a Commercially Available Indoor Tracking System to Assess Distance and Time in Court-Based Sports. eng. Frontiers in Psychology 10 (2019), 2076. ISSN: 1664- 1078. URL: https://doaj.org/article/726ca3c16f6c4592bd888a592cc140cb. [9] Cust, E. E., Sweeting, A. J., Ball, K. and Robertson, S. Machine and deep learning for sport-specifc movement recognition: a systematic review of model development and performance. eng. Journal of Sports Sciences 37.5 (2019), 568–600. ISSN: 0264-0414. URL: http://www.tandfonline.com/doi/abs/10.1080/02640414. 2018.1521769. [10] Wang, J., Chen, Y., Hao, S., Peng, X. and Hu, L. Deep learning for sensor-based activity recognition: A survey. eng. Pattern Recognition Letters 119 (2019), 3–11. ISSN: 0167-8655. 41

[11] Kelly, D., Coughlan, G., Green, B. and Caulfeld, B. Automatic detection of collisions in elite level rugby union using a wearable sensing device. eng. Sports Engineering 15.2 (2012). ISSN: 1369-7072. [12] Gastin, P.B., Mclean, O. C., Breed, R. V. and Spittle, M. Tackle and impact detection in elite Australian football using wearable microsensor technology. eng. Journal of Sports Sciences 32.10 (2014), 947–953. ISSN: 0264-0414. URL: http : / / www . tandfonline.com/doi/abs/10.1080/02640414.2013.868920. [13] Chambers, R. M., Gabbett, T. J., Gupta, R., Josman, C., Bown, R., Stridgeon, P. and Cole, M. H. Automatic detection of one-on-one tackles and ruck events using microtechnology in rugby union. eng. Journal of Science and Medicine in Sport 22.7 (2019), 827–832. ISSN: 1440-2440. [14] Hulin, B. T., Gabbett, T. J., Johnston, R. D. and Jenkins, D. G. Wearable microtech- nology can accurately identify collision events during professional rugby league match-play. eng. Journal of Science and Medicine in Sport 20.7 (2017), 638–642. ISSN: 1440-2440. [15] Hardegger, M., Ledergerber, B., Mutter, S., Vogt, C., Seiter, J., Calatroni, A. and Troster, G. Sensor technology for ice hockey and skating. eng. 2015 IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN). IEEE, 2015, 1–6. ISBN: 9781467372015. [16] Bisdikian, C. An overview of the Bluetooth wireless technology. eng. IEEE Commu- nications Magazine 39.12 (2001), 86–94. ISSN: 0163-6804. [17] Gomez, C., Oller, J. and Paradells, J. Overview and Evaluation of Bluetooth Low Energy: An Emerging Low-Power Wireless Technology. eng. Sensors 12.9 (2012), 11734–11753. ISSN: 1424-8220. [18] Quuppa Oy. Technology. Overview. URL: https : / / quuppa . com / technology / overview/ (visited on 04/22/2020). [19] Quuppa Oy. Benefts. Key features of the Quuppa system. URL: https://quuppa. com/technology/benefits/ (visited on 04/23/2020). [20] Quuppa Oy. Quuppa Positioning Engine (QPE). URL: https : / / quuppa . com / quuppa-positioning-engine/ (visited on 04/23/2020). [21] Wisehockey Oy. Wisehockey. URL: https://wisehockey.com/ (visited on 05/20/2020). [22] Wisehockey Oy. Technical info. URL: https://wisehockey.com/technical-info/ (visited on 05/21/2020). [23] Wisehockey Oy. Overview. URL: https://wisehockey.com/wp-content/uploads/ 2020/02/wh_overview_en.pdf (visited on 05/22/2020). [24] Mitchell, T. M. Machine learning. eng. New York. [25] Fawcett, T. An introduction to ROC analysis. eng. Pattern recognition letters 27.8 (2006), 861–874. ISSN: 0167-8655. [26] Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., VanderPlas, J., Joly, A., Holt, B. and Varoquaux, G. API design for machine learning software: experiences 42

from the scikit-learn project. ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 2013, 108–122. [27] Hastie, T., Tibshirani, R. and Friedman, J. The elements of statistical learning: Data mining, inference, and prediction. 2nd ed., corrected at 11th printing. eng. Springer Series in statistics. New York: Springer, 2016. ISBN: 9780387848570. [28] Jääkiekon SM-liiga Oy. Games. URL: https : / / liiga . fi / en / ottelut / 2019 - 2020/runkosarja/ (visited on 05/21/2020).