<<

Linköping University | Department of Computer and Information Science Master thesis, 30 ECTS | Computer Science 202021 | LIU-IDA/LITH-EX-A--2021/014--SE

Using Reinforcement Learning to Evaluate Player Pair Performance in

Dennis Ljung

Supervisor : Niklas Carlsson Examiner : Patrick Lambrix

Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin- istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam- manhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum- stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con- sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni- versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

c Dennis Ljung Abstract

A recent study using reinforcement learning with a Q-functions to quantify the impact of individual player actions in ice hockey has shown promising results. The model takes into account the context of the actions and captures internal dynamic features of the play which simple common metrics e.g., counting goals or assists, do not. It also performs look ahead which is important in a low scoring game like ice hockey. However, it does not capture the chemistry between the players i.e., how well the players play together which is important in a team sport like ice hockey. In this paper, we, therefore, extend this earlier work on individual player performance with new metrics on player pairs impact when on ice together. Our resulting top pairings are compared to NHL’s official statistics and extended analysis is performed that investigate the relationship with time on ice which provides insights that could be of relevance to coaches. Acknowledgments

First I would like to thank my partner Emelie Stengård, who is the source of finding my motivation to finish this thesis. Also Carles Sans Fuentes whose ambition gave me new in- spiration and an excellent thesis to reference. At last but certainly not least, my supervisor Niklas Carlsson and examiner Patrick Lambrix who provided invaluable guidance during this thesis and continuing to do so after my initial due date. I would also thank Patrick again for taking me to my first ice hockey game as an adult.

iv Contents

Abstract iii

Acknowledgments iv

Contents v

List of Figures vii

List of Tables viii

1 Introduction 1 1.1 Research questions ...... 1 1.2 Delimitations ...... 1 1.3 Research method ...... 2 1.4 Outline ...... 2

2 Related work 3

3 Background 5 3.1 Teams ...... 5 3.2 Playing area ...... 5 3.3 Game rules ...... 5

4 Theory 7 4.1 Reinforcement Learning ...... 7 4.2 Markov Decision Processes ...... 8 4.3 Markov Game Model ...... 9 4.4 Solutions to MDPs ...... 9 4.5 AD-Tree ...... 10 4.6 Relational database ...... 11 4.7 Cumulative distribution function ...... 11 4.8 Box plot ...... 11 4.9 Exponentially weighted moving average ...... 12

5 Method 13 5.1 Building the state space ...... 13 5.2 Learning the Q-values ...... 15 5.3 Action impacts ...... 19 5.4 Individual player evaluation ...... 19 5.5 Player pair evaluation ...... 20

6 Results and discussion 21 6.1 Top pairings ...... 21 6.2 Top pairings analysis ...... 22

v 6.3 TOI analysis ...... 23

7 Conclusion 27

Bibliography 29

vi List of Figures

3.1 A typical layout of a hockey rink. Each zone is divided by the blue lines. The middle being the neutral zone and the left/right being the defensive/offensive zones. The dotted line marks the of the rink and the blue circle on it marks the starting face-off spot. The other four red circles mark the other face-off spots. The goalposts are located on the centre of the red lines in the defensive/offensive zones. . 6

4.1 A simple Markov process with two states and transition functions between them. . 8 4.2 A small example of an AD-tree where each entry has 3 binary attributes...... 10 4.3 One box plot...... 12

5.1 The final state space with context features and play sequences ...... 15

6.1 CDF/CCDF of all seasons. CDF is plotted bottom left to right and CCDF top left to right (log10 scale)...... 23 6.2 Box plots of seasons...... 24 6.3 Box plots of pairs...... 25 6.4 Impact per minute played...... 26 6.5 Box plots of pairs...... 26

vii List of Tables

5.1 Action events...... 13 5.2 Start/end events...... 13 5.3 Some rows and attributes from the play-by-play events table stored in MySQL database...... 14 5.4 Some rows from the the node table with its attributes being context features, which player who performed the action and the occurrence of that node...... 16 5.5 Some rows from the edge table with its attributes being the node ids and the oc- currence of that edge...... 16

6.1 Top forward pairs 2011-2012...... 21 6.2 Top defense pairs 2011-2012...... 21 6.3 Top mixed pairs 2011-2012...... 22 6.4 Top forward pairs 2013-2014...... 22 6.5 Top defense pairs 2013-2014...... 22 6.6 Top mixed pairs 2013-2014...... 22

viii 1 Introduction

There exist many studies on evaluating individual player performance in ice hockey but not on the performance of player pairs. Ice hockey is a team sport and it is therefore important for the coaches to identify which players that work well together and who do not (for picking out to next season game). Another problem with common metrics for player performance e.g., counting goals, is that they are too simple and do not capture the internal dynamics of the play. In recent works [22] [19], there have been promising results of summing the impact of player actions using data mining techniques. The work by Routley and Schulte [19] with a Markov game model and reinforcement learning techniques learns an action- value Q-function that takes into account the context of an action. This is important because different actions could have different impacts depending on the current situation of the game. The model Routley and Schulte use also perform look-ahead on the consequences of actions which is important in a low scoring game like ice hockey. Data mining techniques have the potential to be generalized into new situations and provide better aide to human domain experts (coaches, managers, and scouts) than just using statistics would do. They could also be used independently to make decisions without input from human experts. [25] In this thesis we extend Routley’s and Schulte’s work [19] with added measures of evaluating player pairs.

1.1 Research questions

In order to comply with the objective of this study, the following research questions will guide the investigation and be summarized in the conclusion:

1. Can the method used by Routley and Schulte, with our extension of player pair valida- tion provide good pair matching?

2. Can it be used to provide other insights that could be of relevance for coaches?

1.2 Delimitations

The calculations will be performed on an Intel laptop with 2,90 GHz Intel Core i7 and 32 Gb memory.

1 1.3. Research method

1.3 Research method

First, a literature study was conducted to find a suitable method for evaluating players and working with the play-by-play event-driven hockey data. This was also done to get a better understanding of the theory used in the field and what could be extended upon. Routley’s and Schulte’s work with a Markov game model and reinforcement learning [19] were finally chosen due to several reasons. It was pretty clear how we could extend the method to also include player pair evaluation and their method in the report was for the most time easy to understand and follow. Also, they had included links to the hockey data, results from running their algorithm and (most of) the code described in the report. This made it easy to develop and test our implementation and compare our results to theirs. The implemen- tation of the algorithms from Routley’s and Schulte’s work was developed in programming languages I felt more secure with, rather than using their code, and to get a better under- standing of the algorithms. However, some exploration was done when choosing program- ming languages. The value iteration algorithm, for example, was first tested developed in the statistic programming language R in hopes that there would exist a suitable implementation already but I ended up copying Routley’s and Schulte’s algorithm instead because of its cus- tom design and demands. R was later chosen for plotting graphs though. The evaluation of our method was primarily done through comparisons with Routley’s and Schulte’s results as mentioned but any new code developed regarding functions for evaluating player pairs was done through a test-driven development (TDD) approach1. This helped to limit bugs and strengthen the confidence in our method. Analysing our final results regarding the player pairs was done through comparing the players to existing common metrics and rankings and plotting the results against different metrics and looking if the graphs seemed reasonable.

1.4 Outline

This thesis is structured as follows. In Chapter 2 the related works regarding evaluating players in ice hockey are presented. A brief overview of the game of hockey is presented in the next chapter. Chapter 4 presents all relevant theory, e.g., reinforcement learning elements, Markov decision processes and how to solve them, which supports the method described in chapter 5. Then chapter 6 contains the results from our method, which are the top player pair rankings, and our discussion and analysis regarding these results. Last we conclude the work in chapter 7 and answer our research questions.

1https://en.wikipedia.org/wiki/Test-driven_development

2 2 Related work

As mentioned in chapter 1, there have been promising works with the more advanced metric of summing the impact of players actions with Markov game models by Routley and Schulte [19]. There have been continuations on this method by evaluating the team’s performance by aggregating the values of actions performed by the team’s players and also including position information where the action took place [23] and using Deep Reinforcement Learning with continuous context signals [10]. Another example of work with Markov game action impacts is where the players are clustered based on their action’s location rather than their position [24], see section 3.1. Kaplan et al. [8] also use a Markov model with the /manpower differential state of the game to evaluate players based on their incremental contribution to the probability of winning. Extending the method by Routley and Schulte with added player impact measures has also been done by Sans Fuentes et al. [21]. There the authors add the no- tion of the indirect impact the player has on achieving goals for the team when on ice which is used in this study as well. Estimating impact of player actions has also been done through regression techniques by Schuckers and Curro [22] and by Gramacy et al. which estimates all the partial player effects each player had towards achieving a goal through logical regression [4]. There also exist studies on improving the existing common +/- metric i.e., counting the goals when the player is on ice minus the goals by the other team over a whole game1, with regression techniques by Macdonald [12] [13]. Pettigrew has developed a new statistic where he evaluates how much a player’s goal contributions impact their team’s chance of winning the game [18]. Different works on prediction systems based on ice hockey have been done, e.g. using machine learning techniques to predict whether a team will win a game [5] or players’ ranking tier based on their attributes such as age, goals, position, etc. [17]. Looking at player pair performance studies in ice hockey, there is not much. The work by Gramacy et al. as mentioned earlier, actually explores the posterior probability that one par- ticular configuration of players is more likely to score or be scored upon when facing another configuration. This type of calculation would allow coaches to explore specific line match- ups against opponents [4]. There is also one example by Thomas et al. [28] where they use Markov game model with hazard functions to evaluate player performance which also takes into account the quality of teammates. One analysis of the top 1000 pairs during sea-

1https://en.wikipedia.org/wiki/Plus-minus

3 sons 2007-2012 with the highest number of shifts i.e., playtime, together generated insights in which pairs generated better or worse performance than their individual performance.

4 3 Background

This chapter will be a brief overview of the game of hockey, with its rules, according to the (NHL) [15].

3.1 Teams

Each team is composed of 20 players and during the game, each team has 6 players on- ice during even-strength situations. These 6 players consists of one goalie and five skaters, usually two defensemen and three forwards (center, right wing and left wing).

3.2 Playing area

The game takes place on a surface known as the rink, see figure 3.1. The rink is divided into three zones: defensive (where the goal is situated for the defending team), neutral (the central portion) and offensive (furthest away from the defending goal). There are five different face-off - spots where one is placed in the centre of the ring and also marks the game starting location.

3.3 Game rules

A goal is scored for a team if the puck has been put between the goalposts by a skater in the offensive zone. An assist is awarded to the player or players (maximum two) who touch the puck prior to the goal scorer. The game time is divided into three 20-minute periods with the addition of five minutes if the game scores are tied during the regular season. A player can get a if he violates a rule, e.g., charging violently into an opponent or using the stick to restrain an opponent (hooking). The penalized player has to leave the ice and sit in a penalty box for 2, 4 or 5 minutes depending on the nature of the violation. Also, his team becomes short-handed meaning that team has fewer players on the ice. The opposing team goes into powerplay meaning that they have an advantage of more players over the shorthanded team. Hockey is a fast-paced sport, for example, no player had more than 1 minute of time on ice (TOI)/shift during seasons 2007-2014.1

1http://www.nhl.com/stats/skaters

5 3.3. Game rules

Figure 3.1: A typical layout of a hockey rink. Each zone is divided by the blue lines. The middle being the neutral zone and the left/right being the defensive/offensive zones. The dotted line marks the centre of the rink and the blue circle on it marks the starting face-off spot. The other four red circles mark the other face-off spots. The goalposts are located on the centre of the red lines in the defensive/offensive zones.

6 4 Theory

4.1 Reinforcement Learning

Reinforcement learning is a form of machine learning where the goal is to map situations to actions - learning what to do [27]. This section will describe the basic elements that are required.

Agent and environment Anything that perceives and acts in an environment can be viewed as an agent. As soon as this agent is dropped in an environment, it generates a sequence of actions according to the percepts it receives. An action causes the environment to change its state. If the environment is deterministic then an agent knows which state it will end up in after applying an action. If the outcome of an action depends on a probability distribution then the environment is stochastic. Given an action and a state, a transition function returns the resulting state. Together with the initial state, the actions and the transition function the complete state-space can be defined. This is the set of all states reachable from the initial state by any sequence of actions. An agent also needs a goal which in reinforcement learning is to maximize the total reward it receives over the long run. [20]

Rewards In reinforcement learning, each state has an associated reward R(s) which returns a positive (good) or negative number (bad) (or zero). If the agent sums up all rewards it receives over some sequence of state visits, R(s1) + R(s2) + ¨ ¨ ¨ + R(sn), then the rewards are additive.A discounted reward is if we introduce a discount factor 0 ď γ ď 1, n´1 R(s1) + γR(s2) + ¨ ¨ ¨ + γ R(sn), which controls the preference for current rewards over future rewards. A γ closer to 0 makes the agent more shortsighted or greedy whereas a value closer to 1 makes the agent more farsighted. Rewards are used to evaluate policies. [20]

Policies A policy is a sequence of actions that the agent will follow starting from the initial state, i.e., a path in the state space. If this sequence of actions maximizes the total reward then it is an

7 4.2. Markov Decision Processes optimal policy. If we have a stochastic environment then the same policy can not be executed every time because the agent will end up in different paths. Then the optimal policy is the one generating the highest expected total reward. [20] On-policy methods attempt to evaluate an existing policy, i.e. how high of a reward it generates, whereas off-policy methods evaluate a policy different from that used to generate the data. [27] If the environment contains terminal states and if the agent is guaranteed to get to one eventually, then we will never need to compare infinite sequences. A policy that is guaranteed to reach a terminal state is called a proper policy. [20]

4.2 Markov Decision Processes

Markov Decision Processes (MDP) is a way of modeling decision-making situations and can be used for reinforcement learning described in 4.1 [27]. A MDP is defined, according to Ronald A. Howard [6], by:

• A set of states S, the environment.

• A set of actions A.

• A transition function P(si, sj) that represents the probability of a state si being followed by state sj. The transition is an action a P A performed by an agent. The environment occupies sj after the transition and since the environment must be in some state after its next transition, N P(si, sj) = 1 (4.1) = jÿ1 meaning that all the probabilities of all transitions from si sum up to 1.

• A reward function R(sj) that maps a number to being in state sj.

These definitions of a MDP are visualized in figure 4.1, with the exception of rewards, which is a very simple example of a discrete-time Markov process from Howard [6]. There are two states s1 and s2 in this environment and the arrows between them are the possible transitions functions with associated probabilities. Lets say that an agent would start in s1. Then the 1 agent would either stay in s1 with probability P(s1, s1) = 2 or move to s2 with probability 1 P(s1, s2) = 2 . A more graphic example of the process in figure 4.1 could be presented by a

Figure 4.1: A simple Markov process with two states and transition functions between them.

frog in a lily pond. The frog (agent) jumps between two pads in the pond (s1 and s2) or stays in one of them. As time goes by, one would observe that the frog does the transitions with the probabilities in figure 4.1. [6]

8 4.3. Markov Game Model

4.3 Markov Game Model

A Markov game follows the same mathematical foundation as described in 4.2 with the addition that there are two or more agents in the environment that share a set of states S. Each agent i has its own set of actions A1,..., Ak and associated transition function Ti : S ˆ A1 ˆ ¨ ¨ ¨ ˆ Ak Ñ PD(S) which is controlled by one agent i at a time. [9]

4.4 Solutions to MDPs

To solve a MDP model, meaning maximizing the rewards when applying actions in the state space, we describe what an optimal policy and optimal value function is.

Value Iteration In order to find an optimal policy (or evaluate an existing one), we need to first define the value of being in a state s following an optimal policy:

V(s) = R(s) + γ max P(s1|s, a)V(s1) (4.2) aPA(s) 1 ÿs This equation, called Bellman equation by Richard Bellman (1957) or state-value function, states that the value of being in state s is the immediate reward of that state R(s) plus the discounted expected value of going to the next state V(s1) when choosing the optimal action a from avail- able actions A(s). The discount factor γ can be eliminated (γ = 1) from the value function if we have proper policies. Also, P(s1|s, a) is the transition function from s to s1 when applying action a, see section 4.2. [20] For each state in the MDP there would be a Bellman equation like equation 4.2 and to find the optimal policy, we need to solve the simultaneous system of equations to find the unknown value functions. For this, we can use dynamic programming, which was coined by Bellman [1]. The basic idea is that of viewing an optimal policy as one determining the decision required at each time in terms of the current state of the system. This theorem lets us use an iterative approach where we update all of the states and its neighbours until equilibrium or convergence criterion is reached. The pseudo-code based on the algorithm from [20] is presented in algorithm 1:

Algorithm 1 Value iteration 1: function VALUEITERATION(mdp, c) 2: states Ð mdp.states 3: R Ð mdp.rewardFunction 4: A Ð mdp.actions 5: lastValue Ð 0 6: currentValue Ð 0 7: V Ð list of length(states) initialized to zeroes 8: loop 9: for s in states do 1 1 10: V(s) Ð R(s) + γ maxaPA(s) s1 P(s |s, a)V(s ) 11: currentValue Ð currentValue + }V(s)} ř 12: end for currentValue´lastValue 13: if currentValue ă c then 14: break 15: end if 16: lastValue Ð currentValue 17: end loop 18: return V 19: end function

9 4.5. AD-Tree

We can think of the value iteration algorithm as propagating information through the state space by means of local updates. Returned is the optimal value function which assigns to each state the largest expected return achievable by any policy.

Q-function Similar to the value function described in previous section, a Q-function or action-value function Q(s, a) gives the expected discounted reward of applying action a in state s and following a policy π. They are directly related as follows: V(s) = π(a/s)Q(s, a) (4.3) a ÿ More specifically, the value function is an expectation over the action-values under a policy. [3]

4.5 AD-Tree

The AD-tree introduced by Moore and Lee [14] is a storage efficient tree structure that caches the sufficient statistics needed when doing frequent counting on queries in large data sets. A query is an entry in the database with one or multiple attributes (attribute = value) and it’s count is how many records that exist in the database. An AD-node in the tree contains the count and a set of pointers to all varying values to its attributes which in turn are new AD-nodes, see figure 4.2.

Figure 4.2: A small example of an AD-tree where each entry has 3 binary attributes.

As can be seen in figure 4.2, there are an extra value ’˚’ for every attribute. This is an omitted value and is needed if we have queries where only one attribute is interesting, e.g. a1 = 1.

10 4.6. Relational database

The tree never stores nodes with counts of zero, and never counts that can be deduced from other counts in the tree, meaning it is not necessary to store Vary nodes with indices below i + 1 because that information can be obtained from another path in the tree, see figure 4.2. If all records exist in the database and we do not store queries with ’˚’ values then the worst case size of the tree (number of nodes) is the same as all possible subsets of attributes: VM, where V is the number of values and M is the number attributes. So, for example if we have 40 different attributes that each can have 2 different values then the worst-case size is 240 = 1, 099, 511, 627, 776 nodes. However, in real-world data sets the number of actual records R is much less than the mentioned upper bound, R ! VM. [14]

4.6 Relational database

A relation in a relational database is a set of rows, i.e. tuples, and columns, i.e. attributes, located in a table. The data must follow a set of rules and guidelines in order to be normalized and thus avoiding anomalies and loss of data when maintained. One of these rules is that each entity must have an unique identifier which is an attribute or a set of attributes that is used for identifying that entity. This is chosen by the designer of the database and will become the primary key. Another key component is the relationships with other tables, which is usually done by foreign keys, which are (sets of) attributes in one table that are primary keys in other tables. Structured Query Language (SQL) is a database definition language which supports the creation and maintenance of the relational database and the management of data within that database. Further, it is also a database query language providing for retrieval of data based on operations on tables. It also provides for aggregation queries (COUNT, SUM, AVG etc) and grouping queries (GROUP BY/JOIN). [16]

4.7 Cumulative distribution function

The cumulative distribution function (CDF) is a way of showing how a variable varies in a data set. It is defined by equation 5.1:

Fx(x) = P(X ď x), x P R (4.4)

For a given X, the probability that X is less than or equal to a value x is calculated for every conceivable X, thus obtaining the function Fx(x). Getting the probability that X is larger than a value x, P(X ą x), is called a complementary cumulative distribution function (CCDF). [2]

4.8 Box plot

A box plot is another way to visually show the distribution of a data set, including the varia- tion range R also called the whisker line, the median x˜ and the 25/75% medians x˜25/x˜75.A box plot could look like the one in figure 4.3:

11 4.9. Exponentially weighted moving average

Figure 4.3: One box plot.

For example, if we have a ordered data set 4.5, then R = xmax ´ xmin = 0.299,

2.328 2.404 2.476 2.504 2.506 2.541 2.616 2.618 2.627 (4.5) x˜ = 2.506, x˜25 = 2.476 and x˜75 = 2.616. They take up less space and are therefore particularly useful for comparing distributions between several groups or sets of data. [2]

4.9 Exponentially weighted moving average

The exponentially weighted moving average (EWMA) is a statistic which gives less and less weight to data as it gets older and older.

y˜t+1 = λyt + (1 ´ λ)y˜t, (4.6) where y˜t+1 is the EWMA value at time t + 1, yt is the observed value at time t, y˜t is the old EWMA value at time t and λ is a constant (0 ă λ ă 1) that determines the weighting decrease. The smaller the value of λ the greater the influence of the historical data. [7]

12 5 Method

The methodology used in this thesis to gather the results will be described in four main subsections. The first three describe the method used by Routley and Schulte [19] to build the state space, learning the Q-values and gather individual player valuations which is the method used in this thesis as well. The last describes the method to gather player pair valua- tions which use the individual player valuations from the first part.

5.1 Building the state space

The first step is to analyze the data and construct a state-space which is later used for value iteration.

Data format Routley and Schulte use data from the 2007-2014 NHL regular seasons which was collected by Sportloqiq c 1 and stored in a relational database. This database was downloaded and used in this thesis with MySQL. The dataset is formed from play-by-play events, which are continuous timewise events that happen during games and recorded by NHL, see table 5.3. One player from a team performs an event at a time. The events are categorized into action events and start/end events:

faceoff period start period end missed shot early intermission start blocked shot stoppage takeaway shootout completed giveaway game end hit game off goal early intermission end penalty Table 5.2: Start/end events. Table 5.1: Action events.

1https://sportlogiq.com

13 5.1. Building the state space

game Id period event nr time event type event id 2014020513 1 1 00:00:00 period start 30634 2014020513 1 2 00:00:00 faceoff 850379 2014020513 1 3 00:00:30 takeaway 162045 2014020513 1 4 00:00:51 hit 504688 2014020513 1 5 00:01:24 hit 504689 2014020513 1 6 00:01:37 shot 282913 2014020513 1 7 00:01:37 stoppage 626266 Table 5.3: Some rows and attributes from the play-by-play events table stored in MySQL database.

The action events in table 5.1 are performed by players and the start/end events in table 5.2 mark the start/end of a play sequence. However, action events goal and penalty can be both types since they also mark the end of a play sequence.

State Space To be able to construct states from the play-by-play events under the rules of a MDP, see 4.2, and can be applied to the hockey domain, Routley and Schulte introduce two concepts. The first is the concept of context features, which are properties of a game that are especially important and describe the dynamics of a hockey game most accurately. These features are goal differential gd, manpower differential md and period p and are calculated as such:

• gd = nr of home goals - nr of away goals.

• md = nr of home players - nr of away players.

• p = current period of the game.

Each time any of these three contexts features changes, a new state is created. To follow the notion of MDP, we also need actions that trigger state transitions. The actions are con- structed from the action events defined in table 5.1 with the added dimensions of which team performed it and which zone it occurred in. For example, f aceo f f (home, neutral), tells that a face-off was won by the home team in the neutral zone. The start/end events from table 5.2 is also added to the state space but only with the context features defined. Routley and Schulte then introduce the second concept, play sequences. A play sequence consists of one start event and one or more actions. If it also has an end event it is a complete play sequence. A state is now a combination of context features and unique (not necessarily complete) play sequences. This means that each time an action event is added which has not happened before in this context and sequence a new state is created, see figure 5.1.

14 5.2. Learning the Q-values

Figure 5.1: The final state space with context features and play sequences

The final piece to the MDP model is the transition function P(si, sj), see equation 5.1.

Occ(si, sj) P(si, sj) = (5.1) Occ(si)

Here, Occ(si) is the number of occurrences of state si in the database and Occ(si, sj) is the number of occurrences of state si being immediately followed by state sj in the database.

Rewards Routley and Schulte focus on two targets of interests to be learned, the next goal scored and the next penalty. In this paper, we only use the next goal reward function which is defined in equation 5.2. 1, if s ends with home goal R(s) = $´1, if s ends with away goal (5.2) &’0, otherwise

This means that only states with complete%’ play sequences get a reward because the action goal is an end event. Also, these states are absorbing states, meaning that there are no transitions to other states, see state s3 in figure 5.1.

5.2 Learning the Q-values

We define the reward function for states and extend the MDP model from the previous chap- ter with the Q-function and solve it with value iteration.

State space graph To be able to construct and manage the large state space described in the previous chapter, Routley and Schulte use a graph representation. This is done with the AD-tree structure, see chapter 4.5, which computes and stores the sufficient statistics for the state space. The nodes in the graph are states (play sequences) that are observed at least one time and state transitions become the edges. The process of building the state space graph is described in pseudo-code in algorithm 2. Some modifications are done to add more state transitions

15 5.2. Learning the Q-values and nodes to reflect the context-awareness. One of these modifications is the injecting of a shot event before a goal event, see row 31 and row 95 in algorithm 2. Also, the function on row 79 reflects the context by comparing the features that define a state, see section 5.1. The result from running this algorithm on all the events in the NHL-data is two MySQL tables describing the AD-tree structure. Small examples of these two tables can be seen in tables 5.4 and 5.5.

node id type gd md period team zone player id occurrence 104 state 0 1 1 105432 105 faceoff 0 1 1 away defensive 8467334 1120 106 shot 0 1 1 home offensive 8476478 2011 107 stoppage 0 1 1 1220 Table 5.4: Some rows from the the node table with its attributes being context features, which player who performed the action and the occurrence of that node.

from node id to node id occurrence 104 105 109 105 106 60 106 107 212 107 108 301 Table 5.5: Some rows from the edge table with its attributes being the node ids and the occur- rence of that edge.

These tables are updated by the functions WriteNode, WriteEdge, IncrementNodeOccurrence and IncrementEdgeOccurrence in algorithm 2. The algorithm is the same as Routley and Schulte implementation in Java with some minor code styling and structural differences. Also in this thesis, it was written in Python.

16 5.2. Learning the Q-values

Algorithm 2 Build AD-Tree 1: currentNode Ð Root 2: previousNode Ð none 3: addedLeafLast Ð f alse 4: function ADDEVENTSTOTREE(events) 5: for event in Events do 6: newNode Ð CreateNode(event) 7: if currentNode == Root then 8: currentNode.Occurrence Ð currentNode.Occurrence + 1 9: stateNode Ð CreateStateNode(event) 10: nextNode Ð currentNode.FindChild(stateNode) 11: if nextNode == none then 12: WriteNode(stateNode) 13: currentNode.AddChild(stateNode) 14: stateNode.parent Ð currentNode 15: nextNode Ð stateNode 16: WriteEdge(currentNode, nextNode) 17: end if 18: IncrementEdgeOccurrence(currentNode, nextNode) 19: currentNode Ð nextNode 20: if addedLeafLast then 21: nextNode Ð previousNode.FindChild(currentNode) 22: if nextNode == none then 23: previousNode.AddChild(currentNode) 24: WriteEdge(previousNode, currentNode) 25: end if 26: IncrementEdgeOccurrence(previousNode, currentNode) 27: addedLeafLast Ð f alse 28: end if 29: currentNode.Occurrence Ð currentNode.Occurrence + 1 30: end if 31: if event.type == goal then 32: event.type Ð shot 33: shotNode Ð CreateNode(event) 34: nextNode Ð currentNode.FindChild(shotNode) 35: if nextNode == none then 36: WriteNode(shotNode) 37: currentNode.AddChild(shotNode) 38: shotNode.parent Ð currentNode 39: nextNode Ð shotNode 40: WriteEdge(currentNode, nextNode) 41: end if 42: IncrementEdgeOccurrence(currentNode, NextNode) 43: previousNode Ð currentNode 44: currentNode Ð nextNode 45: currentNode.Occurrence Ð currentNode.Occurrence + 1 46: end if 47: nextNode Ð currentNode.FindChild(newNode) 48: if nextNode == none then 49: WriteNode(newNode) 50: currentNode.AddChild(newNode) 51: newNode.parent Ð currentNode 52: nextNode Ð newNode 53: WriteEdge(currentNode, nextNode) 54: end if 55: IncrementEdgeOccurrence(currentNode, NextNode) 56: previousNode Ð currentNode 57: currentNode Ð nextNode 58: currentNode.Occurrence Ð currentNode.Occurrence + 1 59: if event == EndEvent then 60: currentNode Ð Root 61: addedLeafLast Ð true 62: end if 63: end for 64: end function

17 5.2. Learning the Q-values

Algorithm 2 Build AD-Tree continued 65: function CREATENODE(event) 66: return new Node 67: type Ð event.type 68: gd Ð event.gd 69: md Ð event.md 70: p Ð event.p 71: zone Ð event.zone 72: team Ð event.team 73: end function 74: 75: function CREATESTATENODE(event) 76: return new Node 77: type Ð state 78: gd Ð event.gd 79: md Ð event.md 80: p Ð event.p 81: end function 82: 83: function FINDCHILD(sel f , node) 84: for child in sel f .children do 85: if child.CompareNode(node) then 86: return node 87: end if 88: end for 89: return none 90: end function 91: 92: function COMPARENODE(sel f , node) 93: if sel f .type != node.type then 94: return f alse 95: end if 96: if sel f .team != node.team or sel f .zone != node.zone then 97: return f alse 98: end if 99: if sel f .gd != node.gd or sel f .md != node.md or sel f .p != node.p then 100: return f alse 101: end if 102: return true 103: end function

Q-function As described in section 4.4, a Q-function Q(s, a) gives the expected reward of taking action a in state s following a policy. Since the states in our state space also encode action histories, i.e., policies, learning the value of a state is equal to learning a Q-function. This can also be derived from the equation 4.3 since all actions are encoded in the state and thus no need for the sum. This, in turn, leads to that standard value iteration method can be used. [27]

Value iteration The Q-function values are learned with the value iteration algorithm with dynamic program- ming technique, see chapter 4.4. The algorithm in pseudo code is presented in figure 3 and the actual implementation is the same as Routley and Schulte but ours was written in C++. The states parameter is a combination of tables 5.4 and 5.5 from section 5.2 and stores the total amount of states, the rewards and the links to next states. The reward function R(s), defined in equation 5.2, finds the current state from the node table 5.4 and returns the goal difference (gd). The transition function, defined in equation 5.1, finds the occurrence of the state and the occurrence of its successor state from the edge table 5.5.

18 5.3. Action impacts

Algorithm 3 Value iteration 1: function VALUEITERATION(states, m, c) 2: lastValue Ð 0 3: Q Ð list of length(states) initialized to zeroes 4: for i Ð 1 to m do 5: for s in states do 1 1 6: Qi+1 Ð R(s) + s1Ps P(s, s ) ˆ Qi(s ) 7: currentValue Ð currentValue + }Qi+ (s)} ř 1 8: end for currentValue´lastValue 9: if currentValue ă c then 10: break 11: end if 12: lastValue Ð currentValue 13: end for 14: return Q˚ 15: end function

Since we have proper policies due to the absorbing states we can use γ = 1, see chapter 4.4. We use a relative convergence criterion c = 0.0001 and maximum iterations m = 100000 which is the same as Routley and Schulte uses. [19] Since we are in an on policy setting, what is learned is the optimal policy based on play in the whole NHL during 2007-2014.

5.3 Action impacts

Using the optimal action values stored in the resulting Q-table, we can now estimate the individual player’s contributions. Routley and Schulte introduce the notion of action impact values to measure what impact a player’s action has towards achieving a goal. Let’s say player p performs action ap in state s for team t resulting in state s ˚ ap, the action impact value is then the difference between the resulting state’s Q-value and the previous state’s Q-value. The equation is formulated as such:

impact(s, ap) = Qt(s ˚ ap) ´ Qt(s) (5.3)

It measures which impact an action has for scoring the next goal for your team, which can be positive or negative.

5.4 Individual player evaluation

The action impact measure defined in equation 5.3 can also be viewed as the direct impact a player has when performing an action in a state. We introduce the notion of actions that happen when the player is on the ice and not necessarily performed by the player, the indirect impact. This is because even though players do not perform the actions, they can still influence the cause of the game e.g., they could aid the other pair player by providing good passes or lead an opponent away from the shooter, which is what we want to capture in this study. If Ap(pk) is the set of actions performed by player pk, which Routley and Schulte focuses on, and Ai(pk) is the set of actions when player pk is on the ice then Ap(pk) Ď Ai(pk). This can be formulated into our new impact measure for player pk:

impact(pk) = impact(s, a), (5.4) ( ) aPAÿi pk where Ai(pk) is all actions when player pk is on the ice and impact(s, a) is the same equation as Routley’s and Schulte’s 5.3. [11]

19 5.5. Player pair evaluation

5.5 Player pair evaluation

The indirect impact from previous section applies to player pairs also, as we look at the actions when both players are on the ice. If we choose player pk and pl as our pair then Ai(pk, pl) is the set of all actions when players pk and pl both are on the ice and Ai(pk, pl) = Ai(pk) X Ai(pl). [11] The impact measure for pairs then becomes:

pair_impact(pk, pl) = impact(s, a) (5.5) ( ) aPAiÿpk,pl The pairs were chosen by first generating all unique combinations of pairs and then choosing those on the same team with time on ice (TOI) higher than one minute during a game. The threshold of a minimum one minute was chosen because lower than that or none could result in many outlier pairs that are not of relevance. Many player pairs just happen by accident e.g., a player gets a minor injury and is replaced for a short period of time. Also, this is related to hockey, in general, being a fast-paced sport, see chapter 3. The impact measure defined in equation 5.5 can be used for data sets covering (parts of) games or (parts of) seasons. When the data set covers all games in a whole season then we call the pair_impact as total_pair_impact (TPI).

20 6 Results and discussion

In this chapter, we first present the top pairings which are based on the player pair indirect impact measure defined in chapter 5.5. Then we discuss the results from the top pairings and also add extended TOI analysis.

6.1 Top pairings

The top pairings results are from the regular seasons 2011-2012 and 2013-2014 and divided into forwards, defenders and mixed pairs, i.e. one forward and one defender. They are ranked based on the total_pair_impact (TPI) measure from equation 5.5. The two players also have their respective goals (G), assists (A) and +/- score metrics. The total TOI the pairs have played together is also displayed (in seconds).

Player 1 G A +/- Player 2 G A +/- Team TPI TOI 37 46 -9 Zach Parise 31 38 -5 NJD 121.17 40,163 Ryan O’Reilly 18 37 -1 22 30 +20 COL 115.74 39,021 Joe Pavelski 31 30 +18 18 59 +17 SJS 112.65 39,353 60 37 +7 Martin St. Louis 25 49 -3 TBL 111.77 35,941 Milan Michalek 35 25 +4 Jason Spezza 34 50 +11 OTT 111.73 36,689

Table 6.1: Top forward pairs 2011-2012.

Player 1 G A +/- Player 2 G A +/- Team TPI TOI Dan Girardi 5 24 +13 Ryan McDonagh 7 25 +25 NYR 155.28 55,911 Filip Kuba 6 26 +26 19 59 +16 OTT 134.74 47,985 Francois Beauchemin 8 14 -14 Cam Fowler 5 24 -28 ANA 125.54 45,795 Josh Gorges 2 14 +14 P.K. Subban 7 29 +9 MTL 125.16 44,390 Carl Gunnarsson 4 15 -9 12 32 -10 TOR 123.06 36,181

Table 6.2: Top defense pairs 2011-2012.

21 6.2. Top pairings analysis

Player 1 G A +/- Player 2 G A +/- Team TPI TOI Jason Spezza 34 50 +11 Erik Karlsson 19 59 +16 OTT 110.58 35,990 Joe Pavelski 31 30 +18 Dan Boyle 9 39 +10 SJS 106.04 35,612 Joe Thornton 18 59 +17 Dan Boyle 9 39 +10 SJS 102.96 35,160 Tomas Fleischmann 27 34 -7 Brian Campbell 4 49 -9 FLA 98.08 31,804 Stephen Weiss 20 27 +5 Brian Campbell 4 49 -9 FLA 96.79 32,995

Table 6.3: Top mixed pairs 2011-2012.

Player 1 G A +/- Player 2 G A +/- Team TPI TOI 30 31 -9 27 43 -5 TOR 166.42 51,910 Alex Ovechkin 21 38 -35 Nicklas Backstrom 18 61 -20 WSH 113.85 40,815 James van Riemsdyk 30 31 -9 Tyler Bozak 19 30 +2 TOR 111.39 32,567 Phil Kessel 27 43 -5 Tyler Bozak 19 30 +2 TOR 109.57 34,648 Chris Kunitz 35 33 +25 36 68 +18 PIT 107.04 36,296

Table 6.4: Top forward pairs 2013-2014.

Player 1 G A +/- Player 2 G A +/- Team TPI TOI Duncan Keith 6 55 +22 Brent Seabrook 7 34 23 CHI 143.19 46,762 Dan Girardi 5 19 +6 Ryan McDonagh 14 29 +11 NYR 129.03 50,102 Carl Gunnarsson 3 14 +12 Dion Phaneuf 8 23 31 TOR 124.39 41,219 Justin Faulk 5 27 -9 Andrej Sekera 11 33 +4 CAR 121.08 42,966 Jan Hejda 6 11 +8 9 30 +5 COL 116.57 40,941

Table 6.5: Top defense pairs 2013-2014.

Player 1 G A +/- Player 2 G A +/- Team TPI TOI Phil Kessel 27 43 -5 Dion Phaneuf 8 23 31 TOR 106.39 32,608 James van Riemsdyk 30 31 -9 Dion Phaneuf 8 23 31 TOR 100.82 30,847 Jason Spezza 23 43 -26 Erik Karlsson 20 54 -15 OTT 92.91 31,898 Sean Couturier 13 26 +1 Braydon Coburn 5 12 -6 PHI 90.43 26,893 Anze Kopitar 29 41 +34 Drew Doughty 10 27 +17 LAK 88.74 33,209

Table 6.6: Top mixed pairs 2013-2014.

6.2 Top pairings analysis

The first thing to check is if these players are indeed good players according to common statistics, i.e., points which is the goals and assists summed. Looking at the top players ranked on points at NHL’s official statistics1 (ranking in parentheses), we find many of our top pair- ings names. For the 2011-2012 results, we have Steven Stamkos (2th), Jason Spezza (4th), Ilya Kovalchuk (5th), Erik Karlsson (11th) and Joe Thornton (14th). For the 2013-2014 results, we have Sidney Crosby (1th), Phil Kessel (6th), Alex Ovechkin (8th), Nicklas Backstrom (11th) and Erik Karlsson (14th). The second thing we look at is how well the pairs scored for their respective teams. Joe Pavelski (4th) and Joe Thornton (1th) were among the highest-scoring pairs for the in season 2011-2012. Another pair is Ryan O’Reilly (1th) and Gabriel Landeskog (3th) for the Avalanche. For the 2013-2014 results, we notice the line James van Riemsdyk (2th), Phil Kessel (1th) and Tyler Bozak (4th). Another important factor is how much TOI our pairs got, reflecting how relied upon they where by

1http://www.nhl.com/stats/skaters

22 6.3. TOI analysis their coaches. Looking at the forward pairs of 2011-2012, again Pavelski/Thorntonn and O’Reilly/Landeskog but also Ilya Kovalchuk and Zach Parise wherein the top 5 in joint TOI. For the defensive/mixed pairs of 2011-2012, four out of five of the pairs were in the same top 5 joint TOI [11].

6.3 TOI analysis

Analysing the relationship between our TPI measure and the TOI for the pairs is important because coaches typically give more ice time to their favourite choices to maximize the chance of success against the other team’s lineup [11]. In upcoming figures, it is converted to minutes for convenience rather than seconds as in the previous results. First, we analyse only the TOI data for the player pairs for all seasons 2007-2014. This is done with a CDF/CCDF plot which is shown in figure 6.1:

Figure 6.1: CDF/CCDF of all seasons. CDF is plotted bottom left to right and CCDF top left to right (log10 scale).

As can be seen from figure 6.1, the distributions are overlapping which suggests that the TOI result is invariant across seasons and our chosen seasons 2011-2014 are no exceptions. Now we can analyse the relationship as mentioned before. This is done with box plots of the TPI and a varying TOI into the time interval [2i, 2i+1] during seasons 2011-2014. The box plots follow the definition in chapter 4.8 except that we use a 90%-whisker line to compensate for outlier data that otherwise would have made the plots less readable. First is the box plot of all player pairs across seasons 2011-2012 and 2013-2014 in figure 6.2:

23 6.3. TOI analysis

Figure 6.2: Box plots of seasons.

The seasons show very similar results and as expected the variation decreases as the pairs play more together. Comparing the individual box plots medians to the overall medians (0.194 and 0.190) we see that the highest medians are found around 16-256 minutes of play- time together. Also, it decreases for some of our top pairs e.g., Dan Girardi and Ryan Mc- Donagh who had the highest TPI during season 2011-2012 but lower impact per minute than the overall median (0.167 < 0.194) with their 930 minutes of playtime together. This could be an indication that these pairs are relied upon more than it is healthy for their performance. Another indication could be that coaches rely on these pairs to play "tough minutes" against the other teams’ top players. However, the overall high values for the significant minutes (16- 256) played by pairs could suggest that the coaches do a good job of distributing the players. [11]

24 6.3. TOI analysis

Figure 6.3: Box plots of pairs.

The second box plot in figure 6.3 we instead compare the different combinations of the pairs regarding positions. It is clear that the mixed pairs have a higher impact across joint TOI compared to both forward and defense pairings which has a similar impact. This can be explained by good forwards being matched width good puck-moving defensemen i.e., playing the puck up to forwards and keeping it in the offensive zone 2. Jason Spezza and Erik Karlsson from the season 2011-2012 mixed pairs results 6.1 is an example of this pairing.

Another interesting analysis is if the player pairs that actually spend the most time to- gether playing produce better results versus when they do not play together (with other players). This is done by plotting the pair impact per minute as a function of fraction played together i.e., 0 (0%) = no time together, 1 (100%) full time played together. The results from seasons 2011-2014 together with subsets of player pairs with at least >30 and >300 minutes of TOI together is plotted. We also use an exponentially weighted moving average (EWMA) with λ = 0.02 to smooth out the curves. Looking first at figure 6.4, it is hard to notice any clear signal in the noise despite the smoothing. The subsets of the pairs with over 30/300 minutes of play seem to be lowering their impact with increasing fraction of time together. Plotting the relative impact i.e., the ratio between the impact per minute when playing to- gether and not playing together also displays this trend (figure 6.5). This could again be the result of a matchup against the other team’s top lines. It could also be an effect of fourth line players taking their opportunities when on ice with top-line players. [11]

2https://en.wikipedia.org/wiki/Defenceman

25 6.3. TOI analysis

Figure 6.4: Impact per minute played.

Figure 6.5: Box plots of pairs.

26 7 Conclusion

In this thesis, we extend Routley and Schultes method [19] of evaluating the performance of single players to player pairs. This includes choosing the pairs and extending the impact metrics to allow for action impacts when both players are on the ice, the total_pair_impact (TPI). The resulting top players from our method indeed show promising pairings, mainly because many of our names show up high in the NHL’s official ranking based on points, both in general and for their respective teams. Also, some of the players placed high in joint TOI which indicates that they are relied players. The resulting top pairs are also those who perform well across multiple seasons. Using the extended analysis that investigates the relationship between our impact measure (TPI) and TOI, we find pairings that are perhaps relied upon more than it is healthy for their performance or had to play more often against tougher opponents. Also, the mixed pair combination of forwards with defensemen has a higher impact than the other pairings and players that play the most together often have a lower relative impact when playing together. These insights could be of relevance for coaches when analysing their choices of pairs. Continuing this thesis work, it would be interesting to look at action impact values and rewards for some of the other events e.g., shots on goals, penalties, injuries or powerplays. Using other impact metrics than the on-ice (indirect) impact and comparing them, like Sans Fuentes et al [21] did with direct vs indirect would also be interesting. An example of using the direct impact could be that only one of the players in the pair gets the credit. Another analysis would be to run our algorithm on subsets of the NHL data set e.g., subset containing only games of one specific team which gives info on the importance of actions for that team. Also running it on a different data set than NHL would be interesting e.g., the (SHL) data set and comparing the optimal play between them.

I will end this thesis with a quote from a book which deals with sports analytics and statistics, “The key is to develop tools and habits so that you are more often looking for ideas and information in the right places - and in honing the skills to harness them into wins and losses once you found them.” [26]

27 Glossary

CCDF complementary cumulative distribution function. 11

CDF cumulative distribution function. 11

EWMA exponentially weighted moving average. 12, 25

MDP Markov Decision Processes. 8, 9, 14, 15

NHL National Hockey League. 5, 13, 16, 19, 22, 27

SHL Swedish Hockey League. 27

SQL Structured Query Language. 11

TDD test-driven development. 2

TOI time on ice. 5, 20–23, 25, 27

TPI total_pair_impact. 20, 21, 23, 24, 27

28 Bibliography

[1] Richard Bellman. “The theory of dynamic programming”. In: Bulletin of the American Mathematical Society 60.6 (1954), pp. 503–515. [2] Gunnar Blom, Jan Enger, Gunnar Englund, Jan Grandell, and Lars Holst. Sannolikhetste- ori och statistikteori med tillämpningar. Studentlitteratur AB, 2017. ISBN: 9789144123561. [3] Pratap Dangeti. Statistics for Machine Learning: Techniques for exploring supervised, un- supervised, and reinforcement learning models with Python and R. Packt Publishing, 2017. ISBN: 9781788295758. [4] Robert B Gramacy, Shane T Jensen, and Matt Taddy. “Estimating player contribution in hockey with regularized logistic regression”. In: Journal of Quantitative Analysis in Sports 9.1 (2013), pp. 97–111. [5] Wei Gu, Krista Foster, Jennifer Shang, and Lirong Wei. “A game-predicting expert sys- tem using big data and machine learning”. In: Expert Systems with Applications 130 (2019), pp. 293–305. [6] Ronald A. Howard. Dynamic Programming and Markov Processes (Technology Press Re- search Monographs). The MIT Press, 1960. ISBN: 0262080095. [7] J Stuart Hunter. “The exponentially weighted moving average”. In: Journal of quality technology 18.4 (1986), pp. 203–210. [8] Edward H Kaplan, Kevin Mongeon, and John T Ryan. “A Markov model for hockey: manpower differential and win probability added”. In: INFOR: Information Systems and Operational Research 52.2 (2014), pp. 39–50. [9] Michael L Littman. “Markov games as a framework for multi-agent reinforcement learning”. In: Machine Learning Proceedings 1994. Elsevier, 1994, pp. 157–163. [10] Guiliang Liu and Oliver Schulte. “Deep reinforcement learning in ice hockey for context-aware player evaluation”. In: Proceedings of the Twenty Seventh International Joint Conference on Artificial Intelligence (2018), pp. 3442–3448. [11] Dennis Ljung, Niklas Carlsson, and Patrick Lambrix. “Player Pairs Valuation in Ice Hockey”. In: Machine Learning and Data Mining for Sports Analytics. Ed. by Ulf Brefeld, Jesse Davis, Jan Van Haaren, and Albrecht Zimmermann. Cham: Springer International Publishing, 2019, pp. 82–92. ISBN: 978-3-030-17274-9. [12] Brian Macdonald. “A regression-based adjusted plus-minus statistic for NHL players”. In: Journal of Quantitative Analysis in Sports 7.3 (2011).

29 Bibliography

[13] Brian Macdonald. “An improved adjusted plus-minus statistic for NHL players”. In: Proceedings of the MIT Sloan Sports Analytics Conference. Vol. 3. 2011, pp. 1–8. [14] Andrew Moore and Mary S Lee. “Cached sufficient statistics for efficient machine learn- ing with large datasets”. In: Journal of Artificial Intelligence Research 8 (1998), pp. 67–91. [15] National Hockey League. National Hockey League Official rules 2018-2019. [Online; ac- cessed 15-January-2019]. 2019. URL: http://www.nhl.com/nhl/en/v3/ext/ rules/2018-2019-NHL-rulebook.pdf. [16] Shamkant B Navathe and Ramez A Elmasri. Fundamentals of Database Systems. Pearson, 2016. ISBN: 0133970779. [17] Timmy Lehmus Persson, Haris Kozlica, Niklas Carlsson, and Patrick Lambrix. “Predic- tion of tiers in the ranking of ice hockey players”. In: Machine Learning and Data Mining for Sports Analytics. Ed. by Ulf Brefeld, Jesse Davis, Jan Van Haaren, and Albrecht Zimmer- mann. Communications in Computer and Information Science (2020). [18] Stephen Pettigrew. “Assessing the offensive productivity of NHL players using in- game win probabilities”. In: 9th annual MIT sloan sports analytics conference. Vol. 2. 3. 2015, p. 8. [19] Kurt Routley and Oliver Schulte. “A markov game model for valuing player actions in ice hockey”. In: Meila, M., Heskes, T. (eds.) Uncertainty in Artificial Intelligence. (2015), pp. 782–791. [20] Stuart J Russell and Peter Norvig. Artificial intelligence: a modern approach. Pearson Edu- cation Limited, 2016. [21] Carles Sans Fuentes, Niklas Carlsson, and Patrick Lambrix. “Player impact measures for scoring in ice hockey”. In: MathSport International Conference, Athens, 1-3 July 2019. Athens University of Economics and Business. 2019, pp. 307–317. [22] Michael Schuckers and James Curro. “Total Hockey Rating (THoR): A comprehensive statistical rating of National Hockey League forwards and defensemen based upon all on-ice events”. In: 7th annual MIT sloan sports analytics conference. 2013. [23] Oliver Schulte, Mahmoud Khademi, Sajjad Gholami, Zeyu Zhao, Mehrsan Javan, and Philippe Desaulniers. “A Markov Game model for valuing actions, locations, and team performance in ice hockey”. In: Data Mining and Knowledge Discovery 31.6 (2017), pp. 1735–1757. [24] Oliver Schulte, Zeyu Zhao, Mehrsan Javan, and Philippe Desaulniers. “Apples-to- apples: Clustering and ranking NHL players using location information and scoring impact”. In: Proceedings of the MIT Sloan Sports Analytics Conference. 2017. [25] Robert P Schumaker, Osama K Solieman, and Hsinchun Chen. “Sports data mining: The field”. In: Sports Data Mining. Springer, 2010, pp. 1–13. [26] Nate Silver. The signal and the noise: the art and science of prediction. Penguin UK, 2012. [27] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. Vol. 1. 1. MIT press Cambridge, 1998. [28] AC Thomas, Samuel L Ventura, Shane T Jensen, and Stephen Ma. “Competing pro- cess hazard function models for player ratings in ice hockey”. In: The Annals of Applied Statistics (2013), pp. 1497–1524.

30