Using Reinforcement Learning to Evaluate Player Pair Performance in Ice Hockey
Total Page:16
File Type:pdf, Size:1020Kb
Linköping University | Department of Computer and Information Science Master thesis, 30 ECTS | Computer Science 202021 | LIU-IDA/LITH-EX-A--2021/014--SE Using Reinforcement Learning to Evaluate Player Pair Performance in Ice Hockey Dennis Ljung Supervisor : Niklas Carlsson Examiner : Patrick Lambrix Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin- istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam- manhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/. Copyright The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum- stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con- sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni- versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/. c Dennis Ljung Abstract A recent study using reinforcement learning with a Q-functions to quantify the impact of individual player actions in ice hockey has shown promising results. The model takes into account the context of the actions and captures internal dynamic features of the play which simple common metrics e.g., counting goals or assists, do not. It also performs look ahead which is important in a low scoring game like ice hockey. However, it does not capture the chemistry between the players i.e., how well the players play together which is important in a team sport like ice hockey. In this paper, we, therefore, extend this earlier work on individual player performance with new metrics on player pairs impact when on ice together. Our resulting top pairings are compared to NHL’s official statistics and extended analysis is performed that investigate the relationship with time on ice which provides insights that could be of relevance to coaches. Acknowledgments First I would like to thank my partner Emelie Stengård, who is the source of finding my motivation to finish this thesis. Also Carles Sans Fuentes whose ambition gave me new in- spiration and an excellent thesis to reference. At last but certainly not least, my supervisor Niklas Carlsson and examiner Patrick Lambrix who provided invaluable guidance during this thesis and continuing to do so after my initial due date. I would also thank Patrick again for taking me to my first ice hockey game as an adult. iv Contents Abstract iii Acknowledgments iv Contents v List of Figures vii List of Tables viii 1 Introduction 1 1.1 Research questions . 1 1.2 Delimitations . 1 1.3 Research method . 2 1.4 Outline . 2 2 Related work 3 3 Background 5 3.1 Teams . 5 3.2 Playing area . 5 3.3 Game rules . 5 4 Theory 7 4.1 Reinforcement Learning . 7 4.2 Markov Decision Processes . 8 4.3 Markov Game Model . 9 4.4 Solutions to MDPs . 9 4.5 AD-Tree . 10 4.6 Relational database . 11 4.7 Cumulative distribution function . 11 4.8 Box plot . 11 4.9 Exponentially weighted moving average . 12 5 Method 13 5.1 Building the state space . 13 5.2 Learning the Q-values . 15 5.3 Action impacts . 19 5.4 Individual player evaluation . 19 5.5 Player pair evaluation . 20 6 Results and discussion 21 6.1 Top pairings . 21 6.2 Top pairings analysis . 22 v 6.3 TOI analysis . 23 7 Conclusion 27 Bibliography 29 vi List of Figures 3.1 A typical layout of a hockey rink. Each zone is divided by the blue lines. The middle being the neutral zone and the left/right being the defensive/offensive zones. The dotted line marks the centre of the rink and the blue circle on it marks the starting face-off spot. The other four red circles mark the other face-off spots. The goalposts are located on the centre of the red lines in the defensive/offensive zones. 6 4.1 A simple Markov process with two states and transition functions between them. 8 4.2 A small example of an AD-tree where each entry has 3 binary attributes. 10 4.3 One box plot. 12 5.1 The final state space with context features and play sequences . 15 6.1 CDF/CCDF of all seasons. CDF is plotted bottom left to right and CCDF top left to right (log10 scale). 23 6.2 Box plots of seasons. 24 6.3 Box plots of pairs. 25 6.4 Impact per minute played. 26 6.5 Box plots of pairs. 26 vii List of Tables 5.1 Action events. 13 5.2 Start/end events. 13 5.3 Some rows and attributes from the play-by-play events table stored in MySQL database. 14 5.4 Some rows from the the node table with its attributes being context features, which player who performed the action and the occurrence of that node. 16 5.5 Some rows from the edge table with its attributes being the node ids and the oc- currence of that edge. 16 6.1 Top forward pairs 2011-2012. 21 6.2 Top defense pairs 2011-2012. 21 6.3 Top mixed pairs 2011-2012. 22 6.4 Top forward pairs 2013-2014. 22 6.5 Top defense pairs 2013-2014. 22 6.6 Top mixed pairs 2013-2014. 22 viii 1 Introduction There exist many studies on evaluating individual player performance in ice hockey but not on the performance of player pairs. Ice hockey is a team sport and it is therefore important for the coaches to identify which players that work well together and who do not (for picking out to next season game). Another problem with common metrics for player performance e.g., counting goals, is that they are too simple and do not capture the internal dynamics of the play. In recent works [22] [19], there have been promising results of summing the impact of player actions using data mining techniques. The work by Routley and Schulte [19] with a Markov game model and reinforcement learning techniques learns an action- value Q-function that takes into account the context of an action. This is important because different actions could have different impacts depending on the current situation of the game. The model Routley and Schulte use also perform look-ahead on the consequences of actions which is important in a low scoring game like ice hockey. Data mining techniques have the potential to be generalized into new situations and provide better aide to human domain experts (coaches, managers, and scouts) than just using statistics would do. They could also be used independently to make decisions without input from human experts. [25] In this thesis we extend Routley’s and Schulte’s work [19] with added measures of evaluating player pairs. 1.1 Research questions In order to comply with the objective of this study, the following research questions will guide the investigation and be summarized in the conclusion: 1. Can the method used by Routley and Schulte, with our extension of player pair valida- tion provide good pair matching? 2. Can it be used to provide other insights that could be of relevance for coaches? 1.2 Delimitations The calculations will be performed on an Intel laptop with 2,90 GHz Intel Core i7 and 32 Gb memory. 1 1.3. Research method 1.3 Research method First, a literature study was conducted to find a suitable method for evaluating players and working with the play-by-play event-driven hockey data. This was also done to get a better understanding of the theory used in the field and what could be extended upon. Routley’s and Schulte’s work with a Markov game model and reinforcement learning [19] were finally chosen due to several reasons. It was pretty clear how we could extend the method to also include player pair evaluation and their method in the report was for the most time easy to understand and follow. Also, they had included links to the hockey data, results from running their algorithm and (most of) the code described in the report. This made it easy to develop and test our implementation and compare our results to theirs.