Quantitative Assessment of Player Performance and Winner Prediction in ODI Cricket

Quantitative Assessment of Player Performance and Winner Prediction in ODI Cricket Thesis submitted in partial fulfillment of the requirements for the degree of Masters of Science by Research in Computer Science and Engineering by Madan Gopal Jhanwar 201202018 [email protected] International Institute of Information Technology Hyderabad - 500032, INDIA JULY 2017 Copyright c Madan Gopal Jhanwar, 2017 All Rights Reserved To my Father, for his unconditional love and support iv Acknowledgements IIIT-Hyderabad has been a family to me for the last 5 years. During this journey, I have met several people each of whom have taught me a new thing every day. As I submit my thesis today, I feel indebted to a lot of people around me without whom my stay at IIIT would not have been such a great memory. First of all, I would like to thank my guide, Dr. Vikram Pudi, who has been an excellent mentor throughout my research journey. He not only gave me freedom to explore my passions but also supported me when I showed interest in cricket analytics. His guidance not only helped me in my research but also made me a better person. I don’t have enough words to thank my father, mother and my family, who have shown immense faith and trust in me. They have given me continuous encouragement throughout my years of study. I couldn’t have accomplished it without the constant support from many of my friends. Saksham, thanks for motivating me in the early days of my research and helping me in getting started. Arjit, I have always seen a great friend in you. Thanks for going beyond the limits to always help me. Our discussions on cricket have given me many insights into the game. Remya, thanks for being around and always helping and motivating me. Nirat, thank you for putting in all the time and efforts. I have learnt the value of friendship and sharing from you guys. Arihant, Abhishek, Junaid, Shipra, Somya, Kalpit, Mohit, Shubham, Sarfaraz, Nazrul, Sanatan, Rishab and Shashank, a special thanks to you all. Thank you for motivating me all along and mak- ing the college life a lively one. I would also like to thank all my lab mates – Jaspreet, Tejas, Vamshi, Saket and Soumyajit, for the wonderful interactions and fun work sessions. Last, but not the least, thanks to IIIT community for giving me an inspiring environment and loads of opportunities to grow. v Abstract The statistics of professional sports, including players and teams, provide numerous opportunities for research. Cricket is one of the most popular team sports, with billions of fans all over the world. In this thesis, we address two problems related to the One Day International (ODI) format of the game. First, we propose a novel method to predict the winner of ODI cricket matches using a team-composition based approach at the start of the match. Second, we present a method to quantitatively assess the performances of individual players in a match of ODI cricket which incorporates the game situations under which the players performed. The player performances are further used to predict the player of the match award. Players are the fundamental unit of a team. Players of one team work against the players of the opponent team in order to win a match. The strengths and abilities of the players of a team play a key role in deciding the outcome of a match. However, a team changes its composition depending on the match conditions, venue, and opponent team, etc. Therefore, we propose a novel dynamic approach which takes into account the varying strengths of the individual players and reflects the changes in player combinations over time. Our work suggests that the relative team strength between the competing teams forms a distinctive feature for predicting the winner. Modeling the team strength boils down to modeling individual players’ batting and bowling performances, forming the basis of our approach. We use career statistics as well as the recent performances of a player to model him. Using the relative strength of one team versus the other, along with two player-independent features, namely, the toss outcome and the venue of the match, we evaluate multiple supervised machine learning algorithms to predict the winner of the match. We show that, for our approach, the k-Nearest Neighbor (kNN) algorithm yields better results as compared to other classifiers. Players have multiple roles in a game of cricket, predominantly as batsmen and bowlers. Over the generations, statistics such as batting and bowling averages, and strike and economy rates have been used to judge the performance of individual players. These measures, however, do not take into consideration the context of the game in which a player performed across the course of a match. Further, these types of statistics are incapable of comparing the performance of players across different roles. Therefore, we present an approach to quantitatively assess the performances of individual players in a single match of ODI cricket. We have developed a new measure, called the Work Index, which represents the amount of work that is yet to be done by a team to achieve its target. Our approach incorporates game situations and the team strengths to measure the player contributions. This not only helps us in vi vii evaluating the individual performances, but also enables us to compare players within and across various roles on a common scale. Using the player performances in a match, we predict the player of the match award for the ODI matches played between 2006 and 2016. We have achieved an accuracy of 86.80% for the top-3 positions in predicting the player of the match award, which is superior to previous works and other baseline models. This further proved the validity of our approach. Contents Chapter Page 1 Introduction :::::::::::::::::::::::::::::::::::::::::: 1 1.1 History of Cricket . .1 1.2 The Game of Cricket . .2 1.3 Aim and Contributions . .6 1.4 Thesis Workflow . .7 2 Related Work ::::::::::::::::::::::::::::::::::::::::: 8 2.1 Brief History of Cricket Analytics . .8 2.2 Literature on Winner Prediction . 10 2.3 Literature on Player Modeling . 11 2.4 Literature on Assessing Player Performances . 13 3 Winner Prediction: A Team Composition Based Approach :::::::::::::::::: 15 3.1 Motivation . 15 3.2 Dataset . 16 3.3 Problem Formulation . 18 3.4 Methodology . 20 3.4.1 Modeling Batsmen . 20 3.4.1.1 The Algorithm . 22 3.4.2 Modeling Bowlers . 22 3.4.2.1 The Algorithm . 24 3.4.3 Modeling Teams . 25 3.4.4 Feature Construction . 26 3.5 Experiments and Results . 26 3.5.1 Modeling Batsmen . 27 3.5.1.1 First Match . 28 3.5.1.2 Second Match . 28 3.5.1.3 Third Match . 29 3.5.2 Modeling Bowlers . 30 3.5.2.1 First Match . 30 3.5.2.2 Second Match . 31 3.5.3 Modeling Teams . 31 3.5.4 Prediction Model . 32 3.6 Discussions . 34 viii CONTENTS ix 4 Honest Mirror: Quantitative Assessment of Player Performances ::::::::::::::: 35 4.1 Introduction . 35 4.2 Dataset . 36 4.3 Methodology . 37 4.3.1 Target estimation for first innings . 37 4.3.2 Modeling Players . 39 4.3.2.1 Modeling Teams . 39 4.3.3 Work Index . 40 4.3.4 Assessing player performances . 42 4.4 Applications . 42 4.4.1 Player of the Match . 42 4.4.2 Player Rankings . 44 4.5 Experiments and Results . 44 4.5.1 Target estimation for first innings . 44 4.5.2 Player of the match . 45 4.5.3 Work Index . 47 4.5.4 Ranking Players . 48 4.6 Discussions . 48 5 Conclusions and Future Work ::::::::::::::::::::::::::::::::: 51 5.1 Winner Prediction . 51 5.2 Honest Mirror . 51 List of Figures Figure Page 1.1 A cricket ground and some of the common fielding positions. Source: Cricket-Australia 3 1.2 Dimensions of a cricket pitch and the related terminologies. Source: Australia-Cricket- News ............................................4 3.1 The average number of player changes per match for all the teams in the years 2010- 2014. We can see that on average, a team changes around 2 players per match. Thus, about a fifth of the team changes per match . 16 3.2 The total number of distinct players played for each team in the years 2010-2014. The high number of distinct players (35-50) in a team of 11, shows that the mentioned factors (Section 3.1) are prominent . 17 3.3 Total number of training samples for each team . 18 3.4 Total number of test samples for each team . 19 3.5 Figure depics the overall work-flow of our approach. We model the players into batsmen and bowlers using their career and recent performance statistics, which in-turn helps in modeling teams. Using toss outcome and venue details, along with the relative team strength as the features, we use supervised models to predict the winner of the match. 21 3.6 Figure depicts the accuracy for different supervised models. As it can be seen, kNN, with k=4, yields best results for our approach to predict the winner of a match. 32 3.7 F-Score comparison for different models. 33 3.8 Figure shows the accuracy of our model for different countries. It depicts that some countries like New Zealand depend heavily on their front-line players, while other countries like South Africa do not. 34 4.1 Average defendable scores in different countries. 38 4.2 Total number of wickets fell in each over in our dataset.

Load more