Embeddings Based Statistical and Semi Supervised Cricket Team Recommendation System
Total Page:16
File Type:pdf, Size:1020Kb
CRICTRS: EMBEDDINGS BASED STATISTICAL AND SEMI SUPERVISED CRICKET TEAM RECOMMENDATION SYSTEM Prazwal Chhabra, Rizwan Ali and Vikram Pudi International Institute of Information Technology, Hyderabad, India ABSTRACT Team Recommendation has always been a challenging aspect in team sports. Such systems aim to recommend a player combination best suited against the opposition players, resulting in an optimal outcome. In this paper, we propose a semi-supervised statistical approach to build a team recommendation system for cricket by modelling players into embeddings. To build these embeddings, we design a qualitative and quantitative rating system which considers the strength of opposition also for evaluating player’s performance. The embeddings obtained, describes the strengths and weaknesses of the players based on past performances of the player. We also embark on a critical aspect of team composition, which includes the number of batsmen and bowlers in the team. The team composition changes over time, depending on different factors which are tough to predict, so we take this input from the user and use the player embeddings to decide the best possible team combination with the given team composition. KEYWORDS Cricket Analytics, Data Mining and Data Analytics 1. INTRODUCTION The advent of statistical modelling has contributed significantly to the success of teams and players in different sports. Different methods have been developed to evaluate player performances in different sports, but team sports pose a different challenge, as comparing two players of same nature and getting a suitable team against an opposition, is difficult. For example, in cricket [1], a team sport which is discrete in nature, comparing two players of same nature (comparing a batsman with another batsman, or a bowler with another bowler) in same and different teams is a complex task. Often, the players are compared based on their quantitative aspects like high scores, wickets taken and career averages (number of runs scored or conceded per dismissal) and teams are decided based on them only. The quantitative factors provide insights but miss some important aspects: Quality of Runs Scored: Two players who played against different oppositions (which are ranked differently) and performed similarly, will have similar statistics. In the mentioned case, the player who scored against better opposition, should be rated better. Quality of Dismissals: Dismissals of batsman with higher career average should be rated more than dismissals of batsman with lower career averages. This paper tries to keep above two important aspects in mind and build a rating system called 'Quality Index of Player (휙Player) which includes qualitative and quantitative aspects of player performance. Later, using 휙Player, we represent players as embeddings, to build a "Semi- Supervised Team Recommendation System". The embeddings obtained, describe the strengths and weaknesses of the players based on theirpast performances. While drafting a recommender David C. Wyld et al. (Eds): MLNLP, BDIoT, ITCCMA, CSITY, DTMN, AIFZ, SIGPRO - 2020 pp. 67-77, 2020. CS & IT - CSCP 2020 DOI: 10.5121/csit.2020.101207 68 Computer Science & Information Technology (CS & IT) system, factors like overall complexity and set of parameters to be considered, are a major factor and a system with high complexity won't be much useful for instantaneous results. If all the possible valid team combinations are considered from a pool of players, the complexity of that would still be polynomial, but very high. Proper selection of parameters along with considering orderings following some constraints can be useful for instantaneous results and can also be used for in-match results when match is not going as predicted. The method, although proposed for cricket, can also be extended to other sports with some modifications. 2. RELATED WORK In literature, the player rating methods like A. Ramalingam [2], MG Jhawar et al. [3], S. Akhtar et al. [4], Margaret I. Johnston et al. [5] are quantitative in nature and give high weight to batsman with more batting averages (runs scored per wicket) and bowlers with lower bowling averages (runs conceded per wicket). Some studies include graphical representations to compare players (A. Kimber [6] proposed a graphical method to compares bowlers). Q. Zhou et al. [7] explains how team recommendations should work, considering the aspect of expanding teams and substituting team members. L. Li, H. Tong et al. [8] also explains team member replacement, considering skill and structure matching. Also, the existing work on "Team Recommendation for Cricket" mainly rank players on statistical measures or some techniques like clustering, etc. Prakash, C. Deep [9] ranks players using a Clustering Algorithm based on different batting and bowling parameters. In S.B. Jayanth et al. [10], K-Means and SVM with RBF Kernel was used to recommend teams for 2011 World Cup. F. Ahmed et al. [11] maximizes the overall batting and bowling strength of a team by optimizing a Multi Objective Problem. NSGA-II algorithm was used to optimize the overall batting and bowling strength of a team and find team members in it. 3. PROPOSED SOLUTION In this paper, we propose a qualitative and quantitative rating mechanism called ’Quality Index of Player 휙Player' which is used to build player embeddings. CRICTRS, a semi supervised team recommendation system, uses the player embeddings and recommends a team based on opponent's strengths and weaknesses. The system utilizes the weakness of the opponent and finds a similar player in our team to recommend against the opponent for each player in opposition. This process is done considering every player in the opposition. For representational purpose, a bipartite graph can be used, with opposition being on one side, and our players on the other. 3.1. Dataset Cricsheet dataset [12] contains data of over 1400 international ODI matches, played between 2005 to 2019. For each match, ball by ball data is available, with following features: 'Inning', 'Over', 'Team Batting', 'Batsman', 'Non-Striker', 'Bowler', 'Runs-Scored', 'Extras', 'Wicket' and 'Dismissed Batsman'. Along with this, details like competing teams, date of match, venue of match, match and toss result are also available. Cricinfo [13] was used to validate the information across each match. 3.2. Player Rating System for Cricket Improving upon existing methods we try to build a method that considers quality of runs and wickets while rating the players. A brief overview of CRICTRS is shown in Fig. 1. Computer Science & Information Technology (CS & IT) 69 3.2.1. Modelling a Match We take the idea from [2] and model each delivery as a Bernoulli trial. The two possible outcomes for each Bernoulli trial or a delivery are 'r' runs scored or a wicket, where 'r' is defined as average runs scored by a batsman per ball. To evaluate a batsman's individual performance in a team, we evaluate the performance of a team that contains 11 replicas of same player and calculate expected score by that team with 10 wickets in hand. Thus, a team with 11 replicas of batsman, on average, will score 300*r runs in a match, when the team does not lose any wicket in a 50 over (300 balls) ODI Match. But if a team loses 'w' wickets, where w<10, then the team will score (300-w) *r runs in that match. In case of an all-out, when team loses all the wickets, average runs scored will be (b-10) *r, where b is the number of balls the team faced in that match. Similarly, a bowler can be evaluated by replacing 'r' as average run conceded per ball and evaluating expected runs conceded by a team of 11 replicas of the bowler. Using above, the expected outcome of match can be written as: - 푟푢푛푠 푟푢푛푠 r = , 푎푣푔 = 푏푎푙푙 푤푐푘푒푡 1 푟 P(dismissal) = 1-p = = (푏푎푙푙푠_푝푒푟_푤푐푘푒푡) 푎푣푔 퐸(푟푢푛푠)=퐸푎푙푙 (푟푢푛푠)+퐸푎푙푙 (푟푢푛푠) 표푢푡 표푢푡 70 Computer Science & Information Technology (CS & IT) 3.2.2. Quality of Runs and Dismissals The approach in [2] is completely quantitative and misses an important aspect of quality of opposition. We replace the quantity metrics of 'r' and 'avg' used in [2] with our quality and quantity-based metric which is re-evaluated as follows: - 퐶푏푎푡푠푚푎푛 = Career Averarge of Batsman 퐶푏표푤푙푒푟 = Career Averarge of Bowler Quality Metrics of Batsman 퐂퐛퐚퐭퐬퐦퐚퐧 Quality of Dismissal (훟Dismisaal)= 퐂퐛퐨퐰퐥퐞퐫 퐂퐛퐚퐭퐬퐦퐚퐧 Quality of Run Scored(훟run)= 퐂퐛퐨퐰퐥퐞퐫 Quality Metrics of Batsman 퐂퐛퐚퐭퐬퐦퐚퐧 Quality of Dismissal (훟Dismisaal)= 퐂퐛퐨퐰퐥퐞퐫 Total runs conceded=Runs Conceded+Extras 퐂퐛퐨퐰퐥퐞퐫 Quality of Run Scored(훟run)= 퐂퐛퐚퐭퐬퐦퐚퐧 With this method we can consider some important aspects of the match, that are difficult to capture otherwise. These include: - 1) Dismissals of top order batsman matter more and as usually top order batsman have higher career average, thus a bowler who takes wickets of in form high average batsman is more rewarding than a bowler who takes wickets of tail-enders. 2) Extras were completely ignored by all previous metrics. Here if a bowler bowls more extras, then he might be under pressure, thus extras are also an important metric while considering bowlers. A bowler who gives away more extras provides greater risk to the team by giving away runs. After re-evaluating the 'r' and 'avg' of players using above, we finally calculate the player rating represented by ‘Quality Index of Player' (휙Player) which is evaluated as: 퐸(푟푢푛푠)−휙푎푣푔 Quality Index of Player (휙 Player ) = . σruns On evaluating the results, we observe the following: - 1) In [2] spinners and in general bowlers who bowled in the middle overs of the innings had higher rating than the bowlers who bowled in the powerplay and death overs, but our method regularises this as shown by the above examples. In above table, Harbhajan Singh and Yuvraj Singh had significantly higher ratings as compared to others by [2], but our method regularises the rating, and no such disparity is there.