AI Methodology for Automated Selection of Playing XI in IPL Cricket
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Engineering Technology Science and Research IJETSR www.ijetsr.com ISSN 2394 – 3386 Volume 4, Issue 6 June 2017 AI Methodology for Automated Selection of Playing XI in IPL Cricket C.Deep Prakash C.Patvardhan C.Vasantha Lakshmi Assistant System Engineer Dayalbagh Educational Dayalbagh Educational Trainee,TCS Institute Institute CTO, NOIDA, India Agra, India Agra, India ABSTRACT Rajasthan Royals, Kolkata Knight Riders, Kings XI T20 cricket has revolutionized competitive cricket with Punjab, Mumbai Indians) were assigned to them. fans finding the shortest format just ideal for an exciting The franchises selected their squads according to evening. The Indian Premier League, whose 10th edition the rules of IPL through competitive bidding from a was completed in May 2017 is a case in point. The stakes pool of Indian and foreign players selected by in IPL tournaments are huge. Enormous sums of money BCCI. BCCI has been organizing the IPL T20 are spent by franchises to acquire the best talent to cricket tournaments every year since then. 10 IPL represent them. This brings up the problem of selecting the tournaments have been held till date with latest best playing XI to benefit from the investment. A being completed in May 2017. completely automated and objective procedure based on comprehensive analytics of performance data using state of the art AI techniques is presented in this paper. The The use of analytical and statistical modeling in approach is validated on data from IPL 9. It is shown that every aspects of cricket such as Batting, Bowling, in a high proportion of cases i.e. 73.3 % cases the team Fielding, Team Selection, Result Prediction, Player that matches more closely the playing XI selected by the Ranking, Team Ranking, Target Revision in a rain proposed methodology wins. Thus the proposed approach affected match is very important. Analytics are is eminently suitable and can be gainfully utilized by Team popular because Indian fans are also followers of Management for automated solution of this complex problem. The methodology is completely objective and free statistical records. Analysis of IPL data thus of bias. becomes more important. Keywords Some data analytics studies related to cricket Machine Learning, Team Selection, Clustering, IPL, reported in the literature are as follows. Deep ReliefF Prakash et al. presented Deep Performance Index for ranking batsmen and bowlers [3]. Barr et al. used a weighted combination of average and strike INTRODUCTION rate for performance evaluation of both batsmen The first T20 World Cup was organised in South and bowlers who played in the first T20 world cup Africa in 2007. India won the tournament after and based upon their ranking they presented the defeating Pakistan in a high voltage final by 5 runs. World XI [4]. Deep Prakash et al. presented a team This prompted Board of Control for Cricket in India selection methodology based of heuristics and (BCCI) [1] to introduce a T20 league, named Random Forests algorithm for IPL season 9 [5] and Indian Premier League (IPL) [2] in 2008. In the first presented another approach using a memetic genetic season eight of the largest cities in India algorithm for selecting the best playing XI for each (Bangalore, Chennai, Delhi, Hyderabad, Jaipur, team in IPL season 9 [6]. However, no attempt was Kolkata, Mohali, Mumbai) were taken and eight made to validate the approach. Amit Kumar and franchises (Royal Challengers Bangalore, Chennai Sindhu [7] use a variety of detailed metrics to Super Kings, Delhi Daredevils, Deccan Chargers, analyze batting performance in IPL. The metrics are 419 C.Deep Prakash, C.Patvardhan, C.Vasantha Lakshmi International Journal of Engineering Technology Science and Research IJETSR www.ijetsr.com ISSN 2394 – 3386 Volume 4, Issue 6 June 2017 designed to reflect the impact of a player's described. In section IV, the new Cluster-based performance on the match outcome. However, Performance Index (CBI) and the proposed team detailed data that is necessary to perform the task selection methodology are given. In section V, are not available easily. Dey and Ghosh [8] employ results are described. In section VI, conclusions and an MCDM approach for evaluating Bowlers future work are given. performance in IPL. In this approach the number of features are very less and almost 25% of the FEATURES EXTRACTION FOR BATSMEN importance is assigned to the number of matches AND BOWLERS played, number of overs bowled and the number of wickets taken. Thus the players who have played For every batsman who played in IPL season 9, 38 more matches may get undue benefit over the young features are calculated from the respective datasets talented players. (Career, Previous Year, IPL). For every bowler who played in IPL season 9, 37 features are calculated There are many performance evaluation metrics but from the respective datasets (Career, Previous they are based on very limited considerations. There Year, IPL). The details of features and their is a need for a more comprehensive metric which mathematical definitions are given for batsmen and incorporates a larger set of cricketing attributes of bowlers in tables 1 and 2 respectively. the players. This is the motivation for developing an analytical framework of detailed features such that all cricketing attributes could be taken into account CLUSTERING OF BATSMEN AND while evaluating the performance of the players. In BOWLERS this work complete analysis of IPL season 9 is The team needs to be balanced in terms of done. The data has been considered in three parts as availability of players for different roles like follows. openers, middle order batsmen, finishers, fast (i) Overall T20 International career data of the bowlers, spinners, wicketkeeper, captain etc. A player up to IPL season 9. team lacking in one particular type of players would (ii) Previous year's international T20 data in order find it difficult to win. A clustering based solution is to take their current form into account [9] and, proposed for the team selection problem. Players are clustered so that players with the similar (iii) IPL career data up to season 9 [10] . abilities and prospective roles can be compared 38 features are calculated for batsmen out of which against each other while selecting the best playing 22 features are extracted from their career data, 8 XI. Each player has a different role in the team and features from their previous year's data and 8 clustering gives relevant information. features from their IPL career data. Similarly 37 K-Means Clustering [11] is used to cluster the features are calculated for bowlers out of which 22 batsmen and bowlers into different clusters. K- features are extracted from their career data, 8 Means Clustering is an unsupervised learning features from their previous year's data and 7 algorithm [12] in Data Mining [13]. All the features from their IPL career data. This is the first batsmen are categorized into k clusters, in which work in which such a comprehensive set of features each batsman belongs to a cluster with nearest has been considered. After extraction of these mean. Let the feature vectors of players be P(1), features, clustering of players is done on the basis of P(2), P(3),.......P(m). In order to group them into k similarity between players. A new integrated and clusters, in the training data, feature vectors P(i) are comprehensive performance index called the given, where i=1......m (number of players). Since it Cluster Based Index (CBI) is developed and is an unsupervised learning algorithm, there is no computed. A team selection strategy is developed target variable. The goal of K-Means Clustering that uses information of these clusters and CBI algorithm is to calculate K centroids (one for each values and identifies the best playing eleven for cluster) and assign a cluster C(i) to each player P(i). each team. Elbow Method [14] is used to determine the number The rest of the paper is organized as follows. In of clusters K. In Elbow Method the percentage of section II, details of features and their extraction is variance is examined as a function of the number of described. In section III, the clustering approach is 420 C.Deep Prakash, C.Patvardhan, C.Vasantha Lakshmi International Journal of Engineering Technology Science and Research IJETSR www.ijetsr.com ISSN 2394 – 3386 Volume 4, Issue 6 June 2017 clusters K and number of clusters K is chosen such The features for batsmen and their corresponding that addition of another cluster doesn't give much weights, according to their respective cluster are better modeling of the data. The number of clusters given in the table 3. In cluster 1, the ability of the for batsmen is determined using Elbow Method in batsman to hit boundaries is the most prominent Figure 1. feature and the ability of maintaining good strike rate on Asian pitches as well as other continental pitches while maintaining the consistency and getting big scores are also important. For cluster 2 players, strike rate matters most along with their ability to stay not out, their consistency in the winning matches as well as in IPL and their experience matters most. Ability to hit boundaries is the most prominent feature for players in cluster 3 and their ability to remain not out, their ability to Fig1. Determining number of clusters using rotate the strike and their performance in the Elbow method winning matches are also important. For cluster 4 batsmen, experience matters the most and along Using K-means clustering on batsmen 6 clusters of with that their consistency in the target chasing sizes 28 players , 22 players , 9 players , 8 players , matches and in the winning matches and their strike 17 players and 11 players are obtained.