KABADDI: from an Intuitive to an Quantitative Approach for Analysis, Predictions and Strategy∗†
Total Page:16
File Type:pdf, Size:1020Kb
KABADDI: From an intuitive to an quantitative approach for analysis, predictions and strategy∗† Manojkumar Parmar Bangalore, Karnataka, India [email protected] ABSTRACT CCS CONCEPTS Kabaddi is a contact team sport of Indian-origin. It is a highly strate- • Applied computing → Computers in other domains; • In- gic game and generates a significant amount of data due to its rules. formation systems → Data analytics; • Computing method- However, data generated from kabaddi tournaments has so far been ologies → Machine learning approaches; • Human-centered unused, and coaches and players rely heavily on intuitions to make computing → Information visualization; decisions and craft strategies. This paper provides a quantitative approach to the game of kabaddi. The research derives outlook from KEYWORDS an analysis performed on data from the 3rd Standard-style Kabaddi Kabaddi, Sports Analytics, Predictive Model, Team Profiling, Visu- World Cup 2016, organised by the International Kabaddi Federation. alisation, Hypotheses, Model selection The dataset, which consists of 66 entries over 31 variables from 33 matches, was manually curated. This paper discusses and provides ACM Reference Format: a quantitative perspective on traditional strategies and conceptions Manojkumar Parmar. 2018. KABADDI: From an intuitive to an quantitative related to the game of kabaddi such as attack and defence strate- approach for analysis, predictions and strategy. In Proceedings of . , 8 pages. gies. Multiple hypotheses are built and validated using studentâĂŹs https://doi.org/ t-test. This paper further provides a quantitative approach to pro- file an entire tournament to gain a general understanding ofthe 1 INTRODUCTION strengths of various teams. Additionally, team-specific profiling, Kabaddi is a team contact sport which has originated in India and through hypotheses testing and visualisation, is presented to gain has travelled to other countries in the region. Kabaddi’s uniqueness a deeper understanding of teamâĂŹs behaviour and performance. is that an entire team defends against a single attacking player This paper also provides multiple models to forecast the winner. from the opposition team. Appendix A provides further details The model-building includes automatic feature selection techniques about game of kabaddi. A majority of Indians either knows or plays and variable importance analysis techniques. Generalised linear kabaddi [28]. However, the use of technology to better the game is model with and without an elastic net, recursive partitioning and non-existent so far. Commercialisation of technology has led to the regression tree, conditional inference tree, random forest, support penetration of the same into kabaddi as well [1]. Kabaddi produces vector machine (linear and radial) and neural network-based mod- data which is under-utilised today, and in the best form used for els are built and presented. Ensemble models use generalised linear showing descriptive statistics. Traditionally, intuitive judgements model and random forest model techniques as ensemble method drove decision-making in kabaddi. This research aims at converting to combine outcome of a generalised linear model with the elastic "what we think" to "what we know", an approach famously pre- net, random forest, and neural network-based models. The research scribed by Sam Allardyce [8]. The research investigates established discusses the comparison between models and their performance claims and strategies of kabaddi and validates them using hypothe- parameters. Research also suggests that ensemble technique is not sis testing methodology. This validation starts at the tournament able to boost up accuracy. Models achieve 91:67% − 100% accuracy level, and team level granularity is introduced as a next step. Based on cross-validation dataset and 78:57% − 100% on test set. Results on the proven foundation, this research then ventures into pre- presented can be used to design in-game real-time winning predic- dictive modelling of the game outcome using supervised learning tions to improve decision-making. Results presented can be used method. Ensembling technique is used to make predictive models to design agent and environments to train artificial intelligence via robust, though results are not promising. Results from predictive reinforced learning model. models are highly satisfactory with 100% prediction accuracies demonstrated. Literature survey, research method, dataset preparation, descrip- ∗ The manuscript is a post-print in nature. The author is a single author and author tive statistics, hypotheses validation, model-building and discussion has agreed for the submission of paper. †This paper was presented in conference 5th International conference on Business of results, form the seven sections of the paper. Literature survey Analytics & Intelligence (link) held at IIM Bangalore, India during 11-13, December discusses the state-of-the-art for kabaddi sports analytics. Research 2017. In this conference only book of abstracts is published. method and dataset preparation discuss the data generation and ‡Author can be reached on stated email, over twitter using @mparmar47, and over phone using +91-9980866747. curation along with research method used. Descriptive statistics sec- tion provides a detailed understanding of dataset via numbers and visualisations. Hypothesis validation investigates common strate- ,, gies and claims of kabaddi. Model building forms the core of re- 2018. search and provides a detailed discussion of predictive models and ,, Manojkumar Parmar their accuracy. Discussion of results comprises of the prospects of analytics. Proprietary sports statistics may not help sports analyt- research findings along with the possible applications. ics movement in general [13] and this is another motivation for publishing research on kabaddi sports analytics. 2 LITERATURE SURVEY 3 RESEARCH METHOD Sports diversity is growing in India, and so is the monetary value at- The objective of this research is to build a model to predict the tached to each sport [15]. India is witnessing the rise of professional outcome of the game while validating the established claims and sports, banking on the success of various leagues [31]. Sports is all strategies. Research methodology consists of curating the dataset about decision-making on the field and off the field, considering and performing a variety of analytics on it to provide insights. multiple parameters. Kabaddi, as a sport, is not different in this re- The dataset is processed to produce a descriptive statistical sum- spect. Kabaddi can benefit from analytics as it produces a variety of mary along with visualisation to improve the understanding of data at the team level and individual player level. Devenport [8], in the dataset. Multiple hypotheses are validated using parametric his comprehensive article, recommends three types of analytics for hypothesis test. The predictive model building includes deriving sports - players’ injury and health analytics to predict their fitness multiple models and later choosing them based on performance and readiness for a game, business analytics to leverage business parameters. The research method applied here is the most common aspects of a game, and player and game performance analytics to method employed by analytics practitioners. help predict individual and game outcomes. Player’s injury and health indicators are measured meticulously, and various studies leverage this information. The research by De, 4 DATASET PREPARATION Dasgupta, Panda, and Bhattacharya [9] is one of the early efforts Data and dataset preparation is critical as it forms the core of any to test the physical efficiency of a male player and relate it to their analytics exercise. Dataset is prepared by collecting raw informa- health. The research by Khanna, Majumdar, Malik, Vrinda, and tion and then applying processing techniques to make the data Mandal [19] is another study to measure physiological responses of relevant for research. For this research, data is consolidated by man- players during the game to deepen understanding of player health. ually scraping through the website of 3rd Standard-style Kabaddi Additionally, specific injuries like knee injuries [11] of players and World Cup tournament. The post-processed dataset consists of 66 most common injuries [22] are subjects of research. These stud- entries over 31 variables from 33 matches. Curated dataset along ies also provide probable causes for injuries. There is a need to with its codebook, which describes variables and its properties, bring fragmented and piecemeal research on players’ injury and are published on Kaggle platform [25]. Appendix B provides more health together to consolidate the learnings and present a holistic information about dataset and codebook. This curated dataset has health and injury analytics for kabaddi players, coaches, and team received ’featured’ status on Kaggle platform as it is the first dataset managers. Business analytics, which includes fan programmes and published on kabaddi. Variables of the dataset represent the attack engagement, dynamic ticket pricing, and marketing optimisation, is and defence points acquired by teams as per standard kabaddi rules not into the exclusive focus of researchers. The research by Sanjeev like tackle points, raid points, bonus points, allout points, touch and Ankur [31] on the topic of "Constituents of Successful Sports points, total points and other points. Variables representing