Assessing Decision Rules for Stopped Cricket Games Assessing Decision Rules for Stopped Cricket Games
Total Page:16
File Type:pdf, Size:1020Kb
Assessing Decision Rules For Stopped Cricket Games Assessing decision rules for stopped cricket games Ahsan Bhatti, B. Sc., A. Stat A project Submitted to the School of Graduate Studies in Partial Fulfilment of the Requirements for the Degree of Master of Science McMaster University c 2015 Ahsan Bhatti i Master of Science (2015) McMaster University Statistics Hamilton, Ontario TITLE: Assessing Decision Rules For Stopped Cricket Games AUTHOR: Ahsan Bhatti SUPERVISOR: Professor Ben Bolker NUMBER OF PAGES: ix, 75 ii Abstract In interrupted limited overs cricket games, teams do not always get a chance to finish the full game. In such cases, different methods can be used to establish which team wins. The method currently in use is called the Duckworth-Lewis method; it was introduced at the international level in 1998. The purpose of this project is to investigate and check the accuracy of Duckworth-Lewis method and make statistical comparisons with other proposed methods. The accuracy is checked via bias estimation, Cohen's Kappa and root mean square error methods; cross-validation is used to assess out-of-sample accuracy. The resource table, a summary of the expected fraction of total runs scored by a given point in the game, has missing values and monotonicity flaws. To improve the resource table, different statistical methods such as isotonic regression and Gibbs sampling, are used to construct different resource surfaces. The accuracy results show that the Duckworth- Lewis displays the lowest accuracy out of all the methods; whereas, the improved Duckworth-Lewis is more accurate at predicting the new target or results of stopped games. Thus, the accuracy of the Duckworth- Lewis can be improved by ignoring the old games and constructing the resource surface using the modern data. iii Acknowledgements First of all, I would like to express my appreciation to my supervisor Ben for being supportive, generous and compassionate human. It was a wonderful experience to work under his supervision. I would also like to sincerely thank Dr. Viveros and Dr. Feng for being my committee members. Special thanks to my family and friends for being supportive at all times. I appreciate it more than you can even imagine. iv Contents 1 Introduction 1 1.1 History and Formats of Cricket . .1 1.2 Project Motivation . .2 1.3 Cricket Rules and Old Methods . .7 1.3.1 Average Run Rate . .8 1.3.2 Most Productive Overs . .8 1.3.3 Discounted Most Productive Overs . .9 1.3.4 PARAB . .9 1.3.5 Clark Curves . 10 1.3.6 Duckworth-Lewis (D/L) Method . 10 2 Constructing Resource Table and Surfaces 14 2.1 Data Collection . 14 2.2 New Resource Tables . 14 2.2.1 Mean of Ratios (R)..................................... 15 2.2.2 Optimization via D/L Method (Improved D/L Method) . 18 2.2.3 Isotonic Regression . 19 2.2.4 Gibbs Sampling . 21 2.3 Comparison of Resource Tables . 22 2.3.1 R vs. Duckworth-Lewis . 22 2.3.2 Isotonic Regression vs. Duckworth-Lewis . 23 2.3.3 Gibbs Sampling vs. Duckworth-Lewis . 23 2.3.4 Improved Duckworth-Lewis vs. Isotonic Regression . 25 2.3.5 Improved Duckworth-Lewis vs. Gibbs Sampling . 25 2.3.6 Improved Duckworth-Lewis vs. Duckworth-Lewis . 26 v 3 Accuracy Check 28 3.1 Cohen's Kappa . 29 3.2 Root Mean Square Error . 30 3.3 Stoppage Distribution . 32 4 Discussion 42 A Basic Rules of Cricket 48 A.1 Ways to dismiss a batsman . 48 A.2 Ways to score runs . 49 B Clark Curves 51 C Difference between T20I and 50 overs game 56 D Scraping Code 59 E Innings 2 Resource Table and Resource Surfaces for 50 overs games 62 F 20 over Resource Tables 68 F.1 First Innings Tables . 68 F.2 Second Innings Tables . 69 F.2.1 Isotonic Regression . 69 F.2.2 Optimization on R ...................................... 69 vi List of Figures 1.1 Runs Achievable vs Overs Remaining using PARAB Method, . .9 1.2 Plot of Duckworth-Lewis resource table for the average number of runs scorable with wickets lost and overs remaining . 13 2.1 Heatmap (levelplot) of a Resource Table . 16 2.2 Heatmap (levelplot) of Standard Deviation of a Resource Table . 16 2.3 Heatmap (levelplot) of difference of Resource Table of both innings . 18 2.4 Heatmap (levelplot) of Optimized R Resource Surface via Duckworth-Lewis Method . 19 2.5 Heatmap (levelplot) of Isotonic Regression Resource Surface . 20 2.6 Heatmap (levelplot) of Gibbs Sampling Resource Surface . 22 2.7 Comparison of R Resource Table and Duckworth-Lewis Resource Surface . 23 2.8 Comparison of Isotonic Regression and Duckworth-Lewis Resources . 24 2.9 Comparison of Gibbs Sampling and Duckworth-Lewis Resources . 24 2.10 Comparison of Isotonic Regression and Improved Duckworth-Lewis Resource Surfaces . 25 2.11 Comparison of Gibbs Sampling and Improved Duckworth-Lewis Resource Surfaces . 26 2.12 Comparison of Improved Duckworth-Lewis and Duckworth-Lewis Resource Tables . 27 3.1 Kappa values along with the CI for different methods for 50 overs . 30 3.2 Out of sample (unweighted) Kappa values along with the CI for different methods for all overs combined . 31 3.3 RMSE values along with the CI for different methods for 50 overs . 33 3.4 Out of sample (unweighted) RMSE values along with the CI for different methods for all overs combined . 33 3.5 Bias values for different methods for 50 overs . 34 3.6 Stoppage probability for all 50 overs . 34 3.7 Average stoppage distribution for 50 overs . 36 vii 4.1 Heatmap (levelplot) of difference between D/L and improved Duckworth-Lewis Resource Sur- faces.................................................. 43 4.2 Difference between Low, Mid and High scoring runs Resource Tables . 44 4.3 Out of sample RMSE values along with the CI for different methods for all overs combined . 45 4.4 Kappa estimates along with the CI for different methods for 20 overs games . 46 4.5 RMSE values along with the CI for different methods for 20 overs games . 47 B.1 CLARK Curves . 52 B.2 Stoppage Type 2 . 52 B.3 Stoppage Type 3 . 53 B.4 Stoppage Type 5 . 55 C.1 Resources available in T20I with wickets lost and overs remaining . 57 F.1 Heatmap (levelplot) of R Resource Table for Innings 2 . 71 F.2 Heatmap (levelplot) of Isotonic Regression Resource Table for Innings 2 . 72 F.3 Heatmap (levelplot) of Optimized R Resource Table for Innings 2 . 73 viii List of Tables 3.1 Average stoppage distribution for 50 overs . 36 3.2 Duckworth-Lewis Resource Table for 50 overs . 37 3.3 R Resource Table using 50 over games . 38 3.4 Optimized R Resource Surface using 50 over games . 39 3.5 Isotonic Regression Resource Surface using 50 over games . 40 3.6 Gibbs Sampling Resource Surface using 50 over games . 41 C.1 Duckworth-Lewis Resource Table . 58 E.1 R Resource Table using 50 over games for Innings 2 . 63 E.2 Standard Deviation of R Resource Table using 50 over games for Innings 1 . 64 E.3 Isotonic Regression Resource Table using 50 over games for Innings 2 . 65 E.4 Optimized R Resource Table using 50 over games for Innings 2 . 66 E.5 Gibbs Sampling Resource Table using 50 over games for Innings 2 . 67 F.1 R Resource Table . 68 F.2 Isotonic Regression Resource Table . 69 F.3 Optimized R Resource Table . 70 F.4 Gibbs Sampling Resource Table . 70 F.5 R Resource Table for Innings 2 . 71 F.6 Isotonic Regression Resource Table for Innings 2 . 72 F.7 Optimized R Resource Table for Innings 2 . 73 ix Chapter 1 Introduction 1.1 History and Formats of Cricket Cricket is one of the most entertaining and the second most watched sport, after soccer, in the world. The game was discovered and introduced in England; the first indication of cricket being played dates back to 1598 in Surrey [1, 3, 16]. The claim is supported by an older man's testimony, as he used to play cricket with his friends instead of attending church mass. Cricket became famous after the Restoration of 1660, however, which is when gamblers started to show interest in the game. Through the English colonies, the game was introduced in North America in the 17th century and the game spread throughout the rest of the world by the 18th century [3]. Interestingly, the first international game was played in 1844 between Canada and United States of America [1]. Three different formats of cricket are currently played at the international level. The first version of cricket introduced, `Test Cricket', can last up to five days. Among many cricketers, this version of cricket is considered the soul of cricket; and unlike any other sport, it combines tactical, technical, physical and mental elements into a single sport [4]. Also, the players (batsmen mainly) can take their time to settle down (that is, there is no time pressure constraint). Every game day, play starts at 9:30 in the.