Review to the Duckworth-Lewis Method Using Data Mining Techniques
Total Page:16
File Type:pdf, Size:1020Kb
Imperial Journal of Interdisciplinary Research (IJIR) Vol-2, Issue-5, 2016 ISSN: 2454-1362, http://www.onlinejournal.in Review to the Duckworth-Lewis Method Using Data Mining Techniques Rohan Brahme1*, Roshan Birar2, Poonam Kadnar3 & Prof. Suruchi Malao4 1,2,3,4Department of Computer Engineering, K. K. Wagh Institute of Engg. Education & Research, Savitribai Phule Pune University, India Abstract- The Duckworth - Lewis system represents mathematical formulation used to get a target score B. Duckworth-Lewis Method for cricket matches interrupted by bad weather Two British statisticians Frank Duckworth and conditions. The Duckworth-Lewis (D/L) method Tony Lewis developed their Method called considers only two factors to provide updated target Duckworth-Lewis (D/L) method which is nothing but i.e. number of runs which can be scored in the a statistical method used to predict the target score of remaining innings as a function of the number of the team batting second in a limited overs game which overs remaining and the number of wickets in hand. is interrupted by unavoidable circumstances. The D/L We will be using WEKA tool to find bias in current method, a system based on mathematical model D/L system and capably illustrate those. considers only two resources – wickets left and overs Duckworth Lewis system has observed to be remaining. When overs are lost, setting an adjusted biased towards the team batting first and the team target is not as simple as to reduce the batting team’s winning the toss from the scenarios like interruption target proportionally, because a team batting second of the game for multiple times in same match and fall with wickets in hand can be expected to play more wickets in death overs while batting second. Bias in aggressively than one with full 50 over’s and hence the context of the outline is defined as taking can achieve a higher run rate. So then Duckworth & advantage of the assets of systems such as the Lewis (1998) considered the most common situation D/Lewis method. We also explore to show that such where two terms play a full length game. taking advantage of the system permits prediction of the result of the match winner which is better than just chance. Using the above analysis, we propose a modification to the existing Duckworth Lewis system by considering the observed patterns from the dataset as an additional resource to reduce the bias along with the existing resources to predict the target score. Keywords- Cricket, Duckworth - Lewis, WEKA, C4.5, Decision Trees. I. INTRODUCTION A. The game of Cricket The above graph [1] shows the percentage of As mentioned in [1]. Cricket is a bat-and-ball team resources remaining for a team to the number of overs sport that is originated in England and is one of the bowled. As we see it in an exponential graph reducing most popular games in the world. Moving on from the as more number of wickets keep falling and comes conventional Test cricket, it has slowly ventured into down to zero when the 9th wicket falls. Duckworth- limited over formats like ODI and T20 so that a Lewis observed a close connection between the definite result is obtained, making it more entertaining availability of these resources and team’s final score, as a spectator sport. Sometimes due constraints like which this algorithm tries to exploit. bad weather (rain, sandstorms and bad lights), In above table the remaining overs are plotted floodlight failure and crowd issue certain amount of against wickets lost. overs are lost and hence a definite result isn’t obtained. To overcome these obstacles methods have been devised to revise target scores and/or declare a winner. Imperial Journal of Interdisciplinary Research (IJIR) Page 547 Imperial Journal of Interdisciplinary Research (IJIR) Vol-2, Issue-5, 2016 ISSN: 2454-1362, http://www.onlinejournal.in 1. South Africa vs. New Zealand, Durban, November 2000 Batting first for New Zealand and score was 81 for 5 after 27.2 overs when rain reduced the game to 49 overs per side. Then, with New Zealand on 114 for 5 in 32.4, their innings was stopped due to rain, and the second innings was shortened to 32 overs. South Africa's got new target according to the D/L charts was 153, but modified version suggests that the target would have been 156. At the time when game was interrupted, New Zealand's run rate was 3.48 for five wickets down and 17.2 overs to spare. Resource Percentage Table According to D/L's modified calculations, South Africa's required run rate would be The above table [3] is the calculation of percentage 4.87. of resources left. Here the percentage of resources are calculated beforehand by taking into consideration of 2. West Indies vs. New Zealand, Port-of- the overs left and the wickets lost and is stored in the Spain, 2002 table so that it comes in handy while calculating the New Zealand made 212 for 5 in 44.2 revised target. This table is actually referred to while overs while batting first, when their Duckworth Lewis comes into picture. innings was called off and West Indies' However, some of the factors like the toss may chase was truncated to 33 overs. D/L play a crucial role while deciding the winner since it calculated their revised target at the time involves a lot of speculation and research while as 212. Again, a comparison of run rates deciding bat or field first after winning the toss. For raises a few questions. New Zealand's run example, the analysis of pitch report, previous history rate at the end of their innings was 4.78; of the ground, and expected weather conditions and West Indies' required rate in 33 overs these factors that suggests the decision. In rain according to D/L is 6.36, an increase of affected matches the batting first is the advantageous 33%. decision. After rain, the pitch becomes soft and outfield becomes slow and the ball bounces unevenly, 3. South Africa vs. New Zealand, making it difficult to bat as mentioned in [2]. Johannesburg, WC2003 Replying to South Africa's imposing 306 C. Duckworth-Lewis Model for 6 in 50 overs, New Zealand, riding on Objective of D/L system was to find method that Stephen Fleming's outstanding century, must follow the criteria given below. were 182 for 1 in 30.2 when rain reduced 1. It must maintain exact fairness to both the chase to 39 overs. According to the sides. new D/L calculations, the revised target 2. It must give appropriate result in all would have been 229 (it was 226 at the possible situations. time). The point of contention is this: at 3. Team 1’s scoring pattern should not affect the time of the interruption, New the revised target for team 2 in an Zealand's required rate was 6.35 runs per interrupted game. over, stretching over a period of almost 20 The interruption of game for multiple times during overs. Going by the current D/L same match and the fall of wickets in death overs calculations, the required run rate on cause the unfair dealing with target prediction. So the resumption is 5.42, over a period of just data mining to reduce such bias in specific conditions 8.4 overs - obviously, the rain has should be done. simplified New Zealand's task enormously (though the D/L contention is that New Zealand are reaping the rewards of being D. Controversial D/L method decided well ahead of the par score at the point of matches [2] interruption). Some actual scenarios from ODIs that highlight the shortcomings in D/L method: Imperial Journal of Interdisciplinary Research (IJIR) Page 548 Imperial Journal of Interdisciplinary Research (IJIR) Vol-2, Issue-5, 2016 ISSN: 2454-1362, http://www.onlinejournal.in E. Formula for D/L score calculation[6] in death over cause unfair result if inning gets Let, truncated. S: Team 1’s score [4] We are going to these tools to extract such R1: Resources % available to team 1 (from R. P. patterns and will try to minimise bias. This is the table) foundation for evaluation part of the project and base R2: Resources % available to team 2 (from R. P. input for extension part. table) Following are some observed patterns [2]: T: Target score for team 2 1. Pattern 1: Team winning the toss wins the matches in 66% cases. Case 1: 2. Pattern 2: Team batting first wins the If R1>R2, match in 64% cases. T=S(R1/R2)+1; 3. Pattern 3: 54% of teams winning toss Reduces team’s score in proportion to reduction in elects to field first in the rain affected resources matches. 4. Pattern 4: Average of difference in run Case 2: rate between winning and losing team If R1=R2, scores is not significant T=S+1; No adjustment required A. Inferences from the above patterns[2] 1. D/L method has been biased towards the Case 3: team batting 1st. If R1<R2; 2. D/L method has been biased towards the T=S+[G50*(R2-R1)/100]+1; team winning the toss. Where G50 for matches involving ICC full 3. D/L method stresses more on wickets member nations, at present is 235 rather than run rate and the runs scored. Increase team 2 target score by the extra runs that are predicted in accordance with the extra resources Based on the above observations, we have a heuristic model to predict the winner in a match F.