Imperial Journal of Interdisciplinary Research (IJIR) Vol-2, Issue-5, 2016 ISSN: 2454-1362, http://www.onlinejournal.in

Review to the Duckworth-Lewis Method Using Data Mining Techniques

Rohan Brahme1*, Roshan Birar2, Poonam Kadnar3 & Prof. Suruchi Malao4 1,2,3,4Department of Computer Engineering, K. K. Wagh Institute of Engg. Education & Research, Savitribai Phule Pune University, India

Abstract- The Duckworth - Lewis system represents mathematical formulation used to get a target score B. Duckworth-Lewis Method for matches interrupted by bad weather Two British statisticians Frank Duckworth and conditions. The Duckworth-Lewis (D/L) method Tony Lewis developed their Method called considers only two factors to provide updated target Duckworth-Lewis (D/L) method which is nothing but i.e. number of runs which can be scored in the a statistical method used to predict the target score of remaining as a function of the number of the team second in a limited overs game which overs remaining and the number of in hand. is interrupted by unavoidable circumstances. The D/L We will be using WEKA tool to find bias in current method, a system based on mathematical model D/L system and capably illustrate those. considers only two resources – wickets left and overs Duckworth Lewis system has observed to be remaining. When overs are lost, setting an adjusted biased towards the team batting first and the team target is not as simple as to reduce the batting team’s winning the toss from the scenarios like interruption target proportionally, because a team batting second of the game for multiple times in same match and fall with wickets in hand can be expected to play more wickets in death overs while batting second. Bias in aggressively than one with full 50 ’s and hence the context of the outline is defined as taking can achieve a higher rate. So then Duckworth & advantage of the assets of systems such as the Lewis (1998) considered the most common situation D/Lewis method. We also explore to show that such where two terms play a full length game. taking advantage of the system permits prediction of the result of the match winner which is better than just chance. Using the above analysis, we propose a modification to the existing Duckworth Lewis system by considering the observed patterns from the dataset as an additional resource to reduce the bias along with the existing resources to predict the target score.

Keywords- Cricket, Duckworth - Lewis, WEKA, C4.5, Decision Trees.

I. INTRODUCTION

A. The game of Cricket The above graph [1] shows the percentage of As mentioned in [1]. Cricket is a bat-and-ball team resources remaining for a team to the number of overs sport that is originated in England and is one of the bowled. As we see it in an exponential graph reducing most popular games in the world. Moving on from the as more number of wickets keep falling and comes conventional , it has slowly ventured into down to zero when the 9th falls. Duckworth- limited over formats like ODI and T20 so that a Lewis observed a close connection between the definite result is obtained, making it more entertaining availability of these resources and team’s final score, as a spectator sport. Sometimes due constraints like which this algorithm tries to exploit. bad weather (rain, sandstorms and bad lights), In above table the remaining overs are plotted floodlight failure and crowd issue certain amount of against wickets lost. overs are lost and hence a definite result isn’t obtained. To overcome these obstacles methods have been devised to revise target scores and/or declare a winner.

Imperial Journal of Interdisciplinary Research (IJIR) Page 547

Imperial Journal of Interdisciplinary Research (IJIR) Vol-2, Issue-5, 2016 ISSN: 2454-1362, http://www.onlinejournal.in

1. South Africa vs. New Zealand, Durban, November 2000 Batting first for New Zealand and score was 81 for 5 after 27.2 overs when rain reduced the game to 49 overs per side. Then, with New Zealand on 114 for 5 in 32.4, their innings was stopped due to rain, and the second innings was shortened to 32 overs. South Africa's got new target according to the D/L charts was 153, but modified version suggests that the target would have been 156. At the time when game was interrupted, New Zealand's was 3.48 for five wickets down and 17.2 overs to spare. Resource Percentage Table According to D/L's modified calculations,

South Africa's required run rate would be The above table [3] is the calculation of percentage 4.87. of resources left. Here the percentage of resources are calculated beforehand by taking into consideration of 2. West Indies vs. New Zealand, Port-of- the overs left and the wickets lost and is stored in the Spain, 2002 table so that it comes in handy while calculating the New Zealand made 212 for 5 in 44.2 revised target. This table is actually referred to while overs while batting first, when their Duckworth Lewis comes into picture. innings was called off and West Indies' However, some of the factors like the toss may chase was truncated to 33 overs. D/L play a crucial role while deciding the winner since it calculated their revised target at the time involves a lot of speculation and research while as 212. Again, a comparison of run rates deciding bat or field first after winning the toss. For raises a few questions. New Zealand's run example, the analysis of pitch report, previous history rate at the end of their innings was 4.78; of the ground, and expected weather conditions and West Indies' required rate in 33 overs these factors that suggests the decision. In rain according to D/L is 6.36, an increase of affected matches the batting first is the advantageous 33%. decision. After rain, the pitch becomes soft and outfield becomes slow and the ball bounces unevenly, 3. South Africa vs. New Zealand, making it difficult to bat as mentioned in [2]. Johannesburg, WC2003

Replying to South Africa's imposing 306 C. Duckworth-Lewis Model for 6 in 50 overs, New Zealand, riding on Objective of D/L system was to find method that Stephen Fleming's outstanding , must follow the criteria given below. were 182 for 1 in 30.2 when rain reduced 1. It must maintain exact fairness to both the chase to 39 overs. According to the sides. new D/L calculations, the revised target 2. It must give appropriate result in all would have been 229 (it was 226 at the possible situations. time). The point of contention is this: at 3. Team 1’s scoring pattern should not affect the time of the interruption, New the revised target for team 2 in an Zealand's required rate was 6.35 runs per interrupted game. over, stretching over a period of almost 20 The interruption of game for multiple times during overs. Going by the current D/L same match and the fall of wickets in death overs calculations, the required run rate on cause the unfair dealing with target prediction. So the resumption is 5.42, over a period of just data mining to reduce such bias in specific conditions 8.4 overs - obviously, the rain has should be done. simplified New Zealand's task enormously

(though the D/L contention is that New

Zealand are reaping the rewards of being D. Controversial D/L method decided well ahead of the par score at the point of matches [2] interruption). Some actual scenarios from ODIs that highlight the shortcomings in D/L method:

Imperial Journal of Interdisciplinary Research (IJIR) Page 548

Imperial Journal of Interdisciplinary Research (IJIR) Vol-2, Issue-5, 2016 ISSN: 2454-1362, http://www.onlinejournal.in

E. Formula for D/L score calculation[6] in death over cause unfair result if inning gets Let, truncated. S: Team 1’s score [4] We are going to these tools to extract such R1: Resources % available to team 1 (from R. P. patterns and will try to minimise bias. This is the table) foundation for evaluation part of the project and base R2: Resources % available to team 2 (from R. P. input for extension part. table) Following are some observed patterns [2]: T: Target score for team 2 1. Pattern 1: Team winning the toss wins the matches in 66% cases. Case 1: 2. Pattern 2: Team batting first wins the If R1>R2, match in 64% cases. T=S(R1/R2)+1; 3. Pattern 3: 54% of teams winning toss Reduces team’s score in proportion to reduction in elects to field first in the rain affected resources matches. 4. Pattern 4: Average of difference in run Case 2: rate between winning and losing team If R1=R2, scores is not significant T=S+1; No adjustment required A. Inferences from the above patterns[2] 1. D/L method has been biased towards the Case 3: team batting 1st. If R1

Imperial Journal of Interdisciplinary Research (IJIR) Page 549

Imperial Journal of Interdisciplinary Research (IJIR) Vol-2, Issue-5, 2016 ISSN: 2454-1362, http://www.onlinejournal.in

Duckworth-Lewis method was introduced in the late 1990s and has since been adopted by all major Target score=Target score by D/L method * [p1 cricketing boards. No other sport uses a statistical *p2*….*pn] method to select the winning target for a match. But the peculiarities of cricket and its susceptibility to bad Where: th weather have made it imperative to and such a pi is a bias reducing parameter for i pattern, solution for matches where a result is mandatory. Cricket one of the most popular game in the world. VI. CONCLUSIONS With billions of followers around the world and This paper presents a novel approach to evaluate extension to the D/L method is probably the greatest the Duckworth Lewis system which is used to predict contribution to the sporting world from a the target score in rain affected cricket matches when mathematical, statistical and operational research one or both the teams have had their innings perspective. shortened. Using sophisticated data mining techniques such as C4.5 with help of WEKA tool we will discover the bias in the D/L method from different IV. EVALUATION OF THE DUCKWORTH LEWIS patterns extracted from sample dataset using WEKA METHOD AND IDENTIFICATION OF IT’S tool. The observed bias will be in the favor the team LIMITATIONS batting first and the team winning the toss. The additional resources while giving the input to the B. Description of Dataset[4] system will be the patterns when D/L gives unfair The dataset consists of information of all the prediction such as interruption of game for multiple matches that were affected by inclement weather and times in the same match that and fall of wickets in the location. i.e. the country and the stadium where death overs the will help to reduce the bias. Duckworth Lewis method had come to use. The dataset consists mainly of (ODI) matches of teams from India, Pakistan, England, West Indies, South Africa, New Zealand, Sri Lanka, Australia, Bangladesh & Afghanistan. [2] VII. ACKNOWLEDGMENT Our thanks to Prof. Dr. S. S. Sane and Mr. Sameer V. EXTENSION TO THE DUCKWORTH LEWIS METHOD Mainkar for guidance and support. In the analysis part of the project we will show VIII. REFERENCES the exploitation of D/L towards the team batting first [1] http://en.wikipedia.org/wiki/Cricket and the team winning the toss using data mining [2] Phanse V. & Deorah S. (2011, December). Evaluation and techniques. We also have some of the sample Extension to the Duckworth Lewis Method: A Dual examples to show the bias in the system. There are Application of Data Mining Techniqies. In Data Mining th several factors which causes exploitation in the Workshop (ICDMW), 2011 IEEE 11 International Conference on (pp. 763-770). IEEE system such as interruption of the game for multiple [3] Frank Duckworth, The Duckworth/Lewis method: an times during same match, fall of wickets in death exercise in Maths, Stats, OR and communications in MSOR overs which leads to unfair changes in the target Connections Vol 8 No 3 August – October 2008 score prediction. So we will extract such ‘n’ number [4] http://www.espncricinfo.com/ [5] Schall R. & Weatheral D. (2013). Accuracy and fairness of of factors using WEKA tool and these patterns will be rain rules for interrupted one-day cricket matches. Journal of provided to the Duckworth-Lewis method calculator Applied statistics, 40(11), 2462-2479 along with its former inputs No. of overs to be played [6] Harshil Shah, Jay Sampat, Rushabh & Kiran Bhowmick and No. of wickets as a resources. (2015). Review of Duckworth Lewis Method [7] Parera H. P. & Swartz T. B. (2013). Resource estimation in Bias in the system gives permission to guess the T20 cricket. IMA Journal of Management Mathematics, winner of the match in rain affected games which 24(3), 337-347. marks a question on D/L system sometimes.

Imperial Journal of Interdisciplinary Research (IJIR) Page 550