Correlation Between Average and Age in Cricket
Total Page:16
File Type:pdf, Size:1020Kb
Correlation Between Average and Age In Cricket 1 Table Of Contents ● Hypothesis…………………………………………………………………………..2 ● Statement Of Intent………………………………………………………………...3 ● Data…………………………………………………………………………………..4 ● Graphs (1).........................................................................................................5 ● Trend Line and Formula...................................................................................6 ● Cricket Background..........................................................................................8 ● Second Claim...................................................................................................8 ● Graphs (2)....................................................................................................8-11 ● Interpolation.....................................................................................................12 ● Extrapolation....................................................................................................12 ● Validity..............................................................................................................1 2 ● Area For Improvement.....................................................................................12 ● Conclusion.......................................................................................................13 ● Bibliography....................................................................................................14 2 Hypothesis If I compare the data for 7 different batsmen over the course of 12 years, it will be evident that there is a correlation between age and average in cricket. This is due to the belief that as sportsmen age, they become more mature and more aware of what their role requires. Also, a young sportsman is more likely to be shuffled around the order, which stops them from fully expressing themselves. Therefore, I further predict that as an individual ages their average in cricket will also rise. As mentioned earlier, this is due to them being aware of what their role requires and also the fact that they have been playing at the highest level for a long span of time, thus giving them experience on conditions and oppositions. Essentially, I am expecting a positive correlation. Statement Of Intent In this exploration, my aim is to discover whether age plays a role in performance (batting average) in cricket. Many ‘pundits’ suggest that age is just a number, in cricket, but I would like to find out for myself whether age is truly just a number or whether this phenomenon is being carried on by a few anomalies. In order to ensure variety in my results, I have chosen to compare 7 batsman from across the playing countries. The batsmen I have selected are: 1. Sachin Tendulkar (India) 2. Jacques Kallis (South Africa) 3. Ricky Ponting (Australia) 4. Virender Sehwag (India) 5. Sanath Jayasuriya (Sri Lanka) 6. Mahela Jayawerdene (Sri Lanka) 7. Kumar Sangakkara (Sri Lanka) To ensure a fair test, I have selected batsmen that have played 40 + ODI (50 over) matches over the course of 12 years. For each batsmen I have recorded their; Average (age 22-34) and their Strike rate (runs per 100/balls). The gathered information has been derived from the largest cricket database, EspnCricinfo. To discover the relationship between age and average, or if there is any, I will create 3 different types of graphs. The graphs being; 1. Line graph 2. Scatter Plot 3. Trend line graph 4. Combination Graph To further find the correlation, I will be applying the correlation coefficient and adding a regression equation where applicable. Through the use of the regression equation, I will make a trend line graph to help me find the line of best fit which will further help me predict and analyse my data. Through the data derived from the graphs, I will test my hypothesis and see whether age is truly just a number or whether it has an impact on performance. I will then aim to look for similarities between data and also aim to explain any discrepancies found. To further my knowledge on the correlation, or lack of, I will also be comparing a batsmen’s strike rate to their average to see if it coincides with rise or falls in the average. 3 Data Batsmen Average: Batsmen Strike Rate Information In Regards To Age Information In Regards To Batsmen Data: ● There were no outliers in the information sorted through age. Although there was one outlier in terms of a cricketer (13.2, Kumar Sangakkara), it was a minor outlier that had no major impact on the data. 1.5 x 12.955 = 19.4325 I 33.46 - 19.4325 = 14.0275. 13.2 falls outside 14.0275 boundary ● Mean = sum of all data / number of data sets ● Median = n+1 / 2 I n=number of values ● Mode = most common number 4 ● IQR = Q3-Q1 ● Max Value = Highest value in data ● Min Value = Min Value in data ● R = Correlation Coefficient ● R^2 = Correlation of Determination = how much of the dependent variable can be explained by the independent variable 5 Graphs Scatter Plot: Will allow me to see any visual correlation between the data sets This scatter plot depicts the batting average of All the batsmen in relation to their age. Looking at the graph, there does not seem to be a convincing relation. It seems as if the averages fluctuate regardless of age and there does not seem to be one age where all batsmen are at a consistent high or a consistent low. Second graph : Line graph Line graph: Unlike the scatter plot, the line graph helps the viewer understand the path of a cricketers average. Through using a line graph, ups and downs become more apparent and it also gives us a better idea of where cricketers were at certain points in their career. The line between points also helps the viewer understand the slope, which helps understand local changes thus emphasising rises in averages. 6 The line graph gives us a fairly good representation of the correlation between age and score. As you can see, the beginning of a cricketer’s career is extremely volatile. Some individuals such as Jayasuriya and Sehwag average extremely low to begin with but as time passes their averages increases. As the graph perfectly depicts, there seems to be a cluster around the 40 run average at the age of 34, with only Ricky Ponting, Sanath Jayasuriya and Virender Sehwag falling below the mark. To the average statistician, it may seem that a correlation, although extremely weak, is evident. All the batsmen start volatile but average around 40 at the age of 34. But to test the true strength of the correlation, I will explore the regression expression. Trend Line and Formula As I aim to find the correlation between batting averages and age, I will be doing the correlation coefficient for all 7 batsmen. This is the formula that I will be using; 1. N = number of pairs of scissors 2. Σxy = sum of the products of paired scores 3. Σx= sum of x scores 4. Σy = sum of y scores 5. Σx^2 = sum of squared x scores 6. Σy^2 = sum of squared y scores The results have been collected through a wide mean of methods, that include; online calculator, google sheets, graphing calculator and also by hand. Here are my R values; 1. Sachin Tendulkar : r = -0.0768 r^2 = 0.0059 2. Jacques Kallis : r = 0.1744 r^2 = 0.0304 3. Ricky Ponting : r = -0.1269 r^2 = 0.0161 4. Virender Sehwag : r = 0.4411 r^2 = 0.1946 5. Sanath Jayasuriya : r = 0.7448 r^2 = 0.5547 6. Mahela Jayawerdene : r = 0.3632 r^2 = 0.1319 7. Kumar Sangakkara : r = 0.4801 r^2 = 0.2305 7 To exhibit these values, I have created a trend line graph. This trend line graph will help show ‘the line of best fit’ for each batsmen. The line of best fit shows us the reliable trend in a batsmen’s average as the years go on. Also, The line graph also helps us make accurate judgements to find interpolated and extrapolated data. This trend line graph shows us the ‘trend’ of all batsmen. The graph highlights the fact that all the batsmen start very volatile but end up consistently around 40. What’s important to note is that all batsmen have positive correlations, except for Ricky Ponting and Sachin Tendulkar. A positive correlation means that the larger scores on one scale are associated with larger scores on the other. For the batsmen, par Tendulkar and Ponting, this means that they had relatively lower scores at a younger age and as they got older, their averages increased. A negative correlation means that larger scores on one scale are associated with minute scores on the other. What this means for Tendulkar and Ponting is that they started off with a relatively high average, and as the got older their average dropped. This is quite interesting, as I predicted that as an individual got older their performance increased. But, I was not expecting for two cricketers two lose steam as they got older. In this scenario, the weak correlation is expected as it is almost impossible to have an almost perfect correlation. In the cricket world, many variables play an impact on average. So thus the chances of having a 1 or -1 correlation are extremely low. But as you can see, Some individuals have a stronger correlation than others. Virender Sehwag, Sanath Jayasuriya, Mahela Jayawerdene and Kumar Sangakkara all have Strong correlations, while Kallis has a weaker correlation. Tendulkar and Ponting have weak negative correlations. Through this, we can assume that a cricketer does not necessarily improve as they get older. But what is intriguing me, is the r^2 value. The r^2 value can be presented as a percentage and what that shows us, is how much of the dependent variable (average) can be explained by the independent variable (age). The low r^2 values tell me that there is more to cricket averages than just age, 8 a lot more. Thus the next step in my investigation will be to compare cricket averages and cricket strike rate. Cricket Background In cricket, there is a statistic known as the strike rate. The strike rate is the projected score a batsman will make every 100 balls he faces. The formula for strike rate is total runs/total balls faced x 100. To further explain the negative correlation of the batsmen, I'd like to compare the strike rate with the average to see if there is any relation.