Math Extended Essay
Total Page:16
File Type:pdf, Size:1020Kb
Math Extended Essay Outline, Introduction and Rationale Research Question How can principles of Probability and Markov chains be used in T20 cricket to find convergence patterns in the areas in which shots are hit for a particular batsman during a particular innings which can in turn help in better field placements?(Given his previous batting history) Aim To find and predict convergence patterns in terms of areas in which runs are scored for a particular batsman( AB de Villiers) in a particular innings given his past scoring patterns. Introduction Right from my childhood, I have always been an ardent sports fan. So when I started looking for potential IA and extended essay topics, I was sure that these topics had to be related to sports in some way or the other. After some thought, I chose Cricket as my sport due to various factors. Firstly, being a passionate cricket player and fan myself, I thought that I would be able to comprehend and relate an investigation on cricket better than an investigation on other sports. Furthermore, my personal involvement with the topic itself would provide me additional motivation and interest towards the investigation itself. Even from a mathematician’s perspective , I personally think that cricket is best suited for an investigation on statistics and Markov chain modelling because of the sheer volume of variables, data and statistics each game of cricket provides. The mathematical possibilities are endless! One can investigate batting averages, bowling averages, strike rates, fielding averages , economy rates etc . The list really is endless. So it was decided, my extended essay would be on cricket. I still had to decide what exact part of mathematics my investigation on cricket would cover though . Also, in mathematics, in particular I was always interested in the topic of probability and statistics. As I started researching and looking at past extended essays, explorations and research paper, one thing I observed was the incredibly simple but very widely applicable probability concept of the markov chains (which is basically an extension of the concept of conditional probability). After looking at a couple of study’s that used Markov chains and eigenvectors as tools to calculate various sorts of probabilities in sports settings, I came up with the idea of using Markov chains , transitional matrices and eigen vectors to investigate the probability of several variables in cricket. After much deliberation and thought, I decided to use this concept to calculate the probability of a particular Batsman (AB de villiers ) hitting the ball in a particular zone in the cricket field(refer to picture below) given his previous zone scoring pattern in a particular innings. Given that there are three formats that cricket is played in, namely-Test cricket, One day International cricket and T20 international cricket. The three formats vary in the number of overs each team gets to bat per innings.I had to choose a particular format as it would be very hard to find scoring patterns across the three formats. This is primarily due to the fact that scoring patterns across the three formats vary immensely. While T20 cricket and One day internationals are (especially T20 internationals) are very dynamic and fast paced because of the limited overs teams have, test cricket is the exact opposite with its strategic but very slow game progression. This is essentialy due to the fact that one Test match is spread over 5 days! One can say that this investigation reflects a combination of my zeal for two vastly different but inter-related things-: Math (in particular probability and statistics) and Cricket. Abstract This investigation aims to utilize the concepts of Markov chains conditional probability and eigenvectors to find a mathematically significant pattern in AB De Villier’s run scoring patterns (in terms of the areas in which he scores the runs in. Given historical data from ESPN Cricinfo’s cricket database stats guru, I have calculated the conditional probabilities of AB De Villiers hitting a shot in any particular ”zone” of the ground, given any combination .These conditional probabilities are summarized in a diagram called a transition diagram. I will then use these conditional probabilities along with the primary concepts and principles behind Markov chains, eigenvalues and eigenvectors and basic probability to arrive at a limiting value which will provide us with a mathematically significant pattern on AB de Villiers’s scoring patterns and will also act as a measure for future predictions on roughly where on the ground AB de Villier’s next shot might take place These calculations will then be repeated for each possible combination of possibilities. Given that there are 4 zones there will be 12 values that I should get at the end which will hopefully provide insights into convergence patterns at different points in the innings. Say for example the 60th ball or the 120th ball. With the help of this investigation and the resultant values, I hope to actually provide insightful data and mathematical analysis on How to plan a field for AB de Villiers to international teams all over the world. This investigation can then be extended on to any batsman in the world to find previously unknown scoring patterns and plan fields accordingly given the situation the match is poised in. The Game of Cricket This is categorized as a shot in zone 1 Cricket is a bat and ball sport played on a circular field of a diameter of anywhere between 120-180 m. A game of cricket consists of two teams with 11 players each taking turns batting and bowling. A game of cricket consists of two or four innings with each team alternating between batting and bowling. The batting team aims to take as many runs as possible in a particular innings while the bowling team tries to restrict the batting team to as few runs as possible. The team that scores more runs at the end of their respective innings wins. Each team at an instant has 1 bowler, 1 wicketkeeper and 9 fielders, all of whom try to stem the flow of runs. Runs are scored by running between the two ends of the pitch. In order to score as many runs as possible, batsmen usually try to hit the ball into gaps in the field. ie- places in the field where there are no fielders. Assumptions made for the investigation • For the purpose of this investigation, I have divided the standard cricket field into 4 equal zones as demonstrated in the above picture. Each zone therefore subtends a 90 degree angle in the circular field. To give some context, the picture given on the previous page will be categorized as a scoring shot in zone 1. • The batsman I have chosen for my investigation is AB de Villiers. I have very sound reasoning as to why I chose AB De Villiers as my case study. Now, AB de Villiers by most cricket analysts and experts is considered to be one of the most versatile batsman in the modern game due to his versatility in batting. He is often called Mr.360 due to his ability to score runs all around the ground. Given these credentials, I think it will be a bigger challenge for me to predict the probability with which de villiers will hit the ball into a particular zone given his versatility in hitting the ball into all parts of the ground, therefore giving me more scope for mathematical investigation. • All the statistics I have taken for AB de villiers’s scoring pattern are from a website called cricinfo . My data sample is all the first class T20 matches AB de Villiers has played in the last 6 years(2010 May-2016 May) • Since AB de villiers is a right handed batsman, the above mentioned zone distribution is used however for left handed batsman the zones are inverted horizontally.ie-Zone 2 for a right handed batsman would be Zone 4 for a left handed batsman. What are Markov chains? Markov chains, named after Andrey Markov, are mathematical systems that hop from one "state" (a situation or set of values) to another. For example, if you made a Markov chain model of a baby's behavior, you might include "playing," "eating", "sleeping," and "crying" as states, which together with other behaviors could form a 'state space': a list of all possible states. In addition, on top of the state space, a Markov chain tells you the probability of hopping, or "transitioning," from one state to any other state---e.g., the chance that a baby currently playing will fall asleep in the next five minutes without crying first. In short Markov chains are a very effective way of illustrating the concept of conditional probability between multiple states. In line with the aforementioned descriptions of Markov chains, The Markov chain or the transition diagram for my investigation would have 4 states for the corresponding zones and another state which accounts for exceptions ie- Some balls are left alone or actually missed by the batsman, these will not come under any of these 4 states which is one of the reasons why the sum of all the probabilities The concept of conditional probability Conditional probability is defined as the probability of a particular event assuming or give that another event has occurred. It is in mathematical terms therefore defined as-: Transition diagram and conditional probability table The following table illustrates the conditional probabilities off all possible permutations of scoring patterns from from Zone 1 to Zone 4. The vertically categorized rows in the table below represent the condition whereas the horizontally categorized columns represent the case of the probability you are trying to find given the aforementioned condition.