Math Extended Essay Outline, Introduction and Rationale Research Question How can principles of Probability and Markov chains be used in T20 cricket to find convergence patterns in the areas in which shots are hit for a particular batsman during a particular innings which can in turn help in better field placements?(Given his previous history) Aim To find and predict convergence patterns in terms of areas in which runs are scored for a particular batsman( AB de Villiers) in a particular innings given his past scoring patterns. Introduction Right from my childhood, I have always been an ardent sports fan. So when I started looking for potential IA and extended essay topics, I was sure that these topics had to be related to sports in some way or the other. After some thought, I chose Cricket as my sport due to various factors. Firstly, being a passionate cricket player and fan myself, I thought that I would be able to comprehend and relate an investigation on cricket better than an investigation on other sports. Furthermore, my personal involvement with the topic itself would provide me additional motivation and interest towards the investigation itself. Even from a mathematician’s perspective , I personally think that cricket is best suited for an investigation on statistics and Markov chain modelling because of the sheer volume of variables, data and statistics each game of cricket provides. The mathematical possibilities are endless! One can investigate batting averages, bowling averages, strike rates, fielding averages , economy rates etc . The list really is endless. So it was decided, my extended essay would be on cricket. I still had to decide what exact part of mathematics my investigation on cricket would cover though .

Also, in mathematics, in particular I was always interested in the topic of probability and statistics. As I started researching and looking at past extended essays, explorations and research paper, one thing I observed was the incredibly simple but very widely applicable probability concept of the markov chains (which is basically an extension of the concept of conditional probability). After looking at a couple of study’s that used Markov chains and eigenvectors as tools to calculate various sorts of probabilities in sports settings, I came up with the idea of using Markov chains , transitional matrices and eigen vectors to investigate the probability of several variables in cricket. After much deliberation and thought, I decided to use this concept to calculate the probability of a particular Batsman (AB de villiers ) hitting the ball in a particular zone in the cricket field(refer to picture below) given his previous zone scoring pattern in a particular innings. Given that there are three formats that cricket is played in, namely-, cricket and T20 international cricket. The three formats vary in the number of overs each team gets to bat per innings.I had to choose a particular format as it would be very hard to find scoring patterns across the three formats. This is primarily due to the fact that scoring patterns across the three formats vary immensely. While T20 cricket and One day internationals are (especially T20 internationals) are very dynamic and fast paced because of the limited overs teams have, test cricket is the exact opposite with its strategic but very slow game progression. This is essentialy due to the fact that one Test match is spread over 5 days! One can say that this investigation reflects a combination of my zeal for two vastly different but inter-related things-: Math (in particular probability and statistics) and Cricket.

Abstract This investigation aims to utilize the concepts of Markov chains conditional probability and eigenvectors to find a mathematically significant pattern in AB De Villier’s scoring patterns (in terms of the areas in which he scores the runs in. Given historical data from ESPN Cricinfo’s cricket database stats guru, I have calculated the conditional probabilities of AB De Villiers hitting a shot in any particular ”zone” of the ground, given any combination .These conditional probabilities are summarized in a diagram called a transition diagram. I will then use these conditional probabilities along with the primary concepts and principles behind Markov chains, eigenvalues and eigenvectors and basic probability to arrive at a limiting value which will provide us with a mathematically significant pattern on AB de Villiers’s scoring patterns and will also act as a measure for future predictions on roughly where on the ground AB de Villier’s next shot might take place These calculations will then be repeated for each possible combination of possibilities. Given that there are 4 zones there will be 12 values that I should get at the end which will hopefully provide insights into convergence patterns at different points in the innings. Say for example the 60th ball or the 120th ball.

With the help of this investigation and the resultant values, I hope to actually provide insightful data and mathematical analysis on How to plan a field for AB de Villiers to international teams all over the world. This investigation can then be extended on to any batsman in the world to find previously unknown scoring patterns and plan fields accordingly given the situation the match is poised in.

The Game of Cricket

This is categorized as a shot in zone 1

Cricket is a bat and ball sport played on a circular field of a diameter of anywhere between 120-180 m. A game of cricket consists of two teams with 11 players each taking turns batting and bowling. A game of cricket consists of two or four innings with each team alternating between batting and bowling. The batting team aims to take as many runs as possible in a particular innings while the bowling team tries to restrict the batting team to as few runs as possible. The team that scores more runs at the end of their respective innings wins.

Each team at an instant has 1 bowler, 1 wicketkeeper and 9 fielders, all of whom try to stem the flow of runs. Runs are scored by running between the two ends of the pitch. In order to score as many runs as possible, batsmen usually try to hit the ball into gaps in the field. ie- places in the field where there are no fielders.

Assumptions made for the investigation • For the purpose of this investigation, I have divided the standard cricket field into 4 equal zones as demonstrated in the above picture. Each zone therefore subtends a 90 degree angle in the circular field. To give some context, the picture given on the previous page will be categorized as a scoring shot in zone 1. • The batsman I have chosen for my investigation is AB de Villiers. I have very sound reasoning as to why I chose AB De Villiers as my case study. Now, AB de Villiers by most cricket analysts and experts is considered to be one of the most versatile batsman in the modern game due to his versatility in batting. He is often called Mr.360 due to his ability to score runs all around the ground. Given these credentials, I think it will be a bigger challenge for me to predict the probability with which de villiers will hit the ball into a particular zone given his versatility in hitting the ball into all parts of the ground, therefore giving me more scope for mathematical investigation. • All the statistics I have taken for AB de villiers’s scoring pattern are from a website called cricinfo . My data sample is all the first class T20 matches AB de Villiers has played in the last 6 years(2010 May-2016 May) • Since AB de villiers is a right handed batsman, the above mentioned zone distribution is used however for left handed batsman the zones are inverted horizontally.ie-Zone 2 for a right handed batsman would be Zone 4 for a left handed batsman.

What are Markov chains? Markov chains, named after Andrey Markov, are mathematical systems that hop from one "state" (a situation or set of values) to another. For example, if you made a Markov chain model of a baby's behavior, you might include "playing," "eating", "sleeping," and "crying" as states, which together with other behaviors could form a 'state space': a list of all possible states. In addition, on top of the state space, a Markov chain tells you the probability of hopping, or "transitioning," from one state to any other state---e.g., the chance that a baby currently playing will fall asleep in the next five minutes without crying first. In short Markov chains are a very effective way of illustrating the concept of conditional probability between multiple states.

In line with the aforementioned descriptions of Markov chains, The Markov chain or the transition diagram for my investigation would have 4 states for the corresponding zones and another state which accounts for exceptions ie- Some balls are left alone or actually missed by the batsman, these will not come under any of these 4 states which is one of the reasons why the sum of all the probabilities

The concept of conditional probability

Conditional probability is defined as the probability of a particular event assuming or give that another event has occurred.

It is in mathematical terms therefore defined as-:

Transition diagram and conditional probability table

The following table illustrates the conditional probabilities off all possible permutations of scoring patterns from from Zone 1 to Zone 4. The vertically categorized rows in the table below represent the condition whereas the horizontally categorized columns represent the case of the probability you are trying to find given the aforementioned condition. Zone no Zone 1 Zone 2 Zone 3 Zone 4 Zone 1 0.20 0.15 0.20 0.35 Zone 2 0.20 0.15 0.25 0.30 Zone 3 0.20 0.30 0.25 0.15 Zone 4 0.20 0.30 0.20 0.15

One would notice that the sum of all the conditional probabilities is not 1 in any of the cases. This to most would seem as a mathematical fallacy as the sum of the probabilities all possibilities given a certain condition always has to be 1 and the 4 zones I have defined cover the whole ground so logically speaking why would the probabilities not add up to 1. This, very interestingly is because some balls in cricket are left alone or missed, these are categorized as no shots but are still counted as legitimate balls. In order to restrict this investigation to 4 cases in order to ensure relatively doable matrix calculations in the proceeding steps. I have chosen to not take into account these cases of a “no shot”.

Transition diagram

A transition diagram uses flowcharts to diagrammatically represents the series of conditional probabilities in a particular mathematical system. The transition diagram for the above table would therefore be represented by the transition diagram below.

Concepts Matrices

Initial state matrix

The initial probability of a batsman hitting a shot in any of the 4 zones is defined by the following matrix M

M = [0.3 0.2 0.25 0.15]

Transition Matrices Transition matrices are another way of representing all the possible permutations of conditional probabilities in a particular mathematical system. Each value represents the conditional probability of a state given by the column matrix and the condition given by the row matrix. This is defined by the diagram below.

The transition matrix for this particular mathematical system is given by-:

Eigenvalues and Eigenvectors

In general, the eigenvector of a matrix is the vector for which the following holds:

where is a scalar value called the ‘eigenvalue’. This means that the linear transformation on vector is completely defined by . We can rewrite equation as follows:

where is the identity matrix of the same dimensions as .

We are going to utilize this concept of eigenvectors and eigenvalues along with stochastic matrices to finally find a (hopefully) converging pattern in AB devilliers shot selection.

Analysis

Stochastic state matrices Now, given the initial state matrix of [0.30 0.20 0.25 0.15] and the transition matrix mentioned above we can now find and extrapolate the state matrices to any ball in the restricted match domain. The initial state matrix is a representation of the probability before the match has begun = [0.30 0.20 0.25 0.15] For the first ball of the match the representative matrix would be as follows

=

Therefore,

Given the above pattern we can now deduce the general formula for where n+1 is the ball in the match you want to find the stochastic state matrix for.

=

Using the above general formula I have now found the value for the 10th and 20th balls of the match The values after long calculations for both of the above matrices are as follows

= [ 0.1605 0.1845 0.1803 0.1856]

[0.12612375 0.143623125 0.1422234375 0.1472653125]

Analysing these values and the general formula from a mathematical perspective I can say that these values are somewhat accurate with what personal trends I have observed in a cricket match. In the very beginning of a match with the fielding restrictions in place most batsman favour the offside(Zone 3) as that is where the least fielding restrictions are in place however as the game progresses, batsman start beginning to take more risk and start playing all around the ground to get possible opportunities to take runs. This is evident in the difference between the probabilities in matrices This holds even more true for batsmen like AB de villiers who have a lot of variety in their batting and are known to utilize the whole ground as possible scoring opportunities for runs.

WORK ON CLARITY HERE TO EXPLAIN THE DIFFERENCE IN TRENDS

Just to lend some perspective I have compiled data for a more traditional T20 batsman who isn’t as exuberant and exotic in his shot selection as AB de villiers. I have chosen Murali Vijay for this investigation.

We use the same general formula but use the letter M and D to depict his matrices =

The transition matrix here is given below-: = Calculating matrix values for him similarly would yield these results-:

=[0.1235 0.09565 0.2755 0.1895] =[0.115463745 0.0876765 0.285556765 0.17873545] Looking at these values at first glance would tell you that Murali Vijay has a considerably lower percentage of shots played in Zone 1 and Zone 2 but a much higher percentage of shots played in Zone 3 and Zone 4. His shot selection is skewed to one or two zones in the ground which is fine as he is considered by most a traditional cricket player. AB devilliers shot distribution on the other hand is much more spread out as he is considered to be a much more versatile and inventive player.

Eigenvalues and Eigenvectors

We’ve already considered the concept of eigenvalues and eigenvectors in the previous page. We will now use the concept of eigenvectors along with the matrices we have previously obtained to come up with a eigenvector to help us in determining AB devilliers shot distribution.

As mentioned before the form of an eigenvector is given by-:

where is a scalar value called the ‘eigenvalue’. This means that the linear transformation on vector is completely defined by .

This can be rewritten as -:

where is the identity matrix of the same dimensions as .

Connecting this to the stochastic steady state matrix for AB de villiers that I deduced previously

Here the eigenvector is vector equivalent of the matrix where-: = Therefore the equivalent matrix now is-: A= And,

Where [W X Y Z] and λ=1

Therefore,

[W X Y Z]= 1 [

Using the laws of matrix multiplication we get these equations-:

(1) (2) (3) 0.15 W + 0.15 X + 0.35 Y + 0.30 Z = X (4) 0.25 W +0.25 X + 0.30 Y + 0.25 Z = Y (5) 0.40 W + 0.30 X + 0.15 Y + 0.25 Z = Z As this system of equations contains 5 equations and 4 variables, the simultaneous equation working is too long ad tedious to be displayed here. I will instead use technology in the form of a simultaneous equation solver to find the corresponding values for W, X, Y and Z to get the corresponding eigen value for the stochastic matrix.

We get: W= 0.22430539868985766 X = 0.24305398689857696 Y = 0.2631578947368421 Z= 0.26948271967472326

Our resulting eigenvector A is therefore (rounded off to four significant figures) A=[0.2243 0.2431 0.2632 0.2695]

This I believe is consistent with whatever I’ve observed in AB De villiers batting as the game progresses as he tends to favour zones 3 and zones 4 towards the end of the game. I will now use these eigenvalues to investigate AB de villier’s scoring patterns in case studies of particular tournaments or matches across his cricketing career in the next part of my investigation. The result in all probability should be consistent with what the above math predicts as I think I have come um with a simple but rather effective predictive model on AB devilliers batting. Investigation to follow

Sources http://www.visiondummy.com/2014/03/eigenvalues- eigenvectors/