A recurring theme in the mathematics of sports Doug Ensley Shippensburg University [email protected] April is Mathematics Awareness Month! April is also…

•Alcohol Awareness Month •National Oral Health Month •Stress Awareness Month •Jazz Appreciation Month •Train Safety Month Esoterica(pedia)

The longest known singles game was one of 80 points between Anthony Fawcett (Rhodesia) and Keith Glass (Great Britain) in the first round of the Surrey, Great Britain Championships on 26 May 1975.

QUESTION: What is the probability of this happening by chance? What assumptions on a model of a tennis game can account for this freakish phenomenon? Scoring in tennis

 Essentially a game in tennis is won by the first player with 4 points, but that player must win by 2 points.  When the score is tied 3-3, 4-4, etc., we say the score is at deuce.  After a deuce score, when the server is up one point, we say the score is ad in, and when the receiver is up one point, the score is ad out.  The 2005 Fawcett-Glass game had deuce 37 times. Expected Value Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p? Background: For a random (quantitative, discrete) variable X (e.g., number of points in a tennis game), the expected value of X is a weighted average of the possible values of X; specifically, if the possible

values of X are v0, v1, v2, …, then

E(X )  (vk )Pr(X  vk ) k Aside: Average Value

Suppose for the experiment of “choosing a random member of the Ensley family” on 04/09/2010, we define the variable X = the age of the person chosen. The following table shows the four possible values of X as well as the probability each is chosen.

Value 14 17 45 46 Pr(X=Value) 0.25 0.25 0.25 0.25 Aside: Average Value

Value 14 17 45 46 Pr(X=Value) 0.25 0.25 0.25 0.25

What is the average age of people in the Ensley house today? E(X )  (14) (0.25)  (17) (0.25)   (45) (0.25)  (46) (0.25)  30.5 Tennis, anyone? Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p? Let X = the number of points that are played after deuce. What is the set of all possible values of X? Tennis, anyone? Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p? Let X = the number of points that are played after deuce. What is the set of all possible values of X? According to the definition, the expected value is the infinite series  E(X )  (k)Pr(X  k) k0

NOTE: The distribution of the values of X is called a geometric distribution in probability theory. Tennis, anyone? Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p? Solution. Let S be the set of all outcomes of this experiment. That is, S = {AA, BB, ABAA, ABBB, BAAA, BABB, ABBAAA, …} Hence, every element of S is either • AA alone, or • BB alone, or • of the form AB____ or • of the form BA____ , where the blank is filled by any element of S. Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p? Solution. Let S be the set of all outcomes of this experiment. That is, S = {AA, BB, ABAA, ABBB, BAAA, BABB, …} Let L be the average length of a string in S. • AA alone,

• or BB alone,

• or AB____,

• or BA____. Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p? Solution. Let S be the set of all outcomes of this experiment. That is, S = {AA, BB, ABAA, ABBB, BAAA, BABB, …} Let L be the average length of a string in S. • AA alone, Probability: p∙p = p2 Length: 2 • or BB alone,

• or AB____,

• or BA____. Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p? Solution. Let S be the set of all outcomes of this experiment. That is, S = {AA, BB, ABAA, ABBB, BAAA, BABB, …} Let L be the average length of a string in S. • AA alone, Probability: p∙p = p2 Length: 2 Probability: (1 – p)2 • or BB alone, Length: 2

• or AB____,

• or BA____. Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p? Solution. Let S be the set of all outcomes of this experiment. That is, S = {AA, BB, ABAA, ABBB, BAAA, BABB, …} Let L be the average length of a string in S. • AA alone, Probability: p∙p = p2 Length: 2 Probability: (1 – p)2 • or BB alone, Length: 2 Probability: p∙(1 – p) • or AB____, Length: 2 + L

• or BA____. Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p? Solution. Let S be the set of all outcomes of this experiment. That is, S = {AA, BB, ABAA, ABBB, BAAA, BABB, …} Let L be the average length of a string in S. • AA alone, Probability: p∙p = p2 Length: 2 Probability: (1 – p)2 • or BB alone, Length: 2 Probability: p∙(1 – p) • or AB____, Length: 2 + L Probability: (1 – p)∙p • or BA____. Length: 2 + L Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p? Solution. S = {AA, BB, ABAA, ABBB, BAAA, BABB, …} Elements of S (Probability)∙(Length) • AA alone p2 (2) • or BB alone, (1 – p)2 (2) • or AB____, p (1 – p) (2 + L) • or BA____. (1 – p) p (2 + L) The average length L of elements of S satisfies the equation L = p2 (2) + (1 – p)2 (2) + 2 p (1 – p) (2 + L) which has solution 2 2 L   2p2  2p 1 p2  (1 p)2 Average length of a tennis game beyond “deuce” 2 L  p2  (1 p)2

NOTE: Probability theory tells us that the variance of the geometric distribution of X is given by 8p(1-p)/(p2+(1-p)2)2, which has maximum value of 8. A more general problem

In tennis a deuce point is always served from the right-hand service court; an ad point is always served from the left-hand service court. Tennis broadcasts often present data on players as if there is no difference. While this might be sound at the highest levels of tennis, it is certainly not true for amateur players. We will try the previous solution method allowing for p and q to differ. Seriously? A more general problem Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a deuce point with probability p and an ad point with probability q? A more general problem Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a deuce point with probability p and an ad point with probability q? Solution. S = {AA, BB, ABAA, ABBB, BAAA, BABB, …} Every element of S is either • AA alone, or • BB alone, or • of the form AB____ , or • of the form BA____ , where the blank is filled by any element of S. Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a deuce point with probability p and an ad point with probability q? Solution. S = {AA, BB, ABAA, ABBB, BAAA, BABB, …} Elements of S (Probability)∙(Length) • AA alone p∙q∙(2) • or BB alone, (1 – p)∙(1 – q)∙(2) • or AB____, p∙(1 – q)∙(2 + L) • or BA____. (1 – p)∙q∙(2 + L) The average length L of elements of S satisfies the equation L = 2pq + 2(1–p)(1–q) + p(1–q)(2+L) + (1–p)q(2+L) which has solution 2 2 L   2pq  p  q 1 pq  (1 p)(1 q) 2 f ( p,q)  pq  (1 p)(1 q) Examples Theorem. The expected length L of a tennis game which begins tied at deuce and in which player A wins a deuce point with probability p and an ad point with probability q is given by 2 2 L   2pq  p  q 1 pq  (1 p)(1 q)

 When q = 0.40, the maximum L is 5 with variance = 6.0.

 When q = 0.30, the maximum L is 6.7 with variance = 9.3.

 When q = 0.20, the maximum L is 10 with variance = 16.0.

 When q = 0.10, the maximum L is 20 with variance = 36.0. Even in the last, extreme case a 74-point game is 9 standard deviations above the mean. Alternative game scoring

Some tennis matches or leagues employ "No-Ad" scoring. Each game proceeds as in regular tennis scoring, but if the score reaches deuce, then the winner of the next point, the seventh in the game, wins the game. The receiver selects which court to receive in. No-ad scoring is most notably used in World Team Tennis, in many recreational leagues, and some Major Mixed doubles events.

Note: This scoring system assumes that the server is not equally effective from deuce and ad courts. Other problems to approach Problem. What is the probability that player A (who has probability p of winning a point) wins a tennis game that begins tied at deuce? Solution. Let t represent the probability of player A winning once the game is tied at deuce. Use recursive thinking to justify the equation

t = p2 + p (1 – p) t + (1 – p) p t p2 Solving this equation yields t  (1 p)2  p2 Other problems Probability of winning a tennis game

p2 Pr(A)  p2  (1 p)2 What does the data say?

 In the 2009 Wimbledon Championship, won 71% of his service points and won 78% of his service points. Federer won 95% (35 of 37) of his service games and Roddick won 98% (37 of 38) of his service games. (There were two tie-breakers played, split between the two players.)  Based on these point probabilities, the model predicts 93% of service games won by Roddick and 86% by Federer. Tennis as a gambling problem

Suppose two players A and B have $2 each, and they play a sequence of games with $1 at stake each time until someone is out of money. This is also known as the Gambler’s Ruin – it generalizes nicely to other starting values. Tennis as a board game

State 1: A wins game State 2: A up 1 point State 3: DEUCE State 4: B up 1 point State 5: B wins game Markov chains Say the probability of A winning any point is 2/3. The transition matrix gives the probabilities of moving between states in one points. States of the game Transition Matrix. State 1: A wins game  1 0 0 0 0    State 2: A up 1 point 2 / 3 0 1/ 3 0 0  State 3: DEUCE  0 2 / 3 0 1/ 3 0    State 4: B up 1 point  0 0 2 / 3 0 1/ 3 State 5: B wins game  0 0 0 0 1  Markov chains

Matrix multiplication

 1 0 0 0 0   1 0 0 0 0      2 / 3 0 1/ 3 0 0  2 / 3 0 1/ 3 0 0   0 2 / 3 0 1/ 3 0   0 2 / 3 0 1/ 3 0       0 0 2 / 3 0 1/ 3  0 0 2 / 3 0 1/ 3  0 0 0 0 1   0 0 0 0 1  Markov chains Matrix multiplication Row 3 times Column 3 …  0    2 / 3 0 1/ 3 0 2 / 3 0  0    1/ 3  0   (0)(0)  (1/ 3)(2 / 3)  (0)(0)  (2 / 3)(1/ 3)  (0)(0)

… gives the probability of going from State 3 to State 3 in two moves Markov chains

Matrix multiplication

2  1 0 0 0 0   1 0 0 0 0      2 / 3 0 1/ 3 0 0  2 / 3 2 / 9 0 1/ 9 0   0 2 / 3 0 1/ 3 0   4 / 9 0 4 / 9 0 1/ 9      0 0 2 / 3 0 1/ 3  0 4 / 9 0 2 / 9 1/ 3  0 0 0 0 1   0 0 0 0 1 

The entry in Row i, Column j of M2 is the probability of the game progressing from State i to State j in exactly 2 moves. Markov chains

General Matrix Powers If M is a transition matrix for a game, then the entry in Row i, Column j of Mk is the probability of the game progressing from State i to State j in exactly k moves.

This allows us to compute the probability that a game lasts a specified number of points! Markov chains

74  1 0 0 0 0   1 0 0 0 0     14 14  2 / 3 0 1/ 3 0 0  0.933 510 0 210 0.067  0 2 / 3 0 1/ 3 0   0.800 0 91014 0 0.200    14 14   0 0 2 / 3 0 1/ 3 0.533 910 0 510 0.467  0 0 0 0 1   0 0 0 0 1 

This shows that the probability that the game is still going on after 74 points is about 10-13. Probability of long games

With p = q = 0.5 … 74  1 0 0 0 0   1 0 0 0 0     12 12  1/ 2 0 1/ 2 0 0  0.750 410 0 410 0.250  0 1/ 2 0 1/ 2 0   0.500 0 71012 0 0.500    12 12   0 0 1/ 2 0 1/ 2 0.250 410 0 410 0.750  0 0 0 0 1   0 0 0 0 1 

This shows that the probability that the game is still going on after 74 points is about 10-11. This is the most optimistic outcome for the case where p = q. Probability of long games

With p = 0.67 and q = 0.25 …

74  1 0 0 0 0   1 0 0 0 0     9 10  1/ 4 0 3/ 4 0 0  0.550 210 0 910 0.450  0 2 / 3 0 1/ 3 0   0.400 0 2109 0 0.600    10 10   0 0 1/ 4 0 3/ 4 0.100 610 0 310 0.900  0 0 0 0 1   0 0 0 0 1 

This shows that the probability that the game is still going on after 74 points is about 10-9. Probability of long games

With p = 0.90 and q = 0.05 …

74  1 0 0 0 0   1 0 0 0 0     3 4  1/ 20 0 19 / 20 0 0  0.354 410 0 410 0.642  0 9 /10 0 1/10 0   0.320 0 4103 0 0.676    4 5   0 0 1/ 20 0 19 / 20  0.02 210 0 210 0.984  0 0 0 0 1   0 0 0 0 1 

This shows that the probability that the game is still going on after 74 points is about 0.004. What does the data say?

 In the 2009 Wimbledon Championship, Andy Roddick won 71% of his service points and Roger Federer won 78% of his service points. Federer won 95% (35 of 37) of his service games and Roddick won 98% (37 of 38) of his service games. (There were two tie-breakers played, split between the two players.)  Based on these point probabilities, the model predicts 93% of service games won by Roddick and 86% by Federer. Other sports with recurrence

Baseball A game cannot end in a tie so additional whole innings are played until there is a winner. The longest professional baseball game was a 33 inning affair played in 1981 at McCoy Stadium in Pawtucket, Rhode Island:

Notes. Cal Ripken, Jr. was 2 for 13 for Rochester in this game. Wade Boggs went 4 for 12 for Pawtucket. Other sports with recurrence

Baseball An “at bat” can last any number of pitches. We can list the possible states as counts of 0-0, 0-1, 1-0, 1-1, 0-2, 2-0, 1-2, 2-1, 3-0, 2-2, 3-1 or 3-2, base hit, strike out, or base on balls. We can then relate the probability p of getting a hit on any given pitch with the official batting average. There are no official records for number of pitches in an “at bat,” but here is some baseball lore: • Alex Cora had an 18-pitch at bat against Matt Clement in 2004. • Roy Thomas (1901) supposedly had a 29-pitch at bat. His ability to foul away pitches supposedly brought about a rule change re: foul balls. • Luke Appling supposedly fouled off 17 straight pitches before hitting a triple. • Phillies’ pitcher Brett Myers had a 9-pitch at bat against CC Sabathia in the 2008 playoffs. More tennis esoterica

 Most games in a singles match before the introduction of the tiebreaker: In 1969 at Wimbledon, Pancho Gonzales took 112 games to defeat Charlie Pasarell in the first round 22–24, 1–6, 16–14, 6–3, 11–9.  Most games in a singles match after the introduction of the tiebreaker: In 2003 at the , Andy Roddick took 83 games to defeat Younes El Aynaoui in the quarterfinals 4–6, 7–6(5), 4–6, 6–4, 21–19.  Most games in a doubles match before the introduction of the tiebreaker: In the American Zone Final of the 1973 , the United States team of and took 122 games to defeat the Chile team of Patricio Cornejo and 7–9, 37–39, 8–6, 6–1, 6–3.  Most games in a doubles match after the introduction of the tiebreaker: In 2007 at Wimbledon, the team of and André Sá took 102 games to defeat the team of and Kevin Ullyett 5–7, 7–6(4), 4–6, 7–6(7), 28–26. References

 Math Awareness Month at http://www.mathaware.org  Tennis Statistics at http://www.atpworldtour.com/  Baseball Statistics at http://www.baseball-reference.com/

 Doug Ensley, [email protected]