1 Streaks of Successes in Sports
Total Page:16
File Type:pdf, Size:1020Kb
1 Streaks of Successes in Sports It is very important in probability problems to be very careful in the statement of a question. For example, suppose that I plan to toss a fair coin five times and wonder, “What is the probability I will obtain 3 or more heads in succession?” A quick, but incorrect, answer goes as follows. The probability of three heads in succession is (0.5)3 = 0.125. Here is a solution which yields the correct answer. There are 32 possible sequences of H’s and T’s, and each of these sequences is equally likely. The sequences are below (1 is for H and 0 is for T). Sequences with 3 or more heads in succes- sion are in bold-face type. 11111 11110 11101 11011 10111 01111 11100 11010 10110 01110 11001 10101 01101 10011 01011 00111 11000 10100 01100 10010 01010 00110 10001 01001 00101 00011 10000 01000 00100 00010 00001 00000 The probability of the event of interest is 8/32 = 0.25. The solution that consists of listing every possible sequence is not practical for a large number of tosses. For example, if I want to compute the probability of 8 or more heads in succession in a sequence of 100 tosses, the above solution is not practical. There are two possible solutions: math theory and computer simulation. First, we will consider an approximate answer that comes from math theory. Consider again the above problem with 5 tosses and our interest is in obtaining 3 or more heads in succession. Let A be the event of interest. Upon reflection, we see that A = A1 ∪ A2 ∪ A3, where Ai denotes the event that the streak begins on toss i. It is easy to calculate the probabilities of the events Ai, as follows. 3 First, P (A1) = (0.5) = 0.125. Next, for A2 to occur, the first toss must be a T and the next three 4 tosses all H’s. Thus, P (A2) = (0.5) = 0.0625. Similarly, for A3 to occur, the second toss must be a T and 4 the next three tosses all H’s. Thus, P (A3) = (0.5) = 0.0625. You can verify that the three events Ai are pairwise disjoint. Thus, P (A) = P (A1) + P (A2) + P (A3) = 0.125 + 2(0.0625) = 0.25, as obtained earlier. For this problem, the answer from math theory is exact. Next, we will generalize the math theory approach. Suppose that I am going to toss a fair coin 100 times and I want to know the probability that I obtain 93 8 or more heads in succession. Let A denote the event of interest. Upon reflection, A = ∪i=1Ai, where 8 Ai is the event that the streak of 8 or more heads begins on toss i. As above, P (A1) = (0.5) and for 9 i ≥ 2, P (Ai) = (0.5) . Unlike the earlier example, however, the events Ai are not all pairwise disjoint. For example, events A1 and A11 can both occur. But the idea behind the approximation is that its is very unlikely that two or more of the Ai’s will occur. In any event, it is true that 93 X P (A) ≤ P (Ai). i=1 Table 1: The Probability of One or More Streaks of j or More Heads in 100 Tosses of a Fair Coin. The Simulation Approximation is Based on 10,000 Runs; it is Used to Obtain the Lower and Upper Bounds for the 95% Confidence Interval for the True Probability. Math Simulation Simulation Simulation j Approx. Approx. Lower Bd Upper Bd 4 3.0625 0.9747 0.9716 0.9778 5 1.5156 0.8022 0.7944 0.8100 6 0.7500 0.5420 0.5322 0.5518 7 0.3711 0.3159 0.3068 0.3250 8 0.1836 0.1743 0.1669 0.1817 9 0.0908 0.0909 0.0853 0.0965 10 0.0449 0.0444 0.0404 0.0484 11 0.0222 0.0228 0.0199 0.0251 12 0.0110 0.0107 0.0087 0.0127 13 0.0054 0.0047 0.0034 0.0060 14 0.0027 0.0018 0.0010 0.0026 15 0.0013 0.0008 0.0002 0.0014 This upper bound is taken as our approximation to P (A) and the hope is that the approximation will be good. For the problem at hand, the approximation is (0.5)8(1 + 92(0.5)) = 0.1836. Is this approximation any good? I will inbed this question into a collection of questions. I will continue to consider 100 tosses of a fair coin, but instead of focusing on 8 or more heads, I will focus on j or more heads, for j ≥ 4. Table 1 gives the math theory approximation of the probability of obtaining j or more heads in succession. It also gives the simulation approximation based on 10,000 runs and the simulation-based lower and upper bounds of the 95% confidence interval for the true probability. Here is my quick assessment of the table. The math approximation seems to be good for j ≥ 9, a bit too large for j = 8, and poor for j ≤ 7. The math approximation above is for 100 trials, a streak of j or more successes, and probability of success on each trial equal to 0.5. It can be generalized to n trials and and probability of success on each trial equal to p (and probability of failure equal to q = 1 − p). The result is the following, where A is, as before, the event that the sequence of n contains a streak of j or more successes. P (A) ≤ pj(1 + (n − j)q). (1) 2 Free Throws Of all the trials in all of sports, one of the leading candidates to be Bernoulli trials are free throws in basketball games. After all, the shot is always from the same distance and there is never any defense. In this section we begin a study of streaks of successes of free throws in NBA regular season basketball games. Between December 27, 1980, and February 28, 1981, Calvin Murphy of the Houston Rockets made 78 consecutive free throws, then an NBA record. (Murphy’s record was broken by two persons in 1993, but more on that later. Murphy’s streak is the third longest in NBA history.) In this section we will use some ideas of probability to investigate this record performance. It is natural to attempt to use the ideas of the previous section. But there is a problem. We don’t know the value of p for Murphy’s career. In his 13 year NBA career, Murphy made 89.16% of his free throws, 3445 of 3864. A 95% confidence interval estimate of Murphy’s p is [0.8818, 0.9014]. Thus, intuitively, using p = 0.8916 in the formula of the previous section seems reasonable. Here’s what we get. P (A) ≤ (0.8916)78(1 + 3786(0.1084)) = 0.0534. Next, I simulated 10,000 careers for Murphy and found that 480 of them (4.80%) had a streak of 78 or more successes. The 95% confidence interval for the true probability is [0.0438, 0.0522]. I am a bit surprised that the probability is so large because Murphy was selected for study precisely because he held the record. More on this later. I want to address a weakness in the above analysis, namely the assumption that p is known. There is a way to attack our problem on streaks without this assumption. There is a very useful fact for Bernoulli trials. Suppose that you have n Bernoulli trials. Then conditional on there being x successes, all sequences that yield x successes are equally likely. Note that the numerical value of p is irrelevant. Here is a simple example. Suppose we have n = 4 Bernoulli trials and observe x = 2 successes. Then the following six sequences are equally likely to have occurred. 1100 1010 1001 0110 0101 0011 This suggests another way to study Murphy’s record. Conditional on his making 3445 of 3864 all sequences of 3445 successes and 419 failures are equally likely. Moreover, it is very easy to simulate these sequences— simply have the computer randomly order a sequence of 3445 ones and 419 zeroes. I simulated 10,000 such sequences and found that 481 sequences had a longest streak of 78 or more! This is almost identical to what we found in the case with p known. In 1993 two players exceeded Murphy’s record of 78 consecutive free throws: Mahmoud Abdul-Rauf with 81 and Micheal (yes, this is how he spells it) Williams with 97. We will investigate these performances next. Abdul-Rauf’s career totals are 1051 free throws made in 1161 attempts, giving pˆ = 0.9053. Let A be the event that he would have a streak of 81 or more successes at some point in his career. First, using our mathematical approximation, the result is P (A) ≈ (0.9053)81(1 + 1080(0.0947)) = 0.0327. Second, I simulated 10,000 careers for Abdul-Rauf, conditioning on his actual performance of 1051 makes and 110 misses. In 231 simulated careers Abdul-Rauf achieved a streak of 81 or more successful free throws. This gives a point estimate of 0.0231 and a 95% confidence interval of [0.0202, 0.0260]. I conclude that the math approximation is not very good in this case. The career totals for Williams are 1545 makes in 1780 attempts, for a success rate of 0.8680.