1 Streaks of Successes in Sports

It is very important in probability problems to be very careful in the statement of a question. For example, suppose that I plan to toss a fair coin five times and wonder, “What is the probability I will obtain 3 or more heads in succession?” A quick, but incorrect, answer goes as follows.

The probability of three heads in succession is (0.5)3 = 0.125.

Here is a solution which yields the correct answer.

There are 32 possible sequences of H’s and T’s, and each of these sequences is equally likely. The sequences are below (1 is for H and 0 is for T). Sequences with 3 or more heads in succes- sion are in bold-face type.

11111 11110 11101 11011 10111 01111 11100 11010 10110 01110 11001 10101 01101 10011 01011 00111 11000 10100 01100 10010 01010 00110 10001 01001 00101 00011 10000 01000 00100 00010 00001 00000

The probability of the event of interest is 8/32 = 0.25. The solution that consists of listing every possible sequence is not practical for a large number of tosses. For example, if I want to compute the probability of 8 or more heads in succession in a sequence of 100 tosses, the above solution is not practical. There are two possible solutions: math theory and computer simulation. First, we will consider an approximate answer that comes from math theory. Consider again the above problem with 5 tosses and our interest is in obtaining 3 or more heads in succession. Let A be the event of interest. Upon reflection, we see that

A = A1 ∪ A2 ∪ A3, where Ai denotes the event that the streak begins on toss i. It is easy to calculate the probabilities of the events Ai, as follows. 3 First, P (A1) = (0.5) = 0.125. Next, for A2 to occur, the first toss must be a T and the next three 4 tosses all H’s. Thus, P (A2) = (0.5) = 0.0625. Similarly, for A3 to occur, the second toss must be a T and 4 the next three tosses all H’s. Thus, P (A3) = (0.5) = 0.0625. You can verify that the three events Ai are pairwise disjoint. Thus,

P (A) = P (A1) + P (A2) + P (A3) = 0.125 + 2(0.0625) = 0.25, as obtained earlier. For this problem, the answer from math theory is exact. Next, we will generalize the math theory approach. Suppose that I am going to toss a fair coin 100 times and I want to know the probability that I obtain 93 8 or more heads in succession. Let A denote the event of interest. Upon reflection, A = ∪i=1Ai, where 8 Ai is the event that the streak of 8 or more heads begins on toss i. As above, P (A1) = (0.5) and for 9 i ≥ 2, P (Ai) = (0.5) . Unlike the earlier example, however, the events Ai are not all pairwise disjoint. For example, events A1 and A11 can both occur. But the idea behind the approximation is that its is very unlikely that two or more of the Ai’s will occur. In any event, it is true that

93 X P (A) ≤ P (Ai). i=1 Table 1: The Probability of One or More Streaks of j or More Heads in 100 Tosses of a Fair Coin. The Simulation Approximation is Based on 10,000 Runs; it is Used to Obtain the Lower and Upper Bounds for the 95% Confidence Interval for the True Probability.

Math Simulation Simulation Simulation j Approx. Approx. Lower Bd Upper Bd 4 3.0625 0.9747 0.9716 0.9778 5 1.5156 0.8022 0.7944 0.8100 6 0.7500 0.5420 0.5322 0.5518 7 0.3711 0.3159 0.3068 0.3250 8 0.1836 0.1743 0.1669 0.1817 9 0.0908 0.0909 0.0853 0.0965 10 0.0449 0.0444 0.0404 0.0484 11 0.0222 0.0228 0.0199 0.0251 12 0.0110 0.0107 0.0087 0.0127 13 0.0054 0.0047 0.0034 0.0060 14 0.0027 0.0018 0.0010 0.0026 15 0.0013 0.0008 0.0002 0.0014

This upper bound is taken as our approximation to P (A) and the hope is that the approximation will be good. For the problem at hand, the approximation is

(0.5)8(1 + 92(0.5)) = 0.1836.

Is this approximation any good? I will inbed this question into a collection of questions. I will continue to consider 100 tosses of a fair coin, but instead of focusing on 8 or more heads, I will focus on j or more heads, for j ≥ 4. Table 1 gives the math theory approximation of the probability of obtaining j or more heads in succession. It also gives the simulation approximation based on 10,000 runs and the simulation-based lower and upper bounds of the 95% confidence interval for the true probability. Here is my quick assessment of the table. The math approximation seems to be good for j ≥ 9, a bit too large for j = 8, and poor for j ≤ 7. The math approximation above is for 100 trials, a streak of j or more successes, and probability of success on each trial equal to 0.5. It can be generalized to n trials and and probability of success on each trial equal to p (and probability of failure equal to q = 1 − p). The result is the following, where A is, as before, the event that the sequence of n contains a streak of j or more successes.

P (A) ≤ pj(1 + (n − j)q). (1)

2 Free Throws

Of all the trials in all of sports, one of the leading candidates to be Bernoulli trials are free throws in games. After all, is always from the same distance and there is never any defense. In this section we begin a study of streaks of successes of free throws in NBA regular season basketball games. Between December 27, 1980, and February 28, 1981, of the made 78 consecutive free throws, then an NBA record. (Murphy’s record was broken by two persons in 1993, but more on that later. Murphy’s streak is the third longest in NBA history.) In this section we will use some ideas of probability to investigate this record performance. It is natural to attempt to use the ideas of the previous section. But there is a problem. We don’t know the value of p for Murphy’s career. In his 13 year NBA career, Murphy made 89.16% of his free throws, 3445 of 3864. A 95% confidence interval estimate of Murphy’s p is [0.8818, 0.9014]. Thus, intuitively, using p = 0.8916 in the formula of the previous section seems reasonable. Here’s what we get.

P (A) ≤ (0.8916)78(1 + 3786(0.1084)) = 0.0534.

Next, I simulated 10,000 careers for Murphy and found that 480 of them (4.80%) had a streak of 78 or more successes. The 95% confidence interval for the true probability is [0.0438, 0.0522]. I am a bit surprised that the probability is so large because Murphy was selected for study precisely because he held the record. More on this later. I want to address a weakness in the above analysis, namely the assumption that p is known. There is a way to attack our problem on streaks without this assumption. There is a very useful fact for Bernoulli trials. Suppose that you have n Bernoulli trials. Then conditional on there being x successes, all sequences that yield x successes are equally likely. Note that the numerical value of p is irrelevant. Here is a simple example. Suppose we have n = 4 Bernoulli trials and observe x = 2 successes. Then the following six sequences are equally likely to have occurred.

1100 1010 1001 0110 0101 0011

This suggests another way to study Murphy’s record. Conditional on his making 3445 of 3864 all sequences of 3445 successes and 419 failures are equally likely. Moreover, it is very easy to simulate these sequences— simply have the computer randomly order a sequence of 3445 ones and 419 zeroes. I simulated 10,000 such sequences and found that 481 sequences had a longest streak of 78 or more! This is almost identical to what we found in the case with p known. In 1993 two players exceeded Murphy’s record of 78 consecutive free throws: Mahmoud Abdul-Rauf with 81 and Micheal (yes, this is how he spells it) Williams with 97. We will investigate these performances next. Abdul-Rauf’s career totals are 1051 free throws made in 1161 attempts, giving pˆ = 0.9053. Let A be the event that he would have a streak of 81 or more successes at some in his career. First, using our mathematical approximation, the result is

P (A) ≈ (0.9053)81(1 + 1080(0.0947)) = 0.0327.

Second, I simulated 10,000 careers for Abdul-Rauf, conditioning on his actual performance of 1051 makes and 110 misses. In 231 simulated careers Abdul-Rauf achieved a streak of 81 or more successful free throws. This gives a point estimate of 0.0231 and a 95% confidence interval of [0.0202, 0.0260]. I conclude that the math approximation is not very good in this case. The career totals for Williams are 1545 makes in 1780 attempts, for a success rate of 0.8680. Let A be the event that he would have a streak of 97 or more successes at some point in his career. First, using our mathematical approximation, the result is

P (A) ≈ (0.8680)97(1 + 1683(0.1320)) = 0.000243.

Second, I simulated 10,000 careers for Williams, conditioning on his actual performance of 1545 makes and 235 misses. In one simulated career Williams achieved a streak of 97 or more successful free throws. This gives a point estimate of 0.0001. In summary, the achievements of Murphy and Abdul-Rauf were mildly surprising, whereas the achieve- ment of Williams was quite remarkable. But let me return to a statement I made earlier: Table 2: Statistics and Approximate Probabilities of Streaks for 13 Selected NBA Players. The Math Ap- proximation is Obtained from Formula 1 with p =p ˆ.

Est. based on 10,000 Approx. Prob. from Theory run simulation study Name Made Total Pct. ≥ 78 ≥ 81 ≥ 97 ≥ 78 ≥ 81 ≥ 97 Mahmoud Abdul-Rauf 1051 1161 0.9053 0.0442 0.0327 0.0066 0.0334 0.0231 0.0031 Rick Barry 3818 4243 0.8998 0.1109 0.0807 0.0148 0.0951 0.0679 0.0128 3960 4471 0.8857 0.0389 0.0270 0.0039 0.0366 0.0253 0.0043 2973 3390 0.8770 0.0146 0.0099 0.0012 0.0124 0.0083 0.0015 9619 12963 0.7420 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 5841 6593 0.8859 0.0586 0.0407 0.0058 0.0524 0.0361 0.0056 Calvin Murphy 3445 3864 0.8916 0.0534 0.0378 0.0060 0.0481 0.0330 0.0058 3389 3871 0.8755 0.0148 0.0099 0.0012 0.0130 0.0088 0.0017 2135 2362 0.9039 0.0821 0.0605 0.0119 0.0750 0.0558 0.0112 Bill Sharman 3143 3559 0.8831 0.0251 0.0173 0.0024 0.0212 0.0140 0.0025 1548 1741 0.8891 0.0193 0.0136 0.0020 0.0150 0.0095 0.0008 Kiki Vandeweghe 3484 3997 0.8717 0.0112 0.0074 0.0008 0.0109 0.0068 0.0012 Micheal Williams 1545 1780 0.8680 0.0036 0.0024 0.0002 0.0045 0.0026 0.0001

I am a bit surprised that the probability is so large because Murphy was selected for study precisely because he holds the record. I now want to consider several of the truly great shooters in the NBA who have not been included in the analysis above. Table 2 presents selected statistics for some of the top free throw shooters in NBA history. In the table, “Made” is the total number of made free throws during the player’s career; “Total” is the total number of free throws attempted; and “Pct.” is Made divided by Total. The next three columns present the approxi- mate probability that the player makes j or more successive free throws at some point in his career, where j = 78, 81, and 97, and p is taken to equal Pct. The final three columns are the estimates of these three prob- abilities based on 10,000 run simulation studies that conditioned on the total number of makes and misses. The reader is encouraged to spend some time comparing the various entries in the Table. The unusual case in Table 2 is Karl Malone. He is included because he holds the NBA career records for most free throws attempted and made. Note, however, that his percentage made, 0.742, is not particularly large. For each of the methods of study, Malone’s chance of having a streak of 78 or more successes is 0.0000. Thus, simply having a large number of attempts does not make a long streak likely. In fact, the longest simulated streak in 10,000 careers for Malone was 62. The values in Table 2 are for selected players. For example, using math theory, 0.1109 is the approximate probability that a player with Rick Barry’s career numbers would achieve a streak of 78 or more consecutive successes. Now we address a somewhat different question. For j = 78, 81 and 97, what is the probability that at least one player in NBA history would have a streak of j or more successes? Each of these probabilities, whatever it is, is larger than The probability that at least one player in Table 2 would have a streak of j or more successes. Table 3: The Probability that at Least One Player in Table 2 Would Have a Streak of Specified Length.

Length of Streak Method 78 81 97 Math Theory 0.3891 0.2940 0.0555 Simulation 0.3494 0.2572 0.0495

Finally, the probability that at least one player would have a particular streak is one minus the probability that no player would have such a streak. Table 3 presents the probabilities that at least one player in Table 2 would have a streak of j or more successes. These probabilities are calculated two ways: using the math theory approximation and using the simulation study results. Examine the numbers in Table 3. What do you conclude? Here is my analysis. All of the probabilities are below 0.5, so each of these streaks is greater than the median of the distribution of longest streak. So, at least in this weak way, the three longest streaks are pretty long. The probabilities for 78 and 81 are respectably large, making the existence of such streaks not terribly surprising. The streak of 97, however, is borderline notable. (As mentioned earlier, this was a pretty amazing performance by Micheal Williams.) What is the impact of my leaving out nearly all NBA players? First, I am sorry, but these are the only data I can obtain easily. The example of Karl Malone suggests that we can safely ignore any player who shot below 75% for his career. A notable absence from the list is , who made 7327 of 8772 free throws in his career, or 83.53%. Using the math theory approach, Jordan’s approximate probabilities are low: 0.0011 for 78, 0.0007 for 81, and 0.0000 for 97. My guess is that if we included all NBA players, the probabilities for 78 would approach 0.5, the probabilities for 81 would increase a bit, and the probabilities for 97 would barely change. But these are just my guesses; a more complete analysis is needed.

3 Three Point Baskets

As mentioned earlier, free throws seem a natural candidate for Bernoulli Trials—every shot is the same and there is no defense. In this section we will examine three point attempts. Often, there is little direct defense on a three point shot, so perhaps Bernoulli trials make sense. The most consecutive three point field goals made, no misses, is 13. It was achieved by from January 15–19, 1996, and matched by Terry Mills from December 4–7, 1996. The third and fourth longest streaks are both 11, achieved by Scott Wedman from December 21, 1984 to March 31, 1985 (over three months without a miss!) and matched by Jeff Hornacek, December 30, 1994 to January 11, 1995. (For some insight into why it took Wedman so long to attempt 11 shots, see

http://www.stat.wisc.edu/˜wardrop/articles/3point_html.)

My analysis will mimic the analysis of free throws. Table 4 presents regular season data for 16 of the best three point shooters in NBA history. The data are thru the end of the 2002–03 regular season. We will begin by looking at the individual accomplishments of the four men listed above. The probability that someone with Brent Price’s career would make 13 or more in succession is estimated to be 0.0025 or 0.0018. For Mills, the numbers are 0.0036 or 0.0032. The probability that someone with Wedman’s career would make 11 or more in succession is estimated to be 0.0009 or 0.0010. For Hornacek, the numbers are 0.0555 or 0.0515. I conclude that Hornacek’s streak was mildly surprising, but that the other three streaks were of quite remarkable lengths. But we need to adjust to these players being selected from a pool of great shooters because of their streaks. Table 4: Statistics and Approximate Probabilities of Streaks of Three Point Baskets for Selected NBA Players. The Math Approximation is Obtained from Formula 1 with p =p ˆ.

Est. based on 10,000 Approx. Prob. from Theory run simulation study Name Made Total Pct. ≥ 11 ≥ 13 ≥ 15 ≥ 11 ≥ 13 ≥ 15 B.J. Armstrong 432 1020 0.4235 0.0458 0.0082 0.0015 0.0418 0.0076 0.0015 1090 2652 0.4110 0.0880 0.0149 0.0025 0.0838 0.0137 0.0027 728 1650 0.4412 0.1131 0.0220 0.0043 0.1009 0.0212 0.0041 1719 4266 0.4030 0.1157 0.0188 0.0030 0.1084 0.0188 0.0021 Jeff Hornacek 828 2055 0.4029 0.0555 0.0090 0.0015 0.0515 0.0077 0.0012 Alan Houston 1187 2965 0.4003 0.0750 0.0120 0.0019 0.0727 0.0118 0.0017 726 1599 0.4540 0.1466 0.0302 0.0062 0.1358 0.0316 0.0069 260 603 0.4312 0.0324 0.0060 0.0011 0.0264 0.0053 0.0007 Reggie Miller 2330 5854 0.3980 0.1397 0.0221 0.0035 0.1288 0.0221 0.0043 Terry Mills 530 1370 0.3869 0.0242 0.0036 0.0005 0.0224 0.0032 0.0003 Wesley Person 1054 2527 0.4171 0.0976 0.0170 0.0029 0.0932 0.0167 0.0036 Drazen Petrovic 255 583 0.4374 0.0362 0.0069 0.0013 0.0319 0.0050 0.0005 Brent Price 363 938 0.3870 0.0166 0.0025 0.0004 0.0140 0.0018 0.0004 1554 3868 0.4018 0.1017 0.0164 0.0026 0.0976 0.0165 0.0031 Trent Tucker 575 1410 0.4078 0.0430 0.0071 0.0012 0.0434 0.0080 0.0013 Scott Wedman 84 251 0.3347 0.0009 0.0001 0.0000 0.0010 0.0001 0.0000

Following the earlier argument for free throws, the probability that at least one of the players in Table 4 would achieve a streak of 11 or more is estimated to be 0.6964 (math approximation) or 0.6689 (simulation). Thus, it is not surprising that somebody achieved 11 consecutive successes. (I have not calculated the probability that four or more would do so.) Similarly, the probability that at least one of the players in Table 4 would achieve a streak of 13 or more is estimated to be 0.1801 (math approximation) or 0.1754 (simulation). Thus, it is not too surprising that somebody achieved 13 consecutive successes. (I have not calculated the probability that two or more would do so; it might be interesting to do so.) Finally, the the probability that at least one of the players in Table 4 would achieve a streak of 15 or more is estimated to be 0.0339 (both methods). Thus, it is not at all surprising that nobody has done this.