A note on the exponentiation approximation of the birthday paradox

Kaiji Motegi∗ and Sejun Woo† Kobe University Kobe University July 3, 2021

Abstract

In this note, we shed new light on the exponentiation approximation of the that all K individuals have distinct birthdays across N calendar days. The exponen- tiation approximation imposes a pairwise independence assumption, which does not hold in general. We sidestep it by deriving the for each pair of individuals to have distinct birthdays given that previous pairs do. An interesting im- plication is that the conditional probability decreases in a step-function form —not in a strictly monotonical form— as the more pairs are restricted to have distinct birthdays. The source of the step-function structure is identified and illustrated. The equivalence between the proposed pairwise approach and the existing approach is also established.

MSC 2010 Classification Codes: 97K50, 60-01, 65C50.

Keywords: birthday problem, pairwise approach, permutation approach, probability.

∗Corresponding author. Graduate School of Economics, Kobe University. 2-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501 Japan. E-mail: [email protected] †Department of Economics, Kobe University. 2-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501 Japan. E-mail: [email protected]

1 1 Introduction

Suppose that each of K individuals has his/her birthday on one of N calendar days consid- ered. Let K¯ = {1,...,K} be the set of individuals, and let N¯ = {1,...,N} be the set of ¯ calendar days, where 2 ≤ K ≤ N. Let Ak,n be an event that individual k ∈ K has his/her birthday on day n ∈ N¯ . Assume that each of all N K events occurs with uniform probability: ( ) ∩K 1 Pr A = , ∀{n , . . . , n } ∈ N¯ K . (1) k,nk N K 1 K k=1

Define the target :

Q(N,K) = Pr(The birthdays of the K individuals are all distinct), P (N,K) = Pr(At least two of the K individuals share the same birthday) = 1 − Q(N,K).

Q(N,K) (resp. P (N,K)) takes a much smaller (resp. larger) value than it sounds, which is well known as the birthday paradox or the birthday problem [1–10]. Let P be the number ∏ N K K−1 − − of of choosing K out of N days, then N PK = i=0 (N i) = N!/(N K)!. Q(N,K) can be characterized exactly by

P Q∗(N,K) = N K . (2) N K

Let us call (2) the permutation approach, since it exhausts all possible permutations such that the K individuals have distinct birthdays. All K individuals have distinct birthdays if and only if all pairs of individuals do. Based on this insight, what we call the pairwise approach characterizes Q(N,K) as the product of the conditional probabilities that each pair have distinct birthdays given that previous pairs do. A conventional simplification called the exponentiation approximation imposes an (incorrect) assumption of pairwise independence, leading to an approximated value of Q(N,K)[1,5,9]. In this note, we sidestep the pairwise independence assumption by characterizing the key conditional probabilities explicitly. An interesting implication is that the conditional probability decreases in a step-function form —not in a strictly monotonical form— as the more pairs are restricted to have distinct birthdays. The source of the step-function structure

2 Table 1: The order of the K C2 pairs of individuals

i ki ℓi i ki ℓi ... i ki ℓi 1 1 2 --- ... - -- 2 1 3 K 2 3 ... - -- 3 1 4 K + 1 2 4 ... - -- ......

K − 1 1 K 2K − 3 2 K ... K C2 K − 1 K

{ } { ∈ This table presents the order of all possible pairs of individuals ki, ℓi i∈{1,...,K C2}, where ki = min k {1,...,K − 1} | i ≤ k{K − 0.5(k + 1)}} and ℓi = K − [ki{K − 0.5(ki + 1)} − i]. is identified and illustrated. The equivalence between the permutation approach and the proposed pairwise approach is also established. The rest of this note is organized as follows. In Section 2, the exponentiation approxi- mation is described. In Section 3, our main results are established. In Section 4, numerical examples which illustrate the main results are presented. In Section 5, brief concluding remarks are provided.

2 Exponentiation approximation

Let C be the number of combinations of choosing 2 out of K individuals, then C = K 2 K 2 ¯ ¯ K P2/2. Define the set of all possible pairs as B = {{k, ℓ} ⊆ K k < ℓ}. Let Bi = {ki, ℓi} th signify the i pair with i ∈ {1,..., K C2}. Assume without loss of generality that the pairs are ordered in an intuitively straightforward way: { ( )}

k + 1 ki = min k ∈ {1,...,K − 1} i ≤ k K − , (3) { ( ) } 2 k + 1 ℓ = K − k K − i − i . (4) i i 2

See Table 1 to visually understand the structures of (3) and (4). Note that ki increases in i in a step-function form.

Let DB be an event that a pair B ∈ B¯ has distinct birthdays. Note that ( ) K∩C2

Q(N,K) = Pr DBi (5) i=1

3 ( ) − K∏C2 i∩1 = Pr(DB ) Pr DB DB (6) 1 i j i=2 j=1

K∏C2 ≈ Pr(DB1 ) Pr(DBi ) (7) (i=2 ) N − 1 K∏C2 N − 1 = (8) N N ( )i=2 N − 1 K C2 = ≡ Qap(N,K), (9) N where (5) holds since all K individuals have distinct birthdays if and only if all K C2 pairs do; (6) follows by the definition of conditional probability; (7) approximately holds by assuming | ∩i−1 ≥ that Pr(DBi j=1 DBj ) = Pr(DBi ) for all i 2, what we call the pairwise independence assumption;(8) follows by (1). Qap(N,K) defined in (9) is called the exponentiation approx- imation of Q(N,K), since (N − 1)/N is raised by K C2.

3 Main results

The exponentiation approximation itself is a well-known approximation [1,5,9], but its prop- erties have not been fully explored. By ordering the K C2 pairs as in Table 1, we can obtain new and interesting implications on the exponentiation approximation. First, redefine the key conditional probability appearing in (6): ( ) − i∩1 Q (N,K) ≡ Pr DB DB , ∀i ∈ {1,..., C }, (10) i i j K 2 j=1

− where it is understood that Q1(N,K) = Pr(DB1 ) = (N 1)/N. Then, (6) can be rewritten as

K∏C2 ∗∗ Q(N,K) = Qi(N,K) ≡ Q (N,K). (11) i=1

Let us call (11) the pairwise approach, since it exploits all possible pairs of individuals. A main contribution of this note is that Qi(N,K) has been characterized in an exact, compact form.

4 Theorem 1. We have that

N − ki Qi(N,K) = , ∀i ∈ {1,..., K C2}, (12) N − ki + 1 where ki is defined in (3).

Proof. By (10) and the definition of conditional probability, we have that { } ∩ ¯ K i {n1, . . . , nK } ∈ N DB { j=1 j } ∀ ∈ { } Qi(N,K) = ∩ , i 1,..., K C2 . (13) { } ∈ N¯ K i−1 n1, . . . , nK j=1 DBj

Consider i ∈ {1,...,K − 1} so that ki = 1. Then, we have by (13) and Table 1 that

− − N(N − 1)ℓi 1N K ℓi N − 1 Qi(N,K) = − − = , N(N − 1)ℓi 2N K ℓi+1 N where ℓi is defined in (4). Similarly, consider i ∈ {K,..., 2K − 3} so that ki = 2. Then, we have that − − N(N − 1)(N − 2)ℓi 2(N − 1)K ℓi N − 2 Qi(N,K) = − − = . N(N − 1)(N − 2)ℓi 3(N − 1)K ℓi+1 N − 1

Generally, for i ∈ {1,..., K C2}, we have that {∏ } ki ℓi−ki−1 K−ℓi N (N − j) (N − ki) (N − ki + 1) − { j=1 } N ki Qi(N,K) = ∏ = . k − i − − ℓi−ki−2 − K−ℓi+1 N ki + 1 N j=1(N j) (N ki) (N ki + 1)

Given Theorem 1, it is straightforward to characterize the pairwise approach Q∗∗(N,K).

Corollary 1. We have that

K∏C2 N − k Q∗∗(N,K) = i . (14) N − k + 1 i=1 i

Proof. Substitute (12) into (11).

The permutation approach Q∗(N,K) and the pairwise approach Q∗∗(N,K) are mutually equivalent characterizations of Q(N,K).

5 Corollary 2. Q∗(N,K) = Q∗∗(N,K) for any (N,K).

Proof. We have that

− ( ) − K∏1 N − i K i Q∗∗(N,K) = (15) N − i + 1 i=1 P = N K (16) N K = Q∗(N,K), (17) where (15) follows by substituting (3) into (14) (also see Table 1); (16) follows by straight- forward rearrangements; (17) follows by (2).

Remark 1. Recall from (3) and Table 1 that ki increases in a step-function form as i increases. Hence, it is evident from (12) that Qi(N,K) decreases in a step-function form —not in a strictly monotonical form— as the more pairs of individuals are restricted to have distinct birthdays. This is a curious property that has not been documented in the existing literature.

An intuition behind Remark 1 is as follows. First, Qi−1(N,K) = Qi(N,K) when ki−1 = ki; the reason for the constant probability is that the birthday of individual ℓi is allowed to be the same as or different from the birthdays of individuals {ki + 1, . . . , ℓi−1}. Second,

Qi−1(N,K) > Qi(N,K) when ki−1 + 1 = ki; the reason for the decreased probability is that the birthday of individual ℓi is restricted be the same as the birthdays of individuals

{1, . . . , ℓi−1}. These two features result in the step-function decrease of Qi(N,K). Another direct implication from Theorem 1 is that the exponentiation approximation Qap(N,K) overestimates the true probability Q(N,K).

Corollary 3. Q(N,K) ≤ Qap(N,K) for any (N,K).

Proof. Equation (12) implies that

N − 1 Q (N,K) = , ∀i ∈ {1,...,K − 1}, (18) i N N − 1 Q (N,K) < , ∀i ∈ {K,..., C }. (19) i N K 2

Hence, we have by (9) and (14) that Q(N,K) ≤ Qap(N,K).

6 Schwarz [9] demonstrated that Q∗(N,K) ≤ Qap(N,K) by applying the Bernoulli in- equality and the mathematical induction. Corollary 3 demonstrates that Q∗∗(N,K) ≤ Qap(N,K). The two approaches are essentially equivalent to each other in view of Corollary 2: Q∗(N,K) = Q∗∗(N,K). The latter approach, however, is simpler than the former since Q∗∗(N,K) has a more similar structure with Qap(N,K) than Q∗(N,K) does.

4 Numerical examples

To illustrate the main results derived in Section 3, we provide a few numerical examples.

First, we visualize the step-function structure of Qi(N,K) with N ∈ {5, 6} and K ∈ {4, 5}. { } For each case, Qi(N,K) i∈{1,...,K C2} is computed by (12) and plotted in Figure 1. When N = K = 5, we have that   0.8 if i ∈ {1, 2, 3, 4},   0.75 if i ∈ {5, 6, 7}, Qi(N,K) =  (20) 0.667 if i ∈ {8, 9},   0.5 if i = 10.

Since N = 5, we have that (N − 1)/N = 0.8. Equation (20) implies that Qi(N,K) =

(N −1)/N for i ∈ {1,...,K−1} and Qi(N,K) < (N −1)/N for i ∈ {K,..., K C2}, confirming (18) and (19). The permutation approach, the pairwise approach, and the exponentiation approach lead to:

P Q∗(N,K) = N K = 0.0384, N K K∏C2 ∗∗ Q (N,K) = Qi(N,K) = 0.0384, (i=1 ) N − 1 K C2 Qap(N,K) = = 0.1074. N

The permutation and pairwise approaches coincide with each other, confirming Corollary 2. Further, as established in Corollary 3, the exponentiation approximation overestimates the true probability: Q(N,K) < Qap(N,K). Similar implications can be drawn from other cases of (N,K); see Figure 1. Fixing N, the larger value of K leads to the greater decay of Qi(N,K) in i, reflecting the harder situation

7 for individuals to have distinct birthdays. In Figure 2, Q(N,K) and Qap(N,K) are compared in larger-scale scenarios with N ∈ {10, 15, 20, 25} and K ∈ {2,...,N}. Q(N,K) can be computed via either the permutation approach (2) or the pairwise approach (14), while Qap(N,K) can be computed via (9). As expected from Corollary 3, Q(N,K) ≤ Qap(N,K) for any (N,K) considered. The difference between them is fairly small, indicating that the exponentiation approximation is accurate for sufficiently large N. Fix K = 8, then Q(N,K) ∈ {0.018, 0.101, 0.198, 0.286} or equivalently P (N,K) = 1 − Q(N,K) ∈ {0.982, 0.899, 0.802, 0.714} for N ∈ {10, 15, 20, 25}, respectively. These results indicate that the probability that at least 2 out of K = 8 individuals have the same birthday decreases as the number of available calendar days increases. Note, however, that P (N,K) takes a relatively large value for each N, confirming the birthday paradox.

5 Conclusion

In this note, we shed new light on the exponentiation approximation of the probability that all K individuals have distinct birthdays across N calendar days. The exponentiation ap- proximation imposes the pairwise independence assumption, which does not hold in general. We sidestep it by deriving the conditional probability for each pair of individuals to have distinct birthdays given that previous pairs do. An interesting implication is that the condi- tional probability decreases in a step-function form —not in a strictly monotonical form— as the more pairs are restricted to have distinct birthdays. The source of the step-function structure has been identified and illustrated. Further, we have demonstrated that the ex- ponentiation approximation always overestimates the true probability of distinct birthdays. The equivalence between the proposed pairwise approach and the existing permutation ap- proach has also been established.

Acknowledgements

We thank Jay Dennis, Shigeyuki Hamori, and seminar participants at Kobe University for helpful comments and discussions.

8 Figure 1: The conditional pairwise probability of distinct birthdays, Qi(N,K)

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10

a) (N,K) = (5, 4) b) (N,K) = (5, 5)

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10

c) (N,K) = (6, 4) d) (N,K) = (6, 5)

{ } This figure plots the conditional pairwise probability of distinct birthdays Qi(N,K) i∈{1,...,K C2}, where the number of calendar days is N ∈ {5, 6} and the number of individuals is K ∈ {4, 5}. When K = 4, the pairs are ordered as B1 = {1, 2}, B2 = {1, 3}, B3 = {1, 4}, B4 = {2, 3}, B5 = {2, 4}, B6 = {3, 4}. When K = 5, the pairs are ordered as B1 = {1, 2}, B2 = {1, 3}, B3 = {1, 4}, B4 = {1, 5}, B5 = {2, 3}, B6 = {2, 4}, B7 = {2, 5}, B { } B { } B { } | ∩i−1 8 = 3, 4 , 9 = 3, 5 , 10 = 4, 5 . By definition, Qi(N,K) = Pr(DBi j=1 DBj ), where DBi signifies the event that the ith pair have distinct birthdays.

9 Figure 2: The probability of distinct birthdays Q(N,K) and the exponentiation approxima- tion Qap(N,K)

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0 2 4 6 8 10 2 4 6 8 10 12 14

a) N = 10 b) N = 15

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0 5 10 15 20 5 10 15 20 25

c) N = 20 d) N = 25

This figure plots the probability of distinct birthdays Q(N,K) and its exponentiation approximation Qap(N,K), where the number of calendar days is N ∈ {10, 15, 20, 25} and the number of individuals is K ∈ {2, 3,...,N}. Q(N,K) can be computed via either the permutation approach Q∗(N,K) = P /N K ∏ N K ∗∗ K C2 − − −1 or the pairwise approach Q (N,K) = i=1 (N ki)(N ki + 1) , and it is plotted as a red line with circle markers. The exponentiation approximation leads to Qap(N,K) = {(N − 1)/N}K C2 , and it is plotted as a black line with triangle markers.

10 References

[1] G. Blom and L. Holst, Some properties of similar pairs, Adv. Appl. Prob., 21, (1989), 941–944.

[2] W. Feller, An Introduction to and Its Applications, Vol. 1, Wiley, (1968), 3rd edn.

[3] F. H. Mathis, A generalized birthday problem, SIAM Rev., 33, (1991), 265–270.

[4] E. H. Mckinney, Generalized birthday problem, Am. Math. Mon., 73, (1966), 385–387.

[5] F. Mosteller, Understanding the Birthday Problem, Math. Teacher, 55, (1962), 322– 325.

[6] A. G. Munford, A note on the uniformity assumption in the birthday problem, Am. Stat., 31, (1977), 119–119.

[7] J. I. Naus, An extension of the birthday problem, Am. Stat., 22, (1968), 27–29.

[8] B. Saperstein, The generalized birthday problem, J. Amer. Statist. Assoc., 67, (1972), 425–428.

[9] W. Schwarz, Approximating the birthday problem, Am. Stat., 42, (1988), 195–196.

[10] K. Suzuki, D. Tonien, K. Kurosawa and K. Toyota, Birthday paradox for multi- collisions, Information Security and Cryptology, ICISC 2006, M. S. Rhee, B. Lee (eds.), (2006), 29–40.

11