A Note on the Exponentiation Approximation of the Birthday Paradox
Total Page:16
File Type:pdf, Size:1020Kb
A note on the exponentiation approximation of the birthday paradox Kaiji Motegi∗ and Sejun Wooy Kobe University Kobe University July 3, 2021 Abstract In this note, we shed new light on the exponentiation approximation of the probability that all K individuals have distinct birthdays across N calendar days. The exponen- tiation approximation imposes a pairwise independence assumption, which does not hold in general. We sidestep it by deriving the conditional probability for each pair of individuals to have distinct birthdays given that previous pairs do. An interesting im- plication is that the conditional probability decreases in a step-function form |not in a strictly monotonical form| as the more pairs are restricted to have distinct birthdays. The source of the step-function structure is identified and illustrated. The equivalence between the proposed pairwise approach and the existing permutation approach is also established. MSC 2010 Classification Codes: 97K50, 60-01, 65C50. Keywords: birthday problem, pairwise approach, permutation approach, probability. ∗Corresponding author. Graduate School of Economics, Kobe University. 2-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501 Japan. E-mail: [email protected] yDepartment of Economics, Kobe University. 2-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501 Japan. E-mail: [email protected] 1 1 Introduction Suppose that each of K individuals has his/her birthday on one of N calendar days consid- ered. Let K¯ = f1;:::;Kg be the set of individuals, and let N¯ = f1;:::;Ng be the set of ¯ calendar days, where 2 ≤ K ≤ N. Let Ak;n be an event that individual k 2 K has his/her birthday on day n 2 N¯ . Assume that each of all N K events occurs with uniform probability: ! \K 1 Pr A = ; 8fn ; : : : ; n g 2 N¯ K : (1) k;nk N K 1 K k=1 Define the target probabilities: Q(N; K) = Pr(The birthdays of the K individuals are all distinct); P (N; K) = Pr(At least two of the K individuals share the same birthday) = 1 − Q(N; K): Q(N; K) (resp. P (N; K)) takes a much smaller (resp. larger) value than it sounds, which is well known as the birthday paradox or the birthday problem [1{10]. Let P be the number Q N K K−1 − − of permutations of choosing K out of N days, then N PK = i=0 (N i) = N!=(N K)!. Q(N; K) can be characterized exactly by P Q∗(N; K) = N K : (2) N K Let us call (2) the permutation approach, since it exhausts all possible permutations such that the K individuals have distinct birthdays. All K individuals have distinct birthdays if and only if all pairs of individuals do. Based on this insight, what we call the pairwise approach characterizes Q(N; K) as the product of the conditional probabilities that each pair have distinct birthdays given that previous pairs do. A conventional simplification called the exponentiation approximation imposes an (incorrect) assumption of pairwise independence, leading to an approximated value of Q(N; K)[1,5,9]. In this note, we sidestep the pairwise independence assumption by characterizing the key conditional probabilities explicitly. An interesting implication is that the conditional probability decreases in a step-function form |not in a strictly monotonical form| as the more pairs are restricted to have distinct birthdays. The source of the step-function structure 2 Table 1: The order of the K C2 pairs of individuals i ki `i i ki `i ... i ki `i 1 1 2 --- ... - -- 2 1 3 K 2 3 ... - -- 3 1 4 K + 1 2 4 ... - -- . K − 1 1 K 2K − 3 2 K ... K C2 K − 1 K f g f 2 This table presents the order of all possible pairs of individuals ki; `i i2f1;:::;K C2g, where ki = min k f1;:::;K − 1g j i ≤ kfK − 0:5(k + 1)gg and `i = K − [kifK − 0:5(ki + 1)g − i]. is identified and illustrated. The equivalence between the permutation approach and the proposed pairwise approach is also established. The rest of this note is organized as follows. In Section 2, the exponentiation approxi- mation is described. In Section 3, our main results are established. In Section 4, numerical examples which illustrate the main results are presented. In Section 5, brief concluding remarks are provided. 2 Exponentiation approximation Let C be the number of combinations of choosing 2 out of K individuals, then C = K 2 K 2 ¯ ¯ K P2=2. Define the set of all possible pairs as B = ffk; `g ⊆ K k < `g. Let Bi = fki; `ig th signify the i pair with i 2 f1;:::; K C2g. Assume without loss of generality that the pairs are ordered in an intuitively straightforward way: { ( )} k + 1 ki = min k 2 f1;:::;K − 1g i ≤ k K − ; (3) { ( ) } 2 k + 1 ` = K − k K − i − i : (4) i i 2 See Table 1 to visually understand the structures of (3) and (4). Note that ki increases in i in a step-function form. Let DB be an event that a pair B 2 B¯ has distinct birthdays. Note that ! K\C2 Q(N; K) = Pr DBi (5) i=1 3 ! − KYC2 i\1 = Pr(DB ) Pr DB DB (6) 1 i j i=2 j=1 KYC2 ≈ Pr(DB1 ) Pr(DBi ) (7) i=2 ! N − 1 KYC2 N − 1 = (8) N N ( )i=2 N − 1 K C2 = ≡ Qap(N; K); (9) N where (5) holds since all K individuals have distinct birthdays if and only if all K C2 pairs do; (6) follows by the definition of conditional probability; (7) approximately holds by assuming j \i−1 ≥ that Pr(DBi j=1 DBj ) = Pr(DBi ) for all i 2, what we call the pairwise independence assumption;(8) follows by (1). Qap(N; K) defined in (9) is called the exponentiation approx- imation of Q(N; K), since (N − 1)=N is raised by K C2. 3 Main results The exponentiation approximation itself is a well-known approximation [1,5,9], but its prop- erties have not been fully explored. By ordering the K C2 pairs as in Table 1, we can obtain new and interesting implications on the exponentiation approximation. First, redefine the key conditional probability appearing in (6): ! − i\1 Q (N; K) ≡ Pr DB DB ; 8i 2 f1;:::; C g; (10) i i j K 2 j=1 − where it is understood that Q1(N; K) = Pr(DB1 ) = (N 1)=N. Then, (6) can be rewritten as KYC2 ∗∗ Q(N; K) = Qi(N; K) ≡ Q (N; K): (11) i=1 Let us call (11) the pairwise approach, since it exploits all possible pairs of individuals. A main contribution of this note is that Qi(N; K) has been characterized in an exact, compact form. 4 Theorem 1. We have that N − ki Qi(N; K) = ; 8i 2 f1;:::; K C2g; (12) N − ki + 1 where ki is defined in (3). Proof. By (10) and the definition of conditional probability, we have that n o T ¯ K i fn1; : : : ; nK g 2 N DB n j=1 j o 8 2 f g Qi(N; K) = T ; i 1;:::; K C2 : (13) f g 2 N¯ K i−1 n1; : : : ; nK j=1 DBj Consider i 2 f1;:::;K − 1g so that ki = 1. Then, we have by (13) and Table 1 that − − N(N − 1)`i 1N K `i N − 1 Qi(N; K) = − − = ; N(N − 1)`i 2N K `i+1 N where `i is defined in (4). Similarly, consider i 2 fK; : : : ; 2K − 3g so that ki = 2. Then, we have that − − N(N − 1)(N − 2)`i 2(N − 1)K `i N − 2 Qi(N; K) = − − = : N(N − 1)(N − 2)`i 3(N − 1)K `i+1 N − 1 Generally, for i 2 f1;:::; K C2g, we have that nQ o ki `i−ki−1 K−`i N (N − j) (N − ki) (N − ki + 1) − n j=1 o N ki Qi(N; K) = Q = : k − i − − `i−ki−2 − K−`i+1 N ki + 1 N j=1(N j) (N ki) (N ki + 1) Given Theorem 1, it is straightforward to characterize the pairwise approach Q∗∗(N; K). Corollary 1. We have that KYC2 N − k Q∗∗(N; K) = i : (14) N − k + 1 i=1 i Proof. Substitute (12) into (11). The permutation approach Q∗(N; K) and the pairwise approach Q∗∗(N; K) are mutually equivalent characterizations of Q(N; K). 5 Corollary 2. Q∗(N; K) = Q∗∗(N; K) for any (N; K). Proof. We have that − ( ) − KY1 N − i K i Q∗∗(N; K) = (15) N − i + 1 i=1 P = N K (16) N K = Q∗(N; K); (17) where (15) follows by substituting (3) into (14) (also see Table 1); (16) follows by straight- forward rearrangements; (17) follows by (2). Remark 1. Recall from (3) and Table 1 that ki increases in a step-function form as i increases. Hence, it is evident from (12) that Qi(N; K) decreases in a step-function form |not in a strictly monotonical form| as the more pairs of individuals are restricted to have distinct birthdays. This is a curious property that has not been documented in the existing literature. An intuition behind Remark 1 is as follows. First, Qi−1(N; K) = Qi(N; K) when ki−1 = ki; the reason for the constant probability is that the birthday of individual `i is allowed to be the same as or different from the birthdays of individuals fki + 1; : : : ; `i−1g.