A (Probably) Exact Solution to the Birthday Problem

A (Probably) Exact Solution to the Birthday Problem David Brink July 2011 Abstract. Given a year with n ≥ 1 days, the Birthday Problem asks for the minimal number X (n) such that in a class of X (n) students, the probability of finding two students with the same birthday is at least 50 percent. We derive heuristically an exact formula for X (n) and argue that the probability that a counter-example to this formula exists is less than one in 45 billion. We then give a new derivation of the asymptotic expansion of Ramanujan's Q-function and note its curious resemblance to the formula for X (n). 1 Introduction It is a surprising fact, apparently first noticed by Davenport around 1927, that in a class of 23 students, the probability of finding two students with the same birthday is more than 50 percent. This observation, and the multitude of related problems that it raises, have been discussed by many distinguished authors, including von Mises [19], Gamow [9, pp. 204{206], Littlewood [17, p. 18], Halmos [11, pp. 103{104], and Davenport [6, pp. 174{175]. The interested reader is referred to [22] for a historical account and to [12] to experience the problem experimentally. Given a year with n ≥ 1 days, the variant of the Birthday Problem studied here asks for the minimal number X (n) such that in a class of X (n) students, the probability of a birthday coincidence is at least 50 percent. In other words, X (n) is the minimal integer x such that 1 2 x − 1 P (x) := 1 − 1 − ··· 1 − (1) n n n 1 is less than or equal to 2 . Clearly, X (n) is at least 2 and at most n + 1. The first 99 values of X (n), Sloane's A033810 [23], are given here: n 1{2 3{5 6{9 10{16 17{23 24{32 33{42 43{54 55{68 69{82 83{99 X (n) 2 3 4 5 6 7 8 9 10 11 12 The bounds q 1 1 q 1 3 n + 4 + 2 ≤ X (n) < 2n log 2 + 4 + 2 (2) 1 1 2 x−1 x2−x 1 2 follow easily from P (x) ≥ 1 − n + n + ··· + n = 1 − 2n and P (x) ≤ exp − n exp − n ··· x−1 x2−x 1 exp − n = exp − n . Mathis [18] showed the better lower bound q 1 1 2n log 2 + 4 − 2 < X (n): (3) p Ahmed and McIntosh [1] gave a short proof of the asymptotic approximation X (n) ∼ 2n log 2. In the following, we show 3 − 2 log 2 p p < X (n) − 2n log 2 ≤ 9 − 86 log 2: (4) 6 As will be seen, these bounds are optimal in the sense that (3 − 2 log 2)=6 ≈ 0:269 is the infimum p p of the sequence X (n) − 2n log 2, while 9 − 86 log 2 ≈ 1:279 is the maximum, taken for n = 43. Contrary to (2) and (3), the bounds (4) are sufficiently tight to give the exact value of X (n) in most cases, for example X (365) = 23. p p In general, it follows from (4) that X (n) always equals either 2n log 2 or 2n log 2 + 1 where dxe denotes the ceiling function. We prove that the formula lp m X (n) = 2n log 2 (5) holds for a set of integers n with asymptotic density (3 + 2 log 2)=6 ≈ 0:731, and moreover that p 3 − 2 log 2 X (n) = 2n log 2 + (6) 6 holds for \almost all" n, i.e., for a set of integers n with asymptotic density 1. Furthermore, we show that the formula & ' p 3 − 2 log 2 9 − 4(log 2)2 2(log 2)2 X (n) = 2n log 2 + + − (7) 6 72p2n log 2 135n referred to in the title of this paper holds for all n up to 1018, and we give a heuristic argument that the probability that a counter-example to this formula exists is less than one in 45 billion. Finally, we use the method developed here to give a new derivation of the asymptotic expansion of Ramanujan's Q-function which has a curious resemblance to (7). 2 Power Sum Polynomials It follows immediately from the definition that X (n) can be described as the minimal integer x such that − log P (x) ≥ log 2: (8) 2 1Cf. [11, p. 104], but note that the inequality P (x) < e−x =2n given there is not correct. 2 By (1) and the Taylor expansion of log(1 − z), we may write 1 2 x − 1 − log P (x) = − log 1 − − log 1 − − · · · − log 1 − n n n 1 1 1 X 1i X 2i X (x − 1)i = + + ··· + (9) i · ni i · ni i · ni i=1 i=1 i=1 1 X 1i + 2i + ··· + (x − 1)i = : i · ni i=1 This derivation is due to Tsaban [24]. For i ≥ 1, the sum 1i + 2i + ··· + (x − 1)i can be expressed as i 1 X i + 1 S (x) := B xi+1−t (10) i i + 1 t t t=0 where Bt is the t-th Bernoulli number, cf. [10]. It was this with this famous formula that Jakob Bernoulli, as related in his Ars Conjectandi [3], computed 110 + 210 + ··· + 100010 = 91409924241424243424241924242500 within half of a quarter of an hour! It appears that Si(x) is a polynomial of degree i+1 with leading 1 coefficient i+1 . The first few such power sum polynomials are 1 2 1 S1(x) = 2 x − 2 x; 1 3 1 2 1 S2(x) = 3 x − 2 x + 6 x; 1 4 1 3 1 2 S3(x) = 4 x − 2 x + 4 x : Incidentally, the beautiful identity 13 + 23 + ··· + x3 = (1 + 2 + ··· + x)2 was discovered by the Indian mathematician Aryabhata in the fifth century AD, cf. [13]. Using (9) and (10), we can express − log P (x) by the series 1 2 3 2 4 3 2 X Si(x) x − x 2x − 3x + x x − 2x + x L(x) := = + + + ··· (11) i · ni 2n 12n2 12n3 i=1 The function P (x) is only defined for integers x > 0; hence − log P (x) = L(x) holds for such x only. It seems we can now define L(x) for all real x > 0 and that, if we find a solution x to the equation L(x) = log 2, we may conclude X (n) = dxe. This idea, however, turns out not to work because (11) diverges for all real numbers x > 0 except x = 1; 2; : : : ; n. The reason is that for x fixed and i going to infinity, Si(x) grows much faster for non-integral values of x than for integral values. More precisely, the \normalized" functions (2π)i+1 · S (x) 2i! i converge (pointwise) to ± sin(2πx) or ±(cos(2πx) − 1) for i going to infinity in a fixed residue class modulo 4, cf. [7]. We will see later how to get around this problem by truncating L(x). 3 For later use, we note the following lemma. Lemma 1. The power sum polynomials Si(x) satisfy 1 i+1 Si(x) < i+1 x for all integral x > 0 (12) as well as 0 i Si(x) > 0 for all real x > 2 . (13) i i Proof. The first inequality follows immediately from Si(x) = 1 +···+(x−1) . To prove the second, write i i X i X i S0(x) = B xi−t = xi − i xi−1 + B xi−t: i t t 2 t t t=0 t=2 i i i−1 i Clearly, x − 2 x is positive for x > 2 . The Bernoulli numbers Bt are zero for t ≥ 3 odd, cf. [10]. For t ≥ 2 even, they are given by Euler's remarkable formula 2t! B = (−1)t=2+1 · ζ(t) · t (2π)t and hence satisfy Bt+2 ζ(t + 2) (t + 2)(t + 1) (t + 2)(t + 1) − = · 2 < 2 : Bt ζ(t) (2π) (2π) From i (i − t)(i − t − 1) t+2 = i (t + 2)(t + 1) t thus follows i i−t−2 2 Bt+2x (i − t)(i − t − 1) i − t+2 < < < 1 i i−t (2πx)2 2πx t Btx i Pi i i−t i for t even, 2 ≤ t ≤ i − 2, and x > 2π . It follows that t=2 t Btx is non-negative for x > 2π since the terms have decreasing absolute values and alternating signs starting with plus. i The assumption x > 2 in (13) in not quite optimal but good enough for our purposes. Figure 1 shows 1 100 a logarithmic plot of S99(x) and 100 x . It is striking how badly (12) fails for small, non-integral x 100 1 (roughly less than 2πe ). For these values of x, S99(x) is approximated well by 100 B100 ·(cos(2πx)−1). 4 Figure 1. 3 The Numbers ct Lemma 2. There is a unique sequence of real numbers c0; c1; c2;::: with c0 > 0 such that the expression p c c c x = c n + c + p2 + 3 + p4 + ··· (14) 1 0 1 n n n n formally satisfies L(x1) = log 2. Proof. Recall the definition of L(x) in (11). Inserting x1 into L(x) gives 1 1 3 c0c1 − c0 + c L(x ) = 1 c2 + p2 6 0 + ··· (15) 1 2 0 n P1 −t=2 Write (15) as L(x1) = t=0 atn .

A (Probably) Exact Solution to the Birthday Problem

The Birthday Problem (2.7)

A Note on the Exponentiation Approximation of the Birthday Paradox

Probability Theory

A Non-Uniform Birthday Problem with Applications to Discrete Logarithms

The Matching, Birthday and the Strong Birthday Problem

Approximations to the Birthday Problem with Unequal Occurrence Probabilities and Their Application to the Surname Problem in Japan*

A FIRST COURSE in PROBABILITY This Page Intentionally Left Blank a FIRST COURSE in PROBABILITY

A Monte Carlo Simulation of the Birthday Problem

This Lecture on Coincidences

Counting Birthday Collisions Using Partitions

Expected Number of Random Duplications Within Or Between Lists

Limited-Birthday Distinguishers for Hash Functions Collisions Beyond the Birthday Bound Can Be Meaningful