A Probabilistic Proof of Wallis's Formula for Π
Total Page:16
File Type:pdf, Size:1020Kb
A PROBABILISTIC PROOF OF WALLIS’S FORMULA FOR ¼ STEVEN J. MILLER There are many beautiful formulas for ¼ (see for example [4]). The purpose of this note is to introduce an alternate derivation of Wallis’s product formula, equation (1), which could be covered in a first course on probability, statistics, or number theory. We quickly review other famous formulas for ¼, recall some needed facts from probability, and then derive Wallis’s formula. We conclude by combining some of the other famous formulas with Wallis’s formula to derive an interesting expression for log(¼=2) (equation (5)). Often in a first-year calculus course students encounter the Gregory-Leibniz for- mula, ¼ 1 1 1 X1 (¡1)n = 1 ¡ + ¡ + ¢ ¢ ¢ = : 4 3 5 7 2n + 1 n=0 The proof uses the fact that the derivative of arctan x is 1=(1 + x2), so ¼=4 = R 1 2 0 dx=(1 + x ). To complete the proof, expand the integrand with the geometric series formula and then justify interchanging the order of integration and summa- tion. Another interesting formula involves Bernoulli numbers and the Riemann zeta function. The Bernoulli numbers Bk are the coefficients in the Taylor series t t X1 B tk = 1 ¡ + k ; et ¡ 1 2 k! k=2 P1 ¡s each Bk is rational. The Riemann zeta function is ³(s) = n=1 n , which converges for real part of s greater than 1. Using complex analysis one finds (see for instance [10, p. 365] or [18, pp. 179–180]) that (¡4)kB ³(2k) = ¡ 2k ¢ ¼2k; 2 ¢ 2k! 1 2 P ¡2 yielding formulas for ¼ to any even power. In particular, ¼ =6 = n n and 4 P ¡4 ¼ =90 = n n . Date: June 24, 2008. 2000 Mathematics Subject Classification. 60E05 (primary), 60F05, 11Y60 (secondary). Key words and phrases. Wallis’s Formula, t-Distribution, Gamma Function. 1An amusing consequence of these formulas is a proof of the infinitude of primes. Using unique Q ¡s ¡1 factorization, one can show that ³(s) also equals p(1 ¡ p ) , where p runs over all primes. As ¼2 is irrational and ³(2) = ¼2=6, there must be infinitely many primes: if there were only finitely 2 Q ¡2 ¡1 many then ¼ =6 = p(1 ¡ p ) would be rational! See [13] for explicit lower bounds on ¼(x) derivable from upper bounds for the irrationality measure of ³(2), and [14] for more details on the numerous connections between ³(s) and number theory. 1 2 STEVEN J. MILLER One of the most interesting formulas for ¼ is a multiplicative one due to Wallis (1665): ¼ 2 ¢ 2 4 ¢ 4 6 ¢ 6 8 ¢ 8 Y1 2n ¢ 2n = ¢ ¢ ¢ = : (1) 2 1 ¢ 3 3 ¢ 5 5 ¢ 7 7 ¢ 9 (2n ¡ 1)(2n + 1) n=1 Common proofs use the infinite product expansion for sin x (see [18, p. 142]) or induction to prove formulas for integrals of powers of sin x (see [3, p. 115]). We present a mostly elementary proof using standard facts about probability distribu- tions encountered in a first course on probability or statistics (and hence the title).2 The reason we must write “mostly elementary” is that at one point we appeal to the Dominated Convergence Theorem. It is possible to bypass this and argue directly, and we sketch the main ideas for the interested reader. Recall that a continuous function f(x) is a continuous probability distribution if R 1 (1) f(x) ¸ 0 and (2) ¡1 f(x)dx = 1. We immediately see that if g(x) is a non- negative continuous function whose integral is finite then there exists an a > 0 such R 1 that ag(x) is a continuous probability distribution (take a = 1= ¡1 g(x)dx). This simple observation is a key ingredient in our proof, and is an extremely important technique in mathematics; the proof of Wallis’s formula is but one of many applica- tions.3 In fact, this observation greatly simplifies numerous calculations in random matrix theory, which has successfully modeled diverse systems ranging from en- ergy levels of heavy nuclei to the prime numbers; see [5, 14] for introductions to random matrix theory and [11] for applications of this technique to the subject. One of the purposes of this paper is to introduce students to the consequences of this simple observation. Our proof relies on two standard functions from probability, the Gamma func- tion and the Student t-distribution. The Gamma function ¡(x) is defined by Z 1 ¡(x) = e¡ttx¡1dt: 0 Note that this integral is well defined if the real part of x is positive. Integrating by parts yields ¡(x + 1) = x¡(x). This implies that if n is a nonnegative integer then ¡(n + 1) = n!; thus the Gamma function generalizes the factorial function (see [17] for more on the Gamma function, including another proof of Wallis’s formula involving the Gammap function). We need the following: Claim: ¡(1=2) = ¼. 2For a statistical proof involving an experiment and data, see the chapter on Buffon’s needle in [1] (page 133): if you have infinitely many parallel lines d units apart, then the probability that a “randomly” dropped rod of length ` · d crosses one of the lines is 2`=¼d. Thus you can calculate ¼ by throwing many rods on the grid and counting the number of intersections. 3A nice application of Wallis’s formula is in determining the universal constant in Stirling’s for- mula for n!; see [15] for some history and applications. A PROBABILISTIC PROOF OF WALLIS’S FORMULA FOR ¼ 3 p Proof. Inp the integral for ¡(1=2), change variables by setting u = t (so dt = 2udu = 2 tdu). This yields Z 1 Z 1 2 2 ¡(1=2) = 2 e¡u du = e¡u du: 0 ¡1 p This integral is well-known to equal ¼ (see page 542 of [2]). The standard proof is to square the integral and convert to polar coordinates: Z 1 Z 1 Z 1 Z 2¼ 2 2 2 ¡(1=2)2 = e¡u du e¡v dv = e¡r rdrdθ = ¼: ¡1 ¡1 0 0 ¤ In fact, our proof above shows Z 1 1 2 p e¡t =2dt = 1: (2) ¡1 2¼ This density is called the standard normal (or Gaussian). This is one of the most important probability distributions, and we shall see it again when we look at the Student t-distribution. If g is a continuous probability density, then we say that the random variable Y has distribution g if for any interval [a; b] the probabil- R b ity that Y takes on a value in [a; b] is a g(y)dy. The celebrated Central Limit Theorem (see [6, p. 515] for a proof) states that for many continuous densities g, if X1;:::;Xn are independent randomp variables, each with density g, then as n ! 1 the distribution of (Yn ¡ ¹)=(σ= n) converges to the standard normal (where Yn = (X1 + ¢ ¢ ¢ + Xn)=n is the sample average, ¹ is the mean of g, and σ is its standard deviation4). The second function we need is the Student5 t-distribution (with º degrees of freedom): ¡ ¢ µ ¶ º+1 µ ¶ º+1 º+1 2 ¡ 2 2 ¡ 2 ¡ 2 t t fº(t) = p ¡ º ¢ ¢ 1 + = cº 1 + ; ¼º ¡ 2 º º here º is a positive integer and t is any real number. Claim: The Student t-distribution is a continuous probability density. Proof. As fº(t) is clearly continuous and nonnegative, to show fº(t) is a proba- bility density it suffices to show that it integrates to 1. We must therefore show that Z µ ¶ º+1 p ¡ ¢ 1 2 ¡ 2 º t ¼º ¡ 2 1 + dt = ¡ º+1 ¢ : ¡1 º ¡ 2 4 R The mean ¹ of a distribution is its average value: ¹ = xg(x)dxR. The standard deviation σ measures how spread out a distribution is about its average value: σ2 = (x ¡ ¹)2g(x)dx. 5Student was the pen name of William Gosset. 4 STEVEN J. MILLER As the integrand isp symmetric, we mayp integrate from 0 to infinity and double the result. Letting t = º tan θ (so dt = º sec2 θdθ) we find Z µ ¶ º+1 Z Z 1 2 ¡ 2 ¼=2 2 ¼=2 t p sec θdθ p º¡1 1 + dt = 2 º º+1 = 2 º cos θdθ: ¡1 º 0 sec θ 0 The proof follows immediately from two properties of the Beta function (see [2, p. 560]): B(p; q) = ¡(p)¡(q)=¡(p + q) Z ¼=2 B(m + 1; n + 1) = 2 cos2m+1(θ) sin2n+1(θ)dθ; (3) 0 an elementary proof without appealing to properties of the Beta function is given in Appendix A. ¤ The Student t-distribution arises in statistical analyses where the sample size º is small and each observation is normally distributed with the same mean and the same (unknown) variance (see [8, 12]). The reason the Student t-distribution is used only for small samples sizes is that as º ! 1, fº(t) converges to the standard normal; proving this will yield Wallis’s formula. While we can prove this by invoking the Central Limit Theorem, we may also see this directly by recalling that ³ x ´N ex = lim 1 + : N!1 N We therefore have µ 2 ¶¡º=2 ³ ´ t 2 ¡1=2 2 lim 1 + = et = e¡t =2: º!1 º As fº(t) is a probability distribution for all positive integers º, it integrates to 1 for all such º, which is equivalent to p ¡ ¢ Z µ ¶ º+1 º 1 2 ¡ 2 1 ¼º ¡ 2 t = ¡ º+1 ¢ = 1 + dt: cº ¡ 2 ¡1 º Taking the limit as º ! 1 yields Z µ ¶¡ º+1 1 1 t2 2 lim = lim 1 + dt º!1 cº º!1 ¡1 º º+1 Z 1 µ 2 ¶¡ Z 1 t 2 2 p = lim 1 + dt = e¡t =2dt = 2¼: ¡1 º!1 º ¡1 Some work is necessary of course to justify interchanging the integral and the limit; this justification is why our argument is only “mostly elementary”.