Applied Mathematics, 2020, 11, 436-446 https://www.scirp.org/journal/am ISSN Online: 2152-7393 ISSN Print: 2152-7385

Derivation of Gaussian Distribution: A New Approach

A. T. Adeniran1, O. Faweya2*, T. O. Ogunlade3, K. O. Balogun4

1Department of Statistics, University of Ibadan, Ibadan, Nigeria 2Department of Statistics, Ekiti State University (EKSU), Ado-Ekiti, Nigeria 3Department of Mathematics, Ekiti State University (EKSU), Ado-Ekiti, Nigeria 4Department of Statistics, Federal School of Statistics, Ibadan, Nigeria

How to cite this paper: Adeniran, A.T., Abstract Faweya, O., Ogunlade, T.O. and Balogun, K.O. (2020) Derivation of Gaussian Proba- The famous de Moivre’s Laplace limit theorem proved the probability density bility Distribution: A New Approach. Ap- function of Gaussian distribution from binomial probability mass function plied Mathematics, 11, 436-446. under specified conditions. De Moivre’s Laplace approach is cumbersome as https://doi.org/10.4236/am.2020.116031 it relies heavily on many lemmas and theorems. This paper invented an al- Received: March 17, 2020 ternative and less rigorous method of deriving Gaussian distribution from Accepted: May 29, 2020 basic random experiment conditional on some assumptions. Published: June 1, 2020 Keywords Copyright © 2020 by author(s) and Scientific Research Publishing Inc. De Moivres Laplace Limit Theorem, Binomial Probability Mass Function, This work is licensed under the Creative Gaussian Distribution, Random Experiment Commons Attribution International License (CC BY 4.0).

http://creativecommons.org/licenses/by/4.0/ Open Access 1. Introduction

A well celebrated, fundamental probability distribution for the class of conti- nuous functions is the classical Gaussian distribution named after the German Mathematician Karl Friedrich Gauss in 1809. Definition 1.1 Let µ and σ be constants with −∞ <µ < ∞ and σ > 0 . The function

2 1x−µ 1 −  fx( ;,µσ) = e2σ ; for−∞

DOI: 10.4236/am.2020.116031 Jun. 1, 2020 436 Applied Mathematics

A. T. Adeniran et al.

tion in statistics. The well-known method of deriving this distribution first appeared in the second edition of the Doctrine of Chances by (hence, de Moivre’s Laplace limit theorem) published in 1738 ([1] [2] [3] [4] [5]). The mathematical statement of the popular de Moivre’s theorem follows. Theorem 1.1 (de Moivre’s Laplace limit theorem) As n grows large ( n →∞), for x in the neighborhood of np, for moderate values of p ( p ≠ 0 and p ≠ 1), we can approximate

( x− np)2 − n x nx− 1 2npq pq ≈ e ,p+= q 1, pq , > 0. (2) x 2πnpq

+ Explicitly, the theorem asserts that suppose n ∈  , and let p and q be proba- bilities, with pq+=1. The function

n x nx− b( xnp;, ) =−= p(1 p) for x 0,1,2, , n (3) x

called the binomial probability function converges to the probability density function of the as n →∞ with mean np and standard deviation np(1− p) . 1 Although, De Moivre proved the result for p = ([6] [7]). [8] extended and 2 generalized the proof to all values of p (probability of success in any trial) such that p is not too small and not too big. Feller result was expounded by [9]. [10] [11] [12] [13] used uniqueness property of moment generating function tech- nique to proof the same theorem. In this paper, we attempt to find an answer to the question: is there any alter- native procedure to the derivation of Gaussian probability density function apart from de Moivre’s Laplace limit theorem approach which relies heavily on many Lemmas and Theorems (Stirling approximation formula, Maclaurin series ex- pansion etc.), as evidenced by the work of [8] and [9]?

2. Existing Technique

This section presents the summary proof of the existing de Moivre’s Laplace limit theorem. First and foremost, the study state with proof, the most important lemma of the de-Moivre’s Laplace limit theorem, Stirling approximation prin- ciple. Lemma 2.1 (Stirling Approximation Principle) Given an integer nn;0> , the factorial of a large number n can be replaced with the approximation n n nn!2≈π e Proof 2.1 This lemma can be derived using the integral definition of the fac- torial, ∞ − n!=Γ+=( n 1) xxnx ed (4) ∫0 Note that the derivative of the logarithm of the integrand can be written

DOI: 10.4236/am.2020.116031 437 Applied Mathematics

A. T. Adeniran et al.

dd− n ln( xnx e) =( n ln xx −=−) 1 (5) ddxxx The integrand is sharply peaked with the contribution important only near xn= . Therefore, let xn= +δ where δ ≤ n , and write

nx− δ ln( x e) = nn ln ( +−+δδ) ( n) = nnln 1 + −+( nδ) n (6) δ =nnln( ) + ln 1 + −+(nδ ) n 1 Recall that the Maclaurin series of fx( ) =ln( 1 +=− x) x x2 +0( n) . There- 2 fore, δδ1 22δ nx− = + − + − +δ =n −− + ln( x e) n ln ( n) 2 (n) ln ( nn) (7) nn22n

Taking the exponential on both sides of the preceding Equation (7) gives

δ2 δδ 22n ln nnn −− n − − nx−−( ) ln(n ) n n x e≈= e2n e ee 22 nn = e (8) e

Plugging (8) into the integral expression for n!, that is, (4) gives

nnδδ22 ∞∞nn−−  n!≈= ed22nnδδ  ed (9) ∫∫0 ee −∞

δ 2 ∞ − From (9), let I = ed2n δ and considering δ and κ as a dummy varia- ∫0 ble such that

δκ22 1 ∞−∞ − ∞∞−+(δκ22) I 2 =×=e22α dδ enn d κ e 2 ddδκ (10) ∫0 ∫0 ∫∫00 Transforming I 2 from algebra to polar coordinates yields δρ= cos( θ) , κρ= sin ( θ) which implies δκ22+= ρ 2 with Jacobian (J) of the transforma- tion as ∂∂δδ ∂∂ρθcos(θ) − ρθ sin ( ) J = = = ρ (11) ∂∂κκsin(θρ) cos( θ) ∂∂ρθ

Hence,

ρρ22 22π∞−−π∞ IJ2 = e22nn ddρθ= e ρρθ dd ∫∫00 ∫∫00 (12) 22ππ∞ =−==ne−u dθθ nn d2π ∫∫000 Therefore, II= =2 π n. Substituting for I in (9) gives n n nn!2≈π  (13) e We now begin with proof of theorem (1.1) using the popular existing tech- nique.

DOI: 10.4236/am.2020.116031 438 Applied Mathematics

A. T. Adeniran et al.

Proof 2.2 Using the result of lemma (2.1), Equation (3) can be rewritten as n n 2πn  e nx− ≈− x f( xnp;, ) x nx− p(1 p) x nx−  22πx π−( nx)   ee   (14) 1 n+ 2 1 n nx− = ppx (1− ) 1 1 2π x+ nx−+ x2 ( nx− ) 2 1 Multiplying both numerator and denominator of Equation (14) by nn≡ 2 to get 11 n+++− xx 22 1 n nx− f( xnp;, ) ≈−px (1 p) 1 1 x+ −+ 2πn 2 nx x( nx− ) 2 (15) 11 −−x −+−nx 22 1 x  nx−  x nx− =    pp(1− ) 2πn nn   Since x is in the neighborhood of np, change variables x= np + ε , where ε measures the distance from the mean, np, of the binomial and the measured quantity x. Re-write (15) in terms of ε and further simplify as follow 11 −−x −+−nx 22 1 np+εε  n −− np  x nx− f( xnp;, ) ≈−   p(1 p) 2πn nn   1 1 −−x −+−nx 2 2 1 εεx nx− ≈p1 +( 11 −−p)pp(1−) 2πn np n(1− p) 1 1 −−x −+−nx 2 1 1 εε2 − − 1 2 ≈+11 − pp( 1−) 2 2πn np n(1− p) to get 1 1 −−x −+−nx 1 εε2 2 f( xnp;, ) ≈1 +− 1 (16) 21π−np( p) np n(1− p) Note that xx= exp( ln ) . Therefore, rewriting (16) in exponential form to have 1 1 −−x −+−nx 1 εε2 2 f( xnp;, ) ≈ exp ln 1+− 1 − 21π−np( p) np n(1 p)  11εε  1  ≈ exp−−x ln 1 + +−+ nx − ln 1 − 21π−np( p) 2np 21 n( − p)

11ε f( x;, n p) ≈ exp −np −−ε ln 1 + 21π−np( p) 2 np (17) 1 ε  +−np(1 −) +ε −ln 1 −  21np( − ) 

DOI: 10.4236/am.2020.116031 439 Applied Mathematics

A. T. Adeniran et al.

1 Suppose fx( ) =ln( 1 + x) , using Maclaurin series fx( ) =−+ x x2 0( n) 2 1 and similarly fx( ) =ln( 1 − x) =−− x x2 +0( n) . So that, 2 2 2 εε 1  ε εεε 1  ln 1+  ≈−  and ln 1−  ≈− − . np  np2  np np(1−−−)  np( 1) 21  np( ) As a result, f( xnp;, ) 1 εε22 ε 2 ε 2 ≈ exp −+εε − + + − (18) 21π−np( p) 2np np 21 n( −− p) n( 1 p) 11ε 2 = exp −  21π−np( p) 21np( − p)

2 Recall that x= np + ε which implies that ε 2 =( x − np) . From binomial dis- tribution np = µ , and np(1−= p) σ 2 which implies that np(1−= p) σ . Making appropriate substitution of these in the Equation (18) yields

2 2 1x−µ 11( x − µ ) 1−  f( xnp;, ) ≈ exp − = e2σ ; for −∞

The theorem confirmed. We recommend that readers interested in the detailed proof of the theorem to consult the study expounded by [9].

3. The Proposed Technique

Suppose a random experiment of throwing needle or any other dart related ob- jects at the origin of the cartesian plane is performed with the aim of hitting the centre (see Figure 1). Due to human nature of inconsistency or lack of perfection, varying results in the throwing generate random errors. To make the derivation possible and less rigorous, we make the following assumptions: 1) The errors are independent of the orientation of the coordinate system. 2) Errors in perpendicular directions are independent. This means that being too high doesn’t alter the probability of being off to the right.

Figure 1. The possible results of the dart experiment.

DOI: 10.4236/am.2020.116031 440 Applied Mathematics

A. T. Adeniran et al.

3) Small errors are more likely than large errors. That is, throwings are more likely to land in region P than either Q or R, since region P is closer to the target (origin). Similarly, for the same reason, region Q is more likely than region R. Furthermore, there is higher possibility or tendency of hitting region V than ei- ther S or T, since V has the wider or bigger surface area and the distances from the origin are approximately the same. From Figure 2, let the probability of the needle falling in the vertical strip from x to xx+∆ be denoted as px( )∆ x. Similarly, the probability of the needle falling in the horizontal strip from y to yy+∆ be py( )∆ y. Obviously, the function cannot be constant, due to the stochastic nature of the experiment. In this study, our interest is to know and obtain the form and characteristics of the function px( ) . From second assumption, the probability of the needle fall- ing in the shaded region ABCD (see Figure 2) is px( )∆⋅ xpy( ) ∆ y

Note that any regions r unit from the origin with area ∆∆xy has the same probability which is a consequence of the assumption that errors do not depend on the orientation. We can say that px( )∆⋅ xpy( ) ∆ y = pxpy( ) ( ) ∆∆ xy = gr( ) ∆∆ xy (20)

where gr( ) = pxpy( ) ( ) (21)

from fundamental rule of Calculus, differentiating (using product rule) both sides of Equation (21) with respect to θ gives dd 0 = px( ) py( ) + py( ) px( ) (22) ddθθ Here, gr′( ) = 0 since g (.) is independent of orientation. By transforma- tion to polar coordinates, xr= cosθ and yr= sinθ , we can rewrite the de- rivatives in Equation (22) as

Figure 2. The typical example of the experiment.

DOI: 10.4236/am.2020.116031 441 Applied Mathematics

A. T. Adeniran et al.

dd 0 ==+=px( ) py( rsinθθ) py( ) px( rcos ) (23) ddθθ Using chain rule of differentiation, (23) becomes 0 = pxp( ) ′′( yr) cosθθ− py( ) p( xr) sin (24)

Rewriting Equation (24) again by replacing r cosθ with x and r cosθ with y yields 0 = pxp( ) ′′( yx) − pyp( ) ( xy) (25)

The above differential equation can be put in a form such that it can be solved using variable separable technique as px′′( ) py( ) = (26) pxx( ) pyy( )

This differential equation can only be true for any x and y, x and y are px′′( ) py( ) independent, if and only if the ratio = defined by (26) is a con- pxx( ) pyy( ) stant. That is, if px′′( ) py( ) = = c. (27) pxx( ) pyy( ) px′( ) Consider = c in (27) and rearrange to have pxx( ) px′( ) = cx. (28) px( )

Integrating Equation (28) gives 2 cx2 cx ln px( ) =+== ksothat px( ) k e2 ; where k e.k1 (29) 2 1 By third assumption, c must be negative so that we write the probability func- tion (29) c − 2 x + px( ) = ke2 ; where c∈  (30) If there is a horizontal shift of target from the origin to an arbitrary point µ which now mark the new center/target, then the probability function in (30) be- comes c −−( x µ )2 px( ) = ke 2 (31)

Differentiating (31) and set the derivative equal to zero gives

cx( −µ )2 − p′( x) =−− ck( x µ )e02 = (32)

cx( −µ )2 − since e02 ≠ implies x = µ . Therefore, Equation (31) has maximum 1 value at x = µ and point of inflexion at x =µ ± . Obviously, (31) has given k

DOI: 10.4236/am.2020.116031 442 Applied Mathematics

A. T. Adeniran et al.

us the basic form of the Gaussian distribution with constants k and c, and do- main of X as −∞ to ∞ . Therefore, for Equation (31) to be regarded as a prop- er probability density function, the total area under the curve must be 1. That is

c 2 ∞ −−( x µ ) kxe2 d1= ∫−∞ (33) ∞∞ For a symmetric function fx( ) , fx( )d2 x= fx( ) d x. Applying this ∫∫−∞ 0 property to Equation (33) yields

c 2 ∞ −−( x µ ) 1 ed.2 x = (34) ∫0 2k Squaring both sides of (34) to get

cc22 ∞∞−−( xyµµ) −−( ) 11 e22 dexy⋅=× d (35) ∫∫00 22kk This is possible since x and y are just dummy variables. Recall that x and y are also independent, so we can write the product in LHS of (35) as a double integral to produce

c µµ22 ∞∞−( xy −) +−( ) 1 e2 ddxy= . (36) ∫∫00 4k 2 Putting x−=⇒µ zdd xz = and y−=⇒µ wdd yw = in the preceding Equation (36) gives

c ∞∞−+zw22 1 e2  ddzw= . (37) ∫∫00 4k 2 The double integral (37) can be evaluated using polar coordinates as zr= cosθ and yr= sinθ with Jacobian (J) of the transformation as ddzz ∂ ( zw, ) ddr θ cosθθ−r sin Jr= = = = , (38) ∂ (r,θ ) dww d sinθθ r cos ddr θ

and

22 zw22+=( rcosθθ) +( r sin) = r2 . (39)

So, Equation (37) now becomes

ππcr22cr ∞∞−−1 22e22J dd rθθ= err dd= . (40) ∫∫00 ∫∫00 4k 2

π cr2 ∞ − Evaluating the double integral 2 e2 rr ddθ in Equation (40) by first let- ∫∫00 cr 2 ting u = , and solving for k in the resulting equation yields 2 c k = (41) 2π Putting (41) in (31), the probability density function, px( ) , becomes

DOI: 10.4236/am.2020.116031 443 Applied Mathematics

A. T. Adeniran et al.

c c −−( x µ )2 px( ) = e 2 (42) 2π Again, integration of probability function over its domain gives 1. Therefore, from (42)

cc22 ∞∞cc−−( xxµµ) ∞−−( ) px( )d x= ed2ed122x= x= (43) ∫∫−∞ −∞ 22ππ ∫0 Further simplification of the preceding Equation (43) gives

c 2 ∞ −−( x µ ) π e 2 = (44) ∫0 2c One of the important goals in mathematical theory of statistics is to obtain the mean and variance of any probability function under study. The mean, µ , is ∞ xp xd x σ 2 defined to be the value of the integral ∫−∞ ( ) . The variance, , is the ∞ 2 x− µ pxd x value of the integral ∫−∞ ( ) ( ) . Therefore, using Equation (42), cc22 ∞∞22cc−−( xxµµ) −−( ) σµ2 =( xx−=) ed222( xx−µ) ed (45) ∫∫−∞ 22ππ0 or equivalently as

c 2 2c ∞ −−( x µ ) σ2 =∫ ( xx −− µµ) ( )ed2 x (46) π 0  consider Equation (46) and using integration by part ( ∫∫udd v= uv − v u ) with cc −−( xxµµ)221 −−( ) ux=−⇒µ dd u = x and dvx=( −µ ) ed22 xv ⇒=− e , we have c

cc22 21cx− µ −−( xxµµ) ∞ −−( ) σ 2 = −e22 −−∫ edx π cc0 n cc22 21c −−( xxµµ) 1∞ −−( ) =−−lim ( xeµ ) 22 +∫ e dx π ccn→∞ 0 0 cc22 21c −−(nxµµ) 1∞ −−( ) =−−lim(nxµ ) e 22 +∫ e d π ccn→∞ 0

cc22 2cc 1∞∞−−( xxµµ) 12 −−( ) =+=0ed∫∫22xx ed ππcc00

putting (44) in the preceding equation above, gives 12c π 1 σ 2 =× × ⇒=c (47) ccπ 2 σ 2 Substituting (47) in (42), the derived probability density function has form 2 1 1x−µ −−( x µ )2 − 112  px( ) = e2σ = e,2σ −∞

DOI: 10.4236/am.2020.116031 444 Applied Mathematics

A. T. Adeniran et al.

To verify that Equation (19) is a proper probability density function with pa- rameters µ and σ is to show that the integral

2 ∞ 11x − µ Ix= exp − d ∫−∞  σ 2π 2 σ

is equal to 1. x − µ Change variables of integration by letting z = , which implies that σ ddxz= σ . Then

z2 zz 22 ∞122−∞∞ −− I= ∫e2σ d z= ∫∫ ed 22 zz= ed −∞ σ 22ππ00π so that

x2 y2 xy22+ 2∞−−  22∞  ∞∞ − I2 = e22 dx  e dy= e 2 ddxy π∫0 ππ ∫0 ∫∫00   

Here xy, are dummy variables. Switching to polar coordinate by making the substitutions xr= cosθ , yr= sinθ produces r as the Jacobian of the trans- formation. So

π r2 2 ∞ − I2 = 2 e2 rr ddθ π ∫∫00 ra2 d Put ar=⇒=d . Therefore, 2 r π ππ 2∞ d2a ∞ 2 Ir2 =2e−−aa dθ =−== 22 e d θθ d1 π∫∫00 r ππ ∫00 ∫0 Thus I = 1, indicating that (48) is a proper probability density function. Other properties of the distribution such as; moments, moments generating function, cumulant generating function, characteristics function, parameter es- timation and the likes can be found in [14] [15] [16].

4. Conclusion

While working with the outlined objective, we are able to establish that there ex- ists an approach that is not only serving as an alternative proof of derivation of the Gaussian probability density function but also free from rigorous mathemat- ical analysis and independent of Lemmas and Theorems. This paper can be clas- sified as a theoretical study of Gaussian distribution and can serve as an excellent teaching reference in probability and statistics classes where only basic calculus and skills to deal with algebraic expressions, Maclaurin series expansion and Eu- ler distribution of second kind (gamma function) are the only background re- quirements.

Acknowledgements

The authors are highly grateful to the editor and anonymous referees for reading through the manuscript, constructive comments and suggestions that helped in

DOI: 10.4236/am.2020.116031 445 Applied Mathematics

A. T. Adeniran et al.

the improvement of the revised version of the paper.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References [1] Van der Vaart, A.W. (1998) Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511802256 [2] Blume, J.D. and Royall, R.M. (2003) Illustrating the Law of Large Numbers (And Confidence Intervals). The American Statistician, 57, 51-57. https://doi.org/10.1198/0003130031081 [3] Lesigne, E. (2005) Heads or Tails: An Introduction to Limit Theorems in Probabili- ty. American Mathematical Society, Providence, Volume 28 of Student Mathemati- cal Library. https://doi.org/10.1090/stml/028 [4] Walck, C. (2007) Handbook on Statistical Distributions for Experimentalists. Par- ticle Physics Group, Fysikum University of Stockholm, Stockholm. [5] Proschan, M.A. (2008) The Normal Approximation to the Binomial. The American Statistician, 62, 62-63. https://doi.org/10.1198/000313008X267848 [6] Shao, J. (1999) Mathematical Statistics. Springer Texts in Statistics, Springer Verlag, New York. [7] Soong, T.T. (2004) Fundamentals of Probability and Statistics for Engineers. John Wiley & Sons Ltd., Chichester. [8] Feller, W. (1973) An Introduction to Probability Theory and Its Applications. Vo- lume 1, Third Edition, John Wiley and Sons, Hoboken. [9] Adeniran, A.T., Ojo, J.F. and Olilima, J.O. (2018) A Note on the Asymptotic Con- vergence of Bernoulli Distribution. Research & Reviews: Journal of Statistics and Mathematical Sciences, 4, 19-32. [10] Inlow, M. (2010) A Moment Generating Function Proof of the Lindeberg-Lévy . The American Statistician, 64, 228-230. https://doi.org/10.1198/tast.2010.09159 [11] Bagui, S.C., Bhaumik, D.K. and Mehra, K.L. (2013) A Few Counter Examples Useful in Teaching Central Limit Theorem. The American Statistician, 67, 49-56. https://doi.org/10.1080/00031305.2012.755361 [12] Bagui, S.C., Bagui, S.S. and Hemasinha, R. (2013) Non-Rigorous Proof’s Stirling’s Formula. Mathematics and Computer Education, 47, 115-125. [13] Bagui, S.C. and Mehra, K.L. (2016) Convergence of Binomial, Poisson, Nega- tive-Binomial, and Gamma to Normal Distribution: Moment Generating Functions Technique. American Journal of Mathematics and Statistics, 6, 115-121. [14] Casella, G. and Berger, R.L. (2002) Statistical Inference. Second Edition, Duxbury Thomson Learning: Integre Technical Publishing Co., Albuquerque. [15] Young, G.A. and Smith, R.L. (2005) Essentials of Statistical Inference. Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511755392 [16] Podgorski, K. (2009) Lecture Notes on Statistical Inference. Department of Mathe- matics and Statistics, University of Limerick, Limerick.

DOI: 10.4236/am.2020.116031 446 Applied Mathematics