The Distribution of Errors

The Distribution of Errors: A Derivation of the Normal Distribution Using Calculus and Combinatorics Dan Teague NC School of Science and Mathematics

Consider a sitution in which you are trying to generate the value X. In the process, there are 2n random errors each of size d being either positive or negative with equal probabilty. The actual value generated, then, will vary from X- 2 nd to X+ 2 nd . 1 2n The probability of generating the value X- 2 nd is GFJI , since all of the errors of size H2K d had to be negative. The probability of generating the value X-(2 n - 3)d is F2nIF1I3F1I2n3 GJ GJ , since this value can only be generated if 3 of the errors were positive, HG3 KJH2KH2K and the remaining errors were negative. The order in which the positive and negative errors occur does not matter. We want to investigate the probability of an error of size R= 2 rd . An error of size R requires that bn  rg of the errors are positive and bn  rg of the errors are negative. The probability that this occurs is 2n 骣1 n+ r n - r (2n) !琪骣2n 骣1 骣 1 桫2 . We want to approximate this P( E ) =琪琪琪 = 桫n+ r 桫2 桫 2(n+ r) !( n - r) ! probability using Sterling's approximation for large factorials, n! 2p n e-n n n .

Applying Sterling's Formula to the Probability P( E ) . 2n 骣1 n+ r n - r (2n) !琪 From before, we have 骣2n 骣1 骣 1 桫2 . Using P( E ) =琪琪琪 = 桫n+ r 桫2 桫 2(n+ r) !( n - r) ! Sterling's formula, n! 2p n e-n n n , we can use the approximation 2n 2 n 骣1-2n 2n 骣 1 (2n) !琪 2p ( 2 n) e( 2 n) 琪桫2 桫 2 P( E ) = (n+ r)!( n - r) ! 2p(n+ r) e-(n + r) ( n + r)(n+ r) 2 p ( n - r) e -( n - r) ( n - r)( n - r)

This expression can be simplified greatly. We will proceed in several steps. The first step is to remove the radicals by dividing out common factors and rewriting as exponents.

2n 2 n+ 1 1 2 -2n2n骣1 - 2 n 2 n+ 骣 1 2p ( 2n) e( 2 n) 琪 e( 2 n) 2 琪桫2 桫 2 = -(n + r) (n+ r) -( n - r) ( n - r) -n + r(n+ r +1) - n - r ( n - r + 1 ) 2p(n+ r) e( n + r) 2 p ( n - r) e( n - r) e( ) ( n+ r) 2p e( ) ( n - r) 2 Next, combine common exponential expressions.

1 2n+ 1 1 2 -2n 2n+ 2 骣1 e(2 n) 琪 2n+ 1 桫2 n 2 = . -n + r(n+ r +1) - n - r ( n - r + 1) ( n + r + 1) ( n - r + 1 ) e( ) ( n+ r) 2p e( ) ( n - r) 2( n + r) 2 p ( n - r) 2

Next, we factor out an n in the terms in the denomintor.

2n 1 2n 1 n 2 n 2 1 1  1 1 cnr 2 h cnr 2 h cnr 2 h cnr 2 h bn  rg  bn  rg F r I F r I nGF1 JI  nGF1 JI HGH nKKJ HGH nKKJ and dividing out common factors of n,

2n+ 1 n 2 1 1 1= 1 1 (n+ r +2) ( n - r + 2) ( n + r + 2) ( n - r + 2 ) n+1骣r n + 1 骣 r 骣 r 骣 r . pn2琪1+ n 2 琪 1 - p n 琪 1 + 琪 1 - 桫n 桫 n 桫 n 桫 n

This last expression can be writtten as

1 1 1 1 = 1 (n+ r +) ( n - r + ) n 2 r- r 骣r2 骣 r 2 骣r2 骣 r 2 骣 r 骣 r . p n 琪1+ 琪 1 - p n 琪1- 琪 1 -琪 1 + 琪 1 - 桫n 桫 n 桫n2 桫 n 2 桫 n 桫 n

骣2n 骣1n+ r 骣 1 n - r 1 琪琪琪 n 1 So, the binomial 桫n+ r 桫2 桫 2 骣r2 骣 r 2 2 骣 rr 骣 r - r . p n 琪1- 琪 1 -琪 1 + 琪 1 - 桫n2 桫 n 2 桫 n 桫 n

Now, as n increases, we can approximate the expressions in r.

1 r r r 2 2 r r F I F1 I  1 F1 I  1 As n gets larger and larger, we have 1 2  1, G J , G J , and HG n KJ H nK H nK

r2 Fn I n n n 2 2 F 2 Hr KI r2 2 F r I F r I 1 n  r 1  1  e  e n . G 2 J GG 2 J J c h H n K HGH n K KJ 1 2n 骣 2 (2n) !琪 - r The final result suggests that 桫2 e n for large n. P( E) = (n+ r)!( n - r) ! p n

2 Checking a Few Values We can easily check this approximation. What is the probability that 50 independent errors of size 0.1 results in a final error of size 2? Here d = 0.1, R = 2 , 100 骣1 (100) !琪 n = 50 , and since R= 2 rd , r =10 . So, 桫2 and P(2) = 0.0108 (60) !( 40) ! -100 e 50 P(2) 换 0.0108 . What is the probability that 12 errors of size 0.01 results in a 50p final error of 0.06? In this case d = 0.01, R = 0.06 , n =12 , and r = 3. So 24 骣1 - 9 (24) !琪 e 12 桫2 and P(.06) 换 0.00769 . P(0.06) = 0.0779 12p (15) !( 9) ! The probability of 1000 errors of size 0.01 results in a final error between 1 and 2 can be found by letting r vary from 50 to 100 and adding up all the terms. 2000 骣1 (2000) !琪 100 桫2 . These values are all too big to handle. We can, however, r=50 (2000+r) !( 2000 - r) ! 2 - r 100 e 1000 find the sum 0.082 . r=50 1000p

The Variance Clearly as the number of errors (2n) increases, the variablility of the results also 1 increases. In the special binomial case considered here, we have p = 2 . The standard notation for the variance of the binomial is s 2 =np(1 - p) where n in this formula is the n numbe of trials (in our case, this is actually 2n). So, s 2 =2n( 1)( 1 ) = and n = 2s 2 . 2 2 2 2 2 - r - r e n e 2s 2 Substituting into P( R) , we have P( E) . This is more recognizable as p n s p 2 thepdf for the normal distribution N (0,s ) .

Reference: Baird, D. C., Experimentation: An Introduction to Measurement Theory and Experment Design, 2nd Edition, Prentice-Hall, Englewood Cliffs, NJ., 1988.

3