The Distribution of Errors
Total Page:16
File Type:pdf, Size:1020Kb
The Distribution of Errors: A Derivation of the Normal Distribution Using Calculus and Combinatorics Dan Teague NC School of Science and Mathematics
Consider a sitution in which you are trying to generate the value X. In the process, there are 2n random errors each of size d being either positive or negative with equal probabilty. The actual value generated, then, will vary from X- 2 nd to X+ 2 nd . 1 2n The probability of generating the value X- 2 nd is GFJI , since all of the errors of size H2K d had to be negative. The probability of generating the value X-(2 n - 3)d is F2nIF1I3F1I2n3 GJ GJ , since this value can only be generated if 3 of the errors were positive, HG3 KJH2KH2K and the remaining errors were negative. The order in which the positive and negative errors occur does not matter. We want to investigate the probability of an error of size R= 2 rd . An error of size R requires that bn rg of the errors are positive and bn rg of the errors are negative. The probability that this occurs is 2n 骣1 n+ r n - r (2n) !琪 骣2n 骣1 骣 1 桫2 . We want to approximate this P( E ) =琪 琪 琪 = 桫n+ r 桫2 桫 2(n+ r) !( n - r) ! probability using Sterling's approximation for large factorials, n! 2p n e-n n n .
Applying Sterling's Formula to the Probability P( E ) . 2n 骣1 n+ r n - r (2n) !琪 From before, we have 骣2n 骣1 骣 1 桫2 . Using P( E ) =琪 琪 琪 = 桫n+ r 桫2 桫 2(n+ r) !( n - r) ! Sterling's formula, n! 2p n e-n n n , we can use the approximation 2n 2 n 骣1-2n 2n 骣 1 (2n) !琪 2p ( 2 n) e( 2 n) 琪 桫2 桫 2 P( E ) = (n+ r)!( n - r) ! 2p(n+ r) e-(n + r) ( n + r)(n+ r) 2 p ( n - r) e -( n - r) ( n - r)( n - r)
This expression can be simplified greatly. We will proceed in several steps. The first step is to remove the radicals by dividing out common factors and rewriting as exponents.
2n 2 n+ 1 1 2 -2n2n骣1 - 2 n 2 n+ 骣 1 2p ( 2n) e( 2 n) 琪 e( 2 n) 2 琪 桫2 桫 2 = -(n + r) (n+ r) -( n - r) ( n - r) -n + r(n+ r +1) - n - r ( n - r + 1 ) 2p(n+ r) e( n + r) 2 p ( n - r) e( n - r) e( ) ( n+ r) 2p e( ) ( n - r) 2 Next, combine common exponential expressions.
1 2n+ 1 1 2 -2n 2n+ 2 骣1 e(2 n) 琪 2n+ 1 桫2 n 2 = . -n + r(n+ r +1) - n - r ( n - r + 1) ( n + r + 1) ( n - r + 1 ) e( ) ( n+ r) 2p e( ) ( n - r) 2( n + r) 2 p ( n - r) 2
Next, we factor out an n in the terms in the denomintor.
2n 1 2n 1 n 2 n 2 1 1 1 1 cnr 2 h cnr 2 h cnr 2 h cnr 2 h bn rg bn rg F r I F r I nGF1 JI nGF1 JI HGH nKKJ HGH nKKJ and dividing out common factors of n,
2n+ 1 n 2 1 1 1= 1 1 (n+ r +2) ( n - r + 2) ( n + r + 2) ( n - r + 2 ) n+1骣r n + 1 骣 r 骣 r 骣 r . pn2琪1+ n 2 琪 1 - p n 琪 1 + 琪 1 - 桫n 桫 n 桫 n 桫 n
This last expression can be writtten as
1 1 1 1 = 1 (n+ r +) ( n - r + ) n 2 r- r 骣r2 骣 r 2 骣r2 骣 r 2 骣 r 骣 r . p n 琪1+ 琪 1 - p n 琪1- 琪 1 -琪 1 + 琪 1 - 桫n 桫 n 桫n2 桫 n 2 桫 n 桫 n
骣2n 骣1n+ r 骣 1 n - r 1 琪 琪 琪 n 1 So, the binomial 桫n+ r 桫2 桫 2 骣r2 骣 r 2 2 骣 rr 骣 r - r . p n 琪1- 琪 1 -琪 1 + 琪 1 - 桫n2 桫 n 2 桫 n 桫 n
Now, as n increases, we can approximate the expressions in r.
1 r r r 2 2 r r F I F1 I 1 F1 I 1 As n gets larger and larger, we have 1 2 1, G J , G J , and HG n KJ H nK H nK
r2 Fn I n n n 2 2 F 2 Hr KI r2 2 F r I F r I 1 n r 1 1 e e n . G 2 J GG 2 J J c h H n K HGH n K KJ 1 2n 骣 2 (2n) !琪 - r The final result suggests that 桫2 e n for large n. P( E) = (n+ r)!( n - r) ! p n
2 Checking a Few Values We can easily check this approximation. What is the probability that 50 independent errors of size 0.1 results in a final error of size 2? Here d = 0.1, R = 2 , 100 骣1 (100) !琪 n = 50 , and since R= 2 rd , r =10 . So, 桫2 and P(2) = 0.0108 (60) !( 40) ! -100 e 50 P(2) 换 0.0108 . What is the probability that 12 errors of size 0.01 results in a 50p final error of 0.06? In this case d = 0.01, R = 0.06 , n =12 , and r = 3. So 24 骣1 - 9 (24) !琪 e 12 桫2 and P(.06) 换 0.00769 . P(0.06) = 0.0779 12p (15) !( 9) ! The probability of 1000 errors of size 0.01 results in a final error between 1 and 2 can be found by letting r vary from 50 to 100 and adding up all the terms. 2000 骣1 (2000) !琪 100 桫2 . These values are all too big to handle. We can, however, r=50 (2000+r) !( 2000 - r) ! 2 - r 100 e 1000 find the sum 0.082 . r=50 1000p
The Variance Clearly as the number of errors (2n) increases, the variablility of the results also 1 increases. In the special binomial case considered here, we have p = 2 . The standard notation for the variance of the binomial is s 2 =np(1 - p) where n in this formula is the n numbe of trials (in our case, this is actually 2n). So, s 2 =2n( 1)( 1 ) = and n = 2s 2 . 2 2 2 2 2 - r - r e n e 2s 2 Substituting into P( R) , we have P( E) . This is more recognizable as p n s p 2 thepdf for the normal distribution N (0,s ) .
Reference: Baird, D. C., Experimentation: An Introduction to Measurement Theory and Experment Design, 2nd Edition, Prentice-Hall, Englewood Cliffs, NJ., 1988.
3