A review and comparison of continuity correction rules; the normal approximation to the Presenter: Yu-Ting Liao Advisor: Takeshi Emura Graduate Institute of Statistics, National Central University, Taiwan ABSTRACT: In applied statistics, the continuity correction is useful when the binomial distribution is approximated by the . In the first part of this thesis, we review the binomial distribution and the central limit theorem. If the sample size gets larger, the binomial distribution approaches to the normal distribution. The continuity correction is an adjustment that is made to further improve the normal approximation, also known as Yates’s correction for continuity (Yates, 1934; Cox, 1970). We also introduce Feller’s correction (Feller, 1968) and Cressie’s finely tuned continuity correction (Cressie, 1978), both of which are less known for statisticians. In particular, we review the mathematical derivations of the Cressie’s correction. Finally, we report some interesting results that these less known corrections are superior to Yates’s correction in practical settings. Theory that is, Design n( X   ) LetX be a having a probability n d N( 0,1). We choose pairs of( n, p ) for numerical function  n analyses. We follow the rule( np 15 ) or   k nk SinceX1, X 2 ,,X n is a sequence of independent pX ( k )  Pr( X  k )    p (1 p ) , ( np 10 and p  0.1) by Emura and Lin (2015) k  and identically distribution and mgfs exist, so we to choose the value of( n, p ) . For fixed p wheren { 0,1, 2,}, ,0  p 1 and 0  k  n. canuseCentralLimitTheorem(Casellaand Berger 2002), if the sample size is sufficient large. andn we try to findk such that The parameterp is the probability of an event.  k  np  The binomial distribution can be approximated to    0.9973. n The parameter is the number of trails. The the normal distribution.  np(1 p )  cumulative distribution function is X  np d This yields Z   N( 0,1), 1 k n np(1 p ) k  [ np   (  ) np(1 p ) ]. F ( k )  Pr( X  k )    pi (1 p )ni X  We consider  0.0027, 0.95, and 0.05. i0 i  the distribution ofX will be normal and for large The absolute error of the distribution The mean and ofX are n approximately normal, function,  k  d  np  Err( n, p, k )  F ( k )    . E( X )  np Var( X )  np(1 p ). FX ( k )  Pr( X  k ) X and  np(1 p )  n  X  np k  np  where without continuity correction, One can write , where X , X , , X  Pr   d  0 X   X j 1 2  n j1  np(1 p ) np(1 p )  d  0.3 with Feller’s correction,d  0.5 with with Yates’s correction, and  k  np  are independent Bernoulli random variable with 2  Pr Z   d 1/ 2  ( q  p )(  k1/ 2 1) / 6 with Cressie’s  np(1 p )  Pr( X j  0 ) 1 p and Pr( X j 1)  p. finely tuned correction.

n\p 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.160.170.180.19  k  np  10 20 30    , 40 50 Definition (Casella and Berger 2002) 60 np(1 p ) 70   80 90 100 150 200 250 300 2 350 X , X , , u 400 Let1 2 be a random variable with x  450  1 500 2 550 where ( x )  e du. 600 f 650 probability density functionX .The moment - 700 750 2 800 M ( t )  E( etx ), generating function (mgf) ofX , is X We choose the boundary of the rule. if the expectation exists fort, allt in  h  t  h, Continuity correction for some h  0. If the expectation does not exist in A continuity correction is an adjustment that is a neighborhood of 0, we say that the moment made when a discrete distribution is approximated generation function does not exist. by a continuous distribution. Since the binomial IfX is continuous, we can write the mgf ofX as  distribution is discrete and normal distribution is tx M X ( t )  e f X ( x ) dx, continuous, it is common practice to use  continuity correction in the approximation. The and ifX is discrete, we can write the mgf ofX as most popular continuity correction is the Yate tx tx M X ( t )  e Pr( X  x )  e f ( x ). correction (Yates, 1934; Cox, 1970) x x  k  0.5  np  Now we calculate the mgf’s of X j FX ( k )   . 1  np(1 p )  M ( t )  E( etX j )  etx p x (1 p )1x X j  x0 Another continuity correction is the Feller  et0  p0  (1 p )  et1 p1  (1 p )11 correction (Feller, 1968)  pet  (1 p ).  k  0.3  np  F ( k )   . X np(1 p ) The expectation exists for all t in h  t  h for   References any h  0. Thus, the moment generating function is exist. Casella, G and Berger, R. L. (2002). Statistical Inference, 2rd ed. Duxbury Press, Australia. Central Limit Theorem Cressie’s finely tuned continuity correction Cox, D. R. (1970). The continuity correction. Biometrika 57: 217-219. LetX1, X 2 ,, be a sequence of independent and We wish to approximate, 2 Cressie, N. (1978). A finely tuned continuity correction, identically distribution random variables whose d 1/ 2  ( q  p )(  k1/ 2 1)/ 6, by Ann. Inst. Statist. Math 30: 435-442. E( X )   1/ 2 mgfs exist in a neighborhoodn of 0. Let j ( ( k  np  d )( np(1 p ) ) ). Emura, T. and Lin, Y. S. (2015). A comparison of and define LetG ( x ) denote the d  0.5 normal approximation rules for attribute control charts. X n  X j / n. n Typically the value ofd chosen to be j1 (Yates, 1934; Cox, 1970). Cressie (1978) Quality and Reliability Engineering International 31 cumulative distribution function of (No.3): 411–418. n( X n   ) /. proposed a method to improve continuity Feller, W. (1968). An Introduction to Then, for any x,    x  , correction of d  0.5. and Its Applications, volume I, 3rd ed. John Wiley & x Sons, Inc. New York. 1  y 2 / 2 limGn ( x )  e dy; n   2