Degree Project

Edgeworth Expansion and Saddle Point Approximation for Discrete Data with Application to Chance Games.

Rani Basna 2010-09-27 Subject: Mathematics Level: Master Course code: 5MA11E

Edgeworth Expansion and Saddle Point Approximation for Discrete Data With Application to Chance Games

Abstract

We investigate mathematical tools, Edgeworth expansion and the saddle point method, which are approximation techniques that help us to estimate the distribution function for the standardized mean of independent identical distributed random variables where we will take into consideration the lattice case. Later on we will describe one important application for these mathematical tools where game developing companies can use them to reduce the amount of time needed to satisfy their standard requests before they approve any game.

Keywords

Characteristic function, Edgeworth expansion, Lattice random variables, Saddle point approximation.

Acknowledgments First I would like to show my gratitude to my supervisor Dr.Roger Pettersson for his continuous support and help provided to me. It is a pleasure to me to thank Boss Media® for giving me this problem. I would like also to thank my fiancée Hiba Nassar for her encourage .Finally I want to thank my parents for their love and I dedicate this thesis to my father who inspired me.

Rani Basna Number of Pages: 45

Contents

1 Introduction 3

2 Notations and Definitions 3 2.1 CharacteristicFunction...... 3 2.2 CentralLimitTheorem...... 6 2.3 Definition of Lattice Distribution and Bounded Variation . . . 7

3 Edgeworth Expansion 8 3.1 FirstCase ...... 9 3.2 SecondCase...... 14 3.2.1 AuxiliaryTheorems...... 14 3.2.2 OnTheRemainderTerm...... 15 3.2.3 MainTheorem ...... 17 3.2.4 Simulation Edgeworth Expansion for Continuous Ran- domVariables...... 18 3.3 Lattice Edgeworth Expansion ...... 18 3.3.1 The Bernoulli Case ...... 19 3.3.2 Simulating for the Edgeworth expansion Bernoulli Case 21 3.3.3 GeneralLatticeCase ...... 21 3.3.4 Continuity-Corrected Points Edgeworth Series . . . . . 25 3.3.5 Simulation on Triss Cards ...... 26

4 Saddle Point Approximation 30 4.1 Simulation With Saddle Point Approximation Method . . . . 32

A Matlab Codes 33

References 38

2 1 Introduction

The basic idea in this Master’s Thesis is to define, adjust, and apply two mathematical tools (Edgeworth Expansion and Saddle Point approximation). Mainly we want to estimate the cumulative distribution function for inde- pendent and identically distributed random variables, specifically the lattice random variables. These approximating methods will give us the ability to reduce the amount of independent random variables for our estimate, com- paring to what we usually use by normal approximation using the . This mean that will make these methods more applicable to real life industry. More precisely the chance game industry may use methods to diminish the amount of time they need to publish a new trusted game. We will write Matlab codes to verify theoretical results, by simulating a Triss game similar to real ones with three boxes, and then apply the codes on this game to see how accurate our methods will be. In the second chapter we define some basic concepts and present important theorems that will help us in our work. In the third chapter we will intro- duce the Edgeworth expansion series in addition to the improvement of the remainder term that Esseen (1968)[11] present. In Chapter four we will dis- cuss the lattice random variables case which is much more important for us since we face it in important applications. After that we will try to apply the Edgeworth method to our main problem and look at the results. In Chapter five we will describe the most important useful formulas for the saddle point approximation technique without theoretical details. Finally we will apply the saddle point approximation method to our problem.

2 Notations and Definitions

2.1 Characteristic Function The definitions and theorems presented below can be found, for example, at [14] ,[18],...,[19].

Definition 2.1 (Distribution Function). For a random variable X, FX (x)= P (X x) is called the distribution function of X. ≤ Definition 2.2 (Characteristic Function). For a random variable X let itX itx ΨX (t) = E e = ∞ e dFX (x), called the characteristic function of X. −∞   R Here the integral is in the usual Stieltjes integral sense.

3 Theorem 2.1. Let X be a random variable with distribution function F and characteristic function ψX (t). 1. If E X n < for some n =1, 2,..., then | | ∞ n j n n n+1 n+1 (it) j t X t X ψX (t) EX E min 2| | | | , | | | | − j! ≤ ( n! (n + 1)! ) j=0 X In particular,

ψ (t) 1 E min 2, tX , | X − |≤ { | |} If E X < , then | | ∞ ψ (t) 1 itEX E min 2 tX ,t2X2/2 , | X − − |≤ | | and if EX2 < , then  ∞ ψ (t) 1 itEX t2EX2/2 E min t2X2, tX 3 /6 , X − − − ≤ | | n t n n  | | 2. If E X < , for all n, and n! E X 0 as n for all t R, then| | ∞ | | → → ∞ ∈ ∞ (it)j ψ (t)=1+ EXj X j! j=1 X Theorem 2.2 (Characteristic Function of Normal Random Variables). If X N(µ,σ) Then it’s characteristic function is ∈ σ2t2 ψ (t) = exp(iµt ) (1) X − 2

To make the paper more self-contained a proof is included. Proof:

We know that 1 ∞ (x µ)2 itx − 2 ψX (t)= e e− 2σ dx σ√2π Z−∞ (x2 2xµ+µ2) 1 ∞ itx − = e e− 2σ2 dx σ√2π Z−∞ (x2 2xµ+µ2)+2itxσ2 1 ∞ − = e− 2σ2 dx σ√2π Z−∞ (x2 2xµ 2itxσ2) µ2 1 ∞ − − = e− 2σ2 e− 2σ2 dx σ√2π Z−∞ (x2 2(µ+itσ2)x) µ2 1 ∞ − = e− 2σ2 e− 2σ2 dx σ√2π Z−∞ 4 (x2 2(µ+itσ2)x+(µ+itσ2)2) (µ+itσ2)2 µ2 1 ∞ − + = e− 2σ2 2σ2 e− 2σ2 dx σ√2π Z−∞ (x µ itσ2)2 (µ+itσ2)2 µ2 1 ∞ − − − = e− 2σ2 e 2σ2 dx σ√2π Z−∞ (x µ itσ2)2 2 2 1 ∞ − − µit t σ = e− 2σ2 e − 2 dx σ√2π Z−∞ t2σ2 (x µ itσ2)2 exp(µit 2 ) ∞ − − = − e− 2σ2 dx. σ√2π Z−∞ (x µ itσ2) − − By substituting y = σ we get dx dy = σ

2 2 t σ 2 exp(itµ 2 ) ∞ y ψX (t)= − e− 2 dy √2π Z−∞ 2 1 y where e −2 dy =1 √2π ⇒ R t2σ2 ψ (t) = exp(itµ ). X − 2 Using Maclaurin expansion for (1) we get the following expansion

t2σ2 1 t2σ2 ψ (t)=1+(µit )+ (µit )2 + ... X − 2 2 − 2 In addition, if we have two normal random variables X,Y : t2σ2 ψ (t) = exp(µ it X ) X X − 2 and t2σ2 ψ (t) = exp(µ it Y ) y Y − 2 then we can easily prove that t2(σ2 + σ2 ) ψ (t)= ψ (t)ψ (t) = exp[(µ + µ )it X Y ]. X+Y X Y X Y − 2 For the special case when X N(0, 1) ∈ we have the formula t2/2 ψX (t)= e− .

5 2.2 Central Limit Theorem

Theorem 2.3 (Central Limit Theorem). Let X1,X2,... be a sequence of independent and identically distributed random variables each having mean µ and variance σ2 < . Then the distribution of ∞ X + ... + X nµ 1 n − σ√n tends to the standard as n . → ∞

The theorem is fundamental in probability theory. One simple derivation is in Blom [14] which we follow below. n Proof: Let’s put: Yk =(Xk µ)/σ, and Sn = 1 Yk/√n. We will show that − itSn P ψSn (t)= E(e ) t2/2 converge to e− , the characteristic function of the standard normal distri- bution. By the independence,

n ψSn (t)= ψ Yk (it/√n)=[ψY (it/√n)] . P Furthermore t2 ψ (t)=1+ iE(Y )t E(Y 2) + t3H(t) Y − 2 where H(t) is bounded in a neighborhood of 0. We get

2 2 t 3/2 ψ (t/√n)=1+ iE(Y )t/√n E(Y ) + n− H . Y − 2n n where Hn is finite and E(Y )=0,V (Y )=1. Hence 2 t 3/2 n ψ (t)=[1 + n− H ] . Sn − 2n n and 2 t 3/2 ln ψ (t)= n ln(1 + n− H ) Sn − 2n n

2 2 3/2 t 3/2 ln(1 t /2n + n− Hn) = n + n− Hn − . −2n t2 + n 3/2H   − 2n − n From the logarithm property ln(1+x) 1, as x 0. Thus x → → 2 2 t 3/2 t n + n− H , as n −2n n →− 2 → ∞   6 and 2 3/2 ln(1 t /2n + n− H ) − n 1, as n t2 + n 3/2H → → ∞ − 2n − n Since the characteristic function of Sn converges to the Characteristic func- tion of the standard normal distribution, Sn converges in distribution to the standard normal random variable, see e.g Cramér[4]. Theorem 2.4 (Laypunov’s Theorem). suppose that for each n the sequence 2 X1,X2,...,Xn is independent and satisfies E [Xn]=0, Var[Xn] = σn and S2 = N σ2. if for some δ > 0 the expected values E X 2+δ are finite N n=1 n | k| for every k and that Lyapounov’s condition P h i N 1 2+δ lim 2+δ E Xn =0 SN n=1 X   holds for some positive δ then the central limit theorem holds. Remark 2.5. This theorem is considered as further development of Lya- pounov’s solution to the second problem of Chebyshev (you can see more details in Gnedenko and Kolmogorov [13]) which turned out to be much sim- ple and more useful in applications of the central limit theorem than former solutions.

2.3 Definition of Lattice Distribution and Bounded Vari- ation Definition 2.3 (Lattice distribution). A random variable X is said to have a lattice distribution if with probability 1 it takes on values of the form a+kh where (k = 1, 2,...), and h > 0 are constants. We call the smallest such number h the± span± of the distribution. Definition 2.4 (Function of Bounded Variation). Let F (x) be a real or complex-valued function of the real variable x. We say F (x) has bounded variation on the whole real axis if

∞ V (F )= dF (x) < | | ∞ Z−∞ For a function F of bounded variation, define F (x) at the discontinuity points in such a way that 1 F (x)= [F (x + 0) F (x 0)] 2 − − 7 where F (x +0) = limε 0F (x + ε) and F (x 0) = limε 0F (x ε). → → If furthermore F ( )=0, F ( )=1, −then F (x) is said to− be a lattice distribution function.−∞ ∞

3 Edgeworth Expansion

Let X1,X2,...Xn be independent and identically distributed random variables with mean µ and variance σ2. By the Central Limit Theorem, n X /n µ S = i=1 i − n σ/√n P is asymptotically normally distributed with zero mean and unit variance. What we are interested in here is to study the asymptotic behavior of the difference between the normal distribution Φ(x) and the distribution function Fn(x) of the Sn. In other words we want to describe the error of the normal approximation and one way to do that is via characteristics function. Three cases may appear, where they all together cover all possibilities:

1. The characteristic function ψX (t) satisfy the condition

lim sup ψX (t) < 1, (C) t | | | |→∞ called the Cramér condition. Then the distribution function has the following expansion s pj(x) x2 1 − 2 Fn(x)=Φ(x)+ j/2 e + O s+1 , s 1 (2) n 2 ≥ j=1 n X   where pj(x) is a polynomial in x. 2. Condition (C) is not satisfied and the distribution is not of lattice type. It is found that

2 α3 2 x 1 Fn(x)=Φ(x)+ (1 x )e− 2 + o( ) 6σ3√2πn − √n α being the third moment of X , α = E [(X EX)3] . 3 i 3 −

3. Fn(x) is a lattice distribution function. Even if all moments are finite, an expansion like the latter one is impossible, Fn(x) will have jumps at 1 discontinuity points of order of magnitude √n . By adding an extra term to the expansion we diminish the order of magnitude of the remainder term.

8 1 1 Note: When s = 1, O( n ) << o( √n ). that means when (C) is satisfied the error is much more smaller than when it is not satisfied.

Remark 3.1. If the distribution function of X is absolutely continuous, i.e. X is a continuous random variable, then (C) is valid.

Remark 3.2. Kolmogorov [13] noted that there are discrete distributions which are not lattice distributed. For instance, if Xj takes only the values √ 1 1 and 3, each with probability 4 , then its distribution is not a lattice distribution± ± because the system of equations

√3= a + k h − 1 1= a + k2h

1= a + k3h √3= a + k h − 4 where ki 1, 2,... , which, if I understand Kolmogorov correctly, does not have∈ a solution. {± ± }

3.1 First Case Here a proof of (2) will be presented. However, we assumed that the distribu- tion function of X is absolutely continuous which implies (C), recall remark 3.1. We know that Sn is asymptotically normal N(0, 1). Then the character- istic function ψ of S converges to that of N(0, 1) as N , n n → ∞ t2/2 ψ (t)= E [exp(itS )] E[exp(itN)] = e− ,

Now if we put Y =(X µ)/σ, where X is equal in law to Xi, and ψY is the characteristic function of− Y then we will have

Pn X nµ i=1 i− itSn it σ√n ψSn (t)= E[e ]= E[e ] X1 µ X2 µ Xn µ it/√n − it/√n − it/√n − = E[e σ ]E[e σ ].....E[e σ ] (4) X µ it/√n − n it/√nY n 1/2 n =(E[e σ ]) =(E[e ]) =(ψY (t/n )) .

We also can write the characteristics function as an expansion

∞ k (it)j log ψ (t)= j . Y j! j=1 X

9 Then 1 1 ψ (t) = exp k it + k (it)2 + ... + k (it)j + .. . (5) Y 1 2 2 3! j   1 z2 Formula (5) follow by using the expansion of the log(1 + z)= z 2 + ... zj j − ± j + O(z ) and replacing the 1+ z by ψY (t) and do some rearrangements. In addition, we have from the characteristic function developed in Maclaurin’s series for small value that 1 1 ψ (t)=1+ E[Y ]it + E[Y 2](it)2 + ... + E[Y j](it)j + ... Y 2 j!

We can define the Kj using the formal identity

1 1 k (it)j = log 1+ E[Y j](it)j j! j j! j 1 ( j 1 ) X≥ X≥ 1 1 k = ( 1)i+1 E[Y j](it)j − i j! i 1   X≥ and by equating coefficients of (it)j,

k1 = E[Y ], k = E[Y 2] E[Y ]2 = V ar(Y ), 2 − k = E[Y 3] 3E[Y 2]E[Y ]+2E[Y ]3 = E[Y E[Y ]]3, (6) 3 − − k = E[Y 4] 4E[Y 3]E[Y ] 3E[Y 2]2 + 12E[Y 2]E[Y ]2 6E[Y ]4 4 − − − = E[Y E[Y ]]4 3(V ar(Y ))2, − − and so on. We expressed the cumulants in term of homogeneous polynomial in mo- ments of degree j. Moments too can be expressed in terms of homogeneous polynomials in cumulants. We have standardized the random variable Y for location and scale, so now E[Y ] = k1 = 0 and var(Y ) = k2 = 1. Hence by (4) and (5) and using the expansion series of the exponential function we get:

it k it 2 k it j ψ (t/n1/2) = exp k + 2 + ... + j . Y 1 √n 2! √n j! √n (     ) it k (it)2 k (it)j ψ (t/n1/2) = exp k + 2 + ... + j Y 1 √n 2! n j! nj/2  

10 Hence k j n it k (it)2 j (it) k1 2 √n 2! n j! nj /2 ψSn (t)= e e ...e ...   Substituting k1 =0 and k2 =1 we get

2 3 (j 2) j t 1/2 k3(it) − − kj(it) ψ (t) = exp − + n− + ... + n 2 + ... , Sn 2 3! j!   t2/2 1/2 1 j/2 ψSn (t)= e− 1+ n− r1(it)+ n− r2(it)... + n− rj(it)+ ... , (7) where rj is a polynomial of degree 3j depending on the cumulants, and this expansion gives the same as the convergence of the central limit theorem gives (3). We can see that rj is an even polynomial when j is even and an odd polynomial when j is odd, and it is obvious from (7) that 1 r (u)= k u3 (8) 1 6 3 and 1 1 r (u)= k u4 + k2u6. (9) 2 24 4 72 3 We can rewrite (7) in the form

t2/2 1/2 t2/2 1 t2/2 j/2 t2/2 ψSn (t)= e− + n− r1(it)e− + n− r2(it)e− ... + n− rj(it)e− + ... (10) Now since ∞ ψ (t)= eitxdP (S x) Sn n ≤ Z−∞ and t2/2 ∞ itx e− = e dΦ(x), (11) Z−∞ where Φ denotes the standard normal distribution function, if we apply the inverse method of Fourier-Stieljes transform on (12) we get

1/2 1 j/2 P (S x)=Φ(x)+ n− R (x)+ n− R (x)... + n− R (x)+ ..., (12) n ≤ 1 2 j where Rj(x) represents a function whose Fourier-Stieljes transform equals t2/2 rj(it)e− ∞ itx t2/2 e dRj(x)= rj(it)e− . Z−∞

11 Our focus now is to compute Rj. Applying integrating by parts j times in (11) gives

t2/2 1 ∞ itx (1) e− =( it)− e dΦ (x) − Z−∞ t2/2 2 ∞ itx (2) e− =( it)− e dΦ (x) − Z−∞ (13) . .

j ∞ itx (j) =( it)− e dΦ (x), − Z−∞ where Φ(j)(x)=(d/dx)jφ(x). Putting D as the differential operator d/dx, such that r ( D) is differential operator, Hence j − ∞ itx j j t2/2 e d ( D) Φ(x) =(it) e− (14) − Z−∞  ∞ itx j t2/2 e d r ( D)Φ(x) =(it) e− { j − } Z−∞ and r ( D)Φ(x) is the function we are looking for here j − R (x)= r ( d/dx)φ(x), for j 1. (15) j j − ≥ The Chebyshev-Hermit polynomials can be defined by the formula k k x2/2 d x2/2 H (x)=( 1) e e− , k − dxk Which gives

H0(x)=1,

H1(x)= x, H (x)= x2 1, (16) 2 − H (x)= x3 3x, 3 − ...... Then using the Hermitian polynomials we can put

j ( D )Φ(x)= Hj 1(x)Φ(x). − − − We note that the Hermitian polynomials are orthogonal with respect to the function Φ(x) and that Hj is a polynomial of degree j and is even when j is even and is odd when j is odd. We get now from (9), (8), and (15) that 1 R (x)= k (x2 1)φ(x) 1 −6 3 − 12 1 1 R (x)= x k (x2 3) + k2(x4 10x2 + 15) φ(x). 2 − 24 4 − 72 3 −   For general j 1, ≥ Rj(x)= pj(x)φ(x), Here the polynomial p have degree of order 3j 1 and is odd for even j. j − This is clear because of rj degree. Hence 1 p (x)= k (x2 1) (17) 1 −6 3 − and 1 1 p (x)= x k (x2 3) + k2(x4 10x2 + 15) . (18) 2 − 24 4 − 72 3 −   Now we can rewrite formula (12)

1/2 1 j/2 P (Sn x)=Φ(x)+ n− p1(x)φ(x)+ n− p2(x)φ(x)... + n− pj(x)φ(x)+ ... ≤ (19) called the Edgeworth expansion of the distribution of P (Sn x). The third 1/2 ≤ k3 refers to skewness, so the term of n− order in the last expan- sion improves the basic normal approximation of the cumulative distribution function of Sn by performing skewness correction. In the same way k4 refers 1 to kurtosis for the term of order n− which improves the normal approxima- tion further by adjusting for kurtosis. In other words the O(1/√n) rate of 1 the Berry-Essen theorem is improved to uniform errors of the order n− , and 3 n− 2 by the one and two term Edgeworth expansion. It is very rare that this expansion converges according to Hall [15]. In fact there is a condition on this expansion (19) Cramér [5], which says that if X has an absolutely continuous distribution function then the condition for convergence of (19) is 1 E exp( Y 2) < , 4 ∞   which is a very severe condition. Usually (19) exists as an asymptotic series, which means that if the series stop at a specific order the remainder is of smaller order than the last omitted term in the series. It means

1/2 1 j/2 j/2 P (Sn x)=Φ(x)+n− p1(x)φ(x)+n− p2(x)φ(x)...+n− pj(x)φ(x)+o(n− ). ≤ (20) j/2 j+1 In fact, the remainder term o(n− ) is much smaller, namely O(n− 2 ) [10]. The restrictions on (20) are E( X j+2) < and lim sup ψ(t) < 1. (21) | | ∞ t | | | |→∞ 13 You can find the proof of this fact in Hall [15]. We derived the Edgeworth expansion from an expansion for the logarithm of the characteristic function of Sn. The Cramér condition (C) ensures that the characteristic function can be expanded in the requested manner. The expansion of Fn was obtained by a Fourier inversion of the expansion for the characteristic function.

3.2 Second Case 3.2.1 Auxiliary Theorems Now we move to the case where the condition (C) is not satisfied and the distribution is not of lattice type. For that we need some auxiliary theorems before a proof the main theorem, Theorem 3.10.

Theorem 3.3. Let A, T, and ε > 0 be constants, F (x) a non decreasing function, and G(x) a function of bounded variation. If

1. F ( )= G( ),F (+ )= G(+ ), −∞ −∞ ∞ ∞ 2. F (x) G(x) dx < , | − | ∞ 3. RG′ (x)exist for all x and G′ (x) A, ≤ T f(t) g(t) − 4. T t dt = ε, −

R then to every number k > 1 there corresponds a finite positive number c(k) depending only on k such that ε A F (x) G(x) k + c(k) . | − |≤ 2π T

Theorem 3.4. Let A, T, ε be arbitrary positive constants, F (x) a non de- creasing discontinuous function, and G(x) a function of bounded variation. If

1. F ( )= G( )=0,F (+ )= G(+ ), −∞ −∞ ∞ ∞ 2. F (x) G(x) dx < , | − | ∞ 3.R the functions F (x) and G(x) have discontinuities only at the points x = xi(xi < xi+1; i = 0, 1, 2,...), and there exist an l such that min(x x ) l, ± ± i+1 − i ≥ 14 4. everywhere except at x = x (i =0, 1, 2,...), i ± ± G′ (x) A ≤

T f(t) g(t) − 5. T t dt = ε, − then toR every number k > 1 there corresponds two finite numbers c (k) and 1 c2(k) depending only on k and such that ε A F (x) G(x) k + c (k) | − |≤ 2π 1 T whenever, Tl c (k). ≥ 2 3.2.2 On The Remainder Term

Theorem 3.5. If the random variables X1,X2,...Xn,.. have finite third mo- ments, then ρ F (x) Φ(x) c 3 , | n − |≤ √n where ρ3 is the third cumulant of the random variable x/σ and c is a constant.

Theorem 3.6. If the random variables X1,X2,...Xn,.. are identically dis- tributed and have finite third moments, then for σ3√n t = Tn, | |≤ 5β3 the following inequality holds:

3 t2 7 t β3 t2 f (t) e− 2 | | e− 4 , n − ≤ 6 σ3√n

where β3 denote the absolute moment of order 3 and equal

∞ β = x 3 dF (x). 3 | | Z−∞

n Xi/n µ Pi=1 − We found that the characteristics function of Sn = σ/√n where Xi are independent and identically distributed is

∞ t2/2 1 j ψ (t)= e− 1+ r (it)( ) (22) Sn j √n ( j=1 ) X

15 Theorem 3.7. If in the sum (22) the summands have finite moments from √n 1 order s where s 3, then for t Tsn = 3/s the inequality ≥ | |≤ 8sρs

s 3 2 − 2 t 1 j c1(s) s 3(s 2) t − 2 − − 4 fn(t) e 1+ rj(it)( ) s 2 t + t e (23) − √n ! ≤ Tsn− | | | | j=1   X holds where c1(s) depend only on s; also the inequality

s 2 2 2 t − 1 δ(n) s 3(s 1) t 2 j 4 fn(t) e− 1+ rj(it)( ) s 2 t + t − e− (24) − √n ! ≤ n −2 | | | | j=1   X holds, where δ(n) depends only on n and

lim δ(n)=0. n →∞

Remark 3.8. By Gnedenko and Kolmogorov [13, P204,Theorem 1b] we in- stead have

s 2 2 − 2 t 1 j c2(s)δ(n) s 3(s 2) t − 2 − − 4 fn(t) e 1+ rj(it)( ) s 2 t + t e . − √n ! ≤ Tsn− | | | | j=1   X

Theorem 3.9. If the distribution function F (x) is non lattice, then whatever the number w> 0 may be, there exist a function λ(n) such that

lim λ(n)= n →∞ ∞ and λ(n) f n(t) 1 I = | |dt = o( ) t √n Zw

The theorem we will mention next was first proved By Cramér [5] where the condition (C) satisfied, and later by Esseen [10] which we will present here.

1 = βs ρs σs

16 3.2.3 Main Theorem

Theorem 3.10. If the independent random variable X1,X2,...,Xn are iden- tically distributed, nonlattice, and have finite third moment, then p (x) 1 F (x)=Φ(x)+ φ(x) 1 + o( ) (25) n √n √n uniformly in x, where p (x)= k3 (1 x2). 1 6 − Proof: Put s =3 in Theorem 3.7 of formula (24). We then deduce that

2 2 2 t r1(it) t δ(n) 3 6 t f (t) e− 2 e− 2 t + t e− 4 . (26) n − − √n ≤ √n | | | |



The characteristic function of the function

k3 (3) k3 2 t2/2 R1( Φ) = Φ (x)= (1 x )e− − − 6 6√2π − is equal to k3 3 t2/2 t2/2 (it) e− = r (it)3e− . 6 1 Now apply Theorem 3.3 with 1 F (x)= F (x),G(x)=Φ(x)+ R ( Φ) n √n 1 −

A = max G′ (x) < + , T = λ(n)√n. ∞

Without loss of generality we can suppose that T T3n and then we estimate the integral ≥

T T3n T3n T f (t) g(t) − ε = n − dt = + + . T t T T3n T3n Z− Z− Z− Z

From (26) we get

T3n fn(t) g(t) δ(n) 2 5 t2/4 1 − dt t + t e− dt = o( ) T3n t ≤ √n | | √n Z− Z  and according to [13, P202] we obtain T f (t) g(t) T t n dt T dt n − dt f( ) + g(t) t ≤ B t | | t ZT3n ZT3n n ZT3n

17 Where Bn = √nβ2 = √nσ. However,

T T 3 dt 1 t2/2 k3 t 1 g(t) e− 1+ | | dt = o( ) | | t ≤ √2πT 6√n √n ZT3n 3n ZT3n ! and by the previous theorem we get T t n dt λ(n) dt 1 f( ) = f(t) n = o( ) B t | | t √n ZT3n n Zw T3n − 1 and we can estimate the integral T in the same way. Hence ε = o( √n ), and by applying Theorem 3.3 we get− the inequality R R ( Φ) a c(a)A 1 f (t) Φ(x) 1 − ε + = o( ) n − − √n ≤ 2π λ(n)√n √n

which proves the theorem.

3.2.4 Simulation Edgeworth Expansion for Continuous Random Variables Let us test accurateness of the Edgeworth expansion. For instance we will try to apply it to exponential random variables which are continuous random variables, and compare the standardized mean of the exponential random variable to the normal approximation. We will choose to have 5 independent random variables X1,X2,X3,X4,X5 identically distributed (exponentially) 1 with λ =2 i.e. mean 2 , and then do the simulation 10000 times using Monte Carlo method to generate exponential random outcomes. We can see from Figure 1 below that the Edgeworth approximation gives better estimate to the distribution function than the normal approximation does.

3.3 Lattice Edgeworth Expansion All absolutely continuous distributions satisfy Cramér condition (C). On the other hand, discrete distributions do not. This mean that all lattice distri- butions do not satisfy Cramér condition (C). The expansion we have seen above does not hold for lattice distributions. A motivative reason for this is that, in the lattice distributions case, the distribution function of Sn has jumps points for every n, while the Edgeworth expansion on the other hand is a smooth function, so it can not approximate the distribution function of Sn with the accuracy of the level claimed in (25). Therefore to let the lattice distributions admit an expansion similar to (25) we need to add extra terms to account for the jumps.

18 1.2 empirical distr normcdf E3

1

0.8

0.6

0.4

0.2

0

−0.2 −4 −3 −2 −1 0 1 2 3 4 5 6

Figure 1: Here Edgeworth expansion is of third order. The Edgeworth ex- pansion approximate the empirical data better than the normal.

3.3.1 The Bernoulli Case The simplest lattice distribution example is the Bernoulli case. Suppose that we have an experiment whose outcome can be classified as either a success or a failure, so if we put X = 1 when the experiment is a success and X = 1 when it is a failure, then the function will have− jumps p, 1 p, at X = 1 where 0 p 1 is the probability of success. A random− variable is said± to be a Bernoulli≤ ≤ random variable if it’s probability distribution function is given by the above definition, and we 1 call it symmetrical Bernoulli distribution if F (x) is having jumps of size 2 at 1 X = 1 i.e. p = 2 . Here F (x) is lattice distribution and the characteristic ± itx 1 it 1 it function ψX (t) = E[e ] = 2 e + 2 e− = cos t. Now if we suppose that n is even number, then Fn(x) is purely discontinuous with jumps at the points k x = √n , (k =0, 2, 4,..., n). We now look± for± an expression± for the particular Bernoulli example. We will end into an expansion of Fn(x) formulated in Esseen [10] and Genden- sko and Kolmagorov [13]. The arguments in Esseen [10] and Gendensko and Kolmagorov [13] differed some what and omitted details filled in. To moti- vate the expansion formula for the Bernoulli example we first need the the following theorem.

Theorem 3.11 (The de Moivre-Laplace Theorem ). Let X1,X2,...,Xn be in- dependent identically distributed random variables with P (Xi =1)= P (Xi =

19 1) = 1 and let Z = X + X + ... + X . If a

k By de Moivre-Laplace Theorem, the jump size of Fn(x) for x = √n , for k is even, is approximately

k+1 k √n k k +1 k 1 φ( √n ) 2 ( k )2/2 φ(y)dy = φ( ) − =2 = e− √n . k 1 √n √n − √n √n √2πn Z √−n   Let us investigate F (x) for x ( 1 , 1 ). n ∈ − √n √n We look at the distribution function Fn of Sn = Zn/√n in the interval ( 1 , 1 ). We have from Theorem 3.11 that − √n √n

1 2 √n √n 1 1 1 x2/2 1 x2/2 2 1 P ( Sn ) e− dx = e− dx = +o( ) −√n ≤ ≤ √n → 1 √2π 0 √2π √2πn √n Z− √n Z Inside the interval ( 1 , 1 ) the distribution function F has a jump at − √n √n n zero asymptotically equal to 2 , and in the neighborhood of x = 0 the √2πn normal distribution function Φ(x) is approximately Φ(0) + φ(x)x 1 + x . 2 √2π Furthermore ≈ 1 1 (F (+0) + F ( 0)) = Φ(0) = . 2 n n − 2 For the particular case 0 x 1 , ≤ ≤ √n 1 F (x) Φ(x)= F (0+) Φ(x) F (x) Φ(x) φ(0)x = [F (0+) F ( 0)] φ(0)x n − n − ≈ n − − 2 n − n − − 1 2 1 2 1 x√n = = 2 √2πn − √2π √2πn 2 − 2   We can represent the behavior of Fn and Φ(x) around x =0 by the figures 1 2 and 3 below and neglect the term o( √n ). Generally, for x R ∈ 2 x√n Fn(x) Φ(x) Q − ≈ √2πn 2   where 1 Q(x)=[x] x + , − 2 20 Φ(x)

2 √2πn 1 Fn(x) 2

b b b b b b b b 4 2 1 2 3 4 5 6 x √−n √−n √n √n √n √n √n √n

Figure 2: Behavior of Fn(x) and Φ(x) and [x] is the integral part of x. Essen [10] suggest that we write

x2 2 x√n − Dn(x)= Q e 2 √2πn 2   x√n It is easy to see that Q 2 is a periodic function and takes values between 1 1 + 2 and 2   then you− need to study the expression

F (x) Φ(x) D (x). n − − n Here we will not go further with this particular Bernoulli example.

3.3.2 Simulating for the Edgeworth expansion Bernoulli Case Here we apply the Edgeworth approximation method using Matlab. We sim- ulate 10 random variables 3000 times, and then apply normal approximation in addition to our Edgeworht series approximation. We can easily see that there is big difference when we deal with lattice distributions. The normal distribution can never estimate the stairs shape of the distribution of Sn while the lattice Edgeworth approximation does that.

3.3.3 General Lattice Case Now we will move to the general case of the lattice distribution.

21 F (x) Φ(x) n −

1 b b 2 b b b b b b 4 2 1 2 3 4 5 6 x √−n √−n √n √n √n √n √n √n

Figure 3: Difference between Fn(x) and Φ(x) for the Bernoulli case

1.2 empirical distr normcdf Kol1

1

0.8

0.6

0.4

0.2

0

−0.2 −4 −3 −2 −1 0 1 2 3 4

Figure 4: Kol1 is the Edgeworth Expansion of order 3, normcdf is the Normal approximation.

Let us consider a sequence of independent random variables X1,X2,...,Xn, all with the same distribution function F (x), the characteristic function ψ (t), the mean value µ, and standard deviation σ =0, the moments α , and X j the absolute moments βj, (j =3, 4,...). We denote by Fn(x) the distribution function of the partial sum

n j=1 Xj/n µ Sn = − . P σ/√n

22 The possible values of the lattice random variable Xn will be x = a+kh, (k = 0, 1, 2,...) and h, the span of the distribution,see definition 2.3. Then ± ± X µ j − the lattice random variable Yj = σ which has mean zero and standard a kh deviation one will take values in σ + σ . Following the same steps we did in the Bernoulli case we get the expression

2 x√n x2 Dn(x)= Q e− 2 . √2πn 2   Here we put the main theorem for general lattice distribution in a way that Esseen (1945) [10] present it 2. The proof is basically to start calculating the characteristic function of Dn(x) and then applying Theorem 3.4, then splitting the integral we have into 3 parts. Each of them will give values 1 estimated as o( √n ). That will proof the theorem.

Theorem 3.12. If X1,X2,...,Xn are independent, identically distributed random variables with finite third moments then

x2 − e 2 P1(x) Q1(x) 1 Fn(x)=Φ(x)+ + + o( ) (27) √2π √n √n √n   uniform in x, (x a√n )√n where P (x)= K3 (x2 1) and Q (x)= h Q − σ . 1 − 6 − 1 σ h/σ   Note P1(x)= p1(x) in ( 17). Gnedenko and Kolmogorov [13] shows the expansion in the lattice case 1 for one term and neglect the terms from order o( √n ). On the other hand, Kolassa [17] put the expansion as series for n terms and neglecting terms 1 j from order o(n −2 ). Now denote the Edgeworth expansion for continuous n random variable by Ej(x; k ) where j represents the order of first cumulants, n Xj /n µ n Pj=1 − and k represents the cumulants for Sn = σ/√n . In the same way we n can denote the discontinuous part of the Edgeworth expansion by Dj(x, k ). Then we can rewrite the formula (27) into the form

n n Fn(x)= E3(x, k )+ D3(x, k ). where P (x) 1 E (x, kn)=Φ(x)+ φ(x) 1 + o( ) 3 √n √n

2We realized that here there is an erorr in the original Esseen formula, see also Hall [15] concerning that observation. The formula here is from Kolassa [17]

23 and Q (x) 1 D (x, kn)= φ(x) 1 + o( ). 3 √n √n We will try to use the formula (27) in applications later, but for seeking more accuracy results we should calculate more terms so we will look at a formula in Kolassa [17] which contains our needs (more than one term) and after that we can compare it with one term to see if we get any improvements.

Applying the Edgeworth cumulative distribution function approximation using the first four cumulants,

n 1 Fn(x)= E4(x, k )+ o( ) n (28) 1 = Φ(x) φ(x) h (x)kn/6+ h (x)kn/24 + h (x)kn2 + o( ) − 2 3 3 4 4 3 n n where k3 is the cumulant of order 3 for Sn, 2 j k n −2 kn = j (29) j σj

We can then prove (29) easily. The characteristic function of Sn is

Snθ ψSn (t)= E[e ] Pn X µ θ j=1 j − = E[e √n/σ ]

X µ n θ j − . = E[e √n σ ]

  n θµ θ Xj = e− σ√n E[e √n σ ]   θj θX ∞ kj Using (5), E[e ]= ePj=1 j!

ψSn (t)= j n θµ θ 1 ∞ kj = e− σ√n ePj=0  σ√n  j!

 k n j  θµ√n j θ j∞=0 j j! = e− σ eP (σ√n) 2 j −2 θµ√n kj n θj j∞=1 j = e− σ eP σ j! , k1 =1 2 j − kj n 2 θj ∞ = ePj=1 σj j!

n θj ∞ k = ePj=2 j j! ,

24 2 j − where kn = kj n 2 , j 2. j σj ≥ We have from Kolassa [17] the following expansion

r 2 l a√n − 1 h/σ (x )√n dl D (x, kn)= Q − σ ( 1)l E (x, kn), (30) 4 l! √n l h/σ − dxl j j=1 ! X   where

l 1 l (l 1)!gl j∞=1 cos(2πjy)/(2 − (πj) ) if l is even Ql = − l 1 l (31) (l 1)!g ∞ sin(2πjy)/(2 − (πj) ) if l is odd ( − l Pj=1 with the constant gl givenP by

+1 if l =4j +1 or l =4j +2 for some integer j gl = 1 if l =4j +3 or l =4j for some integer j (− so n n Fn(x)= E4(x, k )+ D4(x, k ).

3.3.4 Continuity-Corrected Points Edgeworth Series Feller [12] shows that the Edgeworth series estimated only at continuity- corrected points h/σ x+ = x + 2√n will yield results accurate to o(1/√n). In other words Feller suggests that it is possible to add corrections to the standard Edgeworth series expansion for continuous random variables. By modifying the cumulants for the standard Edgeworth expansion for contin- uous random variables we get a formula valid for lattice random variables. Specifically this method suggests that we can use the cumulants adjusted by Sheppard’s corrections to approximate the lattice distribution. Kolassa and McCullagh [16] prove that by evaluating such an Edgeworth expansion at continuity-corrected points, the errors that result are as small as usually in an Edgeworth approximation. The main purpose of their work was to show that the discrete part in Esseen’s approximation for lattice random variables can be omitted by modifying the expansion for the continuous random vari- ables. In equations words they put

25 1 r + + n + n − Er(x ; λ)= Er(x , k )+ Dr(x , k )+ o(n 2 ) ; x are lattice points (32) where

ki for i> 1 and odd λi = (33) k ε h for i even ( i − i nσ are the Sheppard’s-corrected cumulants. Wold [21] describes more about these corrected cumulants . The adjustment εi is the ith cumulant of the uniform random variable U on 1 , 1 , i.e. ε = 1 , ε =0 and ε = 1 . − 2 2 2 12 3 4 − 120   Remark 3.13. If X are not lattice points then (32) is not true. Moreover, + n graphically Er(x , k ) does produce the stair shape for the distribution func- tion.

Remark 3.14. There are discrete distributions which are not lattice distri- butions. For those non of the Edgeworth expansion presented here may work. This problem can be solve using the saddle point approximation method which we will describe later.

3.3.5 Simulation on Triss Cards Our main task here is to try to apply the Edgeworth expansion series for lattice random variables to real application that companies may use, and one of this application is the chance games, for example Triss. We will present here the case where the company produce a Triss card with three boxes and when you scratch the box you may win 0 SEK, 2 SEK, or 10 SEK with probabilities (0.53, 0.46999, 0.00001) , respectively. We will write a program in Matlab for this purpose. The aim behind this program is trying to approximate the cumulative distribution function of Sn better than the simple normal approximation method, and one of the important benefit of this method is to reduce the time spent by a games companies to check the statistical aspects and decide if a specific chance game is reliable or not depending on their rules. First we will start with n = 1000 lattice random numbers taking values in 0, 2, 10 with probabilities (0.53, 0.46999, 0.00001) and let 3000 be the number of simulations. Estimating of the cumulative distribution function will give the figure 5.

26 1 empirical distr normcdf Kol1 0.9 Kolasa1 Kolasa2 Kolasa3 0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 −4 −3 −2 −1 0 1 2 3 4 5

Figure 5: Here Kol1 represent the Edgeworth series expansion with one term in (27), Kolasa1 is the same like Kol1 but derived from Kolassa formula [17], Kolasa2 represent Edgeworth expansion with two terms [17], while Kolasa3 is for Edgeworth expansion for corrected-continuity points.

empirical distr normcdf 0.75 Kol1 Kolasa2 Kolasa3

0.7

0.65

0.6

0.55

0.5

0 0.1 0.2 0.3 0.4 0.5

Figure 6: Edgeworth approximation of the distribution function n = 1000

Now if we take a zoom in to have better picture to our approximation methods we get the figure 6.

27 As a direct conclusion we can see that the normal approximation did not give very accurate result as the others, it is even don’t have the stairs shape, while we can also see that Kol1,Kolasa1, and Kolasa2 are exactly the same curve and they are more close to the empirical distribution function. For the Kolasa3, the Corrected-Continuity method gives a very good accurate result only at Corrected-Continuity points which are the points that the distribution function jumps at. Our second attempt will be with n = 100 and 3000 the number of sim- ulations. Here we reduce the number of lattice random numbers to test the ability of the Edgewrth expansion to give an accurate outcome even though with small n, see the figure 7.

1 empirical distr normcdf Kol1 Kolasa2 0.9 Kolasa3

0.8

0.7

0.6

0.5

0.4

0.3

0.2

−1 −0.5 0 0.5 1 1.5

Figure 7: Edgeworth approximation of the distribution function n = 100

We notice that our Edgeworth approximation methods are doing well here and much more better than the normal approximation and the stairs are also close to the empirical distribution function, the same for the Corrected- Continuity method (Kolasa3). It gives an accurate estimation at the Continuity- Corrected points. It will be more clear with the zoomed picture Figure 8. We still want to try one more case, and here we will change the winning values to become 0 SEK, 2 SEK, and 50 SEK with the same previous proba- bilities, and try to apply the same program we have before at the new lattice case. By applying the program we get Figure 9, with 1000 generated inde- pendent identically distributed lattice random numbers and 3000 number of simulations.

28

empirical distr 0.6 normcdf Kol1 Kolasa2 0.55 Kolasa3

0.5

0.45

0.4

0.35

0.3

0.25

0.2

0.15

−0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0

Figure 8: Edgeworth approximation of the distribution function n = 100

empirical distr normcdf Kol1 Kolasa2 0.7 Kolasa3

0.65

0.6

0.55

0.5

0.45

0.4

−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5

Figure 9: Edgeworth approximation of the distribution function n = 1000.

We notice that similar to the previous cases, the normal approximations don’t give us an accurate result as the Edgeworth approximation does. Also we see the same big difference between the Corrected-Continuity method and the normal method at the corrected continuity points. Now if we change n the number of generated identically distributed lattice

29 random numbers, to 100, with the same simulation number and run the program it will give Figure 10.

0.6 empirical distr normcdf Kol1 Kolasa2 0.55 Kolasa3

0.5

0.45

0.4

0.35

0.3

−0.8 −0.6 −0.4 −0.2 0 0.2

Figure 10: Edgeworth approximation of the distribution function n = 100.

We will present another approximation method now focusing just on the applied part without going through theoretical aspects of it.

4 Saddle Point Approximation

The statistical direction of the saddle point approximation started by the pioneering article by Daniels (1954) [6], where he treated the case of mean of identically distributed random variables. The saddle point approximation is derived using saddle point integration techniques applied to a density in- version integral. Here we will consider the problem of estimating the density of a random variable X whose distribution depends on n. As we mentioned before we will not go further in theoretical details, instead present the for- mulas and how to use it.

First let X1,X2,...Xn be iid continuous observations having f(x) as the density function. Suppose the moment generating function (mgf) of f,ϕ(t)= E[etx]= etxf(x)dx exist for t around the origo point, and K(t) = log(ϕ(t)) represent the cumulant generating function (cgf). Then for given x, let φ = R φ(x) denote the saddle point of K; in other words the solution of the equation b b 30 ′ K (φ) = x, which in many applications is unique. For given n let fn(u) denote the density of X. Then the saddle point approximation to fn says theb following. Theorem 4.1.

n n[K(φ) φx] 1 − − fn(x)= e b b 1+ O(n ) s2πK ′′ (φ)  for any given value x. b The expansion can be carried to higher-order terms, in which the error 1 1 of the approximation goes down in powers of n rather than powers of √n . This gives a hint that the saddle point approximation may be more accurate at small sample sizes. You can find an explicit explanation to this fact in Daniels (1954) [6]. The result in Theorem 4.1 can be derived at least in two ways, like Laplace’s method or Edgeworth expansion with exponential tilting which can be found in DasGupta [8].

Example: Suppose we have Xi are independent identically distributed t2 2 2 t N(0, 1). Then, ψ(t) = e and so K(t) = 2 . The unique saddle point φ solves K ′ (φ)= φ = x, giving the approximation b 2 2 b b n n[ x x2] n n x f (x)= e 2 − = e 2 n 2π 2π r r which coincides with the exact density of X.

Theorem 4.2. (The Lattice Case) Let X1,X2,...,Xn be an independent iden- tically distributed lattice random variables. With ζ = 2n[φx K(φ)]sgn(φ), − φ (2) and Z = 1 e− √nK , then q − b b b b b   1 1 3/2 Qn(x)= P X > x =1 Φ(ζ)+ φ(ζ) + O(n− ) . (34) − Z −  ζ   b b Remark: A continuity-corrected method of thisb theorem is presented in Daniels (1987) [7], but its practical improvement over the uncorrected version looks to be questionable.

31 4.1 Simulation With Saddle Point Approximation Method We will apply the saddle point approximation method on the same chance game we explained before by simulating the game and then compare the empirical distribution function with the distribution function we get from saddle point approximation technique. Programing the saddle point method using Matlab and run it for 1000 iid random numbers with 3000 the number of simulations will give us Figure 11.

empirical distr normcdf 0.9 sad

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

−3 −2 −1 0 1 2 3 4

Figure 11: Saddle point approximation (sad). The comparing normal ap- proximation may not hardly visible on black-white print.

In this figure we can see that the saddle point method give better esti- mation than the normal approximation at the discontinuity points where the distribution function have jumps.

32 A Matlab Codes

%Rani Basna N=10000; lambda=2;n=5; u=unifrnd(0,1,n,N); X=-1/lambda*log(1-u); x=linspace(0,4*lambda); m1=1/lambda; m2=2/(lambda)^2; m3=6/(lambda)^3; m4=6; sigma=sqrt(m2-m1^2); gamma3=(m3-3*m1*m2+2*m1^3)/sigma^3; h2=@(x)x.^2-1; h3=@(x)x.^3-3*x; h4=@(x)x.^4-6*x.^2+3; h5=@(x)x.^6-15*x.^4+45*x.^2-15; E3=@(x)normcdf(x)-normpdf(x).*(gamma3*h2(x)/(sqrt(n)*6)); stand=(mean(X,1)-m1)/(sigma/sqrt(n)); ecdf(stand); y=linspace(-4,4),hold, plot(y,normcdf(y),’g’,y,E3(y),’r’) legend(’empirical distr’,’normcdf’,’E3’) %Rani Basna n=10;%n as in Gnedenko and Kolmogorov m=3000;%number of simulations win=2*unidrnd(2,n,m)-3; m1=[-1 1]*[.5 .5]’; m2=([-1 1].^2)*[.5 .5]’; sigma=sqrt(m2-m1^2); S=@(x)floor(x)-x+1/2; D=@(x)2/sqrt(n)*S(x*sqrt(n)/2).*normpdf(x); Kol=@(x)normcdf(x)+D(x); standwin=(mean(win,1)-m1)/(sigma/sqrt(n)); ecdf(standwin) x=linspace(-4,4), hold,plot(x,normcdf(x),’g’,x,Kol(x),’r’),hold off legend(’empirical distr’,’normcdf’,’Kol1’) %Rani Basna and Roger Pettersson

33 clear n=1000;%n as in Gnedenko and Kolmogorov m=3000;%number of simulations x=rand(n,m); kolasa1=zeros(size(x)); win=0*(x<=0.53)+2*(x>0.53).*(x<=0.53+0.46999)+50*(x>0.53+0.46999); mwin=mean(win,1); %sample of mean wins xvalues=[0 2 50]; pvalues=[.53 .46999 .00001]; m1=xvalues*pvalues’; m2=(xvalues.^2)*pvalues’; m3=(xvalues.^3)*pvalues’; m4=(xvalues.^4)*pvalues’; a=max(xvalues)/2-m1; h=2; sigma=sqrt(m2-m1^2); gamma3=(m3-3*m1*m2+2*m1^3)/sigma^3; gamma4=((xvalues-m1).^4)*pvalues’/sigma^4; %S1=@(x)floor(x)-x+1/2; S=@(x)x-floor(x)-1/2; S22=@(x)(x-floor(x)).^2-(x-floor(x))+1/6; D=@(x)2/sqrt(n)*S(x*sqrt(n)/2).*normpdf(x); pbar=-a/h; an=h*n*pbar/sigma/sqrt(n); S1=@(x)-h/sigma*S((x+an)*sigma*sqrt(n)/h); S2=@(x)h^2/(2*sigma^2*n)*S22((x+an)*sigma*sqrt(n)/h); Q1=@(x)1/6*gamma3*(1-x.^2); Q2=@(x)10*gamma3^2*x.^5/prod(1:6)+(gamma4/3-10*gamma3^2/9)... *x.^3/8+(5*gamma3^2/24-gamma4/8)*x; Kol1=@(x)normcdf(x)+normpdf(x).*(Q1(x)+S1(x))/sqrt(n); Kol2=@(x)Kol1(x)+normpdf(x).*Q2(x)/n; h2=@(x)x.^2-1; h3=@(x)x.^3-3*x; h4=@(x)x.^4-6*x.^2+3; h5=@(x)x.^6-15*x.^4+45*x.^2-15; E3=@(x)normcdf(x)-normpdf(x).*(gamma3*h2(x)/(sqrt(n)*6)); E31=@(x)normpdf(x).*(1+gamma3*h3(x)/(6*sqrt(n))); D3=@(x)S1(x)/sqrt(n).*E31(x); Kolasa1=@(x)D3(x)+E3(x);

34 E4=@(x)E3(x)+gamma4*h3(x)/(24*n)+gamma3^2*h5(x)/(72*n); E41=@(x)normpdf(x).*(1+gamma3*h3(x)/(6*sqrt(n))+... gamma4*h4(x)/(24*n)+gamma3^2*h6(x)/(72*n)); E411=@(x)-normpdf(x).*(x+gamma3*h4(x)/(6*sqrt(n))+... gamma4*h5(x)/(24*n)+gamma3^2*h7(x)/(72*n)); D4=@(x)D3(x)+S2(x).*x.*normpdf(x); Kolasa2=@(x)D4(x)+E4(x); eps2=1/12; eps4=-1/120; lamda3=gamma3; lamda4=gamma4-eps4/n*(h/sigma)^4; Z=@(x)x+h/(sigma*2*sqrt(n)); e3=@(x)normcdf(Z(x))-normpdf(Z(x)).*(lamda3*h2(Z(x))/(sqrt(n)*6)); e31=@(x)normpdf(Z(x)).*(1+lamda3*h3(Z(x))/(6*sqrt(n))); d3=@(x)S1(Z(x))/sqrt(n).*e31(Z(x)); Kolasa3=@(x)e3(x); e4=@(x)e3(Z(x))+lamda4*h3(Z(x))/(24*n)+lamda3^2*h5(Z(x))/(72*n); e41=@(x)normpdf(z(x)).*(1+lamda3*h3(Z(x))/(6*sqrt(n))+... lamda4*h4(Z(x))/(24*n)+lamda3^2*h6(Z(x))/(72*n)); e411=@(x)-normpdf(Z(x)).*(Z(x)+lamda3*h4(Z(x))/(6*sqrt(n))+... lamda4*h5(Z(x))/(24*n)+lamda3^2*h7(Z(x))/(72*n)); d4=@(x)d3(Z(x))+S2(Z(x)).*Z(x).*normpdf(Z(x)); Kolasa4=@(x)e4(x);

Kolasa11=@(x)normcdf(x)-normpdf(x)/sqrt(n).*((gamma3*h2(x))/6+... (gamma4*h3(x))/(sqrt(n)*24)+(gamma3^2*h5(x))/(72*sqrt(n))+... S1(x)+S1(x).*(gamma3*h3(x))/(6*sqrt(n))-(x.*S2(x))*h/(2*sqrt(n)*sigma)); standwin=(mean(win,1)-m1)/(sigma/sqrt(n));%standardized mean data %subplot(211),ecdf(standwin); plot(211),ecdf(standwin); display(’emp percentiles; normal perc’) [prctile(mwin,2.5) prctile(mwin,97.5); %empirical percentiles m1+norminv(0.025)*sigma/sqrt(n) m1+norminv(0.975)*sigma/sqrt(n)] alpha=0.025; R=@(y)1-Kol1(y); zalpha=norminv(1-alpha);

35 yalpha=fzero(@(y)R(y)-alpha,zalpha); [yalpha zalpha]; xalphaNormal=m1+zalpha*sigma/sqrt(n) xalphaKol1=m1+yalpha*sigma/sqrt(n)

%y=linspace(-2,2), hold ,plot(y,normcdf(y),’g’,y,Kol1(y),’r’,y,Kolasa1(y)... ,’c’,y,Kolasa2(y),’m’,y,Kolasa3(y),’y’,y,Kolasa4(y),’k’),hold off y=linspace(-2,2);, hold ,plot(y,normcdf(y),’g’,y,Kol1(y),’r’,y,Kolasa1(y)... ,’c’,y,Kolasa2(y),’m’,y,Kolasa3(y),’y’),hold off legend(’empirical distr’,’normcdf’,’Kol1’,’Kolasa1’,’Kolasa2’,’Kolasa3’)

%Rani Basna and Roger Pettersson clear n=100; m=3000; X=rand(n,m); win=0*(X<=0.53)+2*(X>0.53).*(X<=0.53+0.46999)+10*(X>0.53+0.46999); xvalues=[0 2 10]; pvalues=[.53 .46999 .00001]; mu=sum(xvalues.*pvalues); sigma=sqrt(sum(((xvalues-mu).^2).*pvalues)); xbar=mean(win,1); stand=(xbar-mu)/(sigma/sqrt(n)); psi=@(t)sum(exp(t*xvalues).*pvalues); psi1=@(t)sum(xvalues.*exp(t*xvalues).*pvalues); psi11=@(t)sum(xvalues.^2.*exp(t*xvalues).*pvalues); k=@(t)log(psi(t)); k1=@(t)psi1(t)/psi(t); k11=@(t)(psi11(t).*psi(t)-psi1(t).^2)/psi(t).^2; y=linspace(-2,2); x=mu+sigma*y/sqrt(n); for i=1:length(x) %plot(x(i),k1(x(i)),’.’),hold on end %hold off for i=1:length(x)

36 theta(i)=fzero(@(t)k1(t)-x(i),max(x(i)-1,0)); Zetabar(i)=sqrt(2*n*(theta(i)*x(i)-k(theta(i))))*sign(theta(i)); Zbar(i)=(1-exp(-theta(i)))*sqrt(n*k11(theta(i))); sad(i)=normcdf(Zetabar(i))-normpdf(Zetabar(i))*(1/Zbar(i)-1/Zetabar(i)); end ecdf(stand),hold, plot(y,normcdf(y),’g’,y,sad,’r’) legend(’empirical distr’,’normcdf’,’sad’)

37 References

[1] Bhattachatya, R.N. and Rao, R.R. (1976). Normal Approximation and Asymptotic Expansions. Willey, New York.

[2] Billingsley, P. (1986). Probabili´ty and Measure. 2nd ed. Wiley, New York.

[3] Blom, G. (1984). Sannolikhetsteori med tillämpningar. Studenttlitteratur.

[4] Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Uni- versity Press, Princeton, NJ.

[5] Cramér, H. (1970). Random Variables and Probability Distributions, 3rd ed. Cambridge University Press, Cambridge, UK.

[6] Daniels, H.E.(1954). Saddle Point Approximations in Statistics. Ann.Math. Statist. 25, 631-649.

[7] Daniels, H.E.(1987). Tail Probability Approximations. Int. Stat. Rev., 55, 37-48.

[8] DasGupta, A. (2008). Asymptotic Theory of Statistics and Probability. Springer. New York.

[9] Durrett, R. (1996). Probability theory and examples. Wadsworth.

[10] Esseen, C.G. (1945). Fourier Analysis for Distribution Functions, Acta Mathematica, 77, 1-125.

[11] Esseen, C.G. (1968). On The Remainder Term in The Central Limit Theorem. Arkiv Matematic 8, 7-15.

[12] Feller, W. (1971). An Introduction to Probability Theory and Its Appli- cations,Vol 2, 2nd ed. Wiley, New York.

[13] Gnedenko, B.V. and Kolmogorov, A.N. (1954). Limit Distributions for Sums of Independent Random Variables. Addison-Wesley, Reading, MA.

[14] Gut, A. (2005). Probability a Graduate Course, Springer.

[15] Hall,P. (1992). The Bootstrap and Edgeworth Expansion, Springer- Verlag, New York.

[16] Kolassa, J.E., and McCullagh, P.(1990), Edgeworth Series for Lattice Distributions, Annals of Statistics, 18, 981-985.

38 [17] Kolassa, J.E, (2003). Series Approximation Methods in Statistics. Springer.

[18] Petrov, V.V. (1975). Sums of Independent Random Variables. Springer- Verlag, New York.

[19] Ross, S.(2006). First Course in Probability. 7th Ed. Person Prentice Hall.

[20] Råde, L. and Westergren, B. (2004). Mathematics Handbook for Science and Engineering. Studenttlitteratur.

[21] Wold, H. Sheppard’s Corrections formula in several variables. Skand, Akturarietidskrift 17 248-255.

39

SE-351 95 Växjö / SE-391 82 Kalmar Tel +46-772-28 80 00 [email protected] Lnu.se/dfm