U.U.D.M. Project Report 2008:22

A simulation method for correction

Måns Eriksson

Examensarbete i matematisk statistik, 30 hp Handledare och examinator: Silvelyn Zwanzig December 2008

Department of Mathematics Uppsala University

Abstract

Let X1,...,Xn be i.i.d. random variables with known and skewness. A one-sided confidence interval for the with approximate confidence level α can be constructed using normal approximation. For skew distributions the actual confidence level will then be α+o(1). We propose a method for obtaining confidence intervals with confidence level α+o(n−1/2) using skewness correcting pseudo-random variables. The method is compared with a known method; Edgeworth correction. h h Acknowledgements

I would like to thank my advisor Silvelyn Zwanzig for introducing me to the subject, for mathematical and stylistic guidance and for always encouraging me.

I would also like to thank my friends and teachers at and around the department of mathematics for having inspired me to study mathematics, and for continuing to inspire me.

3 h h Contents

1 Introduction 7 1.1 Skewness ...... 7 1.2 Setting and notation ...... 7

2 The Edgeworth expansion 8 2.1 Definition and formal conditions ...... 8 2.2 Derivations ...... 11 2.2.1 Edgeworth expansion for Sn ...... 11 2.2.2 Edgeworth expansions for more general ...... 14 2.2.3 Edgeworth expansion for Tn ...... 15 2.2.4 Some remarks; skewness correction ...... 16 2.3 Cornish-Fisher expansions for quantiles ...... 16

3 Methods for skewness correction 17 3.1 Coverages of confidence intervals ...... 17 3.2 Edgeworth correction ...... 18 3.3 The bootstrap ...... 19 3.3.1 The bootstrap and Sn ...... 19 3.3.2 Bootstrap confidence intervals ...... 20 3.4 A new simulation method ...... 21 3.4.1 Skewness correction through addition of a . . . . 21 3.4.2 Simulation procedure ...... 22

4 Comparison 23 4.1 Coverages of confidence intervals ...... 23 4.2 Criteria for the comparison ...... 24 4.3 Comparisons of the upper limits ...... 24 4.4 Simulation results ...... 26 4.5 Discussion ...... 26

A Appendix: Skewness and 28 A.1 The skewness of a sum of random variables ...... 28 A.2 The kurtosis of a sum of random variables ...... 30 ˆ ˆ B Appendix: P (θnew ≤ θEcorr) 31

C Appendix: Simulation results 36 h h 1 Introduction

1.1 Skewness The notion of skewness has been a part of statistics for a long time. It dates back to the 19th century and most notably to an article by from 1895 ([26]). Skew distributions are found in all areas of applications, ranging from finance to biology and physics. It has been seen both in theory and in practice that deviations from normality in the form of skewness might have effects too big to ignore on the validity and performance of many statistical methods and procedures. This thesis discusses skewness in the context of the and normal approximation, in particular applied to confidence intervals. Some methods for skewness correction are discussed and a new simulation method is proposed. We assume that the concept of skewness is known and refer to Appendix A for some basic facts about skewness.

1.2 Setting and notation

Throughout the thesis we assume that we have an i.i.d. X1,...,Xn, with EX = µ, Var(X) = σ2 and E|X|3 < ∞, such that X satisfies Cram´er’scondition lim supt→∞ |ϕ(t)| < 1, where ϕ is the characteristic function of X. At times we will 4 also assume that EX < ∞. We use X to denote a generic Xi, that is, X is a random variable with the same distribution as the Xi. Thus, for instance, EX is the mean of the distribution of the observations; EX = EXi for all i. The α-quantile vα of the distribution of some random variable X is defined to be such that P (X ≤ vα) = α. When X ∼ N(0, 1) we denote the quantile λα, that is, Φ(λα) = α. 1/2 We use An to denote a general , Sn = n (X¯ − µ)/σ to denote the stan- 1/2 dardized sample mean and Tn = n (X¯ − µ)/σˆ to denote the studentized sample mean, 2 1 P ¯ 2 whereσ ˆ = n (Xi − X) .

3 3 The skewness E(X − µ) /σ of a random variable X is denoted Skew(X), γ or γX if we need to distinguish between different random variables. The kurtosis of X, E(X − 4 4 µ) /σ − 3, is denoted Kurt(X), κ or κX . Basic facts about skewness and kurtosis are stated in Appendix A.

As for asymptotic notation, for real-valued sequences an and bn we say that an = o(bn) if an/bn → 0 as n → ∞ and an = O(bn) if an/bn is bounded as n → ∞. Finally, we say that a sequence Xn of random variables is bounded in probability if, limc→∞ lim supn→∞ P (|Xn| > c) = 0. We write this as Xn = OP (1) and if, for some sequence an, anXn = OP (1) we write Xn = OP (1/an).

7 2 The Edgeworth expansion

In this section we introduce our main tool, the Edgeworth expansion. Later we will use it to determine the coverage of confidence intervals.

2.1 Definition and formal conditions

Theorem 1. Assume that X1,...,Xn is an i.i.d. sample from a univariate distribution, 2 j+2 with mean µ, variance σ and E|X| < ∞, that satisfies lim supt→∞ |ϕ(t)| < 1. Let 1/2 Sn = n (X¯ − µ)/σ. Then

−1/2 −j/2 −j/2 P (Sn ≤ x) = Φ(x) + n p1(x)φ(x) + ... + n pj(x)φ(x) + o(n ) (1) uniformly in x, where Φ(x) and φ(x) are the standard function and density function and pk is a polynomial of degree 3k − 1. In particular 1 p (x) = − γ(x2 − 1) and 1 6  1 1  p (x) = −x κ(x2 − 3) + γ2(x4 − 10x2 + 15) . 2 24 72 Proof. A proof is given in Section 2.2.1. See also [7] and [8].

(1) is called an Edgeworth expansion for Sn. The condition lim supt→∞ |ϕ(t)| < 1 is known as Cram´er’s condition and was de- rived by Cram´erin [8]. Note that the condition holds whenever X is absolutely con- tinuous. This is an immediate consequence of the Riemann-Lebesgue lemma (Theo- rem 1.5 in Chapter 4 of [18]). Moreover, if we limit the expansion to P (Sn ≤ x) = −1/2 −1/2 −1/2 Φ(x) + n p1(x)φ(x) + o(n ), so that the remainder term is o(n ), then it suf- fices that X has a non-lattice distribution, which was shown by Esseen in [16]. The Edgeworth expansion was first developed for the statistic Sn but has later been extended to other statistics.

Definition 1. Let An denote a statistic. Then if

−1/2 −j/2 −j/2 P (An ≤ x) = Φ(x) + n a1(x)φ(x) + ... + n aj(x)φ(x) + o(n ), (2) where Φ(x) and φ(x) are the standard normal distribution function and density function and ak is a polynomial of degree 3k − 1, (2) is called the Edgeworth expansion for An. If ak = 0 for all k < i the normal approximation of the distribution is said to be i:th order correct.

In general, the polynomials ak will depend on the moments of the statistic. We can therefore, in a sense, view the Edgeworth expansion as an extension of the central limit theorem, where information about the higher moments of the involved random variables is used to obtain a better approximation of the distribution function of An. The expan- sion gives an expression for the size of the remainder term depending on the sample size.

8 We might want to compare this to the Berry-Esseen theorem (see for instance [16] or Section 7.6 of [18]), which essentially, in our context, says that the error in the normal approximation is of order n−1/2.

The Edgeworth expansion for simple statistics was first introduced in papers by Cheby- shev in 1890 ([4]) and Edgeworth in 1894, 1905 and 1907 ([12, 13, 14]). The idea was made mathematically rigorous by Cram´erin 1928 ([7]) and Esseen in 1945 ([16]). The expansions and their applications for more general statistics were then developed in several papers, including [2], by various authors in the mid-1900’s. A thorough treatment of the Edgeworth expansion is found in Chapter 2 of [22]. Chapter 13 of [10] gives a brief introduction to the Edgeworth expansion with some of the most important results and Section 17.7 of [8] is a standard reference for the case where the expansion for Sn is considered.

Conditions for (2) to hold are given next. Although we will focus on expansions of quite simple statistics, the Edgeworth expansion can be used in very general circum- stances. We state a theorem by Bhattacharya and Ghosh ([2]) that provides conditions for the Edgeworth expansion (2) to hold in a general case and illustrate how the theorem relates to the expansion for Sn.

d Let X, X1, X2,..., Xn be i.i.d. random column vectors in R with mean µ and let ¯ −1 Pn d X = n i=1 Xi. Let A : R → R be a function of the form AS = (g(x) − g(µ))/h(µ) ˆ or AT = (g(x) − g(µ))/h(x), where g and h are known, θ = g(X¯ ) is an of the scalar θ = g(µ), h(µ)2 is the asymptotical variance of n1/2θˆ and h(X¯ ) is an estimator of h(µ). d (1) (2) (d) (1) 2 (d) 2 1/2 For t ∈ R , t = (t , t , . . . , t ), define ||t|| = ((t ) + ... + (t ) ) and, for a Pd (j) (j) random d-vector X, ϕ(t) = E(exp(i j=1 t X ). Theorem 2. Assume that A has j + 2 continuous derivates in a neighbourhood of µ = E(X) and that A(µ) = 0. Furthermore, assume that E(||X||j+2) < ∞ and that the characteristic function ϕ of X is such that lim sup||t||→∞ |ϕ(t)| < 1. Let σA be the 1/2 asymptotic standard deviation of n A(X¯ ) and assume that σA > 0. Then, with An = 1/2 n A(X¯ )/σA and for j ≥ 1,

−1/2 −j/2 −j/2 P (An ≤ x) = Φ(x) + n a1(x)φ(x) + ... + n aj(x)φ(x) + o(n ) (3) uniformly in x. ak is a polynomial of degree 3k − 1 with coefficients depending on A and on moments of X of order less than or equal k + 2. ak is odd for even k and even for odd k.

Proof. See [2].

Note that the above theorem is a summary of the results in Bhattacharya’s and Ghosh’s 1978 paper and thus not stated in one single theorem in the original work. Our summary largely resembles that in Chapter 2 of [22].

9 The class of functions that satisfy the conditions in Theorem 2 contains many func- tions of great interest. In particular, we can write any estimator, i.e. an estimator based on sample moments, as a function A of vector in the same way that we will do below for the mean. The Edgeworth expansion is sometimes written as an infinte series, P (An ≤ x) = −1/2 −j/2 Φ(x)+n a1(x)φ(x)+...+n aj(x)φ(x)+.... However, for the series to converge it  (X−µ)2  is required, for an absolutely continuous random variable, that E exp( 4·σ2 ) < ∞; a condition that fails even for exponentially distributed X (see [7]). Thus we prefer the stopped series used in (3), which also turns out to be more useful in practice. Before we show how Theorem 2 relates to the expansion for Sn we state a slightly more general corollary. 1/2 ˆ 1/2 ˆ Corollary 1. Let An be either AS = n (θ − θ)/σ or AT = n (θ − θ)/σˆ, where θ is some unknown scalar parameter, θˆ is an asymptotically unbiased estimator of θ, σ2 is the asymptotic variance of n1/2θˆ and σˆ2 is some of σ2. Then the first two polynomials in the Edgeworth expansion are  1  a (x) = − k + k (x2 − 1) 1 1,2 6 3,1 and 1 1 1  a (x) = −x (k + k2 ) + (k + 4k k )(x2 − 3) + k2 (x4 − 10x2 + 15) 2 2 2,2 1,2 24 4,1 1,2 3,1 72 3,1 where the kj,i comes from an expansion of the j:th of An:

−(j−2)/2 −1 −2 κj,n = n (kj,1 + n kj,2 + n kj,3 + ...). Proof. Details are given in Section 2.2.2.

Theorem 2 can be used to show that Sn as well as the studentized sample mean −1/2 Tn = n (X¯ −µ)/σˆ admit Edgeworth expansions. Assume that X1,...,Xn is a sample from a univariate distribution, that the unknown parameter θ is the mean µ of the 2 2 T distribution and that the distribution has variance σ . Take d = 2, Xi = (Xi,Xi ) and µ = E(X) = (EX, EX2)T and let g(x(1), x(2)) = x(1) and h(x(1), x(2)) = x(2) − (x(1))2. Then g(µ) = µ and g(X¯ ) = X¯. Furthermore h(µ) = σ2 and n n n 2 ¯ −1 X 2  −1 X  −1 X ¯ 2 2 h(X) = n Xi − n Xi = n (Xi − X) =σ ˆ i=1 i=1 i=1

AS = (g(x) − g(µ))/h(µ) and AT = (g(x) − g(µ))/h(x) both fulfill the conditions in 1/2 Theorem 2 and the asymptotic σA of n A(X¯ ) is 1. Furthermore, the j+2 condition E(pX2 + (X2)2 ) < ∞ can be reduced to E(|X|j+2) < ∞ and Cram´er’s condition reduces to lim supt→∞ |ϕ(t)| < 1. Thus the conditions for existence of the expansion for Sn and Tn follow and turn out to be those in Theorem 1. Both statistics are of the form that is considered in Corollary 1 and the polynomials in their expansions can thus be found.

10 2.2 Derivations Knowing under which conditions the Edgeworth expansion exists, we are ready to derive the expressions for the polynomials p1 and p2.

2.2.1 Edgeworth expansion for Sn 1/2 j+2 As before, let Sn = n (X¯ − µ)/σ. Assuming that E|X| < ∞ and that X fulfills Cram´er’s condition, by Corollary 1

−1/2 −j/2 −j/2 P (Sn ≤ x) = Φ(x) + n p1(x)φ(x) + ... + n pj(x)φ(x) + o(n ). (4)

To actually be able to make reasonable use of the expansion we need to find expressions for the polynomials pk. The case where j = 2 will prove to be of special interest to us, so although we consider general j we will in the end only derive explicit expressions for p1 and p2. Our exposition is mainly based on that in [8] and [22] but aims to be somewhat more thorough than those that we have seen in the existing literature.

Since we will assume the existence of moments of order higher than 2, Sn is asymp- totically N(0, 1)-distributed, i.e. Sn converges in distribution to S ∼ N(0, 1), and thus −t/2 the characteristic function ϕn of Sn converges to e , the characteristic function of the standard normal distribution, as n tends to infinity. That is, as n → ∞

−t2/2 ϕn(t) = E(exp(itSn)) −→ E(exp(itS)) = e for −∞ < t < ∞ where S ∼ N(0, 1). n Recall that if Y1,...,Yn are i.i.d. and Sn = Y1 +...+Yn then ϕSn (t) = (ϕY1 (t)) (see 1/2 Theorem 1.8 in Chapter 4 of [18]). Now, let Yi = (Xi−µ)/σ. Then Sn = n (X¯ −µ)/σ = n1/2 1 Pn Y = n−1/2 Pn Y and thus n i1 i i1 i

n n  −1/2 X   −1/2  ϕn(t) = E(exp(itSn)) = E exp(itn Yi) = ϕY (tn ) . i=1

Next we define the cumulant generating function for Y as ln ϕY (t). A MacLaurin ex- k pansion of ln ϕY (t) shows that if E(|Y | ) < ∞ then, after a rearrangement of the terms, we can write ln ϕY (t) on the form

k X (it)j ln ϕ (t) = κ + o(|t|k) as t → 0 Y j! j j=1 for some {κj}. We call the coefficients κj the of Y ; in particular κj is the j:th cumulant. See Section 15.10 of [8] for details. Pk (it)j j k By Theorem 4.2 in Chapter 4 of [18] we have that ϕY (t) = 1+ j=1 j! EY +o(|t| ) as t → 0 when E(|Y |k) < ∞. If we for a moment don’t worry about the existence of

11 moments and convergence of the series we conclude that

∞ ∞ X (it)j  X (it)j  κ = ln ϕ (t) = ln 1 + EY j j! j Y j! j=1 j=1 and by looking at the MacLaurin expansion of the right hand side (i.e. the MacLaurin expansion of the function ln(1 + x) where we replace x with P EY j(it)j/j!) it follows that ∞ ∞ ∞ X (it)j X 1  X (it)j k κ = (−1)k+1 EY j . j! j k j! j=1 k=1 j=1 Comparing the coefficients of (it)j we find that

κ1 = EY = 0 2 2 κ2 = EY − (EY ) = 1 3 2 3 3 κ3 = EY − 3EY EY + 2(EY ) = EY 4 2 3 2 2 4 4 κ4 = EY − 3(EY ) − 4EY EY + 12(EY ) EY − 6(EY ) = EY

j The expression for κj holds whenever E(|Y | ) < ∞. Note that the assumption that E(|X|j) < ∞ implies that E(|Y |j) < ∞. −1/2 n Returning our attention to Sn, the relation ϕn(t) = (ϕY (tn )) and the fact that κ1 = 0 and κ2 = 1 now give us that ∞ ∞   X (itn−1/2)j n  X (it)j  ϕ (t) = exp κ = exp n−(j−2)/2 κ = n j! j j! j j=1 j=1  1 1 1  exp − t2 + n−1/2 κ (it)3 + ... + n−(j−2)/2 κ (it)j + ... = 2 3! 3 j! j 2  1 1  e−t /2 exp n−1/2 κ (it)3 + ... + n−(j−2)/2 κ (it)j + ... = 3! 3 j! j −t2/2 −1/2 −1 −j/2  e 1 + n r1(it) + n r2(it) + ... + n rj(it) + ... where the last equality is obtained through the MacLaurin expansion ex = 1+x+x2/2!+ ... rj is a polynomial of degree 3j that depends on κ3, . . . , κj+2. By comparing the coefficients of n−j/2 on the last two lines we find that 1 r (x) = κ x3 1 6 3 and 1 1 r (x) = κ x4 + κ2x6. 2 24 4 72 3 We can rewrite the expression for ϕn(t) above as

−t2/2 −1/2 −t2/2 −1 −t2/2 −j/2 −t2/2 ϕn(t) = e + n r1(it)e + n r2(it)e + ... + n rj(it)e + ... (5)

12 R ∞ itx −t2/2 R ∞ itx Now, since ϕn(t) = x=−∞ e dP (Sn ≤ X) and e = x=−∞ e dΦ(x) it seems plausible that there is an inversion of (5) of the form

−1/2 −j/2 P (Sn ≤ x) = Φ(x) + n R1(x) + ... + n Rj(x) + ...

R ∞ itx −t2/2 where Rk is a function such that −∞ e dRk(x) = rk(it)e so that Z ∞ Z ∞ Z ∞ itx itx −1/2 itx ϕn(t) = e dP (Sn ≤ X) = e dΦ(x) + n e dR1(x) + ... = (5). x=−∞ x=−∞ −∞

We would thus like to try to find such Rk. By repeating integration by parts j times we find that ∞ ∞ 2 Z Z e−t /2 = eitxdΦ(x) = (−it)−1 eitxdΦ(1)(x) = ... = x=−∞ x=−∞ Z ∞ (−it)−j eitxdΦ(j)(x) x=−∞

(k) dk d k k R ∞ itx k where Φ (x) = dxk Φ(x) = ( dx ) Φ(x) = D Φ(x). Hence −∞ e d((−D) Φ(x)) = k −t2/2 (it) e . Interpreting rk(−D) as a polynomial in D, making rk(−D) a differential R ∞ itx −t2/2 operator, we thus have that −∞ e d(rk(−D)Φ(x)) = rk(it)e . Thus

Rk(x) = rk(−D)Φ(x).

k By differentiating Φ(x) we find that, for k ≥ 1, (−D) Φ(x) = −Hk−1(x)φ(x) where φ(x) is the density function for the standard normal distribution and Hk are the Hermite polynomials:

H0(x) = 1

H1(x) = x, 2 H2(x) = x − 1 2 H3(x) = x(x − 3), 4 2 H4(x) = x − 6x + 3, 4 2 H5(x) = x(x − 10x + 15),...

1 3 1 4 1 2 6 We concluded above that r1(x) = 6 κ3x and r2(x) = 24 κ4x + 72 κ3x and since Rk(x) = rk(−D)Φ(x) we thus have that 1 1 1 R (x) = κ (−D)3Φ(x) = − κ H (x)φ(x) = − (x2 − 1)φ(x) 1 6 3 6 3 2 6 and 1 1 R (x) = − κ H (x)φ(x) − κ2H (x)φ(x) = 2 24 4 3 72 3 5  1 1  − x κ (x2 − 3) + κ2(x4 − 10x2 + 15) φ(x). 24 4 72 3

13 Thus

−1/2 −j/2 P (Sn ≤ x) = Φ(x) + n R1(x) + ... + n Rj(x) + ... = (6) −1/2 −1 −j/2 Φ(x) + n p1(x)φ(x) + n p2(x)φ(x) + ... + n pj(x)φ(x) + ... with 1 p (x) = − κ2(x2 − 1) and 1 6 3  1 1  p (x) = −x κ (x2 − 3) + κ2(x4 − 10x2 + 15) . 2 24 4 72 3

κ3 is the skewness of X and κ4 the kurtosis. It can be shown (see Section 2.4 of [22]) that the inversion of (5) leading to (6) is valid when X is nonsingular and E(|X|j+2) < ∞ if we limit the series to j terms:

−1/2 −j/2 −j/2 P (Sn ≤ x) = Φ(x) + n R1(x) + ... + n Rj(x) + o(n ) = (7) −1/2 −1 −j/2 −j/2 Φ(x) + n p1(x)φ(x) + n p2(x)φ(x) + ... + n pj(x)φ(x) + o(n ). This completes the proof of half of Corollary 1.

2 If X ∼ N(µ, σ ) then Sn ∼ N(0, 1) so we’d expect pj to be 0 for all j. This is indeed the case, since κj = 0 for j ≥ 3 for the standard normal distribution. Thus the ”expansion” still holds when P (Sn ≤ x) = Φ(x). It is of interest to note that both the skewness and the kurtosis are scale and trans- lation invariant, so that the third and fourth cumulants for X and Y coincide (we use Y in the calculations above because standardized variables are easier to handle). It can be shown, by looking at characteristic functions or by straightforward calculation (as is done in Appendix A), that the skewness of X¯ is n−1/2 times the skewness of X. Similarly the kurtosis of X¯ is n−1 times the kurtosis of X and in general, for j ≥ 2 the j:th cumulant of X¯ will be the j:th cumulant of X times n−(j−2)/2. Thus we can view −(j−2)/2 the factors n in the Edgeworth expansion for Sn as coming from the cumulants of X¯.

2.2.2 Edgeworth expansions for more general statistics

The procedure for finding the Edgeworth expansion for more general statistics An is essentially the same as that in the previous section. We briefly mention the result. Let 1/2 ˆ 1/2 ˆ An be either AS = n (θ −θ)/σ or AT = n (θ −θ)/σˆ, where θ is some unknown scalar parameter, θˆ is an asymtotically unbiased estimator of θ, σ2 is the asymptotic variance of n1/2θˆ andσ ˆ2 is some consistent estimator of σ2. Denote by κj,n the j:th cumulant of An. Under the regularity conditions stated in Theorem 2, for j ≥ 1, we can expand κj,n as

−(j−2)/2 −1 −2 κj,n = n (kj,1 + n kj,2 + n kj,3 + ...).

14 for some kj,i, where κ1,1 = 0 and κ2,1 = 1. It can be shown, through calculations that are completely analogous to the Sn case, where the cumulants of An are replaced by their expansions, that the first two polynomials in the Edgeworth expansion for An are  1   1  a (x) = − k + k H (x) = − k + k (x2 − 1) (8) 1 1,2 6 3,1 2 1,2 6 3,1 and

1 2 1 1 2  a2(x) = − (k2,2 + k1,2)H1(x) + (k4,1 + 4k1,2k3,1)H3(x) + k3,1H5(x) = 2 24 72 (9) 1 1 1  − x (k + k2 ) + (k + 4k k )(x2 − 3) + k2 (x4 − 10x2 + 15) 2 2,2 1,2 24 4,1 1,2 3,1 72 3,1 As before the inversion is valid, giving the truncated expansion

−1/2 −j/2 −j/2 P (An ≤ x) = Φ(x) + n a1(x)φ(x) + ... + n aj(x)φ(x) + o(n ) when E|X|j+2 < ∞ and X satisfies Cram´er’scondition.

1/2 Thus the problem of finding the Edgeworth expansion for An on the form n (θˆ− θ)/σ 1/2 or n (θˆ − θ)/σˆ amounts to finding the terms kj,i in the expansion

−(j−2)/2 −1 −2 κj,n = n (kj,1 + n kj,2 + n kj,3 + ...) of the cumulants of An. As we discussed in the previous section, if An = Sn is known then

−(j−2)/2 κj,n = n κj for j ≥ 2, where κj is the j:th cumulant for Y = (X − µ)/σ. Thus kj,1 = κj and kj,i = 0 for i ≥ 2. This reduces the expressions for a1 and a2 above to those for p1 and p2 in (6).

2.2.3 Edgeworth expansion for Tn 1/2 ¯ 2 1 P ¯ 2 Finally, consider the statistic Tn = n (X − µ)/σˆ whereσ ˆ = n (Xi − X) . It can be shown that the kj,i in the expansion of the cumulants of Tn are 1 k = − γ, 1,2 2 1 k = (7γ2 + 12), 2,2 4 k3,1 = −2γ and 2 k4,1 = 12γ − 2κ + 6.

Inserting these into (8) and (9) we get 1 2 1 q (x) = a (x) = −(− γ − γ(x2 − 1)) = γ(2x2 + 1) 1 1 2 6 6

15 and 1 1 1 q (x) = a (x) = − ( (7γ2 + 12) + (− γ)2)+ 2 2 2 4 2 1 1 1  (12γ2 − 2κ + 6 + 4(− γ)(−2γ)(x2 − 3) + (−2γ)2(x4 − 10x2 + 15) = 24 2 72  1 1 1  x κ(x2 − 3) − γ2(x4 + 2x2 − 3) − (x2 + 3) . 12 18 4 We require that EX4 < ∞ and that X satisifes Cram´er’s condition for the expansion

−1/2 −1 −1 P (Tn ≤ x) = Φ(x) + n q1(x)φ(x) + n q2(x)φ(x) + o(n ) to hold.

2.2.4 Some remarks; skewness correction

We’ve seen above that for both Sn and Tn the first polynomial a1 depends on the 3 3 2 skewness κ3 = γ = E(X − µ) /σ and that the second polynomial a2 depends on γ 4 4 and the kurtosis κ4 = κ = E(X − µ) /σ − 3. This is also true for many more general statistics An. In such cases, a1 is said to describe the primary effect of skewness while a2 is said to describe the primary effect of kurtosis and the secondary effect of skewness. A skewness corrected statistic is thus a statistic that has been modified in some way so that a1 = 0.

2.3 Cornish-Fisher expansions for quantiles An interesting use of the Edgeworth expansion is asymptotic expansions of the quantiles of An, obtained by what is essentially an inversion of the Edgeworth expansion. Such expansions are called Cornish-Fisher expansions and first appeared in [6] and [17].

Let An be a statistic with the Edgeworth expansion

−1/2 −j/2 P (An ≤ x) = Φ(x) + n a1(x)φ(x) + ... + n aj(x)φ(x) + ... (10) and let vα be the α-quantile of An, so that P (An ≤ vα) = α. Furthermore, let λα be the α-quantile of the N(0, 1)-distribution, i.e. let Φ(λα) = α. Then there exists an expansion of vα in terms of λα:

−1/2 −1 −j/2 vα = λα + n s1(λα) + n s2(λα) + ... + n sj(λα) + ... (11)

(11) is called the Cornish-Fisher expansion of vα. The functions sk are polynomials of degree at most k + 1, odd for even k and even for odd k, that depend on cumulants of order at most k + 2. They are determined by the polynomials ak in (10). [22] contains a short introduction to the Cornish-Fisher expansion, where it is shown 0 1 2 that s1(x) = −a1(x) and s2(x) = a1(x)a1(x) − 2 a1(x) − a2(x).

16 3 Methods for skewness correction

We discuss the need of skewness correction. Some methods for obtaining second order correct confidence intervals are described. We assume throughout the section that X is absolutely continuous.

3.1 Coverages of confidence intervals

Definition 2. Let Iθ(α) be a confidence interval for an unknown parameter θ with approximate confidence level α. We call α the nominal coverage of Iθ(α). Furthermore 0 we call α = P (θ ∈ Iθ(α)) the actual coverage of the confidence interval and define the coverage error as the difference between the actual coverage and the nominal coverage, α0 − α.

If Iθ(α) has been reasonably constructed then this difference will converge to zero as the sample size n increases. We will illustrate how the Edgeworth expansion can be used to estimate the order of coverage errors.

Let X1,...,Xn be an i.i.d. sample from a distribution, with mean µ and variance 2 σ , such that the Edgeworth expansion for Sn and Tn exists. Consider the two sided −1/2 −1/2 confidence interval Jµ(α) = (X¯ −n σzα, X¯ +n σzα) where P (|Z| ≤ zα) = α when Z ∼ N(0, 1). We find that

P (µ ∈ Jµ(α)) = P (Sn > −zα) − P (Sn > zα) =

P (Sn ≤ zα) − P (Sn ≤ −zα) = −1/2 Φ(zα) − Φ(−zα) + n (p1(zα)φ(zα) − p1(−zα)φ(−zα)) −1 + n (p2(zα)φ(zα) − p2(−zα)φ(−zα)) −3/2 −3/2 + n (p3(zα)φ(zα) − p3(−zα)φ(−zα)) + o(n ) = −1 −3/2 α + 2n p2(zα)φ(zα) + o(n ).

The last equality follows since p2 is odd and p1, p3 and φ are even. The same result holds if we haveσ ˆ instead of σ, with q2 instead of p2. Thus the coverage error for two-sided normal approximation confidence intervals is of order n−1. In some sense we can think of the two-sided confidence intervals as containing an implicit skewness correction.

The situation is not as good for one-sided confidence intervals. Consider the one- −1/2 sided normal approximation confidence interval Iµ(α) = (−∞, X¯ + n σλα), where Φ(λα) = α. The coverage of Iµ(α) is

−1/2 P (µ ∈ Iµ(α)) = P (µ ≤ X¯ + n σλˆ α) = P (Sn ≥ −λα) = −1/2 −1/2 1 − (Φ(−λα) + n p1(−λα)φ(−λα) + o(n )) = −1/2 −1/2 α − n p1(λα)φ(λα) + o(n ).

17 0 ¯ −1/2 The coverage for the interval Iµ(α) = (−∞, X + n σλˆ α) is analogously found to be

−1/2 −1/2 α − n q1(λα)φ(λα) + o(n )

Thus, for one-sided confidence intervals, normal approximation gives a coverage error of order n−1/2.

The polynomials p1 and q1 both contain the skewness γ. Thus we see that the skewness of X affects the actual coverage of the normal approximation confidence intervals. In particular, when the skewness is zero the n−1/2 term of the coverage error disappears and when the skewness is large the coverage error might be large. When X is skew it is possible to obtain confidence intervals with a better coverage by correcting for skewness. Some methods for this are presented next. We will assume that the variance σ2 and the skewness γ are known. It might seem like a bit of a contradiction that the second and third central moments are known, but not the mean. An example where such a situation could occur is when a measuring instrument that has been used sufficiently much, so that the variance and skewness of it’s measures are known, is used to measure something that has not been measured before. One could of course argue that in that case the distribution, or at least the quantiles, of the measurement errors might be known as well and that a parametric confidence interval would make more sense. Let us however assume that the quantiles are unknown and that the density function is unkown or too complicated to work with for such procedures to be fruitful.

3.2 Edgeworth correction

In many cases we wish to derive a confidence interval using some statistic An. In cases where the Edgeworth expansion for An is known, we can obtain confidence intervals with a coverage error of a smaller order than that of the normal approximation interval. In particular, we can make an explicit correction for skewness using the following theorem, various versions of which were proved in [27], [19], [29] and [1].

1/2 1/2 Theorem 3. Let An be either Sn = n (X¯ − µ)/σ or Tn = n (X¯ − µ)/σˆ and assume that An admits the Edgeworth expansion

−1/2 −1/2 P (An ≤ x) = Φ(x) + n a1(x)φ(x) + o(n ).

Then  −1/2  −1/2 P An ≤ x − n a1(x) = Φ(x) + o(n ) (12) and  −1/2  −1/2 P An ≤ x − n aˆ1(x) = Φ(x) + o(n ) where aˆ1 is the polynomial a1 with population moments replaced by sample moments.

18 Under the assumptions of Theorem 3, if An = Sn and if the skewness of X is known,  −1/2 −1  then IEcorr = − ∞, X¯ − n σλ1−α + n σp1(λ1−α) has nominal coverage α and, by (12),

 −1/2 −1  P (µ ∈ IEcorr) = P X¯ − n σλ1−α + n σp1(λ1−α) > µ =

 −1/2  −1/2 −1/2 P Sn > λ1−α − n p1(λ1−α) = 1 − Φ(λ1−α) + o(n ) = α + o(n ).

Thus the coverage error of the confidence intervals is of order n−1. We note that, in the −1/2 −1/2 −1/2 notation of Section 2.3, λ1−α −n a1(λ1−α) = λ1−α +n s1(λ1−α) = v1−α +o(n ). Heuristically, we can therefore consider the idea behind the explicit correction to be to replace the quantile of the normal distribution with the truncated Cornish-Fisher expansion of the corresponding quantile of An. The interval can be corrected further, by correcting for kurtosis using the n−1 term −1/2 in the Edgeworth expansion for An + n a1(x) analogously, to obtain an even smaller coverage error. This iteration will however sometimes result in an over-corrected interval, particularly when n is small; see for instance [19] or [1]. Moreover, the expression is a lot harder to derive analytically for such corrections.

3.3 The bootstrap The bootstrap procedure for is a popular tool for estimating the distributions of statistics. It was first introduced by Efron in 1979 in [15]. Some theoretical results, based on Edgeworth expansions and similar techniques, regarding the asymptotic per- 1/2 formance of the bootstrap for the statistic Sn = n (X¯ −µ)/σ were provided in 1981 by Singh in [28] and Bickel and Freedman in [3]. Since then asymptotics and performance in more general situations have been studied. One of the most important situations is the construction of confidence intervals using the bootstrap; we will investigate the properties of bootstrap confidence intervals using the Edgeworth expansion.

3.3.1 The bootstrap and Sn We briefly mention the results of [28] and [3] to motivate why the bootstrap might be used for skewness correction. We omit the details and only try to give a general idea about the result. The statistic X¯ − µ S = n1/2 n σ has the bootstrap analogue X¯ ∗ − x¯ S∗ = n1/2 n σˆ ∗ ∗ ∗ where X ∼ Fn, i.e. the distribution function of X is the empirical function, EX =x ¯, 1 Pn (x −x¯)3 ∗ 2 1 Pn 2 ∗ n i=1 i Var(X ) =σ ˆ = n i=1(xi − x¯) and Skew(X ) =γ ˆ = σˆ3 . Part D of

19 Theorem 1 in [28] says that

1/2 ∗ ∗ n ||P (Sn ≤ x) − P 2 (Sn ≤ x)||∞ −→ 0 a.s. (13)

The main idea behind this is the following. It is shown that the conditional Edgeworth ∗ expansion for Sn can be written as 1 P ∗(S∗ ≤ x) = Φ(x) − n−1/2 γˆ(x2 − 1)φ(x) + R (x) n 6 n

1/2 −→ uniformly in x where n Rn(x) a.s. 0. By looking at the Edgeworth expansions for Sn ∗ and Sn and considering the difference between γ andγ ˆ (13) follows. In particular, this means that there is no n−1/2 error term (or term of lower order) present when the distribution of Sn is approximated with the bootstrap distribution. ∗ It thus seems plausible that a confidence interval based on the quantiles of Sn will be second order correct. It has been shown that this indeed is the case.

3.3.2 Bootstrap confidence intervals There a numerous bootstrap methods for constructing confidence intervals. A few papers in the 1980’s have been important for the understanding of the theoretical properties of these methods. Abramovitch and Singh ([1]) and Hall ([20]) discussed the coverage of bootstrap confidence intervals and in [21] Hall developed a unified framework for the theory for different types of bootstrap confidence intervals. This allowed a comparison of the different methods. The discussion in the literature has mainly focused on the case where σ is unknown, but as before we consider the case where σ is known. We use a method that we will call -s, in which we try to estimate the quantiles of Sn by ∗ those of Sn. The idea is the following. Let v1−α be such that P (Sn ≤ v1−α) = 1 − α. A one-sided 0 ¯ −1/2 confidence interval with confidence level α is Iµ(α) = (−∞, X − v1−ασn ). Since the distribution of Sn is unknown we would like to estimate v1−α somehow. Looking at (13) ∗ ∗ ∗ it seems reasonable to use Sn for the estimation. The distribution function P (Sn ≤ x) is however also unknown. ∗ ∗ It can be estimated by using B bootstrap replications of Sn: Sn,1,...,Sn,B where ∗ 1/2 ¯ ∗ ∗ Sn,i = n (X − x¯)/σˆ. Looking at the bootstrap sample order statistics Sn,(1) ≤ ... ≤ ∗ ∗ ∗ Sn,(B) we can estimate v1−α byv ˆ1−α = Sn,((1+B)(1−α)). (2) ¯ ∗ −1/2 The coverage of the percentile-s confidence interval Iµ (α) = (−∞, X −vˆ1−ασn ) is  (2)  −1/2 P µ ∈ Iµ (α) = α + o(n ). This is a consequence of (13) (see also [21]). We have thus obtained a skewness corrected confidence interval.

20 3.4 A new simulation method Both the Edgeworth correction and the bootstrap are methods for getting skewness corrected confidence intervals by approximating the quantile v1−α for Sn. As we’ve seen this is done by approximating v1−α via Cornish-Fisher expansion in the Edgeworth correction case and by estimating the quantiles using bootstrap replicates in the different bootstrap methods. Another idea for obtaining skewness corrected confidence intervals is to somehow transform the problem into one relating to a statistic with zero skewness. A method for doing this by using transformations is discussed in [23]. Next we describe a simulation method that corrects for skewness by adding a simulated random variable. Inference can then be made about the obtained skewness corrected distribution.

3.4.1 Skewness correction through addition of a random variable Lemma 1. Let X be a random variable with EX = µ, Var(X) = σ2 and E(X − EX)3 = 3 µ3 so that Skew(X) = γ = µ3/σ . Now let Y be a random variable, independent of X, such that EY = 0, Var(Y ) = 2 3 2 aσ and E(Y − EY ) = −µ3. Then E(X − Y ) = µ, Var(X − Y ) = (1 + a)σ and Skew(X − Y ) = 0. Proof. The results for the mean and variance of X − Y are well known. The result for the skewness follows from Corollary 4 in Appendix A.

Let X1,...,Xn be n observations from the distribution of X and Y1,...,Yn be n observations from the distribution of Y . Define Zi = Xi − Yi so that Z1,...,Zn become n observations from the distribution of Z = X − Y . 1 2 It was shown in Section 2.2 that p1(x) = − 6 γ(x − 1) in the Edgeworth expansion 1/2 ¯ for Sn = n (X − µ)/σ. We see that, since γZ √= Skew(X − Y ) = 0, the first term in 0 1/2 ¯ the Edgeworth expansion for Sn = n (Z − µ)/ 1 + aσ will be zero.

Theorem 4. Let Zi = Xi − Yi where {Xi}√and {Yi} are as in Lemma 1. Then the −1/2 confidence interval Inew = (−∞, Z¯ + n 1 + aσλα) is skewness corrected and has −1  1 2  −1 coverage α + n λα 24 κZ (λα − 3) φ(λα) + o(n ). √ 0 1/2 ¯ 0 Proof. The Edgeworth expansion for Sn = n (Z − µ)/ 1 + aσ is P (Sn ≤ x) = Φ(x) + −1 −1 n p2(x)φ(x) + o(n ). Hence −1 −1 P (µ ∈ JEcorr) = α − n p2(λα)φ(λα) + o(n ) =  1 1  α + n−1λ κ (λ2 − 3) + γ2 (λ4 − 10λ2 + 15) φ(λ ) + o(n−1) = α 24 Z α 72 Z α α α (14)  1  α + n−1λ κ (λ2 − 3) φ(λ ) + o(n−1) α 24 Z α α −1 and thus the coverage error of Inew is of order n .

A simulation procedure that gives us the observations zi will be described next.

21 3.4.2 Simulation procedure

Let X be as above and suppose that n observations x1, . . . , xn from the distribution of X are given and that σ2 and γ0 (and thus γ) are known. Then, if we can find a random variable Y , with the same properties as Y above, such that we can simulate pseudo random numbers from the distribution of Y , we can obtain the confidence interval Jµ as follows. Simulate n numbers√Y1,...,Yn from the distribution of Y and let Zi = xi − Yi. Then −1/2 −1 Inew = (−∞, Z¯ + n 1 + aσλα) has a coverage error of order n .

A natural question at this point is whether or not there exists a distribution or a family of distributions that can be used for this kind of simulation in general circumstances. The answer is that yes, such distributions exist. A possible choice for the distribution of the simulation variable Y is (a shifted version of) the inverse Gaussian distribution. It’s usefulness follows from the fact that it’s variance and skewness easily can be controlled by its two parameters, as we will show. Definition 3. Y is inverse Gaussian distributed with parameters λ > 0 and µ > 0 if s s  λ y   λ y  F (y) = P (Y ≤ y) = Φ − 1 + e2λ/µΦ − + 1 , y > 0 Y y µ y µ

p 3 2 2 fY (y) = λ/(2πy ) exp(−λ(y − µ) /(2yµ )), y > 0 Note that this is not the inverse of the normal distribution; the name is not to be taken literally. Lemma 2. Let Y be inverse Gaussian distributed with parameters λ > 0 and µ > 0. Then EY = µ, µ3 Var(Y ) = , λ µ Skew(Y ) = 3( )1/2 and λ µ Kurt(Y ) = 15 . λ See Chapter 2 of [5] for a proof and further discussion of the distribution. Using Lemma 2 we can now decide which values of λ and µ to choose for our skewness correcting simulation variable.

Lemma 3. Let µ3 > 0 and σ > 0. Then the inverse Gaussian distribution with param- 10 −3 4 −1 2 eters λ = 27 · σ · µ3 and µ = 3 · σ · µ3 has variance σ , third µ3 and 3 skewness µ3/σ . Proof. By Lemma 2 the inverse Gaussian distribution has variance µ3/λ and skewness p 2 3 3 · µ/λ. We want the variance to equal σ and the skewness to equal µ3/σ . The first

22 equality gives us that λ = µ3 · σ−2 and putting this into the second equality yields the 3 1/2 −3/2 −1 4 −1 expression a/σ = 3 · µ · µ · σ from which it follows that µ = 3 · σ · µ3 and 10 −3 therefore that λ = 27 · σ · µ3 . Now assume that Y 0 is inverse Gaussian distributed with parameters λ and µ. Then EY 0 = µ. Since we wanted to use a random variable with mean 0 for our simulation we choose Y = Y 0 − µ as our simulation variable. Then EY = 0, Var(Y ) = σ2 and 3 Skew(Y ) = µ3/σ where σ and µ3 can be choosen arbitrarily. Note that the skewness of Y always is positive, so that if the skewness of X is positive we would choose −Y as our simulation variable instead. Next we state a lemma that will prove to be useful later in our discussion.

Lemma 4. Let Y1,...,Yn be i.i.d. inverse Gaussian distributed random variables with parameters λ > 0 and µ > 0. Then Y¯ is inverse Gaussian with parameters nλ and µ.

See Section 2.4 of [5] for a proof. Finally, we mention that the inverse Gaussian distribution is implemented in R in the SuppDists-library.

4 Comparison

Having looked at different confidence intervals we now want to compare them. We only compare the one-sided intervals Inormal, Inew and IEcorr.

4.1 Coverages of confidence intervals Assume that the variance σ2 and the skewness γ 6= 0 of the i.i.d. random vari- ables X1,...,Xn are known and that X satisfies Cram´er’scondition and is such that E|X|3 < ∞. We consider the following three confidence intervals for the mean µ:

−1/2 The normal approximation confidence interval Inormal(α) = (−∞, X¯ +n σλα), where Φ(λα) = α,   −1/2 −1/2 P µ ∈ Inormal(α) = α − n p1(λα)φ(λα) + o(n ),

 −1/2 −1  the Edgeworth corrected confidence interval IEcorr = −∞, X¯−n σλ1−α+n σaˆ1(λ1−α) ,

  −1/2 P µ ∈ IEcorr(α) = α + o(n ),

−1/2 and the skewness corrected simulation confidence interval Inew = (−∞, Z¯ +n σZ λα), where Skew(Z) = 0,   −1/2 P µ ∈ Inew(α) = α + o(n ).

23 4.2 Criteria for the comparison

It seems that Inormal has the worst coverage. However, coverage is not the only measure of the performance of a confidence interval. Another important property is the length of the interval. In general, if two confidence intervals have the same coverage we prefer the one that is shorter. Alternatively, if two confidence intervals are of the same length we prefer the one with the coverage closest to α. Things get a bit trickier if we have one short interval with bad coverage and one wide interval with good coverage; we must somehow choose between good length and good coverage. And then there is of course the computional cost of the computer intensive methods versus the algebraic work that the analytical methods require. Regardless of how we weigh these properties, it is of interest to compare the lengths of the confidence intervals above. Since the intervals are one-sided their length, in the usual sense of the word, is in fact infinite. We therefore arive at the following definition.

Definition 4. Given two intervals, I1 = (−∞, θˆ1) and I2 = (−∞, θˆ2), both with coverage −j/2 α + o(n ) for some j, we say that I2 is better than I1 if P (θˆ2 ≤ θˆ1) ≥ 1/2.

4.3 Comparisons of the upper limits We will compare the upper limit obtained by our skewness correction simulation method with those obtained by the bootstrap and explicit skewness correction. Throughout we let γX denote the skewness of X.

−1/2 Consider the Edgeworth corrected confidence interval IEcorr = (−∞, X¯ − n σλ1−α + −1 ˆ n σp1(λ1−α)) =√ (−∞, θEcorr) and our simulation skewness corrected interval Inew = −1/2 −1/2 (−∞, Z¯ + n 1 + aσλα) = (−∞, θˆnew). Both intervals have coverage α + o(n ). 1 2 Recall that λ1−α = −λα and that p1(λ1−α) = − 6 γX (λα − 1). Thus ˆ ˆ ˆ ˆ P (θnew ≤ θEcorr) = P (θnew − θEcorr ≤ 0) = √  −1/2 −1/2 −1  P Z¯ + n 1 + aσλα − X¯ + n σλ1−α − n σp1(λ1−α) ≤ 0 =  √ 1  P Z¯ − X¯ + n−1/2σλ ( 1 + a − 1) + n−1 σ(λ2 − 1)γ ≤ 0 = α 6 α X  √ 1  P Y¯ ≤ −n−1/2σλ ( 1 + a − 1) − n−1 σ(λ2 − 1)γ . α 6 α X Since the distribution of Y is known this probability is fully known. It is clearly depen- dent on the distribution of Y , but some words can be said about its general behaviour. √ ¯ −1/2 −1 1 2 Theorem 5. If the of Y is larger than −n σλα( 1 + a−1)−n 6 σ(λα−1)γX then Inew is better than IEcorr. √ −1/2 −1 1 2 Proof. By the definition of the median, if −n σλα( 1 + a − 1) − n 6 σ(λα − 1)γX is larger than the median, the probability will be larger than 1/2.

24 For a large class of unimodal continuous densities, including the Pearson system and the inverse Gaussian distribution it holds that

mean > median when γ > 0

mean < median when γ < 0 (see [25]). Assume that Y¯ has such a density and note that the sign of the skewness of X determines the sign of the skewness of Y¯ . Then Inew is never better than IEcorr when γX > 0. It is sometimes better when γX < 0; in particular the method gets better as |γX | increases. ¯ To see this, let γX > 0. Then γY¯ < 0 and thus the mean of Y is smaller than ¯ the median.√ Since EY = 0 the median must thus be positive. But in that case −1/2 −1 1 2 −n σλα( 1 + a−1)−n 6 σ(λα−1)γX < 0 for reasonable values of α and it therefore seems that the new method is not to be recommended in such cases. ¯ On the other hand, if γX < 0 then γY¯ > 0 and the mean of Y is larger than the median, so that the median must be negative. Moreover, the second part of the right −1 1 2 hand expression, −n 6 σ(λα − 1)γX , is positive and if γX is big enough the right hand side is therefore larger than the median of Y¯ . We see that in this case it is indeed possible that the new method gives intervals that are shorter in general than those obtained using Edgeworth correction. For the special case considered in Section 3.4.2 we have the following corollary. Corollary 2. Assume that Y is generated using an inverse Gaussian distributed random 0 variable Y , as described in Section 3.4.2, and let λY and µY be the parameters in the 0 distribution of Y . If γX > 0 then the probability that Inew is better than IEcorr is r r  nλY  x1   nλY  x1  Φ − 1 + e2nλY /µY Φ − + 1 (15) x1 µY x1 µY where √ 1 x = µ − n−1/2σλ ( 1 + a − 1) − n−1 σ(λ2 − 1)γ , 1 Y α 6 α X and if γX < 0 the probability is r r  nλY  x2   nλY  x2  Φ − 1 + e2nλY /µY Φ − + 1 (16) x2 µY x2 µY where √ 1 x = µ + n−1/2σλ ( 1 + a − 1) + n−1 σ(λ2 − 1)γ . 2 Y α 6 α X

Proof. First, assume that γX < 0. Then it follows from Lemma 4 that Y¯ + µY is inverse Gaussian with parameters nλY and µY and we can rewrite the probability as  √ 1  P Y¯ + µ ≤ µ − n−1/2σλ ( 1 + a − 1) − n−1 σ(λ2 − 1)γ = P (Y¯ + µ ≤ x ) = Y Y α 6 α X Y 1 r r  nλY  x1   nλY  x1  Φ − 1 + e2nλY /µY Φ − + 1 . x1 µY x1 µY

25 Second, assume that γX > 0 and that Y is generated using an inverse Gaussian dis- 0 0 tributed random variable Y . Then Y will be defined as Y = −(Y − µY ) and −Y¯ + µY is inverse Gaussian. Thus  √ 1  P Y¯ ≤ −n−1/2σλ ( 1 + a − 1) − n−1 σ(λ2 − 1)γ = α 6 α X  √ 1  P − Y¯ + µ > µ + n−1/2σλ ( 1 + a − 1) + n−1 σ(λ2 − 1)γ = Y Y α 6 α X 1 − P (−Y¯ + µY ≤ x2) = r r  nλY  x2   nλY  x2  Φ − 1 + e2nλY /µY Φ − + 1 . x2 µY x2 µY

Somewhat surprisingly, (15) and (16) do not depend on σ. Although not easily seen algebraically, this is an effect of the parametrization of the inverse Gaussian distribution and of how λY and µY were defined from σ and γX . Appendix B contains tables with explicit values of (15) and (16) for some combinations of α, n and γX as well as some figures describing how (15) and (16) depend on α. From the discussion above, as well as from Tables 1-2 and Figures 1-3 in the appendix we deduce that the sign and magnitude of γX is of great importance for the theoretical performance of our simulation method. In particular, the new method seems to have a better performance when γX is large and negative and n is small. This effect is even clearer when α is close to 1.

4.4 Simulation results Some simulation results regarding the performance of the new simulation method com- pared with Edgeworth correction and normal approximation are given in Appendix C. The performance of the new method is quite good when the nominal coverage is 0.95 and γX is −10.4 or −20.1. When γX = −40.3 the usual normal approximation interval seems preferable. This might seem surprising if we bear in mind the part that skewness plays in the Edgeworth expansion, but for distributions with a large skew the skewness (and indeed the variance) will largely be due to infrequent big deviations from the mean. In small sample sizes such as those considered in the simulations there may not be such observations, in which case the sample skewness will be close to zero. When the nominal coverage is 0.99 the Edgeworth correction method seems to be superior. The performance of the new method is however always better than that of the normal approximation. The choice of method is not independent of the distribution of the Xi. It would thus be of interest to perform further simulations with different distributions.

4.5 Discussion The theoretical investigation, as well as the simulation study, gives slightly encouraging results. There does seem to be cases, albeit admittedly somewhat unusual, where the

26 new method performs better than Edgeworth correction and normal approximation. It would be of interest to compare the new method with the percentile-s bootstrap as well. ˆ ˆ −3/2 It can be shown that θboots = θEcorr + OP (n ) and thus we expect the result of such a comparison to be quite similar to the comparison with Edgeworth correction. The assumption that both the second and the third central moments are known are perhaps somewhat unrealistic. It would therefore be desireable to expand the method to the situation where these are estimated from the xi-sample. A problem that must be handled is that this introduces a dependence between X and Y . It is possible that the correction might be extended to higher moments as well, using addition formulae like those for skewness. One idea is to use a SIMEX-like method to correct for kurtosis, since the fact that the kurtosis always is greater than −2 makes it hard to get rid of it the same way that the new method gets rid of skewness. This could then also be used for two-sided confidence intervals. It might also be desireable to obtain skewness corrected intervals that are shorter than those given by the new method. Letting a → 0 does not seem to be a good way of doing this, so we might consider adding a second pseudo-random variable that is used in the SIMEX-fashion to reduce the variance. We might also consider to use different conditions on Y ; for instance it is perhaps possible not to require that EY = 0. Another idea is to try to construct Y using a U-statistic where the differences Xi − Xj are considered. Once more, this would introduce a dependence between X and Y . Different kinds of comparisons between the methods can be made. We can compare mean length of the intervals or let the length be fixed and study the coverages given by the different methods under that condition. Finally, it is possible that the method will have a better performance for more general statistics An. The notion of skewness might have a somewhat different meaning for these and the Edgeworth expansions have a different behaviour.

27 A Appendix: Skewness and kurtosis

Several measures of symmetry (or asymmetry) have been suggested throughout history. The most popular one is the third , commonly known as the skew- ness.

Definition 5. The skewness of a random variable X, henceforth denoted Skew(X), is it’s standardized third moment: µ E(X − EX)3 Skew(X) = γ = 3 = σ3 (E(X − EX)2)3/2

(see for instance section 15.8 of [8] or section 4.6 of [18]).

Let us note that the third central moment of aX+b is E(aX+b−aEX−b)3 = a3E(X− EX)3. We notice that the skewness is translation invariant and that Skew(aX) = sign(a) · Skew(X). Furthermore, if X is symmetric then all odd central moments of X are zero and thus Skew(X) = 0. The converse does not necessarily hold, that is, skewness equal to zero does not imply symmetry.

The ”peakedness” of a distribution is measured using the kurtosis of the distribution. It describes to which extent the variance of the distribution depends on uncommon large deviations from the mean; if the kurtosis is high then these are the reason for much of the variance and if it is low most of the variance comes from frequent moderately sized deviations.

Definition 6. The kurtosis (or excess kurtosis) of a random variable X, henceforth denoted Kurt(X), is:

µ E(X − EX)4 Kurt(X) = κ = 4 − 3 = − 3 σ4 (E(X − EX)2)2

(see for instance section 15.8 of [8] or section 4.6 of [18]).

We see that Kurt(aX + b) = Kurt(X). Kurt(X) = 0 when X is normal. Moreover, 4 since µ4/σ ≥ 1, −2 ≤ Kurt(X) ≤ ∞.

A.1 The skewness of a sum of random variables Although formulae for the expectation and variance of a sum of random variables are well known, the corresponding formulae for the third central moment or the skewness of a sum of random variables is rarely seen in the literature. The proofs amount to standard calculations of expectations, but nevertheless they are given here for reference. Z In the following calculations we will use µ3 to denote the third central moment of a random variable Z, whenever we need to be able to distinguish between moments of different random variables.

28 Z 3 Lemma 5. Let µ3 = E(Z − EZ) be the third central moment of the random variable Z and let X and Y be random variables. Then

X+Y X Y 2 2 µ3 = µ3 + µ3 + 3 · Cov(X ,Y ) + 3 · Cov(X,Y ) − 6 · (EX + EY ) · Cov(X,Y ) (17) Proof. We expand the expression for the third central moment of X + Y as follows.

E(X + Y − E(X + Y ))3 = E((X − EX) + (Y − EY ))3 = E(X − EX)3 + E(Y − EY )3 + 3E(X − EX)2(Y − EY ) + 3E(X − EX)(Y − EY )2 (18)

Next we consider the third part of the last expression above:

E(X − EX)2(Y − EY ) = E(X2 − 2XEX + (EX)2)(Y − EY ) = E(X2Y − 2XY EX + Y (EX)2 − X2EY + 2XEXEY − (EX)2EY ) = EX2Y − 2EXY EX + EY (EX)2 − EX2EY + 2(EX)2EY − (EX)2EY = EX2Y − EX2EY + 2(EX)2EY − 2EXY EX = EX2Y − EX2EY + 2EX(EXEY − EXY ) = Cov(X2,Y ) − 2EXCov(X,Y )

In the exact same order we conclude that the fourth part of (18) can be written as

E(X − EX)(Y − EY )2 = Cov(X,Y 2) − 2EY Cov(X,Y )

3 X And thus, since E(X − EX) = µ3 , we have that X+Y X Y 2 2 (18) = µ3 = µ3 + µ3 + 3 · Cov(X ,Y ) + 3 · Cov(X,Y ) − 6 · (EX + EY ) · Cov(X,Y )

Using that Var(X + Y ) = Var(X) + Var(Y ) + 2 · Cov(X,Y ) the addition formula for skewness follows: Corollary 3. Let X and Y be random variables. Then

µX + µY + 3 · Cov(X2,Y ) + 3 · Cov(X,Y 2) − 6 · (EX + EY ) · Cov(X,Y ) Skew(X+Y ) = 3 3 (Var(X) + Var(Y ) + 2 · Cov(X,Y ))3/2 (19) Two important special cases of Corollary 3 immediately follow. Corollary 4. Let X and Y be independent random variables. Then

µX + µY Skew(X + Y ) = 3 3 (20) 2 2 3/2 (σX + σY ) The extension to the sum of n independent variables is straightforward. It is of interest to note that Corollary 4 holds regardless of the values of the expected values of X and Y ; in particular, it holds when EX and/or EY are unknown.

29 Corollary 5. Let X1,X2,...,Xn be i.i.d. random variables. Then

X 1 µ3 1 Skew(X1 + X2 + ... + Xn) = √ 3 = √ · Skew(X) (21) n σX n

Corollary 5 is proved by using Corollary 4 n − 1 times and noticing that Skew(X1 + X 2 3/2 3/2 √ X2 + ... + Xn) = (n · µ3 )/(n · σX ) = n/n · Skew(X) = 1/ n · Skew(X). The i.i.d. condition can be somewhat weakened. It is clear from Corollary 3 that it is sufficient that the variables have equal variance and third central moments and that 2 2 Cov(Xi ,Xj) = Cov(Xi,Xj ) = Cov(Xi,Xj) = 0 when i 6= j.

A.2 The kurtosis of a sum of random variables For higher moments the addition formulae become more complicated, which is one reason for introducing cumulants in the first place; they still have nice addition formulae. We are not as concerned about kurtosis as we are about skewness and therefore we mention only the following lemma, which follows from the remark at the end of Section 2.2.2.

Lemma 6. Let X and Y be independent random variables, with finite fourth central moments, such that Var(X) = Var(Y ). Then 1  Kurt(X + Y ) = Kurt(X) + Kurt(Y ) (22) 4

30 ˆ ˆ B Appendix: P (θnew ≤ θEcorr)

This appendix contains some tables and figures relating to the comparison of our sim- ulation method and the explicit correction method. Tables 1 and 2 give explicit values of (15) and (16) (see Subsection 4.3) for some combinations of α, n and γX . Figures 1-3 show these probabilities as functions of α for eight values of γX and three values of n. Finally, Tables 3 and 4 contain the coverage error terms of order n−1 for the confidence intervals obtained by the two methods.

ˆ ˆ Table 1: P (θnew ≤ θEcorr) when α = 0.95 α = 0.95 n = 15 n = 30 n = 75 n = 150 n = 500 γX = 0.1 0.2461585 0.2466465 0.2470814 0.2473014 0.2475421 γX = 0.5 0.2397286 0.2420325 0.2441252 0.2451973 0.2463814 γX = 1 0.2191781 0.2271931 0.2405524 0.2426289 0.2449492 γX = 2 0.1940548 0.2083596 0.2220317 0.2292854 0.2421463 γX = 4 0.1531201 0.1756957 0.1990799 0.2121904 0.2275920 γX = 6 0.1222838 0.1488952 0.1787612 0.1964792 0.2181277 γX = 10 0.08131108 0.10902326 0.14503125 0.16887748 0.20045674 γX = 15 0.05257063 0.07680105 0.11329497 0.14062359 0.18059000 γX = 20 0.03635416 0.05635855 0.09006719 0.11805326 0.16298357 γX = 50 0.00864711 0.01547122 0.03112579 0.04931212 0.09264032 γX = −0.1 0.2495379 0.2490361 0.2485927 0.2483700 0.2481274 γX = −0.5 0.2566349 0.2539837 0.2516825 0.2505409 0.2493081 γX = −1 0.2802507 0.2703587 0.2556722 0.2533178 0.2508028 γX = −2 0.3165029 0.2948042 0.2766470 0.2678903 0.2538560 γX = −4 0.3996042 0.3493397 0.3085325 0.2894824 0.2698846 γX = −6 0.4922321 0.4104844 0.3434120 0.3126079 0.2816008 γX = −10 0.6701141 0.5427119 0.4208967 0.3632395 0.3064168 γX = −15 0.8195909 0.6931626 0.5256882 0.4333398 0.3399659 γX = −20 0.8963340 0.8001726 0.6269255 0.5075826 0.3761091 γX = −50 0.9859162 0.9697800 0.9178636 0.8359562 0.6147019

31 ˆ ˆ Table 2: P (θnew ≤ θEcorr) when α = 0.99 α = 0.99 n = 15 n = 30 n = 75 n = 150 n = 500 γX = 0.1 0.1639229 0.1649940 0.1659530 0.1664393 0.1669728 γX = 0.5 0.1502835 0.1550681 0.1595138 0.1618278 0.1644116 γX = 1 0.1262885 0.1370188 0.1519811 0.1563255 0.1612902 γX = 2 0.0967257 0.1128146 0.1300353 0.1399258 0.1553075 γX = 4 0.06046637 0.07878596 0.10215891 0.11745456 0.13756958 γX = 6 0.04097709 0.05744783 0.08158069 0.09931798 0.12492985 γX = 10 0.02240486 0.03418956 0.05479085 0.07284335 0.10368843 γX = 15 0.01282831 0.02074561 0.03628517 0.05187652 0.08328521 γX = 20 0.008322046 0.013965375 0.025818012 0.038719667 0.067999284 γX = 50 0.001777641 0.003268604 0.006983243 0.011876996 0.026872164 γX = −0.1 0.1714419 0.1703105 0.1693153 0.1688168 0.1682749 γX = −0.5 0.1879753 0.1816847 0.1763341 0.1737183 0.1709231 γX = −1 0.2240299 0.2057937 0.1856757 0.1801257 0.1743163 γX = −2 0.2972269 0.2523213 0.2172884 0.2013779 0.1813850 γX = −4 0.4814233 0.3693050 0.2804016 0.2417861 0.2049413 γX = −6 0.6624912 0.5049659 0.3560668 0.2889614 0.2265836 γX = −10 0.8704285 0.7391605 0.5270488 0.4005038 0.2759957 γX = −15 0.9514307 0.8868428 0.7150916 0.5527714 0.3484026 γX = −20 0.9763116 0.9437372 0.8343956 0.6875119 0.4293880 γX = −50 0.9972413 0.9939490 0.9820008 0.9574301 0.8227825

32 1.0 0.8 0.6 P 0.4

0.2 gamma=1 gamma=10 gamma=20 gamma=50 0.0

0.5 0.6 0.7 0.8 0.9 1.0

alpha 1.0

gamma=−1 gamma=−10 gamma=−20 0.8 gamma=−50 0.6 P 0.4 0.2 0.0

0.5 0.6 0.7 0.8 0.9 1.0

alpha

ˆ ˆ Figure 1: P (θnew ≤ θEcorr) when n = 30 33 1.0 0.8 0.6 P 0.4

0.2 gamma=1 gamma=10 gamma=20 gamma=50 0.0

0.5 0.6 0.7 0.8 0.9 1.0

alpha 1.0

gamma=−1 gamma=−10 gamma=−20 0.8 gamma=−50 0.6 P 0.4 0.2 0.0

0.5 0.6 0.7 0.8 0.9 1.0

alpha

ˆ ˆ Figure 2: P (θnew ≤ θEcorr) when n = 75 34 1.0 0.8 0.6 P 0.4

0.2 gamma=1 gamma=10 gamma=20 gamma=50 0.0

0.5 0.6 0.7 0.8 0.9 1.0

alpha 1.0

gamma=−1 gamma=−10 gamma=−20 0.8 gamma=−50 0.6 P 0.4 0.2 0.0

0.5 0.6 0.7 0.8 0.9 1.0

alpha

ˆ ˆ Figure 3: P (θnew ≤ θEcorr) when n = 500 35 C Appendix: Simulation results

This appendix contains some simulation results regarding the actual coverage of some confidence intervals for some combinations of α, n and γX . For each combination R was used to simulate n random variables 1 000 000 times and each time the confidence intervals were computed. The figures in the tables are the amount of times that the mean was in the confidence intervals. The distribution used for the random variables was the normal-inverse Gaussian distribution (the normal variance-mixture when the mixing density is inverse Gaussian). It is found in the HyperbolicDist library in R. The inverse Gaussian variables were simulated using the SuppDists library.

α = 0.95 Normal Edgeworth New method New method approxim. correction a = 1 a = 0.1 γX = −10.4, 0.933507 0.961781 0.95385 0.938311 n = 30 γX = −10.4, 0.933039 0.955999 0.951491 0.93839 n = 75 γX = −20.1, 0.941777 0.973632 0.959583 0.945165 n = 30 γX = −20.1, 0.934541 0.964937 0.954753 0.939006 n = 75 γX = −40.3, 0.959887 0.986872 0.971116 0.961884 n = 30 γX = −40.3, 0.947646 0.978591 0.96308 0.950522 n = 75 h h α = 0.99 Normal Edgeworth New method New method approxim. correction a = 1 a = 0.1 γX = −10.4, 0.966634 0.991678 0.982543 0.96997 n = 30 γX = −10.4, 0.971143 0.990596 0.985334 0.974616 n = 75 γX = −20.1, 0.96562 0.994081 0.980019 0.968311 n = 30 γX = −20.1, 0.965572 0.99233 0.981408 0.968813 n = 75 γX = −40.3, 0.973474 0.997139 0.982817 0.975019 n = 30 γX = −40.3, 0.967729 0.99532 0.980365 0.969895 n = 75

36 References

[1] Abramovitch, L., Singh, K. (1985), Edgeworth Corrected Pivotal Statistics and the Bootstrap, The Annals of Statistics, Vol. 13, pp. 116-132

[2] Bhattacharya, R. N., Ghosh, J.K. (1978), On the Validity of the Formal Edgeworth Expansion, The Annals of Statistics, Vol. 6, pp. 434-451

[3] Bickel, P.J., Freedman, D.A. (1981), Some Asymptotic Theory for the Bootstrap, The Annals of Statistics, Vol. 9, pp. 1196-1217

[4] Chebyshev, P.L. (1890), Sur Deux Th´eor`emesRelatifs aux Probabilit´es, Acta Math- ematica, Vol. 14, pp. 305-315

[5] Chhikara, R.S., Folks, J.L. (1989), The Inverse Gaussian Distribution, Mercel Dekker, ISBN 0-8247-7997-5

[6] Cornish, E. A., Fisher, R. A. (1938), Moments and Cumulants in the Specification of Distributions, Extrait de la Revue de l’Institute International de Statistique, Vol. 5, pp. 307-320

[7] Cram´er, H. (1928), On the composition of elementary errors, Skandinavisk Aktua- rietidskrift, Vol. 11, pp. 13-74 and 141-180

[8] Cram´er, H. (1946), Mathematical Methods of Statistics, Princeton University Press, ISBN 0-691-00547-8

[9] Cram´er, H. (1970), Random Variables and Probability Distributions, Cambridge University Press, ISBN 0-521-07685-4

[10] DasGupta, A. (2008), Asymptotic Theory of Statistics and Probability, Springer, ISBN 978-0-387-75970-8

[11] Dubkov, A. A., Malakhov, A. N. (1976), Properties and interdependence of the cumulants of a random variable, Radiophysics and Quantum Electronics, Vol. 19, pp. 833-839

[12] Edgeworth, F.Y. (1894), The Asymmetrical Probability Curve, Proceedings of the Royal Society of London, Vol. 56, pp. 271-272

[13] Edgeworth, F.Y. (1905), The Law of Error, Cambridge Philosophical Transactions, Vol. 20, pp. 36-65

[14] Edgeworth, F.Y. (1907), On the Representation of Statistical by a Series, Journal of the Royal Statistical Society, Vol. 70, pp. 102-106

[15] Efron, B. (1979), Bootstrap Methods: Another Look at the Jackknife, The Annals of Statistics, Vol. 7, pp. 1-26

37 [16] Esseen, C.-G. (1945), of Distribution Functions - A Mathematical Study of the Laplace-Gaussian Law, Acta Mathematica, Vol. 77, pp. 1-125

[17] Fisher, R.A., Cornish, E.A. (1960), The Percentile Points of Distributions Having Known Cumulants, Technometrics, Vol. 2, pp. 209-225

[18] Gut, A. (2005), Probability: A Graduate Course, Springer-Verlag, ISBN 978-0-387- 22833-4

[19] Hall, P. (1983), Inverting an Edgeworth Expansion, The Annals of Statistics, Vol. 11, pp. 569-576

[20] Hall, P. (1986), On the Bootstrap and Confidence Intervals, The Annals of Statistics, Vol. 14, pp. 1431-1452

[21] Hall, P. (1988), Theoretical Comparison of Bootstrap Confidence Intervals, The Annals of Statistics, Vol. 16, pp. 927-953

[22] Hall, P. (1992), The Bootstrap and Edgeworth Expansion, Springer-Verlag, ISBN 0-387-97720-1

[23] Hall, P. (1992), On the Removal of Skewness by Transformation, Journal of the Royal Statistical Society B, Vol. 54, pp. 221-228

[24] Kreyszig, E. (1978), Introductory Functional Analysis with Applications, John Wiley & Sons, ISBN 0-471-50459-9

[25] MacGillivray, H.L. (1981), The Mean, Median, Inequality and Skewness for a Class of Densities, Australian Journal of Statistics, Vol. 23, pp. 247-250

[26] Pearson, K. (1895), Skew Variation in Homogeneous Material, Philosophical Trans- actions of the Royal Society of London. A, Vol. 186, pp. 343-414

[27] Pfanzagl, J. (1979), Nonparametric Minimum Constrast , Selecta Statis- tica Canadiana, Vol. 5, pp. 105-140

[28] Singh, K. (1981), On the Asymptotic Accuracy of Efron’s Bootstrap, The Annals of Statistics, Vol. 9, pp. 1187-1195

[29] Withers, C.S. (1983), Expansions for the Distribution and Quantiles of a Regular Functional of the Empirical Distribution with Applications to Nonparametric Con- fidence Intervals, The Annals of Statistics, Vol. 11, pp. 577-587

38