<<

arXiv:math/0410148v1 [math.PR] 6 Oct 2004 oano trcin deot xaso,rno om ra norm, random expansion, Studentize. Edgeworth sum, normalized attraction, of domain o ( ton ( Hyrenius 1419–1437 2, No. DOI: 32, Vol. Probability2004, of Annals The rpriso h ttsi.Wlae( Wallace . the of properties agd ntal,i h aeo Student’s infer of on effect case the the and in Initially, its gauged. by replaced inference, is adaptive to approaches common most eeoe usata ieaue owihGyn( Gayen bu which normal, to was literature, distribution substantial a sampled developed the that assumption the

c aiainadtpgahcdetail. typographic 1419–1437 and 2, pagination No. 32, publis Vol. article 2004, original the Mathematical of of reprint Institute electronic an is This nttt fMteaia Statistics Mathematical of Institute e od n phrases. and words Key .Introduction. 1. M 00sbetclassifications. subject 2003. 2000 January AMS revised 2002; March Received ETA II HOE O STUDENT’S FOR THEOREM LIMIT CENTRAL 10.1214/009117904000000252 XC OVREC AEADLAIGTR IN TERM LEADING AND RATE CONVERGENCE EXACT 1977 utainNtoa nvriy utainNtoa Univ National Australian University, National Australian hi aiiyuiomyo h hl elln sequivalen is points. line symmetric real in two some whole just in on the that, validity on shown uniformly is validity It given. their also character are of rates Examples convergence correspondi omitted. the fo is when constant normalized but centering , is cated of latter form identi the truncated is a when rate using sum exac exact non-Studentized the the the to Moreover, identical points. for is three line just real of whole sets the on uniformly rate ieteeatrt fcnegnei h eta ii theor limit central the in u convergence is of order approximation rate prop leading-term exact influence the The sho give extreme sum. is which Studentized term in the leading way the of the dom of in form the origin The its in law. have is a of sampled attraction the that being assumption fStudent’s of n rsi ( Cressie and ) 1950 h edn emi h omlapoiaint h distribu the to approximation normal the in term leading The n − eeerycnrbtr,o h ffc fnnomlt on nonnormality of effect the on contributors, early were ) 1 / 2 yPtrHl n iigWang Qiying and Hall Peter By where , t h tdnie eni neryeapeo n fthe of one of example early an is Studentized The ttsi sdrvdi eea etn,wt h sole the with setting, general a in derived is statistic er–senterm hrceiaino aeo converg of rate of characterization theorem, Berry–Esseen 1980 n nvriyo Sydney of University and n 2004 , eoe apesz.I spoe htteexact the that proved is It size. denotes aervee oki hsae.Ee nthe in Even area. this in work reviewed have ) hsrpitdffr rmteoiia in original the from differs reprint This . rmr 00;scnay62E20. secondary 60F05; Primary in 1958 h naso Probability of Annals The 1 ,Bwa,BacapadShen- and Beauchamp Bowman, ), t ttsi,ti a oeunder done was this statistic, e ythe by hed 1949 eo ovrec,self- convergence, of te T , a othat to cal ztosof izations hr nuisance a where otheir to t mu to up em 1950 aeon rate t gtrun- ng stances, , STATISTIC scale r e to sed necarefully ence i of ain nto wn erties ae there later t tion ersity , 1952 and ) ence, 2 P. HALL AND Q. WANG case of normal data, where tables of the exact distribution have long been readily available, the issue of convergence (to normality) of the distribution of the t statistic has been of both theoretical and practical interest for many years; see, for example, Anscombe (1950) and Gayen (1952). From a theoretical viewpoint the problem of determining exact conver- gence rates for the t statistic can be a particularly awkward one. Despite the statistic’s simple representation in terms of the mean and mean of squares of independent data, its distribution is surprisingly difficult to approximate using methods for sums of independent random variables. The problem has, of course, long been solved under sufficiently severe conditions, but its treatment in more theoretically interesting cases, when its distribution is asymptotically normal but few other assumptions are made, is far from straightforward. In a major advance, Bentkus and G¨otze (1996) gave bounds of general Berry–Esseen type for rates of convergence in the central limit theorem for Student’s t statistic when the data are independent and identically distributed. See also Chibisov (1980, 1984) and Slavova (1985). Bentkus, Bloznelis and G¨otze (1996) extended Bentkus and G¨otze’s arguments to nonidentically distributed summands. Hall (1987) had earlier established Edgeworth expansions under moment conditions that were no more severe than existence of the moments actually appearing in the expansions. See also van Zwet (1984), Friedrich (1989), Putter and van Zwet (1998), Ben- tkus, G¨otze and van Zwet (1997), Wang and Jing (1999), Wang, Jing and Zhao (2000) and Bloznelis and Putter (1998, 2002). However, moment conditions, even finite variance, are not the main pre- requisite for convergence of the distribution of Student’s t statistic. In partic- ular, Gin´e, G¨otze and Mason (1997) showed that a necessary and sufficient condition for the Studentized mean to have a limiting standard normal dis- tribution is that the sampled distribution lie in the domain of attraction of the normal law. See also Logan, Mallows, Rice and Shepp (1973), Griffin and Mason (1991) and Egorov (1996). Although it is not of direct relevance to our work, we mention that the case where the data are from a is more complex. There, convergence in the conventional, deterministically normalized central limit theorem is not equivalent to convergence in the randomly normalized case; see Hahn and Zhang (1998). In the present paper we assume no more than that the sampled distribu- tion lies in the domain of attraction of the normal law, and describe rates of convergence, in the independent-data case, without reference to moment properties. We give the leading term in a normal approximation to the distri- bution of Student’s t statistic, and show that its form is strongly influenced by the effects that large data have on the statistic. Using the leading term, we derive the exact convergence rate in the central limit theorem, up to CONVERGENCE RATE FOR STUDENT’S T STATISTIC 3 terms of order n−1/2 (where n denotes sample size), or up to order n−1 when the sampled distribution satisfies Cram´er’s continuity condition. We show that, if the third moment should happen to be finite, the leading term transforms into the conventional first term in an Edgeworth expansion of the distribution of Student’s t statistic. More generally, however, the lead- ing term can be used to show that the exact rate of convergence over the whole real line is equivalent to the exact rate of convergence over very small sets, containing no more than three points. The number of points can be reduced to two if we seek necessary and sufficient characterizations of the convergence rate, rather than the exact rate itself. We draw connections to the rate of convergence of the distribution of a conventionally normalized, non-Studentized mean.

2. Main results. Let X1, X2,... be independent and identically distributed random variables, and let X have the distribution of a generic Xi. Student’s t statistic, with numerator centered at its expectation, is defined to be

n n n 2 1/2 2 −1 (2.1) T = Xi Xi − n Xi . !,( ! ) Xi=1 Xi=1 Xi=1 An alternative, more classical definition of the Studentized mean, in which the sample variance has divisor n − 1 rather than n, has the formula (1 − n−1)−1/2T ; see Gossett (1908). All our results hold for this version of Stu- dent’s statistic, as well as that given by (2.1). The principal results are Theorems 2.1 and 2.2, which respectively describe the leading term and its role in a normal approximation to the distribution of T . Propositions 3.1 and 3.2 in the next section reveal the origins of the leading term, and in particular link it to the way in which extremes affect the distribution of T . Write Φ and φ for the standard normal distribution and density functions, −2 2 respectively. Put bn = sup{x : nx E[X I(|X|≤ x)] ≥ 1} and 2 1/2 (2.2) Ln(x)= nE(Φ[x{1 + (X/bn) } − (X/bn)] − Φ(x)).

Theorem 2.1. If the distribution of X is in the domain of attraction of the normal law, and E(X) = 0, then

−1/2 (2.3) sup |P (T ≤ x) − {Φ(x)+ Ln(x)}| = o(δn)+ O(n ). −∞

We noted in Section 1 that T has a limiting standard normal distribution if and only if the distribution of X is in the domain of attraction of the normal law and E(X) = 0. Theorem 2.1 argues that Ln(x) is a leading term in an expansion of the distribution of T . As Theorem 2.2 will show, the exact order of magnitude of Ln(x) is that of −1 δn = nP (|X| > bn)+ nbn |E{XI(|X|≤ bn)}| (2.4) −3 3 −4 4 + nbn |E{X I(|X|≤ bn)}| + nbn E{X I(|X|≤ bn)}.

Theorem 2.2. Assume the distribution of X is in the domain of attrac- tion of the normal law and E(X) = 0. Then δn → 0 and

(2.5) sup |Ln(x)|≍ δn −∞

0 < lim inf an/bn ≤ lim supan/bn < ∞. n→∞ n→∞ Property (2.5) continues to hold if the supremum over all x is replaced by the 1/2 supremum over x ∈ {−x0,x0,x1}, where x0 > 3 and x1 is any real number 3 2 3 not equal to ±x0. Furthermore, if E(|X| ) < ∞, E(X ) = 1 and E(X )= γ, then 1/2 1 2 (2.6) sup |n Ln(x) − 6 γ(2x + 1)φ(x)|→ 0 −∞

There exist examples of distributions in the domain of attraction of the normal law having zero mean and, for which any given one of the four components in the definition of δn, at (2.4), dominate all the others along a subsequence. It follows that none of the terms of which δn is composed can be dropped if we require a full account of the rate of convergence in the central limit theorem. Formula (2.6) shows that in the case of finite third moment, the leading term is asymptotic to its conventional form in an Edgeworth expansion. Together, properties (2.3) and (2.5) give concise results about the rate of convergence in the central limit theorem. For example, if X is in the domain of attraction of the normal law, and E(X) = 0, then (2.3) and (2.5) imply that −1/2 −1/2 (2.7) sup |P (T ≤ x) − Φ(x)| + n ≍ δn + n ; −∞

3. Proofs.

3.1. Proof of Theorem 2.1. Let α> 0 and define Yi = XiI(|Xi|≤ αbn), ρn = nP (|X| > αbn),

i≤n Yi Ψn(x)= P 2 −1 2 1/2 ≤ x , " { i≤n Yi −Pn ( i≤n Yi) } # (3.1) 2 1/2 Mn1(x)= nE{(Φ[P x{1 + (X/bn)P} − (X/bn)] − Φ(x))I(|X| > αbn)}, 2 1/2 Mn2(x)= nE{(Φ[x{1 + (X/bn) } − (X/bn)] − Φ(x))I(|X|≤ αbn)}. Theorem 2.1 is a direct consequence of the following two propositions, which will be proved in Sections 3.3 and 3.4.

Proposition 3.1. Assume the distribution of X is in the domain of attraction of the normal law, and E(X) = 0. Then, for each α> 0,

(3.2) sup |P (T ≤ x) − {Ψn(x)+ Mn1(x)}| = o(ρn). −∞

Proposition 3.2. Assume the distribution of X is in the domain of attraction of the normal law, and E(X) = 0. Then, for each ε> 0 we have, for all sufficiently small α> 0, −1/2 (3.3) sup |Ψn(x) − {Φ(x)+ Mn2(x)}| ≤ εδn + O(n ). −∞

We remark that our method for proving Proposition 3.1 will show clearly that the leading-term fragment Mn1 derives principally from the largest summand among X1,...,Xn, that is, from the value Xmax of Xi for which |Xi| is greatest. Indeed, it may be proved that 2 1/2 Mn1(x)= E{(Φ[x{1 + (Xmax/bn) } − (Xmax/bn)] − Φ(x))I(|Xmax| > αbn)}

+ o(ρn), uniformly in x. It follows that the leading term Ln(x), introduced at (2.2) and defined as the limit of Mn1 as α → 0, also has this origin. The connections to extremes arise in part through the major role that large summands play in convergence properties of series when the distribu- tion of the summands has infinite variance. See Darling (1952), Arov and Bobrov (1960), Dwass (1966), Hall (1978), LePage, Woodroofe and Zinn (1981) and Resnick (1986) for discussion of more conventional settings. In 2 the present case the main series where extremes cause difficulty is i≤n Xi , appearing in the definition of T at (2.1). The summands here have finite P variance if and only if the sampled distribution has finite fourth moment. However, extremes arising even from the series i≤n Xi play a role in the leading term and so too in the convergence rate; see Hall (1984) for discus- P sion of the latter issue.

3.2. Proof of Theorem 2.2. It is straightforward to show that δn → 0 and sup−∞

(3.4) δn = O sup|Ln(x)| , x∈S  where S = {−x0,x0,x1} is the set of three points in the statement of the theorem; and that (2.6) holds. This follows relatively straightforwardly.

3.3. Proof of Proposition 3.1. Put V = maxi≤n |Xi| and J = argmaxi≤n|Xi|; ties may be broken in any measurable way. Define S to be the sign of XJ 2 and let T1 = i≤n Xi, T2 = i≤n Xi , T3 = i≤n Yi + SVI(V > αbn) and P P P CONVERGENCE RATE FOR STUDENT’S T STATISTIC 7

2 2 T4 = i≤n Yi + V I(V > αbn). The probability that two or more values 2 of |Xi|, for 1 ≤ i ≤ n, exceed αbn equals O(ρn). Therefore, P {(T1, T2)= P 2 (T3, T4)} = 1 − O(ρn), whence it follows that, uniformly in x, T P (T ≤ x)= P 1 ≤ x (T − n−1T 2)1/2 (3.5)  2 1  T3 2 = P −1 2 1/2 ≤ x + O(ρn). (T4 − n T3 )  Put π(v)= P (X ≥ 0||X| = v). Conditional on X1,...,Xn, let S(V ) denote a that takes the values +1 and −1 with probabilities π(V ) and 1 − π(V ), respectively. Let T5 = i≤n Yi + S(V )VI(V > αbn). Then (T , T ) has the same joint distribution as (T , T ), and so by (3.5) we have, 3 4 P 5 4 uniformly in x, 2 (3.6) P (T ≤ x)= P (W ≤ x)+ O(ρn), −1 2 1/2 where W = T5/(T4 − n T5 ) . Define 2 2 TY = (Yi − EYi), TB = (Yi − EYi ), i≤n i≤n X 2 X 2 ν = E{XI(|X|≤ αbn)}, τ = E{X I(|X|≤ αbn)}.

Note that a formula for Ψn(x), equivalent to (3.1), is

TY + nν (3.7) Ψn(x)= P 2 −1 2 1/2 ≤ x . {TB + nτ − n (TY + nν) }  Let the random variable N1 have the standard normal distribution. The −1 −2 joint distribution of the vector (bn TY , bn TB), conditional on V > εbn, con- verges to the joint distribution of (N1, 0). In particular, the second compo- nent of the limiting distribution is degenerate at 0. The convergence has the following property: for all ε> 0, −1 −2 (3.8) sup sup |P (bn TY ≤ x; bn |TB|≤ ε|V = v) − P (N1 ≤ x)|→ 0. v>αbn −∞ αbn, equals the unconditional 2 −1 jointP distributionP of ( i≤n−1 Yi, i≤n−1 Yi ); and that bn i≤n−1(Yi −EYi) → −2 2 2 −1 N1 in distribution, bnP i≤n−1(YPi −EYi ) → 0 in probabilityP and bn |E(Y1)|+ −2 2 bn E(Y1 ) → 0. 2P 2 Since T4 = TB + nτ + V I(V > αbn), T5 = TY + nν + S(V )VI(V > αbn), −2 2 −1 bn nτ → 1 and bn nν → 0, then, for all ε> 0, we have from (3.8), −1 sup sup |P [bn {T5 − S(V )(V/bn)}≤ x; v>αbn −∞

Therefore, if N2 denotes a standard normal random variable that is inde- pendent of V , then

N2 + S(V )(V/bn) sup sup P (W ≤ x|V = v) − P 2 1/2 ≤ x V = v → 0. v>αbn −∞

Equivalently,

2 1/2 P (W ≤ x|V = v) − Φ[x{1 + (v/bn) } − S(v)(v/bn)] → 0, uniformly in v > αbn and −∞ αbn; integrate over v > αbn; and then multiply by P (V > αbn), to prove that, uniformly in x,

P (W ≤ x; V > αbn) 2 1/2 (3.9) = E(Φ[x{1 + (V/bn) } − S(V )(V/bn)]I(V > αbn)) + o(ρn) 2 1/2 = nE(Φ[x{1 + (X/bn) } − (X/bn)]I(X > αbn)) + o(ρn). To derive the last identity, reformulate the expectation using an integration by parts argument, and note that

P {S(V )V >y} = nP (X>y)+ O[{nP (X>y)}2], P {S(V )V ≤ y} = nP (X ≤ y)+ O[{nP (X ≤ y)}2], where both remainders are of the stated orders uniformly in y > αbn and y< −αbn, respectively. Furthermore,

E{P (W ≤ x|V ≤ αbn)I(V ≤ αbn)}

TB + nν = E I 2 −1 2 1/2 ≤ x I(V ≤ αbn)  {TY + nτ − n (TB + nν) }   TB + nν =Ψn(x) − E I 2 −1 2 1/2 ≤ x I(V > αbn) ,  {TY + nτ − n (TB + nν) }   using (3.7) to obtain the last identity. A simpler version of the argument leading to (3.9) may be used to prove that the subtracted term above equals Φ(x)ρn + o(ρn), uniformly in x. Therefore,

(3.10) P (W ≤ x; V ≤ αbn)=Ψn(x) − Φ(x)ρn + o(ρn), uniformly in x. Combining (3.10) with (3.6) and (3.9), we conclude that (3.2) holds. CONVERGENCE RATE FOR STUDENT’S T STATISTIC 9

∗ 2 3.4. Proof of Proposition 3.2. Define Sn = j Xj , Sn = j Yj, Vn = 2 ∗2 2 j Xj and Vn = j Yj . It is well known [e.g., EfronP (1969)] that,P for x ≥ 0, P P ∗ ∗ 2 1/2 (3.11) Ψn(x)= P [Sn/Vn ≤ x{n/(n + x )} ]. Noting also that for |u|≤ 1,

sup |Φ{x(1 + u2)1/2 − u} −∞ 0,

(3.12) sup |Mn2(x) − Qn1(x)|≤ Cαδn, −∞ 0, we have for all sufficiently small α> 0, ∗ ∗ −1/2 (3.13) sup |P (Sn/Vn ≤ x) − {Φ(x)+ Qn1(x)}| ≤ εδn + O(n ), −∞ δn ,

∗ ∗ −x ∗ ∗ −1/12 2 P (Sn ≥ xVn ) ≤ e sup E{exp(|Sn/Vn |)}≤ A exp(−δn ) ≤ Aδn. n (Here and below, A denotes a positive constant which might be different 2 at each appearance.) Moreover, |1 − Φ(x) − Qn1(x)| ≤ Aδn uniformly in −1/12 x > δn . Therefore, (3.13) will follow if, for each ε> 0, we have for all sufficiently small α> 0, ′ ∗ ∗ −1/2 (3.14) sup |P (Sn/Vn ≤ x) − {Φ(x)+ Qn1(x)}| ≤ εδn + O(n ), and O(n−1/2) may be replaced by O(n−1) if Cram´er’s condition is satisfied, ′ −1/12 where sup denotes the supremum over x ∈ [0, δn ]. 2 2 −2 2 2 1/2 Let Bn = nEY1 and Wn = Bn j(Yj − EYj ). Noting that (1 + y) = 1 1 2 1 3 4 1 1 1+ 2 y − 8 y + 16 y + θy , where Pθ = θ(y) satisfies |θ|≤ 16 for |y|≤ 2 , we 10 P. HALL AND Q. WANG may prove that ∗ ∗ P (Sn/Vn ≤ x) ∗ 1/2 = P {Sn ≤ xBn(1 + Wn) } 1 ∗ 1 1 2 1 3 1 4 ≥−P (|Wn|≥ 2 )+ P {Sn ≤ xBn(1 + 2 Wn − 8 Wn + 16 Wn − 16 Wn )}, ∗ ∗ P (Sn/Vn ≤ x) ∗ 1/2 = P {Sn ≤ xBn(1 + Wn) } 1 ∗ 1 1 2 1 3 1 4 ≤ P (|Wn|≥ 2 )+ P {Sn ≤ xBn(1+ 2 Wn − 8 Wn + 16 Wn + 16 Wn)}. In view of Markov’s inequality, it is readily seen that, for each ε> 0, we have for any sufficiently small α> 0 and all sufficiently large n, 4 4 2 P (|Wn|≥ 1/2) ≤ 16E(Wn ) ≤ A(α δn + δn) ≤ εδn. Hence, (3.14) will follow if we prove that, for each ε> 0, we have for |θ|≤ 1/16 and any sufficiently small α> 0, ′ ∗ 1 1 2 1 3 4 sup |P {Sn ≤ xBn(1+ 2 Wn − 8 Wn + 16 Wn + θWn)} (3.15) − {Φ(x)+ Qn1(x)}| −1/2 ≤ εδn + O(n ), and O(n−1/2) may be replaced by O(n−1) if Cram´er’s condition is satisfied. Let j6=k, j6=k6=l and j6=k6=l6=m denote summations over pairs, triples and quadruples, respectively, of distinct integers between 1 and n. Put Zj = 2 P 2 P P Yj − EYj . Simple calculations show that n 6 3 3 2 2 BnWn = Zj + 3 Zj(Zk − EZk )+ ZjZkZl + Wn1, jX=1 jX6=k j6=Xk6=l n 8 4 4 3 3 2 2 2 2 BnWn = Zj + 4 Zj(Zk − EZk ) + 12 (Zj − EZj )(Zk − EZk )+ Wn2, jX=1 jX6=k jX6=k 2 where Wn1 = 3(n − 1)E(Z1 ) j Zj and n P n 3 2 2 2 Wn2 = 4(n − 1)E(Z1 ) Zj + 24(n − 1)E(Z1 ) (Zj − EZj ) jX=1 jX=1 2 2 2 + 12n(n − 1)(EZ1 ) + 24 ZjZkZl + ZjZkZlZm. j6=Xk6=l j6=kX6=l6=m Therefore, 1 1 1 P S∗ ≤ xB 1+ W − W 2 + W 3 + θW 4 n n 2 n 8 n 16 n n    CONVERGENCE RATE FOR STUDENT’S T STATISTIC 11

1 n x x = P ξj(x)+ 4 ϕjk + 6 ψjkl (Bn Bn Bn jX=1 jX6=k j6=Xk6=l

nEη1(x) ≤ x(1 + Wn3) − , Bn ) 1 1 −6 −8 where ξj(x)= ηj(x)−Eηj(x), ψjkl = − 16 ZjZkZl, Wn3 = 16 Bn Wn1 +θBn Wn2,

x x 2 x 3 θx 4 ηj(x)= Yj − Zj + 3 Zj − 5 Zj − 7 Zj , 2Bn 8Bn 16Bn Bn

1 3 2 2 4θ 3 3 ϕjk = ZjZk − 2 Zj(Zk − EZk ) − 4 Zj(Zk − EZk ) 8 16Bn Bn

12θ 2 2 2 2 − 4 (Zj − EZj )(Zk − EZk ). Bn It is readily seen that −6 4 4 4 2 −8 2 2 2 2 E(Bn Wn1) ≤ Aδn(α δn + δn) and E(Bn Wn2) ≤ Aδn(α δn + δn). Hence, for each ε> 0, we have for any sufficiently small α> 0 and all suffi- ciently large n, −6 −8 P (|Wn3|≥ 2εδn) ≤ P (|Bn Wn1|≥ εδn)+ P (|Bn Wn2|≥ εδn) −4 4 2 −2 2 2 ≤ A{ε (α δn + δn)+ ε (α δn + δn)}≤ εδn. Result (3.15) now follows easily from the following three propositions. We will only prove Propositions 3.3 and 3.4 in subsequent sections. The proof of Proposition 3.5 is relatively straightforward although requiring tedious alge- bra, and hence details are omitted. The proof of Proposition 3.2 is therefore complete.

1 Proposition 3.3. For all 0 < α ≤ 2 , n ′ 1 x x sup sup P ξj(x)+ 4 ϕjk + 6 ψjkl ≤ y −∞

−1/2 = o(δn)+ O(n ),

1 (2) where Ln(y)= n[EΦ{y − ξ1(x)/Bn}− Φ(y)] − 2 Φ (y).

itX −1/2 Proposition 3.4. If lim sup|t|→∞ |Ee | < 1, then the term O(n ) on the right-hand side of (3.16) may be replaced by O(n−1). 12 P. HALL AND Q. WANG

Proposition 3.5. For each ε> 0, we have for any sufficiently small α> 0, ′ −1 (3.17) sup |Φ[x − {nEη1(x)/Bn}] − Φ(x)+ Qn2(x)|≤ εδn + O(n ), ′ −1 (3.18) sup |Ln[x − {nEη1(x)/Bn}] − Qn3(x)|≤ εδn + O(n ), j 1 where unj = nE{(X/Bn) I(|X|≤ αbn)}, Qn2(x)= un1φ(x)+ 8 xun4φ(x) and 1 2 1 2 Qn3(x)= un3 6 (2x + 1)φ(x)+ un4 24 x(2x − 3)φ(x).

3.5. Proof of Proposition 3.3. Standard methods based on Taylor’s ex- pansion, although requiring tedious algebra, may be used to establish the following lemmas. Define j unj = nE{(X/Bn) I(|X|≤ αbn)}, g(t,x)= E[exp{itξ1(x)/Bn}] and −t2/2 1 2 fn(t,x)= e [1 + n{g(t,x) − 1} + 2 t ].

1 Lemma 3.6. If 0 < α ≤ 2 , then for all sufficiently large n, −2 2 1 2 2 −1 (3.19) |nBn Eξ1 (x) − (1 − xun3 + 4 x un4)|≤ 2(1 + x )(αδn + n ), −3 3 3 3 −1 (3.20) |nBn Eξ1 (x) − (un3 − 2 xun4)|≤ 12(1 + |x| )(αδn + n ), −4 4 4 −1 (3.21) |nBn Eξ1 (x) − un4|≤ 32(1 + x )(αδn + n ), −5 5 5 (3.22) nBn E|ξ1(x)| ≤ 32(1 + |x| )αδn.

1 Lemma 3.7. There exists a constant c0 > 0 such that, for all α ∈ (0, 2 ], 1/2 −1/12 |t|≤ c0n , all x ∈ [0, δn ] and all sufficiently large n, 2 (3.23) |g(t,x)|≤ e−t /8n, n −t2/2 4 −1 2 4 −t2/8 (3.24) |g (t,x) − e |≤ A(1 + x )(1 + α )δn(t + t )e , n 8 −2 2 4 8 −1 4 −t2/8 (3.25) |g (t,x) − fn(t,x)| ≤ {A(1 + x )(1 + α )δn(t + t ) + 2n t }e .

1 Throughout the proof of Proposition 3.3, we assume that 0 < α ≤ 2 , −1/12 0 ≤ x ≤ δn and n is sufficiently large. Defineϕ ¯jk = ϕjk + ϕkj , Tn = −1 Bn j ξj(x) and P x m−1 n 6x m−2 n n ∆n,m = 4 ϕ¯jk + 6 ψjkl. Bn Bn jX=1 k=Xj+1 jX=1 k=Xj+1 l=Xk+1 2 2 2 Noting that Bn = nEY1 = bn for sufficiently large n, we obtain that |Yj|≤ 2 Bn/2 and |Zj|≤ Bn/2. Using these properties, E(ψ123|Xj) = 0, j = 1, 2, 3, CONVERGENCE RATE FOR STUDENT’S T STATISTIC 13

2 8 4 2 2 and E(ϕ ¯12|Xj) = 0, j = 1, 2, we may deduce that E(ϕ12) ≤ 2 (EY1 ) , E(ψ123) ≤ 4 3 (EY1 ) and, for 1 ≤ m ≤ n, 2 2 −8 2 2 −12 2 E(∆n,m) ≤ 2x {mnBn E(ϕ12)+ mn Bn E(ψ123)} (3.26) 10 −1 2 2 ≤ 2 mn x δn, −4 4 the last inequality following from the fact that nBn EY1 ≤ δn. Moreover, −1 −1 noting that n ≤ δn → 0, |un4|≤ δn and |un3|≤ (1+α )δn, it follows easily from (3.19)–(3.21) that 2 Eξ1 (x) 2 −1 1 (3.27) 2 − 1 ≤ 2(1 + x )(1 + α )δn ≤ , EY1 2

−3 3 3 −1 (3.28) nB n |Eξ1 (x) |≤ 32(1 + |x| )(1 + α )δn, −4 4 4 (3.29) nBn Eξ1 (x) ≤ 65(1 + x )δn. We now turn back to the proof of (3.16). Using the identities x x ∆n,n = 4 ϕjk + 6 ψjkl Bn Bn jX6=k j6=Xk6=l ity and e d{Φ(y)+ Ln(y)} = fn(t,x), and Esseen’s smoothing lemma [e.g., Petrov (1975), page 109], it may be shown that R sup |P (Tn +∆n,n ≤ y) − {Φ(y)+ Ln(y)}| −∞

I = 2|E exp(itT ) − f (t,x)||t|−1 dt 2n −1/4 n n Z|t|≤δn −1 + |E exp(itTn)||t| dt, −1/4 1/2 Zδn ≤|t|≤c0n I = |E{∆ exp(itT )}| dt, 3n −1/4 n,n n Z|t|≤δn −1 I4n = |E exp{it(Tn +∆n,n)}||t| dt, −1/4 −2 1/2 Zδn ≤|t|≤min{δn ,c0n } 14 P. HALL AND Q. WANG and we have used the property, implied by (3.27)–(3.29), that 1 nEξ2(x) n|Eξ3(x)| nEξ4(x) sup |L′ (y)|≤ A 1 − 1 + 1 + 1 n 2 B2 6B3 24B4 −∞

4 −1 −1 2/3 ≤ A(1 + x )(1 + α ) δn ≤ A(1 + α )δn . Using (3.26) and the fact that |eiu − 1 − iu|≤ u2/2, it can be shown that

(3.31) I ≤ 1 E(∆2 )|t| dt ≤ 29x2δ3/2 ≤ 29δ4/3. 1n 2 −1/4 n,n n n Z|t|≤δn Using Lemma 3.7, we obtain 8 −2 2 −1 −2 4/3 −1 (3.32) I2n ≤ A{(1 + x )(1 + α )δn + n }≤ A{(1 + α )δn + n }.

Next we estimate I3n and I4n. Treating the former first, note that E(ϕ12|X1)= 2 8 4 2 E(ϕ12|X2)=0 and Eϕ12 ≤ 2 (EY1 ) , and that as in Bickel, G¨otze and van Zwet (1986), t2 (3.33) E(ϕ12 exp[it{ξ1(x)+ ξ2(x)}/Bn]) = − 2 E{ξ1(x)ξ2(x)ϕ12} + l1(x), Bn where by using |eiu − 1 − iu|≤ u2/2, |eiu − 1| ≤ |u|, (3.27) and (3.29),

itξ1(x)/Bn itξ1(x) itξ2(x)/Bn |l1(x)|≤ E ϕ12 e − 1 − {e − 1}   Bn  

|t| itξ2(x)/Bn itξ2(x) + E ϕ12ξ1(x) e − 1 − Bn   Bn 

3 |t| 2 2 ≤ 3 E[|ϕ12|{|ξ1(x)|ξ2 (x)+ ξ1 (x)|ξ2(x)|}] 2Bn 3 |t| 2 1/2 2 1/2 4 1/2 ≤ 3 (Eϕ12) {Eξ1 (x)} {Eξ1 (x)} Bn 2 3 −1/2 −1 4 2 1/2 1/2 ≤ A(1 + x )|t| n Bn (EY1 )(EY1 ) δn 2 3 −2 3/2 4 ≤ A(1 + x )|t| n δn Bn, −4 4 2 2 since nBn EY1 ≤ δn and Bn = nEY1 . Tedious but elementary calculation shows that 2 −2 2 6 |E{ξ1(x)ξ2(x)ϕ12}| ≤ A(1 + x )n δnBn. Substituting into (3.33), we deduce that 2 2 3 −2 3/2 4 (3.34) |E(ϕ12 exp[it{ξ1(x)+ ξ2(x)}/Bn])|≤ A(1 + x )(t + |t| )n δn Bn.

Similarly, it follows from the identities E(ψ123|Xj)=0, for j = 1, 2, 3, and 2 4 3 from Eψ123 ≤ (EY1 ) and (3.27), that 3 −3 3/2 6 (3.35) |E(ψ123 exp[it{ξ1(x)+ ξ2(x)+ ξ3(x)}/Bn])|≤ A|t| n δn Bn. CONVERGENCE RATE FOR STUDENT’S T STATISTIC 15

From (3.34), (3.35) and (3.23), it can be seen that

|E{∆n,n exp(itTn)}| 2 −4 3 −6 ≤ |x|n Bn |E{ϕ12 exp(itTn)}| + |x|n Bn |E{ψ123 exp(itTn)}| 3 3/2 2 3 −t2/8 ≤ A(1 + |x| )δn (t + |t| )e , and hence (3.36) I = |E{∆ exp(itT )}| dt ≤ A(1 + |x|3)δ3/2 ≤ Aδ5/4. 3n −1/4 n,n n n n Z|t|≤δn ∗ We next estimate I4n. Put ∆n,m =∆n,n − ∆n,m. In view of (3.26), ∗ ∗ |E exp{it(Tn +∆n,n)}− E exp{it(Tn +∆n,m)}− itE∆n,m exp{it(Tn +∆n,m)}| 9 2 2 −1 2 ≤ 2 t x mn δn.

This inequality, together with the independence of the Xk’s, implies that for any 1 ≤ m ≤ n,

|E exp{it(Tn +∆n,n)}| (3.37) m−2 m−5 2 2 −1 2 ≤ |g(t,x)| + A|x|δn|t||g(t,x)| + At x mn δn, 2 1/2 where we have used the bound E|∆n,m|≤ (E|∆n,m| ) ≤ A|x|δn. −2 −1 Let n0 = [16nt log(δn )]+5, where [·] denotes the integer part function. −1/4 −2 1/2 It is clear that 1 ≤ n0 ≤ n for δn ≤ |t|≤ min{δn , c0n }, for n large enough. Hence, choosing m = n0 in (3.37) and using (3.23), we get

−1 I4n = |E exp{it(Tn +∆n,n)}||t| dt −1/4 −2 1/2 (3.38) δn ≤|t|≤min{δn ,c0n } Z 2 −1 2 2 4/3 ≤ A(1 + x )(log δn ) δn ≤ Aδn .

Substituting the bounds for I1n,...,I4n back into (3.30), and recalling that δn → 0, we obtain (3.16), and hence complete the proof of Proposition 3.3.

3.6. Proof of Proposition 3.4. Without loss of generality we assume that −1/3 −1/3 −1/2 δn ≤ n . Indeed, for δn ≥ n , it is obvious that the term O(n ) on −1 −1/3 the right-hand side of (2.3) can be replaced by O(n ). Note that δn ≤ n −1/3 implies that nP (|X|≥ bn) ≤ n . This, together with the fact that the distribution of X is in the domain of attraction of a normal law, implies that EX2 < ∞. We continue to use the notation in the proof of Proposition 3.3. Further, we put m n m n n ˜ (1) x 6x ∆n,m = 4 ϕ¯jk + 6 ψjkl, Bn Bn jX=1 k=Xm+1 jX=1 k=Xm+1 l=Xk+1 n−1 n n−2 n n ˜ (2) x 6x ∆n,m = 4 ϕ¯jk + 6 ψjkl. Bn Bn j=Xm+1 k=Xj+1 j=Xm+1 k=Xj+1 l=Xk+1 16 P. HALL AND Q. WANG

−1/12 As in the proof of (3.26), we have that, for 0 ≤ x ≤ δn , ˜ (1) ˜ (2) 2 10 2 −2 2 2 2 −2 11/6 (3.39) E(∆n,n − ∆n,m − ∆n,m) ≤ 2 m n x δn ≤ Am n δn .

Hence, for m0 = C log n, where C is a constant that we shall specify later, ˜ (1) ˜ (2) −1 2 2 11/6 2 3/2 P (|∆n,n − ∆n,m0 − ∆n,m0 |≥ n ) ≤ AC (log n) δn ≤ AC δn . −1/12 Proposition 3.4 will now follow if we show that, for 0 ≤ x ≤ δn , ˜ (1) ˜ (2) sup |P (Tn + ∆n,m0 + ∆n,m0 ≤ y) − {Φ(y)+ Ln(y)}| (3.40) −∞

Lemma 3.8. Let F be a distribution function with characteristic func- tion f. Then for all y ∈ R and T > 0, it holds that T (3.41) lim F (z) ≤ 1 +P.V. exp(−iyt)T −1K(t/T )f(t) dt, z↓y 2 Z−T T (3.42) lim F (z) ≥ 1 − P.V. exp(−iyt)T −1K(−t/T )f(t) dt, z↑y 2 Z−T where T −h T P.V. = lim + , h↓0 Z−T Z−T Zh  and 2K(s)= K1(s)+ iK2(s)/(πs),

K1(s) = 1 − |s|, K2(s)= πs(1 − |s|) cot πs + |s| for |s| < 1, and K(s) ≡ 0 for |s|≥ 1.

We shall give the proof of (3.40) by using Lemma 3.8 and some of the tech- niques of Bentkus, G¨otze and van Zwet (1997). By Ek(·)= E(·|Xk+1,...,Xn) we shall denote expectation conditional on Xk+1,...,Xn. Define n/2 n 1/2 −2/3 x 1/2 −2/3 x τ1 = n δn Em0 4 ϕ¯1k , τ2 = n δn Em0 4 ϕ¯1k , Bn Bn k=m0+1 k=n/2+1 X X itX and put τ0 = 1 − lim sup|t|→∞ |Ee |, n1/2δ−2/3τ H = n 0 . 16(1 + τ1 + τ2) CONVERGENCE RATE FOR STUDENT’S T STATISTIC 17

As in the proof of (3.26),

n/2 2 n 2 2 −4/3 x x E(τ1 + τ2) ≤ 2nδn E 4 ϕ¯1k + E 4 ϕ¯1k ( Bn ! Bn ! ) k=Xm0+1 k=Xn/2+1 2 2/3 ≤ Ax δn .

−1/12 This, together with the bound 0 ≤ x ≤ δn , implies that

−2 −1 4/3 2 −1 4/3 (3.43) E(H ) ≤ An δn E(1 + τ1 + τ2) ≤ An δn .

1/2 −2/3 Also, we have that H ≤ (τ0/16)n δn .

Returning to the proof of (3.40), note that H depends only on Xm0+1,...,Xn. Using (3.41), and arguing as in Bentkus, G¨otze and van Zwet (1997), we ob- tain ˜ (1) ˜ (2) (3.44) 2P (Tn + ∆n,m0 + ∆n,m0 ≤ y) ≤ 1+ EI1 + EI2, ˜ (1) ˜ (2) where, with f(t)= Em0 exp[it{Tn + ∆n,m0 + ∆n,m0 }],

−1 I1 = H exp(−iyt)K1(t/H)f(t) dt, ZR i I = P.V. exp(−iyt)K (t/H)f(t)t−1 dt. 2 π 2 ZR The following results are derived by Hall and Wang (2003), on which the present paper is based:

−1 (3.45) |EI1| = o(δn)+ O(n ), −1 (3.46) |EI2 + 1 − 2{Φ(y)+ Ln(y)}| = o(δn)+ O(n ).

It follows from (3.44)–(3.46) that

˜ (1) ˜ (2) −1 P (Tn + ∆n,m0 + ∆n,m0 ≤ y) ≤ Φ(y)+ Ln(y)+ o(δn)+ O(n ). Similarly, using (3.42) and symmetry arguments, one can show that

˜ (1) ˜ (2) −1 P (Tn + ∆n,m0 + ∆n,m0 >y) ≤ 1 − {Φ(y)+ Ln(y)} + o(δn)+ O(n ). Result (3.40) now follows, and hence the proof of Proposition 3.4 is complete.

Acknowledgment. The authors thank a referee and Editors for their valu- able comments and suggestions. 18 P. HALL AND Q. WANG

REFERENCES

Anscombe, F. J. (1950). Table of the hyperbolic transformation sinh−1 √x. J. Roy. Statist. Soc. Ser. A 113 228–229. MR37059 Arov, D. Z. and Bobrov, A. A. (1960). The extreme members of a sample and their role in the sum of independent variables. Teor. Veroyatnost. i Primenen. 5 415–435. MR151994 Bentkus, V., Bloznelis, M. and Gotze,¨ F. (1996). A Berry–Ess´een bound for Student’s statistic in the non-i.i.d. case. J. Theoret. Probab. 9 765–796. MR1400598 Bentkus, V. and Gotze,¨ F. (1996). The Berry–Esseen bound for Student’s statistic. Ann. Probab. 24 491–503. MR1387647 Bentkus, V., Gotze,¨ F. and van Zwet, W. R. (1997). An Edgeworth expansion for symmetric statistics. Ann. Statist. 25 851–896. MR1439326 Bickel, P. J., Gotze,¨ F. and van Zwet, W. R. (1986). The Edgeworth expansion for U-statistics of degree two. Ann. Statist. 14 1463–1484. MR868312 Bloznelis, M. and Putter, H. (1998). One term Edgeworth expansion for Student’s t statistic. In and . Proceedings of the Seventh Vilnius Conference (B. Grigelionis et al., eds.) 81–98. Bloznelis, M. and Putter, H. (2002). Second order and bootstrap approximation to Student’s statistic. Theor. Veroyatnost. i Primenen. 47 374–381. MR2003205 Bowman, K. O., Beauchamp, J. J. and Shenton, L. R. (1977). The distribution of the t-statistic under nonnormality. Internat. Statist. Rev. 45 233–242. MR488427 Chibisov, D. M. (1980). Asymptotic expansion for the distribution of statistics admitting a stochastic expansion. I. Teor. Veroyatnost. i Primenen. 25 745–756. MR595136 Chibisov, D. M. (1984). Asymptotic expansions and deficiencies of tests. In Proceedings of the International Congress of Mathematicians (Z. Ciesielski and C. Olech, eds.) 1 1063–1079. PWN, Warsaw. MR804758 Cressie, N. (1980). Relaxing assumptions in the one-sample t-test. Austral. J. Statist. 22 143–153. MR588228 Darling, D. A. (1952). The influence of the maximum term in the addition of independent random variables. Trans. Amer. Math. Soc. 73 95–107. MR48726 Dwass, M. (1966). Extremal processes. II. Illinois J. Math. 10 381–391. MR193661 Efron, B. (1969). Student’s t-test under symmetry conditions. J. Amer. Statist. Assoc. 64 1278–1302. MR251826 Egorov, V. A. (1996). On the asymptotic behavior of self-normalized sums of random variables. Teor. Veroyatnost. i Primenen. 41 643–650. MR1450081 Friedrich, K. O. (1989). A Berry–Esseen bound for functions of independent random variables. Ann. Statist. 17 170–183. MR981443 Gayen, A. K. (1949). The distribution of “Student’s” t in random samples of any size drawn from nonnormal universes. Biometrika 36 353–369. MR33496 Gayen, A. K. (1950). Significance of difference between the of two nonnormal samples. Biometrika 37 399–408. MR38036 Gayen, A. K. (1952). The inverse hyperbolic sine transformation on Student’s t for nonnormal samples. Sankhy¯a 12 105–108. MR54892 Gine,´ E., Gotze,¨ F. and Mason, D. (1997). When is the Student t-statistic asymptoti- cally standard normal? Ann. Probab. 25 1514–1531. MR1457629 Gossett, W. S. (1908). The probable error of a mean. Biometrika 6 1–25. Griffin, P. S. and Mason, D. M. (1991). On the asymptotic normality of self-normalized sums. Math. Proc. Cambridge Philos. Soc. 109 597–610. MR1094756 CONVERGENCE RATE FOR STUDENT’S T STATISTIC 19

Hahn, M. G. and Zhang, G. (1998). Distinctions between the regular and empirical central limit theorems for exchangeable random variables. Progr. Probab. 43 111–143. MR1652324 Hall, P. (1978). On the extreme terms of a sample from the domain of attraction of a stable law. J. London Math. Soc. 18 181–191. MR488231 Hall, P. (1982). Rates of Convergence in the Central Limit Theorem. Pitman, London. MR668197 Hall, P. (1984). On the influence of extremes on the rate of convergence in the central limit theorem. Ann. Probab. 12 154–172. MR723736 Hall, P. (1987). Edgeworth expansion for Student’s t statistic under minimal moment conditions. Ann. Probab. 15 920–931. MR893906 Hall, P. (1988). On the effect of random norming on the rate of convergence in the central limit theorem. Ann. Probab. 16 1265–1280. MR942767 Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and its Application. Aca- demic Press, New York. MR624435 Hall, P. and Wang, Q. (2003). Exact convergence rate and leading term in central limit theorem for Student’s t statistic. Statistics Research Reports SRR03-001, Centre for Mathematics and Its Applications, Australian National University. Available at wwwmaths. anu.edu.au/research.reports. Hyrenius, H. (1950). Distribution of “Student”–Fisher’s t in samples from compound normal functions. Biometrika 37 429–442. MR38034 Lepage, R., Woodroofe, M. and Zinn, J. (1981). Convergence to a via order statistics. Ann. Probab. 9 624–632. MR624688 Logan, B. F., Mallows, C. L., Rice, S. O. and Shepp, L. A. (1973). Limit distributions of self-normalized sums. Ann. Probab. 1 788–809. MR362449 Petrov, V. V. (1975). Sums of Independent Random Variables. Springer, New York. MR388499 Prawitz, H. (1972). Limits for a distribution, if the characteristic function is given in a finite domain. Skand. Aktuarietidskr. 55 138–154. MR375429 Putter, H. and van Zwet, W. R. (1998). Empirical Edgeworth expansions for symmet- ric statistics. Ann. Statist. 26 1540–1569. MR1647697 Resnick, S. I. (1986). Point processes, regular variation and weak convergence. Adv. in Appl. Probab. 18 66–138. MR827332 Slavova, V. V. (1985). On the Berry–Esseen bound for Student’s statistic. Stability Problems for Stochastic Models. Lecture Notes in Math. 1155 355–390. Springer, Berlin. MR825335 van Zwet, W. R. (1984). A Berry–Esseen bound for symmetric statistics. Z. Wahrsch. Verw. Gebiete 66 425–440. MR751580 Wallace, D. L. (1958). Asymptotic approximations to distributions. Ann. Math. Statist. 29 635–654. MR97109 Wang, Q. and Jing, B.-Y. (1999). An exponential nonuniform Berry–Esseen bound for self-normalized sums. Ann. Probab. 27 2068–2088. MR1742902 Wang, Q., Jing, B.-Y. and Zhao, L. C. (2000). The Berry–Esseen bound for Studentized statistics. Ann. Probab. 28 511–535. MR1756015 20 P. HALL AND Q. WANG

Centre for Mathematics Centre for Mathematics and its Applications and its Applications Mathematical Sciences Institute Mathematical Sciences Institute Australian National University Australian National University Canberra ACT 0200 Canberra ACT 0200 Australia Australia e-mail: [email protected] and School of Mathematics and Statistics University of Sydney Sydney NSW 2006 Australia e-mail: [email protected]