arXiv:1806.01431v2 [math.ST] 13 Aug 2019 in .Introduction 1. Oecnve ahjitdistribution joint each view can (One ouba acue,B,Cnd.Ealades kysong@ma address: Email Vancouv Canada. Song, Soci BC, Kyungchul Vancouver, by Columbia, address: supported p Corresponding was a research Canada. in this of comments that valuable acknowledge for I referees mine. anonymous three thank I Date namaual pc nwhich on space measurable a on ups that Suppose U A R uut1,2019. 14, August : d n aigma eo Let zero. mean having and re xaso frsmln-ae distributions. resampling-based of expansion order h itiuin fterno etr.Uigti result, expansi this Edgeworth Using an vectors. derive random the to of used distributions be the can bound sample finite Cram´er Condition xaso o u fidpnet oetal icee n discrete, uniform-in- potentially independent, a of sum a for expansion K M 91C 1991 AMS A YWORDS EY BSTRACT NIFORM acue colo cnmc,Uiest fBiihColum British of University Economics, of School Vancouver hspprpoie nt apebudfrteerrtr i term error the for bound sample finite a provides paper This . { deot xaso;Nra prxmto;Bosrp R Bootstrap; Approximation; Normal Expansion; Edgeworth . X LASSIFICATION P i,n W eso ftewae rme odto in Cram´er condition weaker the of version } i n - =1 IN EAK satinua ra fidpnetrno etr aigv taking vectors random independent of array triangular a is - P 00,6G0 62E20 62G20, 60E05, : V C n,P P ( E X RAM n DGEWORTH ≡ i,n etecleto ftejitdsrbtosfor distributions joint the of collection the be yncu Song Kyungchul ) P n 1 i n =1 X in i =1 n ´ ER ie)Define live.) P Var 1 n C sbigidcdb rbblt measure probability a by induced being as P ( X ONDITIONS i,n lSine n uaiisRsac Council Research Humanities and Sciences al rSho fEoois nvriyo British of University Economics, of School er il.ubc.ca. ) eiu eso fti ae.Alerr are errors All paper. this of version revious , edrv uniform-in- a derive we E natc admvcos using vectors, random onlattice ns n Poly and Angst PNINUNDER XPANSION nta suiomover uniform is that on smln;Weak esampling; h Edgeworth the n ( bia 2017 P .This ). higher ( X i,n alues ) i n =1 . 2

where Var (X ) E (X X′ ) and E denotes the expectation with respect to P . We P i,n ≡ P i,n i,n P are interested in the Edgeworth expansion of the distribution of n 1 1/2 V − X , √n n,P i,n i=1 X which is uniform over P P . ∈ n The Edgeworth expansion has long received attention in the literature. See Bhattacharya and Rao (2010) for a formal review of the results. The validity of the classical Edgeworth expansion is obtained under the Cram´er condition which says that the average of the characteristic functions over the sample units stays bounded below 1 in absolute value as the function is evaluated at a sequence of points which increase to infinity. The Cram´er condition fails when the random variable has a support that consists of a finite number of points, and hence does not apply to resampling-based distributions such as bootstrap.1 A stan- dard approach to deal with this issue is to derive an Edgeworth expansion separately for the bootstrap distribution by using an expansion of the empirical characteristic functions. (e.g. Singh (1981) and Hall (1992).) The main contribution of this paper is to provide a finite sample bound for the remain- der term in the Edgeworth expansion for a sum of independent random vectors. Using the finite sample bound, one can immediately obtain a uniform-in-P Edgeworth expan- sion, where the error bound for the remainder term in the expansion is uniform over a collection of probabilities. A notable feature of the Edgeworth expansion is that it admits random vectors which can be discrete, non-lattice distributions. From this result, as shown in the paper, a uniform-in-P Edgeworth expansion for various resampling-based discrete distributions follows as a corollary. To obtain such an expansion, this paper uses a uniform-in-P version of weak Cram´er conditions introduced by Angst and Poly (2017) and obtains a finite sample bound for the error term in the Edgeworth expansion by modifying the proofs of Theorems 20.1 and 20.6 of Bhattacharya and Rao (2010) and Theorem 4.3 of Angst and Poly (2017). This paper’s finite sample bound reveals that we obtain a uniform-in-P Edgeworth expansion, when- ever we have a uniform-in-P bound for the moment of the same order in the Edgeworth expansion. A uniform-in-P asymptotic approximation is naturally required in a testing set-up with a composite null hypothesis. By definition, a composite null hypothesis involves a collection of probabilities in the null hypothesis, and the size of a test in this case is its maximal

1Despite the failure of the Cram´er condition, the Edgeworth expansion for lattice distributions is well known. See Bhattacharya and Rao (2010), Chapter 5. See Kolassa and McCullagh (1990) for a general result of Edgeworth expansion for lattice distributions. Booth, Hall, and Wood (1994) provide the Edgeworth expan- sion for discrete yet non-lattice distributions. 3

rejection probability over all the probabilities admitted in the null hypothesis. Asymp- totic control of size requires uniform-in-P asymptotic approximation of the test statistic’s distribution under the null hypothesis. One can apply the same notion to the coverage probability control of confidence intervals as well. As for uniform-in-P Gaussian approximation, one can obtain the result immediately from a Berry-Esseen bound once appropriate moments are bounded uniformly in P . It is worth noting that uniform-in-P Gaussian approximation of empirical processes was stud- ied by Gin´eand Zinn (1991) and Sheehy and Wellner (1992). There has been a growing interest in uniform-in-P inference in various nonstandard set-ups in the literature of econo- metrics in connection with the finite sample stability of inference. (See Mikusheva (2007), Linton, Song, and Whang (2010), and Andrews and Shi (2013), among others.) When a test based on a resampling procedure exhibits higher order asymptotic refine- ment properties, the uniform-in-P Edgeworth expansion can be used to establish a higher order asymptotic size control for the test. A related work is found in Hall and Jing (1995) who used the uniform-in-P Edgeworth expansion to study the asymptotic behavior of the confidence intervals based on a studentized t statistic. They used a certain smoothness condition for the distributions of the random vectors which excludes resampling-based distributions.

2. A Uniform-in-P Edgeworth Expansion

2.1. Uniform-in-P Weak Cram´er Conditions

Angst and Poly (2017) (hereafter, AP) introduced what they called a weak Cram´er con- dition and a mean weak Cram´er condition which are weaker than the classical Cram´er con- d 2 dition. Let us prepare notation. Let be the Euclidean norm in R , i.e., a = tr(a′a). k·k k k The following definition modifies the weak Cram´er condition introduced by AP into a con- dition for a collection of probabilities.

Definition 2.1. (i) Given b,c,R > 0, a collection of the distributions P of a random vector W taking values in Rd and having characteristic function φ under P P is said to W,P ∈ satisfy the weak Cram´er condition with parameter (b, c, R), if for all t Rd with t > R, ∈ k k c (1) sup φW,P (t) 1 b . P P | | ≤ − t ∈ k k (ii) Given b,c,R > 0, a collection of the joint distributions Pn of a triangular array n Rd of random vectors (Wi,n)i=1 with each Wi,n taking values in and having characteristic function φ under P P is said to satisfy the mean weak Cram´er condition with Wi,n,P ∈ n 4

parameter (b, c, R), if for all t Rd with t > R, ∈ k k 1 n c (2) sup φWi,n,P (t) 1 . P n | | ≤ − t b P n i=1 ∈ X k k The definition of the weak Cram´er condition modifies that of AP, making the constants b, c, R independent of the probability P . This separation between (b, c, R) and P is crucial for our purpose because the probability P in the context of resampling is typically the empirical measure of the data which depends on the sample size. Since the remainder term in the Edgeworth expansion under the weak Cram´er condition depends on the constants (b, c, R), it is convenient to make these constants independent of the sample size.

2.2. A Uniform-in-P Edgeworth Expansion under Weak Cram´er Conditions

In this section, we present the main result that gives a finite sample bound for the error term in the Edgeworth expansion. Let us prepare notation first. Given each multi-index

ν =(ν1, ..., νd), with νk being a nonnegative integer, we let χ¯ν,P be the average of the ν-th 1/2 of V − X (over i = 1, ..., n). Define for z = (z , ..., z ) Rd and a positive n,P i,n 1 d ∈ integer j 1, ≥ χ¯ν,P ν1 νd χ¯j,P (z)= j! z1 ...zd . ν1! ...νd! ν =j |X| For each j =1, 2, ..., let s 1 ∗ χ¯ (z) χ¯ (z) χ¯ (z) (3) P˜ (z : χ¯ ) j1+2,P j2+2,P ... jm+2,P , j { ν,P } ≡ m! (j + 2)! (j + 2)! (j + 2)! m=1 (j ,...,j 1 2 m ) X 1Xm where ∗ is the summation over all m-tuples (j , ..., j ) of positive integers such that j1,...,jm 1 m m j = s. (See (7.3) of Bhattacharya and Rao (2010) (BR, hereafter).) This polynomial k=1 kP has degree 3j, the smallest order of the terms in the polynomial is j+2, and the coefficients P in the polynomial involve only χ¯ ’s with ν j +2. (Lemma 7.1 of BR, p.52.) Following ν,P | | ≤ the convention, we define the derivative operators as follows:

∂ α α1 αd Dk ,k =1, ..., d, and D D1 Dd , ≡ ∂xk ≡ ··· for α = (α , ..., α ) 0, 1, 2, ... d and D = (D , ..., D ). For each P P , let Q be the 1 d ∈ { } 1 d ∈ n n,P distribution of n 1 1/2 ζ V − X n,P ≡ √n n,P i,n i=1 X 5

and define a signed measure Q˜ as follows: for any Borel set A Rd, n,s,P ⊂ s 2 − j/2 Q˜ (A) n− P˜ ( D : χ¯ )φ(x)dx, n,s,P ≡ j − { ν } A j=0 Z X where φ is the density of the standard on Rd. For each s 1, define ≥ 1 n ρ E X s. n,s,P ≡ n P k i,nk i=1 X We introduce notation for modulus of continuity for functions: for any Borel measurable real function f on Rd, and any measure µ, we define for ε> 0 and x Rd, ∈ ω (x; ε) sup f(z) f(y) : z,y B(x; ε) , and ω¯ (ε; µ) ω (x; ε)µ(dx), f ≡ {| − | ∈ } f ≡ f Z where B(x; ε) is the ε-open ball in Rd centered at x. For any measurable function f and for s> 0, we define f(x) Ms(f) sup | | s . ≡ x Rd 1+ x ∈ || || The theorem below is the main result of this paper which is a modification of Theorem 4.3 of AP with the bound made explicit in finite samples. The proof follows the arguments in the proof of Theorem 20.1 of BR and Theorem 4.3 of AP, making bounds in the error explicit in finite samples in each step.

Theorem 2.1. Suppose that for each P P , V is positive definite, and the following ∈ n n,P conditions hold for some integer s 3, and ρ,b,c,R¯ > 0 such that 0 0 and R 1/(16¯ρ). ≥

(i) supn 1 supP Pn ρs,n,P ρ<¯ . ≥ ∈ ≤ ∞ (ii) Pn satisfies the mean weak Cram´er condition with parameter (b, c, R).

Then, there exist constants n 1 and C > 0 such that for any Borel measurable function 0 ≥ f on Rd, for all n n , and P P , we have ≥ 0 ∈ n (s 2)/2 (s 2)/2 fd(Q Q˜ ) Cn− − (1 + M (f)) + Cω¯ (n− − ; Φ), n,P − n,s,P ≤ s f Z

where Φ denotes the distribution of N(0,I ), and n and C depend only on s, d, ρ¯, and d 0 (b, c, R).

The theorem reveals that under the mean weak Cram´er condition, the uniform-in-P

Edgeworth expansion of Qn,P is essentially obtained by the moment condition that is uni- form in P as in Condition (i) of Theorem 2.1. 6

It is worth noting that the bound in Theorem 2.1 depends on f only through Ms(f) and (s 2)/2 ω¯f (n− − ; Φ). When f is such that f 1, we have Ms(f) 1 and the bound depends (s 2)/2 | | ≤ ≤ on f only through ω¯f (n− − ; Φ). In particular, when we take f to be the indicator function of convex subsets of Rd, we obtain the following corollary which is a version of Corollary 20.5 of BR (p.215) with the finite sample bound made explicit here.

Corollary 2.1. Suppose that the conditions of Theorem 2.1 hold. Then, there exist constants n 1 and C > 0 such that for all n n , P P , and for all convex A Rd, 0 ≥ ≥ 0 ∈ n ⊂ (s 2)/2 Q (A) Q˜ (A) Cn− − , n,P − n,s,P ≤ where n and C depend only on s, d, ρ¯, and (b, c, R). 0 The result follows because for any indicator f(x)=1 x A on a convex set A, we { ∈ } have ω¯ (ε; Φ) c(s,d)ε for any ε> 0, where c(s,d) > 0 is a constant that depends only on f ≤ s and d. (See Corollary 3.2 of BR, p.24.)

3. An Edgeworth Expansion of Resampling Distributions

3.1. Preliminary Results on Weak Cram´er Condition

As noted by AP, the weak Cram´er condition is useful for dealing with distributions ob- tained from a resampling procedure. To clarify this in our context, let us introduce some notation. For any integers d,p 1, a number h > 0, and any given d p matrix u whose ≥ × j-th column is given by d-dimensional vector uj, j =1, ..., p, let p p M (u; h) c δ :(c )p (h, )p, c =1 , d,p ≡ j uj j j=1 ∈ ∞ j ( j=1 j=1 ) X X where δuj denotes the Dirac measure at uj. If we restrict cj = 1/n and p = n, the class

Md,p(u; h) becomes a singleton that contains 1 n (4) δ . n uj j=1 X This is the resampling distribution (equivalently, the (nonparametric) bootstrap distribu- tion or the empirical measure) of (u1, ..., un), when (u1, ..., un) is a realized value of the data Xn =(Xn,1..., Xn,n).

Let Ud,p(b,c,R,h) be the collection of u’s such that Md,p(u; h) does not satisfy the weak Cram´er condition with parameter (b, c, R). The following proposition is due to AP. (See Proposition 2.4 there.) 7

Proposition 3.1 (Angst and Poly (2017)). Suppose that p 3 and b> 1/(p 2). Then ≥ −

λ Ud,p(2b,c,R,h) =0, ! c,R,h>\ 0 where λ is Lebesgue measure on Rp.

Therefore, the weak Cram´er condition with parameter (2b, c, R) is generically satisfied

by Md,p(u; h) for some (c,R,h) for Lebesgue almost all u’s. However, it is not a trivial task to apply this result to obtain a higher order refinement property of a resampling procedure

(even if the data X1, ..., Xn are absolutely continuous with respect to Lebesgue measure.) The main reason is that the error bound in the Edgeworth expansion of such a distribution involves constants, b, c, R, which appear in the definition of the weak Cram´er condition. The constants, c,R,h, which make the Lebesgure measure in Proposition 3.1 small may depend on the dimension p. In many resampling procedures, this p grows to infinity as the size n of the original sample does so. Therefore, in order to determine the asymptotic behavior of the error bound in the Edgeworth expansion, we need to make explicit how the Lebegue measure in Proposition 3.1 behaves as we change c,R,h and p. Instead of working with Lebesgue measure, we directly work with the distribution of the

data Xn = (X1,n, ..., Xn,n) and, using part of the arguments in the proof of Lemma 5.1 of AP and an exponential tail bound of a U-statistic (Proposition 2.3(b) of Arcones and Gin´e (1993)), we obtain a finite sample bound for the probability that the empirical measure

of Xn does not satisfy the weak Cram´er condition with parameters (b, c, R). Later we use this result to obtain an Edgeworth expansion for a resampling distribution. This approach, as opposed to one that uses a result like Proposition 3.1, has the additional merit of not

having to require Xn to be continuous. n d Let U ∗(b, c, R) be the collection of (u ) , u R , such that the resampling distribution n j j=1 j ∈ (4) does not satisfy the weak Cram´er condition with parameter (b, c, R). Proposition 3.2 below gives an explicit finite sample bound for the probability that an i.i.d. random vector n Xn =(Xi,n)i=1 does not satisfy the weak Cram´er condition.

Proposition 3.2. Let X = (X )n be i.i.d. across i’s under P for any P P . Suppose n i,n i=1 ∈ n that there exists a constant c > 0 such that for all n 1 and for all P P , R ≥ ∈ n 1 2 (5) c sup E inf (t′(X X ) 2πq) , R 2 P Z 1,n 2,n ≤ t Rd: t >R 2π q − − ∈ k k  ∈  where Z ..., 1, 0, 1, ... . ≡{ − } 8

Then, for all n 1, ≥ 2 U b cRn sup P Xn n∗ b, cRR , R exp . P Pn ∈ ≤ − 2 ∈    

The existence of constant cR in Proposition 3.2 can be verified with lower level condi- tions for X =(X )n . Since we can always choose t Rd in (5) such that all the entries n i,n i=1 ∈ except for the first one are zero, let us focus on the case with d =1 and note that

2 2 2 EP inf (t(X1,n X2,n) 2πq) = t cP (k; r) R cP (k; r), q Z − − ≥  ∈  k Z k Z X∈ X∈ where r π/t, and ≡ 2 EP ((X1,n X2,n) rk) 1 X1,n X2,n (rk,r(k + 1))] , if k is even cP (k; r) − − { − ∈ } ≡ E ((X X ) r(k + 1))2 1 X X (rk,r(k + 1)] , if k is odd. ( P  1,n − 2,n − { 1,n − 2,n ∈  } Suppose that there exist r (0,π/R) and an integer k Z such that  ∈ ∈

(6) inf inf cP (k; r) > 0. n 1 P Pn ≥ ∈ Then, the condition (5) of Proposition 3.2 holds. This latter condition is very weak. Below, we provide some sufficient conditions.

Proposition 3.3. Suppose that Pn in Proposition 3.2 satisfies that there exist numbers cL < c and ε > 0 and j 1, ..., d such that the j-th entry X of X satisfies either of the U ∈ { } ij,n i,n following two conditions. (i) P X = c >ε and P X = c >ε, for all P P . { ij,n L} { ij,n U } ∈ n (ii) infx [c ,c ] fj,P (x) > ε, for all P Pn, where fj,P : R [0, ) is a measurable map ∈ L U ∈ → ∞ such that for any Borel A R, ⊂

P Xij,n [cL,cU ] A = fj,P (x)dx. { ∈ ∩ } [c ,c ] A Z L U ∩ Then there exists c > 0 such that Condition (5) of Proposition 3.2 holds for all n 1 and R ≥ all P P . ∈ n 3.2. An Edgeworth Expansion of Resampling Distributions

3.2.1. A General Result. Let us illustrate how the previous results can be applied to obtain a uniform-in-P Edgeworth expansion of a bootstrap distribution of a sum of independent random variables. The Edgeworth expansion of a bootstrap distribution for the i.i.d. ran- dom variables is well known in the literature. (See, for example, Hall (1992).) The result in this paper is distinct for two reasons. First, the Edgeworth expansion is uniform in P , 9

where P runs over a collection of the distributions of the random variable. Second, the Edgeworth expansion follows directly from the Edgeworth expansion for a sum of i.i.d. random variables, due to the use of the weak Cram´er condition.

Suppose that we have a triangular array of i.i.d. random vectors Xi,n taking values in d n R , following distribution P P , and the bootstrap sample X∗ , b =1, ..., B, (i.e., ∈ n { i,n,b}i=1 the i.i.d. draws from the empirical measure of X n ), and let the σ-field generated by { i,n}i=1 X n be F . Let { i,n}i=1 n 1 n Vˆ (X X )(X X )′ n ≡ n i,n − n i,n − n i=1 X 1/2 ˆ ∗ F and the conditional distribution of √nVn− (Xn,b Xn) (defined in (11)) given n be de- 1 n − F ∗ noted by Qn, n , where Xn,b n i=1 Xi,n,b∗ . Let χν∗ be the ν-th cumulant of the conditional ≡ d distribution of X∗ given F . Define for each Borel A R , i,n,b n P ⊂ s 2 − j/2 (7) Q˜ F (A) n− P˜ ( D : χ∗ )φ(x)dx. n, n ≡ j − { ν} A j=0 Z X Our purpose is to obtain a finite sample bound for the error term in the approximation of ˜ the bootstrap measure Qn,Fn by its Edgeworth expansion Qn,Fn . Define the event 1 n E (¯ρ) X s ρ¯ , n,0 ≡ n k i,nk ≤ ( i=1 ) X and let En,1(c1), c1 > 0, be the event that the smallest eigenvalue of Vˆn is bounded from

below by c1. Define

E (¯ρ, c ) E (¯ρ) E (c ). n 1 ≡ n,0 ∩ n,1 1

We are ready to provide an Edgeworth expansion of the bootstrap distribution of Qn,Fn in the theorem below. The result can be viewed as a variant of the form in (5.29) of Hall (1992), p.255, such that the constants are made explicit.2

Theorem 3.1. Let s 3 be a given integer and ρ¯ > 0 is a number. Suppose that X ’s are ≥ i,n such that (5) is satisfied for some c > 0 and 0

2 Note that the constants in (5.29) of Hall (1992) potentialy depend on P Pn, for example, through C8 that appears in (5.16) on page 248. ∈ 10

Then, there exist constants n 1 and C > 0 such that for all n n , and for all P P , 0 ≥ ≥ 0 ∈ n s−2 s−2 ˜ 2 2 E (8) P sup Qn,Fn (A) Qn,Fn (A) Cω¯1 A n− ; Φ 2Cn− n(¯ρ, c1) A A − − {·∈ } ≤ ∩  ∈     c2 n P (E (¯ρ, c )) exp R , ≥ n 1 − − 2   where C and n0 depend only on s, d, ρ¯, b, cR, and R. (s 2)/2 Furthermore, if each set A A is convex, then (8) holds with ω¯1 A (n− − ; Φ) replaced s−2 ∈ {·∈ } by c(s,d)n− 2 , where c(s,d) > 0 is a constant that depends only on s and d.

E c In applying the theorem, it remains for us to determine P ( n (¯ρ, c1)) and the last supre- mum in (8). Following the arguments on pages 245 - 246 of Hall (1992), it is not hard to E c λ show that supP Pn P ( n (¯ρ,c1)) Cn− for some constant C > 0 and λ> (s 2)/2 under ∈ ≤ − an appropriate uniform-in-P moment condition for Xi,n.

3.2.2. A Modulus of Continuity of Resampling Distributions. Our next result is related to the modulus of continuity of the resampling distribution Qn,Fn . As the distribution

Qn,Fn is discrete, the modulus of continuity around convex sets can be established only up 1/2 to n− when we use a normal approximation of Qn,Fn . However, using a higher order ˜ approximation of Qn,Fn by Qn,Fn , we can achieve the modulus of continuity up to the order (s 2)/2 n− − , depending on the moment conditions.

Given c2 > 0, define the event 1 n d vk En,2(c2) max Xik,n c2 , Z ≡ v (v1,...,vd) +: v s n | | ≤ ( ≡ ∈ | |≤ i=1 ) X kY=1 where Z 0, 1, 2, ... and X denotes the k-th entry of X . Let + ≡{ } ik,n i,n E (¯ρ, c ,c ) E (¯ρ,c ) E (c ). n 1 2 ≡ n 1 ∩ n,2 2 c λ Under appropriate moment conditions, one can show that P E (c ) = O(n− ) as n , n,2 2 → ∞ for some λ > δ and large c2 > 0. (See Chapter 5 of Hall (1992) for details.)

Corollary 3.1. Suppose that the assumptions of Theorem 3.1 hold, and let C and n0 be as in Theorem 3.1. For any set A Rd and for η > 0, Aη be its η-enlargement, i.e., Aη = x ⊂ { ∈ Rd : y A, s.t. x y η . Let C be the collection of convex sets. ∃ ∈ k − k ≤ } d Then for all n n , η > 0, and P P , ≥ 0 ∈ n η (s 2)/2 E P sup Qn,Fn (A ) Qn,Fn (A) C′n− − + c1(s,d)c2η n(¯ρ, c1,c2) A C | − | ≤ ∩  ∈ d   c2 n P (E (¯ρ, c ,c )) exp R , ≥ n 1 2 − − 2   11 where C′ > 0 is a constant that depends only on C, s and d, and c1(s,d) is a constant that depends only on s and d.

3.2.3. Uniform-in-P Edgeworth Expansion of a t-Test Statistic. Suppose that W n { i,n}i=1 is a triangular array of random variables which are i.i.d. drawn from a common distribu- tion P . Let us assume that this distribution belongs to a collection Pn of distributions. n Let W ∗ , b = 1, ..., B, be the bootstrap sample drawn with replacement from the { i,n,b}i=1 empirical distribution of W n . Define the sample variance: { i,n}i=1 n 2 1 2 s∗ (W ∗ W ∗ ) , n,b ≡ n i,n,b − n,b i=1 X 1 n where W ∗ W ∗ . Then, we are interested in the uniform-in-P Edgeworth ex- n,b ≡ n i=1 i,n,b pansion of the bootstrap distribution of the following: P 1 n (Wi,n,b∗ W n), s∗ √n − n,b i=1 X where W 1 n W . Following Chapter 5 of Hall (1992), we let n ≡ n i=1 i,n P n 1 2 (9) X∗ W ∗ , W ∗ , n,b ≡ n,b n i,n,b i=1 ! X and write 1 n ∗ (10) (Wi,n,b∗ W n)= √ngn(Xn,b), s∗ √n − n,b i=1 X where g (x)=(x W )/ x x2, x = (x , x ), with x > x2. Our focus is on the n 1 − n 2 − 1 1 2 2 1 Edgeworth expansion of thep bootstrap distribution of the test statistic of the form:

(11) T ∗ √ng (X∗ ). n,b ≡ n n,b 2 F n Let Xi,n = (Wi,n, Wi,n) and n be the σ-field of Xn = (Xi,n)i=1. Let Qn,Fn be the condi- 1/2 tional distribution of √nVˆn− (X∗ X ) given F . Define for any t R, n,b − n n ∈ 1/2 Vˆn x fˆn,t(x) 1 √ngn Xn + t . ≡ ( √n ! ≤ ) ˆ Then the map Qn,Fn (fn,t) represents the bootstrap distribution of Tn,b∗ . Note that we cannot use the convex set approach to deal with the last term in (8) as before, because the set of ˆ x’s such that fn,t(x)=1 is not convex. Instead, we invoke Lemma 5.3 of Hall (1992). 12

We define for constants s,c3 > 0,

α En,3(c3) max D gn(Xn) c3 , ≡ α s+3 ≤ | |≤ 

and denote by En,3(c3), c3 > 0, the event that the largest eigenvalue of Vˆn is bounded by

c3. Define

E (¯ρ, c ,c ,c ) E (¯ρ) E (c ) E (c ) E (c ). n 1 2 3 ≡ n,0 ∩ n,1 1 ∩ n,2 2 ∩ n,3 3 Using Lemma 5.3 of Hall (1992), we can obtain the following corollary.

Corollary 3.2. Suppose that the conditions of Theorem 3.1 hold. Then, there exist constants

n′ 1 and C > 0 such that for all n n′ and all P P , 0 ≥ ≥ 0 ∈ n s−2 ˆ ˜ ˆ 2 E (12) P sup Qn,Fn (fn,t) Qn,Fn (fn,t) Cn− n(¯ρ, c1,c2,c3) t R | − | ≤ ∩  ∈   c2 n P (E (¯ρ, c ,c ,c )) exp R , ≥ n 1 2 3 − − 2   where C and n0′ depend only on s, d, ρ¯, b, cR, R, n0, and c1,c2,c3.

E c As for the probability P ( n (¯ρ,c1,c2,c3)), we can find its finite sample bound using stan- (s 2)/2 dard arguments, and show it to be o(n− − ) uniformly over P P , when there is a ∈ n uniform bound for the population moments, and a uniform lower and upper bound for

the eigenvalues of V E [(X E X )(X E X )′]. Details can be furnished n,P ≡ P i,n − P i,n i,n − P i,n following the same arguments in Chapter 5 of Hall (1992).

4. Proofs

We introduce notation. For a measure or a signed measure µ and a measurable function f, we write

µ(f) fdµ, ≡ Z for brevity. Recall that the Fourier-Stieltjes transform of a signed measure µ on Rd is a complex-valued function on Rd:

′ µˆ(t) eit xµ(dx), t Rd, ≡ Rd ∈ Z d 2 where i = √ 1. Recall that denotes the Euclidean norm on R , i.e., a = a′a, a R. − d k·k d k k ∈ We also define a a , for any a =(a , ..., a )′ R . | | ≡ k=1 | k| 1 d ∈ P 13

Define

(13) Y X 1 X √n , and Z Y E Y , i,n ≡ i,n {k i,nk ≤ } i,n ≡ i,n − P i,n and let φ (t) E [exp(it′Z )] and φ (t) E [exp(it′X )]. Let us begin with an i,n,P ≡ P i,n Xi,n,P ≡ P i,n auxiliary lemma which is a uniform-in-P modification of Proposition 2.10 and Lemma 5.4 of AP.

Lemma 4.1. Let X n be a triangular array of independent random vectors taking values { i,n}i=1 Rd P n P in , and let n be the collection of the joint distributions of (Xi,n)i=1 such that n satisfies the mean weak Cram´er condition with parameter (b, c, R) for some b,c,R > 0. Furthermore, assume that 1 n sup sup EP [ Xi,n ] < . n 1 P n k k ∞ P n i=1 ≥ ∈ X Then there exist ε> 0 and n 1 such that for all 0 ¯ 0, ≥ n 1 s sup sup EP [ Xi,n ] ρ.¯ n 1 P n k k ≤ P n i=1 ≥ ∈ X Then, there exist ε> 0 and n such that for all 0 0 and n0 depend only on s, ρ¯, b, c, R and r.

Proof: The first statement is obtained by following the proof of Proposition 2.10 of AP. More specifically, suppose to the contrary that we have 1 n lim sup sup sup φXi,n,P (u) =1. n P n | | r u R P n i=1 →∞ ≤k k≤ ∈ X 14

Then by the definition of supremum, there exists a sequence (u ,P ) such that r u n n ≤k nk ≤ R and P P , and n ∈ n 1 n 1 n lim sup sup sup φXi,n,P (u) = lim sup φXi,n,Pn (un) . n P n | | n n | | r u R P n i=1 i=1 →∞ ≤k k≤ ∈ X →∞ X We can choose a subsequence n(k) k∞=1 n n 1 such that { } ⊂{ } ≥ n(k) 1 n 1 lim sup φXi,n,Pn (un) = lim φXi,n(k),Pn(k) (un(k)) . n n | | k n(k) | | i=1 →∞ i=1 →∞ X X We can follow the rest of the proof in AP, p.6-7, to arrive at a contradiction to the weak Cram´er condition. As for the second result, we take supremum over P P on both sides of (5.17) in ∈ n Lemma 5.4 of AP to obtain that 1 n 1 n 2¯ρ sup sup φi,n,P (t) sup sup φi,n,P (t) + . P n | | ≤ P n | | ns/2 r t R P n i=1 r t R P n i=1 ≤k k≤ ∈ X ≤k k≤ ∈ X The proof can be completed following the same arguments in the proof of the lemma using Definition 2.1. 

Proof of Theorem 2.1: We walk through the steps in the proofs of Theorem 20.1 of BR and Theorem 4.3 of AP, making bounds in the error explicit in finite samples. Throughout

the proof, we assume without loss of generality that Vn,P = Id. While the proof of Theorem

20.1 and Theorem 4.3 assumes that Xi,n’s are i.i.d., here we make it explicit that they are allowed to be non-identically distributed. In the proof, as in BR, we use notation cj(x) to

denote a constant that depends only on x, so that, for example, c1(s,d) is a constant that

depends only on s and d. In following the proof of Theorem 20.1 of BR, we take s′ in the theorem to be s for simplicity.

Recall the definitions of Yi,n and Zi,n in (13). Let Qn,P′ and Qn,P′′ be the distributions of 1 n 1 n √n i=1 Zi,n and √n i=1 Yi,n respectively. We take P P 1 n a E[Y ], n ≡ √n i,n i=1 X and define

(14) f (x) f(x + a ). an ≡ n Define for any constant a> 0 and integer s 1, ≥ 1 n ¯ (a) E X s1 X a√n . △n,s,P ≡ n P || i,n|| || i,n|| ≥ i=1 X    15

Let δ > 0 be such that s 2 1 1 (15) − <δ< + . 2 b 2 Existence of such δ > 0 is ensured by the condition that b < 2/ max s 3, 1 made in the { − } theorem.

Using the identity Qn,P′′ (f)= Qn,P′ (fan ), we bound Q (f) Q˜ (f) A + A + A , | n,P − n,s,P | ≤ 1n 2n 3n where

A Q (f) Q′′ (f) , 1n ≡ | n,P − n,P | A Q′ (f ) Q˜ (f ) , and 2n ≡ | n,P an − n,s,P an | A Q˜ (f ) Q˜ (f) . 3n ≡ | n,s,P an − n,s,P |

By (20.8) of BR, p.208, and (20.12) of BR, p.209, we find that (taking s′ = s there)

(s 2)/2 (16) A + A c (s,d)M (f)n− − ∆¯ (1) 1n 3n ≤ 1 s n,s,P (s 2)/2 c (s,d)M (f)n− − ρ.¯ ≤ 1 s Let us focus on finding a bound for A . Let D 1 n Var (Z ) and define a signed 2,n n,P ≡ n i=1 P i,n measure Q˜ as n,s,P′ P s 2 − j/2 (17) Q˜′ (A) n− P˜ ( D; χ¯ )φ (x)dx, for any Borel set A, n,s,P ≡ j − { ν,P } 0,Dn,P A j=0 Z X

where φ0,Dn,P denotes the density of N(0,Dn,P ). We bound

(18) A Q˜′ (f ) Q˜ (f ) + A′ 2n ≤ | n,s,P an − n,s,P an | 2n (s 2)/2 c (s,d)M (f)n− − ∆¯ (1) + A′ , ≤ 2 s n,s,P 2n

where A′ Q′ (f ) Q˜′ (f ) , and the last inequality comes from (20.13) of BR, 2n ≡ | n,P an − n,s,P an | p.209 (again, taking s′ = s). Let us focus on A2′ n. Define

H Q′ Q˜′ . n,P ≡ n,P − n,s,P

For any positive number 0 <ε< 1, we define Kε to be a probability measure such that

K ( x Rd : x <ε )=1, ε { ∈ k k } and

ε α 1/2 D Kˆ (t) c (s,d)ε| | exp (ε t ) , | ε | ≤ 3 − k k  16

for all t Rd and all α s + d +1. Then, by (20.17) of BR, p.210, we find that ∈ | | ≤ (19) H (f ) M (f) (1+( x + ε + a )s) H K (dx) | n,P an | ≤ s k k k k | n,P ∗ ε| Z +¯ω 2ε; Q˜′ B + B , say. fan | n,s+d,P | ≡ 1n 2n From (20.19) of BR, p.210, we bound 

β B1n c4(s,d) max D (Hˆn,P Kˆε)(t) dt. ≤ 0 β s+d+1 | | ≤| |≤ Z We write

β β α α D (Hˆn,P Kˆε)= c5(α, β)(D − Hˆn,P )(D Kˆε). 0 α β ≤X≤ Now, let us focus on finding a bound for

β α α (20) (D − Hˆn,P (t))(D Kˆε(t)) dt. Z For any d d matrix A, we define the operator norm A supx Rd: x 1 Ax as usual. × k k ≡ ∈ k k≤ k k When A is symmetric, we have A2 = A 2. Define for some constant c (s,d) which k k k k 6 depends only on s and d, c (s,d)n1/2 (21) 6 An 1/(s+d 1) , ≡ 1 n 1/2 s+d+1 − 1/2 E D− Z Λn n i=1 P k n,P i,nk   where Λn denotes the maximumP eigenvalue of Dn,P . We bound the integral in (20) by

β α α (22) (D − Hˆn,P (t))(D Kˆε(t)) dt t An Zk k≤ β α α + (D − Hˆn,P (t))(D Kˆε(t )) dt. t >An Zk k

By (20.21) of BR, p.210, we can choose c6(s,d) in the definition of An so that the leading term in (22) is bounded by n (s+d 1)/2 1 1/2 s+d+1 (23) c (s,d)n− − E D− Z 7 n P k n,P i,nk i=1 h i X n (s+d 1)/2 1/2 s+d+1 1 s+d+1 c (s,d)n− − D− E Z ≤ 7 k n,P k n P k i,nk i=1 Xn   (s+d 1)/2 1 (s+d+1)/2 1 s+d+1 = c (s,d)n− − D− E Z 7 k n,P k n P k i,nk i=1 X   17

From (14.23) and (14.24) of BR, p.125, and our assumption that Vn,P = Id, and because D is symmetric and D denotes the operator norm of D , we find that n,P k n,P k n,P (s 2)/2 (24) Dn,P Id sup t′(Dn,P Id)t 2dn− − ∆¯ n,s,P (1) k − k ≡ t 1 − ≤ k k≤ (s 2)/2 2dn− − ρ,¯ ≤ and

1 1 1 (25) D− = (I +(D I ))− k n,P k k d n,P − d k ≤ 1 D I −k n,P − dk 1 . ≤ 1 2dn (s 2)/2∆¯ (1) − − − n,s,P As noted in (20.23) of BR, p.211,

(26) E Z s+d+1 2s+d+1E Y s+d+1. P k i,nk ≤ P k i,nk For any ξ (0, 1], we can bound ∈ 1 n 1 n (27) E Y s+d+1 (ξn1/2)d+1 E 1 X ξn1/2 X s n P k i,nk ≤ n P {k i,nk ≤ }k i,nk i=1 i=1 X X   1 n +n(d+1)/2 E 1 ξn1/2 < X n1/2 X s n P { k i,nk ≤ }k i,nk i=1 X   n(d+1)/2 ξd+1ρ + ∆¯ (ξ) 2n(d+1)/2ρ.¯ ≤ n,s,P n,s,P ≤ This gives a bound for 1 n E Z s+d+1 through (26). Using this bound and (25), we n i=1 P k i,nk find that for any ξ > 0, P n 2s+d+1n(d+1)/2 ξd+1ρ + ∆¯ (ξ) 1 E 1/2 s+d+1 n,s,P n,s,P (28) P Dn,P− Zi,n , n k k ≤ (s 2)/2 ¯ (s+d+1)/2 i=1 1 2dn− − ∆n,s,P (1)  X h i − and from (23), we obtain that 

β α α (29) (D − Hˆn,P (t))(D Kˆε(t)) dt t An Zk k≤ (s 2)/2 d+1 c′ (s,d)n− − ξ ρn,s,P + ∆¯ n,s,P ( ξ) 7 , ≤ (s 2)/2 ¯ (s+d+1)/2 1 2dn− − ∆n,s,P (1)  − s+d+1 where c7′ (s,d)= c7(s,d)2 .  Let us turn to the second integral in (22). Note first that

(s 2)/2 Λ D D I + I 1+2dn− − ∆¯ (1) n ≤k n,P k≤k n,P − dk k dk ≤ n,s,P (s 2)/2 1+2dn− − ρ,¯ ≤ 18

by (24). Using this and noting (28), we find from the definition of An in (21) that

s−2 s+d+1 − (s 2)/2 2(s+d−1) c (s,d)n 2(s+d 1) 1 2dn− − ∆¯ (1) (30) 6 n,s,P An 1 − 1/2 ≥ d+1 s+d−1 (s 2)/2 ¯ ξ ρn,s,P + ∆¯ n,s,P (ξ) 1+2dn− − ∆n,s,P (1) − s+d+1 s 2 − 2(s+d−1) (s 2)/2 2(s+d 1)  c6(s,d)n 1 2dn − − ρ¯ 1 − 1/2 . ≥ s+d−1 (s 2)/2 (2¯ρ) (1+2dn− − ρ¯) We take √n q , n ≡ 16¯ρ and, as in (20.26) of BR, p.211, let us bound

β α α (D − Hˆn,P (t))(D Kˆε(t)) dt I1 + I2 + I3, t >An ≤ Zk k

where β α ˆ α ˆ I1 (D − Qn,P′ (t))(D Kε(t)) dt ≡ t >qn Zk k β α 5 2 I2 c8(s,d) 1+ t | − | exp t dt, and ≡ t >An k k −24k k Zk k   s+d 2  − β α j/2 1 I3 D − n− P˜j(it; χ¯ν,P ) exp t′Dn,P t dt. ≡ t >An { } −2 Zk k j=0   X Let us deal with I first. Choose any large number w 1. We bound 2 ≥ w β α w 5 2 I2 c8(s,d)An− 1+ t | − | t exp t dt. ≤ t >An k k k k −24k k Zk k    The last integral is bounded for any fixed w 1. In the light of the lower bound for A in ≥ n (30), we find that for any w 1, there exist a constant C and a number n 1 which ≥ 0,1 ≥ depend only on (s, d, ρ,¯ w) such that for all n n , ≥ 0,1 (s−2)w I Cn− 2(s+d−1) . 2 ≤

Since the integrand in I involves exp( t′D t/2), we obtain the same conclusion for I . 3 − n,P 3 The rest of the proof focuses on finding a finite sample bound for I1. In doing so, we switch to the proof of Theorem 4.3 of AP. By (5.15) of AP, p.19, we bound

α β α β α I1 c9(s,d)ε| |2| − | − Jγ (n, ε), ≤ γ γ (Zd )n: γ = β α ! ∈ + X| | | − | 19 where n d/2 (√nε t )1/2 Jγ (n, ε) n φi,n,P (t) e− k k dt, ≡ t qn/√n Zk k≥ i=1,γi=0 Y and

β α β α ! − = | n− | . γ γi ! ! i=1 | | We let Q 1 r , ≡ 16¯ρ so that q /√n = r for all n 1. With this choice of r > 0, from (5.16) of AP, p.20, we n ≥ deduce that for all n n , ≥ 1 n J (n, ε) nd/2 exp (n γ ) log K (n, ε), γ ≤ −| | n γ γ   −| | where n 1 (ε√n t )1/2 K (n, ε) exp (n γ ) log φ (t) e− k k dt. γ ≡ −| | n | i,n,P | t r i=1 !! Zk k≥ X By Lemma 4.1, there exists n¯ 1 such that for all n n¯ and all t Rd satisfying ≥ ≥ ∈ R< t (c/(4¯ρ))1/bns/(2b), k k ≤ 1 n c sup φi,n,P (t) 1 . P n | | ≤ − 2 t b P n i=1 ∈ X k k Take n′ = max n,¯ n , and write 0 { 0,1} 1 2 3 Kγ(n, ε)= Kγ (n, ε)+ Kγ (n, ε)+ Kγ (n, ε), where n 1 1 (ε√n t )1/2 K (n, ε) exp (n γ ) log φ (t) e− k k dt γ ≡ −| | n | i,n,P | r t R i=1 !! Z ≤k k≤ X n 2 1 (ε√n t )1/2 Kγ (n, ε) exp (n γ ) log φi,n,P (t) e− k k dt ≡ 1/b s/(2b) −| | n | | R< t (c/(4¯ρ)) n i=1 !! Z k k≤ X n 3 1 (ε√n t )1/2 Kγ (n, ε) exp (n γ ) log φi,n,P (t) e− k k dt. ≡ 1/b s/(2b) −| | n | | (c/(4¯ρ)) n < t i=1 !! Z k k X 20

1 2 3 By following the proof of AP, dealing with Kγ (n, ε),Kγ (n, ε),Kγ (n, ε) one by one, we can show that

1 2 3 Kγ (n, ε)+ Kγ (n, ε)+ Kγ (n, ε)

converges to zero faster than any polynomial rate in n uniformly over P Pn even when δ ∈ we take ε = n− /2. Thus we conclude that for any w 1, there exist a constant C and a ≥ number n 1 which depend only on (s, d, ρ,b,c,R,δ,w¯ ) such that for all n n , 0,2 ≥ ≥ 0,2 w I + I + I Cn− . 1 2 3 ≤ Thus, there exists n 1 which depends only on (s, d, ρ,b,c,R,δ,w¯ ) such that for all 0 ≥ n n0, the first integral in (22) dominates the second one. Collecting the bounds in ≥ δ (16),(18),(19), and (29), we find that (taking ε = n− /2)

(s 2)/2 δ (31) Q (f) Q˜ (f) c (s,d)M (f)n− − ρ¯ +¯ω n− ; Q˜′ . | n,P − n,s,P | ≤ 12 s fan | n,s+d,P | ˜   Recall the definition Qn,s,P′ in (17) and define

P ( Φ ; χ¯ )(A) P˜ ( D; χ¯ )φ (x)dx, for any Borel set A, j − 0,Dn,P { ν,P } ≡ j − { ν,P } 0,Dn,P ZA

where Φ0,Dn,P denotes the distribution of N(0,Dn,P ). As for the last term in (31), we follow (20.36) and (20.37) of BR, p.213, to find that for

absolute constants C,C′ > 0, for any 0 j s 2, ≤ ≤ − δ j ω¯ n− ; n− 2 P ( Φ ; χ¯ ) fan | j − 0,Dn,P { ν,P } |   s′ δ Cρ M (f) (1 + x ) φ (x) φ(x) dx +¯ω (n− ; Φ) ≤ n,s,P s k k | an,Dn,P − | f  Z  j ′ 2 3j+s +Cn− ρn,s,P 1+ x φan,Dn,P (x)dx x >n1/6 k k k k Z −   s 2 δ c (s,d,j)M (f)n− 2 ∆¯ (1) + C′ρ ω¯ n− ; Φ , ≤ 13 s n,s,P n,s,P f  where the last inequality uses Lemma 14.6 of BR, p.131. (Here φan,Dn,P denotes the density of N(a ,D ).) Also, from (20.39) of BR on p.213, for s 1 j s + d 2, n n,P − ≤ ≤ − δ j ω¯ n− ; n− 2 P ( Φ ; χ¯ ) fan | j − 0,Dn,P { ν,P } |  n  j 1 j+2 3j+s c (s,d,j)n− 2 E Z M (f) 1+ x φ (x)dx. ≤ 14 n P k i,nk s k k an,Dn,P i=1 Z X  21

From (26) and (27), the last term is bounded by

(( s+j+2) j)/2 s+j+2 c15(s,d,j)Ms(f)n − − ξ− ρn,s,P + ∆¯ n,s,P (ξ) (s 2)/2 c16(s,d,j)Ms(f)n− − ξρn,s,P + ∆¯ n,s,P (ξ) , ≤ because 0 <ξ 1. Hence  ≤ δ ω¯ n− ; Q˜′ fan | n,s+d,P |  (s 2)/2 δ c (s,d)M (f)n− − ξρ + ∆¯ (ξ)+ ∆¯ (1) + C′ρ ω¯ n− ; Φ ≤ 17 s n,s,P n,s,P n,s,P n,s,P f (s 2)/2 δ 3c17(s,d)Ms(f)n− − ρ¯ + C′ρ¯ω¯f n− ; Φ  ≤ Combining this with (31), we obtain that for all n  n , ≥ 0 (s 2)/2 δ fd(Q Q˜ ) Cn− − (1 + M (f)) + Cω¯ (n− ; Φ), n,P − n,s,P ≤ s f Z

for constants n and C that depend only on s, d, ρ¯, and (b, c, R). The desired bound follows 0 by (15). 

We turn to proving Proposition 3.2. Define

2 ξ(ui,uj; t) inf (t′(ui uj) 2πq) , q Z ≡ ∈ − − and let for t Rd, ∈ 1 n A (c,r; t) (u )n Rnd : ξ(u ,u ; t)rb c . n ≡ i i=1 ∈ π2n(n 1) i j ≤ ( i=j ) − X6 Lemma 4.2. For any t˜ Rd such that t˜ > R, ∈ k k A (c, t ; t) A (c, R; t˜). n k k ⊂ n t Rd: t >R ∈ [k k Proof: Take t, t˜ Rd such that t , t˜ > R, and note that ∈ k k k k t˜′tt˜ ′(ui uj) t′(u u )= − = t˜′(˜u u˜ ), i − j t˜ 2 i − j k k 2 d where u˜ tt˜ ′u / t˜ . As t′u runs in R, so does t˜′u˜ . Hence for any t , t R such that i ≡ i k k i i 1 2 ∈ t , t > R, k 1k k 2k A (c, t ; t )= A (c; t , t˜). n k 1k 2 n k 1k

Furthermore, for any a, a′ > 0 such that a a′, ≤ A (c, a′; t˜) A (c, a; t˜). n ⊂ n 22

Thus

A (c, t ; t)= A (c, t ; t˜) A (c, R; t˜). n k k n k k ⊂ n t Rd: t >R t Rd: t >R ∈ [k k ∈ [k k 

1 n Proof of Proposition 3.2: Let W be a random vector distributed as n− i=1 δui . From the proof of Lemma 5.1 of AP, we observe that for all t Rd, ∈ P n ′ 2 1 E[eit W ] 2 ξ(u ,u ; t) −| | ≥ π2n2 i j i=j X6 2n(n 1) 1 n = − ξ(u ,u ; t) π2n2 n(n 1) i j i=j − X6 2 1 n ξ(u ,u ; t). ≥ π2 n(n 1) i j i=j − X6 Therefore, n it′W 2 1 1 E[e ] ′ ξ(u ,u ; t) −| | ≥ π2(1 + E[eit W ] ) n(n 1) i j i=j | | − X6 1 n ξ(u ,u ; t). ≥ π2n(n 1) i j i=j − X6 This means that if for all t > R, k k 1 n ξ(u ,u ; t) t b c, π2n(n 1) i j k k ≥ i=j − X6 1 n then n− i=1 δui satisfies the weak Cram´er condition with parameter (b, c, R). Hence for any c> 0, and any t˜ > R, P k k U ∗(b, c, R) A (c, t ; t) A (c, R; t˜), n ⊂ n k k ⊂ n t Rd: t >R ∈ [k k by Lemma 4.2. Let for t Rd, ∈ 1 2 cP (t) EP inf (t′(X1,n X2,n) 2πq) . ≡ 2π2 q Z − −  ∈  We take t˜ Rd such that t˜ > R and for all n 1 and P P , ∈ k k ≥ ∈ n (32) c c (t˜). R ≤ P 23

Such t˜exists due to the condition (5). We find that

b b (33) U ∗(b, c (t˜)R , R) A c (t˜)R , R; t˜ . n P ⊂ n P We write P X A c (t˜)Rb, R; t˜ as  n ∈ n P  1  n (34) P (ξ (t˜) E[ξ (t˜)])Rb c (t˜)Rb π2n(n 1) ij − ij ≤ − P ( i=j ) − X6 1 n = P (ξ (t˜) E[ξ (t˜)]) π2c (t˜) , n(n 1) ij − ij ≤ − P ( i=j ) − X6 where ξ (t) ξ(X ,X ; t). Note that ij ≡ i,n j,n 2 sup ξ(u1,u2; t˜) π . d u1,u2 R ≤ ∈ By Proposition 2.3(b) of Arcones and Gin´e (1993), the last probability in (34) is less than n(π2c (t˜))2 nc2 (t˜) exp P exp P . − 2π4 ≤ − 2     We conclude that 2 ˜ b ncP (t) P X U ∗(b, c (t˜)R , R) exp . n ∈ n P ≤ − 2    2 Note that U ∗(b, c, R) is increasing in c and exp( nc /2) is decreasing in c. Thus we obtain n − the desired result from (32). 

Proof of Proposition 3.3: We show (6) with Xi,n there replaced by Xij,n. Suppose that Condition (i) holds. We take r (0,π/R) and an integer k so that rk

c (k; r) ε2 min (a a rk)2, (a a r(k + 1))2 , P ≥ j2 − j1 − j2 − j1 − fulfilling condition (6) for Xij,n.

Suppose now that Condition (ii) holds. We take a′ < a′′ < b′′ < b′ and r (0,π/R) ∈ and an integer k such that a a′ < a′′ < b′′ < b′ b and 0 < rk < b′′ a′′ < rk + ≤ ≤ − r/2 < b′ a′ < r(k + 1). Then, whenever X [a′, a′′] and X [b′′, b′], we have − 2,n ∈ 1,n ∈ b′′ a′′ X X b′ a′. Hence − ≤ 1,n − 2,n ≤ − 2 2 c (k; r) P X [a′, a′′] P X [b′′, b′] min (b′ a′ r(k + 1)) , (b′′ a′′ rk) P ≥ { 2,n ∈ } { 1,n ∈ } − − − − 2 2 2 ε (a′′ a′)(b′ b′′) min (b′ a′ r(k + 1)) , (b′′ a′′ rk) . ≥ − − − − − −

Therefore, again, condition (6) is fulfilled for Xij,n.  24

Proof of Theorem 3.1: Let U be the set of u = (u )n R2dn such that the discrete j j=1 ∈ measure 1 n δ (with δ denoting Dirac measure at u R2d) fails to satisfy the weak n j=1 uj uj j ∈ Cram´er condition in Definition 2.1 with parameter (b, c Rb, R). We define P R E˜ (X )n U . n,0 ≡{ i,n i=1 ∈ } By Proposition 3.2, 2 ˜ cRn sup P (En,0) exp , P Pn ≤ − 2 ∈   which decreases at a rate faster than any polynomial rate in n. Now, we focus on the event E˜c E (¯ρ, c ). Note that by Theorem 2.1 (noting that n,0 ∩ n 1 δ > (s 2)/2) we have − ˜ (s 2)/2 (s 2)/2 Qn,Fn (A) Qn,Fn (A) 2Cn− − + Cω¯1 A n− − ; Φ , − ≤ {·∈ }

where C > 0 is the same constant that appears in Theorem2.1. Thus we obtain the first

statement of the theorem. The second statement follows by Corollary 3.2 of BR, p.24. 

Proof of Corollary 3.1: First, we bound

η Q F (A ) Q F (A) A + A + A , | n, n − n, n | ≤ 1n 2n 3n where

η η A Q F (A ) Q˜ F (A ) , 1n ≡ n, n − n, n

A Q F (A) Q˜ F (A) , and 2n ≡ n, n − n, n η A Q˜ F (A ) Q˜ F (A ) . 3n ≡ n, n − n, n

By Theorem 3.1, for all n n , ≥ 0 (s 2)/2 P A + A 2(2C + c(s,d))n− − E (¯ρ, c ,c ) 1n 2n ≤ ∩ n 1 2 c2 n P (E (¯ρ, c ,c )) exp R .  ≥ n 1 2 − − 2   On the event En(¯ρ, c1,c2), we have

η η Q˜ F (A ) Q˜ F (A) Q˜ F (A A) c (s,d)c η, n, n − n, n ≤ n, n \ ≤ 1 2

where c (s,d) > 0 is a constant that depends only on s,d . The last inequality follows by 1 Corollary 3.2 of BR, p. 24. Therefore, we obtain the desired result. 

Proof of Corollary 3.2: Let s (s 1)/2 η n− − 2 c d log n . n ≡ 3  p  25

We follow the arguments on pages 253-254 of Hall (1992) and write

1/2 √ngn(Xn + n− x)= gn,1(x)+ gn,2(x),

where, with xk denoting the k-th entry of x, 2 ∂gn(Xn) 1 1/2 ∂ gn(Xn) gn,1(x) xk + n− xk1 xk2 ≡ ∂xk 2 ∂xk1 ∂xk2 Xk kX1,k2 s 1 1 (s 2)/2 ∂ − gn(Xn) +... + n− − xk1 xks−1 , (s 1)! ∂xk1 ...∂xks−1 ··· − k1,...,kXs−1 and g (x) is such that g (x) c′(c ,s,d)η , for some constant c′(c ,s,d) > 0 that n,2 | n,2 | ≤ 1 n 1 depends only on c , s and d, whenever x 2√c d log n. Let 1 k k ≤ 3 1/2 1/2 1/2 f (x) 1 n g X + n− Vˆ x t , n,t ≡ n n n ≤ A n 1/2  1/2 1/2  o 1/2 f (x) 1 n g X + n− Vˆ x t, Vˆ x 2 c d log n , n,t ≡ n n n ≤ k n k ≤ 3 B n 1/2  1/2 1/2  1/2 p o f (x) 1 n g X + n− Vˆ x t, Vˆ x > 2 c d log n , and n,t ≡ n n n ≤ k n k 3 n   o f C (x) 1 g Vˆ 1/2x t, Vˆ 1/2x 2 c d log n . p n,t ≡ n,1 n ≤ k n k ≤ 3 Let n   p o

E ′ E (¯ρ, c ,c ,c ), n ≡ n 1 2 3 for simplicity. We apply Theorem 3.1 to find that for all n n , ≥ 0 ˜ (s 2)/2 (s 2)/2 E P sup Qn,Fn (fn,t) Qn,Fn (fn,t) Cω¯fn,t n− − ; Φ Cn− − n′ t R − − ≤ ∩  ∈   2  cRn P (E ′ ) exp , ≥ n − − 2   where C and n0 are constants that appear in Theorem 3.1. Observe that

(s 2)/2 (s 2)/2 (s 2)/2 ω¯f n− − ; Φ ω¯ A n− − ; Φ +¯ω B n− − ; Φ n,t ≤ fn,t fn,t (s 2)/2 (s 2)/2  ω¯ C n− − + c′(c1,s,d)ηn; Φ +¯ω B n− − ; Φ . ≤ fn,t fn,t By Lemma 5.3 of Hall (1992), we have  

(s 2)/2 (s 2)/2 ω¯ C n− − + c′(c1,s,d)ηn; Φ c′′(c1,s,d) n− − + ηn fn,t ≤ (s 2)/2  2c′′(c ,s,d)n− − ,  ≤ 1

for all n n , where c′′(c ,s,d) depends only on c ,s,d and n depends only on s, d and ≥ 1 1 1 1 c3. 26 E Note that on the event n′ ,

(35) Vˆ 1/2x Vˆ 1/2 x tr(Vˆ ) x c d x . k n k≤k n kk k ≤ n k k ≤ 3 k k q Also, observe that for any real functions a(x) and b(x) of xp Rd, ∈ 1 a(x) t, b(x) >c 1 a(x + y) t, b(x + y) >c | { ≤ } − { ≤ }| 1 a(x) t, b(x) >c 1 a(x + y) t, b(x) >c ≤ | { ≤ } − { ≤ }| + 1 a(x + y) t, b(x) >c 1 a(x + y) t, b(x + y) >c | { ≤ } − { ≤ }| 1 b(x) >c + 1 b(x) >c 1 b(x + y) >c . ≤ { } | { } − { }| (s 2)/2 Hence, ω¯ B (n− − ; Φ) is bounded by fn,t

1 2 c d log n< Vˆ 1/2x dΦ(x) 3 k n k Z n p o (s 2)/2 1/2 (s 2)/2 + 1 2 c d log n n− − < Vˆ x < 2 c d log n + n− − dΦ(x) 3 − k n k 3 Z n p p o ˆ 1/2 2 1 c3d log n< Vn x dΦ(x) 2 dΦ, ≤ k k ≤ x: x log n Z np o Z k k≥ (s 2)/2 for all n n , where n 1 is such that √c d log n > n− − and n depends only on s, ≥ 2 2 ≥ 3 2 2 2 c3 and d. The last inequality follows from (35). By Markov’s inequality, the last integral is bounded by

2 2 d/2 (log n) x 1 log n 1 − (36) exp exp k k dΦ(x)= n− 4 , − 4 4 2   Z     2 as the integral on the left hand side is the moment generating function of χd at 1/4. We can take n 1 that depends only on s,d such that the last term in (36) is bounded by 3 ≥ (s 2)/2  n− − . Taking n0′ to be the maximum of n0, n1, n2, n3, we obtain the corollary.

References

ANDREWS, D. W. K., AND X. SHI (2013): “Inference Based on Conditional Moment In- equalities,” Econometrica, 81, 609–666. ANGST, J., AND G.POLY (2017): “A Weak Cram´er Condition and Application to Edgeworth Expansions,” Electronic Journal of Probability, pp. 1–24. ARCONES, M. A., AND E. GIN´E (1993): “Limit Theorems for U-Processes,” Annals of Prob- ability, pp. 1494–1542. BHATTACHARYA, R. N., AND R. R. RAO (2010): Normal Approximation and Asymptotic Expansion. SIAM, Philadelphia. 27

BOOTH, J. G., P. HALL, AND A. T. A. WOOD (1994): “On the Validity of Edgeworth and Saddlepoint Approximations,” Journal of Multivariate Analysis, 51. GIN´E, E., AND J. ZINN (1991): “Gaussian Characterization of Uniform Donsker Classes of Functions,” Annals of Statistics, 19. HALL, P. (1992): The Bootstrap and Edgeworth Expansion. Springer, New York. HALL, P., AND B.-Y. JING (1995): “Uniform Coverage Bounds for Confidence Intervals and Berry-Esseen Theorems for Edgeworth Expansion,” Annals of Statistics, 1995, 363 – 375. KOLASSA, J. E., AND P. MCCULLAGH (1990): “Edgeworth for Lattice Distributions,” Annals of Statistics, 18, 981–985. LINTON, O., K.SONG, AND Y.-J. WHANG (2010): “An Improved Bootstrap Test of Stochastic Dominance,” Journal of Econometrics, 154, 186–202. MIKUSHEVA, A. (2007): “Uniform Inference in Autoregressive Models,” Econometrica, 75, 1441–1452. SHEEHY, A., AND J. A. WELLNER (1992): “Uniform Donsker Class of Functions,” Annals of Statistics, 20, 1983–2030. SINGH, K. (1981): “On the Asymptotic Accuracy of Efron’s Bootstrap,” Annals of Statistics, pp. 1187–1195.

VANCOUVER SCHOOLOF ECONOMICS,UNIVERSITYOF BRITISH COLUMBIA, 6000 IONA DRIVE, VANCOUVER, BC, V6T 1L4, CANADA E-mail address: [email protected]