arXiv:1605.01440v2 [math.ST] 17 Dec 2017 admvralswt omndistribution common with variables random einvcosand vectors design : model regression linear multiple the Consider Introduction 1. Bootstra Wild Bootstrap, Generalized Bootstrap, Residual n dnial itiue admvariables, random distributed identically and httePrubto otta otne ob ...when S.O.C. be to continues Bootstrap Perturbation student the the of that bootst distribution modified the the approximating modifica for that innovative (S.O.C.) show and an introduce bootstrapped We the correct. version order studentized second classical be the that show We inferences. sasgicn mrvmn vraypoi omlt nth in normality Keywords: asymptotic over improvement significant establis a findings as These distributed. identically be not may Abstract b Parameter DAS DEBRAJ Regression Linear Multiple of Bootstrap M-Estimator Perturbation of Correctness Order Second Bernoulli to Submitted I576 S.E-mail: USA. 53706, WI where a orcns sipratfrrdcn h prxmto er hav approximation method the reducing bootstrap for non-naive important this is Boo correctness of Perturbation results the is order scaling, second and centering proper after etro aaees neetv a fapoiaigtedi the approximating of way effective An parameters. of vector 79-23 S.E-mail: USA. 27695-8203, eateto ttsis ot aoiaSaeUniversity, State Carolina North , of Department eateto ttsis nvriyo Wisconsin-Madiso of University Statistics, of Department ∗ eerhprilyspotdb S rnsn.DS1108 DMS 1310068, DMS no. grants NSF by supported partially Research y 1 y , . . . , osdrtemlil ierrgeso model regression linear multiple the Consider . -siain ... etrainBosrp Edgeworth Bootstrap, Perturbation S.O.C., M-Estimation, a n n .N LAHIRI N. S. and r responses, are β sthe is [email protected] [email protected] p dmninlvco fparameters. of vector -dimensional y i ǫ 1 = ǫ , . . . , b x i ′ β + n ǫ r needn n dnial itiue (IID) distributed identically and independent are i i , x i 1 saekondsg etr and vectors design known are ’s F 1 = (say), , 2 y etrainBosrpapproximation Bootstrap perturbation h zdMetmtr diinly eshow we Additionally, M-estimator. ized i n , . . . , ,10 nvriyAeu,Madison, Avenue, University 1300 n, p. o nfrl to uniformly ror ftebosrpe siao al to fails estimator bootstrapped the of = x 31SisnD,Rlih NC Raleigh, Dr, Stinson 2311 srpMto.I hscretwork, current this In Method. tstrap ersinM-estimation. regression e 1 apdpvti eododrcorrect order second is pivot rapped x h errors the , . . . , enivsiae.Scn order Second investigated. been e ini h tdnie eso of version studentized the in tion i ′ tiuino h M-estimator the of stribution β + 1613192 x ǫ i n where , xaso,Studentization, Expansion, r nw o random non known are ǫ i saeidpnet but independent, are ’s o ( n ǫ i − saeindependent are ’s 1 / 2 β ogtbetter get to ) sthe is p (1.1) ∗ × β ¯ n 1 , 2 Das D. and Lahiri S. N.

Suppose β¯ is the M-estimator of β corresponding to the objective function Λ( ) i.e. n · n ′ β¯n = arg min Λ(y x t). Now if ψ( ) is the derivative of Λ( ), then β¯n is the M- t i=1 i − i · · estimator correspondingP to the score function ψ( ) and is defined as the solution of the · vector equation

n ′ xiψ(y x β)= 0. i − i Xi=1 It is known [cf. Huber(1981)] that under some conditions on the objective function, design vectors and error distribution F ;(β¯n β) with proper scaling has an asymptotically normal − 2 2 2 2 ′ distribution with 0 and dispersion matrix σ Ip where σ = Eψ (ǫ1)/E ψ (ǫ1). After introduction of bootstrap by Efron in 1979 as a technique, it has been widely used as a distributional approximation method. Resampling from the naive empirical distribution of the centered residuals in a regression setup, called residual bootstrap, was introduced by Freedman (1981). Freedman (1981) and Bickel and Freedman (1981b) had ∗ shown that given data, the conditional distribution of √n(βn β¯n) converges to the same − as the distribution of √n(β¯n β) when β¯n is the usual least square − estimator of β, that is, when Λ(x)= x2. It implies that the residual bootstrap approximation to the exact distribution of the least square estimator is first order correct as in the case of normal approximation. The advantage of the residual bootstrap approximation over normal approximation for the distribution of linear contrasts of least square estimator for general p was first shown by Navidi (1989) by investigating the underlying Edgeworth Expansion (EE); although heuristics behind the same was given by Liu (1988) in restricted case p = 1. Consequently, EE for the general M-estimator of β was obtained by Lahiri (1989b) when p = 1; whereas the same for the multivariate least square estimator was found by Qumsiyeh (1990a). EE of standardized and studentized versions of the general M-estimator in multiple setup was first obtained by Lahiri (1992). Lahiri (1992) also established the second order results for residual bootstrap in regression M-estimation. A natural generalization of from the naive empirical distribution is to sample from a weighted empirical distribution to obtain the bootstrap sample residuals. Broadly, the resulting bootstrap procedure is called the weighted or generalized bootstrap. It was introduced by Mason and Newton (1992) for bootstrapping mean of a collection of IID ran- dom variables. Mason and Newton (1992) considered exchangeable weights and established its consistency. Lahiri (1992) established second order correctness of generalized bootstrap in approximating the distribution of the M-estimator for the model (1.1) when the weights are chosen in a particular fashion depending on the design vectors. Wellner and Zhan (1996) proved the consistency of infinite dimensional generalized bootstrapped M-estimators. Con- S.O.C. of Perturbation Bootstrap 3

sequently, Chatterjee and Bose (2005) established distributional consistency of generalized bootstrap in and showed that generalized bootstrap can be used in order to estimate the asymptotic of the original estimator. Chatterjee and Bose (2005) also mentioned the bias correction essential for achieving second order correctness. An important special case of generalized bootstrap is the bayesian bootstrap of Rubin (1981). Rao and Zhao (1992) showed that the distribution function of M-estimator for the model (1.1) can be approximated consistently by bayesian bootstrap. See the monograph of Barbe and Bertail (2012) for an extensive study of generalized bootstrap. A close relative to the generalized bootstrap procedure is the wild bootstrap. It was in-

troduced by Wu (1986) in multiple linear regression model (1.1) with errors ǫi’s being het- eroscedastic. Beran (1986) justified wild bootstrap method by pointing out that the distri- bution of the least square estimator can be approximated consistently by the wild bootstrap approximation. Second order results of wild bootstrap in heteroscedastic regression model was first established by Liu (1988) when p = 1. Liu (1988) also showed that usual residual bootstrap is not capable of approximating the distribution of the least square estimator upto second order in heteroscedastic setup and described a modification in resampling procedure which can establish second order correctness. For general p, the heuristics behind achieving second order correctness by wild bootstrap in homoscedastic least square regression were discussed in Mammen (1993). Recently, Kline and Santos (2011) developed a score based bootstrap method depending on wild bootstrap in M-estimation for the homoscedastic model (1.1) and established consistency of the procedure for Wald and Lagrange Multiplier type tests for a class of M-estimators under misspecification and clustering of data. A novel bootstrap technique, called the perturbation bootstrap was introduced by Jin, Ying, and Wei (2001) as a resampling procedure where the objective function having a U- process structure was perturbed by non-negative random quantities. Jin, Ying, and Wei (2001) showed that in standardized setup, the conditional distribution of the perturbation resampling estimator given the data and the distribution of the original estimator have the same limiting distribution which this resampling method is first order correct without studentization. In a recent work, Minnier, Tian, and Cai (2011) also applied this perturbation

resampling method in penalized regression setup such as Adaptive Lasso, SCAD, lq penalty and showed that the standardized perturbed penalized estimator is first order correct. But, second order properties of this new bootstrap method have remained largely unexplored in the context of multiple linear regression. In this current work, the perturbation bootstrap approximation is shown to be S.O.C. for the distribution of studentized M-estimator for the regression model (1.1). An extension to the case of independent and non-IID errors is also established, showing the robustness of perturbation bootstrap towards the presence 4 Das D. and Lahiri S. N. of . Therefore, besides the existing bootstrap methods, the perturbation bootstrap method can also be used in regression M-estimation for making inferences regard- ing the regression parameters and higher order accuracy can be achieved than the normal approximation. A classical way of studentization in bootstrap setup, in case of regression M-estimator and ∗ ∗ ∗−1 ∗ −1 n ′ ∗ for IID errors, is to consider the studentization factor to be σn = snτn , τn = n i=1 ψ (ǫi ), ∗2 −1 n 2 ∗ ∗ ′ ∗ ∗ s = n ψ (ǫ ) where ǫ = y x βn, i 1,...,n , with βn being the perturbationP n i=1 i i i − i ∈{ } bootstrappedP estimator of β, defined in Section 2. Although the residual bootstrapped es- timator is S.O.C. after straight-forward studentization, the same pivot fails to be S.O.C. in the case of perturbation bootstrap. Two important special cases are considered as examples in this respect. The reason behind this failure is that although the bootstrap residuals are sufficient in capturing the variability of the bootstrapped estimator in residual bootstrap, it is not enough in the case of perturbation resampling. Modifications have been proposed as remedies and are shown to be S.O.C. The modifications are based on the novel idea that the variability of the random perturbing quantities G∗ (1 i n) along with the bootstrap i ≤ ≤ residuals are required to capture the variability of the perturbation bootstrapped estimator; whereas individually they are not sufficient. For technical details, see Section 4.2 and Section 5.1. With a view to establish second order correctness, we start with the standardized setup and then proceed to studentization. First, we find a two-term EE of the conditional density of a suitable of the concerned bootstrapped pivot and then we show that it is the required two-term EE corresponding to the bootstrapped pivot. The result then follows by comparing the EE of the bootstrapped pivot with that of underlying original pivot. The techniques that are to be used in finding EE have been demonstrated and discussed in Bhattacharya and Ghosh (1978), Bhattacharya and Rao (1986), Navidi (1989) and Lahiri (1992). A significant volume of work is available in bootstrapping M-estimators. We will con- clude this section by briefly reviewing the literature. Bootstrapping M-estimators in linear model has been studied by Navidi(1989), Lahiri (1992, 1996), Rao and Zhao (1992), Qum- siyeh (1994), Karabulut and Lahiri (1997), Jin, Ying and Wei (2001), Hu (2001), El Bantli (2004) among others. And in the applications other than linear model, bootstrapping in M-estimation and its subclasses has been investigated by Arcones and Gin´e(1992), Lahiri (1994), Wellner and Zhan (1996), Allen and Datta (1999), Hu and Kalbfleisch (2000), Hlavka (2003), Wang and Zhou (2004), Chatterjee and Bose (2005), Ma and Kosorok (2005), Lahiri and Zhu (2006), Cheng and Huang (2010), Feng et. al. (2011), Lee (2012), Cheng (2015), among others. S.O.C. of Perturbation Bootstrap 5

The rest of the paper is organized as follows. Perturbation bootstrap is described briefly in Section 2. Section 3 states the assumptions and motivations behind considering those assumptions. Main results for IID case, along with the modification in bootstrap studenti- zation, are stated in Section 4. An extension to the case of independent and non-IID errors is proposed in Section 5. An outline of the proofs are given in Section 6. Section 7 states concluding remarks. The details of the proofs are available in a supplementary material Das and Lahiri (2017).

2. Description of Perturbation Bootstrap

In the perturbation bootstrap, the objective function Λ( ) has been perturbed several times · by a non-negative random quantity to get a bootstrapped estimate of β. It has nothing to do with residuals in resampling stage, unlike the residual and weighted bootstrap. More ∗ precisely, the perturbation bootstrap estimator βn is defined as

n ∗ ′ ∗ βn = arg min Λ(yi xit)Gi t − Xi=1 or in terms of the score function ψ( ), as the solution of the vector equation · n ′ ∗ xiψ(y x β)G = 0 (2.1) i − i i Xi=1 where G∗, i 1,...,n are non-negative and non-degenerate completely known random i ∈ { } ∗ ∗ ¯ variables, considered as perturbation quantities. Note that, if µG is the mean of G1, then βn

n ∗ n ∗ ′ ¯ is the solution of E i=1 xiψ(¯ǫi) Gi ǫ1,...,ǫn = i=1 xiψ(¯ǫi)µG = 0 whereǫ ¯i = yi xiβn,  |  − i 1,...,n , are theP residuals corresponding to theP M-estimator β¯ . This observation will ∈{ } n be helpful in finding a suitable stochastic approximation in bootstrap regime. For details, see Section 6. The central idea of the perturbation bootstrap is to draw a relatively large collection of ∗b ∗b ∗ IID random samples (G1 ,...,Gn ): b =1,...,B from the distribution of G1 and then to { } ∗ find the conditional empirical distribution of √n(βn β¯n) given data y : i =1,...,n, by − i solving n ′ ∗b xiψ(y x β)G = 0 i − i i Xi=1 for each b 1,...,B ; to approximate the distribution of √n(β¯n β) asymptotically. ∈ { } − As a result the bootstrapped distribution may be used as an approximation to the original distribution, just like the normal approximation, in constructing confidence intervals and testing of hypotheses regarding β. 6 Das D. and Lahiri S. N.

∗ Now, in the perturbation bootstrap M-estimation, Gi ’s can be thought of as weight cor- responding to the ith data point (xi,yi). To make it easier to understand, consider the least 2 ∗ square setup i.e. Λ(x)= x . In this case βn takes the form

n −1 n ∗ ′ ∗ ∗ βn = xixiGi xiyiGi (2.2)  Xi=1   Xi=1  ∗ indicating that the perturbing quantities Gi ’s can be thought of as weights.

Remark 2.1. Consider the least square estimator βˆn. Then keeping the asymptotic prop- ˆ∗ ˆ erties fixed, the perturbation bootstrap version β1n of βn can be defined alternatively as the solution of n n ′ ∗ ′ ∗ xi(y x β) G µ ∗ + xix (βˆn β) 2µ ∗ G = 0 i − i i − G i − G − i Xi=1   Xi=1   ˆ∗ which in turn implies that β1n is the solution of

n ∗ ′ xi(z x β)= 0 (2.3) i − i Xi=1 ∗ ′ −1 ∗ ′ where z = x βˆn +ˆǫ [µ ∗ (G µ ∗ )],ǫ ˆ = y x βˆn, i 1,...,n . On the other hand, the i i i G i − G i i − i ∈{ } ˆ∗ ˆ simple wild bootstrap version β2n of βn is defined as the solution of

n ∗ ′ xi(y x β)= 0 (2.4) i − i Xi=1 ∗ ′ where y = x βˆn +ˆǫ t , i 1,...,n and t ,...,t is a set of IID random variables i i i i ∈ { } { 1 n} independent of ǫ ,...,ǫ with Et = 0, Var(t ) = 1. Additionally, one needs E(t3)=1 { 1 n} 1 1 1 for establishing second order correctness of wild bootstrap approximation [cf. Liu (1988), Mammen (1993)]. Now Looking at (2.3) and (2.4) and in view of assumption (A.5)(ii), it can be said that the perturbation bootstrap coincides with the wild bootstrap in least square setup. Therefore one can view perturbation bootstrap as a generalization of the wild bootstrap in regression M-estimation.

Remark 2.2. There is a basic difference between perturbation bootstrap and weighted bootstrap with respect to the construction of the bootstrapped estimator. Whereas in the perturbation bootstrap, the bootstrapped estimator is defined through the non-negative and non-degenerate random perturbations of the objective function; in weighted bootstrap, the bootstrapped estimator is defined through bootstrap samples drawn from a weighted empir- ical distribution. See for example the construction of the weighted bootstrapped estimator corresponding to Theorem 2.3 of Lahiri (1992) and compare it with our construction as S.O.C. of Perturbation Bootstrap 7

stated in Section 2. However, as pointed out by a referee, one can think of the perturba- tion bootstrap, defined in Section 2, as the weighted bootstrap version of some statistical functional if the design vectors are random. Suppose, (x ,y ) ..., (x ,y ) are IID with { 1 1 n n } underlying probability measure Q. Then one can write

′ β = T (Q) = arg min EQ Λ(yi xit) t  −  −1 n for some statistical functional T ( ). Define empirical measures Q = n ½(x ,y ) and · n i=1 i i

−1 n ½ Q = n ½(x ,y )W where ( ) is the indicator function and PW ,...,W are n,W i=1 i i i · { 1 n} weights. ThenP we have β¯ = T (Q ) and β∗ = T (Q ) when W = G∗, i 1,...,n . The n n n n,W i i ∈{ } weighted bootstrap of general statistical functionals of only the IID random variables is con- sidered in the monograph of Barbe and Bertail (2012). Second order correctness of weighted bootstrap of standardized mean of IID random variables was established by Haeusler et. al. (1992) under two choices of weights. One choice is the non-negative IID weights and the other one is the self-normalized sum of non-negative IID random variables. Their results were extended by Barbe and Bertail (2012) for general statistical functionals in IID case when the weights are self-normalized sum of non-negative IID random variables [cf. Corol- lary 4.1 of Barbe and Bertail (2012)]. For general M-estimation, Chatterjee (1999) showed that weighted bootstrap estimator is generally biased and established its second order cor- rectness after properly correcting for the bias. To the best of our knowledge, there is no second order result available in the literature under studentized setup for general statistical functional. In this article, we have assumed the design vectors to be non-random, implying that our setup fits neither in the general statistical functional setup of Barbe and Bertail (2012) nor in the general M-estimation setup of Chatterjee (1999); although Theorem 5.1 continue to hold when the design is random. Throughout the article we consider weights to be non-negative IID. Our main motivation is to explore second order results in studentized setup which, unlike the standardized (i.e., the known variance) case, is applicable in practice. Further, we prove our results in the situation when errors are heteroscedastic. We establish all our second order correctness results without requiring any bias correction.

3. Assumptions

′ n ′ 1/2 −1 2 Suppose, xi = (x , x ,...,x ) . Define, Dn D = ( xix ) , An = n D , di = i1 i2 ip ≡ i=1 i −1 p(p + 1) 2 D xi, 1 i n and q = . Also define, q 1P vector zi = (x , x x ,...,x x ≤ ≤ 2 × i1 i1 i2 i1 ip 2 2 ′ n , x , x x ,... ,x x ,...,x ) . Note that for any constants a ,...,a R, a zi = 0 i2 i2 i3 i2 ip ip i n ∈ i=1 i n ′ which implies and is implied by a xix = 0. Hence, z1,..., zn are linearlyP independent i=1 i i { } P 8 Das D. and Lahiri S. N.

′ if and only if xix : 1 i n are linearly independent. Therefore, r = the rank of { i ≤ ≤ } n n ′ ziz is nondecreasing in n. So, if r = max r : n 1 then without loss of generality i=1 i { n ≥ } (w.l.g.),P we can assume that r = r for all n q. Consider canonical decomposition of n ≥ n ′ i=1 zizi as P n ′ ′ Ir 0 L( zizi)L = 0 0 Xi=1   ′ ′ ′ where L is a q q non-singular matrix. Partition L as L = [L L ], where L1 is of order × 1 2 r q. Define r 1 vector ˜zi by × ×

˜zi = L1zi, 1 i n ≤ ≤

n ′ n ′ ′ ′ ′ ′ ′ ′ −1 ′ Note that i=1 ˜zi˜zi = L1( i=1 zizi)L1 = Ir. Suppose, vi =(xiψ(ǫ1), ziψ (ǫ1)) . ˘zi =(zi, n ) .

Let, ΦVPdenotes the normalP distribution with mean 0 and dispersion matrix V and φV is ′ ′′ the density of ΦV. Write ΦV = Φ and φV = φ when V is the identity matrix. h , h denote respectively first and second derivatives of real valued function h that is twice differentiable. Also . denotes euclidean norm.For any set B Rp and ǫ > 0, δB denotes the boundary || || ∈ of B, B denotes the cardinality of B and Bǫ = x : x Rp and d(x, B) < ǫ where | | { ∈ } d(x, B) = inf x y : y B . For a function f : Rl R and a non-negative integral {|| − || ∈ } → ′ α α1 αl αj vector α = (α1,α2,...,αl) , D f = D1 ...Dl f, where Dj f denotes αj times partial derivative of f with respect to the jth component of its argument, 1 j l. Also assume ≤ ≤ ′ p that (e1,..., ep) is the standard basis of R . Let, P∗ and E∗ respectively denote conditional ∗ B bootstrap probability and conditional expectation of G1 given data. The class of sets denotes the collection of borel subsets of Rp satisfying

sup Φ((δB)ǫ)= O(ǫ) as ǫ 0 (3.1) B∈B ↓ Next we state the assumptions:

(A.1) ψ( ) is twice differentiable and ψ′′( ) satisfies a Lipschitz condition of order α for some · · 0 < 2α 1. ≤ (A.2) (i) An A1 as n for some positive definite matrix A1. → →∞ −1 n ′ (ii) E(n viv ) A2 as n for some non-singular matrix A2, where i=1 i → → ∞ expectationP is with respect to F . ′ −1 n ′ (ii) E(n ˜vi˜v ) A3 as n for some non-singular matrix A3 where ˜vi is i=1 i → → ∞ definedP as same way as vi with zi being replaced by ˘zi.

α/2 n 6+2α 1/2 n 4 −1 (iii) n ( di ) + ˜zi = O(n ) i=1 || || i=1 || || P P S.O.C. of Perturbation Bootstrap 9

(A.3) (i) Eψ(ǫ ) = 0 and σ2 = Eψ2(ǫ )/E(ψ′(ǫ )) (0, ). 1 1 1 ∈ ∞ (ii) E ψ(ǫ ) 4 + E ψ′(ǫ ) 4 + E ψ′′(ǫ ) 2 < . | 1 | | 1 | | 1 | ∞ (A.4) G∗ and ǫ are independent for all 1 i n. i i ≤ ≤ (A.5) (i) EG∗3 < 1 ∞ ∗ 2 ∗ 3 3 (ii) Var(G )= µ ∗ , E(G µ ∗ ) = µ ∗ . 1 G 1 − G G (iii) G∗ µ ∗ satisfies Cramer’s condition: 1 − G  lim sup E exp it G∗ µ ∗ < 1. |t|→∞ 1 − G  2   (iii)′ G∗ µ ∗ , G∗ µ ∗ satisfies Cramer’s condition: 1 − G 1 − G      2 ∗ ∗ ∗ ∗ lim sup||(t1,t2)||→∞ E exp it1 G1 µG + it2 G1 µG < 1 − − ′        (A.6) (i) ψ(ǫ1), ψ (ǫ1) satisfies Cramer’s condition:

  ′ lim sup||(t1,t2)||→∞ E exp it1ψ(ǫ1)+ it2ψ (ǫ1) < 1

′ ′ 2    (i) ψ(ǫ1), ψ (ǫ1), ψ (ǫ1) satisfies Cramer’s condition:

  ′ 2 lim sup||(t1,t2,t3)||→∞ E exp it1ψ(ǫ1)+ it2ψ (ǫ1)+ it3ψ (ǫ1) < 1

  

′ ′ ′ ′ Define ¯vi =(¯x , ¯z ) where ¯xi = xiψ(¯ǫ ), ¯zi = ziψ (¯ǫ ); ǫ¯ ,..., ǫ¯ being the set of residuals. i i i i { 1 n} ¯ −1 n ′ ¯ −1 n ′ ′ −1 n ′ Also, define A2n = n i=1 ¯xi¯xi and A1n = n i=1 xixiψ (¯ǫi). Note that n i=1 ¯vi¯vi is −1 n ′ n ′ an estimate of the matrixPE(n i=1 vivi) and dueP to assumption (A.2)(ii), i=1P¯vi¯vi is non- singular for sufficiently large n. Hence,P without loss of generality the canonicalP decomposition n ′ of i=1 ¯vi¯vi can be assumed as n P ′ ′ B ¯vi¯vi B = Ik  Xi=1  where k = p + q and B is a k k non-singular matrix. Define k 1 vector ˘vi by × ×

˘vi = B¯vi, 1 i n ≤ ≤ To find valid EE in the perturbation bootstrap regime, the following condition [cf. Navidi (1989)] is also required:

(A.7) There exists a δ > 0 such that K (δ)/logγ where Bn(δ) = 1 i n : − n n → ∞ { ≤ ≤ ′ 2 2 k 2 (˘v t) > δγ for all t R with t = 1 , K (δ) = Bn(δ) , the cardinality of i n ∈ || || } n | | n 4 1/2 the set Bn(δ), and γ =( ˘vi ) . n i=1 || || P But note that the condition (A.7) has already been satisfied in our set up due to Lemma 6.2 and the proposition in Lahiri (1992). 10 Das D. and Lahiri S. N.

Now we briefly explain the assumptions. Assumption (A.1) is smoothness condition on the score function ψ( ). This condition is essential for obtaining a Taylor’s expansion of ψ( ) · · around regression errors. Assumption (A.2) presents the regularity conditions on the design vectors necessary to find EE. For the validity of asymptotic normality of the regression M- estimator, only (A.2)(i) is enough [cf. Huber (1981)]; whereas additional condition (A.2)(ii) is required for the validity of the EE. (A.2)(iii) states atmost how fast the L2 norm of the design vectors can increase to get a valid EE. This condition is somewhat stronger than the condition (C.6) assumed in Lahiri (1992); although there was a reduction in accuracy of bootstrap approximation due to this relaxation. This type of conditions are quite common in the literature of edgeworth expansions in regression setup; see for example Navidi (1989), Qumsiyeh (1990a). We now state an example where assumption (A.2) (iii) is fulfilled.

Example 3.1. Suppose, X(1),..., X(p) is a set of independent random vectors where { } (j) ′ X = (X1j,..., Xnj) is a vector of n IID copies of the non-degenerate random variable X , j 1,...,p . Define, p p matrix M = ((m )) where m = E(X2 X2 ) and 1j ∈{ } × jk j,k=1,...,p jk 1j 1k n p matrix X = X(1),..., X(p) . Assume, E(X )= E(X3 )=0 and E X 8 < for all × 1j 1j | 1j| ∞ j 1,...,p anddet(M) =0. Then for the design matrix X, assumption (A.2) (iii) holds ∈{ } 6 with probability 1 (w.p. 1).

proof :

′ 2 2 For the design matrix X, xi = (Xi1,Xi2,...,Xip) and zi = (Xi1,Xi1Xi2,...,Xi1Xip,Xi2 ,X X ,...,X X ,...,X2 )′ for i 1,...,n . i2 i3 i2 ip ip ∈{ } First note that if all the entries of X are IID then the condition det(M) = 0 is redundant. 6 −1 2 2 2 By Kolmogorov strong law of large numbers, An = n D diag E(X ),..., E(X ) → 11 1p −1 n 6+2α 6+2α and n xi E x1 both w.p.1 and hence   i=1 || || → || || P n 1/2 n 1/2 α/2 6+2α α/2 −1 3+α 6+2α n di n D xi || || ≤ || || || ||  Xi=1   Xi=1  = O(n−1) w.p.1 (3.2)

Again, since M is a non-singular matrix, n−1 n z z′ N w.p.1, for some positive i=1 i i → definite matrix N. This implies that L = O(n−1P/2) w.p.1 and hence || || n n 4 4 4 ˜zi L zi || || ≤ || || || || Xi=1 Xi=1 = O(n−1) w.p.1 (3.3)

Therefore, our claim follows from (3.2) and (3.3). S.O.C. of Perturbation Bootstrap 11

Assumption (A.3) is the condition on the error variables through the score func- tion ψ( ). (A.3)(i) is generally assumed to establish asymptotic normality. Assumption (A.4) · ∗ is inherent in the present setup, since Gi ’s are introduced by us to define the bootstrapped estimator whereas ǫi’s are already present in the process of data generation. The conditions present in Assumption (A.5) are moment and smoothness conditions on the perturbing quan- ∗ tities Gi ’s, required for the valid two term EE in bootstrap setup. The Cramer’s condition is very common in the literature of edgeworth expansions. Cramer’s condition is satisfied when the distribution of (G∗ µ ∗ ) or ((G∗ µ∗ ), (G∗ EG∗)2) has a non-degenerate compo- 1 − G 1 − G 1 − 1 nent which is absolutely continuous with respect to Lebesgue measure [cf. Hall (1992)]. An ∗ immediate choice of the distribution of G1 is Beta(γ, δ) where 3γ = δ = 3/2. Also one can investigate Generalized Beta family of distributions for more choices of the distribution of ∗ G1. Assumption (A.6) is the Cramer’s condition on the errors. Although this assumption is not needed for obtaining EE of the bootstrapped estimators, it is needed for obtaining EE for the original M-estimator. Note that the condition (A.7) is somewhat abstract. Hence as pointed out by a referee, some clarification would be helpful. To this end, it is worth mentioning that to find formal EE for the standardized bootstrapped pivot (see section 4.1), the most difficult step is to show

α ′ ∗ max D E eit Tn dt = o n−1/2 (3.4) α ∗ p | |≤p+q+4 C1≤γn||t||≤C2 | | Z   ∗ n ˘ ∗ ˘ ∗ ˘ ∗ where C1,C2 are non-negative constants and Tn = i=1 Xi E∗(Xi ) , with Xi = ˘vi(Gi − ′ −∗ α it T µ ∗ )1 ˘vi(G µ ∗ ) 1 . Now it is easy to see thatP for any α p + q + 4, D E e n G || i − G || ≤ | | ≤ | ∗ | is bounded above by a sum of n|α|-terms, each of which is bounded above by

α ′ ˘ ∗ ˘ ∗ ˘ ∗ | | ∗ it Xi C(α) max E∗ Xi E∗(Xi ) : i In E∗e · { || − || ∈ }· ∗c | | i∈YIn where I∗ 1,...,n is of size α and I∗c = 1,...,n I∗ and C(α) is a constant which n ⊂ { } | | n { }\ n depends only on α. Now note that for all i 1,...,n , ∈{ } E X˘ ∗ E (X˘ ∗) |α| 2|α| ∗|| i − ∗ i || ≤ ′ ˘ ∗ ′ it X it ˘vi(Gi−µG∗ ) and E e i E e +2P ˘vi(G µ ∗ ) > 1 | ∗ |≤| ∗ | ∗ || i − G ||   Hence, in view of Cramer’s condition (A.5) (iii) and Lemma 6.2, if there exists a sequence −1 ′ of sets Jn such that Jn 1,...,n and for all i Jn, γ t ˘vi > ξ for some ξ > 0, { }n≥1 ⊂ { } ∈ n | | 12 Das D. and Lahiri S. N.

then for some 0 <θ< 1 we have

it′X˘ ∗ sup E∗e i : C1 γn t C2 ∗c | | ≤ || || ≤  i∈YIn  −1 ′ ∗ iγn t X˘ sup E∗e i : C1 t C2 ≤ ∗c | | ≤ || || ≤  i∈IYn ∩Jn  ∗c θ|In ∩Jn| (3.5) ≤ ∗c −1 Again I Jn Jn α and γ kn . Therefore, to achieve (3.4), it is enough to have | n ∩ |≥| |−| | n ≥ n2(p+q)+4 θ|Jn|−(p+q+4) = o(n−1/2) ·

Hence due to Lemma 6.2, it is enough to have Jn a C log γ for some positive constant | | ≥ n − · n C and a sequence of constants a increasing to . This observation together with (3.5) { n} ∞ justifies condition (A.7).

We will denote the assumptions (A.1)-(A.5) by (A.1)′-(A.5)′ when (A.2) and (A.5) are respectively defined with (ii)′ and (iii)′ instead of (ii) and (iii).

4. Main Results

4.1. Rate of Perturbation Bootstrap Approximation

Here we will state the approximation results both in standardized and studentized setup. It ¯ 2 −1 is well known that √nβn has asymptotic variance σ An . So, the standardized version of ¯ −1 1/2 ¯ the M-estimator βn is defined as Fn = √nσ An (βn β). Now to define the standardized ∗− version of the corresponding bootstrapped statistic βn, we need its conditional asymptotic variance, given the data. Using Taylor’s expansion, it is quite easy to get the conditional ∗ ¯ −1 ¯ ¯ −1 ¯ −1 asymptotic variance of √nβn as A1n A2nA1n . Note that inverse of the matrices A1n and ¯ −1 A2n are well defined for sufficiently large sample size n due to the assumption (A.2)(i) and ∗ (A.3)(ii). Hence, the standardized bootstrapped M-estimator Fn can be defined as

∗ −1/2 ∗ F = √nΣ¯ (βn β¯n) n n − ¯ −1/2 ¯ −1/2 ¯ ¯ 1/2 ¯ where Σn = A2n A1n, A2n being defined in terms of the spectral decomposition of A2n; although it can be defined in many different ways [cf. Lahiri (1994)]. Under some regularity ∗ conditions, both the distribution of Fn and the conditional distribution of Fn can be shown to be approximated asymptotically by a Normal distribution with mean 0 and variance Ip. Hence, it is straightforward that perturbation bootstrap approximation to the distribution S.O.C. of Perturbation Bootstrap 13

of the M-estimator is first order correct. The second order result in standardized case is formally stated in Theorem 4.1.

Proposition 4.1. Suppose, the assumptions (A.1)-(A4), (A.5)(i) hold. Then there exist n constant C > 0 and a sequence of Borel sets Q1n R , such that P((ǫ ,...,ǫ ) Q1n) 1 1 ⊆ 1 n ∈ → as n , and given (ǫ1,...,ǫn) Q1n, n C1 such that there exists a sequence of statistics ∗ →∞ ∈ ≥ βn such that { }n≥1 ∗ ∗ −1/2 1/2 −1/2 P βn solves (2.1) and βn β¯n C .n .(logn) 1 δ n ∗ || − || ≤ 1 ≥ − n   where δ δ (ǫ ,...,ǫ ) tends to 0. n ≡ n 1 n ∗ Theorem 4.1. Let βn be a sequence of statistics satisfying Proposition 4.1 depending { }n≥1 on (ǫ1,...,ǫn). Assume, the assumptions (A.1)-(A.5) hold.

n (a) Then there exist constant C > 0 and a sequence of Borel sets Q2n R and 2 ⊆ polynomial a∗ ( ,ψ,G∗) depending on first three moments of G∗ and on ψ( ), ψ′( ) n · 1 · · ′′ & ψ ( ) through the residuals ǫ¯ ,..., ¯ǫ such that given (ǫ ,...,ǫ ) Q2n, with · { 1 n} 1 n ∈ P((ǫ ,...,ǫ ) Q2n) 1, we have for n C , 1 n ∈ → ≥ 2 ∗ ∗ −1/2 sup P∗(Fn B) ξn(x)dx δnn B∈B | ∈ − ZB | ≤ where ξ∗(x)=(1+ n−1/2a∗ (x,ψ,G∗))φ(x) and δ δ (ǫ ,...,ǫ ) tends to 0. n n n ≡ n 1 n

(b) Suppose in addition assumption (A.6)(i) holds. Then we have,

∗ −1/2 sup P∗(Fn B) P(Fn B) = op(n ) B∈B ∈ − ∈

Now, the quantity σ2 is mostly unavailable in practical circumstances. Hence, the non-

like Fn is very rare in use in providing valid inferences. It is more reasonable to explore the asymptotic properties of a pivotal quantity, like the studentized version of the ′ M-estimator β¯n. Depending on the observed residualsǫ ¯ = y xi β¯n, i 1,...,n , the i i − ∈ { } 2 2 −1 −1 n ′ natural way to define an estimator of σ isσ ˆn whereσ ˆn = snτn , τn = n i=1 ψ (¯ǫi) 2 −1 n 2 and sn = n i=1 ψ (¯ǫi). Hence, the studentized M-estimator in regression setupP may be −1 1/2 defined as Hn P= √nσˆ A (β¯n β). Define the studentized version of the corresponding n n − bootstrapped estimator as

∗ ∗−1 −1/2 ∗ H = √nσ σˆ Σ¯ (βn β¯n) n n n n − ∗ ∗ ∗−1 ∗ −1 n ′ ∗ ∗2 −1 n 2 ∗ 2 ¯ −1/2 where σn = snτn , τn = n i=1 ψ (ǫi ), sn = n i=1 ψ (ǫi ) andσ ˆn and Σn are as defined earlier. P P 14 Das D. and Lahiri S. N.

Theorem 4.2. Suppose, the assumptions (A.1)-(A.5) hold.

n (a) Then there exist constant C > 0 and a sequence of Borel sets Q3n R and 3 ⊆ polynomial a˜∗ ( ,ψ,G∗) depending on first three moments of G∗ and on ψ( ), ψ′( ) n · 1 · · ′′ & ψ ( ) through the residuals ǫ¯ ,..., ¯ǫ , such that given (ǫ ,...,ǫ ) Q3n, with · { 1 n} 1 n ∈ P((ǫ ,...,ǫ ) Q3n) 1, we have for n C , 1 n ∈ → ≥ 3 ∗ ˜∗ −1/2 sup P∗(Hn B) ξn(x)dx δnn B∈B | ∈ − ZB | ≤ where ξ˜∗(x)=(1+ n−1/2a˜∗ (x,ψ,G∗))φ(x) and δ δ (ǫ ,...,ǫ ) tends to 0. n n n ≡ n 1 n

Suppose in addition assumption (A.6)(i)′ holds. Then

(b) for the collection of Borel sets B defined by (3.1),

∗ −1/2 sup P∗(Hn B) P(Hn B) = Op(n ) B∈B ∈ − ∈

(c) if 2Eψ2(ǫ )Eψ(ǫ )ψ′(ǫ ) = Eψ′(ǫ )Eψ3(ǫ ), then there exists ǫ> 0 such that, 1 1 1 6 1 1 ∗ P lim inf √n sup P∗(Hn B) P(Hn B) > ǫ =1 n→∞ B∈B ∈ − ∈    

Remark 4.1. Proposition 4.1 states that there exists a sequence of perturbation boot- ∗ −1/2 1/2 strapped estimator βn within a neighborhood of length C.n (logn) around the original −1/2 M-estimator β¯n outside a set of bootstrap probability op(n ). This existence result is es- sential in finding valid EEs in bootstrap regime. This can be compared with Theorem 2.3 (a) of Lahiri (1992), where similar kind of result was shown in case of residual and generalized bootstrap.

Remark 4.2. Note that, where as the error term in approximating the distribution of M- −1/2 estimator by perturbation bootstrap is of order Op(n ) in the prevalent studentize setup, −1/2 it reduces the order of the error of approximation to op(n ) in simple standardized setup. This means that the difference between coefficients corresponding to the term n−1/2 in the EEs of original and bootstrapped estimator can be made arbitrarily small in standardized setup, but not in usual studentized setup.

Remark 4.3. To understand part (c) of Theorem 4.2, consider the usual least square estimator. In least square setup, the condition in the Theorem 4.2 (c) reduces to Eǫ3 = 0. This 6 simply means that if the studentization in perturbation bootstrapped version is performed analogously as in case of original least square estimator, then the bootstrap distribution can S.O.C. of Perturbation Bootstrap 15

not correct the original distribution upto second order. If this is investigated more deeply, then it can be observed that the usual studentized perturbation bootstrap approximation can not correct for the of the error distribution F .

4.1.1. Examples

Theorem 4.2 concludes that the standard way of performing studentization of the boot- strapped estimator is first order correct. In order to show that the usual studentized setup is not second order correct, we consider following two important special cases with ψ(x)= x.

Example 4.1

Consider the observations y ,...,y are coming from the distribution F with a location { 1 n} shift µ. This in terms of regression model becomes

yi = µ + ǫi

Hence, in this setup p = 1, β = µ and x = 1 for all i 1,...,n . i ∈{ }

∗ ∗ It can be shown that in this setup, ξ˜ ( ) and ξ˜ ( ), the EE of Hn and H respectively, n · n · n turn out to be d d3 ξ˜ (x)= 1 n−1/2 ˜b +6−1˜b φ(x) n − 11 dx 31 dx3  n o d d3 ξ˜∗(x)= 1 n−1/2 ˜b∗ +6−1˜b∗ φ(x) n − 11 dx 31 dx3  n o where

˜b = 2−1σ−3Eǫ3, ˜b = 2σ−3Eǫ3 11 − 1 31 − 1 ˜b∗ = 2σ−1n−1 n ǫ¯ , ˜b∗ = σ−3 12σ−1n−1 n ¯ǫ 11 − n i=1 i 31 n − n i=1 i P P ˜∗ ˜∗ ˜ ˜ It is clear that b11 as well as b31 are not converging respectively to b11 and b31 in probability and hence the perturbation bootstrap method is not second order correct in the above setup when the bootstrapped estimator is studentized in the usual manner.

Example 4.2

Consider the model

yi = β0 + β1xi + ǫi 16 Das D. and Lahiri S. N.

where β0 and β1 are parameters of interest and ǫi’s are IID errors. This model, in terms of our ′ ′ multivariate linear regression structure, can be written as yi = ˜xiβ + ǫi where β =(β0, β1) ′ and ˜xi = (1, xi) . Hence, the EEs of the original and bootstrapped estimators upto the order o(n−1/2), after usual studentization, respectively becomes

2 3 ˜(j,3−j) ˜ −1/2 ˜∗(j) ∂ b31 (j,3−j) ξn(y1,y2)= 1 n b11 + D φ(y1,y2)  − ∂yj j!(3 j)!   jX=1 jX=0 −    2 3 ˜∗(j,3−j) ˜∗ −1/2 ˜∗(j) ∂ b31 (j,3−j) ξn(y1,y2)= 1 n b11 + D φ(y1,y2)  − ∂yj j!(3 j)!   jX=1 jX=0 −  where  

˜(j) −1 −1 n ′ −1/2 b11 = 2 n i=1 ejAn ˜xi γ1 −   P ˜∗(j) b11 = op(1)

′ p where e1,..., ep is the standard basis of R , j = 1 or 2, γ1 is the coefficient of skewness   1x ¯ −1 n ′ −1 n ¯2 −1 n 2 ¯ of ǫ1, An = n ˜xi˜xi = wherex ¯ = n xi and x = n x . A2n is as i=1 x¯ x¯2 i=1 i=1 i P P P defined in general setup with˜xi in place of x for i 1,...,n . The form of the coefficients i ∈{ } ˜(j1,j2) ˜∗(j1,j2) b31 and b31 are given in the supplementary material Das and Lahiri (2017) for all (j , j ) (a, b): a, b 0, 1, 2, 3 and a + b =3 . 1 2 ∈{ ∈{ } } Note that, the coefficients ˜b(j), 1 j p, all can not vanish together unless γ = 0 and 11 ≤ ≤ 1 ˜∗(j) ˜(j) hence b11 can not converge to b11 unless γ1 = 0. Similarly, it can be shown that same ˜(j,3−j) ˜∗(j,3−j) condition is required to have the closeness of the coefficients b31 and b31 . Hence, the

two EEs can not get closer unless γ1 = 0, similar to the Example 4.1. This is exactly what is stated in the part (c) of Theorem 4.2 in most general form.

4.2. Modification to the bootstrapped pivot

∗ As it has been seen that Hn, the usual studentized version of the perturbation bootstrapped −1/2 estimator is not attending the desired optimal rate op(n ), so in the perspective of statis- tical inference, perturbation bootstrap is not advantageous over asymptotic normal approx- imation. For the sake of obtaining second order correctness, define the modified studentized ∗ βn as ∗ ∗ −1 −1/2 ∗ H˜ = √n(˜σ ) σˆ Σ¯ (βn β¯n) (4.1) n n n n − S.O.C. of Perturbation Bootstrap 17

where

n n σ˜∗ =s ˜∗ τ˜∗−1,τ ˜∗ = n−1 ψ′(ǫ∗)G∗,s ˜∗2 = n−1 ψ2(ǫ∗)(G∗ µ ∗ )2. n n n n i=1 i i n i=1 i i − G P P ˜ ∗ The bootstrapped statistic Hn can be seen to be achieving the optimal rate, namely −1/2 op(n ), in approximating the original studentized M-estimator Hn, which is formally stated in the following theorem:

Theorem 4.3. Suppose, the assumptions (A.1)′-(A.5)′ hold. Also assume EG∗4 < . 1 ∞ n (a) Then there exist constant C > 0 and a sequence of Borel sets Q4n R and 4 ⊆ polynomial a¯∗ ( ,ψ,G∗) depending on first three moments of G∗ and on ψ( ), ψ′( ) n · 1 · · ′′ & ψ ( ) through the residuals ǫ¯ ,..., ¯ǫ , such that given (ǫ ,...,ǫ ) Q4n, with · { 1 n} 1 n ∈ P((ǫ ,...,ǫ ) Q4n) 1, we have for n C , 1 n ∈ → ≥ 4 ˜ ∗ ¯∗ −1/2 sup P∗(Hn B) ξn(x)dx δnn B∈B | ∈ − ZB | ≤ where ξ¯∗(x)=(1+ n−1/2a¯∗ (x,ψ,G∗))φ(x) and δ δ (ǫ ,...,ǫ ) tends to 0. n n n ≡ n 1 n

(b) Suppose, in addition (A.6)(i)′ holds. Then, for the collection of Borel sets defined by (3.1), ˜ ∗ −1/2 sup P∗(Hn B) P(Hn B) = op(n ) B∈B ∈ − ∈

Remark 4.4. The modification that is needed to make the perturbation bootstrap method correct upto second order, suggests that besides incorporating the effect of bootstrap ran- domization through ψ( ) and ψ′( ) in the studentization factor of the bootstrap estimator, it · · is also essential to blend properly the effect of randomization that is coming directly from ∗ the perturbing quantities Gi s.

Remark 4.5. As pointed out by a referee, the usefulness of the above results depend

critically on the rate of the probability P (ǫ ,...,ǫ Qin) , i = 1, 2, 3, 4. Following the 1 n ∈ −1/2 −2+γ2 steps of the proofs, it can be shown that P (ǫ ,...,ǫ Qn) = 1 O n (log n) 1 n ∈ − 4 where Qn = Qin, for some γ (0, 2), although the rate can be improved under moment ∩i=1 2 ∈ condition stronger than (A.3) (ii). In general, if E ψ(ǫ ) 2γ3 + E ψ′(ǫ ) 2γ3 + E ψ′′(ǫ ) γ3 < | 1 | | 1 | | 1 | ∞ for some natural number γ 2, then analogously it can be shown that P (ǫ ,...,ǫ 3 ≥ 1 n ∈ −(2γ3−3)/2 −γ3+γ2 Qn) =1 O n (log n) for some γ (0,γ ). This implies that second order − 2 ∈ 3 correctness of perturbation bootstrap can be established in almost sure sense under higher moment condition. 18 Das D. and Lahiri S. N.

Remark 4.6. The condition (3.1) on the collection of Borel subsets B of Rp, that is con- sidered in the above theorems, is somewhat abstract. This condition is needed for achieving two goals. One is to obtain valid EE for the normalized part of the underlying pivot [cf. Corollary 20.4 of Bhattacharya and Rao (1986)] and the other one is to bound the remain- der term with an order o(n−1/2) with probability (or bootstrap probability) 1 o(n−1/2). − These two together allow us to get EE for the underlying pivots. A natural choice for B is the collection of all Borel measurable convex subsets of Rp.

5. Extension to independent and non-identically distributed errors

In this section, we will extend second order results of perturbation bootstrap to the model (1.1) with independent and non-identically distributed [hereafter referred to as non-IID] errors. Clearly the case of non-IID errors includes the situation when the regression errors are heteroscedastic. In many practical situations, the measurements obtained have different variability due to a number of reasons and hence it is crucial for an inference procedure to be robust towards the presence of heteroscedasticity. We will show that perturbation bootstrap can approximate the exact distribution of the regression M-estimator β¯n upto second order even when the errors are non-IID. Before stating second order result in non-IID case, we describe briefly the literature avail- able on bootstrap methods in heteroscedastic regression. Although there is huge literature available on bootstrap in homoscedastic regression, literature on bootstrap in heteroscedastic regression models is limited. Wu (1986) mentioned the limitation of residual bootstrap in het- eroscedasticity and introduced wild bootstrap in least square regression. Beran (1986) gave justification behind consistency of wild bootstrap. Liu (1988) established second order cor- rectness of wild bootstrap in heteroscedastic least square regression when dimension p = 1. Liu (1988) proposed a modification of residual bootstrap in resampling stage and gave jus- tification behind second order correctness. You and Chen (2006) proved consistency of wild bootstrap in approximating the distribution of least square estimator in semiparametric het- eroscedastic regression model. Davidson and Flachaire (2008) and Davidson and Mackkinnon (2010) developed wild bootstrap procedure for testing the coefficients in heteroscedastic lin- ear regression. Arlot (2009) developed a resampling-based penalization procedure for based on exchangeable weighted bootstrap. We state some additional assumptions needed to establish second order correctness. De- −1 n ′ ′ −1 n ′ 2 fine, A1n = n i=1 xixi Eψ (ǫi) and A2n = n i=1 xixiEψ (ǫi). P P S.O.C. of Perturbation Bootstrap 19

′′ −2 n 12 n 4 ′ 4 −1 (A.2)(iii) n xi + ˜zi max 1, E ψ (ǫ ) = O(n ). i=1 || || i=1 || || { | i | } (A.3)(i)′′ Eψ(Pǫ ) = 0 for all iP 1h,...,n . i i ∈{ } ′′ −1 n 6+υ ′ 6+υ ′′ 4+υ (A.3)(ii) n i=1 E ψ(ǫi) + E ψ (ǫi) + E ψ (ǫi) = O(1) for some υ > 0. | | ∞ | | | | ′′ h′ 2 i (A.6)(i) ψ(Pǫn), ψ (ǫn), ψ (ǫn) satisfies Cramer’s condition in a uniform sense i.e. for n=1 any positive b, 

′ 2 lim sup sup E exp it1ψ(ǫn)+ it2ψ (ǫn)+ it3ψ (ǫn) < 1. n→∞ ||(t1,t2,t3)||>b   

(A.8) A1n and A2n both converge to non-singular matrices as n . →∞

We will denote the assumptions (A.1)-(A.4) by (A.1)′′-(A.4)′′ when (A.2) is defined with (iii)′′ instead of (iii) and (A.3) is defined with (i)′′, (ii)′′ in place of (i) and (ii) respectively.

5.1. Rate of Perturbation Bootstrap Approximation

Note that when the regression errors are non-identically distributed, √nβ¯n has asymptotic −1 −1 variance A1n A2nA1n . Hence, the natural way of defining studentized pivot corresponding to β¯n is −1/2 H˘ n = √nΣ¯ (β¯n β) n − ¯ −1/2 ¯ −1/2 ¯ ¯ −1 n ′ ′ ¯ −1 n ′ 2 where Σn = A2n A1n with A1n = n i=1 xixiψ (¯ǫi), A2n = n i=1 xixiψ (¯ǫi) and ′ ǫ¯ = y x β¯n, i 1,...,n . Define the correspondingP bootstrap pivotP as i i − i ∈{ } ∗ ∗−1/2 ∗ H˘ = √nΣ (βn β¯n) n n − ∗−1/2 ∗−1/2 ∗ ∗ ′ ∗ ∗ −1 n ′ ′ ∗ ∗ ∗ where Σ = A A with ǫ = y x βn, A = n xix ψ (ǫ )G and A = n 2n 1n i i − i 1n i=1 i i i 2n −1 n ′ 2 ∗ 2 n xix ψ (ǫ )(G µ ∗ ) , i 1,...,n . P i=1 i i i − G ∈{ } P Theorem 5.1. Suppose, the assumptions (A.1)′′-(A.4)′′ and (A.5)(i) hold.

n (a) Then there exist constant C > 0 and a sequence of Borel sets Q5n R , such that 5 ⊆ P((ǫ1,...,ǫn) Q5n) 1 as n , and given (ǫ1,...,ǫn) Q5n, n C5 such that ∈ → →∞ ∗ ∈ ≥ there exists a sequence of statistics βn such that { }n≥1 ∗ ∗ −1/2 1/2 −1/2 P βn solves (2.1) and βn β¯n C .n .(logn) 1 o n ∗ || − || ≤ 5 ≥ −     (b) Suppose in addition (A.5)(ii),(iii)′ and (A.8) hold. Then there exist polynomial a˘∗ ( ,ψ,G∗) n · depending on first three moments of G∗ and on ψ( ), ψ′( ) & ψ′′( ) through the residuals 1 · · · ǫ¯ ,..., ǫ¯ , such that given (ǫ , ....., ǫ ) Q5n, we have for n C , { 1 n} 1 n ∈ ≥ 5 ˘ ∗ ˘∗ −1/2 sup P∗(Hn B) ξn(x)dx δnn B∈B | ∈ − ZB | ≤ 20 Das D. and Lahiri S. N.

where ξ˘∗(x)=(1+ n−1/2a˘∗ (x,ψ,G∗))φ(x) and δ δ (ǫ ,...,ǫ ) tends to 0. n n n ≡ n 1 n

(c) Suppose, in addition to the assumptions (A.1)′′-(A.4)′′, (A.5)(i),(ii),(iii)′ and (A.8), (A.6)(i)′′ holds. Then, for the collection of Borel sets defined by (3.1),

˘ ∗ ˘ −1/2 sup P∗(Hn B) P(Hn B) = op(n ) B∈B ∈ − ∈

Remark . ˘ ∗ 5.1 The form of the studentized pivot Hn, defined for achieving second order ˜ ∗ correctness in non-IID case is different from Hn, due to the difference in asymptotic vari- ances of β¯n in two setups. In non-IID case, one cannot ignore computation of the negative square root of a matrix at each bootstrap iteration. But Theorem 5.1 is more general than Theorem 4.3 in the sense that it also includes the case when errors are IID. Note that ¯ ∗ ¯ ∗−1 ¯ ∗ ¯ ∗−1 ¯ ∗ −1 n ′ ′ ∗ ¯ ∗ −1 n ′ 2 ∗ Σn = A1n A2nA1n where A1n = n i=1 xixiψ (ǫi ) and A2n = n i=1 xixiψ (ǫi ) and ∗ ∗ ∗−1 ∗ −1 n ′ ∗ ∗2 −1 n 2 ∗ ¯ ∗ σn = snτn where τn = n i=1 ψ (ǫi ),Psn = n i=1 ψ (ǫi ). We needP to modify Σn and ∗ ∗ ∗ σn to Σn andσ ˜n respectivelyP to achieve second orderP correctness.

Remark 5.2. There is no difference in employing perturbation bootstrap and the usual residual bootstrap with respect to the accuracy of inference. Under some mild conditions, both are second order correct. But in view Theorem 5.1, the advantage of employing per- turbation bootstrap instead of residual counterpart is evident when the errors are no longer identically distributed. Perturbation bootstrap continues to be S.O.C. in non-IID case with- out any modification, whereas a modification in the resampling stage is required for residual bootstrap to achieve the same. To see this, consider the heteroscedastic simple linear regres- sion model

yi = βxi + ǫi (5.1)

2 2 where ǫi’s are independent, Eǫi = 0 and Eǫi = σi . The least square estimator of β is ˆ n n 2 ˆ n 2 2 n 2 2 β = i=1 xiyi/ i=1 xi and hence Var(β) = i=1 xi σi /( i=1 xi ) . The bootstrap observa- tionsP in residualP bootstrap are y∗∗ = x βˆ + e∗Pwhere e∗,...,eP ∗ is a random sample from i i i { 1 n} (e e¯),..., (e e¯) ,e ¯ = n−1 n e and e = y x βˆ, i 1,...,n , are least square { 1 − n − } i=1 i i i − i ∈ { } ˆ∗∗ n ∗∗ n 2 residuals. The residual bootstrappedP least square estimator is β = i=1 xiyi / i=1 xi . n n n Hence, Var(βˆ∗∗ ǫ ,...,ǫ ) = (e e¯)2/ x2 where n−1 [(e P e¯)2 σ2]P 0 as | 1 n i=1 i − i=1 i i=1 i − − i → n . Thus Var(βˆ∗∗ ǫ ,...,ǫP ) is not a consistentP estimator ofPVar(βˆ)and hence residual →∞ | 1 n bootstrap is not second order correct in approximating the distribution of βˆ when errors are heteroscedastic. For details see Liu (1988). On the other hand, if βˆ∗ is the pertur- bation bootstrapped least square estimator, then it is easy to show Var(βˆ∗ ǫ ,...,ǫ ) = | 1 n S.O.C. of Perturbation Bootstrap 21

n 2 2 n 2 2 −1 i=1 xi σi /( i=1 xi ) + Op(n ). Additionally, a centering adjustment is required in the def- Pinition of residualP bootstrapped version of the regression M-estimator to achieve second order correctness even when the regression errors are IID [cf. Lahiri (1992)]; whereas in the perturbation bootstrap no adjustment is needed.

Remark 5.3. In view of second order correctness of bootstrap in heteroscedastic linear regression, Theorem 5.1 is the most general result available. Nonparametric or residual boot- strap fails in heteroscedasticity, as shown by Liu (1988). Liu (1988) developed a weighted bootstrap method as a modification of residual bootstrap in least square setup for the simple n 2 linear regression model (5.1). She proposed the weight to be xi/ i=1 xi corresponding to ith centered residual (e e¯ ), i 1,...,n , to achieve second orderP correctness. There is i − n ∈{ } no general theory available on weighted bootstrap for the multiple linear regression model (1.1) even in heteroscedastic least square setup, to the best our knowledge.

6. Proofs

First we define some notations. Throughout this section, C,C1,C2,... will denote generic constants that do not depend on the variables like n, x, and so on. For a non-negative integral vector α = (α ,α ,...,α )′ and a function f = (f , f ,...,f ): Rl Rl, l 1, write 1 2 l 1 2 l → ≥ α = α + ... + α , α! = α ! ...α !, f α = (f α1 ) ... (f αl ). For t = (t ,...t )′ Rl and α as | | 1 l 1 l 1 l 1 l ∈ α α1 αl B above, define t = t1 ...tl . The collection will always be used to denote the collection

Rp ∗ 2 of Borel subsets of which satisfy (3.1). µG and σG∗ will respectively denote mean and ∗ variance of G1. We want to mention here that only the important steps are presented in the proofs of the proposition and the theorems. For further details see the supplementary material Das and Lahiri (2017). Although the proofs for second order results of perturbation bootstrap go through more or less same line as that for residual bootstrap in Lahiri (1992), the advantage in perturbation bootstrap is that the perturbing quantities are independent of the regression errors and hence it is much easier to obtain suitable stochastic approximation to the bootstrapped pivot and finally the EE than the same in case of residual bootstrap. On the negative side, in our proofs atleast we need Cramer’s condition separately on regression errors and on the perturbing quantities [see assumptions (A.5) and (A.6)], whereas for resid- ual bootstrap, one can derive a restricted Cramer’s condition on resampled residuals from the Cramer’s condition on regression errors to obtain second order correctness. Moreover, second order results can be established for residual bootstrap, after a modification, without any Cramer type condition in the case p = 1 [cf. Karabulut and Lahiri (1997)]. We do not know yet if similar conclusion can be drawn in case of perturbation bootstrap. 22 Das D. and Lahiri S. N.

Before coming to the proofs we state some lemmas:

′ Lemma 6.1. Let, Yi = (Y ,Y ) , 1 i n be a collection of mean zero independent { i1 i2 ≤ ≤ } random vectors. Define, for some non random vectors l1i and l2i of dimensions p1 and p2 n ′ 2 n 4 1/2 −1/2 respectively with ljil = Ip and γ˜ =( lji ) = O(n ), i=1 ji j n j=1 i=1 || || P P P n ′ ′ ′ ˜ −1/2 Ui =(l1iYi1, l2iYi2) , Vn = Cov Ui , Ui = Vn Ui  Xi=1  n −1 n 3 2 −1 for 1 i n, and Sn = U˜ i. Let α˜n = n E Yi I( Yi >λγ˜ ), where I( ) is ≤ ≤ i=1 i=1 || || || || n · the indicator function andPλ satisfies 0 <λ< limP inf λn, λi = the smallest eigen value of Σi, n→∞ Σi = Cov(Yi). Suppose, M0n , Min , i =1,...,p be (p + 1) sequence of matrices { }n≥1 { }n≥1 such that for each n 1, M0n is of order p (p + r). and Min, 1 i p, are of order ≥ × ≤ ≤ ′ ′ ′ (p + r) (p + r), p 1, r 1. Let, k = p + r, M¯ 0n = [0 : I ] and M˜ 0n = [M0n : M¯ ] . × ≥ ≥ r r×k 0n k p ′ ′ ′ k Define the functions g : R R by g (x) = M0nx +(x M1nx,..., x Mpnx) , x R , n → n ∈ n 1. Assume that ≥ −1 n 3 (a) there exists a constant k such that n E Yi

(c) the characteristic function g of Yn satisfies lim sup sup t g (t) < 1 for all n n→∞ ||( )||>b | n | b> 0.

(d) max Min :1 i p = O(˜γ ). {|| || ≤ ≤ } n k (e) M0n = O(1), lim inf inf M˜ 0nu : u = 1, u R δ for some constant || || n→∞ {|| || || || ∈ } ≥ δ > 0.

Then for the class B of Borel sets satisfying (3.1),

˚ sup P(gn(Sn) B) ξn(x)dx = o(˜γn) as n B∈B ∈ − B →∞ Z

˚ −1/2 ˚ ′ where ξn(.)= (1+ n ˚a( ))φD˚n ( ), Dn = M0nM0n and ˚a( ) is a polynomial whose coeffi- · · α · cients are continuous functions of E(Yi) , α 3 and i 1,...,n . | | ≤ ∈{ } proof :

The above Lemma follows from Theorem 20.6 of Bhattacharya and Rao (1986) and retracting the proofs of Lemma 3.1 and 3.2 of Lahiri (1992).

Lemma 6.2. Under the assumptions (A.1)-(A.3) or (A.1)′′-(A.3)′′, it follows that 1/2 n 4 −1/2 ˘vi = O (n ). i=1 || || p  P  S.O.C. of Perturbation Bootstrap 23

proof :

See supplementary material Das and Lahiri (2017).

Lemma 6.3. Under the assumptions (A.2) (i) and (A.2) (iii) or (A.2) (iii)′′, the following is true. 1/4 1/2 n 6 n 4 −1/2 (a) di + di = O(n ). i=1 || || i=1 || || n j (b)  P xi = O(n) Pfor j = 3,4, 5, 6, 6+2α when the errors are IID and for j = i=1 || || 6+2P α, 3,..., 12 when the errors are non-IID. proof :

This lemma follows from assumption (A.2) and by applying H¨olders inequality.

We present only outline of the proofs of the main results from Section 4 and 5 to save space. For details, see the supplementary material Das and Lahiri (2017).

6.1. Outline of the proof of Proposition 4.1

Suppose, n ′ ∗ ∗ xiψ(y x t )G = 0 i − i n i Xi=1 Then by Taylor’s expansion we have,

n n n ′ ¯ ∗ 2 ∗ ′ ∗ ′ ∗ [xi (βn tn)] ′′ ∗ xiψ(¯ǫ )G + xix (β¯n t )ψ (¯ǫ )G + xi − ψ (u )G = 0 (6.1) i i i − n i i 2 i i Xi=1 Xi=1 Xi=1 where for each i 1,...,n , u ǫ¯ ǫ∗ ¯ǫ . ∈{ } | i − i|≤| i − i| Now (6.1) can be written as ∗ ∗ ∗ ∗ L (t β¯n) = ∆ + R (6.2) n n − n n where n ∆∗ = n−1 x ψ(¯ǫ )(G∗ µ ∗ ) n i=1 i i i − G ∗ −1 n ′ ′ ∗ Ln = n Pi=1 xixiψ (¯ǫi)Gi

∗ −1 n ′ ′ ∗ E∗Ln = nP i=1 xixiψ (¯ǫi)µG [x′ (β¯ t∗ )]2 R∗ = n−1 nP x i n − n ψ′′(u )G∗ n i=1 i 2 i i By FukP and Nagaev inequality (1971) [hereafter referred to as FN(71)], lemma 6.3, the Lipschitz property of ψ′′( ) and the Taylor’s expansion of ψ( ) and ψ′( ), it follows that · · · 24 Das D. and Lahiri S. N.

there exist a constant C > 0 and a sequence of Borel sets Q Rn, such that given n ⊆ (ǫ , ....., ǫ ) Q with P((ǫ , ...... , ǫ ) Q ) 1 , for n C and any 0 <ǫ< 1, 1 n ∈ n 1 n ∈ n → ≥ n 3+α ∗ ∗ −1/2 P xi (G EG ) > nǫ = o(n ) (6.3) ∗ || || i − i  i=1  X

n P x x ψ′(¯ǫ )(G∗ EG∗) > nǫ = o(n−1/2), j,k 1,...,p (6.4) ∗ ij ik i i − i ∈{ }  i=1  X

∗ −1/2 1/2 −1/2 P∗ ∆n > C.n (logn) = o(n ) (6.5) || || 

Hence, from (6.3)-(6.5), on the set Qn and given (ǫ , ....., ǫ ) Qn with P((ǫ , ...... , ǫ ) 1 n ∈ 1 n ∗ ∗ Qn) 1, for n C , (6.2) can be rewritten as (t β¯ ) = f (t β¯ ), where f is ∈ → ≥ 1 n − n n n − n n p p ∗ −1/2 1/2 a continuous function from R to R satisfying P ( f (tn β¯ ) C .n (logn) ) = ∗ || n − n || ≤ 1 −1/2 ∗ −1/2 1/2 1 o(n ) as n whenever t β¯n C .n (logn) for some constants C > 0. − →∞ || n − || ≤ 1 1 Hence, Proposition 4.1 follows by Brouwer’s fixed point theorem.

6.2. Outline of the proof of Theorem 4.1

∗ Consider, the sequence of statistics βn which satisfies the proposition. Then (6.2) can { }n≥1 be written as

∗ ∗−1 ∗ ∗ ∗ √n(βn β¯n)= L √n[∆ +χ ˜ + R ] (6.6) − n n n 1n ∗−1 ∗ ∗ = Ln √n∆n + R2n (6.7)

′ ∗ ¯ 2 ∗ −1 n [xi (βn βn)] ′′ ∗ whereχ ˜ = n xi − ψ (¯ǫ )G n i=1 2 i i P Now, by FN(71), for some constant C > 0,

P ( R∗ > C.n−(2+α)/2(logn)(2+α)/2)= o (n−1/2) ∗ || 1n|| p and P ( R∗ > C.n−1/2(logn)) = o (n−1/2) ∗ || 2n|| p Again, ∗−1 ∗ −1 ∗ ˜∗ Ln =(E∗Ln) + Wn + Zn (6.8) where S.O.C. of Perturbation Bootstrap 25

W ∗ =(E L∗ )−1(E L∗ L∗ )(E L∗ )−1 n ∗ n ∗ n − n ∗ n Z˜∗ =(E L∗ )−1(E L∗ L∗ )(E L∗ )−1(E L∗ L∗ )L∗−1 n ∗ n ∗ n − n ∗ n ∗ n − n n

Now, it can be shown by FN(71) that for some constant C > 0, as n C , 1 ≥ 1 P ( Z˜∗ >C .n−1/2(logn)−1) ∗ || n|| 1 P ( L∗ E L∗ >C .n−1/4(logn)−1/2) ≤ ∗ || n − ∗ n|| 1 −1/2 = op(n ) (6.9)

Therefore, it follows that there exists C2 > 0 and a sequence of Borel sets Q2n, such that

P((ǫ , ...... , ǫ ) Q2n) 1 as n , and given (ǫ , ....., ǫ ) Q2n and n C , 1 n ∈ → →∞ 1 n ∈ ≥ 2

∗ ∗ −1 ∗ ∗ ∗ ∗ −1 ∗ ∗ √n(βn β¯n)=(E L ) √n∆ + W √n∆ +(E L ) √nχ + R (6.10) − ∗ n n n n ∗ n n 3n ′ ∗ −1 ∗ 2 n [xi ((E∗Ln) ∆n)] where χ∗ = n−1 x ψ′′(¯ǫ )µ ∗ n i=1 i 2 i G and P P ( R∗ = o(n−1/2))=1 o(n−1/2) ∗ || 3n|| − ¯ −1/2 Since Σn = Op(1), so by argument similar to (4.12) of Qumsiyeh (1990a), we have

∗ ∗ −1/2 sup P∗(Fn B) P∗(Un B) = op(n ) (6.11) B∈B | ∈ − ∈ |

∗ ¯ −1/2 ∗ −1 ∗ ∗ ∗ ∗ −1 ∗ where Un = √nΣn (E∗Ln) ∆n + Wn ∆n +(E∗Ln) χn " #

∗ ∗ ∗ ∗ ∗ n ∗ Now, for all 1 i n, defining Y = (G µ ∗ ), X = ˘viY , V = Cov (X ), ≤ ≤ i i − G i i n i=1 ∗ i ˜ ∗ ∗−1/2 ∗ ∗ n ˜ ∗ Xi = Vn Xi and Sn = i=1 Xi , it can be established that P P ∗ ∗ ∗ ∗ ′ ∗ ∗ ∗ ′ ∗ ∗ ′ Un = M0nSn +(Sn M1nSn,..., Sn MpnSn) (6.12) where M∗ = O (1) and M∗ = O (n−1/2) for all j 1,...,p . 0n p jn p ∈{ }

Therefore, by Lemma 6.1 and 6.2,

∗ ∗ −1/2 sup P∗(Un B) ξn(x)dx = op(n ) as n (6.13) B∈B | ∈ − ZB | →∞ where ∗(ν) ∗ −1/2 ∗(ν) ν b31 ν ξn(x)= 1 n b11 D + D φ(x) (6.14)  − ν ν ν!  |X|=1 |X|=3    26 Das D. and Lahiri S. N.

∗(ν) ∗(ν) Now, the coefficients b11 and b31 can be computed using the transformation techniques of Bhattacharya and Ghosh (1978). If ν1 is a p 1 vector with all the elements being 0, × except the jth one and ν2 is a p 1 vector with all the elements being 0, except the j , j × 1 2 and j3 positions then after some algebraic calculations it can be shown that

p n ∗(ν1 ) −1 ′ ∗ ¯ −1 ′ b11 = hjkn n ziEknA1n xiψ(¯ǫi)ψ (¯ǫi) kX=1  Xi=1 h i n −1 ∗ ′ ¯ −1 ¯ ¯ −1 ′′ + (2n) ajinxiA1n A2nA1n xiψ (¯ǫi) (6.15) Xi=1

n 3 ∗(ν2 ) −1 ∗ 3 b31 =n aj in ψ (¯ǫi)  m ! Xi=1 mY=1   n p −2 ∗ ∗ ′ ∗ ¯ −1 2 ′ +2n aj1inaj2in hj3knziEknA1n xj ψ (¯ǫi)ψ(¯ǫj)ψ (¯ǫj) i,jX=1   kX=1   n p −2 ∗ ∗ ′ ∗ ¯ −1 2 ′ +2n aj1inaj3in hj2knziEknA1n xj ψ (¯ǫi)ψ(¯ǫj)ψ (¯ǫj) i,jX=1   kX=1  

n p −2 ∗ ∗ ′ ∗ ¯ −1 2 ′ +2n aj2inaj3in hj1knziEknA1n xj ψ (¯ǫi)ψ(¯ǫj)ψ (¯ǫj) i,jX=1   kX=1   n −3 ∗ ∗ ∗ ′ ¯ −1 ′ ¯ −1 ′′ 2 2 +3n aj1inaj2inaj3in xjA1n xlxlA1n xi ψ (¯ǫl)ψ (¯ǫi)ψ (¯ǫj) (6.16) i,j,lX=1   ¯ ¯ ¯ −1/2 ′ ∗ where A1n and A2n are as defined earlier and A2n = (h1n,..., hpn), hjnxi = ajin, ∗ ∗ hjn =(h ,...,h ), j 1, ,p , i 1,...,n and E is a q p matrix with E q 1jn pjn ∈{ ··· } ∈{ } kn × || kn|| ≤ for all k 1,...,p ∈{ }

−1 1/2 Now, one can find the two term EE of Fn = √nσ A (β¯n β) in similar way such that n − (for detail see Lahiri(1992))

−1/2 sup P(Fn B) ξn(x)dx = o(n ) as n (6.17) B∈B | ∈ − ZB | →∞ where (ν) −1/2 (ν) ν b31 ν ξn(x)= 1 n b11 D + D φ(x) (6.18)  − ν ν ν!  |X|=1 |X|=3    (ν1) (ν2) ∗(ν1) (ν1) where the coefficients b11 and b31 are such that for all j, j1, j2, j3 1,...,p , b11 b11 ν ν ∈{ } − and b∗( 2) b( 2) both can be shown to converge in probability to 0. Hence by (6.12)-(6.18), 31 − 31 Theorem 4.1 follows. S.O.C. of Perturbation Bootstrap 27 6.3. Outline of the proof of Theorem 4.2

We have,

∗ ∗−1 −1/2 ∗ H = √nσ σˆ Σ¯ (βn β¯n) (6.19) n n n n − where σ∗ is as defined earlier. Now using Taylor’s expansion and Lipschitz property of ψ′′( ), n · it can be established that

H∗ = F∗ √nσˆ Σ¯ −1/2Z∗((E L∗ )−1∆∗ )+ R∗ (6.20) n n − n n n ∗ n n 4n where n ∗ 3 −1 2 1 ′′ ′ ∗ −1 ∗ Zn =(2sn τn ) 2τnsn ψ (¯ǫi)[xi ((E∗Ln) ∆n)] | |  n i=1  n X 2 2 ′ ′ ∗ −1 ∗ τ ψ(¯ǫ )ψ (¯ǫ )[xi ((E L ) ∆ )] − n n i i ∗ n n  Xi=1 

and there exist constants C > 0 and a sequence of Borel sets Q3n such that P(Q3n) 1 3 ↑ and given (ǫ , ...... ǫ ) Q3n and n C , 1 n ∈ ≥ 3 P ( R∗ = o(n−1/2))=1 o(n−1/2) (6.21) ∗ || 4n|| − ∗ ∗ ˜ ∗ ∗ Therefore, writing Hn as Hn = Un + R4n, we have

˜ ∗ ˜ ∗ ∗ ∗ ′ ˜ ∗ ∗ ∗ ′ ˜ ∗ ∗ ′ Un = M0nSn +(Sn M1nSn,..., Sn MpnSn) (6.22)

where M˜ ∗ = O (1) and M˜ ∗ = O (n−1/2) for all j 1,...,p . 0n p jn p ∈{ }

Hence, by Lemma 6.1,

˜ ∗ ˜∗ −1/2 sup P∗(Un B) ξn(x)dx = op(n ) as n (6.23) B∈B | ∈ − ZB | →∞ where ˜∗(ν) ˜∗ −1/2 ˜∗(ν) ν b31 ν ξn(x)= 1 n b11 D + D φ(x) (6.24)  − ν ν ν!  |X|=1 |X|=3    Hence part (a) follows by (4.12) of Qumsiyeh (1990a).

−1 1/2 ¯ Suppose the two term EE of the original studentized regression M-estimator Hn = √nσˆn An (βn β) is − ˜(ν) ˜ −1/2 ˜(ν) ν b31 ν ξn(x)= 1 n b11 D + D φ(x) (6.25)  − ν ν ν!  |X|=1 |X|=3    28 Das D. and Lahiri S. N.

Now part (b) of Theorem 4.2 follows directly by comparing (6.24) and (6.25). Again after ˜(ν) ˜(ν) some algebraic calculations, it can be shown that b11 and b31 both contain terms involving 2Eψ2(ǫ ) Eψ(ǫ )ψ′(ǫ ) Eψ′(ǫ )Eψ3(ǫ ) which cannot be replicated by the terms present 1 1 1 − 1 1 h ˜∗(ν) ˜∗(ν) i in b11 and b31 [cf. Supplementary material Das and Lahiri (2017)]. Hence part (c) of Theorem 4.2 follows.

6.4. Outline of the proof of Theorem 4.3

We have the modified studentized bootstrapped M-estimator as,

∗ ∗ −1 −1/2 ∗ H˜ = √n(˜σ ) σˆ Σ¯ (βn β¯n) (6.26) n n n n − n n whereσ ˜∗ =s ˜∗ τ˜∗−1,τ ˜∗ = n−1 ψ′(ǫ∗)G∗ ands ˜∗2 = n−1 ψ2(ǫ∗)(G∗ µ ∗ )2. Also n n n n i=1 i i n i=1 i i − G ∗ 2 2 2 suppose,τ ¯n = µG τn ands ¯n = σPG∗ sn. P

Now using the same line of arguments which is working behind (6.20) in the proof of Theorem 4.2, it can be shown that

H˜ ∗ = F∗ √nσˆ Σ¯ −1/2(Z∗ Z¯∗)((E L∗ )−1∆∗ )+ R∗ (6.27) n n − n n n − n ∗ n n 5n ¯∗ ˜∗ where Zn is as defined in the proof of Theorem 4.2 and Zn is defined as

n ∗ −1 −2 2 −1 ′ ∗ Z¯ =2 τ¯ s¯ 2¯τ s¯ n ψ (¯ǫ )(G µ ∗ ) n n n  n n i i − G    Xi=1  n 2 −1 2 ∗ 2 2 τ¯ n ψ (¯ǫ )[(G µ ∗ ) σ ∗ ] − n i i − G − G   Xi=1   and there exist constant C > 0 and a sequence of Borel sets Q4n such that P(Q4n) 1 and 4 ↑ given (ǫ , ...... ǫ ) Q4n and n C , 1 n ∈ ≥ 4 P ( R∗ = o(n−1/2))=1 o(n−1/2) (6.28) ∗ || 5n|| −

∗ ∗ ∗ ∗ ∗ ∗ 2 2 Therefore, defining Y1i = Gi µG , Y2i =(Gi µG ) σG∗ ′ − − − ∗ ′ ∗ −1/2 2 ∗ ∗ n ∗ ˜ ∗ ∗−1/2 ∗ ¯∗ n ˜ ∗ Xi = ˘viY1i, n ψ (¯ǫi)Y2i , Vn = i=1 Cov∗(Xi ), Xi = Vn Xi , Sn = i=1 Xi with   ¯vi defined with ˘zi in place of zi. P P

˜ ∗ ˜ ∗ ¯ ∗ ∗ Hence, we have Hn as Hn = Un + R5n, where

¯ ∗ ¯ ∗ ¯∗ ¯∗′ ¯ ∗ ¯∗ ¯∗′ ¯ ∗ ¯∗ ′ Un = M0nSn +(Sn M1nSn,..., Sn MpnSn) (6.29) S.O.C. of Perturbation Bootstrap 29

with M¯ ∗ = O (1) and M¯ ∗ = O (n−1/2) for all j 1,...,p . 0n p jn p ∈{ }

Hence, there exists a two term EE ξ¯∗( ), as in Theorem 4.2, such that · ˜ ∗ ¯∗ −1/2 sup P∗(Hn B) ξn(x)dx = op(n ) as n (6.30) B∈B | ∈ − ZB | →∞ Now, ξ¯∗( ) can be found explicitly as in standardized case. See supplementary material n · Das and Lahiri (2017) for more details. Again if ξ¯∗( ) is compared with ξ˜ ( ), given by (6.25), n · n · then it can be established that all the coefficients in ξ¯∗( ) are close in probability to that n · of ξ˜ ( ), unlike the case of naive studentized bootstrapped estimator. One point we want to n · ¯∗ ˜ ∗ make here that the term Zn which is present in the expression of Hn, unlike the expression ∗ of Hn, introduces important third order terms which are crucial in getting second order correctness. Therefore, Theorem 4.3 follows.

6.5. Outline of the proof of Theorem 5.1

See supplementary material Das and Lahiri (2017).

7. Conclusion

Second order results of Perturbation Bootstrap method in regression M-estimation are estab- lished. It is shown that the classical way of studentization in perturbation bootstrap setup is not sufficient for correcting the distribution of the regression M-estimator upto second order. This is a general statement corresponding to the fact that the usual studentized perturba- tion bootstrapped estimator is not capable of correcting the effect of skewness of the error distribution in least square regression. Novel modification is proposed in general setup by properly incorporating the effect of the randomization of the random perturbing quantities in the prevalent studentization factor and is shown as second order correct in both IID and non-IID error setup. Thus, in a way the results in this paper establish perturbation boot- strap method as a refinement of the approximation of the exact distribution of the regression M-estimator over asymptotic normality. The second order result in non-IID case establishes robustness of the perturbation bootstrap towards the presence of heteroscedasticity, similar to the wild bootstrap, but in the more general setup of M-estimation. This is an important finding from the perspective of S.O.C. inferences regarding the regression parameters. 30 Das D. and Lahiri S. N. Acknowledgement

The authors would like to thank the two referees, the associate editor and the editor for many constructive comments. They encouraged the authors to add a section on the performance of perturbation bootstrap when the errors are heteroscedastic (Section 5).

Supplementary Material

Supplement to “Second Order Correctness of Perturbation Bootstrap M-Estimator of Multiple Linear Regression Parameter” (; .pdf). Details of the proofs are provided.

References

[1] ALLEN, M. and DATTA, S. (1999). A Note on Bootstrapping M-Estimators in ARMA Models. J. Analysis 20 365379. [2] ARCONES A. M. and GINE.´ (1992). On the bootstrap of M-estimators and other statistical functionals. Exploring the Limits of Bootstrap. Edited by R. LePage and L. Billard. Wiley, New York. 13-47. [3] ARLOT, S. (2009). Model selection by resampling penalization. Electron. J. Statist. 3 557–624. [4] BARBE, P. and BERTAIL, P. (2012). The weighted bootstrap . Lecture Notes in Statis- tics, 98. [5] BERAN, R. (1986). Discussion: Jackknife, Bootstrap and Other Resampling Methods in . Ann. Statist. 14 1295–1298. [6] BHATTACHARYA, R. N. and GHOSH, J. K. (1978). On the validity of the formal Edgeworth expansion. Ann. Statist. 6 434-451. [7] BHATTACHARYA, R. N. and RANGA RAO, R. (1986). Normal approximation and asymptotic expansions. John Wiley & Sons. [8] BICKEL, PETER J. and FREEDMAN, D. A. (1981b). Some Asymptotic Theory for the Bootstrap. Ann. Statist. 9.6 11961217. [9] CHATTERJEE, S. (1999). Generalised bootstrap techniques. Ph.D. dissertation. Indian Statistical Institute, Calcutta. [10] CHATTERJEE, S. and BOSE, A. (2005). Generalized bootstrap for estimating equa- tions. Ann. Statist. 33 414-436. S.O.C. of Perturbation Bootstrap 31

[11] CHATTERJEE, A. and LAHIRI, S. N. (2013). Rates of convergence of the adaptive LASSO estimators to the oracle distribution and higher order refinements by the boot- strap. Ann. Statist. 41 12321259. [12] CHENG, G. and HUANG, J. Z. (2010). Bootstrap consistency for general semiparamet- ric M -estimation. Ann. Statist. 38 2884–2915. [13] CHENG, G. (2015). Moment Consistency of the Exchangeably Weighted Bootstrap for Semiparametric M-estimation. Scand J Statist. 42 665684. [14] DAS, D. and Lahiri S. N. (2017). Supplement to “Second Order Correctness of Pertur- bation Bootstrap M-Estimator of Multiple Linear Regression Parameter”. [15] DAVIDSON, R. and FLACHAIRE, E. (2008). The wild bootstrap, tamed at last. Jour- nal of . 146 162-169. [16] DAVIDSON, R. and MACKINNON, J. G. (2010). Wild Bootstrap Tests for IV Regres- sion. Journal of Business & Economic Statistics. 28 128-144. [17] EFRON, B. (1979). Bootstrap Methods: Another Look at the Jackknife. Ann. Statist. 7 1-26. [18] EL BANTLI, F. (2004). M-estimation in linear models under nonstandard conditions. J. Statist. Plann. Inference 121 231-248. [19] FENG X., HE X. and HU J. (2011). Wild bootstrap for quantile regression. Biometrika 98 995999. [20] FREEDMAN, D. A. (1981). Bootstrapping Regression Models. Ann. Statist. 9 12181228. [21] FUK, D. H. and NAGAEV, S. V. (1971). Probabilistic inequalities for sums of indepen- dent random variables. Teor. Verojatnost. i Primenen. 16 660-675. [22] HAEUSLER, E., MASON, D. M. and NEWTON, M.A. (1991). Weighted Bootstrapping of Means. Centrum voor Wiskunde en Informatica Quarterly. 4 213-228. [23] HALL, P. (1992). The bootstrap and Edgeworth expansion. Springer Series in Statistics. [24] HLAVKA, Z. (2003). Asymptotic properties of robust three-stage procedure based on bootstrap for M-estimator. J. Statistical Planning and Inference 115 637-656. [25] HU, F. (1996). Efficiency and Robustness of a Resampling M-Estimator in the Linear Model. J. Multivariate Analysis 78 252-271. [26] HU, F. and KALBFLElSCH D. J. (2000). The estimating function bootstrap. The Cana- dian Journal of Statistics 28 449-499. [27] HUBER, P. (1981). . Wiley, New York. [28] JIN, Z. , YING, Z. and WEI, L. J. (2001). A simple resampling method by perturbing the minimand. Biometrika. 88 381-390 . [29] KARABULUT, I.K. and LAHIRI, S.N. (1997). Two-term Edgeworth expansion for M- 32 Das D. and Lahiri S. N.

estimators of a linear regression parameter without Cramer-type conditions and an application to the bootstrap. Proceedings of the Australian Mathematical Society, Ser. A. 62 361-370. [30] KLINE, P. and SANTOS, A. (2012). A Score Based Approach to Wild Bootstrap In- ference. Journal of Econometric Methods 1 2341. [31] LAHIRI, S. N. (1989b). Bootstrap approximation and Edgeworth Expansion for the dis- tributions of the M-estimators of a regression parameter. Preprint 89-36, Dept. Statis- tics, Iowa State Univ. [32] LAHIRI, S. N. (1992). Bootstrapping M-estimators of a multiple linear regression pa- rameter. Ann. Statist. 20 1548-1570. [33] LAHIRI, S. N. (1994). On two-term Edgeworth expansions and bootstrap approxima- tions for Studentized multivariate M-estimators. Sankhya A. 56 201-226. [34] LAHIRI, S. N. (1996). On Edgeworth Expansion and Moving Block Bootstrap for Stu- dentized M-Estimators in Multiple Linear Regression Models. J. Multivariate Analysis 56 42-59. [35] LAHIRI, S. N., and ZHU, J. (2006). Resampling methods for spatial regression models under a class of stochastic designs. Ann. Statist. 34 1774-1813. [36] LEE, STEPHEN M.S (2012). General M-estimation and its bootstrap. J. Korean Sta- tistical Society 41 471-490. [37] LIU, R. Y. (1988). Bootstrap Procedures under some Non-IID Models. . Ann. Statist. 16 1696–1708. [38] MA, S. and KOSOROK, M. R (2004). Robust semiparametric M-estimation and the weighted bootstrap. J. Multivariate Analysis 96 190-217. [39] MAMMEN E. (1993). Bootstrap and Wild Bootstrap for High Dimensional Linear Mod- els. Ann. Statist. 21 255–285. [40] MASON, D. M. and NEWTON, M. A. (1992). A Rank Statistics Approach to the Consistency of a General Bootstrap. Ann. Statist. 20 1611–1624. [41] MINNIER, J., TIAN, L. and CAI, T (2011). A perturbation method for inference on regularized regression estimates. J. Amer. Statist. Assoc. 106 1371-1382. [42] NAVIDI, W. (1989). Edgeworth Expansions for Bootstrapping Regression Models. Ann. Statist. 17 1472–1478. [43] QUMSIYEH, M. B. (1990a). Edgeworth expansion in regression models. J. Multivariate Analysis 35 86-101. [44] QUMSIYEH, M. B. (1994). Bootstrapping and empirical edgeworth expansions in mul- tiple linear regression models. Comm. Statist. Theory Methods 23 32273239. [45] RAO, C. and ZHAO, L. (1992). Approximation to the Distribution of M-Estimates in S.O.C. of Perturbation Bootstrap 33

Linear Models by Randomly Weighted Bootstrap. Sankhya A 54 323-331. [46] RUBIN, D. B. (1981). The Bayesian Bootstrap. Ann. Statist. 9 130–134. [47] WANG, X. M. and ZHOU, W. (2004). Bootstrap Approximation to the Distribution of M-estimates in a Linear Model. Acta Math Sinica. 20 93104. [48] WELLNER, J. A and ZHAN, Y. (1996), ”Bootstrapping Z-Estimators”. Technical re- port, University of Washington, Dept. of Statistics. [49] WU, C. F. J. (1986). Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis. Ann. Statist. 14 1261–1295. [50] YOU, J. and CHEN, G. (2006). Wild bootstrap estimation in partially linear models with heteroscedasticity. Statistics & Probability Letters. 76 340–348.