Second Order Correctness of Perturbation Bootstrap M-Estimator of Multiple Linear Regression Parameter” (; .Pdf)

Submitted to Bernoulli Second Order Correctness of Perturbation Bootstrap M-Estimator of Multiple Linear Regression Parameter∗ DEBRAJ DASa and S. N. LAHIRIb aDepartment of Statistics, University of Wisconsin-Madison, 1300 University Avenue, Madison, WI 53706, USA. E-mail: [email protected] bDepartment of Statistics, North Carolina State University, 2311 Stinson Dr, Raleigh, NC 27695-8203, USA. E-mail: [email protected] Abstract x′ . Consider the multiple linear regression model yi = iβ + ǫi, where ǫi’s are independent and identically distributed random variables, xi’s are known design vectors and β is the p 1 ×¯ vector of parameters. An effective way of approximating the distribution of the M-estimator βn, after proper centering and scaling, is the Perturbation Bootstrap Method. In this current work, second order results of this non-naive bootstrap method have been investigated. Second order correctness is important for reducing the approximation error uniformly to o(n−1/2) to get better inferences. We show that the classical studentized version of the bootstrapped estimator fails to be second order correct. We introduce an innovative modification in the studentized version of the bootstrapped statistic and show that the modified bootstrapped pivot is second order correct (S.O.C.) for approximating the distribution of the studentized M-estimator. Additionally, we show that the Perturbation Bootstrap continues to be S.O.C. when the errors ǫi’s are independent, but may not be identically distributed. These findings establish perturbation Bootstrap approximation as a significant improvement over asymptotic normality in the regression M-estimation. Keywords: M-Estimation, S.O.C., Perturbation Bootstrap, Edgeworth Expansion, Studentization, Residual Bootstrap, Generalized Bootstrap, Wild Bootstrap. 1. Introduction arXiv:1605.01440v2 [math.ST] 17 Dec 2017 Consider the multiple linear regression model : ′ yi = xiβ + ǫi, i =1, 2,...,n (1.1) where y1,...,yn are responses, ǫ1,...,ǫn are independent and identically distributed (IID) random variables with common distribution F (say), x1,..., xn are known non random design vectors and β is the p-dimensional vector of parameters. ∗Research partially supported by NSF grants no. DMS 1310068, DMS 1613192 1 2 Das D. and Lahiri S. N. Suppose β¯ is the M-estimator of β corresponding to the objective function Λ( ) i.e. n · n ′ β¯n = arg min Λ(y x t). Now if ψ( ) is the derivative of Λ( ), then β¯n is the M- t i=1 i − i · · estimator correspondingP to the score function ψ( ) and is defined as the solution of the · vector equation n ′ xiψ(y x β)= 0. i − i Xi=1 It is known [cf. Huber(1981)] that under some conditions on the objective function, design vectors and error distribution F ;(β¯n β) with proper scaling has an asymptotically normal − 2 2 2 2 ′ distribution with mean 0 and dispersion matrix σ Ip where σ = Eψ (ǫ1)/E ψ (ǫ1). After introduction of bootstrap by Efron in 1979 as a resampling technique, it has been widely used as a distributional approximation method. Resampling from the naive empirical distribution of the centered residuals in a regression setup, called residual bootstrap, was introduced by Freedman (1981). Freedman (1981) and Bickel and Freedman (1981b) had ∗ shown that given data, the conditional distribution of √n(βn β¯n) converges to the same − normal distribution as the distribution of √n(β¯n β) when β¯n is the usual least square − estimator of β, that is, when Λ(x)= x2. It implies that the residual bootstrap approximation to the exact distribution of the least square estimator is first order correct as in the case of normal approximation. The advantage of the residual bootstrap approximation over normal approximation for the distribution of linear contrasts of least square estimator for general p was first shown by Navidi (1989) by investigating the underlying Edgeworth Expansion (EE); although heuristics behind the same was given by Liu (1988) in restricted case p = 1. Consequently, EE for the general M-estimator of β was obtained by Lahiri (1989b) when p = 1; whereas the same for the multivariate least square estimator was found by Qumsiyeh (1990a). EE of standardized and studentized versions of the general M-estimator in multiple linear regression setup was first obtained by Lahiri (1992). Lahiri (1992) also established the second order results for residual bootstrap in regression M-estimation. A natural generalization of sampling from the naive empirical distribution is to sample from a weighted empirical distribution to obtain the bootstrap sample residuals. Broadly, the resulting bootstrap procedure is called the weighted or generalized bootstrap. It was introduced by Mason and Newton (1992) for bootstrapping mean of a collection of IID random variables. Mason and Newton (1992) considered exchangeable weights and established its consistency. Lahiri (1992) established second order correctness of generalized bootstrap in approximating the distribution of the M-estimator for the model (1.1) when the weights are chosen in a particular fashion depending on the design vectors. Wellner and Zhan (1996) proved the consistency of infinite dimensional generalized bootstrapped M-estimators. Con- S.O.C. of Perturbation Bootstrap 3 sequently, Chatterjee and Bose (2005) established distributional consistency of generalized bootstrap in estimating equations and showed that generalized bootstrap can be used in order to estimate the asymptotic variance of the original estimator. Chatterjee and Bose (2005) also mentioned the bias correction essential for achieving second order correctness. An important special case of generalized bootstrap is the bayesian bootstrap of Rubin (1981). Rao and Zhao (1992) showed that the distribution function of M-estimator for the model (1.1) can be approximated consistently by bayesian bootstrap. See the monograph of Barbe and Bertail (2012) for an extensive study of generalized bootstrap. A close relative to the generalized bootstrap procedure is the wild bootstrap. It was introduced by Wu (1986) in multiple linear regression model (1.1) with errors ǫi’s being heteroscedastic. Beran (1986) justified wild bootstrap method by pointing out that the distribution of the least square estimator can be approximated consistently by the wild bootstrap approximation. Second order results of wild bootstrap in heteroscedastic regression model was first established by Liu (1988) when p = 1. Liu (1988) also showed that usual residual bootstrap is not capable of approximating the distribution of the least square estimator upto second order in heteroscedastic setup and described a modification in resampling procedure which can establish second order correctness. For general p, the heuristics behind achieving second order correctness by wild bootstrap in homoscedastic least square regression were discussed in Mammen (1993). Recently, Kline and Santos (2011) developed a score based bootstrap method depending on wild bootstrap in M-estimation for the homoscedastic model (1.1) and established consistency of the procedure for Wald and Lagrange Multiplier type tests for a class of M-estimators under misspecification and clustering of data. A novel bootstrap technique, called the perturbation bootstrap was introduced by Jin, Ying, and Wei (2001) as a resampling procedure where the objective function having a U- process structure was perturbed by non-negative random quantities. Jin, Ying, and Wei (2001) showed that in standardized setup, the conditional distribution of the perturbation resampling estimator given the data and the distribution of the original estimator have the same limiting distribution which means this resampling method is first order correct without studentization. In a recent work, Minnier, Tian, and Cai (2011) also applied this perturbation resampling method in penalized regression setup such as Adaptive Lasso, SCAD, lq penalty and showed that the standardized perturbed penalized estimator is first order correct. But, second order properties of this new bootstrap method have remained largely unexplored in the context of multiple linear regression. In this current work, the perturbation bootstrap approximation is shown to be S.O.C. for the distribution of studentized M-estimator for the regression model (1.1). An extension to the case of independent and non-IID errors is also established, showing the robustness of perturbation bootstrap towards the presence 4 Das D. and Lahiri S. N. of heteroscedasticity. Therefore, besides the existing bootstrap methods, the perturbation bootstrap method can also be used in regression M-estimation for making inferences regard- ing the regression parameters and higher order accuracy can be achieved than the normal approximation. A classical way of studentization in bootstrap setup, in case of regression M-estimator and ∗ ∗ ∗−1 ∗ −1 n ′ ∗ for IID errors, is to consider the studentization factor to be σn = snτn , τn = n i=1 ψ (ǫi ), ∗2 −1 n 2 ∗ ∗ ′ ∗ ∗ s = n ψ (ǫ ) where ǫ = y x βn, i 1,...,n , with βn being the perturbationP n i=1 i i i − i ∈{ } bootstrappedP estimator of β, defined in Section 2. Although the residual bootstrapped estimator is S.O.C. after straight-forward studentization, the same pivot fails to be S.O.C. in the case of perturbation bootstrap. Two important special cases are considered as examples in this respect. The reason behind this failure is that although the bootstrap residuals are sufficient in capturing the variability of the bootstrapped estimator in residual bootstrap, it is not enough in the case of perturbation

Load more