arXiv:2007.01615v2 [math.ST] 18 Sep 2020 orsodn otescesof success the to corresponding h oaits h dsrtofrteevent the for ratio odds The covariates. the atrn hsdpnec ymodelling by dependence this capturing aibeadtevleof value the and variable E-mail: Abstract b DAS DEBRAJ Logistic in Bootstrap Regression of correctness order Second On Bernoulli to Submitted 21).Telgsi ersinmdli ie sfollows. as given is model regression logistic The d (2013)). F among in surveys, found regression. biomedical be of trials, can clinical field , regression the logistic in of it applications to popularized merous statistical who a as (1958), function Cox ‘logit’ by the regressio of used use widely The binary. most the is of one is regression Logistic Introduction 1. of coefficients Keywords: the of decision. intervals operations confidence healthcare the usin of find shown to is used method o is Bootstrap second method achieve proposed the to of version properties Bootstrap sample the to version employed studentized is smoothed the strategy that ensure to studentizatio (1993) ML after Lahiri resulting even in the fails binary, approach being Bootstrapping variable direct response the establish the in that challenge fact main the The normality. in in asymptotic results on which based parameter method that regression to Bootstrap the proposed the of techniq for (MLE) Bootstrap correctness estimator perturbation conv novel likelihood alternative a maximum an develop the as we considered paper, this is In regression logistic the variable, a E-mail: eateto imdclIfrais avr eia Sch Medical Harvard Informatics, Biomedical of Department eateto ahmtc n ttsis ninInstitut Indian , and Mathematics of Department ∗ eerhprilyspotdb S npr elwhpDST fellowship Inspire DST by supported partially Research priyam [email protected] ntefilso lncltil imdclsres aktn,banking marketing, surveys, biomedical trial, clinical of fields the In oitcRgeso,PBL,SC atc,Sotig Perturba Smoothing, Lattice, SOC, PEBBLE, Regression, Logistic [email protected] a ∗ n RYMDAS PRIYAM and y eed nthe on depends y eoe by denoted , b y ietyo h oaits nlgsi ersin log-od regression, logistic in covariates, the on directly p { y p needn variables independent ( 1 = 1 x = ) } sgvnby given is P fTcnlg apr India. Kanpur, Technology of e /INSPIRE/04/2018/001290 tes(omr eehwadSturdivant and Lemeshow (Hosmer, others ( o,Bso,USA. Boston, ool, y ldtsbc oBrsn(94,followed (1944), Berkson to back dates ol .W dp mohn ehiu developed technique smoothing adopt We n. ehiuswe h epnevariable response the when techniques n fteMEhsadniy iia smoothing Similar density. a has MLE the of ) smdlda ierfnto of function linear a as modeled is 1), = Suppose mrvdifrnepromnecompared performance inference improved ffrn ed,fo akn etr to sectors banking from fields, ifferent efrapoiaigtedsrbto of distribution the approximating for ue drcretapoiain odfinite- Good approximation. correct rder iuaineprmns h proposed The . simulation g a atc tutr.W hwthat show We structure. lattice a has E h oaitso aae ntefield the in dataset a on covariates the loigtoesmnlwrs nu- works, seminal those ollowing netapoc olna regression. linear to approach enient n eododrcretesremains correctness order second ing odd etr eetbihscn order second establish We vector. y ( x eoe h iayresponse binary the denotes x inBootstrap. tion = ) ( = ihdcooosresponse dichotomous with , x 1 1 x , . . . , − p ( p x ( ) x ) p h logistic The . ) ′ nta of Instead . ds 2 Das D. and Das P. regression model is given by

p(x) logit(p(x)) = log = x′β, (1.1) 1 p(x)  −  where β = (β1,...,βp) is the p-dimensional vector of regression parameters. In convention, the maximum likelihood estimator (MLE) of β is used for the purpose of inference. For a given sample n (xi,y ) , the likelihood is given by { i }i=1 n y 1 y L(β y ,...,y , x ,..., x )= p(x ) i (1 p(x )) − i , | 1 n 1 n i − i Yi=1 x′ β e i β x ′ βˆ β β x x where p( i)= x β . The MLE n of is defined as the maximizer of L( y1,...,yn, 1,..., n), | 1+ e i | which is obtained by solving n (y p(β x ))x = 0. (1.2) i − | i i Xi=1 In order to find confidence intervals for different regression coefficients or to test whether a certain covariate is of importance or not, it is required to find a good approximatation of the distribution of βˆn. βˆn being the MLE, the distribution of βˆn is approximately normal under certain regularity conditions. Asymptotic normality as well as other large sample properties of βˆn have been studied extensively in the literature (cf. Haberman (1974), McFadden (1974), Amemiya (1976), Gourieroux and Monfort (1981), Fahrmeir and Kaufmann (1985)). As an alternative to asymptotic normality, Efron (1979) proposed the Bootstrap approximation which has been shown to work in wide class of models, specially in case of multiple . In the last few decades, several variants of Bootstrap have been developed in linear regression. De- pending on whether the covariates are non-random or random in linear regression setup, Freedman (1981) proposed the residual Bootstrap or the paired Bootstrap. A few other variants of Bootstrap methods in linear regression setup are the wild Bootstrap (cf. Liu (1988), Mammen (1993)), the weighted Bootstrap (Lahiri (1992), Barbe and Bertail (2012)) and the perturbation Bootstrap (Das and lahiri (2019)). Using similar mechanism of the residual and the paired Bootstrap, Moulton and Zeger (1989, 1991) developed the standardized Pearson residual and the observation vector resampling Bootstrap methods in generalized linear models (GLM). Lee (1990) considered the logistic regression model and showed that the conditional distribution of these resample based Bootstrap estimators for the given data are close to the distribution of the original estimator in almost sure sense. Claeskens et al. (2003) proposed a couple of Bootstrap methods for logistic re- gression in univariate case, namely ‘Linear one-step Bootstrap’ and ‘Quadratic one-step Bootstrap’. ‘Linear one-step Bootstrap’ was developed following the linearization principle proposed in Davi- son et al. (1986), whereas, ‘Quadratic one-step Bootstrap’ was constructed based on the quadratic approximation of the estimators as discussed in Ghosh (1994). The validity of these two Bootstrap methods for approximating the underlying distribution in almost sure sense was established in SOC of Bootstrap in Logistic Regression 3

Claeskens et al. (2003). They also developed a finite sample bias correction of logistic regression estimator using their quadratic one-step Bootstrap method. In order to have an explicit understanding about the sample size requirement for practical im- plementations of any asymptotically valid method, it is essential to study the error rate of the approximation. The Bootstrap methods in linear regression have been shown to achieve second or- 1/2 der correctness (SOC), i.e. having the error rate o(n− ). In order to draw more accurate inference results compared to that based on asymptotic , SOC is essential. An elaborate description on the results on SOC of residual for generalized and perturbation Bootstrap methods in linear regression can be found in Lahiri (1992), Barbe and Bertail (2012) and Das and Lahiri (2019) and references their in. However, to the best of our knowledge, for none of the existing Boot- strap methods for logistic regression in the literature, SOC has been explored. In this paper, we propose Perturbation Bootstrap in Logistic Regression (PEBBLE) as an alternative of the normal approximation approach. Whenever the underlying estimator is a minimizer of certain objective function, perturbation Bootstrap simply produces a Bootstrap version of the estimator by finding the minimizer of a random objective function, suitably developed by perturbing the original objec- tive function using some non-negative random variables. We show that the perturbation Bootstrap attains SOC in approximating the distribution of βˆn. For the sake of comparison with the proposed Bootstrap method, we also find the error rate for the normal approximation of the studentized 1/2 version of the distribution of βˆn which comes out to be of O(n− log n). The extra “log n” term in the error rate appears due to the underlying lattice structure. Therefore, the inference based on our Bootstrap method is more accurate than that based on the asymptotic normality. In order to establish SOC for the proposed method, we start with studentization of √n(βˆ β) n − and its perturbation Bootstrap version. We show that unlike in the case of multiple linear regression, here SOC cannot be achieved only by studentization of √n(βˆ β) due to the lattice nature of the n − distribution of the logistic regression estimator βˆn, in general. The lattice nature of the distribution is induced by the binary nature of the response variable. It is a common practice to establish SOC by comparing the Edgeworth expansions in original and Bootstrap case (cf. Hall (1992)). However the usual Edgeworth expansion does not exist when the underlying setup is lattice. Therefore, correction terms are required to take care of the lattice nature. For example, one can compare Theorem 20.8 and corollary 23.2 in Bhattacharya and Rao (1986) [hereafter referred to as BR(86)] to learn the correction terms required in the Edgeworth expansions whenever the underlying structure is lattice. 1/2 In general, these correction terms cannot be approximated with an error of o(n− ), which makes SOC unachievable even with studentization. As a remedy we adopt the novel smoothing technique developed in Lahiri (1993). First, this smoothing technique is applied to transform the lattice nature of the distribution of the studentized version to make it absolutely continuous. Thus the resulting correction terms do not appear in the underlying Edgeworth expansion. Further we use the same smoothing technique for the Bootstrap version and establish SOC by comparing the Edgeworth expansions across the original and the Bootstrap cases. Moreover, an interesting property of the 4 Das D. and Das P. smoothing is that it has negligible effect on the asymptotic of βˆn and therefore it is not required to incorporate the effect of the smoothing in the form of the studentization. In order to prove the results, we establish the Edgeworth expansion of a smoothed version of a sequence of sample of independent random vectors even if they are not identically distributed (cf. Lemma 3). Lemma 3 may be of independent interest for establishing SOC of Bootstrap in other related problems. The rest of the paper is organized as follows. The perturbation Bootstrap version of the logistic regression estimator is described in Section 2. Main results including theoretical properties of the Bootstrap along with normal approximation are stated in Section 3. In Section 4, finite-sample performance of PEBBLE is evaluated comparing with other related existing methods by simulation experiments. Section 5 gives an illustration of PEBBLE in healthcare operations decision dataset. Auxiliary lemmas and the proof of the theorems are presented in Section 6. Finally, we conclude on the proposed methodology in Section 7.

2. Description of PEBBLE

In this section, we define the Perturbation Bootstrapped version of the logistic regression estimator.

Let G1∗,...,Gn∗ be n independent copies of a non-negative and non-degenerate random variable G∗ 2 3 3 with µ ∗ , V ar(G )= µ ∗ and E(G µ ∗ ) = µ ∗ . These quantities serve as perturbing ran- G ∗ G ∗ − G G dom quantities in the construction of the perturbation Bootstrap version of the logistic regression estimator. We define the Bootstrap version as the minimizer of a carefully constructed objective function which involves the observed values y1,...,yn as well as the estimated probability of suc- x′ βˆ e i n cessesp ˆ(xi)= , i = 1,...,n. Formally, the perturbation Bootstrapped logistic regression x′ βˆ 1+ e i n ˆ estimator βn∗ is defined as

n n ′ ˆ ∗ ∗ xit βn∗ = argmax (yi pˆ(xi))xi′ t (Gi∗ µG )+ µG pˆ(xi)(xi′ t) log(1 + e ) . t " − − − # Xi=1 n o Xi=1 n o ˆ In other words, βn∗ is the solution of the equation

n n 1 y pˆ(x ) x (G∗ µ ∗ )µ∗− + pˆ(x ) p(t x ) x = 0, (2.1) i − i i i − G G i − | i i i=1 i=1 X  X  since the derivative of the LHS of (2.1) with respect to t is negative definite. If Bootstrap equation (2.1) is compared to the original equation (1.2), it is easy to note that the second part of the LHS of (2.1) is the estimated version of the LHS of (1.2). The Bootstrap randomness is coming from n 1 the first part of the LHS in (2.1), i.e., y pˆ(x ) x (G µ ∗ )µ∗− . Also, the first part i=1 i − i i i∗ − G G ˆ is the main contributing term in the asymptoticP expansion of the studentized version of βn∗ . One immediate choice for the distribution of G∗ is Beta(1/2, 3/2) since the required conditions of G∗ are satisfied for this distribution. Other choices can be found in Liu (1988), Mammen (1993) and Das SOC of Bootstrap in Logistic Regression 5 et al. (2019). The characteristics of G∗ are assumed to be true for the rest of this paper.

Further any additional assumption on G∗ will be stated in respective theorems.

3. Main Results

In this section, we describe the theoretical results of Bootstrap as well as the normal approximation. In 3.1 we state a Berry-Esseen type theorem for a studentized version of the logistic regression estimator βˆn. In 3.2 we explore the effectiveness of Bootstrap in approximating the distribution of the studentized version. Theorem 2 shows that SOC is not achievable solely by studentization even when p = 1. As a remedy, we introduce a smoothing in the studentization and show that proposed Bootstrap method achieves SOC. Before exploring the rate of normal approximation, first we define the class of sets that we would consider in the following theorems. For any natural number m, the class of sets is the collection Am of Borel subsets of m satisfying R sup Φ((δB)ǫ)= O(ǫ) as ǫ 0. B m ↓ ∈A Here Φ denotes the normal distribution with mean 0 and dispersion matrix being the identity matrix. We are going to use the class for the uniform asymptotic results on normal and Bootstrap Ap approximations. P denotes the conditional Bootstrap probability of G∗ given data y1,...,yn . ∗ { }

3.1. Rate of Normal Approximation

In this sub-section we explore the rate of normal approximation of suitable studentized version of the logistic regression estimator βˆ , uniformly over the class of sets . From the definition (1.2) of n Ap ˆ n ˆ βn, we have that i=1(yi pˆ(xi))xi = 0. Now using Taylor’s expansion of √n βn β , it is easy to − − ′ ′ 1 1 n x β x β 2 see that the asymptotic variance of √n βˆ β is L where L = n x x e i (1+e i ) . P n − n− n − i=1 i i′  − ˆ An estimator of Ln can be obtained by replacing β by βn in the form ofPLn. Hence we can define the studentized version of βˆn as H˜ = √nLˆ 1/2 βˆ β , n n n − n ′ ˆ ′ ˆ 2 ˆ 1 xiβn xiβn  where Ln = n− i=1 xixi′ e 1+ e − . Other studentized versions can be constructed by considering otherP estimators of Ln. For details of the construction of different studentized versions, one can look into Lahiri (1994). The result on normal approximation will hold for other studentized versions also as long as it involves the estimator of L which is √n consistent. n − Berry-Esseen theorem states that the error in normal approximation for the distribution of 1/2 the mean of a sequence of independent random variables is O(n− ), provided the average third absolute moment is bounded (cf. Theorem 12.4 in BR(86)). Note that there is an extra multiplicative 1/2 “log n” term besides the usual n− term in the error rate of the normal approximation which is due to the error incurred in Taylor’s approximation of √n(βˆ β). Since the underlying setup n − 6 Das D. and Das P. in logistic regression has lattice nature, in general, this error cannot be corrected by higher order approximations, like Edgeworth expansions. Further one important tool in deriving the error rate in normal approximation, and later for deriving the higher order result for the Bootstrap is to find the rate of convergence of βˆn to β. To this end, we state our first theorem as follows.

Theorem 1. Suppose n 1 n x 3 = O(1) and L L as n where L is a pd matrix. − i=1 k ik n → → ∞ Then P

(a) there exists a positive constant C0 such that when n>C0 we have

1/2 1/2 1/2 P βˆ solves (1.2) and βˆ β C n− (logn) = 1 o n− . n k n − k≤ 0 −   (b) we have  1/2 sup P H˜ n B Φ(B) = O n− log n . B p ∈ − ∈A   The proof of Theorem 1 is presented in Section 6. Theorem 1 shows that the normal approx- imation of the distribution of H˜ n, the studentized logistic regression estimator, has near optimal Berry-Esseen rate. However the rate can be improved significantly by Bootstrap and an application of a smoothing, as described in 3.2.

3.2. Rate of Bootstrap Approximation

In this sub-section, we extensively study the rate of Bootstrap approximation for the distribution of the logistic regression estimator. To that end, before exploring the rate of convergence of Bootstrap we need to define the suitable studentized versions in both original and the Bootstrap setting. Sim- ilar to the original case, the asymptotic variance of the Bootstrapped logistic regression estimator ˆ βn∗ is needed to be found to define the studentized version in the Bootstrap setting. Using Taylor’s ˆ ˆ ˆ 1 ˆ ˆ 1 expansion, from (2.1) it is easy to see that the asymptotic variance of √n βn∗ βn is Ln− MnLn− ′ ′ − 1 n x βˆ x βˆ 2 1 n 2 where Lˆ = n x x e i n (1+e i n ) and Mˆ = n y pˆ(x ) x x . Therefore the n − i=1 i i′ − n − i=1 i − i i i′ studentized versionP in Bootstrap setting can be defined as P  1/2 H∗ = √nMˆ ∗− L∗ βˆ∗ βˆ , n n n n − n 1 n x′ βˆ∗ x′ βˆ∗ 2 1 n 2 2 2 where L = n x x e i n 1+e i n − and Mˆ = n y pˆ(x ) x x µ−∗ (G µ ∗ ) . n∗ − i=1 i i′ n∗ − i=1 i− i i i′ G i∗− G Analogously, weP define the original studentized version as P  1/2 H = √nMˆ − Lˆ βˆ β , n n n n − which will be used for investigating SOC of Bootstrap for rest of this section. In the next theorem we show that Hn∗ fails to be SOC in approximating the distribution of Hn even when p = 1.

Theorem 2. Suppose p = 1 and denote the only covariate by x in the model (1.1). Let x1,...,xn be the observed values of x and β be the true value of the regression parameter. Define, µn = n 1 n x p(β x ). Assume the following conditions hold: − i=1 i | i P SOC of Bootstrap in Logistic Regression 7

(C.1) x1,...,xn are non random and are all integers. (C.2) x ,...,x = 1 where i ,...,i 1,...,n with m (log n)2. i1 im { 1 m} ⊆ { } ≥ 1 n 6 (C.3) max xi : i = 1,...,n = O(1) and lim infn n− i=1 xi > 0. {| | } →∞ | | (C.4) √n µ < M for n M where M is a positive constant. | n| 1 ≥ 1 1  P  (C.5) The distribution of G∗ has an absolutely continuous component with respect to Lebesgue mea- sure and EG 4 < . ∗ ∞

Then there exist an interval Bn and a positive constant M2 (does not depend on n) such that

lim P √n P Hn∗ Bn P Hn Bn M2 = 1 n ∗ →∞ ∈ − ∈ ≥     The proof of Theorem 2 is presented in Section 6. Theorem 2 shows that unlike in the case of multiple linear regression, in general the Bootstrap cannot achieve SOC even with studentization.

Now we further look into the form of the set Bn. Bn is of the form fn(En ) with En = ( , zn] 3 ×R −∞ and z = µ . f ( ) is a continuous function which is obtained from the Taylor expansion n 4n − n n · of H . Since E is a convex subset of 2, it is also a connected set. Since f ( ) is a continuous n n × R R n · function, B is a connected subset of and hence is an interval. n R Now, we define the smoothed versions of Hn and Hn∗ which are necessary in achieving SOC by the Bootstrap for general p. Note that the primary reason behind Bootstrap’s failure is the lattice nature of the distribution of √n(βˆ β). Hence if one can somehow smooth the distribution n − √n(βˆ β), or more generally the distribution of H , so that the smoothed version has density n − n with respect to Lebesgue measure, then the Bootstrap may be shown to achieve SOC by employing theory of Edgeworth expansions. To that end, suppose Z is a p dimensional standard normal − random vector, independent of y1,...,yn. Define the smoothed version of Hn as

ˇ ˆ 1/2 Hn = Hn + Mn− bnZ, (3.1)

where bn n 1 is a suitable sequence such that it has negligible effect on the variance of √n(βˆn β) { } ≥ − and hence on the studentization factor. See Theorem 3 for the conditions on bn n 1. To define { } ≥ the smoothed studentized version in Bootstrap setting, consider another p dimensional standard − normal vector by Z∗ which is independent of y1,...,yn, G1∗,...,Gn∗ and Z. Define the smoothed

version of Hn∗ as ˇ ˆ 1/2 Hn∗ = Hn∗ + Mn∗− bnZ∗. (3.2) The following theorem can be distinguished as the main theorem of this section as it shows that the smoothing does the trick for Bootstrap to achieve SOC. Thus the inference on β based on the Bootstrap after smoothing is much more accurate than the normal approximation. To state the main theorem, define W = y x , y2 Ey2 z ′ where y = (y p(β x )) and z = i i i′ i − i i′ i i − | i i (x2 ,x x ,...,x x ,x2 ,x x ,...,x x ,...,x2 ) withx = (x ,...,x ) , i 1,...,n . i1 i1 i2 i1 ip i2 i2 i3 i2 ip  ip ′  i i1 ip ′ ∈ { } 8 Das D. and Das P.

Theorem 3. Suppose n 1 n x 6 = O(1) and the matrix n 1 n V ar(W ) converges to − i=1 k ik − i=1 i d some positive definite matrix as n . Also choose the sequence bn n 1 such that bn = O(n− ) P →∞ { P} ≥ and n 1/p1 log n = o(b2 ) where d> 0 is a constant and p = max p + 1, 4 . Then − n 1 { }

(a) there exists two positive constant C2 such that when n>C2 we have

ˆ ˆ ˆ 1/2 1/2 1/2 P βn∗ solves (2.1) and βn∗ βn C2.n− .(logn) = 1 op n− . ∗ k − k≤ −    (b) we have ˇ ˇ 1/2 sup P Hn∗ B P Hn B = op n− . B p ∗ ∈ − ∈ ∈A   

The proof of Theorem 3 is presented in Section 6. Theorem 3 shows that SOC of PEBBLE can be achieved by a simple smoothing in the studentized pivotal quantities. As a result, much more accurate inference on β can be drawn based on Bootstrap than that based on normal approximation specially when n is not large enough compared to p. The finite sample simulation results presented in Table 1 also confirms this fact.

Remark 3.1. The class of sets used to state the uniform asymptotic results is somewhat Ap abstract. Note that there are two major reasons behind considering this class. The first reason is to obtain asymptotic normality or to obtain valid Edgeworth expansions for the normalized part of the underlying pivot and the second one is to bound the remainder term by required small magnitude with sufficiently large probability (or Bootstrap probability). A natural choice for is A the collection of all Borel measurable convex subsets of p, due to Theorem 3.1 in BR(86). R Remark 3.2. The results on Bootstrap approximation presented in Theorem 3, may be estab- 1 n lished in almost sure sense also. In that case the only additional requirement is to have n− i=1 x 12 = O(1), since y ,...,y can take either 0 or 1. Actually an almost sure version of part (a) of k ik 1 n P Theorem 3 is necessary to establish Theorem 2. Note that the requirement for almost sure version is met under the assumptions of Theorem 2.

Remark 3.3. Note that the random quantities Z and Z∗ respectively, introduced in (3.1) and

(3.2), are essential in achieving SOC of the Bootstrap. Z and Z∗ both are assumed to be distributed as N(0, I ), I being the p p identity matrix. However, Theorem 3 remains to be true if we replace p p × Ip by any diagonal matrix, i.e., Theorem 3 is true even if we only assume that the components of

Z (and of Z∗) are independent and have normal distributions.

4. Simulation Study

In this section, we compare the performance of PEBBLE with other existing methods via simulation experiments. For comparative study, we consider the Normal approximation, Pearson Residual Re- Bootstrap (PRRB, Moulton and Zeger (1991)), One-Step Bootstrap (OSB) and Quadratic SOC of Bootstrap in Logistic Regression 9

Bootstrap (QB) (Claeskens et al.(2003)). We consider

b = (1, .5, 2, 0.75, 1.5, 1, 1.85, 1.6). − − − − Note that b has length 8. For the scenarios where p 8, we take the true parameter vector β to be ≤ the first p-many elements of b. The covariate vector X is generated from multivariate normal distri- i j bution with mean 0 and variance Σ = σij p p where σij = 0.5| − |. Now, in order to access the per- { } × formance of all the methods for various dimensional coefficient vectors and sample sizes, we consider the following cases (n,p) = (30, 3), (50, 3), (50, 4), (100, 3), (100, 4), (100, 6), (200, 3), (200, 4), (200, 6) and (200, 8). 1 2 In PEBBLE, we take p = max p + 1, 4 , b = n− p1+1 . Both Z and Z are drawn from in- 1 { } n ∗ 1 I dependent multivariate normal distribution with mean 0 and variance 4 p. Gi∗ is genrated from 1 3 Beta( 2 , 2 ). Further details regarding the forms of the confidence sets for PEBBLE is provided in the Supplementary Material Section 2. PEBBLE is implemented in R. Other methods namely Normal approximation, PRRB, OSB and QB are also implemented in R. For the , we consider 1000 Bootstrap iterations. In order to find coverage, such experiment is repeated 1000 times for each (n,p) scenario. In Table 1, we note down the empirical coverage of lower 90% con- fidence region of β, upper, middle and lower 90% Confidence intervals (CIs) corresponding to the minimum and maximum components of β. We also note down the average over empirical cover- ages of upper, middle and lower 90% CI corresponding to all components of β. Average widths of 90% CI corresponding to all applicable cases are also noted in parenthesis. It is noted that in general, PEBBLE performs better than other methods; specifically, for lower n : p scenarios (small sample size, high dimension), i.e., cases corresponding to (n,p) = (30, 3), (50, 4), (100, 6), (200, 8) in our study. For example, for (n,p) = (100, 6), (200, 8) it is noted that PEBBLE outperforms other methods by a big margin. As n increases for fixed p, performance of PEBBLE is noted to improve and the widths of CIs tend to decrease, as expected. PEBBLE performs better in comparatively bigger margin than other methods. It is also noted that for all the simulation scenarios, the average coverage over all coordinates is much closer to 0.90 for PEBBLE compared to other methods. We observe that for relatively smaller n : p scenarios, the PEBBLE CIs are a little wider compared to other methods, but, as n increases (for fixed p), PEBBLE CI widths become closer to those observed for other methods.

5. Application to Healthcare Operations Decision

Vaginal delivery is the most common type of birth. However due to several medical reasons, with advancement of medical procedures, caesarian delivery is often considered as an alternative way for delivery. Recently a few studies showed how the recommended type of delivery may depend on various clinical aspects of the mother including age, blood pressure and heart problem (Rydahl et al. (2019), Amorim et al. (2017), Pieper (2012)). We consider a dataset about caesarian section 10 Das D. and Das P.

β βmin βmin βmin βmax βmax βmax β avg. β avg. β avg. (n,p) Methods (lower) middle (width) upper lower middle (width) upper lower middle (width) upper lower PEBBLE 0.916 0.885 (2.82) 0.861 0.918 0.936 (3.95) 0.928 0.914 0.900 (3.09) 0.888 0.913 Normal 0.952 0.947 (2.31) 0.956 0.896 0.964 (2.86) 0.909 0.993 0.958 (2.42) 0.939 0.935 (30,3) PRRB 0.946 0.916 (2.17) 0.926 0.873 0.943 (2.66) 0.914 0.932 0.930 (2.27) 0.915 0.905 OSB 0.953 0.942 (2.34) 0.940 0.889 0.930 (2.67) 0.911 0.939 0.930 (2.38) 0.916 0.921 QB 0.976 0.952 (2.49) 0.950 0.924 0.958 (3.07) 0.936 0.965 0.936 (2.53) 0.920 0.940 PEBBLE 0.888 0.891 (2.07) 0.878 0.923 0.909 (2.89) 0.925 0.895 0.904 (2.20) 0.901 0.912 Normal 0.937 0.927 (1.76) 0.924 0.901 0.948 (2.18) 0.906 0.971 0.936 (1.80) 0.920 0.930 (50,3) PRRB 0.917 0.892 (1.68) 0.896 0.880 0.912 (2.06) 0.899 0.933 0.902 (1.71) 0.896 0.909 OSB 0.925 0.911 (1.79) 0.907 0.885 0.905 (2.04) 0.903 0.928 0.913 (1.77) 0.908 0.910 QB 0.932 0.915 (1.86) 0.904 0.913 0.916 (2.12) 0.904 0.935 0.922 (1.84) 0.914 0.922 PEBBLE 0.909 0.901 (2.92) 0.877 0.936 0.909 (3.87) 0.936 0.879 0.902 (2.71) 0.897 0.910 Normal 0.931 0.926 (2.14) 0.952 0.902 0.951 (2.62) 0.906 0.985 0.939 (2.03) 0.926 0.926 (50,4) PRRB 0.928 0.899 (1.99) 0.933 0.860 0.938 (2.42) 0.899 0.949 0.906 (1.88) 0.906 0.894 OSB 0.958 0.928 (2.20) 0.943 0.920 0.937 (2.44) 0.908 0.952 0.928 (2.03) 0.926 0.919 QB 0.954 0.924 (2.11) 0.931 0.915 0.926 (2.40) 0.891 0.954 0.924 (1.99) 0.923 0.912 PEBBLE 0.880 0.877 (1.19) 0.878 0.896 0.896 (1.69) 0.912 0.891 0.887 (1.35) 0.894 0.894 Normal 0.926 0.912 (1.08) 0.909 0.904 0.918 (1.40) 0.911 0.901 0.913 (1.18) 0.903 0.903 (100,3) PRRB 0.905 0.901 (1.08) 0.907 0.901 0.912 (1.39) 0.916 0.891 0.901 (1.18) 0.902 0.898 OSB 0.906 0.897 (1.09) 0.900 0.899 0.896 (1.39) 0.915 0.877 0.897 (1.18) 0.900 0.894 QB 0.899 0.897 (1.08) 0.889 0.900 0.880 (1.33) 0.907 0.873 0.894 (1.17) 0.895 0.895 PEBBLE 0.885 0.907 (1.79) 0.891 0.927 0.900 (2.24) 0.920 0.880 0.898 (1.71) 0.899 0.902 Normal 0.928 0.917 (1.39) 0.924 0.903 0.942 (1.65) 0.912 0.929 0.916 (1.35) 0.910 0.904 (100,4) PRRB 0.901 0.889 (1.35) 0.892 0.900 0.896 (1.60) 0.905 0.881 0.887 (1.32) 0.893 0.887 OSB 0.915 0.904 (1.41) 0.918 0.900 0.914 (1.63) 0.915 0.899 0.904 (1.36) 0.906 0.900 QB 0.940 0.920 (1.49) 0.934 0.902 0.943 (1.86) 0.937 0.926 0.912 (1.42) 0.913 0.903 PEBBLE 0.931 0.910 (1.77) 0.880 0.917 0.907 (2.79) 0.929 0.868 0.906 (2.08) 0.908 0.902 Normal 0.857 0.874 (1.23) 0.883 0.871 0.903 (1.68) 0.882 0.937 0.871 (1.34) 0.877 0.887 (100,6) PRRB 0.849 0.854 (1.22) 0.878 0.870 0.884 (1.66) 0.869 0.914 0.848 (1.33) 0.866 0.874 OSB 0.933 0.797 (1.29) 0.848 0.831 0.832 (1.66) 0.845 0.872 0.791 (1.37) 0.837 0.846 QB 0.953 0.819 (1.37) 0.865 0.838 0.863 (1.84) 0.857 0.902 0.807 (1.44) 0.848 0.854 PEBBLE 0.891 0.906 (0.86) 0.897 0.905 0.918 (1.21) 0.908 0.915 0.903 (1.01) 0.896 0.906 Normal 0.905 0.904 (0.78) 0.902 0.910 0.910 (1.03) 0.936 0.879 0.902 (0.89) 0.912 0.894 (200,3) PRRB 0.902 0.900 (0.77) 0.896 0.904 0.899 (1.02) 0.930 0.874 0.893 (0.88) 0.904 0.892 OSB 0.905 0.902 (0.78) 0.900 0.917 0.897 (1.01) 0.935 0.870 0.895 (0.88) 0.910 0.893 QB 0.867 0.890 (0.75) 0.889 0.913 0.868 (0.93) 0.924 0.842 0.871 (0.83) 0.893 0.877 PEBBLE 0.872 0.898 (1.08) 0.890 0.908 0.912 (1.54) 0.922 0.893 0.900 (1.11) 0.900 0.905 Normal 0.919 0.917 (0.89) 0.902 0.917 0.910 (1.18) 0.918 0.893 0.906 (0.92) 0.905 0.902 (200,4) PRRB 0.899 0.908 (0.88) 0.891 0.915 0.891 (1.15) 0.916 0.876 0.892 (0.91) 0.896 0.890 OSB 0.905 0.911 (0.89) 0.897 0.914 0.901 (1.16) 0.925 0.880 0.900 (0.92) 0.905 0.898 QB 0.926 0.924 (0.93) 0.905 0.923 0.921 (1.23) 0.930 0.892 0.917 (0.97) 0.912 0.907 PEBBLE 0.927 0.915 (1.32) 0.890 0.930 0.921 (1.79) 0.933 0.875 0.913 (1.59) 0.908 0.906 Normal 0.794 0.833 (0.89) 0.855 0.868 0.892 (1.17) 0.915 0.862 0.847 (1.01) 0.863 0.868 (200,6) PRRB 0.791 0.829 (0.90) 0.860 0.872 0.872 (1.18) 0.911 0.859 0.840 (1.02) 0.860 0.865 OSB 0.904 0.751 (0.92) 0.813 0.842 0.794 (1.18) 0.893 0.780 0.741 (1.03) 0.814 0.818 QB 0.902 0.738 (0.88) 0.804 0.837 0.784 (1.15) 0.893 0.768 0.736 (1.01) 0.814 0.814 PEBBLE 0.841 0.869 (1.75) 0.837 0.948 0.866 (2.28) 0.965 0.776 0.851 (1.94) 0.866 0.877 Normal 0.405 0.679 (0.94) 0.886 0.676 0.734 (1.19) 0.696 0.961 0.688 (1.00) 0.778 0.800 (200,8) PRRB 0.496 0.679 (0.98) 0.887 0.673 0.731 (1.23) 0.701 0.953 0.691 (1.03) 0.780 0.803 OSB 0.861 0.468 (0.97) 0.810 0.571 0.569 (1.17) 0.634 0.843 0.486 (1.00) 0.680 0.714 QB 0.852 0.470 (0.98) 0.805 0.575 0.551 (1.15) 0.637 0.837 0.480 (0.99) 0.680 0.713 Table 1. Comparative performance study of the proposed method Perturbation Bootstrap in Logistic Regression (PEBBLE) and other existing methods Normal approximation (Normal), Pearson Residual Resampling Bootstrap (PRRB), One-Step Bootstrap (OSB) and Quadratic Bootstrap (QB). All considered coverage analysis is based on 90% confidence intervals (CI) and average is noted over 1000 experiments, results for each experiment is evaluated based on 1000 Bootstrap iterations. We consider the average coverages based on lower CI of norm of β (column 1), upper, lower and middle CI of the minimum absolute value of β (column 2,3,4), upper, lower and middle CI of the maximum absolute value of the β (column 5,6,7), upper, lower and middle CI of the all components of β, on average (column 8,9,10). The average width of the middle CI corresponding to the min, max and average components are provided in parenthesis in columns 2,5,8 respectively. SOC of Bootstrap in Logistic Regression 11

90% CI 90% CI 90% CI Variables βˆ (mid) (upper) (lower) Age -0.010 (-0.151, 0.300) >-0.100 <0.237 Delivery number 0.263 (-0.544, 0.740) >-0.398 <0.601 Delivery time -0.427 (-0.643, 0.466) >-0.521 <0.348 Blood pressure -0.251 (-0.709, 0.680) >-0.548 <0.531 Heart problem 1.702 (-0.139, 2.327) >0.145 <2.105 Table 2. Real Data Analysis : The estimated coefficients and corresponding middle, upper and lower 90% CIs are noted for all the covariates; the type of delivery is the dependent variable, which takes values 1 or 0 based on if the delivery was caesarian or not. results of 80 pregnant women along with several important related clinical covariates. The dataset is avialable in the following link 1. We regress the type of delivery (caesarian or not) on several related covariates namely age, delivery number, delivery time, blood pressure and presence of heart problem. Delivery time can take three values 0 (timely), 1 (premature) and 2 (latecomer). Blood pressure is denoted by 0, 1, 2 for the cases low, normal and high respectively. The covariate presence of heart problem is also binary, 0 denoting apt behaviour and 1 denoting its inept condition. We perform a logistic regression and corresponding CIs are computed using PEBBLE and in Table 2 we note down the results. It is noted that although 90% CIs for all the covariates contain zero, however, the 90% CI for heart problem belong to the positive quadrant mostly; also the upper 90% CI completely belongs to the positive quadrant, which implies women with heart problems tend to have caesarian procedure, coinciding with the findings in Yap et al. (2008) and Blaci et al. (2011).

6. Proof of the Results

6.1. Notations

Before going to the proofs we are going to define few notations. Suppose, ΦV and φV respectively denote the normal distribution and its density with mean 0 and covariance matrix V . We will write

ΦV = Φ and φV = φ when the dispersion matrix V is the identity matrix. C,C ,C , denote 1 2 · · · generic constants that do not depend on the variables like n,x, and so on. ν1, ν2 denote vectors p in R , sometimes with some specific structures (as mentioned in the proofs). (e1,..., ep)′ denote the standard basis of p. For a non-negative integral vector α = (α , α ,...,α ) and a function R 1 2 l ′ f = (f ,f ,...,f ) : l l, l 1, let α = α + . . . + α , α!= α ! ...α !, f α = (f α1 ) . . . (f αl ), 1 2 l R → R ≥ | | 1 l 1 l 1 l Dαf = Dα1 Dαl f , where D f denotes the partial derivative of f with respect to the jth 1 1 · · · l 1 j 1 1 component of α, 1 j l. We will write Dα = D if α has all the component equal to 1. For ≤ ≤ t = (t ,t , t ) l and α as above, define tα = tα1 tαl . For any two vectors α, β k, 1 2 · · · l ′ ∈ R 1 · · · l ∈ R α β means that each of the component of α is smaller than that of β. For a set A and real ≤ constants a , a , a A + a = a y + a : y A , ∂A is the boundary of A and Aǫ denotes the 1 2 1 2 { 1 2 ∈ } ǫ neighbourhood of A for any ǫ > 0. is the set of natural numbers. C( ),C ( ),... denote − N · 1 · 1https://archive.ics.uci.edu/ml/datasets/Caesarian+Section+Classification+Dataset 12 Das D. and Das P. generic constants which depend on only their arguments. Given two probability measures P1 and P defined on the same space (Ω, ), P P defines the measure on (Ω, ) by convolution of P & 2 F 1 ∗ 2 F 1 P and P P = P P (Ω), P P being the total variation of (P P ). For a function 2 k 1 − 2k | 1 − 2| | 1 − 2| 1 − 2 g : k m with g = (g ,...,g ) , R → R 1 m ′ ∂g (x) Grad[g(x)] = i . ∂xj m k   × Before moving to the proofs of the main theorems, we state some auxiliary lemmas. The proofs of lemma 3, 10 and 11 are relegated to the Supplementary material file to save space. Also we are going to present the proof of Theorem 2 at last, since some proof steps of Theorem 3 will be essential in proving Theorem 2.

6.2. Auxiliary Lemmas

Lemma 1. Suppose Y ,...,Y are zero mean independent r.v.s with E( Y t) < for i = 1,...,n 1 n | i| ∞ and S = n Y . Let n E( Y t)= σ , c(1) = 1+ 2 t and c(2) = 2(2 + t) 1e t. Then, for any n i=1 i i=1 | i| t t t t − − t 2 and x> 0, ≥ P P  (1) t (2) 2 P [ S >x] c σ x− + exp( c x /σ ) | n| ≤ t t − t 2 Proof of Lemma 1. This inequality was proved in Fuk and Nagaev (1971).

1 N(t) 1 Lemma 2. For any t> 0, − wher N( ) and n( ) respectively denote the cdf and pdf n(t) ≤ t · · of real valued standard normal rv.

Proof of Lemma 2: This inequality is proved in Birnbaum (1942).

Lemma 3. Suppose Y ,...,Y are mean zero independent random vectors in k with E = n 1 1 n R n − n V ar(Y ) converging to some positive definite matrix V . Let s 3 be an integer and ρ¯ = i=1 i ≥ s+δ PO(1) for some δ > 0. Additionally assume Z to be a N(0, Ik) random vector which is independent d (s 2)/k˜ 2 of Y1,...,Yn and the sequence cn n 1 to be such that cn = O(n− ) & n− − log n = o(cn) where { } ≥ k˜ = max k + 1,s + 1 & d> 0 is a constant. Then for any Borel set B of k, { } R

(s 2)/2 P √nY¯ + cnZ B ψn,s(x)dx = o n− − , (6.1) ∈ − B Z    where ψ ( ) is defined above. n,s · Proof of Lemma 3. See Section 1 of Supplementary material file. SOC of Bootstrap in Logistic Regression 13

1/2 Lemma 4. Suppose all the assumptions of Lemma 2 are true. Define dn = n− cn and Aδ = x k : x < δ for some δ > 0. Let H : k m (k m) has continuous partial derivatives { ∈ R k k } R → R ≥ of all orders on A and Grad[H(0)] is of full row rank. Then for any Borel set B of m we have δ R

(s 2)/2 P √n H(Y¯n + dnZ) H(0)) B ψˇn,s(x)dx = o n− − , (6.2) − ∈ − B   Z   s 2 r/2 m1 1 2j 2j ˇ x − x x − x where ψn,s( ) = 1+ r=1 n− a1,r(Qn, )φMˇ n ( ) j=1 cn a2,j( ) with m1 = inf j : cn = o n (s 2)/2 andh Q being the distribution of √nY¯ih. a (Q , ), r i1,..., (s 2) , are poly- − − nP n P1,r n · ∈ { − }  nomials whose coefficients are continuous functions of first s average cumulants of Y ,...,Y .  { 1 n} a ( ), j 1,..., (m 1) , are polynomials whose coefficients are continuous functions of par- 2,j · ∈ { − } tial derivatives of H of order (s 1) or less. Mˇ = BE¯ B¯ with B¯ = Grad[H(0)] and E = − n n ′ n 1 n n− i=1 V ar(Yi). P Proof of Lemma 4. This follows exactly through the same line of the proof of Lemma 3.2 in Lahiri (1989).

Lemma 5. Let Y ,...,Y be mean zero independent random vectors in k with n 1 n E Y 3 = 1 n R − i=1 k ik 2 1 1 n O(1). Suppose Tn = En− where En = n− i=1 V ar(Yi) is the average positive definiteP covariance matrix and E converges to some positive definite matrix as n . Then for any Borel subset B n P →∞ of k we have R n −1/2 1/2 1/2 C22(k)ρ3n P n− T Y B Φ(B) C (k)n− ρ + 2 Φ (∂B) , n i ∈ − ≤ 22 3  i=1    X where ρ = n 1 n E T Y 3. 3 − i=1 k n ik P Proof of Lemma 5. This is a direct consequence of part (a) of corollary 24.3 in BR(86).

Lemma 6. Suppose A, B are matrices such that (A aI) and (B aI) are positive semi-definite − − matrices of same order, for some a> 0. For some r> 0, Ar,Br are defined in the usual way. Then for any 0

Lemma 7. Suppose all the assumptions of Lemma 4 are true and Mˇ = I , the m m identity n m × matrix. Define Hˆ = √n H(Y¯ + d Z) H(0)) + R where P R = o n (s 2)/2 = 1 n n n − n k nk − − − (s 2)/2 o n− − and s is ash defined in Lemma 3. Theni we have    (s 2)/2 sup P Hˆn B ψˇn,s(x)dx = o n− − , (6.3) B m ∈ − B ∈A   Z   where the class of sets m is as defined in section 3. A 14 Das D. and Das P.

Proof of Lemma 7. Recall the definition of (∂B)ǫ which is given in section 3. For some B m −(s−2)/2 ⊆ R and δ > 0, define Bn,s,δ = (∂B)δn . Hence using Lemma 4, for any B we have ∈Am

(s 2)/2 P Hˆn B ψˇn,s(x)dx = o n− − ∈ − B Z     (s 2)/2 P Hˆ B P √n H(Y¯ + d Z) H(0)) B + o n− − ≤ n ∈ − n n − ∈   (s 2)/2   n,s,δ (s 2)/2 P R = o n− − + 2P √n H(Y¯ + d Z) H(0)) B + o n− − ≤ k nk 6 n n − ∈    n,s,δ (s 2)/2    = 2P √n H(Y¯ + d Z)  H(0)) B + o n− − n n − ∈  (s 2)/2    = 2 ψˇn,s(x)dx + o n− − (6.4) Bn,s,δ Z   for any δ > 0. Now calculations at page 213 of BR(86) and arguments at page 58 of Lahiri(1989) imply that for any B , ∈Am n,s,δ (s 2)/2 (s 2)/2 ψˇn,s(x)dx C21(s) sup Φ B + o n− − = o n− − , Bn,s,δ ≤ B m Z ∈A      since δ > 0 is arbitrary. Therefore (6.3) follows from (6.4).

Lemma 8. Let A and B be positive definite matrices of same order. Then for some given matrix C, the solution of the equation AX + XB = C can be expressed as

∞ tA tB X = e− Ce− dt, Z0 tA tB where e− and e− are defined in the usual way.

Proof of Lemma 8. This lemma is an easy consequence of Theorem VII.2.3 in Bhatia (1996).

Lemma 9. Let W1,...,Wn be n independent mean 0 random variables with average variance s2 = n 1 n EW 2 and P max W : i 1,...,n C = 1 for some positive constant C n − i=1 i {| j| ∈ { }} ≤ 30 30 and integer s 3. χ¯ is the average νth cumulant. Recall the polynomial P˜ for any non-negative P ≥ ν,n  r integer r, as defined in the beginning of this section. Then there exists two constants 0 0 such that whenever t C (s)√n min C− s ,C− − sn − , we have 32 | |≤ 31 { 30 n 30 } n s 2 in−1/2tW − r/2 (s2 t2)/2 s s (s 2)/2 s 3(s 2) (s2 t2)/4 E e j n− P˜ it : χ¯ e− n C (s)C s− n− − (s t) +(s t) − e− n − r { ν,n} ≤ 32 30 n n n j=1   r=0 h i Y X 

Proof of Lemma 9. In view of Theorem 9.9 of BR(86), enough to show that for any j 1,...,n , − − ∈ { } its 1n 1/2W 2 s/(s 2) s/(s 2) E e n j 1 1/2 whenever t C (s)√n min C− s ,C− − sn − . This is indeed − ≤ | |≤ 31 { 30 n 30 } the case due to the fact that  2 2 −1/2 t EW itn Wj j E e 1 2 . − ≤ 2nsn



SOC of Bootstrap in Logistic Regression 15

Lemma 10. Assume the setup of Theorem 2 and let X = y x , i 1,...,n . Define σ2 = i i i ∈ { } n n 1 n V ar(X ) and χ¯ as the νth average cumulant of (X E(X )),..., (X E(X )) . − i=1 i ν,n { 1 − 1 n − n } Pr Φ 2 : χ¯ν,n is the finite signed measure on whose density is P˜r D : χ¯ν,n φ 2 (x). Let −P σn { } R − { } σn S (x) = 1 and S (x) = x 1/2. Suppose σ2 is bounded away from both 0 & and assumptions 0 1 − n ∞  (C.1)-(C.3) of Theorem 2 hold. Then we have

n 1 r 1/2 r/2 r 1/2 d sup P n− Xi E(Xi) x n− ( 1) Sr(nµn + n x) Φ 2 (x) − ≤ − − dxr σn x i=1 r=0 ∈R  X   X 1/2 1/2 n− P1 Φ 2 : χ¯ν,n (x) = o n− , (6.5) − − σn { }

  where Pr Φσ2 : χ¯ν,n (x) is the Pr Φσ2 : χ¯ν,n measure of the set ( ,x]. − n { } − n { } − −∞   Proof of Lemma 10. See Section 1 in the Supplementary material file.

l Lemma 11. Let W˘ 1,..., W˘ n be iid mean 0 non-degenerate random vectors in for some natural ′ R it W˘ 1 number l, with finite fourth absolute moment and lim sup t Ee < 1 (i.e. Cramer’s condi- k k→∞ tion holds). Suppose W˘ = (W˘ ,..., W˘ ) where W˘ is a random vector in lj and m l = l, i i′1 im′ ′ ij R j=1 j ˜ ˜ m being a fixed natural number. Consider the sequence of random variables W1,...,PWn where W˜ = (c W˘ ,...,c W˘ ) . c : i 1,...,n , j 1,...,m is a collection of real numbers i i1 i′1 im im′ ′ { ij ∈ { } ∈ { }} 1 n 4 1 n 2 such that for any j 1,...,m , n− i=1 cij = O(1) and lim infn n− i=1 cij > 0. Also ∈ { } | | →∞ ˜ ˜ assume that Vn = V ar(Wi) convergesn toP some positiveo definite matrix and χ¯ν,n denotesP the average νth cumulant of W˜ 1,..., W˜ n. Then we have

n 1/2 1/2 1/2 sup P n− W˜ B 1+ n− P˜ D : χ¯ φ (t)dt = o n− , (6.6) i r ν,n V˜n B ∈ − B − { } ∈Al  i=1  Z h i X   where the collection of sets is as defined in section 3. Al Proof of Lemma 11. See Section 1 in the Supplementary material file.

6.3. Proof of Main Results

Proof of Theorem 1. Recall that the studentized pivot is

H˜ = √nLˆ 1/2 βˆ β , n n n − n ′ ˆ ′ 2  ˆ 1 xiβn xi ˆ ˆ where Ln = n− i=1 xixi′ e 1+ e βn − . βn is the solution of (1.2). By Taylor’s theorem, from (1.2) we haveP  n n 1 1 z z z 3 2 L βˆ β = n− (y p(β x ))x (2n)− x e i (1 e i )(1 + e i )− x′ (βˆ β) , (6.7) n n − i − | i i − i − i n − i=1 i=1  X X   16 Das D. and Das P.

where z x β x (βˆ β) for all i 1,...,n . Now due to the assumption n 1 n x 3 = | i − i′ | ≤ | i′ n − | ∈ { } − i=1 k ik O(1), by Lemma 1 (with t = 3) we have P n 1 1/2 1/2 1/2 P n− (y p(β x ))x C (p)n− (log n) = o n− , (6.8) − | i ij ≤ 40 i=1  X  

for any j 1,...,p . Again by assumption L converges to some positive definite matrix L. ∈ { } n Moreover,

n n 1 z z z 3 2 1 3 2 (2n)− x e i (1 e i )(1 + e i )− x′ (βˆ β) n− x βˆ β . i − i − ≤ k ik k n − k i=1 i=1 X    X 

Hence (6.7) can be rewritten as (βˆ β)= f (βˆ β), n − n n − where f is a continuous function from p to p satisfying P f βˆ β C n 1/2(log n)1/2 = n R R k n n− k≤ 40 − 1/2 ˆ 1/2 1/2 1 o n− whenever (βn β) C40n− (logn) . Therefore, part (a) of Theorem 1 follows − k − k ≤ by Brouwer’s  fixed point theorem. Now we are going to prove part (b). Note that from (6.7) and the fact that Ln converges to some positive definite matrix L, we have for large enough n,

˜ ˆ 1/2 1 Hn = Ln Ln− Λn + R1n . (6.9)  1  Here Λ = n 1/2 n (y p(β x ))x and R = L 1 n x ezi (1 ezi )(1+ ezi ) 3 x (βˆ n − i=1 − | i i 1n − n− 2√n i=1 i − − i′ n − β) 2 with z xPβ x (βˆ β) for all i 1,...,n . LPand Lˆ are as defined earlier. Now | i − i′ | ≤ | i′ n − | ∈ { } n n applying part (a) we have P R = O n 1/2(logn) = 1 o n 1/2 . Again by Taylor’s theorem  k 1nk − − − we have      n 1 x′ β x′ β x′ β 3 Lˆ L = n− x x′ e i 1 e i 1+ e i − x′ (βˆ β) + L , (6.10) n − n i i − i n − 1n i=1 X     where by part (a), we have P L = O n 1(logn) = 1 o n 1/2 . Hence using Lemma k 1nk − − − 1/2 1/2 1/2 1/2 6, part (a) and Taylor’s theorem, one can show that P Lˆ n Ln =O n (logn) = k − k − 1 o n 1/2 . Therefore (6.8) and (6.10) will imply that   − −   ˜ 1/2 Hn = Ln− Λn + R2n,

where P R = O n 1/2(logn) = 1 o n 1/2 . Hence for any set B , there exists a k 2nk − − − ∈ Ap constantC41(p) > 0 such that   

P H˜ B Φ(B) n ∈ −   1/2 1/2 P H˜ B P L− Λ B + P L− Λ B Φ(B) ≤ n ∈ − n n ∈ n n ∈ − −1/2    1/2   1/2 C41(p)n (log n) 1/2 P R >C (p)n− (logn) + 2P L− Λ (∂B) + P L− Λ B Φ(B) ≤ k 2nk 41 n n ∈ n n ∈ −       = O n 1/2(log n) . −   SOC of Bootstrap in Logistic Regression 17

The last equality is a consequence of Lemma 5 and the bound on R . Therefore part (b) is proved. k 2nk

Proof of Theorem 3. By applying Taylor’s theorem, it follows from (2.1) that

n n 1 1 z∗ z∗ z∗ 3 2 Lˆ βˆ∗ βˆ = n− (y pˆ(x ))x (2n)− x e i (1 e i )(1 + e i )− x′ (βˆ∗ βˆ ) , n n − n − i i − i − i n − n i=1 i=1  X X   (6.11) where z x β x (βˆ βˆ ) for all i 1,...,n . Now rest of part (a) of Theorem 3 follows | i∗ − i′ | ≤ | i′ n∗ − n | ∈ { } exactly in the same line as the proof of part (a) of Theorem 1. To establish part (b), assume that 2 2 ′ 1 2 2 2 ′ W = y x , y Ey z and W = Yˆ (G µ ∗ )µ−∗ x , Yˆ µ−∗ (G µ ∗ ) 1 z . Here i i i′ i − i i′ i∗ i i∗ − G G i′ i G i∗ − G − i′ y = (y p(β x )) and Yˆ= (y pˆ(x )). First we are going to show that  i i −  | i  i i − i     ˇ ¯ 1/2 ˇ ˆ ¯ 1/2 Hn = √n H Wn + n− bnZ + Rn and Hn∗ = √n H Wn∗ + n− bnZ + Rn∗ ,     p(p + 1) for some functions H, Hˆ : k p where k = p + q with q = . H( ), Hˆ ( ) have continuous R → R 2 · · partial derivatives of all orders, H(0) = Hˆ (0) = 0 and P R = o n 1/2 = 1 o n 1/2 & k nk − − − 1/2 1/2   P Rn∗ = o n− = 1 op n− . Next step is to apply Lemma 3, Lemma 4 and Lemma 7 to ∗ k k − ˇ ˇ claim that suitable Edgeworth expansions exist for both Hn and Hn∗ . The last step is to conclude SOC of Bootstrap by comparing the Edgeworth expansions. Now (6.7) and part (a) of Theorem 1 imply that 1 √n βˆ β = L− Λ ξ /2 + R , (6.12) n − n n − n 3n  h i where P R C (p)n 1(log n)3/2 = 1 o n 1/2 . Here Λ = n 1/2 n y x and ξ = k 3nk ≤ 42 − − − n − i=1 i i n 2 3/2 n x′ β x′ β x′ β 3 1 1/2 n x e i 1 e i 1+ e i − x L Λ . Clearly, P ξ C P(p)n (log n) = − i=1 i − i′ n− n k nk≤ 43 − 1 o n 1/2 . Therefore, by Taylor’s theoremh we havei   − P−      √n Lˆ L βˆ β = ξ + R , (6.13) n − n n − n 4n   where P R C (p)n 1(log n)2 = 1 o n 1/2 . Again noting (6.13), by equation (5) at k 4nk ≤ 44 − − − page 52 of Turnbull (1929) we have  

1/2 1/2 1/2 1/2 Mˆ − L− = L− Z L− + Z , (6.14) n − n − n 1n n 2n 1/2 1/2 where Mˆ L = Ln Z + Z Ln . Also easy to show that n − n 1n 1n  1 1/2 P Mˆ M C (p)n− (log n) = 1 o n− , k n − nk≤ 45 −    where M = n 1 n y2x x . Hence using Lemma 6 we have P Z C (p)n 1(log n)2 = n − i=1 i i i′ k 2nk≤ 46 − 1 o n 1/2 . Therefore from (6.12)-(6.14), Lemma 8 and the fact that b = O(n d) (for some − − P n −   18 Das D. and Das P. d> 0) will imply that

1/2 1/2 ∞ tL1/2 tL1/2 1/2 Hˇ = L− Λ + b Z + ξ /2 L− e− n M L e− n dt L− Λ + R , n n n n n − n n − n n n 5n Z0 h i h  i (6.15) where P R C (p)n 1/2(log n) 1 = 1 o n 1/2 . Now writing W = (W ,W ) and k 5nk ≤ 47 − − − − i i′1 i′2 ′ W¯ = n 1 n W = (W¯ , W¯ ) with W has first p components of W for all i 1,...,n , we n − i=1 i n,′ 1 n′ 2 ′ i1  i ∈ { } have P 1/2 Λn + bnZ = √n W¯ n1 + n− bnZ n 2 1/2 x′ β x′ β  x′ β 3 1 1 ξ = n− x e i 1 e i 1+ e i − W¯ ′ L− x x′ L− W¯ n i − n1 n i i n n1 i=1 X   h i ¯ ˜ ¯ ¯ ˜ ¯ ′ = √n Wn′ 1M1Wn1,..., Wn′ 1MpWn1 ,   1 n x′ β x′ β x′ β 3 1 1 where M˜ = n x e i 1 e i 1 + e i − L x x L for k 1,...,p . Hence k − i=1 ik − n− i i′ n− ∈ { } ˜ ¯ 1/2 writing Wn1 = Wn1P+ n− bnZ we have    

1/2 1/2 ˜ ˜ ˘ ˜ ˜ ˘ ˜ ′ Ln− Λn + bnZ + ξn/2 = √n Ln− Wn + Wn′ 1M1Wn1,..., Wn′ 1MpWn1 , (6.16) h i     d ˜ ˘ p 1/2 ˜ since bn = O(n− ) and Mk = O(1) for any k 1,...,p . Here Mk = j=1 Lkjn− Mk, k 1/2 k k ∈ { 1/2 } ∈ 1,...,p , with L− being the (k, j)th element of Ln− . Again the jth row of M L is { } kjn P n − n W¯ E where E is a matrix of order q p with E q, j 1,...,p . Therefore from (6.7) n′ 2 jn jn × k jnk≤ ∈ { }  and (6.10) we have

1/2 1/2 1/2 ∞ tL tL 1/2 ′ L− e− n M L e− n dt L− Λ = √n W˜ ′ Mˇ W˜ ,..., W˜ ′ Mˇ W˜ + R , n n − n n n n2 1 n2 n2 p n2 6n Z0 h  i   (6.17)

1/2 where W˜ n2 = W¯ n2 + n− bnZ1 with Z1 Nq 0, Iq , independent of Z & y1,...,yn . M¯ k = ∼ { } 1/2 ∞ p 1/2 tL mkjn(t)Mˇ j(t) dt where mkjn(t) is the (k, j)th element of the matrix Ln− e n and 0 j=1  1/2 h tLn 1/2 1/2 1 Mˇ (t)P = E e Ln− , k, j 1,...,p . Moreover, P R C n (log n) = 1 j jn ∈ { } k 6nk ≤ 48 − − − R  ˘  1/2 Mk 0  o n . Now define the (p+q) (p+q) matrices M †,..., M † where M † = . Therefore − 1 p k ¯ × { } "Mk 0# from (6.15 )-(6.17) we have

ˇ 1/2 ˜ ˜ ˜ ˜ ˜ ′ Hn = √n Ln− 0 Wn + Wn′ M1†Wn,..., Wn′ Mp†Wn + Rn      = √nH W˜ n + Rn, (6.18)  where the function H( ) has continuous partial derivatives of all orders, W˜ = W˜ , W˜ ′ and · n n′ 1 n′ 2 Rn = R5n + R6n.  SOC of Bootstrap in Logistic Regression 19

1 n 1 n 1 Through the same line of arguments, writing W¯ = n W = n Yˆ µ−∗ (G n∗1 − i=1 i∗1 − i=1 i G i∗ − 1 n 1 n 2 2 2 µ ∗ )x and W¯ = n W = n Yˆ µ−∗ (G µ ∗ ) 1 z , it can be shown that G i n∗2 − i=1 i∗2 − i=1 i G i∗ − GP − i P P P   ˇ ˆ 1/2 ˜ ˜ ˜ ˜ ˜ ′ Hn∗ = √n Mn− 0 Wn∗ + Wn∗′M1∗†Wn∗,..., Wn∗′Mp∗†Wn∗ + Rn∗      ˆ ˜ = √nH Wn∗ + Rn∗ , (6.19)

˜ ˜ ˜  ˜ ¯ 1/2 ˜ ¯ 1/2 where Wn∗ = Wn∗′1, Wn∗′2 ′ with Wn∗1 = Wn∗1 + n− bnZ∗ and Wn∗2 = Wn∗ + n− bnZ1∗, Z1∗ being a ˘ Mk∗ 0 N 0, I distributed random vector independent of G ,...,G and Z . M ∗† = where q q { 1∗ n∗ } ∗ j ¯ "Mk∗ 0# ˘ p ˆ 1/2 ˜ ˆ 1/2 ˆ 1/2 ˜ ˜ Mj∗ = j=1 Mkjn− Mj∗ with Mkjn− being the (k, j)th element of Mn− , Mj∗ being same as Mj af- ∞ p ter replacing β by βˆn. M¯ ∗ = m∗ Mˇ j(t) dt where m∗ (t) is the (k, j)th element of the P j 0 j=1 kjn kjn 1/2 ˆ 1/2 ˆ 1/2 1/2 ˆ tMn ˇ tMn ˆ 1/2 1 matrix Mn− e− and Mj∗(t) P = Ejne− Mn− . Also P Rn∗ C49n− (log n)− = ∗ k k ≤ 1 o n 1/2 . Now by applyingR Lemma 3, Lemma 4 and Lemma 7 with s = 3, Edgeworth expan- − p −  sions of the densities of Hˇ and Hˇ can be found uniformly over the class upto an error o n 1/2  n n∗ Ap − and o n 1/2 respectively. Call those Edgeworth expansions ψ˜ ( ) and ψ˜ ( ) respectively. Now p − n,3 · n,∗ 3 ·  if ψ˜ ( ) is compared with ψˇ ( ) of Lemma 4, then Mˇ = I . Similarly for ψ˜ ( ) also Mˇ = I . n,3 ·  n,3 · n p n,∗ 3 · n p Therefore, ψ˜ ( ) and ψ˜ ( ) have the forms n,3 · n,∗ 3 ·

m2 1 ˜ 1/2 − 2j ψn,3(x)= 1+ n− q1(β,µW , x)+ bn q2j(β, Ln, x) φ(x) h Xj=1 i m2 1 ˜ 1/2 ˆ − 2j ˆ ˆ ψn,∗ 3(x)= 1+ n− q1(βn, µˆW , x)+ bn q2j(βn, Mn, x) φ(x), h Xj=1 i

2j 1/2 1 n 2 l1 l2 where m = inf j : bn = o(n ) , µ is the vector of n E(y p(β x )) x x ′ : 2 { − } W { − i= i − | i ij ij 1 n 3 l1 l2 l3 j, j 1,...,p , l , l 0, 1, 2 , l + l = 2 and n E(y p(β x )) x x ′ x ′′ : j, j , j ′ ∈ { } 1 2 ∈ { } 1 2 } { − i= i − P | i ij ij ij ′ ′′ ∈ 1 n 2 l1 l2 1,...,p , l , l , l 0, 1, 2, 3 , l + l + l = 3 .µ ˆ is the vector of n (y pˆ(x )) x x ′ : { } 1 2 3 ∈ { } 1 2 3 } W P { − i= i − i ij ij 1 n 3 l1 l2 l3 j, j 1,...,p , l , l 0, 1, 2 , l + l = 2 and n (y pˆ(x )) x x ′ x ′′ : j, j , j ′ ∈ { } 1 2 ∈ { } 1 2 } { − i= i − i P ij ij ij ′ ′′ ∈ 1,...,p , l , l 0, 1, 2, 3 , l + l + l = 3 . q (a, b, c) is a polynomial in c whose coefficients are { } 1 2 ∈ { } 1 2 3 } 1 P continuous functions of (a, b)′. q2j(a, b, c) are polynomials in c whose coefficients are continuous functions of a and b. Now Theorem 3 follows by comparing ψ˜ ( ) and ψ˜ ( ) and due to part (a) n,3 · n,∗ 3 · of Theorem 1.

Proof of Theorem 2. Recall that here p = 1 and hence q = 1. Define, Bn = √nH(En ) with 3 × R E = ( , z ] and z = µ . Here µ = n 1 n x p(β x ). Note that B is an interval, as n −∞ n n 4n − n n − i=1 i | i n argued in section 3 just after the description of Theorem 2. The function H( ) is defined in (6.18). P · We are going to show that there exists a positive constant M2 such that

lim P √n P Hn∗ Bn P Hn Bn M3 = 1. n ∗ ∈ − ∈ ≥ →∞    

20 Das D. and Das P.

Define the set Q = βˆ β = o n 1/2(log n) n 1 n (y p(β x ))2 E(y p(β x ))2 x2 = n n− − ∩ − i=1 i− | i − i− | i i o n 1/2(logn) nn 1 n (y p(β x ))3 o(yn p(βPx ))3 x3 = o(1) . Now due to a stronger − ∩ − i=1 i − | i − i − | i i version of (6.8),o it isn easyP to see that P βˆ β = o n 1/2(logn) = 1 foro all but finitely many n, n − − upon application of Borel-Cantelli lemma and noting that max xi : i 1,...,n = O(1). {| | ∈ { }} Again by applying Lemma 1, it is easy to show that P n 1 n (y p(β x ))2 (y − i=1 i − | i − i − p(β x ))2 = o n 1/2(logn) n 1 n (y p(β x ))n3 (y P p(β x ))3 = o(1) = 1 | i − ∩ − i=1 i − | i − i − | i for large enough n. Hence P Qon =n 1 forP large enough n. Similarly define the Bootstrap versiono of 1/2 1 n 2 2 2 2 Q as Q = βˆ βˆ = o n (log n) n (y pˆ(x )) µ−∗ (G µ ∗ ) 1 x = n n∗ n∗ − n −  ∩ − i=1 i − i G i∗ − G − i 1/2 1 n 3 3 3 3 o n (lognn) n (y pˆ(x ))o µn− ∗ (G P µ ∗) 1 x = o 1) . Through the same − ∩ − i=1 i − i G i∗ − G − i line, it is easy too establishn thatP P Q = 1 = 1 for large enough n. Henceo enough to show P ∗ n∗   

lim P √n P Hn∗ Bn Qn∗ P Hn Bn Qn M2 Qn = 1. (6.20) n ∗ ∈ }∩ − ∈ ∩ ≥ ∩ →∞ n o   ¯ ¯    Recall the definitions of Wn and Wn∗ from the proof of Theorem 3 . Similar to (6.18) and (6.19), it is easy to observe that

¯ ˆ ¯ Hn = √nH(Wn)+ Rn and Hn∗ = √nH(Wn∗)+ Rn∗ , (6.21) where R = O(n 1/2(log n) 1) Q and R = O(n 1/2(log n) 1) Q . To prove (6.20), | n| − − ⊆ n | n∗ | − − ⊆ n∗ first we are going to show for large enough n, 

√n P Hn∗ Bn Qn∗ P Hn Bn Qn M2 Qn ∗ ∈ }∩ − ∈ ∩ ≥ ∩   n     o ˆ ¯ ¯ √ n P √nH(Wn∗) Bn Qn∗ P √nH(W n) Bn Qn 2M2 Qn . (6.22) ⊇ ∗ ∈ }∩ − ∈ ∩ ≥ ∩ n o      Now due to (6.21), we have

−1/2 P H B P √nH(W¯ ) B P √nH(W¯ ) ∂B (n log n) n ∈ n − n ∈ n ≤ n ∈ n      1/2 1  + P R = o(n− (log n)− ) | n| 6 − ˆ ¯  ˆ ¯ (n log n)1/2 P Hn∗ Bn P √nH(Wn∗) Bn P √nH(Wn∗) ∂Bn ∗ ∈ − ∗ ∈ ≤ ∗ ∈      1/2  1  + P Rn∗ = o(n− (log n)− ) ∗ | | 6   −1/2 (n log n) 1/2 To establish (6.22), enough to show P √nHˆ (W¯ n) ∂Bn = o n− and P P ∈ ∗ −1/2 √nHˆ (W¯ ) ∂B (n log n) = o n 1/2 Q = 1 for large enough n. An Edgeworth n ex- n∗ ∈ n − ∩ n ¯ 1/2 pansion of √nW n∗ with an erroro(n− ) (ino almost sure sense) can be established using Lemma 11. Then we can use transformation technique of Bhattacharya and Ghosh (1978) to find an Edgeworth expansionη ˆ ( ) of the density of √nHˆ (W¯ ) with an error o(n 1/2) (in almost sure n · n∗ − ˆ ¯ sense). Now the calculations similar to page 213 of BR(86) will imply that P P √nH(Wn∗) ∗ ∈ n  SOC of Bootstrap in Logistic Regression 21

−1/2 ∂B (n log n) = o n 1/2 Q = 1, since B is an interval. Next we are going to show that n − ∩ n n −1/2 P √nHˆ (W¯ ) ∂B (n log n)o = 0 for large enough n and to show that we need to utilize the n ∈ n ¯ ˆ ¯ form of Bn, as Edgeworth  expansion of √nH(Wn) similar to √nH(Wn∗) does not exist due to the lattice nature of W1,...,Wn. To this end define kn(x) = √nH(x/√n),x2 ′ where x = (x1,x2)′. Note that k ( ) is a diffeomorphism (cf. proof of lemma 3.2 in Lahiri (1989)). Hence k ( ) is a bi- n ·  n · jection and k ( )& k 1( ) have derivatives of all orders. Therefore, arguments given between (2.15) n · n− · and (2.18) at page 444 of Bhattacharya and Ghosh (1978) with g replaced by k 1( ) will imply n n− · that −1/2 1 dn(n log n) 1/2 P H B P √nH(W¯ ) B P √nW¯ ∂k− (B ) + o n− n ∈ n − n ∈ n ≤ n ∈ n n × R −1/2      dn(n log n)  1/2  = P √nW¯ ∂E + o n− , n1 ∈ n   where d max det Grad k (x) 1 : x = O(√log n) . Now by looking into the form of H( ) n ≤ | n |− | | · in (6.8), it is easy to see that d = O(1), say d C for some positive constant C . Now note   n  n ≤ 44 44 that −1/2 P √nW¯ ∂E C44(n log n) n1 ∈ n  n  1/2  1/2 1/2 = P n− y x √nµ z C (n log n)− , z + C (n log n)− i i − n ∈ n − 44 n 44 h i=1 i  n X  1/2 1/2 = P y x 3/4 C (log n)− , 3/4+ C (log n)− i i ∈ − 44 44 i=1  X  = 0, n for large enough n, since i=1 yixi can take only integer values. Therefore (6.22) is established. Now recalling thatη ˆ ( ) is the Egeworth expansion of the density of √nHˆ (W¯ ) with an almost n · P n∗ 1/2 sure error o(n− ), we have for large enough n,

ˆ ¯ P √n P √nH(Wn∗) Bn ηˆn(x)dx = o(1) = 1. (6.23) ∗ ∈ − B    Z n 

Now define U = y p(β x ) x V , y p(β x ) 2x2 V 2 1 ′, i 1,...,n , where V ,...,V are i i − | i i i i− | i i i − ∈ { } 1 n iid continuous random variables which are independent of y ,...,y . Also E(V ) = 0, E(V 2) =    { 1 n} 1 1 3 8 1 E(V ) = 1 and EV < . An immediate choice of the distribution of V is that of (G µ ∗ )µ−∗ . 1 1 ∞ 1 1∗ − G G Other choices of V ,...,V can be found in Liu(1988), Mammen (1993) and Das et al. (2019). { 1 n} Now since max x : i 1,...,n = O(1), there exists a natural number n and constants {| i| ∈ { }} 0 0 < δ2 δ1 < 1 such that supn n0 p(β xn) δ1 and infn n0 p(β xn) δ2. Again V1,...,Vn are iid ≤ ≥ | ≤ ≥ | ≥ continuous random variables. Hence writing p = p(β x ), for any b> 0 we have n | n ′ sup sup Eeit Un n n0 t >b ≥ k k 2 2 2 2 it1(1 pn)V1+it2( pn) [V 1] it1( pn)V1+t2( pn) [V 1] sup pn sup Ee − − 1 − + (1 pn) sup Ee − − 1 − 2 2 ≤ n n0 t >b(1 δ1) − t >bδ ≥  k k − k k 2 

< 1, 22 Das D. and Das P.

i.e. uniform Cramer’s condition holds. Also the minimum eigen value condition of Theorem 20.6 1 n 6 of BR(86) holds due to max xi : i 1,...,n = O(1) and lim infn n− i=1 xi > 0. Hence {| | ∈ { }} →∞ applying Theorem 20.6 of BR(86) and then applying transformation technique ofP Bhattacharya and Ghosh (1978) we have

1/2 P √nH(U¯n) Bn ηn(x)dx = o n− , (6.24) ∈ − B   Z n  where U¯ = n 1 n U . Note that in both the expansions η ( ) andη ˆ ( ) the corre- n − i=1 i n · n · sponding to normal terms are 1. Also Hˆ ( ) can be obtained from H( ) first replacing L by Mˆ P · · n n and then β by βˆn (cf. (6.18) and (6.19)). Hence we can conclude that for any Borel set C,

P √n ηn(x)dx ηˆn(x)dx = o 1 Qn = 1 C − C ∩ n Z Z o   Hence from (6.23) and (6.24), we have

ˆ ¯ ¯ P √n P √nH(Wn∗) Bn P √nH(Un) Bn = o(1) Qn = 1, (6.25) ∗ ∈ − ∈ ∩ n     o 

for large enough n. To establish (6.20), in view of (6.22) and ( 6.25) it is enough to find a positive

constant M3 such that

√n P √nH(W¯ ) B P √nH(U¯ ) B = √n P √nW¯ E P √nU¯ E 4M . n ∈ n − n ∈ n n1 ∈ n − n1 ∈ n ≥ 3         Note that since EV 2 = EV 3 = 1 for all i 1,...,n , the first three average moments of i i ∈ { } W ,...,W are same as that of U ,...,U . However W ,...,W are independent lat- { 11 n1} { 11 n1} { 11 n1} tice random variables whereas U ,...,U are independent random variables for which uniform { 11 n1} Cramer’s condition holds. Therefore by Lemma 10 and Theorem 20.6 of BR(86) we have

1/2 sup P √nW¯ x Φ 2 (x) n P Φ 2 : χ¯ (x) n1 σn − 1 σn ν,n x ≤ − − − { } ∈R    1/2 d 1/2 + n− nµn + √nx 1/2) Φ 2 (x) = o n− − dx σn 1/2 1/2 and sup P √nU¯ x Φ 2 (x) n P Φ 2 : χ¯  (x) = o n , (6.26) n1 σn − 1 σn ν,n − x ≤ − − − { } ∈R     where P Φ 2 : χ¯ (x) is as defined in Lemma 10. Recall that E = ( , z ] where z = 1 σn ν,n n n n 3 − { } −∞ µ . Therefore for some positive constants C ,C ,C we have 4n − n 46 47 48   √n P √nW¯ E P √nU¯ E = √n P √nW¯ √nz P √nU¯ √nz n1 ∈ n − n1 ∈ n n1 ≤ n − n1 ≤ n       1 (nz2 )/(2σ2 )  nµ + nz 1/2 √2πσ − e− n n o(1) ≥ n n − n − 1 (nz2 )/(2σ2 ) = 4√2πσ − e− n n o(1) n − 1 9 2 2 3nµn C exp  C n− + n µ o(1) ≥ 46 − 47 16 n − 2 − n  o C exp C M 2 . ≥ 48 − 47 1 n o SOC of Bootstrap in Logistic Regression 23

The first inequality follows due to (6.26). Second one is due to max x : i 1,...,n = O(1) {| i| ∈ { }} and the last one is due to the assumption √n µ < M . Taking 4M = C exp C M 2 , the | n| 1 2 48 − 47 1 proof of Theorem 2 is now complete. n o

7. Conclusion

In this paper we consider the studentized version of the logistic regression estimator and proposed a novel Bootstrap method called PEBBLE. The rate of convergence of the studentized version to nor- 1/2 mal distribution is found to be sub-optimal with respect to the classical Berry-Esseen rate O n− . We observe that the usual studentization also fails significantly in improving the error rate in the Bootstrap approximation due to the underlying lattice structure. Therefore, a novel modification is proposed in the form of studentized pivots to achieve SOC by the Bootstrap in approximating the distribution of the studentized logistic regression estimator. The proposed Bootstrap method can be used in practical purposes to draw inferences about the regression parameter which will be more accurate than that based on asymptotic normality. PEBBLE is shown perform better than other existing method performance-wise, in general, via simulation experiments. Specifically for larger p, smaller n settings, PEBBLE outperforms other methods by a large margin. The proposed method is used to find the middle, upper and lower CIs for the covariates in a real data application concerning the dependency of the type of delivery on several related clinical variables. As a future extension, the SOC of Bootstrap in the (GLM) can be explored. Additionally, one can also explore the high dimensional structure in GLM, that is when dimension p grows with n, by adding suitable penalty terms in the underlying objective function.

8. Supplementary Material

8.1. Supplementary Proof Details

8.1.1. Notation

Suppose, ΦV and φV respectively denote the normal distribution and its density with mean 0 and covariance matrix V . We will write ΦV = Φ and φV = φ when the dispersion matrix V is the identity matrix. C,C ,C , denote generic constants that do not depend on the variables 1 2 · · · p like n,x, and so on. ν1, ν2 denote vectors in R , sometimes with some specific structures (as p mentioned in the proofs). (e1,..., ep) denote the standard basis of . For a non-negative integral ′ R vector α = (α , α ,...,α ) and a function f = (f ,f ,...,f ) : l l, l 1, let α = 1 2 l ′ 1 2 l R → R ≥ | | α + . . . + α , α! = α ! ...α !, f α = (f α1 ) . . . (f αl ), Dαf = Dα1 Dαl f , where D f denotes 1 l 1 l 1 l 1 1 · · · l 1 j 1 the partial derivative of f with respect to the jth component of α, 1 j l. We will write 1 ≤ ≤ Dα = D if α has all the component equal to 1. For t = (t ,t , t ) l and α as above, define 1 2 · · · l ′ ∈ R tα = tα1 tαl . For any two vectors α, β k, α β means that each of the component of α 1 · · · l ∈ R ≤ 24 Das D. and Das P. is smaller than that of β. For a set A and real constants a , a , a A + a = a y + a : y A , 1 2 1 2 { 1 2 ∈ } ∂A is the boundary of A and Aǫ denotes the ǫ neighbourhood of A for any ǫ> 0. is the set of − N natural numbers. C( ),C ( ),... denote generic constants which depend on only their arguments. · 1 · Given two probability measures P and P defined on the same space (Ω, ), P P defines the 1 2 F 1 ∗ 2 measure on (Ω, ) by convolution of P & P and P P = P P (Ω), P P being the F 1 2 k 1 − 2k | 1 − 2| | 1 − 2| total variation of (P P ). For a function g : k m with g = (g ,...,g ) , 1 − 2 R → R 1 m ′ ∂g (x) Grad[g(x)] = i . ∂xj m k   × For any natural number m, the class of sets is the collection of Borel subsets of m satisfying Am R

sup Φ((δB)ǫ)= O(ǫ) as ǫ 0. (8.1) B m ↓ ∈A

s 2 r/2 For Lemma 3 below, define ξ (t)= 1+ − n P˜ it : χ¯ exp t E t/2 where 1,n,s i=1 − r { ν,n} − ′ n 1 n 1 n En = n− i=1 V ar(Yi) andχ ¯ν,n is the average Pνth cumulant of Y1,...,Y n. Definen ρ ¯l = no− i=1 E Y l, the average lth absolute moment of Y ,...,Y . The polynomials P˜ z : χ¯ are k ik P { 1 n} r { ν,n}P defined on the pages of 51 53 of Bhattacharya and Rao (1986). Define the identity − 

∞ 2 2 j (s 2)/2 ξ (t) ( t b ) /j! = ξ (t)+ o n− − , 1,n,s −k k n n,s j=0  X   uniformly in t < 1, where c is defined in Lemma 3. ψ ( ) is the Fourier inverse of ξ ( ). k k n n,s · n,s ·

8.2. Proofs of Lemma 3, 10 and 11

Lemma 3. Suppose Y ,...,Y are mean zero independent random vectors in k with E = n 1 1 n R n − n V ar(Y ) converging to some positive definite matrix V . Let s 3 be an integer and ρ¯ = i=1 i ≥ s+δ PO(1) for some δ > 0. Additionally assume Z to be a N(0, Ik) random vector which is independent d (s 2)/k˜ 2 of Y1,...,Yn and the sequence cn n 1 to be such that cn = O(n− ) & n− − log n = o(cn) where { } ≥ k˜ = max k + 1,s + 1 & d> 0 is a constant. Then for any Borel set B of k, { } R

(s 2)/2 P √nY¯ + cnZ B ψn,s(x)dx = o n− − , (8.2) ∈ − B Z    where ψ ( ) is defined above. n,s · Proof of Lemma 3. Define V = Y I Y √n and W = V EV . Suppose χ˜¯ is the average i i k ik≤ i i − i ν,n 1 n ˜ ˜ ˜ cumulant of W1,...,Wn and Dn = n− i=1 V ar (Wi). Let ξ1,n,s, ξn,s and ψn,s are respectively ¯ obtained from ξ1,n,s, ξn,s and ψn,s withχ ¯ν,nPreplaced by χ˜ν,n and En replaced by Dn. For any Borel SOC of Bootstrap in Logistic Regression 25 set B k, define B = B n 1/2 n EV . Then we have ∈ R n − − i=1 i P P √nY¯ + c Z B ψ (x)dx n n ∈ − n,s ZB  P √nY¯ + c Z B P √nV¯ + c Z B ≤ n n ∈ − n n ∈   ¯  ˜ ˜ + P √nWn + cnZ Bn ψn,s(x)dx + ψn,s(x)dx ψn,s(x)dx ∈ − B B − B Z n Z n Z  =I1 + I2 + I3 (say). (8.3)

(s 2)/2 First we are going to show that I1 = o n− − . Now writing Gj and Gj′ to be distributions of n 1/2Y and n 1/2V , j 1,...,n , we have  − j − j ∈ { } n I G G′ 1 ≤ k j − jk Xj=1 n = 2 P Y >n1/2 k jk Xj=1   (s 2)/2 = o n− − , (8.4)   due to the fact that n 1 n E Y s+δ = O(1). Next we are going to show I = o n (s 2)/2 . − j=1 k jk 3 − − 2j (s 2)/2 Define m = inf j : bn = o n . Again note that the eigen values of D are bounded 1 { P − − } n away from 0, due to (14.18) in corollary 14.2 of Bhattacharya and Rao (1986) and the fact that En converges to some positive definite matrix. Therefore we have

˜m1 m1 (s 2)/2 (s 2)/2 I3 = ψn,s(x)dx ψn,s(x)dx + o n− − = I31 + o n− − (say), (8.5) B − B Z n Z     uniformly for any Borel set B of k, where R

s 2 m1 1 − − m1 r/2 j 1 2j j ψ (x)= n− P˜ D : χ¯ 2− (j!)− c (D′D) φ (x) and n,s r − ν,n n En  r=0 j=0  h X  ih X i

s 2 m1 1 − − m1 r/2 j 1 2j j ψ˜ (x)= n− P˜ D : χ˜¯ 2− (j!)− c (D′D) φ (x). n,s r − ν,n n Dn  r=0 j=0  h X  ih X i 26 Das D. and Das P.

Now writing l(u)= u /2, u k, and a = n 1/2 n EV , from (8.4) we have k k ∈ R n − i=1 i s 2 m1 1 P − − r/2 2j ˜ l( D) ˜ ¯ l( D) I31 n− bn Pr D : χ¯ν,n − φEn (x) Pr D : χ˜ν,n − φDn (x) dx ≤ B − j! − − j! r=0 j=0  Z n n o n o X X     ˜ l( D) ˜ l( D) + Pr D : χ¯ν,n − φEn (x) Pr D : χ¯ν,n − φEn (x an) dx B − j! − − j! − Z n o n o  (s 2)/2     + o n − −   (s 2)/2 =I311 + I312 + o n− − (say).  

(8.6) Now assume E = I , the k k identity matrix. Then following the proof of Lemma 14.6 of n k × (s 2)/2 Bhattacharya and Rao (86), it can be shown that I311 + I312 = o n− − . Main ingredients of the proof are (14.74), (14.78), (14.79) and bounds similar to (14.80) and (14.86) in Bhattacharya and

Rao (86). The general case when En converges to a positive definite matrix, will follow essentially (s 2)/2 through the same line. Hence from (8.5) and (8.6), we have I3 = o n− − . The last step is to (s 2)/2 ¯ show I2 = o n − . Now let us write Γn = √nWn + cnZ. Then recall that   I2 = P Γn Bn ψ˜n,s(x)dx . ∈ − B Z n  By Theorem 4 of chapter 5 of Feller(2014), we can say that Γn has density with respect to the Lebesgue measure. Let us call that density by q ( ). Then we have n · I q (x) ψ˜ (x) dx q (x) ψ˜ (x) dx + ψ˜ (x) ψ˜ (x) dx, (8.7) 2 ≤ n − n,s ≤ n − n,(k˜ 1) n,s − n,(k˜ 1) Z Z − Z −

where k˜ = max k + 1,s + 1 . Note that x j q (x) ψ˜ (x) dx < for any j , { } k k n − n,(k˜ 1) ∞ ∈ N ˜ ¯ − since ψn,(k˜ 1)(x) has negative exponential termR and Wn is bounded. Therefore by Lemma 11.6 of − Bhattacharya and Rao (86) we have

β ˜ ˜ ˜ I2 C(k) max D qˆ(t) ξn,(k˜ 1)(t) dt + ψn,s(x) ψn,(k˜ 1)(x) dx ≤ β 0,...,(k+1) − − − −  | |∈{ } Z    Z

= I21 + I22 (say). (8.8)

Hereq ˆ ( ) is the Fourier transform of the density q( ). Clearly I = o n (s 2)/2 by looking into n · · 22 − − the definition of ψ˜ ( ). Now define   n,s · ˜ k 3 2 2 ˘ − r/2 ˜ ¯ t′Dnt cn t ξn,(k˜ 1)(t)= n− Pr it : χ˜ν,n exp − − k k . − 2  Xr=0  n o   SOC of Bootstrap in Logistic Regression 27

Then we have

β ˘ β ˘ ˜ I21 C(k) max D qˆn(t) ξn,(k˜ 1)(t) dt + D ξn,(k˜ 1)(t) ξn,(k˜ 1)(t) dt ≤ β 0,...,(k+1) − − − − − | |∈{ }  Z   Z   

= I211 + I212 (say)

(8.9) (s 2)/2 First, we are going to show that I212 = o n− − . Note that   ˜ k 3 2j 2j j − r/2 t′Dnt ∞ cn t ( 1) ξ˘ (t) ξ˜ (t)= n− P˜ it : χ˜¯ exp − k k − , n,(k˜ 1) − n,(k˜ 1) r ν,n 2 2jj! − − r=0  X  n o   jX=m2 where m = m (r) = (s 2) 1m (k˜ 3 r). Therefore for any β k with β 0,...,k + 1 2 2 − − 1 − − ∈N | | ∈ { } we have

β ˘ ˜ D ξn,(k˜ 1)(t) ξn,(k˜ 1)(t) − − − ˜  k 3  r/2 j 2j ∗ − ∞ n− ( 1) cn α γ t′Dnt β α γ 2j = C (α, β, γ) − D P˜ it : χ˜¯ D exp − D − − t , 1 2jj! r ν,n 2 k k X Xr=0 jX=m2    n o     

(8.10)

k where ∗ is over α, γ such that 0 α, γ β. Since the degree of the polynomial ∈ N ≤ ≤ ˜ ¯ α ˜ ¯ Pr it :Pχ˜ν,n is 3r, D Pr it : χ˜ν,n = 0 if α > 3r. When α 3r, then recalling { } | | | | ≤   that n 1 n E Y s = O(1) and byn Lemmao 9.5 & Lemma 14.1(v) of Bhattacharya and Rao − i=1 k ik (1986) weP have

r/(s 2) r(s 3)/(s 2) 3r α C2(α, r) ρ¯s − 1+ ρ¯2 − − (1 + t −| |), if 0 r (s 2) α k k ≤ ≤ − D P˜r it : χ˜¯ν,n ≤  (r+2 s)/2   r 1  3r α   C3(α, r)n − ρ¯s 1+ ρ¯2 − 1+ t −| | , if r> (s 2).  n o  k k −      (8.11)

Again note that

γ γ t′Dnt | | γ t′Dnt D exp − C (γ) 1+ t D | | exp − (8.12) 2 ≤ 4 k k k nk 2          

2j β α γ t 2j ∞ cn D − − 2 and k k C (α, β, γ)c2m3 ecn/2 + t m3 exp(c2 t 2/2) , (8.13) 2jj!  ≤ 5 n k k nk k j=m2 X h i

28 Das D. and Das P.

where m = m (α, β, γ, r) = max m , β α γ /2 . Now combining (8.11)-(8.13), from (8.10) 3 3 { 2 | − − | } (s 2)/2 (s 2)/2 we have I212 = o n− − . Last step is to show I211 = o n− − . Recall that     β ˘ I211 = C(k) max D qˆn(t) ξn,(k˜ 1)(t) dt β 0,...,(k+1) − − | |∈{ }  Z    β ˘ β ˘ C(k) max D qˆn(t) ξn,(k˜ 1)(t) dt + D qˆ(t) ξn,(k˜ 1)(t) dt ≤ β 0,...,(k+1) A − − Ac − − | |∈{ }  Z n   Z n   

= I2111 + I2112 (say),

(8.14) where 1/2 (k˜ 2)/k˜ k 1/2 n − An = t : t C6(k)λ− , n 1/(k˜ 2) ( ∈ R k k≤ η − )  k˜ 

with C6(k) being some fixed positive constant, λn being the largest eigen value of Dn, ηk˜ = n 1 n E B W k˜ and B2 = D 1. Note that − i=1 k n ik n n− β P ˘ D qˆn(t) ξn,(k˜ 1)(t) − − ˜   k 3 2 2 α i√nt′ t′Dnt − r/2 β α cn t = C (α, β)D E e W¯ exp − n− P˜ it : χ˜¯ D − exp − k k , 7 n − 2 r ν,n 2 0 α β  r=0    ≤X≤     X  n o  

(8.15)

where 2 2 2 2 β α cn t 2 β α β α cn t D − exp − k k C (α, β)c | − | t | − | exp − k k and 2 ≤ 8 n k k 2       by Theorem 9.11 and the following remark of Bhattacharya and Rao (86) we have

k˜ 3 ′ t D t − α i√nt W¯ n ′ n r/2 D E e exp − n− P˜ it : χ˜¯ − 2 r ν,n  r=0      X  n o α /2 (k˜ 2)/2 (k˜ α /2) (3(k˜ 2)+ α )/ 2 t′Dnt C (k)λ| | η n− − (t′D t) −| | + (t′D t) − | | exp − . (8.16) ≤ 9 n k˜ n n 4 h i   Now note thatρ ¯s+δ = O(1) and En converges to a positive definite matrix E. Hence applying

Lemma 14.1(v) (with s′ = k˜) and corollary 14.2 of Bhattacharya and Rao (86), from (8.15) we have (s 2)/2 I2111 = o n− − . Again applying Lemma 14.1(v) and corollary 14.2 of Bhattacharya and Rao (86) we have η C (k,s˜ )n(k˜ s)/2ρ¯ for large enough n and λ being converged to some positive k˜ ≤ 10 − s n number. Therefore we have for large enough n,

c k (s 2)/2k˜ A B where B = t : t >C (k, E)n − , n ⊆ n n ∈ R k k 11 n o SOC of Bootstrap in Logistic Regression 29 implying

β ˘ I2112 C(k) max D qˆn(t) ξn,(k˜ 1)(t) dt ≤ β 0,...,(k+1) B − − | |∈{ } Z n   β β ˘ C(k) max D qˆn(t) dt + D ξn,(k˜ 1)(t) dt ≤ β 0,...,(k+1) B B − | |∈{ }  Z n   Z n   

= I21121 + I21122 (say), (8.17)

(s 2)/2 (s 2)/2 for large enough n. To establish I2112 = o n− − , first we are going to show I21122 = o n− − . Note that    

k˜ 3 ˜ β α − r/2 β α t′Dnt D ξ˘ (t) = C (α, β)D n− P˜ it : χ˜¯ D − exp − , n,(k˜ 1) 12 r ν,n 2 − 0 α β  r=0      ≤X≤ X  n o   ˜ 2 where Dn = Dn + cnIk. We are going to use bounds (8.11) and (8.12) with Dn being replaced by d D˜ n. Note that by Corollary 14.2 of Bhattacharya and Rao (86) and the fact that cn = O(n− ),

D˜ n converges to the positive definite matrix E, which is the limit of En. Hence those bounds will imply that for large enough n,

β ˘ I21122 = C(k) max D ξn,(k˜ 1)(t) dt β 0,...,(k+1) B − | |∈{ } Z n   (k˜+1 s)/2 3(k˜ 1) 2 C13(k, E)n − 1+ t − exp C14(E) t /2 dt ≤ B k k − k k Z n     (k˜+1 s)/2 2 C15(k, E)n − exp C14(E) t /4 dt. (8.18) ≤ B − k k Z n   (s 2)/2 Now apply Lemma 2 of the main paper to conclude that I21122 = o n− − . Only remaining (s 2)/2   thing to show is I21121 = o n− − . Note that   2 2 ′ t β α i√nt W¯ n β α cn D qˆ (t) = C (α, β)D E e D − exp − k k , (8.19) n 16 2 0 α β       ≤X≤     where n ′ ¯ ′ Dα E ei√nt Wn Dα E eit Wi/√n ≤     i=1   2 2 Y 2 2 β α cn t β α cn t and D − exp − k k C (α, β) 1+ t | − | exp − k k . 2 ≤ 17 k k 2        ′ ¯ α i√nt Wn α Now by Leibniz’s rule of differentiation, D E e is the sum of n| | terms. A typical term is of the form    r it′W /√n β it′W /√n E e i D l E e il ,

i Cr l=1 Y6∈   Y    where Cr = i1,...,ir 1,...,n , 1 r α . β1,..., βr are non-negative integral vectors { } ⊂ { } ≤ ≤ | | ′ r β it W /√n satisfying β 1 for all j 1, . . . , r and β = α. Note that D l E e il | j| ≥ ∈ { } j=1 i ≤    P

30 Das D. and Das P. n βl /2E W βl and W 2√n, which imply that −| | k il k| | jl ≤ r it′W /√n β it′W /√n Pr β α E e i D l E e il 2 l=1 | l| = 2| | ≤ i Cr l=1 Y6∈   Y    ′ α i√nt W¯ n α D E e (2n)| |. ⇒ ≤    (s 2)/2k˜ Let Kn = C11(k, E) n − . Therefore from (8.19), for large enough n we have

2 2 k+1 k+1 cn t I21121 max C16(α, β) (2n) 1+ t exp − k k ≤ β 0,...,(k+1) k k 2 | |∈{ } 0 α≤β ZBn h ≤X i h    i k+1 k 1 k+1 c2 r2/2 C18(k)(2n) r − 1+ r e− n dr ≤ r Kn Z ≥   1 2 2 k+1 1 cnr /4 C19(k)(2n) cn− 1 e− dr ≤ r Kn 2√πcn− Z ≥ k+d+1 ∞ 1 z2/2 C20(k)n e− dr ≤ √ √2π ZcnKn/ 2 (s 2)/2 = o n− − . (8.20)   The second inequality follows by considering polar transformation. Third inequality follows due to (s 2)/k˜ 2 d the assumptions that n− − (log n)= o(cn) and cn = O(n− ). The last equality is the implication of Lemma 2 presented in main paper. Therefore the proof of Lemma 3 is now complete.

Lemma 10. Assume the setup of Theorem 3 and let X = y x , i 1,...,n . Define σ2 = i i i ∈ { } n n 1 n V ar(X ) and χ¯ as the νth average cumulant of (X E(X )),..., (X E(X )) . − i=1 i ν,n { 1 − 1 n − n } Pr Φ 2 : χ¯ν,n is the finite signed measure on whose density is P˜r D : χ¯ν,n φ 2 (x). Let −P σn { } R − { } σn S (x) = 1 and S (x) = x 1/2. Suppose σ2 is bounded away from both 0 & and assumptions 0 1 − n ∞  (C.1)-(C.3) of Theorem 3 hold. Then we have

n 1 r 1/2 r/2 r 1/2 d sup P n− Xi E(Xi) x n− ( 1) Sr(nµn + n x) Φ 2 (x) − ≤ − − dxr σn x i=1 r=0 ∈R  X   X 1/2 1/2 n− P1 Φ 2 : χ¯ν,n (x) = o n− , (8.21) − − σn { }

  where Pr Φσ2 : χ¯ν,n (x) is the Pr Φσ2 : χ¯ν,n measure of the set ( ,x]. − n { } − n { } − −∞   Proof of Lemma 10. For any integer α, define p (x) = P n X = α and x = n 1/2(α n i=1 i α,n − − nµ ). Also define X˜ = n 1/2 n X E(X ) and q (x) = n 1/2 1 n r/2P˜ D : n n − i=1 i − i n,P3 −  r=0 − r − P  P SOC of Bootstrap in Logistic Regression 31

χ¯ν,n φ 2 (x). Note that { } σn  n 1 r 1/2 r/2 r 1/2 d sup P n− Xi E(Xi) x n− ( 1) Sr(nµn + n x) Φ 2 (x) − ≤ − − dxr σn x i=1 r=0 ∈R  X   X 1/2 n− P1 Φ 2 : χ¯ν,n (x) − − σn { }  1 r r/2 r 1/2 d sup P X˜n x Qn,3(x) + sup Q n,3(x) n− ( 1) Sr(nµn + n x) Φ 2 (x) ≤ ≤ − − − dxr σn x x r=0 ∈R   ∈R X 1/2 n− P1 Φ 2 : χ¯ν,n (x) − − σn { } = J +J (say),  1 2

(8.22)

1/2 where Qn,3(x) = α:xα,n x qn,3(xα,n). Now the fact that J2 = o n− follows from Theorem { ≤ } 1 A.4.3 of BhattacharyaP and Rao (86) and dropping terms of order n− . Now we are going to show 1 J1 = O n− . Note that

 J p (x ) q (x ) = J (say), 1 ≤ n α,n − n,3 α,n 3 α Θ X∈ where Θ has cardinality C n, since P n 1 n X C = 1 for some constant C > 0, due ≤ 33 − i=1 i ≤ 33 33 5 1 to the assumption that max xj : j 1 ,...,n = O (1). Hence n− J3 C33 supα Θ pn(xα,n) {| | ∈ { P}}  ≤ ∈ − 2 qn,3(xα,n) = C33 supα Θ J4(α) (say). Hence enough to show supα Θ J4(α) = O n− . Now define ∈ ∈ itXj ˜ gj(t)= E e and fn(t)= E(itXn). Then we have 

i√ntxα,n  fn √nt = pn xα,n e . α Θ  X∈  Hence by Fourier inversion formula for lattice random variables (cf. page 230 of Bhattacharya and Rao (86)), we have

1 i√ntxα,n pn xα,n = (2π)− e− fn √nt dt ∗ Z  F  1 1/2 itxα,n = (2π)− n− e− fn t dt, (8.23) √n ∗ Z F where = ( π,π), the fundamental domain corresponding to the lattice distribution of n X . F ∗ − i=1 i Again note that P 1 2 2 1 1/2 itxα,n r/2 σ t /2 q (x ) = (2π)− n− e− n− P˜ it : χ¯ e− n dt. (8.24) n,3 α,n r { ν,n} Z r=0 R X  32 Das D. and Das P.

2 5/3 5/3 Now defining the set E = t : t C (s)√n min C− σ ,C− σn , from (8.23) & (8.24) ∈ R | |≤ 31 33 n 33 we have n  o 1 1 1/2 r/2 σ2 t2/2 supα ΘJ4(α) (2π)− n− fn(t) n− P˜r it : χ¯ν,n e− n dt ∈ ≤ − { } E r=0  Z X 1  r/2 σ2 t2/2 + fn(t) dt + n− P˜r it : χ¯ν,n e− n dt ∗ c | | ∗ c { } √n E (√n ) r=0 Z F ∩ ZR∩ F X  1 1/2  =(2π)− n− J41 + J42 + J43 (say) . (8.25)

3/2  3/2 Note that J41 = O n− by applying Lemma 9 of the main paper with s = 5. J43 = O n− due to the presence of the exponential term in the integrand and the form of the set E. Moreover  noting the form of the set , we can say that there exists constants C > 0, 0 < C ,C < π F ∗ 34 35 36 such that

n 1/2 ityi m m J42 C34 sup gj(n− t) C34 sup E(e 1 ) C34δ , (8.26) ≤ ∗ c ≤ ≤ t √n E i=1 C35 t C36 ∈ F ∩ Y ≤| |≤ for some 0 <δ< 1. Recall that x = 1 for all j 1,...,m . The last inequality is due to the fact ij ∈ { } ity 3/2 that there is no period of E(e i1 ) in the interval [C ,C ] [ C , C ]. Now J = O(n ) 35 36 ∪ − 36 − 35 42 − follows from (8.26) since m (log n)2. Therefore the proof is complete. ≥

l Lemma 11. Let W˘ 1,..., W˘ n be iid mean 0 non-degenerate random vectors in for some natural ′ R it W˘ 1 number l, with finite fourth absolute moment and lim sup t Ee < 1 (i.e. Cramer’s condi- k k→∞ tion holds). Suppose W˘ = (W˘ ,..., W˘ ) where W˘ is a random vector in lj and m l = l, i i′1 im′ ′ ij R j=1 j ˜ ˜ m being a fixed natural number. Consider the sequence of random variables W1,...,PWn where W˜ = (c W˘ ,...,c W˘ ) . c : i 1,...,n , j 1,...,m is a collection of real numbers i i1 i′1 im im′ ′ { ij ∈ { } ∈ { }} 1 n 4 1 n 2 such that for any j 1,...,m , n− i=1 cij = O(1) and lim infn n− i=1 cij > 0. Also ∈ { } | | →∞ ˜ ˜ assume that Vn = V ar(Wi) convergesn toP some positiveo definite matrix and χ¯ν,n denotesP the average νth cumulant of W˜ 1,..., W˜ n. Then we have

n 1/2 1/2 1/2 sup P n− W˜ B 1+ n− P˜ D : χ¯ φ (t)dt = o n− , (8.27) i r ν,n V˜n B ∈ − B − { } ∈Al  i=1  Z h i X   where the collection of sets is as defined in (8.1). Al

Proof of Lemma 11. First note that W˜ 1,..., W˜ n is a sequence of independent random variables. Hence (8.27) follows by Theorem 20.6 of Bhattacharya and Rao (1986), provided there exists δ (0, 1), independent of n, such that for all υ δ , 4 ∈ ≤ 4 n 1 3 n− E W˜ i 1 W˜ i > υ√n = o(1) (8.28) i=1 X  

SOC of Bootstrap in Logistic Regression 33 and α 1/2 max D E exp(it′R1†n) dt = o n− (8.29) α l+2 t υ√n | |≤ Zk k≥   1/2 n where R† = n Z EZ with 1n − i=1 i − i P  Z = W˜ 1 W˜ υ√n . i i i ≤   First consider (8.28). Note that max c : i 1,...,n , j 1,...,m = O n1/4 . There- | ij| ∈ { } ∈ { } fore, we have for any υ > 0, n o  n 1 3 n− E W˜ i 1 W˜ i > υ√n i=1   Xn m m 3/2 1 2 2 2 2 2 n− E c W˘ 1 c W˘ > υ n ≤ ij ij ij ij Xi=1  Xj=1   Xj=1  n m 2 1 2 3 2 2 1/2 n− 1+ c E W˘ 1 W˘ >C υ n ≤ ij 1 1 37 i=1 j=1   X  X    =o(1).

α Now consider (8.29). Note that for any α l + 2, D E exp(it R† ) is bounded above by a sum | |≤ | ′ 1n | α of n| |-terms, each of which is bounded above by

α /2 α C (α) n−| | max E Z EZ | | : k I E exp(it′Z /√n) (8.30) 38 · { k i − ik ∈ n} · | i | i Ic Y∈ n c where I 1,...,n is of size α and I = 1,...,n I . Now for any ω > 0 and t lj , define n ⊂ { } | | n { }\ n ∈ R the set B(j)(t,ω)= i : 1 i n and c t >ω . n ≤ ≤ | ij |k k n o Hence for any t l writing t = t ,..., t ′, t is of length l , we have ∈ R 1′ m′ j j  sup E exp(it′Z /√n) : t υ√n | k | k k≥ ( i Ic ) Y∈ n 2 2 =sup E exp(it′Z ) : t υ | k | k k ≥ ( i Ic ) Y∈ n 2 1/2 max sup E exp ic t′ W˘ + P W˘ >C υ n ≤ | ij j 1j | k 1k 37 (  t c (j) Y j h    i i In Bn ,υ/√2 ∈ ∩ tj k k  : t υ/√2 : j 1,...,m k jk≥ ∈  )  34 Das D. and Das P.

c c (j) tj (j) tj Now since In In Bn , υ/√2 Bn , υ/√2 α , due to Cramer’s condition ≥ ∩ tj ≥ tj − | | k k  k k  we have

2 1/2 sup E exp ic t′ W˘ + P W˘ >C υ n : t υ/√2 | ij j 1j | k 1k 37 k qk≥  t  c (j) Y j h    i i In Bn ,υ/√2 ∈ ∩ tj k k  (j) tj Bn ,υ/√2 α θ tj − k k  ≤

(8.31) 1 n 2 Next note that lim infn n− i=1 cij > 0 for all j 1,...,m . Therefore for any j →∞ ∈ { } ∈ 1,...,m , u lj with u = 1, there exists 0 < δ < 1 such that for sufficiently large n we { } ∈ R | | P 5 have n nδ 5 uc 2 2 ≤ ij i=1 X max c 2 : 1 i n B(j)(u, ω) + n B(j)(u, ω) ω2 ≤ ij ≤ ≤ · | n | − | n | · n o   C n1 /2 B(j)(u,ω) + nω2 ≤ 38 · · | n | (j) 1/2 which implies Bn (u,ω) C n whenever ω < δ /2. Therefore taking δ = δ /3, (8.29) | |≥ 39 · 5 4 5 follows from (8.30) and (8.31). p p

8.3. Supplementary Simulation Details

In this section we present expanded forms of the pivots and the forms of the confidence intervals obtained based on our proposed Bootstrap method. Code details for the reproduction of the results of Section 6 and 7 of the main manuscript can be supplied if required. Recall that our model is y = 1, w.p. p(β x ), i | i = 0, w.p. [1 p(β xi)], T − | exp (xi β) where p(β xi) = T , i 1,...,n . Here y1,...,yn are independent binary responses and | 1+exp (xi β) ∈ { } x1,..., xn are known non-random design vectors. β = (β1,n,...,βp,n) is the p-dimensional vector of regression parameters. For the rest of this section xi,A denotes the sub vector of xi comprising of only components belonging to the set A where A 1,...,p . For any vector γ of length p, γA ⊆ { } is the sub vector of γ comprising of only components belonging to the set A.

The logistic regression estimator βn of β is defined as

βˆ = Argmax L(β y ,...,y , x ,..., x ), n β | 1 n 1 n SOC of Bootstrap in Logistic Regression 35 where L(β y ,...,y , x ,..., x ) = n p(x )yi (1 p(x ))1 yi is the likelihood. The Bootstrap | 1 n 1 n i=1 i − i − ˆ ˆ version [hereafter referred to as PEBBLE]Q βn∗ of βn is defined as n n ′ ˆ ∗ ∗ xit βn∗ = argmax (yi pˆ(xi))xi′ t (Gi∗ µG )+ µG pˆ(xi)(xi′ t) log(1 + e ) , t " − − − # Xi=1 n o Xi=1 n o where G1,...,Gn∗ are iid copies of a non-negative & non-degenerate random variable G∗ with 2 3 3 V ar(G )= µ ∗ and E(G µ ∗ ) = µ ∗ . One example of the distribution of G is Beta(1/2, 3/2). ∗ G ∗ − G G ∗ 8.3.1. Form of the confidence region for the parameter

The original studentized pivot for the parameter vector β is

1/2 1/2 Hˇ = Mˆ − Lˆ √n βˆ β + Mˆ − b Z, n n n n − n n ′ ′ T n ˆ ˆ   n 2 exp (x βˆn) ˆ 1 xiβn xiβn 2 ˆ 1 i where Ln = n− i=1 xixi′ e (1+e )− , Mn = n− i=1 yi pˆ(xi) xixi′ ,p ˆ(xi)= T ˆ . − 1+exp (xi βn) Z is distributed asPN 0, D where D is a p p diagonal matrix,P independent of y1,...,yn. bn n 1 × { } ≥ d 1/p1 2 is a sequence of real numbers such that bn = O(n− ) and n− log n = o(bn) where d > 0 is a constant and p = max p + 1, 4 . Corresponding PEBBLE version of the studentized pivot is 1 { } defined as 1/2 1/2 Hˇ ∗ = Mˆ ∗− L∗ √n βˆ∗ βˆ + Mˆ ∗− b Z∗, n n n n − n n n 1 n x′ βˆ∗ x′ βˆ∗ 2 1 n 2 2 2 where L = n x x e i n 1+e i n − and Mˆ =n y pˆ(x ) x x µ−∗ (G µ ∗ ) . n∗ − i=1 i i′ n∗ − i=1 i− i i i′ G i∗− G Z∗ has the sameP distribution asZ, independent of y1,...,ynPand G1∗,...,Gn∗. ˇ ˇ For some α (0, 1), let Hn∗ be the αth quantile of the Bootstrap distribution of Hn∗ . ∈ k k α k k Then the 100(1 α)% confidence region of β is given by − ˇ ˇ β : Hn Hn∗ . k k≤ k k (1 α)    −  8.3.2. Form of the confidence intervals for the components of the parameter

The for the jth component of β is formulated as

1 ˇ ˆ − 2 ˆ ˆ 1 ′ Hj,n = Σj,n √n(βj,n βj)+ bn Ln− Z , − j    ·  where βˆ & β are respectively the jth component of βˆ and β, j 1,...,p . Σˆ is the j,n j,n n ∈ { } j,n ˆ ˆ ˆ 1 ˆ ˆ 1 ˆ 1 ′ ˆ 1 (j, j)-th element of Σn where Σn = Ln− MnLn− and Ln− is the j-th row of Ln− . Similarly the j · Bootstrap version corresponding to Hˇj,n is defined as 

1 ˇ ∗− 2 ˆ ˆ 1 ′ Hj,n∗ =Σj,n √n(βj,n∗ βj,n)+ bn Ln∗− Z∗ , − j    ·  where βˆ is the jth component of the vector βˆ , j 1,...,p .Σ is the (j, j)-th element of Σ j,n∗ n∗ ∈ { } j,n∗ n∗ 1 ˆ 1 1 ′ ˆ 1 where Σn∗ = Ln∗− Mn∗Ln∗− and Ln∗− is the j-th row of Ln∗− . j   · 36 Das D. and Das P.

Now define Hˇ to be the αth quantile of the Bootstrap distribution of Hˇ for some α j,n∗ α j,n∗ ∈ (0, 1). Then 100(1 α)% two-sided confidence interval of β is given by − j 1/2 1/2 Σˆ u∗ Σˆ l∗ βˆ j,n 1j , βˆ j,n 1j , j,n − √n j,n − √n "   #

1 1 ˇ ˆ − 2 ˆ 1 ′ ˇ ˆ − 2 ˆ 1 ′ where l1∗j = (Hj,n∗ )α/2 bnΣj,n Ln− Z and u1∗j = (Hj,n∗ )(1 α)/2 bnΣj,n Ln− Z . Again − j − − j 100(1 α)%h lower and upper confidence  · intervalsi of β areh respectively given by  · i − j 1/2 1/2 Σˆ l∗ Σˆ u∗ , βˆ j,n 2j and βˆ j,n 2j , , −∞ j,n − √n j,n − √n ∞  # "  !

1 1 ˇ ˆ − 2 ˆ 1 ′ ˇ ˆ − 2 ˆ 1 ′ where l2∗j = (Hj,n∗ )α bnΣj,n Ln− Z and u2∗j = (Hj,n∗ )(1 α) bnΣj,n Ln− Z . − j − − j h   · i h   · i References

[1] AMEMIYA, T. (1976). The maximum likelihood, the minimum chi-square, and the non-linear weighted least squares estimator in the general qualitative response model. Journal of the American Statistical Association. 71 347–351. [2] AMORIM, M.M.R., SANDRO, A.S.R., and KATZ, L. (2017). Planned caesarean section versus planned vaginal birth for severe pre-eclampsia. Cochrane Database Syst Rev. 10 CD009430. [3] BALCI, A., DRENTHEN, W., MULDER, B.J. et al. (2011). Pregnancy in women with cor- rected tetralogy of Fallot: occurrence and predictors of adverse events. Am Heart J. 161 307– 313. [4] BARBE, P. and BERTAIL, P. (2012). The weighted Bootstrap . Lecture Notes in Statistics. [5] BERKSON, J. (1944). Application of the Logistic Function to Bio-Assay. Journal of the Amer- ican Statistical Association. 39 357–365. [6] BIRNBAUM, Z. W. (1942). An Inequality for Mill’s Ratio. Annals of Mathematical Statistics. 13(2) 245–246. [7] BHATIA, R. (1996). Matrix Analysis. Springer. [8] BHATTACHARYA, R. N. and GHOSH, J. K. (1978). On the validity of the formal Edgeworth expansion. Ann. Statist. 6 434-451. [9] BHATTACHARYA, R. N. and RANGA RAO, R. (1986). Normal approximation and asymp- totic expansions. John Wiley & Sons. [10] CLAESKENS, G., AERTS, M. and MOLENBERGHS, G. (2003). A quadratic Bootstrap method and improved estimation in logistic regression. Statistics & Probability Letters. 61(4) 383–394. [11] COX, D. R. (1958). The of Binary Sequences. Journal of the Royal Sta- tistical Society: Series B. 20(2). 215–232. SOC of Bootstrap in Logistic Regression 37

[12] DAS, D., GREGORY, K. and LAHIRI, S. N. (2019). Perturbation Bootstrap in Adaptive Lasso. Ann. of Statist. (47) 2080-2116. [13] DAS, D. and LAHIRI S. N. (2019). Second Order Correctness of Perturbation Bootstrap M- Estimator of Multiple Linear Regression Parameter. Bernoulli. 25 654-682. [14] DAVISON, A.C., HINKLEY, D.V. and SCHECHTMAN, E. (1986). Efficient Bootstrap simu- lations. Biometrika 73 555–566. [15] EFRON, B. (1979). Bootstrap Methods: Another Look at the Jackknife. Annals of Statistics. 7(1) 1–26. [16] FAHRMEIR, L. and KAUFMANN, H. (1985). Consistency and Asymptotic Normality of the Maximum Likelihood Estimator in Generalized Linear Models. Annals of Statistics. 13(1) 342–368. [17] FREEDMAN, D. A. (1981) Bootstrapping Regression Models. Ann. Statist. 9, 1218-–1228. [18] FUK, D. H. and NAGAEV, S. V. (1971). Probabilistic inequalities for sums of independent random variables. Teor. Verojatnost. i Primenen. 16 660-675. [19] GHOSH, J. K. (1994). Higher Order Asymptotics. NSF-CBMS regional conference series in probability and statistics, Vol. 4. [20] GOURIEROU, C. and MONFORT, A. (1981). Asymptotic properties of the maximum likeli- hood estimator in dichotomous logit models. Journal of . 17(1) 83–97. [21] HABERMAN, S. J. (1974). Log-Linear Models for Frequency Tables Derived by Indirect Ob- servation: Maximum Likelihood Equations. Annals of Statistics. 2(5) 911–924. [22] HALL, P. (1992). The Bootstrap and Edgeworth expansion. Springer Series in Statistics. [23] HOSMER, D. W., LEMESHOW, S. and STURDIVANT, R. X. (2013). Applied Logistic Re- gression. Wiley Series in Probability and Statistics. [24] LAHIRI, S. N. (1989). Bootstrap approximations to the distributions of m-estimators. Thesis. [25] LAHIRI, S. N. (1992). Bootstrapping M-estimators of a multiple linear regression parameter. Ann. Statist. 20 1548-1570. [26] LAHIRI, S. N. (1993). Bootstrapping the Studentized Sample Mean of Lattice Variables. Jour- nal of Multivariate Analysis. 45(2) 247–256. [27] LAHIRI, S. N. (1994). On two-term Edgeworth expansions and Bootstrap approximations for Studentized multivariate M-estimators. Sankhya A. 56 201-226. [28] LEE, K-W. (1990). Bootstrapping logistic regression models with random regressors. Commu- nications in Statistics - Theory and Methods. 19(7) 2527–2539. [29] LIU, R. Y. (1988). Bootstrap Procedures under some Non-IID Models. . Ann. Statist. 16 1696–1708. [30] MAMMEN E. (1993). Bootstrap and Wild Bootstrap for High Dimensional Linear Models. Ann. Statist. 21 255–285. [31] McFADDEN, D. (1974). Conditional Logit Analysis of Qualitative Choice Behavior. In P. Zarembka, ed., Frontiers in Econometrics. New York: Academic Press. 38 Das D. and Das P.

[32] MOULTON, L. H., and ZEGER, S. L. (1989). Analyzing repeated measures on generalized linear models via the Bootstrap. Biometrics 45(2) 381–394. [33] MOULTON, L. H., and ZEGER, S. L. (1991). Bootstrapping generalized linear models. Com- putational Statistics and Data Analysis 11(1) 53–63. [34] PIEPER, P.G. (2012). The pregnant woman with heart disease: management of pregnancy and delivery. Neth Heart J. 20(1) 33–37. [35] RYDAHL, E.,DECLERCQ, E., JUHL, and M., MAIMBURG, R.D. (2019). Cesarean section on a rise—Does advanced maternal age explain the increase? A population register-based study. Plos One 14(1) e0210655. [36] YAP, S.C., DRENTHEN, W., PIEPER, P.G. et al. (2008). On behalf of the ZAHARA Inves- tigators. Risk of complications during pregnancy in women with congenital aortic stenosis. Int J Cardiol. 126 240–246.