Bootstrap

Nathaniel E. Helwig

Assistant Professor of Psychology and University of Minnesota (Twin Cities)

Updated 04-Jan-2017

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 1 Copyright

Copyright c 2017 by Nathaniel E. Helwig

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 2 Outline of Notes

1) Background Information 3) Bootstrap in Practice Bootstrap in R distributions and -squared error Need for resampling The Jackknife

2) Bootstrap Basics 4) Bootstrapping Regression Overview Regression review Empirical distribution Bootstrapping residuals Plug-in principle Bootstrapping pairs

For a thorough treatment see: Efron, B. & Tibshirani, R.J. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 3 Background Information

Background Information

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 4 Background Information Statistical Inference The Classic Statistical Paradigm

X is some , e.g., age in years

X = {x1, x2, x3,...} is some population of interest, e.g., Ages of all students at the University of Minnesota Ages of all people in the state of Minnesota

At the population level. . . F(x) = P(X ≤ x) for all x ∈ X is the population CDF θ = t(F) is population parameter, where t is some function of F

At the level. . . 0 iid x = (x1,..., xn) is sample of with xi ∼ F for i ∈ {1,..., n} θˆ = s(x) is sample , where s is some function of x

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 5 Background Information Statistical Inference The Classic Statistical Paradigm (continued)

θˆ is a random variable that depends on x (and thus F)

The of θˆ refers to the CDF (or PDF) of θˆ.

If F is known (or assumed to be known), then the sampling distribution of θˆ may have some known distribution. iid 2 ¯ 2 ¯ 1 Pn If xi ∼ N(µ, σ ), then x ∼ N(µ, σ /n) where x = n i=1 xi Note in the above example, θ ≡ µ and θˆ ≡ x¯

How can we make inferences about θ using θˆ when F is unknown?

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 6 Background Information Sampling Distributions The Hypothetical Ideal

Assume that X is too large to measure all members of population.

If we had a really LARGE research budget, we could collect B independent samples from the population X 0 iid xj = (x1j ,..., xnj ) is j-th sample with xij ∼ F ˆ θj = s(xj ) is statistic (parameter estimate) for j-th sample

ˆ ˆ B Sampling distribution of θ can be estimated via distribution of {θj }j=1.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 7 Background Information Sampling Distributions The Hypothetical Ideal: Example 1 (Normal Mean) iid Sampling Distribution of x¯ with xi ∼ N(0, 1) for n = 100:

Sampling Distribution: B = 200 Sampling Distribution: B = 500 Sampling Distribution: B = 1000

5 x pdf 5 x pdf 5 x pdf 4 4 4 3 3 3 Density Density Density 2 2 2 1 1 1 0 0 0

−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4

xbar xbar xbar

Sampling Distribution: B = 2000 Sampling Distribution: B = 5000 Sampling Distribution: B = 10000

5 x pdf 5 x pdf 5 x pdf 4 4 4 3 3 3 Density Density Density 2 2 2 1 1 1 0 0 0

−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4

xbar xbar xbar

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 8 Background Information Sampling Distributions The Hypothetical Ideal: Example 1 R Code

# hypothetical ideal: example 1 (normal mean) set.seed(1) n = 100 B = c(200,500,1000,2000,5000,10000) xseq = seq(-0.4,0.4,length=200) quartz(width=12,height=8) par(mfrow=c(2,3)) for(k in 1:6){ X = replicate(B[k], rnorm(n)) xbar = apply(X, 2, mean) hist(xbar, freq=F, xlim=c(-0.4,0.4), ylim=c(0,5), main=paste("Sampling Distribution: B =",B[k])) lines(xseq, dnorm(xseq, sd=1/sqrt(n))) legend("topright",expression(bar(x)*" pdf"),lty=1,bty="n") }

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 9 Background Information Sampling Distributions The Hypothetical Ideal: Example 2 (Normal ) iid Sampling Distribution of median(x) with xi ∼ N(0, 1) for n = 100:

Sampling Distribution: B = 200 Sampling Distribution: B = 500 Sampling Distribution: B = 1000

5 x pdf 5 x pdf 5 x pdf 4 4 4 3 3 3 Density Density Density 2 2 2 1 1 1 0 0 0

−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4

xmed xmed xmed

Sampling Distribution: B = 2000 Sampling Distribution: B = 5000 Sampling Distribution: B = 10000

5 x pdf 5 x pdf 5 x pdf 4 4 4 3 3 3 Density Density Density 2 2 2 1 1 1 0 0 0

−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4

xmed xmed xmed

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 10 Background Information Sampling Distributions The Hypothetical Ideal: Example 2 R Code

# hypothetical ideal: example 2 (normal median) set.seed(1) n = 100 B = c(200,500,1000,2000,5000,10000) xseq = seq(-0.4,0.4,length=200) quartz(width=12,height=8) par(mfrow=c(2,3)) for(k in 1:6){ X = replicate(B[k], rnorm(n)) xmed = apply(X, 2, median) hist(xmed, freq=F, xlim=c(-0.4,0.4), ylim=c(0,5), main=paste("Sampling Distribution: B =",B[k])) lines(xseq, dnorm(xseq, sd=1/sqrt(n))) legend("topright",expression(bar(x)*" pdf"),lty=1,bty="n") }

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 11 Background Information Need for Resampling Back to the Real World

In most cases, we only have one sample of data. What do we do?

If n is large and we only care about x¯, we can use the CLT.

iid Sampling Distribution of x¯ with xi ∼ U[0, 1] for B = 10000:

Sampling Distribution: n = 3 Sampling Distribution: n = 5 Sampling Distribution: n = 10

2.5 asymp pdf asymp pdf asymp pdf 3.0 4 2.0 2.5 3 2.0 1.5 1.5 2 Density Density Density 1.0 1.0 1 0.5 0.5 0 0.0 0.0

0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8

xbar xbar xbar

Sampling Distribution: n = 20 Sampling Distribution: n = 50 Sampling Distribution: n = 100 10 asymp pdf asymp pdf 14 asymp pdf 6 12 8 5 10 4 6 8 3 Density Density Density 6 4 2 4 2 1 2 0 0 0

0.3 0.4 0.5 0.6 0.7 0.8 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.40 0.45 0.50 0.55 0.60

xbar xbar xbar

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 12 Background Information Need for Resampling The Need for a Nonparametric Resampling Method

For most statistics other than the sample mean, there is no theoretical argument to derive the sampling distribution.

To make inferences, we need to somehow obtain (or approximate) the sampling distribution of any generic statistic θˆ. Note that overcome this issue by assuming some particular distribution for the data Nonparametric bootstrap overcomes this problem by resampling observed data to approximate the sampling distribution of θˆ.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 13 Bootstrap Basics

Bootstrap Basics

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 14 Bootstrap Basics Overview Problem of Interest

In statistics, we typically want to know the properties of our estimates, e.g., precision, accuracy, etc.

In parametric situation, we can often derive the distribution of our estimate given our assumptions about the data (or MLE principles).

In nonparametric situation, we can use the bootstrap to examine properties of our estimates in a variety of different situations.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 15 Bootstrap Basics Overview Bootstrap Procedure

0 iid Suppose x = (x1,..., xn) with xi ∼ F(x) for i ∈ {1,..., n}, and we want to make inferences about some statistic θˆ = s(x).

Can use Monte Carlo Bootstrap:

1 ∗ Sample xi with replacement from {x1,..., xn} for i ∈ {1,..., n} 2 ˆ∗ ∗ ∗ ∗ ∗ 0 Calculate θ = s(x ) for b-th sample where x = (x1 ,..., xn ) 3 Repeat 1–2 a total of B times to get bootstrap distribution of θˆ 4 Compare θˆ = s(x) to bootstrap distribution

ˆ ˆ∗ B Estimated of θ is standard of {θb}b=1: r 1 σˆ = PB (θˆ∗ − θ¯∗)2 B B − 1 b=1 b ¯∗ 1 PB ˆ∗ ˆ where θ = B b=1 θb is the mean of the bootstrap distribution of θ. Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 16 Bootstrap Basics Empirical Distribution Empirical Cumulative Distribution Functions

0 iid Suppose x = (x1,..., xn) with xi ∼ F(x) for i ∈ {1,..., n}, and we want to estimate the cdf F.

ˆ The empirical cumulative distribution function (ecdf) Fn is defined as 1 Fˆ (x) = Pˆ(X ≤ x) = Pn I n n i=1 {xi ≤x}

where I{·} denotes an indicator function.

The ecdf assigns probability 1/n to each value xi , which implies that 1 Pˆ (A) = Pn I n n i=1 {xi ∈A} for any set A in the of X.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 17 Bootstrap Basics Empirical Distribution Some Properties of ECDFs

For any fixed value x, we have that ˆ E[Fn(x)] = F(x) ˆ 1 V [Fn(x)] = n F(x)[1 − F(x)]

As n → ∞ we have that ˆ as sup |Fn(x) − F(x)| → 0 x∈R which is the Glivenko-Cantelli theorem.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 18 Bootstrap Basics Empirical Distribution ECDF Visualization for

n = 100 n = 500 n = 1000

● ● ● ● ●●●●● ● ●●●● 1.0 ● 1.0 ●●●● 1.0 ● ●● ● ●●● ● ●●● ● ●●● ● ●●● ● ●●● ● ●● ● ●●●● ● ●● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ●● 0.8 ● 0.8 ●● 0.8 ● ●● ● ● ● ●● ● ●● ● ●● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ●● ● ●● ● ●

0.6 0.6 ● 0.6 ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● Fn(x) ● Fn(x) ● Fn(x) ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● 0.4 ● 0.4 ●● 0.4 ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●●● ● ●● 0.2 0.2 ● 0.2 ● ●●● ● ●● ● ●● ● ● ● ●● ● ●● ● ●●● ● ●● ● ●● ● ●●● ● ●● ● ●●● ● ●●●●● ● ●●● ● ●●●● ● ●●● ● ● ● ●● ● 0.0 0.0 0.0

−2 −1 0 1 2 −2 0 2 4 −4 −2 0 2 4

x x x

set.seed(1) par(mfrow=c(1,3)) n = c(100,500,1000) xseq = seq(-4,4,length=100) for(j in 1:3){ x = rnorm(n[j]) plot(ecdf(x),main=paste("n = ",n[j])) lines(xseq,pnorm(xseq),col="blue") } Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 19 Bootstrap Basics Empirical Distribution ECDF Example

Table 3.1 An Introduction to the Bootstrap (Efron & Tibshirani, 1993).

School LSAT (y) GPA (z) School LSAT (y) GPA (z) 1 576 3.39 9 651 3.36 2 635 3.30 10 605 3.13 3 558 2.81 11 653 3.12 4 578 3.03 12 575 2.74 5 666 3.44 13 545 2.76 6 580 3.07 14 572 2.88 7 555 3.00 15 594 2.96 8 661 3.43

Defining A = {(y, z): 0 < y < 600, 0 < z < 3.00}, we have ˆ P15 P15(A) = (1/15) i=1 I{(yi ,zi )∈A} = 5/15

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 20 Bootstrap Basics Plug-In Principle Plug-In Parameter Estimates

0 iid Suppose x = (x1,..., xn) with xi ∼ F(x) for i ∈ {1,..., n}, and we want to estimate some parameter θ = t(F) that depends on the cdf F. Example: want to estimate E(X) = R xf (x)dx

The plug-in estimate of θ = t(F) is given by

θˆ = t(Fˆ)

which is the statistic calculated using the ecdf in place of the cdf.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 21 Bootstrap Basics Plug-In Principle Plug-In Estimate of Mean

0 iid Suppose x = (x1,..., xn) with xi ∼ F(x) for i ∈ {1,..., n}, and we want to estimate the expected value θ = E(X) = R xf (x)dx.

The plug-in estimate of the expected value is the sample mean

n n X 1 X θˆ = E (x) = x ˆf = x = x¯ Fˆ i i n i i=1 i=1

ˆ 1 where fi = n is sample probability from the ecdf.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 22 Bootstrap Basics Plug-In Principle Standard Error of Mean

2 2 Let µF = EF (x) and σF = VF (x) = EF [(x − µF ) ] denote the mean and 2 of X, and denote this using the notation X ∼ (µF , σF ).

0 iid If x = (x1,..., xn) with xi ∼ F(x) for i ∈ {1,..., n}, then the sample ¯ 1 Pn ¯ 2 mean x = n i=1 xi has mean and variance x ∼ (µF , σF /n).

The standard error of the mean x¯ is the square root of the variance of x¯ √ SEF (x¯) = σF / n

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 23 Bootstrap Basics Plug-In Principle Plug-In Estimate of Standard Error of Mean

0 iid Suppose x = (x1,..., xn) with xi ∼ F(x) for i ∈ {1,..., n}, and we want to estimate√ the standard error of the mean p 2 SEF (x¯) = σF / n = EF [(x − µF ) ]/n.

The plug-in estimate of the is given by

1 1/2 σˆ = σ = Pn (x − x¯)2 Fˆ n i=1 i

so the plug-in estimate of the standard error of the mean is

√ √  1 1/2 σ/ˆ n = σ / n = Pn (x − x¯)2 Fˆ n2 i=1 i

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 24 Bootstrap in Practice

Bootstrap in Practice

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 25 Bootstrap in Practice Bootstrap in R Bootstrap Standard Error (revisited)

0 iid Suppose x = (x1,..., xn) with xi ∼ F(x) for i ∈ {1,..., n}, and we want to make inferences about some statistic θˆ = s(x).

ˆ ˆ∗ B Estimated standard error of θ is standard deviation of {θb}b=1: r 1 σˆ = PB (θˆ∗ − θ¯∗)2 B B − 1 b=1 b

¯∗ 1 PB ˆ∗ ˆ where θ = B b=1 θb is the mean of the bootstrap distribution of θ.

As the number of bootstrap samples goes to infinity, we have that ˆ lim σˆB = SEˆ (θ) B→∞ F ˆ ˆ where SEFˆ (θ) is the plug-in estimate of SEF (θ). Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 26 Bootstrap in Practice Bootstrap in R

48 THE BOOTSTRAP ESTIMATE OF STANDARD ERROR Illustration of Bootstrap Standard Error EXAMPLE: THE 49 tion g( x*), the expectation with x (and P) fixed and x* varymg Figure 6.1. The bootstrap algorithm for estimating the standard error of according to (6.1). a statistzc iJ = s(x); each bootstrap sample zs an mdependent random The reader is asked in Problem 6.5 to show that there is a total sample of szze n from F The number of bootstrap replications B for 1 Figure 6.1: An Introduction to the Bootstrap (Efron & Tibshirani, 1993). 1 2 estimating a standard error zs usually between 25 and 200. As B ---> oo, of enn- ) distinct bootstrap samples. Denote these by z , z ,. . zm 1 ies approaches the plug-m estimate of sep ( iJ). where m = enn- ). For example, if n = 2, the distinct sam- ples are ( x1, xt), (x2, Xz) and ( x1, x2); since the order doesn't mat- Bootstrap Bootstrap Bootstrap Estimate ter, ( x2, xt) 1s the same as ( x1, x2). The probability of obtaining Empirical Samples of Replications of Standard Error one of these samples under sampling with replacement can be ob- Distribution Size n of 8 tained frotu Llw multinomial distribution: details are in Problem t t 6.7 Denote the probability of the jth distinct sample by Wj,J = - 1 1, 2,.-- enn- ). Then a direct way to calculate the ideal bootstrap I ::: estimate of standard error would be to use the population standard 3 = deviation of them bootstrap values s(zl): 3 x* -----+ S*(3) = s(x* ) m sep(O*) = (L w.i{s(zl) _ s(-)}2]1/2 (6.8) F / *b A x -----+ S*(b) = s(x*b) J=1

where s(·) = I:;"= 1 Wjs(zJ). The difficulty with this approach is that unless n is quite small(::::; 5), the number en,; 1) is very large, making computation of (6.8) impractical. Hence the need for boot- strap sampling as described above.

6.3 Example: the correlation coefficient We have already seen two examples of the bootstrap standard error estimate, for the mean and the median of the Treatment group of the mouse data, Table 2.1. As a second example consider the sample correlation coefficient between y = LSAT and z = GPA for then= 15 law school data points, Table 3.1, Con.·(y, z) = .776. How accurate is the estimate . 776? Table 6.1 shows the bootstrap estimate of standard error BeiJ for B ranging from 25 to 3200. The last value, sca2ou = .132, ts our estimate for se1.·(cilli). Later we A word about notation: m (G.7) we write sep(iJ*) rather than Will see that SC200 is nearly as good an estimate of Sep as is Be3200· Looking at the nght stde of Figure 3.1, the reader can i;nagine Nathaniel E. Helwig (U of Minnesota)sep(O) to avoid confuswnBootstrap between Resampling o, the value of s(x) based onUpdated 04-Jan-2017the bootstrap :Slide sampling 27 process at work. The sample correlation of the observed data, and 0* = s(x*) thought of as a random variable the n = 15 actual data points is CciiT = . 776. A bootstrap sample based on the bootstrap sample. The fuller notation se p ( 0( x*)) em- consists of 15 points selected at random and with replacement from phasizes that sep is a bootstrap standard error: the actual data x the actual 15. The sample correlation of the bootstrap sample is a is held fixed in (6.7); the in the calculation comes from bootstrap CciiT*, which may be either bigger or smaller the variability of the bootstrap samples x*, gwen x. Similarly we than CciiT. Independent repetitions of the bootstrap sampling pro- will write E pg(x*) to indicate the bootstrap expectation of a func- cess give bootstrap replications COiT* ( 1), COiT* (2), ; COiT* (B). Fi- Bootstrap in Practice Bootstrap in R Illustration of Bootstrap Procedure

Figure 8.1: An Introduction to the Bootstrap (Efron & Tibshirani, 1993). ONE-SAMPLE PROBLEMS 87

REAL WORLD BOOTSTRAP WORLD CHAPTER 8

Unknown Probability Observed Random Bootstrap Sample Sample More complicated data Distribution D1stnbution structures

8.1 Introduction Statistic of interest Bootstrap Replication The bootstrap algorithm of Figure 6.1 is based on the simplest possible probability model for random data: the one-sample model, Figure 8.1. A schematic diagram of the bootstrap as it applies to one- where a single unknown F produces the sample problems. In the r·eal world, the unknown probability distribution Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 28 data x by random sampling F gtves the data x = (x1 , x 2 ,- · , xn) by random sampling; from x we calculate the stattstic of mterest {j s(x). In the bootstrap world, F (8.1) = generates x* by random sampling, gwmg {j• = s(x*). There ts only one The individual data points xt in (8.1) can themselves be quite observed value of e, but we can generate as many bootstrap replications complex, perhaps being numbers or vectors or maps or images or {j• as affordable. The crucwl step m the bootstrap pro,cess ts "===* ", the anything at all, but the probability mechanism is simple. Many process by whtch we construct from x an estimate F of the unknown data analysis problems involve more complicated data structures. population F These structures have names like , , regression models, multi-sample problems, censored data, stratified late bootstrap replications of the statistic of interest, {j• = s(x*). sampling, and so on. The bootstrap algorithm can be adapted to The big advantage of the, bootstrap world is that we can calculate general data structures, as is dificussed here and m Chapter 9. as many replications of B* as we want, or at least as many as we can afford. This allows us to do probabilistic calcu!ations directly, 8.2 One-sample problems for example using the observed variability of the B*'s to estimate the unobservable quantity sep(iJ). , Figure 8.1 is a schematic diagram of the bootstrap method as The double arrow in Figure 8.1 indicates the calculation of F it applies to one-sample problems. On the left IS the real world, from F _Conceptually, this is the crucial step in the bootstrap pro- where an unknown distribution F has given the observed data cess, even though it is computationally simple. Every other part of x = ( x 1 , x 2 , - - - , Xn) by random sampling_ We have calculated a the bootstrap picture is defined by analogy: F, gives x by random statistic of interest from x, =: s(x), and wish to know something iJ sampling, so F gives x* random sampling; e is obtained from X about iJ's statistical behavior, perhaps its standard error se F ( iJ). via the function s(x), so e· is obtained from x* in the same way. On the right side of the diagram is the bootstrap world, to use Bootstrap calculations for more complex probability mechanisms David Freedman's evocative te,:minology. In the bootstrap world, turn out to be straightforward, once we know how to carry out the empirical distribution F gives bootstrap samples the double arrow process - estimating the entire probability mech- x* = (xi, - · - , x;,) by random sampling, from which we calcu- anism from the data. Fortunately this is easy to do for all of the Bootstrap in Practice Bootstrap in R An R Function for Bootstrap Resampling

We can design our own bootstrap sampling function:

bootsamp <- function(x,nsamp=10000){ x = as.matrix(x) nx = nrow(x) bsamp = replicate(nsamp,x[sample.int(nx,replace=TRUE),]) }

If x is a vector of length n, then bootsamp returns an n × B matrix, where B is the number of bootstrap samples (controlled via nsamp).

If x is a matrix of order n × p, then bootsamp returns an n × p × B array, where B is the number of bootstrap samples.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 29 Bootstrap in Practice Bootstrap in R An R Function for Bootstrap Standard Error

We can design our own bootstrap standard error function: bootse <- function(bsamp,myfun,...){ if(is.matrix(bsamp)){ theta = apply(bsamp,2,myfun,...) } else { theta = apply(bsamp,3,myfun,...) } if(is.matrix(theta)){ return(list(theta=theta,cov=cov(t(theta)))) } else{ return(list(theta=theta,se=sd(theta))) } }

ˆ∗ B Returns a list where theta contains bootstrap statistic {θb}b=1, and se contains the bootstrap standard error estimate (or cov contains bootstrap matrix). Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 30 Bootstrap in Practice Bootstrap in R Example 1: Sample Mean

> set.seed(1) of bse$theta > x = rnorm(500,mean=1) > bsamp = bootsamp(x) > bse = bootse(bsamp,mean) 1500 > mean(x)

[1] 1.022644 1000

> sd(x)/sqrt(500)

[1] 0.04525481 500 > bse$se

[1] 0.04530694 0

> hist(bse$theta) 0.9 1.0 1.1 1.2

bse$theta

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 31 Bootstrap in Practice Bootstrap in R Example 2: Sample Median

> set.seed(1) Histogram of bse$theta > x = rnorm(500,mean=1) > bsamp = bootsamp(x) 5000

> bse = bootse(bsamp,median) 4000 > median(x) [1] 0.9632217 3000

> bse$se Frequency 2000 [1] 0.04299574

> hist(bse$theta) 1000 0

0.8 0.9 1.0 1.1 1.2

bse$theta

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 32 Bootstrap in Practice Bootstrap in R Example 3: Sample Variance

> set.seed(1) Histogram of bse$theta > x = rnorm(500,sd=2) 1500 > bsamp = bootsamp(x) > bse = bootse(bsamp,var)

> var(x) 1000 [1] 4.095996

> bse$se Frequency [1] 0.2690615 500 > hist(bse$theta) 0

3.5 4.0 4.5 5.0

bse$theta

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 33 Bootstrap in Practice Bootstrap in R Example 4: Mean Difference

> set.seed(1) > x = rnorm(500,mean=3) > y = rnorm(500) > z = cbind(x,y) > bsamp = bootsamp(z) > myfun = function(z) mean(z[,1]) - mean(z[,2]) > bse = bootse(bsamp,myfun) > myfun(z) Histogram of bse$theta [1] 3.068584 > sqrt( (var(z[,1])+var(z[,2]))/nrow(z) ) [1] 0.06545061 > bse$se 2500

[1] 0.06765369 2000 > hist(bse$theta) 1500 Frequency 1000 500 0

2.8 2.9 3.0 3.1 3.2 3.3

bse$theta

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 34 Bootstrap in Practice Bootstrap in R Example 5: Median Difference

> set.seed(1) > x = rnorm(500,mean=3) > y = rnorm(500) > z = cbind(x,y) > bsamp = bootsamp(z) > myfun = function(z) median(z[,1]) - median(z[,2]) > bse = bootse(bsamp,myfun) > myfun(z) Histogram of bse$theta [1] 2.984479

> bse$se 3000 [1] 0.07699423 > hist(bse$theta) 2000 Frequency 1000 500 0

2.8 3.0 3.2 3.4

bse$theta

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 35 Bootstrap in Practice Bootstrap in R Example 6: Correlation Coefficient

> set.seed(1) Histogram of bse$theta > x=rnorm(500) > y=rnorm(500) > Amat=matrix(c(1,-0.25,-0.25,1),2,2) 2000 > Aeig=eigen(Amat,symmetric=TRUE)

> evec=Aeig$vec 1500 > evalsqrt=diag(Aeig$val^0.5) > Asqrt=evec%*%evalsqrt%*%t(evec) 1000

> z=cbind(x,y)%*%Asqrt Frequency > bsamp=bootsamp(z)

> myfun=function(z) cor(z[,1],z[,2]) 500 > bse=bootse(bsamp,myfun)

> myfun(z) 0 [1] -0.2884766 −0.45 −0.35 −0.25 −0.15 > (1-myfun(z)^2)/sqrt(nrow(z)-3) [1] 0.04112326 bse$theta > bse$se [1] 0.03959024 > hist(bse$theta)

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 36 Bootstrap in Practice Bootstrap in R Example 7: Uniform[0, θ] = bootstrap failure

> set.seed(1) Histogram of bse$theta > x = runif(500) > bsamp = bootsamp(x) > myfun = function(x) max(x) > bse = bootse(bsamp,myfun) > myfun(x) 5000 [1] 0.9960774 > bse$se 3000

[1] 0.001472801 Frequency > hist(bse$theta) 1000

Fˆ is not a good estimate of F in 0 extreme tails. 0.986 0.988 0.990 0.992 0.994 0.996 bse$theta

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 37 Bootstrap in Practice Bias and Mean-Squared Error Measuring the Quality of an

We have focused on standard error to measure the precision of θˆ. Small standard error is good, but other qualities are important too!

Consider the toy example: n iid 2 Suppose {xi }i=1 ∼ (µ, σ ) and want to estimate mean of X Define µˆ = 10 + x¯ to be our estimate of µ √ √ Standard error of µˆ is σ/ n and limn→∞ σ/ n = 0 But µˆ = 10 + x¯ is clearly not an ideal estimate of µ

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 38 Bootstrap in Practice Bias and Mean-Squared Error

0 iid Suppose x = (x1,..., xn) with xi ∼ F(x) for i ∈ {1,..., n}, and we want to make inferences about some statistic θˆ = s(x).

The bias of an estimate θˆ = s(x) of θ = t(F) is defined as

BiasF = EF [s(x)] − t(F)

where the expectation is taken with respect to F. Bias is difference between expectation of estimate and parameter In the example on the previous slide, we have that BiasF = EF (ˆµ) − µ = EF (10 + x¯) − µ = 10 + EF (x¯) − µ = 10

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 39 Bootstrap in Practice Bias and Mean-Squared Error Bootstrap Estimate of Bias

The bootstrap estimate of bias substitutes Fˆ for F in bias definition ˆ BiasFˆ = EFˆ [s(x)] − t(F)

where the expectation is taken with respect to the ecdf Fˆ. Note that t(Fˆ) is the plug-in estimate of θ t(Fˆ) is not necessarily equal to θˆ = s(x)

Given B bootstrap samples, we can estimate bias using

¯∗ ˆ Biasd B = θ − t(F)

¯∗ 1 PB ˆ∗ ˆ where θ = B b=1 θb is the mean of the bootstrap distribution of θ

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 40 Bootstrap in Practice Bias and Mean-Squared Error An R Function for Bootstrap Bias

We can design our own bootstrap bias estimation function: bootbias <- function(bse,theta,...){ if(is.matrix(bse$theta)){ return(apply(bse$theta,1,mean) - theta) } else{ return(mean(bse$theta) - theta) } }

The first input bse is the object output from bootse, and the second input theta is the plug-in estimate of theta used for bias calculation.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 41 Bootstrap in Practice Bias and Mean-Squared Error Sample Mean is Unbiased

> set.seed(1) > x = rnorm(500,mean=1) > bsamp = bootsamp(x) > bse = bootse(bsamp,mean) > mybias = bootbias(bse,mean(x)) > mybias [1] 0.0003689287 > mean(x) [1] 1.022644 > sd(x)/sqrt(500) [1] 0.04525481 > bse$se [1] 0.04530694

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 42 Bootstrap in Practice Bias and Mean-Squared Error Toy Example

> set.seed(1) > x = rnorm(500,mean=1) > bsamp = bootsamp(x) > bse = bootse(bsamp,function(x) mean(x)+10) > mybias = bootbias(bse,mean(x)) > mybias [1] 10.00037 > mean(x) [1] 1.022644 > sd(x)/sqrt(500) [1] 0.04525481 > bse$se [1] 0.04530694

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 43 Bootstrap in Practice Bias and Mean-Squared Error (MSE)

The mean-squared error (MSE) of an estimate θˆ = s(x) of θ = t(F) is

2 MSEF = EF {[s(x) − t(F)] } ˆ ˆ 2 = VF (θ) + [BiasF (θ)]

where the expectation is taken with respect to F. MSE is expected squared difference between θˆ and θ In the toy example on the previous slide, we have that

2 MSEF = EF {(ˆµ − µ) } 2 = EF {(10 + x¯ − µ) } 2 2 = EF (10 ) + 2EF [10(x¯ − µ)] + EF [(x¯ − µ) ] = 100 + σ2/n

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 44 Bootstrap in Practice Bias and Mean-Squared Error Toy Example (revisited)

> set.seed(1) > x = rnorm(500,mean=1) > bsamp = bootsamp(x) > bse = bootse(bsamp,function(x) mean(x)+10) > mybias = bootbias(bse,mean(x)) > c(bse$se,mybias) [1] 0.04530694 10.00036893 > c(bse$se,mybias)^2 [1] 2.052718e-03 1.000074e+02 > mse = (bse$se^2) + (mybias^2) > mse [1] 100.0094 > 100 + 1/length(x) [1] 100.002

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 45 Bootstrap in Practice Bias and Mean-Squared Error Balance between

MSE quantifies both the accuracy (bias) and precision (variance).

Ideal are accurate (small bias) and precise (small variance).

Having some bias can be an ok (or even good) thing, despite the negative connotations of the word “biased”.

For example: Q: Would you rather have an estimator that is biased by 1 unit with a standard error of 1 unit? Or one that is unbiased but has a standard error of 1.5 units? A: The first estimator is better with respect to MSE

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 46 Bootstrap in Practice Bias and Mean-Squared Error Accuracy and Precision Visualization

low accuracy and low precision low accuracy and high precision 3 3

● ● ● ● ● ● ● ● ● ● ● ● ● ●●

2 ● ● 2 ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ●● ●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●●●● ● ●● ● ●● ● ● ●● ●● ●● ●●●●●● ●● ● ● ● ●● ●●● ●● ● ● ● ● ●●●●●●●●●●●●● ● ●●● ● ● ● ● ● ●● ● ●●●●●●●●●●●●●●●●●● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●●●●●● ●● ●●●●●●●● ●●●● ●● ● ● ● ● ● ●●●● ●●● ● ● ● ● ●● ●●●●●●●●●●●●●●●●●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●●●●●●●●●●●● ●●●●●●● ● ●● ● ●●● ● ●● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●● ●●●●● ● ●● ● ● ● ●● ● ● ●● ● ● ●●●●●●● ●●●●●●●●●●●●●●●●●● ● ● ● ●● ● ● ● ● ● ● ● ●● ●●●●●●●●●●●●●●●●●● ● ● 1 ● ● ● ● ● ● 1 ●●●●●●●●●●●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●● ●●●● ● ● ● ● ●●●● ● ● ● ● ● ● ●●●●●●● ●●●●●●● ● ●●● ● ● ● ● ● ● ● ● ● ●●●●●●● ● ●●●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ●●●● ● ●●●● ●● ●●● ● ● ●● ●●●● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ●●●●●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● y y 0 ● ● ●● 0 ● ● ● ● ● ●● ● ● ●

● ● −1 −1 −2 −2 Truth Truth ● Estimates ● Estimates −3 −3

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

x x

high accuracy and low precision high accuracy and high precision 3 3

● ●●

●● ●

● ● ● ● ● ● ●● ● ● ● ● ● 2 ● ● 2 ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ● ●● ● ●●●● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●●●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●●●●●● ●●●●●●●●● ● ● ● ● ● ● ●● ● ● ● ●●●● ●●●●●● ● ●●●●●●●●● ● ● ● ●● ●● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ●●●●●● ●●●●●●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●●● ●●●●●●● ●● ● ● ●● ●●● ● ● ● ● ● ● ●● ●●●● ●●●●●●●●●●●●●●●●●● ●● ● ●● ● ●●●● ●● ● ● ●● ●●● ● ●●●● ●●●●●●●●●●●●●●● ●●● ●● ● ● ● ● ● ● ● ● ●● ● ●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●● ● ●●●● ● 1 ● 1 ●● ● ● ●●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●●●●●●●●●●●●●●●● ●● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ●●●●●● ●●●●●●● ●●●●●● ●● ● ● ●● ● ● ● ● ●●●●●●● ●●●● ●●●● ● ●● ● ●● ●● ● ● ● ●●●●● ●● ●●●●●● ● ● ● ●●●●● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● y y

0 ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● −1 −1 −2 −2 Truth Truth ● Estimates ● Estimates −3 −3

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

x x

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 47 Bootstrap in Practice The Jackknife Jackknife Sample

Before the bootstrap, the jackknife was used to estimate bias and SE.

The i-th jackknife sample of x = (x1,..., xn) is defined as

x(i) = (x1,..., xi−1, xi+1,..., xn)

for i ∈ {1,..., n}. Note that. . .

x(i) is original data vector without the i-th observation

x(i) is a vector of length (n − 1)

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 48 Bootstrap in Practice The Jackknife Jackknife Replication

ˆ ˆ The i-th jackknife replication θ(i) of the statistic θ = s(x) is

ˆ θ(i) = s(x(i))

which is the statistic calculated using the i-th jackknife sample.

For plug-in statistics θˆ = t(Fˆ), we have that ˆ ˆ θ(i) = t(F(i))

ˆ where F(i) is the empirical distribution of x(i).

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 49 Bootstrap in Practice The Jackknife Jackknife Estimate of Standard Error

The jackknife estimate of standard error is defined as v u n un − 1 X σˆ = t (θˆ − θˆ )2 jack n (i) (·) i=1

ˆ 1 Pn ˆ where θ(·) = n i=1 θ(i) is the mean of the jackknife estimates of θ.

n−1 ˆ ¯ Note that the n factor is derived considering the special case θ = x v u n u 1 X σˆ = t (x − x¯)2 jack (n − 1)n i i=1

which is an unbiased estimator of the standard error of x¯.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 50 Bootstrap in Practice The Jackknife Jackknife Estimate of Bias

The jackknife estimate of bias is defined as

ˆ ˆ Biasd jack = (n − 1)(θ(·) − θ)

ˆ 1 Pn ˆ where θ(·) = n i=1 θ(i) is the mean of the jackknife estimates of θ.

This approach only works for plug-in statistics θˆ = t(Fˆ). Only works if t(Fˆ) is smooth (e.g., mean or ratio) Doesn’t work if t(Fˆ) is unsmooth (e.g., median) Gives bias estimate using only n recomputations (typically n  B)

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 51 Bootstrap in Practice The Jackknife Smooth versus Unsmooth t(Fˆ)

Suppose we have a sample of data (x1,..., xn), and consider the mean and median as a function of x1:

mean median

●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●● ●●● ●●● ●●● ●●● ●●● ●●● 0.14 ●●● ●●● ●●● ● ●●● ●●● ●● 0.14 ●●● ●●● ●●● ●●● ●●● ●●● ●●● ●●● ●●● ●●● ●●● 0.12 ●●● ●●● ●●● ●●● ●●● ●●● 0.12 ●●● ●●● ●●● ●●● ●●● ●●● ●● medval meanval ●●● ●●● ●●● ●●●

0.10 ● ●●● ●●● ●●● ● ●●● ●●● ● 0.10 ●●● ●●● ●●● ●●● ●●● ●●● ●●● ●●● ●●● ●●● ●●●

0.08 ● ●●● ●●● ●●● ●●● ●●● ●● 0.08 ●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

−4 −2 0 2 4 −4 −2 0 2 4

x1 x1

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 52 Bootstrap in Practice The Jackknife Smooth versus Unsmooth t(Fˆ) R Code

# mean is smooth meanfun <- function(x,z) mean(c(x,z)) set.seed(1) z = rnorm(100) x = seq(-4,4,length=200) meanval = rep(0,200) for(j in 1:200) meanval[j] = meanfun(x[j],z) quartz(width=6,height=6) plot(x,meanval,xlab=expression(x[1]),main="mean")

# median is unsmooth medfun <- function(x,z) median(c(x,z)) set.seed(1) z = rnorm(100) x = seq(-4,4,length=200) medval = rep(0,200) for(j in 1:200) medval[j] = medfun(x[j],z) quartz(width=6,height=6) plot(x,medval,xlab=expression(x[1]),main="median")

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 53 Bootstrap in Practice The Jackknife Some R Functions for

We can design our own jackknife functions:

jacksamp <- function(x){ nx = length(x) jsamp = matrix(0,nx-1,nx) for(j in 1:nx) jsamp[,j] = x[-j] jsamp }

jackse <- function(jsamp,myfun,...){ nx = ncol(jsamp) theta = apply(jsamp,2,myfun,...) se = sqrt( ((nx-1)/nx) * sum( (theta-mean(theta))^2 ) ) list(theta=theta,se=se) }

These functions work similar to the bootsamp and bootse functions if x is a vector and the statistic produced by myfun is unidimensional.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 54 Bootstrap in Practice The Jackknife Example 1: Sample Mean (revisited)

> set.seed(1) Histogram of jse$theta > x = rnorm(500,mean=1)

> jsamp = jacksamp(x) 100 > jse = jackse(jsamp,mean) 80 > mean(x)

[1] 1.022644 60

> sd(x)/sqrt(500) Frequency 40 [1] 0.04525481

> jse$se 20

[1] 0.04525481 0

> hist(jse$theta) 1.016 1.020 1.024 1.028

jse$theta

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 55 Bootstrap in Practice The Jackknife Example 2: Sample Median (revisited)

> set.seed(1) Histogram of jse$theta > x = rnorm(500,mean=1) 250 > jsamp = jacksamp(x)

> jse = jackse(jsamp,median) 200 > median(x) [1] 0.9632217 150 Frequency

> jse$se 100 [1] 0.01911879 > hist(jse$theta) 50 0

0.9625 0.9630 0.9635 0.9640

jse$theta

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 56 Bootstrapping Regression

Bootstrapping Regression

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 57 Bootstrapping Regression Regression Review Simple Model: Scalar Form

The model has the form

yi = b0 + b1xi + ei

for i ∈ {1,..., n} where

yi ∈ R is the real-valued response for the i-th observation b0 ∈ R is the regression intercept b1 ∈ R is the regression slope xi ∈ R is the predictor for the i-th observation iid 2 ei ∼ (0, σ ) is zero-mean measurement error

ind 2 Implies that (yi |xi ) ∼ (b0 + b1xi , σ )

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 58 Bootstrapping Regression Regression Review Simple Linear Regression Model: Matrix Form

The simple linear regression model has the form

y = Xb + e

where 0 n y = (y1,..., yn) ∈ R is the n × 1 response vector n×2 X = [1n, x] ∈ R is the n × 2 design matrix • 1n is an n × 1 vector of ones 0 n • x = (x1,..., xn) ∈ R is the n × 1 predictor vector 0 2 b = (b0, b1) ∈ R is the 2 × 1 vector of regression coefficients 0 2 e = (e1,..., en) ∼ (0n, σ In) is the n × 1 error vector

2 Implies that (y|x) ∼ (Xb, σ In)

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 59 Bootstrapping Regression Regression Review Ordinary : Scalar Form

The (OLS) problem is

n X 2 min (yi − b0 − b1xi ) b0,b1∈R i=1 and the OLS solution has the form ˆ ˆ b0 = y¯ − b1x¯ Pn (x − x¯)(y − y¯) bˆ = i=1 i i 1 Pn 2 i=1(xi − x¯) Pn Pn where x¯ = (1/n) i=1 xi and y¯ = (1/n) i=1 yi

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 60 Bootstrapping Regression Regression Review Ordinary Least Squares: Matrix Form

The ordinary least squares (OLS) problem is

min ky − Xbk2 b∈R2 where k · k denotes the Frobenius norm; the OLS solution has the form

bˆ = (X0X)−1X0y

where

 n 2 n  −1 1 P x − P x X0X = i=1 i i=1 i Pn 2 − Pn n i=1(xi − x¯) i=1 xi n  Pn  0 i=1 yi X y = Pn i=1 xi yi

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 61 Bootstrapping Regression Regression Review OLS Coefficients are Random Variables

Note that bˆ is a linear function of y, so we can derive the following.

The expectation of bˆ is given by E(bˆ) = E[(X0X)−1X0y] = E[(X0X)−1X0(Xb + e)] = E[b] + (X0X)−1X0E[e] = b and the is given by V(bˆ) = V[(X0X)−1X0y] = (X0X)−1X0V[y]X(X0X)−1 0 −1 0 2 0 −1 = (X X) X (σ In)X(X X) = σ2(X0X)−1

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 62 Bootstrapping Regression Regression Review Fitted Values are Random Variables

Similarly yˆ = Xbˆ is a linear function of y, so we can derive. . .

The expectation of yˆ is given by E(yˆ) = E[X(X0X)−1X0y] = E[X(X0X)−1X0(Xb + e)] = E[Xb] + X(X0X)−1X0E[e] = Xb and the covariance matrix is given by V(yˆ) = V[X(X0X)−1X0y] = X(X0X)−1X0V[y]X(X0X)−1X0 0 −1 0 2 0 −1 0 = X(X X) X (σ In)X(X X) X = σ2X(X0X)−1X0

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 63 Bootstrapping Regression Regression Review Need for the Bootstrap

iid 2 If the residuals are Gaussian ei ∼ N(0, σ ), then we have that bˆ ∼ N(b, σ2(X0X)−1). yˆ ∼ N(Xb, σ2X(X0X)−1X0) so it is possible to make probabilistic statements about bˆ and yˆ.

iid If ei ∼ F for some arbitrary distribution F with EF (ei ) = 0, we can use the bootstrap to make inferences about bˆ and yˆ. ˆ Use bootstrap with ecdf F as distribution of ei

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 64 Bootstrapping Regression Bootstrapping Residuals Bootstrapping Regression Residuals

Can use following bootstrap procedure: 1 Fit regression model to obtain yˆ and eˆ = y − yˆ 2 ∗ ˆ ˆ Sample ei with replacement from {e1,..., en} for i ∈ {1,..., n} 3 ∗ ˆ ∗ ˆ∗ 0 −1 0 ∗ Define yi = yi + ei and b = (X X) X y 4 Repeat 2–3 a total of B times to get bootstrap distribution of bˆ

Don’t need Monte Carlo simulation to get bootstrap standard errors:

V(bˆ∗) = (X0X)−1X0V(y∗)X(X0X)−1 2 0 −1 =σ ˆF (X X)

∗ 2 2 Pn 2 given that V(y ) =σ ˆF In where σˆF = i=1 ei /n

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 65 Bootstrapping Regression Bootstrapping Residuals Bootstrapping Regression Residuals in R

> set.seed(1) ^ > n = 500 b0 > x = rexp(n) > e = runif(n,min=-2,max=2) > y = 3 + 2*x + e > linmod = lm(y~x) > linmod$coef

(Intercept) x 1000 2.898325 2.004220 Frequency > yhat = linmod$fitted.values 0 > bsamp = bootsamp(linmod$residuals) 2.6 2.7 2.8 2.9 3.0 3.1 3.2 > bsamp = matrix(yhat,n,ncol(bsamp)) + bsamp > myfun = function(y,x) lm(y~x)$coef bse$theta[1, ] > bse = bootse(bsamp,myfun,x=x) > bse$cov (Intercept) x ^ (Intercept) 0.006105788 -0.003438571 b x -0.003438571 0.003605852 1 > sigsq = mean(linmod$residuals^2) > solve(crossprod(cbind(1,x))) sigsq

* 3000 x 0.006112136 -0.003412774

x -0.003412774 0.003573180 1500

> par(mfcol=c(2,1)) Frequency

> hist(bse$theta[1,],main=expression(hat(b)[0])) 0 > hist(bse$theta[2,],main=expression(hat(b)[1])) 1.8 1.9 2.0 2.1 2.2 2.3

bse$theta[2, ] Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 66 Bootstrapping Regression Bootstrapping Pairs Bootstrapping Pairs Instead of Residuals

We could also use the following bootstrap procedure: 1 Fit regression model to obtain yˆ and eˆ = y − yˆ

2 ∗ ∗ ∗ Sample zi = (xi , yi ) with replacement from {(x1, y1),..., (xn, yn)} for i ∈ {1,..., n}

3 ∗ ∗ ∗ ∗ ∗ ∗ ∗ Define x = (x1 ,..., xn ), X∗ = [1n, x ], y = (y1 ,..., yn ), , and ˆ∗ 0 −1 0 ∗ b = (X∗X∗) X∗y 4 Repeat 2–3 a total of B times to get bootstrap distribution of bˆ

Bootstrapping pairs only assumes (xi , yi ) are iid from some F.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 67 Bootstrapping Regression Bootstrapping Pairs Bootstrapping Regression Pairs in R

> set.seed(1) ^ > n = 500 b0 > x = rexp(n) > e = runif(n,min=-2,max=2) > y = 3 + 2*x + e 2500 > linmod = lm(y~x) > linmod$coef

(Intercept) x 1000 2.898325 2.004220 Frequency > z = cbind(y,x) 0 > bsamp = bootsamp(z) 2.6 2.7 2.8 2.9 3.0 3.1 3.2 > myfun = function(z) lm(z[,1]~z[,2])$coef > bse = bootse(bsamp,myfun) bse$theta[1, ] > bse$cov (Intercept) z[, 2] (Intercept) 0.006376993 -0.003913989 ^ z[, 2] -0.003913989 0.004308720 b > sigsq = mean(linmod$residuals^2) 1 > solve(crossprod(cbind(1,x))) * sigsq x

0.006112136 -0.003412774 2500 x -0.003412774 0.003573180 > par(mfcol=c(2,1)) 1000

> hist(bse$theta[1,],main=expression(hat(b)[0])) Frequency

> hist(bse$theta[2,],main=expression(hat(b)[1])) 0

1.8 1.9 2.0 2.1 2.2

bse$theta[2, ] Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 68 Bootstrapping Regression Bootstrapping Regression: Pairs or Residuals?

Bootstrapping pairs requires fewer assumptions about the data

Bootstrapping pairs only assumes (xi , yi ) are iid from some F iid 2 Bootstrapping residuals assumes (yi |xi ) ∼ (b0 + b1xi , σ )

Bootstrapping pairs can be dangerous when working with categorical predictors and/or continuous predictors with skewed distributions

Bootstrapping residuals is preferable when regression model is reasonably specified (because X remains unchanged).

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 69