Regression III Lecture 5: Resampling Techniques
Total Page:16
File Type:pdf, Size:1020Kb
Goals of the Lecture Regression III Lecture 5: Resampling Techniques Introduce the general idea of Resampling - techniques to • Dave Armstrong resample from the original data Bootstrapping • Cross-validation University of Western Ontario • Department of Political Science An extended example of bootstrapping local polynomial Department of Statistics and Actuarial Science (by courtesy) • regression models. e: [email protected] w: www.quantoid.net/teachicpsr/regression3/ 1/64 2/64 Resampling: An Overview Resampling and Regression: A Caution Resampling techniques sample from the original dataset There is no need whatsoever for bootstrapping in regression • • Some of the applications of these methods are: analysis if the OLS assumptions are met • In such cases, OLS estimates are unbiased and maximally efficient. to compute standard errors and confidence intervals (either when • • we have small sample sizes, dubious distributional assumptions or There are situations, however, where we cannot satisfy the • for a statistic that does not have an easily derivable asymptotic assumptions and thus other methods are more helpful standard error) Robust regression (such as MM-estimation) often provides better Subset selection in regression • estimates than OLS in the presence of influential cases, but only • Handling Missing Data has reliable SEs asymptotically. • Selection of degrees of freedom in nonparametric regression Local polynomial regression is often “better” (in the RSS sense) in • (especially GAMs) • the presence of non-linearity, but because of the unknown df, only For the most part, this lecture will discuss resampling techniques has approximate sampling distribution. • in the context of computing confidence intervals and hypothesis Cross-validation is particularly helpful for validating models and • tests for regression analysis. choosing model fitting parameters 3/64 4/64 Bootstrapping: General Overview Bootstrapping: General Overview (2) If we assume that a random variable X or statistic has a Assume that we have a sample of size n for which we require • particular population value, we can study how a statistical • more reliable standard errors for our estimates estimator computed from samples behaves Perhaps n is small, or alternatively, we have a statistic for which • We don’t always know, however, how a variable or statistic is there is no known sampling distribution • distributed in the population The bootstrap provides one “solution” • For example, there may be a statistic for which standard errors Take several new samples from the original sample, calculating the • have not been formulated (e.g., imagine we wanted to test whether • statistic each time two additive scales have significantly di↵erent levels of internal Calculate the average and standard error (and maybe quantiles) consistency - Cronbach’s α doesn’t have an exact sampling • from the empirical distribution of the bootstrap samples distribution In other words, we find a standard error based on sampling (with Another example is the impact of missing data on a distribution - • replacement) from the original data • we don’t know how the missing data di↵er from the observed data We apply principles of inference similar to those employed when Bootstrapping is a technique for estimating standard errors and • sampling from the population • The population is to the sample as the sample is to the bootstrap confidence intervals (sets) without making assumptions about the • distributions that give rise to the data samples 5/64 6/64 Bootstrapping: General Overview (3) Bootstrapping the Mean Imagine, unrealistically, that we are interested in finding the • confidence interval for the mean of a sample of only 4 observations Specifically, assume that we are interested in the di↵erence in there are several Variants of the bootstrap. The two we are most • income between husbands and wives interested in are: We have four cases, with the following mean di↵erences (in 1. Nonparametric Bootstrap • $ 1000’s): 6, -3, 5, 3, for a mean of 2.75 and a standard deviation No underlying population distribution is assumed of 4.031 • Most commonly used method From classical theory, we can calculate the standard error: • • 2. Parametric Bootstrap Sx Assumes that the statistic has a particular parametric form (e.g., SE = • normal) pn 4.031129 = p4 = 2.015564 Now we’ll compare the confidence interval to the one calculated • using bootstrapping 7/64 8/64 Defining the Random Variable The Sample as the Population (1) We now treat the sample as if it were the population, and The first thing that bootstrapping does is estimate the • • population distribution of Y from the four observations in the resample from it sample In this case, we take all possible samples with replacement, • meaning that we take nn = 256 di↵erent samples In other words, the random variable Y ⇤ is defined: • Since we found all possible samples, the mean of these samples is Y ⇤ p⇤ Y ⇤ • 6 0.25( ) simply the original mean -3 0.25 We then determine the standard error of Y¯ from these samples • 5 0.25 nn 2 3 0.25 Y¯⇤ Y¯ SE Y¯ = b=1( b − ) = 1.74553 The mean of Y ⇤ is then simply the mean of the sample: ⇤ s n • ( ) Õ n We now adjust for the sample size E⇤ Y ⇤ = Y ⇤p⇤ Y ⇤ ( ) ( ) = ’2.75 n 4 ¯ SEˆ Y¯ = SE⇤ Y¯⇤ = 1.74553 = 2.015564 = Y ( ) n 1 ( ) 3 ⇥ r − r 9/64 10 / 64 The Sample as the Population (2) Characteristics of the Bootstrap Statistic The bootstrap sampling distribution around the original estimate • of the statistic T is analogous to the sampling distribution of T In this example, because we used all possible resamples of our around the population parameter θ • The average of the bootstrapped statistics is simply: sample, the bootstrap standard error (2.015564) is exactly the • same as the original standard error R T ⇤ This approach can be used for statistics for which we do not have T¯ = E T b=1 b • ⇤ ⇤ standard error formulas, or we have small sample sizes ( ) ⇡ Õ R In summary, the following analogies can be made to sampling where R is the number of bootstraps • from the population The bias of T can be seen as its deviation from the bootstrap Bootstrap observations original observations • average (i.e., it estimates T θ) • Bootstrap Mean original! sample mean − • ! Original sample mean unknown population mean µ Bˆ⇤ = T¯⇤ T • Distribution of the bootstrap! means unknown sampling − • distribution from the original sample! The estimated bootstrap variance of T is: • ⇤ R T T¯ 2 b=1 b⇤ ⇤ Vˆ T ⇤ = ( − ) ( ) R 1 Õ − 11 / 64 12 / 64 Bootstrapping with Larger Samples Evaluating Confidence Intervals Accuracy: The larger the sample, the more e↵ort it is to calculate the how quickly do coverage errors go to zero? • bootstrap estimates • Prob θ < Tˆ = α and Prob θ > Tˆ = α With large sample sizes, the possible number of bootstrap samples • { lo} { up} • n Errors go to zero at a rate of: n gets very large and impractical (e.g., it would take a long time • to calculate 10001000 bootstrap samples) 1 (second-order accurate) • n typically we want to take somewhere between 1000 and 2000 1 (first-order accurate) • bootstrap samples in order to find a confidence interval of a • pn statistic Transformation Respecting: After calculating the standard error, we can easily find the For any monotone transformation of θ, ϕ = m θ , can we obtain • • ( ) confidence interval. Three methods are commonly used the right confidence interval on ϕˆ with the confidence intervals on 1. Normal Theory Intervals θˆ mapped by m ? E.g., 2. Percentile Intervals () 3. Bias Corrected Percentile Intervals ϕˆ , ϕˆ = m θˆ ,m θˆ [ lo up] [ ( lo) ( up)] 13 / 64 14 / 64 Bootstrap Confidence Intervals: Normal Theory Intervals Bootstrap Confidence Intervals: Percentile Intervals Uses percentiles of the bootstrap sampling distribution to find • the end-points of the confidence interval Many statistics are asymptotically normally distributed ˆ • If G is the CDF of T ⇤, then we can find the 100(1-α)% confidence Therefore, in large samples, we may be able to use a normality • interval with: • assumption to characterize the bootstrap distribution. E.g., 1 1 Tˆ , ,Tˆ , = Gˆ− α ,Gˆ− 1 α 2 % lo % up Tˆ⇤ N Tˆ,seˆ ( ) ( − ) ⇠ ( ) The 1 2α percentile interval can be approximated with: • ( − ) ⇥ ⇤ ⇥ ⇤ ˆ α 1 α where seˆ is V T ⇤ ˆ ˆ ⇤( ) ⇤( − ) ( ) T%,lo,T%,up TB ,TB This approachq works well for the bootstrap confidence interval, ⇡ • h i but only if the bootstrap sampling distribution is approximately α ⇥1 α ⇤ where T ⇤( ) and T ⇤( − ) are the ordered (B) bootstrap replicates normally distributed B B such that 100α% of them fall below the former and 100α% of In other words, it is important to look at the distribution before • relying on the normal theory interval them fall above the latter. These intervals do not assume a normal distribution, but they do • not perform well unless we have a large original sample and at least 1000 bootstrap samples 15 / 64 16 / 64 Bootstrap Confidence Intervals: Bias-Corrected, Accelerated Bias correction: zˆ0 Percentile Intervals BC ( a) The BCa CI adjusts the confidence intervals for bias due to small • samples by employing a normalizing transformation through two correction factors. # T ⇤ T This is also a percentile interval, but the percentiles are not zˆ = Φ 1 b • 0 − necessarily the ones you would think. B ! α 1 α Using strict percentile intervals, Tˆ ,Tˆ T ⇤( ),T ⇤( − ) • lo up ⇡ B B This just gives the inverse of the normal CDF for proportion of α α • Here, Tˆ ,Tˆ T ⇤( 1),T ⇤( 2) ⇥ ⇤ h i bootstrap replicates less than T . • lo up ⇡ B B α1 , α and α2 , 1h α1 i Note that if # T ⇤ T = 0.5, then zˆ0 = 0. • ⇥ ⇤ ( − ) • b If T is unbiased, the proportion will be close to 0.5, meaning that • α the correction is close to 0.