Bootstrap Resampling

Total Page:16

File Type:pdf, Size:1020Kb

Bootstrap Resampling Bootstrap Resampling Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 1 Copyright Copyright c 2017 by Nathaniel E. Helwig Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 2 Outline of Notes 1) Background Information 3) Bootstrap in Practice Statistical inference Bootstrap in R Sampling distributions Bias and mean-squared error Need for resampling The Jackknife 2) Bootstrap Basics 4) Bootstrapping Regression Overview Regression review Empirical distribution Bootstrapping residuals Plug-in principle Bootstrapping pairs For a thorough treatment see: Efron, B. & Tibshirani, R.J. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC. Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 3 Background Information Background Information Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 4 Background Information Statistical Inference The Classic Statistical Paradigm X is some random variable, e.g., age in years X = fx1; x2; x3;:::g is some population of interest, e.g., Ages of all students at the University of Minnesota Ages of all people in the state of Minnesota At the population level. F(x) = P(X ≤ x) for all x 2 X is the population CDF θ = t(F) is population parameter, where t is some function of F At the sample level. 0 iid x = (x1;:::; xn) is sample of data with xi ∼ F for i 2 f1;:::; ng θ^ = s(x) is sample statistic, where s is some function of x Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 5 Background Information Statistical Inference The Classic Statistical Paradigm (continued) θ^ is a random variable that depends on x (and thus F) The sampling distribution of θ^ refers to the CDF (or PDF) of θ^. If F is known (or assumed to be known), then the sampling distribution of θ^ may have some known distribution. iid 2 ¯ 2 ¯ 1 Pn If xi ∼ N(µ, σ ), then x ∼ N(µ, σ =n) where x = n i=1 xi Note in the above example, θ ≡ µ and θ^ ≡ x¯ How can we make inferences about θ using θ^ when F is unknown? Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 6 Background Information Sampling Distributions The Hypothetical Ideal Assume that X is too large to measure all members of population. If we had a really LARGE research budget, we could collect B independent samples from the population X 0 iid xj = (x1j ;:::; xnj ) is j-th sample with xij ∼ F ^ θj = s(xj ) is statistic (parameter estimate) for j-th sample ^ ^ B Sampling distribution of θ can be estimated via distribution of fθj gj=1. Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 7 Background Information Sampling Distributions The Hypothetical Ideal: Example 1 (Normal Mean) iid Sampling Distribution of x¯ with xi ∼ N(0; 1) for n = 100: Sampling Distribution: B = 200 Sampling Distribution: B = 500 Sampling Distribution: B = 1000 5 x pdf 5 x pdf 5 x pdf 4 4 4 3 3 3 Density Density Density 2 2 2 1 1 1 0 0 0 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 xbar xbar xbar Sampling Distribution: B = 2000 Sampling Distribution: B = 5000 Sampling Distribution: B = 10000 5 x pdf 5 x pdf 5 x pdf 4 4 4 3 3 3 Density Density Density 2 2 2 1 1 1 0 0 0 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 xbar xbar xbar Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 8 Background Information Sampling Distributions The Hypothetical Ideal: Example 1 R Code # hypothetical ideal: example 1 (normal mean) set.seed(1) n = 100 B = c(200,500,1000,2000,5000,10000) xseq = seq(-0.4,0.4,length=200) quartz(width=12,height=8) par(mfrow=c(2,3)) for(k in 1:6){ X = replicate(B[k], rnorm(n)) xbar = apply(X, 2, mean) hist(xbar, freq=F, xlim=c(-0.4,0.4), ylim=c(0,5), main=paste("Sampling Distribution: B =",B[k])) lines(xseq, dnorm(xseq, sd=1/sqrt(n))) legend("topright",expression(bar(x)*" pdf"),lty=1,bty="n") } Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 9 Background Information Sampling Distributions The Hypothetical Ideal: Example 2 (Normal Median) iid Sampling Distribution of median(x) with xi ∼ N(0; 1) for n = 100: Sampling Distribution: B = 200 Sampling Distribution: B = 500 Sampling Distribution: B = 1000 5 x pdf 5 x pdf 5 x pdf 4 4 4 3 3 3 Density Density Density 2 2 2 1 1 1 0 0 0 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 xmed xmed xmed Sampling Distribution: B = 2000 Sampling Distribution: B = 5000 Sampling Distribution: B = 10000 5 x pdf 5 x pdf 5 x pdf 4 4 4 3 3 3 Density Density Density 2 2 2 1 1 1 0 0 0 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 xmed xmed xmed Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 10 Background Information Sampling Distributions The Hypothetical Ideal: Example 2 R Code # hypothetical ideal: example 2 (normal median) set.seed(1) n = 100 B = c(200,500,1000,2000,5000,10000) xseq = seq(-0.4,0.4,length=200) quartz(width=12,height=8) par(mfrow=c(2,3)) for(k in 1:6){ X = replicate(B[k], rnorm(n)) xmed = apply(X, 2, median) hist(xmed, freq=F, xlim=c(-0.4,0.4), ylim=c(0,5), main=paste("Sampling Distribution: B =",B[k])) lines(xseq, dnorm(xseq, sd=1/sqrt(n))) legend("topright",expression(bar(x)*" pdf"),lty=1,bty="n") } Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 11 Background Information Need for Resampling Back to the Real World In most cases, we only have one sample of data. What do we do? If n is large and we only care about x¯, we can use the CLT. iid Sampling Distribution of x¯ with xi ∼ U[0; 1] for B = 10000: Sampling Distribution: n = 3 Sampling Distribution: n = 5 Sampling Distribution: n = 10 2.5 asymp pdf asymp pdf asymp pdf 3.0 4 2.0 2.5 3 2.0 1.5 1.5 2 Density Density Density 1.0 1.0 1 0.5 0.5 0 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 xbar xbar xbar Sampling Distribution: n = 20 Sampling Distribution: n = 50 Sampling Distribution: n = 100 10 asymp pdf asymp pdf 14 asymp pdf 6 12 8 5 10 4 6 8 3 Density Density Density 6 4 2 4 2 1 2 0 0 0 0.3 0.4 0.5 0.6 0.7 0.8 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.40 0.45 0.50 0.55 0.60 xbar xbar xbar Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 12 Background Information Need for Resampling The Need for a Nonparametric Resampling Method For most statistics other than the sample mean, there is no theoretical argument to derive the sampling distribution. To make inferences, we need to somehow obtain (or approximate) the sampling distribution of any generic statistic θ^. Note that parametric statistics overcome this issue by assuming some particular distribution for the data Nonparametric bootstrap overcomes this problem by resampling observed data to approximate the sampling distribution of θ^. Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 13 Bootstrap Basics Bootstrap Basics Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 14 Bootstrap Basics Overview Problem of Interest In statistics, we typically want to know the properties of our estimates, e.g., precision, accuracy, etc. In parametric situation, we can often derive the distribution of our estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we can use the bootstrap to examine properties of our estimates in a variety of different situations. Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 15 Bootstrap Basics Overview Bootstrap Procedure 0 iid Suppose x = (x1;:::; xn) with xi ∼ F(x) for i 2 f1;:::; ng, and we want to make inferences about some statistic θ^ = s(x). Can use Monte Carlo Bootstrap: 1 ∗ Sample xi with replacement from fx1;:::; xng for i 2 f1;:::; ng 2 ^∗ ∗ ∗ ∗ ∗ 0 Calculate θ = s(x ) for b-th sample where x = (x1 ;:::; xn ) 3 Repeat 1–2 a total of B times to get bootstrap distribution of θ^ 4 Compare θ^ = s(x) to bootstrap distribution ^ ^∗ B Estimated standard error of θ is standard deviation of fθbgb=1: r 1 σ^ = PB (θ^∗ − θ¯∗)2 B B − 1 b=1 b ¯∗ 1 PB ^∗ ^ where θ = B b=1 θb is the mean of the bootstrap distribution of θ.
Recommended publications
  • Phase Transition Unbiased Estimation in High Dimensional Settings Arxiv
    Phase Transition Unbiased Estimation in High Dimensional Settings St´ephaneGuerrier, Mucyo Karemera, Samuel Orso & Maria-Pia Victoria-Feser Research Center for Statistics, University of Geneva Abstract An important challenge in statistical analysis concerns the control of the finite sample bias of estimators. This problem is magnified in high dimensional settings where the number of variables p diverge with the sample size n. However, it is difficult to establish whether an estimator θ^ of θ0 is unbiased and the asymptotic ^ order of E[θ] − θ0 is commonly used instead. We introduce a new property to assess the bias, called phase transition unbiasedness, which is weaker than unbiasedness but stronger than asymptotic results. An estimator satisfying this property is such ^ ∗ that E[θ] − θ0 2 = 0, for all n greater than a finite sample size n . We propose a phase transition unbiased estimator by matching an initial estimator computed on the sample and on simulated data. It is computed using an algorithm which is shown to converge exponentially fast. The initial estimator is not required to be consistent and thus may be conveniently chosen for computational efficiency or for other properties. We demonstrate the consistency and the limiting distribution of the estimator in high dimension. Finally, we develop new estimators for logistic regression models, with and without random effects, that enjoy additional properties such as robustness to data contamination and to the problem of separability. arXiv:1907.11541v3 [math.ST] 1 Nov 2019 Keywords: Finite sample bias, Iterative bootstrap, Two-step estimators, Indirect inference, Robust estimation, Logistic regression. 1 1. Introduction An important challenge in statistical analysis concerns the control of the (finite sample) bias of estimators.
    [Show full text]
  • Notes on Jackknife
    Stat 427/627 (Statistical Machine Learning, Baron) Notes on Jackknife Jackknife is an effective resampling method that was proposed by Morris Quenouille to estimate and reduce the bias of parameter estimates. 1.1 Jackknife resampling Resampling refers to sampling from the original sample S with certain weights. The original weights are (1/n) for each units in the sample, and the original empirical distribution is 1 mass at each observation x , i ∈ S ˆ i F = n 0 elsewhere Resampling schemes assign different weights. Jackknife re-assigns weights as follows, 1 mass at each observation x , i ∈ S, i =6 j ˆ i FJK = n − 1 0 elsewhere, including xj That is, Jackknife removes unit j from the sample, and the new jackknife sample is S(j) =(x1,...,xj−1, xj+1,...,xn). 1.2 Jackknife estimator ˆ ˆ ˆ Suppose we estimate parameter θ with an estimator θ = θ(x1,...,xn). The bias of θ is Bias(θˆ)= Eθ(θˆ) − θ. • How to estimate Bias(θˆ)? • How to reduce |Bias(θˆ)|? • If the bias is not zero, how to find an estimator with a smaller bias? For almost all reasonable and practical estimates, Bias(θˆ) → 0, as n →∞. Then, it is reasonable to assume a power series of the type a a a E (θˆ)= θ + 1 + 2 + 3 + ..., θ n n2 n3 with some coefficients {ak}. 1 1.2.1 Delete one Based on a Jackknife sample S(j), we compute the Jackknife version of the estimator, ˆ ˆ ˆ θ(j) = θ(S(j))= θ(x1,...,xj−1, xj+1,...,xn), whose expected value admits representation a a E (θˆ )= θ + 1 + 2 + ...
    [Show full text]
  • Bias, Mean-Square Error, Relative Efficiency
    3 Evaluating the Goodness of an Estimator: Bias, Mean-Square Error, Relative Efficiency Consider a population parameter ✓ for which estimation is desired. For ex- ample, ✓ could be the population mean (traditionally called µ) or the popu- lation variance (traditionally called σ2). Or it might be some other parame- ter of interest such as the population median, population mode, population standard deviation, population minimum, population maximum, population range, population kurtosis, or population skewness. As previously mentioned, we will regard parameters as numerical charac- teristics of the population of interest; as such, a parameter will be a fixed number, albeit unknown. In Stat 252, we will assume that our population has a distribution whose density function depends on the parameter of interest. Most of the examples that we will consider in Stat 252 will involve continuous distributions. Definition 3.1. An estimator ✓ˆ is a statistic (that is, it is a random variable) which after the experiment has been conducted and the data collected will be used to estimate ✓. Since it is true that any statistic can be an estimator, you might ask why we introduce yet another word into our statistical vocabulary. Well, the answer is quite simple, really. When we use the word estimator to describe a particular statistic, we already have a statistical estimation problem in mind. For example, if ✓ is the population mean, then a natural estimator of ✓ is the sample mean. If ✓ is the population variance, then a natural estimator of ✓ is the sample variance. More specifically, suppose that Y1,...,Yn are a random sample from a population whose distribution depends on the parameter ✓.The following estimators occur frequently enough in practice that they have special notations.
    [Show full text]
  • Chapter 8 Fundamental Sampling Distributions And
    CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS 8.1 Random Sampling pling procedure, it is desirable to choose a random sample in the sense that the observations are made The basic idea of the statistical inference is that we independently and at random. are allowed to draw inferences or conclusions about a Random Sample population based on the statistics computed from the sample data so that we could infer something about Let X1;X2;:::;Xn be n independent random variables, the parameters and obtain more information about the each having the same probability distribution f (x). population. Thus we must make sure that the samples Define X1;X2;:::;Xn to be a random sample of size must be good representatives of the population and n from the population f (x) and write its joint proba- pay attention on the sampling bias and variability to bility distribution as ensure the validity of statistical inference. f (x1;x2;:::;xn) = f (x1) f (x2) f (xn): ··· 8.2 Some Important Statistics It is important to measure the center and the variabil- ity of the population. For the purpose of the inference, we study the following measures regarding to the cen- ter and the variability. 8.2.1 Location Measures of a Sample The most commonly used statistics for measuring the center of a set of data, arranged in order of mag- nitude, are the sample mean, sample median, and sample mode. Let X1;X2;:::;Xn represent n random variables. Sample Mean To calculate the average, or mean, add all values, then Bias divide by the number of individuals.
    [Show full text]
  • Weak Instruments and Finite-Sample Bias
    7 Weak instruments and finite-sample bias In this chapter, we consider the effect of weak instruments on instrumen- tal variable (IV) analyses. Weak instruments, which were introduced in Sec- tion 4.5.2, are those that do not explain a large proportion of the variation in the exposure, and so the statistical association between the IV and the expo- sure is not strong. This is of particular relevance in Mendelian randomization studies since the associations of genetic variants with exposures of interest are often weak. This chapter focuses on the impact of weak instruments on the bias and coverage of IV estimates. 7.1 Introduction Although IV techniques can be used to give asymptotically unbiased estimates of causal effects in the presence of confounding, these estimates suffer from bias when evaluated in finite samples [Nelson and Startz, 1990]. A weak instrument (or a weak IV) is still a valid IV, in that it satisfies the IV assumptions, and es- timates using the IV with an infinite sample size will be unbiased; but for any finite sample size, the average value of the IV estimator will be biased. This bias, known as weak instrument bias, is towards the observational confounded estimate. Its magnitude depends on the strength of association between the IV and the exposure, which is measured by the F statistic in the regression of the exposure on the IV [Bound et al., 1995]. In this chapter, we assume the context of ‘one-sample’ Mendelian randomization, in which evidence on the genetic variant, exposure, and outcome are taken on the same set of indi- viduals, rather than subsample (Section 8.5.2) or two-sample (Section 9.8.2) Mendelian randomization, in which genetic associations with the exposure and outcome are estimated in different sets of individuals (overlapping sets in subsample, non-overlapping sets in two-sample Mendelian randomization).
    [Show full text]
  • STAT 830 the Basics of Nonparametric Models The
    STAT 830 The basics of nonparametric models The Empirical Distribution Function { EDF The most common interpretation of probability is that the probability of an event is the long run relative frequency of that event when the basic experiment is repeated over and over independently. So, for instance, if X is a random variable then P (X ≤ x) should be the fraction of X values which turn out to be no more than x in a long sequence of trials. In general an empirical probability or expected value is just such a fraction or average computed from the data. To make this precise, suppose we have a sample X1;:::;Xn of iid real valued random variables. Then we make the following definitions: Definition: The empirical distribution function, or EDF, is n 1 X F^ (x) = 1(X ≤ x): n n i i=1 This is a cumulative distribution function. It is an estimate of F , the cdf of the Xs. People also speak of the empirical distribution of the sample: n 1 X P^(A) = 1(X 2 A) n i i=1 ^ This is the probability distribution whose cdf is Fn. ^ Now we consider the qualities of Fn as an estimate, the standard error of the estimate, the estimated standard error, confidence intervals, simultaneous confidence intervals and so on. To begin with we describe the best known summaries of the quality of an estimator: bias, variance, mean squared error and root mean squared error. Bias, variance, MSE and RMSE There are many ways to judge the quality of estimates of a parameter φ; all of them focus on the distribution of the estimation error φ^−φ.
    [Show full text]
  • Permutation Tests
    Permutation tests Ken Rice Thomas Lumley UW Biostatistics Seattle, June 2008 Overview • Permutation tests • A mean • Smallest p-value across multiple models • Cautionary notes Testing In testing a null hypothesis we need a test statistic that will have different values under the null hypothesis and the alternatives we care about (eg a relative risk of diabetes) We then need to compute the sampling distribution of the test statistic when the null hypothesis is true. For some test statistics and some null hypotheses this can be done analytically. The p- value for the is the probability that the test statistic would be at least as extreme as we observed, if the null hypothesis is true. A permutation test gives a simple way to compute the sampling distribution for any test statistic, under the strong null hypothesis that a set of genetic variants has absolutely no effect on the outcome. Permutations To estimate the sampling distribution of the test statistic we need many samples generated under the strong null hypothesis. If the null hypothesis is true, changing the exposure would have no effect on the outcome. By randomly shuffling the exposures we can make up as many data sets as we like. If the null hypothesis is true the shuffled data sets should look like the real data, otherwise they should look different from the real data. The ranking of the real test statistic among the shuffled test statistics gives a p-value Example: null is true Data Shuffling outcomes Shuffling outcomes (ordered) gender outcome gender outcome gender outcome Example: null is false Data Shuffling outcomes Shuffling outcomes (ordered) gender outcome gender outcome gender outcome Means Our first example is a difference in mean outcome in a dominant model for a single SNP ## make up some `true' data carrier<-rep(c(0,1), c(100,200)) null.y<-rnorm(300) alt.y<-rnorm(300, mean=carrier/2) In this case we know from theory the distribution of a difference in means and we could just do a t-test.
    [Show full text]
  • Sampling Distribution of the Variance
    Proceedings of the 2009 Winter Simulation Conference M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, eds. SAMPLING DISTRIBUTION OF THE VARIANCE Pierre L. Douillet Univ Lille Nord de France, F-59000 Lille, France ENSAIT, GEMTEX, F-59100 Roubaix, France ABSTRACT Without confidence intervals, any simulation is worthless. These intervals are quite ever obtained from the so called "sampling variance". In this paper, some well-known results concerning the sampling distribution of the variance are recalled and completed by simulations and new results. The conclusion is that, except from normally distributed populations, this distribution is more difficult to catch than ordinary stated in application papers. 1 INTRODUCTION Modeling is translating reality into formulas, thereafter acting on the formulas and finally translating the results back to reality. Obviously, the model has to be tractable in order to be useful. But too often, the extra hypotheses that are assumed to ensure tractability are held as rock-solid properties of the real world. It must be recalled that "everyday life" is not only made with "every day events" : rare events are rarely occurring, but they do. For example, modeling a bell shaped histogram of experimental frequencies by a Gaussian pdf (probability density function) or a Fisher’s pdf with four parameters is usual. Thereafter transforming this pdf into a mgf (moment generating function) by mgf (z)=Et (expzt) is a powerful tool to obtain (and prove) the properties of the modeling pdf . But this doesn’t imply that a specific moment (e.g. 4) is effectively an accessible experimental reality.
    [Show full text]
  • Arxiv:1804.01620V1 [Stat.ML]
    ACTIVE COVARIANCE ESTIMATION BY RANDOM SUB-SAMPLING OF VARIABLES Eduardo Pavez and Antonio Ortega University of Southern California ABSTRACT missing data case, as well as for designing active learning algorithms as we will discuss in more detail in Section 5. We study covariance matrix estimation for the case of partially ob- In this work we analyze an unbiased covariance matrix estima- served random vectors, where different samples contain different tor under sub-Gaussian assumptions on x. Our main result is an subsets of vector coordinates. Each observation is the product of error bound on the Frobenius norm that reveals the relation between the variable of interest with a 0 1 Bernoulli random variable. We − number of observations, sub-sampling probabilities and entries of analyze an unbiased covariance estimator under this model, and de- the true covariance matrix. We apply this error bound to the design rive an error bound that reveals relations between the sub-sampling of sub-sampling probabilities in an active covariance estimation sce- probabilities and the entries of the covariance matrix. We apply our nario. An interesting conclusion from this work is that when the analysis in an active learning framework, where the expected number covariance matrix is approximately low rank, an active covariance of observed variables is small compared to the dimension of the vec- estimation approach can perform almost as well as an estimator with tor of interest, and propose a design of optimal sub-sampling proba- complete observations. The paper is organized as follows, in Section bilities and an active covariance matrix estimation algorithm.
    [Show full text]
  • An Unbiased Variance Estimator of a K-Sample U-Statistic with Application to AUC in Binary Classification" (2019)
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Wellesley College Wellesley College Wellesley College Digital Scholarship and Archive Honors Thesis Collection 2019 An Unbiased Variance Estimator of a K-sample U- statistic with Application to AUC in Binary Classification Alexandria Guo [email protected] Follow this and additional works at: https://repository.wellesley.edu/thesiscollection Recommended Citation Guo, Alexandria, "An Unbiased Variance Estimator of a K-sample U-statistic with Application to AUC in Binary Classification" (2019). Honors Thesis Collection. 624. https://repository.wellesley.edu/thesiscollection/624 This Dissertation/Thesis is brought to you for free and open access by Wellesley College Digital Scholarship and Archive. It has been accepted for inclusion in Honors Thesis Collection by an authorized administrator of Wellesley College Digital Scholarship and Archive. For more information, please contact [email protected]. An Unbiased Variance Estimator of a K-sample U-statistic with Application to AUC in Binary Classification Alexandria Guo Under the Advisement of Professor Qing Wang A Thesis Submitted in Partial Fulfillment of the Prerequisite for Honors in the Department of Mathematics, Wellesley College May 2019 © 2019 Alexandria Guo Abstract Many questions in research can be rephrased as binary classification tasks, to find simple yes-or- no answers. For example, does a patient have a tumor? Should this email be classified as spam? For classifiers trained to answer these queries, area under the ROC (receiver operating characteristic) curve (AUC) is a popular metric for assessing the performance of a binary classification method, where a larger AUC score indicates an overall better binary classifier.
    [Show full text]
  • Examination of Residuals
    EXAMINATION OF RESIDUALS F. J. ANSCOMBE PRINCETON UNIVERSITY AND THE UNIVERSITY OF CHICAGO 1. Introduction 1.1. Suppose that n given observations, yi, Y2, * , y., are claimed to be inde- pendent determinations, having equal weight, of means pA, A2, * *X, n, such that (1) Ai= E ai,Or, where A = (air) is a matrix of given coefficients and (Or) is a vector of unknown parameters. In this paper the suffix i (and later the suffixes j, k, 1) will always run over the values 1, 2, * , n, and the suffix r will run from 1 up to the num- ber of parameters (t1r). Let (#r) denote estimates of (Or) obtained by the method of least squares, let (Yi) denote the fitted values, (2) Y= Eai, and let (zt) denote the residuals, (3) Zi =Yi - Yi. If A stands for the linear space spanned by (ail), (a,2), *-- , that is, by the columns of A, and if X is the complement of A, consisting of all n-component vectors orthogonal to A, then (Yi) is the projection of (yt) on A and (zi) is the projection of (yi) on Z. Let Q = (qij) be the idempotent positive-semidefinite symmetric matrix taking (y1) into (zi), that is, (4) Zi= qtj,yj. If A has dimension n - v (where v > 0), X is of dimension v and Q has rank v. Given A, we can choose a parameter set (0,), where r = 1, 2, * , n -v, such that the columns of A are linearly independent, and then if V-1 = A'A and if I stands for the n X n identity matrix (6ti), we have (5) Q =I-AVA'.
    [Show full text]
  • Lecture 14 Testing for Kurtosis
    9/8/2016 CHE384, From Data to Decisions: Measurement, Kurtosis Uncertainty, Analysis, and Modeling • For any distribution, the kurtosis (sometimes Lecture 14 called the excess kurtosis) is defined as Testing for Kurtosis 3 (old notation = ) • For a unimodal, symmetric distribution, Chris A. Mack – a positive kurtosis means “heavy tails” and a more Adjunct Associate Professor peaked center compared to a normal distribution – a negative kurtosis means “light tails” and a more spread center compared to a normal distribution http://www.lithoguru.com/scientist/statistics/ © Chris Mack, 2016Data to Decisions 1 © Chris Mack, 2016Data to Decisions 2 Kurtosis Examples One Impact of Excess Kurtosis • For the Student’s t • For a normal distribution, the sample distribution, the variance will have an expected value of s2, excess kurtosis is and a variance of 6 2 4 1 for DF > 4 ( for DF ≤ 4 the kurtosis is infinite) • For a distribution with excess kurtosis • For a uniform 2 1 1 distribution, 1 2 © Chris Mack, 2016Data to Decisions 3 © Chris Mack, 2016Data to Decisions 4 Sample Kurtosis Sample Kurtosis • For a sample of size n, the sample kurtosis is • An unbiased estimator of the sample excess 1 kurtosis is ∑ ̅ 1 3 3 1 6 1 2 3 ∑ ̅ Standard Error: • For large n, the sampling distribution of 1 24 2 1 approaches Normal with mean 0 and variance 2 1 of 24/n 3 5 • For small samples, this estimator is biased D. N. Joanes and C. A. Gill, “Comparing Measures of Sample Skewness and Kurtosis”, The Statistician, 47(1),183–189 (1998).
    [Show full text]