Notes for a Graduate-Level Course in Asymptotics for Statisticians
Total Page:16
File Type:pdf, Size:1020Kb
Notes for a graduate-level course in asymptotics for statisticians David R. Hunter Penn State University June 2014 Contents Preface 1 1 Mathematical and Statistical Preliminaries 3 1.1 Limits and Continuity . 4 1.1.1 Limit Superior and Limit Inferior . 6 1.1.2 Continuity . 8 1.2 Differentiability and Taylor's Theorem . 13 1.3 Order Notation . 18 1.4 Multivariate Extensions . 26 1.5 Expectation and Inequalities . 33 2 Weak Convergence 41 2.1 Modes of Convergence . 41 2.1.1 Convergence in Probability . 41 2.1.2 Probabilistic Order Notation . 43 2.1.3 Convergence in Distribution . 45 2.1.4 Convergence in Mean . 48 2.2 Consistent Estimates of the Mean . 51 2.2.1 The Weak Law of Large Numbers . 52 i 2.2.2 Independent but not Identically Distributed Variables . 52 2.2.3 Identically Distributed but not Independent Variables . 54 2.3 Convergence of Transformed Sequences . 58 2.3.1 Continuous Transformations: The Univariate Case . 58 2.3.2 Multivariate Extensions . 59 2.3.3 Slutsky's Theorem . 62 3 Strong convergence 70 3.1 Strong Consistency Defined . 70 3.1.1 Strong Consistency versus Consistency . 71 3.1.2 Multivariate Extensions . 73 3.2 The Strong Law of Large Numbers . 74 3.3 The Dominated Convergence Theorem . 79 3.3.1 Moments Do Not Always Converge . 79 3.3.2 Quantile Functions and the Skorohod Representation Theorem . 81 4 Central Limit Theorems 88 4.1 Characteristic Functions and Normal Distributions . 88 4.1.1 The Continuity Theorem . 89 4.1.2 Moments . 90 4.1.3 The Multivariate Normal Distribution . 91 4.1.4 Asymptotic Normality . 92 4.1.5 The Cram´er-Wold Theorem . 94 4.2 The Lindeberg-Feller Central Limit Theorem . 96 4.2.1 The Lindeberg and Lyapunov Conditions . 97 4.2.2 Independent and Identically Distributed Variables . 98 ii 4.2.3 Triangular Arrays . 99 4.3 Stationary m-Dependent Sequences . 108 4.4 Univariate extensions . 111 4.4.1 The Berry-Esseen theorem . 112 4.4.2 Edgeworth expansions . 113 5 The Delta Method and Applications 116 5.1 Local linear approximations . 116 5.1.1 Asymptotic distributions of transformed sequences . 116 5.1.2 Variance stabilizing transformations . 119 5.2 Sample Moments . 121 5.3 Sample Correlation . 123 6 Order Statistics and Quantiles 127 6.1 Extreme Order Statistics . 127 6.2 Sample Quantiles . 134 6.2.1 Uniform Order Statistics . 134 6.2.2 Uniform Sample Quantiles . 135 6.2.3 General sample quantiles . 137 7 Maximum Likelihood Estimation 140 7.1 Consistency . 140 7.2 Asymptotic normality of the MLE . 144 7.3 Asymptotic Efficiency and Superefficiency . 149 7.4 The multiparameter case . 154 7.5 Nuisance parameters . 159 iii 8 Hypothesis Testing 161 8.1 Wald, Rao, and Likelihood Ratio Tests . 161 8.2 Contiguity and Local Alternatives . 165 8.3 The Wilcoxon Rank-Sum Test . 175 9 Pearson's chi-square test 180 9.1 Null hypothesis asymptotics . 180 9.2 Power of Pearson's chi-square test . 187 10 U-statistics 190 10.1 Statistical Functionals and V-Statistics . 190 10.2 Asymptotic Normality . 194 10.3 Multivariate and multi-sample U-statistics . 201 10.4 Introduction to the Bootstrap . 205 iv Preface These notes are designed to accompany STAT 553, a graduate-level course in large-sample theory at Penn State intended for students who may not have had any exposure to measure- theoretic probability. While many excellent large-sample theory textbooks already exist, the majority (though not all) of them reflect a traditional view in graduate-level statistics education that students should learn measure-theoretic probability before large-sample the- ory. The philosophy of these notes is that these priorities are backwards, and that in fact statisticians have more to gain from an understanding of large-sample theory than of measure theory. The intended audience will have had a year-long sequence in mathematical statistics, along with the usual calculus and linear algebra prerequisites that usually accompany such a course, but no measure theory. Many exercises require students to do some computing, based on the notion that comput- ing skills should be emphasized in all statistics courses whenever possible, provided that the computing enhances the understanding of the subject matter. The study of large-sample the- ory lends itself very well to computing, since frequently the theoretical large-sample results we prove do not give any indication of how well asymptotic approximations work for finite samples. Thus, simulation for the purpose of checking the quality of asymptotic approxi- mations for small samples is very important in understanding the limitations of the results being learned. Of course, all computing activities will force students to choose a particular computing environment. Occasionally, hints are offered in the notes using R (http://www.r- project.org), though these exercises can be completed using other packages or languages, provided that they possess the necessary statistical and graphical capabilities. Credit where credit is due: These notes originally evolved as an accompaniment to the book Elements of Large-Sample Theory by the late Erich Lehmann; the strong influence of that great book, which shares the philosophy of these notes regarding the mathematical level at which an introductory large-sample theory course should be taught, is still very much evident here. I am fortunate to have had the chance to correspond with Professor Lehmann several times about his book, as my students and I provided lists of typographical errors that we had spotted. He was extremely gracious and I treasure the letters that he sent me, written out longhand and sent through the mail even though we were already well into the 1 era of electronic communication. I have also drawn on many other sources for ideas or for exercises. Among these are the fantastic and concise A Course in Large Sample Theory by Thomas Ferguson, the compre- hensive and beautifully written Asymptotic Statistics by A. W. van der Vaart, and the classic probability textbooks Probability and Measure by Patrick Billingsley and An Introduction to Probability Theory and Its Applications, Volumes 1 and 2 by William Feller. Arkady Tem- pelman at Penn State helped with some of the Strong-Law material in Chapter 3, and it was Tom Hettmansperger who originally convinced me to design this course at Penn State back in 2000 when I was a new assistant professor. My goal in doing so was to teach a course that I wished I had had as a graduate student, and I hope that these notes help to achieve that goal. 2 Chapter 1 Mathematical and Statistical Preliminaries We assume that many readers are familiar with much of the material presented in this chapter. However, we do not view this material as superfluous, and we feature it prominently as the first chapter of these notes for several reasons. First, some of these topics may have been learned long ago by readers, and a review of this chapter may remind them of knowledge they have forgotten. Second, including these preliminary topics as a separate chapter makes the notes more self-contained than if the topics were omitted: We do not have to refer readers to \a standard calculus textbook" or \a standard mathematical statistics textbook" whenever an advanced result relies on this preliminary material. Third, some of the topics here are likely to be new to some readers, particularly readers who have not taken a course in real analysis. Fourth, and perhaps most importantly, we wish to set the stage in this chapter for a math- ematically rigorous treatment of large-sample theory. By \mathematically rigorous," we do not mean “difficult” or \advanced"; rather, we mean logically sound, relying on argu- ments in which assumptions and definitions are unambiguously stated and assertions must be provable from these assumptions and definitions. Thus, even well-prepared readers who know the material in this chapter often benefit from reading it and attempting the exercises, particularly if they are new to rigorous mathematics and proof-writing. We strongly caution against the alluring idea of saving time by skipping this chapter when teaching a course, telling students \you can always refer to Chapter 1 when you need to"; we have learned the hard way that this is a dangerous approach that can waste more time in the long run than it saves! 3 1.1 Limits and Continuity Fundamental to the study of large-sample theory is the idea of the limit of a sequence. Much of these notes will be devoted to sequences of random variables; however, we begin here by focusing on sequences of real numbers. Technically, a sequence of real numbers is a function from the natural numbers f1; 2; 3;:::g into the real numbers R; yet we always write a1; a2;::: instead of the more traditional function notation a(1); a(2);:::. We begin by defining a limit of a sequence of real numbers. This is a concept that will be intuitively clear to readers familiar with calculus. For example, the fact that the sequence a1 = 1:3; a2 = 1:33; a3 = 1:333;::: has a limit equal to 4/3 is unsurprising. Yet there are some subtleties that arise with limits, and for this reason and also to set the stage for a rigorous treatment of the topic, we provide two separate definitions. It is important to remember that even these two definitions do not cover all possible sequences; that is, not every sequence has a well-defined limit. Definition 1.1 A sequence of real numbers a1; a2;::: has limit equal to the real num- ber a if for every > 0, there exists N such that jan − aj < for all n > N.