Asymptotic in Statistics Lecture Notes for Stat522b Jiahua Chen

Total Page:16

File Type:pdf, Size:1020Kb

Asymptotic in Statistics Lecture Notes for Stat522b Jiahua Chen Asymptotic in Statistics Lecture Notes for Stat522B Jiahua Chen Department of Statistics University of British Columbia 2 Course Outline A number of asymptotic results in statistics will be presented: concepts of statis- tic order, the classical law of large numbers and central limit theorem; the large sample behaviour of the empirical distribution and sample quantiles. Prerequisite: Stat 460/560 or permission of the instructor. Topics: • Review of probability theory, probability inequalities. • Modes of convergence, stochastic order, laws of large numbers. • Results on asymptotic normality. • Empirical distribution, moments and quartiles • Smoothing method • Asymptotic Results in Finite Mixture Models Assessment: Students will be expected to work on 20 assignment problems plus a research report on a topic of their own choice. Contents 1 Brief preparation in probability theory 1 1.1 Measure and measurable space . 1 1.2 Probability measure and random variables . 3 1.3 Conditional expectation . 6 1.4 Independence . 8 1.5 Assignment problems . 9 2 Fundamentals in Asymptotic Theory 11 2.1 Mode of convergence . 12 2.2 Uniform Strong law of large numbers . 17 2.3 Convergence in distribution . 19 2.4 Central limit theorem . 21 2.5 Big and small o, Slutsky’s theorem . 22 2.6 Asymptotic normality for functions of random variables . 24 2.7 Sum of random number of random variables . 25 2.8 Assignment problems . 26 3 Empirical distributions, moments and quantiles 29 3.1 Properties of sample moments . 30 3.2 Empirical distribution function . 34 3.3 Sample quantiles . 35 3.4 Inequalities on bounded random variables . 38 3.5 Bahadur’s representation . 40 1 2 CONTENTS 4 Smoothing method 47 4.1 Kernel density estimate . 47 4.1.1 Bias of the kernel density estimator . 49 4.1.2 Variance of the kernel density estimator . 50 4.1.3 Asymptotic normality of the kernel density estimator . 52 4.2 Non-parametric regression analysis . 53 4.2.1 Kernel regression estimator . 54 4.2.2 Local polynomial regression estimator . 55 4.2.3 Asymptotic bias and variance for fixed design . 56 4.2.4 Bias and variance under random design . 57 4.3 Assignment problems . 61 5 Asymptotic Results in Finite Mixture Models 63 5.1 Finite mixture model . 63 5.2 Test of homogeneity . 65 5.3 Binomial mixture example . 66 5.4 C(a) test . 70 5.4.1 The generic C(a) test . 71 5.4.2 C(a) test for homogeneity . 73 5.4.3 C(a) statistic under NEF-QVF . 76 5.4.4 Expressions of the C(a) statistics for NEF-VEF mixtures . 77 5.5 Brute-force likelihood ratio test for homogeneity . 78 5.5.1 Examples . 83 5.5.2 The proof of Theorem 5.2 . 86 Chapter 1 Brief preparation in probability theory 1.1 Measure and measurable space Measure theory is motivated by the desire of measuring the length, area or volumn of subsets in a space W under consideration. However, unless W is finite, the number of possible subsets of W is very large. In most cases, it is not possible to define a measure so that it has some desirable properties and it is consistent with common notions of area and volume. Consider the one-dimensional Euclid space R consists of all real numbers and suppose that we want to give a length measurement to each subset of R. For an ordinary interval (a;b] with b > a, it is natural to define its length as m((a;b]) = b − a; where m is the notation for measuring the length of a set. Let Ii = (ai;bi] and A = [Ii and suppose ai ≤ bi < ai+1 for all i = 1;2;:::. It is natural to require m to have the property such that ¥ m(A) = ∑(bi − ai): i=1 That is, we are imposing a rule on measuring the length of the subsets of R. 1 2 CHAPTER 1. BRIEF PREPARATION IN PROBABILITY THEORY Naturally, if the lengths of Ai, i = 1;2;::: have been defined, we want ¥ ¥ m([i=1Ai) = ∑ m(Ai); (1.1) i=1 when Ai are mutually exclusive. The above discussion shows that a measure might be introduced by first as- signing measurements to simple subsets, and then be extended by applying the additive rule (1.1) to assign measurements to more complex subsets. Unfortu- nately, this procedure often does not extend the domain of the measure to all possible subsets of W. Instead, we can identify the maximum collection of subsets that a measure can be extended to. This collection of sets is closed under countable union. The notion of s-algebra seems to be the result of such a consideration. Definition 1.1 Let W be a space under consideration. A class of subsets F is called a s-algebra if it satisfies the following three conditions: (1) The empty set /0 2 F ; (2) If A 2 F , then Ac 2 F ; ¥ (3) If Ai 2 F , i = 1;2;:::, then their union [i=1Ai 2 F . Note that the property (3) is only applicable to countable number of sets. When W = R and F contains all intervals, then the smallest possible s-algebra for F is called Borel s-algebra and all the sets in F are called Borel sets. We denote the Borel s-algebra as B. Even though not every subset of real numbers is a Borel set, statisticians rarely have to consider non-Borel sets in their research. As a side remark, the domain of a measure on R such that m((a;b]) = b − a, can be extended beyond Borel s-algebra, for instance, Lesbegues algebra. When a space W is equipped with a s-algebra F , we call (W;F ) a measurable space: it has the potential to be equipped with a measure. A measure is formally defined as a set function on F with some properties. Definition 1.2 Let (W;F ) be a measureable space. A set function m defined on F is a measure if it satisfies the following three properties. (1) For any A 2 F , m(A) ≥ 0; (2) The empty set /0 has 0 measure; 1.2. PROBABILITY MEASURE AND RANDOM VARIABLES 3 (3) It is countably additive: ¥ ¥ m([i=1Ai) = ∑ m(Ai) i=1 when Ai are mutually exclusive. We have to restrict the additivity to countable number of sets. This restriction results in a strange fact in probability theory. If a random variable is continuous, then the probability that this random variable takes any specific real value is zero. At the same time, that chance for it to fall into some interval (which is made of in- dividual values) can be larger than 0. The definition of a measure disallows adding up probabilities over all the real values in the interval to form the probability of the interval. In measure theory, the measure of a subset is allowed to be infinity. We assume that ¥ + ¥ = ¥ and so on. If we let m(A) = ¥ for all non-empty set A, this set function satisfies the conditions for a measure. Such measures is probably not useful. Even if some sets possessing infinite measure, we would like to have a sequence of mutually exclusive sets such that every one of them have finite measure, and their union covers the whole space. We call this kind of measure s-finite. Naturally, s-finite measures have many other mathematical properties that are convenient in applications. When a space is equipped with a s-algebra F , the sets in F have the potential to be measured. Hence, we have a measurable space (W;F ). After a measure n is actually assigned, we obtain a measure space (W;F ;n). 1.2 Probability measure and random variables To a mathematician, a probability measure P is merely a specific measure: it as- signs measure 1 to the whole space. The whole space is now called the sample space which denotes the set of all possible outcomes of an experiment. Individual possible outcomes are called sample points. For theoretical discussion, a specific experimental setup is redundant in the probability theory. In fact, we do not men- tion the sample space at all. In statistics, the focus is on functions defined on the sample space W, and these functions are called random variables. Let X be a randon variable. The desire of 4 CHAPTER 1. BRIEF PREPARATION IN PROBABILITY THEORY computing the probability of fw : X(w) 2 Bg for a Borel set B makes it necessary for fw : X(w) 2 Bg 2 F . These considerations motive the definition of a random variable. Definition 1.3 A random variable is a real valued function on the probability (W;F ;P) such that fw : X(w) 2 Bg 2 F for all Borel sets B. In plain words, random variables are F -measurable functions. Interestingly, this definition rules out the possibility for X to take infinity as its value and implies the cumulative distribution function defined as F(x) = P(X ≤ x) has limit 1 when x ! ¥. For one-dimensional function F(x), it is a cumulative distribution function of some random variable if and only if 1. limx→−¥ F(x) = 0; limx!¥ F(x) = 1. 2. F(x) is a non-decreasing, right continuous function. Note also that with each random variable defined, we could define a corre- sponding probability measure PX on the real space such that PX (B) = P(X 2 B): We have hence obtained an induced measure on R. At the same time, the collection of sets X 2 B is also a s-algebra.
Recommended publications
  • The Saddle Point Method in Combinatorics Asymptotic Analysis: Successes and Failures (A Personal View)
    Pn The number of inversions in permutations Median versus A (A large) for a Luria-Delbruck-like distribution, with parameter A Sum of positions of records in random permutations Merten's theorem for toral automorphisms Representations of numbers as k=−n "k k The q-Catalan numbers A simple case of the Mahonian statistic Asymptotics of the Stirling numbers of the first kind revisited The Saddle point method in combinatorics asymptotic analysis: successes and failures (A personal view) Guy Louchard May 31, 2011 Guy Louchard The Saddle point method in combinatorics asymptotic analysis: successes and failures (A personal view) Pn The number of inversions in permutations Median versus A (A large) for a Luria-Delbruck-like distribution, with parameter A Sum of positions of records in random permutations Merten's theorem for toral automorphisms Representations of numbers as k=−n "k k The q-Catalan numbers A simple case of the Mahonian statistic Asymptotics of the Stirling numbers of the first kind revisited Outline 1 The number of inversions in permutations 2 Median versus A (A large) for a Luria-Delbruck-like distribution, with parameter A 3 Sum of positions of records in random permutations 4 Merten's theorem for toral automorphisms Pn 5 Representations of numbers as k=−n "k k 6 The q-Catalan numbers 7 A simple case of the Mahonian statistic 8 Asymptotics of the Stirling numbers of the first kind revisited Guy Louchard The Saddle point method in combinatorics asymptotic analysis: successes and failures (A personal view) Pn The number of inversions in permutations Median versus A (A large) for a Luria-Delbruck-like distribution, with parameter A Sum of positions of records in random permutations Merten's theorem for toral automorphisms Representations of numbers as k=−n "k k The q-Catalan numbers A simple case of the Mahonian statistic Asymptotics of the Stirling numbers of the first kind revisited The number of inversions in permutations Let a1 ::: an be a permutation of the set f1;:::; ng.
    [Show full text]
  • Higher-Order Asymptotics
    Higher-Order Asymptotics Todd Kuffner Washington University in St. Louis WHOA-PSI 2016 1 / 113 First- and Higher-Order Asymptotics Classical Asymptotics in Statistics: available sample size n ! 1 First-Order Asymptotic Theory: asymptotic statements that are correct to order O(n−1=2) Higher-Order Asymptotics: refinements to first-order results 1st order 2nd order 3rd order kth order error O(n−1=2) O(n−1) O(n−3=2) O(n−k=2) or or or or o(1) o(n−1=2) o(n−1) o(n−(k−1)=2) Why would anyone care? deeper understanding more accurate inference compare different approaches (which agree to first order) 2 / 113 Points of Emphasis Convergence pointwise or uniform? Error absolute or relative? Deviation region moderate or large? 3 / 113 Common Goals Refinements for better small-sample performance Example Edgeworth expansion (absolute error) Example Barndorff-Nielsen’s R∗ Accurate Approximation Example saddlepoint methods (relative error) Example Laplace approximation Comparative Asymptotics Example probability matching priors Example conditional vs. unconditional frequentist inference Example comparing analytic and bootstrap procedures Deeper Understanding Example sources of inaccuracy in first-order theory Example nuisance parameter effects 4 / 113 Is this relevant for high-dimensional statistical models? The Classical asymptotic regime is when the parameter dimension p is fixed and the available sample size n ! 1. What if p < n or p is close to n? 1. Find a meaningful non-asymptotic analysis of the statistical procedure which works for any n or p (concentration inequalities) 2. Allow both n ! 1 and p ! 1. 5 / 113 Some First-Order Theory Univariate (classical) CLT: Assume X1;X2;::: are i.i.d.
    [Show full text]
  • Asymptotic Analysis for Periodic Structures
    ASYMPTOTIC ANALYSIS FOR PERIODIC STRUCTURES A. BENSOUSSAN J.-L. LIONS G. PAPANICOLAOU AMS CHELSEA PUBLISHING American Mathematical Society • Providence, Rhode Island ASYMPTOTIC ANALYSIS FOR PERIODIC STRUCTURES ASYMPTOTIC ANALYSIS FOR PERIODIC STRUCTURES A. BENSOUSSAN J.-L. LIONS G. PAPANICOLAOU AMS CHELSEA PUBLISHING American Mathematical Society • Providence, Rhode Island M THE ATI A CA M L ΤΡΗΤΟΣ ΜΗ N ΕΙΣΙΤΩ S A O C C I I R E E T ΑΓΕΩΜΕ Y M A F O 8 U 88 NDED 1 2010 Mathematics Subject Classification. Primary 80M40, 35B27, 74Q05, 74Q10, 60H10, 60F05. For additional information and updates on this book, visit www.ams.org/bookpages/chel-374 Library of Congress Cataloging-in-Publication Data Bensoussan, Alain. Asymptotic analysis for periodic structures / A. Bensoussan, J.-L. Lions, G. Papanicolaou. p. cm. Originally published: Amsterdam ; New York : North-Holland Pub. Co., 1978. Includes bibliographical references. ISBN 978-0-8218-5324-5 (alk. paper) 1. Boundary value problems—Numerical solutions. 2. Differential equations, Partial— Asymptotic theory. 3. Probabilities. I. Lions, J.-L. (Jacques-Louis), 1928–2001. II. Papani- colaou, George. III. Title. QA379.B45 2011 515.353—dc23 2011029403 Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy a chapter for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society.
    [Show full text]
  • Definite Integrals in an Asymptotic Setting
    A Lost Theorem: Definite Integrals in An Asymptotic Setting Ray Cavalcante and Todor D. Todorov 1 INTRODUCTION We present a simple yet rigorous theory of integration that is based on two axioms rather than on a construction involving Riemann sums. With several examples we demonstrate how to set up integrals in applications of calculus without using Riemann sums. In our axiomatic approach even the proof of the existence of the definite integral (which does use Riemann sums) becomes slightly more elegant than the conventional one. We also discuss an interesting connection between our approach and the history of calculus. The article is written for readers who teach calculus and its applications. It might be accessible to students under a teacher’s supervision and suitable for senior projects on calculus, real analysis, or history of mathematics. Here is a summary of our approach. Let ρ :[a, b] → R be a continuous function and let I :[a, b] × [a, b] → R be the corresponding integral function, defined by y I(x, y)= ρ(t)dt. x Recall that I(x, y) has the following two properties: (A) Additivity: I(x, y)+I (y, z)=I (x, z) for all x, y, z ∈ [a, b]. (B) Asymptotic Property: I(x, x + h)=ρ (x)h + o(h)ash → 0 for all x ∈ [a, b], in the sense that I(x, x + h) − ρ(x)h lim =0. h→0 h In this article we show that properties (A) and (B) are characteristic of the definite integral. More precisely, we show that for a given continuous ρ :[a, b] → R, there is no more than one function I :[a, b]×[a, b] → R with properties (A) and (B).
    [Show full text]
  • The Method of Maximum Likelihood for Simple Linear Regression
    08:48 Saturday 19th September, 2015 See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 6: The Method of Maximum Likelihood for Simple Linear Regression 36-401, Fall 2015, Section B 17 September 2015 1 Recapitulation We introduced the method of maximum likelihood for simple linear regression in the notes for two lectures ago. Let's review. We start with the statistical model, which is the Gaussian-noise simple linear regression model, defined as follows: 1. The distribution of X is arbitrary (and perhaps X is even non-random). 2. If X = x, then Y = β0 + β1x + , for some constants (\coefficients", \parameters") β0 and β1, and some random noise variable . 3. ∼ N(0; σ2), and is independent of X. 4. is independent across observations. A consequence of these assumptions is that the response variable Y is indepen- dent across observations, conditional on the predictor X, i.e., Y1 and Y2 are independent given X1 and X2 (Exercise 1). As you'll recall, this is a special case of the simple linear regression model: the first two assumptions are the same, but we are now assuming much more about the noise variable : it's not just mean zero with constant variance, but it has a particular distribution (Gaussian), and everything we said was uncorrelated before we now strengthen to independence1. Because of these stronger assumptions, the model tells us the conditional pdf 2 of Y for each x, p(yjX = x; β0; β1; σ ). (This notation separates the random variables from the parameters.) Given any data set (x1; y1); (x2; y2);::: (xn; yn), we can now write down the probability density, under the model, of seeing that data: n n (y −(β +β x ))2 Y 2 Y 1 − i 0 1 i p(yijxi; β0; β1; σ ) = p e 2σ2 2 i=1 i=1 2πσ 1See the notes for lecture 1 for a reminder, with an explicit example, of how uncorrelated random variables can nonetheless be strongly statistically dependent.
    [Show full text]
  • Notes for a Graduate-Level Course in Asymptotics for Statisticians
    Notes for a graduate-level course in asymptotics for statisticians David R. Hunter Penn State University June 2014 Contents Preface 1 1 Mathematical and Statistical Preliminaries 3 1.1 Limits and Continuity . 4 1.1.1 Limit Superior and Limit Inferior . 6 1.1.2 Continuity . 8 1.2 Differentiability and Taylor's Theorem . 13 1.3 Order Notation . 18 1.4 Multivariate Extensions . 26 1.5 Expectation and Inequalities . 33 2 Weak Convergence 41 2.1 Modes of Convergence . 41 2.1.1 Convergence in Probability . 41 2.1.2 Probabilistic Order Notation . 43 2.1.3 Convergence in Distribution . 45 2.1.4 Convergence in Mean . 48 2.2 Consistent Estimates of the Mean . 51 2.2.1 The Weak Law of Large Numbers . 52 i 2.2.2 Independent but not Identically Distributed Variables . 52 2.2.3 Identically Distributed but not Independent Variables . 54 2.3 Convergence of Transformed Sequences . 58 2.3.1 Continuous Transformations: The Univariate Case . 58 2.3.2 Multivariate Extensions . 59 2.3.3 Slutsky's Theorem . 62 3 Strong convergence 70 3.1 Strong Consistency Defined . 70 3.1.1 Strong Consistency versus Consistency . 71 3.1.2 Multivariate Extensions . 73 3.2 The Strong Law of Large Numbers . 74 3.3 The Dominated Convergence Theorem . 79 3.3.1 Moments Do Not Always Converge . 79 3.3.2 Quantile Functions and the Skorohod Representation Theorem . 81 4 Central Limit Theorems 88 4.1 Characteristic Functions and Normal Distributions . 88 4.1.1 The Continuity Theorem . 89 4.1.2 Moments .
    [Show full text]
  • Use of the Kurtosis Statistic in the Frequency Domain As an Aid In
    lEEE JOURNALlEEE OF OCEANICENGINEERING, VOL. OE-9, NO. 2, APRIL 1984 85 Use of the Kurtosis Statistic in the FrequencyDomain as an Aid in Detecting Random Signals Absmact-Power spectral density estimation is often employed as a couldbe utilized in signal processing. The objective ofthis method for signal ,detection. For signals which occur randomly, a paper is to compare the PSD technique for signal processing frequency domain kurtosis estimate supplements the power spectral witha new methodwhich computes the frequency domain density estimate and, in some cases, can be.employed to detect their presence. This has been verified from experiments vith real data of kurtosis (FDK) [2] forthe real and imaginary parts of the randomly occurring signals. In order to better understand the detec- complex frequency components. Kurtosis is defined as a ratio tion of randomlyoccurring signals, sinusoidal and narrow-band of a fourth-order central moment to the square of a second- Gaussian signals are considered, which when modeled to represent a order central moment. fading or multipath environment, are received as nowGaussian in Using theNeyman-Pearson theory in thetime domain, terms of a frequency domain kurtosis estimate. Several fading and multipath propagation probability density distributions of practical Ferguson [3] , has shown that kurtosis is a locally optimum interestare considered, including Rayleigh and log-normal. The detectionstatistic under certain conditions. The reader is model is generalized to handle transient and frequency modulated referred to Ferguson'swork for the details; however, it can signals by taking into account the probability of the signal being in a be simply said thatit is concernedwith detecting outliers specific frequency range over the total data interval.
    [Show full text]
  • Introducing Taylor Series and Local Approximations Using a Historical and Semiotic Approach Kouki Rahim, Barry Griffiths
    Introducing Taylor Series and Local Approximations using a Historical and Semiotic Approach Kouki Rahim, Barry Griffiths To cite this version: Kouki Rahim, Barry Griffiths. Introducing Taylor Series and Local Approximations using a Historical and Semiotic Approach. IEJME, Modestom LTD, UK, 2019, 15 (2), 10.29333/iejme/6293. hal- 02470240 HAL Id: hal-02470240 https://hal.archives-ouvertes.fr/hal-02470240 Submitted on 7 Feb 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. INTERNATIONAL ELECTRONIC JOURNAL OF MATHEMATICS EDUCATION e-ISSN: 1306-3030. 2020, Vol. 15, No. 2, em0573 OPEN ACCESS https://doi.org/10.29333/iejme/6293 Introducing Taylor Series and Local Approximations using a Historical and Semiotic Approach Rahim Kouki 1, Barry J. Griffiths 2* 1 Département de Mathématique et Informatique, Université de Tunis El Manar, Tunis 2092, TUNISIA 2 Department of Mathematics, University of Central Florida, Orlando, FL 32816-1364, USA * CORRESPONDENCE: [email protected] ABSTRACT In this article we present the results of a qualitative investigation into the teaching and learning of Taylor series and local approximations. In order to perform a comparative analysis, two investigations are conducted: the first is historical and epistemological, concerned with the pedagogical evolution of semantics, syntax and semiotics; the second is a contemporary institutional investigation, devoted to the results of a review of curricula, textbooks and course handouts in Tunisia and the United States.
    [Show full text]
  • An Introduction to Asymptotic Analysis Simon JA Malham
    An introduction to asymptotic analysis Simon J.A. Malham Department of Mathematics, Heriot-Watt University Contents Chapter 1. Order notation 5 Chapter 2. Perturbation methods 9 2.1. Regular perturbation problems 9 2.2. Singular perturbation problems 15 Chapter 3. Asymptotic series 21 3.1. Asymptotic vs convergent series 21 3.2. Asymptotic expansions 25 3.3. Properties of asymptotic expansions 26 3.4. Asymptotic expansions of integrals 29 Chapter 4. Laplace integrals 31 4.1. Laplace's method 32 4.2. Watson's lemma 36 Chapter 5. Method of stationary phase 39 Chapter 6. Method of steepest descents 43 Bibliography 49 Appendix A. Notes 51 A.1. Remainder theorem 51 A.2. Taylor series for functions of more than one variable 51 A.3. How to determine the expansion sequence 52 A.4. How to find a suitable rescaling 52 Appendix B. Exam formula sheet 55 3 CHAPTER 1 Order notation The symbols , o and , were first used by E. Landau and P. Du Bois- Reymond and areOdefined as∼ follows. Suppose f(z) and g(z) are functions of the continuous complex variable z defined on some domain C and possess D ⊂ limits as z z0 in . Then we define the following shorthand notation for the relative!propertiesD of these functions in the limit z z . ! 0 Asymptotically bounded: f(z) = (g(z)) as z z ; O ! 0 means that: there exists constants K 0 and δ > 0 such that, for 0 < z z < δ, ≥ j − 0j f(z) K g(z) : j j ≤ j j We say that f(z) is asymptotically bounded by g(z) in magnitude as z z0, or more colloquially, and we say that f(z) is of `order big O' of g(z).
    [Show full text]
  • Chapter 4. an Introduction to Asymptotic Theory"
    Chapter 4. An Introduction to Asymptotic Theory We introduce some basic asymptotic theory in this chapter, which is necessary to understand the asymptotic properties of the LSE. For more advanced materials on the asymptotic theory, see Dudley (1984), Shorack and Wellner (1986), Pollard (1984, 1990), Van der Vaart and Wellner (1996), Van der Vaart (1998), Van de Geer (2000) and Kosorok (2008). For reader-friendly versions of these materials, see Gallant (1987), Gallant and White (1988), Newey and McFadden (1994), Andrews (1994), Davidson (1994) and White (2001). Our discussion is related to Section 2.1 and Chapter 7 of Hayashi (2000), Appendix C and D of Hansen (2007) and Chapter 3 of Wooldridge (2010). In this chapter and the next chapter, always means the Euclidean norm. kk 1 Five Weapons in Asymptotic Theory There are …ve tools (and their extensions) that are most useful in asymptotic theory of statistics and econometrics. They are the weak law of large numbers (WLLN, or LLN), the central limit theorem (CLT), the continuous mapping theorem (CMT), Slutsky’s theorem,1 and the Delta method. We only state these …ve tools here; detailed proofs can be found in the techinical appendix. To state the WLLN, we …rst de…ne the convergence in probability. p De…nition 1 A random vector Zn converges in probability to Z as n , denoted as Zn Z, ! 1 ! if for any > 0, lim P ( Zn Z > ) = 0: n !1 k k Although the limit Z can be random, it is usually constant. The probability limit of Zn is often p denoted as plim(Zn).
    [Show full text]
  • Chapter 6 Asymptotic Distribution Theory
    RS – Chapter 6 Chapter 6 Asymptotic Distribution Theory Asymptotic Distribution Theory • Asymptotic distribution theory studies the hypothetical distribution -the limiting distribution- of a sequence of distributions. • Do not confuse with asymptotic theory (or large sample theory), which studies the properties of asymptotic expansions. • Definition Asymptotic expansion An asymptotic expansion (asymptotic series or Poincaré expansion) is a formal series of functions, which has the property that truncating the series after a finite number of terms provides an approximation to a given function as the argument of the function tends towards a particular, often infinite, point. (In asymptotic distribution theory, we do use asymptotic expansions.) 1 RS – Chapter 6 Asymptotic Distribution Theory • In Chapter 5, we derive exact distributions of several sample statistics based on a random sample of observations. • In many situations an exact statistical result is difficult to get. In these situations, we rely on approximate results that are based on what we know about the behavior of certain statistics in large samples. • Example from basic statistics: What can we say about 1/ x ? We know a lot about x . What do we know about its reciprocal? Maybe we can get an approximate distribution of 1/ x when n is large. Convergence • Convergence of a non-random sequence. Suppose we have a sequence of constants, indexed by n f(n) = ((n(n+1)+3)/(2n + 3n2 + 5) n=1, 2, 3, ..... 2 Ordinary limit: limn→∞ ((n(n+1)+3)/(2n + 3n + 5) = 1/3 There is nothing stochastic about the limit above. The limit will always be 1/3. • In econometrics, we are interested in the behavior of sequences of real-valued random scalars or vectors.
    [Show full text]
  • Statistical Models in R Some Examples
    Statistical Models Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Statistical Models Outline Statistical Models Linear Models in R Statistical Models Regression Regression analysis is the appropriate statistical method when the response variable and all explanatory variables are continuous. Here, we only discuss linear regression, the simplest and most common form. Remember that a statistical model attempts to approximate the response variable Y as a mathematical function of the explanatory variables X1;:::; Xn. This mathematical function may involve parameters. Regression analysis attempts to use sample data find the parameters that produce the best model Statistical Models Linear Models The simplest such model is a linear model with a unique explanatory variable, which takes the following form. y^ = a + bx: Here, y is the response variable vector, x the explanatory variable, y^ is the vector of fitted values and a (intercept) and b (slope) are real numbers. Plotting y versus x, this model represents a line through the points. For a given index i,y ^i = a + bxi approximates yi . Regression amounts to finding a and b that gives the best fit. Statistical Models Linear Model with 1 Explanatory Variable ● 10 ● ● ● ● ● 5 y ● ● ● ● ● ● y ● ● y−hat ● ● ● ● 0 ● ● ● ● x=2 0 1 2 3 4 5 x Statistical Models Plotting Commands for the record The plot was generated with test data xR, yR with: > plot(xR, yR, xlab = "x", ylab = "y") > abline(v = 2, lty = 2) > abline(a = -2, b = 2, col = "blue") > points(c(2), yR[9], pch = 16, col = "red") > points(c(2), c(2), pch = 16, col = "red") > text(2.5, -4, "x=2", cex = 1.5) > text(1.8, 3.9, "y", cex = 1.5) > text(2.5, 1.9, "y-hat", cex = 1.5) Statistical Models Linear Regression = Minimize RSS Least Squares Fit In linear regression the best fit is found by minimizing n n X 2 X 2 RSS = (yi − y^i ) = (yi − (a + bxi )) : i=1 i=1 This is a Calculus I problem.
    [Show full text]