Mathematical Statistics, Lecture 16 Asymptotics: Consistency and Delta

Total Page:16

File Type:pdf, Size:1020Kb

Mathematical Statistics, Lecture 16 Asymptotics: Consistency and Delta Asymptotics Asymptotics: Consistency and Delta Method MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Outline 1 Asymptotics Asymptotics: Consistency Delta Method: Approximating Moments Delta Method: Approximating Distributions 2 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Consistency Statistical Estimation Problem X1;:::; Xn iid Pθ; θ 2 Θ: q(θ): target of estimation. q^(X1;:::; Xn): estimator of q(θ): Definition: q^n is a consistent estimator of q(θ), i.e., Pθ q^n(X1;:::; Xn) −! q(θ) if for every E > 0; limn!1 Pθ(jqn(X1;:::; Xn) − q(θ)j > E) = 0: Example: Consider Pθ such that: E [X1 j θ] = θ q(θ) = θ Pn i=1 Xi q^n = = X n When is q^n consistent for θ? 3 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Consistency: Example Example: Consistency of sample mean q^n(X1;:::; Xn) = X = X n: 2 If Var(X1 j θ) = σ (θ) < 1, apply Chebychev's Inequality. For any E > 0: 2 2 Var(X n j θ) σ (θ)/E n!1 Pθ(jX nj ≥ E) ≤ = −−−! 0 E2 n If Var(X1 j θ) = 1; X n is consistent if E [jX1j j θ] < 1: Proof: Levy Continuity Theorem. 4 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Consistency: A Stronger Definition Definition: q^n is a uniformly consistent estimator of q(θ), if for every E > 0; limn!1 sup[Pθ(jqn(X1;:::; Xn) − q(θ)j > E)] = 0: θ2Θ Example: Consider the sample mean q^n = X n for which E [X n j θ] = θ 2 Var[X n j θ] = σ (θ): Proof of consistency of q^n = X n extends to uniform consistency if 2 supθ2Θ σ (θ) ≤ M < 1 (∗). Examples Satisfying (∗) Xi i.i.d. Bernoulli(θ): 2 2 Xi i.i.d. Normal(µ, σ ); where θ = (µ, σ ) Θ = fθg = (−∞; +1) × [0; M]; for finite M < 1: 5 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Consistency: The Strongest Definition Definition: q^n is a strongly consistent estimator of q(θ), if i t Pθ lim jqn(X1;:::; Xn) − q(θ)j ≤ E)] = 1; for every E > 0: n!1 a:s: q^n −−! q(θ): (a.s. ≡ "almost surely") Compare to: Definition: q^n is a (weakly) consistent estimator of q(θ), if for every E > 0; limn!1 [Pθ(jqn(X1;:::; Xn) − q(θ)j > E)] = 0: 6 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Consistency of Plug-In Estimators Plug-In Estimators: Discrete Case Discrete outcome space of size K : X = fx1;:::; xK g X1;:::; Xn iid Pθ; θ 2 Θ; where θ = (p1;:::; pK ) P(X1 = xk j θ) = pk ; k = 1;:::; K PK pk ≥ 0 for k = 1;:::; K and 1 pk = 1: Θ = SK (K -dimensional simplex). Define the empirical distribution: ^ θn = (^p1;:::; p^K ) where Pn 1(X = x ) N p^ = i=1 i k ≡ k k n n ^ θn 2 SK . 7 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Consistency of Plug-In Estimators Proposition/Theorem (5.2.1) Suppose Xn = (X1;:::; Xn) is a random sample of size n from a discrete distribution θ 2 S: Then: θ^n is uniformly consistent for θ 2 S: d For any continuous function q : S! R , q^ = q(θ^n) is uniformly consistent for q(θ): Proof: For any E > 0; Pθ(jθ^n − θj ≥ E) −! 0: This follows upon noting that: ^ 2 2 K ^ 2 2 fxn : jθn(xn) − θj < E g ⊃ \k=1fxn : j(θn(xn) − θ)k j < E =K g So ^ 2 2 PK ^ 2 2 P(fxn : jθn(xn) − θj ≥ E g ≤ k=1 P(fxn : j(θn(xn) − θ)k j ≥ E =K g PK 1 2 K 2 ≤ k=1 4n =(E =K ) = 4n:2 8 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Consistency of Plug-In Estimators Proof (continued): q(·): continuous on compact S =) q(·) uniformly continuous on S: For every E > 0; there exists δ(E) > 0 such that: jθ1 − θ0j < δ(E) =) jq(θ1) − q(θ0)j < E, uniformly for all θ0; θ1 2 Θ: It follows that c c fx : jq(θ^n(x)) − q(θ)j < Eg ⊆ fx : jθ^n − θj < δ(E)g =) Pθ[jq^n − q(θ)j ≥ E] ≤ Pθ[jθ^n − θj ≥ δ(E)] Note: uniform consistency can be shown; see B&D. 9 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Consistency of Plug-In Estimators Proposition 5.2.1 Suppose: d g = (g1;:::; gd ): X ! Y ⊂ R : E [jgj (X1)j j θ] < 1; for j = 1;:::; d; for all θ 2 Θ: mj (θ) ≡ E [gj (X1) j θ]; for j = 1;:::; d: Define: q(θ) = h(m(θ)), where m(θ) = (m1(θ);:::; md (θ)) p h : Y! R ; is continuous. Then: n ! 1 X q^n = h(g¯) = h g(Xi ) n i=1 is a consistent estimator of q(θ): 10 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Consistency of Plug-In Estimators Proposition 5.2.1 Applied to Non-Parametric Models P = fP : EP (jg(X1)j) < 1g and ν(P) = h(EP g(X1)) P ν(P^n) −! ν(P) where P^n is empirical distribution. 11 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Consistency of MLEs in Exponential Family Theorem 5.2.2 Suppose: P is a canonical exponential family of rank d generated by T T = (T1(X );:::; Td (X )) : p(x j η) = h(x)expfT(x)η − A(η)g E = fηg is open. X1;:::; Xn are i.i.d Pη 2 P Then: n!1 Pη[ the MLE η^ exists] −−−! 1: η^ is consistent. 12 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Consistency of MLEs in Exponential Family Proof: Pn o η^(X ;:::; X ) exists iff T = T(X )=n 2 C : 1 n n 1 i Tn o If η is true, then t = E [T(X ) j η ] 2 C 0 0 1 0 Tn • and A(η0) = t0: By definition of the interior of the convex support, there exists o δ > 0: S = ft : jt − E [T(X )j < δg ⊂ C . δ η0 1 Tn By the WLLN: n 1 X Pη0 Tn = T(Xi ) −−−−! Eη0 [T(X1)] n i=1 o n!1 =) P T 2 C −−−−−! 1: η0 n Tn • o η^ exists if it solves A(η) = T , i.e., if T 2 C , n n Tn • 0 The map η ! A(η) is 1-to-1 on CT and continuous on E; so • −1 the inverse function A is continuous, and Prop. 5.2.1 applies. 13 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Consistency of Minimum-Contrast Estimates Minimum-Contrast Estimates d X1;:::; Xn iid Pθ; θ 2 Θ ⊂ R : ρ(X ; θ): X × Θ ! R; a contrast function D(θ0; θ) = E [ρ(X ; θ) j θ0]: the discrepancy function θ = θ0 uniquely minimizes D(θ0; θ): The Minimum-Contrast Estimate θ^ minimizes: n 1 X ρn(X; θ) = ρ(Xi ; θ): n i=1 Theorem 5.2.3 Suppose n P 1 X θ0 supf ρ(Xi ; θ) − D(θ0; θ)g −−−−! 0 θ2Θ n i=1 inf fD(θ0; θ)g > D(θ0; θ0); for all E > 0: jθ−θ0|≥: Then θ^ is consistent. 14 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Outline 1 Asymptotics Asymptotics: Consistency Delta Method: Approximating Moments Delta Method: Approximating Distributions 15 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Delta Method Theorem 5.3.1 (Applying Taylor Expansions) Suppose that: X1;:::; Xn iid with outcome space X = R. 2 E [X1] = µ, and Var[X1] = σ < 1: m E [jX1j ] < 1: h : R ! R, m-times differentiable on R; with m ≥ 2: dj h(x) h(j) = , j = 1; 2;:::; m: dxj (m) (m) jjh jj1 ≡ sup jh (x)j ≤ M < 1: x2X Then m−1 (j) X h (µ) j E (h(X )) = h(µ) + E[(X − µ) ] + Rm j! j=1 m E [jX1j ] −m=2 where jRmj ≤ M m! n 16 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Proof: Apply Taylor Expansion to h(X ): h(X ) = m−1 (j) X h (µ) j (m) ∗ m h(µ) + (X − µ) + h (X )(X − µ) ; j! j=1 ∗ where jX − µj ≤ jX − µj Take expectations and apply Lemma 5.3.1: If j E jX1j < 1; j ≥ 2; then there exist constants Cj > 0 and Dj > 0 such that j j −j=2 E jX − µj ≤ Cj E jX1j n j j −(j+1)=2 jE [(X − µ) ]j ≤ Dj E jX1j n for j odd 17 MIT 18.655 Asymptotics: Consistency and Delta Method Asymptotics: Consistency Asymptotics Delta Method: Approximating Moments Delta Method: Approximating Distributions Applying Taylor Expansions Corollary 5.3.1 (a).
Recommended publications
  • Chapter 8 Large Sample Theory
    Chapter 8 Large Sample Theory 8.1 The CLT, Delta Method and an Expo- nential Family Limit Theorem Large sample theory, also called asymptotic theory, is used to approximate the distribution of an estimator when the sample size n is large. This theory is extremely useful if the exact sampling distribution of the estimator is complicated or unknown. To use this theory, one must determine what the estimator is estimating, the rate of convergence, the asymptotic distribution, and how large n must be for the approximation to be useful. Moreover, the (asymptotic) standard error (SE), an estimator of the asymptotic standard deviation, must be computable if the estimator is to be useful for inference. Theorem 8.1: the Central Limit Theorem (CLT). Let Y1,...,Yn be 2 1 n iid with E(Y )= µ and VAR(Y )= σ . Let the sample mean Y n = n i=1 Yi. Then √n(Y µ) D N(0,σ2). P n − → Hence n Y µ Yi nµ D √n n − = √n i=1 − N(0, 1). σ nσ → P Note that the sample mean is estimating the population mean µ with a √n convergence rate, the asymptotic distribution is normal, and the SE = S/√n where S is the sample standard deviation. For many distributions 203 the central limit theorem provides a good approximation if the sample size n> 30. A special case of the CLT is proven at the end of Section 4. Notation. The notation X Y and X =D Y both mean that the random variables X and Y have the same∼ distribution.
    [Show full text]
  • Applying the Delta Method in Metric Analytics
    Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas Alex Deng Ulf Knoblich Jiannan Lu∗ Microsoft Corporation Microsoft Corporation Microsoft Corporation Redmond, WA Redmond, WA Redmond, WA [email protected] [email protected] [email protected] ABSTRACT variance of a normal distribution from independent and identically During the last decade, the information technology industry has distributed (i.i.d.) observations, we only need to obtain their sum adopted a data-driven culture, relying on online metrics to measure and sum of squares, which are the corresponding summary statis- 1 and monitor business performance. Under the setting of big data, tics and can be trivially computed in a distributed fashion. In data- the majority of such metrics approximately follow normal distribu- driven businesses such as information technology, these summary tions, opening up potential opportunities to model them directly statistics are often referred to as metrics, and used for measuring without extra model assumptions and solve big data problems via and monitoring key performance indicators [15, 18]. In practice, it closed-form formulas using distributed algorithms at a fraction of is often the changes or differences between metrics, rather than the cost of simulation-based procedures like bootstrap. However, measurements at the most granular level, that are of greater inter- certain attributes of the metrics, such as their corresponding data est. In the context of randomized controlled experimentation (or generating processes and aggregation levels, pose numerous chal- A/B testing), inferring the changes of metrics can establish causal- lenges for constructing trustworthy estimation and inference pro- ity [36, 44, 50], e.g., whether a new user interface design causes cedures.
    [Show full text]
  • Chapter 6 Asymptotic Distribution Theory
    RS – Chapter 6 Chapter 6 Asymptotic Distribution Theory Asymptotic Distribution Theory • Asymptotic distribution theory studies the hypothetical distribution -the limiting distribution- of a sequence of distributions. • Do not confuse with asymptotic theory (or large sample theory), which studies the properties of asymptotic expansions. • Definition Asymptotic expansion An asymptotic expansion (asymptotic series or Poincaré expansion) is a formal series of functions, which has the property that truncating the series after a finite number of terms provides an approximation to a given function as the argument of the function tends towards a particular, often infinite, point. (In asymptotic distribution theory, we do use asymptotic expansions.) 1 RS – Chapter 6 Asymptotic Distribution Theory • In Chapter 5, we derive exact distributions of several sample statistics based on a random sample of observations. • In many situations an exact statistical result is difficult to get. In these situations, we rely on approximate results that are based on what we know about the behavior of certain statistics in large samples. • Example from basic statistics: What can we say about 1/ x ? We know a lot about x . What do we know about its reciprocal? Maybe we can get an approximate distribution of 1/ x when n is large. Convergence • Convergence of a non-random sequence. Suppose we have a sequence of constants, indexed by n f(n) = ((n(n+1)+3)/(2n + 3n2 + 5) n=1, 2, 3, ..... 2 Ordinary limit: limn→∞ ((n(n+1)+3)/(2n + 3n + 5) = 1/3 There is nothing stochastic about the limit above. The limit will always be 1/3. • In econometrics, we are interested in the behavior of sequences of real-valued random scalars or vectors.
    [Show full text]
  • Wooldridge, Introductory Econometrics, 4Th Ed. Appendix C
    Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathemati- cal statistics A short review of the principles of mathemati- cal statistics. Econometrics is concerned with statistical inference: learning about the char- acteristics of a population from a sample of the population. The population is a well-defined group of subjects{and it is important to de- fine the population of interest. Are we trying to study the unemployment rate of all labor force participants, or only teenaged workers, or only AHANA workers? Given a population, we may define an economic model that contains parameters of interest{coefficients, or elastic- ities, which express the effects of changes in one variable upon another. Let Y be a random variable (r.v.) representing a population with probability density function (pdf) f(y; θ); with θ a scalar parameter. We assume that we know f;but do not know the value of θ: Let a random sample from the pop- ulation be (Y1; :::; YN ) ; with Yi being an inde- pendent random variable drawn from f(y; θ): We speak of Yi being iid { independently and identically distributed. We often assume that random samples are drawn from the Bernoulli distribution (for instance, that if I pick a stu- dent randomly from my class list, what is the probability that she is female? That probabil- ity is γ; where γ% of the students are female, so P (Yi = 1) = γ and P (Yi = 0) = (1 − γ): For many other applications, we will assume that samples are drawn from the Normal distribu- tion. In that case, the pdf is characterized by two parameters, µ and σ2; expressing the mean and spread of the distribution, respectively.
    [Show full text]
  • The 'Delta Method'
    B Appendix The ‘Delta method’ ... Suppose you have done a study, over 4 years, which yields 3 estimates of survival (say, φ1, φ2, and φ3). But, suppose what you are really interested in is the estimate of the product of the three survival values (i.e., the probability of surviving from the beginning of the study to the end of theb study)?b Whileb it is easy enough to derive an estimate of this product (as [φ φ φ ]), how do you derive 1 × 2 × 3 an estimate of the variance of the product? In other words, how do you derive an estimate of the variance of a transformation of one or more random variables (inb thisb case,b we transform the three random variables - φi - by considering their product)? One commonly used approach which is easily implemented, not computer-intensive, and can be robustly applied inb many (but not all) situations is the so-called Delta method (also known as the method of propagation of errors). In this appendix, we briefly introduce the underlying background theory, and the implementation of the Delta method, to fairly typical scenarios. B.1. Mean and variance of random variables Our primary interest here is developing a method that will allow us to estimate the mean and variance for functions of random variables. Let’s start by considering the formal approach for deriving these values explicitly, based on the method of moments. For continuous random variables, consider a continuous function f (x) on the interval [ ∞, +∞]. The first three moments of f (x) can be written − as +∞ M0 = f (x)dx ∞ Z− +∞ M1 = x f (x)dx ∞ Z− +∞ 2 M2 = x f (x)dx ∞ Z− In the particular case that the function is a probability density (as for a continuous random variable), then M0 = 1 (i.e., the area under the PDF must equal 1).
    [Show full text]
  • Chapter 5 the Delta Method and Applications
    Chapter 5 The Delta Method and Applications 5.1 Linear approximations of functions In the simplest form of the central limit theorem, Theorem 4.18, we consider a sequence X1,X2,... of independent and identically distributed (univariate) random variables with finite variance σ2. In this case, the central limit theorem states that √ d n(Xn − µ) → σZ, (5.1) where µ = E X1 and Z is a standard normal random variable. In this chapter, we wish to consider the asymptotic distribution of, say, some function of Xn. In the simplest case, the answer depends on results already known: Consider a linear function g(t) = at+b for some known constants a and b. Since E Xn = µ, clearly E g(Xn) = aµ + b =√g(µ) by the linearity of the expectation operator. Therefore, it is reasonable to ask whether n[g(Xn) − g(µ)] tends to some distribution as n → ∞. But the linearity of g(t) allows one to write √ √ n g(Xn) − g(µ) = a n Xn − µ . We conclude by Theorem 2.24 that √ d n g(Xn) − g(µ) → aσZ. Of course, the distribution on the right hand side above is N(0, a2σ2). None of the preceding development is especially deep; one might even say that it is obvious that a linear transformation of the random variable Xn alters its asymptotic distribution 85 by a constant multiple. Yet what if the function g(t) is nonlinear? It is in this nonlinear case that a strong understanding of the argument above, as simple as it may be, pays real dividends.
    [Show full text]
  • Stat 8931 (Aster Models) Lecture Slides Deck 4 [1Ex] Large Sample
    Stat 8931 (Aster Models) Lecture Slides Deck 4 Large Sample Theory and Estimating Population Growth Rate Charles J. Geyer School of Statistics University of Minnesota October 8, 2018 R and License The version of R used to make these slides is 3.5.1. The version of R package aster used to make these slides is 1.0.2. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-sa/4.0/). The Delta Method The delta method is a method (duh!) of deriving the approximate distribution of a nonlinear function of an estimator from the approximate distribution of the estimator itself. What it does is linearize the nonlinear function. If g is a nonlinear, differentiable vector-to-vector function, the best linear approximation, which is the Taylor series up through linear terms, is g(y) − g(x) ≈ rg(x)(y − x); where rg(x) is the matrix of partial derivatives, sometimes called the Jacobian matrix. If gi (x) denotes the i-th component of the vector g(x), then the (i; j)-th component of the Jacobian matrix is @gi (x)=@xj . The Delta Method (cont.) The delta method is particularly useful when θ^ is an estimator and θ is the unknown true (vector) parameter value it estimates, and the delta method says g(θ^) − g(θ) ≈ rg(θ)(θ^ − θ) It is not necessary that θ and g(θ) be vectors of the same dimension. Hence it is not necessary that rg(θ) be a square matrix. The Delta Method (cont.) The delta method gives good or bad approximations depending on whether the spread of the distribution of θ^ − θ is small or large compared to the nonlinearity of the function g in the neighborhood of θ.
    [Show full text]
  • Taylor Approximation and the Delta Method
    Taylor Approximation and the Delta Method Alex Papanicolaou∗ April 28, 2009 1 Taylor Approximation 1.1 Motivating Example: Estimating the odds Suppose we observe X1;:::;Xn independent Bernoulli(p) random variables. Typically, we are p interested in p but there is also interest in the parameter 1−p , which is known as the odds. For example, if the outcomes of a medical treatment occur with p = 2=3, then the odds of getting better is 2 : 1. Furthermore, if there is another treatment with success probability r, we might also be p r interested in the odds ratio 1−p = 1−r , which gives the relative odds of one treatment over another. If we wished to estimate p, we would typically estimate this quantity with the observed success P p^ probabilityp ^ = i Xi=n. To estimate the odds, it then seems perfectly natural to use 1−p^ as an p estimate for 1−p . But whereas we know the variance of our estimatorp ^ is p(1 − p) (check this p^ be computing var(^p)), what is the variance of 1−p^? Or, how can we approximate its sampling distribution? The Delta Method gives a technique for doing this and is based on using a Taylor series approxi- mation. 1.2 The Taylor Series (r) dr Definition: If a function g(x) has derivatives of order r, that is g (x) = dxr g(x) exists, then for any constant a, the Taylor polynomial of order r about a is r X g(k)(a) T (x) = (x − a)k: r k! k=0 While the Taylor polynomial was introduced as far back as beginning calculus, the major theorem from Taylor is that the remainder from the approximation, namely g(x) − Tr(x), tends to 0 faster than the highest-order term in Tr(x).
    [Show full text]
  • TESTING for the PARETO DISTRIBUTION Suppose That X1
    1 M3S3/M4S3 STATISTICAL THEORY II WORKED EXAMPLE: TESTING FOR THE PARETO DISTRIBUTION Suppose that X1;:::;Xn are i.i.d random variables having a Pareto distribution with pdf θcθ f (xjθ) = x > c Xjθ xθ+1 and zero otherwise, for known constant c > 0, and parameter θ > 0. (i) Find the ML estimator, θbn, of θ, and ¯nd the asymptotic distribution of p n(θbn ¡ θT ) where θT is the true value of θ. (ii) Consider testing the hypotheses of H0 : θ = θ0 H1 : θ 6= θ0 for some θ0 > 0. Determine the likelihood ratio, Wald and Rao tests of this hypothesis. SOLUTION (i) The ML estimate θbn is computed in the usual way: Yn Yn θcθ θncnθ Ln(θ) = fXjθ(xijθ) = θ+1 = θ+1 i=1 i=1 xi sn Yn where sn = xi. Then i=1 ln(θ) = n log θ + nθ log c ¡ (θ + 1) log sn n l_ (θ) = + n log c ¡ log s n θ n and solving l_n(θ) = 0 yields the ML estimate · ¸ " #¡1 log s ¡1 1 Xn θb = n ¡ log c = log x ¡ log c : n n n i i=1 The corresponding estimator is therefore " #¡1 1 Xn θb = log X ¡ log c : n n i i=1 Computing the asymptotic distribution directly is di±cult because of the reciprocal. However, consider Á = 1/θ; by invariance, the ML estimator of Á is 1 Xn 1 Xn Áb = log X ¡ log c = (log X ¡ log c) n n i n i i=1 i=1 2 which implies how we should compute the asymptotic distribution of θbn - we use the CLT on the random variables Yi = log Xi ¡ log c = log(Xi=c), and then use the Delta Method.
    [Show full text]
  • Chapter 9. Properties of Point Estimators and Methods of Estimation
    Chapter 9. Properties of Point Estimators and Methods of Estimation 9.1 Introduction 9.2 Relative Efficiency 9.3 Consistency 9.4 Sufficiency 9.5 The Rao-Blackwell Theorem and Minimum- Variance Unbiased Estimation 9.6 The Method of Moments 9.7 The Method of Maximum Likelihood 1 9.1 Introduction • Estimator θ^ = θ^n = θ^(Y1;:::;Yn) for θ : a function of n random samples, Y1;:::;Yn. • Sampling distribution of θ^ : a probability distribution of θ^ • Unbiased estimator, θ^ : E(θ^) − θ = 0. • Properties of θ^ : efficiency, consistency, sufficiency • Rao-Blackwell theorem : an unbiased esti- mator with small variance is a function of a sufficient statistic • Estimation method - Minimum-Variance Unbiased Estimation - Method of Moments - Method of Maximum Likelihood 2 9.2 Relative Efficiency • We would like to have an estimator with smaller bias and smaller variance : if one can find several unbiased estimators, we want to use an estimator with smaller vari- ance. • Relative efficiency (Def 9.1) Suppose θ^1 and θ^2 are two unbi- ased estimators for θ, with variances, V (θ^1) and V (θ^2), respectively. Then relative efficiency of θ^1 relative to θ^2, denoted eff(θ^1; θ^2), is defined to be the ra- V (θ^2) tio eff(θ^1; θ^2) = V (θ^1) - eff(θ^1; θ^2) > 1 : V (θ^2) > V (θ^1), and θ^1 is relatively more efficient than θ^2. - eff(θ^1; θ^2) < 1 : V (θ^2) < V (θ^1), and θ^2 is relatively more efficient than θ^1.
    [Show full text]
  • Uniformity and the Delta Method
    Uniformity and the delta method Maximilian Kasy∗ January 1, 2018 Abstract When are asymptotic approximations using the delta-method uniformly valid? We provide sufficient conditions as well as closely related necessary conditions for uniform neg- ligibility of the remainder of such approximations. These conditions are easily verified by empirical practitioners and permit to identify settings and parameter regions where point- wise asymptotic approximations perform poorly. Our framework allows for a unified and transparent discussion of uniformity issues in various sub-fields of statistics and economet- rics. Our conditions involve uniform bounds on the remainder of a first-order approximation for the function of interest. ∗Associate professor, Department of Economics, Harvard University. Address: Littauer Center 200, 1805 Cambridge Street, Cambridge, MA 02138. email: [email protected]. 1 1 Introduction Many statistical and econometric procedures are motivated and justified using asymptotic ap- proximations.1 Standard asymptotic theory provides approximations for fixed parameter values, letting the sample size go to infinity. Procedures for estimation, testing, or the construction of confidence sets are considered justified if they perform well for large sample sizes, for any given parameter value. Procedures that are justified in this sense might unfortunately still perform poorly for arbi- trarily large samples. This happens if the asymptotic approximations invoked are not uniformly valid. In that case there are parameter values for every sample size such that the approximation is poor, even though for every given parameter value the approximation performs well for large enough sample sizes. Which parameter values cause poor behavior might depend on sample size, so that poor behavior does not show up in standard asymptotics.
    [Show full text]
  • Review of Mathematical Statistics Chapter 10
    Review of Mathematical Statistics Chapter 10 Definition. If fX1 ; ::: ; Xng is a set of random variables on a sample space, then any function p f(X1 ; ::: ; Xn) is called a statistic. For example, sin(X1 + X2) is a statistic. If a statistics is used to approximate an unknown quantity, then it is called an estimator. For example, we X1+X2+X3 X1+X2+X3 may use 3 to estimate the mean of a population, so 3 is an estimator. Definition. By a random sample of size n we mean a collection fX1 ;X2 ; ::: ; Xng of random variables that are independent and identically distributed. To refer to a random sample we use the abbreviation i.i.d. (referring to: independent and identically distributed). Example (exercise 10.6 of the textbook) ∗. You are given two independent estimators of ^ ^ an unknown quantity θ. For estimator A, we have E(θA) = 1000 and Var(θA) = 160; 000, ^ ^ while for estimator B, we have E(θB) = 1; 200 and Var(θB) = 40; 000. Estimator C is a ^ ^ ^ ^ weighted average θC = w θA + (1 − w)θB. Determine the value of w that minimizes Var(θC ). Solution. Independence implies that: ^ 2 ^ 2 ^ 2 2 Var(θC ) = w Var(θA) + (1 − w) Var(θB) = 160; 000 w + 40; 000(1 − w) Differentiate this with respect to w and set the derivative equal to zero: 2(160; 000)w − 2(40; 000)(1 − w) = 0 ) 400; 000 w − 80; 000 = 0 ) w = 0:2 Notation: Consider four values x1 = 12 ; x2 = 5 ; x3 = 7 ; x4 = 3 Let's arrange them in increasing order: 3 ; 5 ; 7 ; 12 1 We denote these \ordered" values by these notations: x(1) = 3 ; x(2) = 5 ; x(3) = 7 ; x(4) = 12 5+7 x(2)+x(3) The median of these four values is 2 , so it is 2 .
    [Show full text]