Asymptotic Concepts L
Total Page:16
File Type:pdf, Size:1020Kb
Asymptotic Concepts L. Magee January, 2010 ||||||||||{ 1 Definitions of Terms Used in Asymptotic Theory Let an to refer to a random variable that is a function of n random variables. An example is a sample mean n −1 X an =x ¯ = n xi i=1 Convergence in Probability The scalar an converges in probability to a constant α if, for any positive values of and δ, there is a sufficiently large n∗ such that ∗ Prob(jan − αj > ) < δ for all n > n α is called the probability limit, or plim, of an. Consistency If an is an estimator of α, and plim an = α, then an is a (weakly) consistent estimator of α. Convergence in Distribution an converges in distribution to a random variable y (an ! y) if, as n ! 1, Prob(an ≤ b) = Prob(y ≤ b) for all b. In other words, the distribution of an becomes the same as the distribution of y. 2 Examples Let xi ∼ N[µ, σ ]; i = 1; : : : ; n, where the xi's are mutually independently distributed. Define the three statistics −1 Pn 1.x ¯ = n i=1 xi 2 −1 Pn 2 2. s = (n − 1) i=1(xi − x¯) 3. t = x¯−µ (s2=n)1=2 Considering each statistic one-by-one, 1 1. As n ! 1, Var(¯x) ! 0. This implies that plim(¯x) = µ, andx ¯ is a consistent estimator of µ. (Uses fact that Var(¯x) = σ2=n and E(¯x) = µ.) 2. As n ! 1, Var(s2) ! 0, so plim(s2) = σ2 and s2 is a consistent estimator of σ2. (Uses 2 2 2 2 2 distributional result: (n − 1)s /σ ∼ χn−1. Since E(χn−1) = n − 1 and Var(χn−1) = 2(n − 1), then Es2 = σ2 and Var(s2) = 2σ4=(n − 1).) 3. t converges in distribution to z, where z ∼ N[0; 1]. (Uses distributional result: t ∼ tn−1. Since plim(s2) = σ2, then as n ! 1, t ! (¯x − µ)=(σ2=n)1=2. This is a standardized normal random variable.) Properties (i) if plim(xn) = θx, then plim(g(xn)) = g(θx), for any function g(·) that is continuous at θx. This is sometimes called Slutsky's theorem. (ii) if xn converges in distribution to some random variable x, i.e. xn ! x, then, for any func- tion g(·), g(xn) ! g(x). That is, the distribution of g(xn) converges to the distribution of g(x). (This is like property (i) but for convergence in distribution instead of convergence in probability.) (iii) if plim(xn) = θx and plim(yn) = θy, then plim(xnyn) = θxθy. (iv) if plim(xn) = θx and yn ! y, then xnyn ! θxy. 0 −1 Often xn is a matrix and y is a normally distributed vector. Similarly, yn(xn) yn ! 0 −1 y (θx) y, which relates to the asymptotic chi-square distribution often encountered in hy- pothesis testing. 2 Order Notation It is useful to have notation that describes the rate that a statistic converges to zero or goes off to infinity as the sample size n grows. First, consider f(n), some non-random function of n. Definitions (i) f(n) is O(nd) (\is order nd") if, as n ! 1, then f(n)=nd is finite. (If d > 0, then f(n) grows to infinity at the same rate as nd, and if d < 0, f(n) shrinks to zero at the same rate as nd.) (ii) f(n) is o(nd) (\is order smaller than nd") if, as n ! 1, then f(n)=nd ! 0. 2 2.1 Examples (i) -3 is O(1), or O(n0) (ii) 5n3 is O(n3), and it is o(n4) and o(n5) 3 −1 (iii) n is O(n ) and is o(1) (iv) 3 − 2 is O(n−1) n n3=2 Example (iv) illustrates that the order of a sum depends only on the order of the highest-order term (meaning the term with the largest \d" in O(nd)) as long as the number of terms in the sum does not depend on n. If the number of terms in a sum itself depends on n, then the order of the sum can be affected. Let xi 0 and xij be O(n ) constants. For example, xi and xij do not display a trend as i or j increase. Then, Pn Pn Pn 2 in general, i=1 xi is O(n) and i=1 j=1 xij is O(n ). (For an example of what happens when 0 Pn Pn xi is not an O(n ) constant, consider i=1 xi when xi = a + bi. Then i=1 xi = na + bn(n + 1)=2, which is O(n2) when b 6= 0.) Here are some rules for operations involving order notation: O(np) + O(nq) is O(nmax(p;q)) o(np) + o(nq) is o(nmax(p;q)) O(np) + o(nq) is O(np) if p ≥ q (already mentioned in example (iv)) and is o(nq) if p < q O(np) × O(nq) is O(np+q) O(np) × o(nq) is o(np+q) o(np) × o(nq) is o(np+q) (O(np))−1 is O(n−p) (o(np))−1 is of unknown order without more information Combining these gives other results, such as O(np)=O(nq) is O(np−q). 2.2 Order in Probability Order notation can be applied to random variables, using a \p" subscript, so that Op denotes \order in probability". Let an be a random variable function of n random variables as on page 1. 3 Definitions d d (i) an is Op(n ) if, for every > 0, there is some K > 0 for which, as n ! 1, Prob(jan=n j > K) < . d d (ii) an is op(n ) if as n ! 1, then plim(an=n ) = 0. Except for special cases that usually do not apply to econometric models, (i) is equivalent to the d condition that the mean and the standard deviation of an=n stay bounded as n ! 1. Therefore d d 2d if an is Op(n ) then: (1) E(an) is O(n ) and (2) Var(an) is O(n ). Another way to think about it d f g is, if E(an) is O(n ) and Var(an) is O(n ) then an is O(n ), where g = max(d; f=2). 2.3 Examples of Order in Probability 2 Let xi; i = 1; : : : ; n, be independent random variables with mean µ and variance σ . (i) xi is Op(1) Pn 1=2 (ii) i=1 xi is Op(n ) if µ = 0, and it is Op(n) if µ 6= 0 Pn 2 2 (iii) i=1 xi is Op(n) unless µ = σ = 0 Pn 2 (ii) arises often in asymptotic theory. If µ = 0, then i=1 xi has a mean of 0 and variance nσ , Pn 1=2 2 Pn 1=2 implying i=1 xi=n has a mean of 0 and a variance of σ . So if µ = 0 then i=1 xi is Op(n ). Pn Pn 1=2 However, if µ 6= 0, then the mean of i=1 xi is nµ, implying that i=1 xi is Op(n), not Op(n ). 2 2 2 (iii) follows from the fact that xi has a finite non-trending mean, m1 = µ +σ , and finite variance, Pn 2 m2. Then i=1 xi =n has a finite mean m1 and variance m2=n. 2.4 Asymptotic Expansions −1=2 Many consistent estimators are \root-n consistent", meaning that the sampling error is Op(n ), −1=2 as in θ^ − θ = Op(n ). Asymptotic expansions simplify the analysis of the distribution of θ^, by −1=2 ignoring the part of θ^ − θ that is op(n ). This involves decomposing θ^ as ^ −1=2 θ = θ + ξ−1=2 + op(n ) (1) The right hand side of (1) contains three terms of declining importance as n ! 1. The first term −1=2 is O(1), and must equal the true parameter value if θ^ is consistent. The second term is Op(n ). 4 2 It usually has mean zero, and often is simple enough to enable the derivation of E(ξ−1=2), which is then used for estimating Var(θ^). The third term is the remainder. It is left out in most asymptotic approximations. We hope it is not very big compared to the first two terms. Its importance often can be examined most easily by simulations. 3 Application to Ordinary Least Squares with Heteroskedastic Er- rors 0 2 Assume the true model is yi = xiβ + ui, where Exiui = 0, Var(uijxi) = σi , and the ui's are P 0 −1 P independent. Consider the OLS estimator of β, b = ( xixi) xiyi. Unless indicated otherwise, the summations run over i from 1 to n, where n is the number of observations. Assume that the xi's are random, as in survey data where the randomness in both xi and yi derives from the random survey sampling. 0 th Aside on notation: xi is a k × 1 vector of observations on the RHS variables. xi is the i row of th the usual n × k matrix X, and yi is the i element of the usual n × 1 vector y. So in this vector 0 P 0 notation, a matrix product such as X X is written as xixi. Substituting out yi gives X 0 −1 X b = β + ( xixi) xiui (2) Relating this to (1), β is O(1), and there is no remainder term.