Stat 709: Mathematical Statistics Lecture 30

Total Page:16

File Type:pdf, Size:1020Kb

Stat 709: Mathematical Statistics Lecture 30 Lecture 18: U- and V-statistics U-statistics Let X1;:::;Xn be i.i.d. from an unknown population P in a nonparametric family P. If the vector of order statistic is sufficient and complete for P 2 P, then a symmetric unbiased estimator of an estimable J is the UMVUE of J. In many problems, parameters to be estimated are of the form J = E[h(X1;:::;Xm)] with a positive integer m and a Borel function h that is symmetric and satisfies Ejh(X1;:::;Xm)j < ¥ for any P 2 P. An effective way of obtaining an unbiased estimator of J (which is a UMVUE in some nonparametric problems) is to use n −1 U = h(X ;:::;X ); (1) n ∑ i1 im m c n where ∑c denotes the summation over the m combinations of mbeamer-tu-logo distinct elements fi1;:::;img from f1;:::;ng. UW-Madison (Statistics) Stat 709 Lecture 18 2018 1 / 17 Definition 3.2 The statistic in (1) is called a U-statistic with kernel h of order m. Examples m Consider the estimation of m , where m = EX1 and m is an integer > 0. m Using h(x1;:::;xm) = x1 ···xm, we obtain the following U-statistic for m : n −1 U = X ···X : n ∑ i1 im m c Consider next the estimation of 2 2 s = [ Var(X1) + Var(X2)]=2 = E[(X1 − X2) =2]; 2 we obtain the following U-statistic with kernel h(x1;x2) = (x1 − x2) =2: 2 n ! 2 (Xi − Xj ) 1 2 ¯ 2 2 Un = ∑ = ∑ Xi − nX = S ; n(n − 1) 1≤i<j≤n 2 n − 1 i=1 beamer-tu-logo which is the sample variance. UW-Madison (Statistics) Stat 709 Lecture 18 2018 2 / 17 Examples In some cases, we would like to estimate J = EjX1 − X2j, a measure of concentration. Using kernel h(x1;x2) = jx1 − x2j, we obtain the following U-statistic unbiased for J = EjX1 − X2j: 2 Un = ∑ jXi − Xj j; n(n − 1) 1≤i<j≤n which is known as Gini’s mean difference. Let J = P(X1 + X2 ≤ 0). Using kernel h(x1;x2) = I(−¥;0](x1 + x2), we obtain the following U-statistic unbiased for J: 2 Un = ∑ I(−¥;0](Xi + Xj ); n(n − 1) 1≤i<j≤n beamer-tu-logo which is known as the one-sample Wilcoxon statistic. UW-Madison (Statistics) Stat 709 Lecture 18 2018 3 / 17 Variance of a U-statistic The variance of a U-statistic Un with kernel h has an explicit form. For k = 1;:::;m, let hk (x1;:::;xk ) = E[h(X1;:::;Xm)jX1 = x1;:::;Xk = xk ] = E[h(x1;:::;xk ;Xk+1;:::;Xm)] ~ hk = hk − E[h(X1;:::;Xm)] For any U-statistic with kernel h, n −1 U − E(U ) = h~(X ;:::;X ): (2) n n ∑ i1 im m c Theorem 3.4 (Hoeffding’s theorem) 2 For a U-statistic Un with E[h(X1;:::;Xm)] < ¥, n −1 m mn − m Var(Un) = ∑ zk ; m k=1 k m − k beamer-tu-logo where zk = Var(hk (X1;:::;Xk )). UW-Madison (Statistics) Stat 709 Lecture 18 2018 4 / 17 Proof Consider two sets fi1;:::;img and fj1;:::;jmg of m distinct integers from f1;:::;ng with exactly k integers in common. n mn−m The number of distinct choices of two such sets is m k m−k . ~ By the symmetry of hm and independence of X1;:::;Xn, ~ ~ E[h(Xi1 ;:::;Xim )h(Xj1 ;:::;Xjm )] = zk for k = 1;:::;m. Then, by (2), n −2 (U ) = E[h~(X ;:::;X )h~(X ;:::;X )] Var n ∑∑ i1 im j1 jm m c c n −2 m n mn − m = ∑ zk : m k=1 m k m − k This proves the result. beamer-tu-logo UW-Madison (Statistics) Stat 709 Lecture 18 2018 5 / 17 Corollary 3.2 Under the condition of Theorem 3.4, m2 m (i) n z1 ≤ Var(Un) ≤ n zm; (ii) (n + 1) Var(Un+1) ≤ n Var(Un) for any n > m; (iii) For any fixed m and k = 1;:::;m, if zj = 0 for j < k and zk > 0, then 2 k!m z 1 Var(U ) = k k + O : n nk nk+1 For any fixed m, if zj = 0 for j < k and zk > 0, then the mse of Un is of −k k=2 the order n and, therefore, Un is n -consistent. Example 3.11 2 Consider h(x1;x2) = x1x2, the U-statistic unbiased for m , m = EX1. ~ Note that h1(x1) = mx1, h1(x1) = m(x1 − m), ~ 2 2 2 2 ~ 2 z1 = E[h1(X1)] = m Var(X1) = m s , h(x1;x2) = x1x2 − m , and 2 4 2 2 2 4 z2 = Var(X1X2) = E(X1X2) − m = (m + s ) − m . n−1 beamer-tu-logo By Theorem 3.4, for Un = 2 ∑1≤i<j≤n Xi Xj , UW-Madison (Statistics) Stat 709 Lecture 18 2018 6 / 17 n−1 2n − 2 2n − 2 Var(U ) = z + z n 2 1 1 1 2 0 2 2 h i = 2(n − 2)m2s 2 + (m2 + s 2)2 − m4 n(n − 1) 4m2s 2 2s 4 = + : n n(n − 1) Next, consider h(x1;x2) = I(−¥;0](x1 + x2), which leads to the one-sample Wilcoxon statistic. Note that h1(x1) = P(x1 + X2 ≤ 0) = F(−x1), where F is the c.d.f. of P. Then z1 = Var(F(−X1)). Let J = E[h(X1;X2)]. Then z2 = Var(h(X1;X2)) = J(1 − J). Hence, for Un being the one-sample Wilcoxon statistic, 2 Var(U ) = [2(n − 2)z + J(1 − J)]: n n(n − 1) 1 If F is continuous and symmetric about 0, then z1 can be simplified as beamer-tu-logo 1 z1 = Var(F(−X1)) = Var(1 − F(X1)) = Var(F(X1)) = 12 UW-Madison (Statistics) Stat 709 Lecture 18 2018 7 / 17 Asymptotic distributions of U-statistics For nonparametric P, the exact distribution of Un is hard to derive. We study the method of projection, which is particularly effective for studying asymptotic distributions of U-statistics. Definition 3.3 Let Tn be a given statistic based on X1;:::;Xn. The projection of Tn on kn random elements Y1;:::;Ykn is defined to be kn ˇ Tn = E(Tn) + ∑[E(TnjYi ) − E(Tn)]: i=1 ˇ Let Tn be the projection of Tn on X1;:::;Xn, and yn(Xi ) = E(TnjXi ). If Tn is symmetric (as a function of X1;:::;Xn), then yn(X1);:::;yn(Xn) ˇ are i.i.d. with mean E[yn(Xi )] = E(Tn) = E[E(TnjXi )] = E(Tn). 2 If E(Tn ) < ¥ and Var(yn(Xi )) > 0, then, by the CLT, 1 n [yn(Xi ) − E(Tn)] !d N(0;1) (3) p ∑ beamer-tu-logo n Var(yn(X1)) i=1 UW-Madison (Statistics) Stat 709 Lecture 18 2018 8 / 17 ˇ If we can show Tn − Tn has a negligible order, then we can derive the asymptotic distribution of Tn by using (3) and Slutsky’s theorem. Lemma 3.1 ˇ Let Tn be a symmetric statistic with Var(Tn) < ¥ for every n and Tn be the projection of Tn on X1;:::;Xn. ˇ Then E(Tn) = E(Tn) and ˇ 2 ˇ E(Tn − Tn) = Var(Tn) − Var(Tn): Proof ˇ Since E(Tn) = E(Tn), ˇ 2 ˇ ˇ E(Tn − Tn) = Var(Tn) + Var(Tn) − 2 Cov(Tn;Tn) ˇ ˇ 2 Cov(Tn;Tn) = E(TnTn) − [E(Tn)] 2 = nE[TnE(TnjXi )] − n[E(Tn)] 2 = nEfE[TnE(TnjXi )jXi ]g − n[E(Tn)] 2 2 = nEf[E(TnjX )] g − n[E(Tn)] i beamer-tu-logo ˇ = n Var(E(TnjXi )) = Var(Tn) UW-Madison (Statistics) Stat 709 Lecture 18 2018 9 / 17 For a U-statistic Un, one can show (exercise) that n ˇ m ~ Un = E(Un) + ∑ h1(Xi ); n i=1 ˇ where Un is the projection of Un on X1;:::;Xn and ~ h1(x) = h1(x) − E[h(X1;:::;Xm)]; h1(x) = E[h(x;X2;:::;Xm)]: ~ Hence, if z1 = Var(h1(Xi )) > 0, ˇ 2 Var(Un) = m z1=n and, by Corollary 3.2 and Lemma 3.1, ˇ 2 −2 E(Un − Un) = O(n ): This is enough for establishing the asymptotic distribution of Un. If z1 = 0 but z2 > 0, then we can show that ˇ 2 −3 E(Un − Un) = O(n ): = One may derive results for the cases where z2 0, but the case ofbeamer-tu-logo either z1 > 0 or z2 > 0 is the most interesting case in applications. UW-Madison (Statistics) Stat 709 Lecture 18 2018 10 / 17 Theorem 3.5 2 Let Un be a U-statistic with E[h(X1;:::;Xm)] < ¥. (i) If z1 > 0, then p 2 n[Un − E(Un)] !d N(0;m z1): (ii) If z1 = 0 but z2 > 0, then ¥ m(m − 1) 2 n[Un − E(Un)] !d ∑ lj (c1j − 1); (4) 2 j=1 2 where c1j ’s are i.i.d. random variables having the chi-square 2 distribution c1 and lj ’s are some constants (which may depend on ¥ 2 P) satisfying ∑j=1 lj = z2. Lemma 3.2 Let Y be the random variable on the right-hand side of (4). 2 m2(m−1)2 beamer-tu-logo Then EY = 2 z2. UW-Madison (Statistics) Stat 709 Lecture 18 2018 11 / 17 It follows from Corollary 3.2(iii) and Lemma 3.2 that if z1 = 0, then m2(m − 1)2 amse (P) = z =n2 = Var(U ) + O(n−3) Un 2 2 n Proof of Lemma 3.2.
Recommended publications
  • Notes for a Graduate-Level Course in Asymptotics for Statisticians
    Notes for a graduate-level course in asymptotics for statisticians David R. Hunter Penn State University June 2014 Contents Preface 1 1 Mathematical and Statistical Preliminaries 3 1.1 Limits and Continuity . 4 1.1.1 Limit Superior and Limit Inferior . 6 1.1.2 Continuity . 8 1.2 Differentiability and Taylor's Theorem . 13 1.3 Order Notation . 18 1.4 Multivariate Extensions . 26 1.5 Expectation and Inequalities . 33 2 Weak Convergence 41 2.1 Modes of Convergence . 41 2.1.1 Convergence in Probability . 41 2.1.2 Probabilistic Order Notation . 43 2.1.3 Convergence in Distribution . 45 2.1.4 Convergence in Mean . 48 2.2 Consistent Estimates of the Mean . 51 2.2.1 The Weak Law of Large Numbers . 52 i 2.2.2 Independent but not Identically Distributed Variables . 52 2.2.3 Identically Distributed but not Independent Variables . 54 2.3 Convergence of Transformed Sequences . 58 2.3.1 Continuous Transformations: The Univariate Case . 58 2.3.2 Multivariate Extensions . 59 2.3.3 Slutsky's Theorem . 62 3 Strong convergence 70 3.1 Strong Consistency Defined . 70 3.1.1 Strong Consistency versus Consistency . 71 3.1.2 Multivariate Extensions . 73 3.2 The Strong Law of Large Numbers . 74 3.3 The Dominated Convergence Theorem . 79 3.3.1 Moments Do Not Always Converge . 79 3.3.2 Quantile Functions and the Skorohod Representation Theorem . 81 4 Central Limit Theorems 88 4.1 Characteristic Functions and Normal Distributions . 88 4.1.1 The Continuity Theorem . 89 4.1.2 Moments .
    [Show full text]
  • Chapter 4. an Introduction to Asymptotic Theory"
    Chapter 4. An Introduction to Asymptotic Theory We introduce some basic asymptotic theory in this chapter, which is necessary to understand the asymptotic properties of the LSE. For more advanced materials on the asymptotic theory, see Dudley (1984), Shorack and Wellner (1986), Pollard (1984, 1990), Van der Vaart and Wellner (1996), Van der Vaart (1998), Van de Geer (2000) and Kosorok (2008). For reader-friendly versions of these materials, see Gallant (1987), Gallant and White (1988), Newey and McFadden (1994), Andrews (1994), Davidson (1994) and White (2001). Our discussion is related to Section 2.1 and Chapter 7 of Hayashi (2000), Appendix C and D of Hansen (2007) and Chapter 3 of Wooldridge (2010). In this chapter and the next chapter, always means the Euclidean norm. kk 1 Five Weapons in Asymptotic Theory There are …ve tools (and their extensions) that are most useful in asymptotic theory of statistics and econometrics. They are the weak law of large numbers (WLLN, or LLN), the central limit theorem (CLT), the continuous mapping theorem (CMT), Slutsky’s theorem,1 and the Delta method. We only state these …ve tools here; detailed proofs can be found in the techinical appendix. To state the WLLN, we …rst de…ne the convergence in probability. p De…nition 1 A random vector Zn converges in probability to Z as n , denoted as Zn Z, ! 1 ! if for any > 0, lim P ( Zn Z > ) = 0: n !1 k k Although the limit Z can be random, it is usually constant. The probability limit of Zn is often p denoted as plim(Zn).
    [Show full text]
  • Chapter 6 Asymptotic Distribution Theory
    RS – Chapter 6 Chapter 6 Asymptotic Distribution Theory Asymptotic Distribution Theory • Asymptotic distribution theory studies the hypothetical distribution -the limiting distribution- of a sequence of distributions. • Do not confuse with asymptotic theory (or large sample theory), which studies the properties of asymptotic expansions. • Definition Asymptotic expansion An asymptotic expansion (asymptotic series or Poincaré expansion) is a formal series of functions, which has the property that truncating the series after a finite number of terms provides an approximation to a given function as the argument of the function tends towards a particular, often infinite, point. (In asymptotic distribution theory, we do use asymptotic expansions.) 1 RS – Chapter 6 Asymptotic Distribution Theory • In Chapter 5, we derive exact distributions of several sample statistics based on a random sample of observations. • In many situations an exact statistical result is difficult to get. In these situations, we rely on approximate results that are based on what we know about the behavior of certain statistics in large samples. • Example from basic statistics: What can we say about 1/ x ? We know a lot about x . What do we know about its reciprocal? Maybe we can get an approximate distribution of 1/ x when n is large. Convergence • Convergence of a non-random sequence. Suppose we have a sequence of constants, indexed by n f(n) = ((n(n+1)+3)/(2n + 3n2 + 5) n=1, 2, 3, ..... 2 Ordinary limit: limn→∞ ((n(n+1)+3)/(2n + 3n + 5) = 1/3 There is nothing stochastic about the limit above. The limit will always be 1/3. • In econometrics, we are interested in the behavior of sequences of real-valued random scalars or vectors.
    [Show full text]
  • Chapter 5 the Delta Method and Applications
    Chapter 5 The Delta Method and Applications 5.1 Linear approximations of functions In the simplest form of the central limit theorem, Theorem 4.18, we consider a sequence X1,X2,... of independent and identically distributed (univariate) random variables with finite variance σ2. In this case, the central limit theorem states that √ d n(Xn − µ) → σZ, (5.1) where µ = E X1 and Z is a standard normal random variable. In this chapter, we wish to consider the asymptotic distribution of, say, some function of Xn. In the simplest case, the answer depends on results already known: Consider a linear function g(t) = at+b for some known constants a and b. Since E Xn = µ, clearly E g(Xn) = aµ + b =√g(µ) by the linearity of the expectation operator. Therefore, it is reasonable to ask whether n[g(Xn) − g(µ)] tends to some distribution as n → ∞. But the linearity of g(t) allows one to write √ √ n g(Xn) − g(µ) = a n Xn − µ . We conclude by Theorem 2.24 that √ d n g(Xn) − g(µ) → aσZ. Of course, the distribution on the right hand side above is N(0, a2σ2). None of the preceding development is especially deep; one might even say that it is obvious that a linear transformation of the random variable Xn alters its asymptotic distribution 85 by a constant multiple. Yet what if the function g(t) is nonlinear? It is in this nonlinear case that a strong understanding of the argument above, as simple as it may be, pays real dividends.
    [Show full text]
  • Asymptotic Results for the Linear Regression Model
    Asymptotic Results for the Linear Regression Model C. Flinn November 29, 2000 1. Asymptotic Results under Classical Assumptions The following results apply to the linear regression model y = Xβ + ε, where X is of dimension (n k), ε is a (unknown) (n 1) vector of disturbances, and β is a (unknown) (k 1)×parametervector.Weassumethat× n k, and that × À ρ(X)=k. This implies that ρ(X0X)=k as well. Throughout we assume that the “classical” conditional moment assumptions apply, namely E(εi X)=0 i. • | ∀ 2 V (εi X)=σ i. • | ∀ We Þrst show that the probability limit of the OLS estimator is β, i.e., that it is consistent. In particular, we know that 1 βˆ = β +(X0X)− X0ε 1 E(βˆ X)=β +(X0X)− X0E(ε X) ⇒ | | = β In terms of the (conditional) variance of the estimator βˆ, 2 1 V (βˆ X)=σ (X0X)− . | Now we will rely heavily on the following assumption X X lim n0 n = Q, n n →∞ where Q is a Þnite, nonsingular k k matrix. Then we can write the covariance ˆ × of βn in a sample of size n explicitly as 2 µ ¶ 1 ˆ σ Xn0 Xn − V (β Xn)= , n| n n so that 2 µ ¶ 1 ˆ σ Xn0 Xn − lim V (βn Xn) = lim lim n n n →∞ | 1 =0 Q− =0 × Since the asymptotic variance of the estimator is 0 and the distribution is centered on β for all n, we have shown that βˆ is consistent. Alternatively, we can prove consistency as follows.
    [Show full text]
  • Asymptotic Independence of Correlation Coefficients With
    Electronic Journal of Statistics Vol. 5 (2011) 342–372 ISSN: 1935-7524 DOI: 10.1214/11-EJS610 Asymptotic independence of correlation coefficients with application to testing hypothesis of independence Zhengjun Zhang Department of Statistics, University of Wisconsin, Madison, WI 53706, USA e-mail: [email protected] Yongcheng Qi Department of Mathematics and Statistics, University of Minnesota Duluth, Duluth, MN 55812, USA e-mail: [email protected] and Xiwen Ma Department of Statistics, University of Wisconsin, Madison, WI 53706, USA e-mail: [email protected] Abstract: This paper first proves that the sample based Pearson’s product- moment correlation coefficient and the quotient correlation coefficient are asymptotically independent, which is a very important property as it shows that these two correlation coefficients measure completely different depen- dencies between two random variables, and they can be very useful if they are simultaneously applied to data analysis. Motivated from this fact, the paper introduces a new way of combining these two sample based corre- lation coefficients into maximal strength measures of variable association. Second, the paper introduces a new marginal distribution transformation method which is based on a rank-preserving scale regeneration procedure, and is distribution free. In testing hypothesis of independence between two continuous random variables, the limiting distributions of the combined measures are shown to follow a max-linear of two independent χ2 random variables. The new measures as test statistics are compared with several existing tests. Theoretical results and simulation examples show that the new tests are clearly superior. In real data analysis, the paper proposes to incorporate nonlinear data transformation into the rank-preserving scale re- generation procedure, and a conditional expectation test procedure whose test statistic is shown to have a non-standard limit distribution.
    [Show full text]
  • Deriving the Asymptotic Distribution of U- and V-Statistics of Dependent
    Bernoulli 18(3), 2012, 803–822 DOI: 10.3150/11-BEJ358 Deriving the asymptotic distribution of U- and V-statistics of dependent data using weighted empirical processes ERIC BEUTNER1 and HENRYK ZAHLE¨ 2 1Department of Quantitative Economics, Maastricht University, P.O. Box 616, NL–6200 MD Maastricht, The Netherlands. E-mail: [email protected] 2 Department of Mathematics, Saarland University, Postfach 151150, D–66041 Saarbr¨ucken, Germany. E-mail: [email protected] It is commonly acknowledged that V-functionals with an unbounded kernel are not Hadamard differentiable and that therefore the asymptotic distribution of U- and V-statistics with an un- bounded kernel cannot be derived by the Functional Delta Method (FDM). However, in this article we show that V-functionals are quasi-Hadamard differentiable and that therefore a mod- ified version of the FDM (introduced recently in (J. Multivariate Anal. 101 (2010) 2452–2463)) can be applied to this problem. The modified FDM requires weak convergence of a weighted version of the underlying empirical process. The latter is not problematic since there exist sev- eral results on weighted empirical processes in the literature; see, for example, (J. Econometrics 130 (2006) 307–335, Ann. Probab. 24 (1996) 2098–2127, Empirical Processes with Applications to Statistics (1986) Wiley, Statist. Sinica 18 (2008) 313–333). The modified FDM approach has the advantage that it is very flexible w.r.t. both the underlying data and the estimator of the unknown distribution function. Both will be demonstrated by various examples. In particular, we will show that our FDM approach covers mainly all the results known in literature for the asymptotic distribution of U- and V-statistics based on dependent data – and our assumptions are by tendency even weaker.
    [Show full text]
  • Asymptotic Distributions in Time Series
    Statistics 910, #11 1 Asymptotic Distributions in Time Series Overview Standard proofs that establish the asymptotic normality of estimators con- structed from random samples (i.e., independent observations) no longer apply in time series analysis. The usual version of the central limit theorem (CLT) presumes independence of the summed components, and that's not the case with time series. This lecture shows that normality still rules for asymptotic distributions, but the arguments have to be modified to allow for correlated data. 1. Types of convergence 2. Distributions in regression (Th A.2, section B.1) 3. Central limit theorems (Th A.3, Th A.4) 4. Normality of covariances, correlations (Th A.6, Th A.7) 5. Normality of parameter estimates in ARMA models (Th B.4) Types of convergence Convergence The relevant types of convergence most often seen in statis- tics and probability are • Almost surely, almost everywhere, with probability one, w.p. 1: a.s. Xn ! X : P f! : lim Xn = Xg = 1: • In probability, in measure: p Xn ! X : lim f! : jXn − Xj > g = 0 : n P • In distribution, weak convergence, convergence in law: d Xx ! X : lim (Xn ≤ x) = (X ≤ x) : n P P Statistics 910, #11 2 • In mean square or `2, the variance goes to zero: m.s. 2 Xn ! X : lim (Xn − X) = 0 : n E Connections Convergence almost surely (which is much like good old fashioned convergence of a sequence) implies covergence almost surely which implies covergence in distribution: p a.s.!) !) !d Convergence in distribution only implies convergence in probability if the distribution is a point mass (i.e., the r.v.
    [Show full text]
  • Asymptotic Variance of an Estimator
    Chapter 10 Asymptotic Evaluations 10.1 Point Estimation 10.1.1 Consistency The property of consistency seems to be quite a fundamental one, requiring that the estimator converges to the correct value as the sample size becomes infinite. Definition 10.1.1 A sequence of estimators Wn = Wn(X1,...,Xn) is a consistent sequence of estimators of the parameter θ if, for every ² > 0 and every θ ∈ Θ, lim Pθ(|Wn − θ| < ²) = 1. n→∞ or equivalently, lim Pθ(|Wn − θ| ≥ ²) = 0. n→∞ 169 170 CHAPTER 10. ASYMPTOTIC EVALUATIONS Note that in this definition, we are dealing with a family of prob- ability structures. It requires that for every θ, the corresponding estimator sequence will converge to θ in probability. Recall that, for an estimator Wn, Chebychev’s Inequality states E [(W − θ)2] P (|W − θ| ≥ ²) ≤ θ n , θ n ²2 so if, for every θ ∈ Θ, 2 lim Eθ[(Wn − θ) ] = 0, n→∞ then the sequence of estimators is consistent. Furthermore, 2 2 Eθ[(Wn − θ) ] = VarθWn + [BiasθWn] . Putting this all together, we can state the following theorem. Theorem 10.1.1 If Wn is a sequence of estimators of a param- eter θ satisfying i. limn→∞ VarθWn = 0, ii. limn→∞ BiasθWn = 0, for every θ ∈ Θ, then Wn is a consistent sequence of estimators of θ. 10.1. POINT ESTIMATION 171 Example 10.1.1 (Consistency of X¯ ) Let X1,... be iid N(θ, 1), and consider the sequence 1 Xn X¯ = X . n n i i=1 Since 1 E X¯ = θ and Var X¯ = , θ n θ n n the sequence X¯n is consistent.
    [Show full text]
  • Asymptotic Inference in Some Heteroscedastic Regression Models
    The Annals of Statistics 2008, Vol. 36, No. 1, 458–487 DOI: 10.1214/009053607000000686 c Institute of Mathematical Statistics, 2008 ASYMPTOTIC INFERENCE IN SOME HETEROSCEDASTIC REGRESSION MODELS WITH LONG MEMORY DESIGN AND ERRORS By Hongwen Guo and Hira L. Koul Michigan State University This paper discusses asymptotic distributions of various estima- tors of the underlying parameters in some regression models with long memory (LM) Gaussian design and nonparametric heteroscedas- tic LM moving average errors. In the simple linear regression model, the first-order asymptotic distribution of the least square estimator of the slope parameter is observed to be degenerate. However, in the second order, this estimator is n1/2-consistent and asymptoti- cally normal for h + H< 3/2; nonnormal otherwise, where h and H are LM parameters of design and error processes, respectively. The finite-dimensional asymptotic distributions of a class of kernel type estimators of the conditional variance function σ2(x) in a more gen- eral heteroscedastic regression model are found to be normal when- ever H < (1 + h)/2, and non-normal otherwise. In addition, in this general model, log(n)-consistency of the local Whittle estimator of H based on pseudo residuals and consistency of a cross validation type estimator of σ2(x) are established. All of these findings are then used to propose a lack-of-fit test of a parametric regression model, with an application to some currency exchange rate data which exhibit LM. 1. Introduction. This paper discusses asymptotic distributions of some estimators of the underlying parameters in some heteroscedastic regression models with LM in design and errors.
    [Show full text]
  • U-Statistics
    U-STATISTICS Notes for Statistics 200C, Spring 2005 Thomas S. Ferguson 1. Definitions. The basic theory of U-statistics was developed by W. Hoeffding (1948a). Detailed expositions of the general topic may be found in M. Denker (1985) and A. J. Lee (1990). See also Fraser (1957) Chapter 6, Serfling (1980) Chapter 5, and Lehmann (1999) Chapter 6. Let P be a family of probability measures on an arbitrary measurable space. The problems treated here are nonparametric, which means that P will be taken to be a large family of distributions subject only to mild restrictions such as continuity or existence of moments. Let θ(P ) denote a real-valued function defined for P ∈P. The first notion we need is that of an estimable parameter. (Hoeffding called these regular parameters.) Definition 1. We say that θ(P ) is an estimable parameter within P, if for some integer m there exists an unbiased estimator of θ(P ) based on m i.i.d. random variables distributed according to P ; that is, if there exists a real-valued measurable function h(x1,...,xm) such that EP (h(X1,...,Xm)) = θ(P ) for all P ∈P, (1) when X1,...,Xm are i.i.d. with distribution P . The smallest integer m with this property is called the degree of θ(P ). It should be noted that the function h may be assumed to be a symmetric function of its arguments. This is because if f is an unbiased estimator of θ(P ), then the average of f applied to all permutations of the variables is still unbiased and is, in addition, symmetric.
    [Show full text]
  • A Unified Asymptotic Distribution Theory for Parametric and Non
    AUnified Asymptotic Distribution Theory for Parametric and Non-Parametric Least Squares Bruce E. Hansen∗ University of Wisconsin† November 2015 Preliminary Abstract This paper presents simple and general conditions for asymptotic normality of least squares esti- mators allowing for regressors vectors which expand with the sample size. Our assumptions include series and sieve estimation of regression functions, and any context where the regressor set increases with the sample size. The conditions are quite general, including as special cases the assumptions commonly used for both parametric and nonparametric sieve least squares. Our assumptions allow the regressors to be unbounded, and do not bound the conditional variances. Our assumptions allow the number of regressors to be either fixedorincreasingwithsample size. Our conditions bound the allowable rate of growth of as a function of the number of finite moments, showing that there is an inherent trade-off between the number of moments and the allowable number of regressors. ∗Research supported by the National Science Foundation. I thank Victor Chernozhukov and Xiaohong Chen for helpful and detailed comments. †Department of Economics, 1180 Observatory Drive, University of Wisconsin, Madison, WI 53706. 1Introduction This paper presents simple and general conditions for asymptotic normality of least squares estimators allowing for regressors vectors which expand with the sample size. Our assumptions include series and sieve estimation of regression functions, and any context where the regressor set increases with the sample size. We focus on asymptotic normality of studentized linear functions of the coefficient vector and regression function. Our results are general and unified, in the sense that they include as special cases the conditions for asymptotic normality obtained in much of the previous literature.
    [Show full text]