Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations
2006 Applications of nonparametric regression in survey statistics Xiaoxi Li Iowa State University
Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Statistics and Probability Commons
Recommended Citation Li, Xiaoxi, "Applications of nonparametric regression in survey statistics " (2006). Retrospective Theses and Dissertations. 3055. https://lib.dr.iastate.edu/rtd/3055
This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Applications of nonparametric regression in survey statistics
by
Xiaoxi Li
A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY
Major: Statistics
Program of Study Committee: Jean D. Opsomer, Major Professor Song X. Chen Barry L. Falk Wayne A. Fuller Michael D. Larsen
Iowa State University Ames, Iowa 2006 Copyright c Xiaoxi Li, 2006. All rights reserved. UMI Number: 3243537
UMI Microform 3243537 Copyright 2007 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, MI 48106-1346 ii
To my family iii
TABLE OF CONTENTS
CHAPTER 1 Model-based variance estimation for systematic sampling 1 1.1 Systematic sampling ...... 1 1.2 Variance estimation for systematic sampling ...... 3 1.3 Variance estimators under linear regression models ...... 6 1.4 Variance estimators under nonparametric models ...... 16 1.5 Simulation Study ...... 32 1.6 Simulation conclusions ...... 37
CHAPTER 2 Applications of model-based nonparametric variance estimator in Forest Inventory and Analysis (FIA) ...... 49 2.1 Introduction ...... 49 2.2 Variance estimation ...... 50 2.3 Conclusions ...... 55
CHAPTER 3 Model averaging in survey estimation ...... 74 3.1 Regression estimator ...... 74 3.2 Model averaging ...... 77 3.3 Simulation setup ...... 82 3.4 Simulation results ...... 84 3.5 Simulation conclusions ...... 90
APPENDIX A Theorems, Corollaries and Lemmas for Chapter 1 . . 116
APPENDIX B Using R to calculate αk’s in Chapter 3 ...... 121 iv
BIBLIOGRAPHY ...... 124 v
LIST OF TABLES
Table 1.1 Systematic selection of n from population of N = nk elements. Same location b is selected from n rows...... 2 Table 1.2 Relative biases when populations are sorted by x ...... 40
Table 1.3 Relative biases when populations are sorted by z1 ...... 41
Table 1.4 Relative biases when populations are sorted by z2 ...... 42 Table 1.5 Comparison of MSEs when populations are sorted by x . . . . . 43
Table 1.6 Comparison of MSEs when populations are sorted by z1 . . . . . 44
Table 1.7 Comparison of MSEs when populations are sorted by z2 . . . . . 45 Table 1.8 Comparison of MSPEs when populations are sorted by x . . . . 46
Table 1.9 Comparison of MSPEs when populations are sorted by z1 . . . . 47
Table 1.10 Comparison of MSPEs when populations are sorted by z2 . . . . 48
Table 2.1 Mean and variance estimates under model (2.1) for FIA data . . 52 Table 2.2 Mean and variance estimates under model (2.2) for FIA data . . 53 Table 2.3 Mean and variance estimates under model (2.2) for FIA data. Different span for each predictor variable ...... 54
Table 3.1 List of population functions ...... 83 Table 3.2 List of case IDs and corresponding descriptions...... 84 Table 3.3 Relative Bias, SRS design, R2 = 0.75 and n = 500 ...... 92 Table 3.4 Relative Bias, SRS design, R2 = 0.25 and n = 500 ...... 92 Table 3.5 Relative Bias, SRS design, R2 = 0.75 and n = 100 ...... 93 vi
Table 3.6 Relative Bias, SRS design, R2 = 0.25 and n = 100 ...... 93 Table 3.7 Relative Bias, STSRS design, R2 = 0.75 and n = 500 ...... 94 Table 3.8 Relative Bias, STSRS design, R2 = 0.25 and n = 500 ...... 94 Table 3.9 Relative Bias, STSRS design, R2 = 0.75 and n = 100 ...... 95 Table 3.10 Relative Bias, STSRS design, R2 = 0.25 and n = 100 ...... 95 Table 3.11 Relative MSE, SRS design, R2 = 0.75 and n = 500 ...... 96 Table 3.12 Relative MSE, SRS design, R2 = 0.25 and n = 500 ...... 96 Table 3.13 Relative MSE, SRS design, R2 = 0.75 and n = 100 ...... 97 Table 3.14 Relative MSE, SRS design, R2 = 0.25 and n = 100 ...... 97 Table 3.15 Relative MSE, STSRS design, R2 = 0.75 and n = 500 ...... 98 Table 3.16 Relative MSE, STSRS design, R2 = 0.25 and n = 500 ...... 98 Table 3.17 Relative MSE, STSRS design, R2 = 0.75 and n = 100 ...... 99 Table 3.18 Relative MSE, STSRS design, R2 = 0.25 and n = 100 ...... 99 Table 3.19 Model Averaging coefficients and standard errors for population ’Linear’ and SRS design ...... 100 Table 3.20 Model Averaging coefficients and standard errors for population ’Quadratic’ and SRS design ...... 101 Table 3.21 Model Averaging coefficients and standard errors for population ’Bump’ and SRS design ...... 102 Table 3.22 Model Averaging coefficients and standard errors for population ’Jump’ and SRS design ...... 103 Table 3.23 Model Averaging coefficients and standard errors for population ’Normal CDF’ and SRS design ...... 104 Table 3.24 Model Averaging coefficients and standard errors for population ’Exponential’ and SRS design ...... 105 Table 3.25 Model Averaging coefficients and standard errors for population ’Slow sine’ and SRS design ...... 106 vii
Table 3.26 Model Averaging coefficients and standard errors for population ’Fast sine’ and SRS design ...... 107 Table 3.27 Model Averaging coefficients and standard errors for population ’Linear’ and STSRS design ...... 108 Table 3.28 Model Averaging coefficients and standard errors for population ’Quadratic’ and STSRS design ...... 109 Table 3.29 Model Averaging coefficients and standard errors for population ’Bump’ and STSRS design ...... 110 Table 3.30 Model Averaging coefficients and standard errors for population ’Jump’ and STSRS design ...... 111 Table 3.31 Model Averaging coefficients and standard errors for population ’Normal CDF’ and STSRS design ...... 112 Table 3.32 Model Averaging coefficients and standard errors for population ’Exponential’ and STSRS design ...... 113 Table 3.33 Model Averaging coefficients and standard errors for population ’Slow sine’ and STSRS design ...... 114 Table 3.34 Model Averaging coefficients and standard errors for population ’Fast sine’ and STSRS design ...... 115 viii
LIST OF FIGURES
Figure 2.1 Map of study region in northern Utah ...... 56 Figure 2.2 Illustration of 4-per-stratum design ...... 57 Figure 2.3 Contour plot of fitted BIOMASS vs. location, span=0.5 . . . . . 58 Figure 2.4 Contour plot of fitted BIOMASS vs. location, span=0.2 . . . . . 59 Figure 2.5 Contour plot of fitted BIOMASS vs. location, span=0.1 . . . . . 60 Figure 2.6 Contour plot of fitted CRCOV vs. location, span=0.1 ...... 61 Figure 2.7 Contour plot of fitted BA vs. location, span=0.1 ...... 62 Figure 2.8 Contour plot of fitted NVOLTOT vs. location, span=0.1 . . . . 63 Figure 2.9 Contour plot of fitted FOREST vs. location, span=0.1 ...... 64 Figure 2.10 Contour plot of fitted BIOMASS (with elevation) vs. location, span=0.1 ...... 65 Figure 2.11 Contour plot of elevation vs. location...... 66 Figure 2.12 Gam component plot for BIOMASS (contributed by elevation) vs. elevation, span=0.1 ...... 67 Figure 2.13 Gam component plot for BIOMASS (contributed by location) vs. location, span=0.1 ...... 68 Figure 2.14 Gam component plots for BIOMASS, span=0.1 and 0.3 . . . . . 69 Figure 2.15 Gam component plots for CRCOV, span=0.1 and 0.3 ...... 70 Figure 2.16 Gam component plots for BA, span=0.1 and 0.3 ...... 71 Figure 2.17 Gam component plots for NVOLTOT, span=0.1 and 0.3 . . . . . 72 Figure 2.18 Gam component plots for FOREST, span=0.1 and 0.3 ...... 73 ix
ACKNOWLEGEMENTS
I would like to gratefully acknowledge my adviser Dr. Jean Opsomer’s enthusiastic supervision during this work. I greatly appreciate his help for suggesting the topic and guiding me through difficulties throughout the whole time. I also thank my PhD committee members, Dr. Wayne Fuller, Dr. Song Chen, Dr. Michael Larsen and Dr. Barry Falk for investing their time on my work and give me insightful advice. I am very thankful to other faculty and staff members at Center for Survey Statistics and Methodology (CSSM) as well for their support. I am also grateful to all my friends at Iowa State University for being the surrogate family during the five years I stayed there. Finally, I am forever indebted to my parents for their love and inspiration. Without them, I could not have imagine the completion of this work. 1
CHAPTER 1 Model-based variance estimation for systematic
sampling
1.1 Systematic sampling
Systematic sampling is widely used in surveys of finite populations due to its ap- pealing simplicity and efficiency. If applied properly, it can reflect stratifications in the population and thus can be more precise than random sampling.
Suppose that the population size is N and that the study variable is Yj ∈ <, j = 1, 2, ··· ,N. Then the population mean is
N 1 X Y¯ = Y . N N j j=1
Let n denote the sample size and k = N/n denote the sampling interval. For simplicity,
p we assume that N is an integral multiple of n, i.e. k is an integer. Let Xj ∈ <
(j = 1, 2, ··· ,N) be vectors of p auxiliary variables, where Yj is some continuous and
finite function of Xj’s. To draw a systematic sample, we first sort the population by some criterion. For
example, we can sort by one of the auxiliary variables in Xj. If the study variable Y and auxiliary variables X are related through a function, sorting by X may provide a nice ’spread’ of Y ’s so that a systematic sample can pick up hidden stratifications in the population. If we sort the population by some criterion that is not related to Y at all, for instance, sort by a variable Z which is independent of Y , then we will have a random permutation of the population. In this case, systematic sampling is equivalent to simple 2
random sampling without replacement (SRS). After sorting the population, we randomly choose an element from the first k ones,
say the bth one, then this systematic sample, denoted by Sb, consists of the observations with the following labels
b, b + k, ... , b + (n − 1)k.
Table 1.1 illustrates this procedure. Each column corresponds to a possible sys- tematic sample. As we can see, the interval k divides the population into n rows of k elements each. One element from each row is selected and each element has the same location on each row.
Sample, S
S1 ··· Sb ··· Sk
Y values Y1 ··· Yb ··· Yk
Y1+k ··· Yb+k ··· Y2k ......
Y1+(n−1)k ··· Yb+(n−1)k ··· YN
Table 1.1: Systematic selection of n from population of N = nk elements. Same location b is selected from n rows.
Note that there are k possible values for b, so there are k possible samples for a population. The selection probability for each element in the population is therefore 1/k. In systematic sampling, the population mean is estimated by the bth sample mean:
1 X Y¯ = Y . Sb n Sbj j∈Sb ¯ ¯ The design-based variance for sample mean YS, denoted by Varp(YS), was first developed 3
by Madow and Madow (1944), where
k 1 X Var (Y¯ ) = (Y¯ − Y¯ )2. (1.1) p S k Sb N b=1 ¯ In Varp(YS), S is a member of the set of possible samples {S1, ··· ,Sk}. In the terminology of cluster sampling, systematic sampling is equivalent to grouping the population into k clusters, each of size n, and drawing an SRS sample of one cluster. ¯ Thus, there is no unbiased estimator for the design variance Varp(YS) because there are not enough degrees of freedom to estimate this quantity (Iachan, 1982).
1.2 Variance estimation for systematic sampling
As we have shown, it is impossible to derive an unbiased randomization based es- ¯ timator for Varp(YS). Several alternative design-based approaches are discussed in the literature. One is to use biased variance estimators. S¨arndal et al. (1992) remarked that the estimator for SRS design, i.e.
1 − f 1 X Vˆ (Y¯ ) = (Y − Y¯ )2, (1.2) SRS S n n − 1 j S j∈S
where f = n/N, will often overestimate the variance. A more comprehensive review can be found in Wolter (1985), where eight biased variance estimators were described and guidelines for choosing among them were given. Wolter (1985) suggested that the estimators that have good properties on average are the one based on overlapping differences (Kish 1965, sec. 4.1) , i.e.
n 1 − f 1 X Vˆ (Y¯ ) = (Y − Y )2, (1.3) OL S n 2(n − 1) j j−1 j=2
and the one based on nonoverlapping differences (Kish 1965, sec. 4.1), i.e.
n/2 1 − f 1 X Vˆ (Y¯ ) = (Y − Y )2. (1.4) NO S n n 2j 2j−1 j=1 4
ˆ ¯ In the case where sample size is very small, the overlapping difference estimator VOL(YS) is preferred as it has more degrees of freedom than the nonoverlapping difference esti- mator. Another approach, for example, is to take more than one sample. T¨ornqvist (1963) discussed a method that follows the idea of interpenetrating samples introduced by Ma- ¯ ¯ halanobis (1946). Let YS1 , ··· , YSc be the means of independent systematic subsamples, ¯ ∗ −1 Pc ¯ then the variance of YS = c j=1 YSj is unbiasedly estimated by
c 1 X Vˆ (Y¯ ∗) = (Y¯ − Y¯ ∗)2. S c(c − 1) Sj S j=1
Note that the above design gives an unbiased variance estimator, but it is essentially a different design from the systematic sampling that will be discussed in this work. Zinger (1980) pursued an approach, defined as partially systematic sampling, that gives an unbiased and positive variance estimator, by using a systematic sample and ¯ ¯ a simple random sample from the remaining population. Let YS and YR denote the systematic sample mean and the simple random sample mean, respectively. Let nS and nR be the sample size of systematic sample and simple random sample, respectively. Then the combined sample mean, denoted by Y¯ (β), is
¯ ¯ ¯ Y (β) = (1 − β)YS + βYR, with β ≥ 0. The design-based variance is
¯ 2 ¯ Varp(Y (β)) = b1(β)SYU + b2(β)Varp(YS), where
1 X S2 = (Y − Y¯ )2, YU N − 1 i U j∈U 2 b1(β) = β (N − 1)(N − nS − nR)/nRN(N − nS − 1),
2 2 2 b2(β) = (1 − kβ/(k − 1)) − (N − nS − nR)β /(nR(k − 1) (N − nS − 1)). 5
¯ An unbiased estimator for Varp(Y (β)) is then
ˆ ¯ 2 V (Y (β)) = b1(β)s + b2(β)v, where
N α Q(0) − nα (Y¯ − Y¯ )2 s2 = 2 1 S R , N − 1 (nS + nR)α2 − α1(N − nS − nR)/(N − nS − 1)
(N − n − n )Q(0)/(N − n − 1) − n (n + n )(Y¯ − Y¯ )2 v = (k − 1) S R S R S R S R , (N − nS − nR)α1/(N − nS − 1) − (nS + nR)α2 and
α1 = nS + nR + knR − N,
α2 = knR + nR + [nS(nR − 1)/(N − nS − 1)].
The above variance estimation methods are conditional on the design. In other words, they are design-based in the sense that we treat the finite population as fixed. There also exist model-based variance estimators where the populations are considered random realizations from a superpopulation model. Wolter (1985) described a general model-based approach to estimate the variance, which is a similar approach that we will be following in this work. He pointed out that a practical difficulty with that model-base variance estimator is the correct specification of the superpopulation model. Montanari and Bartolucci (1998) proposed a linear model-based variance estimator, which is approximately unbiased for the anticipated variance, i.e. the expectation of the design-based variance for the sample mean under a linear superpopulation model. However, it may lack accuracy and efficiency due to a higher contribution of the bias if the systematic component of the superpopulation is significantly different from linear. A new class of unbiased estimators were proposed by Bartolucci and Montanari (2006), ˆ ¯ denoted by VL(YS) in what follows. Based on that approach, they proposed two new estimators that are unbiased under linear models. 6
We propose a model-based nonparametric variance estimator based on local polyno- mial regression. Section 1.3 reviews the model-based variance results under the linear superpopulation model. In section 1.4, we study the properties of the proposed local polynomial variance estimator under the nonparametric superpopulation model. Simu- lation results and conclusions are presented in section 1.5.
1.3 Variance estimators under linear regression models
¯ Due to the fact that no unbiased estimator for the design variance Varp(YS) exists, we consider the model-based context, where the finite population is regarded as a ran- dom realization from a superpopulation model. The following section is a continuation of Bartolucci and Montanari (2006)’s work. First, let us introduce their model-based variance estimator.
Let Yj ∈ < (j = 1, 2, ···) be a set of independent and identically distributed random
p variables. Let Xj ∈ < (j = 1, 2, ···) be vectors of auxiliary variables, which we consider to be fixed with respect to the superpopulation model. First, let us examine the case where this superpopulation model is linear. We use L to denote the following linear model
Y = X∗β∗ + ε, (1.5)
2 where EL(ε) = 0 and VarL(ε) = σLΩ, with EL and VarL denoting the expectation and variance under the model L, respectively. We assume the errors to be mutually independent, i.e. Ω is a diagonal matrix and Ω = diag{ω1, ω2, ··· , ωN }. Each element in Ω is assumed to be known. 7
T ∗ T In model (1.5), Y = (Y1,Y2, ··· ,YN ) , β = (β0, β1, ··· , βp) , and
1 X11 ··· X1p ∗ . . . . X = . . . . . 1 XN1 ··· XNp
We can write the design variance as k 1 X 1 Var (Y¯ ) = (Y¯ − Y¯ )2 = YT DY, (1.6) p S k Sb N kn2 b=1 T T 1 T where D = M HM, with M = 1n ⊗ Ik and H = Ik − k 1k1k . Here ⊗ is the Kronecker
product and 1r is a column vector of 1’s of length r. Specifically, H is a k × k matrix
1 1 with diagonal elements being 1− k and off-diagonal element being − k , and D is a N ×N matrix, which consists of n Hs in rows and n Hs in columns, i.e. HH ··· H 1 − 1 − 1 · · · − 1 k k k . . . . . . . . D = . . . . , H = ...... . (1.7) 1 1 1 HH ··· H − k − k ··· 1 − k Let X denote the submatrix of X∗ without the first column and β the subvector of ∗ ¯ β without the first element β0. Then the model anticipated variance of YS is 1 1 E [Var (Y¯ )] = βT XT DXβ + tr(DΩ)σ2 . (1.8) L p S kn2 kn2 L ¯ Bartolucci and Montanari (2006) proposed an unbiased estimator for EL[Varp(YS)], defined as 1 1 Vˆ (Y¯ ) = βˆT XT DXβˆ − tσˆ2 + tr(DΩ)ˆσ2 . (1.9) L S kn2 b b Lb kn2 Lb ˆ where βb is the weighted least square (WLS) estimator for β from the bth sample and
2 2 σˆLb is a model unbiased estimator for σL. Specifically,
ˆ T −1 T −1 T −1 βb = (Xb Ωb Xb ) Xb Ωb Yb, 8
and
(Y − X βˆ )T Ω−1(Y − X βˆ ) σˆ2 = b b b b b b b , Lb n − rank(X)
where Xb is a submatrix of X and Ωb a submatrix of Ω corresponding to the elements in the bth sample.
2 In equation (1.9), tσˆLb is a bias correction term where
k 1 X t = tr(PT XT DXP Ω ), (kn)2 b b b b=1
T −1 −1 T −1 T −1 −1 and a choice of Pb is Pb = (Xb Ωb Xb) Xb Ωb . We assume that (Xb Ωb Xb) exists. ˆ ¯ ˆ ¯ We are interested in the convergence properties of VL(YS). Note that VL(YS) is an ¯ ¯ ˆ ¯ estimator of EL[Varp(YS)] and a predictor of Varp(YS). We call VL(YS) a predictor of ¯ ¯ Varp(YS) mainly for two reasons. One is that in this model-based context, Varp(YS) is no longer a fixed value. Instead, it is a random variable associated with the random ¯ ¯ population. The other reason is that the difference between EL[Varp(YS)] and Varp(YS) has small order in probability, as we will shown in Theorem 1.1. To prove our main results in this section, we make the following assumptions.
A1.1. The superpopulation model, denoted by L, is Y = Xβ + ε, where errors εj are
2 mutually independent with mean zero, variance ωjσL.
A1.2. We consider all the elements in X, β and Ω to be fixed with respect to the superpopulation model L. We also assume that all the elements in X, β and Ω are bounded.
r A1.3. The third and fourth moments of εj, denoted by mjr = E(εj) , r = 3 and 4, exist and are bounded.
A1.4. Let sample size n and sampling interval k be positive integers with nk=N. We also let n → ∞ and k = O(1) or k → ∞. 9
ˆ ¯ The following theorem addresses the convergence properties of VL(YS). It shows that ˆ ¯ ¯ ¯ VL(YS) is a consistent estimator for EL[Varp(YS)] and a consistent predictor for Varp(YS)
−1/2 and the convergence rate is Op(n ).
¯ ¯ Theorem 1.1. Let YS denote the systematic sample mean. Let Varp(YS) denote the ¯ ¯ design-based variance of YS. Under superpopulation model L, let EL[Varp(YS)] denote ¯ ˆ ¯ ¯ the model anticipated variance of YS and VL(YS) an estimator for EL[Varp(YS)], where ¯ ¯ ˆ ¯ Varp(YS),EL[Varp(YS)] and VL(YS) are defined as in (1.6), (1.8) and (1.9), respectively. Then, assuming A1.1 - A1.4, ¯ ¯ 1 Varp(YS) − EL[Varp(YS)] = Op √ , (1.10) N
1 Vˆ (Y¯ ) − E [Var (Y¯ )] = O √ , (1.11) L S L p S p n
1 and Vˆ (Y¯ ) − Var (Y¯ ) = O √ . (1.12) L S p S p n
Proof. First, let us prove (1.10). Denote
p p p X X X T T Xβ = ( X1iβi, X2iβi, ··· , XNiβi) ≡ (Q1,Q2, ··· ,QN ) . i=1 i=1 i=1 2 Note that E(Yj) = Qj and Var(Yj) = σLωj. By assumption A1.3, for r = 3, 4, mjr =
r r E(Yj − E(Yj)) = E(εj) < ∞. By Lemma A.1,
1 Var (Var (Y¯ )) = Var YT DY L p S L kn2 1 3σ4 2σ4 = dT m d − L dT Ω2d + L tr (DΩ)2 k2n4 4 k2n4 k2n4 4σ2 4 + L βT XT DΩDXβ + βT XT Dm d, (1.13) k2n4 k2n4 3
T where D ≡ (dij) is defined as in (1.7), d = (d11, ..., dNN ) and m3 and m4 are vectors of the third and fourth moment of Y, respectively. Now let us evaluate each term on the right hand side of (1.13). 10
For the first term, note that the fourth moment m4j’s are finite, so
N 1 1 X dT m d = d2 m k2n4 4 k2n4 jj 4j j=1 N 2 1 X 1 = 1 − m k2n4 k 4j j=1 1 ≤ N max{m } k2n4 4j 1 = O . (1.14) Nn2
For the second term, since ωj’s are finite,
N 3σ4 3σ4 X L dT Ω2d = L d2 ω2 k2n4 k2n4 jj j j=1 N 2 3σ4 X 1 = L 1 − ω2 k2n4 k j j=1 3σ4 ≤ L N max{ω2} k2n4 j 1 = O . (1.15) Nn2
For the third term, note that the jth diagonal element of (DΩ)2 is
2 X 1 X 1 − ω ω + ω ω , k j j k2 j j j∈Sb j∈U where b = 1, 2, ··· , k and Sb is the systematic sample that contains the jth observation. For instance, if j = 1, k + 1, 2k + 1, ··· , (n − 1)k + 1, then b = 1; if j = 2, k + 2, 2k + 2, ··· , (n − 1)k + 2, then b = 2, etc. So
N " # X 2 X 1 X tr (DΩ)2 = 1 − ω ω + ω ω k j j k2 j j j=1 j∈Sb j∈U 2 X X 1 X X = 1 − ω ω + ω ω k j j k2 j j j∈U j∈Sb j∈U j∈U k 2 X X X 1 X X = 1 − ω ω + ω ω k j j k2 j j b=1 j∈Sb j∈Sb j∈U j∈U 11
k !2 !2 2 X X 1 X = 1 − ω + ω . k j k2 j b=1 j∈Sb j∈U Thus,
k !2 !2 2σ4 2σ4 2 1 X 1 X 2σ4 1 X L tr (DΩ)2 = L 1 − ω + L ω k2n4 kn2 k k n j k2n2 kn j b=1 j∈Sb j∈U 1 1 = O + O kn2 k2n2 1 = O . (1.16) Nn
Now let us investigate the fourth term,
4σ2 4σ2 L βT XT DΩDXβ = L (Q ,Q , ··· ,Q )DΩD(Q ,Q , ··· ,Q )T k2n4 k2n4 1 2 N 1 2 N ! 4σ2 X X X = L T 2 ω + T 2 ω + ··· + T 2 ω k2n4 1 j 2 j k j j∈S1 j∈S2 j∈Sk k ! 4σ2 X X = L T 2 ω , k2n4 b j b=1 j∈Sb where T = P Q − 1 P Q and b = 1, 2, ··· , k. b j∈Sb j k j∈U j Clearly, 1 T = O(1) and 1 P ω = O(1), so n b n j∈Sb j
k !! 4σ2 4σ2 1 X 1 1 X 1 L βT XT DΩDXβ = L T 2 ω = O . (1.17) k2n4 kn k n2 b n j N b=1 j∈Sb Finally,
4 4 βT XT Dm d = (Q ,Q , ··· ,Q )Dm d k2n4 3 k2n4 1 2 N 3 ! 4 1 X X = 1 − T m + ··· + T m k2n4 k 1 3j k 3j j∈S1 j∈Sk k ! 4 1 X X = 1 − T m k2n4 k b 3j b=1 j∈Sb k !! 4 1 X 1 1 X ≤ T m kn2 k n b n 3j b=1 j∈Sb 1 = O . (1.18) Nn 12
By (1.14), (1.15), (1.16), (1.17) and (1.18),
1 Var (Var (Y¯ )) = O . (1.19) L p S N
Note that
¯ ¯ ¯ 2 VarL(Varp(YS)) = E Varp(YS) − EL[Varp(YS)] ,
thus by Corollary A.1, ¯ ¯ 1 Varp(YS) − EL[Varp(YS)] = Op √ . (1.20) N and thus (1.10) holds. Secondly, let us prove (1.11). Note that
ˆ ¯ ¯ VL(YS) − EL[Varp(YS)] 1 1 = βˆT XT DXβˆ − tσˆ2 + tr(DΩ)ˆσ2 kn2 b b Lb kn2 Lb 1 1 − βT XT DXβ − tr(DΩ)σ2 kn2 kn2 L 1 h = (βˆ − β)T XT DX(βˆ − β) + βT XT DX(βˆ − β) kn2 b b b i 1 +(βˆ − β)T XT DXβ + tr(DΩ)(ˆσ2 − σ2 ) − tσˆ2 . (1.21) b kn2 Lb L Lb
We will examine each term on the right hand side of (1.21). ˆ √1 2 2 √1 For linear regression models, βbi − βi = Op n andσ ˆLb − σL = Op n . The 1 T element on ith row and jth column of kn2 X DX is
k 1 X (X¯ − X¯ )(X¯ − X¯ ) = O(1), (1.22) k Sbi Ui Sbj Uj b=1 where i, j = 1, ..., p. So
1 1 (βˆ − β)T XT DX(βˆ − β) = O , (1.23) kn2 b b p n
and
1 βT XT DX(βˆ − β) = (βˆ − β)T XT DXβ = O √ . (1.24) b b p n 13
Also note that
H ··· H ω1 ··· 0 1 1 tr(DΩ) = tr . .. . . .. . 2 2 . . . . . . kn kn H ··· H 0 ··· ωN
N 1 1 X = 1 − ω kn2 k j j=1 N 1 1 1 X = 1 − ω n k N j j=1 1 = O . (1.25) n
Thus,
1 1 tr(DΩ)(ˆσ2 − σ2 ) = O √ . (1.26) kn2 Lb L p n n
2 Now it remains to show the order in probability of tσˆLb. By the definition of t,
k 1 X t = tr(PT XT DXP Ω ) (kn)2 b b b b=1 k 1 X = tr(Ω−1X (XT Ω−1X )−1XT DX(XT Ω−1X )−1XT ) (kn)2 b b b b b b b b b b=1 | {z } | {z } A1 A2 k 1 X = tr((XT Ω−1X )−1XT DX(XT Ω−1X )−1XT Ω−1X ) (kn)2 b b b b b b b b b b=1 | {z } | {z } A2 A1 k 1 X = tr((XT Ω−1X )−1XT DX). (kn)2 b b b b=1
Let Ωb denote the variance-covariance matrix for observations from the bth sample, 14
then Ωb = diag{ωb, ωb+k, ··· , ωb+(n−1)k}, and
Xb1 ··· Xbp . . . Xb = . .. . . X(b+(n−1)k)1 ··· X(b+(n−1)k)p
1 T −1 The element on the ith row and jth column of n Xb Ωb Xb is
n−1 1 X X(b+tk)iX(b+tk)j = O(1), (1.27) n ω t=0 b+tk where i, j = 1, ..., p.
1 T −1 −1 1 T Note that tr(( n Xb Ωb Xb) ( kn2 X DX)) is a sum of p terms. By (1.22) and (1.27), each of those p terms is finite, so
1 1 tr(( XT Ω−1X )−1( XT DX)) = O(1). n b b b kn2
Thus
k −1 ! 1 X kn2 1 1 t = tr XT Ω−1X XT DX (kn)2 n n b b b kn2 b=1 1 = O . (1.28) n 2 2 √1 √1 Also note thatσ ˆLb = σL + Op n = O(1) + Op n = Op(1). Therefore,
1 tσˆ2 = O . (1.29) Lb p n
By (1.23), (1.24), (1.26) and (1.29), equation (1.11) holds. Finally, (1.12) follows from (1.10) and (1.11).
¯ Remark 1: Although we did not calculate the order of EL[Varp(YS)] in the proof of
2 Theorem 1.1, it is useful to describe it here. Note that ELY = Xβ, VarLY = ΩσL and 15
D is a symmetric matrix defined in (1.7). By Theorem A.1,
1 E [Var (Y¯ )] = E [ YT DY] L p S L kn2 1 1 = βT XT DXβ + tr(DΩ)σ2 . (1.30) kn2 kn2 L
Pp Pp Pp T T Let Xβ = ( i=1 X1iβi, i=1 X2iβi, ··· , i=1 XNiβi) ≡ (Q1,Q2, ··· ,QN ) , then
1 1 βT XT DXβ = (Q ,Q , ··· ,Q )D(Q ,Q , ··· ,Q )T kn2 kn2 1 2 N 1 2 N k !2 1 X 1 X 1 X = Q − Q k n j N j b=1 j∈Sb j∈U = O(1). (1.31)
By (1.25),
1 1 tr(DΩ)σ2 = O . kn2 L n
So
¯ EL[Varp(YS)] = O(1). (1.32)
Equation (1.32) suggests that for systematic sampling, the design-based variance of the sample mean is bounded and does not generally decrease as the sample size increases. ¯ Remark 2: Equation (1.32) implies that Varp(YS) = Op(1) with respect to model L. This is because Var (Y¯ ) − E [Var (Y¯ )] = O √1 . p S L p S p N 1 T T 1 2 Remark 3: In equation (1.31), the order of kn2 β X DXβ is actually O 1 − k because
1 X 1 X 1 X 1 X 1 X Q − Q = Q − Q − Q n j N j n j N j N j j∈Sb j∈U j∈Sb j∈Sb j∈U\Sb 1 1 X 1 X = − Q − Q n N j N j j∈Sb j∈U\Sb 1 1 1 = − nQ¯ − (N − n)Q¯ n N Sb N U\Sb 16
n = 1 − Q¯ − Q¯ N Sb U\Sb n = O 1 − N 1 = O 1 − , k
which implies that ! 1 12 βT XT DXβ = O 1 − . kn2 k
Note that this is essentially O(1) because in systematic sampling, k is either a constant or
1 2 an increasing number. Either way, O 1 − k = O(1). However, this term could be of ¯ ¯ a smaller order if X’s are random variables and QSb and QU have the same distribution, ¯ ¯ in which case QSb estimates QU extremely well. For example, when the population is ¯ ¯ sorted by model variable X, the difference between QSb and QU depends on the size of ¯ ¯ sampling interval k, which is also a function of n. So as n increases, |QSb − QU | will decrease towards zero.
1.4 Variance estimators under nonparametric models
A parametric method is efficient when we correctly specify the superpopulation model. However, if the superpopulation model is incorrectly specified, a parametric method may result in biased or inefficient estimation. In this section, we propose a consistent variance estimator under a nonparamet- ric model. We study the case where p = 1, i.e. one predictor variable X. Let X =
(X1,X2, ··· ,XN ). Our nonparametric superpopulation model, denoted by NP , is
Y = m + ε, (1.33)
2 2 where ENP (Yj|Xj = xj) = m(xj) and VarNP (ε) = σNP diag{ω1, ω2, ··· , ωN } ≡ σNP Ω.
Let m(·) be a continuous and bounded function, and define m = (m(x1), ··· , m(xN )). 17
Then m is a vector of bounded numbers. We assume that the ωj’s are bounded and positive, where j = 1, ..., N.
¯ 1 Pk ¯ ¯ 2 1 T As shown in (1.6), Varp(YS) = k b=1(YSb − YN ) = kn2 Y DY, so by Theorem A.1, ¯ the model anticipated variance of YS under model NP , is 1 1 E [Var (Y¯ )] = mT Dm + tr(DΩ)σ2 . (1.34) NP p S kn2 kn2 NP ¯ To estimate ENP [Varp(YS)], we propose the following estimator 1 1 Vˆ (Y¯ ) = (mˆ T Dmˆ ) + tr(DΩ)ˆσ2 , (1.35) NP S kn2 b b kn2 NP b
2 whereσ ˆNP b is defined as (Y − mˆ )T Ω−1(Y − mˆ ) σˆ2 = b b b b b , (1.36) NP b n
and mˆ b = (m ˆ (x1), ··· , mˆ (xn)), wherem ˆ (xj) is the local polynomial regression estimator obtained from bth sample:
T T −1 T mˆ (xj) = e1 (XbjWbjXbj) XbjWbjYb.
Let q be the degree of local polynomial regression. We only consider the case where q is odd. The most popular examples are local linear regression and local cubic regression.
Then e1 is the (q + 1) × 1 vector having 1 in the first entry and all other entries 0, and q 1 (x1 − xj) ··· (x1 − xj) . . . . Xbj = . . . . , q 1 (xn − xj) ··· (xn − xj)
x − x W = diag K i j , i = 1, ··· n , bj h
where h is the bandwidth, and K is the kernel function. For simplicity, we will use