Catalogue no. 12-001-XIE

Survey Methodology

June 2002 How to obtain more information

Specific inquiries about this product and related or services should be directed to: Business Survey Methods Division, Statistics Canada, Ottawa, Ontario, K1A 0T6 (telephone: 1-800-263-1136).

For information on the wide range of data available from Statistics Canada, you can contact us by calling one of our toll-free numbers. You can also contact us by e-mail or by visiting our website.

National inquiries line 1-800-263-1136 National telecommunications device for the hearing impaired 1-800-363-7629 Depository Services Program inquiries 1-800-700-1033 Fax line for Depository Services Program 1-800-889-9734 E-mail inquiries [email protected] Website www.statcan.ca

Information to access the product

This product, catalogue no. 12-001-XIE, is available for free. To obtain a single issue, visit our website at www.statcan.ca and select Publications.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner and in the official language of their choice. To this end, the Agency has developed standards of service that its employees observe in serving its clients. To obtain a copy of these service standards, please contact Statistics Canada toll free at 1 800 263-1136. The service standards are also published on www.statcan.ca under About us > Providing services to Canadians. Statistics Canada Business Survey Methods Division Survey Methodology

June 2002

Published by authority of the Minister responsible for Statistics Canada

© Minister of Industry, 2006

All rights reserved. The content of this electronic publication may be reproduced, in whole or in part, and by any means, without further permission from Statistics Canada, subject to the following conditions: that it be done solely for the purposes of private study, research, criticism, review or newspaper summary, and/or for non-commercial purposes; and that Statistics Canada be fully acknowledged as follows: Source (or “Adapted from”, if appropriate): Statistics Canada, year of publication, name of product, catalogue number, volume and issue numbers, reference period and page(s). Otherwise, no part of this publication may be reproduced, stored in a retrieval system or transmitted in any form, by any means—electronic, mechanical or photocopy—or for any purposes without prior written permission of Licensing Services, Client Services Division, Statistics Canada, Ottawa, Ontario, Canada K1A 0T6.

October 2006 Catalogue no. 12-001-XIE ISSN 1492-0921 Frequency: semi-annual Ottawa

Cette publication est disponible en français sur demande (no 12-001-XIF au catalogue).

Note of appreciation Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued cooperation and goodwill. Survey Methodology, June 2002 5 Vol. 28, No. 1, pp. 5­23 Statistics Canada, Catalogue No. 12­001­XIE

Regression Estimation for Survey Samples

Wayne A. Fuller 1

Abstract Regression and regression related procedures have become common in survey estimation. We review the basic properties of regression estimators, discuss implementation of regression estimation, and investigate variance estimation for regression estimators. The role of models in constructing regression estimators and the use of regression in nonresponse adjustment are explored.

Key Words: Auxiliary information; Calibration; Least squares; Design consistency; Linear prediction.

1. Introduction available to non governmental institutions selecting samples. Design and estimation in survey sampling involve the In the third situation, only the population mean of x is use of information about the study population to construct known, or known for a large sample. In this case, the efficient procedures. While design and estimation are auxiliary information cannot be used in design and the intimately related, with estimators depending on the design, estimation options are limited. For example the U.S. the two topics are often treated somewhat separately in the Department of Agriculture might release an estimate of the survey sampling literature. We follow tradition first total number of animals of a particular type on farms on a studying estimation treating the design as given. The particular date. Our discussion concentrates on this situation. estimation task is to combine the available information Two estimation situations can also be identified. In one, about the population, with the sample data to produce good a single variable and a parameter, or a very small number of representations of characteristics of interest. parameters, is under consideration. The analyst is willing to Regression estimation is one of the important procedures invest a great deal of effort in the analysis, has a well that use population information or information from a larger formulated population model, and is prepared to support the sample, to construct estimators with good efficiency. The estimation procedure on the basis of the reasonableness of information, sometimes called auxiliary information, may the model. In the second situation, a large number of have been used in the design or may not have been available analyses of a large number of variables is anticipated. No at the design stage. In surveys of the human population, the single model is judged adequate for all variables. The information often comes from official sources such as the prototypical example of the second situation is the case in national census. Similar sources may provide information which a data set is prepared by the survey sampler to be for other types of surveys. For example, in a survey of land analyzed by others. Because the person preparing the data use the total surface area, the area owned by the national set does not have knowledge of the analysis variables, government, and the area in permanent water bodies may be emphasis is placed on the use of estimators that can be available from national data archives. defended with minimal recourse to models. Three distinct situations can be identified with respect to Regression estimators fall in the class of linear esti­ the nature of the auxiliary information that is available. In mators. Linear estimators have a particular advantage in the first, the values of the auxiliary vector x are known for survey sampling because once the weights are calculated each element in the population at the time of sample they are appropriate for any analysis variable. Several selection. In this case the auxiliary variable can be used in properties of estimators will be examined in our discussion. designing the sample selection procedure. Given a model, we accept the classical goal of minimizing In the second situation all values of the vector x are the mean square error in a class of estimators. That class known, but a particular value cannot be associated with a may be the class of linear estimators that are unbiased under particular element until the sample is observed. In this case, the model, but the class may be further restricted. the auxiliary information cannot be used in design, but a Estimators that are scale and location invariant can be wide range of estimation options are available once the used in general settings. Mickey (1959) suggested that the observations are available. For example, the population term regression estimator be restricted to linear estimators census may give the age­sex distribution of the population, that are location and scale invariant. While we may not but a list of individuals and their characteristics is not adhere strictly to this definition, we support the distinction

1. Wayne Fuller, Emeritus Distinguished Professor, Iowa State University, 221 Snedecor Hall, Ames, IA 50011­1210, U.S.A. 6 Fuller: Regression Estimation for Survey Samples between estimators that are location and scale invariant and A great deal of research was conducted in the 1970’s and those that are not. We consider location invariance to be 1980’s on the general nature of the regression estimator in important for sampling designs where the unit of interest for survey samples and on the degree to which the model analysis is also the sampling unit. For cluster and two stage prediction approach can be reconciled with the design designs in which weights are constructed for primary perspective. Fuller (1973, 1975) gave the large sample sampling unit totals, location invariance is less important. properties of a vector of regression coefficients computed Models play an important role in the construction of from a survey sample. Isaki (1970) studied regression regression estimators. It is desirable that the estimators estimators and the results were published in expanded retain good properties if the model specification is not exact. versions in Isaki and Fuller (1982) and Fuller and Isaki Therefore properties conditional on the realized finite (1981). It was shown that a regression estimator constructed population, as well as properties under the model, are under a model is design consistent for the population mean if the model contains certain variables. Cassel, Särndal and important. Wretman (1976) considered both model and design Linear estimators that reproduce the known means of the principles in estimator construction and suggested the term auxiliary variables are said to be calibrated. This is a desir­ “generalized regression estimator” for design consistent able property in that, for example, the marginals of tables estimators of the total of the form with an auxiliary variable as an analysis variable agree with known totals. If the auxiliary variable is of no analytic Tˆ = Tˆ + (T − Tˆ ) βˆ , interest, then calibration is less important. y, GREG y, HT x, N x , HT ˆ ˆ where T y , HT and Tx , HT are the Horvitz­Thompson estimators of the totals of y and x, respectively, Tx , N is the 2. Background know population total of x and βˆ is an estimated regression coefficient. Särndal (1980), Wright (1983), and Särndal and The earliest references to the use of regression in survey Wright (1984) discussed classes of regression estimators. sampling include Jessen (1942) and Cochran (1942). The text by Särndal, Swensson and Wretman (1992) Regression in similar contexts would certainly have been contains an extensive discussion of regression estimation used earlier and Cochran (1977, page 189) mentions a and Mukhopadhyay (1993) is a review. regression on leaf area by Watson (1937). It is interesting It was the 1970’s before the use of regression for general that Jessen’s use of regression was essentially composite purpose, multiple characteristic, surveys appeared and it was estimation where regression was used to improve estimates the 1990’s before the use of regression weighting could be for two time points given samples at each point with some called widespread. An early use of regression weights was common elements in the two samples. Cochran (1942) gave at Doane Agricultural Services Inc., now Doane Marketing the basic theory for regression in survey sampling relying Research. During 1971–1972 a readership study of farmers heavily on linear model theory. He showed that the linear was conducted under the direction of Mr. John Wilkin in model did not need to hold in order for the regression which 6,920 farmers responded. Weights for the estimator to perform well. He derived an expression for the respondents were constructed using regression procedures, O(n­ 1) bias and an O(n­ 2) approximation for the variance. He where the controls came from the U.S. Agricultural Census also showed that for the model with regression passing and from Department of Agriculture sources. Doane through the origin and error variances proportional to x, the provided financial support to Iowa State University to ratio estimator is the generalized least squares estimator. develop a regression weight generation program. To Regression estimation attracted theoretical interest in the guarantee positive weights in the Doane study, observations 1950’s, often in the form of studies of the bias. See Mickey with small weights were grouped and assigned a common weight. Grouping continued until the common weight was (1959). Brewer (1963) is an early reference that considers positive. Later computer programs used modifications of the linear estimation using a superpopulation model to Huang and Fuller (1978) procedure to guarantee positive determine an optimal procedure. He was concerned with weights. Doane has used regression weights for their finding the optimal design for the ratio estimator and syndicated market research studies since 1972. discussed the possible conflict between an optimal design Regression estimation was first used at Statistics Canada under the model and a design that is less model dependent. in 1988 for the Canadian Labour Force Survey. In 1992 See also Brewer (1979). Royall (1970) argued for the use of regression estimation was used by the 1991 Canadian models, that the conditional properties that are important are Census of Population to ensure that the weighted sum of those conditional on the auxiliary information in the sample, variables collected via the long form (a one in five and that the design should be chosen to optimize those systematic sample of all households in Canada) was equal properties. Royall and his coworkers, e.g., Royall and to known household and population totals as collected in the Cumberland (1981), studied the conditional properties of 1991 Census. See Bankier, Rathwell and Majkowski (1992) regression estimators, conditional on the realized sample of and Bankier, Houle and Luc (1997). The regression auxiliary variables. estimator is also the key component of the Generalized

Statistics Canada, Catalogue No. 12­001­XIE Survey Methodology, June 2002 7

Estimation System (GES) developed at Statistics Canada where the weights, w αi , minimize the Lagrangean and used in numerous business and social surveys since its k   release in 1992. The methodology is described in Estevao, w2 + λ w x − α ∑ αi ∑ j  ∑ α i ij j  Hidiroglou and Särndal (1995). See also Hidiroglou, i∈A j=1  i∈ A  Särndal and Binder (1995). Regression estimation is now ˆ used to construct composite estimators for the Canadian and the λ j are Lagrange multipliers. The variance of θ α is Labour Force Survey. See Singh, Kennedy and Wu (2001),   Gambino, Kennedy and Singh (2001) and Fuller and Rao ˆ   2 2 V{θα} = V ∑ wαi ei  = ∑ wα i σ e (2001). i∈A  i∈ A Bethlehem and Keller (1987) report on the use of regression estimation at the Netherlands Central Bureau of because the weights are functions of the x i and not of y i . Statistics (now Statistics Netherlands) in a program called The covariance matrix of βˆ is LIN WEIGHT. Nieuwenbrock, Renssen and Hofman − 1 −1 ˆ       (2000) describe the software package Bascula, that has V {β } =  ∑ x ′i x i  V ∑ b ′i   ∑ x ′i x i  replaced LIN WEIGHT. Deville, Särndal and Sautory  i∈ A   i∈ A   i∈ A  (1993) describe a computer program CALMAR developed   (3.3) = V  ∑ c i  at Institut National de la Statistique et des Etudes  i∈ A  Economiques (I.N.S.E.E.) that computes weights of the −1 where b ′i = x ′i ei and c i = ( X ′ X ) x ′i ei . Because ei is regression type with options for different objective independent of x j for all i and j, functions. A program developed at Statistics Sweden and called CLAN97 is documented in Anderson and Nordberg   2 V ∑ b ′i  = ∑ V {b ′i } = ∑ x ′i x i σ e (1998). Folsom and Singh (2000) discuss a procedure  i∈ A  i∈ A i∈ A developed at the Research Triangle Institute. and we obtain the familiar expression,

−1 3. The Classical Linear Model ˆ   2 V {β } =  ∑ x ′i x i  σ e .  i∈ A  The classical linear model is the foundation for survey regression estimation, but the survey situation requires The usual unbiased estimator of the covariance matrix of βˆ 2 certain adaptations. To introduce regression estimation for is obtained by replacing σ e with the unbiased estimator of 2 survey samples, we review the classical linear model. σ e obtained as the mean square of the residuals, ˆ Assume eˆi = y i − x i β . An estimator of the covariance matrix that estimates V {∑ i ∈ A b′i } directly is y i = x i β + e i , i = 1, 2, ..., n, − 1 − 1 2 ~     e i ~ NI ( 0, σ e ) , (3.1) ˆ ˆ ˆ V b {β } =  ∑ x ′i x i  ∑ b ′i b i  ∑ x ′i x i   i∈ A  i∈ A  i∈ A  where ei is independent of the k–dimensional row ′ vectors xj for all i and j, and β is the unknown parameter = ∑ c ˆ i c ˆ i , (3.4) column vector. We will also use matrix representations for i∈ A ˆ −1 the sample quantities. Thus, for a sample of n elements, where b ′i = x ′i eˆi and cˆ i = ( X ′ X ) x ′i eˆi . In the same way ˆ ˆ 2 2 X ′ = ( x1 ′ , x ′2 , ..., x ′n ) and y ′ = ( y1 , y 2 , ..., y n ). Vb {θ α } = ∑ wα i e ˆi (3.5) i∈ A Given a sample of size n and treating the x i as fixed, the best (minimum mean squared error) estimator of β is is a linear combination of the elements of (3.4) and is a ˆ consistent estimator of V {θ α }. The estimator (3.4) is a − 1   consistent estimator of V {βˆ } when the covariance matrix ˆ ′ ′ ′ −1 ′ β =  ∑ x i x i  ∑ x i y i = ( X X ) X y , (3.2) of the e is a diagonal matrix with bounded elements. Thus  i∈ A  i∈ A i it is a more robust estimator. However, the estimator (3.4) is where A is the set of indexes of the sample elements and we biased downward because the variance of eˆi is usually less assume, as we will throughout, that the matrix to be inverted than the variance of ei . Two methods are available for ˆ is nonsingular. If the ei are not normally distributed, β is reducing the bias. The first is to make a degrees­of­freedom ~ ˆ −1 the estimator with smallest variance in the class of linear adjustment by multiplying Vb { β} by ( n − k ) n , where k unbiased estimators. The estimator of a linear combination is the dimension of x i . An alternative adjustment is to k of the coefficients, say θ α = ∑ j = 1 α j β j , can be written as replace eˆi with ˆ e~ = (1 − ψ ) − 0. 5 e , θ α = ∑ wα i y i i ii ˆi i∈ A

Statistics Canada, Catalogue No. 12­001­XIE 8 Fuller: Regression Estimation for Survey Samples

th ′ −1 ′ where ψ ii is the i diagonal element of X ( X X ) X . FN = [( y1 N , x1 N ), ( y 2 N , x 2 N ), ..., ( y NN , x NN ) ] See Horn, Horn and Duncan (1975), Royall and th Cumberland (1978) and Cook and Weisberg (1982, section be the set of vectors for the N finite population. The 2.2). subscript N on the vectors will often be omitted. The finite If we observe the value x for an element, but do not population mean is i N observe y i , then the best predictor of y i for that element is z = ( y , x ) = N − 1 (y , x ). (4.1) ˆ N N N ∑ i i yˆ i = x i β. Likewise, if we know the sum of x i for a set of i = 1 x’s, then the best predictor for the sum of the y i is the sum of x βˆ . Thus, given a set of N elements that satisfy model We denote the set of indices appearing in the sample i selected from the N th finite population by A . (3.1), a set of observations ( y , x ) on a subset denoted by N i i When the finite population is a sample from an infinite A, and the known values of x i for the remaining N − n elements, superpopulation, the probability properties of a sample are determined by the properties of the superpopulation and the Yˆ = y = x βˆ , N − n , reg ∑ ˆ i ∑ i properties of the probability mechanism used to select the i∈ A i∈ A sample. One can consider the unconditional properties, the where A is the set of elements for which y is not observed, properties conditional on the particular finite population, or is the best predictor of the sum of the unobserved y ’s. See the properties conditional on some part of the realized Goldberger (1962), Brewer (1963), Royall (1970), Harville sample. (1976) and Graybill (1976, section 12.2). Hence Properties conditional on the finite population depend Tˆ = y + Yˆ (3.6) primarily on the survey design and are often called y, reg ∑ i N − n , reg ˆ i∈ A design properties. Thus an estimator θ is said to be design consistent for the finite population parameter θ is the best predictor for the total of N observations. N if, for all ε > 0 , If the first element in the x–vector is always one, we can ˆ partition the x–vector as x i = (1 , x1 , i ) and write the lim prob{| θ − θ N | > ε | FN } = 0, regression estimator of the mean as N , n→ ∞ where the notation means that we condition on the − 1 ˆ ˆ ˆ yreg = N Ty, reg = x N β = yn + (x1, N − x1, n ) β1 , (3.7) realized finite population FN and, hence, the probability ˆ ˆ ˆ is with respect to the design. where β of (3.2) is partitioned as ( β 0 , β 1′ ) ′ and ( yn , x n ) is ˆ Assume the finite population is generated as the vector of simple sample means. We call x N β the regression estimator of the mean. independent selections from a superpopulation for which Given the model (3.1), the expected value of the mean of E{ z ′i z i } is positive definite, where z i = ( y i , x i ) . We y for the finite population of N elements generated by the define a superpopulation vector of least squares ˆ regression coefficients by model is x N β and x N β is an unbiased estimator of the finite population mean. This, we believe, is the point at β = [ E{ x ′ x }] −1 E {x ′ y }. (4.2) which regression estimation for the finite population mean i i i i under more complex designs begins. Given a sample of n observations on z i we define the n × ( k + 1 ) matrix Z = ( y , X ) of observations, where the th i row of Z is ( y i , x i ) . If we assume the model 4. Design Based Estimation y = Xβ + u, (4.3) The development of this section treats the finite E{ u , u u ′} = ( 0 , Φ ), population as a sample realization from an infinite population. The use of such models has a long history in the generalized least squares estimator of β is survey sampling. Some references through 1970 are −1 −1 − 1 βˆ = ( X ′ Φ X ) X ′ Φ y. (4.4) Cochran (1939, 1942, 1946), Deming and Stephan (1941), Madow and Madow (1944), Yates (1949), Godambe The model (4.3) serves as motivation for estimators of (1955), Hájek (1959), Rao, Hartley, and Cochran (1962), the form (4.4) but we shall consider estimators where Φ Konijn (1962), Brewer (1963), Godambe and Joshi (1965), is a general symmetric positive definite weight matrix, Hanurav (1966), Ericson (1969), Isaki (1970), and Royall not necessarily the covariance matrix of the errors. (1970). We give the large sample properties of the vector of To discuss the large sample properties of regression estimated regression coefficients (4.4) following Fuller estimators we consider sequences of finite populations and (1975). See also Hidiroglou (1974), Scott and Wu (1981), associated probability samples. The set of indices of the and Robinson and Särndal (1983). th elements in the N finite population is U N = {1 , ..., N } , Assume the superpopulation has eighth moments and where N = 1, 2 , ... . Associated with the i th element of the that the sample design is such that the error in the th −1 / 2 N population is a row vector of characteristics Horvitz­Thompson estimator of the mean is O p ( n ) , z iN = ( y iN , x iN ) . Let where the Horvitz­Thompson estimator of the mean is

Statistics Canada, Catalogue No. 12­001­XIE Survey Methodology, June 2002 9

−1 − 1 where βˆ is of the form (4.4) with a general Φ matrix. The zHT = (yHT , xHT ) = N ∑ π i z i (4.5) i∈ A estimator can be written as w′ y , where the vector of weights can be constructed by minimizing the Lagrangean and π i is the selection probability for element i. Then the error in the vector of regression coefficients is w ′ Φ w + (w′ X − xN ) λ ˆ −1 ′ − 1 β − β N | F N = Q xxN b HT + O p ( n ) , (4.6) and λ is the vector of Lagrange multipliers. If there is a column vectors γ such that where − 1 − 1 X γ = Φ D π J (4.14) β N = Q xxN Q xyN , (4.7)

for all possible samples, where D π = diag ( π1 , π 2 , ..., π n ) ( Q , Q ) = E {( Qˆ , Qˆ ) | F } , (4.8) and J is an n–dimensional column vector of ones, then the xxN xyN xx xy N ˆ ˆ regression estimator x N β of (4.13) with β defined in (4.4) is a design consistent estimator of y . It follows from ˆ ˆ −1 ′ − 1 ′ − 1 N ( Q xx , Q xy ) = n ( X Φ X , X Φ y ) , (4.11) that

− 1 − 1 [x Vˆ {βˆ} x′ ] −1 / 2 (x βˆ − y ) L → N (0, 1) . (4.15) b HT = N ∑ π i b i , (4.9) N N N N i∈ A Φ D − 1 J −1 The requirement of (4.14) that π be in the b ′i = n N π i ζ i e i , e i = y i − x i β N , and ζ i is column i of − 1 column space of X is crucial for design consistency. Simple X ′ Φ . By (4.9) the error in the estimator of β N is ways to satisfy this requirement are to let one column of X approximately the error in a Horvitz­Thompson estimator of be the column of ones and to use a multiple of D π as Φ , or − 1 the mean. In result (4.6), the β N is defined as a function of to let one column of X be the elements π and set Φ = I , ˆ ˆ i the expected values of the sample quantities ( Q xx , Q xy ) . or to let one column of X be the elements π i and set 2 Thus β N is not necessarily the ordinary least squares finite Φ = D π . If X is composed of the single column vector with 2 population regression coefficient. The vector b i of (4.9) is elements π i and if Φ = D π , then the estimator (4.13) the generalization of the vector b i of (3.3). If the limiting reduces to the Horvitz­Thompson estimator of (4.5) for distribution of the properly standardized Horvitz­Thompson fixed size designs. If X = J and Φ = D π , the estimator estimator is normal, and if there is a design consistent esti­ (4.13) reduces to the ratio estimator, mator of the variance of the Horvitz­Thompson estimator, then it is possible to construct tests and confidence intervals −1  −1  −1 for the coefficients. Assume the design is such that y π =  ∑ π i  ∑ π i y i , (4.16)  i∈ A  i∈ A V− 1/2 (z − z ) | F L → N (0, I ) , (4.10) z z HT N N which is location and scale invariant. To see the nature of the estimator when (4.14) is as N , n → ∞, where Vz z is the covariance matrix of −1 ˆ satisfied, let, with no loss of generality, X = ( x 0 , X 1 ) , zHT − z N . If Vz z is O ( n ) and the estimator Vz z is − 1 where x 0 = Φ D π J and x i = ( x 0 , i , x1 , i ) . Then consistent for Vz z , then −1 − 1 ˆ ˆ ˆ − 1/ 2 ˆ L yreg = x0, N x0, π yπ + (x1, N − x0, N x0 , π x1 , π ) β1 , (4.17) [V{β}] (β − βN ) | FN → N (0, I ) , (4.11) where where ˆ ˆ ˆ −1 ˆ ˆ − 1 ˆ ′ ˆ −1 −1 V {β } = Q xx Vb b Q xx , = V {cH T }, (4.12) β1 =[(X1 − x0 μˆ x1 )′ Φ (X1 − x0 μˆ x 1 )] ˆ ′ − 1 ˆ ˆ ′ × (X1 − x 0 μ x1 ) Φ y , Vb = V {b HT } is the estimated design variance of b HT ˆ − 1 ˆ calculated with bi′ = n N πi ζi′ eˆi eˆi = yi − x i β, and − 1 ˆ µˆ x 1 = x0 , π x1 , π , and ( y π , xπ ) is defined in (4.16). The V {cH ′ T } is the estimated design variance of cH′ T calculated − 1 ˆ − 1 ˆ ratios, such as x0, π yπ , can also be written as ratios of with cˆ i′ = Q xx b ′i . The limiting properties hold for stratified samples and for stratified two stage samples under mild Horvitz­Thompson estimators. If J is in the column space of X, estimator (4.17) is location invariant. If Φ = D π , then restrictions on the sequence of populations. −1 By analogy to (3.7), a regression estimator of the finite x0, π x0 , N = 1, and population mean is obtained by evaluating the estimated ˆ ˆ regression function at the population mean of x to obtain yreg = x N β = y π + (x1, N − x1, π ) β1 , (4.18) ˆ yr eg = x N β, (4.13) where

Statistics Canada, Catalogue No. 12­001­XIE 10 Fuller: Regression Estimation for Survey Samples

− 1   From (4.17), we can write βˆ = (x − x )′ π−1 (x − x ) 1  ∑ 1, i 1, π i 1, i 1, π  − 1 i∈ A     yreg = x0, N x0, π  yπ − x1 , π β1N − ( y N − x1, N β1 N )  − 1 ′ −1 × ∑ (x1, i − x1 , π ) πi ( yi − yπ ) . (4.19) + O (n ), i∈ A p −1 Also, when Φ = D π , the β N of (4.7) is the population = eπ + Op (n ), regression coefficient − 1 where ei = y i − x i β. Hence, the variance of the regression   estimator can be estimated with β N  ∑ x ′i x i  ∑ x ′i y i . (4.20)  i ∈U  i ∈U −1    Because the regression estimator of the mean is a linear ˆ ˆ  −1  −1 V {e π } = V  ∑ π i  ∑ π i e ˆi  , (4.26) combination of regression coefficients, it is a regression  i ∈ A  i∈ A  coefficient for a linear combination of the original x − where eˆ = y − x βˆ . Because (4.25) is as easy to compute variables. To see this, let x i = ( x 0, i , x1 , i ) = (1 , x1 , i ) , and i i i define a new vector with one in the first position and a as (4.26), and is applicable when x1, π − x1 , N is not −1 / 2 second vector with population mean equal to zero obtained O p ( n ) , the estimator (4.25) is recommended. The variance of the regression estimator can also be by subtracting the original population mean x1 , N from the computed using the jackknife or other replication methods, original x1 , i vector. Let qi = (1, x1, i − x1 , N ) be the transformed vector. Then the transformed regression model and the use of replication methods is becoming more is common. See Frankel (1971), Kish and Frankel (1974), y = q γ + e , (4.21) Woodruff and Causey (1976), Royall and Cumberland i i i (1978), and Duchesne (2000). Yung and Rao (1996) where the finite population coefficient vector is showed that (4.25) is identical to a jackknife linearization − 1 estimator for stratified multistage designs.   γ = ( y , β′ )′ = q′ q q ′ y . (4.22) The approach to regression estimation associated with N N 1, N  ∑ i i  ∑ i i  i∈U  i∈U (4.18) and (4.19) falls completely within a design formu­ lation. No models of the population, beyond the existence of The expression for the regression estimator of the mean moments, are used, through one might argue that one would becomes only consider regression when one feels there is some linear ˆ ˆ yreg = q N γ = γ 0 , (4.23) correlation between x1 , i and yi . The estimator (4.19) is a very natural estimator because where γˆ is obtained from (4.4) with q replacing x . i i the estimated regression coefficient is a design consistent Because the estimator is a linear estimator of the form estimator of the population regression coefficient. It is ′ w y , we can write mildly annoying that (4.18) does not always yield the − 1 yreg = ∑ wi yi = ∑ π i gi yi , (4.24) smallest large sample design variance for the estimated i∈A i∈ A ˆ mean. Treating β 1 of (4.18) as a fixed vector, the value that − 1 minimizes the variance of the linear combination of means where wi = π i g i . Furthermore, the estimated variance from (4.12) is is −1   β = V {x | F } C{x , y | F }. (4.27) ˆ ˆ ˆ  −1  1, dopt  1, π N  1, π π N V{ yreg} = V{γˆ 0 } = V ∑ πi (gi eˆi ) , (4.25) i∈ A  See Cochran (1977, page 201), Fuller and Isaki (1981), where it is understood that the estimated design variance of Montanari (1987, 1999) and Rao (1994). If there is a design ˆ consistent estimator of the variance of x1 , π , then the β1 , d (4.25) is computed for the variable gi eˆi , eˆi = yi − x i β , and βˆ is defined in (4.4) The variance estimator (4.25) is a that minimizes the estimated variance direct generalization of expression (3.5). By transforming Vˆ {y − x β }, (4.28) the variables so that the population mean of the auxiliary π 1, π 1, d vector is zero, the first element of the regression vector is ˆ denoted by β 1, dopt , is a consistent estimator of β 1, dopt . It the regression estimator of the mean and the first element of follows that the estimator (4.12) is an estimator of the variance of the regression ˆ estimator that contains a component due to estimating β. yd , reg = y π + (x1, N − x1, π ) β1, dopt (4.29) This was pointed out in Hidiroglou, Fuller, and Hickman (1978). Also, see Särndal (1982). Särndal, Swensson and has the minimum limit variance for design consistent Wretman (1989) suggested the g–factor terminology for the estimators of the form yπ + (x1, N − x1 , π ) β1 , d . Also calculation of the estimated variance of a regression ˆ −1 / 2 L estimated total. [V{eπ }] ( yd , reg − yN ) → N (0, 1) , (4.30)

Statistics Canada, Catalogue No. 12­001­XIE Survey Methodology, June 2002 11

ˆ where V {eπ } is the estimator of (4.26) constructed with y = y + (x − x ) βˆ ˆ reg π 1, N 1, π 1 eˆi = yi − y π − (x1, i − x1 , π ) β1 , dopt . In a large sample sense, (4.29) answers the question of with the smallest estimated design variance. If the true how to construct a regression estimator with optimum slopes in the strata are the same and if the selection design properties. In practice a number of questions remain. probabilities are proportional to the square roots of the The estimator is obtained under the assumption of a large 2 within­stratum variances, then the use of Φ = D π gives a sample and a vector x of fixed dimension. In practice there smaller small sample MSE than the use of Φ −1 = may be a number of potential auxiliary variables and if a 2 2 diag { K t } because the sum of wh i σ h is smaller. Fuller and large number are included in the regression, terms excluded Isaki (1981) noted that the design­optimum estimator is in the large sample approximation become important. This often well approximated by the estimator constructed with is particularly true for cluster samples where the number of 2 Φ = D π . primary sampling units in the sample is small. In such cases, We have introduced regression estimation for the mean, ˆ the number of degrees­of­freedom in V {x1 , π } is small and but it is often the totals that are estimated and totals that are the inverse can be unstable. These issues are discussed used as controls. Consider the regression estimator of the further in section 9. total of y defined by The estimator β of (4.29) is linear in y for most 1, dopt ˆ ˆ ˆ ˆ designs. See Rao (1994). For example, for a stratified design Ty, reg = Ty , π + (Tx, N − Tx , π ) β y⋅ x , (4.33) with simple random sampling within strata, where T is the known total of x and (Tˆ , Tˆ ) is a ˆ x, N y, π x , π C{x 1, π , yπ } vector of design consistent estimators of (T y , N , Tx , N ) . By

H n h analogy to (4.28), the estimator of the optimum β is = K (x − x )′ ( y − y ) , (4.31) ∑ h ∑ 1, hj 1, h hj h ˆ ˆ ˆ −1 ˆ ˆ ˆ h=1 j =1 β y⋅x = [V {Tx, π }] C{Tx ′, π , Ty , π }, (4.34) where ˆ ˆ where V {Tx , π } is a design consistent estimator of the vari­ 2 − 1 −1 ˆ ˆ ˆ ˆ K = W (1 − f ) ( n − 1) n ance of Tx , π and C (Tx′, π , T y , π ) is a design consistent h h h h h ˆ ˆ − 2 − 2 −1 estimator of the covariance of Tx , π and Ty , π . = N π h (1 − f h ) ( n h − 1) nh , The estimator of the total is N y reg for simple random −1 sampling, but the exact equivalence may not hold in more N N h = W h , N h is the size of stratum −1 complicated samples, because in such situations the h, f h = π h = N h n h , and n h is the sample size in stratum h. It follows that the weights associated with estimator estimated mean may be a ratio estimator. However, if the (4.29) are regression estimator of the two totals is constructed using (4.34), the ratio of the two estimated totals has large sample −1 − 1 whi = N πh + (x1, N − x1 , π ) variance equal to that of the regression estimator of the −1 mean. To see this write the error in the regression estimated n  H h  totals of y and u as ×  ∑ K t ∑ (x1, tj − x1, t )′ (x1, tj − x1 , t )  t =1 j = 1 ˆ ˆ   Ty, reg − Ty, N = Ty, π − T y, N × K h (x1, hi − x1 , h )′. (4.32) ˆ − 1 + (Tx, N − Tx , π ) β y⋅ x, N + Op (Nn ) See also Särndal (1996). The weights of (4.32) can be and constructed by minimizing ∑ w 2 K − 1 subject to the hi ∈ A hi h Tˆ − T = Tˆ − T constraints u, reg u, N u, π u, N − 1 + (T − Tˆ ) β + O (Nn −1 ) , (4.35) ∑ wh i = N N h , h = 1, 2, ..., H , x, N x, π u⋅ x, N p i∈ A h ˆ ˆ where we are assuming Ty, π − Ty , N , β y ⋅ x − β y ⋅ x, N and the and −1 / 2 corresponding quantities for u, to be O p ( N n ) and ∑ wh i x1, hi = x1 , N , −1 / 2 ˆ − 1 ˆ hi∈ A O p ( n ) , respectively. Then the error in Tu, reg T y , reg is where Ah is the set of sample elements in stratum h. Φ = D ˆ −1 ˆ −1 − 1 ˆ The estimator of (4.19) with π is a function of Tu, reg Ty, reg − Tu, N Ty, N = Tu, N [(Ty, π − T y, N ) Horvitz­Thompson estimators of population moments. The −1 − R (Tˆ − T ) estimator (4.17) with Φ = diag { K t } , the diagonal matrix N u, π u, N with K on the diagonal for elements in stratum t, and t + (T − Tˆ ) (β − R β )] dummy variables for stratum effects, gives the estimator of x, N x, π y⋅x, N N z⋅ x, N − 1 the mean in the class + Op (Nn ) , (4.36)

Statistics Canada, Catalogue No. 12­001­XIE 12 Fuller: Regression Estimation for Survey Samples

−1 − 1 −1 where R N = T u , N T y , N . If we construct the regression where Γ AA = ΣeeAA Σ eeAA , xN − n = (N − n) (N xN − n xn ), ˆ ˆ − 1 ˆ estimator for R N starting with R = Tu, π Ty , π , we have Σ ee A A = E { e A e ′A } , ˆ ˆ ˆ ˆ ˆ − 1 − 1 −1 Rreg = R + (Tx, N − Tx , π ) β R⋅ x , (4.37) β = ( X ′ Σ eeAA X ) X ′ Σ eeAA y , where e A = ( e n + 1 , e n + 2 , ..., e N ) , J N − n is an N − n dimensional βˆ = [Vˆ{Tˆ }] −1 Cˆ {Tˆ ′ , Rˆ } column vector of ones, xn is the simple sample mean, and R⋅x x, π x , π A is the set of elements in U that are not in A. See Royall and (1976). Under the model, Cˆ{Tˆ , Rˆ} = Cˆ {Tˆ , T − 1 (Tˆ − R Tˆ )}. ˆ ˆ −1 ′ x, π x, π u, N y, π N u , π θ − yN = C xA (β − β) + N J N − n (Γ AA e A − e A ) It follows that the large­sample­design­optimum coefficient and for the ratio is T − 1 ( β − R β ) and the ratio of u , N y ⋅ x , N N u ⋅ x , N ˆ ˆ ′ design­optimum regression estimators is the large sample V{θ − yN | X A } = C xA V {β} C xA design­optimum regression estimator of the ratio. −2 ′ + N J N −n (ΣeeA A − Γ AAΣ eeAA ) J N − n , (5.4) where 5. Models and Regression Estimation − 1 ′ CxA = N [(N − n) xN −n − J N − n Γ AA X A ] . In this section we assume that the analyst postulates a detailed superpopulation model. Assume also that the Design consistency of estimator (5.3) and the situations sample is an unequal probability sample or (and) the in which the model estimator reduces to the Horvitz­ specified error covariance structure is not a multiple of the Thompson estimator have been considered by, among identity matrix. Then, only in special cases will the design others, Isaki (1970), Royall (1970, 1976), Scott and Smith optimal estimator of (4.29) agree with the best estimator (1974), Cassel, Särndal, and Wretman (1976, 1979, 1983), constructed under the model, conditioning on the sample Zyskind (1976), Tallis (1978), Isaki and Fuller (1982), x − values. To investigate this possible conflict, write the Wright (1983), Pfefferman (1984), Tam (1986), Brewer, model for the population in matrix notation as Hanif and Tam (1988), Montanari (1999), and Gerow and y = X β + e McCulloch (2000). U U U The estimator (5.3) reduces to x βˆ if there is an η e ~ ( 0, Σ ) , (5.1) N U eeUU such that where y U = ( y1 , y 2 , ..., y N ) ′, e U = ( e1 , e 2 , ..., e N ) ′ and X A η = Σ eeAA J n + Σ eeA A J N − n , (5.5) X U = ( x1 ′ , x ′2 , ..., x N ) ′. It is assumed that Σ eeUU is known or known up to a multiple. The model for a sample of n for all samples with positive probability. If there is also γ observation is such that

y A = X A β + e A , − 1 X A γ = Σ eeAA D π J n (5.6) e A ~ ( 0, Σ eeAA ) , for all samples with positive probability, then θˆ of (5.3) is ′ ′ where y A = ( y1 , y 2 , ..., y n ) , e A = ( e1 , e 2 , ..., e n ) , X A = design consistent, where D was defined for (4.14). Given ′ ′ ′ ′ π ( x1 , x 2 , ..., x n ) , and we index the sample elements by 1, a k such that 2, ..., n, for convenience. We have used the subscript U to identify population quantities, and the subscript A to identify X k = Σ ( D −1 J − J ) Σ J , (5.7) sample quantities, but we will often omit the subscript A to A eeAA π n n eeA A N − n simplify the notation. For example, we may sometimes then θˆ of (5.3) is expressible as write the n× n covariance matrix as Σ ee . The unknown finite population mean is ˆ ˆ θ = yπ + (xN − xπ ) β (5.8)

y N = x N β + eN . (5.2) and if the design is such that y is design consistent for Under model (5.1), the best linear, conditionally unbiased ˆ π y N , θ of (5.8) is design consistent for y N . predictor of θN = y N , conditional on X is We call a regression model of the form (5.1) for which (5.5) and (5.6), or (5.7), holds a full model. If (5.6) or (5.7)  y + (N − n) x βˆ  ∑ i N − n does not hold, we call the model a reduced model or a θˆ = N −1 i∈ A  , (5.3)   restricted model. We cannot expect the conditions for a full + J′ Γ (y − X βˆ )  N − n AA A A  model to hold for every analysis variable in a general

purpose survey because Σ ee will be different for different

Statistics Canada, Catalogue No. 12­001­XIE Survey Methodology, June 2002 13

y’s. Therefore, given a reduced model, one might search for E{yreg − yN } = E{E[ yreg − y N | H ]} a good model estimator in the class of design consistent = E{(0, q − q ) β =& 0, (5.14) estimators. reg N y⋅ h To construct a design consistent estimator of the form where y is defined in (5.12) and the approximation is due ˆ reg x N β when model (5.1) is a reduced model, we can add a to the approximate design expectation of the regression vector satisfying (5.7) to the X–matrix to create a full estimator qreg . model. There are two possible situations associated with this The estimator (5.13) is a linear estimator, where the approach. In the first, the population mean (or total) of the vector of weights, w, minimizes the Lagrangean added variable is known. With known mean, one can w′ Σ w + [w′ H − (x , q )] λ . (5.15) construct the usual regression estimator and the usual design ee N reg variance estimation formulas are appropriate. The estimator is location invariant if the column of ones is To describe an estimation procedure for the situation in in the column space of X. which the population mean of the added variable is not Because the variable q is the variable whose omission known, let q = ( q1 , q 2 , ..., q n ) ′ denote the added vector, from the full model can produce a bias, it seems prudent to where q is the vector on the right side of the equality in test the coefficient of q before using the reduced model to (5.7). Let H = ( X , q ) , where X is the matrix of auxiliary construct an estimator for the mean of y. This can be done variables with known population mean vector, x N . We using a model estimator of the variance, write the full model for the sample as ˆ ˆ ˆ −1 −1 V {β y⋅ h | H} = (H′ Σ ee H ) y = Zβ y⋅ h + e , (5.9) or using the design estimator of variance of (4.12). See where e ~ ( 0, Σ ee ) . The best linear conditionally unbiased Du Mouchel and Duncan (1983) and Fuller (1984). estimator of β y⋅ h is A working specification for Σ ee may be particularly ˆ −1 − 1 − 1 β y ⋅ h = ( H ′ Σ ee H ) H ′ Σ ee y. (5.10) appropriate for two­stage samples, see Royall (1976, 1986) and Montanari (1987). A reasonable model is that in which If the coefficient for q in (5.9) is not zero, it is not there is common correlation among items in the same possible to construct a conditionally unbiased estimator of primary sampling unit and zero correlation between units in h N β y ⋅h because the q N component of h N is unknown. different primary sampling units. Because the associated However, because βˆ is unbiased for β , it is possible y⋅ h y⋅ h Σ ee is block diagonal of a particular form, it is relatively to construct a conditionally unbiased estimator of any linear easy to invert and hence the estimator based on such a function of β y⋅ h . Thus, it is natural to replace the unknown working Φ is relatively easy to construct. The regression q N with the “best available” estimator of q N , and a estimator using a Φ with a non zero correlation for units in reasonable choice is the regression estimator, the same primary sampling unit is a combination of the estimator based on primary sampling unit totals and that ˆ qreg = qπ + (xN − xπ ) β q⋅ x , (5.11) based on elements. See Fuller and Battese (1973). Thus, the use of such a Φ can avoid variance problems associated ˆ −1 −1 −1 where βq⋅ x = (X′ Σee X) X′ Σ ee q . Then the estimator with the use of primary sampling unit totals. (5.3) becomes ˆ ˆ θ = yπ + [(xN , qreg ) − (xπ , qπ )] β y⋅h (5.12) 6. Maximum Likelihood and Raking Ratio The estimator (5.12) can be expressed in the familiar regression estimator form, The theoretical foundation for the regression estimators y = y + (x − x ) βˆ . (5.13) discussed in section 3 and section 4 is maximum likelihood reg π N π y⋅ x estimation for the linear model with normal errors. We now That is, the regression estimator of the finite population consider the likelihood for multinomial variables. Given a mean of y based on the full model, but with the mean of qi simple random sample from a multinomial defined by the unknown and estimated with the regression estimator, is the entries in a two way table, the logarithm of the likelihood, regression estimator with β y⋅ x estimated by the generalized except for a constant, is least squares regression of y on x using the covariance r c matrix Σ ee . See Park (2002). The estimator is conditionally ∑ ∑ ai j log p ij , (6.1) model unbiased under the reduced model containing only x i = 1 j = 1 if the reduced model is true. If the population coefficient for

qi is not zero, the reduced model is not true. Then the where ai j is the estimated fraction in cell ij, pi j is the estimator is conditionally model biased, but the estimator is population fraction in cell ij, r is the number of rows, and c unbiased for the finite population mean under the full model is the number of columns. If (6.1) is maximized subject to and an unbiased design, because the restriction ∑ ∑ p ij = 1 , one obtains the maximum

Statistics Canada, Catalogue No. 12­001­XIE 14 Fuller: Regression Estimation for Survey Samples likelihood estimators pˆ ij = ai j . If the marginal row Deville, Särndal and Sautory (1993) investigated four fractions pi ⋅, N and the marginal column fractions p⋅ j , N are estimators in the class. Although weights constructed using known, it is natural to maximize the likelihood subject to different functions could differ considerably, the authors these constraints by using the Lagrangean concluded that estimates were quite similar, a result consistent with the theory. Singh and Mohl (1996) and r c r  c    Théberge (1999, 2000) discuss estimators with the ∑ ∑ a ij log p ij + ∑ λ i  ∑ p ij − p i ⋅, N  i =1 j =1 i =1  j =1  calibration property. r + c  r  + ∑ λ j  ∑ p ij − p ⋅ j , N  , (6.2) j = r +1  i =1  7. Population of Auxiliary Vectors where λ i , i = 1, 2, ..., r , are for the row restrictions and Known at Estimation Step λ j , j = 1, 2, ..., c , are for the column restrictions. There is no explicit expression for the solution to (6.2) and there may If the x­vector is known for all of the population be no solution if there are too many empty cells. A elements, the number of possible regression­type estimators procedure that produces estimates close to the maximum is greatly expanded. Most procedures involve the fitting of likelihood solution is that called raking ratio or iterative an approximating function for the relationship between y proportional fitting. The procedure iterates, first making and the auxiliary variables. The most used procedure is to ratio adjustments for the row restrictions, then making ratio assign the population elements to categories on the basis of adjustments for the column restrictions, then making a ratio the auxiliary data and to use these categories as post strata. adjustments for the row restrictions, etc. The method is This procedure is equivalent to approximating the expected generally credited to Deming and Stephan (1940). See, for value of y given x by a step function. The estimator is example, Bishop, Fienberg and Holland (1975, Chapter 3). formally equivalent to the regression estimator (4.19) where Deville and Särndal (1992) considered a class of the x–vector is a vector of indicator variables for post­ objective functions of the form ∑ i ∈ A G ( wi , α i ) , where stratum membership. G ( w, α) is a measure of distance between an initial weight The application of the procedure often requires the

α i and a final weight wi . The objective function is development of criteria to use in forming the post strata. minimized subject to the constraints Typically the post strata are formed so that each post stratum contains a minimum number of sample elements ∑ wi xi = x N . (6.3) i∈ A and so that the weights for any post stratum are not overly large. Estimation with post strata and the formation of post Deville and Särndal (1992) used the term calibrated to strata have been studied by Fuller (1966), Holt and Smith describe weights satisfying (6.3). If the initial weight is (1979), Tremblay (1986) Kalton and Maligalig (1991), α = ( ∑ π −1 ) − 1 π − 1 and if one is the first element of x , the i j i i Little (1993), Eltinge and Yansaneh (1997), and Lazzeroni solution to the minimization problem is approximated by a and Little (1998), among others. Holt and Smith (1979) regression estimator of the mean of the form argued for the use of a conditional variance estimator for ˆ yreg = yπ + (xN − x π ) β , (6.4) post stratification. where Given the population of x–vectors, one can use the −1 sample to estimate a functional relationship between y and x   ˆ ′ −1 ′ −1 and then predict the unobserved y. If the procedure is to be β =  ∑ x i ϕ ii x i  ∑ x i ϕ ii y i ,  i∈ A  i∈ A design consistent, then a condition similar to (4.14) must hold. One way to ensure design consistency is to require the and ϕ is the second derivative of G ( w, α) with respect to ii fitted model to satisfy w evaluated at ( w, α ) = ( α i , α i ) . Using this approach, −1 ˆ Deville and Särndal (1992) showed that the maximum ∑ π i [ y i − f ( x i , β ) ] = 0 , (7.1) likelihood and raking ratio estimators have the same i∈ A ˆ th limiting distribution as the regression estimator (4.18) with where f ( x i , β ) is the model estimated value for the i Φ = D π . To obtain the raking ratio weights they used the observation. objective function Firth and Bennett (1998) pointed out that some nonlinear − 1 models satisfy (7.1). If the initial model does not satisfy ∑ [ wi log α i wi + α i − wi ] , (6.5) i∈ A (7.1), an estimated intercept term can be added to create an expanded full model, and to obtain the maximum likelihood weights they used the ~ ˆ ˆ objective function f F ( x i ; β ) = f ( x i ; β ) −1 −1 [ w − α − α log α w ] . (6.6)  −1  −1 ∑ i i i i i +  π  π [ y − f ( x ; βˆ )] . i∈ A ∑ i ∑ i i i  i∈ A  i∈ A

Statistics Canada, Catalogue No. 12­001­XIE Survey Methodology, June 2002 15

This is a direct extension of the ideas of difference for i = 1, 2 , ..., N . Thus, if the reciprocal of the response estimation to the nonlinear case. See Isaki (1970), Cassel, probability is a linear function of the control variables, the Särndal and Wretman (1976) and Wright (1983). A closely regression estimator is a consistent estimator of the mean of related approach was suggested by Wu and Sitter (2001) in y. One way in which (8.4) can be satisfied is for the ele­ ˆ which the fitted function f (x i , β ) is used as the auxiliary ments of x i to be dummy variables that define subgroups variable in a linear regression estimator. and for the response probabilities to be constant in each A number of “local” procedures, other than step subgroup. functions, can be used to approximate the functional If (8.4) holds and if the probability of responding is relationship between x and y. Spline functions and independent from unit to unit, then the estimated variance polynomials are linear models that fall within the class of based on (4.12) is an appropriate estimator for the variance section 4. Estimators that use some kind of local smoothing of the regression estimator of the mean. It is particularly to estimate population quantities have been considered for important that a variance estimator of the form (4.12) or finite populations from a model viewpoint by Kuo (1988), (4.25), and not of the form (4.26) be used, because xN − xπ −1 / 2 Dorfman (1993), Dorfman and Hall (1993), Chambers is, in general, not O p ( n ) in the presence of (1996), and Chambers, Dorfman and Wehrly (1993). Breidt nonresponse. Singh and Folsom (2000) make a similar and Opsomer (2000) showed that estimators based on local argument for the variance estimator (4.25) when using polynomial regression are design consistent. Firth and regression to adjust for coverage error. Bennett (1998) also considered local fit models. Often a preliminary adjustment to the selection probabilities is made for nonresponse and this is followed by regression estimation. The most frequently used response 8. Regression Estimation and adjustment is to form adjustment cells (post strata) and to Nonresponse ratio adjust the weights of respondents in the cell so that the sum of the weights is equal to the estimated (or known) total Regression estimation is frequently a part of procedures for the cell. See, for example, Little and Rubin (1987, page used to adjust data for unit nonresponse. Regression can be 250). Procedures using an estimated response probability justified on the basis of a model such as (3.1) or on the basis function are discussed by Cassel, Särndal and Wretman that regression can adjust for unequal response probabilities. (1983), Rosenbaum and Rubin (1983), Folsom and Witt See Cassel, Särndal and Wretman (1979, 1983), Little (1994). Fuller and An (1998), and Folsom and Singh (1982, 1986), Bethlehem (1988), Kott (1994), Fuller, (2000). Brick, Waksberg and Keeter (1996) use an Loughin and Baker (1994) and Fuller and An (1998). estimated contact probability to adjust for frame coverage. Consider an estimator of the population regression vector To consider procedures based on estimated response of the form (4.4) with Φ = D constructed with the π ~ probabilities, assume that the inverse of the response responding units. Denote the estimator by β and let pi be probability for individual i is given by the conditional probability of observing unit i given that the −1 0 unit is selected for the sample. Then under regularity pi = g ( z i ; θ ) , (8.5) ~ conditions, the estimator β is a consistent estimator of where z i is a vector of variables that can be observed for 0 − 1 both respondents and nonrespondents, θ is the true value   γ =  x ′ p x  x ′ p y . (8.1) of θ , and g ( z i ; θ ) is continuous in θ with continuous first N  ∑ i i i  ∑ i i i 0  i∈ U  i ∈U and second derivatives in an open set containing θ for all z . The vector ( y , x , z ) is observed, and we assume The population mean of y can be expressed as i i i i that pi is bounded below by a positive number. y = x γ + a (8.2) N N N N Let δ i be the indicator variable with δ i = 1 if a response is obtained and δ i = 0 if a response is not obtained. Using where ai = y i − x i γ N and a N is the population mean of 0 the a . The regression estimator y = x β% will be the vector ( δ i , z i ) , the parameter θ of the response i reg N ˆ 0 consistent for y if the probability limit of a is zero. The probability function is estimated. Assume that θ − θ = N N −1 / 2 ˆ O p ( n ) , where θ is the estimator of θ. Let β N denote probability limit of a N will be zero if the sequence of finite populations is a sequence of random samples from an the finite population regression vector for the regression of y infinite population in which on x. Let −1 y = x β + e , (8.3) %  −1 −1  −1 − 1 i i i β =  ∑ x′i xi πi pˆi δi  ∑ x i′ yi πi pˆ i δ i , (8.6)  i∈A  i∈ A and the e the sample are independent of x with i i ˆ − 1 E {e | x } = 0. where π i are the selection probabilities and pi = i i g ( z ; θˆ ) . Under conditions of the type used in section 4, Alternatively, a sufficient condition for a N to be zero is i the existence of a column vector ξ such that % −1 −1 − 1 ˆ 0 β − βN = M xx ∑ δi πi pi xi′ ai [1 + pi g1, i (θ − θ ) ] − 1 i∈ A x i ξ = p i (8.4) − 1 + Op (n ) ,

Statistics Canada, Catalogue No. 12­001­XIE 16 Fuller: Regression Estimation for Survey Samples where g1 , i is the row vector of first derivatives of g ( z i ; θ ) regression estimator. Because the estimator is not calibrated, 0 −1 − 1 evaluated at θ = θ and M xx = ∑ i ∈ A x ′i x i π i p i δ i . If we suggest a calibrated version obtained by computing the g1 , i is uncorrelated with ai , then the term involving g1 , i ai regression estimator with πˆ i | A as initial weights. The −1 is O p ( n ) and the variance estimator constructed as if difference between (9.2) and the regression estimator 0 −1 g ( z ; θ ) is known is appropriate. The conditions are constructed with initial weights πˆ i | A is O p ( n ) . Hence, satisfied if z i is a subvector of x i and z i defines there is a good chance that the regression weights so ˆ imputation cells (adjustment cells) with equal response rates constructed will be positive. The variance estimator Σ x x (i ) within a cell. is relatively simple to compute for stratified samples but may require considerable computation for other cases. Thus

one may choose to approximate Σ x x (i ) . 9. Practical Considerations Given that the regression weights are being constructed If the regression weights are to be used in a general by minimizing an objective function, one can add purpose survey, no individual weight used in estimating a restrictions to the problem to place bounds on the weights. total should be less than one. Also, it seems reasonable, on Huang and Fuller (1978) gave an interative procedure robustness grounds, to avoid very large weights. We discuss equivalent to constructing a Φ matrix at each step that some procedures that have been developed to accomplish reduces the weight on observations whose current weight these objectives. deviates from the average by a large absolute amount. A number of algorithms produce positive weights with a To discuss additional procedures associated with high probability. Raking ratio procedures produces positive quadratic objective functions, assume we have a working weights for most data configurations. Deville, Särndal and covariance matrix, denoted by Φ ee , for the model (5.1) that Sautory (1993) discuss the extension of raking ratio to is to be used to construct a regression estimator. Let α be general x–variables and extensions to include bounds on the the column vector of initial weights and assume Φ ee α is in weights. the column space of X. Then the weights that minimize the Tillé (1998) suggested the use of approximate conditional model variance are the weights that minimize ′ conditional probabilities, conditional on xπ , to compute an w Φ ee w or, equivalently, that minimize estimator. His approximation can be extended to produce ( w − α ) ′ Φ ( w − α ) (9.3) regression weights that are positive with high probability. ee (i ) subject to the constraint Let xπ be an estimator obtained by deleting element i, or primary sampling unit i, and modifying the remaining w′ X = x N . (9.4) weights so that x (i ) is unbiased, or consistent to the same π Given an objective function, we can add restrictions on the order as xπ , for the population mean of all elements (i ) wi such as excluding i. The estimator xπ can be the estimator used to ˆ L 1 ≤ wi ≤ L 2 , i ∈ A , (9.5) construct jackknife deviates. Let Σ x x be an estimator of the ˆ covariance matrix of xπ and let Σ x x (i ) be an estimator where L and L are nonnegative constants. Minimizing (i ) 1 2 of the conditional covariance matrix of xπ conditional on (9.3), subject to the constraints (9.4) and (9.5) is a quadratic (i ) i ∈ A . Then, in large samples xπ and xπ are programming problem. The use of quadratic programming approximately normally distributed and an estimator of the was suggested by Husain (1969) and was used by Isaki, probability that i is in the sample given the estimated mean Tsay and Fuller (2000). xπ , is If a large number of control variables are used, it may not be possible to construct weights satisfying the calibration πˆ = P ˆ i ∈ A | F , x i | A { N π} constraints and also falling within reasonable bounds. The ˆ 1/ 2 ˆ − 1/ 2 practitioner is faced with making compromises. The most = πi | Σ x x | | Σ x x (i) | exp {0.5 (G x x − G x x (i ) )} (9.1) common practice is to drop variables from the model. See where Bankier, Rathwell and Maijkowski (1992) and Silva and Skinner (1997). To discuss an alternative procedure, ˆ − 1 G x x = (xπ − x N ) Σ x x (xπ − x N )′, consider the situation in which some of the constraints are required but others can be relaxed. Let the matrix of (i) (i) ˆ − 1 (i) (i ) observations on the auxiliary variables be partitioned as G x x (i) = (xπ − x N ) Σx x (i) (xπ − x N )′, ( X 0 , X 2 ) , where X 0 is the set of variables for which exact (i ) − 1 constraints are required and X is the set for which the and x = (N −1) (N x − x ). For simple random 2 N N i constraints can be relaxed. Assume Φ α is in the column sampling, Tillé (1998) showed that the estimator ee space of X 0 . Then a generalization of (9.3) and (9.4) is the −1 −1 function y pπ = N ∑ π i| A y i , (9.2) i∈ A (w − α)′ Φee (w − α ) where π i| A is the conditional probability calculated under + (w′ X2 − x2 , N ) Ψ (w′ X2 − x2 , N ) ′ (9.6) the normality assumptions, is approximately equal to the

Statistics Canada, Catalogue No. 12­001­XIE Survey Methodology, June 2002 17 and the constraint Then the diagonal Ψ that minimizes the approximate variance has elements w′ X0 − x0, N = 0, (9.7) − 1 2 ψ ii = ( mi iV ββ ii ) β i , (9.12) where Φ ee and Ψ are positive definite symmetric matrices and x = (x , x ). The w that minimizes (9.6) subject N 0, N 2, N where m is the i th element of the diagonal matrix X% ′ to (9.7) minimizes the mean squared error of the unbiased ii 2 Φ −1 X % and V is the variance of βˆ in the transformed linear predictor of x β under the mixed model ee 2 ββ ii i N scale. To implement the procedure one must estimate the y = X 0 β 0 + X 2 β 2 + e , population parameters or choose realistic values for a general purpose Ψ. If one postulates a superpopulation where β 2 ~ ( 0, Ψ ), e ~ ( 0, Φ ee ), the random vector β 2 is 2 random model for β , then the β i of (9.12) is replaced with independent of e, and β 0 is a fixed vector. See Lazzeroni 2 and Little (1998) for the use of random models for post E {β i }, where the expectation is the model expectation. stratification. The vector w′ that minimizes (9.6) subject to restriction 10. Comments (9.7) is w ′ = α′ + (x − x ) H−1 X ′ Φ − 1 , (9.8) Regression estimation is a flexible and powerful tool for N π xψx ee the incorporation of auxiliary information into the esti­ where mation process. Closely related procedures, such as raking ratio, have large sample properties equivalent to those of − 1 − 1  X ′0 Φ ee X 0 X ′0 Φ ee X 2  regression estimators. The linearity of such estimators is of H =   . (9.9) x ψ x  −1 −1 −1  paramount importance because it permits the construction of  X ′2 Φ ee X 0 Ψ + X ′2 Φ ee X 2  a general purpose data set that provides very good esti­ The estimator can be written mators for a wide range of parameters. Given a concentrated interest in a single y–variable, ˆ yr reg = w′ y = y π + ( xN − xπ ) θ , (9.10) efficiency gains may be possible by postulating a particular set of auxiliary variables and a particular error covariance ˆ − 1 ′ −1 where θ = H x ψ y X Φ ee y. See Henderson (1963), matrix. Because of the simple nature of the design Robinson (1991), and Rao (2002, Chapter 6). consistency requirement, it is easy to test such models for Husain (1969) considered (9.6) for a simple random design consistency. sample from a normal distribution with X 0 = J , Φ ee = I , −1 −1 ˆ ˆ and Ψ = γ Σ x , 22 , where Σ x , 22 is the estimated covariance matrix of x2 , π , and γ is a constant to be Acknowledgements determined. For this case, Husain showed that the optimal γ is This research was partially supported by Cooperative Agreement 43­3AEU­0­80064 between Iowa State 2 −1 2 γ opt = [ k 2 (1 − R ) ] ( n − k 2 − 2) R , (9.11) University, the U. S. National Agricultural Statistics Service and the U. S. Bureau of the Census. I am deeply indebted to 2 where k 2 is the dimension of x 2 and R is the squared Mingue Park for assistance in literature review, for multiple correlation coefficient. Bardsley and Chambers comments on and repair of theoretical results, and for use of (1984) considered the function (9.6), the division of x i into material from his thesis. I thank Michael Hidiroglou, J.N.K. two components, and studied the behavior of the estimator Rao, Harold Mantel, and Jean Opsomer for useful from a model perspective. The procedure associated with comments on drafts of the manuscript. I thank the Associate (9.5), (9.6) and (9.7) was used by Isaki, Tsay and Fuller Editor for numerous comments that improved the (2000). In that application, the vector x0 , N contained presentation. marginal totals of a multiway table and x2 , N contained totals for interior cells. Rao and Singh (1997) studied a closely related estimator in which tolerances are given for Appendix the difference between the final estimates for elements of This appendix contains theorems supporting the limiting x2 , N and the corresponding elements of x2 , N . Park (2002) extended Husain’s optimality results to a properties of the regression estimators discussed in section 4. more general Ψ. The x vector can be transformed so that Theorem A.1. Let {U , F , A , n : N = k + 3, k + 4, ...} ˆ 2 N N N N V {x2 , π } for the transformed vector is a diagonal matrix and be a sequence of finite populations and samples, where FN % − 1 % % so that X′2 Φee X 2 is a diagonal matrix, where X 2 is the is a sample from an infinite population with eighth part of X 2 that is orthogonal to X 0 in the metric Φ ee . That moments, AN is the sample of size n N selected from the is, N th population. Let βˆ be defined by (4.4) of the text, and % −1 −1 − 1 X2 = X2 − X0 (X′0 Φee X0 ) X′0 Φee X 2 . let

Statistics Canada, Catalogue No. 12­001­XIE 18 Fuller: Regression Estimation for Survey Samples

ˆ −1 −1 Proof. The error in βˆ is Q zz = n Z ′ Φ Z , ˆ −1 −1 −1 −1 where Φ is a positive definite symmetric n × n matrix that β − βN = (X′ Φ X) [X′ Φ y − X′ Φ Xβ N ] may be a function of X but not of y, Z is defined following ˆ −1 −1 ′ − 1 (4.2), and we omit the subscript N on sample quantities. = Q xx (n X Φ e ) . Assume Qˆ is positive definite with probability one. If Φ zz ˆ is random, assume the rows of Φ − 1Z have bounded fourth Now β is a generalized least squares estimator. Therefore moments. Assume the selection probabilities satisfy eˆ ′ Φ −1 X = (y − Xβˆ ) ′ Φ − 1 X = 0 −1 0 < K 1 < N n π i < K 2 , and Q xyN − β ′N Q xxN = Q exN = 0 . By assumption (A.1) where π are the selection probabilities. Assume the sample i ˆ ′ −1 ′ −1 − 1/ 2 design is such that for any z with bounded fourth moments Qex = n X Φ e = Op (n ). Thus ˆ − 1/ 2 [(zHT − zN )′, (Q zz − Q zzN )] | FN = Op (n ), (A.1) ˆ −1  −1  − 1 β − βN = Q xxN  n ∑ ζ′i ei  + Op (n ) where  i∈ A    −1 −1 −1 −1 ′ ′ − 1 zHT = (yHT , xHT ) = N ∑ πi z i , (A.2) = Q xxN  N ∑ πi b i  + Op (n ). i∈ A  i∈ A 

ˆ The b i have bounded fourth moments by the assumptions. Q zzN = E{ Q zz | FN }, z N is the finite population mean of th Thus, by assumption (A.5) z, Q zzN is a positive definite matrix for the N population, and the limit of Q is positive definite. Then zzN 1/ 2 ˆ L V ββ (β − βN ) → N (0, I ), ˆ ­1 −1 β − βN | FN = Q xxN b ′HT + Op (n ), (A.3) where −1 −1 Vββ = Q xxN V b b Q xxN −1 −1 − 1 where βN = Q xxN Q xyN , bHT = N Σi∈ A πi bi , b ′i = −1 n N π i ζ′i e i , and Vb b = V {b HT }. Now n − 1 X ′ Φ − 1 e ˆ = n −1 X ′ Φ − 1 e + n − 1 X ′ Φ − 1 X ( β − βˆ )  Q yyN Q yxN  N Q =   , (A.4) −1 −1 −1 −1 zzN   =: N π b ′ + N π h ′ ,  Q xyN Q xxN  ∑ i i ∑ i i i∈ A i∈ A − 1 ei = y i − x i β N , and ζ′i is column i of X ′ Φ . where Assume the design is such that − 1 h ′i = n Nπ i ζ ′i x i δ β − 1/ 2 L ˆ V {z − z | F }→ N (0, I ), (A.5) and δ β = β N − β. For any fixed δ , by (A.6), the estimated z z HT N N −1 − 1 variance of N ∑ i∈ A π i ( b ′i + h ′i ) is consistent for the variance of the estimator of the mean of b + h . By as n N → ∞ for any z with finite fourth moments, where assumption, the elements of ζ′i x i have fourth moments. For Vz z is the covariance matrix of zHT − z N . Assume that −1 −1 a fixed δ the variance of h HT is O ( n ) . For δ = δ β , Vz z is O ( n ) and that the design admits an estimator ˆ ˆ − 1 Vz z such that V{h H′ T } = op (n ), ˆ and n(Vz z − Vz z ) | FN = o p (1) (A.6) ˆ − 1 V{bH′ T } = V{b ′HT } + op (n ) for any z with bounded fourth moments. Then − 1 / 2 because δ = O ( n ) . Result (A.7) then follows from the β p ˆ ˆ ˆ −1 / 2 ˆ L asymptotic normality of β − β N . [V{β}] [β − βΝ ] | FN → N (0, I ), (A.7) Theorem A.2. Let y ′ = ( y1 , y 2 , ..., y n ) and X ′ = ( x ′ , x ′ , ..., x ′ ). Let Φ be a nonsingular symmetric where 1 1 n n × n matrix and let Φ N be a nonsingular symmetric N × N matrix. Let Vˆ {βˆ} = Qˆ −1 Vˆ Qˆ − 1 , (A.8) xx b b xx −1 −1 −1 − 1 yπ , xπ , n (X′ Φ X) and n X′ Φ y ˆ ˆ ′ ′ Vb b = V {b HT } is the estimated design variance of b HT be design consistent estimators for finite population charac­ ˆ − 1 ˆ calculated with b i′ = n N πi ζ i′ eˆi and eˆi = yi − xi β. teristics y N , xN , Q xxN and Q xyN , respectively, where

Statistics Canada, Catalogue No. 12­001­XIE Survey Methodology, June 2002 19

−1 − 1 − 1 −1 ˆ [ Q , Q ] = [ N X ′ Φ X , N X Φ y ]. (A.9) where V {eπ } is the estimator of (A.14) constructed with xxN xyN N N N N N N ˆ eˆi = yi − yπ − (x1, i − x1 , π ) β1 , dopt . − 1 Let β N = Q xxN Q xyN . Let there be a sequence of column vectors {γ N } such that Proof. The estimator − 1 X γ N = Φ D π J n (A.10) ˆ ˆ − 1 ˆ β1, dopt = [V{x1, π}] C{x1 , π , y π } for all possible samples, where D π = diag ( π1 , π 2 , ..., π n ) minimizes the estimated variance of (A.15), and, by and J n is an n–dimensional column vector of ones. Then, the regression estimator x βˆ with assumption (A.14), the estimated variance is consistent for N ˆ the true variance. Hence, β1 , dopt is design consistent for ˆ −1 −1 −1 β = (X′ Φ X) X′ Φ y , (A.11) β1 , dopt and β1 , dopt minimizes V {y π − x1 , π β} . Therefore, no estimator of the form (4.29) has a limit distribution with is a design consistent estimator of y N . smaller variance. Proof. If βˆ is defined by (A.11), then by the properties of Now generalized least squares estimators, ˆ yd , reg − yN = yπ − y N − (x1, N − x1 , π ) β1 , dopt (y − Xβˆ )′ Φ −1 X = 0. − 1/2 = eπ + op (n ), If (A.10) holds, then where ei = yi − y N − (x1, i − x1 , N )β1 , dopt . Therefore the 1/ 2 variance of the limiting distribution of n ( yd , reg − y N ) is 1/ 2 −1  − 1  ( y − Xβˆ)′ D J = π ( y − x βˆ ) = 0. the variance of n (eπ − eN ). By assumption (A.14), the π  ∑ i  π π ˆ  i∈ A  estimator V {z π γ} is a consistent variance estimator of ˆ V {z π γ} for any fixed γ . Because β1, dopt − β1 , dopt = o p (1), It follows that yreg is design consistent because the estimated variance based on eˆi converges to the estimated variance based on ei and (A.16) holds. ˆ 0 = p lim {( y π − xπ βn ) | F N } N →∞

= p lim {( y π − xπ βN ) | F N } References N →∞

= p lim {( y N − x N βN ) | F N }. Anderson, C., and Nordberg, L. (1998). A user’s guide to CLAN97. N →∞ Statistics Sweden, Orebro, Sweden.

Theorem A.3. Let a sequence of populations and samples Bankier, M.D., Rathwell, S. and Majkowski, M. (1992). Two step be as defined in Theorem A.1. Let z i be a vector of the generalized least squares estimation in the 1991 Canadian Census. form z i = ( y i , 1, x1 , i ) and let z 1, i = ( y i , x1 , i ). Assume Working Paper­Methodology Branch, Census Operations Section, Social Survey Methods Division. Statistics Canada, Ottawa. z1 , π is a design consistent estimator of the population mean z with nonsingular covariance matrix 1, N Bankier, M.D., Houle, A.M. and Luc, M. (1997). Calibration − 1 estimation in the 1991 and 1996 Canadian census. Statistics V {z1, π | FN } = O (n ) (A.12) Canada (draft), 8 pages. and n1/ 2 (z − z ) | F L → N (0, ∑ ), (A.13) Bardsley, P., and Chambers, R.L. (1984). Multipurpose estimation 1, π 1, N N z z from unbalanced samples. Applied Statistics, 33, 290­299. where Σ is the limit of nV {z | F }. Assume there is z z 1, π N ˆ Bethlehem, J.G. (1988). Reduction of nonresponse bias through an estimator of the variance of z1 , π , denoted by V {z1 , π }, regression estimation. Journal of Official Statistics, 4, 251­260. such that Bethlehem, J.G., and Keller, W.J. (1987). Linear weighting of sample 1 +δ ˆ p lim n (V {z1, π}−V {z1, π | FN }) = 0 (A.14) survey data. Journal of Official Statistics, 3, 141­153. N →∞ ˆ Bishop, Y.M.M., Fienberg, S.E. and Holland, P.W. (1975). Discrete for some δ > 0. Let β 1, dopt be the vector that minimizes Multivariate Analysis: Theory and Practice. MIT Press, ˆ Cambridge, MA. V {yπ − x1, π β1, d } (A.15) and let β be the vector that minimizes Breidt, F.J., and Opsomer, J.D. (2000). Local polynomial regression 1, dopt estimators in survey sampling. The Annals of Statistics, 28, V {y π − x1 , π β1 , d }. Let yd , reg be defined by (4.29). Then 1026­1053. yd , reg has the minimum limit variance for design consistent estimators of the form yπ + (x1, N − x1 , π ) β1 , d . Also Brewer, K.R.W. (1963). Ratio estimation and finite populations: Some results deducible from the assumption of an underlying ˆ − 1/ 2 L [V{eπ }] ( yd , reg − yN ) → N (0, 1), (A.16) stochastic process. Australian Journal of Statistics, 5, 93­105.

Statistics Canada, Catalogue No. 12­001­XIE 20 Fuller: Regression Estimation for Survey Samples

Brewer, K.R.W. (1979). A class of robust sampling designs for Deville, J., Särndal, C.­E. and Sautory, O. (1993). Generalized raking large­scale surveys. Journal of the American Statistical procedures in survey sampling. Journal of the American Statistical Association, 74, 911­915. Association, 88, 1013­1020. Dorfman, A.H. (1993). A comparison of design­based and Brewer, K.R.W., Hanif, M. and Tam, S.M. (1988). How nearly can model­based estimators of the finite population distribution model­based prediction and design­based estimation be function. Australian Journal of Statistics, 35, 29­41. reconciled? Journal of the American Statistical Association, 83, 128­132. Dorfman, A.H., and Hall, P. (1993). Estimators of the finite population distribution function using nonparametric regression. Brick, J.M., Waksberg, J. and Keeter, S. (1996). Using data on Annals of Statistics, 21, 1452­1475. interruptions in telephone service as coverage adjustments. Survey Methodology, 22, 185­197. Duchesne, P. (2000). A note on jackknife variance estimation for the general regression estimator. Journal of Official Statistics, 16, Cassel, C.M., Särndal, C.­E. and Wretman, J.H. (1976). Some results 133­138. on generalized difference estimation and generalized regression estimation for finite populations. Biometrika, 63, 615­620. Du Mouchel, W.H., and Duncan, G.J. (1983). Using survey weights in multiple regression analysis of stratified samples. Journal of the Cassel, C.M., Särndal, C.­E. and Wretman, J.H. (1979). Prediction American Statistical Association, 78, 535­543. theory for finite populations when model­based and design­based Eltinge, J.L., and Yansaneh, I.S. (1997). Diagnostics for formation of principles are combined. Scandinavian Journal of Statistics, 6, nonresponse adjustment cells, with an application to income 97­106. nonresponse in the U. S. Consumer Expenditure Survey. Survey Methodology, 23, 33­40. Cassel, C.M., Särndal, C.­E. and Wretman, J.H. (1983). Some uses of statistical models in connection with the nonresponse problem. In Ericson, W.A. (1969). Subjective Bayesian models in sampling finite Incomplete Data in Sample Surveys, (Eds. W.G. Madow, I. Olkin, populations. Journal of the Royal Statistical Society, Series B, 31, and D. Rubin). New York: Academic Press, 3, 143­160. 195­224.

Chambers, R.L. (1996). Robust case­weighting for multipurpose Estevao, V., Hidiroglou, M.A. and Särndal, C.­E. (1995). establishment surveys. Journal of Official Statistics, 12, 3­32. Methodological principles for a generalized estimation system at Statistics Canada. Journal of Official Statistics, 11, 181­204. Chambers, R.L., Dorfman, A.H. and Wehrly, T.E. (1993). Bias robust estimation in finite populations using nonparametric Firth, D., and Bennett, K.E. (1998). Robust models in probability calibration. Journal of the American Statistical Association, 88, sampling. Journal of the Royal Statistical Society, Series B, 60, 268­277. 3­21. Cochran, W.G. (1939). The use of the analysis of variance in Folsom, R.E., and Witt, M.B. (1994). Testing a new attrition enumeration by sampling. Journal of the American Statistical nonresponse adjustment method for SIPP. Proceedings of the Association, 34, 492­510. Section on Survey Research Methods, American Statistical Association, 428­433. Cochran, W.G. (1942). Sampling theory when the sampling units are of unequal sizes. Journal of the American Statistical Association, Folsom, R.E., and Singh, A.C. (2000). The generalized exponential 37, 199­212. model for a unified approach to sampling weight calibration for outlier weight treatment, nonresponse adjustment and Cochran, W.G. (1946). Relative accuracy of systematic and stratified post­stratification. Proceedings of the Section on Survey Research random samples for a certain class of populations. Annals of Methods, American Statistical Association, 598­603. Mathematical Statistics, 17, 164­177. Frankel, M.R. (1971). Inference from survey samples: An empirical Cochran, W.G. (1977). Sampling Techniques, 3 rd ed., New York: investigation. Institute for Social Research, University of John Wiley & Sons, Inc. Michigan, Ann Arbor. Fuller, W.A. (1966). Estimation employing post strata. Journal of the Cook, R.D., and Weisberg, S. (1982). Residuals and Influence in America Statistical Association, 61, 1172­1183. Regression. New York: Chapman and Hall. Fuller, W.A. (1973). Regression for sample surveys. Paper presented Deming, W.E., and Stephan, F.F. (1940). On a least squares at meeting of International Statistical Institute. August, 1973, adjustment of a sampled frequency table when the expected Vienna, Austria. marginal totals are known. Annals of Mathematical Statistics, 11, 427­444. Fuller, W.A. (1975). Regression analysis for sample survey. Sankhyā, Series C, 37, 117­132. Deming, W.E., and Stephan, F.F. (1941). On the interpretation of censuses as samples. Journal of the American Statistical Fuller, W.A. (1984). Least squares and related analyses for complex Association, 36, 45­49. survey designs. Survey Methodology, 10, 97­118.

Deville, J., and Särndal, C.­E. (1992). Calibration estimators in survey Fuller, W.A., and An, A.B. (1998). Regression adjustments for sampling. Journal of the American Statistical Association, 87, nonresponse. Journal of the Indian Society of Agricultural 376­382. Statistics, 51, 331­342.

Statistics Canada, Catalogue No. 12­001­XIE Survey Methodology, June 2002 21

Fuller, W.A., and Battese, G.E. (1973). Transformations for Holt, D., and Smith, T.M. F. (1979). Post Stratification. Journal of estimation of linear models with nested­error structure. Journal of the Royal Statistical Society, Serie. A, 142, 33­46. the American Statistical Association, 68, 626­632. Horn, S.D., Horn, R.A. and Duncan, D.B. (1975). Estimating Fuller, W.A., and Isaki, C.T. (1981). Survey design under heteroscedastic variances in linear models. Journal of the superpopulation models. In Current Topics in Survey Sampling, American Statistical Association, 70, 380­385. (Eds., D. Krewski, J.N.K. Rao and R. Platek), New York: Academic Press, 199­226. Huang, E.T., and Fuller, W.A. (1978). Nonnegative regression estimation for sample survey data. Proceedings of the social Fuller, W.A., Loughin, M.M. and Baker, H.D. (1994). Regression statistics section, American Statistical Association, 300­305. weighting for the 1987­88 National Food Consumption Survey, Survey Methodology, 20 75­85. Husain, M. (1969). Construction of regression weights for estimation in sample surveys. Unpublished M.S. thesis, Iowa State Fuller, W.A., and Rao, J.N.K. (2001). A regression composite University, Ames, Iowa. estimator with application to the Canadian labour force survey. Survey Methodology, 27, 45­52. Isaki, C.T. (1970). Survey designs utilizing prior information. Unpublished Ph.D. thesis. Iowa State University. Gambino, J., Kennedy, B. and Singh, M.P. (2001). Regression composite estimation for the Canadian labour force survey: Isaki, C.T., and Fuller, W.A. (1982). Survey design under the Evaluation and implementation. Survey Methodology, 27, 65­74. regression superpopulation model. Journal of the American Statistical Association, 77, 89­96. Gerow, K., and Mcculloch, C.E. (2000). Simultaneously model unbiased, design­unbiased estimation. Biometrics 56, 873­878. Isaki, C.T., Tsay, J.H. and Fuller, W.A. (2000). Estimation of census adjustment factors. Survey Methodology, 26, 31­42. Godambe, V.P. (1955). A unified theory of sampling from finite populations. Journal of the Royal Statistical Society, Series B, 17, Jessen, R.J. (1942). Statistical investigation of a sample survey for 269­278. obtaining farm facts. Iowa Agriculture Experiment Station Research Bulletin, 304. Godambe, V.P., and Joshi, V.M. (1965). Admissibility and Bayes estimation in sampling finite populations, 1. Annals of Kalton, G., and Maligalig, D.S. (1991). A Comparison of Methods of Mathematical Statistics, 36, 1707­1722. Weighting Adjustment for Nonresponse. Proceedings of the 1991 Annual Research Conference, U.S. Bureau of the Census, Goldberger, A.S. (1962). Best linear unbiased prediction in the 409­428. generalized linear regression model. Journal of the American Statistical Association, 57, 369­375. Kish, L., and Frankel, M.R. (1974). Inference from complex samples (with discussion). Journal of the Royal Statistical Society, Series Graybill, F.A. (1976). Theory and application of the linear model. B, 36, 1­37. Wadsworth, Belmont, CA. Konijn, H.S. (1962). Regression analysis for sample surveys. Journal Hájek, J. (1959). Optimum strategy and other problems in probability of the American Statistical Association, 57, 590­606. sampling. Casopis Pro Pestovani Matematiky, 84, 387­423. Kott, P.S. (1994). A note on handling nonresponse in sample surveys. Hanurav, T.V. (1966). Some aspects of unified sampling theory. Journal of the American Statistical Association, 89, 693­696. Sankhyā, Series A, 28, 175­204. Kuo, L. (1988). Classical and prediction approaches to estimating Henderson, C.R. (1963). Selection index and expected genetic distribution function from survey data. Proceedings of the Section advance. In Statistical Genetics and Plant Breeding, 141­163. on Survey Research Methods, American Statistical Association, National Academy Sciences, National Research Council 280­285. Publication 982, Washington, DC. Lazzeroni, L.C., and Little, R.J.A. (1998). Random­effects models for smoothing post­stratification weights. Journal of Official Harville, D.A. (1976). Extension of the Gauss­Markov Theroem to Statistics, 14, 61­78. include estimation of random effects. Annals of Statistics, 4, 384­395. Little, R.J.A. (1982). Models for nonresponse in sample surveys. Journal of the American Statistical Association, 77, 237­250. Hidiroglou, M.A. (1974). Estimation of regression parameters for finite populations. Ph.D. thesis, Iowa State University, Ames, Little, R.J.A. (1986). Survey nonresponse adjustments for estimates Iowa. of means. International Statistical Review, 54, 139­157.

Hidiroglou, M.A., Fuller, W.A. and Hickman, R.D. (1978). Super Little, R.J.A., and Rubin, D.B. (1987). Statistical Analysis with Carp, (sixth edition, 1980) Survey Section, Statistical Laboratory, Missing Data. New York: John Wiley & Sons, Inc. Iowa State University, Ames, Iowa. Little, R.J.A. (1993). Post­stratification: A modeler’s perspective. Journal of the American Statistical Association, 88, 1001­1012. Hidiroglou, M.A., Särndal, C.­E. and Binder, D.A. (1995). Weighting and estimation in business surveys. Business Survey Methods, Madow, W.G., and Madow, L.H. (1944). On the theory of systematic (Eds. Cox, Binder, Chinnappa, Colledge and Kott) New York: sampling. Annals of Mathematical Statistics, 15, 1­24. John Wiley & Sons, Inc., 477­502.

Statistics Canada, Catalogue No. 12­001­XIE 22 Fuller: Regression Estimation for Survey Samples

Mickey, M.R. (1959). Some finite population unbiased ratio and Royall, R.M., and Cumberland, W.G. (1981). The finite population regression estimators. Journal of the American Statistical linear regression estimator and estimators of its variance, an Association, 54, 594­612. empirical study. Journal of the American Statistical Association, 76, 924­930. Montanari, G.E. (1987). Post­sampling efficient Q­R prediction in large­sample surveys. International Statistical Review, 55, Särndal, C.­E. (1980). On π − weighting versus best linear unbiased 191­202. weighting in probability sampling. Biometrika, 67, 639­650.

Montanari, G.E. (1999). A study on the conditional properties of Särndal, C.­E. (1982). Implications of survey design for generalized finite population mean estimators. Metron, 57, 21­35. regression estimation of linear functions. Journal of Statistical Planning and Inference, 7, 155­170. Mukhopadhyay, P. (1993). Estimation of a finite population total under regression models: A review. Sankhyā, 55, 141­155. Särndal, C.­E. (1996). Efficient estimators with simple variance in unequal probability sampling. Journal of the American Statistics Nieuwenbroek, N., Renssen, R. and Hofman, L. (2000). Towards a Association, 91, 1289­1300. generalized weighting system. In Proceedings of the Second International Conference on Establishment Surveys, American Särndal, C.­E., Swenson, B. and Wretman, J.H. (1989). The weighted Statistical Association, Alexandria, Virginia. residual technique for estimating the variance of the general regression estimator of the finite population total. Biometrika, 76, Park, M. (2002). Regression estimation of the mean in Survey 527­537. Sampling. Unpublished Ph.D. dissertation, Iowa State University, Ames, Iowa. Särndal, C.­E., Swenson, B. and Wretman, J.H. (1992). Model Assisted Survey Sampling. New York: Springer­Verlag. Pfeffermann, D. (1984). Note on large sample properties of balanced samples. Journal of the Royal Statistical Society, Series B, 46, Särndal, C.­E., and Wright, R. (1984). Cosmetic form of estimators 38­41. in survey sampling. Scandinavian Journal of Statistics, 11, 146­ 156. Rao, J.N.K. (1994). Estimating totals and distribution functions using auxiliary information at the estimation stage. Journal of Official Scott, A., and Smith, T.M.F. (1974). Linear superpopulation models Statistics, 10, 153­165. in survey and sampling. Sankhyā, C, 36, 143­146. Scott, A., and Wu, C.F. (1981). On the asymptotic distribution of ratio Rao, J.N.K. (2002). Small Area Estimation Theory and Methods, New and regression estimators. Journal of the American Statistical York: John Wiley & Sons, Inc. Association, 76, 98­102. Rao, J.N.K, Hartley, H.O. and Cochran, W.G. (1962). On a simple Silva, P.L.D.N., and Skinner, C.J. (1997). Variable selection for procedure of unequal probability sampling without replacement. regression estimation in finite populations. Survey Methodology, Journal of the Royal Statistical Society, Series B, 24, 482­491. 23, 23­32.

Rao, J.N.K., and Singh, A.C. (1997). A ridge shrinkage method for Singh, A.C., and Folsom, R.E. (2000). Bias corrected estimating range restricted weight calibration in survey sampling. functions approach for variance estimation adjusted for Proceedings of the section on survey research methods, American poststratification. Proceedings of the Section on Survey Research Statistical Association, 57­64. Methods, American Statistical Association, 610­615. Singh, A.C., Kennedy, B. and Wu, S. (2001). Regression composite Robinson, G.K. (1991). The BLUP is a good thing: The estimation of estimation for the Canadian Labour Force Survey with a rotating random effects. Statistical Science, 6, 15­32. design. Survey Methodology, 27, 33­44. Robinson, P.M., and Särndal, C.­E. (1983). Asymptotic properties of Singh, A.C., and Mohl, C.A. (1996). Understanding calibration the generalized regression estimator in probability sampling. estimators in survey sampling. Survey Methodology, 22, 107­115. Sankhyā, Series B, 45, 240­248. Tallis, G.M. (1978). Note on robust estimation infinite populations. Sankhyā, C, 40, 136­138. Rosenbaum, P.R., and Rubin, D.B. (1983). The central role of the propensity score in observational studies for casual effects. Tam, S.M. (1986). Characterization of best model­based predictors in Biometrika, 70, 41­55. survey sampling. Biometrika, 73, 232­235.

Royall, R.M. (1970). On finite population sampling theory under Théberge, A. (1999). Extensions of calibration estimators in survey certain linear regression models. Biometrika, 57, 377­387. sampling. Journal of the American Statistical Association, 94, 635­644. Royall, R.M. (1976). The linear least squares prediction approach to two­stage sampling. Journal of the American Association, 71, Théberge, A. (2000). Calibration and restricted weights. Survey 657­664. Methodology, 26, 99­107.

Royall, R.M. (1986). The prediction approach to robust variance Tillé, Y. (1998). Estimation in surveys using conditional inclusion estimation in two stage cluster sampling. Journal of the American probabilities: Simple random sampling. International Statistical Statistical Association, 81, 119­123. Review, 66, 303­322. Royall, R.M., and Cumberland, W.G. (1978). Variance estimation in Tremblay, V. (1986). Practical Criteria for Definition of Weighting finite population sampling. Journal of the American Statistical Classes. Survey Methodology, 12, 85­97. Association, 73, 351­358.

Statistics Canada, Catalogue No. 12­001­XIE Survey Methodology, June 2002 23

Watson, D.J. (1937). The estimation of leaf area in field crops. Yates, F. (1949). Sampling Methods for Census and Surveys. Journal of Agricultural Science, 27, 474­483. London: Griffin. Woodruff, R.S., and Causey, B.D. (1976). Computerized method for Yung, W., and Rao, J.N.K. (1996). Jackknife linearization variance approximating the variance of a complicated estimate. Journal of estimators under stratified multi­stage sampling. Survey the American Statistical Association, 71, 315­321. Methodology, 22, 23­31. Wright, R.L. (1983). Finite population sampling with multivariate Zyskind, G. (1976). On canonical forms, non­negative covariance auxiliary information. Journal of the American Statistical matrices and best and simple least squares linear estimators in Association, 78, 879­884. linear models. Annals of Mathematical Statistics, 38, 1092­1109. Wu, C., and Sitter, R.R. (2001). A model­calibration approach to using complete auxiliary information from survey data. Journal of the American Statistical Association, 96, 185­193.

Statistics Canada, Catalogue No. 12­001­XIE