Under Which the Response Bias Could Be Eliminated from Sample Survey Estimates by Obtaining 'True Observations• for a Subsample of the Original Sample

SURVEY METHODOLOGY 144 VOLUME 1 - NUMBER 2 MEASUREMENT OF RESPONSE EFRORS IN CENSUSES AND SAMPLE SURVEYS G.J. Brackstone, J.F. Gosselin, B.E. Garton Census Survey Methods Division Madow [1968] has proposed a tvm-phase sampling scheme under which response bias can be eliminated from sample surveys by obtaining 't:r·ue' values for a subsample of the original sample. Often in cases of Censuses or ongoing surveys, the subsarnple data are not used to correct the main survey estimates but to assess their reliability. The main purpose of this paper is to present methods by which reliability estimates can be obtained when true values can be determined for a subsample of units. 1. INTRODUCTION A sample scheme was proposed by Madow [1965] under which the response bias could be eliminated from sample survey estimates by obtaining 'true observations• for a subsample of the original sample. This is achieved by using the estimate of bias from the subsample to correct the original estimate. There are some instances, however, where the subsample data are obtained for evaluation purposes after the survey data has been published. In this case the objective is the measurement of the overall reliability of previously published survey data for the purpose of a) informing the user of the data of its reliability (allowing him to make adjustments to it if he wishes), and, b) informing the survey-taker of the sources of error in the survey so that improvements can be made in future surveys. The purpose of this paper is therefore to present estimators of the reliability of Census and sample survey data when true values can be obtained for a sample or sub-sample of cases. 145 Two approaches to this problem are considered. In the first case, we consider the problem under the usual 'response variance- response bias' framework. This represents the rna in part of the paper. The second approach ignores the above model and attempts to measure only the net error in the particular survey observed. The results in this paper apply to any sampling scheme for the original survey and to any sub-sampling scheme for true values, provided only that estimators of sampling variance are available for the sampling schemes used. 2. APPROACH I 2.1 Response Error Model The response error model is based on the concept of independent repeti tion. We will first treat the Census case (i.e. 100% enumeration). Suppose, hypothetically, that many independent 'trials' of the Census could be made under the same general conditions. The Census 'estimate' for a given category of interest would then follow a certain frequency distribution. A particular Census figure may then be considered to be a random observation from a distribution of all possible Census estimates. Let X(t) denote the Census estimate obtained at trial t and let~ denote the corresponding true mean. The usual statistical parameter used to measure the reliability of an estimate in this situation is the Mean Square Error (MSE) of X(t): 2 MS E (X ( t) ) = E (X ( t) - ~) t 2 = v [X(t)] + B t where B denotes the bias of X(t), 146 i.e. B E [x (t)] - J.l t - = X - J.l where X = E [X(t)] t The response variance of x( t) is defined as v (x ( t)) = E (x ( t) - x)2 t t In general, therefore, bias results from errors that tend to occur in one direction rather than another. For example, if an error was present in the instructions accompanying the Census questionnaire, errors would tend to be systematic in one direction. Hence, bias measures the average net effect of all these possible factors. On the other hand, response variance measures the random component of the error. For instance, in the case of an ambiguous question, a self enumerated person may give different responses on different independent trials. These types of error depend on unknown factors that are impossible to control and may vary from trial to trial (e.g. the frame of mind of the respondent, the fatigue of the interviewer). The above discussion applies to any characteristic obtained from a Census. However, it can easily be extended to cover characteristics obtained from a sample survey. In this case, let (t) denote the xs estimate obtained from sample, s, at trial t. The MSE [x (t)] can then s be expressed as MSE (t)] = E (t) - ~] 2 [x s [x s where the expectation is taken over all possible trials and samples. 147 Letting X = E (x (t)) E E (x ( t) Is) s s s t and B X - J..l 2 2 MSE (t)] E t) - + s [x s (x s ( x) 2 = E v (x ( t) Is) + v E (x ( t) is) + s s s s t s t where the first term measures response variance, the second measures sampling variance, and the third measures bias. 2.2 Estimation In the response error model described above three statistical parameters are defined, the mean square error, the response variance and the bias. Ideally we would 1 ike to obtain estimates of all three of these parameters in order to assess the level of both random and systematic errors and to obtain an overall measure of reliability. Under the assumption that the true value of a sample characteristic is known for a random sub-sample of the survey sample, unbiased estimates of the MSE and bias are derived below. An unbiased estimator of the true proportion, J..l, is also given following Madow [1965]. However, under this framework, it is not possible to estimate the total response variance. 1 Suppose true values are known for a random sub-sample, s , from s. Let (t) denote the unbiased estimate made from the sub-sample s 1 using xs 1 the values observed on trial t, and let ~ denote the corresponding s 1 estimate using the true values. So E (x 1 (t)js,t) = x (t) S I S 5 E E (~ ) = J..l S 5 I S 148 Thus B c>< I ( t) ).J- ) is an unbiased estimator of B. s s 1 2 Consider first the estimator (x (t) - ) . Letting E denote s 1 ~ s 1 1 expectations over s , sand t (i.e. E = E E E). t s s 1 2 = E I ( t) - I> 1 t} + E I ( t) - I> 1 t} + B v{ (x s ~ s v { (x s ~ s t ss 1 t ss 1 Secondly, let v (t)) be a variance estimator such that s (x s E [v (t)) JtJ (t) ltJ s (x s = v [x s s s i.e. an unbiased estimator of the sampling variance over the survey sampling scheme for a given trial t. Thirdly, let vss 1 (Xs 1 (t) - ~s 1 ) be a variance estimator such E [v 1 (t)- ~ 1 )jt] = v [(x l(t)- ~s~)ltJ ss 1 (x s s ss 1 SS I S A Now, if we define MSE (x ( t)) by s 2 MsE(x (t)) (xsl(t) - > + v (x (t)) - v (x (t) - ) S = ~ S I S S SS I 5 I ~ S I ( 2. 1) then A E(MSE (x (t))) = V E {(xs 1 (t)- ~s 1 )jt} + E V S t SS I t ss 1 2 + s + E v (x ( t) 1 t) - E v { (x I ( t) - 0 I> 1 t} t S S t SS I S S 149 2 v E (xs ( t) 1 t) + E v (xs ( t) 1 t) + s t s t s 2 v (t)) + s = t,s (x s = MSE (t)) (x s Thus MSE (xs(t)) given by (2.1) is an unbiased estimator of MSE (xs(t)). In the case where the original survey is a Census, the middle term in MSE (x (t)) disappears and we have s (2. 2) Given the unbiased estimator, B, of the bias of x (t), an unbiased s estimator of the true proportion, ~. is given by ~ ~ = X (t) - B s x ( t) (x I ( t) = s - s - ~s I)' A A with V(~) = E E v (~) + E v E (~) + v E E (~) t s sl t s S I t s sl = E E v (x l(t) - ~ sl ) + E v E (~s 1) t s s I S t s sl since E (~lt,s) = E(~ 1 lt,s) sl s I S and v E E (~) = 0 t s s 1 V(~) = E E v (x I ( t) ~s~) + E v (~s I) - E E V (~s 1) t s S I S t ss 1 t s 5 I 150 ]J- sl ) is a variance estimator such that E{v (t)- ~ ~)1t,s} = v (x ,(t)- ~ ~), 5 1 (.x 5 1 5 5 I 5 5 if v (lJ- ) is a variance estimator such that 55 I 5 I E[v ,(~ )ltl= v 55 5 1 ss 1 and if v 51 (~ 51 ) is a variance estimator such that then an unbiased estimator of V(~) is given by In the case where the original survey is a Census, an unbiased estimator, B, of the bias of X(t) is again given by and an unbiased estimator of the true proportion is given by In the expression for the variance of lJ derived above for the sample survey case, s now becomes the total population. The second and third terms therefore cancel and we get E I ( t) - I) v(~) = v (x s ~ s 151 .. and therefore an unbiased estimator of V(]J) is given by v(~) = v s • rx s ,(t) - ~ s .1 2.3 Example The purpose of this section is to describe how the previous estimators have been applied to a small study that was carried out in connection with the 1971 Census, and to present some numerical results.

Under Which the Response Bias Could Be Eliminated from Sample Survey Estimates by Obtaining 'True Observations• for a Subsample of the Original Sample

Statistical Characterization of Tissue Images for Detec- Tion and Classification of Cervical Precancers

The Instat Guide to Choosing and Interpreting Statistical Tests

Estimating Suitable Probability Distribution Function for Multimodal Traffic Distribution Function

Continuous Dependent Variable Models

Fitting Population Models to Multiple Sources of Observed Data Author(S): Gary C

Pdf) of a Random Process X(T) and E() the Mean

Section 2, Basic Statistics and Econometrics 1 Statistical

Statistical Evidence of Central Moments As Fault Indicators in Ball Bearing Diagnostics

Targeted Maximum Likelihood Estimation in Safety Analysis

Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations

Maximum Likelihood Vs. Bayesian Parameter Estimation

Guidance for Data Quality Assessment