Errors in : Overview

• Models of survey error: What, why, different perspectives • Total survey error and mean square error Errors in Surveys • Views of the components of error • Types of error

What are models of survey error? Why models of survey error?

These models serve at least two functions: • Models of survey error are statements in mathematical notation about the factors that influence a survey answer • Descriptive • All models make assumptions and simplifications • Identify and label components of error • Models from different traditions use different formulations • Indicate where in the process of survey measurement and have different conceptualizations of validity and errors originate reliability • Theoretical and statistical • Imply conditions or designs under which errors can be estimated • Make assumptions about error explicit Three questions about models of error Why models of survey error (continued) (Groves, 1991, pp. 3-5)

When we have designs that allow us to measure the impact of errors on survey estimates, then we can: • What is the statistic of interest? • Will constant errors affect the statistic of interest? • Explore ways of reducing errors and test success of results • Errors can affect means and regression coefficients differently • Evaluate trade-offs between errors and costs

Three questions about models of error (continued) Perspectives on error (Groves 1991)

• Which features of are fixed and which are • Disciplines variable? • statistics • Affects which components can be estimated and • Population means and totals interpretation of observations • Total survey error • Example: Sampling statistician is concerned with variation • over samples from same population with same design. • Econometrics Analyst may be concerned with values of a specific . • Orientation • Example: Psychometric notions of reliability consider • Reduce or measure response error over repeated applications of same item to • Collect or analyze same sample, not over different samples. • Describe or model • What assumptions are made about persons not measured or the properties of observational errors--are errors eliminated by model assumptions? Example: Sampling distribution (Kish 1965, p. 12) Example: Sampling distribution (Kish 1965) (continued)

Y is the population quantity estimated by y under a set of essential survey conditions. The diagram shows the sampling distribution of y . Each of the "." represents a specific sample estimate, y . If there is bias in the estimator of Y provided by the sample design (sampling bias), E () y will differ from Y true . We will ignore the possibility of sampling bias. We will consider only the total distance between the E () y Y and Y true , and we will think of the distance as total bias.

Mean square error (continued) Mean square error M SE=+va iancer bias 2 The mean square error of an estimator includes both • Deviations are taken from the population value, Y . variable error and bias: • Variable errors are those that vary over replications. MSEy()=+− Vary () [ Ey ( Y )]2 • For example, we imagine conducting replications of the survey under the same essential survey conditions, The variable component can be generalized to include drawing different respondents each time. variable errors in addition to . • Each trial gives a different estimate, and the distribution of estimates is the sampling distribution of the statistic. We cannot usually estimate the bias component of MSE • Bias includes sources of error that are constant over because we do not have information about Y , which we repeated replications of the survey design, that is,Y may can think of here as equivalent to Y true . not be equal to Y true . Sometimes we can ask about something for which there • Models differ in what is considered fixed and what is is an external criterion, that can be used as an estimate considered variable of Y true , even if it is not a perfect estimate. • Which components of error can be estimated depends on the survey design. Mean square error: Variability and bias Mean square error: Variability and bias (Biemer and Lyberg 2003)

• Consider two estimators with different properties • One has high variance and low bias • One has low variance and high bias • What is the total MSE of each?

Mean square error (continued): Variance and Composition of the MSE (Biemer and Lyberg 2003, p. 58) bias (Biemer and Lyberg 2003) Left Target: Low Bias, High Variance Right Target: High Bias, Low Variance Distance from Hit to Squared Distance from Hit Squared Hit Center of Hits Distance from Hit to Center of Hits Distance from Center Center 1 (2.2 - 0.15) = 2.05 4.20 1 (3.1 - 4.5) = -1.40 1.96 2 (-3.6 - 0.15) = -3.75 14.06 2 (3.7 - 4.5) = -0.80 0.64 3 -4.65 21.62 3 0.80 0.64 4 6.65 44.22 4 0.40 0.16 5 4.95 24.50 5 1.60 2.56 6 -7.35 54.02 6 -0.10 0.01 7 -4.05 16.40 7 -1.70 2.89 8 5.15 26.52 8 1.60 2.56 9 -1.95 3.80 9 0.00 0.00 10 2.95 8.70 10 -0.40 0.16 Avg. 0.0 21.81 Avg. 0.0 1.17 (=variance) (=variance) Bias = (0.15-0.0) = 0.15 Bias2 = 0.023 Bias = (4.5-0.0) = 4.5 Bias2 = 20.25 MSE = Bias2+Variance = 21.8 MSE = Bias2+Variance = 21.4 Example: Contribution of bias to total error Example: Contribution of bias to total error (Kish 1965, p. 513) (continued)

Homeowners interviewed about home value We can see the relative contributions of variable error and bias to total error by examining RMSE under different sample y = mean in sample = 9,200 sizes: n= standard deviation = 5,700 100 1,000 10,000 2 Y true = appraiser estimate = 8,880 57. n .57 .18 .06 ()57. 22n 32+ .65 .37 .32 if appraiser estimate is true mean, 22 2 57..() 57n + 032 .*76 240 308 bias = 320 = 3.5% of mean

2 * number of observations from unbiased design that would without bias, RMSE= 57. n yield the same total error as n observations from design with

with bias, RMSE=+()57..22 n 32 a mean bias of .32 = measure of the effect of the bias on the total error for different sample sizes

Sampling variance as a component of total MSE, Comparing MSE across designs – bias Comparing MSE across designs (Biemer and (continued) Lyberg 2003, p. 60-61) • Design A - face-to-face • Frame bias: Area frame sample, so all houses will be listed and coverage error should be low • : Highest response rate, nonresponse bias probably lowest • Measurement bias: Likely to be lowest • Design B - telephone interviewing • Frame bias: RDD frame omits those without phones • Response bias: Moderately low response rate, nonresponse bias probably intermediate • Measurement bias: Likely to be larger than A • Design C - self-administered by mail • Frame bias: Telephone directory-type listing for obtaining addresses omits both nontelephone and unlisted households • Response bias: Lowest response rate, nonresponse bias probably highest • Measurement bias: Likely to be larger than A Comparing MSE across designs - variance (continued) Views of the components of error

• Design A - face-to-face • Measurement variance: Largest because of presence of • Error and the steps in a survey (Groves et al. 2004) interviewer • Error in the survey process (Groves et al. 2004) • Sampling variance: largest because budget allows smallest sample size • Sampling and five major sources of nonsampling error • Design B - telephone interviewing (Biemer and Lyberg 2003) • Measurement variance: Intermediate because uses • Total survey error perspective interviewer, but interviewer not in the room • Sampling variance: intermediate because cost of interviews allow intermediate sample size • Design C - self-administered by mail • Measurement variance: Smallest because of absence of interviewer • Sampling variance: smallest because budget allows largest sample size

Error and quality in the steps of a survey Error and the survey process (Groves et al. 2004) (Groves et al. 2004, p. 48) Major sources of error (Biemer and Lyberg 2003, p. 39) Major sources of error (continued)

• Nonresponse error • Sampling error • Whole unit • Specification error • Within unit • Concepts • Item • Objectives • Incomplete information from open questions • Data elements • Measurement error • Frame error • Information systems or records consulted by respondents • Omissions • Setting • Erroneous inclusions • Mode of data collection • Duplications • Respondent • Interviewer • Instrument

Total survey error: Types of error (based on Major sources of error (continued) Groves 1989)

• Processing error • Editing • Data entry • Weighting • Tabulation Total Survey Error

Errors of Nonobservation Errors of Observation Errors of Processing Coverage , Mode Editing Sampling Interviewer Coding Nonresponse Respondent Imputation Total survey error: Types of error (Groves 1989) Methods of measuring errors

Bias Variance P Universe statistics Coverage P Replication of frame construction Sampling P Frame statistics P Randomized, repeated selections Nonresponse P Sample statistics P Randomized, multiple recruiters Measurement P Records, true values P Interpenetration of on R's multiple measurers

Processing P Pre-post comparison P Interpenetration of processors