Errors and Bias 11
Total Page:16
File Type:pdf, Size:1020Kb
ERRORS AND BIAS 11 Introduction of estimation. A fixed weight index can be seen as a weighted average of partial indices of commodity groups, 11.1 This chapter discusses the general types of with weights being expenditure shares. The estimation potential error to which all price indices are subject. The procedures that most statistical offices apply to a CPI literature on consumer price indices (CPIs) discusses involve different kinds of samples. The most important these errors from two perspectives, and this chapter kinds are: presents the two perspectives in turn. First, the chapter for each commodity group, a sample of commodities describes the sources of sampling and non-sampling to calculate the partial price index of the commodity error that arise in estimating a population CPI from a group; sample of observed prices. Second, the chapter reviews the arguments made in numerous recent studies for each commodity, a sample of outlets to calculate that attribute bias to CPIs as a result of insufficiently the elementary price index of the commodity from accurate treatment of quality change, consumer sub- individual price observations; stitution and other factors. It should be emphasized that a sample of households needed for the estimation of many of the underlying issues discussed here are dealt the average expenditure shares of the commodity with in much greater detail elsewhere in the manual. groups. (Some countries use data from national accounts instead of a household expenditure survey to obtain the expenditure shares.) Types of error 11.4 The sampling error can be split into a selection 11.2 One of the main objectives of a sample survey error and an estimation error. A selection error occurs is to compute estimates of population characteristics. when the actual selection probabilities deviate from the Such estimates will never be exactly equal to the popu- selection probabilities as specified in the sample design. lation characteristics. There will always be some error. The estimation error denotes the effect caused by using Table 11.1 gives a taxonomy of the different types of a sample based on a random selection procedure. Every error. See also Balk and Kersten (1986) and Dale´ n (1995) new selection of a sample will result in different elements, for overviews of the various sources of stochastic and and thus in a possibly different value of the estimator. non-stochastic errors experienced in calculating a CPI. Two broad categories can be distinguished: sampling errors and non-sampling errors. Non-sampling error 11.5 Non-sampling errors may occur even when the whole population is observed. They can be subdivided Sampling error into observation errors and non-observation errors. 11.3 Sampling errors are due to the fact that an Observation errors are the errors made during the process estimated CPI is based on samples and not on a complete of obtaining and recording the basic observations or enumeration of the populations involved. Sampling responses. errors vanish if observations cover the complete popula- 11.6 Overcoverage means that some elements are tion. As mentioned in previous chapters, statistical offi- included in the survey which do not belong to the target ces usually adopt a fixed weight price index as the object population. For outlets, statistical offices usually have inadequate sampling frames. In some countries, for instance, a business register is used as the sampling frame Table 11.1. A taxonomy of errors in a consumer price index for outlets. In such a register, outlets are classified accord- Total error: ing to major activity. The register thus usually exhibits Samplingerror extensive overcoverage, because it contains numerous Selection error outlets which are out of scope from the CPI perspective Estimation error (e.g. firms that sell to businesses rather than to house- Non-samplingerror Observation error holds). In addition, there is usually no detailed infor- Overcoverage mation on all the commodities sold by an outlet, so it is Response error possible that a sampled outlet may turn out not to sell a Processingerror particular commodity at all. Non-observation error 11.7 Response errors in a household expenditure Undercoverage Non-response survey or price survey occur when the respondent does not understand the question, or does not want to give the 207 CONSUMER PRICE INDEX MANUAL: THEORY AND PRACTICE right answer, or when the interviewer or price collector so-called judgemental and cut-off selection methods are makes an error in recording the answer. In household applied. expenditure surveys, for example, households appear to 11.13 In view of the complexity of the (partially systematically underreport expenditures on commodity connected) sample designs in compiling a CPI, an inte- groups such as tobacco and alcoholic beverages. In most grated approach to variance estimation appears to be countries, the main price collection method is by persons problematic. That is, it appears to be difficult to present who regularly visit outlets. They may return with prices a single formula for measuring the variance of a CPI, of unwanted commodities. which captures all sources of sampling error. It is, how- 11.8 The price data are processed in different stages, ever, feasible to develop partial (or conditional) mea- such as coding, entry, transfer and editing (control and sures, in which only the effect of a single source of correction). At each step mistakes, so-called processing variability is quantified. For instance, Balk and Kersten errors, may occur. For example, at the outlets the price (1986) calculated the variance of a CPI resulting from collectors write down the prices on paper forms. After the sampling variability of the household expenditure the collectors have returned home, a computer is used survey, conditional on the assumption that the partial as the input and transmission medium for the price price indices are known with certainty. Ideally, all the information. It is clear that this way of processing prices conditional sampling errors should be put together in a is susceptible to errors. unifying framework in order to assess the relative 11.9 Non-observation errors are made when the importance of the various sources of error. Under rather intended measurements cannot be carried out. Under- restrictive assumptions, Balk (1989a) derived an inte- coverage occurs when elements in the target population grated framework for the overall sampling error of do not appear in the sampling frame. The sampling frame a CPI. of outlets can have undercoverage, which means that 11.14 There are various procedures for trying to some outlets where relevant commodities are purchased estimate the sampling variance of a CPI. Design-based cannot be contacted. Some statistical offices appear to variance estimators (that is, variances of Horvitz– exclude mail order firms and non-food market stalls from Thompson estimators) can be used, in combination with their outlet sampling frame. Taylor linearization procedures, for sampling errors 11.10 Another non-observation error is non-response. arising from a probability sampling design. For instance, Non-response errors may arise from the failure to obtain assuming a cross-classified sampling design, in which the required information in a timely manner from all the samples of commodities and outlets are drawn inde- units selected in the sample. A distinction can be drawn pendently from a two-dimensional population, with between total and partial (or item) non-response. Total probabilities proportional to size (PPS) in both dimen- non-response occurs when selected outlets cannot be sions, a design-based variance formula can be derived. contacted or refuse to participate in the price survey. In this way Dale´ n and Ohlsson (1995) found that Another instance of total non-response occurs when mail the sampling error for a 12-month change of the all- questionnaires and collection forms are returned by the commodity Swedish CPI was of the order of 0.1–0.2 respondent and the price collector, respectively, after the per cent. deadline for processing has passed. Mail questionnaires 11.15 The main problem with non-probability and collection forms that are only partially filled in are sampling is that there is no theoretically acceptable examples of partial non-response. If the price changes way of knowing whether the dispersion in the sample of the non-responding outlets differ from those of the data accurately reflects the dispersion in the population. responding outlets, the results of the price survey will be It is then necessary to fall back on approximation biased. techniques for variance estimation. One such technique 11.11 Total and partial non-response may also be is quasi-randomization (see Sa¨ rndal, Swensson and encountered in a household expenditure survey. Total Wretman (1992, p. 574)), in which assumptions are non-response occurs when households drawn in the sam- made about the probabilities of sampling commodi- ple refuse to cooperate. Partial non-response occurs, for ties and outlets. The problem with this method is that instance, when certain households refuse to give infor- it is difficult to find a probability model that ade- mation about their expenditure on certain commodity quately approximates the method actually used for groups. outlet and item selection. Another possibility is to use a replication method, such as the method of random groups, balanced half-samples, jackknife, or bootstrap. Measuring error and bias This is a completely non-parametric class of methods to estimate sampling distributions and standard errors. Estimation of variance Each replication method works by drawing a large 11.12 The variance estimator depends on both the number of sub-samples from the given sample. From chosen estimator of a CPI and the sampling design. Boon each sub-sample the parameter of interest can be (1998) gives an overview of the sampling methods that estimated. Under rather weak conditions, it can be are applied in the compilation of CPIs by various Euro- shown that the distribution of the resulting esti- pean statistical institutes.