Survey Methodology
Total Page:16
File Type:pdf, Size:1020Kb
Catalogue no. 12-001-XIE Survey Methodology June 2006 How to obtain more information Specific inquiries about this product and related statistics or services should be directed to: Business Survey Methods Division, Statistics Canada, Ottawa, Ontario, K1A 0T6 (telephone: 1 800 263-1136). For information on the wide range of data available from Statistics Canada, you can contact us by calling one of our toll-free numbers. You can also contact us by e-mail or by visiting our website. National inquiries line 1 800 263-1136 National telecommunications device for the hearing impaired 1 800 363-7629 Depository Services Program inquiries 1 800 700-1033 Fax line for Depository Services Program 1 800 889-9734 E-mail inquiries [email protected] Website www.statcan.ca Information to access the product This product, catalogue no. 12-001-XIE, is available for free. To obtain a single issue, visit our website at www.statcan.ca and select Our Products and Services. Standards of service to the public Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner and in the official language of their choice. To this end, the Agency has developed standards of service that its employees observe in serving its clients. To obtain a copy of these service standards, please contact Statistics Canada toll free at 1 800 263-1136. The service standards are also published on www.statcan.ca under About Statistics Canada > Providing services to Canadians. Statistics Canada Business Survey Methods Division Survey Methodology June 2006 Published by authority of the Minister responsible for Statistics Canada © Minister of Industry, 2006 All rights reserved. The content of this electronic publication may be reproduced, in whole or in part, and by any means, without further permission from Statistics Canada, subject to the following conditions: that it be done solely for the purposes of private study, research, criticism, review or newspaper summary, and/or for non-commercial purposes; and that Statistics Canada be fully acknowledged as follows: Source (or “Adapted from”, if appropriate): Statistics Canada, year of publication, name of product, catalogue number, volume and issue numbers, reference period and page(s). Otherwise, no part of this publication may be reproduced, stored in a retrieval system or transmitted in any form, by any means—electronic, mechanical or photocopy—or for any purposes without prior written permission of Licensing Services, Client Services Division, Statistics Canada, Ottawa, Ontario, Canada K1A 0T6. July 2006 Catalogue no. 12-001-XIE ISSN 1492-0921 Frequency: semi-annual Ottawa Cette publication est disponible en français sur demande (no 12-001-XIF au catalogue). Note of appreciation Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued cooperation and goodwill. 4 Longford: Sample Size Calculation for Small•Area Estimation Vol. 32, No. 1, pp. 8796 Stati stics Canada, Catalogue No. 12001 Sample Size Calculation for SmallArea Estimation Nicholas Tibor Longford 1 Abstract We describe a general approach to setting the sampling design in surveys that are planned for making inferences about small areas (subdomains). The approach requires a specification of the inferential priorities for the areas. Sample size allocation schemes are derived first for the direct estimator and then for composite and empirical Bayes estimators. The methods are illustrated on an example of planning a survey of the population of Switzerland and estimating the mean or proportion of a variable for each of its 26 cantons. Key Words: Efficiency; Inferential priority; Sample size allocation; Smallarea estimation. 1. Introduction designed for estimating national quantities but, sometimes almost as an afterthought, are used for inferences about Sampling design is a key device for efficient estimation small areas. This would be appropriate if the sampling and other forms of inference about a large population when designs optimal for smallarea and national inferences were the resources available do not permit collecting the relevant similar. We illustrate in this paper that this is not the case information from every member of the population. In this and that sampling design can be effectively targeted for context, efficiency is interpreted as the optimal combination smallarea estimation, taking into account the goal of of a sampling design and an estimator of a population quan efficient estimation of national quantities. To avoid the tity q. By optimum we understand minimum mean squared trivial case, we assume that the areas have unequal popu error, although the development presented in this paper can lation sizes. We apply the methods to the problem of be adapted for other criteria. The pool of the possible planning inferences about the 26 cantons of Switzerland; sampling designs is delimited by the resources, and these their population sizes range from 15,000 (Appenzell are usually expressed in terms of a fixed sample size. This is Innerrhoden) to 1.23 million (Zürich). The population of not always appropriate because the designs may not entail Switzerland is 7.26 million. identical average costs per subject. However, within a Literature on the subject of planning surveys for small limited range of designs, this issue can be ignored. area estimation is rather sparse. An important contribution is The problem of setting the sampling design for the Singh, Gambino and Mantel (1994). In one of the ap purpose of efficient estimation of a single quantity is well proaches they discuss, the planned sample size for the understood, and solutions are available for many commonly Canadian Labor Force Survey is split into two parts. One encountered settings. Most of them involve a univariate part is allocated optimally for the purpose of national constrained optimisation problem. Setting the sampling (domain) estimation and the remainder optimally for small design for estimating several quantities represents a quan area estimation. For the latter goal, equal subsample sizes tum leap in complexity, because the problem involves are allocated to each area when the areas have equal within several factors, typically one for each quantity. It is essential area variances, the finite population correction can be to optimise the design simultaneously for all the factors, ignored and the areas have equal survey costs per subject, because the goals of efficient inference about the target but also when the targets of inference are arealevel means. quantities may be in conflict. For example, in smallarea When the targets are population totals, equal allocation to estimation, a more generous allocation of the sample size to the areas is not efficient, because it handicaps estimation for one area has to be compensated by a less generous allo more populous areas. Even when proportions or rates (per cation to one or several other areas. centages) are estimated, the withinarea variances depend on Smallarea statistics have become an important research the population proportion, although the dependence is weak topic in survey methods in the last few decades (Fay and when all the proportions are distant from zero and unity. For Herriot 1979; Platek, Rao, Särndal and Singh 1987; Ghosh more recent developments in sampling design for smallarea and Rao (1994), Longford 1999; and Rao 2003), stimulated estimation, see Marker (2001). by increasing interest of government agencies, the adver The next section describes the proposed approach based tising and marketing industry and the financial and in on minimising the weighted sum of the sampling variances surance sector. At present, many largescale surveys are (mean squared errors) of the planned estimators, with the 1. Nicholas Tibor Longford, Departament d’Económia i Empresa, Universitat Pompeu Fabra, Ramón Trias Fargas 2527, 08005 Barcelona, Spain. Email: [email protected]. Survey Methodology, June 2006 5 weights specified to reflect the inferential priorities. It is ¶v P d = const. applied first to direct estimation of the arealevel quantities. d ¶n d Then it is extended to incorporate the goal of national estimation, and, finally, to composite estimation in section An analytical expression for the optimal subsample sizes 2 3. The concluding section 4 contains a discussion. nd cannot be obtained in general, but when vd = s d / nd , The remainder of this section introduces the notation as in simple random sampling within areas, the solution is used in the rest of the paper. We assume that arealevel proportional to s d Pd , that is, population quantities qd ,d = 1, ..., D , are estimated by ˆ † s d P d q d with respective mean squared errors (MSE) vd that are n = n . d s P + ... + s P functions of the withinarea subsample sizes nd; v d = 1 1 D D v( n ). , n d d The overall sample size is denoted by and is 2 2 assumed to be fixed. The population sizes are denoted by When the withinarea variances s d coincide, s1 = ... = 2 2 N (overall) and N (for area d ). For brevity, we denote sD = s , this simplifies further; the optimal sample sizes d 2 Τ are proportional to Pd and do not depend on s . = n (n1 , ..., nD ) . Most population quantities q are functions of a single variable, such as its mean, total, and the In most contexts, it is difficult to elicit a suitable set of priorities Pd , and so it is more constructive to propose a like. The variable may be recorded in the survey directly, or Τ constructed from one or several such variables. Although convenient parametric class of priorities P = (P1 , ..., PD ) our development is not restricted to such quantities, the and illustrate their impact on the sample size allocation.